DevOps Information

Namespace Resources

There are three main types of namespace resources that have to be monitored:

CPU

Each pod has a request and a limit of the amount of CPU (computing power) that the pod needs. The request amount roughly represents the normal amount the pod uses, and the limit is the amount that the pod is able to spike to during high load. These amounts of CPU are measured in m (millicores). If there is no unit shown then it is in cores (1000 millicores). The YAML for DeployConfigs and StatefulSets contains the CPU settings:

If the application is sluggish, check the CPU metrics for the pods. You can view the metrics through the OCP console:

Note:

the orange horizontal line in the metrics is the CPU request
the blue horizontal line at the top of the metrics is the CPU limit
the data in the graphs is downsampled to an average and may hide short spikes
you can drill down into the metrics by clicking the graph

If any of the pods have their CPU pegged, it is probably the reason for the problem. Adding CPU, though, won’t necessarily fix the problem so do look for the underlying cause. For example, if the database is at 100% CPU then perhaps it needs some indexes added for long-running queries.

If you do need to adjust the CPU, note that changing the values in the OCP console will only change the values until the next deployment of CHEFS. It’s a good way to try something out without too much effort. To make the change permanent, though, you will need to update the values in the /openshift files in the repo.

Memory

Each pod has a request and a limit of the amount of memory that the pod needs. The request amount roughly represents the normal amount the pod uses, and the limit is the amount that the pod is able to spike to during high load. These amounts of memory are typically measured in Mi (megabytes) or Gi (gigabytes). The YAML for DeployConfigs and StatefulSets contains the memory settings:

If the application is sluggish, check the memory metrics for the pods. You can view the metrics through the OCP console:

Note:

the orange horizontal line in the metrics is the memory request
the blue horizontal line at the top of the metrics is the memory limit
the data in the graphs is downsampled to an average and may hide short spikes
you can drill down into the metrics by clicking the graph

If any of the pods have their memory pegged, it is probably the reason for the problem. Adding memory, though, won’t necessarily fix the problem so do look for the underlying cause. For example, if the database is at 100% memory then perhaps it needs some indexes added for long-running queries.

If you do need to adjust the memory, note that changing the values in the OCP console will only change the values until the next deployment of CHEFS. It’s a good way to try something out without too much effort. To make the change permanent, though, you will need to update the values in the /openshift files in the repo.

Storage

Most storage in the pods is ephemeral and disappears when the pod is deleted. However, some pods have persistent storage that survives pod restarts, which is needed for things like database data. When storage fills to 100% it makes things much harder to recover, so it is best to expand storage long before it hits capacity.

You can view the capacity of storage in the OCP console:

Note that the top two storage items don’t show the “Used” amount. This is because those PVCs are not currently mounted to a pod - these PVCs are used for cron jobs which only run for a few minutes per day. However, they are monitored and will produce an alert in the #chefs-sysdig channel on rocket.chat if they reach 90%.