Resource Requests and Limits
Resource Requests and Limits are a feature of Kubernetes Pods. They are used to help Kubernetes decide where pods can and should be deployed. There are two resources supported by Kubernetes, CPU and memory.
Requests: Reserved amount of a resource for a pod.
For example, if you
request1 CPU core then your pod will be allocated to a Kubernetes Node where there is at least 1 CPU core available.
Limits: The maximum amount of a resource a Pod will be allowed to use.
For example, if you
limityour Pod’s memory to 1GB and then try to use more than 1GB, Kubernetes will tell your Pod that it is out of memory.
Why are they important?
To ensure your pods have enough resources to do their job
To ensure other pods do not affect your pod by using up resources you need
To ensure our Kubernetes cluster has enough Nodes to support our services
To ensure our Kubernetes cluster doesn’t have too many nodes, which would waste our budget
Tips and Tricks
Measuring CPU and Memory Usage
Try to measure how much CPU and Memory your service will use in production. During a load test is a good time to measure CPU and memory usage!
Don’t set very low limits or requests before the first deployment to production, it is better to lower the limits and requests after you know how much you really need.
If you are running your service in a Docker container locally, you can see current resource usage with: docker stats.
If you are running your service in Kubernetes (e.g. in Dev), you can see current resource usage with: kubectl top pod -n ${NAMESPACE}.
In addition to CPU usage, you should also monitor CPU throttling (docker.cpu.throttled on Datadog): under normal production load this metric should be almost always 0. If it’s frequently non-0 it means your CPU limits are too low.
Recommendations for initial limits and requests
The following are recommended initial limits/requests for a stateless microservice.
You can use this until you have performed load tests to understand the actual resources needed by your service. Note that if your service keeps significant amount of data cached in memory, or has to handle a high number of concurrent requests, the following numbers may not be appropriate.
requests:
cpu: 500m # 1/3x as limits.cpu; should be low enough to keep CPU usage equal to
# the HPA target multipled by requests.cpu
memory: 256Mi # should be the same as limits.memory, and high enough to avoid
# OOM during normal operations and in case of dependency failures
limits:
cpu: 1500m # 3x as requests.cpu; should be high enough to keep `docker.cpu.throttled`
# down to 0 during normal operations
memory: 256Mi # should be the same as requests.memory
After doing a representative load test of production workloads, and especially after releasing in production, you should review your resources requests and limits and fix them as needed. Refer to the next section for general recommendations regarding how to choose limits and requests.
Recommendations for limits and requests
NOTE: These recommendations may not apply to all services or workloads. Do not blindly apply them without load tests!
In general, it is advisable to have many small pods instead of very few big pods, and in any case you should have at least 3 pods for availability.
It is normally not advised to run too many pods, as running too many pods can cause resource exhaustion or overload in your dependencies (e.g. too many open connections on your database or caching server), make debugging/troubleshooting more difficult, and slow down the deployment process.
In general, you should allocate just enough resource headroom to ensure that your pods can work correctly under nominal conditions and likely exceptional circumstances. Avoid needlessly reserving too many resources.
Document the rationale for choosing specific values for the requests and limits.
Memory limits should be equal to memory request: this makes it unlikely for kubernetes to kill your pod due to the memory consumption of other pods.
The overall CPU utilization (
docker.cpu.usage) of your deployment should ideally always be the one set for the HPA (e.g. if your CPU request is 500m and your HPA target is 65%, your CPU usage in each pod should always be 65% of 500m, i.e. ~325m[^metric]). Considering that the recommendation is to set HPA min replicas to 3, you may need to lower your CPU requests.Understand the differences between RSS and the size of the Go heap.
If your service performs reads/writes on the filesystem during regular operations you should raise your memory request/limits to ensure there is enough memory for the page cache.
Periodically review the historical resource usage of your services and perform corrective actions to ensure your service is not consuming too many resources.
Golang services
The golang GC can interact negatively with the CFS sched and cause high tail latencies. If tail latency is a concern:
GOMAXPROCS should be set to
floor(limit.cpu * 2 / 3)(or you should use something likegithub.com/uber-go/automaxprocsto do it automatically)CPU limit should be at 2x~4x of CPU request
CPU limit should not be less than
1000m, and ideally at least1500m, sinceGOMAXPROCScan not be less than 1.