
Member-only story
Autoscaling in Kubernetes: Understanding Vertical Pod Autoscaler (VPA)
Historically, companies used to scale their servers and allocate resources according to predictable usage patterns on fixed daily, weekly, or yearly cycles. For example, administrators would scale up the office’s internal systems during business hours, or a retailer might increase server capacity during holiday sales spikes. However, in an era where a sudden viral feed, post on TikTok, or breaking news can create immediate and unpredictable changes in demand, these usage patterns have become less predictable, making manual responsiveness obsolete.
To address this issue and avoid poor performance or wasted costs caused by under or over-provisioning of servers, cloud service providers and Kubernetes created functionality to automatically scale servers up or down according to real-world metrics like load and traffic.
In Kubernetes, we have three different autoscaling resources:
- Vertical Pod Autoscaler (VPA)
- Horizontal Pod Autoscaler (HPA)
- Cluster Autoscaler (CA)
Each autoscaler targets a different layer and need within the cluster. In this series of articles, I will explore each one of them in more depth, starting with the Vertical Pod Autoscaler.