GKE Autopilot is here, but not for everyone, for now...

4 min readMar 1, 2021

GKE Autopilot is GA! That means less stress for me as an SRE guy who's taking care of the cluster, better SLA for the applications and clients, etc. but it has its own dark side as well I will describe some of it here.

What is GKE Autopilot?

Autopilot is a new mode of operation for creating and managing Kubernetes clusters in Google Kubernetes Engine (GKE). In this mode, GKE configures and manages the underlying infrastructure, including nodes and node pools enabling users to only focus on the target workloads and pay per pod resource requests (CPU, memory, and ephemeral storage). In addition to GKE’s SLA on hosts and the control plane, Autopilot also includes an SLA on Pods.

From a developer’s perspective, nothing really changes here, but this new mode does free up teams to focus on the actual workloads and less on managing Kubernetes clusters. With Autopilot, users still get the benefits of Kubernetes, but without all of the routine management and maintenance related to the cluster and resources.

First bad news for us

Allowed resource ranges: Pod vCPU are available in increments of 0.25 vCPU. In addition to the minimum values, the CPU: memory ratio must be in the range of 1:1 to 1:6.5. Resources outside of the allowable ratio ranges will be scaled up. Why this is bad for us? Our smallest services require just 50 mCPU and 64 MiB of memory and they usually boost to limits around 200 mCPU and 128 MiB of memory that means a lot of services will be overscaled just because of this limitation. Another case is the limit of 10 Gib of storage as one client is sharing with us 150 GiB zip files we usually download to disk, unzip, and process, we can overcome that but the point is that it is not just lift & shift from GKE Standard to Autopilot.

No privileged PODs and customizations on HOSTs

That's a real blocker for now. We do need to run an application using SMB mount and to do so we need to update the hosts or run POD in privileged mode. Again we can overcome that by migrating this custom application to VM or keep GKE standard but none of them looks nice to me as I want to minimize the operations.

No preemptible nodes

As now Autopilot does not support preemptible nodes what would mean for us a huge increase in the price of all our non-production workloads which are made of preemptible node-pools to save costs. I do believe this feature will be added soon.

Limits vs VPA/HPA

On our non-production clusters, we do leverage the K8s Requests and Limits. We're using low requests mostly 100 mCPU and 128 MiB memory and if needed the app has higher limits to peak up to 500 mCPU and 256 MiB memory to process whatever is needed and there is no need to scale up, it just uses the free resources on the node. This way we have tens of containers on a single node and even like that the node is underutilized as those applications are processing the requests just when QA or Developers are using it. For this use-case, the VPA or HPA is just too slow and having over-provisioned resources does not make sense again, maybe I will reconsider it once Autopilot will support preemptible just to lower the OPS work for a bit bigger price.

Security

Overall the level of security with Autopilot should be better than in case you would manage your own GKE but we're missing the implementation CMEK because of regulations for the healthcare and fintech sector. I want to dig into this topic deeper later on and see the comparison with Cloud Functions and Cloud Run.

Optimize to save $$$

As autopilot is "pay as you go" the best thing you can do is set up proper monitoring whatever it's Datadog integration or Google Monitoring and look for underutilized pods.

Leverage the HPA and VPA as much as possible to lower the base running costs!

Autopilot vs Cloud Run

Google Cloud with its Cloud Run already enabled users to create containerized applications in managed serverless fashion and offered an alternative to AWS-Fargate or Azure Container Instances. CloudRun still has its place in the universe because you don't need to pay for the Control Plane of GKE Autopilot when the application is scaled to 0 and the simplicity of usage and level of integration to the event-driven ecosystem of GCP is much better and simpler for now that on GKE Autopilot.

Single API everywhere!

The biggest advantage of Autopilot is that it enables users to use Kubernetes as a container orchestrator with standardized API to deploy applications and it's a managed platform that adopts pay per use model.
With Autopilot you don't need to worry about any other extra configuration and also adds concrete automation for handling auto-scaling, auto-upgrades, and other Day-2 operations.

I love to see the progress and the way K8s is heading and I'm with it. From first self-managed clusters on bare-metal, where it was better to re-create the cluster than perform an upgrade to Google Cloud and GKE where after some tuning it just worked to fully managed platform. And still using the same API and YAML files I used years ago just with few changes and keeping the same concepts.

I will be closely watching new releases and support of more workloads like Istio and do some long-run tests to determine financial benefits/downsides.
Wait for it…