In this edition we feature Amazon EKS support for Kubernetes Version 1.35 and various re:Invent announcements related such as EKS Capabilities, Provisioned Control Plane, AWS Backup support and enhanced network observability.
New AWS services and features
Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.35
- Amazon EKS now supports Kubernetes version 1.35 in all the AWS Regions where EKS is available, including the AWS GovCloud (US) Regions.
- Kubernetes Version 1.35 brings significant improvements such as In-Place Pod Resource Updates for CPU and memory adjustments without pod restarts, PreferSameNode Traffic Distribution to reduce latency, Node Topology Labels via Downward API, and Image Volumes for delivering AI models using OCI container images.
Amazon EKS introduces enhanced network security policies
- Amazon EKS announced enhanced network security policies for clusters running Kubernetes 1.29 or later.
- The new capabilities include ClusterNetworkPolicy for centrally enforcing network access filters across entire clusters, and DNS-based egress policies to prevent unauthorized access to external resources using FQDNs.
- ClusterNetworkPolicy is available in all EKS cluster launch modes using VPC CNI v1.21.1 or later and DNS-based policies are only supported in EKS Auto Mode-launched EC2 instances.
Announcing Amazon EKS Capabilities
- Amazon EKS Capabilities is now generally available in all AWS Regions except GovCloud and China regions.
- With this release, EKS provides a fully-managed extensible set of Kubernetes-native platform features that includes continuous deployment with Argo CD, AWS resource management through AWS Controllers for Kubernetes (ACK), and dynamic resource orchestration using Kube Resource Orchestrator (KRO), with AWS handling auto scaling, patching, and upgrading.
Amazon EKS introduces Provisioned Control Plane
- With this release, Amazon EKS introduced a new feature that gives you the ability to select your cluster’s control plane capacity from a set of well-defined scaling tiers to ensure predictable, high performance for the most demanding workloads.
- Besides handling traffic spikes or unpredictable bursts, this also supports ultra-scale scenarios requiring thousands of worker nodes for AI training/inference and high-performance computing.
Amazon EKS and Amazon ECS announce fully managed MCP servers in preview
- Amazon EKS and ECS announced preview of fully managed MCP servers enabling AI-powered development and operations experiences.
- These AWS-hosted servers provide standardized interfaces with automatic updates, AWS IAM integration, CloudTrail audit logging, and eliminate local installation needs.
Amazon EKS introduces enhanced container network observability
- Amazon EKS introduced enhanced container network observability powered by Amazon CloudWatch Network Flow Monitor.
- The new capabilities provide granular network metrics for anomaly detection, network monitoring visualizations in AWS console, and ability to identify top-talkers and flows causing retransmissions and timeouts.
AWS Backup now supports Amazon EKS
- AWS Backup now supports Amazon EKS with fully-managed, centralized backup for cluster state and persistent application data.
- This agent-free solution provides automated scheduling, retention management, immutable vaults, cross-Region/cross-account copies, and enables restoration of entire clusters, specific namespaces, or individual persistent volumes.
AWS blogs
[Blog] Implementing assurance pipeline for Amazon EKS Platform
- Organizations struggle to validate that EKS clusters are production-ready, facing infrastructure validation gaps, siloed testing approaches, limited policy enforcement testing, complex non-functional testing, difficulty assessing resilience, and time-consuming manual validation processes.
- This blog article serves as a comprehensive guide for platform engineering teams to build an assurance pipeline using six validation frameworks: Terraform test for early infrastructure validation, Pytest BDD for behavioral testing, Helm testing for package validation, Chainsaw for policy compliance, Locust for performance assessment, and AWS Resilience Hub with AWS Fault Injection Service (AWS FIS) for resilience testing.
[Blog] Efficient image and model caching strategies for AI/ML and generative AI workloads on Amazon EKS
- Organizations face performance bottlenecks and cost inefficiencies in AI/ML workloads due to slow container image pull times, inadequate storage performance that underutilizes expensive GPU resources, and storage systems.
- This blog article serves as a practical guide to implement container image caching using Bottlerocket data volumes (up to 100% reduction in startup times) or secondary EBS volumes, and leverage appropriate storage solutions including Amazon S3 for cost-effective scalability, S3 Express One Zone for 10x faster performance, FSx for Lustre for high-performance file systems with GPU optimization, and optimized data loading with S3 Connector for PyTorch.
[Blog] Enhance Amazon EKS network security posture with DNS and admin network policies
- Organizations need sophisticated network security that simplifies operations at scale, facing challenges with managing external endpoint access with constantly changing IP addresses, lack of centralized policy management across namespaces, operational complexity in maintaining IP-based filtering, and difficulty enforcing consistent security standards across multi-tenant environments.
- This blog article serves as an implementation guide for DNS-based network policies using FQDNs for stable access control to external services, and Admin network policies (ClusterNetworkPolicy) for centralized cluster-wide enforcement with hierarchical tiered management combining both approaches for defense-in-depth security.
[Blog] Deep dive: Streamlining GitOps with Amazon EKS capability for Argo CD
- Running Argo CD in production requires managing high availability, upgrades, SSO configuration, and cross-cluster connectivity, with operational overhead growing across regions and AWS accounts, requiring complex setup including VPC peering, IAM role chaining, ECR token refresh, and OIDC configurations.
- This blog article serves as a deep dive into fully managed Amazon EKS Capability for Argo CD with hub-and-spoke architecture for centralized multi-cluster management, featuring native integrations including AWS IAM Identity Center for SSO, EKS Access Entries for cross-account authentication, automatic ECR authentication, AWS Secrets Manager for credential management, and AWS CodeConnections for private Git repositories.
[Blog] Amazon EKS introduces Provisioned Control Plane
- Organizations running AI training/inference at ultra scale, multi-tenant SaaS platforms, or mission-critical web applications need absolute predictability and guaranteed control plane responsiveness before peak demand arrives.
- This blog article serves as a technical deep dive into EKS Provisioned Control Plane that allows pre-allocation of control plane capacity from scaling tiers (XL, 2XL, 4XL) with well-defined performance metrics for API request concurrency, pod scheduling rate, and cluster database size, built on architectural innovations including enhanced storage, API optimization, improved controllers, and new-generation etcd architecture that enables clusters to handle up to 40,000 nodes and 640,000 pods.
[Blog] Data-driven Amazon EKS cost optimization: A practical guide to workload analysis
- Organizations waste cloud resources through overprovisioning with three critical patterns: greedy workloads with oversized pod resource requests, pet workloads with excessive replica counts due to overly strict configurations, and isolated workloads with fragmented node pools creating stranded capacity.
- This blog article serves as a practical guide to apply data-driven rightsizing using tools like Kubecost, Goldilocks, and Vertical Pod Autoscaler to match resource requests to actual usage, optimize topology spread constraints, consolidate NodePools to enable resource sharing, and configure appropriate PDBs to enable node consolidation while maintaining performance and reliability.
Community news and articles
Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees
- Ingress NGINX, will be retired in March 2026 due to insufficient maintainers and accumulated technical debt that creates security vulnerabilities.
- This article illustrates the importance for migrating to alternatives like Gateway API or one of the many third-party ingress controllers at the earliest.
Experimenting with Gateway API using kind
- The article will serve as walkthrough for setting up a local experimental environment with Gateway API on kind.
Videos and webinars
Open source projects
- Kthena
- Kthena, a new sub-project of Volcano, is a Kubernetes-native AI serving platform for scalable model serving.
- The platform addresses the complexity of deploying and managing LLM workloads at scale by offering prefill-decode disaggregation for optimized hardware utilization, cost-driven autoscaling with budget constraints, intelligent routing with model load-aware and KV-cache aware strategies, and support for multiple inference engines (vLLM, SGLang, Triton) through familiar Kubernetes-native APIs.
Get Hands-on with Amazon EKS
- Want to dive deeper into the topics covered in this newsletter? Join our Get hands-on with EKS event series by registering to take part in a workshop.