EKS News 077

In this edition we feature Amazon EKS support for Kubernetes Version 1.34 and enhancements related to Sagemaker Hyperpod and EKS Auto Mode.

New AWS services and features

Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.34
- Amazon EKS now supports Kubernetes version 1.34 in all the AWS Regions where EKS is available, including the AWS GovCloud (US) Regions.
- Kubernetes Version 1.34 brings significant improvements in resource management with Dynamic Resource Allocation (DRA) Core APIs reaching GA status for efficient GPU and specialized hardware management, alongside Pod-level Resource Requests and Limits (Beta) that enable shared resource pools for multi-container applications.
Amazon EKS Auto Mode now available in AWS GovCloud (US-East) and (US-West)
- Amazon EKS Auto Mode is now available in the AWS GovCloud (US-East) and (US-West) regions.
- With this release, EKS Auto Mode now supports FIPS-validated cryptographic modules through its Amazon Machine Images (AMIs) to help customers meet FedRAMP compliance requirements.
Amazon SageMaker HyperPod now supports autoscaling using Karpenter
- Amazon SageMaker HyperPod now supports managed node autoscaling using Karpenter.
- With this release, for inference workloads, Sagemaker HyperPod provides automatic capacity scaling to handle production traffic bursts, cost reduction through intelligent node consolidation during idle periods, and seamless integration with event-driven pod autoscalers like KEDA.
Amazon EKS introduces a new catalog of community add-ons in the AWS GovCloud (US) Regions
- With this release, Amazon EKS now supports catalog of community add-ons that includes metrics-server, kube-state-metrics, cert-manager, prometheus-node-exporter, fluent-bit, and external-dns in all AWS GovCloud (US) regions.
- Each add-on has been packaged, scanned, and validated for compatibility by EKS, with container images securely hosted in an EKS-owned private Amazon Elastic Container Registry (ECR) repository.

AWS blogs

[Blog] Unlocking next-generation AI performance with Dynamic Resource Allocation on Amazon EKS and Amazon EC2 P6e-GB200
- Kubernetes Dynamic Resource Allocation (DRA) enables topology-aware scheduling and cross-node GPU communication through ComputeDomain and ResourceClaims, automatically managing IMEX primitives, NVLink partition management, and hardware initialization for distributed AI workloads.
- Amazon EC2 P6e-GB200 UltraServers feature NVIDIA GB200 Grace Blackwell Superchips that integrate two NVIDIA Blackwell GPUs with a NVIDIA Grace CPU, providing NVLink-Chip-to-Chip connection with 900 GB/s of bidirectional bandwidth.
- This blog article serves as guidance to deploy distributed AI training workloads on Amazon EKS using EC2 P6e-GB200 UltraServers, demonstrating how DRA and IMEX channels create memory-coherent GPU clusters that span multiple nodes for training trillion-parameter models with near-bare-metal performance.
[Blog] How to build highly available Kubernetes applications with Amazon EKS Auto Mode
- Amazon EKS Auto Mode automates control plane updates, streamlines add-on management, and ensures clusters maintain current best practices by dynamically adding or removing nodes based on application demands while replacing nodes every 21 days maximum for security patches.
- With Kubernetes, resilience features including Pod Disruption Budgets (PDBs), Pod Readiness Gates, Topology Spread Constraints, and proper lifecycle management work together to maintain high availability during cluster events and node disruptions.
- This blog article serves as guidance to build highly available Kubernetes applications with Amazon EKS Auto Mode, demonstrating comprehensive testing scenarios
[Blog] Implementing granular failover in multi-Region Amazon EKS
- Multi-Region multi-tenant Amazon EKS architectures face the cascading failure problem where a single application failure causes Route 53 to mark the entire ALB as unhealthy, redirecting all traffic to other regions even for healthy applications.
- This blog article serves as guidance to implement granular failover in multi-Region Amazon EKS design by configuring dedicated Route 53 health checks for each application, setting “Evaluate Target Health” attribute to No in alias records, and enabling selective failover that redirects only affected applications while maintaining optimal routing for healthy services.
[Blog] SaaS deployment architectures with Amazon EKS
- SaaS Anywhere deployment models enable SaaS providers to extend their applications into customers’ environments.
- GitOps practices and Kubernetes-native tools like AWS Controllers for Kubernetes (ACK), Crossplane, and Helm charts enable consistent application packaging and deployment across distributed environments, while maintaining systematic rollout of updates and coordinated infrastructure changes.
- This blog article serves as guidance for implementing SaaS deployment architectures with Amazon EKS, exploring patterns for managing remote environments, establishing shared responsibility models, and addressing day-2 operations including tenant onboarding, upgrade management, and cross-account monitoring across different deployment models.
[Blog] How to manage EKS Pod Identities at scale using Argo CD and AWS ACK
- Amazon EKS Pod Identity simplifies IAM permissions management for Kubernetes applications by providing fine-grained access control at the pod level.
- AWS Controllers for Kubernetes (ACK) enables Kubernetes-native management of AWS resources through Custom Resource Definitions (CRDs), allowing the PodIdentityAssociation custom resource to map IAM roles to service accounts.
- This blog article serves as guidance to manage EKS Pod Identities at scale using Argo CD and AWS ACK, demonstrating how to implement validation jobs that verify IAM role associations before deploying application workloads.
[Blog] Kubernetes Gateway API in action
- Kubernetes Gateway API provides a unified, standardized approach to traffic management that replaces traditional Ingress resources and vendor-specific annotations, supporting HTTP, HTTPS, gRPC, and TLS traffic routing through declarative Kubernetes-native resources like HTTPRoute, GRPCRoute, and TLSRoute.
- This blog article serves as guidance for implementing Kubernetes Gateway API in action using a Calendar web application (Cal-LLM), demonstrating three key use cases: exposing applications with hostname-based routing (North-South traffic), enabling canary deployments between microservices using gRPC (East-West traffic), and restricting communication to external services through egress controls with Linkerd service mesh and Envoy Gateway.
[Blog] Extending GPU Fractionalization and Orchestration to the edge with NVIDIA Run:ai and Amazon EKS
- NVIDIA Run:ai provides dynamic GPU fractionalization, node-level scheduling, and priority-based sharing capabilities that improve GPU utilization.
- This blog article serves as guidance to extend GPU fractionalization and orchestration to the edge with NVIDIA Run:ai and Amazon EKS, demonstrating how to deploy Run:ai control planes across AWS Regions while managing GPU worker nodes in Local Zones, Outposts, and on-premises environments through consistent APIs and security controls for distributed training and inference scenarios.

Community news and articles

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
- Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let users explore, manage, and debug cluster resources.
- The release of Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI.
SaaS Anywhere: Setting Up Amazon EKS Hybrid Nodes
- The article will serve as walkthrough for setting up Amazon EKS Hybrid Nodes for SaaS Anywhere deployments.

Videos and webinars

Open source projects

Karpor
- Karpor is a Kubernetes Visualization tool that brings advanced Search, Insight and AI to Kubernetes for visibility into Kubernetes clusters.