In this edition we feature several blogs on GPUs and how you can use them efficiently when running certain types of workloads.
New AWS services and features
- Amazon EMR on EKS now supports managed Apache Flink (Public Preview)
- Customers who already use EMR can now run their Apache Flink application along with other types of applications on the same Amazon EKS cluster.
- To learn more and get started, please visit the Apache Flink section in the documentation .
- Amazon GuardDuty introduces cluster configurability in EKS Runtime Monitoring
- Now you can selectively configure which Amazon Elastic Kubernetes Service (Amazon EKS) clusters are to be monitored for threat detection.
- AWS Identity and Access Management provides action last accessed information for more than 140 services
- You can use action last accessed information as part of your periodic review process to restrict the access granted to IAM roles, including roles assigned to pods with IRSA, to just the required permissions.
- Amazon CloudWatch adds Amazon EKS control plane logs as Vended Logs
- With this change you can take advantage of volume-based tiered pricing for CloudWatch Logs.
- Automate Packet Acceleration configuration using DPDK on Amazon EKS
- Explains how to install and configure SRIOV and DPDK to optimize workloads for maximum network throughput.
- [Blog] Maximizing GPU utilization with NVIDIA’s Multi-Instance GPU (MIG) on Amazon EKS: Running more pods per GPU for enhanced performance
- Explains how to configure and optimize the NVIDIA multi-instance GPU (MIG) device plugin on EKS worker nodes so you can derive the most value from each GPU.
- [KB Article] External Traffic Fails to Reach EKS Pods with NLB Client IP Preservation
- [Tutorials] The first five tutorials in this series walk you through different EKS cluster setups.
- [Blog] GPU sharing on Amazon EKS with NVIDIA time-slicing and accelerated EC2 instances
- Time-slicing refers to the method where multiple tasks or processes share the GPU resources in small time intervals, ensuring efficient utilization and task concurrency.
- Time-slicing becomes important in scenarios where GPU demands are dynamic, where multiple tasks or users need concurrent access, or where maximizing the efficiency of GPU resources is a priority.
- This blog explains how to enable GPU sharing on EKS with the NVIDIA Kubernetes device plugin.
- Deploy Generative AI Models on Amazon EKS
- This post walks you through an end-to-end stack (JARK) and example (Dreambooth) for building Generative AI systems on Amazon EKS.
- Run Spark-RAPIDS ML workloads with GPUs on Amazon EMR on EKS
- To enhance the capabilities of NVIDIA GPUs within the Spark ecosystem, NVIDIA developed Spark-RAPIDS.
- Spark-RAPIDS is an extension library that uses RAPIDS libraries built on CUDA, to enable high-performance data processing and ML training on GPUs.
- Starting from Amazon EMR on EKS 6.9, customers can use the power of the NVIDIA Spark-RAPIDS Accelerator without the need for creating and maintaining custom images.
- This post explores the capabilities of the NVIDIA Spark-RAPIDS Accelerator and its impact on the ML workflow using Apache Spark.
- Argo CD Application Controller Scalability Testing on Amazon EKS
- This post builds on prior work done by the industry to look at the scalability of Argo CD.
- It presents the findings from experiments of deploying 10,000 Argo CD applications to 1, 10, and 97 remote clusters.
- It includes observed scalability bottlenecks and modifications that were made to improved our efforts to scale Argo CD on any Kubernetes cluster, including Amazon EKS.
- Policy management in Kubernetes is changing
- AI for Kubernetes; good or evil?
- AIOps and K8SGPT
- Build your OCI images with ko while still using GoReleaser!
Videos and webinars
- Just in time access to Kubernetes
- Sveltos workshop , Simplifying Kubernetes Add-ons Across Multitudes of Clusters
- Hands-on with VPC Lattice
- All about Bottlerocket
- Kubernetes Ingress with ngrok