AWS has launched G7e instances on Amazon SageMaker AI, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Configurations with 1, 2, 4 and 8 GPUs—each GPU offering 96 GB of GDDR7 memory—are now available for inference workloads, including single‑node hosting of several 120B‑class and mid‑sized open foundation models.

  • NVIDIA RTX PRO 6000 Blackwell GPUs with 96 GB GDDR7 per GPU
  • Node options: 1, 2, 4, and 8 GPUs; single‑node G7e.2xlarge supports 120B‑class FMs
  • Available now on Amazon SageMaker AI for inference of open foundation models

What happened

Amazon SageMaker AI now offers the G7e instance family built on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Customers can provision nodes with 1, 2, 4, or 8 GPUs; each GPU includes 96 GB of GDDR7 memory. AWS highlights that a single‑node G7e.2xlarge configuration can host large open foundation models such as GPT-OSS-120B, Nemotron-3-Super-120B-A12B (NVFP4 variant), and Qwen3.5-35B-A3B for inference.

Advertising
Reserved for inline-leaderboard

Why it matters

The combination of high VRAM per GPU and multi‑GPU node options lowers the operational complexity of serving large models by reducing dependence on model sharding and extreme model parallelism. That can simplify deployment, shorten iteration cycles, and provide a more cost‑efficient path for organizations that want to run large open foundation models for production inference. The GDDR7 memory and Blackwell architecture also position these instances to better handle memory‑heavy tasks such as long‑context inference and larger batch sizes.

What to watch next

Teams should benchmark their target models on G7e instances to validate latency, throughput, and cost versus existing instance types and multi‑node setups. Watch for official AWS documentation and region availability updates, published performance benchmarks, pricing details, and expanded model compatibility or optimized runtimes for these GPUs. Expect vendors and open‑model maintainers to publish guidance and tuning options specific to Blackwell‑based inference soon.

Source assisted: This briefing began from a discovered source item from AWS Machine Learning Blog. Open the original source.