NVIDIA Vera Rubin NVL72: Slashing AI Inference Costs by 90%

At Dell Technologies World, NVIDIA CEO Jensen Huang described the current surge in AI interest as "utterly parabolic," signaling a massive shift in enterprise adoption. Central to this momentum is the NVIDIA Vera Rubin NVL72, a breakthrough architecture designed to optimize agentic AI inference. The platform reportedly reduces the cost per token to one-tenth of previous levels, while the Vera CPU accelerates enterprise data queries by up to 3x. With over 5,000 enterprises—including global leaders like Lilly, Samsung, and Honeywell—already utilizing Dell AI Factories, the collaboration between NVIDIA and Dell is redefining the infrastructure for large-scale AI workloads. This transition toward agentic AI, supported by faster sandboxes and more efficient processing, marks a significant milestone in the industrialization of artificial intelligence.

Key Takeaways

Parabolic Demand: NVIDIA CEO Jensen Huang reports that demand for AI infrastructure is experiencing exponential, "parabolic" growth across the enterprise sector.
Drastic Cost Reduction: The NVIDIA Vera Rubin NVL72 architecture enables agentic AI inference at just one-tenth the cost per token compared to traditional methods.
Performance Breakthroughs: Agent sandboxes operate 50% faster on NVIDIA Vera hardware, while the Vera CPU delivers a 3x speed increase for enterprise data queries.
Massive Enterprise Adoption: Over 5,000 organizations, including Honeywell, Samsung, and Lilly, are now running AI workloads through Dell AI Factories.

In-Depth Analysis

The Economics of Agentic AI Inference

The announcement by Jensen Huang at Dell Technologies World underscores a pivotal shift in the economic landscape of artificial intelligence. By introducing the NVIDIA Vera Rubin NVL72, NVIDIA is addressing the primary barrier to widespread AI deployment: the cost of inference. The claim that agentic AI inference can now be performed at one-tenth the cost per token is a transformative metric for the industry.

As enterprises move beyond simple chatbots toward "agentic AI"—systems capable of autonomous reasoning and multi-step task execution—the volume of tokens processed increases significantly. A 90% reduction in cost per token allows businesses to scale these complex agents without a linear increase in operational expenses. This efficiency is not merely a marginal improvement but a fundamental restructuring of the AI cost-benefit analysis, enabling use cases that were previously financially unviable.

Hardware Acceleration for Enterprise Data and Sandboxing

Beyond cost, the technical performance of the Vera architecture provides a dual advantage in speed and agility. The integration of the Vera CPU has shown a 3x improvement in the speed of enterprise data queries. In a corporate environment where data is often siloed and massive in scale, the ability to query information three times faster directly impacts the responsiveness of AI-driven insights.

Furthermore, the 50% performance boost in agent sandboxes is critical for the development cycle of AI agents. Sandboxes are the controlled environments where AI agents are tested and refined before deployment. By running these environments 50% faster than traditional CPUs, NVIDIA and Dell are effectively shortening the R&D lifecycle for enterprise AI. This speed allows for more rapid iteration, safer testing of autonomous behaviors, and faster time-to-market for AI-driven services.

The Scale of the Dell and NVIDIA Partnership

The collaboration between NVIDIA and Dell Technologies has reached a massive scale, evidenced by the 5,000 enterprises currently leveraging Dell AI Factories. The mention of specific industry titans such as Lilly, Samsung, and Honeywell indicates that the demand is not limited to the tech sector but spans pharmaceuticals, electronics, and industrial manufacturing.

These "Dell AI Factories" represent a standardized approach to AI infrastructure, combining NVIDIA’s specialized hardware like the Vera Rubin NVL72 with Dell’s enterprise deployment capabilities. The fact that demand is described as "utterly parabolic" suggests that the infrastructure build-out is struggling to keep pace with the appetite for generative and agentic AI. This partnership serves as the backbone for this growth, providing the necessary compute power to handle the increasingly complex workloads required by global corporations.

Industry Impact

The transition to the Vera Rubin architecture and the resulting performance gains signal a new era for the AI industry. By focusing on the efficiency of the Vera CPU and the NVL72 system, NVIDIA is moving the conversation from raw training power to the practicalities of enterprise-scale inference. The 3x increase in query speed and 50% faster sandboxing suggest that the next phase of AI competition will be won by those who can integrate AI most deeply into existing data workflows.

For the broader industry, the "parabolic" demand mentioned by Huang implies a sustained investment cycle in hardware. As 5,000 enterprises lead the way, the success of companies like Samsung and Honeywell in utilizing these AI factories will likely serve as a blueprint for the rest of the global market. The focus on reducing the cost per token specifically for "agentic AI" indicates that the industry is moving toward more autonomous, high-token-usage applications, necessitating the high-efficiency hardware NVIDIA is now delivering.

Frequently Asked Questions

Question: What is the primary cost benefit of the NVIDIA Vera Rubin NVL72?

According to the announcement, the Vera Rubin NVL72 allows for agentic AI inference at one-tenth the cost per token compared to previous standards, representing a 90% reduction in inference expenses.

Question: How does the Vera CPU improve enterprise data operations?

The Vera CPU significantly enhances data processing capabilities, enabling enterprise data queries to run up to 3x faster than traditional systems, which accelerates the retrieval of insights from large datasets.

Question: Which major companies are currently using Dell AI Factories for their workloads?

Over 5,000 enterprises are utilizing the platform, with prominent examples including the pharmaceutical leader Lilly, electronics giant Samsung, and industrial conglomerate Honeywell.

NVIDIA CEO Jensen Huang Highlights Parabolic Demand and Cost Efficiency of Vera Rubin NVL72 at Dell Technologies World