Nvidia CEO Jensen Huang believes that Physical AI will be the next major technological breakthrough. Emerging robots, powered by AI, are expected to take various forms and become integral to numerous environments. Nvidia envisions a future where robots are ubiquitous, populating kitchens, factories, doctors’ offices, and highways, among other settings, to perform repetitive tasks with increasing frequency. Nvidia’s role will be pivotal, providing the AI software and hardware essential for developing and operating these systems.
Huang describes the current AI landscape as being in the “pioneering phase,” focused on developing foundational models and tools. This phase is about creating the base technologies that will be refined for specific applications. The next phase, known as Enterprise AI, is already underway and involves enhancing productivity through chatbots and AI models for employees, partners, and customers. At the peak of this phase, individuals will have personal AI assistants or even multiple AI systems tailored for specific tasks. In these initial phases, AI primarily informs or demonstrates information by generating likely sequences of words or tokens. However, Huang highlights the emergence of the final phase: Physical AI. This phase involves AI embodied in physical forms that interact with their environments, requiring the integration of data from sensors and the manipulation of objects in three-dimensional space.
“Building foundation models for general humanoid robots is one of the most exciting challenges in AI today,” said Jensen Huang, founder and CEO of NVIDIA. “The enabling technologies are converging, allowing leading roboticists worldwide to make significant strides toward artificial general robotics.” Designing both the robot and its intelligence is a task for AI. However, testing these robots against countless unpredictable circumstances, many of which cannot be anticipated or replicated physically is a significant challenge. The solution involves using AI to simulate the environments and interactions that the robots will encounter.
“We’re going to need three computers: one to create the AI, one to simulate the AI, and one to run the AI,” said Huang. Huang refers to Nvidia’s suite of hardware and software solutions. The process begins with Nvidia H100 and B100 servers for creating the AI, progresses with workstations and servers using Nvidia Omniverse with RTX GPUs for simulation and testing, and concludes with Nvidia Jetson (soon equipped with Blackwell GPUs) for real-time sensing and control. Nvidia has also introduced GR00T, which stands for Generalist Robot 00 Technology. GR00T is designed to understand and replicate human movements, learning coordination and dexterity to navigate and interact with the real world. Huang demonstrated several such robots during his GTC keynote.
Two new AI NIMs will assist roboticists in developing simulation workflows for generative Physical AI in NVIDIA Isaac Sim, a reference application built on the NVIDIA Omniverse platform. The MimicGen NIM microservice generates synthetic motion data from recorded teleoperated data using spatial computing devices like Apple Vision Pro. The Robocasa NIM microservice creates robot tasks and simulation-ready environments in OpenUSD, a universal framework that supports Omniverse for 3D world development and collaboration. Additionally, NVIDIA OSMO is a cloud-native managed service designed to orchestrate and scale complex robotics development workflows across distributed computing resources, both on-premises and in the cloud. OSMO simplifies robot training and simulation workflow creation, reducing deployment and development times from months to less than a week. Users can manage tasks such as generating synthetic data, training models, conducting reinforcement learning, and testing at scale for humanoids, autonomous mobile robots, and industrial manipulators.
To design robots capable of handling objects without crushing or dropping them, the Nvidia Isaac Manipulator offers advanced dexterity and AI capabilities. This system is built on a foundation of models and includes early ecosystem partners such as Yaskawa, Universal Robots (a Teradyne company), PickNik Robotics, Solomon, READY Robotics, and Franka Robotics. To train robots to “see,” Isaac Perceptor provides multi-camera, 3D surround-vision capabilities. These are increasingly used in autonomous mobile robots for manufacturing and fulfillment operations to enhance efficiency, and worker safety, and reduce error rates and costs. Early adopters like ArcBest, BYD, and KION Group are using this technology to achieve higher levels of autonomy in material handling.
For operating robots, the new Jetson Thor SoC features a Blackwell GPU with a transformer engine delivering 800 teraflops of 8-bit floating-point AI performance. This enables the running of multimodal generative AI models like GR00T. Equipped with a functional safety processor, a high-performance CPU cluster, and 100GB of Ethernet bandwidth, it significantly simplifies design and integration efforts. Huang believes that robots will need to take human-like forms because the factories and environments where they will operate are designed for human operators. It is more cost-effective to design humanistic robots than to redesign the factories and spaces in which they will be used.
Leave a comment