ICrypto

Hotest Blockchain News in First Media Index

Nvidia’s Eureka helps robot dog perfect yoga ball balance

Researchers have utilized Nvidia’s Eureka platform, a human-level reward design algorithm, to train a quadruped robot to balance and walk on top of a yoga ball.

Derived from the platform, DrEureka is a large language model (LLM) agent specialized in crafting code to train robots’ skills within simulations and to develop solutions that overcome the challenges of the simulation-reality gap.

Researchers claim that it operates seamlessly, automating the entire process from initial skill acquisition to real-world implementation. This ensures a smooth transition from virtual environments to practical deployment.

The team used the platform to train the robot dog in simulation and then transferred it to real work conditions. The quadruped completed the task in the first attempt itself, and no fine-tuning was required.

The details of the study from the team of researchers from the University of Pennsylvania, University of Texas at Austin, and Nvidia were published on GitHub.

Automating sim-to-real robotics

Researchers highlight that leveraging policies acquired in simulation for real-world applications holds significant promise in scaling up robot skill acquisition.

Nonetheless, sim-to-real methodologies often necessitate manual configuration and adjustment of task reward functions and simulation physics parameters, leading to slow progress and requiring substantial human effort.

“Traditionally, the sim-to-real transfer is achieved by domain randomization, a tedious process that requires expert human roboticists to stare at every parameter and adjust by hand,” said Jim Fan, senior research manager & lead of embodied AI at Nvidia, in a post on X.

DrEureka starts by taking task and safety instructions, along with the environment source code, to initiate Eureka. Eureka then produces a standardized reward function and policy. These are tested across various simulation conditions to develop a physics prior that’s sensitive to rewards.

This is then utilized by the LLM to generate a range of domain randomization (DR) parameters. Finally, leveraging the synthesized reward and DR parameters, DrEureka trains policies ready for real-world deployment.

Cutting-edge LLMs such as GPT-4 come equipped with an extensive built-in understanding of physical concepts like friction, damping, stiffness, gravity, and more. “We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well,” said Fan.

Real-world adaptability

Assessing quadrupedal locomotion, the team systematically tested DrEureka’s policies across various real-world terrains.

Results show their robustness and superior performance compared to policies trained with manually designed reward and domain randomization settings.

“DrEureka policy exhibits impressive robustness in the real world, adeptly balancing and walking atop a yoga ball under various real-world, un-controlled terrain condition changes and disturbances,” said the researchers in the study.

Furthermore, the enhancement of DrEureka’s LLM reward design subroutine surpasses Eureka’s capabilities by integrating safety instructions. Researchers assert its importance in crafting reward functions sufficiently safe for real-world deployment.

Key findings reveal the significance of leveraging the initial Eureka policy for creating reward-aware physics prior to DrEureka’s success. Additionally, utilizing the LLM to sample domain randomization parameters is vital for optimizing real-world performance.

Looking ahead, researchers say there are numerous ways to enhance DrEureka further. For instance, presently, DrEureka policies are solely trained in simulation, but using real-world failures as feedback could help LLMs better fine-tune sim-to-real methods in subsequent iterations.

Additionally, all tasks and policies in the study relied solely on the robot’s internal sensory inputs, and integrating vision or other sensors could enhance policy performance and the LLM feedback loop.

Share
 05.05.2024

Hotest Cryptocurrency News

End of content

No more pages to load

Next page