A Wintry Mix: Eastern US Braces for Snow and Winter Weather Alerts

Will it snow today? Millions under winter storm watches, alerts across the eastern US Millions in the eastern U.S. and Great Lakes region are getting their first real taste of winter weather on Thursday with a storm set to bring rain and up to a foot of snow to higher elevations. Light snow could even
HomeTechnologyHarnessing Visual Data to Create Robot Training Simulations: The Future of AI...

Harnessing Visual Data to Create Robot Training Simulations: The Future of AI in Robotics

Two recent studies have presented artificial intelligence systems that utilize either videos or photos to develop simulations aimed at training robots for real-world tasks. This approach has the potential to significantly reduce the costs associated with training robots for complex environments.

Researchers involved in creating large AI models, like ChatGPT, utilize extensive resources of internet text, photos, and videos for their training. However, those training physical robots face unique challenges: collecting robot data is costly, and the limited number of robots navigating the real world means there’s a lack of easily accessible data necessary for effective performance in dynamic settings, such as residential homes.

To address this issue, some researchers have begun using simulations for robot training. However, this method, which often requires the input of a graphic designer or engineer, can be both time-consuming and expensive.

The recent studies from University of Washington researchers showcase AI systems that leverage videos or photos to generate simulations for training robots to operate in real-world environments. This innovation could greatly decrease training expenses for robots operating in intricate surroundings.

In the first study, a user scans a space with a smartphone to capture its layout. The system, named RialTo, then constructs a “digital twin” simulation of that space, allowing the user to define how various elements function (like how to open a drawer). The robot can then practice similar movements in the simulation with slight variations to learn how to execute them effectively. The second study introduced URDFormer, a system that analyzes images of actual environments found online to rapidly generate realistic simulation environments for robot training.

Both studies were presented at the Robotics Science and Systems conference, with the first study showcased on July 16 and the second on July 19 in Delft, Netherlands.

“Our goal is to facilitate systems that can transition from real-world settings to simulations affordably,” explained Abhishek Gupta, an assistant professor at UW’s Paul G. Allen School of Computer Science & Engineering and co-senior author of both studies. “These systems allow robots to train in simulated environments, enhancing their functionality in physical spaces. This is crucial for safety; poorly trained robots can cause damage or harm. Additionally, if robots can be programmed to work in homes simply by scanning with a smartphone, it makes this technology more accessible.”

While many robots excel in structured environments like assembly lines, teaching them to engage with people and navigate less controlled settings presents greater challenges.

“In factories, tasks are highly repetitive,” noted Zoey Chen, lead author of the URDFormer study and a doctoral student at UW’s Allen School. “While these tasks can be challenging, once programmed, a robot can perform them continuously. In contrast, homes are unique and ever-evolving, featuring a wide variety of objects, tasks, layouts, and people. This is where AI becomes incredibly beneficial for robotic engineers.”

The two systems tackle these challenges with different approaches.

RialTo, developed by Gupta alongside a team at the Massachusetts Institute of Technology, requires a person to navigate through an environment, capturing video of its geometry and interactive elements. For instance, in a kitchen scenario, they would film actions like opening cabinets, using the toaster, and accessing the refrigerator. The system then utilizes existing AI models, with a bit of user input via a graphical interface to demonstrate how components move, to fashion a simulated kitchen based on the footage. A virtual robot then trains via trial and error within this simulated space, engaging in tasks such as operating the toaster oven—this practice method is termed reinforcement learning.

Through this simulation process, the robot refines its skills, adapting to changes and obstacles within the environment, like a mug placed next to the toaster. This results in a robot that can transfer its learning to the actual kitchen environment, achieving performance nearly as adept as robots trained directly in a real kitchen.

On the other hand, URDFormer prioritizes rapid and economical simulation generation over precise accuracy in a single setting. It compiles images found online and aligns them with established models of object movements, such as how kitchen drawers and cabinets typically operate. Consequently, it can quickly produce a variety of generalized kitchen simulations, enabling researchers to train robots across a broad spectrum of scenarios. However, the trade-off is that these simulations lack the accuracy of those produced by RialTo.

“These two approaches can work in tandem,” Gupta mentioned. “URDFormer delivers invaluable training on numerous scenarios. Conversely, RialTo proves especially useful when deploying a pre-trained robot in a home environment, aiming for a success rate of around 95%.”

Looking ahead, the RialTo team aspires to implement its system in real homes (previously, testing has been primarily laboratory-based), and Gupta expressed interest in integrating small amounts of actual training data with these systems to enhance their effectiveness.

“Ideally, even a minimal amount of real-world data can resolve issues,” Gupta stated. “However, we still need to determine the best method to merge costly real-world data with more affordable but imperfect simulation data.”

Additional co-authors of the URDFormer paper include UW students Aaron Walsman, Marius Memmel, Alex Fang, Karthikeya Vemuri, and Alan Wu, as well as research scientist Kaichun Mo from NVIDIA. Professor Dieter Fox from the Allen School also co-authored the URDFormer paper. The URDFormer study involved contributions from MIT doctoral students Marcel Torne, Anthony Simeonov, and Tao Chen, alongside research assistant Zechu Li and undergraduate student April Chan. Assistant Professor Pulkit Agrawal at MIT was a co-senior author. The research for URDFormer received partial funding from Amazon Science Hub, whereas RialTo was partially funded by the Sony Research Award, the U.S. Government, and Hyundai Motor Company.