In Star Trek: The Next Generation, Captain Picard and the crew of the U.S.S. Enterprise use the holodeck, an empty room that can create 3D environments, to get ready for missions and have fun by simulating various settings like lush jungles and the London of Sherlock Holmes. These holodeck environments are highly immersive and completely interactive, and they can be customized in countless ways using just language – the crew simply needs to ask. Engineers have now developed a tool that can generate similar 3D environments using AI technology, based on everyday language prompts.
In the world of computer science, there is a tool called the holodeck that creates virtual environments. Today, these interactive virtual environments are also used to train robots before they are deployed in real life, in a process known as “Sim2Real.” However, these virtual environments have been in short supply. According to Yue Yang, a doctoral student in the labs of Mark Yatskar and Chris Callison-Burch, artists have to manually create these environments, and it can take them a week to build just one. This is a time-consuming and decision-heavy process.
There is a scarcity of virtual environments available for training robots to navigate real-world complexities. Neural networks, which power AI systems, need large amounts of data, in this case, simulations of the physical world. “Generative AI systems like ChatGPT are trained on trillions of words, and image generators like Midjourney and DALLE are trained on billions of images,” says Callison-Burch. “We only have a fraction of that amount of 3D environments for training so it’s a problem.”The process, known as ’embodied AI,’ involves using generative AI techniques to create robots capable of safely navigating real-world environments, necessitating the creation of millions or even billions of simulated environments.”-called ’embodied AI.’ If we want to use generative AI techniques to develop robots that can safely navigate in real-world environments, then we will need to create millions or billions of simulated environments.”
Enter Holodeck, a system for generating interactive 3D environments co-created by Callison-Burch, Yatskar, Yang and Lingjie Liu, Aravind K. Joshi Assistant Professor in CIS, along with collaborators at Stanford, the University of Washington, and the Allen Institute for Artificial Intelligence (AI2). Named for its Star Trek forebear, Holodeck generates a virtually limitless range of indoor environments, using AI to in rnHolodeck allows users to control it using language, according to Yang. It can easily understand and create different environments for the AI agents to be trained in. Holodeck utilizes the knowledge from large language models (LLMs), which are the systems behind ChatGPT and other chatbots. LLMs have a lot of knowledge about space design due to the large amount of text they consume during training. Essentially, Holodeck works by having a conversation with an LLM using a specific set of hidden questions.The goal is to categorize user requests into specific parameters. For example, similar to how Captain Picard might request Star Trek’s Holodeck to simulate a speakeasy, scientists can request Penn’s Holodeck to generate “a 1b1b apartment of a researcher who has a cat.” This request is carried out in multiple steps: first, the floor and walls are created, followed by the doorway and windows. Then, Holodeck searches Objaverse, a large library of pre-made digital objects, for appropriate furnishings such as a coffee table, a cat tower, and so on. Finally, Holodeck queries a layout module, which the researchers designed to constrain the pl.Placement of items is important to avoid awkward positioning, such as a toilet sticking out from the wall horizontally. To assess the realism and accuracy of Holodeck, researchers compared it to ProcTHOR by creating 120 scenes and asking Penn Engineering students to choose their preferred version without knowing which tool was used. In every aspect – asset selection, layout coherence, and overall preference – the students consistently favored the environments generated by Holodeck.The researchers also tested Holodeck’s ability to create scenes that are less common in robotics research and more challenging to manually generate than apartment interiors, such as stores, public spaces, and offices. When comparing Holodeck’s results to those of ProcTHOR, which were produced using rules created by humans rather than AI-generated text, the researchers once again discovered that human evaluators preferred the scenes created by Holodeck. This preference was consistent across various indoor environments, ranging from science labs to art studios, locker rooms to wine cellars.
Finally, the researchers utilized scenes generated by Holodeck to “fine-tune” an embodied AI.agent. According to Yatskar, “The best way to test Holodeck is to use it to assist robots in safely interacting with new environments, preparing them to inhabit unfamiliar places.” Across various virtual spaces like offices, daycares, gyms, and arcades, Holodeck significantly improved the agent’s ability to navigate new areas. For example, when pre-trained using ProcTHOR, the agent only found a piano in a music room about 6% of the time after taking around 400 million virtual steps. However, after being fine-tuned using 1, the agent successfully found the piano over 30% of the time.00 music rooms generated by Holodeck.
“For a long time, the field has been focused on researching residential spaces,” Yang explains. “But there are so many different environments out there, and efficiently generating a large number of environments to train robots has always been a major challenge. Holodeck provides this functionality.”
Journal Reference:
- Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark. Holodeck: Languag rnrnThe creation of 3D embodied AI environments is detailed in the article “Guided Generation of 3D Embodied AI Environments,” which was submitted to arXiv in 2024. The DOI for the article is 10.48550/arXiv.2312.09067.