Researchers have developed a prototype of headphones that lets users hear voices within a customizable sound bubble that ranges from 3 to 6 feet. Sounds from outside this bubble are reduced by about 49 decibels, even if they are louder than the sounds inside.
Picture this scenario: You’re at work wearing noise-canceling headphones to minimize background noise. When a co-worker approaches your desk to ask a question, you can hear them clearly without needing to take off your headphones. Meanwhile, the noise from conversations at the other end of the office is barely noticeable. Or consider being in a bustling restaurant, where you can hear every word from your dining companions, while the surrounding clamor is muted.
A research team from the University of Washington has successfully created a headphone prototype that can establish this kind of “sound bubble.” By integrating artificial intelligence algorithms with their headphone design, users can hear voices within a customizable radius of 3 to 6 feet while outside sounds are diminished by an average of 49 decibels (which is approximately the gap in volume between a vacuum cleaner and leaves rustling). This technology ensures that even louder distant sounds are quieted when compared to those in the sound bubble.
The researchers shared their results on November 14 in Nature Electronics. They’ve made the code for this proof-of-concept device available for others to develop further. Additionally, the researchers are working on a startup to commercialize this innovative technology.
“Humans struggle to gauge sound distances, especially when faced with multiple noise sources,” explained Shyam Gollakota, a senior author and professor at the Paul G. Allen School of Computer Science & Engineering at UW. “Our ability to concentrate on the people around us can be challenging in loud venues like restaurants, making it difficult to create sound bubbles in headphones until now. Our AI can learn to differentiate the distance of each sound source in a room and process this data in real-time, within just 8 milliseconds, on the device itself.”
The research team assembled the prototype using standard noise-canceling headphones, equipping the headband with six small microphones. A neural network, running on a compact onboard computer attached to the headphones, monitors the sounds each microphone picks up. If sounds are detected outside the bubble, the system suppresses them while enhancing and slightly amplifying sounds from within the bubble (given that noise-canceling headphones naturally allow some sound to filter through).
“Previously, we had developed a smart speaker system that utilized microphones spaced across a table, under the assumption that greater distance was necessary to gather sound distance information,” Gollakota shared. “However, we began to question this assumption. We discovered that we could achieve the desired ‘sound bubble’ just using the microphones on the headphones, and we were able to do this in real-time, which was quite unexpected.”
To train their system for sound bubble creation across different settings, the researchers recognized the need for a dataset based on real-world sound distances, which did not exist. To obtain this data, they placed the headphones on a mannequin head and used a robotic platform to rotate the head while a moving speaker played sounds from varying distances. The team collected information using this mannequin system as well as through trials with human users in 22 diverse indoor environments such as offices and homes.
The researchers identified two key reasons for the system’s effectiveness. Firstly, the wearer’s head acts as a reflector of sound, aiding the neural network in discerning sounds from different distances. Secondly, sounds—like human speech—contain various frequencies, and these frequencies travel through different phases from their source. The researchers believe their AI algorithm compares these frequencies’ phases to assess the distance of any sound source (such as someone talking).
While headphones like Apple’s AirPods Pro 2 can amplify the voice of a person directly in front of the user while reducing some background noise, they do this by tracking head position and boosting sound from a particular direction. This method does not adjust for distance, making it ineffective for enhancing multiple speakers simultaneously and reducing louder sounds coming from the targeted direction if the user turns away.
Currently, the system is designed for indoor use only, since capturing clean audio for training outdoors proves challenging. Next, the researchers aim to adapt the technology for use in hearing aids and noise-canceling earbuds, necessitating a new approach for microphone placement.
Other contributors to this research include Malek Itani and Tuochao Chen, doctoral students at UW’s Allen School; Sefik Emre Eskimez, a senior researcher at Microsoft; and Takuya Yoshioka, director of research at AssemblyAI. This research was supported by a Moore Inventor Fellow award, funding from the UW CoMotion Innovation Gap Fund, and the National Science Foundation.