Researchers Move Closer to Giving Voice to Those Who Cannot Speak
A “new voice” for people who cannot verbally communicate, such as those living with amyotrophic lateral sclerosis (ALS) or recovering from stroke, is a step closer to becoming reality.
Researchers from the Zuckerman Institute at Columbia University have developed a computer algorithm that can recognize a person’s thoughts as brain activity waves and transform them into intelligible speech.
This breakthrough, which combines the power of speech synthesizers and artificial intelligence, may provide valuable data for development of the next generation of speech brain-computer interface systems.
This brain translator algorithm was described in a study, “Towards reconstructing intelligible speech from the human auditory cortex,” published in the journal Scientific Reports.
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” Nima Mesgarani, PhD, principal investigator at the Mortimer B. Zuckerman Mind Brain Behavior Institute and senior author of the study, said in a news release. “With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
When people speak or even think words, the brain reacts through specific brain wave patterns. These distinct, recognizable signals also emerge when we listen to someone, or imagine listening.
Researchers have been trying to decode these patterns into something that could be easily recognizable. But this has been challenging, mostly because of the limitations of the existing simple computer models.
To overcome this, Mesgarani’s team set out to develop a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people speaking.
“This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions,” said Mesgarani, who is also an associate professor of electrical engineering at Columbia’s Fu Foundation School of Engineering and Applied Science.
They recruited five patients who would undergo brain surgery to treat epilepsy, and record their brain waves as they listened to continuous speech sounds. The collected data was used to teach the vocoder how to interpret the patterns.
To test the system, the team asked two men and two women to recite the numbers zero through nine; the resulting brain wave signals were then decoded. The result was a robotic voice reciting the numbers back. These were accurately recognized 75% of the time by 11 volunteers with normal hearing.
“We found that people could understand and repeat the sounds about 75 percent of the time, which is well above and beyond any previous attempts,” Mesgarani said. “The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.”
The team is planning to further develop the brain-decoding algorithm, testing it with more complicated words and sentences. They expect that this system could one day be part of an implant, in order to translate a person’s thoughts directly into words.
“In this scenario, if the wearer thinks, ‘I need a glass of water,’ our system could take the brain signals generated by that thought and turn them into synthesized, verbal speech,” Mesgarani said. “This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them.”