IISc trying to teach machines to understand voice amid noise

By combining machine learning with neuroscience, the lab is not only training systems to identify speech and emotion, but also uncovering how the brain processes sound.
A view of IISc, Bengaluru.
A view of IISc, Bengaluru.File Photo | Express
Updated on
2 min read

BENGALURU: At a lab at the Indian Institute of Science’s (IISc) Electrical Engineering department, Dr Sriram Ganapathy is trying to teach machines what most humans do easily — understand speech amid real-world noise.

With his Learning and Extraction of Acoustic Patterns (LEAP) lab, he is focusing on decoding how humans speak, listen and interpret each other, specifically in noisy environments — where current artificial intelligence (AI) models fail. His research has much potential to develop smarter hearing aids in future.

By combining machine learning with neuroscience, the lab is not only training systems to identify speech and emotion, but also uncovering how the brain processes sound.

Over the past nine years, the LEAP lab has worked on everything from developing speech recognition systems that function in noisy settings, to studying how the brain distinguishes between ‘two speakers’ in a conversation. Their research has attracted collaborations with companies like Samsung, Sony and Google.

More recently, the team at LEAP lab has been trying to use large language models (LLMs) — the same kind used in ChatGPT — to detect emotions in speech, a task that even the best AI tools still find it difficult. The lab ran experiments to study how humans can tell when a new person starts talking.

A view of IISc, Bengaluru.
IISc Bengaluru develops nanozymes to prevent excess clotting

IISc lab enhances hearing technology with help of AI

In one test, people had to press a button when they noticed a change in speaker. Interestingly, those who didn’t understand the language were quicker at identifying the change. “If the language is unknown, the brain pays more attention to the voice and tone, not the meaning,” explained Ganapathy, stating that this could help design smarter hearing aids.

A big part of Ganapathy’s work is in “representation learning”, where machines are trained to pick up patterns from audio. These patterns help machines identify different voices, accents and emotions — even when the speech is not very clear. Apart from better hearing aids, this research can help build virtual assistants that sound more human.

The lab is also working on Explainable AI (xAI), which analyses how trustworthy and understandable AI decisions are. In one project, the lab trained image-recognition models to identify important visual areas mentioned in captions. It performed better than older models. This can be used in speech models too, helping AI understand and explain its answers more clearly.

During Covid-19, Ganapathy’s team created Coswara, an app that used sounds of cough and voice to detect possible Covid infections. They trained the model using samples from people across India. Although the Indian Council of Medical Research (ICMR) discussed scaling it up, it didn’t move forward. The team later published its findings in 2023. His journey began in Kerala, where he studied Electronics and Telecommunications at the College of Engineering, Trivandrum (CET). He later joined IISc for an MTech in signal processing. .

Everything changed when he attended a talk by Hynek Hermansky from EPFL (Swiss Federal Technology Institute of Lausanne), Switzerland. Ganapathy contacted him and joined his lab there. After his PhD, he worked at the IBM TJ Watson Research Centre.

Related Stories

No stories found.

X
Open in App
The New Indian Express
www.newindianexpress.com