IIIT-Hyderabad researchers kick off mammoth crowdsourcing speech project

Over 50 per cent of the Indian population uses devices that are embedded with AI-based speech recognition technology.

Published: 08th January 2021 08:24 AM  |   Last Updated: 08th January 2021 08:32 AM   |  A+A-

International Institute of Information Technology, Hyderabad

International Institute of Information Technology, Hyderabad

By Express News Service

HYDERABAD: Researchers at the International Institute of Information Technology, Hyderabad (IIIT-H) on Thursday kicked off one of the largest crowdsourcing speech projects to connect voice with vernacular languages, including Telugu, to build an automatic speech recognition system.  

Over 50 per cent of the Indian population uses devices that are embedded with AI-based speech recognition technology. But in this multi-lingual country, which has 22 official languages and 12 different scripts, these voice-enabled devices are dominated by English-speaking assistants.   

As a pilot, the team is inviting volunteers to contribute to the Telugu language speech data. The idea is to collect around 2,000 hours of spoken Telugu over the course of one year. The IIIT-H has joined forces with the government to embark on the Automatic Speech Recognition (ASR) module for the translation of Indian languages. The project is being led by Prakash Yalla, the head of Technology Transfer Office, and Dr Anil Kumar Vuppala, an associate professor at the Speech Processing Centre. 

However, building AI-enabled automatic speech recognition systems requires thousands of hours of speech data, along with transcribed text of the same, for each language. “We have been working on speech recognition technology for the last 10 years and have collected data too, but it is only of 50-60 hours. The challenge is not limited to the audio or speech file. We need to fragment the speech files and write them down in the form of text, which is a laborious process,” Dr Vuppala said. With Prof. Raj Reddy’s vision of reaching out to the common man, conversational AI assumes importance.

Hence, datasets containing speech in as natural a setting as possible is crucial. For this, the project is looking towards crowdsourcing of speech data as a cost-effective measure. “We plan on liaisoning with academic institutions across TS and AP, and conducting Just-A-Minute and debate competitions. Another approach is via the existing Telugu Wikipedia community which consists of learned scholars and lovers of the language,” Prakash said.

Talking about the likely challenges, Dr Vuppala said, “The volunteers should have relevant experience in transcription too, as it will reflect in the quality of the ASR we are building.” The initial collection of Telugu speech data will help establish protocols and systems for crowdsourcing of data for all Indian languages. 

Follow The New Indian Express channel on WhatsApp



Disclaimer : We respect your thoughts and views! But we need to be judicious while moderating your comments. All the comments will be moderated by the newindianexpress.com editorial. Abstain from posting comments that are obscene, defamatory or inflammatory, and do not indulge in personal attacks. Try to avoid outside hyperlinks inside the comment. Help us delete comments that do not follow these guidelines.

The views expressed in comments published on newindianexpress.com are those of the comment writers alone. They do not represent the views or opinions of newindianexpress.com or its staff, nor do they represent the views or opinions of The New Indian Express Group, or any entity of, or affiliated with, The New Indian Express Group. newindianexpress.com reserves the right to take any or all comments down at any time.

flipboard facebook twitter whatsapp