IIIT-Hyderabad researchers kick off mammoth crowdsourcing speech project
Over 50 per cent of the Indian population uses devices that are embedded with AI-based speech recognition technology.
HYDERABAD: Researchers at the International Institute of Information Technology, Hyderabad (IIIT-H) on Thursday kicked off one of the largest crowdsourcing speech projects to connect voice with vernacular languages, including Telugu, to build an automatic speech recognition system.
Over 50 per cent of the Indian population uses devices that are embedded with AI-based speech recognition technology. But in this multi-lingual country, which has 22 official languages and 12 different scripts, these voice-enabled devices are dominated by English-speaking assistants.
As a pilot, the team is inviting volunteers to contribute to the Telugu language speech data. The idea is to collect around 2,000 hours of spoken Telugu over the course of one year. The IIIT-H has joined forces with the government to embark on the Automatic Speech Recognition (ASR) module for the translation of Indian languages. The project is being led by Prakash Yalla, the head of Technology Transfer Office, and Dr Anil Kumar Vuppala, an associate professor at the Speech Processing Centre.
However, building AI-enabled automatic speech recognition systems requires thousands of hours of speech data, along with transcribed text of the same, for each language. “We have been working on speech recognition technology for the last 10 years and have collected data too, but it is only of 50-60 hours. The challenge is not limited to the audio or speech file. We need to fragment the speech files and write them down in the form of text, which is a laborious process,” Dr Vuppala said. With Prof. Raj Reddy’s vision of reaching out to the common man, conversational AI assumes importance.
Hence, datasets containing speech in as natural a setting as possible is crucial. For this, the project is looking towards crowdsourcing of speech data as a cost-effective measure. “We plan on liaisoning with academic institutions across TS and AP, and conducting Just-A-Minute and debate competitions. Another approach is via the existing Telugu Wikipedia community which consists of learned scholars and lovers of the language,” Prakash said.
Talking about the likely challenges, Dr Vuppala said, “The volunteers should have relevant experience in transcription too, as it will reflect in the quality of the ASR we are building.” The initial collection of Telugu speech data will help establish protocols and systems for crowdsourcing of data for all Indian languages.