IIIT-B students develop web search in colloquial Kannada

The application is based on community radio recordings by an organisation, ‘Namma Halli Radio’.
IIIT-B students develop web search in colloquial Kannada

BENGALURU: India’s rural areas contain rich knowledge that is often lost or becomes inaccessible on the internet since it is primarily oral. To bridge this gap, Aparna Madva, a Masters in Data Science student at the International Institute of Information Technology, Bangalore (IIIT-B), has introduced Graama-Kannada Audio Search, a search engine built on the foundation of colloquial audio content in Kannada.

The application is based on community radio recordings by an organisation, ‘Namma Halli Radio’. The recordings include interactions with villagers and the community residing in Tumakuru district.

“Small communities have immense knowledge, but don’t store it formally or write books on it. This audio corpus was created through a local radio show which greatly helps even people with low literacy levels,” explained Aparna.

She added that the audio search engine is developed on the principles of Large Language Models (LLMs) and tweaked a little to train it using optimal approach, under the guidance of Srinath Srinivasa, Professor and Dean (R&D), Web Science Lab, IIIT-B. “We fine-tune state-of-the-art Automatic Speech Recognition (ASR) models using limited audio data to reduce the Word Error Rate (WER) for colloquial audio data to acquire transcripts for the audio, followed by creating an interface to search for keywords using simple fuzzy matching technique for n-gram inputs,” Aparna said. The fuzzy match takes into consideration spelling errors and still offers appropriate responses.

Through this model, the team aimed to develop a search engine that individuals can use, entering relevant keywords. Popular concepts as options have also been compiled to make the search easy. The web application has been developed on five hours of audio recording from the local radio, and has created around 150 pages of text.

Users can type in the English script and translate it into Kannada or directly type in Kannada, which will display results from the audio corpus. Explaining the real-time impact of this model, Aparna said if an individual in the area wants to know more about a nearby temple, villagers can just search for it. “It will display all the times the temple has been mentioned and they can learn more about it. From education, and health to ancient remedies and bedtime stories, the audio will help villagers develop their local knowledge. The next step is to work on a voice-based search mechanism and not just typing, as it is meant for rural areas. These individuals have very little formal education. Through this feature, we want to reach the last mile,” said Aparna.

Team members

Sharath Srivatsa (PhD Scholar, Web Science Lab, IIIT Bangalore)

Sai Madhavan G (iMTECH student, IIIT Bangalore)

TB Dinesh (iruWay Rural Research Lab, Janastu)

Related Stories

No stories found.

X
The New Indian Express
www.newindianexpress.com