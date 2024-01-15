BENGALURU: India’s rural areas contain rich knowledge that is often lost or becomes inaccessible on the internet since it is primarily oral. To bridge this gap, Aparna Madva, a Masters in Data Science student at the International Institute of Information Technology, Bangalore (IIIT-B), has introduced Graama-Kannada Audio Search, a search engine built on the foundation of colloquial audio content in Kannada.

The application is based on community radio recordings by an organisation, ‘Namma Halli Radio’. The recordings include interactions with villagers and the community residing in Tumakuru district.

“Small communities have immense knowledge, but don’t store it formally or write books on it. This audio corpus was created through a local radio show which greatly helps even people with low literacy levels,” explained Aparna.

She added that the audio search engine is developed on the principles of Large Language Models (LLMs) and tweaked a little to train it using optimal approach, under the guidance of Srinath Srinivasa, Professor and Dean (R&D), Web Science Lab, IIIT-B. “We fine-tune state-of-the-art Automatic Speech Recognition (ASR) models using limited audio data to reduce the Word Error Rate (WER) for colloquial audio data to acquire transcripts for the audio, followed by creating an interface to search for keywords using simple fuzzy matching technique for n-gram inputs,” Aparna said. The fuzzy match takes into consideration spelling errors and still offers appropriate responses.