THIRUVANANTHAPURAM: In a major breakthrough in the area of multilingual automatic speech recognition (ASR) -- which help transcribe speech in more than one language -- three researchers from the state have detected a flaw in the evaluation suite of the AI model used by different multinational companies in transcribing texts from Indic scripts.
Usually, AI companies claim that these models are precise as they are automated. However, research carried out by a team led by Elizabeth Sherly, head of Virtual Resource Centre for Language Computing (VRCLC) at the Digital University Kerala, alongside computational linguist Kavya Manohar and research scientist Leena G Pillai, has found that AI systems had trouble evaluating Indian languages including Malayalam, Hindi and Tamil accurately.
“In Google translator, we can find errors when Malayalam language, which has many vowel signs, is used,” Elizabeth told TNIE.
“When we tested it, we found that these ASR models didn’t recognise the vowel signs correctly. AI systems usually avoid the coma and the full stop at the processing stage. However, we found that accuracy level is very low in the evaluation stage too, with the failure to recognise vowel signs. When we corrected the error, the system responded positively,” she said.
The error in accuracy was found in the ASR models designed to transcribe speech into text by Open AI’s Whisper, Meta’s MMS, Seamless, and Assembly AI’s Conformer.
“All companies claim that their ASR models based on AI are accurate,” said Kavya.
“However, there are instances in which these models fail to recognise the Indic languages. We first tested Open AI’s accuracy checking process by formulating a standard speech and an expected text of it in Malayalam. However, it was found that their accuracy checking process fails to account for vowel signs.”
The researchers had carried out the same test in other ASR models in Tamil and Hindi and found the vowel signs missing. However, the English and Finnish language models did not have the issue.
According to Kavya, there is a reason behind it. When the evaluation programme was written for the English language, the companies avoided the coma and other signs.
“Hence, in English, when a speech is transcribed into text, only those signs are avoided. However, in the case of Indic languages, it avoided the vowel signs. So, during the evaluation of transcribing the word ‘Digital University’ in Malayalam, it would be processed as ‘DAJATTAL YANAVAZHASATTA’, thereby losing its readability. This will lead to errors which will go undetected. However, the companies claim the systems to be accurate. This issue was detected in Thai language too,” she said.
The new discovery has caught the interest of international researchers as well. The team presented a research paper ‘What is lost in Normalisation? Exploring Pitfalls in Multilingual ASR Model Evaluation’ at the prestigious international conference EMNLP (Empirical Methods in Natural Language Processing) 2024 in Florida in the US, virtually. The Association of Computational Linguistics (ACL) also awarded the team with a grant to support their presentation.