IIT Madras faculty develop AI to process text in 11 Indian languages

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi
IIT Madras (File photo | EPS)
IIT Madras (File photo | EPS)

CHENNAI: Indian Institute of Technology Madras (IIT-M) faculty have developed Artificial Intelligence (AI) models and datasets to process texts in 11 Indian languages. According to a statement issued by the institute, this was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for local problems.

Elaborating on this initiative, Mitesh M Khapra, department of Computer Science and Engineering, said, “As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

For example, imagine a learner who posts a question on an e-learning platform in Tamil, Hindi or another Indian regional language. There is a need for tools that can automatically process
such questions written in Indian languages and classify them into specific topics.

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi. The statement added that an accompanying research paper describing the research methodologies and evaluation has been accepted at EMNLP-Findings -- a companion publication at one of the top Natural Language Processing conferences.

AI4Bharat is an initiative co-founded by Khapra and Pratyush Kumar from IIT-M and works to solve India specific problems in a community-driven, open-sourced manner. They are also associated with the Robert Bosch Centre for Data Science and Artificial Intelligence.

For the past one year, a team of researchers comprising students, faculty and volunteers from IIT Madras and AI4Bharat worked on collecting data and training powerful models for processing text
written in Indian languages, the statement said, adding that the AI took advantage of the similarities between Indian languages to make efficient use of data. These open-source models are freely available and can be downloaded from a Github repository (https://indicnlp.ai4bharat.org/).

Related Stories

No stories found.

X
The New Indian Express
www.newindianexpress.com