IIT Madras faculty develop AI to process text in 11 Indian languages

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi

IIT Madras (File photo | EPS)

Updated on:

22 Sep 2020, 7:12 pm

2 min read

CHENNAI: Indian Institute of Technology Madras (IIT-M) faculty have developed Artificial Intelligence (AI) models and datasets to process texts in 11 Indian languages. According to a statement issued by the institute, this was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for local problems.

Elaborating on this initiative, Mitesh M Khapra, department of Computer Science and Engineering, said, “As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

For example, imagine a learner who posts a question on an e-learning platform in Tamil, Hindi or another Indian regional language. There is a need for tools that can automatically process
such questions written in Indian languages and classify them into specific topics.

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi. The statement added that an accompanying research paper describing the research methodologies and evaluation has been accepted at EMNLP-Findings -- a companion publication at one of the top Natural Language Processing conferences.

AI4Bharat is an initiative co-founded by Khapra and Pratyush Kumar from IIT-M and works to solve India specific problems in a community-driven, open-sourced manner. They are also associated with the Robert Bosch Centre for Data Science and Artificial Intelligence.

For the past one year, a team of researchers comprising students, faculty and volunteers from IIT Madras and AI4Bharat worked on collecting data and training powerful models for processing text
written in Indian languages, the statement said, adding that the AI took advantage of the similarities between Indian languages to make efficient use of data. These open-source models are freely available and can be downloaded from a Github repository (https://indicnlp.ai4bharat.org/).

IIT Madras faculty develop AI to process text in 11 Indian languages

Related Stories

Man injured in cylinder blast at scrapyard dies at hospital in Chennai

One hurt in group clash at juvenile home, 19 booked in Chengalpattu

Chennai youth makes girlfriend ‘steal’ gold worth Rs 1.3 crore from home

Ammonia leak: Seafood unit sealed after technical remediation in Tiruvallur