The tussle with AI over language

Learning, speculation and dreams are embedded in language. It’s also a primary resource for AI to stay relevant. Tech firms training their models on language content owe it to the community to support language learning.
Image used for representational purposes only.
Image used for representational purposes only.

Every once in a while, I have an argument standing at my doorstep. It’s with a delivery agent who asks me for an OTP. The package has been ordered by a family member, who is not at home or is inaccessible at the moment. I point out to them that they are at my door, the address is right and the package is paid for. I am informed that without the OTP the delivery cannot be closed in their system. I offer to sign a paper, like we do for India Post deliveries. However, rules are rules and processes are processes, so they leave with the package, saying they will attempt to deliver the next day. Our tech-driven life sometimes leaves no room for common sense and practicality.

A majority of people are smitten by technology, especially that driven by artificial intelligence. It has indeed made many aspects of decision-making smoother and faster. Recently, the head of the tech giant Microsoft expressed a dislike for the term artificial intelligence. He is quoted as having said, “Because I have my intelligence. I don’t need any artificial intelligence.” He also said ‘different intelligence’ would be a better term. It is doubtful whether a change in nomenclature would change the game. Rather, there is a need to assert the stupendous dimensions of human intelligence.

The ability to use complex language is a distinct feature of the human species. Language is not merely a means of communication. It is a repository of knowledge, a tool of creativity and the medium of cultural and social bonding. It is unsurprising that AI machines are also being trained to acquire language skills. With the advent of large language models or LLMs, AI is able to generate content, which is proving helpful in analysis, writing project reports and even class assignments for students.

While the ease of access to resources and reduction in time consumed are obvious advantages, there are serious concerns on the quality of learning and originality in thinking. There are also ethical questions of how much of the output is plagiarism. The flaws in the output are often bizarre. A recent campaign in the UK, ‘I am not a typo’, highlights the distortions of autocorrect for African, Asian and eastern European names. The predictive text becomes presumptive, armed with suggestions and altering the output as per its sweet will.

Many old-timers, irrespective of their accents and speaking styles, take pride in their writing skills in a second or third language, not merely in their mother tongue. Schoolteachers of a bygone era drilled the rigours of grammar and syntax, spelling and punctuation with great thoroughness. Learning languages meant listening to stories, reciting poems and enacting plays. The culture and tradition relating to the language came alive in this pedagogy. However, the easily-available tech tools for grammar, syntax, spell checks, and translation across software platforms have rendered the need for knowing and learning even basic rules irrelevant. While technology may help overcome functional barriers, it acts as a disincentive in language learning.

In the National Education Policy, emphasis is laid on multi-lingual flexibility. In reality, language subjects are low-priority for students, especially while preparing for competitive exams. An emphasis on science, technology, engineering and math in the curriculum enables a ticket to a professional course and future employment. The result is a gradual attrition in language skills. Language and literature are a substratum of human culture. The Indian Constitution recognises 22 major ‘scheduled’ languages. These are the major literary languages with a considerable volume of writing in them. Human learning, speculations and dreams are embedded in language. It cannot be merely evaluated for its utility in terms of job procurement.

Language is a primary resource that AI needs to stay relevant. It is trained on terabytes of data available by scraping the internet. However, language is more than a token or data. While AI churns out a sentence, it is matching patterns in the data. While this is creditable, it is not the same as human use of language with an understanding of the meaning. It is ironical that while we increasingly depend on AI models for numerous transactions, AI models feed off human intellectual capital acquired over the years. Therefore, tech companies training their LLMs on content, which is a product of human thought, owe it to the community to sponsor and fund core language learning, as they largely use this content without paying for it.

There is also concern among big AI players about a possible data crunch for training future models. The data available on the internet is finite. Once the models are trained on the available data, they need larger swathes. If future models are trained on AI-generated content or synthetic data, it will only dilute the quality of subsequent iterations, as there will be no residue of originality. The human capacity to use language to store knowledge and push the boundaries of thought with the use of imagination is potentially limitless. Technology is on an upward trajectory. However, it is a derivative of just one aspect of human knowledge. There is a need for some conscious decoupling vis-à-vis technology and reclaim a part of our lives to celebrate the epiphanies of art, music and literature.

Geetha Ravichandran

Former bureaucrat and author, most recently, of The Spell of the Rain Tree

(The author is not a technology expert. Views are personal)


Related Stories

No stories found.

The New Indian Express