How AI empowers The Hindu to ask questions

Mar 25, 2026 at 06:00 pm by admin


The figures are astronomic… but so are the results The Hindu is achieving with AI embedded into the Indian masthead’s data journalism.

Delegates at WAN-Ifra’s AI in Media forum in Bangalore heard from deputy national editor Srinivasan Ramani, how large language models are helping reporters process vast document sets, write scripts and build interactive tools.

He said the goal was not automated storytelling, but expanding the scale and speed of investigations, WAN-Ifra research editor Neha Gupta reports.

In recent months, journalists at The Hindu parsed nearly 22 million voter records across three Indian states, built an election results interface without writing a line of code manually, and assembled low-cost heat sensors to measure how different workers experience extreme temperatures.

Much of that work, Srinivasan Ramani (pictured) said in Bangalore, was accelerated by LLMs used, not to generate prose, but to process documents, write code, and structure investigations. “AI is a very sophisticated intern,” he said. “You tell it exactly what to do. It does it. But you remain in control.”

One of the most extensive projects examined India’s ‘special intensive revision’, a periodic update of voter rolls conducted by the Election Commission. In the latest round, authorities released records listing deleted voters and the reasons cited.

The data was not analysis-ready, but came in the form of image-based PDFs – effectively photographs of forms – in Hindi.

In Bihar alone, the team processed around 90,000 files covering 65 lakh (6.5 million) records. Tamil Nadu involved roughly 78,000 files and 97 lakh (9.7 million) records; West Bengal about 80,000 files and 58 lakh (5.8 million) records. In total, the three states accounted for roughly 22 million (2.2 crore) records.

The newsroom used OCR to convert image-based files into machine-readable text, translated them into English and stored the results in databases. Ramani relied on LLMs to generate SQL queries through natural-language prompts rather than writing database commands manually.

The analysis surfaced patterns that prompted further reporting. For example, in Bihar, more women than men appeared to have been deleted from voter rolls despite higher male out-migration. And in several polling booths, large shares of deleted voters were marked as deceased even though many were under 50.

Scrutiny widened after the Supreme Court of India directed the Election Commission to release full deletion records. The Hindu built a searchable database of deleted names and reasons and published separate state-level investigations.

“These were not conclusions drawn by AI,” Ramani said. “The hypothesis was ours. The political and social context was ours. AI helped us process the scale.”

The findings were discussed in Parliament and in court proceedings, and in Bihar, some corrections to voter rolls followed public scrutiny and ground reporting.

Ramani said AI use extended beyond document processing. For India’s 2019 and 2024 general elections – national parliamentary polls – the team built interactive maps allowing users to filter results by region, state, rural-urban classification and urban clusters.

 

The application used JavaScript, HTML and D3, but Ramani did not manually write the code. “I did not write a single line myself,” he noted. “The entire application was built over two weeks using prompts in ChatGPT, Gemini and Claude.”

The team collected publicly available election data, broke the interface into components – filters, maps, list views – and used models to generate annotated code for each, enabling verification.

India’s general elections involve nearly a billion eligible voters. Building tools that allow constituency-level filtering at that scale is technically demanding, particularly under deadline.

Previously, such projects required in-house engineers or outside volunteers. AI-assisted development shortened that loop. “Deadlines are sacrosanct in journalism,” he said. “Now we don’t have to extend them because we’re waiting for technical help.”

Ramani emphasised that AI tools fit into an established data journalism pipeline: hypothesis formation, data collection – via scraping, public records requests or mining structured sources – cleaning and structuring, analysis, visualisation, and publication.

His team’s work fell into five types: simple trend analysis; correlation studies; factor analysis; causal investigations; and deep-dive accountability reporting. AI now assists at multiple stages: generating web-scraping scripts, processing unstructured documents, suggesting database queries and building front-end interfaces.

But human oversight, he said, remains central.

In one instance, an AI-generated script processed documents sequentially, slowing the analysis. Only after a technologist suggested multi-threading, or parallel processing, did the model produce a more efficient version when prompted accordingly.

“You need human insight to tell it what to optimise,” Ramani said.

He cautioned against using AI to draw editorial conclusions. In structured tasks – extracting data, generating code – hallucination risks are lower, he argued, because outputs can be tested directly.

From graphics to investigations

Ramani traced the evolution of data journalism at The Hindu over the past decade, from visual add-ons to traditional reporting to a dedicated function with data journalists, designers and editorial coders building applications and investigations.

Among its major projects was an excess deaths analysis during the COVID-19 pandemic. Using civil registration data, the newsroom estimated that official COVID death counts were underreported by a factor of five to six.

The finding was contested at the time, but later analyses by the World Health Organisation and subsequent official data revisions pointed to substantial undercounting.

“Today, data-driven reporting is integrated across print and digital operations rather than siloed as a specialist unit. Many of these investigations are published as premium stories,” Ramani said, adding that the newsroom has seen higher subscriptions and engagement for such work.

“We want a more informed audience. This kind of work helps us move in that direction. Across projects, AI does not replace journalistic judgement. It expands the scale at which it can operate,” he said.

–WAN-Ifra/Neha Gupta with thanks


Comments

or Register to post a comment




ADVERTISEMENTS


ADVERTISEMENTS