Title: How much data: measuring quantity, visibility and topics of Indian open governmental data using machine algorithm approaches
Authors: Priya Tiwari; Bhaskar Mukherjee
Addresses: Department of Library and Information Science, Banaras Hindu University, Varanasi, India ' Department of Library and Information Science, Banaras Hindu University, Varanasi, India
Abstract: This study evaluates India's open data ecosystem through data.gov.in by analysing sector-wise and state-wise data production, usability, and major content topics. Using 6,907 datasets (2012-2023) extracted via a Python-based API, the analysis highlights key areas of interest: primary healthcare data (health), crop production statistics (agriculture), and demographic data (census). States like Tamil Nadu and Assam focus heavily on health-related data, while Andhra Pradesh and Bihar prioritise census abstracts. Tamil Nadu leads contributions to transport data (75.44%) and livestock census data (16.53%). Comparisons with USA and UK portals reveal India's strength in regional datasets but emphasise gaps in technical infrastructure and user engagement. Despite disparities between displayed and actual catalogues, India's tailored local data fosters transparency and data-driven governance. This first-of-its-kind study benchmarks India's portal globally, providing insights into sectoral focus, identifying best practices, and addressing socio-economic governance challenges.
Keywords: governmental data; Indian government; open data; machine algorithm; data visibility; sectoral data; state data; open science; content analysis; India; UK; USA.
Electronic Government, an International Journal, 2026 Vol.22 No.1, pp.1 - 19
Received: 15 Feb 2024
Accepted: 03 Dec 2024
Published online: 03 Dec 2025 *