The government in its Economic Survey on Thursday (July 4) acknowledged what the tech giants have always known and built their business models on: Data, is the new gold. The difference being that this commodity, unlike conventional gold, is available in plenty to all but whose potential has remained completely unrecognised.

So what does data do, and how is it helpful for Governance ?

The Economic Survey itself gives the answer. “A district education officer can make better decisions if he knows, for each school in his district, attendance rates of students and teachers, average test scores and status of school toilets. Similarly, parents can make better decisions about which school to send their children to if they know the average absenteeism rate of teachers in their village and can compare the rate to that in the neighboring village. A multitude of scenarios exist in which harnessing the marginal unit of data can lead to sharp increases in public welfare.” What this shows is it exponentially improves governance and delivery of government services.

Tech activists and NGOs have been calling for digital data to be treated as a public good because of the exponential value that it provides by supplementing scarce national statistics  and helping in times of emergencies. The ‘responsible data’ movement  has evolved to discuss guidelines and frameworks that will sometime in the future establish ethical principles for data sharing. However, this movement is not gaining traction with those who hold the highest-value data, particularly mobile network operators and global web 2.0 firms who are proving reluctant to make data collected in low- and middle-income countries accessible through intermediaries (infomediaries).


A large part of the technology and necessary data was created by all of us — “crowd sourced” — and logically one could conclude that this should belong to the ‘crowd’ and hence a public good. The underlying infrastructure that all these companies rely on was created collectively (via taxes and the auction of bandwidth that built the internet), and it also leverages the network effects that are produced together by crowd. There is indeed no reason why the public’s data should not be owned by a public repository that sells the data to Google’ and Facebook’s of the world , rather than vice versa, which is what the present scenario is.  But the key issue here is not just sending a portion of the profits from “data” back to citizens but also allow them to shape the digital economy in  ways that satisfies public needs. Using big data and AI to improve the services provided by the welfare state—from health care to education to affordable housing.

“As people shift their day-to-day activities online, they leave digital footprints of these activities. Put differently, people produce data about themselves and store this data on public and private servers, every day, of their own accord. Data that would have involved a laborious survey to gather a few decades ago is today accumulating online at a near-zero cost, although it is scattered across sources”

The Chapter 4 of the Economic Survey (volume 1) tabled at the parliament today talks about Data as a public Good, the chapter starts with a quote of the former Chinese premier who arguably is credited with changing the direction of the Chinese Economy, Deng Xiaoping — “Cross the river by feeling the stones.”

The survey also goes on to point the fact that in the present Indian context  not everyone participates in the digital economy. A majority of the poor still have no digital footprint. Among those who do, the range of activities undertaken online is quite limited. However, the cost of gathering data is still much lower than it was a few decades ago. Even if a door-to-door survey is the only way to gather a certain kind of data, we now have in our hands  technologies to log data online in real time, circumventing an otherwise laborious paper-based survey followed by a tedious data entry process that was followed earlier.

Together, the advancements in gathering, storing, processing and dissemination have lowered the marginal cost of data to unfathomable levels. Consequently, the marginal benefit of data (which is clearly more than the marginal cost of obtaining and storing data) is higher than ever.

In other words the Economic survey emphasizes the fact that many in the tech industry always knew which is the idea that “data” with the right set of analytical tools is indeed money, and what oil did to the 20th century economic advancement would now be done by data. But in the Indian context the Government does indeed maintains data about its citizens however most  of the data is dispersed across different registries maintained by different ministries and infact impenetrable silos.

This is why every time a citizen has to access a new service, they are asked to collect all the documents to prove their identity and prove their claim on the process  and newer and newer identity cards get issued by the various departments and ministries of the government. What the government has not considered when it comes to collecting various data is whether these datasets are interoperable. The economic survey does recognizes the problem  that government data collection in India is extremely decentralized, thus there is a need to integrate data collection and storage across ministries and agencies.

“If the information embedded in these datasets is utilised together, data offers potential to reduce targeting error in welfare schemes. For example, consider a hypothetical individual who is affluent enough to own a car but is able to avail BPL welfare schemes, though unwarranted. When datasets are unconnected, the vehicle registry does not speak to, say, the public distribution system registry. Consequently, the public distribution system continues to subsidise this individual erroneously. However, if the two datasets are integrated, such inclusion error can be minimised, saving valuable Government resources. In the same way, exclusion errors can be rectified,” the survey notes.

But does the government have access to the cutting edge machine learning tools and technology(AI) that are available to silicon valley and other places based private players who by the virtue of being early movers in the space are now eons ahead when it comes to leveraging data for social good ? That is a question for which we need to seek an honest answer.

So to come back to the question should Data be a public good or privately owned the answer would be to quote Deng Xiaoping yet again  “It does not matter if it is a Black Cat or a White cat as long as it does its job of catching mice”!