Covid-19 and the use of Big Data

As the Covid-19 virus continues to spread, accurate data and information is more sought after than ever before and Big Data lies at the heart of efforts to comprehend and forecast the impact that Coronavirus will have on all of us. It helps governments to track confirmed cases and fatalities, journalists to inform the public what is going on and organisations rely on it to make critical decisions but the use of Big Data is also raising several important questions about trust and privacy.
Can we trust the data that is out there? Chinese authorities now seem to be saying that they have defeated the virus and that there are no new cases being reported but these are the same authorities that suppressed essential information and reporting on the initial outbreak which delayed the response and caused severe implications.
The first case of someone suffering from Covid-19 can be traced back to 17 November, according to media reports on unpublished Chinese government data. Yet official statements by the Chinese government to the World Health Organisation only reported that the first confirmed case had been diagnosed on 8 December. Doctors who tried to raise the alarm with colleagues about a new disease in late December were reprimanded and it was not until 21 January that authorities publicly conceded there was human-to-human transmission.

Last week, senior British government minister Michael Gove told the BBC “some of the reporting from China was not clear about the scale, the nature, the infectiousness of the virus”. US President Donald Trump also said last week that the current reported death toll and infections seemed “a little bit on the light side”.

Models from Big Data supported the prediction of a pandemic. The near real-time Covid-19 trackers that continuously pull data from sources around the world are helping healthcare workers, scientists, epidemiologists and policymakers aggregate and synthesise incident data on a global basis. But how is this data gathered?
Google has, for instance, published charts showing how the coronavirus has brought hard-hit Italy to a standstill based on the analysis of location data.

The company has access to location data from billions of Google users’ phones and it is the largest public dataset available to help health authorities assess if people are abiding with shelter-in-place and similar orders issued across the world to rein in the virus.

Google has released reports for 131 countries with charts that compare traffic from 16 February to 29 March to retail and recreational venues, train and bus stations, grocery stores and workplaces with a five-week period earlier this year. There were no reports for China and Iran, where Google services are blocked.
The company said it published the reports to avoid any confusion about what it was providing to authorities, given the global debate that has emerged about balancing privacy-invasive location tracking with the need to prevent further outbreaks.
Tech companies, governments, and international agencies have all announced measures to help contain the spread of the virus. Some of these measures impose severe restrictions on people’s freedoms and rights to privacy and other human rights. Unprecedented levels of surveillance, data exploitation, and misinformation are being tested across the world.
In the data-intensive world of 2020, ubiquitous data points and digital surveillance tools can easily exacerbate privacy concerns. China is reportedly using ubiquitous sensor data and health-check apps to curb the disease spread but according to a New York Times report, there is little transparency in how these data are cross-checked and reused for surveillance purposes.

While large-scale collection of data can help curb the pandemic, it should not neglect privacy and public trust. Best practices must be identified to maintain responsible data-collection and data-processing standards at a global scale.

The use of Big Data for prediction and surveillance – for instance to identify people who have traveled to areas where the disease has spread or tracing and isolating the contacts of infected people – is of paramount importance in the fight against the pandemic.
It is equally important, however, to use these data and algorithms in a responsible manner, in compliance with data-protection regulations and with due respect for privacy and confidentiality. Failing to do so risks undermining public trust, which studies have shown will make people less likely to follow public health advice or recommendations and more likely to have poorer health outcomes.