Your Source for Data Science

Detecting and Monitoring Diseases with Big Data

In an increasingly globalized world, diseases can spread easier than ever.


To avoid epidemics such as the 2014 Ebola outbreak, early detection and diagnosing of diseases is critical. Even in the more isolated cases, such as the development of cancer, early diagnosis can save lives. Big data can be used as a versatile tool to assist in the detection, monitoring, and diagnosis of bacterial diseases and cancer.

Monitoring Disease Outbreaks with Big Data

Past efforts to combat outbreaks of disease typically focused on the collection of physical information from laboratory test results and public health records, to create predictive models of how the disease might spread.

However, the big data model uses medical information, internet resources, social media, and other sources to enable real time tracking of disease outbreaks. This method has been proven effective as a report published by Scientific Reports states that researchers at Boston Children’s Hospital managed to detect cases of influenza up to a week before the CDC reported them. Unlike physical methods of information distribution, where materials need to be printed and then sent to wherever they are needed, data transfer is instant and cheap.

Scientists can now use big data to form accurate real-time maps of outbreaks. This is a much needed improvement as traditional maps typically have a short period of reliability, since outbreaks spread fast and change quickly. Data collected from hospitals, internet databases, and social media feeds combined with accurate geo-positioning data allows researchers and analysts to see the full extent of an outbreak.

Big data is also useful in predicting where outbreaks will occur and how far they will spread. The instant data is reported that indicates a disease has shifted and spread to a new location, predictive models can be adjusted based off of the new information. A data management system for use in simulating outbreaks called epiDMS does this already, and as more information on outbreaks becomes available, the accuracy of epidemic modeling is likely to improve.

Detecting Disease Symptoms with Big Data

In addition to changing how diseases outbreaks are monitored, big data has the potential to dramatically change how diseases are detected.

A study published in the Journal of Oncology Practice used data gathered from Bing searches to identify the individuals most likely to develop pancreatic cancer based on their web searches. This was done by identifying people whose searches provided compelling evidence that they had recently been diagnosed with cancer, and then going back to analyze their previous searches for symptoms of the disease, trying to find patterns of searches which suggested that the individual would eventually be diagnosed with pancreatic disease. The large-scale data was anonymized to protect the privacy of the originators of the data.

The researchers state that they successfully found patterns of queries which can predict future queries that are “highly suggestive” of pancreatic cancer, and that they could correctly identify 5 to 15 percent of pancreatic cancer cases while maintaining a false positive rate as low as 1 in 100,000.

Pancreatic cancer, and diseases like it, have higher survival rates the earlier the disease is diagnosed, so early detection is important. The goal of the research project was not to diagnose the disease, but to encourage people who have known symptoms of the disease to contact a medical professional who can make the diagnosis. The research was conducted to prove that developing an alternative form of detection network or monitoring system for diseases was possible, and the study’s authors hope that their research will spur further investigation into the possible applications for big data in identifying hard-to-detect diseases.

Monitoring and detecting diseases is still a relatively new application of big data, but as technology improves and data sets get more comprehensive, the impact and applications of big data will grow as well. As with every new application of technology, there’s no doubt ethical concerns that must be dealt with, such as protecting privacy when using disease detecting algorithms, but these will likely be dealt with in time. In the meantime, many lives could be saved as a result of this exciting development in data science.