A Primer on Biodefense Data Science and Technology for Pandemic Preparedness

March 9, 2020

Biodefense is a term commonly associated with protections from the use of bioweapons, although the tactics and techniques of biodefense are also applicable to pathogen risks that don’t involve a human perpetrator [1]. Ever since the anthrax attacks that followed soon after the 9/11 terror attacks in 2001, the U.S. biodefense field has mostly focused on developing vaccines against biological weapons, albeit with checkered results [2]. Traditional biodefense strategies tend to rely upon hindsight and historical knowledge of pre-existing threats — an approach that can hamper effectiveness. Responding to novel threats such as viral pandemics requires new techniques, new capabilities, and imagination.

The need for biodefense is significant, because most elements of our society are vulnerable to disruption by disease. The novel SARS-CoV-2 coronavirus (Covid-19) outbreak has already rattled stock markets, sharply curtailed global travel, limited the availability of essential goods, closed schools, and even caused modifications to the planned 2020 Olympic Torch route. In the face of such uncertainty and disruption, the data science and technology communities can play vital roles in ensuring the best possible outcomes. Further, the Director-General of the World Health Organization (WHO) has issued an open call for help and innovation.

Many of us are already playing important roles in responding to the outbreak. The data and tools of epidemiology, public health, and digital disease detection are already part of the response’s critical path. Similarly, drug and vaccine discovery, development, regulatory review, pharmacovigilance, and computational support of basic science are all essential activities. In addition, there are highly specialized branches of data science that will rapidly see increased demand if the Covid-19 outbreak reaches pandemic status. Below, I provide an overview of biodefense use cases and capabilities that comprise key components of preventative, curative, and restorative activities with the hope of inspiring others to consider lending their expertise to biodefense.

Understanding Risk

When applied in a business context, resilience denotes the capacity to continue to execute on business imperatives despite difficulties such as a natural disaster, fire, cyberattack, or infectious disease outbreak. Cybersecurity provides an excellent context for understanding the specific concepts of resilience. Cyber resilience planning helps organizations protect against cyber risks, defend against and limit the severity of attacks, and ensure continued survival despite an attack.  The first step in designing resilience is to understand risk. In cybersecurity, risk is frequently represented as a simple equation:

Risk = Threat × Vulnerability × Impact × Likelihood.

Each organization is unique and the differences in the magnitudes of these variables can be substantial. Let’s consider the “impact” variable for a moment: if a prestigious medical journal were to be taken offline, the field of medicine would lose an important informational resource. If the critical care systems of a hospital were taken offline, patient care (and wellbeing) could suffer. All organizations have varying levels of threats, business-specific vulnerabilities, and likelihoods, and there are specific methods for determining each of these elements. 

The term attack surface describes the level of exposure to malicious cyber activity for any given person or institution. It’s often used to provide a quantitative measure for the vulnerability term. Attack surface reduction — that is, reducing the total reachable and exploitable vulnerabilities on a system, application, network, or person — is a common method of minimizing vulnerability. The easiest way to envision this concept is to consider one’s personal attack surface. The total number of connected devices; network, system and application accounts; email addresses; and social media footprints all add up to our personal attack surfaces. Of course, these concepts are complex and interrelated as attack surface may also inform the “likelihood” and “vulnerability” terms, depending upon the specific method used.

Within epidemiology, the concept of contact tracing provides some interesting analogs. The WHO defines contact tracing as the process of identifying and monitoring people who have had close contact with someone who has been infected with a virus. The WHO notes that “…closely watching contacts after exposure to an infected person will help the contacts get care and treatment, and will prevent further transmission of the virus.” In practical terms, contact tracing consists of three elements: contact identification, contact listing, and contact follow-up.  When overlaid with geolocation data, these steps enable the creation of a comprehensive map of risk and exposure of specific people and the places they have been. 

The importance of such data cannot be overstated. During the 2014-2016 Ebola outbreak in West Africa, patients carrying the disease could be asymptomatic for up to 21 days. This, when coupled with the fact that the earliest symptoms (fever, headache, gastrointestinal upset) were often indistinguishable from endemic illnesses such as malaria, made the task of triage highly complex and challenging. The presence of a confirmed positive contact was often the key differentiator. It was, and is, that important.

Understanding Resilience

In considering the specifics of business resilience and continuity, data scientists are ideally positioned to clearly identify and rank priorities regarding systems and data that must be protected and kept running. In the United States, the Federal Emergency Management Agency (FEMA) recommends that every business have a business continuity plan. The essential elements most applicable to our skillsets are IT and data systems continuity planning, emergency connectivity for employees, enabling employees with special needs, ensuring adequate supplies, crisis communications and cybersecurity. However, as we consider this, we should also bear in mind a 2014 study that showed that 70% of companies lack adequate preparations for disaster recovery. Further, disasters are often disproportionally catastrophic for small innovators: FEMA claims that 40 to 60 percent of small businesses never recover from a disaster.

Techies are already on the critical path for most of these activities, although this may be less true in larger, more complex organizations where responsibility for data and systems may be ambiguous between centralized technology groups and decentralized innovation. Effective continuity planning mandates that such ambiguities be clarified and resolved. Another essential function is testing of continuity plans. Plans can only be trusted when they’ve been properly validated; unfortunately, few institutions actually test their disaster plans. This gap is common in the technology world, where businesses that have not tested their backups often find that they cannot restore systems when they need to do so. Data scientists can develop test scripts, lead training, identify gaps, and offer a host of other services that will fill gaps and enhance the resilience of continuity plans.

Amid all these activities, it is essential that all stakeholders be able to trust in truthful, accurate communications. According to the WHO, the coronavirus outbreak has led to an “infodemic” of misinformation and malinformation comprising everything from conspiracy theories asserting that the outbreak is a man-made biological attack, to badly distorted death statistics, to quack cures being aggressively hawked on social media. We all must be the custodians of truth, a job that includes everything from ensuring the accuracy of all communications containing business data to serving as vigilant sentries for misinformation campaigns. Unfortunately, all resilience plans must include provisions to deal with those looking to directly enrich themselves and promote their agendas in the wake of a catastrophe. Whether it is diligently monitoring systems for signs of increased cyber activity, monitoring supply chain integrity for signs of counterfeiting, or surveilling the web for malicious misinformation, disinformation and malinformation, we must be prepared to work to prevent and minimize the effects of these inevitable bad guys. The Covid-19 outbreak is already being aggressively exploited by cybercriminals, and data and IT staff must be on the lookout for more than just direct attacks and scams.  For example, one of the most important tasks during Covid-19 may be assuring the resilience of voting systems for the 2020 presidential election. Be imaginative — the bad guys certainly are.

Response and Recovery

During the onset of an event such as the one we’re now experiencing, resilience is the key priority. Secure your systems and protect your family and business. Remember, cybercrime spreads just as easily from personal devices to work devices as viruses do between people.   Biodefense may have previously been considered the domain of the military and antiterrorism experts, but all of us now have a potential role to play. Please consider lending your time and expertise.

In this post, the first of a series exploring the concepts of biodefense, we’ve focused on the topic of resilience.  Once resilience has been established, the focus then shifts to Response and Recovery, which we will cover in the next installment in our series. 


  1. Gronvall GK. Biodefense in the 21st century. Science. 2017;356(6338):588. doi:10.1126/science.aan1118
  2. US biodefense--shocking and awful. Nat Biotechnol. 2007;25(6):603. doi:10.1038/nbt0607-603

Supplemental Material - Additional Digital Resources

See more blog posts by Eric Perakslis