Artificial intelligence and machine learning have undisputed potential for distilling large bodies of data into clinical action. There are compelling justifications for the use of these technologies in healthcare: data sources such as genomics (and other –omics), social data, socioeconomic variables, and streaming data from wearable devices all can yield data directly relevant to human health. Currently, however, many clinicians are overwhelmed even by the volume of “conventional” health data encountered in typical electronic health records (EHRs). Machine learning (ML) has the potential to bridge this gap by condensing large, complex, and multilayered datasets into actionable insights that will free clinicians to maximize the utility of their time and increase the quantity of high-quality data suitable for research and for informing decision-making about health and healthcare by patients, clinicians, administrators, policymakers, and the public.
However, algorithmic applications have a substantial and demonstrated capacity for encoding and propagating biases, whether inadvertently or intentionally. The social cost of bias incorporated into machine learning applications in healthcare in particular can be clearly seen in the case of a widely used medical algorithm that consistently misclassified the severity of illness in Black patients, leading to systematic undertreatment. Algorithm developers, regulators, and ultimately clinicians, patients, and the public would all benefit from a structured approach to identifying, evaluating, and countering bias in algorithmic products with clinical or health-related applications.
To this end, Duke Forge (Duke University's Center for Health Data Science) convened a conference of experts that included a former U.S. Food and Drug Administration (FDA) Commissioner, representatives from the FDA, journalists, computer scientists, experts in the law and ethics of algorithmic applications, quantitative experts, and clinicians to engage in exploratory work that would support the development of a reference architecture for evaluating bias in algorithms—one that could potentially be used by the scientific community and regulatory bodies for vetting algorithms used in healthcare.The impetus for this meeting grew out of conversations that centered on the increasing excitement in the world of medicine about the potential for artificial intelligence (AI) and machine learning, the prevailing puzzlement about why its use has yet to meaningfully permeate clinical practice (other than some relatively simple linear equations), and concerns about the potential for algorithmic technologies to introduce or exacerbate harmful biases.
The “Algorithmic Bias in Machine Learning” conference was held on September 19-20, 2019 at the J.B. Duke Hotel on Duke University campus in Durham, North Carolina. This symposium represented an effort to extend work previously funded by the Gordon and Betty Moore Foundation, namely the “Human Intelligence and Artificial Intelligence Symposium” conducted at Stanford University (April 2018) and the “Regulatory Oversight of Artificial Intelligence & Machine Learning” meeting sponsored by Duke’s Robert J. Margolis, MD, Center for Health Policy. The overarching purpose of the symposium was to move concretely toward a practical framework for evaluating artificial intelligence and machine learning applications in the context of use in health and healthcare. The resulting report, titled Algorithmic Bias in Machine Learning, summarizes and distills the discussions that took place during this expert meeting and presents a set of consensus priorities for future action.
This Algorithmic Bias in Machine Learning conference was hosted by Duke Forge (Duke University, Durham, North Carolina) and supported by a grant from the Gordon and Betty Moore Foundation.
 Obermeyer Z, Powers P, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453.