Approaching Health Data Science with Humility

October 3, 2019

“The only wisdom we can hope to acquire

Is the wisdom of humility: humility is endless.”

—T. S. Eliot, “East Coker,” from Four Quartets (1939)

“One, two, three, four, five, six, seven, eight…”

Elbows locked, scrubs damp with sweat, glasses sliding down your nose, you keep counting the chest compressions. Amid the vortex of urgency and adrenaline, klaxon blaring in the background, a team has materialized: respiratory therapist at the head of the bed, a nurse by the code cart, sticky pads to the chest and side, wires, tubing, junior residents on either side placing chest tubes…

A “code” — the medical response to a person going into cardiopulmonary arrest — at once represents both the triumph and failure of modern medicine. Viewed from one angle, we see an impressively universal, uniform approach that rigorously applies life-saving techniques, allowing us to yank a patient back from the precipice thanks to teamwork and an extraordinary suite of technology. But there’s another side: the sheer, violent unexpectedness of a code. Pounding on someone’s chest and shocking them is not an expected outcome for just about any intervention – by definition, something has gone awry.

Even when everyone has done everything right, the unexpected still happens. You may have applied the most up-to-date knowledge to select a treatment, done a thorough preoperative assessment, consulted the right specialists, and executed the procedure perfectly, but the complexity of human pathophysiology and our all-too-fallible efforts to alter its course come with no guarantees of success.

Practicing medicine is an exercise in humility. And in the context of healthcare, practicing data science should be no different. As a data scientist, you might employ your craft to the best of your knowledge and abilities, but as the volume of data swells and data products are bound up with the ways we deliver healthcare, there will inevitably be unexpected effects—including unintended harm.

How? Sometimes harm can arise from problems with data quality. Current versions of electronic health records (EHRs) in use today provide such a poor user experience that busy, overworked clinicians unintentionally enter incorrect information or choose expedience over accuracy (for example, by cutting-and-pasting previous chart notes). When we use these same data to train a machine learning algorithm, we cannot do so uncritically, because the algorithm will learn the vagaries of harried clinician data entry rather than the “ground truth.” Even with optimal conditions, no algorithm is perfect, just as no drug or procedure is perfect. An algorithm may possess an impressive 95% accuracy in identifying people who need a new cancer treatment, but this still means that it will miss 5% of the time. And that 5% represents real people.

Side by side pictures of guacamole and a cat.
Left: guacamole. Right: also guacamole. Image credit: Steve Buissinne (guacamole) and Jae Park (cat).

If an algorithm gets something wrong, does it do so in a way that makes sense to us? MIT’s Lab6 group famously manipulated an image of a tabby cat in ways imperceptible to human eyes that convinced one of Google’s vaunted pre-trained neural networks that the cat was “guacamole.” No doubt the conditions under which the research team made this happen were somewhat artificial, but a toddler will be 100% accurate in determining that the image — manipulated or not — is a cat. Another point is that the “failure mode” of the AI is entirely unintuitive to a human: why “guacamole,” and not a leopard, or a tiger? Is it possible that a doctor or nurse juggling a busy service might accept the diagnostic equivalent of “guacamole”? Have we designed our systems to prevent such a thing from occurring?

Training an algorithm may not seem to have the same urgency or stakes as being scrubbed into a surgical field and gently prying a cancerous pancreas away from the portal vein. But when we’re programming at a keyboard, we must approach data science with a delicacy and awareness of dire consequences like that of a hepatobiliary surgeon. There are many data hazards—the digital equivalent of a hemorrhage—that must be avoided.

We might think that unlike a human clinician, an impersonal, tireless algorithm is free of prejudices. But like any human creation, algorithms are inextricably linked with the people and culture that give rise to them. Three years ago, Microsoft learned this lesson the hard way when it unveiled a Twitter chatbot named “Tay.” Tay’s AI algorithms were designed to learn “casual and playful conversation” from the daily give-and-take of user interactions on Twitter. Unfortunately, about 24 hours later, Microsoft’s researchers were chagrined to find that Tay had turned into a hate-spewing neo-Nazi. Other AI misfires have been less spectacular but still deeply worrisome: Reuters reported late last year that Amazon had quietly turned off an algorithm for sifting through the résumés of job applicants. It had been trained on a decade’s worth of résumés and was found to favor male candidates over female ones, in large part because the data sources it learned from were dominated by male applicants.

Technocrats like clinicians and data scientists have skills that can feel like superpowers: the primary care doc can assimilate information about her patient’s chronic conditions and provide a life-extending, quality-of-life enhancing treatment regimen; an interventional cardiologist can unclog a lethally blocked coronary artery; a data scientist can crack open a massive dataset and build the digital equivalent of a crystal ball, or find data “signatures” for precision cancer therapies.

Such powers are gratifying but can also be intoxicating. Hence the need for humility. As Spiderman’s Uncle Ben says: "With great power comes great responsibility."

Modern medicine has multiple frameworks designed to inculcate this sense of responsibility. They are admittedly imperfect, but they are woven into our culture in a fundamental way. Starting with the Hippocratic Oath, they branch outward to encompass professional codes, licensure requirements, regulatory bodies such as the FDA, and legal frameworks governing malpractice. As data science merges with practice in the delivery of health, we will need to develop similar conventions. A data scientist at Google designing a product that mistargets advertisements to a small percentage of people is one thing; an algorithm mistargeting a cancer therapy with its attendant cost, side effects, or complications is quite another.

Here are a few things for us to think about as we apply data science to health:

Algorithms will be biased.

As the example of Amazon’s résumé screening shows, if we do not attend to the biases embedded in the data we collect and the conclusions we derive from them, we are potentially providing the means for virulent automation of that bias. Practically, we might find that an algorithm predicts complications associated with diabetes with acceptable accuracy across a broad population. Yet we might also find that its accuracy varies across different subpopulations, based on characteristics such as race, ethnicity, or gender. We must develop methods for identifying and addressing such biases.

Data scientists must become better acquainted with patients and the daily flow of health delivery.

Because our diagnostics and interventions always exist on a continuum of risk and benefit, data scientists must understand what this really means to the patients and clinicians involved. One of the founding figures of data science at Duke, Frank Starmer, rounded weekly with the Chair of Medicine, Gene Stead. Embedded among attending physicians, residents, interns, and medical students, he learned the connection between the data being collected on computers he was installing in Duke’s intensive care units and the practical demands of clinical medicine. Data scientists must understand at a visceral level the consequences of good decisions and incorrect ones. In medicine, nothing can be more gratifying, and nothing can be more humbling.

Data scientists must be terrific listeners.

David Shaywitz wrote that Flatiron Health, a successful oncology data startup, was based around a “deeply human core.” This in large part was due to its former chief medical officer (and now FDA Deputy Commissioner) Amy Abernethy. A former Duke housestaff, chief resident, and faculty member, Dr. Abernethy's many years in the trenches of academic medicine allowed her to accumulate thousands of conversations with patients, nurses, physicians, administrators, and researchers, yielding a unique knowledge base to inform the creation of a product that led to a $2.1BB acquisition by Roche. The Flatiron model meets a compelling need that exists across all of medicine, not just in oncology. However, a lot of listening was needed to get there.

Quantifying isn’t everything.

Physicians Pamela Hartzband and Jerome Groopman have cautioned that enthusiasm for “medical Taylorism” and the accompanying drive to quantify and maximize profits in health delivery (often through the EHR) has a downside. Many aspects of human health don’t move neatly and sequentially down a production line. Quantification is important, but it is not the endgame. Helping people is. The “scientific management” approach of Taylorism, they write, “…creates a fertile field for the sorts of cognitive errors that result in medical mistakes.” Therefore we ask, how do we create data science applications that mesh naturally with human behavior while being rigorous?

In a recent JAMA perspective, Zeke Emanuel and Bob Wachter noted that “…data, analytics, AI, and machine learning are about identification. But they have little role in establishing the structures, culture, and incentives necessary to change the behaviors of clinicians and patients.” Better application of data to health is really a combination of anthropology and AI. Anthropology requires empathy to observe and understand “cultures” and “incentives.”

… After the code, the patient’s room is a brightly lit chasm. He has been whisked to the operating room by the cardiothoracic fellow and an anesthesia team. The floor is strewn with sterile pack wrappers, torn medication boxes from the code cart, oxygen and suction tubing. A wall-mounted vacuum flask is three-quarters full of cherry-colored fluid. The charge nurse is outside the door finishing paperwork. Tonight is a save—the other day wasn’t. You lean against the wall, the catecholamine sizzle in your system subsiding, humbled by your duty, humbled by the privilege of being trained to help people, humbled by this brush with mortality.

These are the stakes data science needs to understand and incorporate to become an integral part of health and healing. 

See more blog posts from Dr. Huang.