Hard Lessons: What the World of Health AI Can Learn from Aviation

January 6, 2021

“When fixing an aircraft you don’t want to design in the next crash”

The morning of October 29, 2018, was a balmy 80 degrees in Jakarta, Indonesia, with a 5-knot wind blowing out of the southwest. Under clear skies, Lion Air Flight 610 pushed back from the gate at Soekarno-Hatta International Airport, bound for Depati Amir Airport on Bangka Island. On board were 188 souls: two pilots, five flight attendants, and 181 passengers.

During takeoff, as the Boeing 737 Max 8 accelerated to rotation speed and the pilots pulled back on the control column to lift the nose of the plane to the sky, the left control column “stick shaker” activated and continued shaking for most of the few remaining minutes of the flight—a warning sign that the plane was at risk of stalling, or losing lift.

Twelve minutes later, the aircraft plummeted into the Java Sea 21 miles offshore, killing all aboard. Five months later (a timespan during which Boeing claimed that it would be able to identify and remedy whatever the problem was), Ethiopian Airlines Flight 302, also a Boeing 737 Max 8, crashed en route from Addis Ababa to Nairobi under suspiciously similar circumstances, with the loss of all 157 passengers and crew.

Despite initial protestations from Boeing’s then-CEO Dennis Muilenburg that the aircraft were “properly designed” and crew error was responsible for the total 346 lives lost, the 387 operational 737 Max 8s across the world remain grounded today, more than two years later, although the FAA recently announced a series of actions that could return the U.S. fleet to service.

Since then, investigations have raised many questions. Chief among them: how the aerodynamic characteristics of the 737 Max 8 differs from its predecessors, the workings of the plane’s Maneuvering Characteristics Augmentation System (MCAS) flight control system, what “airmanship” means, and the Federal Aviation Authority’s (FAA) ability to adequately oversee the certification of new aircraft. Meanwhile, Boeing’s course for readying the plane for its return to service has been rocky. Along the way, revisions of the MCAS have elicited a state deemed “catastrophic” for a passenger airliner, there have been errors with indicator lights, wiring bundles have been noted to be too close together, and a software glitch that “prevents the flight control computers from powering up and verifying they are ready for flight” appeared in testing.

Airplanes and Algorithms

You may reasonably ask: “Why is Duke Forge blogging about airplanes when it is dedicated to promoting actionable health data science?” Well, it’s all about algorithms: especially in mission-critical areas such as aviation and health, where a single error can have catastrophic results. Let’s dig in a bit further.

While the story of the 737’s double-decker big sibling, the Boeing 747, as a “bet the company” gamble is well-known, the original 737 was designed to be simple and cheap. It opportunistically borrowed the 707’s nose and the 727’s cross-section, mating them with small, high-lift wings and small-diameter engines. These engines, mounted tightly under the aircraft’s wings, permitted the use of a short undercarriage, facilitating ease of access for engine maintenance and loading and unloading of baggage.

The ongoing story of the 737 Max 8 is—at its core—a question of appropriate use of automation.

As with any product, the 737 is the result of design decisions. At its genesis, Boeing recognized a market for small commercial jets with a capacity of approximately 100 passengers and a range of around 1,000 miles. This prompted development of a new plane that satisfied this market: the 737. As time passed, customers asked for more passenger capacity and longer range, and as regulators introduced new demands related to fuel economy and emissions, Boeing continuously massaged the 737 design to accommodate these new parameters. Prior to this most recent evolution that culminated in the 737 Max, in its 5 decades the 737 had already grown to be 44 feet longer, have a wingspan that was 20 feet wider, weigh almost 80,000 pounds more, and boast almost twice the range compared with the original design—arguably not an evolution, but an outright transmogrification of the airframe.

Most recently, facing strong competition from the latest and most efficient iteration of Airbus’ A320Neo, Boeing needed a quick and economical riposte in the form of a new Boeing “737 Max.” The need to respond quickly to a competitor for its market niche meant that Boeing would need to maximize speed to production and minimize cost. Given existing regulatory requirements, this meant that “…the redesign had to lie within the original 1968 FAA certification of the type and not be treated officially as a new airplane.”[1] This prompts the question: Is the 737 Max a new airplane or a variation on an established theme? Especially with an original certification more than a half century old?

Marketing Meets Engineering

Detail of Boeing 737 Max 8 airliner on the ground, showing the plane's right wingtip. Image credit: Oleg V. Belyakov via Wikipedia (CC BY-SA 3.0)
Image credit: Oleg V. Belyakov via Wikipedia (CC BY-SA 3.0)

Part of the appeal of the 737 Max, per Boeing’s own advertising campaign, was that the 737 Max has the “…same pilot type rating, same ground handling, same maintenance program, same flight simulators, same reliability…” as its predecessors. However, a key component of this “sameness” resides in the aircraft’s flight handling characteristics. This is the point at which the MCAS becomes a focus for attention. The system was designed to make pilots at the controls of the 737 Max—with its new, larger engines and greater wingspan—feel just like they were flying earlier generations of 737s.

The MCAS is not a safety device, per se. Rather, it’s a software package that is essentially designed to make a new aircraft with significantly different flight characteristics and performance envelope feel (and respond to pilot input) like the old 737—a sort of “translation layer” between the pilot and the plane. The larger point of MCAS was that it would help Boeing avoid reclassifying the Max as a new aircraft with all the regulatory burden a new certification requires. According to William Langwiesche,1 the artificial feel the MCAS brought to flying the 737 Max was so indistinguishable from earlier 737s that “Boeing convinced itself (and the FAA) that there was no need to even introduce the MCAS to the airplane’s future pilots.”

There are, however, two problems with this. First, MCAS had catastrophic design flaws. Second, pilots were unaware that MCAS even existed.

William Langwiesche notes that the crisp problem-solving that accompanies excellent piloting skills might have prevented both crashes, but Boeing’s ambition to make the 737 Max ubiquitous across the globe suggests the company should have been more judicious in accounting for wide variation in pilot skill. This adds a third problem: a mismatch between a product and its users.

When you board a typical commercial jet from the jet bridge, you can often observe below the pilot’s windshield a small moveable fin (known as a “vane”) projecting from the fuselage. This is the “angle-of-attack” (AoA) sensor. During flight it aligns with airflow and provides information about whether the plane is pointed up or down. Modern commercial planes possess redundant AoA vanes so that if one fails (for instance, by freezing in place) another can be used. This redundancy also is important for systems that depend on AoA readings—most modern avionics are designed to utilize a logic that the pilots will understand to reconcile disagreement between sensors. In the case of MCAS, the system was fatefully designed to accept input from only one vane; this tragic design flaw allowed a single malfunctioning vane to invoke a vigorous—and erroneous—deflection of the horizontal stabilizer, thereby pointing the nose of the aircraft down. This design flaw proved fatal for Lion Air 610.

 In the wake of the Indonesian and Ethiopian tragedies, reporting revealed the existence of an indicator that would alert pilots to a disagreement between the AoA sensors. But this was a costly option offered by Boeing, one that an economy airline such as Lion Air unsurprisingly eschewed.

Larger Lessons from a Tragedy

Having taken a deep dive into commercial aviation, we’ll now try to make clear why these examples are relevant to healthcare. Are there lessons to be learned from the ongoing tragic saga of the Boeing 737 Max 8? Absolutely.

The Role of Automation in Mission-Critical Work

The ongoing story of the 737 Max 8 is—at its core—a question of appropriate use of automation. If the fundamental aerodynamic characteristics of an aircraft changes, it’s worth thinking very seriously about whether using software to synthetically change the feel of the plane to avoid additional training or recertification is appropriate. Similarly in healthcare, where clinician burnout is rife,[2] automation has the potential to relieve much of this burden. Yet what and how we automate is a question that requires deep consideration. What happens in the event that we accomplish the equivalent of spackling AI or software over obsolete designs?

By paying close attention to the lessons of the Boeing 737 Max 8, we can use our consciousness and our consciences to effectively and safely blend algorithms into how we care for our patients and our communities. 

A Politico article about the deployment of an electronic health record (EHR) in Denmark notes that a new AI application had to be built just to extract useful content from EHR-generated discharge letters. In this case, AI is being used to remediate inherent flaws in an EHR instead of fixing the flawed EHR in the first place. EHRs may sound modern, but they’ve been around for quite some time—an old “airframe” built on obsolete software architecture principles and even a core programming language dating from the 1960s. How much AI should we stack on top of this groaning airframe reconsidering whether we should reengineer the airframe itself? AI shouldn't be a technological Band-Aid: if there are deficiencies in technologies such as EHRs, then we should fix them. And if pragmatic considerations must include AI as a stopgap measure, we should make a decision about when we will address the underlying problem and not use AI indefinitely as the equivalent of a “healthcare MCAS.”

Dealing with Conflicting Information

Black and white photograph a jetliner cockpit showing instruments and controls. Image credit: September20th via Pexels
Image credit: September20th via Pexels

One of the fundamental problems faced by clinical trainees and professionals alike is conflicting data. A modern clinical environment includes a mix of objective data such as laboratory tests, derived information (in many cases, a seemingly objective value such as a “diagnosis” is a composite of multiple facts), and unrecorded, hidden information such as clinician practice patterns. In many cases, the information points in different directions. And usually, as with the Lion Air 737 Max 8, there is no indicator for such contradictions. Most uses of AI in healthcare skim structured “objective” data or AI-amenable imaging data, while much deeper, latent, and contextual information remains inaccessible. This means that clinicians still need to reconcile potential contradictions between the outputs of machine learning and other information they have about their patients. Algorithms provide a tempting path out of this thicket. But can they recognize context that goes unrecorded in billing and charting systems?

Failure Modes that Are Graceful Rather than Catastrophic

In order to make the 737 Max 8 “feel like” earlier models, Boeing designed a system that had the potential to fail in a catastrophic manner, catching flight crews literally by surprise, the crews being oblivious to the existence of the MCAS. Professional clinicians are undoubtedly familiar with operating with ambiguous situations or uncertainty—that is part of the business—but algorithms have a factitious air of certitude or objectivity that is tempting to accept without further scrutiny.[3] One thing we will need to consider in the “user experience” of AI is an algorithm’s ability to convey varying levels of confidence (many implementations misleadingly do not display degrees of uncertainty at all) and contextualize a prediction amid the presence or absence of information we have about our patients.

Training and “Airmanship”

Algorithms do not obviate the professionalism of health care providers. It is tempting to perform “bake-offs” between algorithms and clinicians—and examples already abound—but algorithms do not take care of patients. Professionals do. And part of that professional obligation is contextualizing information within a patient’s own goals, not for how an algorithm was quantitatively optimized. Think: who would you rather have take care of you? The best doctor you've ever known, or an algorithm? Something to be attentive to is whether clinicians begin to cede their skills and capabilities to algorithms, a process known as “de-skilling.”

Mismatch Between Product & Users

Who is the primary recipient of output from an AI? If an algorithm is more effective for screening a population, is it useful to layer an additional metric for a front-line clinician amid the cacophony of signals she receives daily? Should we consider whether centralized teams who can offload the alert burden from busy services and view predictions from an operations center and serve as consultants can provide a buffer and implementation path for algorithms? There is strong temptation to use algorithms in adjacent disease areas or use cases that they were not explicitly designed to serve—the equivalent of everything looking like a nail when all you have is a hammer. Certainly this can be done, but it requires careful analysis of the performance of the algorithm in adjacent settings and the clinical workflows that are likely to be very different than the algorithm’s training environment.

Regulatory Issues

In December, the Senate Committee on Commerce, Science, and Transportation released a report on their investigation of the FAA’s role in the Boeing Max 8 debacle. The Committee concluded that Boeing “inappropriately coached” FAA test pilots to come to a predetermined outcome for recertification, and that some tests were performed on simulators that did not even include a simulated MCAS function. “Regulatory capture,” a term that describes cozy relationships between supposedly independent regulators and those they ostensibly oversee, undermines the intent of regulation*. In the world of algorithms in healthcare, agencies such as the Food and Drug Administration and the Federal Trade Commission are slowly and carefully developing policies for protecting consumers. In the meanwhile, most algorithm builders perform their own analyses of the performance characteristics of their work. Even with the strong culture of transparency in the machine learning world, builders are, by definition, conflicted. Therefore, organizations that intend to use algorithms must develop independent oversight regardless of how prepared federal agencies may be, or how immature state and federal legislation may be. 

1868 photograph of an early unsuccessful flying machine - Jean-Marie Le Bris' Albatross II - sitting on a pair of carriage wheels
Image via Wikimedia Commons

Learning to Fly

With any new and promising technology, we often begin with phases of breathless optimism and hubris that are followed with disillusionment and cynicism. The application of artificial intelligence and algorithms in health care will likely follow a similar trajectory. At the same time, health care is labor-intensive, data-intensive, mission critical, and…human. The busy frontline clinicians who literally bring their work home with them and wrestle with electronic health records after the dishes are washed and the children put to bed need all the help we can get them. By paying close attention to the lessons of the Boeing 737 Max 8, we can use our consciousness and our consciences to effectively and safely blend algorithms into how we care for our patients and our communities.

*As of January 7, 2020, the U.S. Department of Justice announced that it had reached a settlement in the form of a deferred prosecution agreement with Boeing, under which the company would admit to "criminal misconduct" and pay a total of $2.5BB, most of which would go to compensate airlines for purchases of the 737 Max 8. A total of $500MM is set aside for compensation to crash victims' families, and another nearly $244MM will be paid in criminal penalties. Details of the settlement, including the possibility of future action against individuals employed by Boeing, are available in this story by NPR's David Schaper.

See more blog posts from Dr. Huang.

[1] William Langewiesche W. What really brought down the Boeing 737 Max? New York Times. September 18, 2019; updated December 29, 2020. Available at: https://www.nytimes.com/2019/09/18/magazine/boeing-737-max-crashes.html

[2] Tseng P, Kaplan RS, Richman BD, Shah MA, Schulman KA. Administrative Costs Associated With Physician Billing and Insurance-Related Activities at an Academic Health Care System. JAMA. 2018 Feb 20;319(7):691-697. doi: 10.1001/jama.2017.19148. PMID: 29466590; PMCID: PMC5839285.

[3] Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med. 2021 Jan 5;4(1):4. doi: 10.1038/s41746-020-00367-3. PMID: 33402680.


Portrait of Duke Forge Director Erich S. Huang, MD, PhD