APIs and Health Data: Chipping Away at the Tower of Babel

September 20, 2018

In the kickoff symposium for Duke Forge in 2017, I noted that that a 2012 issue of the Harvard Business Review heralded “data scientist” as the “sexiest job of the 21st century,” “sexy” in this case to be construed as “having rare qualities that are much in demand.” Indeed, considering this summer’s oversubscription of Duke’s Summer Course in Machine Learning, many see working familiarity with TensorFlow, convolutional neural networks (CNNs), long short-term memory (LSTM), and generative adversarial networks (GANs) as the attributes that will prompt prospective employers to “swipe right” on their resumés.

In contrast to the Harvard Business Review, however, the New York Times reported that good data science requires a lot of “janitor work”. In the article, a computer scientist stated, “it’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”

I heartily agree.

More often, it requires a laborious slog into the raw muck and vagaries of such a dataset before you can fire up and tune an algorithm to wring meaning from it. Many cite a rule-of-thumb of 80% time spent cleaning data and 20% applying machine learning to it. But without discounting the real labor that, say, an AirBnB or Stitch Fix data scientist puts into pointing personalized recommendations at customers, our colleagues in the tech industry have made it easier to store data of diverse types at scale, distribute it to computing resources, and push results to individual consumers. 

Another thing tech folk do well is create “ecosystems” for their data science to flourish. AirBnB and Lyft would not work nearly as well without preexisting services like Google Maps. Braintree Payments provides technology for clearing credit card transactions for Stitch Fix; Warby Parker uses Stripe for the same. In turn, these businesses make their services available to other businesses—note how in Apple Maps you can request a Lyft, or how your local restaurants use Uber Eats to deliver, and that you can watch HBO NOW on an AppleTV.

Such ecosystems depend on application programming interfaces, or APIs.

The willingness to participate in and contribute to ecosystems that power APIs is exactly where healthcare is at its weakest. Many healthcare systems have only recently converted from paper charts to electronic records, and there is much talk about “interoperability” and the potential of more facile data exchange. Yet there is little institutional knowledge of or comfort with enabling the kinds of interconnected ecosystems the tech world is building elsewhere.

APIs are software ports that provide standardized routes for moving data around a network or the internet. APIs are how your browser or your mobile phone talks to Instagram or Yelp. When you open the Dropbox app, it uses APIs the way your TV or laptop uses its HDMI, video, power, or data ports.

The willingness to participate in and contribute to ecosystems that power APIs is exactly where healthcare is at its weakest. Many healthcare systems have only recently converted from paper charts to electronic records, and there is much talk about “interoperability” and the potential of more facile data exchange. Yet there is little institutional knowledge of or comfort with enabling the kinds of interconnected ecosystems the tech world is building elsewhere.

For the most part, the clinical world is still held back by “static” data analysis. You spend months, if not years, dealing with the governance to access a piece of data. You wait for a data delivery team to detach a stale, non-real-time chunk of data, you analyze it and perhaps publish, or report the results relevant for last year. And if you happen to have built a machine learning algorithm that you now want to use in the course of patient care, it’s very difficult to deploy it in ways that allow it to receive and respond to live information.

How can we help build healthcare ecosystems that are more responsive and “real time”? APIs might very well be the answer.

Amid the current enthusiasm for machine learning and artificial intelligence, APIs may seem far from “sexy”—but the machine learning part is far downstream of accessing and understanding data, and preparing it for those fancy algorithms. A much-touted concept in medicine is of “learning health systems” in which we constantly assimilate the data we generate in the course of healthcare, analyze it, and act upon it. This is hardly feasible when the processes associated with data analysis are so manual and ponderous. You can be sure that Lyft is looking at and adjusting to their data in real time. They are a “learning rideshare”—they convert real-world, real-time evidence into action.

We should strive to do the same in health.

If you want to play with the Google Maps’ Places API, you’re allowed to “hit” it 25,000 times a day for free. Within this free tier, I’ve been able to look at how far diabetic patients in different parts of town would have to walk or drive to different types of food retail locations. It’s virtually impossible to do the same with our own data in the healthcare world. In the broader tech world, the community standards of APIs have become the lingua franca of web and application development. In healthcare, we have an impenetrable Tower of Babel.

One thing we must do is start teaching our trainees out of our Babel—how to use APIs and what a good API looks like. When you can easily merge Google Maps data on retail locations and Zillow mortgage data, you begin to wonder why we make it so hard on ourselves to serve our own patients. And this certainly does not mean we are discounting privacy or security concerns. If anything, APIs provide the opportunity to become more granular about controlling access to sensitive data and rigorous in auditing their use.

All in all, while we need to teach our trainees about “Adagrad” and “Adam” for optimizing neural networks, they also need regular exposure to best practices in accessing data via APIs and should learn how to pipeline these data to algorithms. These skills should be table stakes and a fundamental part of their training for the future ecosystem we need in health and healthcare.


See more blog posts by Dr. Huang

Author

Portrait of Duke Forge Director Erich S. Huang, MD, PhD