Health Data Science Internship Program

The Health Data Science Internship Program is a partnership between the Duke Clinical Research Institute (DCRI), which hosts and manages the interns, and Duke Forge, which convenes and catalyzes the projects interns are assigned to. Projects are part of the Forge Demonstration Program in which transdisciplinary teams use machine learning methods and health data to “demonstrate the art of the possible.” The HDS internship program is structured as a 17-month program. Interns work under the direction of quantitative experts, are paired with biostatistician staff mentors, and receive dedicated technical and professional skills training including a summer journal club, weekly one-on-one sessions between interns and mentors, and weekly meetings with the faculty quantitative lead.

The Health Data Science Internship Program launched in May 2017 and has formed 3 intern cohorts comprising 15 students in total. Duke graduate students pursuing master’s-level studies in any quantitative discipline are eligible to apply to the program. Graduates have successfully progressed to PhD programs and positions in industry with companies and schools such as the Dana Farber Cancer Institute,, the NYU School of Medicine, QuantumBlack, SAP Leonardo, the University of Wisconsin-Madison, Wells Fargo, and VF Corporation.

2019-05-09 Internship cohorts slide.jpg
Duke Health Data Science Intern Cohorrts, 2017-2019

Learning Objectives for Students in the Duke HDS Internship Program
Health Data Science Internship Mentors
Biostatistical Mentors. L-R: Matt Phelan, MS, Hillary Mulder, MS, Robert AJ Overton, MS, Peter Merrill, PhD, Allison Dunning, MS. Not pictured: Jyotsna Garg, MS, and Steven Lippman, PhD
  1. Comprehend the big picture clinical question. Interns will demonstrate they understand the translation of a clinical question or problem into an analyzable scope and be able to articulate the objectives of the project clearly.
  2. Understand the data, including preparation and processing stages. They will demonstrate a thorough understanding of the end-to-end process of working with and preparing the data. This includes consideration of the underlying population and data generation process; understanding potential sources of bias and confounders; and assessment of data quality and cleanliness.
  3. Apply appropriate methodologies for data exploration, description, analysis, and modeling. Interns will demonstrate comprehension of the methods applied, including assumptions and limitations of a given approach.
  4. Ensure reproducibility of their results through good documentation and code provenance. They will carefully manage their code and develop accurate descriptions of what they are doing, and why, in a way that others can understand.
  5. Use best practices in programming. Interns will demonstrate tool-specific skills and apply common software development practices such as modularizing code, use of code repositories, and version control.
  6. Engage with their teams at a professional level and develop collegial relationships. They will engage as an active contributor with their transdisciplinary team, communicate clearly and regularly, bring issues forward, and work with the team in developing solutions.
  7. Tailor their communications to the appropriate audience, including clinician and quantitative team members. Beyond giving a static presentation, interns will have the ability to listen to others and the confidence to speak up professionally. They will demonstrate the ability to discuss and converse in many situations with many audiences, and tailor their use of language and concepts to the people they are engaging with.
  8. Cultivate self-awareness and use knowledge of their strengths and weaknesses to guide continuous learning. They will be able to evaluate areas where they don't have adequate knowledge and be able to discuss and actively develop this understanding.

These learning objectives originally appeared in an abstract poster presented at the American Medical Informatics Association (AMIA) Annual Symposium, November 3-7, 2018 in San Francisco, California.

Health Data Science Internship Program Leadership
  • Lisa Wruck, PhD: Director of the Center for Predictive Medicine, Duke Clinical Research Institute
  • Larisa Rodgers, MSW: Health Data Science Program Coordinator, Duke Clinical Research Institute
  • Ricardo Henao, PhD: Principal Data Scientist, Duke Forge; Assistant Professor, Biostatistics & Bioinformatics; Assistant Professor, Electrical and Computer Engineering; Faculty, Duke Clinical Research Institute
  • Shelley Rusincovitch, MMCi: Associate Director of Informatics, Duke Forge
  • Ursula Rogers, BS: Senior Informaticist, Duke Forge
  • With key program advisers including Amy Herring, DSci, Victoria Christian, Michael Pencina, PhD, and Larry Carin, PhD
Further Reading


Acknowledgements: This work was supported by Duke Forge and the Duke Clinical Research Institute. Health Data Science at Duke is supported by the Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002553.