2017-2018 Health Data Science Interns Get Hands-On Experience with Duke Forge and DCRI

January 15, 2018
HDS Intern Reception.png

Six graduate students from Duke University have been paired with staff biostatistician mentors to complete internships with substantive hands-on experience in health data. 

The 2017-2018 cohort includes Yimeng Jia, Arpita Mandan, Xilin (Cecilia) Shi, Muyao Sun, Lanqiu (Kate) Yao, and Ran Zhou.

Interns are based in the Duke Clinical Research Institute Center for Predictive Medicine. “This first cohort of interns began last summer, and since that time we’ve seen them rapidly progress and develop,” commented Lisa Wruck, PhD, the center’s director. “We’re excited to bring a second cohort of interns in the spring and continue to expand and refine the program.”

The mentors have played an important role in the success of the program. In addition to shepherding the interns through their role on the analytic teams, the mentors have themselves made critical contributions to the projects and are developing leadership and technical skills. Mentors include Allison Dunning, MS, Peter Merrill, PhD, Hillary Mulder, MS, Robert (AJ) Overton, MS, and Matt Phelan, MS.

As members of transdisciplinary teams, the interns contribute to projects convened by the Forge, Duke’s center for health data science, in collaboration with clinical and quantitative leaders across Duke. Michael Pencina, PhD, Director of Biostatistics for the Duke Clinical Research Institute, described that “the internship program provides students with important opportunities to work with real problems in health data science and learn from experts in the field.”

Projects involve advanced quantitative methods including statistics and machine learning and offer opportunities to be a contributing member of the multidisciplinary teams. “The students have worked hard, and we’re looking forward to the next semester with them,” said Ricardo Henao, PhD, principal data scientist for Duke Forge.

More about each student and their internship experience:

Yimeng Jia, Master of Statistical Science, Class of 2018
Prior to joining Duke, Yimeng obtained her Bachelor of Science at the University of California, Los Angeles. She likes cutting-edge science and technology and enjoys applying new research methodologies in addressing real-world problems. As a health data science intern at the Duke Clinical Research Institute, she is currently working with prediction of medication adherence of type II diabetes patients using clinical notes. She is proud that her educational background of statistics and data science can help generate innovative insight in healthcare field. She will continue to solve data science problems in the future.

Arpita Mandan, Master of Statistical Science, Class of 2018
Arpita completed her Integrated Master’s in Chemistry from the Indian Institute of Technology Kanpur, in India. During her undergraduate studies, she completed multiple internships in biochemistry at UW Madison. Arpita got involved in health data science because this allows her to work with statistics as well as to use data to contribute to research in healthcare. The best part of Arpita’s health data science internship experience so far has been to work with data from real patients with type 2 diabetes. The objective of the project is to develop individualized medication recommendations for diabetics. Using real data and analyzing it to understand the different kinds of patients has been exciting.

Xilin (Cecilia) Shi, Master of Statistical Science, Class of 2018
Before coming to Duke, Xilin obtained her bachelor’s degree in statistics and financial mathematics from the Hong Kong University of Science and Technology. She is passionate about using data to solve real-world problems and health data science is an emerging field that has lots of opportunities to explore. From the health data science internship, she gained hands-on experience in dealing with massive data sets as well as fitting different models and comparing the results. In the future, she would like to become a data scientist and to use machine learning techniques to extract information and find hidden patterns from the data.

Muyao Sun, Master of Statistical Science, Class of 2018
Muyao obtained her bachelor’s degree at the University of Michigan and is part of the neonatal intensive care unit (NICU) predictive modeling project. The purpose of this project is to improve early diagnosis and prediction of necrotizing enterocolitis. Muyao is excited to utilize both machine learning and conventional statistical methods to gain insights from the data. After graduation, she plans to work as a data scientist in healthcare industry.

Lanqiu (Kate) Yao, Master of Biostatistics, Class of 2018
Before joining Duke, Kate earned her bachelor degrees in clinical medicine and in statistics at Peking University in China. She is working in the NICU prediction project, to help doctors predict the morbidity of necrotizing enterocolitis (NEC) in newborn babies. She enjoys the work at DCRI since it provides her an opportunity to collaborate with experienced statisticians and doctors, as well as a chance to compare machine learning methods with traditional biostatistics analysis methods in the healthcare field. Her main research interests include machine learning theory and methods with application in high dimensional and network data analysis, as well as image analysis. With a dream of helping people fight against diseases, she plans to pursue a PhD and dive into this field to help more people.

Ran Zhou, Master of Statistical Science, Class of 2018
Ran earned her bachelor’s degree at the University of Illinois at Urbana-Champaign with a double major in mathematics and statistics. The undergraduate education in statistics attracted Ran to the pursuit of a career in data science. Currently she is working on a project building a natural language processing model using neural networks. The experience lets Ran see the potential of machine learning in solving medical problems and makes her more assertive about a future career in data science. The natural language processing experience boosted her confidence in dealing with text data and pursuing a data science position focusing on natural language processing.