学术报告

学术报告

您当前所在位置: 首页 > 学术报告 > 正文
报告时间 报告地点
报告人

报告题目:Reproducible, Reusable and Robust Data Science: from Theory to Practice

报 告 人:Ana Trisovic 副教授 哈佛大学

照    片:

邀 请 人:李伟

报告时间:2020年12月21日上午9:55-10:40

腾讯会议ID:300 853 885

报告人简介:Ana Trisovic是哈佛大学定量社会科学研究所(IQSS)的斯隆博士后学者。她的研究主要集中于计算再生性、数据保护和数据科学,与Dataverse团队合作,研究了如何通过自动化、元数据和封装来促进研究数据和代码的再使用。此前,Ana Trisovic是芝加哥大学CLIR博士后研究员,在那里她与能源政策研究所(EPIC)和图书馆工作。她于2018年在剑桥大学完成了计算机科学博士学位,博士论文题目是“CERN LHCb实验的数据保存和再现性”。在欧洲核子研究所工作期间,她与LHCb合作、CERN开放数据和CERN分析保存小组一起工作。在她攻读博士学位期间,她是纽纳姆学院Muir Wood学者成员、申请到CERN博士生项目和谷歌Anita Borg纪念奖学金的获得者。

报告摘要:A new challenge of data science and machine learning is to ensure that published results are reliable and robust, which is essential for research verification and trustworthiness. However, in recent years we have observed issues in recreating and replicating machine learning models, causing a lack of research result reproducibility, which is defined as obtaining consistent results using the same input data, methods, and code. Furthermore, a reproducibility crisis has been reported, as much of the published results cannot be reproduced. To enable reproducibility, a researcher needs actionable steps that facilitate implementation and help mark progress in practice. This talk will focus on actionable steps in enabling reproducibility and reuse. In particular, we will discuss several aspects that can help improve the robustness of a data science analysis, such as data provenance, feature provenance, model provenance, and software environment. The talk will outline concrete guidelines and checklists as tools for building a reproducible machine learning pipeline and effectively sharing data science results.

上一篇:Nonparametric confidence intervals for population variance of one sample and the difference of variances of two samples

下一篇:Reproducible, Reusable and Robust Data Science: Values, Challenges and Goals

关闭