Course Repository
The official course repository containing all the relevant material (slides, recordings, images, etc..) is: https://github.com/federicoruggeri/phdlectures-r3
Introduction
The number of scientific articles published in Computer Science (and similar fields) increases steadily every year. This is mainly due to breakthroughs like Deep Learning, and, more recently, Large Language Models.
Paradoxically, researchers are struggling even more to reproduce published research. This issue affects all possible aspects of research, including methodology, data curation, approach comparison, and implementation.
In this course, we’ll introduce and discuss the concept of ‘reproducibility’ in research. In particular, we’ll overview current issues in research and existing attempts to address them. We’ll focus on data curation, experimental setup, model comparison, and programming best practices.
This course is recommended for all types of researchers, from those who have just embarked on their journey to those who have always wondered how certain research managed to get published. See Section Prerequisites for more details.
Part 1: Reproducibility in Research
We discuss the current risks in doing research nowadays, characterized by non-reproducible findings, non publicly available resources (e.g., data, code, artifacts), an incredible and non-sustainable urge of publishing large amounts of papers in short time. In this world, either as an aspiring or experience researcher, would you accept (trust) a work that (i) doesn’t provide sufficient information for reproducibility; (ii) doesn’t provide any code; (iii) doesn’t provide the data/guidelines for collecting their contributing dataset; (iv) doesn’t provide training details like model hyper-parameters and data partitioning.
Lecture recordings
Readings
- Stodden and Miguez, 2013 - Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research.
- McNutt, 2014 - Reproducibility.
- Allison et al., 2016 - Reproducibility: A tragedy of errors.
- Baker 2016 - 1,500 scientists lift the lid on reproducibility.
- Edwards and Roy, 2017 - Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition.
- Raff, 2019 - A Step Toward Quantifying Independently Reproducible Machine Learning Research.
- Bouthillier et al., 2019 - Unreproducible Research is Reproducible.
- Pineau et al., 2020 - Improving Reproducibility in Machine Learning Research.
- Garcia et al., 2021 – Nonreplicable publications are cited more than replicable ones.
- Leonelli, 2023 - Philosophy of Open Science.
- Kapoor and Naranayan, 2023 - Leakage and the reproducibility crisis in machine-learning-based science.
Part 2: Data collection and Annotation
Reproducibility can target different aspects of the research pipeline. From the conceptualization of ideas to collecting resources and doing experiments. We cover several aspects of data collection since it often represents the backbone of most produced machine learning research. In particular, we cover annotation paradigms, requirements for collecting and annotating data, issues and risks when collecting data, and evaluating annotation quality.
Lecture recordings
- Lecture 3 – Data Collection and Annotation.
Readings
- Geva et al., 2019 - Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets.
- Liao et al., 2021 - Are We Learning Yet? A Meta-Review of Evaluation Failures Across Machine Learning.
- Paullada et al., 2021 - Data and its (dis)contents: A survey of dataset development and use in machine learning research.
- Koch et al., 2021 - Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research.
- Röttger et al., 2022 - Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
- Cabitza et al., 2023 - Toward a Perspectivist Turn in Ground Truthing for Predictive Computing.
- Ruggeri et al., 2025 - Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology.
Part 3: Modeling and Experimenting
There are other aspects of a machine learning pipeline where talking about reproducibility is essential: modeling, experimenting, and evaluation. We talk about data partitioning, data leakage, random seeding, performance comparison, and metrics.
Lecture recordings
- Lecture 4 – Modeling and Experimenting.
Readings
- Blagec et al., 2008 - A critical analysis of metrics used for measuring progress in artificial intelligence.
- Henderson et al., 2018 - Deep Reinforcement Learning That Matters.
- Dror et al., 2018 - The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing.
- Gorman and Bedrick, 2019 - We need to talk about standard splits.
- Amrhein et al., 2019 - Scientists rise up against statistical significance.
- Forde and Paganini, 2019 - The Scientific Method in the Science of Machine Learning.
- Hovy and Prabhumoye, 2020 - Five sources of bias in natural language processing.
- Azer et al., 2020 - Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses.
- Dehghani et al., 2021 - The Benchmark Lottery.
- Søgaard et al., 2021 - We need to Talk About Random Splits.
- Rob van der Goot, 2021 - We Need to Talk About train-dev-test Splits.
- Bethard, 2021 - We need to talk about random seeds.
- Marie et al., 2021 - Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers.
- Kapoor and Naranayan, 2023 - Leakage and the reproducibility crisis in machine-learning-based science.
- Lones, 2024 - How to avoid machine learning pitfalls: a guide for academic researchers.
Part 4: Responsible Research
While there are several issues we might encounter concerning reproducibility, there is also effort in developing solutions to mitigate these issues. One of these solutions in accordance with reproducible research is represented by recommendations checklists.
Lecture recordings
- Lecture 5 – Responsible Research.
Readings
- Arnold et al., 2019 - FactSheets: Increasing trust in AI services through supplier’s declarations of conformity.
- Mitchell et al., 2019 - Model Cards for Model Reporting.
- Gebru et al., 2021 - Datasheets for Datasets.
- Pushkarna et al., 2022 - Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI.
- Kapoor and Naranayan, 2023 - Leakage and the reproducibility crisis in machine-learning-based science.
- Mancini et al., 2025 - Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research.
Part 5: Programming Best Practices
Whether you like it or not, experimental setting might require you to do some coding stuff.
Coding translates to:
- Transparency (don’t you dare do some cheap tricks!)
- Correctness (your code should reflect your paper statements)
- Readability (please, don’t make this a nightmare)
- Efficiency (time is money)
- Maintainability (I’m sure you’ll re-use this code)
Lecture recordings
Readings
- The Theory of Type Hints
- The state of type hints in Python
- The Correct Way to Overload Functions in Python
- Ultimate Guide to Python Debugging
- Pytest Features, That You Need in Your (Testing) Life
- Making Python Programs Blazingly Fast
- Optimizing Memory Usage in Python Applications
- Python CLI Tricks That Don’t Require Any Code Whatsoever
- Profiling and Analyzing Performance of Python Programs
Part 6: Cinnamon, a lightweight python library for research
Cinnamon is a lightweight library for general-purpose configuration and code logic de-coupling.
Lecture recordings
- Lecture 8 – Cinnamon: a lightweight python library for research.
Readings
Course History
2024-2025–> “Robust and Reproducible Research” (16 hours)2022-2023–> “Robust and Reproducible Experimental Deep Learning Setting” (10 hours)
Course Info
16 Hours Lecture Format: 2 hour-long hybrid lectures.
Prerequisites
Lectures are meant to be interactive.
- Programming: Intermediate
- Deep Learning Theory: Intermediate
- Jupyter Notebook: Beginner