Course Repository

The official course repository containing all the relevant material (slides, recordings, images, etc..) is: https://github.com/federicoruggeri/phdlectures-r3

Introduction

The number of scientific articles published in Computer Science (and similar fields) increases steadily every year. This is mainly due to breakthroughs like Deep Learning, and, more recently, Large Language Models.

Paradoxically, researchers are struggling even more to reproduce published research. This issue affects all possible aspects of research, including methodology, data curation, approach comparison, and implementation.

In this course, we’ll introduce and discuss the concept of ‘reproducibility’ in research. In particular, we’ll overview current issues in research and existing attempts to address them. We’ll focus on data curation, experimental setup, model comparison, and programming best practices.

This course is recommended for all types of researchers, from those who have just embarked on their journey to those who have always wondered how certain research managed to get published. See Section Prerequisites for more details.

Part 1: Reproducibility in Research

We discuss the current risks in doing research nowadays, characterized by non-reproducible findings, non publicly available resources (e.g., data, code, artifacts), an incredible and non-sustainable urge of publishing large amounts of papers in short time. In this world, either as an aspiring or experience researcher, would you accept (trust) a work that (i) doesn’t provide sufficient information for reproducibility; (ii) doesn’t provide any code; (iii) doesn’t provide the data/guidelines for collecting their contributing dataset; (iv) doesn’t provide training details like model hyper-parameters and data partitioning.

Lecture recordings
  • Lecture 1 – Reproducibility in Research (Pt. I).
  • Lecture 2 - Reproducibility in Research (Pt. II).
Readings

Part 2: Data collection and Annotation

Reproducibility can target different aspects of the research pipeline. From the conceptualization of ideas to collecting resources and doing experiments. We cover several aspects of data collection since it often represents the backbone of most produced machine learning research. In particular, we cover annotation paradigms, requirements for collecting and annotating data, issues and risks when collecting data, and evaluating annotation quality.

Lecture recordings
  • Lecture 3 – Data Collection and Annotation.
Readings
  • Geva et al., 2019 - Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets.
  • Liao et al., 2021 - Are We Learning Yet? A Meta-Review of Evaluation Failures Across Machine Learning.
  • Paullada et al., 2021 - Data and its (dis)contents: A survey of dataset development and use in machine learning research.
  • Koch et al., 2021 - Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research.
  • Röttger et al., 2022 - Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
  • Cabitza et al., 2023 - Toward a Perspectivist Turn in Ground Truthing for Predictive Computing.
  • Ruggeri et al., 2025 - Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology.

Part 3: Modeling and Experimenting

There are other aspects of a machine learning pipeline where talking about reproducibility is essential: modeling, experimenting, and evaluation. We talk about data partitioning, data leakage, random seeding, performance comparison, and metrics.

Lecture recordings
Readings

Part 4: Responsible Research

While there are several issues we might encounter concerning reproducibility, there is also effort in developing solutions to mitigate these issues. One of these solutions in accordance with reproducible research is represented by recommendations checklists.

Lecture recordings
Readings

Part 5: Programming Best Practices

Whether you like it or not, experimental setting might require you to do some coding stuff.

Coding translates to:

  • Transparency (don’t you dare do some cheap tricks!)
  • Correctness (your code should reflect your paper statements)
  • Readability (please, don’t make this a nightmare)
  • Efficiency (time is money)
  • Maintainability (I’m sure you’ll re-use this code)
Lecture recordings
  • Lecture 6 – Programming Best Practices (Pt. I).
  • Lecture 7 – Programming Best Practices (Pt. II).
Readings

Part 6: Cinnamon, a lightweight python library for research

Cinnamon is a lightweight library for general-purpose configuration and code logic de-coupling.

Lecture recordings
  • Lecture 8 – Cinnamon: a lightweight python library for research.
Readings

Course History

  • 2024-2025 –> “Robust and Reproducible Research” (16 hours)
  • 2022-2023 –> “Robust and Reproducible Experimental Deep Learning Setting” (10 hours)

Course Info

16 Hours Lecture Format: 2 hour-long hybrid lectures.

Prerequisites

Lectures are meant to be interactive.

  • Programming: Intermediate
  • Deep Learning Theory: Intermediate
  • Jupyter Notebook: Beginner