Introduction#

Reproducibility is the hallmark of the scientific method. And data serves as the assay that enables reproducibility.

In modern science, however, data collection is not just about performing measurements some numbers in a lab. It may include gather noisy data using many sensors across the glob. Also, data process is no longer adding a few numbers together. It may requires complicated software to processing terabytes or even petabytes of data.

While we hope the ultimate scientific findings are algorithm independent, the reality is not that ideal. There are always bugs in software, and sometime they matter. Therefore, the data come out at the end of a data processing pipeline certainly would depend not only on the assumptions made in the pipeline, and also versions of the pipeline. In order to disentangle the true scientific results from the “systematic error”, maybe the experiment setup and the data processing pipeline should also be considered as “input data” the scientific results.