The 21st century is surely the century of data. New technologies now permit the collection of data by virtually all scientific, educational, governmental, societal and commercial enterprises, leading to not only an explosion in the amount of data but also in its diversity, complexity, and velocity. The opportunities for information and knowledge extraction from these data are enormous, however they present new challenges to reproducibility and verifiability. In this talk I will outline issues in reproducibility in the big data context and motivate both technical and nontechnical solutions. I will present ResearchCompendia.org, a tool I have been collaboratively developing to both persistently associate data and code with published findings, and verify those findings. I will also present recent empirical research intended to illuminate data and code sharing practices and inform policy steps to enable really reproducible research.
…Read more
Less…