Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such high-dimensional dynamical systems often involve time-delays, nonlinearity, and strong autocorrelations. These present major challenges for causal discovery techniques such as the traditional Granger causality. Further method development and comparison requires datasets with known causal ground truth.
The CauseMe platform provides such benchmark datasets generated from synthetic models mimicking real data challenges as well as real data sets where the causal structure is known with high confidence. This allows to assess the performance of causal inference methods in time series problems and help choose the right method for a particular problem. We believe that data sharing and reproducability will greatly advance progress on method development. The available benchmark datasets vary in dimensionality, complexity and sophistication. The aim is to assess methods capabilities under a common experimental framework.
There are two ways to contribute:
The datasets are released to the scientific community for analysis and experimentation. Method developers can upload their predictions (matrices of causal connections) and the platform evaluates and ranks the methods according to different metrics of performance. CauseMe currently contains several different datasets, but it is ready to scale up to many more! We encourage contributions from applied sciences and practitioners in many areas.
Each experiment (with Id below) consists of several hundred datasets. This allows to not only assess detection power on a single dataset, but also the robustness of the method. Each dataset contains a set of N time series with time series length T. Some further information (such as maximum time lag of interactions, nonlinearity, etc.) may also be given. Causal methods need to be run on all datasets of an experiment. After registering and logging in, more information, zip files of the datasets for an experiment, and code snippets in Python, MATLAB and R languages are given to help in the process.
|1||5||150||max. time lag 5, linear dependencies|
|2||10||150||max. time lag 5, linear dependencies|
|3||20||150||max. time lag 5, linear dependencies|
If you find the platform useful, just acknowledge it citing these references:
CauseMe: An online system for benchmarking causal inference methods. J. Muñoz-Marí, G. Mateo, J. Runge, and G. Camps-Valls. In preparation (2018)
Inferring causation from time series: A fresh look at an old problem. J. Runge et al. Under review at Nature Communications (2018).
Detecting causal associations in large nonlinear time series datasets. J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic. ArXiv e-prints:1702.07007v2 (2018).
Causal network reconstruction from time series: From theoretical assumptions to practical estimation. J. Runge. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28:7 (2018).