Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such high-dimensional dynamical systems often involve time-delays, nonlinearity, and strong autocorrelations. These present major challenges for causal discovery techniques such as the traditional Granger causality. Further method development and comparison requires datasets with known causal ground truth. Our platform extends previous causality challenges as listed below under Links.
The CauseMe platform provides such benchmark datasets generated from synthetic models mimicking real data challenges as well as real data sets where the causal structure is known with high confidence. This allows to assess the performance of causal inference methods in time series problems and help choose the right method for a particular problem. We believe that data sharing and reproducability will greatly advance progress on method development. The available benchmark datasets vary in dimensionality, complexity and sophistication. The aim is to assess methods capabilities under a common experimental framework.
There are two ways to contribute:
The datasets are released to the scientific community for analysis and experimentation. Method developers can upload their predictions (matrices of causal connections) and the platform evaluates and ranks the methods according to different metrics of performance. CauseMe currently contains several different datasets, but it is ready to scale up to many more! We encourage contributions from applied sciences and practitioners in many areas.
Please contact us if you want to contribute!
Causeme covers a wide range of synthetic model data mimicking a number of real data challenges. These cover time delays, autocorrelation, nonlinearity, chaotic dynamics, extreme events, measurement error, and will be extended by many more.
To further assess the scalability of methods, for each such model we provide experiments with different numbers of variables and length of time series. Each experiment consists of several hundred datasets to not only assess detection power on a single dataset, but also assess the robustness of the method. After registering and logging in, more information, zip files of the datasets for an experiment, and code snippets in Python, MATLAB and R languages are given to help in the process.
If you find the platform useful, just acknowledge it citing these references:
CauseMe: An online system for benchmarking causal inference methods. J. Muñoz-Marí, G. Mateo, J. Runge, and G. Camps-Valls. In preparation (2018)
Inferring causation from time series with perspectives in Earth system sciences. J. Runge et al. Under review at Nature Communications (2019).
Detecting causal associations in large nonlinear time series datasets. J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic. ArXiv e-prints:1702.07007v2 (2018).
Causal network reconstruction from time series: From theoretical assumptions to practical estimation. J. Runge. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28:7 (2018).