Data and Models

experiments

Below you find a list of available datasets. Currently, they come from dynamical model systems featuring different challenges for causal discovery from time series as discussed in the accompanying Nature Communications Perspective paper. At the end of this page you find information on how to contribute real world datasets or model systems. Clicking on the model name will bring you to a description of the model and a list of experimental datasets. Please see the CauseMe workflow description in HowTo on how to upload your results for these experiments.

You can search through the database by name, description or tags.


Name Long name Type Tags
linear-VAR Linear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear
linear-VAR_aggregated Time-aggregated linear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear, time-aggregation
linear-VAR_dense Linear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear, dense interactions
linear-VAR_multirealizations Linear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear
linear-VAR_noisy Linear vector-autoregressive time series model with observational noise Synthetic Autocorrelation, time delays, linear, observational noise
linear-VAR_subsampled Time-subsampled linear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear, time-subsampling
logistic-deterministic Chaotic logistic map model Synthetic Autocorrelation, time delays, nonlinear, chaotic
logistic-largenoise Chaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
logistic-lownoise Chaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
nongauss-VAR Linear vector-autoregressive time series model with gaussian and non-gaussian noise Synthetic Autocorrelation, time delays, linear, non-gaussian noise
nonlinear-VAR Nonlinear vector-autoregressive time series model Synthetic Autocorrelation, time delays, nonlinear
TestCLIM1-2 Linear climate-type datasets (Testing phase) Hybrid Autocorrelation, time delays, linear
TestClimNoise1-01 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
TestCLIMnoise1-1 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
TestCLIMnoise1-05 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
TestCLIMnoise1-01 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
TestCLIMnonstat1 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, non-stationary trends
TestCLIMnonstat5 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, non-stationary trends
linear-joint-VAR Linear joint vector-autoregressive time series model Synthetic Linear, autocorrelation, time delays, joint model
FinalCLIM2 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
FinalCLIMnoise2-05 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
FinalCLIMnoise2-02 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
FinalCLIMnoise2-01 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
FinalCLIMnoise2-1 Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
Finallinear-VAR FinalLinear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear
Testlinear-VAR TestLinear vector-autoregressive time series model Synthetic Autocorrelation, time delays, linear
Finallogistic-deterministic FinalChaotic logistic map model Synthetic Autocorrelation, time delays, nonlinear, chaotic
Testlogistic-deterministic TestChaotic logistic map model Synthetic Autocorrelation, time delays, nonlinear, chaotic
Finallogistic-lownoise FinalChaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
Testlogistic-lownoise TestChaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
Finallogistic-largenoise FinalChaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
Testlogistic-largenoise TestChaotic logistic map model with dynamical noise Synthetic Autocorrelation, time delays, nonlinear, chaotic
Finalnonlinear-VAR FinalNonlinear vector-autoregressive time series model Synthetic Autocorrelation, time delays, nonlinear
Testnonlinear-VAR TestNonlinear vector-autoregressive time series model Synthetic Autocorrelation, time delays, nonlinear
TestCLIM Climate datasets Hybrid Autocor relation, time delays, linear, time-aggregation
TestCLIMnoise Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
TestCLIMnonstat Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, nonstationarity
FinalCLIM Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation
FinalCLIMnoise Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, observational noise
FinalCLIMnonstat Climate datasets Hybrid Autocorrelation, time delays, linear, time-aggregation, nonstationarity
FinalWEATHsub Weather datasets Hybrid Autocorrelation, time delays, non-linear, time-subsampling
FinalWEATHnoise Weather datasets Hybrid Autocorrelation, time delays, non-linear, observational noise
FinalWEATH Weather datasets Hybrid Autocorrelation, time delays, non-linear
FinalWEATHmiss Weather datasets Hybrid Autocorrelation, time delays, non-linear, missing values
TestWEATHmiss Weather datasets Hybrid Autocorrelation, time delays, non-linear, missing values
TestWEATHnoise Weather datasets Hybrid Autocorrelation, time delays, non-linear, observational noise
TestWEATHsub Weather datasets Hybrid Autocorrelation, time delays, non-linear, time-subsampling
TestWEATH Weather datasets Hybrid Autocorrelation, time delays, non-linear
data_river data_river
CTM-EI The output from Chemistry Transport Model (CTM) driven by ERA-Interim Reanalysis Hybrid Linear, real data, contemporaneous links
bSCMC_size bivariate structural causal model characteristics data (bSCMC): split by sample sizes; aggregated over functional dependencies, cause types, noise types and mutual informations Synthetic, Bivariate Iid, various functions, various noises, various dependence strengths, contemporaneous, non-timeseries
bSCMC_funcType_size bivariate structural causal model characteristics data (bSCMC): split by functional dependencies and sample sizes; aggregated over cause types, noise types and mutual informations Synthetic, Bivariate Iid, various functions, various noises, various dependence strengths, contemporaneous, non-timeseries
bSCMC_funcType_causeType_mi_size bivariate structural causal model characteristics data (bSCMC): split by functional dependencies, cause types, mutual informations and sample sizes; aggregated over noise types Synthetic, Bivariate Iid, various functions, various noises, various dependence strengths, contemporaneous, non-timeseries
bSCMC_funcType_causeType_noiseType_mi_size bivariate structural causal model characteristics data (bSCMC): split by functional dependencies, cause types, noise types, mutual informations and sample sizes Synthetic, Bivariate Iid, various functions, various noises, various dependence strengths, contemporaneous, non-timeseries
bSCMC_funcType_causeType_size bivariate structural causal model characteristics data (bSCMC): split by functional dependencies, cause types and sample sizes; aggregated over noise types and mutual informations Synthetic, Bivariate Iid, various functions, various noises, various dependence strengths, contemporaneous, non-timeseries
river-runoff River runoff data Real Real data, contemporaneous time lag


User uploaded data files



Name Long name Type Tags

Contribute data


If you are interested in contributing a new model or real world dataset with known ground truth, contact us at 'info at causeme.net'. Notes:

  • Datasets must be accompanied by ground truth on the existence or absence of causal links.
  • In addition, ground truth on time lags must be provided (write us if we should make that optional).
  • Ideally, more than one dataset with the same underlying challenges is provided to allow for more robust method evaluation; this is especially useful for synthetic datasets; for real data this may correspond to different measurement locations of the same variables; however, also submissions with just one dataset are welcome.
  • ALL such datasets must have the same number of variables and time series length, but can have different ground truth.
  • Datasets should be described accurately and credit to data sources given.
  • Ideally, provide a description paper URL.

To help us with the integration into CauseMe please use this script. It helps you to setup a plain JSON dictionary file with the following fields (closely follow the comments in the provided script):

  • name: The name should only contain letters, numbers, and hyphens.
  • longname: Longer title that will be shown in the dataset list on CauseMe.
  • tags: Provide tags such as linear, nonlinear, time delays, linear, nonlinear, contemporaneous links, autocorrelation, nonstationarity, chaotic, time-aggregation (see Fig. 4 in Runge et al. Nature Comm. (2019)).
  • type: Indicate type of data among 'synthetic', 'real', or 'hybrid'.
  • description: A full description of the model (see the examples in the list).
  • experiment: Must be of the form name_N-XX_T-XX where N denotes the number of variables and T the sample size/length of time series.
  • url_paper: Optional entry if description paper is available.
  • datasets: List of datasets for that model. For instance a model can have 200 datasets composed of 3 time series of length 150 each. This will be encoded as a list of vectors as [[[1.06265, -0.61032, -0.22016], [0.41801, -1.12347, -0.51735], [0.32098, 1.23107, 0.23685], ...]]. A single real dataset will be a list of only one element then.
  • truths: Row-major flattened matrices of integers 0 or 1 where ground truth causal relationships are encoded. We must have as many matrices as datasets. In the example above, we will have 200 matrices of 3x3 size (which flattened will be 9x1 vectors). The A_ij element of this matrix, corresponding to the i-th row and j-th column, indicates a DIRECT causal link i --> j, i.e., variable i causes variable j. An entry A_ij = 0 indicates NO DIRECT causal link. Here 'direct' means that it does not go through any of the other variables in the dataset. Of course, there might be an INDIRECT causal effect. Self-causation, i.e., entries on the diagonal, can also be filled, but these are currently not evaluated by CauseMe. Feedbacks, i.e., i --> j and j --> i are also allowed.
  • lag_matrix: These must be in the same format as truths and contain non-negative integers 0, 1, 2,... indicating the causal time lag for each non-zero entry in the truths matrix. A lag can also be 0 (contemporaneous).