netket.stats.statistics

Contents

netket.stats.statistics#

netket.stats.statistics(data)[source]#

Returns statistics of a given array (or matrix, see below) containing a stream of data. This is particularly useful to analyze Markov Chain data, but it can be used also for other type of time series. Assumes same shape on all MPI processes.

Parameters:

data (vector or matrix) – The input data. It can be real or complex valued. * if a vector, it is assumed that this is a time series of data (not necessarily independent); * if a matrix, it is assumed that that rows data[i] contain independent time series.

Returns:

A dictionary-compatible class containing the average (.mean, ["Mean"]), variance (.variance, ["Variance"]), the Monte Carlo standard error of the mean (error_of_mean, ["Sigma"]), an estimate of the autocorrelation time (tau_corr, ["TauCorr"]), and the Gelman-Rubin split-Rhat diagnostic (.R_hat, ["R_hat"]).

If the flag NETKET_EXPERIMENTAL_FFT_AUTOCORRELATION is set, the autocorrelation is computed exactly using a FFT transform, and an extra field tau_corr_max is inserted in the statistics object

These properties can be accessed both the attribute and the dictionary-style syntax (both indicated above).

The split-Rhat diagnostic is based on comparing intra-chain and inter-chain statistics of the sample and is thus only available for 2d-array inputs where the rows are independently sampled MCMC chains. In an ideal MCMC samples, R_hat should be 1.0. If it deviates from this value too much, this indicates MCMC convergence issues. Thresholds such as R_hat > 1.1 or even R_hat > 1.01 have been suggested in the literature for when to discard a sample. (See, e.g., Gelman et al., Bayesian Data Analysis, or Vehtari et al., arXiv:1903.08008.)

Return type:

Stats