## Calculating the mean and standard deviation in one pass

October 27th, 2009

It is possible to calculate the mean and the standard deviation of a serie of double values while iterating only once on the whole set of data. This approach is thus also suitable for streaming data; it could update the mean and standard deviation on-the-fly.

Here I present an implementation of the algorithm shown by Mark Hoemmen in his paper Computing the standard deviation efficiently as a static Java method. The values are provided by an `Iterator`

. Note that in Java 5+ the iterator could directly force the objects to implement the interface. Here I have chosen to delegate the extraction of the double value to an extractor object (I’m using it on Java 1.4).

/** * A method to calculate the mean and standard deviation of a series * of double. The objects are provided by an {@link Iterator} and the * double value is extracted using the passed extractor. * It skips <code>NaN</code> values.<br> * * Returns a <code>double[3]</code> where <br> * [0] is the number of values used in the mean<br> * [1] contains the mean value of the series<br> * [2] contains the standard deviation (sigma) of the complete * population<br> * Returns <code>NaN</code> for the mean and std dev if no valid value * could be found in the series.<br> * * Algorithm taken from "Computing the standard deviation efficiently" * by Mark Hoemmen * (http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf) * * @return the number of values, mean and std-dev of the serie * */ public static double[] getMeanAndStdDev(Iterator it, DoubleExtractor e){ while(it.hasNext()){// while initial value is NaN, try next double xk = e.doubleValue(it.next()); if(Double.isNaN(xk)){ continue; } int k = 1; double Mk = xk; double Qk = 0; while(it.hasNext()){ xk = e.doubleValue(it.next()); if(Double.isNaN(xk)){ continue; } k++; double d = xk-Mk; // is actually xk - Mk-1, // as Mk was not yet updated Qk += (k-1)*d*d/k; Mk += d/k; } return new double[]{k,Mk,Math.sqrt(Qk/k)}; } return new double[]{0,Double.NaN,Double.NaN}; }

using the following interface

/** * An object which can return a double representation of passed objects. * * @author Philipp Buluschek * */ public interface DoubleExtractor { /** * @return the double representation of the passed object. */ public double doubleValue(Object o); }

Thanks! here a similar C implementation:

typedef struct statistics_s {

size_t k; // sample count

double Mk; // mean

double Qk; // standard variance

} statistics_t;

static void

statistics_record (statistics_t *stats, double x) {

stats->k++;

if (1 == stats->k) {

stats->Mk = x;

stats->Qk = 0;

} else {

double d = x – stats->Mk; // is actually xk – M_{k-1},

// as Mk was not yet updated

stats->Qk += (stats->k-1)*d*d/stats->k;

stats->Mk += d/stats->k;

}

}

static double

statistics_stdev (statistics_t *stats)

{

return sqrt (stats->Qk/stats->k);

}

static size_t

statistics_nsamples (statistics_t *stats)

{

return stats->k;

}

static double

statistics_variance (statistics_t *stats)

{

return stats->Qk / (stats->k – 1);

}

static double

statistics_stdvar (statistics_t *stats)

{

return stats->Qk / stats->k;

}

static void

statistics_init (statistics_t *stats)

{

stats->k = 0;

}