Home > Uncategorized > Calculating the mean and standard deviation in one pass

## Calculating the mean and standard deviation in one pass

October 27th, 2009

It is possible to calculate the mean and the standard deviation of a serie of double values while iterating only once on the whole set of data. This approach is thus also suitable for streaming data; it could update the mean and standard deviation on-the-fly.

Here I present an implementation of the algorithm shown by Mark Hoemmen in his paper Computing the standard deviation efficiently as a static Java method. The values are provided by an `Iterator`. Note that in Java 5+ the iterator could directly force the objects to implement the interface. Here I have chosen to delegate the extraction of the double value to an extractor object (I’m using it on Java 1.4).

```	/**
* A method to calculate the mean and standard deviation of a series
* of double. The objects are provided by an {@link Iterator} and the
* double value is extracted using the passed extractor.
* It skips <code>NaN</code> values.<br>
*
* Returns a <code>double</code> where <br>
*  is the number of values used in the mean<br>
*  contains the mean value of the series<br>
*  contains the standard deviation (sigma) of the complete
* population<br>
* Returns <code>NaN</code> for the mean and std dev if no valid value
* could be found in the series.<br>
*
* Algorithm taken from "Computing the standard deviation efficiently"
* by Mark Hoemmen
* (http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf)
*
* @return the number of values, mean and std-dev of the serie
*
*/
public static double[] getMeanAndStdDev(Iterator it, DoubleExtractor e){
while(it.hasNext()){// while initial value is NaN, try next
double xk = e.doubleValue(it.next());
if(Double.isNaN(xk)){
continue;
}
int k = 1;
double Mk = xk;
double Qk = 0;
while(it.hasNext()){
xk = e.doubleValue(it.next());
if(Double.isNaN(xk)){
continue;
}
k++;
double d = xk-Mk; // is actually xk - Mk-1,
// as Mk was not yet updated
Qk += (k-1)*d*d/k;
Mk += d/k;
}
return new double[]{k,Mk,Math.sqrt(Qk/k)};
}
return new double[]{0,Double.NaN,Double.NaN};
}
```

using the following interface

```/**
* An object which can return a double representation of passed objects.
*
* @author Philipp Buluschek
*
*/
public interface DoubleExtractor {

/**
* @return the double representation of the passed object.
*/
public double doubleValue(Object o);
}
```
1. May 1st, 2012 at 04:39 | #1

Thanks! here a similar C implementation:

typedef struct statistics_s {
size_t k; // sample count
double Mk; // mean
double Qk; // standard variance
} statistics_t;

static void
statistics_record (statistics_t *stats, double x) {
stats->k++;
if (1 == stats->k) {
stats->Mk = x;
stats->Qk = 0;
} else {
double d = x – stats->Mk; // is actually xk – M_{k-1},
// as Mk was not yet updated
stats->Qk += (stats->k-1)*d*d/stats->k;
stats->Mk += d/stats->k;
}
}

static double
statistics_stdev (statistics_t *stats)
{
return sqrt (stats->Qk/stats->k);
}

static size_t
statistics_nsamples (statistics_t *stats)
{
return stats->k;
}

static double
statistics_variance (statistics_t *stats)
{
return stats->Qk / (stats->k – 1);
}

static double
statistics_stdvar (statistics_t *stats)
{
return stats->Qk / stats->k;
}

static void
statistics_init (statistics_t *stats)
{
stats->k = 0;
}