Calculating the mean and standard deviation in one pass

It is possible to calculate the mean and the standard deviation of a serie of double values while iterating only once on the whole set of data. This approach is thus also suitable for streaming data; it could update the mean and standard deviation on-the-fly.

Here I present an implementation of the algorithm shown by Mark Hoemmen in his paper Computing the standard deviation efficiently as a static Java method. The values are provided by an Iterator. Note that in Java 5+ the iterator could directly force the objects to implement the interface. Here I have chosen to delegate the extraction of the double value to an extractor object (I’m using it on Java 1.4).

[sourcecode lang=”java”]
/**
* A method to calculate the mean and standard deviation of a series
* of double. The objects are provided by an {@link Iterator} and the
* double value is extracted using the passed extractor.
* It skips <code>NaN</code> values.<br>
*
* Returns a <code>double[3]</code> where <br>
* [0] is the number of values used in the mean<br>
* [1] contains the mean value of the series<br>
* [2] contains the standard deviation (sigma) of the complete
* population<br>
* Returns <code>NaN</code> for the mean and std dev if no valid value
* could be found in the series.<br>
*
* Algorithm taken from "Computing the standard deviation efficiently"
* by Mark Hoemmen
* (http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf)
*
* @return the number of values, mean and std-dev of the serie
*
*/
public static double[] getMeanAndStdDev(Iterator it, DoubleExtractor e){
while(it.hasNext()){// while initial value is NaN, try next
double xk = e.doubleValue(it.next());
if(Double.isNaN(xk)){
continue;
}
int k = 1;
double Mk = xk;
double Qk = 0;
while(it.hasNext()){
xk = e.doubleValue(it.next());
if(Double.isNaN(xk)){
continue;
}
k++;
double d = xk-Mk; // is actually xk – Mk-1,
// as Mk was not yet updated
Qk += (k-1)*d*d/k;
Mk += d/k;
}
return new double[]{k,Mk,Math.sqrt(Qk/k)};
}
return new double[]{0,Double.NaN,Double.NaN};
}
[/sourcecode]

using the following interface
[sourcecode lang=”java”]
/**
* An object which can return a double representation of passed objects.
*
* @author Philipp Buluschek
*
*/
public interface DoubleExtractor {

/**
* @return the double representation of the passed object.
*/
public double doubleValue(Object o);
}
[/sourcecode]


Posted

in

by