Home > Uncategorized > Calculating the mean and standard deviation in one pass

Calculating the mean and standard deviation in one pass

October 27th, 2009

It is possible to calculate the mean and the standard deviation of a serie of double values while iterating only once on the whole set of data. This approach is thus also suitable for streaming data; it could update the mean and standard deviation on-the-fly.

Here I present an implementation of the algorithm shown by Mark Hoemmen in his paper Computing the standard deviation efficiently as a static Java method. The values are provided by an Iterator. Note that in Java 5+ the iterator could directly force the objects to implement the interface. Here I have chosen to delegate the extraction of the double value to an extractor object (I’m using it on Java 1.4).

	/**
	 * A method to calculate the mean and standard deviation of a series 
	 * of double. The objects are provided by an {@link Iterator} and the 
	 * double value is extracted using the passed extractor.
	 * It skips <code>NaN</code> values.<br>
	 * 
	 * Returns a <code>double[3]</code> where <br>
	 * [0] is the number of values used in the mean<br>
	 * [1] contains the mean value of the series<br>
	 * [2] contains the standard deviation (sigma) of the complete 
	 * population<br>
	 * Returns <code>NaN</code> for the mean and std dev if no valid value 
	 * could be found in the series.<br>
	 * 
	 * Algorithm taken from "Computing the standard deviation efficiently" 
	 * by Mark Hoemmen 
	 * (http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf)
	 * 
	 * @return the number of values, mean and std-dev of the serie
	 * 
	 */
	public static double[] getMeanAndStdDev(Iterator it, DoubleExtractor e){
		while(it.hasNext()){// while initial value is NaN, try next
			double xk = e.doubleValue(it.next());
			if(Double.isNaN(xk)){ 
				continue;
			}
			int k = 1;
			double Mk = xk;
			double Qk = 0;
			while(it.hasNext()){
				xk = e.doubleValue(it.next());
				if(Double.isNaN(xk)){
					continue;
				}
				k++;
				double d = xk-Mk; // is actually xk - Mk-1, 
				                  // as Mk was not yet updated
				Qk += (k-1)*d*d/k;
				Mk += d/k;
			}
			return new double[]{k,Mk,Math.sqrt(Qk/k)};
		} 
		return new double[]{0,Double.NaN,Double.NaN};
	}

using the following interface

/**
 * An object which can return a double representation of passed objects.
 * 
 * @author Philipp Buluschek
 *
 */
public interface DoubleExtractor {
	
	/**
	 * @return the double representation of the passed object.
	 */
	public double doubleValue(Object o);
}

Uncategorized , , ,

  1. Alfons Laarman
    May 1st, 2012 at 04:39 | #1

    Thanks! here a similar C implementation:

    typedef struct statistics_s {
    size_t k; // sample count
    double Mk; // mean
    double Qk; // standard variance
    } statistics_t;

    static void
    statistics_record (statistics_t *stats, double x) {
    stats->k++;
    if (1 == stats->k) {
    stats->Mk = x;
    stats->Qk = 0;
    } else {
    double d = x – stats->Mk; // is actually xk – M_{k-1},
    // as Mk was not yet updated
    stats->Qk += (stats->k-1)*d*d/stats->k;
    stats->Mk += d/stats->k;
    }
    }

    static double
    statistics_stdev (statistics_t *stats)
    {
    return sqrt (stats->Qk/stats->k);
    }

    static size_t
    statistics_nsamples (statistics_t *stats)
    {
    return stats->k;
    }

    static double
    statistics_variance (statistics_t *stats)
    {
    return stats->Qk / (stats->k – 1);
    }

    static double
    statistics_stdvar (statistics_t *stats)
    {
    return stats->Qk / stats->k;
    }

    static void
    statistics_init (statistics_t *stats)
    {
    stats->k = 0;
    }

  1. No trackbacks yet.