Calculating the mean and standard deviation in one pass

October 27th, 2009

It is possible to calculate the mean and the standard deviation of a serie of double values while iterating only once on the whole set of data. This approach is thus also suitable for streaming data; it could update the mean and standard deviation on-the-fly.

Here I present an implementation of the algorithm shown by Mark Hoemmen in his paper Computing the standard deviation efficiently as a static Java method. The values are provided by an Iterator. Note that in Java 5+ the iterator could directly force the objects to implement the interface. Here I have chosen to delegate the extraction of the double value to an extractor object (I’m using it on Java 1.4).

	/**
	 * A method to calculate the mean and standard deviation of a series 
	 * of double. The objects are provided by an {@link Iterator} and the 
	 * double value is extracted using the passed extractor.
	 * It skips <code>NaN</code> values.<br>
	 * 
	 * Returns a <code>double[3]</code> where <br>
	 * [0] is the number of values used in the mean<br>
	 * [1] contains the mean value of the series<br>
	 * [2] contains the standard deviation (sigma) of the complete 
	 * population<br>
	 * Returns <code>NaN</code> for the mean and std dev if no valid value 
	 * could be found in the series.<br>
	 * 
	 * Algorithm taken from "Computing the standard deviation efficiently" 
	 * by Mark Hoemmen 
	 * (http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf)
	 * 
	 * @return the number of values, mean and std-dev of the serie
	 * 
	 */
	public static double[] getMeanAndStdDev(Iterator it, DoubleExtractor e){
		while(it.hasNext()){// while initial value is NaN, try next
			double xk = e.doubleValue(it.next());
			if(Double.isNaN(xk)){ 
				continue;
			}
			int k = 1;
			double Mk = xk;
			double Qk = 0;
			while(it.hasNext()){
				xk = e.doubleValue(it.next());
				if(Double.isNaN(xk)){
					continue;
				}
				k++;
				double d = xk-Mk; // is actually xk - Mk-1, 
				                  // as Mk was not yet updated
				Qk += (k-1)*d*d/k;
				Mk += d/k;
			}
			return new double[]{k,Mk,Math.sqrt(Qk/k)};
		} 
		return new double[]{0,Double.NaN,Double.NaN};
	}

using the following interface

/**
 * An object which can return a double representation of passed objects.
 * 
 * @author Philipp Buluschek
 *
 */
public interface DoubleExtractor {
	
	/**
	 * @return the double representation of the passed object.
	 */
	public double doubleValue(Object o);
}

Uncategorized , , ,

Removing french language in Eclipse’s findbugs plugin

September 16th, 2009

When using the findbugs plugin in Eclipse in an environment where the locale is french, all bug descriptions appear in french. Theses messages are not displayed correctly, supposedly because of an encoding incompatbility with HTML.

In order to get the original, english messages, we will remove the french messages from the plugin. Do the following:

  • locate and go to your Eclipse plugin folder (maybe at c:\Program Files\Eclipse\plugins)
  • in the plugins find the findbugs folder (edu.umd.cs.findbugs.plugin.eclipse_x.x.x.xxxxxxxx, where the x stand for a version number)
  • in that folder, you find the plugin as a jar (findbugs-plugin.jar), open it with a zip utility and rename the file messages_fr.xml to (for example) messages_fr.xml.noFrench so that it won’t be found on startup

Uncategorized

Getting the location of an executing batch script (.bat file)

July 24th, 2009

A Windows batch script (.bat file) has an associated directory; the current working folder. It can be accessed in a batch file by the system variable %cd%. If instead, you want to obtain the location of the currently executing script, use %~dp0.

Note that both may be the same, but are not necessarily.

Uncategorized

htpasswd.exe to make your apache password files

May 27th, 2009

It happened several times that I was looking for htpasswd.exe which is distributed with the apache web server, but had it not handy. The apache foundation unfortunately does not distribute this exec separately, and installing / uninstalling the server just to get the password making program is a bit tedious. So here it is:

htpasswd.exe

.
Note that the Windows executable cannot encode the password using the CRYPT method (although it says so in the help screen). If you need to make passwords for a system expecting CRYPT, you may use the service at http://www.askapache.com/online-tools/htpasswd-generator/

Uncategorized , , ,

Quote or escape your Java classpath in cygwin!

April 17th, 2009

If you are using Java in cygwin, a little attention is necessary because two worlds collide. On the one hand, the Windows world, where the java.exe lives and on the other the bash of cygwin in which you are starting the command. In particular, when setting the classpath you need to pay attention to the differences.

Java on Linux separates its classpath entries using the colon (:) while Java on Windows separates the entries with a semi-colon (;). There is a good reason for this; on Windows the colon would have been impractical as it is used in file pathes (c:\…), on Linux the semi-colon is impractical, as it is used to end a command line.

Thus using the Windows Java in cygwin forces us to use the semi-colon (because that’s what the java.exe expects) in an enviornment where the semi-colon is a command line end. So to make it work, you need to either quote the argument or escape the semi-colons
$ java -cp "path/to/include;path/to/some.jar" Main
$ java -cp path/to/include\;path/to/some.jar Main

Else the following errors occur.
When using the semi-colon un-escaped:

$ java -cp .;Printer.jar PrintTest
Usage: java [-options] class [args...]
(to execute a class)
or java [-options] -jar jarfile [args...]
(to execute a jar file)
[...]

When using the colon (which would be correct in a Linux environment)

$ java -cp .:Printer.jar PrintTest
Exception in thread "main" java.lang.NoClassDefFoundError: PrintTest
Caused by: java.lang.ClassNotFoundException: PrintTest
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
Could not find the main class: PrintTest. Program will exit.

Uncategorized , ,

Setting ls to reasonable colors in cygwin

April 15th, 2009

I often navigate folders in Windows explorer without using the mouse. This is quick, as you can jump to any file or folder by typing the first letters of the name and use backspace to go up one level. The speed is probably faster than what you achieve in a bash shell, where you complete names with tab.

In the Windows explorer, folders are listed first and have a yellow icon, files are just plain black-on-white. Using the bash shell in cygwin, I wanted to retain these mnemonic features.

Here is what goes into my .bashrc to achieve this:

alias ls='ls -A --color=auto --group-directories-first'
alias ll='ls -lh'
LS_COLORS='no=00:di=33;01:tw=33;01:ow=33;01'
LS_COLORS=$LS_COLORS':fi=00:ln=00:pi=00:so=00:bd=00:cd=00:or=00:mi=00:ex=00'
LS_COLORS=$LS_COLORS':*.sh=31:*.sh=31:*.exe=31:*.bat=31:*.com=31'
export LS_COLORS

A few words of explanation. It defines two aliases, one shadowing the ls command itself to 1) hide the . and .. folders in each output using the -A option 2) use coloring on device which support it only with the --color=auto 3) have folders listed first with the --group-directories-first. The second alias is really just a shortcut (using the first!) long listed dirs (with human readable file sizes).

The colors are basically bright-yellow (33;01) for all dirs, red (31) for executable files and default (00) for all other files and text. Note that it doesn’t really make sense to define a color for executable files in cygwin, as all existing files are executable per default.

Uncategorized , , ,

Getting the number of dimensions of a multidimensional array in Java

January 21st, 2009

Based on my previous post about cloning multidimensional arrays, I have come up with a short static function to get the number of dimensions which a multidimensional array has.

Note that this method only returns the number of dimensions of the original object passed as argument. It does not expand into the contained objects themselves. Example:

Object[] a = new Object[3][0];
System.out.println(getNumberOfDimensions(a));

prints 2.

Object[] a = new Object[1];
a[0] = new float[2][5][1];
System.out.println(getNumberOfDimensions(a));

prints 1.

Here’s the code:

/** 
 * Returns the number of dimensions which a multidimensional 
 * array has as a positive integer. For example, returns 3 
 * when passed a <code>float[][][]</code> 
 * @param src the multidimensional array 
 * @return the number of dimensions of src 
 * @author Philipp Buluschek bulu@romandie.com 
 */  
public static int getNumberOfDimensions(Object[] src){  
  int dim = 1;  
  Class cl = src.getClass().getComponentType();  
  while(cl.isArray()){  
    dim++;  
    cl = cl.getComponentType(); // will not return null as we tested isArray() above  
  }  
  return dim;  
}  

java , , , ,

Cloning multidimensional arrays in Java

January 13th, 2009

In Java, calling the clone() method on an array returns a copy of the array, that is a new array containing references to the same objects as the source array. In particular, the objects themselves are not copied. As multidimensional arrays are just arrays of arrays, cloning a multidimensional array result in only the first dimension being copied, the array of the second and following dimensions are the same in the source and the destination array.

Below I present a static method to get a real clone of a multidimensional array, meaning one where all dimensions are copied, but not the objects. I use it for example, if I have a private multidimensional array in a class, and want to provide an accessor for it. To assure that the array which is returned by the accessor is totally independant of the one inside the class, all dimensions must be copied correctly (defensive copy).

Note that this implementation does not support arrays containing themselves. This will result in an infinite recursion and eventually a stack overflow.

So without further discussion, here’s the code (compatible JRE 1.3)

/** 
 * Makes a copy of all dimensions of a multidimensional 
 * array. This function does not copy the actual objects, but 
 * only the references. The behavior is the same as {@link #clone()} 
 * but extends to all dimensions of an array.<br> 
 * Note that this implementation does not handle arrays which 
 * contain themselves correctly. It results in a infinite recursion
 * ending in a {@link StackOverflowError}.
 * 
 * @param src a multidimensional array, either of {@link Object}s 
 * or of a primitive type 
 * @return a copy of src, so that each dimension is copied, but 
 * the objects are not cloned 
 * @author Philipp Buluschek bulu@romandie.com 
 */  
public static Object[] deepClone(Object[] src){
  if(src == null){
    return null;
  }
  Object[] dest = (Object[])src.clone();
  for (int i = 0; i < dest.length; i++) {
    Object e = dest[i];
    if (e != null &amp;amp;&amp;amp; e.getClass().isArray()) { 
      // if it is null or not an array, it was taken care of by the clone()
      if (e instanceof Object[]) {
        // using recursion to reach all dimensions
        dest[i] = deepClone((Object[])e);
      }
      else {
        // primitive arr
        if(e instanceof byte[])   dest[i] = ((byte[]) e).clone();
        else if(e instanceof short[])  dest[i] = ((short[]) e).clone();
        else if(e instanceof int[])    dest[i] = ((int[]) e).clone();
        else if(e instanceof long[])   dest[i] = ((long[]) e).clone();
        else if(e instanceof float[])  dest[i] = ((float[]) e).clone();
        else if(e instanceof double[]) dest[i] = ((double[]) e).clone();
        else if(e instanceof boolean[])dest[i] = ((boolean[]) e).clone();
      }
    }
  }
  return dest;
}

The same code using generics for JRE >= 1.5

/** 
 * Makes a copy of all dimensions of a multidimensional 
 * array. This function does not copy the actual objects, but 
 * only the references. The behavior is the same as {@link #clone()} 
 * but extends to all dimensions of an array.<br> 
 * Note that this implementation does not handle arrays which 
 * contain themselves correctly. It results in a infinite recursion
 * ending in a {@link StackOverflowError}.
 * 
 * @param src a multidimensional array, either of {@link Object}s 
 * or of a primitive type 
 * @return a copy of src, so that each dimension is copied, but 
 * the objects are not cloned 
 * @author Philipp Buluschek bulu@romandie.com 
 */  

@SuppressWarnings("unchecked")
public static <T> T[] deepClone(T[] src){
  if(src == null){
    return null;
  }
  T[] dest = src.clone();
  for (int i = 0; i < dest.length; i++) {
    Object e = dest[i];
    if (e != null &amp;amp;&amp;amp; e.getClass().isArray()) { 
      // if it is null or not an array, it was taken care of by the clone()
      if (e instanceof Object[]) {
        // using recursion to reach all dimensions
        dest[i] = (T)(deepClone((Object[])e));
      }
      else {
        // primitive arr
        if(e instanceof byte[])        dest[i] = (T)((byte[]) e).clone();
        else if(e instanceof short[])  dest[i] = (T)((short[]) e).clone();
        else if(e instanceof int[])    dest[i] = (T)((int[]) e).clone();
        else if(e instanceof long[])   dest[i] = (T)((long[]) e).clone();
        else if(e instanceof float[])  dest[i] = (T)((float[]) e).clone();
        else if(e instanceof double[]) dest[i] = (T)((double[]) e).clone();
        else if(e instanceof boolean[])dest[i] = (T)((boolean[]) e).clone();
      }
    }
  }
  return dest;
}

java , , ,