Simple Statistics

The first rule about a measurement is: Know what you're measuring.

The second rule about measurement is: Repeat it

How good is your result?

You need to calculate mean and standard deviation, and maybe standard error.

Mean and Standard Deviation

It's easy enough, given a series of measurements of the same thing, to calculate the mean and standard deviation.

mean = sum(sample0, sample1, ...)/nsamples

stddev = (sqrt(sum(sqr(sample0), sqr(sample1), ...) -  sqr(sum(sample0, sample1, ...))/nsamples)/(nsamples - 1)

You can quote this as mean(stddev) where stddev is in terms of the last n digits.

So for example, 100(20) means mean 100, standard deviation 20; 1.00(3) means mean 1, standard deviation 0.03

Another useful figure is the standard error, which for our purposes is the standard deviation divided by the square root of the number of samples.

stderr = stddev/sqrt(nsamples)

Quote this as a ± figure: 100±20 means standard error is twenty.

The standard error is an estimate of the standard deviation of the sample in terms of the standard deviation of the population from which the sample is drawn.

For more details see the wikipedia article

Measuring Performance

Often in systems work we want to measure whether a change we've made (a different algorithm, fixes for scalability on some platform or other) haresults in any performance change. Often, I read in LKML, `The difference is down in the noise'. I immediately want to run screaming to the writer with a textbook on statistics.

Typically, we have a benchmark that we want to show does not perform differently in two circumstances.

There are two cases to consider: 1. where the benchmark gives a single figure of merit (in general it's best to avoid such benchmarks!) 1. where the benchmark gives a series of points that can be plotted, performance vs workload. For example, network throughput against packet size, or cpu usage against number of concurrent jobs.

In either case, the null hypothesis (that we want to establish) is that the two curves are the same, or that the figure-of-merit is the same. The appropriate test is usually Student's T-test.

IA64wiki: SimpleStatistics (last edited 2009-03-12 04:01:54 by PeterChubb)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.