Tuesday, 21 December 2010

Sample Size Calculations

Please  type any requests you may have for a "best in class" sample size website into the "Comments" section at the end of this post. I will include the best suggestions in the finished website ...


Sample Size Website - Log

Make no mistake - working out sample sizes is pretty difficult !

My goal is to set up the best, free, sample size calculation website on the internet today. This blog is my story of how the site comes to be.

So far, I have scoped out the project (and it is indeed a project, not a task). The site will work out estimated sample sizes for:

  • Means
  • Proportions
  • Standard Deviations
  • Rates
It will allow the user to specify 1-tailed or 2-tailed tests, and also to specify whether 1-sample or 2-sample trials are being used. For means, the user will be able to specify whether the standard deviations are known or estimated, and whether they are equal (in the 2-sample case).

As a statistician, I already know what equations need to be used. As a programmer, I have written Java classes that calculate the Complimentary Error Function (ERFC), and it's t-test equivalent.

Fortunately, Java already provides the arcsin function, for use with the Dobson-Gebski correction for small proportions!

Next steps?

Test the "ERFC" Java class from user inputs.This should be shortly after the new year ...

Links to Definitions used in Sample Size Calculations

Sample sizes are almost always calculated as part of an experiment, to test a hypothesis. The following describes the hypothesis testing  process, and hyperlinks to useful web pages are included.

Note that where possible, it is more efficient to use means, especially when the data set follows a Gaussian distribution.

What assumptions have been made?
  • For means, is the data set Gaussian? If not, can the central limit theorem be used?
  • For proportions, are there enough points? (e.g. n × p > 10)
Define the conditions to be tested
Then, run the sample size calculation using the appropriate equation



The formula provides an estimate of the sample size needed. The experiment is then run using the sample size provided. After the experiment, the appropriate statistical test is used to accept or reject the null hypothesis.