Statistics

Beyond p-values

Biologists, medical doctors and students in these branches of science typically grapple with statistics, but finally many of them reach a basic level of understanding of “significance” and “p-values”. The bad news is that neither of these concepts characterizes sufficiently the reliability of our statistical decisions. This tutorial is intended for those practicing biologists and medical doctors who are interested in the limitations of simply reporting significance or p values. This text is not a comprehensive introduction to hypothesis testing, and basic level of understanding of its principles is assumed. Practical, how-to sections are written on a gray background. At the end of the tutorial, a brief summary without any theory is provided.Download the PDF file here:

A Practical Guide to Significance Beyond p-values

Statistics with Excel

It is often overlooked that Excel can perform most statistical calculations ordinary biologists need. I have created an Excel macro-enabled workbook which will do all the statistical calculations you probably need if you are a biologist.
The capabilities of the program include:

descriptive statistics (mean, SD, SEM, median, mode, skewness, kurtosis)
normality tests
calculations with the normal, binomial and Poisson distributions
z-test, Student's t-tests, Welch test, F test
ANOVA (1-, 2- and 3-way, repeated-measures ANOVA with one factor)
Levene's test
non-parametric tests:Wilcoxon test, sign test, median test, Mann-Whitney test
chi2 test of independence
Kolmogorov-Smirnov test
tests for populations proportions
Kaplan-Meier logrank test
linear and polynomial regression with p value estimations, linear regression on ranks (Spearman)
Deming linear regression (when observations of both the X and Y variables are associated with error)
general purpose fitting
false discovery rate calculation including the Benjamini-Hochberg and the Storey methods

Download the Excel workbook here:
Peter_ManyStatProbes_with_Excel.xlsm

The workbook

requires Excel 2010 or above, and the Solver Add-in.
automatically upgrades itself.

n-way ANOVA from summary statistics in Matlab

The Matlab program anovanFromSumStat can perform one-way, two-way, ... n-way ANOVA on the main and interaction effects when only summary statistics (mean, SD and size of each group) is available.
The program runs in four different modes depending on the first argument:

anovaArray=anovanFromSumStat('gen'): it will generate the array containing the means, SDs and size of each group.
anovaArray=anovanFromSumStat('regen',anovaArray):it will modify the anovaArray created using the 'gen' option.
varargout=anovanFromSumStat('calc',anovaArray): it will perform ANOVA with the array created in the previous step.
anovanFromSumStat('ver'): version of the program is displayed.

Help is available when typing 'help anovanFromSumStat' at the Matlab command prompt.

Download the Matlab P-file here:
anovanFromSumStat.p

The program is also available on MatlabCentral:
https://www.mathworks.com/matlabcentral/fileexchange/41036-n-way-anova-from-summary-statistics

Estimation of the false discovery rate

The Matlab program determines the false discovery rate in a single comparison (t-test).
More description about the program and the principles it is based on is available in this tutorial:
A Practical Guide to Significance Beyond p-values

Download the Matlab M-file here:
fdrEstimation.m

Correction for false discoveries in multiple comparisons

If an investigation requires multiple statistical tests, the probability of reaching false discoveries can be frighteningly high. This Matlab program performs correction according to the Benjamini-Hochberg and the Storey methods. More description about the program and the principles it is based on is available in this tutorial:
A Practical Guide to Significance Beyond p-values

Download the Matlab M-file here:
correctFDR.m

Determine the power of a two-sample t-test

The power of a statistical test is the probability that the test will lead to the rejection of the null hypothesis given it is indeed false. The power can be calculated if the effect size is known. More description about the program and the principles it is based on is available in this tutorial:
Practical Guide to Significance Beyond p-values

Download the Matlab M-file here:
determinePowerTtest.m

Determine the required sample size to reach a certain power in a two-sample t-test

It can often be estimated how large an effect is expected in an investigation. In order for this effect to be detectable in a statistical test, a certain minimum sample size is required, which is determined by this Matlab program. More description about the program and the principles it is based on is available in this tutorial:
A Practical Guide to Significance Beyond p-values

Download the Matlab M-file here:
sampleSizeForTtest.m