This is a brief guide to using the pwr
1 package by way of a few examples.
Recall that for a statistical test the following factors are inter-related:
Thus knowing any four factors will provide an estimate for the remaining fifth factor.
In base R
the stats
package has some functions for calculating power, namely:
power.t.test()
,power.prop.test()
, andpower.anova.test()
.The pwr
package includes substitutes for these functions plus a few more.
pwr
packageThe pwr
package has various functions useful for power calculations. The first four below overlap with the above-mentioned stats
package functions:
pwr.t.test()
1-, 2-sample, and paired t-testpwr.t2n.test()
2-sample t-testpwr.2p.test()
2-sample test of proportions (equal size)pwr.anova.test()
balanced 1-way ANOVApwr.2p2n.test()
2-sample test of proportions (unequal size)pwr.p.test()
1-sample test of proportionspwr.r.test()
correlation testpwr.chisq.test()
chi-squared goodness of fit or association testpwr.f2.test()
test of linear model coefficientsOne difference between the base stats
and the pwr
functions is that the latter generally expects standardised (Cohen2) effect sizes as an argument rather than sample statistics such as proportions, means, or variances.
More detailed documentation on the pwr
package can be found in its vignette on CRAN.
Install the pwr
package using the install.packages()
command:
The model for multiple linear regression is as follows:
\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p\]
The null hypothesis is that none of the \(p\) explanatory variables \(x_i\) explain any of the variability in the response variable \(y\). This would mean their regression coefficients, \(\beta_i\), are all statistically indistinguishable from zero.
The alternative hypothesis is that at least one of the coefficients is not equal to zero.
\(H_0: \beta_i = 0, \quad \forall i = 1, 2, \dots, p.\)
\(H_A: \textrm{At least one}\; \beta_i \ne 0,\; \textrm{for}\;i = 1, 2, \dots, p.\)
The pwr
function for calculating sample sizes for multiple linear regression is pwr.f2.test()
.
## function (u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL)
## NULL
The (numerator) degrees of freedom, \(u\), is the number of coefficients you have in your model.
The (denominator) degrees of freedom, \(v\), is the number of error degrees of freedom \(v = n − u − 1\). Rearranging gives an expression for sample size \(n = v + u + 1\) (always rounding up to the next integer).
The effect size \(f^2 = \frac{R^2}{1−R^2}\), where \(R^2\) is the coefficient of determination, otherwise understood as the proportion of variance in the response variable explained by the multiple regression model.
One way to determine the effect size parameter is by first hypothesising an \(R^2\) value, i.e., the proportion of variance that the model will explain.
For example, if we have:
then passing these to the pwr.f2.test()
function:
##
## Multiple regression power calculation
##
## u = 6
## v = 54.09317
## f2 = 0.25
## sig.level = 0.05
## power = 0.8
we get a \(v = 55\) (rounding up).
From this we can calculate the sample size: \(n = v + u + 1 = 55 + 6 + 1 = 62\).
Alternatively, Cohen (1982)3 suggests that \(f^2\) values of 0.02, 0.15, and 0.35 represent small, medium, and large effect sizes respectively.
These values are conveniently stored in the pwr
package and retrieved using the cohen.ES()
function:
##
## Conventional effect size from Cohen (1982)
##
## test = f2
## size = small
## effect.size = 0.02
##
## Conventional effect size from Cohen (1982)
##
## test = f2
## size = medium
## effect.size = 0.15
##
## Conventional effect size from Cohen (1982)
##
## test = f2
## size = large
## effect.size = 0.35
Therefore we have:
##
## Multiple regression power calculation
##
## u = 6
## v = 38.62994
## f2 = 0.35
## sig.level = 0.05
## power = 0.8
Calculating the sample size \(n = v + u + 1 = 39 + 6 + 1 = 46\), i.e., to achieve a power of 80% and be able to detect a large effect size, a sample size of 46 is needed.
ANOVA where each group has the same number of samples, i.e., balanced.
Null hypothesis is that the means of each group are all the same.
Alternative hypothesis is that the mean of at least one group is significantly different.
\(H_0: \mu_1 = \mu_2 = \dots = \mu_k\)
\(H_A: \textrm{at least one}\; \mu_i\; \textrm{is different from the others}\)
##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = small
## effect.size = 0.1
Therefore we have
##
## Balanced one-way analysis of variance power calculation
##
## k = 3
## n = 322.157
## f = 0.1
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
Therefore to have 80% power and be able to detect a small difference in effects between groups, 323 samples are needed in each group. That makes a total of 969 samples! Large numbers of samples are needed if you want to detect small effects reliably.
For a two-sample t-test where each group has the same number of samples, the null hypothesis is that the means of both groups are all the same.
The alternative hypothesis is that the mean of group 2 is larger.
\(H_0: \mu_1 = \mu_2\)
\(H_A: \mu_1 < \mu_2\)
# Looking for a large effect size
pwr.t.test(d = cohen.ES(test = "t", size = "large")$effect.size,
power = 0.80,
sig.level = 0.05,
alternative = "greater")
##
## Two-sample t test power calculation
##
## n = 20.03277
## d = 0.8
## sig.level = 0.05
## power = 0.8
## alternative = greater
##
## NOTE: n is number in *each* group
The required sample size is \(n = 21 \times 2 = 42\).
# Compare result with built-in R function
power.t.test(n = 20,
sd = 1,
sig.level = 0.05,
power = 0.8,
alternative = "one")
##
## Two-sample t test power calculation
##
## n = 20
## delta = 0.8006829
## sd = 1
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
##
## NOTE: n is number in *each* group
Working backwards from sample size, we see that the power.t.test()
returns a similar effect size estimate as the pwr
function.