1 Introduction

This is a brief guide to using the pwr1 package by way of a few examples.

Recall that for a statistical test the following factors are inter-related:

  1. The desired level of significance, \(\alpha\) = P(Type I error)
  2. The power of the test, (1 - \(\beta\)) = 1 - P(Type II error)
  3. The sample size, \(n\)
  4. The minimal effect size of interest
  5. The variance in the response variable

Thus knowing any four factors will provide an estimate for the remaining fifth factor.

In base R the stats package has some functions for calculating power, namely:

  • power.t.test(),
  • power.prop.test(), and
  • power.anova.test().

The pwr package includes substitutes for these functions plus a few more.

1.1 pwr package

The pwr package has various functions useful for power calculations. The first four below overlap with the above-mentioned stats package functions:

  1. pwr.t.test() 1-, 2-sample, and paired t-test
  2. pwr.t2n.test() 2-sample t-test
  3. pwr.2p.test() 2-sample test of proportions (equal size)
  4. pwr.anova.test() balanced 1-way ANOVA
  5. pwr.2p2n.test() 2-sample test of proportions (unequal size)
  6. pwr.p.test() 1-sample test of proportions
  7. pwr.r.test() correlation test
  8. pwr.chisq.test() chi-squared goodness of fit or association test
  9. pwr.f2.test() test of linear model coefficients

One difference between the base stats and the pwr functions is that the latter generally expects standardised (Cohen2) effect sizes as an argument rather than sample statistics such as proportions, means, or variances.

More detailed documentation on the pwr package can be found in its vignette on CRAN.

Install the pwr package using the install.packages() command:

2 Examples

2.1 Multiple linear regression

The model for multiple linear regression is as follows:

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p\]

The null hypothesis is that none of the \(p\) explanatory variables \(x_i\) explain any of the variability in the response variable \(y\). This would mean their regression coefficients, \(\beta_i\), are all statistically indistinguishable from zero.

The alternative hypothesis is that at least one of the coefficients is not equal to zero.

\(H_0: \beta_i = 0, \quad \forall i = 1, 2, \dots, p.\)

\(H_A: \textrm{At least one}\; \beta_i \ne 0,\; \textrm{for}\;i = 1, 2, \dots, p.\)

The pwr function for calculating sample sizes for multiple linear regression is pwr.f2.test().

## function (u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL) 
## NULL

The (numerator) degrees of freedom, \(u\), is the number of coefficients you have in your model.

The (denominator) degrees of freedom, \(v\), is the number of error degrees of freedom \(v = n − u − 1\). Rearranging gives an expression for sample size \(n = v + u + 1\) (always rounding up to the next integer).

The effect size \(f^2 = \frac{R^2}{1−R^2}\), where \(R^2\) is the coefficient of determination, otherwise understood as the proportion of variance in the response variable explained by the multiple regression model.

2.1.1 Determining effect size \(f^2\)

One way to determine the effect size parameter is by first hypothesising an \(R^2\) value, i.e., the proportion of variance that the model will explain.

For example, if we have:

  • six explanatory variables,
  • \(R^2 = 20\%\) which gives \(f^2 = \frac{0.2}{1 - 0.2} = 0.25\),
  • significance level of \(\alpha = 5\%\), and
  • power of \(1 - \beta = 0.8\),

then passing these to the pwr.f2.test() function:

## 
##      Multiple regression power calculation 
## 
##               u = 6
##               v = 54.09317
##              f2 = 0.25
##       sig.level = 0.05
##           power = 0.8

we get a \(v = 55\) (rounding up).

From this we can calculate the sample size: \(n = v + u + 1 = 55 + 6 + 1 = 62\).

2.1.2 Cohen’s suggested \(f^2\) values

Alternatively, Cohen (1982)3 suggests that \(f^2\) values of 0.02, 0.15, and 0.35 represent small, medium, and large effect sizes respectively.

These values are conveniently stored in the pwr package and retrieved using the cohen.ES() function:

## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = f2
##            size = small
##     effect.size = 0.02
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = f2
##            size = medium
##     effect.size = 0.15
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = f2
##            size = large
##     effect.size = 0.35

Therefore we have:

  • six explanatory variables,
  • \(f^2 = 0.35\) (large effect size)
  • significance level of \(\alpha = 5\%\), and
  • power of \(1 - \beta = 0.8\).
## 
##      Multiple regression power calculation 
## 
##               u = 6
##               v = 38.62994
##              f2 = 0.35
##       sig.level = 0.05
##           power = 0.8

Calculating the sample size \(n = v + u + 1 = 39 + 6 + 1 = 46\), i.e., to achieve a power of 80% and be able to detect a large effect size, a sample size of 46 is needed.

2.2 Balanced ANOVA

ANOVA where each group has the same number of samples, i.e., balanced.

Null hypothesis is that the means of each group are all the same.

Alternative hypothesis is that the mean of at least one group is significantly different.

\(H_0: \mu_1 = \mu_2 = \dots = \mu_k\)

\(H_A: \textrm{at least one}\; \mu_i\; \textrm{is different from the others}\)

## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = anov
##            size = small
##     effect.size = 0.1

Therefore we have

  • \(k = 3\) groups,
  • \(f = 0.1\) (small effect size)
  • significance level of \(\alpha = 5\%\), and
  • power of \(1 - \beta = 0.8\).
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 322.157
##               f = 0.1
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: n is number in each group

Therefore to have 80% power and be able to detect a small difference in effects between groups, 323 samples are needed in each group. That makes a total of 969 samples! Large numbers of samples are needed if you want to detect small effects reliably.

2.3 Two-sample t-Test

For a two-sample t-test where each group has the same number of samples, the null hypothesis is that the means of both groups are all the same.

The alternative hypothesis is that the mean of group 2 is larger.

\(H_0: \mu_1 = \mu_2\)

\(H_A: \mu_1 < \mu_2\)

## 
##      Two-sample t test power calculation 
## 
##               n = 20.03277
##               d = 0.8
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater
## 
## NOTE: n is number in *each* group

The required sample size is \(n = 21 \times 2 = 42\).

## 
##      Two-sample t test power calculation 
## 
##               n = 20
##           delta = 0.8006829
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = one.sided
## 
## NOTE: n is number in *each* group

Working backwards from sample size, we see that the power.t.test() returns a similar effect size estimate as the pwr function.


  1. pwr package

  2. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

  3. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.