McNemar's Test for Paired Data with Python

McNemar's test is a test for paired data, as in the case of 2x2 contingency tables with a dichotomous trait. The McNemar test determines if the row and marginal column frequencies are equal, also known as marginal homogeneity. For example, McNemar's test can be used when comparing positive/negative results for two tests, surgery vs. non-surgery in siblings and non-siblings, and other instances. McNemar's test is also particularly suited for 'before and after' experiments in which each subject is treated as their own control. The test was developed by Quinn McNemar in 1947.

Consider a 2x2 contingency table with four cells where each cell and its position is denoted $n_{rc}$ where $r$ is the number of rows, and $c$ is the number of columns. The appropriate null hypothesis states the marginal probabilities of each outcome are the same.

$$ n_{11} + n_{12} = n_{11} + n_{21} $$$$ n_{12} + n_{22} = n_{21} + n_{22} $$

The above simplifies to $n_{12} = n_{21}$. Therefore the null hypothesis can be stated more simply as:

$$ H_0: n_{12} = n_{21} $$

The null hypothesis can also be stated as the off-diagonal probabilities of the 2x2 contingency table are the same, with the alternative hypothesis stating the probabilities are not equal. To test this hypothesis, the McNemar test can be used, which is defined as:

$$ \chi^2 = \frac{(n_{12} - n_{21})^2}{n_{12} + n_{21}} $$

This is also known as the asymptotic McNemar test. With an adequate number of samples, the McNemar test statistic, $\chi^2$ has a chi-square distribution with $1$ degree of freedom.

Continuity correction can be applied to the asymptotic McNemar test as proposed by Edwards. The continuity corrected version of the asymptotic McNemar test approximates the McNemar exact conditional test which is described below. The asymptotic McNemar test with continuity correction is defined as:

$$ z = \frac{|n_{12} - n_{21}| - 1}{\sqrt{n_{12} + n_{21}}} $$

Fagerland et al. recommend the asymptotic McNemar test in most cases. The continuity corrected version is not recommended as it has been shown to be overly conservative.

There also exists several variations of the original McNemar test that may have better performance in specific cases.

Variations of the McNemar Test

When the sample sizes of cells $n_{12}$ or $n_{21}$ are small (small being subjective, but generally assumed to be $n < 30$), an exact binomial test can be used to calculate McNemar's test. This is known as the McNemar exact conditional test. The one-sided test is defined as the following:

$$ p_{exact} = \sum^n_{i=n_{12}} \binom{n}{i} \frac{1}{2}^i \left( 1 - \frac{1}{2} \right)^{n - i} $$

The two-sided p-value can also be easily found by multiplying $p_{exact}$ by $2$.

Fagerland et al. do not recommend the exact conditional test as it was found to have least the performance in Type 1 error and power of other McNemar test variations.

The McNemar mid-p test is calculated by subtracting half the point probability of the observed $n_{12}$ cell of the contingency table from the one-sided $p_{exact}$ value using the equation above. The resulting p-value is then doubled to obtain the two-sided mid-p-value. Stated more formally, the McNemar mid-p test is defined as:

$$ p_{mid} = 2 \sum^n_{i=b} \binom{n}{i} \frac{1}{2}^i \left( 1 - \frac{1}{2} \right )^{n - i} - \frac{1}{2} \binom{n}{b} \frac{1}{2}^b \left( 1 - \left( \frac{1}{2} \right) ^{n -b} \right) $$

The mid-p test can also be written more simply as:

$$p_{mid} = p_{exact} - \binom{n}{b} \frac{1}{2}^b \left(1 - \frac{1}{2} \right)^{n-b} $$

According to Fagerland et al., the McNemar mid-p test has much higher performance compared to the McNemar exact conditional test and is a considerable alternative to the McNemar exact unconditional test which is significantly more complex.

Example with Python

The following is an example of how to calculate and perform McNemar's test for a $2 \times 2$ contingency table with Python. To begin, we import the libraries that we will leverage throughout the example.

In [1]:
import numpy as np
from scipy.special import comb
from scipy.stats import chi2, binom

We represent the $2 \times 2$ contingency table as a numpy array with the following values.

In [2]:
a = np.array([[59, 6], [16, 80]])
array([[59,  6],
       [16, 80]])

To calculate McNemar's test statistic, we need to access several of the array elements. To do this, we can take advantage of numpy's array indexing capabilities. We know we need the contingency table cells $x_{12}$ and $x_{21}$ to calculate the test statistic. Thus, taking into account Python's indexing starting at 0, we need to access the [0, 1] and [1, 0] values of the array.

In [3]:
x2_statistic = (np.absolute(a[0, 1] - a[1, 0]) - 1) ** 2 / (a[0, 1] + a[1, 0])

We compute a McNemar test statistic of approximately $3.68$. With the test statistic known, we can calculate the associated p-value. As the test statistic has an approximate chi-square distribution, we use scipy's chi2 with one degree of freedom.

In [4]:
p_value = chi2.sf(x2_statistic, 1)

The p-value is just above the significance level of $0.05$. Therefore we would reject the alternative hypothesis $H_A$ in favor of the null hypothesis $H_0$. However, we can also investigate the exact p-value and mid-p-value to see if these values help provide any further evidence towards rejecting or accepting the null hypothesis.

In [5]:
i = a[0, 1]
n = a[1, 0] + a[0, 1]
i_n = np.arange(i + 1, n + 1)

p_value_exact = 1 - np.sum(comb(n, i_n) * 0.5 ** i_n * (1 - 0.5) ** (n - i_n)) 
p_value_exact *= 2

mid_p_value = p_value_exact - binom.pmf(a[0, 1], n, 0.5)

print('p-value Exact: ', p_value_exact)
print('Mid p-value: ', mid_p_value)
p-value Exact:  0.052478790283203125
Mid p-value:  0.034689664840698256

The mid-p-value is below the $0.05$ significance level threshold, while the exact p-value is just slightly above the threshold. Thus, we are in an interesting position where there may not be enough evidence to reject the null hypothesis.

Asymptotic McNemar's Test (Non-Continuity Corrected)

As noted above, Fagerland et al. mention the continuity corrected version of the McNemar test can be overly conservative. Thus it is often recommended to use the non-continuity corrected test, also known as the asymptotic McNemar's test. The mid-p-value and exact p-value will be the same in both versions of the test. Therefore we only need to re-calculate McNemar's test statistic and the associated p-value.

In [6]:
x2_statistic2 = (a[0, 1] - a[1, 0]) ** 2 / (a[0, 1] + a[1, 0])

p_value = chi2.sf(x2_statistic2, 1)

print('Asymptotic McNemar test statistic: ', x2_statistic)
print('Asymptotic test p-value: ', p_value)
Asymptotic McNemar test statistic:  3.6818181818181817
Asymptotic test p-value:  0.0330062576612325

The asymptotic version of the test gives a p-value of $0.033$, which is below the significance level of $0.05$. Therefore, the asymptotic test gives us evidence that the null hypothesis should be rejected in favor of the alternative.


Edwards AL: Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika 1948, 13(3):185–187.

Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. Retrieved April 14, 2018, from

Gibbons, J. D., & Chakraborti, S. (2010). Nonparametric statistical inference. London: Chapman & Hall.

Wikipedia contributors. (2018, April 29). McNemar's test. In Wikipedia, The Free Encyclopedia. Retrieved 12:24, August 15, 2018, from

Related Posts