Van der Waerden's Normal Scores Test

The Van der Waerden test is a non-parametric test for testing the hypothesis that $k$ sample distribution functions are equal. Van der Waerden's test is similar to the Kruskal-Wallis one-way analysis of variance test in that it converts the data to ranks and then to standard normal distribution quantiles. The ranked data is known as the 'normal scores'. Hence, the Van der Waerden test is sometimes referred to as a 'normal scores test'.

The benefit of Van der Waerden's test is that it performs well compared to ANOVA (analysis of variance) when the group population samples are normally distributed and the Kruskal-Wallis test when the samples are not normally distributed.

The null and alternative hypotheses of the Van der Waerden test can be generally stated as follows:

  • $H_0$: All of the $k$ population distribution functions are equal.
  • $H_A$: At least one of the $k$ population distribution functions are not equal and tend to yield larger observations to the other distribution functions.

Test Procedure

Let $n_j$, be the number of samples for each of the $k$ groups where $j$ is the $j$-th group. $N$ is the number of total samples in all groups, while $X_{ij}$ is the $i$-th value of the $j$-th group. The normal scores used in the Van der Waerden test are calculated as:

$$ A_{ij} = \Phi^{-1} \left( \frac{R \left( X_{ij} \right)}{N + 1} \right) $$

where $R(X_{ij})$ and $\Phi^{-1}$ are the ranks of the $X_{ij}$ observation and the quantile function (percent point function) of the normal distribution, respectively. The average normal scores can then be calculated as:

$$ \bar{A}_j = \frac{1}{n_j} \sum^{n_j}_{i=1} A_{ij} \qquad j = 1, 2, \cdots, k $$

The variance $s^2$ of the normal scores is defined as:

$$ s^2 = \frac{1}{N - 1} \sum^k_{i=1} \sum^{n_i}_{j=1} A^2_{ij} $$

The Van der Waerden test statistic, $T_1$, is defined as:

$$ T_1 = \frac{1}{s^2} \sum^k_{i=1} n_i \bar{A}_i^2 $$

As the test is approximate to a chi-square distribution, the critical region for a significance level $\alpha$ is:

$$ T_1 = \chi^2_{\alpha, k-1} $$

When the null hypothesis is rejected (the p-value is within the critical region) and at least one of the sample distribution functions differs, a post-hoc multiple comparions test can be performed to get a better sense of which populations differ from the others. Two sample populations, $j_1$ and $j_2$, tend to be different if the following is true:

$$ | \bar{A}_{j_1} - \bar{A}_{j_2} | > s \space t_{1-\frac{\alpha}{2}} \sqrt{\frac{N-1-T_1}{N-k}} \sqrt{\frac{1}{n_{j_1}} + \frac{1}{n_{j_2}}} $$

Van der Waerden's Test in Python

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import rankdata, norm, chi2, t
import numpy_indexed as npi
from itertools import combinations

The PlantGrowth dataset is available in R as part of its standard datasets and can also be downloaded here. After downloading the data, we load it into memory with pandas' read_csv function. Once the data is loaded, we transform the resulting DataFrame into a numpy array with the .to_numpy method. The first three rows of the dataset are then printed to get a sense of what the data contains.

In [2]:
plants = pd.read_csv('../../data/PlantGrowth.csv')
plants = plants.to_numpy()
array([[1, 4.17, 'ctrl'],
       [2, 5.58, 'ctrl'],
       [3, 5.18, 'ctrl']], dtype=object)

As the dataset description stated, there are two columns (three including the index column), one containing the plant weight of the sample and the sample in which the group belongs. There are three sample groups in the dataset, which we can confirm using numpy's unique function.

In [3]:
['ctrl', 'trt1', 'trt2']

With the data loaded and inspected, we are ready to proceed with creating Van der Waerden's test! As the test employs ranks of the observations rather than the observed values, we first rank the plant weight column using scipy's rankdata function. The returned ranked array is then merged back into our original array with numpy's column_stack. Similar to before, we then print the first five rows of the new data array to confirm our operations were successful.

In [4]:
ranks = rankdata(plants[:, 1], 'average')
plants = np.column_stack([plants, ranks])
array([[1, 4.17, 'ctrl', 3.5],
       [2, 5.58, 'ctrl', 24.0],
       [3, 5.18, 'ctrl', 17.0],
       [4, 6.11, 'ctrl', 28.0],
       [5, 4.5, 'ctrl', 7.0]], dtype=object)

Now that the sampled observations are ranked, we can calculate the normal scores, denoted as $A_{ij}$. We first find the number of total samples, denoted as $n$, using the .shape method of the numpy array and the number of groups, denoted as $k$. The normal scores for each ranked observation are then computed by employing the normal distribution quantile function (also known as the point percent function) from scipy.stats.norm. As before, the computed array is then combined with the plants array with column_stack.

In [5]:
n = plants.shape[0]
k = len(np.unique(plants[:,2]))

aij = norm.ppf(list(plants[:, 3] / (n + 1)))
plants = np.column_stack([plants, aij])

The calculated normal scores can now be used to find the sample group average scores, denoted $\bar{A}_j$, where $j = 1, 2, \cdots, k$, and the total score variance, $s^2$. The numpy_indexed package is handy for grouping numpy arrays. Using the group_by function in the numpy_indexed library, we can find the average normal scores of each group.

In [6]:
avg_scores = npi.group_by(plants[:, 2], plants[:, 4], np.mean)
score_variance = np.sum(plants[:, 4] ** 2) / (n - 1)

print('Average Scores:', avg_scores)
print('Score Variance:', score_variance)
Average Scores: [('ctrl', -0.061896845217548804), ('trt1', -0.5431352145948504), ('trt2', 0.6058987377889883)]
Score Variance: 0.8402744001083048

After obtaining the average normal scores of each group and the score variance, we can compute the $T_1$ Van der Waerden test statistic and the associated p-value. The test statistic is approximated by a chi-square distribution. Therefore we use the scipy chi2 variable for finding the p-value.

In [7]:
average_scores = np.array([i for _, i in avg_scores])
group_obs = np.array([i for _, i in npi.group_by(plants[:, 2], plants[:, 2], len)])
t1 = np.sum(group_obs * average_scores ** 2) / score_variance

p_value = chi2.sf(t1, k - 1)

print('Test Statistic:', t1)
print('p-value:' , p_value)
Test Statistic: 7.925272519897477
p-value: 0.019012925151783353

The reported p-value is below $0.05$. Thus we reject the null hypothesis $H_0$ that the sample population distributions are equal. When the null hypothesis is rejected, a post-hoc multiple comparisons test can be employed to compare each group sample to the others simultaneously to see which pairs of group populations differ. The following code block gets all possible combinations of the groups using the combinations function from Python's standard itertools library. The group combinations are then used to construct a pandas DataFrame where we then compute the multiple comparisons inequality described above.

In [8]:
sample_sizes = 1 / np.array(list(combinations(group_obs, 2)))[:, 0] + \
               1 / np.array(list(combinations(group_obs, 2)))[:, 1]

group_names = np.unique(plants[:, 2])

groups = pd.DataFrame(np.array(list(combinations(group_names, 2))))

groups['groups'] = groups[0] + ' - ' + groups[1]
groups['score'] = average_scores

average_score_differences = np.abs(np.array(list(combinations(average_scores, 2)))[:, 0] - \
                            np.array(list(combinations(average_scores, 2)))[:, 1])

groups['difference'] = average_score_differences > np.sqrt(score_variance) * \
                       t.ppf(1 - 0.05 / 2, n - k) * \
                       np.sqrt((n - 1 - t1) / (n - k)) * np.sqrt(sample_sizes)

del groups[0]
del groups[1]

groups score difference
0 ctrl - trt1 -0.061897 False
1 ctrl - trt2 -0.543135 False
2 trt1 - trt2 0.605899 True

The multiple comparisons test shows us the two treatment groups, trt1 and trt2, differ from one another. Still, all other group combinations tend to not differ.


Related Posts