The Van der Waerden test is a non-parametric test for testing the hypothesis that $k$ sample distribution functions are equal. Van der Waerden's test is similar to the Kruskal-Wallis one-way analysis of variance test in that it converts the data to ranks and then to standard normal distribution quantiles. The ranked data is known as the 'normal scores'. Hence, the Van der Waerden test is sometimes referred to as a 'normal scores test'.

The benefit of Van der Waerden's test is that it performs well compared to ANOVA (analysis of variance) when the group population samples are normally distributed and the Kruskal-Wallis test when the samples are not normally distributed.

The null and alternative hypotheses of the Van der Waerden test can be generally stated as follows:

- $H_0$: All of the $k$ population distribution functions are equal.
- $H_A$: At least one of the $k$ population distribution functions are not equal and tend to yield larger observations to the other distribution functions.

### Test Procedure¶

Let $n_j$, be the number of samples for each of the $k$ groups where $j$ is the $j$-th group. $N$ is the number of total samples in all groups, while $X_{ij}$ is the $i$-th value of the $j$-th group. The normal scores used in the Van der Waerden test are calculated as:

$$ A_{ij} = \Phi^{-1} \left( \frac{R \left( X_{ij} \right)}{N + 1} \right) $$where $R(X_{ij})$ and $\Phi^{-1}$ are the ranks of the $X_{ij}$ observation and the quantile function (percent point function) of the normal distribution, respectively. The average normal scores can then be calculated as:

$$ \bar{A}_j = \frac{1}{n_j} \sum^{n_j}_{i=1} A_{ij} \qquad j = 1, 2, \cdots, k $$The variance $s^2$ of the normal scores is defined as:

$$ s^2 = \frac{1}{N - 1} \sum^k_{i=1} \sum^{n_i}_{j=1} A^2_{ij} $$The Van der Waerden test statistic, $T_1$, is defined as:

$$ T_1 = \frac{1}{s^2} \sum^k_{i=1} n_i \bar{A}_i^2 $$As the test is approximate to a chi-square distribution, the critical region for a significance level $\alpha$ is:

$$ T_1 = \chi^2_{\alpha, k-1} $$When the null hypothesis is rejected (the p-value is within the critical region) and at least one of the sample distribution functions differs, a post-hoc multiple comparions test can be performed to get a better sense of which populations differ from the others. Two sample populations, $j_1$ and $j_2$, tend to be different if the following is true:

$$ | \bar{A}_{j_1} - \bar{A}_{j_2} | > s \space t_{1-\frac{\alpha}{2}} \sqrt{\frac{N-1-T_1}{N-k}} \sqrt{\frac{1}{n_{j_1}} + \frac{1}{n_{j_2}}} $$### Van der Waerden's Test in Python¶

```
import numpy as np
import pandas as pd
from scipy.stats import rankdata, norm, chi2, t
import numpy_indexed as npi
from itertools import combinations
```

The `PlantGrowth`

dataset is available in R as part of its standard datasets and can also be downloaded here. After downloading the data, we load it into memory with pandas' `read_csv`

function. Once the data is loaded, we transform the resulting `DataFrame`

into a `numpy`

array with the `.to_numpy`

method. The first three rows of the dataset are then printed to get a sense of what the data contains.

```
plants = pd.read_csv('../../data/PlantGrowth.csv')
plants = plants.to_numpy()
plants[:3]
```

As the dataset description stated, there are two columns (three including the index column), one containing the plant weight of the sample and the sample in which the group belongs. There are three sample groups in the dataset, which we can confirm using numpy's `unique`

function.

```
list(np.unique(plants[:,2]))
```

With the data loaded and inspected, we are ready to proceed with creating Van der Waerden's test! As the test employs ranks of the observations rather than the observed values, we first rank the plant weight column using scipy's `rankdata`

function. The returned ranked array is then merged back into our original array with numpy's `column_stack`

. Similar to before, we then print the first five rows of the new data array to confirm our operations were successful.

```
ranks = rankdata(plants[:, 1], 'average')
plants = np.column_stack([plants, ranks])
plants[:5]
```

Now that the sampled observations are ranked, we can calculate the normal scores, denoted as $A_{ij}$. We first find the number of total samples, denoted as $n$, using the `.shape`

method of the numpy array and the number of groups, denoted as $k$. The normal scores for each ranked observation are then computed by employing the normal distribution quantile function (also known as the point percent function) from `scipy.stats.norm`

. As before, the computed array is then combined with the `plants`

array with `column_stack`

.

```
n = plants.shape[0]
k = len(np.unique(plants[:,2]))
aij = norm.ppf(list(plants[:, 3] / (n + 1)))
plants = np.column_stack([plants, aij])
```

The calculated normal scores can now be used to find the sample group average scores, denoted $\bar{A}_j$, where $j = 1, 2, \cdots, k$, and the total score variance, $s^2$. The `numpy_indexed`

package is handy for grouping `numpy`

arrays. Using the `group_by`

function in the `numpy_indexed`

library, we can find the average normal scores of each group.

```
avg_scores = npi.group_by(plants[:, 2], plants[:, 4], np.mean)
score_variance = np.sum(plants[:, 4] ** 2) / (n - 1)
print('Average Scores:', avg_scores)
print('Score Variance:', score_variance)
```

After obtaining the average normal scores of each group and the score variance, we can compute the $T_1$ Van der Waerden test statistic and the associated p-value. The test statistic is approximated by a chi-square distribution. Therefore we use the scipy `chi2`

variable for finding the p-value.

```
average_scores = np.array([i for _, i in avg_scores])
group_obs = np.array([i for _, i in npi.group_by(plants[:, 2], plants[:, 2], len)])
t1 = np.sum(group_obs * average_scores ** 2) / score_variance
p_value = chi2.sf(t1, k - 1)
print('Test Statistic:', t1)
print('p-value:' , p_value)
```

The reported p-value is below $0.05$. Thus we reject the null hypothesis $H_0$ that the sample population distributions are equal. When the null hypothesis is rejected, a post-hoc multiple comparisons test can be employed to compare each group sample to the others simultaneously to see which pairs of group populations differ. The following code block gets all possible combinations of the groups using the `combinations`

function from Python's standard `itertools`

library. The group combinations are then used to construct a pandas `DataFrame`

where we then compute the multiple comparisons inequality described above.

```
sample_sizes = 1 / np.array(list(combinations(group_obs, 2)))[:, 0] + \
1 / np.array(list(combinations(group_obs, 2)))[:, 1]
group_names = np.unique(plants[:, 2])
groups = pd.DataFrame(np.array(list(combinations(group_names, 2))))
groups['groups'] = groups[0] + ' - ' + groups[1]
groups['score'] = average_scores
average_score_differences = np.abs(np.array(list(combinations(average_scores, 2)))[:, 0] - \
np.array(list(combinations(average_scores, 2)))[:, 1])
groups['difference'] = average_score_differences > np.sqrt(score_variance) * \
t.ppf(1 - 0.05 / 2, n - k) * \
np.sqrt((n - 1 - t1) / (n - k)) * np.sqrt(sample_sizes)
del groups[0]
del groups[1]
groups
```

The multiple comparisons test shows us the two treatment groups, `trt1`

and `trt2`

, differ from one another. Still, all other group combinations tend to not differ.