Bartlett's test, developed by Maurice Stevenson Bartlett, is a statistical procedure for testing if $k$ population samples have equal variances. Equality of variances in population samples is assumed in commonly used comparison of means tests, such as Student's t-test and analysis of variance. Therefore, a procedure such as Bartlett's test can be conducted to accept or reject the assumption of equal variances across group samples.

Levene's test is an alternative to Barlett's test and is less sensitive to non-normal samples. Thus, Levene's test is generally preferred in most cases, particularly when the underlying distribution of the samples is known. Equality of variances is also known as homoscedasticity or homogeneity of variances.

Bartlett's test statistic, denoted $\chi^2$ (also sometimes denoted as $T$), is approximately chi-square distributed with $k - 1$ degrees of freedom, where $k$ is the number of sample groups. The chi-square approximation does not hold sufficiently when the sample size of a group is $n_i > 5$. The test statistic is defined as:

$$ \chi^2 = \frac{(n - k) \ln(S^2_p) - \sum^k_{i=1} (n_i - 1) \ln(S^2_i)}{1 + \frac{1}{3(k - 1)} \left(\sum^k_{i=1} (\frac{1}{n_i - 1}) - \frac{1}{n - k} \right)} $$where,

- $n$ is the total number of samples across all groups
- $k$ is the number of groups
- $S^2_i$ are the sample variances.

$S^2_p$, the pooled estimate of the samples' variance, is defined as:

$$ S^2_p = \frac{1}{n - k} \sum_i (n_i - 1) S^2_i $$### Bartlett's Test in Python¶

```
import numpy as np
import pandas as pd
from scipy.stats import chi2
import numpy_indexed as npi
```

The `PlantGrowth`

dataset is available in R as part of its standard datasets and can also be downloaded here. After downloading the data, we load it into memory with pandas' `read_csv`

function. Once the data is loaded, we transform the resulting `DataFrame`

into a `numpy`

array with the `.to_numpy`

method. The first three rows of the dataset are then printed to get a sense of what the data contains. We also confirm there are indeed three sample groups in the data using numpy's `unique`

function.

```
plants = pd.read_csv('../../data/PlantGrowth.csv')
plants = plants.to_numpy()
print(plants[:3])
print(list(np.unique(plants[:,2])))
```

Implementing the test procedure in Python is comparatively straightforward. First, we require the total number of samples, $n$, and the number of sample groups, $k$. The number of samples can be found by indexing the first value returned from the `plants`

array `.shape`

method, while the number of unique groups is obtained by taking the length of the array returned by the `unique`

function. We also require the total number of samples within each group and the group's variance, which we compute by grouping the `plants`

array by `len`

and `numpy`

's `var`

functions. The `numpy-indexed`

library and its `group_by`

function are convenient for performing this grouping operation on `numpy`

arrays.

```
n = plants.shape[0]
k = len(np.unique(plants[:,2]))
group_n = np.array([i for _, i in npi.group_by(plants[:, 2], plants[:, 1], len)])
group_variance = np.array([i for _, i in npi.group_by(plants[:, 2], plants[:, 1], np.var)])
```

The sample pooled variance, $S^2_p$, is then computed.

```
pool_var = 1 / (n - k) * np.sum((group_n - 1) * group_variance)
```

As the Bartlett test statistic equation is rather hefty, we split the computation into two variables, the numerator and denominator of the test statistic, to help keep the code easier to read. The numerator and denominator are then divided, which returns the computed test statistic.

```
x2_num = (n - k) * np.log(pool_var) - np.sum((group_n - 1) * np.log(group_variance))
x2_den = 1 + 1 / (3 * (k - 1)) * (np.sum(1 / (group_n - 1)) - 1 / (n - k))
x2 = x2_num / x2_den
x2
```

The Bartlett test statistic, $\chi^2$, is approximately $2.279$. To find the associated p-value of the test statistic, we using the `.cdf`

method from `scipy.stats`

's `chi2`

variable with $k - 1$ degrees of freedom.

```
p = 1 - chi2.cdf(x2, k - 1)
p
```

The reported p-value is over $0.05$. Therefore we accept the null hypothesis that the group variances are equal.