Processing math: 100%

Aaron Schlegel's Notebook of Interesting Things

Median Test

The median test, sometimes referred to as Mood's median test is a nonparametric procedure for investigating whether the medians of the populations from which $k$ sample groups are drawn are equal. The test is a particular case of the chi-square test of dependence.

Read more

Tagged as : Statistics Python
Chi-Square Test of Independence for R x C Contingency Tables

The chi-square test is often used to assess the significance (if any) of the differences among $k$ different

Read more

Tagged as : Statistics Python
Wald-Wolfowitz Two-Sample Runs Test

The Wald-Wolfowitz runs test is used to test the hypothesis that two independent samples have been drawn from the

Read more

Tagged as : Python statistics
Games-Howell Post-Hoc Multiple Comparisons Test with Python

The Games-Howell test is a nonparametric post hoc analysis approach for performing multiple comparisons for two or more sample populations. The Games-Howell test is somewhat similar to Tukey's post hoc test. Still, unlike Tukey's test, it does not assume homogeneity of variances or equal sample sizes. Thus, the Games-Howell test can be applied in settings when the assumptions of Tukey's test do not hold. The Games-Howell test and Tukey's test will often report similar results with data that is assumed to have equal variance and equal sample sizes.

Read more

Tagged as : Python statistics
Bartlett's Test for Equality of Variances with Python

Bartlett's test, developed by Maurice Stevenson Bartlett, is a statistical procedure for testing if $k$ population samples have equal variances. Equality of variances in population samples is assumed in commonly used comparison of means tests, such as Student's t-test and analysis of variance. Therefore, a procedure such as Bartlett's test can be conducted to accept or reject the assumption of equal variances across group samples.

Read more

Tagged as : Python statistics
Levene's Test for Equality of Variances with Python

Levene's test is a statistical procedure for testing equality of variances (also sometimes called homoscedasticity or homogeneity of variances) between two or more sample populations. Several commonly used statistical routines such as the t-test and analysis of variance assume the populations have equal variances. Therefore Levene's test is often employed to test this assumption before performing these tests.

Read more

Tagged as : Python statistics
Van der Waerden's Normal Scores Test

The Van der Waerden test is a non-parametric test for testing the hypothesis that $k$ sample distribution functions are equal. Van der Waerden's test is similar to the Kruskal-Wallis one-way analysis of variance test in that it converts the data to ranks and then to standard normal distribution quantiles. The ranked data is known as the 'normal scores'. Hence, the Van der Waerden test is sometimes referred to as a 'normal scores test'.

Read more

Tagged as : Python statistics
McNemar's Test for Paired Data with Python

McNemar's test is a test for paired data, as in the case of 2x2 contingency tables with a dichotomous trait. The

Read more

Tagged as : statistics Python
Tukey's Test for Post-Hoc Analysis

After a multivariate test, it is often desired to know more about the specific groups to find out if they are significantly different or similar. This step after analysis is referred to as 'post-hoc analysis' and is a major step in hypothesis testing. One common and popular method of post-hoc analysis is Tukey's Test. The test is known by several different names. Tukey's test compares the means of all treatments to the mean of every other treatment and is considered the best available method in cases when confidence intervals are desired or if sample sizes are unequal.

Read more

Tagged as : R statistics
Kruskal-Wallis One-Way Analysis of Variance of Ranks

The Kruskal-Wallis test extends the Mann-Whitney-Wilcoxon Rank Sum test for more than two groups. The test is nonparametric similar to the Mann-Whitney test and as such does not assume the data are normally distributed and can, therefore, be used when the assumption of normality is violated. This example will employ the Kruskal-Wallis test on the PlantGrowth dataset as used in previous examples. Although the data appear to be approximately normally distributed as seen before, the Kruskal-Wallis test performs just as well as a parametric test.

Read more

Tagged as : R statistics
Calculating and Performing One-way Multivariate Analysis of Variance (MANOVA)

MANOVA, or Multiple Analysis of Variance, is an extension of Analysis of Variance (ANOVA) to several dependent variables. The approach to MANOVA is similar to ANOVA in many regards and requires the same assumptions (normally distributed dependent variables with equal covariance matrices).

Read more

Tagged as : R statistics
Calculating and Performing One-way Analysis of Variance (ANOVA)

ANOVA, or Analysis of Variance, is a commonly used approach to testing a hypothesis when dealing with two or more groups. One-way ANOVA, which is what will be explored in this post, can be considered an extension of the t-test when more than two groups are being tested. The factor, or categorical variable, is often referred to as the 'treatment' in the ANOVA setting. ANOVA involves partitioning the data's total variation into variation between and within groups. This procedure is thus known as Analysis of Variance as sources of variation are examined separately.

Read more

Tagged as : R statistics
Computing Working-Hotelling and Bonferroni Simultaneous Confidence Intervals

There are two procedures for forming simultaneous confidence intervals, the Working-Hotelling and Bonferroni procedures. Each estimates intervals of the mean response using a family confidence coefficient. The Working-Hotelling coefficient is defined by $W$ and Bonferroni $B$ . In practice, it is recommended to perform both procedures to determine which results in a tighter interval. The Bonferroni method will be explored first.

Read more

Tagged as : R statistics
Predicting Cat Genders with Logistic Regression

Consider a data set of 144 observations of household cats. The data contains the cats' gender, body weight and height. Can we model and accurately predict the gender of a cat based on previously observed values using logistic regression?

Read more

Tagged as : R statistics
Hierarchical Clustering Nearest Neighbors Algorithm in R

Hierarchical clustering is a widely used and popular tool in statistics

Read more

Tagged as : R clustering statistics
Factor Analysis with the Principal Component Method and R Part Two

In the first post on factor analysis, we examined computing the estimated covariance matrix $S$ of the rootstock data and proceeded to find two factors that fit most of the variance of the data. However, the variables in the data are not on the same scale of measurement, which can cause variables with comparatively large variances to dominate the diagonal of the covariance matrix and the resulting factors. The correlation matrix, therefore, makes more intuitive sense to employ in factor analysis.

Read more

Tagged as : R statistics factor analysis linear algebra
Factor Analysis with the Principal Component Method and R

The goal of factor analysis, similar to principal component analysis, is to reduce the original variables into a smaller number of factors that allows for easier interpretation. PCA and factor analysis still defer in several respects. One difference is principal components are defined as linear combinations of the variables while factors are defined as linear combinations of the underlying latent variables.

Read more

Tagged as : R statistics factor analysis linear algebra

Categories

Recent Posts

Page 1 / 1