The median test, sometimes referred to as Mood's median test is a nonparametric procedure for investigating whether the medians of the populations from which \(k\) sample groups are drawn are equal. The test is a particular case of the chi-square test of dependence.
Articles by Aaron Schlegel
The chi-square test is often used to assess the significance (if any) of the differences among \(k\) different
The Wald-Wolfowitz runs test is used to test the hypothesis that two independent samples have been drawn from the
Matrix norms are an extension of vector norms to matrices and are used to define a measure of distance on the space of a matrix. The most commonly occurring matrix norms in matrix analysis are the Frobenius, \(L_1\), \(L_2\) and \(L_\infty\) norms. The following will investigate these norms, along with some Python implementations of the calculation of the matrix norm.
Similar to the real line concerning two real scalars and the distance between them, vector norms allow us to get a sense of the distance or magnitude of a vector. In fact, a vector of length one is simply a scalar. Norms are often used in regularization methods and other machine learning procedures, as well as many different matrix and vector operations in linear algebra.
The Games-Howell test is a nonparametric post hoc analysis approach for performing multiple comparisons for two or more sample populations. The Games-Howell test is somewhat similar to Tukey's post hoc test. Still, unlike Tukey's test, it does not assume homogeneity of variances or equal sample sizes. Thus, the Games-Howell test can be applied in settings when the assumptions of Tukey's test do not hold. The Games-Howell test and Tukey's test will often report similar results with data that is assumed to have equal variance and equal sample sizes.
Bartlett's test, developed by Maurice Stevenson Bartlett, is a statistical procedure for testing if \(k\) population samples have equal variances. Equality of variances in population samples is assumed in commonly used comparison of means tests, such as Student's t-test and analysis of variance. Therefore, a procedure such as Bartlett's test can be conducted to accept or reject the assumption of equal variances across group samples.
Levene's test is a statistical procedure for testing equality of variances (also sometimes called homoscedasticity or homogeneity of variances) between two or more sample populations. Several commonly used statistical routines such as the t-test and analysis of variance assume the populations have equal variances. Therefore Levene's test is often employed to test this assumption before performing these tests.
The Van der Waerden test is a non-parametric test for testing the hypothesis that \(k\) sample distribution functions are equal. Van der Waerden's test is similar to the Kruskal-Wallis one-way analysis of variance test in that it converts the data to ranks and then to standard normal distribution quantiles. The ranked data is known as the 'normal scores'. Hence, the Van der Waerden test is sometimes referred to as a 'normal scores test'.
In the first part of this series, we extracted adoptable cat and dog information from Petfinder. We found the tones used in the descriptions of the adoptable animals using IBM Watson's Tone Analyzer. These datasets were then combined and cleaned to create a single, unified dataset that can be analyzed with standard Python data analysis packages. In this post, we will explore the dataset and try to answer our original question. Is there a significant difference in tones used in adoptable animal descriptions depending on the species or other factors?
Many animals listed on Petfinder are also given a description by the shelter that provides further details and information on the pet. These descriptions are useful for increasing interest among potential adopters by helping to establish a more personal connection to the animal beyond just cute pictures (not to say I can't get enough of cute cat pictures). Do these descriptions vary in tone depending on the type of animal or the animal's age or other statistics? Through the combination of several Python libraries petpy, textacy, pandas, and the IBM Watson Tone Analyzer API, we will take the first step in answer these questions and more by cleaning and transforming the extracted data and adoptable pet descriptions from the Petfinder API.
Due to the number of different extensions and options on possible underlying assets, a generalized Black-Scholes model was created to simplify computations by significantly reducing the number of equations. In this post, we will explore several of the Black-Scholes option pricing models for different underlying assets and then introduce the generalized Black-Scholes pricing formula.
The Austin Animal Center provides its animal intake and outcome datasets on Socrata. When an animal is taken into the shelter, it is given a unique identifier that is also used in the outcomes dataset. We have already investigated and performed exploratory data analysis on the Austin Animal Center's intakes and animal outcomes individually and found several interesting facets of information. In this analysis, we merge the intakes and outcomes dataset using pandas to enable us to perform exploratory data analysis on the merged data. With the data merged, we will be able to explore in more depth the transition from intake to outcome.
In this example, we will walk through a possible use case of the nasapy library by extracting the next 10 years of close-approaching objects to Earth identified by NASA's Jet Propulsion Laboratory's Small-Body Database. The close_approach method of the nasapy library allows one to access the JPL SBDB to extract data related to known meteoroids and asteroids within proximity to Earth. Setting the parameter return_df=True automatically coerces the returned JSON data into a pandas DataFrame.
In this example, we will go through one possible use of the nasapy library by extracting a decade of fireball data from the NASA API and visualizing it on a map. Using the nasapy library, we can extract the last 10 years of fireball data as a pandas DataFrame by calling the fireballs function. The fireballs method does not require authentication to the NASA API, so we can go straight to getting the data.
McNemar's test is a test for paired data, as in the case of 2x2 contingency tables with a dichotomous trait. The
Integration by parts is another technique for simplifying integrands. As we saw in previous posts, each differentiation rule has a corresponding integration rule. In the case of integration by parts, the corresponding differentiation rule is the Product Rule. This post will introduce the integration by parts formula as well as several worked-through examples.
L'Hospital's Rule allows us to simplify the evaluation of limits that involve indeterminate forms. An indeterminate form is defined as a limit that does not give enough information to determine the original limit. In this post, we explore several examples of indeterminate forms and how to calculate their limits using L'Hospital's Rule. We also leverage Python and SymPy to verify our answers.
The Fundamental Theorem of Calculus is a theorem that connects the two branches of calculus, differential and integral, into a single framework. We saw the computation of antiderivatives previously is the same process as integration; thus we know that differentiation and integration are inverse processes. The Fundamental Theorem of Calculus formalizes this connection. The theorem is given in two parts, which we will explore in turn along with Python examples to verify our results.
As we noted in the previous sections on the Fundamental Theorem of Calculus and Antiderivatives, indefinite integrals are also called antiderivatives and are the same process. Indefinite integrals are expressed without upper and lower limits on the integrand, the notation \(\int f(x)\) is used to denote the function as an antiderivative of \(F\). Therefore, \(\int f(x) \space dx = F^\prime(x)\).
- Data Science
- Linear Algebra
- Machine Learning