In this example, we will walk through a possible use case of the nasapy library by extracting the next 10 years of close-approaching objects to Earth identified by NASA's Jet Propulsion Laboratory's Small-Body Database. The close_approach method of the nasapy library allows one to access the JPL SBDB to extract data related to known meteoroids and asteroids within proximity to Earth. Setting the parameter return_df=True automatically coerces the returned JSON data into a pandas DataFrame.

## Plot Earth Fireball Impacts with nasapy, pandas and folium

In this example, we will go through one possible use of the nasapy library by extracting a decade of fireball data from the NASA API and visualizing it on a map. Using the nasapy library, we can extract the last 10 years of fireball data as a pandas DataFrame by calling the fireballs function. The fireballs method does not require authentication to the NASA API, so we can go straight to getting the data.

## Integration by Parts

Integration by parts is another technique for simplifying integrands. As we saw in previous posts, each differentiation rule has a corresponding integration rule. In the case of integration by parts, the corresponding differentiation rule is the Product Rule. This post will introduce the integration by parts formula as well as several worked-through examples.

## L'Hospital's Rule for Calculating Limits and Indeterminate Forms

L'Hospital's Rule allows us to simplify the evaluation of limits that involve indeterminate forms. An indeterminate form is defined as a limit that does not give enough information to determine the original limit. In this post, we explore several examples of indeterminate forms and how to calculate their limits using L'Hospital's Rule. We also leverage Python and SymPy to verify our answers.

## The Fundamental Theorem of Calculus

The Fundamental Theorem of Calculus is a theorem that connects the two branches of calculus, differential and integral, into a single framework. We saw the computation of antiderivatives previously is the same process as integration; thus we know that differentiation and integration are inverse processes. The Fundamental Theorem of Calculus formalizes this connection. The theorem is given in two parts, which we will explore in turn along with Python examples to verify our results.

## Indefinite Integrals

As we noted in the previous sections on the Fundamental Theorem of Calculus and Antiderivatives, indefinite integrals are also called antiderivatives and are the same process. Indefinite integrals are expressed without upper and lower limits on the integrand, the notation \(\int f(x)\) is used to denote the function as an antiderivative of \(F\). Therefore, \(\int f(x) \space dx = F^\prime(x)\).

## Substitution Rule

The Substitution Rule is another technique for integrating complex functions and is the corresponding process of integration as the chain rule is to differentiation. The Substitution Rule is applicable to a wide variety of integrals, but is most performant when the integral in question is similar to forms where the Chain Rule would be applicable. In this post, the Substitution Rule is explored with several examples. Python and SymPy are also used to verify our results.

## Antiderivatives

Antiderivatives, which are also referred to as indefinite integrals or primitive functions, is essentially the opposite of a derivative (hence the name). More formally, an antiderivative \(F\) is a function whose derivative is equivalent to the original function \(f\), or stated more concisely: \(F^\prime(x) = f(x)\). The Fundamental Theorem of Calculus defines the relationship between differential and integral calculus. We will see later that an antiderivative can be thought of as a restatement of an indefinite integral. Therefore, the discussion of antiderivatives provides a nice segue from the differential to integral calculus. The process of finding an antiderivative of a function is known as antidifferentiation and is the reverse of differentiating a function.

## Newton's Method for Finding Equation Roots

Newton's method, also known as Newton-Raphson, is an approach for finding the roots of nonlinear equations and is one of the most common root-finding algorithms due to its relative simplicity and speed. The root of a function is the point at which \(f(x) = 0\). This post explores the how Newton's Method works for finding roots of equations and walks through several examples with SymPy to verify our answers.

## Implicit Differentiation

An implicit function defines an algebraic relationship between variables. In this post, implicit differentiation is explored with several examples including solutions using Python code.

## The Chain Rule of Differentiation

The chain rule is a powerful and useful derivation technique that allows the derivation of functions that would not be straightforward or possible with the only the previously discussed rules at our disposal. The rule takes advantage of the "compositeness" of a function. In this post, we will explore several examples of the chain rule and will also confirm our results using the SymPy symbolic computation library.

## Limit of a Function

A function limit, roughly speaking, describes the behavior of a function around a specific value. Limits play a role in the definition of the derivative and function continuity and are also used in the convergent sequences. In this post, we will explore the definition of a function limit and some other limit laws using examples with Python.

## Derivatives of Logarithmic Functions

Implicit differentiation can also be employed to find the derivatives of logarithmic functions, which are of the form \(y = \log_a{x}\). In this post, we explore several derivatives of logarithmic functions and also prove some commonly used derivatives. The symbolic computation library SymPy is also employed to verify our answers.

## Product, Quotient and Power Rules of Differentiation

Several rules exist for finding the derivatives of functions with several components such as \(x \space sin \space x\). With these rules and the chain rule, which will be explored later, any derivative of a function can be found (assuming they exist). There are five rules that help simplify the computation of derivatives, of which each will be explored in turn. We will also take advantage of SymPy to perform symbolic computations to confirm our results.

## Continuous Functions

In this post, we explore the definition of a continuous function and introduce several examples with Python code.

## Tukey's Test for Post-Hoc Analysis

After a multivariate test, it is often desired to know more about the specific groups to find out if they are significantly different or similar. This step after analysis is referred to as 'post-hoc analysis' and is a major step in hypothesis testing. One common and popular method of post-hoc analysis is Tukey's Test. The test is known by several different names. Tukey's test compares the means of all treatments to the mean of every other treatment and is considered the best available method in cases when confidence intervals are desired or if sample sizes are unequal.

## Kruskal-Wallis One-Way Analysis of Variance of Ranks

The Kruskal-Wallis test extends the Mann-Whitney-Wilcoxon Rank Sum test for more than two groups. The test is nonparametric similar to the Mann-Whitney test and as such does not assume the data are normally distributed and can, therefore, be used when the assumption of normality is violated. This example will employ the Kruskal-Wallis test on the

`PlantGrowth`

dataset as used in previous examples. Although the data appear to be approximately normally distributed as seen before, the Kruskal-Wallis test performs just as well as a parametric test.## Quadratic Discriminant Analysis of Several Groups

Quadratic discriminant analysis for classification is a modification of linear discriminant analysis that does not assume equal covariance matrices amongst the groups (\(\Sigma_1, \Sigma_2, \cdots, \Sigma_k\)). Similar to LDA for several groups, quadratic discriminant analysis for several groups classification seeks to find the group that maximizes the quadratic classification function and assign the observation vector \(y\) to that group.

## Quadratic Discriminant Analysis of Two Groups

LDA assumes the groups in question have equal covariance matrices (\(\Sigma_1 = \Sigma_2 = \cdots = \Sigma_k\)). Therefore, when the groups do not have equal covariance matrices, observations are frequently assigned to groups with large variances on the diagonal of its corresponding covariance matrix (Rencher, n.d., pp. 321). Quadratic discriminant analysis is a modification of LDA that does not assume equal covariance matrices amongst the groups. In quadratic discriminant analysis, the respective covariance matrix \(S_i\) of the \(i^{th}\) group is employed in predicting the group membership of an observation, rather than the pooled covariance matrix \(S_{p1}\) in linear discriminant analysis.

## Linear Discriminant Analysis for the Classification of Several Groups

Similar to the two-group linear discriminant analysis for classification case, LDA for classification into several groups seeks to find the mean vector that the new observation \(y\) is closest to and assign \(y\) accordingly using a distance function. The several group case also assumes equal covariance matrices amongst the groups (\(\Sigma_1 = \Sigma_2 = \cdots = \Sigma_k\)).

## Linear Discriminant Analysis for the Classification of Two Groups

In this post, we will use the discriminant functions found in the first post to classify the observations. We will also employ cross-validation on the predicted groups to get a realistic sense of how the model would perform in practice on new observations. Linear classification analysis assumes the populations have equal covariance matrices (\(\Sigma_1 = \Sigma_2\)) but does not assume the data are normally distributed.

## Discriminant Analysis for Group Separation

Discriminant analysis assumes the two samples or populations being compared have the same covariance matrix \(\Sigma\) but distinct mean vectors \(\mu_1\) and \(\mu_2\) with \(p\) variables. The discriminant function that maximizes the separation of the groups is the linear combination of the \(p\) variables. The linear combination denoted \(z = a′y\) transforms the observation vectors to a scalar. The discriminant functions thus take the form:

## Discriminant Analysis of Several Groups

Discriminant analysis is also applicable in the case of more than two groups. In the first post on discriminant analysis, there was only one linear discriminant function as the number of linear discriminant functions is \(s = min(p, k − 1)\), where \(p\) is the number of dependent variables and \(k\) is the number of groups. In the case of more than two groups, there will be more than one linear discriminant function, which allows us to examine the groups' separation in more than one dimension.

## Building a Poetry Database in PostgreSQL with Python, poetpy, pandas and Sqlalchemy

In this example, we walk through a sample use case of extracting data from a database using an API and then structuring that data in a cohesive manner that allows us to create a relational database that we can then query with SQL statements. The database we will create with the extracted data will use Postgresql. The Python libraries that will be used in this example are poetpy, a Python wrapper for the PoetryDB API written by yours truly, pandas for transforming and cleansing the data as needed, and sqlalchemy for handling the SQL side of things.

## Introduction to Rpoet

The Rpoet package is a wrapper of the PoetryDB API, which enables developers and other users to extract a vast amount of English-language poetry from nearly 130 authors. The package provides a simple R interface for interacting and accessing the PoetryDB database. This vignette will introduce the basic functionality of Rpoet and some example usages of the package.

## Calculating and Performing One-way Multivariate Analysis of Variance (MANOVA)

MANOVA, or Multiple Analysis of Variance, is an extension of Analysis of Variance (ANOVA) to several dependent variables. The approach to MANOVA is similar to ANOVA in many regards and requires the same assumptions (normally distributed dependent variables with equal covariance matrices).

## Calculating and Performing One-way Analysis of Variance (ANOVA)

ANOVA, or Analysis of Variance, is a commonly used approach to testing a hypothesis when dealing with two or more groups. One-way ANOVA, which is what will be explored in this post, can be considered an extension of the t-test when more than two groups are being tested. The factor, or categorical variable, is often referred to as the 'treatment' in the ANOVA setting. ANOVA involves partitioning the data's total variation into variation between and within groups. This procedure is thus known as Analysis of Variance as sources of variation are examined separately.

## Introduction to poetpy

The poetpy library is a Python wrapper for the PoetryDB API. The library provides a Pythonic interface for interacting with and extracting information from the PoetryDB database. In this introductory example, we will explore some of the basic functionality of the poetpy library for interacting with the PoetryDB database.

## Computing Working-Hotelling and Bonferroni Simultaneous Confidence Intervals

There are two procedures for forming simultaneous confidence intervals, the Working-Hotelling and Bonferroni procedures. Each estimates intervals of the mean response using a family confidence coefficient. The Working-Hotelling coefficient is defined by \(W\) and Bonferroni \(B\). In practice, it is recommended to perform both procedures to determine which results in a tighter interval. The Bonferroni method will be explored first.

## Predicting Cat Genders with Logistic Regression

Consider a data set of 144 observations of household cats. The data contains the cats' gender, body weight and height. Can we model and accurately predict the gender of a cat based on previously observed values using logistic regression?

## PetfindeR, R Wrapper for the Petfinder API, Introduction Part Two

The first post introduced and explored the basic usage of the PetfindeR library. In this post, we take a quick look at some of the additional uses of the library and its methods to extract data from the Petfinder database.

## PetfindeR, R Wrapper for the Petfinder API, Introduction Part One

The goal of the PetfindeR package is to provide a simple and straightforward interface for interacting with the Petfinder API through R. The Petfinder database contains approximately 300,000 adoptable pet records and 11,000 animal welfare organization records, which makes it a handy and valuable source of data for those in the animal welfare community. However, the outputs from the Petfinder API are in messy JSON format and thus it makes it more time-consuming and often frustrating to coerce the output data into a form that is workable with R.

## From Intake to Outcome: Analyzing the Austin Animal Center's Intake and Outcomes Datasets

The Austin Animal Center provides its animal intake and outcome datasets on Socrata. When an animal is taken into the shelter, it is given a unique identifier that is also used in the outcomes dataset. We have already investigated and performed exploratory data analysis on the Austin Animal Center's intakes and animal outcomes individually and found several interesting facets of information. In this analysis, we merge the intakes and outcomes dataset using pandas to enable us to perform exploratory data analysis on the merged data. With the data merged, we will be able to explore in more depth the transition from intake to outcome.

## Austin Animal Center Intakes Exploratory Data Analysis with Python, Pandas and Seaborn

The Austin Animal Center, the largest no-kill municipal shelter in the United States, makes available its collected data on Austin's Open Data Portal. This data includes both animals incoming into the shelter and the animals' outcome. In this post, we perform some exploratory data analysis on the intakes dataset to see if we can find any noticeable trends or interesting pieces of information of the data. First, we will extract the data from Austin's Data Portal, which is supported by Socrata. We will then perform some data transformation and cleaning steps to get the data ready for analysis.

## Extract and Analyze the Seattle Pet Licenses Dataset

The city of Seattle makes available its database of pet licenses issued from 2005 to the beginning of 2017 as part of the city's ongoing Open Data Initiative. This post will explore extracting the data from Seattle's Open Data portal using requests, then transform the extracted JSON data into a workable dataset with pandas to analyze and investigate the pet license database.

## Predicting Shelter Cat Adoptions and Transfers with Scikit-learn and Machine Learning

Following from the previous analyses of the Austin Animal Center's shelter outcomes dataset, we now take what we learned from the exploratory data analysis component of the investigation and build and train a machine learning model for predicting if a cat entering the shelter will be adopted or transferred to a partner facility. Adoptions and transfers make up about 90% of all the outcomes.

## Exploratory Data Analysis of Shelter Cat Outcomes with Pandas and Seaborn

The following post walked through how to extract and transform the shelter outcome data to make it tidy and suitable for data analysis. In this post, we perform exploratory data analysis using pandas and seaborn to investigate and visualize the shelter outcomes of cats. The findings that are garnered from the exploratory data analysis step can help tremendously in the model building phase when we need to select the important features of the data.

## Analyzing Nationwide Utility Rates with R, SQL and Plotly

R and SQL make excellent complements for analyzing data due to their respective strengths. The sqldf package provides an interface for working with SQL in R by querying data from a database into an R data.frame. This post will demonstrate how to query and analyze data using the sqldf package in conjunction with the graphing libraries plotly and ggplot2 as well as some other packages that provide useful statistical tests and other functions.

## Analyzing the Consumer Complaints Database with Python, SQL and Plotly

The consumer complaints database is a collection of complaints received by the Bureau of Consumer Financial Protection related to financial products and services. This post explores creating a database file using SQLite and analyzing the data with Pandas and Plotly.

## Extraction and Feature Engineering of Animal Austin Center's Shelter Outcomes Dataset using Requests and Pandas

The Austin Animal Center, the largest no-kill municipal animal shelter in the United States, makes available its shelter animal outcomes dataset as patrt of the City of Austin's Open Data program. This post demonstrates how to extract the data from the City of Austin's Open Data portal using the requests library and convert the resulting JSON to a tabular pandas DataFrame. We will then enrich the data by applying feature engineering to the data to add more information, which should help improve the outcome prediction model.

## Cartesian Product and Ordered and Unordered Pairs

A pair set is a set with two members, for example, {2, 3}, which can also be thought of as an unordered pair, in that {2, 3}={3, 2}. However, we seek a more a strict and rich object that tells us more about two sets and how their elements are ordered.

## Algebra of Sets with R

The set operations, union and intersection, the relative complement − and the inclusion relation (subsets) are known as the algebra of sets. The algebra of sets can be used to find many identities related to set relations.

## Measuring Sensitivity to Derivatives Pricing Changes with the "Greeks" and Python

The Greeks are used as risk measures that represent how sensitive the price of derivatives are to change.

## Black-Scholes Formula and Python Implementation

Introduces the call and put option pricing using the Black-Scholes formula and Python implementations.

## N-Union and Intersection Set Operations

Set unions and intersections can be extended to any number of sets. This post introduces notation to simplify the expression of n-sets and the set union and intersection operations themselves with R.

## Implied Volatility Calculations with Python

Discusses calculations of the implied volatility measure in pricing security options with the Black-Scholes model.

## Put-Call Parity of Vanilla European Options and Python Implementation

Introduces the put-call parity as identified by Hans Stoll in 1969 as well as Python code for computing the put-call parity both numerically and symbolically.

## Download 45,000 Adoptable Cat Images in 6.5 Minutes with petpy and multiprocessing

Using petpy and multiprocessing to download 45,000 cat images in 6.5 minutes

## Introduction to petpy

Introduction to using the petpy Python library for interacting with the Petfinder API.

## Combined Linear Congruential Generator for Pseudo-random Number Generation

Combined linear congruential generators, as the name implies, are a type of PRNG (pseudorandom number generator) that combine two or more LCGs (linear congruential generators). The combination of two or more LCGs into one random number generator can result in a marked increase in the period length of the generator which makes them better suited for simulating more complex systems.

## Multiplicative Congruential Random Number Generators with R

Multiplicative congruential generators, also known as Lehmer random number generators, is a type of linear congruential generator for generating uniform pseudorandom numbers. The multiplicative congruential generator, often abbreviated as MLCG or MCG, is defined as a recurrence relation similar to the LCG.

## Linear Congruential Generator for Pseudo-random Number Generation with R

Linear congruential generators (LCGs) are a class of pseudorandom number generator (PRNG) algorithms used for generating sequences of random-like numbers. The generation of random numbers plays a large role in many applications ranging from cryptography to Monte Carlo methods. Linear congruential generators are one of the oldest and most well-known methods for generating random numbers primarily due to their comparative ease of implementation and speed and their need for little memory.

## Set Union and Intersections with R

The set operations 'union' and 'intersection' should ring a bell for those who've worked with relational databases and Venn Diagrams. The 'union' of two of sets A and B represents a set that comprises all members of A and B (or both).

## Introduction to Sets and Set Theory with R

Sets define a 'collection' of objects, or things typically referred to as 'elements' or 'members.' The concept of sets arises naturally when dealing with any collection of objects, whether it be a group of numbers or anything else.

## Hierarchical Clustering Nearest Neighbors Algorithm in R

Hierarchical clustering is a widely used and popular tool in statistics

## Factor Analysis with the Iterated Factor Method and R

The iterated principal factor method is an extension of the principal

## Factor Analysis with Principal Factor Method and R

As discussed in a previous post on the principal component method of factor analysis, the \(\hat{\Psi}\) term in the estimated covariance matrix \(S\), \(S = \hat{\Lambda} \hat{\Lambda}' + \hat{\Psi}\), was excluded and we proceeded directly to factoring \(S\) and \(R\). The principal factor method of factor analysis (also called the principal axis method) finds an initial estimate of \(\hat{\Psi}\) and factors \(S - \hat{\Psi}\), or \(R - \hat{\Psi}\) for the correlation matrix.

## Factor Analysis with the Principal Component Method and R Part Two

In the first post on factor analysis, we examined computing the estimated covariance matrix \(S\) of the rootstock data and proceeded to find two factors that fit most of the variance of the data. However, the variables in the data are not on the same scale of measurement, which can cause variables with comparatively large variances to dominate the diagonal of the covariance matrix and the resulting factors. The correlation matrix, therefore, makes more intuitive sense to employ in factor analysis.

## Factor Analysis with the Principal Component Method and R

The goal of factor analysis, similar to principal component analysis, is to reduce the original variables into a smaller number of factors that allows for easier interpretation. PCA and factor analysis still defer in several respects. One difference is principal components are defined as linear combinations of the variables while factors are defined as linear combinations of the underlying latent variables.

## Image Compression with Principal Component Analysis

Image compression with principal component

## Principal Component Analysis with R Example

Often, it is not helpful or informative to only look at all the variables in a dataset for correlations or covariances. A preferable approach is to derive new variables from the original variables that preserve most of the information given by their variances. Principal component analysis is a widely used and popular statistical method for reducing data with many dimensions (variables) by projecting the data with fewer dimensions using linear combinations of the variables, known as principal components.

## Image Compression with Singular Value Decomposition

The method of image compression with singular value decomposition is

## Singular Value Decomposition and R Example

SVD underpins many statistical and real-world

## Cholesky Decomposition with R Example

Cholesky decomposition, also known as Cholesky factorization, is a

## How to Calculate the Inverse Matrix for 2×2 and 3×3 Matrices

The inverse of a number is its reciprocal. For example, the inverse of 8

#### Pages

#### Categories

Page 1 / 1