Aaron Schlegel's Notebook of Interesting Things

Calculating and Performing One-way Analysis of Variance (ANOVA)

ANOVA, or Analysis of Variance, is a commonly used approach to testing a hypothesis when dealing with two or more groups. One-way ANOVA, which is what will be explored in this post, can be considered an extension of the t-test when more than two groups are being tested. The factor, or categorical variable, is often referred to as the 'treatment' in the ANOVA setting. ANOVA involves partitioning the data's total variation into variation between and within groups. This procedure is thus known as Analysis of Variance as sources of variation are examined separately.

Read more

Tagged as : R statistics
Introduction to poetpy

The poetpy library is a Python wrapper for the PoetryDB API. The library provides a Pythonic interface for interacting with and extracting information from the PoetryDB database. In this introductory example, we will explore some of the basic functionality of the poetpy library for interacting with the PoetryDB database.

Read more

Tagged as : Python APIs poetpy
Computing Working-Hotelling and Bonferroni Simultaneous Confidence Intervals

There are two procedures for forming simultaneous confidence intervals, the Working-Hotelling and Bonferroni procedures. Each estimates intervals of the mean response using a family confidence coefficient. The Working-Hotelling coefficient is defined by $W$ and Bonferroni $B$ . In practice, it is recommended to perform both procedures to determine which results in a tighter interval. The Bonferroni method will be explored first.

Read more

Tagged as : R statistics
Predicting Cat Genders with Logistic Regression

Consider a data set of 144 observations of household cats. The data contains the cats' gender, body weight and height. Can we model and accurately predict the gender of a cat based on previously observed values using logistic regression?

Read more

Tagged as : R statistics
PetfindeR, R Wrapper for the Petfinder API, Introduction Part Two

The first post introduced and explored the basic usage of the PetfindeR library. In this post, we take a quick look at some of the additional uses of the library and its methods to extract data from the Petfinder database.

Read more

Tagged as : R PetfindeR APIs
PetfindeR, R Wrapper for the Petfinder API, Introduction Part One

The goal of the PetfindeR package is to provide a simple and straightforward interface for interacting with the Petfinder API through R. The Petfinder database contains approximately 300,000 adoptable pet records and 11,000 animal welfare organization records, which makes it a handy and valuable source of data for those in the animal welfare community. However, the outputs from the Petfinder API are in messy JSON format and thus it makes it more time-consuming and often frustrating to coerce the output data into a form that is workable with R.

Read more

Tagged as : R PetfindeR APIs
From Intake to Outcome: Analyzing the Austin Animal Center's Intake and Outcomes Datasets

The Austin Animal Center provides its animal intake and outcome datasets on Socrata. When an animal is taken into the shelter, it is given a unique identifier that is also used in the outcomes dataset. We have already investigated and performed exploratory data analysis on the Austin Animal Center's intakes and animal outcomes individually and found several interesting facets of information. In this analysis, we merge the intakes and outcomes dataset using pandas to enable us to perform exploratory data analysis on the merged data. With the data merged, we will be able to explore in more depth the transition from intake to outcome.

Read more

Tagged as : Python data analysis animal welfare
Austin Animal Center Intakes Exploratory Data Analysis with Python, Pandas and Seaborn

The Austin Animal Center, the largest no-kill municipal shelter in the United States, makes available its collected data on Austin's Open Data Portal. This data includes both animals incoming into the shelter and the animals' outcome. In this post, we perform some exploratory data analysis on the intakes dataset to see if we can find any noticeable trends or interesting pieces of information of the data. First, we will extract the data from Austin's Data Portal, which is supported by Socrata. We will then perform some data transformation and cleaning steps to get the data ready for analysis.

Read more

Tagged as : Python data analysis animal welfare
Extract and Analyze the Seattle Pet Licenses Dataset

The city of Seattle makes available its database of pet licenses issued from 2005 to the beginning of 2017 as part of the city's ongoing Open Data Initiative. This post will explore extracting the data from Seattle's Open Data portal using requests, then transform the extracted JSON data into a workable dataset with pandas to analyze and investigate the pet license database.

Read more

Tagged as : Python data analysis animal welfare
Predicting Shelter Cat Adoptions and Transfers with Scikit-learn and Machine Learning

Following from the previous analyses of the Austin Animal Center's shelter outcomes dataset, we now take what we learned from the exploratory data analysis component of the investigation and build and train a machine learning model for predicting if a cat entering the shelter will be adopted or transferred to a partner facility. Adoptions and transfers make up about 90% of all the outcomes.

Read more

Tagged as : Python animal welfare scikit-learn machine learning
Exploratory Data Analysis of Shelter Cat Outcomes with Pandas and Seaborn

The following post walked through how to extract and transform the shelter outcome data to make it tidy and suitable for data analysis. In this post, we perform exploratory data analysis using pandas and seaborn to investigate and visualize the shelter outcomes of cats. The findings that are garnered from the exploratory data analysis step can help tremendously in the model building phase when we need to select the important features of the data.

Read more

Tagged as : Python animal welfare pandas seaborn
Analyzing Nationwide Utility Rates with R, SQL and Plotly

R and SQL make excellent complements for analyzing data due to their respective strengths. The sqldf package provides an interface for working with SQL in R by querying data from a database into an R data.frame. This post will demonstrate how to query and analyze data using the sqldf package in conjunction with the graphing libraries plotly and ggplot2 as well as some other packages that provide useful statistical tests and other functions.

Read more

Tagged as : R Plotly SQL
Analyzing the Consumer Complaints Database with Python, SQL and Plotly

The consumer complaints database is a collection of complaints received by the Bureau of Consumer Financial Protection related to financial products and services. This post explores creating a database file using SQLite and analyzing the data with Pandas and Plotly.

Read more

Tagged as : Python SQL Plotly
Extraction and Feature Engineering of Animal Austin Center's Shelter Outcomes Dataset using Requests and Pandas

The Austin Animal Center, the largest no-kill municipal animal shelter in the United States, makes available its shelter animal outcomes dataset as patrt of the City of Austin's Open Data program. This post demonstrates how to extract the data from the City of Austin's Open Data portal using the requests library and convert the resulting JSON to a tabular pandas DataFrame. We will then enrich the data by applying feature engineering to the data to add more information, which should help improve the outcome prediction model.

Read more

Tagged as : Python animal welfare pandas seaborn
Cartesian Product and Ordered and Unordered Pairs

A pair set is a set with two members, for example, {2, 3}, which can also be thought of as an unordered pair, in that {2, 3}={3, 2}. However, we seek a more a strict and rich object that tells us more about two sets and how their elements are ordered.

Read more

Tagged as : R set theory
Algebra of Sets with R

The set operations, union and intersection, the relative complement − and the inclusion relation (subsets) are known as the algebra of sets. The algebra of sets can be used to find many identities related to set relations.

Read more

Tagged as : R set theory
Measuring Sensitivity to Derivatives Pricing Changes with the "Greeks" and Python

The Greeks are used as risk measures that represent how sensitive the price of derivatives are to change.

Read more

Tagged as : Python finance mathematics
Black-Scholes Formula and Python Implementation

Introduces the call and put option pricing using the Black-Scholes formula and Python implementations.

Read more

Tagged as : Python finance mathematics
N-Union and Intersection Set Operations

Set unions and intersections can be extended to any number of sets. This post introduces notation to simplify the expression of n-sets and the set union and intersection operations themselves with R.

Read more

Tagged as : R set theory
Implied Volatility Calculations with Python

Discusses calculations of the implied volatility measure in pricing security options with the Black-Scholes model.

Read more

Tagged as : Python finance mathematics

Aaron Schlegel's Notebook of Interesting Things

Calculating and Performing One-way Analysis of Variance (ANOVA)

Introduction to poetpy

Computing Working-Hotelling and Bonferroni Simultaneous Confidence Intervals

Predicting Cat Genders with Logistic Regression

PetfindeR, R Wrapper for the Petfinder API, Introduction Part Two

PetfindeR, R Wrapper for the Petfinder API, Introduction Part One

From Intake to Outcome: Analyzing the Austin Animal Center's Intake and Outcomes Datasets

Austin Animal Center Intakes Exploratory Data Analysis with Python, Pandas and Seaborn

Extract and Analyze the Seattle Pet Licenses Dataset

Predicting Shelter Cat Adoptions and Transfers with Scikit-learn and Machine Learning

Exploratory Data Analysis of Shelter Cat Outcomes with Pandas and Seaborn

Analyzing Nationwide Utility Rates with R, SQL and Plotly

Analyzing the Consumer Complaints Database with Python, SQL and Plotly

Extraction and Feature Engineering of Animal Austin Center's Shelter Outcomes Dataset using Requests and Pandas

Cartesian Product and Ordered and Unordered Pairs

Algebra of Sets with R

Measuring Sensitivity to Derivatives Pricing Changes with the "Greeks" and Python

Black-Scholes Formula and Python Implementation

N-Union and Intersection Set Operations

Implied Volatility Calculations with Python

Categories

Recent Posts