1. Integration by Parts

    Integration by parts is another technique for simplifying integrands. As we saw in previous posts, each differentiation rule has a corresponding integration rule. In the case of integration by parts, the corresponding differentiation rule is the Product Rule. The technique of integration by parts allows us to simplify integrands of the form: $$ \int f(x) g(x) dx $$

  2. L'Hospital's Rule for Calculating Limits and Indeterminate Forms

    L'Hospital's Rule allows us to simplify the evaluation of limits that involve indeterminate forms. An indeterminate form is defined as a limit that does not give enough information to determine the original limit. The most common indeterminate forms that occur in calculus and other areas of mathematics include:

    $$ \frac{0}{0}, \qquad \frac{\infty}{\infty}, \qquad 0 \times \infty, \qquad 1^\infty, \qquad \infty - \infty, \qquad 0^0, \qquad \infty^0 $$

  3. The Fundamental Theorem of Calculus

    The Fundamental Theorem of Calculus is a theorem that connects the two branches of calculus, differential and integral, into a single framework. We saw the computation of antiderivatives previously is the same process as integration; thus we know that differentiation and integration are inverse processes. The Fundamental Theorem of Calculus formalizes this connection. The theorem is given in two parts.

  4. Indefinite Integrals

    As we noted in the previous sections on the Fundamental Theorem of Calculus and Antiderivatives, indefinite integrals are also called antiderivatives and are the same process. Indefinite integrals are expressed without upper and lower limits on the integrand, the notation $\int f(x)$ is used to denote the function as an antiderivative of $F$. Therefore, $\int f(x) \space dx = F^\prime(x)$.

  5. Substitution Rule

    The Substitution Rule is another technique for integrating complex functions and is the corresponding process of integration as the chain rule is to differentiation.

    The Substitution Rule is applicable to a wide variety of integrals, but is most performant when the integral in question is of the form:

    $$ \int F\big(g(x)\big) g^\prime (x) \space dx $$

  6. Antiderivatives

    Antiderivatives, which are also referred to as indefinite integrals or primitive functions, is essentially the opposite of a derivative (hence the name). More formally, an antiderivative $F$ is a function whose derivative is equivalent to the original function $f$, or stated more concisely: $F^\prime(x) = f(x)$.

    The Fundamental Theorem of Calculus defines the relationship between differential and integral calculus. We will see later that an antiderivative can be thought of as a restatement of an indefinite integral. Therefore, the discussion of antiderivatives provides a nice segue from the differential to integral calculus.

  7. Newton's Method for Finding Equation Roots

    Newton's method, also known as Newton-Raphson, is an approach for finding the roots of nonlinear equations and is one of the most common root-finding algorithms due to its relative simplicity and speed. The root of a function is the point at which $f(x) = 0$. Many equations have more than one root. Every real polynomial of odd degree has an odd number of real roots ("Zero of a function," 2016). Newton-Raphson is an iterative method that begins with an initial guess of the root. The method uses the derivative of the function $f'(x)$ as well as the original function $f(x)$, and thus only works when the derivative can be determined.

  8. Implicit Differentiation

    Implicit Differentiation

    An explicit function is of the form that should be the most familiar, such as:

    $$ f(x) = x^2 + 3 $$ $$ y = \sin{x} $$

    Whereas an implicit function defines an algebraic relationship between variables. These functions have a form similar to the following:

    $$ x^2 + y^2 = 25 $$ $$ y^5 + xy = 3 $$

  9. The Chain Rule of Differentiation

    The chain rule is a powerful and useful derivation technique that allows the derivation of functions that would not be straightforward or possible with the only the previously discussed rules at our disposal. The rule takes advantage of the "compositeness" of a function. For example, consider the function:

    $$ f(x) = \sin{4x} $$

  10. Limit of a Function

    A function limit, roughly speaking, describes the behavior of a function around a specific value. Limits play a role in the definition of the derivative and function continuity and are also used in the convergent sequences.

    Before getting to the precise definition of a limit, we can investigate limit of a function by plotting it and examining the area around the limit value.

  11. Derivatives of Logarithmic Functions

    Implicit differentiation, which we explored in the last section, can also be employed to find the derivatives of logarithmic functions, which are of the form $y = \log_a{x}$. This also includes the natural logarithmic function $y = \ln{x}$.

    Proving $\frac{d}{dx} (\log_a{x}) = \frac{1}{x \ln{a}}$

    Taking advantage of the fact that $y = \log_a{x}$ can be rewritten as an exponential equation, $a^y = x$, we can state the derivative of $\log_a{x}$ as:

  12. Product, Quotient and Power Rules of Differentiation

    Several rules exist for finding the derivatives of functions with several components such as $x \space sin \space x$. With these rules and the chain rule, which will be explored later, any derivative of a function can be found (assuming they exist). There are five rules that help simplify the computation of derivatives, of which each will be explored in turn.

  13. Continuous Functions

    Continuous Functions

    A function is said to be continuous at a point $a$ if the following statements hold:

    • the function $f$ is defined at $a$
    • the limit $\lim_{x \to a} \space f(x)$ exists
    • the limit is equal to $f(a)$, $\lim_{x \to a} \space f(x) = f(a)$

    Continuity of a function can also be expressed more compactly by the statement: $f(x) \to f(a) \space \text{as} \space f \to a$

  14. Tukey's Test for Post-Hoc Analysis

    After a multivariate test, it is often desired to know more about the specific groups to find out if they are significantly different or similar. This step after analysis is referred to as 'post-hoc analysis' and is a major step in hypothesis testing. One common and popular method of post-hoc analysis is Tukey's Test. The test is known by several different names. Tukey's test compares the means of all treatments to the mean of every other treatment and is considered the best available method in cases when confidence intervals are desired or if sample sizes are unequal.

    Tagged as : R statistics
  15. Kruskal-Wallis One-Way Analysis of Variance of Ranks

    The Kruskal-Wallis test extends the Mann-Whitney-Wilcoxon Rank Sum test for more than two groups. The test is nonparametric similar to the Mann-Whitney test and as such does not assume the data are normally distributed and can, therefore, be used when the assumption of normality is violated. This example will employ the Kruskal-Wallis test on the PlantGrowth dataset as used in previous examples. Although the data appear to be approximately normally distributed as seen before, the Kruskal-Wallis test performs just as well as a parametric test.

    Tagged as : R statistics
  16. Quadratic Discriminant Analysis of Several Groups

    Quadratic discriminant analysis for classification is a modification of linear discriminant analysis that does not assume equal covariance matrices amongst the groups (\(\Sigma_1, \Sigma_2, \cdots, \Sigma_k\)). Similar to LDA for several groups, quadratic discriminant analysis for several groups classification seeks to find the group that maximizes the quadratic classification function and assign the observation vector \(y\) to that group.

  17. Quadratic Discriminant Analysis of Two Groups

    LDA assumes the groups in question have equal covariance matrices (\(\Sigma_1 = \Sigma_2 = \cdots = \Sigma_k\)). Therefore, when the groups do not have equal covariance matrices, observations are frequently assigned to groups with large variances on the diagonal of its corresponding covariance matrix (Rencher, n.d., pp. 321). Quadratic discriminant analysis is a modification of LDA that does not assume equal covariance matrices amongst the groups. In quadratic discriminant analysis, the respective covariance matrix \(S_i\) of the \(i^{th}\) group is employed in predicting the group membership of an observation, rather than the pooled covariance matrix \(S_{p1}\) in linear discriminant analysis.

  18. Linear Discriminant Analysis for the Classification of Several Groups

    Similar to the two-group linear discriminant analysis for classification case, LDA for classification into several groups seeks to find the mean vector that the new observation \(y\) is closest to and assign \(y\) accordingly using a distance function. The several group case also assumes equal covariance matrices amongst the groups (\(\Sigma_1 = \Sigma_2 = \cdots = \Sigma_k\)).

  19. Linear Discriminant Analysis for the Classification of Two Groups

    In this post, we will use the discriminant functions found in the first post to classify the observations. We will also employ cross-validation on the predicted groups to get a realistic sense of how the model would perform in practice on new observations. Linear classification analysis assumes the populations have equal covariance matrices (\(\Sigma_1 = \Sigma_2\)) but does not assume the data are normally distributed.

  20. Discriminant Analysis for Group Separation

    Discriminant analysis assumes the two samples or populations being compared have the same covariance matrix \(\Sigma\) but distinct mean vectors \(\mu_1\) and \(\mu_2\) with \(p\) variables. The discriminant function that maximizes the separation of the groups is the linear combination of the \(p\) variables. The linear combination denoted \(z = a′y\) transforms the observation vectors to a scalar. The discriminant functions thus take the form:

  21. Discriminant Analysis of Several Groups

    Discriminant analysis is also applicable in the case of more than two groups. In the first post on discriminant analysis, there was only one linear discriminant function as the number of linear discriminant functions is \(s = min(p, k − 1)\), where \(p\) is the number of dependent variables and \(k\) is the number of groups. In the case of more than two groups, there will be more than one linear discriminant function, which allows us to examine the groups' separation in more than one dimension.

  22. Building a Poetry Database in PostgreSQL with Python, poetpy, pandas and Sqlalchemy

    The PoetryDB API stores its data in MongoDB, a popular NoSQL database. Indeed, a NoSQL database is a solid choice for the type of data that is stored in PoetryDB (unstructured text, for example). However, what if we wanted to create a more traditional SQL database with the PoetryDB API data for use in other projects where a relational database would be preferred? By extracting the data from the PoetryDB API using a combination of a few Python libraries, we can recreate the NoSQL PoetryDB database as a SQL database which will allow us more freedom to create additional data features and avoid the need to hit the PoetryDB database more than necessary.

  23. Introduction to Rpoet

    The Rpoet package is a wrapper of the PoetryDB API, which enables developers and other users to extract a vast amount of English-language poetry from nearly 130 authors. The package provides a simple R interface for interacting and accessing the PoetryDB database. This vignette will introduce the basic functionality of Rpoet and some example usages of the package.

    Tagged as : R APIs poetry
  24. Calculating and Performing One-way Analysis of Variance (ANOVA)

    ANOVA, or Analysis of Variance, is a commonly used approach to testing a hypothesis when dealing with two or more groups. One-way ANOVA, which is what will be explored in this post, can be considered an extension of the t-test when more than two groups are being tested. The factor, or categorical variable, is often referred to as the 'treatment' in the ANOVA setting. ANOVA involves partitioning the data's total variation into variation between and within groups. This procedure is thus known as Analysis of Variance as sources of variation are examined separately.

    Tagged as : R statistics
  25. Introduction to poetpy

    The poetpy library is a Python wrapper for the PoetryDB API. The library provides a Pythonic interface for interacting with and extracting information from the PoetryDB database to explore nearly 130 poets and more than 3,200 poems. In this introductory notebook, we will explore some of the basic functionality for interacting with the PoetryDB database.

    Tagged as : Python APIs poetpy
  26. Computing Working-Hotelling and Bonferroni Simultaneous Confidence Intervals

    There are two procedures for forming simultaneous confidence intervals, the Working-Hotelling and Bonferroni procedures. Each estimates intervals of the mean response using a family confidence coefficient. The Working-Hotelling coefficient is defined by \(W\) and Bonferroni \(B\). In practice, it is recommended to perform both procedures to determine which results in a tighter interval. The Bonferroni method will be explored first.

    Tagged as : R statistics
  27. PetfindeR, R Wrapper for the Petfinder API, Introduction Part One

    The goal of the PetfindeR package is to provide a simple and straightforward interface for interacting with the Petfinder API through R. The Petfinder database contains approximately 300,000 adoptable pet records and 11,000 animal welfare organization records, which makes it a handy and valuable source of data for those in the animal welfare community. However, the outputs from the Petfinder API are in messy JSON format and thus it makes it more time-consuming and often frustrating to coerce the output data into a form that is workable with R.

    Tagged as : R PetfindeR APIs
  28. Austin Animal Center Intakes Exploratory Data Analysis with Python, Pandas and Seaborn

    The Austin Animal Center, the largest no-kill municipal shelter in the United States, makes available its collected data on Austin's Open Data Portal. This data includes both animals incoming into the shelter and the animals' outcome. In this post, we perform some exploratory data analysis on the intakes dataset to see if we can find any noticeable trends or interesting pieces of information of the data. First, we will extract the data from Austin's Data Portal, which is supported by Socrata

  29. Predicting Shelter Cat Adoptions and Transfers with Scikit-learn and Machine Learning

    In the previous notebook analysis, we identified several likely candidate features and variables that could be significant in predicting a cat's outcome as it enters the shelter. Using that information and scikit-learn, we can train a machine learning model to predict if a cat will be adopted or transferred to a partner facility. For this first task, we are only interested in the adoption and transfer outcomes to see if our assumptions based on experience and the information we learned from the previous analysis align with predicted results. Adoptions and transfers represent over 90% of all the outcomes in the Austin Animal Center shelter system, therefore focusing on these outcomes and their more specific subtype outcomes and building a model to predict these outcomes is still quite valuable.

  30. Exploratory Data Analysis of Shelter Cat Outcomes with Pandas and Seaborn

    In this step, we visualize the data we extracted from the AAC database with the additional features that were added to the data in the previous notebook. The visualization of the outcomes and variables of which we have an interest will help us better understand the data and how the variables relate to each other. This knowledge will be crucial when selecting which variables we should focus on and include in our prediction model during the model building phase.

  31. Analyzing Nationwide Utility Rates with R, SQL and Plotly

    R and SQL make excellent complements for analyzing data due to their respective strengths. The sqldf package provides an interface for working with SQL in R by querying data from a database into an R data.frame. This post will demonstrate how to query and analyze data using the sqldf package in conjunction with the graphing libraries plotly and ggplot2 as well as some other packages that provide useful statistical tests and other functions.

    Tagged as : R Plotly SQL
  32. Analyzing the Consumer Complaints Database with Python, SQL and Plotly

    The consumer complaints database provided by the Bureau of Consumer Financial Protection, can be downloaded as a 190mb csv file.

    Although the csv file is not large relative to other available datasets that can exceed many gigabytes in size, it still provides good motivation for aggregating the data using SQL and outputting into a Pandas DataFrame. This can all be done conveniently with Pandas's iotools

    Tagged as : Python SQL Plotly
  33. Extraction and Feature Engineering of Animal Austin Center's Shelter Outcomes Dataset using Requests and Pandas

    The Austin Animal Center is the largest no-kill animal shelter and shelters and protects over 18,000 animals each year. As part of the City of Austin's Open Data Initiative, the Center makes available their data detailing shelter pet intake and outcomes. According to the data portal, over 90% of animal outcomes are adoptions, transfers to other shelter partners or returning lost pets to owners.

  34. Algebra of Sets with R

    The set operations, union and intersection, the relative complement − and the inclusion relation (subsets) are known as the algebra of sets. The algebra of sets can be used to find many identities related to set relations.

    Tagged as : R set theory
  35. Black-Scholes Formula and Python Implementation

    The Black-Scholes model was first introduced by Fischer Black and Myron Scholes in 1973 in the paper "The Pricing of Options and Corporate Liabilities". Since being published, the model has become a widely used tool by investors and is still regarded as one of the best ways to determine fair prices of options.

    Tagged as : Python finance mathematics
  36. Implied Volatility Calculations with Python

    Implied volatility $\sigma_{imp}$ is the volatility value $\sigma$ that makes the Black-Scholes value of the option equal to the traded price of the option.

    Recall that in the Black-Scholes model, the volatility parameter $\sigma$ is the only parameter that can't be directly observed. All other parameters can be determined through market data (in the case of the risk-free rate $r$ and dividend yield $q$ and when the option is quoted. This being the case, the volatility parameter is the result of a numerical optimization technique given the Black-Scholes model.

    Tagged as : Python finance mathematics
  37. Download 45,000 Adoptable Cat Images in 6.5 Minutes with petpy and multiprocessing

    Combining the multiprocessing package for concurrent use of multiple CPUs and the petpy package for interacting with the Petfinder API allows one to find and download a vast amount of animal images for use in other tasks, such as image classification.

    This post will introduce how to use the multiprocessing and petpy packages to quickly and easily download a large set of cat images of all the different breeds available in the Petfinder database. We will end up with a collection of just under 45,000 of cat images sorted by user-defined breed classifications.

  38. Introduction to petpy

    The following post introduces the petpy package and its methods for interacting with the Petfinder API. The goal of the petpy library is to enable other users to interact with the rich data available in the Petfinder database with an easy-to-use and straightforward Python interface. Methods for coercing the often messy JSON and XML API outputs into pandas DataFrame

    Tagged as : Python APIs petpy
  39. Combined Linear Congruential Generator for Pseudo-random Number Generation

    Combined linear congruential generators, as the name implies, are a type of PRNG (pseudorandom number generator) that combine two or more LCGs (linear congruential generators). The combination of two or more LCGs into one random number generator can result in a marked increase in the period length of the generator which makes them better suited for simulating more complex systems.

  40. Linear Congruential Generator for Pseudo-random Number Generation with R

    Linear congruential generators (LCGs) are a class of pseudorandom number generator (PRNG) algorithms used for generating sequences of random-like numbers. The generation of random numbers plays a large role in many applications ranging from cryptography to Monte Carlo methods. Linear congruential generators are one of the oldest and most well-known methods for generating random numbers primarily due to their comparative ease of implementation and speed and their need for little memory.

  41. Set Union and Intersections with R

    The set operations 'union' and 'intersection' should ring a bell for those who've worked with relational databases and Venn Diagrams. The 'union' of two of sets A and B represents a set that comprises all members of A and B (or both).

    Tagged as : R set theory
  42. Factor Analysis with Principal Factor Method and R

    As discussed in a previous post on the principal component method of factor analysis, the \(\hat{\Psi}\) term in the estimated covariance matrix \(S\), \(S = \hat{\Lambda} \hat{\Lambda}' + \hat{\Psi}\), was excluded and we proceeded directly to factoring \(S\) and \(R\). The principal factor method of factor analysis (also called the principal axis method) finds an initial estimate of \(\hat{\Psi}\) and factors \(S - \hat{\Psi}\), or \(R - \hat{\Psi}\) for the correlation matrix.

    Tagged as : R factor analysis
  43. Factor Analysis with the Principal Component Method and R Part Two

    In the first post on factor analysis, we examined computing the estimated covariance matrix \(S\) of the rootstock data and proceeded to find two factors that fit most of the variance of the data. However, the variables in the data are not on the same scale of measurement, which can cause variables with comparatively large variances to dominate the diagonal of the covariance matrix and the resulting factors. The correlation matrix, therefore, makes more intuitive sense to employ in factor analysis.

  44. Factor Analysis with the Principal Component Method and R

    The goal of factor analysis, similar to principal component analysis, is to reduce the original variables into a smaller number of factors that allows for easier interpretation. PCA and factor analysis still defer in several respects. One difference is principal components are defined as linear combinations of the variables while factors are defined as linear combinations of the underlying latent variables.

  45. Principal Component Analysis with R Example

    Often, it is not helpful or informative to only look at all the variables in a dataset for correlations or covariances. A preferable approach is to derive new variables from the original variables that preserve most of the information given by their variances. Principal component analysis is a widely used and popular statistical method for reducing data with many dimensions (variables) by projecting the data with fewer dimensions using linear combinations of the variables, known as principal components.

Page 1 / 1