R and SQL make excellent complements for analyzing data due to their respective strengths. The sqldf package provides an interface for working with SQL in R by querying data from a database into an R data.frame. This post will demonstrate how to query and analyze data using the sqldf package in conjunction with the graphing libraries plotly and ggplot2 as well as some other packages that provide useful statistical tests and other functions.
The consumer complaints database provided by the Bureau of Consumer Financial Protection, can be downloaded as a 190mb csv file.
Although the csv file is not large relative to other available datasets that can exceed many gigabytes in size, it still provides good motivation for aggregating the data using SQL and outputting into a Pandas DataFrame. This can all be done conveniently with Pandas's iotools
Page 1 / 1