Introduction to poetpy

The poetpy library is a Python wrapper for the PoetryDB API. The library provides a Pythonic interface for interacting with and extracting information from the PoetryDB database to explore nearly 130 poets and more than 3,200 poems. In this introductory notebook, we will explore some of the basic functionality for interacting with the PoetryDB database.

First Steps

Make sure the poetpy library is installed. The easiest way to install the library is through pip.

pip install poetpy

An alternative installation option is to clone or download the Github repo of poetpy and invoke the setup.py installation command.

python setup.py install

Using the API

The get_poetry function is the primary interface for interacting with the PoetryDB API.

In [1]:
from poetpy import get_poetry

Basic Usage

The only required parameter for accessing the PoetryDB API is the input_term. The input_term can be any one or a combination of 'author', 'title', 'lines', or 'linecount'. For example, let's say we are interested in finding all of the authors currently in the database.

In [2]:
authors = get_poetry('author')

Because the output will be somewhat lengthy, let's just print the length of the returned object to see how many authors are in the database.

In [3]:
len(authors['authors'])
Out[3]:
129

We can also do the same as above but with the number of poems and sonnets in the database by changing the 'author' input_term to 'title'.

In [4]:
titles = get_poetry('title')
len(titles['titles'])
Out[4]:
2972

We see there are just under 3,000 poems and 130 authors currently in the PoetryDB. With this information, we can then find the average number of poems for each author.

In [5]:
len(titles['titles']) / len(authors['authors'])
Out[5]:
23.03875968992248

Specifying Search Parameters

In addition to the input_term parameter, a corresponding search_term parameter can also be passed to refine the returned results. For example, let's say we are interested in finding William Shakespeare's poetry.

In [6]:
ab = get_poetry('author', 'William Shakespeare')
In [7]:
len(ab)
Out[7]:
162

The search found 162 matching poems and sonnets for William Shakespeare! Let's presume we are only interested in one of Shakespeare's sonnets. Rather than going through the relatively large JSON object that was returned in the previous search, we can edit the query to look for the title of the sonnet we want to return.

In [8]:
get_poetry('title', 'Sonnet 1: From fairest creatures we desire increase')
Out[8]:
[{'author': 'William Shakespeare',
  'linecount': '14',
  'lines': ['From fairest creatures we desire increase,',
   "That thereby beauty's rose might never die,",
   'But as the riper should by time decease,',
   'His tender heir might bear his memory:',
   'But thou contracted to thine own bright eyes,',
   "Feed'st thy light's flame with self-substantial fuel,",
   'Making a famine where abundance lies,',
   'Thy self thy foe, to thy sweet self too cruel:',
   "Thou that art now the world's fresh ornament,",
   'And only herald to the gaudy spring,',
   'Within thine own bud buriest thy content,',
   "And tender churl mak'st waste in niggarding:",
   '  Pity the world, or else this glutton be,',
   "  To eat the world's due, by the grave and thee."],
  'title': 'Sonnet 1: From fairest creatures we desire increase'}]

Now, let's say we do not know the name of the sonnet we wish to find, but we do happen to know one of the lines from the sonnet. By specifying 'lines' in the input_term parameter and then passing the known line in the search_term parameter, the same result as before will be returned.

In [9]:
get_poetry('lines', 'But thou contracted to thine own bright eyes,')
Out[9]:
[{'author': 'William Shakespeare',
  'linecount': '14',
  'lines': ['From fairest creatures we desire increase,',
   "That thereby beauty's rose might never die,",
   'But as the riper should by time decease,',
   'His tender heir might bear his memory:',
   'But thou contracted to thine own bright eyes,',
   "Feed'st thy light's flame with self-substantial fuel,",
   'Making a famine where abundance lies,',
   'Thy self thy foe, to thy sweet self too cruel:',
   "Thou that art now the world's fresh ornament,",
   'And only herald to the gaudy spring,',
   'Within thine own bud buriest thy content,',
   "And tender churl mak'st waste in niggarding:",
   '  Pity the world, or else this glutton be,',
   "  To eat the world's due, by the grave and thee."],
  'title': 'Sonnet 1: From fairest creatures we desire increase'}]

Filtering Returned Results

The get_poetry function also provides an output parameter that can filter the data returned from a query. As an example, let's use the previous search but assume we are only interested in returning the author, title, and linecount of the matching sonnet.

In [10]:
get_poetry('lines', 'But thou contracted to thine own bright eyes,', 'author,title,linecount')
Out[10]:
[{'author': 'William Shakespeare',
  'linecount': '14',
  'title': 'Sonnet 1: From fairest creatures we desire increase'}]

Similar to the input_term parameter, the output parameter can be one or any combination of 'author', 'title', 'lines', 'linecount'.

Combination Searches

Multiple terms can be specified in the input_term parameter with a comma delimiter to return several result sets within one API call. The respective input terms should each have a corresponding search_term delimited by a semi-colon. For example, let's say we want to find the full title name and the line count of John Milton's poetry with Paradise Lost in the title.

In [11]:
get_poetry('title,author', 'Paradise Lost;Milton', 'title,linecount')
Out[11]:
[{'linecount': 640, 'title': 'Paradise Lost: Book 07'},
 {'linecount': 912, 'title': 'Paradise Lost: Book 06'},
 {'linecount': 907, 'title': 'Paradise Lost: Book 05'},
 {'linecount': 1055, 'title': 'Paradise Lost: Book 02'},
 {'linecount': 650, 'title': 'Paradise Lost: Book 12'},
 {'linecount': 653, 'title': 'Paradise Lost: Book 08'},
 {'linecount': 925, 'title': 'Paradise Lost: Book 11'},
 {'linecount': 798, 'title': 'Paradise Lost: Book 01'},
 {'linecount': 743, 'title': 'Paradise Lost: Book 03'},
 {'linecount': 1188, 'title': 'Paradise Lost: Book 09'},
 {'linecount': 1012, 'title': 'Paradise Lost: Book 04'},
 {'linecount': 1073, 'title': 'Paradise Lost: Book 10'}]

As another example, let's say we are interested in finding all of William Shakespeare's poems and sonnets that are fourteen lines long (a sonnet is a poem of 14 equal length lines).

In [13]:
fourteen_lines = get_poetry('author,linecount', 'William Shakespeare;14', 'title,linecount')
len(fourteen_lines)
Out[13]:
152

Other Resources

The PoetryDB API Github page contains information on the implementation and design of the PoetryDB and its API, along with some more examples for working with the API (though they are not in Python).

Related Posts