Analyzing American Cookbooks

Like many people around this time of year, I am faced with the decision of what to make for Thanksgiving. Recipes, traditional and modern, fill my brain as I decide what my guests will accept in their holiday meal. Recent conversations about my recipes have made me think about the history of American cooking, so I thought for this assignment I would look at American Cookbooks to see what were popular ingredients and cooking methods during our nation’s early history.

I used Michigan State University’s Feeding America Digital Project ( to find cookbooks published in America and written by American authors. The site has 76 cookbooks from the late 18th century to the early 20th century and offers multiple formats to download. I used seven cookbooks from 1798-1909:

  • American cookery by Amelia Simmons (1798)
  • The frugal housewife by Susannah Carter (1803)
  • The Virginia housewife: or, Methodical cook by Mary Randolph (1838)
  • The American economical housekeeper, and family receipt book by E. A. Howland (1845)
  • The great western cook book: or, Table receipts, adapted to western housewifery by Anna Maria Collins (1857)
  • The Boston cooking-school cook book by Fannie Merritt Farmer (1896)
  • The good housekeeping woman’s home cook book by Good Housekeeping Magazine (1909)

After looking through the different text-mining tools, I decided to use Voyant. I downloaded the cookbooks as .txt files and renamed them by their date of publication for a cleaner look. I uploaded the files and this is the breakdown of my corpus:,reader,trends,summary,contexts

Most frequent words in the corpus.

The Summary tab gives an overview of the corpus: 7 documents with 381,754 total words and 9,972 unique word forms. The longest document is the 1896 cookbook at 151,148  words and the shortest is the 1798 cookbook at 14,800 words. The 1798 cookbook has the highest vocabulary density and the lowest density is the 1896 cookbook. The five most frequent words in the corpus are ‘water’ (3,074); ‘butter’ (2,481); ‘add’ (2,439); ‘salt’ (2,250); and ‘cup’ (2,224). I wanted to see more results, so I moved the ‘Items’ slide bar on the bottom and it gave me the top 25 most frequent words in the corpus. The above chart shows the Relative Frequency of the top five words. I found it interesting how often the word ‘cup’ was used throughout the corpus. In 1798 ‘cup’ was counted 6 times and ‘cups’ 8 times. In the 1909 cookbook, ‘cup’ was counted 516 times and ‘cups’ 158 times. Granted the 1909 text is much longer than the 1798 text (approximately 35,000 words more), but I wondered what measurement terms were used in 1798. I entered common terms, and saw that ‘pint’ was used 68 times, ‘quart’ 94 times ‘spoonful’ 9 times and ‘ounce’ 12 times in the 1798 cookbook. Interestingly enough, neither ‘teaspoon/s’ nor ‘tablespoon/s’ were used in the 1798 or 1803 cookbooks.

I then looked at common cooking terms and ran into my first issue. I know modern cooking terms, but what were common terms used in the 18th and 19th centuries? Looking at the word cloud Voyant produced helped, as well as the most frequent terms for each cookbook. After creating my initial list, I had to decide how I wanted to input the terms. Do I only include ‘sear’ or all words beginning with sear (‘sear*’) so I don’t miss terms in different tenses? When I just used ‘sear’ I had 5 counts, but ‘sear*’ is 15. I looked them up, and 18 ‘sear’ terms were cooking related (one was ‘sear’d’) and the other 2 instances were the word ‘search’. I’m sure that using all terms with an ‘*’ has skewed the results a bit, but as of right now I would rather be inclusive. 

Cooking Methods

Next, I wanted to focus on food items, so I added measurements and related words to the list of stopwords. Added were: add, cup, tea/tablespoon, half, make, bake, pour, hot and dish. What showed up was slightly different, but I still saw some measurement terms and realized that I had to add the plural forms of words like tablespoon, cup, etc. I was thinking about removing ‘pound’ but there are recipes for pound cake and pear was called ‘Pound Pear’ (1798 cookbook) so I decided to keep it. After the words were added, the most common words in the corpus were: ‘water’ ‘flour’ ‘butter’ ‘sugar’ ‘milk’ and ‘eggs.’ I then looked at what meats were used and when. ‘Fish’ was the most popular, with ‘turkey’ and ‘pigeon’ at the bottom. 

Meat terms in the corpus.

A side part of this project is to see when American cookbooks included what we think of as traditional Thanksgiving fare. I decided to look up pumpkin (pie), cranberry sauce and sweet potato. I also wanted to look up cocoa/chocolate. Looking up pumpkin was interesting. ‘Pumpkin’ was in every cookbook except in 1798. Through an internet search, I found out that pumpkin was spelled ‘pompkin’ at that time and once I searched the word, ‘pompkin’ is mentioned in the 1798 book 3 times. And it is for two variations of a pompkin pudding. By 1803, ‘pompkin’ was changed to ‘pumpkin’ and there was a recipe for a pie. Pumpkin came up the most in 1857. For cranberry, cranberries were first mentioned in 1803 for tarts. But over time cranberry sauce recipes were included. Sweet potatoes were first mentioned in 1838. Cocoa/ chocolate was first included in the 1838 cookbook.


I really enjoyed looking through the cookbooks, but I know that if I were to expand on this project, I would need to do more research about traditional cooking terms and recipes so I could get more accurate results. My current knowledge about this topic is not enough to make the decisions about what terms to focus on or what I can safely add to the list of stopwords.  I would also need to find a larger collection of cookbooks. Feeding America was a great site for an introduction to text analysis, but there are only 76 cookbooks available, and some of them were more of a guide for women and the home than cookbooks. I need to look into how many cookbooks were actually published during this time. There were also books that focused on Swedish, German or Jewish-American cuisine, but they were published in the late 19th/early 20th centuries. I would like to investigate that topic further-when did ‘immigrant cookbooks’ first get published? Voyant was a good fit for this project, and I would recommend using it for those who are dipping their toes into text analysis. It was easy to upload the .txt files and fun to play around with the different Tools in each section (Scatter Plot, Mandala, etc). If I wanted to expand this project, I would have to investigate if Voyant would be as easy to use with a much larger corpus.