19 December 2010

Playing with the Ngram Viewer

The Google Books Ngram Viewer has been in the news (for example, on Wired). If you want to know how many times one or more words or phrases have appeared in the 15 million books Google has scanned, you carry out a search using those words or phrases and get a graph showing the number of occurrences (as percentages) of each over the years starting, I think, in 1800.

This is supposed to be helpful to those who are researching cultural and literary trends. Here is what I got when I searched for Constantinople (blue) and Istanbul (red).

It all makes sense. In the 19th and the early 20th centuries, the city in question was known to the English writers (and readers) almost exclusively as Constantinople. Its alternate names, especially Istanbul preferred by its Turkish residents, were not in common circulation outside of Turkey. In the graph we see that the use of Istanbul picked up soon after 1923 when the Turkish Republic was founded, while the occurrences of Constantinople started to decline*.

The Ngram Viewer also has the potential to help the historians of science. I got the next graph after I searched for slug (blue) and snail (red).

The occurrences of the word snail in books, hence, presumably, the interest in snails, remained roughly the same between 1800 and 1920. This was followed, first, by an increase and, then, a decrease to the pre-1920 levels. On the other hand, the occurrences of the word slug increased steadily from 1800 until about 1920. But around 1920, the slug curve also experienced a sudden jump followed by a drop to a steady level.

I don't know what caused the occurrences of the words snail and slug in books to increase for several decades after 1920.

*The original post featured a different graph that had been the result of a search in which the names Constantinople and Istanbul were not capitalized. After the reader David Winter pointed out in the comments below that Ngram Viewer is case sensitive. I did a new search with capitalized names and then revised the graph and the post accordingly.

David Winter said...

The ngram viewer is case sensitive (I only found out by making a similar error). It seems Constantinople is still in pretty wide circulation.