Dienstag, 28. Februar 2012

Tools: Bookworm

What is Bookworm?

Bookworm demonstrates a new way of interacting with the millions of recently digitized library books. The Harvard Cultural Observatory already collaborated with Google Books on the Google ngrams viewerthat has data for years. Bookworm doesn't work so closely with Google Books: instead, it uses books in the public domain so you can explore the information we know about a book from many angles at once: genre, author information, publication place, and so on. We're submitting it as part of the Digital Public Library of America's Beta Sprint initiative.

What can I do with it?

Library metadata makes all sorts of interesting queries possible. For example:

  • Say you want to know about the history of Social Darwinism: when did "evolution" cross over from the sciences into the social sciences? You can compare the paths of keywords like "evolution" in different genres.

  • You can also use geographical information to make comparisons. Suppose that you want to know whether British or American fiction has more female characters. Searching for female pronouns shows you that American literature does seem to use 'she' a little bit more. But you'll need to do some more searches, and look at some books, to be sure.

  • Although it doesn't work the way you'd expect on multiword phrases, you are able to combine words if you want to search for things like plurals or places that have two names; you can, for example examine the history of the "long-s" by comparing the usage of the words "fo" and "so" together and apart.

  • You don't have to plot by publication year, either: you can use a number of different variables, including the age of the author when the book was published. Death and taxes may be the only two constants in life, but authors seem to care about them at different ages: the young and old talk more about death, while only the safely middle-aged seem to care about taxes.

    What Books does this use?

    All of our site builds on the amazing work of the Open Library and Internet Archive projects. The Internet Archive makes scans of books publically available to the public with Optical Character Recognition already perfomed. The books come mostly from major research libraries and are scanned by the Internet Archive itself, Google, Microsoft and other scanning initiatives. The Open Library is the Internet Archive's cataloging wing; they hope to create a publically editable library catalogue with an entry for every book ever published. We try to include all the books available bothe Open Library and the Internet Archive. Currently, that means about 950,000 books. When you build a corpus, you can see exactly how many books you are searching in the construction box.

  • Keine Kommentare:

    Kommentar veröffentlichen