About Northwestern Books

<back to Northwestern Books


Project history and scope up

Northwestern University Library (NUL) uses a Kirtas automated book scanner and the services of external vendors to reformat brittle books, digitize special collections, and fulfill targeted patron digitization requests. Prior to 2007, however, the Library lacked a mechanism for making digital book facsimiles available online. In fall 2007, the Andrew W. Mellon Foundation awarded NUL a one-year grant to develop software to gather page scans, apply book structure, prepare OCR and JPEG2000 derivatives, generate technical and preservation metadata, submit these book objects to a Fedora repository, and present a public book viewer with both page turning and search capabilities. The public book viewing site, Northwestern Books (books.northwestern.edu), became available in beta preview on April 6, 2009. The software developed through the Mellon-funded Mounting Books project is now available as open source via SourceForge.net (summer 2009).


Help with the online book viewer up

The Northwestern Books home page contains a simple search mechanism; enter any word and click the Search button to find the word in metadata or in the full text of a book.

searchbox

Clicking Search without entering a term will retrieve all currently available books. Searches can be fine-tuned using the "Anywhere" drop-down menu to find a word only in the Title, Author, Publisher, Subject or full Text. Make your search more specific by clicking the "Add more search terms" button to craft a more focused search using And, Or and Not operators.

Once a book has been retrieved, thumbnails of all pages will be displayed:

bookthumbnails

The default setting will show the first 100 thumbnails; if the book is longer than 100 pages, use the navigation controls at the very bottom of the screen:

thumbnailnav

These controls can be used to jump ahead through thumbnails, or to change the number of thumbnails shown per page.

To see a page at full size, simply click its thumbnail. The page will open at full size, and a new set of buttons will appear at the top. Use these buttons to adjust your view or return to the panel of thumbnails:

pageviewnav

The "Raw Text" button will display the uncorrected text generated by the Optical Character Recognition (OCR) software. This is the text that is used in the book search feature. The "Zoom In" and "Zoom Out" tools present the page at a limited number of pre-selected zoom levels. The "Use Flash" button calls up a more sophisticated zooming tool that can be used for dynamic zooming and panning, and can also be used to rotate the page to any degree desired:

flashrotation

Once the "Use Flash" feature has been activated, it will remain the default option for page viewing until it is clicked a second time to deactivate it.

You may navigate forwards and backwards within a book, page by page, using the navigation controls at the bottom of the screen:

pagenav

Note that these are the same navigation buttons used in the thumbnail view, but since this is a page-at-a-time page-turning view, the "Items per page" option is greyed out and unavailable.

The navigation panel on the left-hand side of the screen can be hidden from view to maximize the page viewing area. The small double-arrow icon can be clicked to hide and reveal the navigation panel:

hidenav shownav

Additional features available within the left-hand navigation panel include a Search inside this book feature and an Explore this Book feature. If page numbering or structural elements such as a Table of Contents and chapters have been added to the book, these elements will be viewable in Explore this Book.

explorebook

The Link to this Book and Download/Print this Book features have not yet been fully implemented as of April 2009; see Known issues and future development (below) for more information.


The Mounting Books software up

The Mounting Books software is a suite of web-based tools for publishing book objects to a Fedora repository and presenting a public book viewer. There are two main components to the suite: the Book Workflow Interface and the Book Viewer. The software will be available for download on June 1, 2009.

Book Workflow Interface

The Book Workflow Interface (also known as BWI, and codenamed "Crabcake" by the project team) includes a jBPM workflow management suite that manages a book digitization process, moving scans from initial capture on the Kirtas scanner through a series of post-processing and quality review steps. The jBPM process culminates in an elegant book structure tool, the Book Builder, that uses a drag-and-drop behavior inside the web browser to support page arrangement and numbering, and to add structure to digitized books (chapters, sections, etc.). A Queue Server accepts page images, submits them to conversion tools that prepare OCR, JPEG2000 derivatives, and technical metadata. The final step of the BWI process generates permanent Handle URLs, submits the book objects to a Fedora repository and updates a SOLR index for searching. The BWI can also import page scans scanned elsewhere, a feature used both for vendor-digitized books and to import foldouts scanned separately.

Book Viewer

The Book Viewing software presents a SOLR-based search for fielded metadata and full text searching. It utilizes the Aware JPEG2000 software to present a zoomable view of book pages and a page-turning interface. See the Help section, above, for further explanation of the features of the book viewer.

External software components

The Mounting Books software also employs a number of external software components. Depending on the needs and development capacity of any given library/institution, these components may be replaceable. For example, if a different OCR engine is used, or a different JPEG2000 viewer, those elements can be replaced with other mechanisms by modifying the BWI code. External components include:

  • Perl
  • MySQL
  • jBPM
  • Ext JS
  • Fedora digital repository
  • SOLR
  • ABBYY OCR engine
  • Aware JPEG2000 software
  • Yaz Proxy Z39.50 gateway
  • Handle permanent URL system
  • Voyager Processing Server/Crabcake Assistant (Voyager OPAC link publisher)

Known issues and future development up

Between the beta preview launch (April 6, 2009) and the first release of the Mounting Books software, the development team will be working on a limited number of Known issues. Additional development and features may be added in future, either by the Northwestern/OSS teams or by external users. A brief list of possible enhancements (Future development) is listed below.

Known issues

Possible future development


Credits up

Financial support for the Mounting Books project provided by the Andrew W. Mellon Foundation's Scholarly Communications program.

Open Sky Solutions and James Chartrand of OSS were primary software developers for this project

Northwestern University Library project team:

Claire Stewart, project manager, Head, Digital Collections

Steve DiDomenico, lead library developer, Repository Architect, NUL Technology Division

Stu Baker, Associate University Librarian for Information Technology

Paul Clough, Kirtas production supervisor, Digital Collections

Karen Miller, Cataloger and metadata specialist, Bibliographic Services

Bill Parod, Repository Architect, NUL Technology Division

Julie Patton, Digital Projects Librarian, Digital Collections

Dan Zellner, Multimedia Services Specialist and head of production, Digital Collections

Scott Devine, Head of Preservation, served as Principal Investigator and managed financial reports and contracts for the project

Northwestern University's Academic & Research Technologies division provided the Flash viewer for the public interface

Anne Karle-Zenith and the staff at the University of Michigan for information and assistance relating to copyright research


Frequently Asked Questions (coming soon) up

As questions are submitted, they will be gathered here.

Please direct questions about this project or the digitized books found on this site to digitalcollections@northwestern.edu .


News, events, publications and presentations up

Book Workflow Interface software now available on SourceForge (summer 2009)

Open Repositories 2009. Mounting Books Project presentation, May 19, 2009.

CNI Spring 2009 Task Force Meeting, project briefing, April 7, 2009. Google Docs presentation.
movie 1 | movie 2

 


 

Trivia and fun stuff up

Why Crabcake?

Stu Baker, Associate University Librarian for Information Technology, deserves the credit for the codename "Crabcake." The logic was: Book Workflow Interface = BWI = airport code for Baltimore = Maryland = home of the delicious blue crab = crabcakes. Genius!