- Project history and scope
- Help with the online book viewer
- The Mounting Books software
- Known issues and future development
- Frequently Asked Questions (coming soon)
- News, events, publications and presentations
Northwestern University Library (NUL) uses a Kirtas automated book scanner and the services of external vendors to reformat brittle books, digitize special collections, and fulfill targeted patron digitization requests. Prior to 2007, however, the Library lacked a mechanism for making digital book facsimiles available online. In fall 2007, the Andrew W. Mellon Foundation awarded NUL a one-year grant to develop software to gather page scans, apply book structure, prepare OCR and JPEG2000 derivatives, generate technical and preservation metadata, submit these book objects to a Fedora repository, and present a public book viewer with both page turning and search capabilities. The public book viewing site, Northwestern Books (books.northwestern.edu), became available in beta preview on April 6, 2009. The software developed through the Mellon-funded Mounting Books project is now available as open source via SourceForge.net (summer 2009).
The Northwestern Books home page contains a simple search mechanism; enter any word and click the Search button to find the word in metadata or in the full text of a book.
Clicking Search without entering a term will retrieve all currently available books. Searches can be fine-tuned using the "Anywhere" drop-down menu to find a word only in the Title, Author, Publisher, Subject or full Text. Make your search more specific by clicking the "Add more search terms" button to craft a more focused search using And, Or and Not operators.
Once a book has been retrieved, thumbnails of all pages will be displayed:
The default setting will show the first 100 thumbnails; if the book is longer than 100 pages, use the navigation controls at the very bottom of the screen:
These controls can be used to jump ahead through thumbnails, or to change the number of thumbnails shown per page.
To see a page at full size, simply click its thumbnail. The page will open at full size, and a new set of buttons will appear at the top. Use these buttons to adjust your view or return to the panel of thumbnails:
The "Raw Text" button will display the uncorrected text generated by the Optical Character Recognition (OCR) software. This is the text that is used in the book search feature. The "Zoom In" and "Zoom Out" tools present the page at a limited number of pre-selected zoom levels. The "Use Flash" button calls up a more sophisticated zooming tool that can be used for dynamic zooming and panning, and can also be used to rotate the page to any degree desired:
Once the "Use Flash" feature has been activated, it will remain the default option for page viewing until it is clicked a second time to deactivate it.
You may navigate forwards and backwards within a book, page by page, using the navigation controls at the bottom of the screen:
Note that these are the same navigation buttons used in the thumbnail view, but since this is a page-at-a-time page-turning view, the "Items per page" option is greyed out and unavailable.
The navigation panel on the left-hand side of the screen can be hidden from view to maximize the page viewing area. The small double-arrow icon can be clicked to hide and reveal the navigation panel:
Additional features available within the left-hand navigation panel include a Search inside this book feature and an Explore this Book feature. If page numbering or structural elements such as a Table of Contents and chapters have been added to the book, these elements will be viewable in Explore this Book.
The Link to this Book and Download/Print this Book features have not yet been fully implemented as of April 2009; see Known issues and future development (below) for more information.
The Mounting Books software is a suite of web-based tools for publishing book objects to a Fedora repository and presenting a public book viewer. There are two main components to the suite: the Book Workflow Interface and the Book Viewer. The software will be available for download on June 1, 2009.
Book Workflow Interface
The Book Workflow Interface (also known as BWI, and codenamed "Crabcake" by the project team) includes a jBPM workflow management suite that manages a book digitization process, moving scans from initial capture on the Kirtas scanner through a series of post-processing and quality review steps. The jBPM process culminates in an elegant book structure tool, the Book Builder, that uses a drag-and-drop behavior inside the web browser to support page arrangement and numbering, and to add structure to digitized books (chapters, sections, etc.). A Queue Server accepts page images, submits them to conversion tools that prepare OCR, JPEG2000 derivatives, and technical metadata. The final step of the BWI process generates permanent Handle URLs, submits the book objects to a Fedora repository and updates a SOLR index for searching. The BWI can also import page scans scanned elsewhere, a feature used both for vendor-digitized books and to import foldouts scanned separately.
The Book Viewing software presents a SOLR-based search for fielded metadata and full text searching. It utilizes the Aware JPEG2000 software to present a zoomable view of book pages and a page-turning interface. See the Help section, above, for further explanation of the features of the book viewer.
External software components
The Mounting Books software also employs a number of external software components. Depending on the needs and development capacity of any given library/institution, these components may be replaceable. For example, if a different OCR engine is used, or a different JPEG2000 viewer, those elements can be replaced with other mechanisms by modifying the BWI code. External components include:
- Ext JS
- Fedora digital repository
- ABBYY OCR engine
- Aware JPEG2000 software
- Yaz Proxy Z39.50 gateway
- Handle permanent URL system
- Voyager Processing Server/Crabcake Assistant (Voyager OPAC link publisher)
Between the beta preview launch (April 6, 2009) and the first release of the Mounting Books software, the development team will be working on a limited number of Known issues. Additional development and features may be added in future, either by the Northwestern/OSS teams or by external users. A brief list of possible enhancements (Future development) is listed below.
- Add a link to full metadata in Voyager OPAC
- Minor fixes to display of metadata on public books site: suppress empty fields, better subject heading handling, etc.
- Add PDF download feature, including user-specified page range
- Improve function of Explore this book navigation; clicking a folder should open it or limit the thumbnail views
- Improve default labeling of pages: possibly change from "Page x" to "Image x" to reduce confusion between numbered pages and unnumbered but sequential image scans.
- Statistical reports on book jobs
- Add preservation metadata (PREMIS) capture to the BWI system for digital life cycle tracking
- OCR performance, language and non-Roman alphabet handling
- Add capability to copy original scans back to processing workstation to fix QC errors as needed
- Streamline jBPM system to automatically advance through selected steps where operator intervention not strictly necessary
- Improve page numbering/tagging features in the Book Builder to add all features to the right-click menu
Possible future development
- Submit books to Google Books/HathiTrust
- Enhance access controls to encompass a broader range of rules and copyright situations
- Expand jBPM system to support more sophisticated text structure and non-text workflow processes
- Extend Book Building features to subject specialists, catalogers, and, potentially, to faculty and other scholars
- Integrate METS/ALTO or TEI features for deeper tagging and structure of text objects
Financial support for the Mounting Books project provided by the Andrew W. Mellon Foundation's Scholarly Communications program.
Open Sky Solutions and James Chartrand of OSS were primary software developers for this project
Northwestern University Library project team:
Claire Stewart, project manager, Head, Digital Collections
Steve DiDomenico, lead library developer, Repository Architect, NUL Technology Division
Stu Baker, Associate University Librarian for Information Technology
Paul Clough, Kirtas production supervisor, Digital Collections
Karen Miller, Cataloger and metadata specialist, Bibliographic Services
Bill Parod, Repository Architect, NUL Technology Division
Julie Patton, Digital Projects Librarian, Digital Collections
Dan Zellner, Multimedia Services Specialist and head of production, Digital Collections
Scott Devine, Head of Preservation, served as Principal Investigator and managed financial reports and contracts for the project
Northwestern University's Academic & Research Technologies division provided the Flash viewer for the public interface
Anne Karle-Zenith and the staff at the University of Michigan for information and assistance relating to copyright research
As questions are submitted, they will be gathered here.
Please direct questions about this project or the digitized books found on this site to email@example.com .
Open Repositories 2009. Mounting Books Project presentation, May 19, 2009.
Stu Baker, Associate University Librarian for Information Technology, deserves the credit for the codename "Crabcake." The logic was: Book Workflow Interface = BWI = airport code for Baltimore = Maryland = home of the delicious blue crab = crabcakes. Genius!