It’s high-time this was posted to my blog, but the canonical copy of this document can be found in smb://Storage/LIBRARY/mcfatem/DG-Book-Ingest-Workflow.md.

Valid Book Datastream Structure

I want to begin here by showing what I see as a “proper” working book datastream structure in Digital.Grinnell. The image below is a screen grab of the datastreams from the Grinnell College Yearbook 1961, DG object grinnell:23749:

Figure 1 · Valid Book Datastreams

Creating a Valid Book Structure

One of the biggest problems I have encountered with ingest of books is uploading very large multi-page PDFs. Fortunately, I’ve crafted the following procedure for working around that limitation.

Procedure

On the host workstation (DGDocker1 in the case of Digital.Grinnell) open a command terminal and…

  • Mount the .pdf file(s) representing the book(s) into the host’s /mnt/storage folder using something like: sudo mount -t cifs -o username=mcfatem //storage.grinnell.edu/MEDIADB/DGIngest /mnt/storage
  • Find or create an empty book.pdf file on the host using something like: touch ~/book.pdf

Open a browser from your local workstation and…

  • Login to https://digital.grinnell.edu as an admin (Library Staff for example).
  • Navigate your browser to the book’s intended parent collection object in DG.
  • Append /manage to the end of the parent object address and return.
  • Click the link to Add an object to this Collection.
  • Choose the Islandora Internet Archive Book Content Model content type.
  • Enter necessary MODS metadata and submit the form.
  • In the PDF Upload field navigate to the book.pdf file created in Step 2 and upload it. The file should now appear in the isle-apache-dg container as /tmp/book.pdf

Return to the open command terminal and…

  • Copy the book’s actual .pdf file into the isle-apache-dg container like so: docker cp /mnt/storage/name-of-book.pdf isle-apache-dg:/tmp/book.pdf

When the docker cp command is finished return to your browser and…

  • Complete the process by clicking Submit at the end of the form.

Now sit back and watch the magic.

Incomplete Book… Orphaned pages

Unfortunately, I also have some “broken” books that have a slew of ingested book pages all pointing to the wrong parent. One of my entries in Trello, for the 1962 Cyclone relates to one such case:

grinnell:25521 - 25862 [Pages but NO Book!]  Empty book object is grinnell:23747

The problem in the case of grinnell:23747 is twofold:

  1. That book object has no PDF - This condition can be corrected simply by uploading the appropriate PDF file as a new PDF datastream in the book/parent object.
  2. The pages/children of that book/parent all incorrectly reference grinnell:25520 as their book/parent - Fortunately, there’s an easy fix for this too, see below.

Since grinnell:25520 never ingested properly all of the pages are effectively “orphans”. In the case of the 1962 Cyclone I used this command running inside the isle-apache-dg container:

root@28eb71ea69bf:/var/www/html/sites/default# drush -u 1 iduF grinnell:25521-25862 ChangeText --find="grinnell:25520" --replace="grinnell:23747" --dsid=RELS-EXT