Book Ingest in Digital.Grinnell
It’s high-time this was posted to my blog, but the canonical copy of this document can be found in
Valid Book Datastream Structure
I want to begin here by showing what I see as a “proper” working book datastream structure in Digital.Grinnell. The image below is a screen grab of the datastreams from the Grinnell College Yearbook 1961, DG object
Creating a Valid Book Structure
One of the biggest problems I have encountered with ingest of books is uploading very large multi-page PDFs. Fortunately, I’ve crafted the following procedure for working around that limitation.
On the host workstation (
DGDocker1 in the case of Digital.Grinnell) open a command terminal and…
- Mount the .pdf file(s) representing the book(s) into the host’s
/mnt/storagefolder using something like:
sudo mount -t cifs -o username=mcfatem //storage.grinnell.edu/MEDIADB/DGIngest /mnt/storage
- Find or create an empty book.pdf file on the host using something like:
Open a browser from your local workstation and…
- Login to https://digital.grinnell.edu as an admin (
Library Stafffor example).
- Navigate your browser to the book’s intended parent collection object in DG.
/manageto the end of the parent object address and return.
- Click the link to
Add an object to this Collection.
- Choose the
Islandora Internet Archive Book Content Modelcontent type.
- Enter necessary MODS metadata and submit the form.
- In the PDF Upload field navigate to the
book.pdffile created in Step 2 and upload it. The file should now appear in the
Return to the open command terminal and…
- Copy the book’s actual .pdf file into the
isle-apache-dgcontainer like so:
docker cp /mnt/storage/name-of-book.pdf isle-apache-dg:/tmp/book.pdf
docker cp command is finished return to your browser and…
- Complete the process by clicking
Submitat the end of the form.
Now sit back and watch the magic.
Incomplete Book… Orphaned pages
Unfortunately, I also have some “broken” books that have a slew of ingested book pages all pointing to the wrong parent. One of my entries in Trello, for the 1962 Cyclone relates to one such case:
grinnell:25521 - 25862 [Pages but NO Book!] Empty book object is grinnell:23747
The problem in the case of
grinnell:23747 is twofold:
- That book object has no PDF - This condition can be corrected simply by uploading the appropriate PDF file as a new
- The pages/children of that book/parent all incorrectly reference
grinnell:25520as their book/parent - Fortunately, there’s an easy fix for this too, see below.
grinnell:25520 never ingested properly all of the pages are effectively “orphans”. In the case of the 1962 Cyclone I used this command running inside the
root@28eb71ea69bf:/var/www/html/sites/default# drush -u 1 iduF grinnell:25521-25862 ChangeText --find="grinnell:25520" --replace="grinnell:23747" --dsid=RELS-EXT