Book Ingest in Digital.Grinnell
It’s high-time this was posted to my blog, but the canonical copy of this document can be found in smb://Storage/LIBRARY/mcfatem/DG-Book-Ingest-Workflow.md
.
Valid Book Datastream Structure
I want to begin here by showing what I see as a “proper” working book datastream structure in Digital.Grinnell. The image below is a screen grab of the datastreams from the Grinnell College Yearbook 1961, DG object grinnell:23749
:
Creating a Valid Book Structure
One of the biggest problems I have encountered with ingest of books is uploading very large multi-page PDFs. Fortunately, I’ve crafted the following procedure for working around that limitation.
Procedure
On the host workstation (DGDocker1
in the case of Digital.Grinnell) open a command terminal and…
- Mount the .pdf file(s) representing the book(s) into the host’s
/mnt/storage
folder using something like:sudo mount -t cifs -o username=mcfatem //storage.grinnell.edu/MEDIADB/DGIngest /mnt/storage
- Find or create an empty book.pdf file on the host using something like:
touch ~/book.pdf
Open a browser from your local workstation and…
- Login to https://digital.grinnell.edu as an admin (
Library Staff
for example). - Navigate your browser to the book’s intended parent collection object in DG.
- Append
/manage
to the end of the parent object address and return. - Click the link to
Add an object to this Collection
. - Choose the
Islandora Internet Archive Book Content Model
content type. - Enter necessary MODS metadata and submit the form.
- In the PDF Upload field navigate to the
book.pdf
file created in Step 2 and upload it. The file should now appear in theisle-apache-dg
container as/tmp/book.pdf
Return to the open command terminal and…
- Copy the book’s actual .pdf file into the
isle-apache-dg
container like so:docker cp /mnt/storage/name-of-book.pdf isle-apache-dg:/tmp/book.pdf
When the docker cp
command is finished return to your browser and…
- Complete the process by clicking
Submit
at the end of the form.
Now sit back and watch the magic.
Incomplete Book… Orphaned pages
Unfortunately, I also have some “broken” books that have a slew of ingested book pages all pointing to the wrong parent. One of my entries in Trello, for the 1962 Cyclone relates to one such case:
grinnell:25521 - 25862 [Pages but NO Book!] Empty book object is grinnell:23747
The problem in the case of grinnell:23747
is twofold:
- That book object has no PDF - This condition can be corrected simply by uploading the appropriate PDF file as a new
PDF
datastream in the book/parent object. - The pages/children of that book/parent all incorrectly reference
grinnell:25520
as their book/parent - Fortunately, there’s an easy fix for this too, see below.
Since grinnell:25520
never ingested properly all of the pages are effectively “orphans”. In the case of the 1962 Cyclone I used this command running inside the isle-apache-dg
container:
root@28eb71ea69bf:/var/www/html/sites/default# drush -u 1 iduF grinnell:25521-25862 ChangeText --find="grinnell:25520" --replace="grinnell:23747" --dsid=RELS-EXT