Attention: On 21-May-2020 an optional, but recommended, sixth step was added to this workflow in the form of a new Drush command: islandora_mods_post_processing, an addition to my previous work in islandora_mods_via_twig. See my new post, Islandora MODS Post Processing for complete details.
A 5-Step Workflow
This document is follow-up, with technical details, to Exporting, Editing, & Replacing MODS Datastreams, post 069, in my blog. In case you missed it, the aforementioned post was written specifically for metadata editors working on the 2020 Grinnell College Libraries review of Digital Grinnell MODS metadata.
Attention: This document uses a shorthand ./ in place of the frequently referenced //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/ directory. For example, ./social-justice is equivalent to the Social Justice collection sub-directory at //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/social-justice.
Briefly, the five steps in this workflow are:
Export of all grinnell:*MODS datastreams using drush islandora_datastream_export. This step, last performed on April 14, 2020, was responsible for creating all of the grinnell_<PID>_MODS.xml exports found in ./<collection-PID>.
Execute my Map-MODS-to-MASTERPython 3 script on iMac MA8660 to create a mods.tsv file for each collection, along with associated grinnell_<PID>_MODS.log and grinnell_<PID>_MODS.remainder files for each object. The resultant ./<collection-PID>/mods.tsv files are tab-seperated-value (.tsv) files, and they are key to this process.
Use drush islandora_mods_via_twig in each ready-for-update collection to generate new .xml MODS datastream files. For a specified collection, this command will find and read the ./<collection-PID>/mods-imvt.tsv and create one ./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xml file for each object.
Execute the drush islandora_datastream_replace command once for each collection. This command will process each ./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xml file and replace the corresponding object’s MODS datastream with the contents of the .xml file. The digital_grinnell branch version of the islandora_datastream_replace command also performs an implicit update of the object’s “Title”, a transform of the new MODS to DC (Dublin Core), and a re-indexing of the new metadata in Solr.
The remainder of this document provides technical details, frequently in the form of command lines used to build and use the aforementioned tools.
Step 1a - Installation of Drush islandora_datastream_export and islandora_datastream_replace Commands
Local tests of these commands were successful so I proceeded to install them in the production instance of Digital Grinnell at dgdocker1.grinnell.edu. Before doing that I needed to change the definition of Apache to reflect the production instance of our Apache container, like so Apache=isle-apache-dg.
Created a Fork of Islandora Datastream Replace
I also chose to “fork” the islandora_datastream_replace project so that I could do a little Digital.Grinnell customization of it. The fork I’m working with is here and my work is limited to the digital_grinnell branch of that fork.
In the digital_grinnell branch I modified the behavior of the islandora_datastream_replace command so that it implicitly performs an UpdateFromMODS operation that lives in our idu, or Islandora Drush Utilities module. The UpdateFromMODS, performed immediately after each datastream replace operation does the following:
Updates the object “Title”, one of its properties, to match the new value of /mods:mods/mods:titleInfo[not(@type)]/mods:title.
Invokes the iduF DCTransform operation which runs the default XSLT transform of the new MODS to DC (Dublin Core) and creates a new “DC” datastream for the object.
The iduF DCTransform operation also concludes with an implicit iduF IndexSolr operation to ensure that the new object metadata is properly indexed in Solr.
Step 1b - Installation of Drush islandora_datastream_export and islandora_datastream_replace Commands in Production
To install the commands in production I opened a terminal to dgdocker1.grinnell.edu as user islandora and executed the following commands there:
Attention! This step, and some that come later, will require that the network storage path //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1 be accessible to our production instance of Digital.Grinnell. To make that possible I had to run this sequence on DGDocker1:
Unfortunately, the islandora_datastream_export results in my local test were woefully incomplete… NONE of the child objects with a compound parent were exported. I’m still not entirely sure why child obejcts were omitted since the query I used should have captured all objects. In testing I did find that this seems to be a flaw in the islandora_datastream_export command, and specifically in its implementation of any Solr query.
Fortunately, the aforementioned command also has a SPARQL query option, and after some trial-and-error I got it to work properly. To do so I created an export.shbash script, shown below, and used it on dgdocker1.grinnell.edu like so:
In the case of the Digital Grinnellsocial-justice collection, for example, this script produced 32 .xml files, the correct number. Each collection’s set of exported .xml files can be found in the collection-specific subdirectory of //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/ and all have filenames of the form: grinnell_<PID>_MODS.xml. Note that objects which have no MODS datastream were not exported.
Step 2 - Map-MODS-to-MASTERPython 3 Script
The Map-MODS-to-MASTER script was developed, in Python 3, on iMac MA8660 at ~/GitHub/Map-MODS-to-MASTER to facilitate generation of mods.tsv and accompanying .log files for each Digital Grinnell collection from the .xml files found in subdirectories of //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/.
The Map-MODS-to-MASTER project can be found in the master branch of https://github.com/DigitalGrinnell/Map-MODS-to-MASTER. I choose to execute it using PyCharm from iMac MA8660 since the directory holding all of the .xml files and folders is already mapped to /Volumes/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1 on that iMac. Note that this //STORAGE location was choosen because the ./ALLSTAFF directory, and its subordinates, are accessible to all staff in the Grinnell College Libraries.
It should not be necessary to run this script ever again…NEVER. However, if it becomes necessary to look back at this code and process, details can be found in Map-MODS-to-MASTER. Note: If it should ever become necessary to repeat the Map-MODS-to-MASTER process it might be wise to look at replacing the Python 3 script with a new Drush command, maybe islandora_map_mods_to_master, written in PHP and installed directly into the production instance of Digital.Grinnell.
As each individual collection mods-imvt.tsv file is made ready-for-update, it will be necessary to run a drush islandora_mods_via_twig command to process the .tsv data. Running --help with that command produces:
[islandora@dgdocker1 ~]$ docker exec -it isle-apache-dg bash
root@122092fe8182:/# cd /var/www/html/sites/default/
root@122092fe8182:/var/www/html/sites/default# drush -u 1 islandora_mods_via_twig --help
Generate MODS .xml files from the mods-imvt.tsv file for a specified collection.
drush -u 1 islandora_mods_via_twig social-justice Process ../social-justice/mods-imvt.tsv, for example.
collection The name of the collection to be processed. Defaults to "social-justice".
So, my command sequence to run islandora_mods_via_twig for the “Social Justice” collection, as an example, was:
When the islandora_mods_via_twig command is run, it processes the corresponding //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/mods-imvt.tsv file and creates one //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xml file for each object.
Step 5 - Run drush islandora_datastream_replace
The whole point of this entire process is to get us back to this point with a set of reviewed and modified .xml files in a //STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/ready-for-datastream-replace/ collection-specific subdirectory so that we can replace existing object MODS datastreams with new data, and we use the drush islandora_datastream_replace command to do this.
Running --help for the aformentioned command produced this:
root@122092fe8182:/var/www/html/sites/default# drush -u 1 islandora_datastream_replace --help
Replaces a datastream in all objects given a file list in a directory.
drush -u 1 islandora_datastream_replace --source=/mnt/metadata-review/social-justice/ready-for-datastream-replace
Replacing MODS datastream for objects in --source using the digital_grinnell branch of code.
--dsid The datastream id of the datastream. Required.
--namespace The namespace of the pids. Required.
--source The directory to get the datastreams and pid# from. Required.
It’s worth noting that this command looks for any files named MODS in whatever ABSOLUTE directory is named with the --source parameter. The command shown below was executed inside the Apache container, isle-apache-dg, on node DGDocker1, in order to process Digital Grinnell’s social-justice collection.