Collection Migration Template // The Grinnell College Digital Library Application Developer's Blog

This document lists the commands and steps, without a lot of detail, that should be taken to migrate a DG collection. Use this document as a template for recording actual collection migration, where additional details may be necessary.

Check our Migration Google Sheet to verify that there is NO worksheet for the target collection. If one exists you should rename it for safe-keeping to get it out of the way.
Map the smb://storage/mediadb/DGingest/Migration-to-Alma/outputs and smb://storage/mediadb/DGingest/Migration-to-Alma/exports to the workstation as /Volumes/outputs and /Volumes/exports, respectively.
On your workstation, cd into the migrate-MODS-to-dcterms project directory and verify that the main branch and its venv are active. Your prompt should look something like this: (.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/migrate-MODS-to-dcterms ‹main›. Use the source .venv/bin/activate command if needed.

cd ~/GitHub/migrate-MODS-to-dcterms
source .venv/bin/activate

Run main.py. Assuming a target collection ID of __target-collection__, it should look like this:

time python3 main.py --collection_name __target-collection__

Run to-google-sheet.py. Assuming a target collection ID of __target-collection__, it should look like this:

time python3 to-google-sheet.py --collection_name __target-collection__

The collection’s stakeholder(s) should be engaged to clean-up the metadata found in the new __target-collection__ worksheet in our Migration Google Sheet.
Run manage-collections.py. Assuming a target collection ID of __target-collection__, it should look like this:

time python3 manage-collections.py --collection_name __target-collection__

Note that the manage-collections.py script does LOTS of things for you. It will take care of rearranging most “compound” object data to achieve our intended Alma/Primo structures. Don’t skip this step!

The scripts may fail to change compound parent and child objects’ collection_id as they should, so you should intervene and copy the pending-review ID (81313013130004641) into every row in the spreadsheet replacing ALL collection_id values.

This will import the objects into the suppressed pending-review collection so they can be reviewed before being moved to the proper sub-collection.

Note: The above was inserted as step 8 on 2024-Aug-22.

Navigate to ~/GitHub/worksheet-file-finder, set your VENV as needed, and run the streamlit_app.py script/application to check the file_name_1 column (typically column AW) values against network storage, typically /Volumes/exports/__target-collection__/OBJ.

This step is now a Streamlit Python app so it should be largely self-explanatory. See the app’s README.md file for additional details if needed.

The streamlit_app.py will also check that the column headings in your worksheet are correct!

Note: The above was inserted as step 9 on 2024-Aug-22 and dramatically modified on 2024-Oct-15.

Note that there are options in worksheet-file-finder to automatically generate thumbnail images (.clientThumb files in the case of Alma migration) AND copy both the found OBJ and clientThumb to a new /Volumes/outputs/OBJs subdirectory.

Note: The above was inserted as step 10 on 2024-Dec-18.

Run expand-csv.py. Assuming a target collection ID of __target-collection__, it should look like this:

time python3 expand-csv.py --collection_name __target-collection__

Attention! The expand-csv.py script now accepts optional parameters --first_row (or -f) and --last_row (or -l) that can be used to limit the rows of the __target-collection__ Google Sheet that the values.csv file contains.

time python3 expand-csv.py --collection_name __target-collection__ --first_row 50 --last_row 500

If the --first_row parameter is omitted it defaults to 2, and if the last is omitted it defaults to 5000. The --last_row limit is automatically trimmed to the last row of data in the sheet so specifying a number larger than the row count effectively includes all rows including and after --first_row.

Examine the /Volumes/outputs/__target-collection__ directory to confirm that a new values.csv file has been created.
In Alma, invoke the Digital Uploader via the Resources | Digital Uploader menu selection.
In the Digital Uploader choose Add New Ingest from the menu tab in the upper center of the window.
Give the ingest a descriptive and unique name Ingest Details | Name and note the all-important ID value displayed below it.
Select Add Files then navigate to the /Volumes/outputs/__target-collection__ network directory and select the values.csv file there.
Click on Upload All to send the values.csv file off for later processing and click OK.
The Digital Uploader page should show the aforementioned ID with a status of Upload Complete.
Return to the terminal prompt where we will now enter some aws S3... commands following guidance provided in AWS-S3-Storage-Info.md.
List the contents of our upload directory like so:

aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize

Verify that there’s a short list of files in ../upload/ including our values.csv file. Copy the ID portion of the values.csv (two subdirectories after upload) path for use in the next step.
Paste the copied path into the following aws S3 command AND be sure to change the __target-collection__ to our intended target.

aws s3 cp /Volumes/exports/OBJs/ s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/__PASTE__/ --recursive

This should copy the /Volumes/outputs/OBJs subdirectory contents of our collection into AWS for ingest.

List the contents of our upload directory to verify, like so:

aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize

Return to the Digital Uploader and select the ingest (should be at the top of the list) and click Submit Selected.
Wait for the Run MD Import button to be enabled, then click it.
After a short time you should see a pop-up that says: Import job xxxxxxxxxxxx04641 submitted successfully.
Navigate in the menus to Resources | Monitor and View Imports to check on progress.
Be patient while the ingest takes place and you’re almost done!
Once the import/ingest is complete, report all of the new MMS_ID values and copy/paste them back into the corresponding mms_id cells of the __target-collection__ worksheet.
Move the completed __target-collection__ worksheet and its ..._READY-FOR-EXPANSION companion to the Migration-Arcive Google Sheet. See the Migration-to-Alma-D Google Sheet README tab for instruction.

Note: The two sections above were appended as steps 28 and 29 (now 29 and 30) on 2024-Aug-22.

Use Resources and Manage Collections to pull up the Pending Review collection and select up to 50 records at a time, then click the Move Selected option, search for and select the __target-collection__. Moving all of the Pending Review content in this manner may take many iterations.

Note: Section 30 (now 31) above was appended on 2024-Oct-15.