Collection Migration Template
This document lists the commands and steps, without a lot of detail, that should be taken to migrate a DG collection. Use this document as a template for recording actual collection migration, where additional details may be necessary.
Check our Migration Google Sheet to verify that there is NO worksheet for the target collection. If one exists you should rename it for safe-keeping to get it out of the way.
Map the
smb://storage/mediadb/DGingest/Migration-to-Alma/outputsandsmb://storage/mediadb/DGingest/Migration-to-Alma/exportsto the workstation as/Volumes/outputsand/Volumes/exports, respectively.On your workstation,
cdinto themigrate-MODS-to-dctermsproject directory and verify that themainbranch and itsvenvare active. Your prompt should look something like this:(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/migrate-MODS-to-dcterms ‹main›. Use thesource .venv/bin/activatecommand if needed.
cd ~/GitHub/migrate-MODS-to-dcterms
source .venv/bin/activate
- Run
main.py. Assuming a target collection ID of__target-collection__, it should look like this:
time python3 main.py --collection_name __target-collection__
- Run
to-google-sheet.py. Assuming a target collection ID of__target-collection__, it should look like this:
time python3 to-google-sheet.py --collection_name __target-collection__
The collection’s stakeholder(s) should be engaged to clean-up the metadata found in the new
__target-collection__worksheet in our Migration Google Sheet.Run
manage-collections.py. Assuming a target collection ID of__target-collection__, it should look like this:
time python3 manage-collections.py --collection_name __target-collection__
Note that the manage-collections.py script does LOTS of things for you. It will take care of rearranging most “compound” object data to achieve our intended Alma/Primo structures. Don’t skip this step!
- The scripts may fail to change compound parent and child objects’
collection_idas they should, so you should intervene and copy thepending-reviewID (81313013130004641) into every row in the spreadsheet replacing ALLcollection_idvalues.
This will import the objects into the suppressed pending-review collection so they can be reviewed before being moved to the proper sub-collection.
Note: The above was inserted as step 8 on 2024-Aug-22.
- Navigate to
~/GitHub/worksheet-file-finder, set your VENV as needed, and run thestreamlit_app.pyscript/application to check thefile_name_1column (typically columnAW) values against network storage, typically/Volumes/exports/__target-collection__/OBJ.
This step is now a Streamlit Python app so it should be largely self-explanatory. See the app’s README.md file for additional details if needed.
The streamlit_app.py will also check that the column headings in your worksheet are correct!
Note: The above was inserted as step 9 on 2024-Aug-22 and dramatically modified on 2024-Oct-15.
- Note that there are options in
worksheet-file-finderto automatically generate thumbnail images (.clientThumbfiles in the case of Alma migration) AND copy both the foundOBJandclientThumbto a new/Volumes/outputs/OBJssubdirectory.
Note: The above was inserted as step 10 on 2024-Dec-18.
- Run
expand-csv.py. Assuming a target collection ID of__target-collection__, it should look like this:
time python3 expand-csv.py --collection_name __target-collection__
Attention! The expand-csv.py script now accepts optional parameters --first_row (or -f) and --last_row (or -l) that can be used to limit the rows of the __target-collection__ Google Sheet that the values.csv file contains.
time python3 expand-csv.py --collection_name __target-collection__ --first_row 50 --last_row 500
If the --first_row parameter is omitted it defaults to 2, and if the last is omitted it defaults to 5000. The --last_row limit is automatically trimmed to the last row of data in the sheet so specifying a number larger than the row count effectively includes all rows including and after --first_row.
Examine the
/Volumes/outputs/__target-collection__directory to confirm that a newvalues.csvfile has been created.In Alma, invoke the
Digital Uploadervia theResources | Digital Uploadermenu selection.In the
Digital UploaderchooseAdd New Ingestfrom the menu tab in the upper center of the window.Give the ingest a descriptive and unique name
Ingest Details | Nameand note the all-importantIDvalue displayed below it.Select
Add Filesthen navigate to the/Volumes/outputs/__target-collection__network directory and select thevalues.csvfile there.Click on
Upload Allto send thevalues.csvfile off for later processing and clickOK.The
Digital Uploaderpage should show the aforementionedIDwith a status ofUpload Complete.Return to the terminal prompt where we will now enter some
aws S3...commands following guidance provided inAWS-S3-Storage-Info.md.List the contents of our
uploaddirectory like so:
aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize
Verify that there’s a short list of files in
../upload/including ourvalues.csvfile. Copy the ID portion of thevalues.csv(two subdirectories afterupload) path for use in the next step.Paste the copied path into the following
aws S3command AND be sure to change the__target-collection__to our intended target.
aws s3 cp /Volumes/exports/OBJs/ s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/__PASTE__/ --recursive
This should copy the /Volumes/outputs/OBJs subdirectory contents of our collection into AWS for ingest.
- List the contents of our
uploaddirectory to verify, like so:
aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize
Return to the
Digital Uploaderand select the ingest (should be at the top of the list) and clickSubmit Selected.Wait for the
Run MD Importbutton to be enabled, then click it.After a short time you should see a pop-up that says:
Import job xxxxxxxxxxxx04641 submitted successfully.Navigate in the menus to
Resources|Monitor and View Importsto check on progress.Be patient while the ingest takes place and you’re almost done!
Once the import/ingest is complete, report all of the new
MMS_IDvalues and copy/paste them back into the correspondingmms_idcells of the__target-collection__worksheet.Move the completed
__target-collection__worksheet and its..._READY-FOR-EXPANSIONcompanion to theMigration-ArciveGoogle Sheet. See theMigration-to-Alma-DGoogle SheetREADMEtab for instruction.
Note: The two sections above were appended as steps 28 and 29 (now 29 and 30) on 2024-Aug-22.
- Use
ResourcesandManage Collectionsto pull up thePending Reviewcollection and select up to 50 records at a time, then click theMove Selectedoption, search for and select the__target-collection__. Moving all of thePending Reviewcontent in this manner may take many iterations.
Note: Section 30 (now 31) above was appended on 2024-Oct-15.