Collection Migration Template
This document lists the commands and steps, without a lot of detail, that should be taken to migrate a DG collection. Use this document as a template for recording actual collection migration, where additional details may be necessary.
Check our Migration Google Sheet to verify that there is NO worksheet for the target collection. If one exists you should rename it for safe-keeping to get it out of the way.
Map the
smb://storage/mediadb/DGingest/Migration-to-Alma/outputs
andsmb://storage/mediadb/DGingest/Migration-to-Alma/exports
to the workstation as/Volumes/outputs
and/Volumes/exports
, respectively.On your workstation,
cd
into themigrate-MODS-to-dcterms
project directory and verify that themain
branch and itsvenv
are active. Your prompt should look something like this:(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/migrate-MODS-to-dcterms ‹main›
. Use thesource .venv/bin/activate
command if needed.
cd ~/GitHub/migrate-MODS-to-dcterms
source .venv/bin/activate
- Run
main.py
. Assuming a target collection ID of__target-collection__
, it should look like this:
time python3 main.py --collection_name __target-collection__
- Run
to-google-sheet.py
. Assuming a target collection ID of__target-collection__
, it should look like this:
time python3 to-google-sheet.py --collection_name __target-collection__
The collection’s stakeholder(s) should be engaged to clean-up the metadata found in the new
__target-collection__
worksheet in our Migration Google Sheet.Run
manage-collections.py
. Assuming a target collection ID of__target-collection__
, it should look like this:
time python3 manage-collections.py --collection_name __target-collection__
Note that the manage-collections.py
script does LOTS of things for you. It will take care of rearranging most “compound” object data to achieve our intended Alma/Primo structures. Don’t skip this step!
- The scripts may fail to change compound parent and child objects’
collection_id
as they should, so you should intervene and copy thepending-review
ID (81313013130004641
) into every row in the spreadsheet replacing ALLcollection_id
values.
This will import the objects into the suppressed pending-review
collection so they can be reviewed before being moved to the proper sub-collection.
Note: The above was inserted as step 8 on 2024-Aug-22.
- Navigate to
~/GitHub/worksheet-file-finder
, set your VENV as needed, and run thestreamlit_app.py
script/application to check thefile_name_1
column (typically columnAW
) values against network storage, typically/Volumes/exports/__target-collection__/OBJ
.
This step is now a Streamlit
Python app so it should be largely self-explanatory. See the app’s README.md
file for additional details if needed.
The streamlit_app.py
will also check that the column headings in your worksheet are correct!
Note: The above was inserted as step 9 on 2024-Aug-22 and dramatically modified on 2024-Oct-15.
- Note that there are options in
worksheet-file-finder
to automatically generate thumbnail images (.clientThumb
files in the case of Alma migration) AND copy both the foundOBJ
andclientThumb
to a new/Volumes/outputs/OBJs
subdirectory.
Note: The above was inserted as step 10 on 2024-Dec-18.
- Run
expand-csv.py
. Assuming a target collection ID of__target-collection__
, it should look like this:
time python3 expand-csv.py --collection_name __target-collection__
Attention! The expand-csv.py
script now accepts optional parameters --first_row
(or -f
) and --last_row
(or -l
) that can be used to limit the rows of the __target-collection__
Google Sheet that the values.csv
file contains.
time python3 expand-csv.py --collection_name __target-collection__ --first_row 50 --last_row 500
If the --first_row
parameter is omitted it defaults to 2
, and if the last is omitted it defaults to 5000
. The --last_row
limit is automatically trimmed to the last row of data in the sheet so specifying a number larger than the row count effectively includes all rows including and after --first_row
.
Examine the
/Volumes/outputs/__target-collection__
directory to confirm that a newvalues.csv
file has been created.In Alma, invoke the
Digital Uploader
via theResources | Digital Uploader
menu selection.In the
Digital Uploader
chooseAdd New Ingest
from the menu tab in the upper center of the window.Give the ingest a descriptive and unique name
Ingest Details | Name
and note the all-importantID
value displayed below it.Select
Add Files
then navigate to the/Volumes/outputs/__target-collection__
network directory and select thevalues.csv
file there.Click on
Upload All
to send thevalues.csv
file off for later processing and clickOK
.The
Digital Uploader
page should show the aforementionedID
with a status ofUpload Complete
.Return to the terminal prompt where we will now enter some
aws S3...
commands following guidance provided inAWS-S3-Storage-Info.md
.List the contents of our
upload
directory like so:
aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize
Verify that there’s a short list of files in
../upload/
including ourvalues.csv
file. Copy the ID portion of thevalues.csv
(two subdirectories afterupload
) path for use in the next step.Paste the copied path into the following
aws S3
command AND be sure to change the__target-collection__
to our intended target.
aws s3 cp /Volumes/exports/OBJs/ s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/__PASTE__/ --recursive
This should copy the /Volumes/outputs/OBJs
subdirectory contents of our collection into AWS for ingest.
- List the contents of our
upload
directory to verify, like so:
aws s3 ls s3://na-st01.ext.exlibrisgroup.com/01GCL_INST/upload/ --recursive --human-readable --summarize
Return to the
Digital Uploader
and select the ingest (should be at the top of the list) and clickSubmit Selected
.Wait for the
Run MD Import
button to be enabled, then click it.After a short time you should see a pop-up that says:
Import job xxxxxxxxxxxx04641 submitted successfully
.Navigate in the menus to
Resources
|Monitor and View Imports
to check on progress.Be patient while the ingest takes place and you’re almost done!
Once the import/ingest is complete, report all of the new
MMS_ID
values and copy/paste them back into the correspondingmms_id
cells of the__target-collection__
worksheet.Move the completed
__target-collection__
worksheet and its..._READY-FOR-EXPANSION
companion to theMigration-Arcive
Google Sheet. See theMigration-to-Alma-D
Google SheetREADME
tab for instruction.
Note: The two sections above were appended as steps 28 and 29 (now 29 and 30) on 2024-Aug-22.
- Use
Resources
andManage Collections
to pull up thePending Review
collection and select up to 50 records at a time, then click theMove Selected
option, search for and select the__target-collection__
. Moving all of thePending Review
content in this manner may take many iterations.
Note: Section 30 (now 31) above was appended on 2024-Oct-15.