Problems with dc:identifier Elements
The addition of scholar profiles from LASIR, specifically the module’s introduction of /mods/identifier[@type='u1']
and /mods/identifier[@type='u2']
fields, has caused a few problems in Digital.Grinnell. Perhaps the most sinister of these… these fields are transformed into DC or Dublin Core elements that wreak havoc with our OAI export and subsequent import into Primo VE.
OAI Exports
While on the subject of OAI, it’s worth noting here that we can query to see the OAI that Digital.Grinnell exported by visiting a URL like: https://digital.grinnell.edu/oai2?verb=ListRecords&metadataPrefix=oai_dc&from=2022-02-15
.
Note the from=
parameter at the end of the address. Specifying a date here will show us what was exported on the specified date (and since that date?).
A Possible Solution?
I’ve confirmed that if all <dc:identifier>
elements containing u1:*
or u2:*
values are removed from an object’s DC datastream, the object’s behavior in Digital.Grinnell is not impacted, and the objects’ import to Primo appear to be successful.
Identifying XML elements based on their “value” can be tricky, but I found that an xpath query like *[contains(.,’u1:’) or contains(.,'u2:')]
works nicely.
Fixing the MODS-to-DC Transform?
Unfortunately, this is NOT an attractive option because our XSLT is so complex. To be honest, I think ALL XSLT is too complex! I hate XSLT with a passion. The transform that Digital.Grinnell uses can be found in two places on DGDocker1:
- /var/www/html/sites/all/modules/islandora/dg7/xslt/mods_to_dc_grinnell.xsl, and
- /var/www/html/sites/all/modules/islandora/islandora_xml_forms/builder/transforms/mods_to_dc_grinnell.xsl
Yes, these two files are IDENTICAL, but a necessary evil due the way that the islandora_xml_forms
module is built. I hate that module almost as much as I hate XSLT, maybe more. 😦
The XSLT that we apply for MODS-to-DC transform reads like this:
<xsl:template match="mods:identifier">
<dc:identifier>
<xsl:variable name="type" select="translate(@type,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')"/>
<xsl:choose>
<!-- 2.0: added identifier type attribute to output, if it is present-->
<xsl:when test="contains(.,':')">
<xsl:value-of select="."/>
</xsl:when>
<xsl:when test="@type">
<xsl:value-of select="$type"/>: <xsl:value-of select="."/>
</xsl:when>
<xsl:when test="contains ('isbn issn uri doi lccn', $type)">
<xsl:value-of select="$type"/>: <xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</dc:identifier>
</xsl:template>
A Viable Solution
What I’m going to discuss here isn’t an ideal fix because every time we ingest a new object with u1:
and/or u2:
MODS identifiers, our transform will put them in the object’s corresponding DC record. Like I said, not ideal. But there is a relatively easy way to remove what the transforms deposit.
For this purpose I’ve resurrected an old, broken drush iduF
command. So a command of the form drush -u 1 iduF grinnell:31898-31902 PurgeElements --dsid=DC --xpath="*[contains(.,'u1:') or contains(.,'u2:')]" --verbose
can be used to strip the DC object(s) of its/their offending element(s). The command is PurgeElements
and the --xpath
clause shown above is CRITICAL. As with all drush iduF
commands, the --verbose
parameter is optional, and --dyrRun
is also available for testing.
That command is worth highlighting one more time.
drush -u 1 iduF grinnell:31898-31902 PurgeElements --dsid=DC --xpath="*[contains(.,'u1:') or contains(.,'u2:')]" --verbose
An XML Namespace Issue?
It’s not been confirmed just yet, but there is speculation that the real root of the problem here stems from the apparent existence of srw_dc
namespace references that appear in DC records only when identifier u1:
or u2:
elements exist. You may get some sense of what these references look like on OAI exports from the screen capture shared below.
If this is indeed the root of the problem, then my prayer (🙏) is that removing all unnecessary <dc:identifier>
u1:
and u2:
elements will make the problem go away. 🍀
And that’s a wrap. Until next time, stay safe and wash your hands! 😄