Problems with dc:identifier Elements
The addition of scholar profiles from LASIR, specifically the module’s introduction of
/mods/identifier[@type='u2'] fields, has caused a few problems in Digital.Grinnell. Perhaps the most sinister of these… these fields are transformed into DC or Dublin Core elements that wreak havoc with our OAI export and subsequent import into Primo VE.
While on the subject of OAI, it’s worth noting here that we can query to see the OAI that Digital.Grinnell exported by visiting a URL like:
from= parameter at the end of the address. Specifying a date here will show us what was exported on the specified date (and since that date?).
A Possible Solution?
I’ve confirmed that if all
<dc:identifier> elements containing
u2:* values are removed from an object’s DC datastream, the object’s behavior in Digital.Grinnell is not impacted, and the objects’ import to Primo appear to be successful.
Identifying XML elements based on their “value” can be tricky, but I found that an xpath query like
*[contains(.,’u1:’) or contains(.,'u2:')] works nicely.
Fixing the MODS-to-DC Transform?
Unfortunately, this is NOT an attractive option because our XSLT is so complex. To be honest, I think ALL XSLT is too complex! I hate XSLT with a passion. The transform that Digital.Grinnell uses can be found in two places on DGDocker1:
- /var/www/html/sites/all/modules/islandora/dg7/xslt/mods_to_dc_grinnell.xsl, and
Yes, these two files are IDENTICAL, but a necessary evil due the way that the
islandora_xml_forms module is built. I hate that module almost as much as I hate XSLT, maybe more. 😦
The XSLT that we apply for MODS-to-DC transform reads like this:
<xsl:template match="mods:identifier"> <dc:identifier> <xsl:variable name="type" select="translate(@type,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')"/> <xsl:choose> <!-- 2.0: added identifier type attribute to output, if it is present--> <xsl:when test="contains(.,':')"> <xsl:value-of select="."/> </xsl:when> <xsl:when test="@type"> <xsl:value-of select="$type"/>: <xsl:value-of select="."/> </xsl:when> <xsl:when test="contains ('isbn issn uri doi lccn', $type)"> <xsl:value-of select="$type"/>: <xsl:value-of select="."/> </xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose> </dc:identifier> </xsl:template>
A Viable Solution
What I’m going to discuss here isn’t an ideal fix because every time we ingest a new object with
u2: MODS identifiers, our transform will put them in the object’s corresponding DC record. Like I said, not ideal. But there is a relatively easy way to remove what the transforms deposit.
For this purpose I’ve resurrected an old, broken
drush iduF command. So a command of the form
drush -u 1 iduF grinnell:31898-31902 PurgeElements --dsid=DC --xpath="*[contains(.,'u1:') or contains(.,'u2:')]" --verbose can be used to strip the DC object(s) of its/their offending element(s). The command is
PurgeElements and the
--xpath clause shown above is CRITICAL. As with all
drush iduF commands, the
--verbose parameter is optional, and
--dyrRun is also available for testing.
That command is worth highlighting one more time.
drush -u 1 iduF grinnell:31898-31902 PurgeElements --dsid=DC --xpath="*[contains(.,'u1:') or contains(.,'u2:')]" --verbose
An XML Namespace Issue?
It’s not been confirmed just yet, but there is speculation that the real root of the problem here stems from the apparent existence of
srw_dc namespace references that appear in DC records only when identifier
u2: elements exist. You may get some sense of what these references look like on OAI exports from the screen capture shared below.
If this is indeed the root of the problem, then my prayer (🙏) is that removing all unnecessary
u2: elements will make the problem go away. 🍀
And that’s a wrap. Until next time, stay safe and wash your hands! 😄