Call it The GCDLADB if you like.
Here There Be WARCs 8 min read Jun 17, 2020 Superseded by posts/126-creating-warc-from-a-wordpress-clone
The term WARC, an abbreviation of Web ARChive, always reminds me of things like hobbits, elves, dark lords, and orcs, of course. But this post has nothing to do with those things so I need to clear my head and press on. A WARC is essentially a file format used to capture the content and organization of a web site. Recently, I was asked to add a pair of WARCs to Digital.Grinnell. Doing so proved to be quite an adventure, but I am pleased to report that we now have these two objects to show for it: ...
PDF Ingest in Digital.Grinnell 2 min read Jul 24, 2019
A set of 21 PDF objects were ingested into Digital.Grinnell’s Faculty Scholarship collection using IMI on 22-July-2019; unfortunately none of these PDFs contained OCR (optical character recognition) or “text recognition” data, so none of them generated a valid FULL_TEXT datastream. FULL_TEXT datastreams are required to make PDF, and similar text content, searchable and discoverable in Digital.Grinnell. In order to confirm that the lack of OCR was in fact the problem, I ran a little test on https://digital.grinnell.edu/islandora/object/grinnell:26702, one of the 21 objects. ...