presenter notes Last week, we started discussing born-digital processing and the hands-on work of managing and preserving digital materials through imaging and transfers. Weâll continue exploring different aspects of that work in the coming weeks. This week, weâre zooming out to look at systems: the tools that help us do this work at scale. To effectively manage digital collections, we need systems that can: a.) Store and process data about files, such as those created through imaging or digitization. b.) Make this data accessible to end-usersâresearchers, patrons, and casual browsersâso they can search, discover, and make sense of these materials. Iâll walk through two real-life examples of archival systems in use: - At Yale (where I work) - At the Bentley Historical Library at the University of Michigan These examples will show how different systems work together to support digital archiving. Beyond just having systems, we also need to ensure that they communicate with each otherâthis is where interoperability comes in. If a system can send, receive, or exchange data with another system, it means they are interoperable. One of the primary ways that systems "talk" to one another is through Application Programming Interfaces, or APIs. Today, weâll introduce APIs and discuss how they allow different systems to share data, automate tasks, and enhance access to digital collections.
presenter notes To approach the topic of systems used in a digital repository (and defining what a digital repository is), letâs start by exploring the systems that support a physical archival repository. An archival repository is a physical place where archives are kept. The physical components of an archival repository are stored on shelves (aka âstacksâ) in, ideally, highly monitored, climate-controlled spaces, to ensure the materials are protected from risks posed by things like rodents, bugs, and humidity, but also organized in a way so items can be retrieved for access or remediation.
presenter notes Letâs imagine we are working in a purely physical repository: just record boxes on shelves, no digital items. That would be pretty rare today, but for the sake of this example, let's focus only on the systems that help manage physical collections. Let's pretend it's the year 1995. Hereâs a non-exhaustive list of systems we might be using in this environment. Oftentimes, these systems work separate from one another. For example, you would have the archivist entering finding aid data into a word processing document; you would have a spreadsheet or a simple database tracking object locations and reading room requests, a separate website that published a list of archival collections, and an online catalog entry for the archival collection, maybe with a special instruction to email the archives to schedule an appointment. Once you got to your appointment, the archivist would hand you a paper-based finding aid.
presenter notes Link to Rachaelâs blog post: https://www.tdl.org/2019/04/what-is-a-digital-repository/ Digital repositories function much like physical archival repositoriesâboth are designed to organize, store, and provide access to materials. The way digital repositories are set up, managed, and maintained often mirrors how physical stacks work, following similar principles of organization and preservation. However, digital repositories introduce added complexity. The biggest difference is in how materials are managed, requiring multiple layers of systems to track different aspects of digital objects. For example, a floppy disk in a box would be tracked in a collection management system, which records its physical locationâinside a specific box, on a particular shelf, in a room with controlled temperature and humidity. But that same floppy disk may also have a digital presence that needs additional tracking. A separate system would record that a disk image exists, where it is stored on a server, and when its checksum was last verified. Modern archival systems also need to connect the physical object to its digital manifestation, ensuring that both are managed in relation to each other. It would need to make that information available both to repository managers, and end-users, in a way that is understandable and discoverable. A digital repository is just one layer in a larger system that tracks both the physical and digital characteristics, versions, and events of an archival object. These layers work together to maintain integrity, access, and preservation across different formats. Just like physical stacks are supported by multiple systems, a digital repository is supported by multiple systems too.â
Presenter notes While no perfect system of integrated systems exists, we are seeing more of an effort to make each of these system types work with one another. There are many reasons behind this push, and likely has had a lot to do with collections becoming increasingly online/networked. By nature, a network exchanges information, and in order to do that, systems need to be able to "talk" to each other in a way that makes them mutually understandable to one another. We also saw an extra push during COVID lockdown: this event increased patron expectations be able to look at materials in an online-only setting.
presenter notes Now that weâve talked about how different archival systems interact, both for physical and digital materials, letâs step back and look at how these systems fit together structurally. In the world of technology, we often refer to these interconnected systems as a technology stack or simply a stack. A stack is a layered set of technologies, where different systems handle different responsibilities. Here, weâre using the term broadly to describe how different archival and library systems fit together (hence the quotes around the term "Stack"), rather than a specific programming/software stack. Examining examples of technology stacks helps us see how different systems communicate, where different types of data live, and how they integrate to form a functional ecosystem, whether weâre managing physical collections, digital archives, or both.
presenter notes As you can see, the systems that support digital repository operations are varied. Iâll go through each of these next, but itâs worth keeping in mind that they rarely work in isolation. In practice, digital repository systems are often bundled together and handle multiple functions at the same time. Also note that these describe systems in terms of functions.
presenter notes While Iâm going to define each system function one-by-one, remember: real institutions often bundle these functions together. One system can do multiple jobs at once â and weâll see that in the Bentley case study (coming up).
Presenter notes Please note that my use of "front-facing" or "front-end" refers to the parts of the system meant to be seen and consumed by an end-user (a patron, a researcher, a casual browser)
presenter notes If you want to take a deep dive into all the systems that are out there, there are a couple of resources to check out. The first one is a crowd-sourced Google Sheet, âThe Collection Management System Collectionâ, which was kick-started by Ashley Blewer, a software developer, educator, writer, and artist who has done incredible work, especially within the field of audio/visual and moving image preservation. In 2017, she made this spreadsheet publicly available for folks in the field to contribute system descriptions in a matrix form. Another helpful resource is the Community Owned Digital Preservation Tool Registry (COPTR) Tools Grid, which uses a Wiki format. This grid starts off with a matrix of general digital preservation object types like âaudioâ or âebookâ on the Y-axis, and broad digital preservation functional areas on the X-axis. You can click on any of the numbers to see a list of relevant tools for that object type/functional area, and further drill down into other functional area sub-categories. There are nearly 600 tools described in this Wiki. These lists show how many tools can fill the same function.
presenter notes Image from GIF Cities (https://web.archive.org/web/20091027084349/http://hk.geocities.com/kieou/3.htm) In the early days of digital archiving and preservation, a variety of platforms emerged, to better automate, standardize and streamline various processes. Systems like ArchivesSpace emerged, designed with the intention of enabling archivists to accession collections, describe them accurately, and create and publish finding aids. Yet, these systems were built in isolation, tailored to specific tasks without consideration for the full lifecycle of digital records.
presenter notes Systems integration describes â[a] functional coupling between software applications to act as a coordinated whole.â This quote comes from Max Eckardâs book *Making Your Tools Work for You*, which was originally âadopted from... the ArchivesSpace Technical Advisory Committee (TAC) Integrations sub-team, which goes on to state that âa defining characteristic of all integrations is communication, i.e., seamless data flowâwithout a manual, intermediary stepâbetween systemsâ (4). Integration characterizes the ability of one or multiple systems to âtalkâ to one another. The development of interoperable standards and the adoption of holistic digital asset management solutions have started to bridge the gaps between previously isolated systems. These integrated platforms streamline the archival processâfrom digitization to online accessibilityâreducing redundancy, minimizing errors, and significantly improving the discoverability of digital archives. In addition, integration allows you to maintain your current system âecosystem,â which is advantageous because no single system can do everything. In fact, having a single all-encompassing system might not be ideal. This modular approach enables systems ecosystems to be more flexible and adaptable over time.
presenter notes This is a screenshot of the system Archivematica. You will be using the Archivematica sandbox next week during your weekly activity.
presenter notes Don't worry, "SaaS", "open source" and "microservice" are defined in the next few slides!
presenter notes Rather than having to download software on your computer, you can access this software using a web browser. The software and all its data are hosted and maintained on remote servers by a third-party provider. A popular archives-specific example of a SaaS platforms are Archivematica, ArchivesSpace. A SaaS service you might have encountered in your own work: Google Drive, Zoom.
presenter notes The key idea behind open-source software is that it promotes collaboration and transparency, enabling developers and users to contribute to its improvement, adapt it to their needs, and share it freely.
presenter notes A microservice is an application designed to perform a single function within the digital curation and preservation process. The concept of a microservice was developed by the California Digital Library (CDL), which in 2009 introduced a new approach to the curation and preservation of digital objects. This reconceptualization challenged the assumption that âthe curation and preservation of digital objects required the installation and operation of a single, long-lived application combining the necessary functions behind one user interface.â Instead, CDL proposed that âsmall, relatively simple utilities would pose fewer challenges in their development, deployment, maintenance, and enhancement than a large, integrated system, especially in the context of constant technological change.â Additionally, they noted that users could âeasily adapt a set of distributed services to local conditions in different divisions and departments of the university, and easily replace each of them upon their obsolescence.â
presenter notes ArchivesSpace (aka ASpace) is an archival system primarily used throughout the accessioning, arrangement and description of archival collections. The data entered into ASpace can be used to produce finding aids in EAD XML format, so they may be viewed on the web. Collections, or bodies of work, are called âresourcesâ. Within each resource, you will find various levels of hierarchy that describe how a particular body of work is arranged, such as series or sub-series, which in ArchivesSpace are known as "archival objects". Archival objects can also be rolled up into what are known as "Top containers", which represent the physical containers/boxes that may be requested or circulated in a reading room or other special collection setting. So, ArchivesSpace also has a collection management side, as well as places to accession materials, and make connections between archival objects and digital objects.
presenter notes In our system-type terms, DSpace is doing two jobs at Bentley: 1 - It stores and organizes access-ready objects 2 - It is the public interface where users find and retrieve them That overlap is normal â itâs an example of bundled functions like we've been discussing throughout today.
presenter notes In the Bentley article you read about how they integrated Archivematica, ASpace and DSpace. The way they were integrated was modeled after the Digital Curation Center or DCC Lifecycle Model. Before we look at the Bentley Library example, we should understand what the DCC Model is, what it is for, and how it differs from the OAIS, specifically.
presenter notes The DCC Curation Lifecycle Model provides a high-level graphical overview of the stages required for successful curation and preservation of data, starting from initial conceptualization or receipt. This model can be used to plan activities within an organization or consortium to ensure all necessary stages are undertaken in the correct sequence. It enables granular functionality to be mapped against the lifecycle, helping to define roles and responsibilities and to build a framework of standards and technologies for implementation. Additionally, it supports the identification of extra steps that may be required, actions that are unnecessary for specific situations or disciplines, and ensures that processes and policies are thoroughly documented. For more information, refer to the [DCC Curation Lifecycle Model PDF](https://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf).
presenter notes Hereâs my cleaned-up more accessible version of the Bentley Diagram. The Bentley Historical Library's integrated system achieved the following: - Provided archivists with access to the ArchivesSpace interface directly from the context of the Archivematica system. This allowed them to use information generated during the Archivematica ingest process to inform appraisal tasks. - Enabled archivists to view ArchivesSpace resource records, add or edit archival descriptions, and create digital object instances in the finding aidâall without switching over to ArchivesSpace, using a tab within Archivematica. - Archivematica creates preservation packages (AIPs) and can deposit content into DSpace, which Bentley uses as the repository and access layer.
presenter notes The interoperability of these systems at Bentley was achieved using APIs. Application Programming Interfaces, or APIs, provide a way for different software applications to communicate and request services or data from each other without needing to understand the internal workings of the other system. They enable applications to interact and collaborate, simplifying the development of interoperability. While not always required, APIs often use web protocolsâsets of instructions specific to computers or servers within a networkâto execute requests, update data, and perform other tasks. APIs are very commonly used throughout digital repositories.
presenter notes Letâs unpack what we just said about APIs and the web. Why do they often go hand in hand? A common way we send instructions to other computers around the world is by opening a web browser and typing in a URL to access a website. Here we are unknowingly prompting our web browser to send instructions to a server somewhere in the world. We are all very used to using HTTP for our own, human-centric purposes, especially for browsing the web. However, websites and the servers that host them contain parts or areas that enable them to speak to other computers, with or without a human prompting that communication.
presenter notes - HTTP enables browsers to load web pages by requesting and receiving content from servers. - It also powers APIs, allowing applications to send and receive data over the web using URLs.
presenter notes - Designed to be easy to read/write for humans and machines. - Many APIs return data in JSON format because it is widely supported. When we make an API request, the response we get back needs to be structured in a way that both humans and computers can understand. One of the most common formats for this is JSON, or JavaScript Object Notation. JSON is a lightweight, easy-to-read format used for exchanging data between systems. Itâs widely used in APIs because itâs simple for machines to process while still being human-readable. When we requested data from the Library of Congress API, the response came back in JSON formatâstructured as key-value pairs that represent information. On the next slide, weâll take a look at how JSON is structured and why itâs useful for APIs.
presenter notes Think of an endpoint as a doorway to an API. Each API has multiple endpoints, each designed for a specific task, like searching for weather data or retrieving digitized images. ArchivesSpace provides a list of API endpoints. An API endpoint is a specific point of interaction between an API (Application Programming Interface) and the outside world, typically represented by a URL where the API can receive requests and send responses. ArchivesSpace offers online documentation for all available endpoints. Using our cooking analogy, an endpoint is like browsing the menu of a restaurant. In this case, I want to "order up" a list of repositories. To do this, I would search the ASpace REST API documentation for the keyword "repository" to see what it offers. Sure enough, there is an endpoint called "Get a List of Repositories," which seems to be exactly what I need. You can check out the documentation here: [Get a List of Repositories](https://archivesspace.github.io/archivesspace/api/#get-a-list-of-repositories) The documentation tells me that the specific endpoint is called `/repositories`. So, what does this mean for me?
presenter notes If you do not have the pretty-print option in your browser, you can copy and paste the data into this online tool: <a href="https://jsonformatter.org/json-pretty-print" target="_blank">https://jsonformatter.org/json-pretty-print</a>
presenter notes Q = "query" FO = "Format"
presenter notes Back to the Bentley! In the Bentley integration system, we learned that Archivematica, a web-based system, can talk to ASpace, another web-based system, using an API. They do this using a combination of both the HTTP protocol, as well as another protocol known as REST.
presenter notes Here we are seeing Archivematica calling up the archival object tree for a single ArchivesSpace resource using the ASpace API, and presenting it to the user as a list of nested folders.