presenter notes Last week, we started discussing born-digital processing and the hands-on work of managing and preserving digital materials through imaging and transfers. We’ll continue exploring different aspects of that work in the coming weeks. This week, we’re zooming out to look at systems: the tools that help us do this work at scale. To effectively manage digital collections, we need systems that can: a.) Store and process data about files, such as those created through imaging or digitization. b.) Make this data accessible to end-users—researchers, patrons, and casual browsers—so they can search, discover, and make sense of these materials. I’ll walk through two real-life examples of archival systems in use: - At Yale (where I work) - At the Bentley Historical Library at the University of Michigan These examples will show how different systems work together to support digital archiving. Beyond just having systems, we also need to ensure that they communicate with each other—this is where interoperability comes in. If a system can send, receive, or exchange data with another system, it means they are interoperable. One of the primary ways that systems "talk" to one another is through Application Programming Interfaces, or APIs. Today, we’ll introduce APIs and discuss how they allow different systems to share data, automate tasks, and enhance access to digital collections.

presenter notes To approach the topic of systems used in a digital repository (and defining what a digital repository is), let’s start by exploring the systems that support a physical archival repository. An archival repository is a physical place where archives are kept. The physical components of an archival repository are stored on shelves (aka “stacks”) in, ideally, highly monitored, climate-controlled spaces, to ensure the materials are protected from risks posed by things like rodents, bugs, and humidity, but also organized in a way so items can be retrieved for access or remediation.

presenter notes Let’s imagine we are working in a purely physical repository: just record boxes on shelves, no digital items. That would be pretty rare today, but for the sake of this example, let's focus only on the systems that help manage physical collections. Let's pretend it's the year 1995. Here’s a non-exhaustive list of systems we might be using in this environment. Oftentimes, these systems work separate from one another. For example, you would have the archivist entering finding aid data into a word processing document; you would have a spreadsheet or a simple database tracking object locations and reading room requests, a separate website that published a list of archival collections, and an online catalog entry for the archival collection, maybe with a special instruction to email the archives to schedule an appointment. Once you got to your appointment, the archivist would hand you a paper-based finding aid.

presenter notes Link to Rachael’s blog post: https://www.tdl.org/2019/04/what-is-a-digital-repository/ Digital repositories function much like physical archival repositories—both are designed to organize, store, and provide access to materials. The way digital repositories are set up, managed, and maintained often mirrors how physical stacks work, following similar principles of organization and preservation. However, digital repositories introduce added complexity. The biggest difference is in how materials are managed, requiring multiple layers of systems to track different aspects of digital objects. For example, a floppy disk in a box would be tracked in a collection management system, which records its physical location—inside a specific box, on a particular shelf, in a room with controlled temperature and humidity. But that same floppy disk may also have a digital presence that needs additional tracking. A separate system would record that a disk image exists, where it is stored on a server, and when its checksum was last verified. Modern archival systems also need to connect the physical object to its digital manifestation, ensuring that both are managed in relation to each other. It would need to make that information available both to repository managers, and end-users, in a way that is understandable and discoverable. A digital repository is just one layer in a larger system that tracks both the physical and digital characteristics, versions, and events of an archival object. These layers work together to maintain integrity, access, and preservation across different formats. Just like physical stacks are supported by multiple systems, a digital repository is supported by multiple systems too.”

Presenter notes While no perfect system of integrated systems exists, we are seeing more of an effort to make each of these system types work with one another. There are many reasons behind this push, and likely has had a lot to do with collections becoming increasingly online/networked. By nature, a network exchanges information, and in order to do that, systems need to be able to "talk" to each other in a way that makes them mutually understandable to one another. We also saw an extra push during COVID lockdown: this event increased patron expectations be able to look at materials in an online-only setting.

presenter notes Now that we’ve talked about how different archival systems interact, both for physical and digital materials, let’s step back and look at how these systems fit together structurally. In the world of technology, we often refer to these interconnected systems as a technology stack or simply a stack. A stack is a layered set of technologies, where different systems handle different responsibilities. Here, we’re using the term broadly to describe how different archival and library systems fit together (hence the quotes around the term "Stack"), rather than a specific programming/software stack. Examining examples of technology stacks helps us see how different systems communicate, where different types of data live, and how they integrate to form a functional ecosystem, whether we’re managing physical collections, digital archives, or both.

presenter notes As you can see, the systems that support digital repository operations are varied. I’ll go through each of these next, but it’s worth keeping in mind that they rarely work in isolation. In practice, digital repository systems are often bundled together and handle multiple functions at the same time. Also note that these describe systems in terms of functions.

presenter notes While I’m going to define each system function one-by-one, remember: real institutions often bundle these functions together. One system can do multiple jobs at once — and we’ll see that in the Bentley case study (coming up).

Presenter notes Please note that my use of "front-facing" or "front-end" refers to the parts of the system meant to be seen and consumed by an end-user (a patron, a researcher, a casual browser)

presenter notes If you want to take a deep dive into all the systems that are out there, there are a couple of resources to check out. The first one is a crowd-sourced Google Sheet, “The Collection Management System Collection”, which was kick-started by Ashley Blewer, a software developer, educator, writer, and artist who has done incredible work, especially within the field of audio/visual and moving image preservation. In 2017, she made this spreadsheet publicly available for folks in the field to contribute system descriptions in a matrix form. Another helpful resource is the Community Owned Digital Preservation Tool Registry (COPTR) Tools Grid, which uses a Wiki format. This grid starts off with a matrix of general digital preservation object types like “audio” or “ebook” on the Y-axis, and broad digital preservation functional areas on the X-axis. You can click on any of the numbers to see a list of relevant tools for that object type/functional area, and further drill down into other functional area sub-categories. There are nearly 600 tools described in this Wiki. These lists show how many tools can fill the same function.

presenter notes Image from GIF Cities (https://web.archive.org/web/20091027084349/http://hk.geocities.com/kieou/3.htm) In the early days of digital archiving and preservation, a variety of platforms emerged, to better automate, standardize and streamline various processes. Systems like ArchivesSpace emerged, designed with the intention of enabling archivists to accession collections, describe them accurately, and create and publish finding aids. Yet, these systems were built in isolation, tailored to specific tasks without consideration for the full lifecycle of digital records.

presenter notes Systems integration describes “[a] functional coupling between software applications to act as a coordinated whole.” This quote comes from Max Eckard’s book *Making Your Tools Work for You*, which was originally “adopted from... the ArchivesSpace Technical Advisory Committee (TAC) Integrations sub-team, which goes on to state that ‘a defining characteristic of all integrations is communication, i.e., seamless data flow–without a manual, intermediary step–between systems” (4). Integration characterizes the ability of one or multiple systems to “talk” to one another. The development of interoperable standards and the adoption of holistic digital asset management solutions have started to bridge the gaps between previously isolated systems. These integrated platforms streamline the archival process—from digitization to online accessibility—reducing redundancy, minimizing errors, and significantly improving the discoverability of digital archives. In addition, integration allows you to maintain your current system “ecosystem,” which is advantageous because no single system can do everything. In fact, having a single all-encompassing system might not be ideal. This modular approach enables systems ecosystems to be more flexible and adaptable over time.

presenter notes This is a screenshot of the system Archivematica. You will be using the Archivematica sandbox next week during your weekly activity.

presenter notes Don't worry, "SaaS", "open source" and "microservice" are defined in the next few slides!

presenter notes Rather than having to download software on your computer, you can access this software using a web browser. The software and all its data are hosted and maintained on remote servers by a third-party provider. A popular archives-specific example of a SaaS platforms are Archivematica, ArchivesSpace. A SaaS service you might have encountered in your own work: Google Drive, Zoom.

presenter notes The key idea behind open-source software is that it promotes collaboration and transparency, enabling developers and users to contribute to its improvement, adapt it to their needs, and share it freely.

presenter notes A microservice is an application designed to perform a single function within the digital curation and preservation process. The concept of a microservice was developed by the California Digital Library (CDL), which in 2009 introduced a new approach to the curation and preservation of digital objects. This reconceptualization challenged the assumption that “the curation and preservation of digital objects required the installation and operation of a single, long-lived application combining the necessary functions behind one user interface.” Instead, CDL proposed that “small, relatively simple utilities would pose fewer challenges in their development, deployment, maintenance, and enhancement than a large, integrated system, especially in the context of constant technological change.” Additionally, they noted that users could “easily adapt a set of distributed services to local conditions in different divisions and departments of the university, and easily replace each of them upon their obsolescence.”

presenter notes ArchivesSpace (aka ASpace) is an archival system primarily used throughout the accessioning, arrangement and description of archival collections. The data entered into ASpace can be used to produce finding aids in EAD XML format, so they may be viewed on the web. Collections, or bodies of work, are called “resources”. Within each resource, you will find various levels of hierarchy that describe how a particular body of work is arranged, such as series or sub-series, which in ArchivesSpace are known as "archival objects". Archival objects can also be rolled up into what are known as "Top containers", which represent the physical containers/boxes that may be requested or circulated in a reading room or other special collection setting. So, ArchivesSpace also has a collection management side, as well as places to accession materials, and make connections between archival objects and digital objects.

presenter notes In our system-type terms, DSpace is doing two jobs at Bentley: 1 - It stores and organizes access-ready objects 2 - It is the public interface where users find and retrieve them That overlap is normal — it’s an example of bundled functions like we've been discussing throughout today.

presenter notes In the Bentley article you read about how they integrated Archivematica, ASpace and DSpace. The way they were integrated was modeled after the Digital Curation Center or DCC Lifecycle Model. Before we look at the Bentley Library example, we should understand what the DCC Model is, what it is for, and how it differs from the OAIS, specifically.

presenter notes The DCC Curation Lifecycle Model provides a high-level graphical overview of the stages required for successful curation and preservation of data, starting from initial conceptualization or receipt. This model can be used to plan activities within an organization or consortium to ensure all necessary stages are undertaken in the correct sequence. It enables granular functionality to be mapped against the lifecycle, helping to define roles and responsibilities and to build a framework of standards and technologies for implementation. Additionally, it supports the identification of extra steps that may be required, actions that are unnecessary for specific situations or disciplines, and ensures that processes and policies are thoroughly documented. For more information, refer to the [DCC Curation Lifecycle Model PDF](https://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf).

presenter notes Here’s my cleaned-up more accessible version of the Bentley Diagram. The Bentley Historical Library's integrated system achieved the following: - Provided archivists with access to the ArchivesSpace interface directly from the context of the Archivematica system. This allowed them to use information generated during the Archivematica ingest process to inform appraisal tasks. - Enabled archivists to view ArchivesSpace resource records, add or edit archival descriptions, and create digital object instances in the finding aid—all without switching over to ArchivesSpace, using a tab within Archivematica. - Archivematica creates preservation packages (AIPs) and can deposit content into DSpace, which Bentley uses as the repository and access layer.

presenter notes The interoperability of these systems at Bentley was achieved using APIs. Application Programming Interfaces, or APIs, provide a way for different software applications to communicate and request services or data from each other without needing to understand the internal workings of the other system. They enable applications to interact and collaborate, simplifying the development of interoperability. While not always required, APIs often use web protocols—sets of instructions specific to computers or servers within a network—to execute requests, update data, and perform other tasks. APIs are very commonly used throughout digital repositories.

presenter notes Let’s unpack what we just said about APIs and the web. Why do they often go hand in hand? A common way we send instructions to other computers around the world is by opening a web browser and typing in a URL to access a website. Here we are unknowingly prompting our web browser to send instructions to a server somewhere in the world. We are all very used to using HTTP for our own, human-centric purposes, especially for browsing the web. However, websites and the servers that host them contain parts or areas that enable them to speak to other computers, with or without a human prompting that communication.

presenter notes - HTTP enables browsers to load web pages by requesting and receiving content from servers. - It also powers APIs, allowing applications to send and receive data over the web using URLs.

presenter notes - Designed to be easy to read/write for humans and machines. - Many APIs return data in JSON format because it is widely supported. When we make an API request, the response we get back needs to be structured in a way that both humans and computers can understand. One of the most common formats for this is JSON, or JavaScript Object Notation. JSON is a lightweight, easy-to-read format used for exchanging data between systems. It’s widely used in APIs because it’s simple for machines to process while still being human-readable. When we requested data from the Library of Congress API, the response came back in JSON format—structured as key-value pairs that represent information. On the next slide, we’ll take a look at how JSON is structured and why it’s useful for APIs.

presenter notes Think of an endpoint as a doorway to an API. Each API has multiple endpoints, each designed for a specific task, like searching for weather data or retrieving digitized images. ArchivesSpace provides a list of API endpoints. An API endpoint is a specific point of interaction between an API (Application Programming Interface) and the outside world, typically represented by a URL where the API can receive requests and send responses. ArchivesSpace offers online documentation for all available endpoints. Using our cooking analogy, an endpoint is like browsing the menu of a restaurant. In this case, I want to "order up" a list of repositories. To do this, I would search the ASpace REST API documentation for the keyword "repository" to see what it offers. Sure enough, there is an endpoint called "Get a List of Repositories," which seems to be exactly what I need. You can check out the documentation here: [Get a List of Repositories](https://archivesspace.github.io/archivesspace/api/#get-a-list-of-repositories) The documentation tells me that the specific endpoint is called `/repositories`. So, what does this mean for me?

Week 4

Digital Archives Systems

Today

Announcements

Digital Archives Systems

Definition

Archival Repository (Physical)

System Types - Physical Based

System Types

Systems Integration

Definition

Technology "Stack"

List of Digital Repository System Functions

Descriptive and Bibliographic

Digitization Workflow

Digital/Media Asset Management

Digital Preservation

Public Access and Discovery

Metadata Management

Storage Infrastructure

Rights and Access

Workflow and Project Management

There are so many systems!

Integrating Systems

How to Knit Software Systems Together

Question

Why might one system need to communicate with another?

Challenges of Disconnected Systems

Definition

Systems Integration

Short Stack: Bentley Systems Integration

Archivematica + ArchivesSpace + DSpace

Digital Preservation System

Archivematica

Definition

Software as a Service (SaaS) - 1/2

Definition

Software as a Service (SaaS) - 2/2

Definition

Open source

Definition

Microservice

Just Some Archivematica Microservices

Definition

Normalize

Descriptive & Bibliographic System

ArchivesSpace (Aspace)

Digital/Media Asset Management System

DSpace

Definition

Digital Curation

Definition

Application Programming Interface (API)

You have likely used an API unknowingly!

Definition

Hypertext Transfer Protocol (HTTP)

Definition

JavaScript Object Notation (JSON)

Definition

Endpoint

Definition

REpresentational State Transfer (REST)

REST Method: GET

REST Method: POST

REST Method: PUT

REST Method: DELETE

APIs enable...

Weekly Activity

Short Stack