presenter notes This semester’s syllabus is hosted on Github. Github is an online platform that is used to store and version information. It is also a platform used widely in the digital archives and preservation fields. We will cover what Github is, more, later on in the semester, and see some "real life" examples of digital archiving and preservation repositories. But for now, you will be using it primarily to access the class syllabus, as well assignments and other documents we will be using for in-class activities. Syllabus link: https://github.com/kiddmary/HIST-GA-1011
presenter notes I want to step you through basic concepts to do with what digital information is, and in particular, how it is encoded.
presenter notes Lyons, Bertram. "Digital Preservation." In The Digital Archives Handbook: A Guide to Creation, Management, and Preservation, edited by Aaron D. Purcell, 3-18. Rowman & Littlefield Publishers, 2019. Accessed September 11, 2023. http://ebookcentral.proquest.com/lib/nyulibrary-ebook/detail.action?docID=5646172, 3.
presenter notes Let's unpack this definition by thinking a bit about Data Objects we encounter through our life and work. We will return to defining Bitstreams later on.
Data Objects encapsulate various forms of digital content, such as documents, media, or software. All Data Objects, whether it's a single file, or an entire application, will require specialized software, hardware, emulation, specialized knowledge, or one or all of these things, to faithfully render and understand, ensuring their long-term accessibility and preservation.
presenter notes
presenter notes In this next section, we will talk about how binary digits are used by data objects to encode digital information, and that information is in turn used by computers to render information. Before we do so, it's good to take a pause and think broadly about how we generally use numbers to represent things in the world. My favorite example of this is a zip code. In the United States at least, a zip code is composed of five numbers that enables the postal service to quickly identify where a piece of mail is bound or returning. Take a second here to think through why a zip code might be more efficient than a non-numeric system to represent a place in the world: that is, writing out the specific location where a piece of mail is heading. A good example might be disambiguation between street addresses. Let's say I'm sending a letter to 11 91st Street. Though I haven't counted, there are likely many, many 11 91st Streets throughout the United States. Now, you can further clarify where the mail is going by writing out the city and state, which we commonly do when we prepare mail to be sent. However, this still might not be enough information to clarify where this mail is bound. For example, in New York City, where we live, there are several 11 91st Streets, depending on the borough: there's one in Queens and another in Brooklyn and an 11 East 91st Street in Manhattan. And, maybe this is a very New York-y thing, but I've received letters to me where the city and state is listed as New York, New York rather than Brooklyn: technically, both are true, since Brooklyn is a part of New York City. This is where zip codes come in handy. They're not random groups of five letters; instead, they are structured in a way that indicates with increasing granularity where something is going. The first digit represents a certain group of U.S. states, the second and third digits together represent a region in that group (or perhaps a large city), and the fourth and fifth digits represent a group of delivery addresses within that region.
presenter notes Over these next slides, I will be talking about the binary or base-2 counting system. Why should we care about this? This is because binary is as close as you can get to the underlying physicality of any computer. In our day-to-day we are actually several layers removed from what goes on underneath our computers: we are likely only really interfacing with a GUI (graphical user interface, so the buttons, windows, words, etc. that are displayed on your screen), or programming something using a specific language. But all of this is abstracted up from what are essentially just billions of transistors and logic gates.
presenter notes Binary is an encoding scheme that, instead of using the decimal digits (0-9) we are used to using to represent information, uses binary digits (1 and 0), known more commonly as "bits". So, a 1 is a bit, and a 0 is a bit, and that's all there is in a binary system. 1 or 0. Since there are only two possible values used, binary is considered what's known as a base-2 system. Along with bits, binary also uses place values to represent information. Place values are a term we were all probably introduced to in elementary or middle school. So, let's switch gears and look at the encoding scheme we are most used to: The base-10 decimal digit system.
presenter notes https://commons.wikimedia.org/wiki/File:HP_1813-0091_top_case_removed.jpg#Summary
presenter notes If you did not know this already, the numbers that you and I are most familiar with are written in a "base-10" decimal system. The 10 in base-10 refers to the fact that it uses 10 decimal values (0-9) to represent numeric values.
presenter notes Therefore, when we write out a number like twelve (12), we don't have a specific decimal number that represents 12 (otherwise it would be called a base-11 system). Instead, we combine a 1 and a 2 together to form a 12. The 2 is in the "ones" place, which we know to be the right-most decimal, and the 1 is in the "tenths" place. By combining decimals and using place values, we can represent any number.
presenter notes As in our "12" example, for 6,478,341, each digit’s **place** has a **weight**, a power of 10, that we subconsciously add together. We may even insert a nice comma in there to separate chunks of 3 places to make large numbers like this easier to read.
presenter notes A byte is a discrete-length grouping of bits. In the slide, we have an example of a byte whose length is 8 bits. You can think of a byte as a container that holds a certain amount of bits. Computers are built to handle specific byte lengths. Some handle 8-bit bytes, others 16, or 32.
presenter notes An 8-bit byte system means each byte contains 8 bits. Each bit represents 1 of 2 values: a 1 or 0. To calculate how many different combinations of 8 1s and 0s, we raise the number 2 (standing for 2 possible values) to the power of 8 (8 total bits). From this, we get 256 possible values.
presenter notes Comparing an 8-bit Nintendo Entertainment System to a 16-bit one side-by-side. There are more colors, shades, textures, and tones in the right-hand screen. The more values you can encode, the more colors and other visual details you can represent on-screen.
presenter notes So how do we get from bits to Mario - or in my example in the slide, an image of Pikachu? The constituent parts of an image are known as pixels, which are tiny squares of one particular color. The color of a single pixel can be encoded in what is known as the Red, Green and Blue color model, aka RGB. The RGB color model creates colors by combining various levels of the colors red, green, and blue. Let’s pretend that the particular system we are using to render Pikachu is an 8-bit system, which means that each of the red, green and blue values can be represented by a combination of up to eight 1s and 0s, which corresponds to the intensity or amount added for each color to create the color we see on the screen. We can express these 8-bit bitstreams by a pixel decimal number ranging from 0 to 255. Each of these three values from 0 to 255 can be translated further into what are known as hexadecimal values. Hexadecimal values come in two alphanumeric character pairs, each which represent 4 bits. Since we are using an 8-bit system, each of the red, green and blue values corresponds to a 2-character hex value. Hex values can then be broken down into bits. In this case, F stands for 1111, so two Fs equals 11111111.
presenter notes Let’s shift from the raw binary representation to something more familiar—an actual word. In this case, let's use the word "OK" as an example. When you see the word "OK" on a computer screen, you’re looking at an abstraction built on several layers of encoded data. The process that brings that simple word to your screen involves multiple transformations, from human-readable characters to machine-interpretable code. In the table on the slide, the left-hand column names each of these layers, while the right-hand column shows how the computer encodes and interprets the information. We are going to "drill down" through these layers, one-by-one.
presenter notes The first layer is what you see—the letters "O" and "K." Notice how I call these, in the chart "ASCII" (pronounced ask-key).
presenter notes Image source: https://upload.wikimedia.org/wikipedia/commons/1/1b/ASCII-Table-wide.svg
presenter notes Each letter is assigned a decimal number through a computer’s internal dictionary, also known as the ASCII table. The letter "O" corresponds to the decimal number 79, and "K" corresponds to 75.
presenter notes Then, these decimal values are often converted into a hexadecimal system for efficiency, where "O" becomes 4F and "K" becomes 4B. You can think of hexadecimals, referred sometimes in short as "hex", as a kind of shorthand for bytes.
presenter notes These values are converted into their binary representations: 01001111 for "O" and 01001011 for "K." At its core, computers understand and process everything in bits and bytes. In this case, each character in "OK" is made up of 8 bits, with a specific combination of 1s and 0s. These bits are then stored physically in hardware.
presenter notes If we could microscopically zoom into the physical storage—like a hard drive or memory chip—we would see that these bits are stored using electrical signals or magnetic charges. Think of each 1 and 0 as a tiny "on" or "off" switch, or a north/south magnetic direction. For example, a 1 might be represented by a magnetic field pointing in one direction, while a 0 is stored as the magnetic field pointing in the opposite direction. On a hard drive or chip, this encoding process happens for every single bit, ensuring that what you see on the screen is faithfully represented by physical signals underneath. So, whether you're reading a word, watching a video, or listening to music, it's all fundamentally encoded in binary and stored physically as on/off signals or magnetic impressions. This entire process—from the word "OK" you see on the screen down to the magnetic signals on a storage device—is how modern computing translates information into a format both humans and machines can understand.
presenter notes Here is a sample list of binary values, corresponding to decimal values, in an 8-bit system. In the right-most column, we have 10 decimals, 0 through 9, and their corresponding binary values. In an 8-bit system, the complete list would show 256 possible values. You may have noticed that, there seems to be a pattern in the placement of 1s and 0s for each decimal going up in succession. Bytes are not arbitrarily assigned to decimals: there is a mathematical system, corresponding to chains of logic gates that are the physical manifestation of math (adding, subtracting, etc.) behind that make it so, if you take a binary value, you can reverse-engineer it to determine, in a few steps, the decimal value it represents.
presenter notes Each bit has its own place or position, which is mapped out on the slide. In an 8-bit system, we have 8 possible place values, starting from place 0, up to place 7. Places are read from right to left.
presenter notes What do we mean by weight? A good example comes from the base-10 decimal system we are most familiar with.
presenter notes - The 1 in Place 0 carries a weight of 2^0 or 1. We multiply by 1 to get a Value of 1 - The 1 in Place 1 carries a weight of 2^1 or 2. We multiply by 1 to get a Value of 2. - The 1 in Place 2 carries a weight of 2^2 or 4. We multiply 4 by 1 to get a Value of 4. - Add together all values: 4 + 2 + 1 = 7