Speaker 1: I'm Oluwashiong Obashola. A librarian from Nigeria working at the British Library in London on a project tag, Big Data and Libraries. The video you're about to watch focuses on digital archiving. The aim of the video is to provide basic knowledge about digital archiving for librarians and archivists in training, especially in Africa. Digital archiving starts with digitization of archival materials, which involve the use of technology, both hardware and software in managing valuable records that are born digital and those converted into digital form. Materials for digital archiving should be original, authentic, reliable and usable. Digital archiving involves planning, creation of digital objects, acquisition or injection, cataloguing, preservation and storage, access for use and reuse and evaluation. However, the processes can vary depending on the type of material.
Speaker 2: To plan well for digitization, one needs to spend quite a bit of time on scoping for projects, and this would involve answering quite a few questions, such as what are we to digitize and why is this important, who is the audience we are digitizing this for, how are we going to achieve the digitisation, what access would look like, and what do we know about the collection, do we have metadata available, and of course to consider what the budget will be, what the condition of the collection is, and are there any rights issues with the collection.
Speaker 1: Archiving ensures preservation of valuable records for long-term use. Efficient retrieval of important records for use and reuse. It prevents loss of valuable records.
Speaker 3: From an archiving point of view, the main things we're worried about is web resources rot, disappear very quickly. We did some studies on the stuff we've collected over the years and whether it's still on the web now and we found that even within one or two years, more than half of the content we'd archived had already disappeared from the live web. It's not just about collecting a snapshot of how the stuff looked at a given time, it's It's also the fact that most of this content simply disappears within a year or two of being published. And we found that when you're 10 years out, there's less than about 5% of content is still on the web, unchanged as it was 10 years ago.
Speaker 1: Archiving ensures the provision of records that can tell stories about events, individuals, people or organisations. how the materials can be used as evidence during legal proceedings to ensure justice. During preparation for digital archiving, there is the need for a careful consideration of both hardware and software that will be used to avoid digital obsolescence.
Speaker 4: When you have a book, you can put a book on a shelf very safely for a long time without anything happening to it. But if you have a digital object and you don't check it regularly, it can become inaccessible very quickly, and that's because the world around it changes so quickly. So the means that you needed to have in order to access it may not be available anymore, but we may not be aware of it. So what we need to do is we need to, what we call, characterize it. So we need to make an inventory of all the very important properties of this digital object. And then we need to keep a watch and see how the world changes and whether there is a certain risk arising that may affect the scheduled object. So when we find out that, for example, a file format is no longer supported or a license for a software product expires and we have no right anymore to access it, then we need to think about what other preservation actions that we need to take in order to ensure the long-term access for the continued access of this. And so what we have is we have constant watchfulness, and we have, as I said earlier, the provenance, where we're going from possibly one migration to another, so we keep the thing accessible. Or alternatively, if we don't want to migrate, for example, we could say we emulate the platform on which the digital object was used. So if I have an old computer system, and this computer system is just not used anymore, but there are lots of files that all used to run on it, what I can do instead, I can take a modern computer system and I can emulate that old system on it. And that means then on the emulation platform, I can still use all those digital objects.
Speaker 1: Hardware used for digital archiving They include computers, scanners, cameras and storage devices.
Speaker 5: The scanners, the main scanners we use, the Zeutschel 1200 scanners, they're a scanner that has a glass platen, but that platen doesn't have to be used. We can scan without the glass on those machines. So for more fragile items, stuff that we have to be particularly careful with, those machines are very good for that sort of work. So the software will correct out the curvature in the pages to a large degree. The beds are highly adjustable, so regardless of the size of the item, we can get them to sit on there nicely. The larger machines like the one behind me, although we can scan without the glass, it's a little more restrictive on a machine like this, however on these machines the beds are highly adjustable. The pressure of both the glass and the bed itself can be adjusted to a very high degree to make sure that there's no damage to the material, but some material is sent to us and we are told you cannot touch the surface with the glass. So this will dictate which machines we use for which projects. As I was speaking about the collection care issues earlier. Also within one project we might have different types of material.
Speaker 6: This is the machine we would use for any cut sheet items or loose sheet items. A lot of that we get. People sending could be anything from card files, we might have material that's actually been cut so we can feed it through as individual sheets and this automatically scans both sides at the same time. It can also detect color in black and white or grayscale and then we can output to all the different formats from this as well. The operator only has to select whichever job with the predetermined job settings in. So they select them, it defaults to up and we size the tray at the bottom. It has ultrasonic sensors which detect there's paper there and will also detect two sheets feeding at once as well and will give us a warning so when we're happy it's loaded okay we can then start the batch scan, which I've just done. And as you can see, it's very fast. I haven't actually got the auto-rotate on, but we can turn the auto-rotate on. We don't always have it on because sometimes you have images that aren't meant to be rotated. We do that manually afterwards, but as you can see, it's captured front and back. If it's identified some color, it'll capture in color. If it hasn't identified color, it'll capture in the black and white or gray scale. files it's actually got ignore blank page on so it's not captured the back because there's nothing on there and it's just captured the front of the cards. We would then once it's finished just click on the finalize button and that actually if you can see this it's starting to process the batch and that will be creating from the job setting either we've asked it to just send a TIFF or a multi-TIFF to a particular folder on the network it might be creating a PDF and it might be doing the OCRing of that PDF as well and and delivering it to a folder we've specified already for that job for someone to then check the work and post-process it further.
Speaker 1: The dates the archival object was born should be in the ISO 601-2004 standard. The ISO 8601-2004 standard format provides a consistent method for version tracking over the years. It is also important to state the context in which a digital object was created. For an estimation of the file size for an image, the formula displayed can be used. For easy access either globally or locally, archival materials can be deployed by adapting the Open Archive Information System, ISO standard 147-121-2003. See ICRC standards for more information.
Speaker 7: The key, I think, first step if you want people to access your information is to think about who are the likely user groups that may want to come to you and where are they likely to find the information that you want to get them to access. So having your own library catalogue or archival catalogue or sort of a page of information of the collections that you have is quite important, but equally you have to make sure the other services can pick up the information. So for example when you design your website it's quite important to think about how can Google and other the search engines index that website and have enough information on there that's sort of set up in a way that Google can find the information. The keyword here is search engine optimization and there are quite a lot of websites that sort of provide basic information on how to set this up. So it's effectively thinking outside the existing library catalogue and say what are my users using? Are there social networks that are perhaps prominent in the country that you're in? Or are there sort of tools that are provided by by publishers and by others.
Speaker 1: Digital objects, DOI, should be obtained for archival material. A DOI is a special persistent tag for an archival material. DOIs link the user to three things, the object, its metadata, and the current provider's commitment statement.
Speaker 8: We use, as an overall framework, METS, the format that was developed at the Library of Congress, for maintaining information about digital objects. We use that as a framework. And within that, we may plug in then other metadata elements relating to items like preservation and description. So for preservation, we would use the premise standard. For descriptive metadata, we'd use the descriptive standard, which is appropriate to the type of material. So in the case of books and serials, traditional material that is catalogued by the library, we would use something like the Mark 21 or Mark XML descriptive framework, use the new RDA, resource description access standard for cataloguing rules for that material. But for something like manuscript material, we would use a different set of descriptive standards which are appropriate to that archival world. So, for example, the ISAD-G standard for general materials. We might, in the case of doing some detailed digitisation of a manuscript, a particular Asian material at the moment, we're experimenting with the Text Encoding Initiative metadata standard for enriching the descriptive information around the manuscript itself and also creating metadata. We're holding that in our integrated archive and manuscript system and we are then exporting that to users as required. With commercial material, often you can use standard identifiers, such as in the case of an e-book, an ISBN or e-journal, an ISSN, to sometimes derive additional data to enhance the description. But in the case of the unique manuscript and older digital material, that's much harder to do due to the lack of widely available unique identifiers for that type of material. But one thing we are also looking at doing is, where possible, enhancing certain elements within the descriptive metadata itself through the addition of things like the new International studied name identifier, effectively the ISBN for authors, a unique identifier that enables us to assign identities. Incredibly wide range of material to deal with, a huge amount of challenge to handle and a lot of expectations to deal with as well and that's perhaps the greatest challenge of all because people are always seeing the most cutting edge developments in the world.
Speaker 1: I hope you have enjoyed this video and you found it both useful and informative. Many thanks to Chevening, British Library and British Library staff for supporting the project.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now