In this installment of the Insights Interviews series, a project of the Innovation Working Group of the National Digital Stewardship Alliance Innovation Working Group, I’m talking to Dirk von Suchodoletz from the Department of Computer Science at the University of Freiburg, Germany and a representative to the Open Planets Foundation. He and his research partner Klaus Rechert visited the Library in October of 2012 to give a public presentation of his work on emulation and we thought it would be useful to get him to discuss it in even more detail for the blog.
This is Part One of a two-part interview. Part Two will run on Tuesday Dec. 11, 2012.
Tell us briefly about your work at the University of Freiburg and your involvement in the Planets Project and the Open Planets Foundation.
I’m currently holding a position as a lecturer and principal researcher at the chair of “communication systems” at the Institute for Computer Science at Freiburg University. Besides doing research I’m teaching seminars, planning and organizing courses and supervising the scientific work and thesis preparation of the students working with the chair’s various research groups. The courses are both on the Bachelor and Master level and focus on communication networks and computer systems, from the introduction of routing principles to telecommunication in large networks, covering issues like mobile networks, location-based services and privacy. Practical issues such as programming special client/server applications in Open Source environments or building blocks of our digital preservation workflows are part of the supervised student work.
My main research interests are in emulation-based digital preservation and access. I’m looking into the several building blocks which could be the foundation of “Emulation as a Service,” a cloud service providing remote access to a wide range of different emulation services allowing for object migration, access and interaction in their original environments.
Originally I got involved with emulation and digital preservation through a thesis idea of a colleague on maintaining access to popular computer games of the home computer era in the 1980s. This led to this first small research group joining the German nestor initiative in 2005 and we got invited to participate in the large scale EU integration project PLANETS which ran from 2006 to 2010. We became part of the preservation action team and especially the emulation working group. This brought us in contact with practitioners at the major memory institutions like the British Library, the National Libraries of the Netherlands, Denmark and Austria as well as national archives.
In our research we looked into requirements for reliable emulation, automation of preservation workflows and remote access to original software and hardware environments. After the end of PLANETS we were among the founding members of the Open Planets Foundation (OPF) and started to look into digital software archiving, reference workstations for end-user access to diverse emulation services as well as migration-by-emulation projects. The OPF brings together the major players in the domain of digital preservation around Europe and coordinates events like the regular hackathons, bringing together practitioners and researchers to work on practical challenges, like the hackathon on emulation we just had here in Freiburg.
In 2011 our research group joined into a Baden-Wurttemberg state-sponsored project on practical emulation workflows to complement services in libraries and archives. Recently I cooperated with the National Archives of New Zealand in an emulation office environment reference project and an early 1990s floppy disk recovery project.
Our readers may not fully understand some of the concepts behind emulation for digital preservation. Could you give us a little background?
Emulation is a concept in digital preservation to keep things, especially hardware architectures, as they were. As the hardware itself might not be preservable as a physical entity it could be very well preserved in its software reproduction. There are a variety of tools available to run a second operating system on top of your actual working environment. The so-called guest-operating systems optimally do not “see” any difference between the real thing (the metal and circuits physical object) and it’s software reproduction. Thus, not only does the hardware need to be kept functional, but also a piece of software.
This approach introduces a lot of advantages: Emulators and installed original environments can easily be distributed and replicated. For memory institutions old digital artifacts become more easy to handle. They can be viewed, rendered and interacted-with in their original environments and do not need to be adapted to our modern ones, saving the risk of modifying some of the artifact’s significant properties in an unwanted way. Instead of trying to mass-migrate every object in the institution’s holdings objects are to be handled on access request only, significantly shifting the preservation efforts.
Plus, the emulation approach maintains access to every type of object ever produced if an appropriate emulator and the original software is available. It offers the unique chance of using objects in their creation environment. In most cases the applications or operating systems developed by the format vendors or software producers are the best candidates for handling a specific object of a certain type. The vendors are expected to have the most complete knowledge about their own data formats, and especially for proprietary formats the format information is often incomplete. Often there are no other alternatives than these original environments because of the proprietary nature of many digital objects and formats.
In short: Emulation uses a different approach compared to other well-established migration strategies in digital preservation. Emulation strategies usually do not operate on the object itself, but are intended to preserve the object’s original digital environment. Emulation helps in becoming independent of future technological developments and avoids the modification of certain digital artifacts in a digital long-term archive.
How did you get interested in emulation as a digital preservation solution?
It was a combination of several strands. Working at the university requires you to find a PhD topic and a colleague came up with the idea of eternal computer game access using emulation. Another strand was the interest in virtualization. I’m involved in an infrastructure project to make Windows maintainable and usable in student computer pool environments. A wide range of different environments for courses has to be made available on top of the same machines. Combining these strands brought me into the digital preservation domain of emulation. This strategy can be used not just on today’s operating systems but on deprecated ones as well.
For example, imagine if we’re able to maintain the course installations that we’ve created today. In 10 years time those will still be the perfect environments to access and re-run digital artifacts which were originally produced there. This approach was proposed by Euan Cochrane at National Archives NZ and Maurice van den Dobbelsteen at the National Archive in The Hague to preserve typical governmental office environments to be able to reproduce typical artifacts from a certain era. Being in the field of computer science for quite a while I lost the faith in migration of proprietary formats years ago. Standards might be a nice thing, but if I cannot reproduce a “standard artifact” like a PowerPoint 4.0 presentation in today’s environments, they fail to serve me.
The computer industry and software vendors have understandably different agendas compared to memory institutions required to preserve the digital heritage. Emulation is in my opinion a very good compromise here, helping both sides. The industry can march forward and does not need to keep a long legacy while memory institutions are enabled to preserve objects in an authentic way. Simply, the legal framework needs to be tweaked a bit to fully support this kind of solution. And of course it would be great, if a vendor like Apple would be more open to virtualize Apple desktop machines and emulate mobile devices like the iPad. This would make initiatives like iPad-based schoolbooks much more sustainable and acceptable for both sides.
What are the key technical challenges that must be addressed for emulation to become a widely-used strategy?
Emulators face the same problems as do every software package and general digital objects, so perpetuating emulators for future use is a central component of a reliable preservation strategy . Hence, emulators need to be adapted to the current hardware and operating system combinations regularly.
If the emulator is available as an Open Source package, it can be ensured that a timely adaption to the respective new computer platform appears. Digital objects cannot be used by themselves, but require a suitable context to the already mentioned working environment in order to be accessed. This context must combine suitable hardware and software components so that its creation environment or a suitable equivalent is generated, depending on the type of artifact a user is interested in. The needed additional software components up to now are implicitly used but are not categorized and officially archived. The only memory institution I’m aware of, is the National Library of Australia which preserves a big collection of software boxes and instruction material. Thus a missing operating system or firmware ROM of a home computer might render a digital object completely unusable, even with a perfectly running virtual replacement of the original machine.
In addition to storing and handling the digital objects themselves, it is essential that we store and manage the required set of software components. In order to allow non-technical individuals to access deprecated user environments, the tasks of setting up and configuring an emulator, injecting and retrieving digital objects in and from the emulated environment have to be provided as easy-to-use services. Making these services web-based allows for a large and virtually global user base to access and work with emulated systems.
Alongside managing the software components and associated documentation, a software archive must tackle the legal and technical problems of software licensing. For proprietary software, this may severely limit the rights of the institution to use the software to provide preservation services. Furthermore, technical approaches to protecting intellectual property, such as Digital Rights Management (DRM), copy protection mechanisms, online update or registration requirements all create significant problems for a software archive.
To tackle this problem will require the cooperation of software manufacturers with a designated software archiving institution to provide suitably licensed unprotected copies of software for long-term preservation purposes. We recommend the development of an approach similar in concept to the legal deposit approach used by many national or copyright libraries.
End of Part One. Part Two will run on Tuesday Dec. 11, 2012.
Note: Edited on 12/10/12 to add research partner. Edited 12/12/12 to fix a typo.