Modeling as a Sustainability Strategy for DH Software Applications

Fischer, Anna
Data Center for the Humanities, University of Cologne, Germany
anna.fischer@uni-koeln.de

Harzenetter, Lukas
Institute of Architecture of Application Systems, University of Stuttgart, Germany
harzenetter@iaas.uni-stuttgart.de

Schildkamp, Philip
Data Center for the Humanities, University of Cologne, Germany
philip.schildkamp@uni-koeln.de

Breitenbücher, Uwe
Institute of Architecture of Application Systems, University of Stuttgart, Germany
breitenbuecher@iaas.uni-stuttgart.de

Neuefeind, Claes
Data Center for the Humanities, University of Cologne, Germany
c.neuefeind@uni-koeln.de

Leymann, Frank
Institute of Architecture of Application Systems, University of Stuttgart, Germany
leymann@iaas.uni-stuttgart.de

Mathiak, Brigitte
Data Center for the Humanities, University of Cologne, Germany
bmathiak@uni-koeln.de

Table of contents

1. Introduction

Given the growing relevance of digital methodologies for the Humanities, research within this diverse field yields an increasingly broad and heterogeneous range of software solutions to produce, present and persist its scientific findings (Buddenbohm et al. 2016; Reiche et al. 2014). Applications developed in the Digital Humanities (DH) include presentation systems, interactive visualizations, queryable research databases and digital editions, among others. In contrast to static research data, such as (collections of) digital text documents or audio recordings, research applications are highly dependent on their surrounding ecosystem, i.e., on the infrastructure they run on, on their operating system and on other components such as a web server or database. These ecosystems evolve alongside continuous technological advances and deprecations and can be subject to software aging (Grottke et al. 2008). Furthermore, applications might need to be relocated for (infra-)structural reasons, e.g., when researchers and their software projects move to another institution. Thus, research applications must be actively maintained to remain accessible and usable, which frequently exceeds the financial and technical resources of fixed-term projects.

Against this backdrop, researchers need practicable sustainability strategies for their applications, similar to well-established long-term archiving techniques for classic paper media or standards of Research Data Management (RDM), to ensure scientific reproducibility. In our presentation at the EADH 2021, we draw a balance of the findings of the DFG-funded project SustainLife (2018-2021), which has been dedicated to exploring the adaptability of cloud deployment methods to the needs and prerequisites of DH research applications.

2. Sustainability strategies

Considering the diversity of applications developed within the Digital Humanities, individualized consultancy and guidance is essential in finding a suitable sustainability approach that meets a project’s technological and methodological as well as its financial and operational requirements (Smithies et al. 2019). The Data Center for the Humanities (DCH, University of Cologne) is engaged in numerous efforts to disseminate RDM standards, from local RDM consulting to participating in international and national research data infrastructure projects (Witt et al. 2018). While being able to provide extensive and standardized guidance when it comes to publishing, revisioning and archiving digital research data, advising researchers on how to preserve their individual applications for the future is a challenging and time-consuming task.

Even though well-established technological strategies for software maintenance exist, standards for the sustainability of research applications in the Humanities are still scarce. The various strategies developed throughout the technological landscape not only differ in their implementation but are characterized by different theoretical backgrounds and standards: Besides keeping spare hardware to replace failing components, full system emulation and virtualization represent the most traditional means of software sustainability (Rosenthal 2015). Newer strategies, such as containerization, focus on encapsulating an application with all its dependencies in binary images, while running on a reusable hardware abstraction (Burton et al. 2020). Between these divergent strategies, many different hybrid forms of hardware abstraction exist, each bearing their own means of reproducing desired execution environments. In terms of preservation and reproducibility, the above mentioned approaches are industrial-grade solutions and encompass sophisticated means of snapshotting and migrating applications between different systems. However, none of the approaches are sufficiently interoperable since all of them are vendor-dependent or bound to specific infrastructures and technologies.

Being tied to institutional or technological constraints, most of the emerging DH-specific technical approaches for software sustainability face similar problems. Within the DH, we encountered strategies which (1) invest resources to keep software up-to-date or even reimplement it (Smithies et al. 2019), (2) transform dynamic applications to static snapshots, i.e., archiving the contained data at a fixed moment in time (Arneil et al. 2019), (3) enforce a technology stack from the start for the sake of consistency when it comes to archiving and replication (Arneil et al. 2019) and (4) employ virtualization or containerization to encapsulate and bundle software (Smithies et al. 2019). To overcome the above mentioned issues, the DFG-funded project SustainLife examined the adaptation of cloud deployment methods and technologies for research applications developed in the Digital Humanities in order to facilitate their maintenance and provisioning (Neuefeind et al. 2018). By using the standardized modeling language TOSCA Topology Orchestration Specification for Cloud Applications (OASIS 2013; OASIS 2020), our solution allows combining existing sustainability approaches in an additional abstraction layer.

3. The TOSCA-based sustainability approach developed within SustainLife

TOSCA is an OASIS standard to describe, provision and manage applications in a portable as well as vendor- and technology-independent manner. To model an application in TOSCA, its structure is described by its components, i.e., Node Templates, and their relations, called Relationship Templates, thereby forming a declarative deployment model called Topology Template. Hereby, Node Templates and Relationship Templates are typed by Node Types and Relationship Types. This type system enables users to share and reuse already defined Node Types, which yields synergetic effects, as changes to any Node or Relationship Type are directly reflected in all applications using the same type. Thus, the more types are defined, the less of an effort it becomes to model new applications (Schildkamp et al. 2020).

Using the open-source TOSCA ecosystem OpenTOSCA (Breitenbücher et al. 2016), we modeled several DH-specific use cases, which were selected by (1) ascertaining local demands (Neuefeind et al. 2019a), (2) polling participants of a digital editions-targeted sustainability workshop we organized (Neuefeind et al. 2019b), and (3) analyzing the technical structure of common DH research applications (Helling et al. 2019). Furthermore, we adapted the OpenTOSCA ecosystem to the methodological and technical needs of the DH community by iteratively feeding our modeling and usage experience back into the OpenTOSCA development.

One of the results of the projects is that the DCH now hosts a repository containing our modeled use cases in TOSCA. While our self-hosted OpenTOSCA instance is only available within the local area network of the University of Cologne (OpenTOSCA 2019), our repository is publicly available (SustainLife 2021). Furthermore, by extending the OpenTOSCA ecosystem as part of the project assignment, we developed the possibility to version applications (Harzenetter 2018), freeze and defrost applications (Harzenetter et al. 2019a), and to automatically enrich running applications with management operations which can be executed on demand (Harzenetter et al. 2019b; Harzenetter et al. 2021). Moreover, we introduced another method to simplify the modeling of applications in TOSCA by using abstract design patterns instead of obliging users to understand all technical components and how they must be configured to achieve a certain behavior (Harzenetter et al. 2018; Harzenetter et al. 2020). All these features ease the operation and maintenance of long-running applications in a cost-efficient manner.

Our TOSCA-based approach not only reduces the resources necessary to maintain long-running applications but it is also designed to support long-term archiving of research applications. We therefore use the CSAR (Cloud Service Archive) defined in the TOSCA standard, which is a packaging format to bundle applications modeled in TOSCA together with all their dependencies. CSARs may contain only the application logic, or they can be completely self-contained and bundle everything needed to provision the contained application. Additionally, if the Freeze and Defrost approach (Harzenetter et al. 2019a) is used, a CSAR is generated that contains not only the application’s components but also its internal state. Thus, a frozen application can be redeployed in the state it was frozen in, i.e., defrosted, at any point in time, such that the application and all contained functionality and data are accessible again. Therefore, as long as a TOSCA runtime, such as the OpenTOSCA Runtime Container (Binz et al. 2013), is available, an application that has been bundled as a CSAR can be instantiated anywhere at any given time.

4. Summary and future work

In summary, our TOSCA-based sustainability approach reduces the resources necessary to keep long-running applications up-to-date and to archive them for the long term by (1) reusing existing Node Types throughout different applications, (2) updating components independently from each other and from the application they comprise, and (3) decoupling applications from their environmental dependencies and technological deprecations. Furthermore, by employing TOSCA, we are able to accommodate the diversity of applications produced within DH research. Based on our experience with the TOSCA standard and the aforementioned advantages, we conclude that the OpenTOSCA ecosystem is an ideally fitted candidate to become a community-wide standard to ensure application sustainability within the DH.

While TOSCA eases the modeling of applications in general, it imposes a certain complexity on the modeling process of small-scale applications, e.g., a simple PHP-based weblog. Since it combines many different concepts and options for the deployment and management of applications, unaccustomed users might be overwhelmed with the overflow of possibilities offered within the TOSCA framework. Thus, future work on TOSCA-based software sustainability for the DH should continue to focus on enhancing OpenTOSCA’s usability and to ease the modeling process even further. Therefore, to implement TOSCA as a community-wide standard for sustaining research software in the DH, future work should extend and document best practices, offer extensive user guides and tutorials as well as automated assistance in OpenTOSCA.

Appendix A

Bibliography
  1. Arneil, Stewart / Holmes, Martin / Newton, Greg (2019): “Project Endings: Early Impressions From Our Recent Survey On Project Longevity In DH”, in: Book of Abstracts of the 30th Annual Conference on Digital Humanities (DH 2019) <https://dev.clariah.nl/files/dh2019/boa/0891.html> [14.05.2021].
  2. Binz, Tobias / Breitenbücher, Uwe / Haupt, Florian / Kopp, Oliver / Leymann, Frank / Nowak, Alexander / Wagner, Sebastian (2013): “OpenTOSCA – A Runtime for TOSCA-based Cloud Applications”, in: Proceedings of the 11th International Conference on Service-Oriented Computing (ICSOC 2013), Springer DOI: 10.1007/978-3-642-45005-1_62.
  3. Breitenbücher, Uwe / Endres, Christian / Képes, Kálmán / Kopp, Oliver / Leymann, Frank / Wagner, Sebastian / Wettinger, Johannes / Zimmermann, Michael (2016): “The OpenTOSCA Ecosystem. Concept & Tools”, in: European Space Project on Smart Systems, Big Data, Future Internet - Towards Serving the Grand Societal Challenges (EPS Rome 2016) 1: 112–130 DOI: 10.5220/0007903201120130.
  4. Buddenbohm, Stefan / Engelhardt, Claudia / Wuttke, Ulrike (2016): “Angebotsgenese für ein geisteswissenschaftliches Forschungsdatenzentrum”, in: Zeitschrift für digitale Geisteswissenschaften (ZfdG) 1 DOI: 10.17175/2016_003.
  5. Burton, Matt / Lavin, Matthew J. / Otis, Jessica / Weingart, Scott B. (2020): “Digits: Two Reports on New Units of Scholarly Publication”, in: Journal of Electronic Publishing 22, 1 DOI: 10.3998/3336451.0022.105.
  6. Grottke, Michael / Matias Jr., Rivalino / Trivedi, Kishor S. (2008): “The Fundamentals of Software Aging”, in: Proceedings of the 1st International Workshop on Software Aging and Rejuvenation, 19th International Conference on Software Reliability Engineering (ISSRE 2008) DOI: 10.1109/ISSREW.2008.5355512.
  7. Harzenetter, Lukas (2018): Versioning of Applications Modeled in TOSCA. Master Thesis, University of Stuttgart DOI: 10.18419/opus-9710.
  8. Harzenetter, Lukas / Binz, Tobias / Breitenbücher, Uwe / Leymann, Frank / Wurster, Michael (2021): “Automated Generation of Management Workflows for Running Applications by Deriving and Enriching Instance Models”, in: Proceedings of the 11th International Conference on Cloud Computing and Services Science (CLOSER 2021). SciTePress DOI: 10.5220/0010477900990110.
  9. Harzenetter, Lukas / Breitenbücher, Uwe / Falkenthal, Michael / Guth, Jasmin / Krieger, Christoph / Leymann, Frank (2018): “Pattern-based Deployment Models and Their Automatic Execution”, in: 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2018) 41–52 DOI: 10.1109/UCC.2018.00013.
  10. Harzenetter, Lukas / Breitenbücher, Uwe / Falkenthal, Michael / Guth, Jasmin / Leymann, Frank (2020): “Pattern-based Deployment Models Revisited. Automated Pattern-driven Deployment Configuration”, in: Proceedings of the 12th International Conference on Pervasive Patterns and Applications (PATTERNS 2020) 40–49 ISBN: 978-1-61208-783.
  11. Harzenetter, Lukas / Breitenbücher, Uwe / Képes, Kálmán / Leymann, Frank (2019): “Freezing and Defrosting Cloud Applications. Automated Saving and Restoring of Running Applications”, in: Software-Intensive Cyber-Physical Systems 35: 101–114 DOI: 10.1007/s00450-019-00415-8.
  12. Harzenetter, Lukas / Breitenbücher, Uwe / Leymann, Frank (2019): “Automated Generation of Management Workflows for Applications Based on Deployment Models”, in: 23rd IEEE International Enterprise Distributed Object Computing Conference (EDOC 2019) 216-225 DOI: 10.1109/EDOC.2019.00034.
  13. Helling, Patrick / Schildkamp, Philip / Mathiak, Brigitte (2019): “Nachhaltigkeit von Forschungsdateninfrastrukturen am Beispiel von Digitalen Editionen – Was steht noch in 2039”, in: Postersession of the 20th Annual DINI Conference (DINI 2019) <https://dini.de/fileadmin/jahrestagungen/2018/DINI_Jahrestagung_2019_Helling.pdf> [14.05.2021].
  14. Neuefeind, Claes / Harzenetter, Lukas / Schildkamp, Philip / Breitenbücher, Uwe / Mathiak, Brigitte / Barzen, Johanna / Leymann, Frank (2018): “The SustainLife Project – Living Systems in Digital Humanities”, in: IBM Research Report – Papers from the12th Advanced Summer School On Service-Oriented Computing (SummerSOC’18) 101–112 <https://dominoweb.draco.res.ibm.com/reports/RC25681.pdf> [14.05.2021].
  15. Neuefeind, Claes, / Schildkamp, Philip / Mathiak, Brigitte / Marčić, Aleksander / Hentschel, Frank / Harzenetter, Lukas / Breitenbücher, Uwe / Barzen, Johanna / Leymann, Frank (2019): “Sustaining the Musical Competitions Database – A TOSCA-based Approach to Application Preservation in the Digital Humanities”, in: Book of Abstracts of the 30th Annual Conference on Digital Humanities (DH 2019) <https://dev.clariah.nl/files/dh2019/boa/0574.html> [14.05.2021].
  16. Neuefeind, Claes / Schildkamp, Philip / Mathiak, Brigitte / Harzenetter, Lukas / Barzen, Johanna / Breitenbücher, Uwe / Leymann, Frank (2019): “Technologienutzung im Kontext Digitaler Editionen – Eine Landschaftsvermessung”, in: Book of Abstracts of the 6th Annual Conference of the Digital Humanities Association in German-speaking Countries (DHd 2019) 219–222 DOI: 10.5281/zenodo.2596094.
  17. OASIS (2013): Topology and Orchestration Specification for Cloud Applications Version 1.0 – OASIS Standard <http://docs.oasis-open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.html> [14.05.2021].
  18. OASIS (2020): TOSCA Simple Profile in YAML Version 1.3 – OASIS Standard <https://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.3/os/TOSCA-Simple-Profile-YAML-v1.3-os.html> [14.05.2021].
  19. OpenTOSCA (2019): Use the Extensive Functions of the OpenTOSCA Container Getting started with the OpenTOSCA Runtime Environment. <http://www.opentosca.org/sites/use_opentosca.html> [14.05.2021].
  20. Reiche, Ruth / Becker, Rainer / Bender, Michael / Munson, Mathew / Schmunk, Stephan / Schöch, Christof (2014): “Verfahren der Digital Humanities in den Geistes- und Kulturwissenschaften”, in: DARIAH-DE working papers 4. Göttingen <http://resolver.sub.uni-goettingen.de/purl/?dariah-2014-2> [14.05.2021].
  21. Rosenthal, David S. H. (2015): Emulation & Virtualization as Preservation Strategies <https://mellon.org/Rosenthal-Emulation-2015> [14.05.2021].
  22. Schildkamp, Philip / Harzenetter, Lukas / Leymann, Frank / Mathiak, Brigitte / Neuefeind, Claes / Breitenbücher, Uwe / Fischer, Anna (2020): “Workshop on Modelling and Maintaining Research Applications in TOSCA”, in: Book of Abstracts of the 31th Annual Conference on Digital Humanities (DH 2020) <https://dh2020.adho.org/wp-content/uploads/2020/07/120_WorkshoponModellingandMaintainingResearchApplicationsinTOSCA.html> [14.05.2021].
  23. Smithies, James / Westling, Carina / Sichani, Anna-Maria / Mellen, Pam / Ciula, Arianna (2019): “Managing 100 Digital Humanities Projects. Digital Scholarship & Archiving in King’s Digital Lab”, in: Digital Humanities Quarterly 13, 1 <http://www.digitalhumanities.org/dhq/vol/13/1/000411/000411.html> [14.05.2021].
  24. SustainLife (2021): OpenTOSCA Definitions implemented by the Data Center for the Humanities at the University of Cologne <https://gitlab.cceh.uni-koeln.de/sustainlife/tosca-definitions-dch> [14.05.2021].
  25. Witt, Andreas / Blumtritt, Jonathan / Helling, Patrick / Mathiak, Brigitte / Rau, Felix (2018): “Forschungsdatenmanagement in den Geisteswissenschaften an der Universität zu Köln”, in: Das offene Bibliotheksjournal 5, 3: 104-117 DOI: 10.5282/o-bib/2018H3S104-117.