Digital Humanities

DH2010

King's College London, 3rd - 6th July 2010

[Image: KCL Photo Collage]
[Image: London Photo Collage (Somerset House; Globe Theatre; Millennium Bridge; Tate Modern)]

Dingler-Online – The Digitized "Polytechnisches Journal" on Goobi Digitization Suite

See Abstract in PDF, XML, or in the Programme

Hug, Marius
Humboldt-Universität zu Berlin
marius.hug@culture.hu-berlin.de

Kassung, Christian
Humboldt-Universität zu Berlin
CKassung@culture.hu-berlin.de

Meyer, Sebastian
SLUB-Dresden
sebastian.meyer@slub-dresden.de

This project located at Humboldt-Universität zu Berlin sets out to digitize Dingler’s "Polytechnical Journal" ("Polytechnisches Journal"), 1820-1931. Aside from the digitization of the journal’s images, we encode the OCRed text according to the Text Encoding Initiative Guidelines TEI-P5. Our online edition of Dingler’s journal will be freely available on a state-of-the-art system called Goobi Digitization Suite, which is a new production and presentation solution funded by the DFG (German Research Foundation).

Dingler's Journal

In 1820 the German chemist and industrialist J. G. Dingler started publishing the "Polytechnisches Journal". This journal was to include a personal but representative selection of a broad variety of articles. Originally most of these articles had been published in magazines all over Europe and, though most of them originated from the UK, there were also specimens from France, Italy, and Russia.

The "Polytechnisches Journal" was published over a period of 111 years. Thus, the journal became an extremely important source for the history of 19th century knowledge, as it is an account of period that included the industrialization, the progress of transport and communication, and the differentiation of various technologies. For instance, the journal covers the discovery of electro-magnetism by Hans Christian Oersted (1820) and the theory of relativity by Albert Einstein (1905/1915). It contains articles on steam engines and locomotives, as well as bicycles and automobiles.

Synchronic and diachronic transfer of knowledge and technique
Articles published in Dingler’s journal give us an example of the emergence of culture in a technical context. In the process of industrialization, new technical achievements profoundly affected everyday life. It is in the interplay of science and knowledge that the journal evolves its epistemic significance. The "Polytechnisches Journal" is unique and highly relevant for very different research fields which focus on the cultural history as it emerged from Europe’s technical transformations. It is significant not only for people engaged in the history of science but for anyone interested in the cultural heritage of Europe.

Dingler-Online – The encoding

Linking
Since Dingler annotated his editorial work very thorougly, we find all necessary metadata on each article contained within the journal. He even went one step further: Dingler cross-referenced other source material on issues inherent to each article. Therefore "linking" is one of the main tasks for enriching the text. Doing this consistently from the very beginning of our project we are aiming at a network of digitized knowledge for that period covering the whole of Europe. Any digitized magazine of a somehow technical background of the 19th century will be interesting to be linked to.

Indexing
As is true for any non-digitally published magazine, researching the contents of journals is a time consuming task. Right from the beginning Dingler knew he would have to give assistance to those accessing his material, so he compiled an index once a year. In 1843 the first so-called "Real-Index" was published, a third-hand work which covered the first 78 volumes of the journal. All in all there are four of these "Real-Indexes".

Based on these two different kinds of indexes as well as our index-related TEI-encoding, we will be able to provide a deeply granulated and dynamically generated index. It will consist of a register of persons (differentiated according to their role, i.e. author, translator, originator etc.), objects, and, among others, those journals, which were the source of the published articles.

The articles – our key component
Dingler’s journal comes in 360 volumes, each including 4 to 6 issues. The key components of our edition are the 50 to 170 articles in each volume. Even at this very basic level, we distinguish between two types of articles, since there are in fact a couple of articles published for the first time in the "Polytechnisches Journal" in addition to the reprinted articles. We extract all these articles from the volume and provide access to downloadable PDF-versions as well as different formats of established bibliographical meta-data. In the long-run we aim at providing access to PDFs generated dynamically via XSL-FO.

Text and images
The editors of the journal strictly adhered to the medial conditions of their time. In 1820 Dingler started using Gothic typescript for text, and copper engravings for the imprints. In later issues (starting in the 1870s) we find Antiqua letters and floating images integrated within the text.

Our aim is a re-interpretation of the relationship between text and images. Dingler completed each volume with technical drawings and visualizations on additional plates. Hence, up to 40 figures on a plate are encoded according to their specific coordinates using the Image Markup Tool developed by the University of Victoria. Via hyperlink we are able to provide access to a zoomable view of each figure. This approach has two immediate advantages: Firstly, it enables parallel reading of text and image and therefore adopts the original layout, in which plates were attached to the back of each volume as foldouts. Secondly, we can provide a new kind of readability. For economic reasons the plates were densely packed with images. Highlighting them per mouseover will be much more convenient, allowing to inspect them in more detail and thus enabling a wider integration of the text and images.

Since right from the beginning Dingler insisted on very detailed and thus expensive lithographs rather than wood engravings, we have made it our task not to veer from the standard set by Dingler at this point.

Dingler-Online II – The appearance

Not only since we are facing the challenging task of digitizing Gothic type in more than 220 volumes of the "Polytechnisches Journal", we find ourselves in good company with two rather impressive German digitization projects: Grimms "Deutsches Wörterbuch" and Krünitz’s "Oeconomische Encyclopädie". Both made use of double-keying and therefore sent their books/images to Asia where the text digitization took place. Afterwards so-called TUSTEP-routines were employed in order to match the two different text versions.

In the following we will take a closer look at different aspects, which will take our project one step further than the aforementioned approaches.

Goobi Digitization Suite

With the so called Goobi Digitization Suite – a software solution funded by the DFG and developed by the SLUB-Dresden (Sächsische Landesbibliothek – Staats- und Universitätsbibliothek) and the SUB-Göttingen (Niedersächsische Staats- und Universitäts- bibliothek) – we will be using a completely new technology on the market.

The Goobi Suite consists of two parts: Goobi.Production and Goobi.Presentation. Goobi.Production is a web-based tool for managing a digitization workflow using Java technology. Among other features, it comes with a very flexible metadata editor, an user-based permission system, and visually enhanced statistics.

Since at the beginning of our project the Goobi Suite wasn’t available yet, we found an experienced service provider for text digitization and (semi-)automatic encoding: the Editura GmbH. Their OCR produces very good results even for Gothic type, given that the images are scanned at 600 dpi.

Editura encodes the OCRed text and already enriches it according to the TEI-P5 guidelines. This step includes 'tagging' the structure and special attributes of the text to an encoding level between 3 and 4. Thus the digitization of the text includes more than a basic structural encoding and we can concentrate on a more scientific encoding approach going beyond other projects comparable in extent.

Apart from XML-files in TEI-encoding and images encoded using the Image Markup Tool our service provider delivers elaborate METS-files which are necessary for a presentation of the edition in the so called DFG-viewer, as well as in Goobi.Presentation, which we use as part two of the Goobi Digitization Suite. This is a full-featured web presentation layer for digital material and is based on the TYPO3 CMS Framework, which can hold a regular website, too. Hence, Goobi.Presentation integrates perfectly into any page inside the CMS.

The whole software suite is considered open source and freely available to everyone. As can be seen in our project, Goobi.Presentation can be used independently from Goobi.Production. This modularity of Goobi is ensured by the consequent usage of the international standards METS, MODS and TEI.

Customizing Goobi

The more data there is to present, the less important any unstructured information becomes. This is why encoding and a directed access to data, via searching or browsing, becomes more and more important.

Goobi.Presentation makes it possible to customize the search engine. Naturally one will be able to search any term anywhere in the text. In addition, it is possible to limit the search results referring to different issues. For instance: if someone is looking for all articles on patent applications on steam engines published in the magazine in the 1840s, they will just have to search for "steam engine", then restrict their search to "text type" patent application, and "time" 1840s.

Conclusion

Dingler-Online is an enriched digitization that is neither simply image-based nor massproduced. It is a user-friendly platform which inspires a broad use not restricted to historians of technology or, come to that, researchers, but is open to the interested public in general.

References

© 2010 Centre for Computing in the Humanities

Last Updated: 30-06-2010