Copernico's engine room
Copernico relies on a wide range of databases to curate its data in a way that is as sustainable, structured, and standards-based as possible. To support this, additional in-house databases and input interfaces have also been developed.
Many websites today are maintained using content management systems (CMS). Among the best-known examples are Wordpress, Drupal, and Typo3. Copernico also uses such a system—Drupal—but the content itself is not maintained directly within it. Instead, it is managed in dedicated specialist databases designed specifically for this purpose. These databases are queried once an hour via an OAI-PMH interface, and the content is automatically transferred or updated.
Our Databases
All portal content is recorded in databases based on the software digiCULT.web, developed by the digiCULT.Verbund eG, which is widely used in the museum sector and also underpins major portals such as “Sammlungsgut aus kolonialen Kontexten,” “Dehio OME,” “Menora – 900 Jahre Jüdisches Leben in Thüringen” and “Rheinische Geschichte.” Since 2019, dedicated input forms have been developed for the various types of content—such as thematic articles, exhibitions, and projects—based on the core classes of the CIDOC CRM. The data is exported in LIDO (Lightweight Information Describing Objects), an XML schema widely used in the cultural heritage sector, and transferred to the portal’s content management system via an OAI interface.
Within digiCULT.web, several interconnected databases work together. For example, in order for a thematic article to appear on the portal, it is not sufficient simply to complete the input form for the article itself. People and institutions referenced in keywords or bibliographic entries, as well as images and publication titles, are each recorded in their own dedicated databases. The advantage of this approach is that once a dataset—such as a person or an academic article—has been created, it can also be reused in other contributions. Over time, this results in extensive collections of data on people and organizations, image materials, and referenced literature. Four years after the launch of the portal, for instance, the literature database already contains 2,713 records, while the media management database holds 4,970 records.
Within digiCULT.web, several interconnected databases work together. For example, in order for a thematic article to appear on the portal, it is not sufficient simply to complete the input form for the article itself. People and institutions referenced in keywords or bibliographic entries, as well as images and publication titles, are each recorded in their own dedicated databases. The advantage of this approach is that once a dataset—such as a person or an academic article—has been created, it can also be reused in other contributions. Over time, this results in extensive collections of data on people and organizations, image materials, and referenced literature. Four years after the launch of the portal, for instance, the literature database already contains 2,713 records, while the media management database holds 4,970 records.
A thematic article on the portal also consists of several records from the thematic text database. At a minimum, two internally linked records are required: a thematic text record, in which the article itself is managed, and a record containing the biography of the person who is displayed as the author.
Take, for example, the introductory article “Jewish History in Eastern Europe: The Nineteenth Century” by Anke Hilbrenner, which is composed of the article record and the corresponding biographical record.
Take, for example, the introductory article “Jewish History in Eastern Europe: The Nineteenth Century” by Anke Hilbrenner, which is composed of the article record and the corresponding biographical record.
Data Fields, Controlled Vocabularies, and Our Thesaurus
A look at the data entry form for one of these thematic articles shows that it contains a large number of fields. Many of these are not free-text fields in which text can be entered arbitrarily, but rather selection lists. These draw on so-called controlled vocabularies, value lists, a location database, and a thesaurus developed specifically for Copernico. The term “controlled” already indicates the purpose of this approach: the aim is to minimize potential sources of error and avoid inconsistencies when populating our databases.
One example is the field “Thementexttyp” (type of thematic article). Here it is specified whether a contribution is, for example, an introduction, a background article, an interview, or an image gallery. Without such a controlled list, different variants—such as “introductory text,” “introduction,” or “introductory article”—could easily be entered. This would undermine the consistent structure of the data and make searching within the portal more difficult.
Our controlled vocabularies, thesaurus, and location database are managed in the software digiCULT.xTree, which is also provided by digiCULT.Verbund eG. This is a web-based database that enables collaborative work independent of location.
In particular, our indexing relies on our information retrieval thesaurus. The key aspect here is information retrieval—that is, our vocabularies are designed with the portal search and the wider web in mind. In other words, our goal is to ensure that Copernico’s content can be found as easily as possible. And spoiler alert: it works!
The same applies to our Historical Thesaurus. It structures and guides our work, since we cannot simply assign any keywords we like. Each new entry first requires careful research and consideration: Is the subject term used to index an article truly appropriate and unambiguous? Is the designation we have chosen actually used and clearly defined? Only then can the content of the contribution be captured precisely. In doing so, we also orient ourselves toward other vocabularies, especially established authority files such as the Integrated Authority File (GND) of the German National Library (DNB).
One example is the field “Thementexttyp” (type of thematic article). Here it is specified whether a contribution is, for example, an introduction, a background article, an interview, or an image gallery. Without such a controlled list, different variants—such as “introductory text,” “introduction,” or “introductory article”—could easily be entered. This would undermine the consistent structure of the data and make searching within the portal more difficult.
Our controlled vocabularies, thesaurus, and location database are managed in the software digiCULT.xTree, which is also provided by digiCULT.Verbund eG. This is a web-based database that enables collaborative work independent of location.
In particular, our indexing relies on our information retrieval thesaurus. The key aspect here is information retrieval—that is, our vocabularies are designed with the portal search and the wider web in mind. In other words, our goal is to ensure that Copernico’s content can be found as easily as possible. And spoiler alert: it works!
The same applies to our Historical Thesaurus. It structures and guides our work, since we cannot simply assign any keywords we like. Each new entry first requires careful research and consideration: Is the subject term used to index an article truly appropriate and unambiguous? Is the designation we have chosen actually used and clearly defined? Only then can the content of the contribution be captured precisely. In doing so, we also orient ourselves toward other vocabularies, especially established authority files such as the Integrated Authority File (GND) of the German National Library (DNB).
Since March 2026, the thesaurus can also be explored through its own dedicated viewer for the first time. This provides an additional way to research the portal’s content: Copernico entries linked to a particular subject term can now be viewed and accessed at a glance.
Quality Assurance Through the Stage System
After an article has been entered into our databases and indexed, it is not published immediately. Instead, Copernico uses a so-called stage system before publication. In structure and appearance it corresponds to the live portal, but it is not publicly accessible.
In this protected environment, we primarily check whether the links between images and text established in the databases are correct. Authors are also given the opportunity—via password-protected access—to review their previously peer-reviewed and copy-edited contribution one final time. At this stage, minor inconsistencies often still come to light, such as incorrect captions, missing spaces, or typographical errors.
The editorial teams responsible for individual thematic focuses also make use of the stage system. There they can assess how the section works as a whole: whether the planned title image—the so-called mood image—is appropriate, whether the arrangement of the articles appears coherent, and take one last look at all the texts.
Before a new thematic focus is published, we transfer the arrangement defined in the stage system into the Copernico backend. The resulting compilation of contributions—for example here for the thematic focus “Jewish Life”—appears in a structure such as the following:
In this protected environment, we primarily check whether the links between images and text established in the databases are correct. Authors are also given the opportunity—via password-protected access—to review their previously peer-reviewed and copy-edited contribution one final time. At this stage, minor inconsistencies often still come to light, such as incorrect captions, missing spaces, or typographical errors.
The editorial teams responsible for individual thematic focuses also make use of the stage system. There they can assess how the section works as a whole: whether the planned title image—the so-called mood image—is appropriate, whether the arrangement of the articles appears coherent, and take one last look at all the texts.
Before a new thematic focus is published, we transfer the arrangement defined in the stage system into the Copernico backend. The resulting compilation of contributions—for example here for the thematic focus “Jewish Life”—appears in a structure such as the following:
Together with our agency Outermedia, we developed a range of layouts during the early development phase of Copernico, which we have been gradually refining and expanding ever since. In the configuration shown here, for example, the contributions are presented in a three-column gallery view. In the frontend, this appears as follows:
A look inside Copernico’s “engine room” reveals how many interconnected work steps, technical structures, and editorial decisions are required for content to appear on the portal in its present form. Much of this work is intentionally kept invisible to users. Yet it is precisely these carefully developed processes—from controlled vocabularies and interconnected databases to multi-stage review procedures—that ensure the portal’s content remains reliable, consistent, and usable in the long term. The engine room is therefore not an end in itself, but a foundation that allows Copernico to function as a scientifically grounded, sustainable, and user-friendly portal.











