Case VIRTA: Finnish National Publication Data to OpenAIRE

Services use cases

VIRTA

Providing Finnish National Publication Data to OpenAIRE

Efficient dissemination and visibility of research results across scientific communication infrastructure boundaries is closely linked to the definition of standards for the description of scientific information and communication protocols. Metadata should be as complete and consistent as possible, as its quality is also part of the services that build upon it and is therefore a prerequisite for its use and acceptance by researchers and the public. At the same time, authors should not be burdened with additional effort and redundant input of bibliographic data.

To achieve this goal, collaborations, resources and active contributions from different infrastructures and their organizations are required.
One example of such collaborative effort is the integration of the Finnish VIRTA Publication Information Service into OpenAIRE which has been a work-in-progress from mid-2018 and aims to be in production in the first half of this year. The goal of the integration is to provide Finnish publication metadata from the national aggregator, VIRTA, to OpenAIRE and thus improve the quality of metadata and the visibility of Finnish research results on an European and international level. By help of the integration of the aggregator, serving as a national Current Research Information System (CRIS), a single point of entry is created. Thus the effort needed to be OpenAIRE compliant is shifted from many institutional publication infrastructures to one central content provider.

The integration of VIRTA in OpenAIRE will also solve the following issues. In case of Finland none of the commercial CRIS platforms is currently compatible with OpenAIRE aggregation requirements. Moreover Finnish repositories do not cover the complete research output available from academic institutions in Finland. VIRTA will allow to answer questions like what is the portion of Open Access compared to the total publication output and what is the share of native-language publications. Integration of (national) CRIS with OpenAIRE would provide answers to such questions and enables comparison across national borders. In parallel the integration of institutional CRIS is important as it will greatly improve the coverage and quality of metadata in OpenAIRE and will expand the monitoring capabilities provided by the OpenAIRE portal and dashboards.

VIRTA Publication Information Service
VIRTA Publication Information Service is an advanced data warehouse solution to integrate institutional data at the national level in Finland. VIRTA was launched in spring 2016. The service is developed by CSC – IT Center for Science and owned by the Finnish Ministry of Education and Culture. As a data hub, VIRTA has up-to-date bibliographic information of all scientific publications from 54 Finnish organizations using different local solutions for publication data collection, such as commercial CRISes, self-made publication registers and institutional publication repositories (Figure 1). About 60,000 scientific, professional and non-scholarly publications are transferred per year with all scientific fields covered. Publication metadata in VIRTA is based on a national data model that fulfills the requirements of national higher education institutions' funding model and other needs of monitoring research and development activities.

Figure 1. VIRTA Publication Information Service metadata flows and integrations to both organizational CRIS systems as well as national and international services.

Two (or three) steps to OpenAIRE integration
The OpenAIRE Guidelines for CRIS Managers version 1.1 have been released in June 2018 with smaller updates last December (current version 1.1.1). These guidelines are available at: https://openaire-guidelines-for-cris-managers.readthedocs.io/en/latest/index.html.
They are aiming to provide instructions for CRIS managers on how to expose their metadata in a way that is compatible with the OpenAIRE infrastructure and thus allows the integration into it. National aggregated CRIS systems, such as VIRTA, can also be compliant to these Guidelines by providing additional provenance information about their records. In the following we describe three major steps towards to become compliant with the OpenAIRE Guidelines for CRIS Managers.

1. Mapping the data model to CERIF
The first step of integration is to map the data model in your CRIS system to the CERIF data model as described in the Guidelines. The work needed may vary considerably between the different source systems and their data models. To use proper time and resources at this point it is highly recommended though, as it both improves the interoperability and quality of the metadata and makes the validation phase more fluently later on.
Gladly, there were many similarities between the VIRTA and CERIF data models to start with. However, some key differences had to be addressed. These included for example the vocabulary of publication types, the use of IDs in case of persistent identifiers as well as person IDs. Moreover, open access classifications needed to be homogenized and Finnish national classifications, e.g. scientific fields, needed to be taken into consideration when representing metadata both in human and machine readable formats required in CERIF. The mapping resulted in a rather long table, which includes the source VIRTA element and the equivalent CERIF element and examples for both. This up-to-date mapping is available at: https://wiki.eduuni.fi/x/lRLTB

2. Providing the data in CERIF-XML via OAI-PMH endpoint
As stated in the Guidelines, OpenAIRE harvests metadata by using the OAI-PMH protocol and the endpoint provided by the source system. This endpoint should provide the metadata in CERIF-XML which is made available by using the mapping done to the source system data model.
OAI-PMH was already implemented in VIRTA in order to provide metadata in both Dublin Core and VIRTA-XML formats (Figure. 2). This implementation was used as the basis for implementing OpenAIRE specifications. However, the implementation was extended and now supports an additional metadata prefix oai_cerif_openaire and the supported sets:
- openaire_cris_publications; 
- openaire_cris_persons; 
- openaire_cris_events; 
- openaire_cris_orgUnits.

Figure 2. VIRTA’s technical architecture related to data flows, procedures and APIs

3. Any extra steps?
Source systems aiming for OpenAIRE integration may require additional effort to get harvested by OpenAIRE. This might be due to metadata ownership and GDPR related issues, technological or infrastructure solutions not being able support endpoints or other issues which are not directly related to OpenAIRE, but rather have to be solved at the source system level.
This one extra step for VIRTA was due to the fact that currently VIRTA only stores a copy of metadata and research organizations act as registrars, i.e. owners, of the metadata. As such, an extra step in a form of written permissions were needed for metadata to be allowed for external services to use this metadata. In these permissions each research organization could allow OpenAIRE to harvest records affiliated to that organization via VIRTAs OAI-PMH endpoint. With coordination with the Finnish OpenAIRE National Open Access Desk (NOAD) with research organizations on the plans and data model mapping was done.

Summary

As the VIRTA-OpenAIRE integration goes into production in the following months, more than 350 000 scientific, professional and non-scholarly publications' metadata can be added to OpenAIRE's database and explored via the OpenAIRE portal.
By using VIRTA's OpenAIRE integration, the Finnish research organizations do not need to invest in their own solutions for OpenAIRE compliance. This leads to both high cost efficiency and greatly enhances the interoperability of Finnish publication metadata at European level, and in addition expands OpenAIRE's coverage in national metadata aggregators. 
Storing a full text in XML in JATS is also one of the Plan S Technical Guidance and requirements.

 

Other resources:

• About VIRTA Publication Information Service https://wiki.eduuni.fi/display/cscvirtajtp/VIRTA+in+English
• OpenAIRE Guidelines for CRIS Managers 1.1.1 https://openaire-guidelines-for-cris-managers.readthedocs.io/en/latest/index.html
• Up-to-date mapping table between VIRTA and CERIF data models https://wiki.eduuni.fi/pages/viewpage.action?pageId=80941717

This text was published in OpenAIRE blog (23th April 2019)

Göttingen University

The Göttingen State and University Library (UGOE) is one of the largest libraries in Germany and a leader in the development of digital libraries. It plays a key role in leading the Networking Activities in the EC-funded DRIVER project, building the digital repository infrastructure for Europe. UGOE is one of the leading open access institutions and hosts open access initiatives in measuring usage statistics, reference linking, citation analysis etc. UGOE also hosts the secretariat of DINI (German Initiative for Networked Information). It has collaborated with the other group members to develop the DINI guidelines, "Certificate Document and Publication Repositories" and "Electronic Publishing in Higher Education". Moreover, UGOE coordinates the development of open-access.net, a national information platform on Open Access (all information also available in English) and reaches out to expand over the German-speaking regions.

The role of UGOE in OpenAIRE would be strategic coordinator of Networking Activities, facilitating communication between partners in the project, leading the development of the European Helpdesk System, and contributing to the liaison with other related initiatives.
 
Contact persons
  • Norbert Lossau
  • Birgit Schmidt

Open Peer Review: Models, Benefits and Limitations (6th OpenAIRE workshop - June 2016)

Few would deny that peer review, as currently practiced, has its drawbacks. It is slow, unaccountable, wasteful of resources, and lacking in incentives yet it is an essential part of the scientific process. A variety of initiatives have set up experiments with different forms of Open Peer Review (OPR) making the process faster and less opaque. But what does it entail and how can it provide better scientific publications? To explore the possibilities of OPR a workshop on“Open Peer Review: Models, Benefits and Limitations”, was held by OpenAIRE in conjunction with The International Conference on Electronic Publishing (Elpub) in Göttingen, June 2016.

University of Athens

The University of Athens is one of the major higher degree public educational institutions in Greece. The University of Athens belongs to the Faculty of Applied Sciences. Department staff consists of 42 academic staff members and over 100 PhD candidates and research associates. The faculty, research staff and students participate in a large number of projects of national and international scale, funded by programmes such as RACE, ACTS, TELEMATICS, DELTA, ESPRIT II & III, IST, RISI. The department has a rich and long experience in several topics of Computer Science such as Software Engineering, Databases and Knowledge Bases; Experiment Management Systems; Workflow Management Systems; Digital Libraries; User Interfaces; Personalization and Usage Statistics; Data Warehouses; Data Mining; and Distributed Systems and has participated in several relevant research and development projects. Those funded during the last five years include the DILIGENT IP (IST FP6), DIAS (eContent), DELOS Network of Excellence (IST FP6), BRICKS Integrated Project (IST FP6), KATOPTRON (under a Greek initiative), DRIVER (IST FP6), DRIVER-II & D4SCIENCE (FP7), HEALTH-E-CHILD (FP6), PAPYRUS(FP7).

In the context of the OpenAIRE, NKUA, having the DRIVER experience, will give scientific, technological and administrative management support to the project. It will (a) act as the project coordinator ensuring smooth operation of the whole project, (b) enhance and maintain the services provided by OpenAIRE (c) participate in JRA activities for the study of research data and the processing of usage statistics from the OpenAIRE and the repository portals.

Contact persons

OpenAIRE
flag black white lowOpenAIRE-Advance receives
funding from the European 
Union's Horizon 2020 Research and
Innovation programme under Grant
Agreement No. 777541.
  Unless otherwise indicated, all materials created by OpenAIRE are licenced under CC ATTRIBUTION 4.0 INTERNATIONAL LICENSE.
OpenAIRE uses cookies in order to function properly. By using the OpenAIRE portal you accept our use of cookies.