Presentation of the project

The catalogue of the Biblioteca Virtual Miguel de Cervantes contains about 200,000 records which were originally created in compliance with the MARC21 standard. The entries in the catalogue have been recently migrated to a new relational database whose data model adheres to the conceptual models promoted by the International Federation of Library Associations and Institutions (IFLA), in particular, to the FRBR and FRAD specifications.

The database content has been later mapped, by means of an automated procedure, to RDF triples which employ mainly the RDA vocabulary (Resource Description and Access) to describe the entities, as well as their properties and relationships. In contrast to a direct transformation, the intermediate relational model provides tighter control over the process for example through referential integrity, and therefore enhanced validation of the output. This RDF-based semantic description of the catalogue is now accessible online.

HTML pages

Data.cervantesvirtual.com displays structured data in several pages which are manifestation, author, language and date:

  • On the author pages: citations of all the works authored by a person, corporate body or family (as author, illustrator, translator, etc.). Example: see Lope de Vega
  • On the manifestation pages: information about the work of which the Cervantesvirtual digital library holds at least one exemplar: manuscripts, electronic publications, digitized items in Cervantesvirtual digital library, audio-visual adaptations, etc. Example: see the page Cervantes o la casa encantada
  • On pages devoted to languages. Example: Español
  • On pages devoted to dates: statements of the events that happened during a given year, e.g. the creation of a work, the birth of an author, etc. Example: 1562

Unique and permanent identifiers have been assigned to every record using the catalogue identifiers. For example: http://data.cervantesvirtual.com/manifestation/224029.

Our data in RDF

The data belongs to the main catalogue. It is produced and stored in RDF format in a triple store making the items interoperable.

The data model used in data.cervantesvirtual.com makes it possible to include links to external sources (VIAF or DBpedia). Resources produced by Cervantesvirtual digital library (authority and catalogue records) are assigned permanent identifiers that enable the creation of persistent links.

External links

Data.cervantesvirtual.com is part of the Web and provides external links to Web sites, either maintained by the Cervantesvirtual digital library (full text search, dedicated websites) or external.

There are several kinds of links:

  • Links to other external repositories such as VIAF (Virtual International Authority File) or ISNI (International Standard Name Identifier).
  • Links to search forms in which query terms (author name, work title) are automatically pre-typed: Cervantesvirtual digital library catalogue, Europeana or Wikipedia.
  • Wikipedia provides thumbnails for authors and a short biography, which are retrieved from DBpedia.
  • Library of Congress provides the information about the international standard for language codes.

Ontologies and vocabularies

We preferred to reuse existing vocabularies in order to foster interoperability.

  • dc http://purl.org/dc/elements/1.1/
  • skos http://www.w3.org/2004/02/skos/core#
  • rdfs http://www.w3.org/2000/01/rdf-schema#
  • rdac http://rdaregistry.info/Elements/c/
  • rdaw http://rdaregistry.info/Elements/w/
  • rdae http://rdaregistry.info/Elements/e/
  • rdamt http://rdaregistry.info/termList/RDAMediaType/
  • rdact http://rdaregistry.info/termList/RDACarrierType/
  • rdau http://rdaregistry.info/Elements/u/
  • rdau http://rdaregistry.info/Elements/u/
  • rdam http://rdaregistry.info/Elements/m/
  • rdai http://rdaregistry.info/Elements/i/
  • rdai http://rdaregistry.info/Elements/i/
  • rdaa http://rdaregistry.info/Elements/a/
  • time http://www.w3.org/2006/time#
  • madsrdf http://www.loc.gov/mads/rdf/v1#

Embedded data: Schema.org

Author and work pages are open on the Web and can be reached by search engines. This is why, except from the traditional methods used for indexing the homepage, we have chosen to embed data to structure these pages:

Schema.org, provides a vocabulary to add information to the HTML content, with a microdata format, to foster the indexing by search engines. data.cervantesvirtual.com used: http://schema.org/Person, http://schema.org/Organization and http://schema.org/Book.

Opengraph Protocol (OG), so that the pages can be represented in social networks. It is a very simple vocabulary to encode in RDFa metadata to be retrieved when the user adds the resource to its Facebook profile.

Glyphicon Halflings set icons are included in the framework.

Software

We use the free RDF API Apache Jena for transforming the catalogue records into RDF. Jena is an open source Java library to develop Web semantic applications and is published under the Apache License, Version 2.0.

Once the RDF triples are generated, we use the Sesame server for storing and querying. Sesame is a powerful Java framework for processing and handling RDF data. This includes creating, parsing, storing, inferencing and querying over such data.It offers an easy-to-use API that can be connected to all leading RDF storage solutions.

Use and share

In order to facilitate the use and share the BVMC repository has been published at datahub.io.

Publications

  • Gustavo Candela Romero, Maria Pilar Escobar Esteban, Manuel Marco Such, Rafael C. Carrasco

    Transformation of a Library Catalogue into RDA Linked Open Data.

    TPDL 2015: 321-325

  • Gustavo Candela Romero, Maria Pilar Escobar Esteban, Manuel Marco Such, Rafael C. Carrasco

    Migration of a library catalogue into RDA linked open data.

    Semantic Web Journal 2017. Online