OAI-PMH service for K2N Fedora Commons based repository

From Biowikifarm Metawiki
Revision as of 21:45, 15 November 2009 by LiaVeja (Talk | contribs) (Conclusions: Is this service reliable for our purpose or not?)

Jump to: navigation, search

The current document would be considered only a draft. Insofar as we achieve advanced knowledge about OAI Provider Service for Fedora Commons based repository, we will complete the present document.

Generalities regarding OAI-PMH protocol

The Open Archives Initiative Protocol for Metadata Harvesting (referred to as the OAI-PMH in the remainder of this document) provides an application-independent interoperability framework based on metadata harvesting. There are two classes of participants in the OAI-PMH framework:

  • Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and
  • Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services.

For the moment, we are interested in exposing our K2N metadata as Data Provider to harvesters (Service Providers) such Europeana. Fedora Commons repository has been provided with such a rough service since 2.2.1 version. Actually, since 3.0 version, with major changes in Fedora Commons Content Model approach, the OAI Provider service it seems to be very flexible and we believe, reliable. The current implementation is Fedora Commons 3.0-3.2 compatible.

OAI Provider for Fedora Commons Repository

The new provider is based on PROAI, an open source caching, polling OAI provider. It has the following features:

  • Supports any metadata format available through your Fedora Repository via a Datastream or dissemination service
  • Supports sets that are expressed as RDF relationships in your digital objects' RELS-EXT Datastreams as exposed via the Resource Index
  • Runs as a Web application in any servlet container, acting as a Web service client to the Fedora Repository
  • Caches the content of Fedora Repository disseminations and Datastreams intended to be exposed as OAI records allowing for fast response times and ensuring that the OAI provider can continue to run even when the Fedora Repository is temporarily stopped.

OAI Provider Service Installation

Download from Fedora Commons Services oaiprovider distribution under Fedora Commons 3.0 version. The .src distribution contains under "src/demo/" directory very useful .foxml demo files. See oaiprovider-1.2-src.zip For advanced users, accustomed already with oaiprovider service, the simple oaiprovider-1.2.zip it's enough.

Installation

To install the service:

    1. Make sure you have a suitable database installed (MySQL, PostgreSQL, Oracle, or McKoi) and a database user account that can create tables in the database.
    2. Make sure your Fedora Repository is running with the Resource Index turned ON. This is necessary because the OAI provider periodically queries the resource index to discover which records of interest have changed.
    3. Deploy the oaiprovider.war file into your servlet container.
    4. Configure the OAI Provider as described in the Configuration section below.
    5. Re-start the Web application (this is often done by restarting the servlet container itself).

Demos ingestion

  • Complete installation steps 1-4 above. Start with the default values in the proai.properties configuration file and ensure the following properties are set according to your own Fedora Repository installation:
    • driver.fedora.baseURL
    • driver.fedora.user
    • driver.fedora.pass
  • Make sure your Fedora Repository installation is configured to retain PIDs of objects in the "demo" PID namespace on ingest. You can check this in your fedora.fcfg file: If one of the values of "retainPIDs" is "demo" or "*" (asterisk), your repository is configured correctly. Otherwise, you should add this value and re-start it.
  • Use the fedora-admin GUI or fedora-ingest command-line utility to ingest all demonstration objects in the src/test/foxml directory of the Fedora OAI Provider service source distribution.
  • Start the Web application.

How to configure OAI Provider Service

See also: OAI Provider Configuration Reference

First successful tests

First tests have been performed on a local installed Fedora Commons 3.2. Some elements of the proai.properties file was set as follows:

proai.validateUpdates = false

After firsts test, step-by-step modification both in K2N Fedora Commons repository and proai.properties file, finally, some results it seems to come into sight. At the request:

http://localhost:8080/oaiprovider/?verb=ListRecords&metadataPrefix=k2n

The response for new ingested digital object RELS-EXT datastream is:

<?xml version="1.0" encoding="UTF-8"?>
 <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
                            http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2009-11-12T21:38:41Z</responseDate>
  <request verb="ListRecords" metadataPrefix="k2n">http://localhost:8080/oaiprovider/</request>
<ListRecords>
<record xmlns="http://www.openarchives.org/OAI/2.0/">
 <header>
   <identifier>oai:example.org:item22</identifier>
   <datestamp>2009-11-12T21:33:39Z</datestamp>
 </header>
 <metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:k2n="http://example.org/k2n/" xmlns:fedora- 
model="info:fedora/fedora-system:def/model#" xmlns="http://www.fedora.info/definitions/" xmlns:oai="http://www.openarchives.org 
/OAI/2.0/">
 <rdf:Description rdf:about="info:fedora/K2N:SI_8fdaa4be929fad338e7442f58c5c13">
      <k2n:Normal_Preview_Availability>online (free)</k2n:Normal_Preview_Availability>
      <k2n:Creation_Date>2008</k2n:Creation_Date>
      <k2n:Scientific_Names>Halimeda tuna (J. Ellis et Solander) J.V.Lamouroux</k2n:Scientific_Names>
      <k2n:Format>jpg</k2n:Format>
      <k2n:Country_Names>Italy</k2n:Country_Names>
      <k2n:Creators>Diego Poloniato</k2n:Creators>
      <k2n:Best_Quality_URI>http://dbiodbs.units.it/quint/al/foto/AL000501.jpg.jpg</k2n:Best_Quality_URI>
      <k2n:Metadata_Creator>Annalisa Falace</k2n:Metadata_Creator>
      <k2n:Best_Quality_Availability>online (free)</k2n:Best_Quality_Availability>
      <k2n:Copyright_Statement>Copyright of the author</k2n:Copyright_Statement>
      <k2n:Normal_Preview_URI>http://dbiodbs.units.it/quint/al/foto/pics/AL000501.jpg.jpg</k2n:Normal_Preview_URI>
      <k2n:License_Statement>To be discussed with the author</k2n:License_Statement>
      <k2n:Collection_By_Resource_ID>Algae_(UNITS)</k2n:Collection_By_Resource_ID>
      <k2n:Resource_ID>http://dbiodbs.units.it/quint/al/foto/AL000501.jpg</k2n:Resource_ID>
      <k2n:Metadata_Language>en</k2n:Metadata_Language>
      <k2n:Taxon_Category>Algae</k2n:Taxon_Category>
      <oai:itemID>oai:example.org:item22</oai:itemID>
     <fedora:isMemberOf rdf:resource="info:fedora/K2N:Collection_1e52cf53dfa639a4c94dc1396f37aa"></fedora:isMemberOf>
     <fedora:serviceProvidedBy rdf:resource="info:fedora/K2N:Provider_4f2e3b2b2b85ac2b1f9638b08b87a8"</fedora:serviceProvidedBy>
   </rdf:Description>
  </rdf:RDF>
 </metadata>
</record>
<record xmlns="http://www.openarchives.org/OAI/2.0/">
 <header>
   <identifier>"""oai:example.org:item22"""</identifier>
   <datestamp>2009-11-12T20:09:51Z</datestamp>
 </header>
 <metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 <rdf:Description rdf:about="info:fedora/demo:SI_ex1">
   <itemID xmlns="http://www.openarchives.org/OAI/2.0/">"oai:example.org:item22"</itemID>
   <isMemberOf xmlns="info:fedora/fedora-system:def/relations-external#" rdf:resource="info:fedora/demo:SetPrime"></isMemberOf>
   <serviceProvidedBy xmlns="info:fedora/fedora-system:def/relations-external#" rdf:resource="info:fedora/demo:SetPrime">
</serviceProvidedBy>
   <Scientific_Names xmlns="http://example.org/k2n/">"Halimeda tuna (J. Ellis et Solander) J.V.Lamouroux"</Scientific_Names>
 </rdf:Description>
</rdf:RDF>
 </metadata>
</record>
<record xmlns="http://www.openarchives.org/OAI/2.0/">
 <header>
   <identifier>"""\\""oai:example.org:item22\\"""""</identifier>
   <datestamp>2009-11-12T20:11:54Z</datestamp>
 </header>
 <metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 <rdf:Description rdf:about="info:fedora/demo:SI_ex1">
   <itemID xmlns="http://www.openarchives.org/OAI/2.0/">"\"oai:example.org:item22\""</itemID>
   <isMemberOf xmlns="info:fedora/fedora-system:def/relations-external#" rdf:resource="info:fedora/demo:SetPrime"></isMemberOf>
   <serviceProvidedBy xmlns="info:fedora/fedora-system:def/relations-external#" rdf:resource="info:fedora/demo:SetPrime"> 
</serviceProvidedBy>
   <Scientific_Names xmlns="http://example.org/k2n/">Halimeda tuna (J. Ellis et Solander) J.V.Lamouroux</Scientific_Names>
 </rdf:Description>
</rdf:RDF>
 </metadata>
</record>
<record xmlns="http://www.openarchives.org/OAI/2.0/">
 <header>
   <identifier>oai:example.org:item23</identifier>
   <datestamp>2009-11-12T21:13:26Z</datestamp>
 </header>
 <metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"  
xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:k2n="http://example.org/k2n/"  
xmlns:oai="http://www.openarchives.org/OAI/2.0/" xmlns:fedora-model="info:fedora/fedora-system:def/model#" 
xmlns="http://www.fedora.info/definitions/">
              <rdf:Description rdf:about="info:fedora/demo:SI_ex2">
                 <k2n:Normal_Preview_Availability>online (free)</k2n:Normal_Preview_Availability>
                 <k2n:Creation_Date>2008</k2n:Creation_Date>
                 <k2n:Scientific_Names>Halimeda tuna (J. Ellis et Solander) J.V.Lamouroux</k2n:Scientific_Names>
                 <k2n:Format>jpg</k2n:Format>
                 <k2n:Country_Names>Italy</k2n:Country_Names>
                 <k2n:Creators>Diego Poloniato</k2n:Creators>
                 <k2n:Best_Quality_URI>http://dbiodbs.units.it/quint/al/foto/AL000500.jpg.jpg</k2n:Best_Quality_URI>
                 <k2n:Metadata_Creator>Annalisa Falace</k2n:Metadata_Creator>
                 <k2n:Best_Quality_Availability>online (free)</k2n:Best_Quality_Availability>
                 <k2n:Copyright_Statement>Copyright of the author</k2n:Copyright_Statement>
                 <k2n:Normal_Preview_URI>http://dbiodbs.units.it/quint/al/foto/pics/AL000500.jpg.jpg</k2n:Normal_Preview_URI>
                 <k2n:License_Statement>To be discussed with the author</k2n:License_Statement>
                 <k2n:Collection_By_Resource_ID>Algae_(UNITS)</k2n:Collection_By_Resource_ID>
                 <k2n:Resource_ID>http://dbiodbs.units.it/quint/al/foto/AL000500.jpg</k2n:Resource_ID>
                 <k2n:Metadata_Language>en</k2n:Metadata_Language>
                 <k2n:Taxon_Category>Algae</k2n:Taxon_Category>
                 <oai:itemID>oai:example.org:item23</oai:itemID>
                 <fedora:isMemberOf rdf:resource="info:fedora/K2N:Collection_1e52cf53dfa639a4c94dc1396f37aa"></fedora:isMemberOf>
                 <fedora:serviceProvidedBy rdf:resource="info:fedora/K2N:Provider_4f2e3b2b2b85ac2b1f9638b08b87a8"> 
</fedora:serviceProvidedBy>
              </rdf:Description>
           </rdf:RDF>
 </metadata>
</record>

For updated RELS-EXT datastream, the response is:

<record xmlns="http://www.openarchives.org/OAI/2.0/">
 <header>
   <identifier>K2N:SI_8fdaa4be929fad338e7442f58c5c13</identifier>
   <datestamp>2009-11-12T21:37:42Z</datestamp>
 </header>
<metadata>
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/K2N:SI_8fdaa4be929fad338e7442f58c5c13">
   <Normal_Preview_Availability xmlns="http://example.org/k2n/">"online (free)"</Normal_Preview_Availability>
   <Creation_Date xmlns="http://example.org/k2n/">"2008"</Creation_Date>
   <Scientific_Names xmlns="http://example.org/k2n/">"Halimeda tuna (J. Ellis et Solander)  
J.V.Lamouroux&quot</Scientific_Names>
   <Format xmlns="http://example.org/k2n/">"jpg"</Format>
   <Country_Names xmlns="http://example.org/k2n/">"Italy"</Country_Names>
   <Creators xmlns="http://example.org/k2n/">"Diego Poloniato"</Creators>
   <Best_Quality_URI xmlns="http://example.org/k2n/">http://dbiodbs.units.it/quint/al/foto/AL000501.jpg.jpg</Best_Quality_URI>
   <Metadata_Creator xmlns="http://example.org/k2n/">Annalisa Falace</Metadata_Creator>
   <Best_Quality_Availability xmlns="http://example.org/k2n/">online (free)</Best_Quality_Availability>
   <Copyright_Statement xmlns="http://example.org/k2n/">"Copyright of the author"</Copyright_Statement>
   <Normal_Preview_URI xmlns="http://example.org/k2n/">http://dbiodbs.units.it/quint/al/foto  
/pics/AL000501.jpg.jpg</Normal_Preview_URI>
   <License_Statement xmlns="http://example.org/k2n/">"To be discussed with the author"</License_Statement>
   <Collection_By_Resource_ID xmlns="http://example.org/k2n/">"Algae_(UNITS)"</Collection_By_Resource_ID>
   <Resource_ID xmlns="http://example.org/k2n/">"http://dbiodbs.units.it/quint/al/foto/AL000501.jpg%22</Resource_ID>
   <Metadata_Language xmlns="http://example.org/k2n/">"en"</Metadata_Language>
   <Taxon_Category xmlns="http://example.org/k2n/">"Algae"</Taxon_Category>
   <itemID xmlns="http://www.openarchives.org/OAI/2.0/">K2N:SI_8fdaa4be929fad338e7442f58c5c13</itemID>
   <isMemberOf xmlns="info:fedora/fedora-system:def/relations-external#"
rdf:resource="info:fedora/K2N:Collection_1e52cf53dfa639a4c94dc1396f37aa"></isMemberOf>
   <serviceProvidedBy xmlns="info:fedora/fedora-system:def/relations-external#" 
rdf:resource="info:fedora/K2N:Provider_4f2e3b2b2b85ac2b1f9638b08b87a8"></serviceProvidedBy>
 </rdf:Description>
</rdf:RDF>
 </metadata>
</record>
</ListRecords>
</OAI-PMH>

Conclusions: Is this service reliable for our purpose or not?

  • It seems that RELS-EXT datastream are treated in a dual manner: as special RELS-EXT datastream for relationships expressions and normal datastream (and we should take advantages from this).
  • A special element is necessary on RELS-EXT datastream, in order to identify the record id for OAI-PMH:
<oai:itemID>K2N:SI_8fdaa4be929fad338e7442f58c5c13</oai:itemID>
  • We need the answer from Europeana library: if this format is desirable for them, being in OAI-PMH format it should be. But we still need a schema.
  • Other solution: to build a wrapper in order to expose our K2N metadata to harvesters. This solution might present some other implications:
    • It would be necessary to build a wrapper for any harvester - is not a reliable solution;
    • to adopt a standard format as EAD or MODS in order to expose our metadata;
    • to build a schema for our metadata inside of RELS-EXT datastream and to use OAI Provider Fedora Commons Service in order to expose our metadata. Maybe a wrapper application between our OAI Provider and harvesters would be necessary.

Discussions are welcome.