Difference between revisions of "FEDORA Batch Import"
GiselaWeber (Talk | contribs) (Created page with '===Manual batch import test=== Manual batch import is described in FEDORA_HOME/docs/userdocs/client/batch/batchtool.doc or http://fedora.info/download/2.2.1/userdocs//client/bat...') |
GiselaWeber (Talk | contribs) |
||
Line 140: | Line 140: | ||
<fedora:isMemberOf rdf:resource="info:fedora/demo:StillImageCollection"/> | <fedora:isMemberOf rdf:resource="info:fedora/demo:StillImageCollection"/> | ||
in RELS-EXT. | in RELS-EXT. | ||
+ | |||
+ | [[Category:Software documentation]] |
Revision as of 14:06, 19 January 2010
Manual batch import test
Manual batch import is described in FEDORA_HOME/docs/userdocs/client/batch/batchtool.doc or http://fedora.info/download/2.2.1/userdocs//client/batch/batchtool.doc . With fedora-admin it is possible to create a batch of digital objects, to ingest such a batch or to do both combined .
To create a batch of digital objects a general template with data common to all objects of the batch is needed. This must be a Fedora METS or a Fedora FOXML XML document. The object-specific substitutions have to be in separate XML documents. There are demo files for mets-template, foxml-template (File:Foxml-template.xml) and object-specifics (e.g. File:Americanacademy Beispiel.xml) in FEDORA_HOME/client/demo/batch-demo.
For a simple test I made a copy of the foxml-template.xml and 4 copies of one example of the object-specific documents. The example batch was intended to contain 4 objects with 3 datastreams in each object: "DC" (Dublin Core metadata, here dc:title and dc:identifier), "RELS-EXT" for metadata for the resource index(k2n:ScientificName, k2n:Country and k2n:Url) and "Image" as externally referenced image/jpeg file. There is also a disseminator with the same bDef and bMech for all objects. In this test I used the bdef "demo:27" and the bMech "demo:28" which belong to the demos delivered with Fedora and use a Java servlet "ImageManipulation" also delivered with Fedora.
In the copies of the example files, one can delete all the datastream elements which are not needed and fill in the data for the needed elements (File:Foxml-template Beispiel Ranunculus.xml). Since the disseminator is the same for all objects, there is no disseminator element in the object-specific files, only in the template:
<foxml:disseminator BDEF_CONTRACT_PID="demo:27" ID="DISS1" STATE="A" VERSIONABLE="true"> <foxml:disseminatorVersion BMECH_SERVICE_PID="demo:28" ID="DISS1.0" LABEL="Ranunculus disseminator"> <foxml:serviceInputMap> <foxml:datastreamBinding DATASTREAM_ID="Image" KEY="url" LABEL="Binding to IMAGE" ORDER="0"/> </foxml:serviceInputMap> </foxml:disseminatorVersion> </foxml:disseminator>
On the other hand, the content of the DC and RELS-EXT datastreams and the external link for the Image are provided by each object-specific file (File:Ranunculus angustifolius Beispiel.xml).
The objects are created with fedora-admin Tools -> Batch -> BuildBatch. In the window which opens, one has to enter the template file, an input directory containing all and only the object-specific files, an output directory to hold all and only the created object files and a file path of own choice for object processing map (output file), a file which maps object-specs to objects built.
After successful building of the objects, they can be ingested with Tools -> Batch -> IngestBatch.
Search tests
iTQL
The example ingested in this way is also searchable for the k2n metadata ScientificName, Country and Url in the Fedora Resource Index Query Service. Having done the same batch import as described above for the example proposed in FEDORA Evaluation#Test driving Fedora, the following iTQL query (see e.g. here) in the fedora/risearch FindTuples user interface:
select $subject $title $identifier $ScientificName $URL $Country
from <#ri>
where $subject <http://key2nature.eu/ns/test-rels-ext/ScientificName> $ScientificName
and $subject <http://key2nature.eu/ns/test-rels-ext/url> $URL
and $subject <dc:title> $title
and $subject <dc:identifier> $identifier
and $subject <http://key2nature.eu/ns/test-rels-ext/Country> $Country
and $Country <tucana:is> 'it'
gives the following result in sparql:
<sparql>
- <head>
- <variable name="subject"/>
- <variable name="title"/>
- <variable name="identifier"/>
- <variable name="ScientificName"/>
- <variable name="URL"/>
- <variable name="Country"/>
- </head>
- <results>
...
- <result>
- <subject uri="info:fedora/demo:K2NBatchtest2"/>
- <title>Lichtnelke</title>
- <identifier>demo:K2NBatchtest2</identifier>
- <ScientificName>Silene italica</ScientificName>
- <URL>http://www.funghiitaliani.it/uploads/post-5-1141817448.jpg</URL>
- <Country>it</Country>
- </result>
- <result>
- <subject uri="info:fedora/demo:K2NBatchtest5"/>
- <title>Melandrium</title>
- <identifier>demo:K2NBatchtest5</identifier>
- <ScientificName>Melandrium alba</ScientificName>
- <URL>http://www.ckkaempfe.de/chr/2005-05-eifel/pd5241.jpg</URL>
- <Country>it</Country>
- </result>
- <result>
- </results>
</sparql>
The example 1 with the title "Flowers of Silene italica" is not included in the result because it has no ScientificName element. It seems, however, that there are no wildcards in itql, so that a query for "k2n:ScientificName starts with 'GenusName'" might not be possible.
RDQL
On the other hand, the query language RDQL (see e.g. here) can handle regular expressions, so that the following RDQL query:
select ?subject ?identifier ?ScientificName ?URL ?Country
from <#ri>
where (?subject <dc:identifier> ?identifier ),
(?subject <http://key2nature.eu/ns/test-rels-ext/url> ?URL),
(?subject <http://key2nature.eu/ns/test-rels-ext/Country> ?Country),
(?subject <http://key2nature.eu/ns/test-rels-ext/ScientificName> ?ScientificName)
AND ?ScientificName=~ /^Silene/
gives the result (in "Simple" format):
subject : <info:fedora/demo:K2NBatchtest2>
identifier : "demo:K2NBatchtest2"
ScientificName : "Silene italica"
URL : "http://www.funghiitaliani.it/uploads/post-5-1141817448.jpg"
Country : "it"
subject : <info:fedora/demo:K2NBatchtest2>
identifier : "demo:K2NBatchtest2"
ScientificName : "Silene italica"
URL : "http://www.funghiitaliani.it/uploads/post-5-1141817448.jpg"
Country : "ch"
subject : <info:fedora/demo:K2NBatchtest4>
identifier : "demo:K2NBatchtest4"
ScientificName : "Silene italica"
URL : "http://flora.nhm-wien.ac.at/Bilder-Thumbnails/Silene-italica.jpg"
Country : "de"
If the line "(?subject <http://key2nature.eu/ns/test-rels-ext/Country> "it")," is added to the query, only demo:K2NBatchtest2 is returned. A combined filter expression is also possible. The query:
select ?subject ?identifier ?ScientificName ?URL ?Country ?title
from <#ri>
where (?subject <dc:identifier> ?identifier ),
(?subject <dc:title> ?title ),
(?subject <http://key2nature.eu/ns/test-rels-ext/url> ?URL),
(?subject <http://key2nature.eu/ns/test-rels-ext/Country> ?Country),
(?subject <http://key2nature.eu/ns/test-rels-ext/ScientificName> ?ScientificName)
AND ((?ScientificName=~ /^Silene/) || (?title=~ /silene/i))
returns those objects whose ScientificName starts with "Silene" or whose title contains "silene" (case insensitive). For such a query it is necessary that the RELS-EXT of those objects without a ScientificName contains an empty tag: <k2n:ScientificName/>, otherwise they are not included in the result.
Improved Batch Import
In order to create a larger number of object-specific files for batch ingestion, one template object specific file was created with placeholders for all values specific for the individual objects. Currently, these values (metadata from the first metadata survey for secondary data) are stored in a table in a database. The values are read from the table and the placeholders replaced with the values specific for each object by a Java program. In this program it is also possible to not just replace the values inside the existing XML elements in the template, but to write the XML elements for the values. This is necessary if a metadata field contains a list of values, so that several elements of the same type are needed. A metadata value can also be represented as an object-to-object relationship in the RELS-EXT datastream. For example, if type =StillImage, this can be expressed by the line
<fedora:isMemberOf rdf:resource="info:fedora/demo:StillImageCollection"/>
in RELS-EXT.