Implementation details of Java-based Fedora Ingestion code
From Biowikifarm Metawiki
Revision as of 17:44, 20 September 2009 by LiaVeja (Talk | contribs) (Created page with '==Implementation Phase== * A harvest tool to recurrently poll the MediaWiki address looking for new online medatada since the last ingest; * A parsing tool, which download new m...')
Implementation Phase
- A harvest tool to recurrently poll the MediaWiki address looking for new online medatada since the last ingest;
- A parsing tool, which download new metadata. This tool performs a simple syntactical analysis in order to generate a FOXML file as correct as possible.
- An identification tool, to establish the object identifier based on the MD5 checksum algorithm, calculated against all metadata prepared for ingestion;
- A search tool which is looking for already ingested objects. This tool is based on SOAP clients (Fedora API-A, API-M and GSearch SOAP Client);
- A preparation tool, to preparse FOXML files for ingest and establish the relationships between digital objects;
- Ingest tool;
- A MediaWiki editing tool used to write messages for metadata providers. The outcome consists of MediaWiki pages, linked to existing provider/collection pages.
Workflow
- metadata providers insert metadata in the K2N MediaWiki, using a template;
- Metadata are pulled from MediaWiki in a cache database;
- The ingest tool is "new metadata sensitive", ingests this new metadata and creates new Fedora digital objects;
- The GSearch indexing service is triggered via Fedora’s messaging service and adds the new digital objects to it’s indexes;
- If existing, the messages for providers could be shown on a page with the following path: Provider/Collection/Metadata_Aggregation_Report;
- The search tool, based on the GSearch engine, is able to retrieve new digital objects after every new ingest.