Implementation details of Java-based Fedora Ingestion code

From Biowikifarm Metawiki
Revision as of 17:44, 20 September 2009 by LiaVeja (Talk | contribs) (Created page with '==Implementation Phase== * A harvest tool to recurrently poll the MediaWiki address looking for new online medatada since the last ingest; * A parsing tool, which download new m...')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Implementation Phase

  • A harvest tool to recurrently poll the MediaWiki address looking for new online medatada since the last ingest;
  • A parsing tool, which download new metadata. This tool performs a simple syntactical analysis in order to generate a FOXML file as correct as possible.
  • An identification tool, to establish the object identifier based on the MD5 checksum algorithm, calculated against all metadata prepared for ingestion;
  • A search tool which is looking for already ingested objects. This tool is based on SOAP clients (Fedora API-A, API-M and GSearch SOAP Client);
  • A preparation tool, to preparse FOXML files for ingest and establish the relationships between digital objects;
  • Ingest tool;
  • A MediaWiki editing tool used to write messages for metadata providers. The outcome consists of MediaWiki pages, linked to existing provider/collection pages.

Workflow

  • metadata providers insert metadata in the K2N MediaWiki, using a template;
  • Metadata are pulled from MediaWiki in a cache database;
  • The ingest tool is "new metadata sensitive", ingests this new metadata and creates new Fedora digital objects;
  • The GSearch indexing service is triggered via Fedora’s messaging service and adds the new digital objects to it’s indexes;
  • If existing, the messages for providers could be shown on a page with the following path: Provider/Collection/Metadata_Aggregation_Report;
  • The search tool, based on the GSearch engine, is able to retrieve new digital objects after every new ingest.