Preliminary test results for Fedora Ingestion Service
From Biowikifarm Metawiki
First Test results
- Notice regarding GSearch Service. During the tests, we noticed that if the Ingest Service stops for a reason (RPC errors, Internet connection problems, operator side cancellation of task), GSearch will delete the index file, and indexes only the last successfully ingested objects, after the incident.
- Proposed solution: to manage Fedora Commons messaging service in this cases, in order to send a notification to the Repository Admin, or, better, for the massive ingest, to stop the GSearch indexing before the ingest, and fire it after that again, in an background separate thread.
- 1,462,351 milliseconds manually indexing time for 212,713 objects, because the index file was deleted therefore as a console cancellation of the work flow.
- 1,920,544 milliseconds manually indexing time for 258,493 objects after a massive ingest. The messaging service has been disabled.
- Memory used by Ingestion Service application:
- The Ingestion Service uses maximum amount of 320MB of memory and 50% from a processor of 1,8 GB, because of using of Xalan-java 2.7.1 XSLT processor.
- The preparation for Ingest uses the maximum memory, effective ingest uses only 60MB of memory.
- Tests results: after testing of the Ingest Service on the biggest collection Vascular plants (UNITS), reports as follows:
- Preparing for ingest takes: 52 minutes. Files validation and splitting multiple valued items takes 35 minutes from this total time for 49851 objects
- Effective ingest took from 12:50:00 PM to 17:45:00 PM, about 10,000/hour.
(Return to Software documentation)