FEDORA Installation

From Biowikifarm Metawiki
Revision as of 13:54, 19 January 2010 by GiselaWeber (Talk | contribs)

Jump to: navigation, search

Note: additional details on earlier installations and modifications are last available here).

Fedora 3.1

Installation of Fedora under Linux Debian 4.0: This is the first production installation, using the primary Tomcat 5.5 servlet container of Debian and the mysql database. The installation follows this guide: http://fedora-commons.org/confluence/display/FCR30/Installation+and+Configuration+Guide, all details are given below.

Update: Currently the Tomcat 5.5 on the Debian server is not working, so this installation is also using the Tomcat included in Fedora.

Fedora 3.1 was installed not with the included McKoi database but MySQL, which might bring better performance. The installation guide says that the inbuilt McKoi database should not be used for any production repository. During installation, the following values were entered:

database = fedora31
defaultcharacterset = utf8
defaultcollation = utf8-bin
user = k2nFedora
JDBC URL = (default)
driverClass = (default)

The Fedora installation can be accessed by http://160.45.63.55:8183/fedora/search and http://160.45.63.55:8183/fedora/risearch. The risearch interface now also supports Sparql as query language.

Update: Fedora is running now on the virtual host and can be reached at: http://fedora.keytonature.net/fedora/ . All Fedora's services are running under this address.

The Fedora object data where placed in /var/lib/fedora/data/objects, with a symbolic link from the original directory $FEDORA_HOME/data/objects directory. In $FEDORA_HOME/server/config/fedora.fcfg <param name="object_store_base" value "data/objects" was changed accordingly. In $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/repository/BasicRepos/repository.properties the fgsrepository.fedoraObjectDir was also changed accordingly.

To start Fedora on Linux, run

./startup.sh

in $FEDORA_HOME/tomcat/bin.

To make Fedora start when the machine boots, a script called start_fedora was added to /etc/init.d:

#!/bin/sh
# /etc/init.d/start_fedora -- startup script for the fedora tomcat
#
# Written by Gisela Weber 2008-12-15
FEDORA_HOME=/usr/share/fedora
JAVA_HOME=/usr/lib/jvm/java-6-sun
export JAVA_HOME
JRE_HOME=/usr/lib/jvm/java-6-sun/jre
export JRE_HOME
case "$1" in
 start)
	echo "Starting fedora 3.1"
	$FEDORA_HOME/tomcat/bin/startup.sh 
	;;
  stop)
	echo "Stopping fedora 3.1"
	$FEDORA_HOME/tomcat/bin/shutdown.sh
	;;
 
esac

and update-rc.d start_fedora defaults was run (see [1]).

To be able, if necessary, to have several Fedora installations with built-in tomcat and/or one using the existing tomcat working at the same time, edit .../tomcat/bin/catalina.sh and add

#set CATALINA_HOME for this fedora instance
export CATALINA_HOME=/path/to/tomcat
#set FEDORA_HOME for this fedora instance export FEDORA_HOME=/path/to/fedora

for each fedora/tomcat,using the correct path for your installation. However, this currently does not work for gSearch.


GSearch

GSearch was installed on this Fedora installation in the same way as for Fedora 3.0, following the documentation included in the genericsearch-2.1.1.zip archive and FEDORA Evaluation#Gsearch Installation. It runs under http://160.45.63.55:8183/fedoragsearch/rest.

Update: GSearch is running at http://fedora.keytonature.net/fedoragsearch/services/FgsOperations

There was the same problem with the empty /BasicIndex directory and an error message saying that files segment* were not found in directory .../gsearch/BasicIndex as with Fedora 3.0. These files were then copied from the previous installation. The directory ../fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes contains several subdirectories: config, configBasic and several others with identical structure, to support different types of index. It was not clear whether the changes in the stylesheets like basicFoxmlToLucene.xslt (see FEDORA Evaluation#Gsearch Installation) should be made in config or configBasic, so that at first the styleshet was changed in both directories. However, it appears that changes in the indexing can be achieved by changing /config/index/BasicIndex/basicFoxmlToLucene.xslt, changing the design (e.g. the title of the user interface page) by changing /configBasic/rest/basiccommon.xslt.

 ###Part of .../config/index/BasicIndex/basicFoxmlToLucene.xslt
 <xsl:for-each select="foxml:datastream/foxml:datastreamVersion[last()]/foxml:xmlContent/oai_dc:dc/*">
    <xsl:choose>
         <xsl:when test= "name()='dc:language'"> 
              <IndexField index="UN_TOKENIZED" store="YES" termVector="NO">
                   <xsl:attribute name="IFname">
                        <xsl:value-of select="concat('dc.', substring-after(name(),':'))"/>
                   </xsl:attribute>
                   <xsl:value-of select="text()"/>
               </IndexField>
         </xsl:when>
         <xsl:otherwise>
               <IndexField index="TOKENIZED" store="YES" termVector="YES">
                   <xsl:attribute name="IFname">
                        <xsl:value-of select="concat('dc.', substring-after(name(),':'))"/>
                   </xsl:attribute>
                   <xsl:value-of select="text()"/>
               </IndexField>
         </xsl:otherwise>
   </xsl:choose>
 </xsl:for-each>
			
<!-- RELS-EXT  -->
 <xsl:for-each select="foxml:datastream/foxml:datastreamVersion[last()]/foxml:xmlContent/rdf:RDF/rdf:Description/*">
    <xsl:choose>
	<xsl:when test= "name()='fedora:isMemberOf' or name()='fedora:serviceProvidedBy'">
	     <IndexField index="TOKENIZED" store="YES" termVector="YES">
		<xsl:attribute name="IFname">
		    <xsl:value-of select="concat('k2nrelation.', substring-after(name(),':'))"/>
		</xsl:attribute>
		<xsl:value-of select="@rdf:resource"/>	
	     </IndexField>
	</xsl:when>
	<xsl:when test= "name()='k2n:Country_Codes' or name()='k2n:Metadata_Language'">
             <IndexField index="UN_TOKENIZED" store="YES" termVector="NO">
                 <xsl:attribute name="IFname">
                    <xsl:value-of select="concat('k2n.', substring-after(name(),':'))"/>
                 </xsl:attribute>
                 <xsl:value-of select="text()"/>  
              </IndexField>
        </xsl:when>
	<xsl:otherwise>
	      <IndexField index="TOKENIZED" store="YES" termVector="YES">
		 <xsl:attribute name="IFname">
		    <xsl:value-of select="concat('k2n.', substring-after(name(),':'))"/>
			</xsl:attribute>
			<xsl:value-of select="text()"/>	
	       </IndexField>
	</xsl:otherwise>
    </xsl:choose>
 </xsl:for-each>

the fields "dc:language", "k2n:Country_Codes" and "k2n:Metadata_Language" need to be UN_TOKENIZED, because otherwise Lucene's StandardAnalyzer filters "stop words" like "it", "or", "the" etc. Thus, "it" for Italy or italian would not be found. However, without the StandardAnalyzer the search is not case-insensitive, so that at ingestion 'toLowerCase()' has to be applied to the values of these fields.

To update the GSearch index from the user interface first click updateIndex on http://fedora.keytonature.net/fedoragsearch/rest. There are the following options:

* createEmpty - creating or emptying the index. For a new index, you have to run createEmpty once,
  before you can run the other actions.
* fromFoxmlFiles ( filePath ) - indexing FOXML records; filePath may be null, in which case the configured
  Fedora   Object Directory is used, so that the whole of the Fedora registry is indexed.
* fromPid ( PID ) - indexing one FOXML record, as exported by Fedora API-M; in case a previous index document
  with the same PID exists, it is first deleted. This is the incremental update operation
  that shall be called after all of Fedora's API-M operations that modifies a FedoraObject.
* deletePid ( PID ) - deleting one index document.

To set access restrictions, add the following to .../tomcat/webapps/fedoragsearch/WEB-INF/web.xml:

<security-constraint>
 <web-resource-collection>
 <web-resource-name>AdminResources</web-resource-name>
 <url-pattern>/rest/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>fedorausers</role-name>
</auth-constraint>
</security-constraint>
<security-role>
  <role-name>fedorausers</role-name>
</security-role>

Then add in .../tomcat/conf/tomcat-users.xml the role fedorausers and a user and password with that role.

To place the index data in /var/cache/fedora the property fgsindex.indexDir in $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/configBasic/index/BasicIndex/index.properties was set to /var/cache/fedora (after creating that directory) and Fedora restarted, but this did not show an effect. After changing that property in .../classes/config/index/BasicIndex/index.properties and restarting Fedora, there was the same error message as on first installing GSearch, that files segment* were not found in directory /var/cache/fedora. After running updateIndex createEmpty and updateIndex fromFoxmlFiles in updateIndex, the segments files and a .cfs file had been created in /var/cache/fedora, even without first copying files into the directory. To have the same directory structure as it is originally in gSearch, it was changed to /var/cache/fedora/gSearch/BasicIndex. A symbolic link was created from the original $FEDORA_HOME/gSearch/BasicIndex to /var/cache/fedora/gSearch/BasicIndex.

Symbolic links to prevent base partition overflow

Note: Symbolic links must point from $FEDORA_HOME folders to other points in the debian file system: persistent data to /var/lib/fedora/, temporary logs and rebuildable cache-data to /mnt/dump/var/cache/fedora/. These changes have to be repeated with every update to a new Fedora version!

Particularly the following need to be checked:

/usr/share/fedora-3.1/gsearch/
/usr/share/fedora-3.1/data/resourceIndex/

to

 /mnt/dump/var/cache/fedora/gsearch
 /mnt/dump/var/cache/fedora/data/resourceIndex

and

/usr/share/fedora-3.1/data/objects

to

/var/lib/fedora/data/objects

Also the changes in fedora.fcfg and index.properties have to be repeated manually after updating to a new Fedora version.

Some $FEDORA_HOME/tomcat folders also have to be linked and this must be repeated manually after every updating to a new Fedora version:

/usr/share/fedora-3.1/tomcat/logs

to

/mnt/dump/var/log/fedora-tomcat

and

/usr/share/fedora-3.1/tomcat/work

to

/mnt/dump/var/cache/fedora-tomcat


Example of linking, should the data be in /usr/share/ instead of /mnt/dump/var/cache/

# clear cache folder:
rm /mnt/dump/var/cache/fedora/data-resourceIndex/* -r
# move existing data:
mv /usr/share/fedora-3.1/data/resourceIndex/* /mnt/dump/var/cache/fedora/data-resourceIndex
rmdir /usr/share/fedora-3.1/data/resourceIndex
cd /usr/share/fedora-3.1/data
# create softlink
ln -s -T /mnt/dump/var/cache/fedora/data-resourceIndex resourceIndex

Paths

The application is installed in: /usr/share/fedora3.1 (with a softlink set from /usr/share/fedora for version independent linking).

The persistent data should be placed in /var/lib/fedora (the mysql database will automatically be placed in /var/lib/mysql)

Generic Search Service(GSearch) 2.1.1

TODO

Notes

(Andrei Homodi, UTC-N) Due to the fact that the search results were not sorted properly by result score, I've modified the index.properties files, where available, by commenting out:

#fgsindex.defaultSortFields = PID,AUTO,true