Extension:Template Parameter Index

From Biowikifarm Metawiki
Revision as of 12:03, 4 July 2018 by Andreas Plank (Talk | contribs) (Parsed Template Paremeter: DO_PARSE_TPL2PAR_PREG -> PREG_DO_PARSE_PAR_OF_TPL)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Template Parameter Index is a Wiki extension designed to harvest information from any template present on wiki pages into an index or cache table. The result is similar to how mediawiki handles categories, links and other relations. The extension uses the mediawiki XML-API and has the ability to harvest other wikis in addition to its own datasource. The external wikis are defined in a special configuration page, linked through the SpecialPage:TemplateParameterIndex. In addition, in the local wiki, page updates will automatically update the TemplateParameterIndex.


Installation

In your $IP/extensions folder create a TemplateParameterIndex folder, and decompress the files to this folder. Activate the extension by adding to LocalSettings.php:

include_once("$IP/extensions/TemplateParameterIndex/TemplateParameterIndex.php");

Then run the update maintenance script that should add tables template_parameter_usage.sql and template_parameter_jobs.sql in your wiki database, if not you have to insert these CREATE TABLE statements manually to mysql. Run the maintenanc script to update database changes:

sudo -u www-data php ./maintenance/update.php --dbuser wikiadmin --conf ./LocalSettings.php

If external wikis are to be harvested, create a page "MediaWiki:TemplateParameterIndex/configurations" (this is the default name, can be overwritten in LocalSettings.php) where parameters will be entered in the form: * parametername = parametervalue.

$wgGroupPermissions['sysop']['templateparameterindex'] = true; 
$wgGroupPermissions['user']['templateparameteranalyzer'] = true; 
$wgGroupPermissions['user']['templateparameterexport'] = true;

are set as default group Permissions. They can also be overwritten in LocalSettings.php.


Former lines in /etc/crontab

### K2N Ingest application
# 3 0 * * * root wget -q -O /dev/null "http://biowikifarm.net/testwiki/index. php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=update&since=1"
# execute command on wiki to harvest template updates of the last day from testwiki (G.Weber)
# 13 0 * * * root wget -q -O /dev/null "http://biowikifarm.net/metawiki/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=update&since=1"
# execute command on wiki harvest template updates of the last day from metawiki (G.Weber)
# 23 0 * * * root wget -q -O /dev/null "http://biowikifarm.net/testwiki2/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=update&since=1"
# execute command on wiki harvest template updates of the last day from testwiki2 (G.Weber)
# 35 * * * * root wget -q -O /dev/null "http://biowikifarm.net/testwiki/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=resume"
# execute command on wiki to resume initializing the template table for testwiki (G.Weber)
45 * * * * root wget -q -O /dev/null "http://offene-naturfuehrer.de/w/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=resume"
# execute command on wiki to resume initializing the template table for offene-naturfuehrer wiki (G.Weber)
# 55 * * * * root wget -q -O /dev/null "http://biowikifarm.net/metawiki/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=resume"
# execute command on wiki to resume initializing the template table for metawiki (G.Weber)
# 5 * * * * root wget -q -O /dev/null "http://biowikifarm.net/testwiki2/index.php?title=Special:TemplateParameterIndex&action=send&do=insertAll&mode=resume"
# execute command on wiki to resume initializing the template table for testwiki2 (G.Weber)

12 2-9 * * * root sh /usr/share/FedoraIngestEngine/FedoraIngestEngine.sh & # Ingest Application

Documentation

The harvested information is exposed through an analysis functionality on the Special page, and through an xml-interface. The latter is used to ingest information into a FEDORA repository (see Definition_specs_for_Fedora_Ingestion_Service#Interaction_between_Ingest_and_Wiki_TemplateParameterIndex).

The xml export is accessible via user interface or via url.

Current Main menu of user interface:

   * Manage Index
         o Clear Index     [= deletes all data from the data table]
         o Update Index (Update for defined time period) [= queries api.php for recentchanges in articles, newly uploaded 
           files and file overrides in the given period, passes the found pages to harvest]
         o Rebuild Entire Index (Clear index and restart) [= deletes all data from the data table, queries api.php for 
           allpages, passes the found pages to harvest]
         o Rebuild Indices Selectively By Source (not yet implemented)
         o Update Page [= user chooses wiki via BaseUrl, user enters page name, both are passed to harvest]
         o Configure TemplateParameterIndex [= link to configuration page on which parameters are defined e.g. the URLs of the
           Wikis that are to be harvested]
   * Analyze Index
         o Show Templates - pages, parameters [shows all indexed templates, for each template the pages which use it, 
           and all parameters which it uses, for the parameters the pages and values]
   * Export Index
         o Export Index by Recent Changes and Parameters [= queries the data table with parameters entered by user, 
           exports result as XML]
         

Currently 3 special pages are created:

  • Special:TemplateParameterIndex, which displays the complete menu with all functions
  • Special:TemplateParameterAnalyzer, which offers only the Analyze Index functions
  • Special:TemplateParameterExport, which offers xml export either via user interface or via url (http://.../index.php?title=Spezial:TemplateParameterExport&action=submit&do=export) with the following parameters:
    • selectwiki - article path of the selected wiki, in user interface all available are offered in a select box, if not set, all are returned
    • template - one template name, by default all are returned
    • pagename - one wiki page, by default all are returned
    • from - start date
    • to - end date
    • parname - name of a parameter for a restriction, currently only one possible
    • parvalue - value of that parameter for the desired restriction
    • returnpar - list of parameters that are to be returned, separated by ";"
    • outxml - set to "old" to get the output xml with elements not nested, default is nested

Parsed Template Paremeter

By default all parameters are stored as they are but the following parameter are parsed and rendered to HTML to have complex formatting already parsed. The PHP-Code snippet shows the regular expression pattern for the different templates:

public static $PREG_DO_PARSE_PAR_OF_TPL = array(
    "Key Start" => "@(?"
       . ":description"
       . "|parent_key_text|publicity"
       . "|remarks"
       . "|source|status"
       . "|taxon_name|title"
       . ")@i",             
    "Key_Start" => "@(?"
       . ":description"
       . "|parent_key_text|publicity"
       . "|remarks"
       . "|source|status"
       . "|taxon_name|title"
       . ")@i",             
    "Lead" => "@(?"
      . ":description"
      . "|caption_[a-z]"
      . "|imagesfooter"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name|subheading|synonyms"
      . "|unnamed2"
      . ")@i",
    "Switch" => "@(?"
      . ":caption_[1-9][0-9]?[a-z]?"
      . "|parent_key_text"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-9][0-9]?|subheading|synonyms_[1-9][0-9]?"
      . "|parent_taxon|taxon_name|title"
      . ")@i",
    "ImageSwitch" => "@(?"
      . ":lead_[1-9][0-9]?"
      . "|result_?text"
      . "|scientific_name_[1-9][0-9]?|synonyms_[1-9][0-9]?"
      . "|title"
      . ")@i",
    "Bildweiche" => "@(?"
      . ":Beschreibung"
      . "|Ergebnis_?Text"
      . "|Titel"
      . "|Wissenschaftlicher_name_[1-9][0-9]?|Synonyme_[1-9][0-9]?"
      . ")@i",
    "Decision S2" => "@(?"
      . ":subheading"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|text"
      . ")@i",
    "Decision_S2" => "@(?"
      . ":subheading"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|text"
      . ")@i",
    "Decision Horizontal" => "@(?"
      . ":caption"
      . "|description"
      . "|intro_text"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|subheading"
      . ")@i",
    "Decision_Horizontal" => "@(?"
      . ":caption"
      . "|description"
      . "|intro_text"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|subheading"
      . ")@i",
    "Decision Horizontal2" => "@(?"
      . ":caption_\d[a-z]?"
      . "|description"
      . "|lead|lead_main"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|subheading"
      . ")@i",
    "Decision_Horizontal2" => "@(?"
      . ":caption_\d[a-z]?"
      . "|description"
      . "|lead|lead_main"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-8]|synonyms_[1-8]"
      . "|subheading"
      . ")@i",
    "Decision" => "@(?"
      . ":caption_\d[a-z]?"
      . "|description"
      . "|imagesfooter"
      . "|lead|lead_main"
      . "|remarks|result_?qualifier|result_?text"
      . "|scientific_name_[1-3]|synonyms_[1-3]"
      . "|subheading"
      . ")@i"
  );