Mediawiki XML page importing
Notes:
1. it is possible to create mediawiki xml from Microsoft Access tables and queries. However, when pasting this to a text editor, the following has to be observed:
- Putting all into one field will often fail, because problems occur when calculated fields exceed a certain size
- Exporting in multiple columns may work better. The following needs to be post-fixed in the text:
- remove first line with field names
- remove tabulator characters
- fix double-quote escaping (both in xml attributes (preserve) and inside the element content):
"<text to <text and </page>" to </page> (normally not necessary: "<page> and </comment>"); replace "" with ".
- Normally, multiple revision elements are in a single page element. It is possible to import them in separate page elements however (this greatly simplifies some imports!)
- When importing through the web interface, additional versions are created, with the date of import. In this case the sequence of imports rather than dates counts, because these additional versions get the date/time of import! - Avoid using the web interface, when importing versions!
Importing Data from Command Line Interface
This is the preferred method, as it does not create additional versions (compare next section).
Transfer the xml file to the server, and execute (example):
cd /var/www/testwiki; php ./maintenance/importDump.php /var/www/testwiki/Test.xml --conf ./LocalSettings.php cd /var/www/testwiki; php ./maintenance/rebuildall.php --conf ./LocalSettings.php
(rebuildrecentchanges or rebuildall as above are necessary after import - the latter may be very slow.)
Importing Data through Special Pages Web Interface
The Web interface under Special:Import will create extra revisions (in addition to those imported) designating the importing user. If you don't want to document who did a transfer, it may therefore be desirable to use the command-line version (see below). For the web import it may be desirable to create a special "Import-User" so that the name better documents authorship than using a normal username during upload of the xml file. Important creates two revisions for each page: Revision 1 is the imported revision, Revision 2 is the revision documenting the import process. If the imported data alone document this (e.g. when they already are using Import-User and an appropriate comment), it is possible to delete the second revisions in the database (assuming Import-User has ID=4):
Delete FROM PREFIX_revision WHERE PREFIX_revision.rev_user=4 AND PREFIX_revision.rev_minor_edit=1; --Then need to fix the latest revision stored in page: UPDATE PREFIX_revision AS R2 INNER JOIN (PREFIX_page LEFT JOIN PREFIX_revision AS R1 ON PREFIX_page.page_latest=R1.rev_id) ON R2.rev_page=PREFIX_page.page_id SET page_latest=R2.rev_id WHERE R1.rev_id Is Null