Copying or merging wiki user accounts

From Biowikifarm Metawiki
Revision as of 11:46, 5 January 2015 by Alvaro Ortiz-Troncoso (Talk | contribs) (Script)

Jump to: navigation, search

There are several use cases where user accounts need to be copied from one wiki to another, e.g.

Merging two wikifarms
Wikifarms store user accounts in the metawiki database, which is shared by all wikis in the wikifarm. If the wikis of a wikifarm (or a selection thereof) need to be moved to another wikifarm, then a merge of the user accounts becomes necessary.
Moving a wikifarm to anothers server
Strictly speaking, moving a wikifarm to another server can be done by copying the whole databse. In practice however, the new wikifarm will often be a separate branch of the original wiki, so a merge becomes necessary.

Overview

The process consists of a sequence of activities, described in more detail below:

  1. Backup the user database
  2. Resolve conflicts
  3. Merge the user accounts
  4. Anonymize user accounts which should be discarded
  5. Clean-up
  • The source database server shall be refered to as "FROM"
  • The sink database server shall be refered to as "TO"

1. Backup

Make a backup of the tables in TO that will be modified:

  • user table in metawiki database
  • user tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist in all individual wiki databases.

2. Resolve conflicts

User accounts can conflict if FROM and TO have accounts with the same ids but with different data. This can happen if:

  • accounts in FROM have been updated after TO was copied.
  • accounts have been created in TO, but not in FROM.

Resolving conflicts involves the following actions:

  1. Search for incomplete accounts in FROM (e.g. without an e-mail address). If incomplete accounts are found, stop.
  2. List all conflicting accounts in TO, i.e. which have no equivalent in FROM because they were created after TO was copied.
  3. Add a buffer of e.g. 2000 accounts in TO, to accomodate future account merges.
  4. Move conflicting accounts after the buffer.
  5. Update user ids in the individual wiki databases, in the tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist.
  6. Check that conflicts are resolved, else stop.

3. Merge

Merge the user accounts in TO with accounts in FROM.

  1. Copy the table user_groups from all individual wiki databases that will be merged.
  2. Update user data in the tables user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist from all individual wiki databases that will be merged.
  3. Copy accounts which exist in FROM but not in TO, or which were updated in FROM after TO was copied.

4. Anonymize

In some cases, not all accounts shall be merged. However, to keep the page revisions, discussions etc., user accounts cannot simply be deleted, so they have to be anonymized:

  1. Make a list of users which will be kept, e.g.
    1. Users which belong to a specific user group
    2. Users which are registered with a specific wiki in TO
    3. Users with certain E-mail addresses
    4. Administrators, bureaucrats
    5. Bots
  2. Anonymize all users which are not in the list obtained in the previous step:
    1. Erase E-mail address and related columns
    2. Replace login with some string, e.g. id
    3. Replace password and related columns by empty string or null
    4. Replace real name by id

Script

Merging wikifarms can be complicated, so it will generally be an iterative process. A script can automatize the most tedious tasks, and provide more security by allowing to (unit) test the merge on a test server.

There are several ways of implementing this:

  • Using the Mediawiki http API. Using PHP, the Snoopy library for simulating a browser can come in handy.
  • Directly modifying the wiki database, e.g. using the PDO data abstraction layer for PHP. This practice is not recommended by Mediawiki, but in this case, given the potentially large number of modifications necessary, it can be the most efficient solution.
  • Using the database abstraction layer provided by Mediawiki.

A script was written for automatically merging the wikis of Museum für Naturkunde Berlin (MfN). The script implements the steps listed above in 4 classes:

Backuper
Backup user tables.
Resolver
Detect and resolve conflicting user accounts.
Merger
Merge user accounts
Anonymizer
Anonymize user accounts, keep selected accounts.

The interaction between these classes is modelled in the following UML activity diagramm:

UML activity diagramm of merging or copying the users from one wikifarm to another (FROM->TO).