Difference between revisions of "Copying or merging wiki user accounts"
(→Script) |
(→Script) |
||
Line 67: | Line 67: | ||
* Using [http://www.mediawiki.org/wiki/Manual:Database_access the database abstraction layer] provided by Mediawiki. | * Using [http://www.mediawiki.org/wiki/Manual:Database_access the database abstraction layer] provided by Mediawiki. | ||
− | A script was written for automatically merging the wikis of Museum für Naturkunde Berlin (MfN). The script | + | A script was written for automatically merging the wikis of Museum für Naturkunde Berlin (MfN). The script implements the steps listed above in 4 classes: |
+ | ;Backuper | ||
+ | :Backup user tables. | ||
+ | ;Resolver | ||
+ | :Detect and resolve conflicting user accounts. | ||
+ | ;Merger | ||
+ | :Merge user accounts | ||
+ | ;Anonymizer | ||
+ | :Anonymize user accounts, keep selected accounts. | ||
+ | |||
+ | The interaction between these classes is modelled in the following UML activity diagramm: | ||
[[File:Merge users of two wikifarms UML diagramm.png|UML activity diagramm of merging or copying the users from one wikifarm to another (FROM->TO).]] | [[File:Merge users of two wikifarms UML diagramm.png|UML activity diagramm of merging or copying the users from one wikifarm to another (FROM->TO).]] |
Revision as of 11:46, 5 January 2015
There are several use cases where user accounts need to be copied from one wiki to another, e.g.
- Merging two wikifarms
- Wikifarms store user accounts in the metawiki database, which is shared by all wikis in the wikifarm. If the wikis of a wikifarm (or a selection thereof) need to be moved to another wikifarm, then a merge of the user accounts becomes necessary.
- Moving a wikifarm to anothers server
- Strictly speaking, moving a wikifarm to another server can be done by copying the whole databse. In practice however, the new wikifarm will often be a separate branch of the original wiki, so a merge becomes necessary.
Overview
The process consists of a sequence of activities, described in more detail below:
- Backup the user database
- Resolve conflicts
- Merge the user accounts
- Anonymize user accounts which should be discarded
- Clean-up
- The source database server shall be refered to as "FROM"
- The sink database server shall be refered to as "TO"
1. Backup
Make a backup of the tables in TO that will be modified:
- user table in metawiki database
- user tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist in all individual wiki databases.
2. Resolve conflicts
User accounts can conflict if FROM and TO have accounts with the same ids but with different data. This can happen if:
- accounts in FROM have been updated after TO was copied.
- accounts have been created in TO, but not in FROM.
Resolving conflicts involves the following actions:
- Search for incomplete accounts in FROM (e.g. without an e-mail address). If incomplete accounts are found, stop.
- List all conflicting accounts in TO, i.e. which have no equivalent in FROM because they were created after TO was copied.
- Add a buffer of e.g. 2000 accounts in TO, to accomodate future account merges.
- Move conflicting accounts after the buffer.
- Update user ids in the individual wiki databases, in the tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist.
- Check that conflicts are resolved, else stop.
3. Merge
Merge the user accounts in TO with accounts in FROM.
- Copy the table user_groups from all individual wiki databases that will be merged.
- Update user data in the tables user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist from all individual wiki databases that will be merged.
- Copy accounts which exist in FROM but not in TO, or which were updated in FROM after TO was copied.
4. Anonymize
In some cases, not all accounts shall be merged. However, to keep the page revisions, discussions etc., user accounts cannot simply be deleted, so they have to be anonymized:
- Make a list of users which will be kept, e.g.
- Users which belong to a specific user group
- Users which are registered with a specific wiki in TO
- Users with certain E-mail addresses
- Administrators, bureaucrats
- Bots
- Anonymize all users which are not in the list obtained in the previous step:
- Erase E-mail address and related columns
- Replace login with some string, e.g. id
- Replace password and related columns by empty string or null
- Replace real name by id
Script
Merging wikifarms can be complicated, so it will generally be an iterative process. A script can automatize the most tedious tasks, and provide more security by allowing to (unit) test the merge on a test server.
There are several ways of implementing this:
- Using the Mediawiki http API. Using PHP, the Snoopy library for simulating a browser can come in handy.
- Directly modifying the wiki database, e.g. using the PDO data abstraction layer for PHP. This practice is not recommended by Mediawiki, but in this case, given the potentially large number of modifications necessary, it can be the most efficient solution.
- Using the database abstraction layer provided by Mediawiki.
A script was written for automatically merging the wikis of Museum für Naturkunde Berlin (MfN). The script implements the steps listed above in 4 classes:
- Backuper
- Backup user tables.
- Resolver
- Detect and resolve conflicting user accounts.
- Merger
- Merge user accounts
- Anonymizer
- Anonymize user accounts, keep selected accounts.
The interaction between these classes is modelled in the following UML activity diagramm: