Converting Word to Mediawiki text

From Biowikifarm Metawiki
Revision as of 17:29, 13 December 2010 by Gregor Hagedorn (Talk | contribs) (first public notes)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Internal notes:

OpenOffice

The main automatic converter we use is OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.

Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!

The automatic converter is very useful for tables and much formatting. It uses <nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually.

Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.

MS Word

Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:

find-text: (empty)
find-formatting: smallcaps
replace with text: {{Smallcaps|^&}}
replace-formatting: not smallcaps.

^& is the Word placeholder inserting the text that was found back into the replace text.

Although the automatic converter (see above) will convert italics, bold, and underline, it can be advantageous with big documents to do this manually as well. For example by replacing italics with <i> and </i> one can fix the spacing errors which are frequently invisible in Word documents (spacing at end, start, or alone being bold or italic). The advantage of <i> and </i> over the default double apostrophes is that they are directional, so after replacing:

find-text: (empty)
find-formatting: italics
replace with text: <i>^&</i>
replace-formatting: not italics

One can run some cleanup:

"<i> </i>" with " "
"<i> " with " <i>"
" </i>" with "</i> "