Difference between revisions of "Converting Word to Mediawiki text"
Line 6: | Line 6: | ||
− | ==OpenOffice== | + | ==LibreOffice/OpenOffice== |
− | The main automatic converter we | + | The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF. |
+ | |||
+ | According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing). | ||
: '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!''' | : '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!''' | ||
Line 16: | Line 18: | ||
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word. | Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word. | ||
− | ==MS Word== | + | ==MS Word: Manual search and replace== |
Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps. | Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps. | ||
Line 51: | Line 53: | ||
The same is done for bold, underline, and if necessary superscript/subscript. | The same is done for bold, underline, and if necessary superscript/subscript. | ||
+ | |||
+ | ==MS Word: Converter Extension== | ||
+ | |||
+ | See http://en.wikipedia.org/wiki/Help:WordToWiki | ||
---- | ---- |
Revision as of 09:34, 6 January 2014
Contents
Introduction
These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!
Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation.
LibreOffice/OpenOffice
The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.
According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing).
- Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!
The automatic converter is very useful for tables and much formatting. It uses <nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually.
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.
MS Word: Manual search and replace
Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.
- The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:
find-text: (empty) find-formatting: smallcaps replace with text: {{Smallcaps|^&}} replace-formatting: not smallcaps.
^& is the Word placeholder inserting the text that was found back into the replace text.
Although the automatic converter (see above) will convert italics, bold, and underline, it can be advantageous with big documents to do this manually as well. For example by replacing italics with <i> and </i> one can fix the spacing errors which are frequently invisible in Word documents (spacing at end, start, or alone being bold or italic). The advantage of <i> and </i> over the default double apostrophes is that they are directional, so after replacing:
find-text: (empty) find-formatting: italics replace with text: <i>^&</i> replace-formatting: not italics
One can run some cleanup:
"<i> " with " <i>" "<i>^t" with "^t<i>" " </i>" with "</i> " "^t</i>" with "</i>^t" "<i></i>" with "" " " with " " (two blanks with one) "</i> <i>" with " " "</i>^t<i>" with " "
The same is done for bold, underline, and if necessary superscript/subscript.
MS Word: Converter Extension
See http://en.wikipedia.org/wiki/Help:WordToWiki