Difference between revisions of "Converting Word to Mediawiki text"
(first public notes) |
m (→LibreOffice/OpenOffice: +pandoc) |
||
(6 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ==Introduction== | |
− | + | ''These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!'' | |
− | The main automatic converter we | + | Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation. |
+ | |||
+ | |||
+ | ==LibreOffice/OpenOffice== | ||
+ | |||
+ | The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF. | ||
+ | |||
+ | According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing). | ||
: '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!''' | : '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!''' | ||
+ | |||
+ | ::I use regularly the extension wikipublisher.oxt of OpenOffice. You can download it at [http://www.formavia.fr/wiki/index.php/Aide:OpenOffice_et_Mediawiki Formavia] (I have the adress in French). I agree that many nowiki tags appear, I don't know why. [[User:Michel Chauvet|Michel Chauvet]] ([[User talk:Michel Chauvet|talk]]) 11:28, 24 March 2014 (CET). | ||
The automatic converter is very useful for tables and much formatting. It uses <nowiki><nowiki></nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually. | The automatic converter is very useful for tables and much formatting. It uses <nowiki><nowiki></nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually. | ||
Line 11: | Line 20: | ||
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word. | Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word. | ||
− | ==MS Word== | + | You can also export your file to HTML and use the command line tool <code>pandoc</code> (https://pandoc.org) to convert it to mediawiki: |
+ | pandoc --from html --to mediawiki myfile.html > myfile.wiki | ||
+ | |||
+ | ==MS Word: Manual search and replace== | ||
+ | |||
+ | Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps. | ||
+ | |||
+ | : '''The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.''' | ||
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows: | Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows: | ||
Line 31: | Line 47: | ||
One can run some cleanup: | One can run some cleanup: | ||
− | |||
<nowiki>"<i> " with " <i>"</nowiki> | <nowiki>"<i> " with " <i>"</nowiki> | ||
+ | <nowiki>"<i>^t" with "^t<i>"</nowiki> | ||
<nowiki>" </i>" with "</i> "</nowiki> | <nowiki>" </i>" with "</i> "</nowiki> | ||
+ | <nowiki>"^t</i>" with "</i>^t"</nowiki> | ||
+ | <nowiki>"<i></i>" with ""</nowiki> | ||
+ | <nowiki>" " with " "</nowiki> (two blanks with one) | ||
+ | <nowiki>"</i> <i>" with " "</nowiki> | ||
+ | <nowiki>"</i>^t<i>" with " "</nowiki> | ||
+ | |||
+ | |||
+ | The same is done for bold, underline, and if necessary superscript/subscript. | ||
+ | |||
+ | ==MS Word: Converter Extension== | ||
+ | |||
+ | See http://en.wikipedia.org/wiki/Help:WordToWiki | ||
---- | ---- | ||
+ | [[Category:MediaWiki]] |
Latest revision as of 21:39, 24 August 2017
Contents
Introduction
These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!
Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation.
LibreOffice/OpenOffice
The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.
According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing).
- Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!
- I use regularly the extension wikipublisher.oxt of OpenOffice. You can download it at Formavia (I have the adress in French). I agree that many nowiki tags appear, I don't know why. Michel Chauvet (talk) 11:28, 24 March 2014 (CET).
The automatic converter is very useful for tables and much formatting. It uses <nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually.
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.
You can also export your file to HTML and use the command line tool pandoc
(https://pandoc.org) to convert it to mediawiki:
pandoc --from html --to mediawiki myfile.html > myfile.wiki
MS Word: Manual search and replace
Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.
- The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:
find-text: (empty) find-formatting: smallcaps replace with text: {{Smallcaps|^&}} replace-formatting: not smallcaps.
^& is the Word placeholder inserting the text that was found back into the replace text.
Although the automatic converter (see above) will convert italics, bold, and underline, it can be advantageous with big documents to do this manually as well. For example by replacing italics with <i> and </i> one can fix the spacing errors which are frequently invisible in Word documents (spacing at end, start, or alone being bold or italic). The advantage of <i> and </i> over the default double apostrophes is that they are directional, so after replacing:
find-text: (empty) find-formatting: italics replace with text: <i>^&</i> replace-formatting: not italics
One can run some cleanup:
"<i> " with " <i>" "<i>^t" with "^t<i>" " </i>" with "</i> " "^t</i>" with "</i>^t" "<i></i>" with "" " " with " " (two blanks with one) "</i> <i>" with " " "</i>^t<i>" with " "
The same is done for bold, underline, and if necessary superscript/subscript.
MS Word: Converter Extension
See http://en.wikipedia.org/wiki/Help:WordToWiki