Difference between revisions of "Converting Word to Mediawiki text"

From Biowikifarm Metawiki
Jump to: navigation, search
(first public notes)
 
m (LibreOffice/OpenOffice: +pandoc)
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Internal notes:
+
==Introduction==
  
==OpenOffice==
+
''These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!''
  
The main automatic converter we use is OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.
+
Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation.
 +
 
 +
 
 +
==LibreOffice/OpenOffice==
 +
 
 +
The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.
 +
 
 +
According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing).
  
 
: '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!'''
 
: '''Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!'''
 +
 +
::I use regularly the extension wikipublisher.oxt of OpenOffice. You can download it at [http://www.formavia.fr/wiki/index.php/Aide:OpenOffice_et_Mediawiki Formavia] (I have the adress in French). I agree that many nowiki tags appear, I don't know why. [[User:Michel Chauvet|Michel Chauvet]] ([[User talk:Michel Chauvet|talk]]) 11:28, 24 March 2014 (CET).
  
 
The automatic converter is very useful for tables and much formatting. It uses <nowiki><nowiki></nowiki> tags too often, so I prefer to remove ALL &lt;nowiki&gt;&lt;/nowiki&gt; - rather adding the few truly needed manually.
 
The automatic converter is very useful for tables and much formatting. It uses <nowiki><nowiki></nowiki> tags too often, so I prefer to remove ALL &lt;nowiki&gt;&lt;/nowiki&gt; - rather adding the few truly needed manually.
Line 11: Line 20:
 
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.
 
Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.
  
==MS Word==
+
You can also export your file to HTML and use the command line tool <code>pandoc</code> (https://pandoc.org) to convert it to mediawiki:
 +
pandoc --from html --to mediawiki myfile.html > myfile.wiki
 +
 
 +
==MS Word: Manual search and replace==
 +
 
 +
Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.
 +
 
 +
: '''The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.'''
  
 
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:
 
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:
Line 31: Line 47:
 
One can run some cleanup:  
 
One can run some cleanup:  
  
<nowiki>"<i> </i>" with " "</nowiki>
 
 
  <nowiki>"<i> " with " <i>"</nowiki>
 
  <nowiki>"<i> " with " <i>"</nowiki>
 +
<nowiki>"<i>^t" with "^t<i>"</nowiki>
 
  <nowiki>" </i>" with "</i> "</nowiki>
 
  <nowiki>" </i>" with "</i> "</nowiki>
 +
<nowiki>"^t</i>" with "</i>^t"</nowiki>
 +
<nowiki>"<i></i>" with ""</nowiki>
 +
<nowiki>"  " with " "</nowiki> (two blanks with one)
 +
<nowiki>"</i> <i>" with " "</nowiki>
 +
<nowiki>"</i>^t<i>" with " "</nowiki>
 +
 +
 +
The same is done for bold, underline, and if necessary superscript/subscript.
 +
 +
==MS Word: Converter Extension==
 +
 +
See http://en.wikipedia.org/wiki/Help:WordToWiki
  
 
----
 
----
 +
[[Category:MediaWiki]]

Latest revision as of 21:39, 24 August 2017

Introduction

These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!

Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation.


LibreOffice/OpenOffice

The main automatic converter we used was OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.

According to http://www.libreoffice.org/features/writer/ and http://www.libreoffice.org/features/extensions/ the wiki export may now be part of the system. However, it may be that no longer a text file is exported, but directly written to an online wiki. This has advantages (fast) and disadvantages (usually the export needs some post-processing).

Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!
I use regularly the extension wikipublisher.oxt of OpenOffice. You can download it at Formavia (I have the adress in French). I agree that many nowiki tags appear, I don't know why. Michel Chauvet (talk) 11:28, 24 March 2014 (CET).

The automatic converter is very useful for tables and much formatting. It uses <nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually.

Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.

You can also export your file to HTML and use the command line tool pandoc (https://pandoc.org) to convert it to mediawiki:

pandoc --from html --to mediawiki myfile.html > myfile.wiki

MS Word: Manual search and replace

Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.

The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.

Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:

find-text: (empty)
find-formatting: smallcaps
replace with text: {{Smallcaps|^&}}
replace-formatting: not smallcaps.

^& is the Word placeholder inserting the text that was found back into the replace text.

Although the automatic converter (see above) will convert italics, bold, and underline, it can be advantageous with big documents to do this manually as well. For example by replacing italics with <i> and </i> one can fix the spacing errors which are frequently invisible in Word documents (spacing at end, start, or alone being bold or italic). The advantage of <i> and </i> over the default double apostrophes is that they are directional, so after replacing:

find-text: (empty)
find-formatting: italics
replace with text: <i>^&</i>
replace-formatting: not italics

One can run some cleanup:

"<i> " with " <i>"
"<i>^t" with "^t<i>"
" </i>" with "</i> "
"^t</i>" with "</i>^t"
"<i></i>" with ""
"  " with " " (two blanks with one)
"</i> <i>" with " "
"</i>^t<i>" with " "


The same is done for bold, underline, and if necessary superscript/subscript.

MS Word: Converter Extension

See http://en.wikipedia.org/wiki/Help:WordToWiki