Difference between revisions of "Converting Word to Mediawiki text"

From Biowikifarm Metawiki
Jump to: navigation, search
Line 18: Line 18:
 
==MS Word==
 
==MS Word==
  
Generally important steps: Search for use of ^l, the new line, consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.
+
Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.
 +
 
 +
: '''The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.'''
  
 
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:
 
Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:

Revision as of 19:09, 5 January 2011

Introduction

These notes are publicly visible, but nevertheless mostly created for internal use. Use them at your own risk, but contributions are most welcome of course!

Conversion is usually most effective on a single, large document. If many pages exist, try to combine them with a unique separator. A separator may allow to later convert the document into an xml-import file, that can import 100s of pages in a single operation.


OpenOffice

The main automatic converter we use is OpenOffice 3.1.1 (which is not the current version, in the current version the mediawiki export has been officially moved to an extension, but I did not find that extension). In OO 3.1.1, the export is in the file menu under Export, select at the bottom of the dialog box "Mediawiki" instead of the default PDF.

Experience with a current OpenOffice version or links to solutions how to install the export there again are welcome!

The automatic converter is very useful for tables and much formatting. It uses <nowiki> tags too often, so I prefer to remove ALL <nowiki></nowiki> - rather adding the few truly needed manually.

Unfortunately the converter does not convert smallcaps or colored text or colored background. Therefore as a first step (before using OpenOffice), some conversions are done manually in MS Word.

MS Word

Generally important steps: Search for use of ^l (the new line character), consider replacing with ^p (the paragraph). Replace ^p formatted as italic, bold, smallcaps with default paragraph style, not bold italic smallcaps.

The special find and replace symbols like ^p ^l ^& are specific to Microsoft Word and will not work in Open Office. If you have equivalent instructions for MS Office, please help providing them here.

Smallcaps: Given a Template:Smallcaps, rendering text in small capital letters, we use MS Word to replace as follows:

find-text: (empty)
find-formatting: smallcaps
replace with text: {{Smallcaps|^&}}
replace-formatting: not smallcaps.

^& is the Word placeholder inserting the text that was found back into the replace text.

Although the automatic converter (see above) will convert italics, bold, and underline, it can be advantageous with big documents to do this manually as well. For example by replacing italics with <i> and </i> one can fix the spacing errors which are frequently invisible in Word documents (spacing at end, start, or alone being bold or italic). The advantage of <i> and </i> over the default double apostrophes is that they are directional, so after replacing:

find-text: (empty)
find-formatting: italics
replace with text: <i>^&</i>
replace-formatting: not italics

One can run some cleanup:

"<i> " with " <i>"
"<i>^t" with "^t<i>"
" </i>" with "</i> "
"^t</i>" with "</i>^t"
"<i></i>" with ""
"  " with " " (two blanks with one)
"</i> <i>" with " "
"</i>^t<i>" with " "


The same is done for bold, underline, and if necessary superscript/subscript.