Difference between revisions of "Excel to SDD converter"

From Biowikifarm Metawiki
Jump to: navigation, search
(Porting page from http://www.keytonature.eu/wiki/Excel_to_SDD_converter - with full list of earlier contributors)
 
(Removing the list of earlier contributors to make it more readable, please refer to the first version to see all authors)
Line 1: Line 1:
The following text is originally from http://www.keytonature.eu/wiki/Excel_to_SDD_converter and published under the same Creative Commons by-sa license. It is now ported to biowikifarm to centralize help files.
 
 
The earlier contributors (http://www.keytonature.eu/w/index.php?title=Excel_to_SDD_converter&action=history) are:
 
* 09:18, 29 October 2010 WikiSysop (9,669 bytes)
 
* 12:09, 18 September 2010 Gregor Hagedorn (9,594 bytes) (→Metadata rows)
 
* 10:49, 16 September 2010 Gregor Hagedorn (9,532 bytes) (→Metadata rows)
 
* 10:48, 16 September 2010 Gregor Hagedorn (9,522 bytes) (→Metadata rows)
 
* 10:20, 16 September 2010 Gregor Hagedorn (9,505 bytes)
 
* 22:36, 13 September 2010 AndreiHomodi m (9,328 bytes) (→Bracketed-Image-Method)
 
* 22:36, 13 September 2010 AndreiHomodi m (9,403 bytes) (→Character columns)
 
* 14:03, 9 September 2010 Sabine von Mering (9,434 bytes) (→Bracketed-Image-Method)
 
* 10:30, 9 September 2010 Gregor Hagedorn (9,318 bytes)
 
* 10:28, 8 September 2010 Sabine von Mering (8,059 bytes) (→Character columns)
 
* 13:19, 3 September 2010 Sabine von Mering m (7,663 bytes) (-typo)
 
* 23:13, 22 August 2010 AndreiHomodi (7,664 bytes) (→Taxon columns)
 
* 23:13, 22 August 2010 AndreiHomodi (7,663 bytes) (→Metadata rows)
 
* 09:36, 21 August 2010 Gregor Hagedorn (7,676 bytes) (→Metadata rows)
 
* 09:32, 21 August 2010 Gregor Hagedorn (7,535 bytes) (→Metadata rows)
 
* 00:09, 21 August 2010 AndreiHomodi (7,441 bytes) (→Metadata rows)
 
* 00:07, 21 August 2010 AndreiHomodi (7,391 bytes) (→Character columns)
 
* 23:55, 20 August 2010 AndreiHomodi (6,791 bytes) (→Taxon columns)
 
* 20:38, 14 July 2010 AndreiHomodi (6,667 bytes) (→Character columns)
 
* 20:36, 14 July 2010 AndreiHomodi (6,612 bytes) (→Metadata rows)
 
* 15:10, 14 July 2010 Gregor Hagedorn (6,659 bytes)
 
* 10:12, 14 July 2010 AndreiHomodi (6,376 bytes)
 
* 10:01, 14 July 2010 AndreiHomodi (6,176 bytes)
 
* 07:53, 6 July 2010 Gregor Hagedorn (6,015 bytes)
 
* 10:04, 2 July 2010 Gregor Hagedorn (4,822 bytes)
 
* 09:43, 2 July 2010 Sabine von Mering (4,864 bytes)
 
* 18:10, 1 July 2010 Gregor Hagedorn (4,811 bytes)
 
* 17:56, 1 July 2010 Gregor Hagedorn (4,290 bytes) (new)
 
 
 
 
To allow the creation of matrix keys without the need for complicated technical overhead, UTCN (Prof. M. Giurgiu and A. Homodi) has created a converter from Excel to a format that can be used by identification software (the SDD format).
 
To allow the creation of matrix keys without the need for complicated technical overhead, UTCN (Prof. M. Giurgiu and A. Homodi) has created a converter from Excel to a format that can be used by identification software (the SDD format).
  
 
To allow a conversion, the Excel spreadsheet must have the following form:
 
To allow a conversion, the Excel spreadsheet must have the following form:
 
+
* The key must be on the first sheet in an Excel file (regardless of name).
The key must be on the first sheet in an Excel file (regardless of name).
+
* The first row contains column headings, all further rows (until the first blank row) contain organism names and characters for the organisms.
 
+
* The first column always contains the organism names (= also known as "taxon" names). For a simple key, this may be enough, and all following columns may already contain characters (see "Character columns" further down). However, it is possible to have up to 5 taxon-related columns for scientific name, synonyms, taxon images, and taxon-web-page links. This is explained in detail under "Taxon columns" below.
The first row contains column headings, all further rows (until the first blank row) contain organism names and characters for the organisms.
+
* After a blank row, at the end of the spreadsheet, the first two columns may contain a number of general information about the key. See "Metadata rows" below.
 
+
The first column always contains the organism names (= also known as "taxon" names). For a simple key, this may be enough, and all following columns may already contain characters (see "Character columns" further down). However, it is possible to have up to 5 taxon-related columns for scientific name, synonyms, taxon images, and taxon-web-page links. This is explained in detail under "Taxon columns" below.
+
 
+
After a blank row, at the end of the spreadsheet, the first two columns may contain a number of general information about the key. See "Metadata rows" below.
+
  
 
==Taxon columns==
 
==Taxon columns==

Revision as of 06:08, 22 June 2011

To allow the creation of matrix keys without the need for complicated technical overhead, UTCN (Prof. M. Giurgiu and A. Homodi) has created a converter from Excel to a format that can be used by identification software (the SDD format).

To allow a conversion, the Excel spreadsheet must have the following form:

  • The key must be on the first sheet in an Excel file (regardless of name).
  • The first row contains column headings, all further rows (until the first blank row) contain organism names and characters for the organisms.
  • The first column always contains the organism names (= also known as "taxon" names). For a simple key, this may be enough, and all following columns may already contain characters (see "Character columns" further down). However, it is possible to have up to 5 taxon-related columns for scientific name, synonyms, taxon images, and taxon-web-page links. This is explained in detail under "Taxon columns" below.
  • After a blank row, at the end of the spreadsheet, the first two columns may contain a number of general information about the key. See "Metadata rows" below.

Taxon columns

  1. The first column in the spreadsheet must always be the primary name of the organism (= taxon). If this is the only name column, it may be either the common or scientific name (but see below).
    • The language must be indicated in brackets (=[]) after the column heading (= in the first row). Example: the column heading could be "Name [de]" for names in German.
  2. If any of the 2nd to 4th column is labeled "Scientific Name", "scientific name", or "Wissenschaftlicher Name", it will be considered the scientific name of the organism. The first column will be the common name.
  3. If any of the 2nd to 4th column is labeled "Images", "images", "Image", "image", "Bilder" or "Bild", it will be considered to contain a list of organism (taxon) images. Image references starting with http:// are direct web links, those starting with "File:" or "Datei:" are internal links to files. Each link must be written between double square brackets (e.g. [[File:...]]). No delimiter is needed for multiple images.
  4. If any of the 2nd to 4th column is labeled "URL" or "url", it will be considered a link to a species page. Only links to web pages are supported (e.g. http://...) and each link must be written in double square brackets ([[http://...]]).

Note: synonyms and taxon hierarchies (supported in SDD) are presently not supported by the converter. If you need synonyms, you may want to add them to the scientific name column, like "Aus bus (syn. Cus bus)".

Character columns

All further columns create the character matrix. The character matrix may start in column 2, but then none of the columns 2-5 may be labeled with names reserved for taxon columns.

Characters may be quantitative (like counts or lengths) or categorical (like "round", "elliptical", or "red", "blue", etc.). The name of the character is always given in the first row (i.e., in the column heading).

  • For any kind of character, a character image (i.e., an image that applies to all states of the character, e.g., for leaf length a diagram to explain how measure it (incl. or excl. the petiole)), may be given after the character name using the Bracketed-Image-Method (see below).
  • For quantitative characters, a measurement unit (like mm, cm, mg, etc.) may be given after the character name (i.e., in the first row) in single square brackets. Example: "Leaf length [mm]".
  • Characters can also be grouped by adding a group name. In the character list, the groups are then shown first, with the contained characters expandable for each group. Only a single grouping level (hierarchy) is presently supported by the converter.
    • Method A: The group for which the character belongs is written in double curly brackets after the character name (e. g., Petal Color {{Flowers}}).
    • Method B (preferred): The groups are in a second row below the character names. This group is recognized by the Words "Char.Group", "Chargroup" or "Merkmalsgruppe" in the first column (the one normally containing the organism/taxon names).

For each organism/taxon row, the states are listed directly in the cell under the character heading. Example: "red" under the heading "flower color" in the row corresponding to a plant with red flowers.

  • Some organisms have more than one state for a character, e.g., for organism:daisy states for heading: "flower color" may be "white; yellow". Character states in a single cell may be semicolon separated.
  • If no data is available this may be indicated by a question mark or by simply leaving the cell empty. (SDD: empty cell result in no output in SDD at all, "?" results in character with status="DataUnavailable")
  • If a character is inapplicable for a certain taxon (e.g. "stipule form" for taxa without stipules) this may be indicated by using the abbreviations "n.a." OR "N/A" OR "n/a". (SDD: maps to status="DataUnavailable")
  • State images, illustrating a particular character state may be added after the character state label using the Bracketed-Image-Method (see below).

Each state may bear a comment at the beginning or end, surrounded ("DELTA-like") by "<>". Comments at the beginning are created as modifiers, comments at the end as remarks/annotations. Example: the text in a cell may be: "<rarely>red<especially immediately after anthesis>;<usually>blue"

A character may have more than one state in a given taxon. The creator of the data can create 2 or more excel columns with identical character labels. The converter will recognize this and merge the states into a single character.

Notes: 1. comma and semicolon may occur inside the <> without splitting into two states. 2. single > or < may occur in state labels, as in: "<mostly>sepal > petal length<a comment>". Search modifier and comment from right and left.


Bracketed-Image-Method

This method is used for character and character state images. After the character or character state label, the image name and image captions are added in double square brackets, with a vertical bar ("|") in front of the caption. A URL linking to additional file information (author, licence etc.) may be present after another vertical bar ("|").

Images may be stored anywhere on the Internet or on the Wiki. In the first cases the name must start with "http://", else a wiki name is assumed. Note that re-using images from other servers is often a violation of copyright, unless you own the image yourself. The use of wiki images is therefore easier, because the license is present and documented. It is either required or good practice to attribute any image authors different from the key authors in the image caption.

Example: An Excel cell may contain a state label followed by two images: "blue [[File:Blue campanula flower.jpg | Blue flower (Campanula), (c) X. Smith, cc-by-sa 3.0]][[http://x/x.jpg | caption for second image]]".

Whereas character labels appear only a single time (first row), state labels may appear multiple times. The image that becomes the state image is the first state label with an image from the top. It is permissible (but not required) to add the image each time. However, if different images are used with the same state label in the same character, only the first set of images is accepted, all later ones are ignored.

Note that multiple images are separated just by their double brackets, not by any additional "," or ";" characters.

Metadata rows

Below the main matrix table, a blank row and a second row containing the text [METADATA] indicates the start of key-wide metadata. The first column contains a field name, the second the value. The following fields are recommended:

[METADATA]
Title: A key to XXX
Creators: A. First; B. Second
Description: Free form text, including information about source of information, literature used, geographic or taxonomic scope, etc. (Mapping to SDD:Dataset/Representation/Detail)
Language: en
Copyright: (c) A. First and B. Second 2010
License: Creative commons by-sa 3.0

Note: The keywords are recognized error tolerant with and without the colon, i.e. both Title: and Title are accepted.


Things to do:

Ideally, the conversion process would create an xml file and also a simple template to copy into a wiki page, so that after uploading the xml file to the wiki, it would only be necessary to paste this into a wiki page to get a running key.