Difference between revisions of "Excel to SDD converter"
m (→Character columns: Wording import for Artenquiz as blockquote) |
m (→Wording import for Artenquiz) |
||
Line 61: | Line 61: | ||
* CharacterState.wording = blue | * CharacterState.wording = blue | ||
− | '''table-cell/attribute''' e.g. “<<span style="color:blue">mostly</span>>green<<span style="color:red">only during spring time</span>>”: | + | '''table-cell/attribute''' e.g. “<<span style="color:blue">mostly</span>>green<<span style="color:red">only during spring time</span>>”:<br/>(see also [[{{TALKPAGENAME}}#modifier_import|the talk about import modifier]]) |
* <span style="color:blue">mostly</span> → Modifier | * <span style="color:blue">mostly</span> → Modifier | ||
* <span style="color:red">only during spring time</span> → note/comment e.g. in CategoricalCharacterData.notes | * <span style="color:red">only during spring time</span> → note/comment e.g. in CategoricalCharacterData.notes |
Revision as of 10:21, 30 April 2012
To allow the creation of matrix keys without the need for complicated technical overhead, UTCN (Prof. M. Giurgiu and A. Homodi) has created a converter from Excel to a format that can be used by identification software (the SDD format).
To allow a conversion, the Excel spreadsheet must have the following form:
- The key must be on the first sheet in an Excel file (regardless of name).
- The first row contains column headings, all further rows (until the first blank row) contain organism names and characters for the organisms.
- The first column always contains the organism names (= also known as "taxon" names). For a simple key, this may be enough, and all following columns may already contain characters (see "Character columns" further down). However, it is possible to have up to 5 taxon-related columns for scientific name, synonyms, taxon images, and taxon-web-page links. This is explained in detail under "Taxon columns" below.
- After a blank row, at the end of the spreadsheet, the first two columns may contain a number of general information about the key. See "Metadata rows" below.
Contents
Taxon columns
- The first column in the spreadsheet must always be the primary name of the organism (= taxon). If this is the only name column, it may be either the common or scientific name (but see below).
- The language must be indicated in brackets (=[]) after the column heading (= in the first row). Example: the column heading could be "Name [de]" for names in German.
- If any of the 2nd to 4th column is labeled "Scientific Name", "scientific name", or "Wissenschaftlicher Name", it will be considered the scientific name of the organism. The first column will be the common name.
- If any of the 2nd to 4th column is labeled "Images", "images", "Image", "image", "Bilder" or "Bild", it will be considered to contain a list of organism (taxon) images. Image references starting with http:// are direct web links, those starting with "File:" or "Datei:" are internal links to files. Each link must be written between double square brackets (e.g. [[File:...]]). No delimiter is needed for multiple images.
- If any of the 2nd to 4th column is labeled "URL" or "url", it will be considered a link to a species page. Only links to web pages are supported (e.g. http://...) and each link must be written in double square brackets ([[http://...]]).
Note: synonyms and taxon hierarchies (supported in SDD) are presently not supported by the converter. If you need synonyms, you may want to add them to the scientific name column, like "Aus bus (syn. Cus bus)".
Character columns
All further columns create the character matrix. The character matrix may start in column 2, but then none of the columns 2-5 may be labeled with names reserved for taxon columns.
Characters may be quantitative (like counts or lengths) or categorical (like "round", "elliptical", or "red", "blue", etc.). The name of the character is always given in the first row (i.e., in the column heading).
- For any kind of character, a character image (i.e., an image that applies to all states of the character, e.g., for leaf length a diagram to explain how measure it (incl. or excl. the petiole)), may be given after the character name using the Bracketed-Image-Method (see below).
- For quantitative characters, a measurement unit (like mm, cm, mg, etc.) may be given after the character name (i.e., in the first row) in single square brackets. Example: "Leaf length [mm]".
- Characters can also be grouped by adding a group name. In the character list, the groups are then shown first, with the contained characters expandable for each group. Only a single grouping level (hierarchy) is presently supported by the converter.
- Method A: The group for which the character belongs is written in double curly brackets after the character name (e. g.,
Petal Color {{Flowers}}
). - Method B (preferred): The groups are in a second row below the character names. This group is recognized by the Words "Char.Group", "Chargroup" or "Merkmalsgruppe" in the first column (the one normally containing the organism/taxon names).
- Method A: The group for which the character belongs is written in double curly brackets after the character name (e. g.,
For each organism/taxon row, the states are listed directly in the cell under the character heading. Example: "red" under the heading "flower color" in the row corresponding to a plant with red flowers.
- Some organisms have more than one state for a character, e.g., for organism:daisy states for heading: "flower color" may be "white; yellow". Character states in a single cell may be semicolon separated.
- If no data is available this may be indicated by a question mark or by simply leaving the cell empty. (SDD: empty cell result in no output in SDD at all, "?" results in character with status="DataUnavailable")
- If a character is inapplicable for a certain taxon (e.g. "stipule form" for taxa without stipules) this may be indicated by using the abbreviations "n.a." OR "N/A" OR "n/a". (SDD: maps to status="DataUnavailable")
- State images, illustrating a particular character state may be added after the character state label using the Bracketed-Image-Method (see below).
Each state may bear a comment at the beginning or end, surrounded ("DELTA-like") by "<>". Comments at the beginning are created as modifiers, comments at the end as remarks/annotations. Example: the text in a cell may be: "<rarely>red<especially immediately after anthesis>;<usually>blue"
- Notes: 1. comma and semicolon may occur inside the <> without splitting into two states. 2. single > or < may occur in state labels, as in: "<mostly>sepal > petal length<a comment>". Search modifier and comment from right and left.
A character may have more than one state in a given taxon. The creator of the data can create 2 or more excel columns with identical character labels. The converter will recognize this and merge the states into a single character.
Wording import for Artenquiz
There may be given a < … >-syntax following DELTA. See also Artenquiz documentation
character e.g. “flower <color>”:
- Character.label = flower (color)
- Character.wording = flower
state e.g. “blue <maybe purple violet>”:
- CharacterState.label = blue (maybe purple violet)
- CharacterState.wording = blue
table-cell/attribute e.g. “<mostly>green<only during spring time>”:
(see also the talk about import modifier)
- mostly → Modifier
- only during spring time → note/comment e.g. in CategoricalCharacterData.notes
Bracketed-Image-Method
This method is used for character and character state images. After the character or character state label, the image name and image captions are added in double square brackets, with a vertical bar ("|") in front of the caption. A URL linking to additional file information (author, licence etc.) may be present after another vertical bar ("|").
Images may be stored anywhere on the Internet or on the Wiki. In the first cases the name must start with "http://", else a wiki name is assumed. Note that re-using images from other servers is often a violation of copyright, unless you own the image yourself. The use of wiki images is therefore easier, because the license is present and documented. It is either required or good practice to attribute any image authors different from the key authors in the image caption.
- Example: An Excel cell may contain a state label followed by two images: "blue [[File:Blue campanula flower.jpg | Blue flower (Campanula), (c) X. Smith, cc-by-sa 3.0]][[http://x/x.jpg | caption for second image]]".
Whereas character labels appear only a single time (first row), state labels may appear multiple times. The image that becomes the state image is the first state label with an image from the top. It is permissible (but not required) to add the image each time. However, if different images are used with the same state label in the same character, only the first set of images is accepted, all later ones are ignored.
Note that multiple images are separated just by their double brackets, not by any additional "," or ";" characters.
Metadata rows
Below the main matrix table, a blank row and a second row containing the text [METADATA]
indicates the start of key-wide metadata. The first column contains a field name, the second the value. The following fields are recommended:
[DATA MATRIX] | [DATA MATRIX] |
‹blank line› | ‹blank line› |
[METADATA] | |
Title: | A key to XXX |
Creators: | A. First; B. Second |
Description: | Free form text, including information about source of information, literature used, geographic or taxonomic scope, etc. (Mapping to SDD:Dataset/Representation/Detail) |
Language: | en |
Copyright: | (c) A. First and B. Second 2010 |
License: | Creative commons by-sa 3.0 |
Note: The keywords are recognized error tolerant with and without the colon, i.e. both Title: and Title are accepted.
Things to do:
Ideally, the conversion process would create an xml file and also a simple template to copy into a wiki page, so that after uploading the xml file to the wiki, it would only be necessary to paste this into a wiki page to get a running key.