Difference between revisions of "Batch importing media files into MediaWiki"

From Biowikifarm Metawiki
Jump to: navigation, search
(Importing the binary data: +bash highligting; hope it works too with '\'; +bash snippets)
m (Importing binary files manually (maintenance importImages.php): You cannot override page content via importImages.php --overwrite)
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Note: This is about importing binary files like images. See also: [[Mediawiki XML page importing]] (i.e. import text for pages)
+
Note: This is about importing binary files like images from the command line. See also: [[Mediawiki XML page importing]] (i.e. import text for pages using an xml-format).
  
==Importing the binary data==
+
==Importing binary files manually (maintenance importImages.php) ==
  
 +
{{Ombox
 +
|text=It is important to set the media files owner to www-data for all import media files (<code>sudo chown -R  www-data:www-data ./my-import-media-folder</code>). If the file's owner is root, no image scaling can be achieved.
 +
|type=content
 +
}}
 
<blockquote>
 
<blockquote>
<source lang="bash">
+
<source lang="bash">
 
#!bin/bash
 
#!bin/bash
 
# USAGE: php importImages.php [options] <dir>
 
# USAGE: php importImages.php [options] <dir>
Line 10: Line 14:
 
# --comment=<text>      Set upload summary comment, default 'Importing image file'
 
# --comment=<text>      Set upload summary comment, default 'Importing image file'
 
# --comment-file=<file> Set upload summary comment the the content of <file>
 
# --comment-file=<file> Set upload summary comment the the content of <file>
# --dry                Dry run, don't import anything
+
# --comment-ext=<ext>  Set extension for comment file
 +
# --dry                Dry run, don't import anything (*but create the page*)
 
# --overwrite          Overwrite existing images with the same name (default is to skip them)
 
# --overwrite          Overwrite existing images with the same name (default is to skip them)
 
# --user=<username>    Set username of uploader, default 'Maintenance script'
 
# --user=<username>    Set username of uploader, default 'Maintenance script'
Line 16: Line 21:
 
#######################################
 
#######################################
 
# run php ./maintenance/importImages.php as user www-data  
 
# run php ./maintenance/importImages.php as user www-data  
#  with images from /var/www/v-species/o/atmp_OR_adump and log it to
+
#  with images from /var/www/v-species/o/my-import-media-folder and log it to
#  /var/www/v-species/o/aaaimport.log
+
#  /var/www/v-species/o/my-import-media.log
 +
#######################################
 +
# step 0: prepare your media and store them into a temporary folder
 +
#        in wiki openmedia (/var/www/v-species/o/my-import-media-folder)
 +
#        make sure to have informative file names, e.g. “what” and “from whom”
 +
#        “Zygiobia carpini Loew, 1874 on Carpinus betulus (Michal Maňas, 2013).jpg”
 +
#        set owner to all import media files to www-data
 +
#        cd /var/www/v-species/o/my-import-media-folder && sudo chown -R  www-data:www-data ./
 +
#######################################
 +
# step 1: go to wiki openmedia (root)
 +
# hint in bash: \ means to continue the command line over multiple lines
 
#######################################
 
#######################################
 
cd /var/www/v-species/o
 
cd /var/www/v-species/o
# continue the command with \
+
sudo -u www-data php ./maintenance/importImages.php --conf ./LocalSettings.php --comment="{{Provider XX ZZZNAME}}  
sudo -u www-data php ./maintenance/importImages.php \
+
--conf ./LocalSettings.php \
+
--comment="{{Provider XX ZZZNAME}}  
+
 
{{Collection XX ZZZNAME}}  
 
{{Collection XX ZZZNAME}}  
 
{{Metadata  
 
{{Metadata  
 
  | Type  = StillImage  
 
  | Type  = StillImage  
 
  | Title        =   
 
  | Title        =   
  | Description  = {{de|1= }}{{en|1=}}  
+
  | Description  = {{Metadata Description de|1= }}{{en|1=}}  
 
  | Locality      =   
 
  | Locality      =   
 
  | Identified By =  
 
  | Identified By =  
Line 38: Line 50:
 
  | Subject Category = Amphibia  
 
  | Subject Category = Amphibia  
 
  | General Keywords =  
 
  | General Keywords =  
}}" \
+
}}" --user="XX ZZZNAME" /var/www/v-species/o/my-import-media-folder > /var/www/v-species/o/my-import-media.log
--user="XX ZZZNAME" /var/www/v-species/o/atmp_OR_adump > /var/www/v-species/o/aaaimport.log
+
  # update indices / job queue
  # ...
+
 
  cd /var/www/v-species/o
 
  cd /var/www/v-species/o
  # ...
+
  # optionally rebuild the links and indices used for searching your site
  php ./maintenance/runJobs.php   --conf ./LocalSettings.php
+
# sudo -u www-data php ./maintenance/rebuildall.php --dbuser wikiadmin --conf ./LocalSettings.php
 +
  # manually force the job queue to run
 +
# (sudo -u www-data AND --dbuser wikiadminCORRECT??)
 +
sudo -u www-data php ./maintenance/runJobs.php --dbuser wikiadmin  --conf ./LocalSettings.php --procs=3
 
</source>
 
</source>
 
</blockquote>
 
</blockquote>
Line 49: Line 63:
 
If no comment is added a default comment "Importing image file" will be inserted by mediawiki. The default user if none is added is "Maintenance script".
 
If no comment is added a default comment "Importing image file" will be inserted by mediawiki. The default user if none is added is "Maintenance script".
  
The comment can contain REAL line breaks, since the text is inside double quotes. Using \n may or may not work, real line breaks are preferred.
+
The comment can contain REAL line breaks, since the text is inside double quotes but the comment ''is inserted on the page '''only the first time''''' at the import. You ''cannot override'' page content via <code>importImages.php --overwrite</code>, in that case you must [[Mediawiki XML page importing|create an XML import file]].
  
 
To reduce server load, one may add the sleep option, time in seconds between files, e.g. --sleep=2.
 
To reduce server load, one may add the sleep option, time in seconds between files, e.g. --sleep=2.
  
It is advisable to redirect output to a file (here  > aaaimport.log), to be able to carefully check import success. To check script progress one can use the size of the output file (second terminal or WinSCP) to see if import has stopped.
+
It is advisable to redirect output to a file (here  > aaaimport.log), to be able to carefully check import success. To check script progress one can use the size of the output file (open a second ssh shell) to see if import has stopped.
  
For user names the name, not the ID counts. If the name does not exist yet, it will be considered an IP-Number...
+
For user names the name, not the ID counts. If the name does not exist yet, it will be imported nevertheless and stored like an IP-Number.
  
By default, the XML importing version of the web interface limits file sizes to around 1.4 MB. This can be changed by the server admin (or you in php.ini in maxuploadsize=); for larger imports and to prevent timeouts, use the command line interface described above.
+
<!-- START COMMENTED OUT
 +
 
 +
GREGOR: I am commenting this out. The use case is not clear to me, please explain. If reintrocuding, please make this a new section with its own heading
  
 
Depending on the file name an alternative approach might be to write a <code>for</code>-loop and extract parts of the file, writing it to bash variables and get more flexible imports. But it sounds maybe sophisticated to write such a stuff:
 
Depending on the file name an alternative approach might be to write a <code>for</code>-loop and extract parts of the file, writing it to bash variables and get more flexible imports. But it sounds maybe sophisticated to write such a stuff:
Line 72: Line 88:
 
   echo $fileName; # print it to the terminal
 
   echo $fileName; # print it to the terminal
 
   echo $fileExt;
 
   echo $fileExt;
   #do somthing
+
   #do something
 
done
 
done
 
# for-loop: words
 
# for-loop: words
Line 119: Line 135:
 
</source>
 
</source>
 
</blockquote>
 
</blockquote>
 +
 +
END COMMENTED OUT-->
 +
 +
 +
=== Import large files (eg. ZIP) ===
 +
 +
<syntaxhighlight lang="bash">
 +
# import ALL zip files in folder /tmp/DiversityGazetteer_010013/
 +
cd /var/www/v-species/o # the wiki path (here OpenMedia http://species-id.net/openmedia/)
 +
# HELP
 +
# php ./maintenance/importImages.php --help --conf
 +
# https://www.mediawiki.org/wiki/Manual:ImportImages.php
 +
 +
# --comment="..."        becomes the wiki text on the page
 +
# --user="..."            a valid user name see page Special:UserList of a wiki
 +
# --extensions=          comma-separated list of allowable extensions, defaults defined in the Wiki's settings to $wgFileExtensions
 +
sudo -u www-data php ./maintenance/importImages.php --conf ./LocalSettings.php --comment="
 +
{{Metadata
 +
| Type  = Dataset
 +
| Title        = Dateset DiversityGazetteer
 +
| Description  = DiversityGazetteer is a tool to visualize places from a DiversityGazetteer database within a geographical environment.
 +
| Locality      = 
 +
| Identified By =
 +
| Subject Sex  = 
 +
| Scientific Names = 
 +
| Common Names    = 
 +
| Language        = zxx
 +
| Creators        =
 +
| Subject Category =
 +
| General Keywords =
 +
}}" --user="A Correct User Name" --extensions=zip /tmp/DiversityGazetteer_010013/
 +
</syntaxhighlight>
  
 
== Problem removing files from temporary import folder ==
 
== Problem removing files from temporary import folder ==
  
 
A common problem after importing is that Linux is unable to delete the files in the temporary folder used for import. A normal "rm *" or "rm * -r" will terminate with the message '''Argument list too long'''. Typically, the length of all filenames combined may not exceed 128kB (this applies to all commands, not just rm). 128kB are easily exceeded by a few thousand images with filenames appropriate as wiki titles. Solution:
 
A common problem after importing is that Linux is unable to delete the files in the temporary folder used for import. A normal "rm *" or "rm * -r" will terminate with the message '''Argument list too long'''. Typically, the length of all filenames combined may not exceed 128kB (this applies to all commands, not just rm). 128kB are easily exceeded by a few thousand images with filenames appropriate as wiki titles. Solution:
 +
<blockquote>
 +
<source lang="bash">
 +
# remove non-interactive
 +
cd TheFolder; sudo find . -maxdepth 1 -name '*.?*' -exec rm {} ';'
 +
# interactively with confirming on each file “rm -i”
 +
cd TheFolder; sudo find . -name '*' -exec rm -i {} ';'
 +
# with a pipe “|” and xargs rm
 +
cd TheFolder; sudo find . -maxdepth 1 -name '*.?*' | xargs rm
  
cd TheFolder; sudo find . -name '*' | xargs rm
+
# check what would be deleted
 
+
find . -maxdepth 1 -name '*.?*' -exec echo {}  ';'
will do the job.
+
# {} → the found string
 +
# ';' → final argument for -exec (BTW + works too but is somewhat fragile, see in the manual of find «man find»)
 +
</source>
 +
</blockquote>
  
 
== Writing file names in a folder under Windows to text file ==
 
== Writing file names in a folder under Windows to text file ==
  
 
* Open command prompt (type: cmd in Windows Start button command box)
 
* Open command prompt (type: cmd in Windows Start button command box)
* type: chcp  1252 to switch from DOS/OEM codepage 850 to ANSI
+
* Change directory, switch from DOS/OEM codepage 850 to ANSI, write all file names to file 000.txt:
* type: dir/w > 000.txt to write all file names to file 000.txt
+
<source lang="bash">
 +
cd current directory
 +
chcp  1252
 +
dir/w > 000.txt
 +
</source>
 
* edit and process using text editor
 
* edit and process using text editor
  
Line 138: Line 201:
  
 
Often files should add or drop a prefix. To add a prefix "XXX_"use:
 
Often files should add or drop a prefix. To add a prefix "XXX_"use:
cd ../foto01; sudo for i in *.jpg; do mv -i "$i" "XXX_$i"; done
+
<blockquote>
 +
<source lang="bash">
 +
cd ../foto01; sudo for f in *.jpg; do mv -i "$f" "XXX_$f"; done
 +
</source>
 +
</blockquote>
  
Should this have been executed twice, or should a prefix be removed, use rename with a perl substitution string (s/fromold/tonew/), like
+
Should this have been executed twice, or should a prefix be removed, use rename (perl utilities) with a perl substitution string (s/fromold/tonew/), like
sudo rename 's/XXX_XXX_/XXX_/' *.jpg
+
<blockquote>
 +
<source lang="bash">
 +
sudo rename 's/XXX_XXX_/XXX_/' *.jpg
 +
</source>
 +
</blockquote>
 +
The util-linux-ng rename has no such perl substitution mechanism, but only simple string replacements:
 +
<blockquote>
 +
<source lang="bash">
 +
rename oldstring newstring whichFiles
 +
rename .htm .html *.htm
 +
rename image image0 image?? # → image001, image002, ...
 +
</source>
 +
</blockquote>
 +
But the same substitution can be done with a small bash script running in a for loop and using a pipe (“|”) to {{abbr|sed}}:
 +
<blockquote>
 +
<source lang="bash">
 +
################################
 +
# replace all white space characters ' ' with underscore characters '_'
 +
# list all *.jpg but replace all ' ' to '|' → save in $i
 +
# (the '|' in sed is a place holder for later replacement)
 +
for i in `ls *.jpg | sed 's/ /|/g'` ; do
 +
  # save in $old the old file name
 +
  old=`echo $i  | sed 's/|/ /g'`
 +
  # save in $old the new file name with '_'
 +
  new=`echo "$i" | sed 's/|/_/g'`
 +
  mv --force "$old" "$new"
 +
done
 +
# The original file name should not contain a “|” character, otherwise it is replaced too.
 +
</source>
 +
</blockquote>
  
 +
==Change ownershop from root to www-data==
  
[[Category: Software documentation]]
+
find /var/www/v-species/*/media/thumb/    -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'
 +
find /var/www/v-species/*/*/media/thumb/  -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'
 +
find /var/www/v-species/*/*/*/media/thumb/ -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'
 +
 
 +
== Preparing MetaData for image files from text files ==
 +
<source lang="bash">
 +
#!/bin/bash
 +
# concatenate several files to a prepared MediaWiki-Import
 +
# This has to be manually corrected and assumes iso-8859-1 source files
 +
# assume files (here e.g. file.meta) with metadat; defined in filterExtension
 +
 
 +
filterExtension="*.meta"
 +
sourceEncoding="iso-8859-1"
 +
targetEncoding="utf-8"
 +
 
 +
userName="WikiSysop"
 +
comment="Bot generated metadata update"
 +
 
 +
xmlWriteToFile="allmetadata_utf8.xml" # number of all files
 +
 
 +
# some info
 +
echo "Conactenate ${nFiles} files as MediaWiki Import into ${xmlWriteToFile}…"
 +
# write the header to ${xmlWriteToFile}
 +
cat > ${xmlWriteToFile} <<HEADER
 +
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
 +
  <siteinfo>
 +
    <sitename>OpenMedia</sitename>
 +
    <base>http://species-id.net/openmedia/Main_Page</base>
 +
    <generator>MediaWiki 1.18.0</generator>
 +
    <case>first-letter</case>
 +
    <namespaces>
 +
      <namespace key="-2" case="first-letter">Media</namespace>
 +
      <namespace key="-1" case="first-letter">Special</namespace>
 +
      <namespace key="0" case="first-letter" />
 +
      <namespace key="1" case="first-letter">Talk</namespace>
 +
      <namespace key="2" case="first-letter">User</namespace>
 +
      <namespace key="3" case="first-letter">User talk</namespace>
 +
      <namespace key="4" case="first-letter">OpenMedia</namespace>
 +
      <namespace key="5" case="first-letter">OpenMedia talk</namespace>
 +
      <namespace key="6" case="first-letter">File</namespace>
 +
      <namespace key="7" case="first-letter">File talk</namespace>
 +
      <namespace key="8" case="first-letter">MediaWiki</namespace>
 +
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
 +
      <namespace key="10" case="first-letter">Template</namespace>
 +
      <namespace key="11" case="first-letter">Template talk</namespace>
 +
      <namespace key="12" case="first-letter">Help</namespace>
 +
      <namespace key="13" case="first-letter">Help talk</namespace>
 +
      <namespace key="14" case="first-letter">Category</namespace>
 +
      <namespace key="15" case="first-letter">Category talk</namespace>
 +
      <namespace key="102" case="first-letter">Property</namespace>
 +
      <namespace key="103" case="first-letter">Property talk</namespace>
 +
      <namespace key="106" case="first-letter">Form</namespace>
 +
      <namespace key="107" case="first-letter">Form talk</namespace>
 +
      <namespace key="108" case="first-letter">Concept</namespace>
 +
      <namespace key="109" case="first-letter">Concept talk</namespace>
 +
      <namespace key="170" case="first-letter">Filter</namespace>
 +
      <namespace key="171" case="first-letter">Filter talk</namespace>
 +
      <namespace key="198" case="first-letter">Internal</namespace>
 +
      <namespace key="199" case="first-letter">Internal talk</namespace>
 +
      <namespace key="200" case="first-letter">Portal</namespace>
 +
      <namespace key="201" case="first-letter">Portal talk</namespace>
 +
      <namespace key="202" case="first-letter">Bibliography</namespace>
 +
      <namespace key="203" case="first-letter">Bibliography talk</namespace>
 +
      <namespace key="204" case="first-letter">Draft</namespace>
 +
      <namespace key="205" case="first-letter">Draft talk</namespace>
 +
      <namespace key="206" case="first-letter">Submission</namespace>
 +
      <namespace key="207" case="first-letter">Submission talk</namespace>
 +
      <namespace key="208" case="first-letter">Reviewed</namespace>
 +
      <namespace key="209" case="first-letter">Reviewed talk</namespace>
 +
      <namespace key="274" case="first-letter">Widget</namespace>
 +
      <namespace key="275" case="first-letter">Widget talk</namespace>
 +
    </namespaces>
 +
  </siteinfo>
 +
HEADER
 +
 
 +
progressInfo="."
 +
nFile=0
 +
 
 +
for metafile in *.meta; do
 +
  echo "<page><title>File:${metafile}</title>" >> ${xmlWriteToFile}
 +
  date=`date +'%Y-%m-%dT%H:%M:%SZ'` # 2011-12-19T10:43:11Z
 +
  echo "<revision><timestamp>${date}</timestamp>" >> ${xmlWriteToFile}
 +
  echo  "<contributor><username>${userName}</username></contributor><comment>${comment}</comment>" >> ${xmlWriteToFile}
 +
  text=`iconv -f ${sourceEncoding} -t ${targetEncoding} "${metafile}"`
 +
  #text=`cat "${metafile}.utf8"`
 +
  echo  "<text xml:space='preserve'>${text}</text>"  >> ${xmlWriteToFile}
 +
  echo "</revision>" >> ${xmlWriteToFile}
 +
echo  "</page>" >> ${xmlWriteToFile}
 +
 
 +
  nFile=$(( $nFile + 1 ))
 +
 
 +
  # progress info 100 dots then line break with modulo
 +
  if [ $(( $nFile % 100 )) == 0 ]; then
 +
    echo "$progressInfo"
 +
  else
 +
    echo -n "$progressInfo"
 +
  fi
 +
done
 +
echo "</mediawiki>"  >> ${xmlWriteToFile}
 +
# some info
 +
echo -e "\n … (done)"
 +
</source>
 +
[[Category: Import]]
 +
[[Category:MediaWiki]]

Latest revision as of 15:51, 2 June 2015

Note: This is about importing binary files like images from the command line. See also: Mediawiki XML page importing (i.e. import text for pages using an xml-format).

Importing binary files manually (maintenance importImages.php)

#!bin/bash
# USAGE: php importImages.php [options] <dir>
# options:
# --comment=<text>      Set upload summary comment, default 'Importing image file'
# --comment-file=<file> Set upload summary comment the the content of <file>
# --comment-ext=<ext>   Set extension for comment file
# --dry                 Dry run, don't import anything (*but create the page*)
# --overwrite           Overwrite existing images with the same name (default is to skip them)
# --user=<username>     Set username of uploader, default 'Maintenance script'
# ... some more options
#######################################
# run php ./maintenance/importImages.php as user www-data 
#   with images from /var/www/v-species/o/my-import-media-folder and log it to
#   /var/www/v-species/o/my-import-media.log
#######################################
# step 0: prepare your media and store them into a temporary folder
#         in wiki openmedia (/var/www/v-species/o/my-import-media-folder)
#         make sure to have informative file names, e.g. “what” and “from whom”
#         “Zygiobia carpini Loew, 1874 on Carpinus betulus (Michal Maňas, 2013).jpg”
#         set owner to all import media files to www-data
#         cd /var/www/v-species/o/my-import-media-folder && sudo chown -R  www-data:www-data ./
#######################################
# step 1: go to wiki openmedia (root)
# hint in bash: \ means to continue the command line over multiple lines
#######################################
cd /var/www/v-species/o
sudo -u www-data php ./maintenance/importImages.php --conf ./LocalSettings.php --comment="{{Provider XX ZZZNAME}} 
{{Collection XX ZZZNAME}} 
{{Metadata 
 | Type  = StillImage 
 | Title         =  
 | Description   = {{Metadata Description de|1= }}{{en|1=}} 
 | Locality      =  
 | Identified By = 
 | Subject Sex   = female 
 | Scientific Names =  
 | Common Names     =  
 | Language         = zxx 
 | Creators         = XX ZZZNAME 
 | Subject Category = Amphibia 
 | General Keywords = 
}}" --user="XX ZZZNAME" /var/www/v-species/o/my-import-media-folder > /var/www/v-species/o/my-import-media.log
 # update indices / job queue
 cd /var/www/v-species/o
 # optionally rebuild the links and indices used for searching your site
 # sudo -u www-data php ./maintenance/rebuildall.php --dbuser wikiadmin --conf ./LocalSettings.php
 # manually force the job queue to run 
 # (sudo -u www-data AND --dbuser wikiadminCORRECT??)
 sudo -u www-data php ./maintenance/runJobs.php --dbuser wikiadmin  --conf ./LocalSettings.php --procs=3

If no comment is added a default comment "Importing image file" will be inserted by mediawiki. The default user if none is added is "Maintenance script".

The comment can contain REAL line breaks, since the text is inside double quotes but the comment is inserted on the page only the first time at the import. You cannot override page content via importImages.php --overwrite, in that case you must create an XML import file.

To reduce server load, one may add the sleep option, time in seconds between files, e.g. --sleep=2.

It is advisable to redirect output to a file (here > aaaimport.log), to be able to carefully check import success. To check script progress one can use the size of the output file (open a second ssh shell) to see if import has stopped.

For user names the name, not the ID counts. If the name does not exist yet, it will be imported nevertheless and stored like an IP-Number.


Import large files (eg. ZIP)

# import ALL zip files in folder /tmp/DiversityGazetteer_010013/
cd /var/www/v-species/o # the wiki path (here OpenMedia http://species-id.net/openmedia/)
# HELP
# php ./maintenance/importImages.php --help --conf
# https://www.mediawiki.org/wiki/Manual:ImportImages.php

# --comment="..."         becomes the wiki text on the page
# --user="..."            a valid user name see page Special:UserList of a wiki
# --extensions=           comma-separated list of allowable extensions, defaults defined in the Wiki's settings to $wgFileExtensions
sudo -u www-data php ./maintenance/importImages.php --conf ./LocalSettings.php --comment="
{{Metadata 
 | Type  = Dataset 
 | Title         = Dateset DiversityGazetteer
 | Description   = DiversityGazetteer is a tool to visualize places from a DiversityGazetteer database within a geographical environment.
 | Locality      =  
 | Identified By = 
 | Subject Sex   =  
 | Scientific Names =  
 | Common Names     =  
 | Language         = zxx 
 | Creators         =
 | Subject Category =
 | General Keywords = 
}}" --user="A Correct User Name" --extensions=zip /tmp/DiversityGazetteer_010013/

Problem removing files from temporary import folder

A common problem after importing is that Linux is unable to delete the files in the temporary folder used for import. A normal "rm *" or "rm * -r" will terminate with the message Argument list too long. Typically, the length of all filenames combined may not exceed 128kB (this applies to all commands, not just rm). 128kB are easily exceeded by a few thousand images with filenames appropriate as wiki titles. Solution:

# remove non-interactive
cd TheFolder; sudo find . -maxdepth 1 -name '*.?*' -exec rm {} ';'
# interactively with confirming on each file “rm -i”
cd TheFolder; sudo find . -name '*' -exec rm -i {} ';'
# with a pipe “|” and xargs rm
cd TheFolder; sudo find . -maxdepth 1 -name '*.?*' | xargs rm

# check what would be deleted
find . -maxdepth 1 -name '*.?*' -exec echo {}  ';'
# {} → the found string
# ';' → final argument for -exec (BTW + works too but is somewhat fragile, see in the manual of find «man find»)

Writing file names in a folder under Windows to text file

  • Open command prompt (type: cmd in Windows Start button command box)
  • Change directory, switch from DOS/OEM codepage 850 to ANSI, write all file names to file 000.txt:
 cd current directory
 chcp  1252
 dir/w > 000.txt
  • edit and process using text editor

Batch renaming files in Linux

Often files should add or drop a prefix. To add a prefix "XXX_"use:

cd ../foto01; sudo for f in *.jpg; do mv -i "$f" "XXX_$f"; done

Should this have been executed twice, or should a prefix be removed, use rename (perl utilities) with a perl substitution string (s/fromold/tonew/), like

sudo rename 's/XXX_XXX_/XXX_/' *.jpg

The util-linux-ng rename has no such perl substitution mechanism, but only simple string replacements:

rename oldstring newstring whichFiles
rename .htm .html *.htm
rename image image0 image?? # → image001, image002, ...

But the same substitution can be done with a small bash script running in a for loop and using a pipe (“|”) to sed:

################################
# replace all white space characters ' ' with underscore characters '_'
# list all *.jpg but replace all ' ' to '|' → save in $i
# (the '|' in sed is a place holder for later replacement)
for i in `ls *.jpg | sed 's/ /|/g'` ; do 
  # save in $old the old file name
  old=`echo $i   | sed 's/|/ /g'`
  # save in $old the new file name with '_'
  new=`echo "$i" | sed 's/|/_/g'`
  mv --force "$old" "$new"
done
# The original file name should not contain a “|” character, otherwise it is replaced too.

Change ownershop from root to www-data

find /var/www/v-species/*/media/thumb/     -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'
find /var/www/v-species/*/*/media/thumb/   -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'
find /var/www/v-species/*/*/*/media/thumb/ -maxdepth 3 -user root -name '*' -exec sudo chown -R  www-data:www-data '{}' ';'

Preparing MetaData for image files from text files

#!/bin/bash
# concatenate several files to a prepared MediaWiki-Import
# This has to be manually corrected and assumes iso-8859-1 source files
# assume files (here e.g. file.meta) with metadat; defined in filterExtension 

filterExtension="*.meta"
sourceEncoding="iso-8859-1"
targetEncoding="utf-8"

userName="WikiSysop"
comment="Bot generated metadata update"

xmlWriteToFile="allmetadata_utf8.xml" # number of all files

# some info
echo "Conactenate ${nFiles} files as MediaWiki Import into ${xmlWriteToFile}…"
# write the header to ${xmlWriteToFile}
cat > ${xmlWriteToFile} <<HEADER
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
  <siteinfo>
    <sitename>OpenMedia</sitename>
    <base>http://species-id.net/openmedia/Main_Page</base>
    <generator>MediaWiki 1.18.0</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">OpenMedia</namespace>
      <namespace key="5" case="first-letter">OpenMedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
      <namespace key="102" case="first-letter">Property</namespace>
      <namespace key="103" case="first-letter">Property talk</namespace>
      <namespace key="106" case="first-letter">Form</namespace>
      <namespace key="107" case="first-letter">Form talk</namespace>
      <namespace key="108" case="first-letter">Concept</namespace>
      <namespace key="109" case="first-letter">Concept talk</namespace>
      <namespace key="170" case="first-letter">Filter</namespace>
      <namespace key="171" case="first-letter">Filter talk</namespace>
      <namespace key="198" case="first-letter">Internal</namespace>
      <namespace key="199" case="first-letter">Internal talk</namespace>
      <namespace key="200" case="first-letter">Portal</namespace>
      <namespace key="201" case="first-letter">Portal talk</namespace>
      <namespace key="202" case="first-letter">Bibliography</namespace>
      <namespace key="203" case="first-letter">Bibliography talk</namespace>
      <namespace key="204" case="first-letter">Draft</namespace>
      <namespace key="205" case="first-letter">Draft talk</namespace>
      <namespace key="206" case="first-letter">Submission</namespace>
      <namespace key="207" case="first-letter">Submission talk</namespace>
      <namespace key="208" case="first-letter">Reviewed</namespace>
      <namespace key="209" case="first-letter">Reviewed talk</namespace>
      <namespace key="274" case="first-letter">Widget</namespace>
      <namespace key="275" case="first-letter">Widget talk</namespace>
    </namespaces>
  </siteinfo>
HEADER

progressInfo="."
nFile=0

for metafile in *.meta; do
  echo "<page><title>File:${metafile}</title>" >> ${xmlWriteToFile}
  date=`date +'%Y-%m-%dT%H:%M:%SZ'` # 2011-12-19T10:43:11Z
  echo "<revision><timestamp>${date}</timestamp>" >> ${xmlWriteToFile}
  echo   "<contributor><username>${userName}</username></contributor><comment>${comment}</comment>" >> ${xmlWriteToFile}
  text=`iconv -f ${sourceEncoding} -t ${targetEncoding} "${metafile}"`
  #text=`cat "${metafile}.utf8"`
  echo   "<text xml:space='preserve'>${text}</text>"  >> ${xmlWriteToFile}
  echo "</revision>" >> ${xmlWriteToFile}
 echo  "</page>" >> ${xmlWriteToFile}

  nFile=$(( $nFile + 1 ))

  # progress info 100 dots then line break with modulo
  if [ $(( $nFile % 100 )) == 0 ]; then
    echo "$progressInfo"
  else
    echo -n "$progressInfo"
  fi
done
echo "</mediawiki>"  >> ${xmlWriteToFile}
# some info
echo -e "\n … (done)"