Batch importing media files into MediaWiki
Note: This is about importing binary files like images. See also: Mediawiki XML page importing (i.e. import text for pages)
Contents
Importing the binary data
#!bin/bash # USAGE: php importImages.php [options] <dir> # options: # --comment=<text> Set upload summary comment, default 'Importing image file' # --comment-file=<file> Set upload summary comment the the content of <file> # --dry Dry run, don't import anything # --overwrite Overwrite existing images with the same name (default is to skip them) # --user=<username> Set username of uploader, default 'Maintenance script' # ... some more options ####################################### # run php ./maintenance/importImages.php as user www-data # with images from /var/www/v-species/o/atmp_OR_adump and log it to # /var/www/v-species/o/aaaimport.log ####################################### cd /var/www/v-species/o # continue the command with \ sudo -u www-data php ./maintenance/importImages.php \ --conf ./LocalSettings.php \ --comment="{{Provider XX ZZZNAME}} {{Collection XX ZZZNAME}} {{Metadata | Type = StillImage | Title = | Description = {{de|1= }}{{en|1=}} | Locality = | Identified By = | Subject Sex = female | Scientific Names = | Common Names = | Language = zxx | Creators = XX ZZZNAME | Subject Category = Amphibia | General Keywords = }}" \ --user="XX ZZZNAME" /var/www/v-species/o/atmp_OR_adump > /var/www/v-species/o/aaaimport.log # ... cd /var/www/v-species/o # ... php ./maintenance/runJobs.php --conf ./LocalSettings.php
If no comment is added a default comment "Importing image file" will be inserted by mediawiki. The default user if none is added is "Maintenance script".
The comment can contain REAL line breaks, since the text is inside double quotes. Using \n may or may not work, real line breaks are preferred.
To reduce server load, one may add the sleep option, time in seconds between files, e.g. --sleep=2.
It is advisable to redirect output to a file (here > aaaimport.log), to be able to carefully check import success. To check script progress one can use the size of the output file (second terminal or WinSCP) to see if import has stopped.
For user names the name, not the ID counts. If the name does not exist yet, it will be considered an IP-Number...
By default, the XML importing version of the web interface limits file sizes to around 1.4 MB. This can be changed by the server admin (or you in php.ini in maxuploadsize=); for larger imports and to prevent timeouts, use the command line interface described above.
Depending on the file name an alternative approach might be to write a for
-loop and extract parts of the file, writing it to bash variables and get more flexible imports. But it sounds maybe sophisticated to write such a stuff:
#!bin/bash ####################################### # general for-loop with simple substitution ${...} see below for myfile in *.JPG; do # save a variable fileName=${myfile%.*} fileExt =${myfile#*.} echo $fileName; # print it to the terminal echo $fileExt; #do somthing done # for-loop: words for file in a b c; do echo "$file copied"; # a copied # b copied # c copied done # for-loop: a sequence 001 002 003 etc. for i in $(seq --format=%003.f 1 150); do echo $i done ####################################### # some bash replaces/substitutions in general # ${parameter/pattern search/string replaced} # example: |-> remove # removes on the left side longPath='./hi/structure/file.ext' echo ${longPath#*/*} # extracts hi/structure/file.ext # ## → instead takes the longest match (away) echo ${longPath##*/} # extracts 'file.ext' # example: remove <-| # removes on the right side echo ${longPath%*/*} # extracts ./hi/structure # %% → instead takes the longest match (away) echo ${longPath%%/*} # extracts . ####################################### # snippet for counting JPG-files in the current directory with output nFiles=`ls *.JPG | wc -l` # number of lines nCharMax=`ls *.JPG | wc -L` # longest line nCharsNumber=`echo $nFiles | wc -m` # number of characters i=0; # start with zero for file in *.JPG; do i=$(expr $i + 1) # add 1 fileName=${file%.*} fileExt=${file#*.} printf "%"$nCharMax"s %2."$nCharsNumber"d of %1d\\n" $file $i $nFiles done # might give: # DSC03678.JPG 0001 of 362 # DSC03679.JPG 0002 of 362 # ...
Problem removing files from temporary import folder
A common problem after importing is that Linux is unable to delete the files in the temporary folder used for import. A normal "rm *" or "rm * -r" will terminate with the message Argument list too long. Typically, the length of all filenames combined may not exceed 128kB (this applies to all commands, not just rm). 128kB are easily exceeded by a few thousand images with filenames appropriate as wiki titles. Solution:
cd TheFolder; sudo find . -name '*' | xargs rm
will do the job.
Writing file names in a folder under Windows to text file
- Open command prompt (type: cmd in Windows Start button command box)
- type: chcp 1252 to switch from DOS/OEM codepage 850 to ANSI
- type: dir/w > 000.txt to write all file names to file 000.txt
- edit and process using text editor
Batch renaming files in Linux
Often files should add or drop a prefix. To add a prefix "XXX_"use:
cd ../foto01; sudo for i in *.jpg; do mv -i "$i" "XXX_$i"; done
Should this have been executed twice, or should a prefix be removed, use rename with a perl substitution string (s/fromold/tonew/), like
sudo rename 's/XXX_XXX_/XXX_/' *.jpg