Installing GNU Gift on QemuOemrServer?
Assumptions
We're going to assume:
- you're using a virtual machine manufactured to the specifications of the FAI configuration maintained by user juri_ at gitorious.org
Preparing Source
Following the directions on Savannah, use CVS to download the newest version:
mkdir -p /home/gift/gift/src/orig/ cd /home/gift/gift/src/orig/ cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/gift co gift mv gift gnuift-0.1.14
Create an original tarball, for later debian package building:
tar -cvzf gnuift_0.1.14.orig.tar.gz gnuift-0.1.14/
Building from Source
Generate files required to build:
cd gift ./bootstrap-cvs.sh
Set up the build system:
./configure
Build Gift:
make
Building a Debian Package
Now that you have built the tree, lets get the tools to build a debian package out of our source tree.
mkdir -p /home/gift/gift/src/debian_package/ cd /home/gift/gift/src/debian_package/ git clone git://gitorious.org/gnu-gift-debian-package/gnu-gift-debian-package.git debian tar -cvzf gnuift_0.1.14.debian.tar.gz debian/
Grab our tarball we created earlier
cp /home/gift/gift/src/orig/*.tar.gz /home/gift/gift/src/debian_package/
Extract gift, adding a debian control directory.
tar -xzf gnuift_0.1.14.orig.tar.gz cd gnuift-0.1.14 tar -xzf ../gnuift_0.1.14.debian.tar.gz
now, build the package.
debuild
Installing Your New Debian Packages
Since image indexing takes twice(!!) as long with whats in debian, versus the packages we just built, make sure to install the packages you just created.
sudo dpkg -i /home/gift/gift/src/debian_package/*.deb
Obtaining Sample Image Data
I grabbed my sample image data from The Open Clip Art Project.
OpenClipArt? 0.19
Specifically, I downloaded their 0.19 release, then ran the following command, to move all the .png files into one flat directory, for gift to index:
cd /home/gift/ wget http://www.openclipart.org/downloads/0.19/openclipart-0.19.tar.bz2 tar -xjf openclipart-0.19.tar.bz2 mkdir -p gift/openclipart-0.19/ cp -a `find openclipart-0.19 -name *.png` gift/openclipart-0.19/
OpenClipArt? 2.0
Download the 2.0 release, then move all the .png files into one flat directory, for gift to index:
cd /home/gift/ wget http://www.openclipart.org/downloads/2.0/openclipart-2.0-full.tar.bz2 tar -xjf openclipart-2.0-full.tar.bz2 mkdir -p gift/openclipart-2.0/ cp -a `find openclipart-2.0-full -name *.png` gift/openclipart-2.0/
Indexing your Collection
To index your collection, first copy in a default configuration file, then run gift-add-collection with the full path of the images you want to import.
The gift packaged in lenny and the current CVS version fail to create a config file. copying the file from the default, before running gift-add-collection.pl works around the problem.
cp /usr/share/libmrml1/gift-config.mrml /home/gift/ gift-add-collection.pl /home/gift/gift/openclipart-*/
now wait a long, long time.
Speeding this up
Downloading and installing the new version from CVS results in this step taking half(!) as long.
In addition, you can create the thumbnails using a separate script, at the same time. gift-add-collection.pl will skip generating a thumbnail if you create them for it either ahead of time, or while it is running.
NOTE: this script is maintained in CVS, and the version in this web page may become out-of-date.
gift-generate-thumbnails.sh
#!/bin/sh
# first argument: path of directory containing images to thumbnail
target=`dirname $1`/`basename $1`
thumbnail_dir=`dirname $1`/`basename $1`_thumbnails
echo "converting images in $target, placing them in $thumbnail_dir."
for each in `find $target -maxdepth 1 -type f|sed "s=\(.*\)/=="`; do {
convname=`echo $each | sed "s/\(.*\)[.]/\1_thumbnail_/"`
if [ ! -f "$thumbnail_dir/${convname}.jpg" ]; then
{
echo converting $each
convert -geometry 128x128 -quality 100 ${target}/${each} $thumbnail_dir/${convname}.jpg
}
fi
}
done;
Fixing Problems
Failure to add Collection ID
PROGRESS: 99% Copying /home/gift/gift-config.mrml to /home/gift/gift-config.mrml-old XML::DOM::Attr=ARRAY(0x9e06b78) Can't locate object method "getAttribute" via package "XML::DOM::Attr" at /usr/b in/gift-add-collection.pl line 855, <LOCALELIST> line 274. ----> collection-id c-59-50-8-18-9-111-2-290-0 <----
This happened after a 'successful' run, where the mergesort had gone sideways. i had removed the gift data directory, and forgotten to restore the gift-config to default.
to fix it, since the collection had already written out this file successfully with the wrong collection ID, it is meerly required to find the collection ID in the file, and replace it with the one presented in the error message (not the one from this site!).
MergeSort? Blowup
...finished before mergesort Starting quicksort: 1048576 elements per page. Sorting files /home/gift/gift-indexing-data/images//gift-auxiliary-1 to /home/gift/gift-indexing-data/images//gift-auxiliary-2 NOW ALLOCATING A PAGE1048576 HIERFIRSTLEVELQUICK226868124;0 ................gift-generate-inverted-file: ../../libGIFTAcInvertedFile/include /merge_sort_streams.h:282: void first_level_quicksort(int, const char*, const ch ar*) [with T = CIFBuilderTriplet]: Assertion `lTemporary' failed. PROGRESS: 99%
This was the result of running out of disk during the mergesort. removed everything, freed up some disk space, and started over.
Out of Memory
PROGRESS: 99% Copying /home/gift/gift-config.mrml to /home/gift/gift-config.mrml-old Ran out of memory for input buffer at /usr/lib/perl5/XML/Parser/Expat.pm line 469, <LOCALELIST> line 270.
Run out of memory? create a 256 meg swap file, and use it:
dd if=/dev/zero of=/home/gift/swapfile bs=4k count=65536 sudo mkswap /home/gift/swapfile sudo swapon /home/gift/swapfile
now skip down to the section on re-starting your run.
Infinity when re-starting a run
STARTING mit MERGESIZE1 MERGESORT MergeSize 12 endmerge after mergesort. The last file I used was /home/gift/gift-indexing-data/clipart/ /gift-auxiliary-2 Opening sorted stream for reading. State (should be '1'): 0xbff84bec [inFeatureID:4/0;inPosition:16/0==0]20 Writing Chunk for Feature ID 0. The Offset is 0x0=0 The collection frequency is: inf gift-generate-inverted-file: CInvertedFileChunk.cc:117: bool CInvertedFileChunk: :writeBinary(std::ostream&, TID, size_t) const: Assertion `!"collection frequenc y out of range"' failed. PROGRESS: 99% Copying /home/gift/gift-config.mrml to /home/gift/gift-config.mrml-old Ran out of memory for input buffer at /usr/lib/perl5/XML/Parser/Expat.pm line 46 9, <LOCALELIST> line 274.
This happened when I re-ran gift-add-colllection after a failed run due to no config file.
because it did not regenerate any .fts files, the script fails the next step.
Fix the root cause of the failure, then follow the directions on re-starting a broken run:
cp /usr/share/libmrml1/gift-config.mrml /home/gift/
Re-starting A Broken Run
Assuming you have fixed the root cause of whatever failure you encountered...
Backup the completed mapping between URIs and feature files.
cp /home/gift/gift-indexing-data/openclipart-*/url2fts.xml /home/gift
Next, clean out generated files.
rm /home/gift/gift-indexing-data/openclipart-*/InvertedFile* rm /home/gift/gift-indexing-data/openclipart-*/gift-aux* rm /home/gift/gift-indexing-data/openclipart-*/00*
re-run gift-add-collection, and place the backup url2fts.xml file in place.
gift-add-collection.pl /home/gift/gift/openclipart-*/ cp /home/gift/url2fts.xml /home/gift/gift-indexing-data/openclipart-*/
now, get a count of images in your image repository, and insert the count into /home/gift/gift-config.mrml.
cat gift-config.mrml | sed 's/images="[0-9]*/images="'`find gift/openclipart-*/ -type f | wc -l`'/' > gift-config.new cp gift-config.new gift-config.mrml
Finally, have gift-add-collection update the collection ID in the configuration file.
gift-add-collection.pl -fix-config /home/gift/gift/openclipart-*/
Setting up a Frontend
RainbowSock? (derived from the historic monosock) is available via gitorious, so to grab a copy:
mkdir -p /home/gift/rainbowsock cd /home/gift/rainbowsock git clone git://gitorious.org/rainbowsock/rainbowsock.git rainbowsock
Now, create a tarball of the distribution, so that we can extract it into the webroot at /var/www/:
cd rainbowsock git archive -o ../rainbowsock-master.tar master
Extract the front-end into the webroot:
cd /var/www/ sudo tar -xf /home/gift/rainbowsock/rainbowsock-master.tar
Remove the default "it works!" page.
sudo rm /var/www/index.html
Add an alias for /home/gift/gift, to serve the images and their thumbnails to the public through apache.
Add the following in /etc/apache2/sites-available/default, between the VirtualHost? tags.
Alias /gift/ "/home/gift/gift/"
<Directory /home/gift/gift/>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
</Directory>
Finally, edit /var/www/include/config.php, and change the two 'file' URIs to match the directory names of your image path, and thumbnail path, respectively.
Improving the Frontend
See GiftNewFrontend
Using GIFT for sorting images
Import Batch of Images
We need a web interface for importing a collection of images, that stores image metadata (relationships of one image to another, whether an image is part of the image database or a 'stock' image, tags, etc) in an sql database.
Create a default set that is all images in the imported collection, named 'collectionname-origin'. This set should be read-only.
Sort Images into Sets
Using something similar to the current 'search' page, decide form type by making a search that only returns forms of the type you are looking for. To do this, run a search, and create a set by using that search's results, plus a floor value for how alike of results to find. create an derived anti-set to pull out any false-positives, and you're done sorting.
cutting forms up
Create a new collection out of the result set, instead of using convert to resize the entire image in the import script, crop it to known region.. (look at css for defining regions?)
This will cut the field segments we want out. you may want to import them into a new collection, along side some 'known' field images (known signatures, printed client IDs, etc)
Getting Results
Compare the imported/generated images with the cut fields. matches should be in the high percentage with matching fields. Use this corelation to file forms by field match.
Importing into OpenEMR
sort forms based on type, AND field match. perform action with form based on form type.
