Monday, July 09, 2007

EQO mobile: the way I see it

Well it has markedly improved. It is easier to install and it doesn't use Skype anymore so you do not need to leave your computer running Skype.
But now you have an additional company to buy credit from instead of Skype credit. The other down side is that now you cannot IM for free Skype users sitting in front of their computers. You can only communicate for free with people who have installed EQO mobile on their phones and this is not entirely free as you have to pay your network provider for data transfer (unless you have an all inclusive deal). I think that EQO mobile has now lost interesting functionality. O.K. it was a pain to have your computer on all the time but at least you could communicate with the millions of Skype users. Additionally, the costs of EQO calls to Mexico for example are higher than they use to be with the Skype set-up.

Wednesday, July 04, 2007

Tuesday, July 03, 2007

Installing tesseract command line OCR on MacOS X

Installing libpng from source:
http://kenno.wordpress.com/2006/04/20/compiling-libpng-for-mac-os-x/

fink install libjpeg, aspell, aspell-en

I will want to create my own aspell dictionary using taxonomic names:
http://www.mail-archive.com/code4lib@listserv.nd.edu/msg01545.html

Download and installing tesseract following install instructions:
http://code.google.com/p/tesseract-ocr/downloads/list

fink xpdf for pdfimages to extract images from a pdf:
>pdfimages -j LandPlants_paper.pdf LandPlantImg

To convert in imagemagick to tif for tesseract :
convert LandPlantImg.jpg -compress None test.tif

Using tesseract:
tesseract test.tif out.txt

I have now got a script to extract the names and check them against a dictionary of taxonomic names from spira.
I am thinking that using information from the article itself might provide even better results. When tesseract 2.0 comes out, there will also be a way of training the program to improve the character recognition. OCRupus also looks like an interesting program for layout detection but it doesn't work on MacOSx yet
The line extraction is proving to be much more difficult than first thought mainly because the lack of consistn format and the labelling at the nodes that get in the way of edge detection. I have tried a number of methods for cleaning up the image and bit by bit I will get there, I hope.






Disqus for Evo-Karma