X-Mailer: XFMail 1.4.7 on Linux
From: Karl-Heinz Herrmann k.-h.herrmann@fz-juelich.de
To: linux-questions-only@ssc.com
Subject: RE: [TAG] searching PDFs made from faxes
Date: Tue, 01 Jul 2003 22:25:52 +0200 (CEST)

+-+--------------------------------------------------------------------+-+
+-+ You've asked a question of The Answer Gang, so you've been sent the
+-+ reply directly as a courtesy. The TAG list has also been copied.
+-+ Please send all replies to linux-questions-only@ssc.com so that
+-+ we can help our other readers by publishing the exchange in our monthly
+-+ web magazine Linux Gazette,http://www.LinuxGazette.com.
+-+--------------------------------------------------------------------+-+

On 01-Jul-2003 Faber Fedor wrote:
> Hey Gang,
>
> Is anyone aware of a way to search PDF files that were created from
> faxes, e.g. tiff files?
>
> I'm guessing that OCR has to be utilized here, right? I've come
> across things like pdftotext, but the fact that the PDF started life
> as a TIFF is, I think, a complication.
>
> For the record, I'm putting together a fax server solution for a
> client. The ability to search the faxes for text strings would be
> killer.

Hi,

your guess is quite right -- if the pdf contains only a large graphic
and no actual text you would need ocr. gocr or claraocr might come in
handy (gocr seems already trained while clara ocr is a quite different
method). gocr produced reasonable results for me already 1 or 2 years
back. BUT: I had clean 300dpi scans. From a jagged looking Fax..... I
guess you are facing serious problems.