X-Mailer: XFMail 1.4.7 on Linux
From: Karl-Heinz Herrmann k.-h.herrmann@fz-juelich.de
To: linux-questions-only@ssc.com
Subject: RE: [TAG] searching PDFs made from faxes
Date: Tue, 01 Jul 2003 22:25:52 +0200 (CEST)
+-+--------------------------------------------------------------------+-+
+-+ You've asked a question of The Answer Gang, so you've been
sent the
+-+ reply directly as a courtesy. The TAG list has also been
copied.
+-+ Please send all replies to linux-questions-only@ssc.com
so that
+-+ we can help our other readers by publishing the exchange in
our monthly
+-+ web magazine Linux Gazette,http://www.LinuxGazette.com.
+-+--------------------------------------------------------------------+-+
On 01-Jul-2003 Faber Fedor wrote:
> Hey Gang,
>
> Is anyone aware of a way to search PDF files that were
created from
> faxes, e.g. tiff files?
>
> I'm guessing that OCR has to be utilized here, right? I've
come
> across things like pdftotext, but the fact that the PDF
started life
> as a TIFF is, I think, a complication.
>
> For the record, I'm putting together a fax server solution
for a
> client. The ability to search the faxes for text strings
would be
> killer.
Hi,
your guess is quite right -- if the pdf contains only a large
graphic
and no actual text you would need ocr. gocr or claraocr might
come in
handy (gocr seems already trained while clara ocr is a quite
different
method). gocr produced reasonable results for me already 1 or 2
years
back. BUT: I had clean 300dpi scans. From a jagged looking
Fax..... I
guess you are facing serious problems.