[sf-lug] PDF embedded graphics

Jeff Bragg jackofnotrades at gmail.com
Sun Jul 10 14:13:23 PDT 2011


Looks like it got most, but not all of them from the file I'm testing with.
Much better than none.  I'll still try to get PDFBox to do what I want,
since I think it may more reliably extract them, but pdftohtml definitely
does better than no images at all (which is what Apache Tika gives).

On Sun, Jul 10, 2011 at 1:52 PM, Akkana Peck <akkana at shallowsky.com> wrote:

> Jeff Bragg writes:
> > Anyone know anything about extracting _embedded_ graphics (charts,
> tables,
> > figures) from PDF files?
>
> The pdftohtml program does this as part of its operation, at least
> for some PDFs.  It doesn't work on all of them. Worth a try.
>
>        ...Akkana
>
> _______________________________________________
> sf-lug mailing list
> sf-lug at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/sf-lug
> Information about SF-LUG is at http://www.sf-lug.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://linuxmafia.com/pipermail/sf-lug/attachments/20110710/b7b9f2b2/attachment.html>


More information about the sf-lug mailing list