[sf-lug] PDF embedded graphics

Jeff Bragg jackofnotrades at gmail.com
Sun Jul 10 13:27:02 PDT 2011


Still haven't found any gs-related ways to do this, but it does look like I
might be able to get the info via Apache's PDFBox API.  I think it'll work,
if I'm willing to soil my brain with Java.

http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDResources.html#getImages()

On Sun, Jul 10, 2011 at 12:25 PM, Jeff Bragg <jackofnotrades at gmail.com>wrote:

> Anyone know anything about extracting _embedded_ graphics (charts, tables,
> figures) from PDF files?  Note that I am _not_ interested in extracting
> pages as images (I already know how to do that, and it doesn't separate
> embedded elements), only embedded elements.  It seems like there should be a
> way to do this in Ghostscript, but I can't seem to track one down.
>
> It also occurs to me, given that PDF is an image-based format, that it may
> not be possible to separate embedded elements (they may get merged with
> relevant page info to form a single image/page without any metadata retained
> to guide later element separation).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://linuxmafia.com/pipermail/sf-lug/attachments/20110710/09982ef2/attachment.html>


More information about the sf-lug mailing list