PDF-editing programs mentioned below:
- pdfbox (Java)
- pjx (Java)
- PDF::API2 Perl module bundle
- PDF Chain
- PDF Shuffler
- PDF Split and Merge (aka pdfsam)
- mbt PDF Assembler (aka Mad Builder PDF Assembler) (recommended)
- PDFedit (recommended)
- poppler-utils (and Poppler library)
- pyPDF library
- LibreOffice / OpenOffice.org
- iText library
- ImageMagick's convert tool
- Qoppa Software's PDF Studio (proprietary)
From: Rick Moen <firstname.lastname@example.org>
Subject: Re: [vox-tech] PDF Editing
Date: Tue, 21 Dec 2004 17:15:23 -0800
X-Mas: Bah humbug.
These links brought to you courtesy of feedback postings to a recent "Grumpy Editor's Guide to PDF Viewers" feature on LWN.net: A reader asked about PDF editors.
Reader evgeny said:
Reader DrBubba said:
perl has the PDF::API2 bundle that I've used to break a series of pdf files down into pages and then reassemble them into a single document. This will require a little bit of coding on your part and the documentation with the module is a little bit spotty.
Reader tekNico said:
$ apt-cache show pdftk
If PDF is electronic paper, then pdftk is an electronic stapler-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:
- Merge PDF documents
- Split PDF pages into a new document
- Decrypt input as necessary (password required)
- Encrypt output as desired
- Burst a PDF document into single pages
- Report PDF on metrics, including metadata and bookmarks
- Uncompress and re-compress page streams
- Repair corrupted PDF (where possible)
Author: Sid Steward <email@example.com>
[RM: See also: http://www.linux.com/article.pl?sid=06/04/17/1943230 A reader comments that pkftk does not appear to be capable of PDF resizing, but that a promising approach would be to convert the PDF to PostScript, use native PostScript tools, then convert the file back.] See also separate entry for GhostScript.
Which in turn got two replies. Reader kfiles said:
And pdftk itself uses the handy Java iText libraries for actual PDF composition/decomposition. iText can specifically address the author's desire to modify PDF content inline. And if you don't want to use run-time Java interpreting, you can copy pdftk's technique of precompiling to native code using gjc.
Reader liamh said:
I had good luck with "Mad Builder PDF Assembler" http://thierry.schmit.free.fr/dev/mbtPdfAsm/enMbtPdfAsm2. It took a little while to figure out - you have to create an assembly/disassembly script - but it seems quite versatile.
I haven't personally investigated any of this stuff, but I've been meaning to, Real Soon Now.
PDF Chain is a GUI for pdftk written with gtkmm. You can merge some pdf files to one pdf file or split, set background/stamp or add attachments to one pdf file. There are also some options and tools.
Merging and splitting PDF play a significant part in computer user's dealing of PDF files. For instance, sometimes only a few pages of the whole file are required, so you need to split PDF into multiple sections. While, other times, you need to merge PDF files into one for the purpose of information integrity. In addition to merging and splitting PDF, adding watermark to PDF is also frequently used in everyday life. How can we solve all these problems all at one time? Now, we can turn to PDF Chain for help.
PDF Chain is a graphical front end for the PDF Toolkit which allows you to merge, split, watermark, rotate, add attachments to, and set permissions for existing PDF documents. In a word, PDF Chain can serve as PDF merger, PDF spliter and watermark maker.
PDF Chain is easy for users of all levels to grasp with its simple interface and features. First of all, follow the instructions listed in this software to finish installing PDF chain. Then, you can use it to merge PDF, split PDF as well as add watermark to PDF files.
To merge PDF documents all you need to do is click the + button to add the PDF documents you want to merge. Use appropriate arrows to move the PDF documents up and down. The order in which they appear in the Add window will be the order they are merged into. When you finish adding your PDF files, you can select the ID for the merged PDF. When all is done, click the Save button and give the new document a name to begin merging.
PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
PDF Split and Merge is a very simple, easy to use, free, open source utility to split and merge pdf files. It has a simple graphical interface to let the user choose pdf files, split or merge them.
You can also use pdfchain for pdf split and merge. The package includes features designed to handle PDF files in a easy way. Basically it can merge, split, add backgrounds or stamps and add attachments. There are some tools for extended needs, too. The GUI is written in GTKmm, a C++ library for GTK+.
You can also use pdfjam for pdf split and merge PDFjam is a small collection of shell scripts that work similarly to the well known psutils (psmerge, psnup). They provide a simple interface to some of the functionality of the pdfpages package for pdfLaTeX. At present, the utilities available are pdfnup, pdfjoin, and pdf90. PDFjam depends on a working installation of (pdf)LaTeX.
- pdfnup puts multiple document pages together on one physical page at a reduced size
- pdfjoin concatenates multiple PDF documents
- pdf90 rotates the pages of PDF documents
Date: Tue, 21 Dec 2004 18:05:30 -0800 (PST)
From: "Dylan Beaudette" <firstname.lastname@example.org>
To: "lugod's technical discussion forum" <email@example.com>
Subject: Re: [vox-tech] PDF Editing/Data Entry Question
I have heard that Scribus can do things like this. I have used it for various tasks that I have traditionally used Illustrator to do.
[RM: Scribus works by first converting the PDF to PostScript, which is cumbersome and can require experimentation to get right, depending on the conversion tools used and the versions of PDF and PostScript involved.
The KDE/KOffice "KWord" word processor can edit PDFs. However, in doing so, it first converts the document into its own data format, which can lose complex formatting.
The heavy-duty GIMP graphics program can open individual pages of a PDF file and edit them as an image. It can then output them only to PostScript.
mbt PDF Assembler
(the aforementioned "Mad Builder PDF Assembler")
This tool acts as an in-line tool for assembling/merging PDF files, extracting information from PDF files, and updating PDF files' metadata.
In assembling mode (default mode), this tool concatenates pages, in full file mode, or in pages list mode. In page list mode, outlines are not concatenated. However this tool makes it possible to add outlines via a definition file of outlines (-o option).
In extraction mode (-g[...]) (note informations, not data extractions) informations are printed on the standard output in a CSV format.[...]
In update mode (-u) the files matching the mask(s), are uptaded according to the command line options.[...]
[RM: You can also use this utility to add page numbers to existing PDFs.]
PDFedit offers complete native editing of PDF documents. You can either change raw PDF objects (for advanced users) or use predefined GUI functions. Tool can be used from either the GUI or the command line. Functions can be easily added, as everything is based on a scriptng facility. The qt 3.x graphics widget set is required. Code is in C++, with calls to xpdf, qt, and QSA. This utility is at beta version 0.2.5 level as of 2007-03.
pdftohtml is a utility (probably identical with or derived from the
one in poppler-utils; see next entry) that converts PDF files into HTML
and XML formats. It generates its output in the current working
directory, and extracts embedded graphics, as part of its operation, at
least for some PDFs.
Poppler is a PDF-rendering library based on xpdf that has multiple UI front-ends. poppler-utils is one, a set of command line utilities (based on Poppler) for getting information of PDF documents or convert them to other formats:
- pdffonts -- font analyzer
- pdfimages -- image extractor
- pdfinfo -- document information
- pdftohtml -- PDF to HTML converter
- pdftoppm -- PDF to PPM/PNG/JPEG image converter
- pdftops -- PDF to PostScript (PS) converter
- pdftotext -- text extraction
InkscapeDate: Thu, 7 Jul 2011 23:03:49 +1000 (EST)
From firstname.lastname@example.org Thu Jul 07 06: 4:20 2011
From: John Simmons
Subject: Re: Embedding images in PDFs
I have occasionally used Inkscape (open source) for editing PDFs - with some success. It really is a handy package, and its ability to edit PDFs has been very useful to me for several "once off" tasks.
Date: Wed, 6 Jul 2011 22:12:19 +1000
From: Matthew Cengia
Subject: Re: Embedding images in PDFs
If it's only a one-page PDF, I'd highly recommend trying to open it in Inkscape (which I believe only lets you open one page from a given PDF). Inkscape has a reasonable PDF decoder, and (for example) anything within the PDF that is currently a vector graphic will remain as such. Using this, you should be able to import an image, and add a text field for the date, and then save it as another PDF. Unfortunately, I often find that Inkscape isn't very efficient at saving PDFs, and they end up somewhat large. One partial solution I've found is to save a a PostScript file, and then use 'ps2pdf', which creates a much more efficient PDF.
pyPdf is a Python library for working with PDFs. Abilities:
- extracting document information (title, author, ...),
- splitting documents page by page,
- merging documents page by page,
- cropping pages,
- merging multiple pages into a single page,
- encrypting and decrypting PDF files.
Date: Thu, 07 Jul 2011 10:57:11 +1000
Subject: Re: Embedding images in PDFs
[RM notes: Thread had posed the problem of electronically signing a PDF.
Interesting problem, caused me to try out http://pybrary.net/pyPdf/ Based on their example code from that page, I was able to successfully overlay my signature on the 2nd page of an existing document. I had to create a pdf with a transparent background, and used Inkscape for that.
The python code I used was:
from pyPdf import PdfFileWriter, PdfFileReader fname = "paintissues.pdf" output = PdfFileWriter() input1 = PdfFileReader(file(fname, "rb")) # add page 1 from input1 to output document, unchanged output.addPage(input1.getPage(0)) # add page 2 from input1, but first add a watermark from another pdf: page2 = input1.getPage(1) #testsign.pdf is a small( 5cm(W) x 3cm(H) )pdf with my #signature created in inkscape with transparent background signature = PdfFileReader(file("testsign.pdf", "rb")) #page2.mergePage(signature.getPage(0)) # Use tx & ty to move my small pdf around the page to different locations tx = 150 ty = 150 page2.mergeTranslatedPage(signature.getPage(0), tx, ty) output.addPage(page2) # finally, write "output" to document-output.pdf outputStream = file("test-output.pdf", "wb") output.write(outputStream) outputStream.close()
LibreOffice / OpenOffice.org
If you have the PDF Import Extension installed, PDFs can be opened into LibreOffice Draw / OpenOffice.org Draw, modified to some extent, and then exported back out as PDF. Please note that this method does not provide a method to convert PDFs to LibreOffice word processory data, but rather stores the input as vector graphics data.
Because PDF Import Extension was merged into the Novell-sponsored Go-OO fork, that extension has been merged into LibreOffice, Go-OO's successor project.
Ghostscript gives you the power to combine files, convert files, and much more, all from the command line.
It is easy to combine several input files into one combined PDF using Ghostscript:
[RM note: The above was the beginning of an excellent article on the subject by Kurt Pfeifle, which I recommend but cannot reproduce here for copyright reasons. Article continues on the Linux Journal site: http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files
pstoedit is a free computer program that converts PostScript and PDF files to other vector formats. It supports many output formats, including WMF/EMF, PDF, DXF, CGM, and HTML, and by means of free/shareware plugins SVG, MIF and RTF. The author and maintainer is Wolfgang Glunz.
pstoedit uses ghostscript to perform the first part of the conversion process. Ghostscript converts the PostScript (or PDF) file to a more basic PostScript format, translating complex functions to basic functions, such as line draw commands. The second part of the conversion process consists of translating these basic functions into basic functions of the output format.
iText is a free and open source library for creating and manipulating PDF files in Java. It was written by Bruno Lowagie, Paulo Soares, and others. As of version 5.0.0 (released Dec 7, 2009) it is distributed under the Affero General Public License version 3. Previous versions of iText (Java: up to 2.1.7 and C# up to 4.1.6) were distributed under the Mozilla Public License or the LGPL. iText is also available through a proprietary license, distributed by iText Software Corp.
- Serve PDF to a browser
- Generate dynamic documents from XML file or databases
- Use PDF's many interactive features
- Add bookmarks, page numbers, watermarks, barcodes, etc.
- Split, concatenate and manipulate PDF pages
- Automate filling out PDF forms
- Add digital signatures to a PDF file
pdfrecycle is an open source cross-platform tool to create a PDF file by composing pages from other PDF files. It can add PDF bookmarks and metadata, scale, rotate and crop pages and put multiple logical pages onto each physical sheet. pdfrecycle is based on pdfTeX.
QPDF is a command-line program that does structural, content-preserving transformations on PDF files. It could have been called something like pdf-to-pdf. It also provides many useful capabilities to developers of PDF-producing software or for people who just want to look at the innards of a PDF file to learn more about how they work.
QPDF is capable of creating linearized (also known as Web-optimized) files and encrypted files. It is also capable of converting PDF files with object streams (also known as compressed objects) to files with no compressed objects or to generate object streams from files that don't have them (or even those that already do). QPDF also supports a special mode designed to allow you to edit the content of PDF files in a text editor. For more details, please see the documentation links below.
QPDF is not a PDF content creation library, a PDF viewer, or a program capable of converting PDF into other formats. In particular, QPDF knows nothing about the semantics of PDF content streams. If you are looking for something that can do that, you should look elsewhere. However, once you have a valid PDF file, QPDF can be used to transform that file in ways perhaps your original PDF creation can't handle. For example, programs generate simple PDF files but can't password-protect them, Web-optimize them, or perform other transformations of that type....
flpsed - a Postscript and PDF annotator
- Add arbitrary text to existing PostScript documents.
- Reedit text, that has been added with flpsed.
- The overall structure of the PostScript document is not modified. flpsed only adds the additional text.
- Batch processing (no X11 required) to modify tagged text lines that have been entered interactively with flpsed before. This is very useful for repeatedly filling in forms.
- Text lines can be imported from other flpsed-modified documents.
- Import and export PDF. Therefore it can be used as a PDF editor as well.
Requires X11, fltk (Fast Light ToolKit), and Ghostscript.
ImageMagick's convert tool
Among the large number of command-line tools included in the ImageMagick suite is "convert", whose capabilities include conversion of PDF to various image formats such a PNG, etc.
Qoppa Software's PDF Studio (proprietary)
"PDF Studio is a powerful, easy to use PDF editor that provides a large number of functions on PDF documents at a fraction of the cost of Adobe Acrobat and other PDF tools. PDF Studio maintains full compatibility with the PDF Standard." Looks to be Java-based.
Xournal is personal journal-creation/editing software. However, it includes PDF annotation/editing functions courtesy of the Poppler library:
"Xournal can be used to annotate PDF files, by loading the pages of a PDF file as backgrounds for a journal. As of version 0.4.5 this is done using the poppler library (previous versions used the pdftoppm converter, which is part of the xpdf utilities or the poppler utilities depending on distributions). The Annotate PDF command in the File menu can be used to load a PDF file into a new (empty) journal. The page backgrounds and page sizes correspond to the contents of the PDF file. (Most unencrypted PDF files should be supported)."