Reading PDF Properties with Ruby

Posted by Peter Wagenet Sat, 01 Nov 2008 20:07:00 GMT

So off the bat I’ll admit that this is not the most interesting topic that I could write about. However, I did have to do some PDF Processing Recently in which I had to read some custom properties from a PDF. It took me a little while to stumble on to a solution so this may serve as a useful fast track.

What I eventually found was PDF::Reader, available over on github. In the README, he gives instructions for extracting metadata, which was exactly what I wanted. Unfortunately, the setup is a bit awkward, requiring an additional receiver.

To simplify things, I wrote the following snippet:

require 'pdf/reader'

class MetaDataReceiver
  attr_accessor :regular
  attr_accessor :xml

  def metadata(data)
    @regular = data
  end

  def metadata_xml(data)
    @xml = data
  end
end

class PDF::Reader
  def self.metadata(path)
    receiver = MetaDataReceiver.new
    PDF::Reader.file(path, receiver, :pages => false, :metadata => true)
    receiver
  end
end

Getting metadata from your PDFs is now as simple as calling:

PDF::Reader.metadata(PATH_TO_PDF).regular

On the PDF::Reader github page you’ll also find instructions on how to count the pages in a PDF, write tests to check a generated PDF and more. It’s definitely worth checking out.

Comments

Leave a comment

Comments