Reading PDF Properties with Ruby
So off the bat I’ll admit that this is not the most interesting topic that I could write about. However, I did have to do some PDF Processing Recently in which I had to read some custom properties from a PDF. It took me a little while to stumble on to a solution so this may serve as a useful fast track.
What I eventually found was PDF::Reader, available over on github. In the README, he gives instructions for extracting metadata, which was exactly what I wanted. Unfortunately, the setup is a bit awkward, requiring an additional receiver.
To simplify things, I wrote the following snippet:
require 'pdf/reader'
class MetaDataReceiver
attr_accessor :regular
attr_accessor :xml
def metadata(data)
@regular = data
end
def metadata_xml(data)
@xml = data
end
end
class PDF::Reader
def self.metadata(path)
receiver = MetaDataReceiver.new
PDF::Reader.file(path, receiver, :pages => false, :metadata => true)
receiver
end
endGetting metadata from your PDFs is now as simple as calling:
PDF::Reader.metadata(PATH_TO_PDF).regularOn the PDF::Reader github page you’ll also find instructions on how to count the pages in a PDF, write tests to check a generated PDF and more. It’s definitely worth checking out.