Ruby Operators and Assignment Shortcuts
Recently I was looking through some source code (probably Rails) and discovered a handy little shortcut. When using operators, Ruby doesn’t return a true or false, it returns the actual value of the last piece that it evaluated. I’m sure most of you are familiar with using the OR operator for this. Here’s a simple example:
a = nil
b = "hello"
c = a || b # => "hello"As you can see, the variable “c” isn’t assigned a true or false value. Instead Ruby tries to evaluate “a”, discovers that it returns nil (a false value would behave the same) and then moves on to “b” which it assigns to “c”.
What you might not have realized is that this same principle can also be applied to the AND operator.
To see how this works, take the following example:
a = "hello"
b = a ? a.length : nil #=> 5Here we’re calling a method on “a” that won’t exist if “a” is nil. So first, we check to see if “a” exists. If it exists, we call a method on it, otherwise we return nil.
Now this works fine, but using what we’ve seen about how operators work, we can abbreviate this even more:
a = "hello"
b = a && a.length # => 5Here, ruby first evaluates “a”, which, if it’s neither nil nor false, will then move on to check the next part: “a.length”. Since “a” isn’t nil, a.length can be called safely and returns 5. If however, “a” had been nil or false, evaluation would have stopped there and “b” would have been assigned the value of “a” (nil or false).
I admit that this isn’t a huge shortcut, but I find that it reduces a bit of redundancy in some cases and makes my code just a little bit cleaner. After all, every little bit adds up towards making nice clean code.
Regexp.new with Multiple Modifier
This may be a no brainer, but I was working with Regexp.new and ran into and wanted to make a regexp with multiple modifiers. It’s easy enough to make one, the docs show you how to:
r = Regexp.new('dog', Regexp::EXTENDED) #=> /dog/xBut what if you want more than one modifier? The docs say they should be “or-ed” together. This is as simple as:
r = Regexp.new('test', Regexp::IGNORECASE | Regexp::MULTILINE)
#=> /test/miYou can even use all three as such:
r = Regexp.new('test', Regexp::IGNORECASE | Regexp::MULTILINE | Regexp::EXTENDED)
#=> /dog/mixSort of a no-brainer once you realize what it means.
Get a Backtrace for Debugging Without Throwing an Exception
In one of my current projects, I ran into a very annoying situation where an object attribute was getting set to an incorrect value. The problem was that I didn’t know where this was happening. Furthermore, as far as I could tell, it was being set in a method that had a complex series of alias_method_chains and callbacks. After spending far too long debugging what turned out to be a simple issue, I realized I needed a better solution for the future.
What I really wanted was a backtrace showing what method chain had led to setting the attribute incorrectly. However, as far as I could tell, there was no easy way to get a backtrace outside of raising an Exception. So I came up with a little helper to make this process easier:
class Object
def backtrace
raise
rescue Exception => e
e.backtrace[1..-1] # Leave off first line since we don't care about it
end
endAs you can see, nothing complex is going on here. We’ve added a new method to the Object class, which means that it’s available to all other classes. We raise an error, immediately capture it and return its backtrace. We cut off the first line of the backtrace since it’s only a reference to the backtrace method itself which we’re not too interested. (And yes, I know this means the title is a bit of a lie since an exception is thrown in our backtrace method, but at least we can avoid it in the main method.)
Here’s a usage example for an ActiveRecord model:
def write_attribute(attr_name, value)
logger.info "write_attribute backtrace\n#{backtrace.join("\n")}"
super
endOutput will look like a normal backtrace, but without the clutter of raising an Exception.
Stop Hash.from_xml from Killing XML Attributes 1
UPDATE: I’ve submitted a patch for this to core. Check it out here.
Before I get into the solution, I should probably begin by explaining the problem. Consider the following line of code:
Hash.from_xml('<variable type="integer">5</variable>')
#=> { "variable" => 5 }There are two important things to notice here. First, the 5 is parsed as an integer, second, we lose the “type” attribute. What’s going on behind the scenes is that Rails is looking for a "type" attribute to determine how to typecast the element contents. Now under most circumstances, this is all well and good, but what if you run into a situation where the “type” attribute doesn’t match up with ones known to Rails?
Hash.from_xml('<variable type="product_code">5</variable>')
#=> { "variable" => 5 }Now we not only lose the benefit of typecasting since Rails doesn’t know what to do with type “product_code”, we also lose any information telling us that the type was “product_code”. Unfortunately it gets worse:
Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>')
#=> { "variable" => 5 }We also lose any other attributes on this element! This is true of all elements that are childless but not empty.
Since a project I’m working on requires me to be able to get the attributes in these cases, I had to do a little hacking in the Hash class. My results are as follows:
class Hash
def self.from_xml(xml, preserve_attributes = false)
# TODO: Refactor this into something much cleaner that doesn't rely on XmlSimple
typecast_xml_value(undasherize_keys(XmlSimple.xml_in_string(xml,
'forcearray' => false,
'forcecontent' => true,
'keeproot' => true,
'contentkey' => '__content__')
), preserve_attributes)
end
private
def self.typecast_xml_value(value, preserve_attributes = false)
case value.class.to_s
when 'Hash'
if value['type'] == 'array'
child_key, entries = value.detect { |k,v| k != 'type' } # child_key is throwaway
if entries.nil? || (c = value['__content__'] && c.blank?)
[]
else
case entries.class.to_s # something weird with classes not matching here. maybe singleton methods breaking is_a?
when "Array"
entries.collect { |v| typecast_xml_value(v, preserve_attributes) }
when "Hash"
[typecast_xml_value(entries, preserve_attributes)]
else
raise "can't typecast #{entries.inspect}"
end
end
elsif value.has_key?("__content__")
content = value["__content__"]
if parser = XML_PARSING[value["type"]]
if parser.arity == 2
XML_PARSING[value["type"]].call(content, value)
else
XML_PARSING[value["type"]].call(content)
end
elsif preserve_attributes && value.keys.size > 1
value["content"] = value.delete("__content__")
value
else
content
end
elsif value['type'] == 'string' && value['nil'] != 'true'
""
# blank or nil parsed values are represented by nil
elsif value.blank? || value['nil'] == 'true'
nil
# If the type is the only element which makes it then
# this still makes the value nil, except if type is
# a XML node(where type['value'] is a Hash)
elsif value['type'] && value.size == 1 && !value['type'].is_a?(::Hash)
nil
else
xml_value = value.inject({}) do |h,(k,v)|
h[k] = typecast_xml_value(v, preserve_attributes)
h
end
# Turn { :files => { :file => #<StringIO> } into { :files => #<StringIO> } so it is compatible with
# how multipart uploaded files from HTML appear
xml_value["file"].is_a?(StringIO) ? xml_value["file"] : xml_value
end
when 'Array'
value.map! { |i| typecast_xml_value(i, preserve_attributes) }
case value.length
when 0 then nil
when 1 then value.first
else value
end
when 'String'
value
else
raise "can't typecast #{value.class.name} - #{value.inspect}"
end
end
endNow that’s a bit lengthy, but all you have to worry about is the new parameter for #from_xml called “preserve_type”. Setting this to true will provide slightly less elegant, but much more useful results in the above cases. Instead you get something like this:
Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>', true)
# => {"variable"=>{"type"=>"product_code", "content"=>5, "subtype"=>"upc"}}The only downside to this solution is that our variable value is now stored in “content”. It is a touch less elegant, but I find it a worthwhile trade off for not losing any data.
As an added bonus, it’s not too hard to make use of this with ActiveResource, which is actually where I ran into the problem originally. All we have to do is add this Format to ActiveResource:
module ActiveResource
module Formats
module AttributePreservingXmlFormat
extend XmlFormat
def self.decode(xml)
from_xml_data(Hash.from_xml(xml, true))
end
end
end
endAnd then in your model:
class SomeResource < ActiveResource::Base
self.format = :attribute_preserving_xml
endWe’ve now prevented Hash.from_xml from killing our attributes.
Reading PDF Properties with Ruby
So off the bat I’ll admit that this is not the most interesting topic that I could write about. However, I did have to do some PDF Processing Recently in which I had to read some custom properties from a PDF. It took me a little while to stumble on to a solution so this may serve as a useful fast track.
What I eventually found was PDF::Reader, available over on github. In the README, he gives instructions for extracting metadata, which was exactly what I wanted. Unfortunately, the setup is a bit awkward, requiring an additional receiver.
To simplify things, I wrote the following snippet:
require 'pdf/reader'
class MetaDataReceiver
attr_accessor :regular
attr_accessor :xml
def metadata(data)
@regular = data
end
def metadata_xml(data)
@xml = data
end
end
class PDF::Reader
def self.metadata(path)
receiver = MetaDataReceiver.new
PDF::Reader.file(path, receiver, :pages => false, :metadata => true)
receiver
end
endGetting metadata from your PDFs is now as simple as calling:
PDF::Reader.metadata(PATH_TO_PDF).regularOn the PDF::Reader github page you’ll also find instructions on how to count the pages in a PDF, write tests to check a generated PDF and more. It’s definitely worth checking out.