Ruby Operators and Assignment Shortcuts

Posted by Peter Wagenet Fri, 23 Jan 2009 15:13:00 GMT

Recently I was looking through some source code (probably Rails) and discovered a handy little shortcut. When using operators, Ruby doesn’t return a true or false, it returns the actual value of the last piece that it evaluated. I’m sure most of you are familiar with using the OR operator for this. Here’s a simple example:

a = nil
b = "hello"
c = a || b # => "hello"

As you can see, the variable “c” isn’t assigned a true or false value. Instead Ruby tries to evaluate “a”, discovers that it returns nil (a false value would behave the same) and then moves on to “b” which it assigns to “c”.

What you might not have realized is that this same principle can also be applied to the AND operator.

To see how this works, take the following example:

a = "hello"
b = a ? a.length : nil #=> 5

Here we’re calling a method on “a” that won’t exist if “a” is nil. So first, we check to see if “a” exists. If it exists, we call a method on it, otherwise we return nil.

Now this works fine, but using what we’ve seen about how operators work, we can abbreviate this even more:

a = "hello"
b = a && a.length # => 5

Here, ruby first evaluates “a”, which, if it’s neither nil nor false, will then move on to check the next part: “a.length”. Since “a” isn’t nil, a.length can be called safely and returns 5. If however, “a” had been nil or false, evaluation would have stopped there and “b” would have been assigned the value of “a” (nil or false).

I admit that this isn’t a huge shortcut, but I find that it reduces a bit of redundancy in some cases and makes my code just a little bit cleaner. After all, every little bit adds up towards making nice clean code.

Regexp.new with Multiple Modifier

Posted by Peter Wagenet Tue, 18 Nov 2008 16:31:00 GMT

This may be a no brainer, but I was working with Regexp.new and ran into and wanted to make a regexp with multiple modifiers. It’s easy enough to make one, the docs show you how to:

r = Regexp.new('dog', Regexp::EXTENDED)   #=> /dog/x

But what if you want more than one modifier? The docs say they should be “or-ed” together. This is as simple as:

r = Regexp.new('test', Regexp::IGNORECASE | Regexp::MULTILINE)
#=> /test/mi

You can even use all three as such:

r = Regexp.new('test', Regexp::IGNORECASE | Regexp::MULTILINE | Regexp::EXTENDED)
#=> /dog/mix

Sort of a no-brainer once you realize what it means.

Get a Backtrace for Debugging Without Throwing an Exception

Posted by Peter Wagenet Tue, 11 Nov 2008 18:18:00 GMT

In one of my current projects, I ran into a very annoying situation where an object attribute was getting set to an incorrect value. The problem was that I didn’t know where this was happening. Furthermore, as far as I could tell, it was being set in a method that had a complex series of alias_method_chains and callbacks. After spending far too long debugging what turned out to be a simple issue, I realized I needed a better solution for the future.

What I really wanted was a backtrace showing what method chain had led to setting the attribute incorrectly. However, as far as I could tell, there was no easy way to get a backtrace outside of raising an Exception. So I came up with a little helper to make this process easier:

class Object
  def backtrace
    raise
  rescue Exception => e
    e.backtrace[1..-1] # Leave off first line since we don't care about it
  end
end

As you can see, nothing complex is going on here. We’ve added a new method to the Object class, which means that it’s available to all other classes. We raise an error, immediately capture it and return its backtrace. We cut off the first line of the backtrace since it’s only a reference to the backtrace method itself which we’re not too interested. (And yes, I know this means the title is a bit of a lie since an exception is thrown in our backtrace method, but at least we can avoid it in the main method.)

Here’s a usage example for an ActiveRecord model:

def write_attribute(attr_name, value)
  logger.info "write_attribute backtrace\n#{backtrace.join("\n")}"
  super
end

Output will look like a normal backtrace, but without the clutter of raising an Exception.

Stop Hash.from_xml from Killing XML Attributes 1

Posted by Peter Wagenet Thu, 06 Nov 2008 00:41:00 GMT

UPDATE: I’ve submitted a patch for this to core. Check it out here.

Before I get into the solution, I should probably begin by explaining the problem. Consider the following line of code:

Hash.from_xml('<variable type="integer">5</variable>')
#=> { "variable" => 5 }

There are two important things to notice here. First, the 5 is parsed as an integer, second, we lose the “type” attribute. What’s going on behind the scenes is that Rails is looking for a "type" attribute to determine how to typecast the element contents. Now under most circumstances, this is all well and good, but what if you run into a situation where the “type” attribute doesn’t match up with ones known to Rails?

Hash.from_xml('<variable type="product_code">5</variable>')
#=> { "variable" => 5 }

Now we not only lose the benefit of typecasting since Rails doesn’t know what to do with type “product_code”, we also lose any information telling us that the type was “product_code”. Unfortunately it gets worse:

Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>')
#=> { "variable" => 5 }

We also lose any other attributes on this element! This is true of all elements that are childless but not empty.

Since a project I’m working on requires me to be able to get the attributes in these cases, I had to do a little hacking in the Hash class. My results are as follows:

class Hash

  def self.from_xml(xml, preserve_attributes = false)
    # TODO: Refactor this into something much cleaner that doesn't rely on XmlSimple
    typecast_xml_value(undasherize_keys(XmlSimple.xml_in_string(xml,
      'forcearray'   => false,
      'forcecontent' => true,
      'keeproot'     => true,
      'contentkey'   => '__content__')
    ), preserve_attributes)
  end

  private

    def self.typecast_xml_value(value, preserve_attributes = false)
      case value.class.to_s
        when 'Hash'
          if value['type'] == 'array'
            child_key, entries = value.detect { |k,v| k != 'type' }   # child_key is throwaway
            if entries.nil? || (c = value['__content__'] && c.blank?)
              []
            else
              case entries.class.to_s   # something weird with classes not matching here.  maybe singleton methods breaking is_a?
              when "Array"
                entries.collect { |v| typecast_xml_value(v, preserve_attributes) }
              when "Hash"
                [typecast_xml_value(entries, preserve_attributes)]
              else
                raise "can't typecast #{entries.inspect}"
              end
            end
          elsif value.has_key?("__content__")
            content = value["__content__"]
            if parser = XML_PARSING[value["type"]]
              if parser.arity == 2
                XML_PARSING[value["type"]].call(content, value)
              else
                XML_PARSING[value["type"]].call(content)
              end
            elsif preserve_attributes && value.keys.size > 1
              value["content"] = value.delete("__content__")
              value
            else
              content
            end
          elsif value['type'] == 'string' && value['nil'] != 'true'
            ""
          # blank or nil parsed values are represented by nil
          elsif value.blank? || value['nil'] == 'true'
            nil
          # If the type is the only element which makes it then 
          # this still makes the value nil, except if type is
          # a XML node(where type['value'] is a Hash)
          elsif value['type'] && value.size == 1 && !value['type'].is_a?(::Hash)
            nil
          else
            xml_value = value.inject({}) do |h,(k,v)|
              h[k] = typecast_xml_value(v, preserve_attributes)
              h
            end

            # Turn { :files => { :file => #<StringIO> } into { :files => #<StringIO> } so it is compatible with
            # how multipart uploaded files from HTML appear
            xml_value["file"].is_a?(StringIO) ? xml_value["file"] : xml_value
          end
        when 'Array'
          value.map! { |i| typecast_xml_value(i, preserve_attributes) }
          case value.length
            when 0 then nil
            when 1 then value.first
            else value
          end
        when 'String'
          value
        else
          raise "can't typecast #{value.class.name} - #{value.inspect}"
      end
    end
end

Now that’s a bit lengthy, but all you have to worry about is the new parameter for #from_xml called “preserve_type”. Setting this to true will provide slightly less elegant, but much more useful results in the above cases. Instead you get something like this:

Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>', true)
# => {"variable"=>{"type"=>"product_code", "content"=>5, "subtype"=>"upc"}}

The only downside to this solution is that our variable value is now stored in “content”. It is a touch less elegant, but I find it a worthwhile trade off for not losing any data.

As an added bonus, it’s not too hard to make use of this with ActiveResource, which is actually where I ran into the problem originally. All we have to do is add this Format to ActiveResource:

module ActiveResource
  module Formats
    module AttributePreservingXmlFormat
      extend XmlFormat

      def self.decode(xml)
        from_xml_data(Hash.from_xml(xml, true))
      end
    end
  end
end

And then in your model:

class SomeResource < ActiveResource::Base
  self.format = :attribute_preserving_xml
end

We’ve now prevented Hash.from_xml from killing our attributes.

Reading PDF Properties with Ruby

Posted by Peter Wagenet Sat, 01 Nov 2008 20:07:00 GMT

So off the bat I’ll admit that this is not the most interesting topic that I could write about. However, I did have to do some PDF Processing Recently in which I had to read some custom properties from a PDF. It took me a little while to stumble on to a solution so this may serve as a useful fast track.

What I eventually found was PDF::Reader, available over on github. In the README, he gives instructions for extracting metadata, which was exactly what I wanted. Unfortunately, the setup is a bit awkward, requiring an additional receiver.

To simplify things, I wrote the following snippet:

require 'pdf/reader'

class MetaDataReceiver
  attr_accessor :regular
  attr_accessor :xml

  def metadata(data)
    @regular = data
  end

  def metadata_xml(data)
    @xml = data
  end
end

class PDF::Reader
  def self.metadata(path)
    receiver = MetaDataReceiver.new
    PDF::Reader.file(path, receiver, :pages => false, :metadata => true)
    receiver
  end
end

Getting metadata from your PDFs is now as simple as calling:

PDF::Reader.metadata(PATH_TO_PDF).regular

On the PDF::Reader github page you’ll also find instructions on how to count the pages in a PDF, write tests to check a generated PDF and more. It’s definitely worth checking out.