Stop Hash.from_xml from Killing XML Attributes 1
UPDATE: I’ve submitted a patch for this to core. Check it out here.
Before I get into the solution, I should probably begin by explaining the problem. Consider the following line of code:
Hash.from_xml('<variable type="integer">5</variable>')
#=> { "variable" => 5 }There are two important things to notice here. First, the 5 is parsed as an integer, second, we lose the “type” attribute. What’s going on behind the scenes is that Rails is looking for a "type" attribute to determine how to typecast the element contents. Now under most circumstances, this is all well and good, but what if you run into a situation where the “type” attribute doesn’t match up with ones known to Rails?
Hash.from_xml('<variable type="product_code">5</variable>')
#=> { "variable" => 5 }Now we not only lose the benefit of typecasting since Rails doesn’t know what to do with type “product_code”, we also lose any information telling us that the type was “product_code”. Unfortunately it gets worse:
Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>')
#=> { "variable" => 5 }We also lose any other attributes on this element! This is true of all elements that are childless but not empty.
Since a project I’m working on requires me to be able to get the attributes in these cases, I had to do a little hacking in the Hash class. My results are as follows:
class Hash
def self.from_xml(xml, preserve_attributes = false)
# TODO: Refactor this into something much cleaner that doesn't rely on XmlSimple
typecast_xml_value(undasherize_keys(XmlSimple.xml_in_string(xml,
'forcearray' => false,
'forcecontent' => true,
'keeproot' => true,
'contentkey' => '__content__')
), preserve_attributes)
end
private
def self.typecast_xml_value(value, preserve_attributes = false)
case value.class.to_s
when 'Hash'
if value['type'] == 'array'
child_key, entries = value.detect { |k,v| k != 'type' } # child_key is throwaway
if entries.nil? || (c = value['__content__'] && c.blank?)
[]
else
case entries.class.to_s # something weird with classes not matching here. maybe singleton methods breaking is_a?
when "Array"
entries.collect { |v| typecast_xml_value(v, preserve_attributes) }
when "Hash"
[typecast_xml_value(entries, preserve_attributes)]
else
raise "can't typecast #{entries.inspect}"
end
end
elsif value.has_key?("__content__")
content = value["__content__"]
if parser = XML_PARSING[value["type"]]
if parser.arity == 2
XML_PARSING[value["type"]].call(content, value)
else
XML_PARSING[value["type"]].call(content)
end
elsif preserve_attributes && value.keys.size > 1
value["content"] = value.delete("__content__")
value
else
content
end
elsif value['type'] == 'string' && value['nil'] != 'true'
""
# blank or nil parsed values are represented by nil
elsif value.blank? || value['nil'] == 'true'
nil
# If the type is the only element which makes it then
# this still makes the value nil, except if type is
# a XML node(where type['value'] is a Hash)
elsif value['type'] && value.size == 1 && !value['type'].is_a?(::Hash)
nil
else
xml_value = value.inject({}) do |h,(k,v)|
h[k] = typecast_xml_value(v, preserve_attributes)
h
end
# Turn { :files => { :file => #<StringIO> } into { :files => #<StringIO> } so it is compatible with
# how multipart uploaded files from HTML appear
xml_value["file"].is_a?(StringIO) ? xml_value["file"] : xml_value
end
when 'Array'
value.map! { |i| typecast_xml_value(i, preserve_attributes) }
case value.length
when 0 then nil
when 1 then value.first
else value
end
when 'String'
value
else
raise "can't typecast #{value.class.name} - #{value.inspect}"
end
end
endNow that’s a bit lengthy, but all you have to worry about is the new parameter for #from_xml called “preserve_type”. Setting this to true will provide slightly less elegant, but much more useful results in the above cases. Instead you get something like this:
Hash.from_xml('<variable type="product_code" subtype="upc">5</variable>', true)
# => {"variable"=>{"type"=>"product_code", "content"=>5, "subtype"=>"upc"}}The only downside to this solution is that our variable value is now stored in “content”. It is a touch less elegant, but I find it a worthwhile trade off for not losing any data.
As an added bonus, it’s not too hard to make use of this with ActiveResource, which is actually where I ran into the problem originally. All we have to do is add this Format to ActiveResource:
module ActiveResource
module Formats
module AttributePreservingXmlFormat
extend XmlFormat
def self.decode(xml)
from_xml_data(Hash.from_xml(xml, true))
end
end
end
endAnd then in your model:
class SomeResource < ActiveResource::Base
self.format = :attribute_preserving_xml
endWe’ve now prevented Hash.from_xml from killing our attributes.
Dude this is awesome! You should submit a patch.