In and out of JSONness: Parsing Logic with Swift and Aldwych


If you can parse XML in and out of a JSON structure, then you have something that can be natively manipulated as Dictionaries and Arrays are. You also have something that is essentially cross-platform: the logic of the loops that get you from XML to JSON are pretty much the same if you keep them simple enough. And simplicity is the key.

The main source of mind melding is the nesting of tags/JSON, because it's a bit like a mirror placed in front of a mirror that reflects forever.  But really if you can get nesting working it can be repeated over and over, and you needn't worry so much about this fact.

Target structure

You're going to need a logical structure that you want the JSON ending up in and I have a fairly simple one: a tag and its content become a dictionary with the tag as the key and the content as the value. (There'll also be an attribute key in each dictionary containing an attribute dictionary for the tag, which incidentally echoes the NSXMLParser approach.) The value of the core tag name key is an array of content from between the tags and the XML file itself is treated as one long ordered array.

If the content of every tag was purely text then there wouldn't be any complexity here, but it is necessary to handle the situation of a nested tag, and a nested tag inside the nested tag, and so on. So we have to come up with a system to handle this infinitely. Luckily we're only going to need two arrays to handle this situation: one array of JSONDictionary values and one array of JSONArray values.

Parsing Logic

So this is what happens inside the parsing class, which is a NSXMLParserDelegate. We start with two instance variables:
var elementArray = [JSONDictionary]()
var contentArray = [JSONArray]()
When a tag opens a JSONDictionary is created, which goes into the JSONDictionary array and a JSONArray that goes into the other array. 
func parser(parser: NSXMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [NSObject : AnyObject]) {

       
        // current dictionary is the newly opened tag
        elementArray.append(JSONDictionary(dict: [elementName:"", "attributes":attributeDict], restrictTypeChanges: false))

        
        // every new tag has an array added to the holdingArr
        contentArray.append(JSONArray(restrictTypeChanges: false))

    }

Every time there's a string it goes into the final JSONArray in the array dictionary:
func parser(parser: NSXMLParser, foundCharacters string: String?) {
        if let str = string {

        // current array is always the last item in holding, add string to array
        contentArray[contentArray.count-1].append(str)
                    
        }
}

And every time we have a new open tag we add a dictionary and an array, as we did at the start.

Closing time

The important stuff happens when we arrive at a closed tag. Here the last object in the array of JSONArrays is the content of the dictionary and so we make that the value of the key that is the last dictionary in the JSONDictionary array. Next the dictionary is placed into the penultimate JSONArray in the array of JSONArrays, so that it is placed in its parent nest. Next the final entry of the dictionary array is removed and the final array of the JSONArray array is also removed. It's the most difficult part to understand, but can be dealt with in a few lines of code:
func parser(parser: NSXMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {

        // current array, which might be one string or a nested set of elements is added to the current dictionary
        if contentArray.count > 0 {
            for (k,v) in elementArray.last! {
                if k != "attributes" {
                    elementArray[elementArray.count-1][k] = contentArray.last
                }
            }
            
        }
        
        // add the current dictionary to the array before if there is one, i.e. if this isn't the document's parent tag
        if contentArray.count > 1 {
         
            // add the dictionary into the previous JSONArray of the holdingArray
            contentArray[contentArray.count-2].append(elementArray[elementArray.count-1])
            
            
         // remove the dictionary
            if elementArray.count > 0 {
                // remove last JSONDictionary from element array
                elementArray.removeLast()
            }
            if contentArray.count > 0 {
                // remove last JSONArray from content array
                contentArray.removeLast()
            }

        }
}
The magic really happens in the looping over the elements and when we get to the final dictionary in the dictionary array, then we have arrived at the opening tag of the document and all is finished, we can return this element without first placing it in a parent nest.

Aldwych JSON parsing

At this stage you might be thinking this is all very well but there's no JSONArray or JSONDictionary classes in the Swift standard dictionary. And you'd be correct, these are part of a JSON parser I created called Aldwych, which is available on GitHub.

With all of the files from the Aldwych repository added to your project you can parse XML to JSON with the following code:
var error:NSError?
if let url = NSBundle.mainBundle().pathForResource("test", ofType: "xml"),
   d = NSData(contentsOfFile: url)
   {
        let a = XMLParser()
        let jsonData = a.parse(d).jsonData(options: NSJSONWritingOptions.PrettyPrinted, error: &error)
   }

This is only one feature of the parsing library for JSON, you'll be hearing plenty more about the repository in future posts.

Update: I've now added conversion from JSON to XML to Aldwych, so you can convert the JSON back to XML. (Note: At present it only converts XML that shares the structure of the kind Aldwych generates from XML, not randomly created JSON.) 


Endorse on Coderwall

Comments