post-photo

A Beginner’s Guide to XML Parsing in Swift

Róbert Klacso ,

2016-08-24

#code

#comparison

#iOS

#Swift

Tons of document formats using XML syntax had been developed, like RSS, Atom, SOAP and XHTML, so it’s good to know, how to work with them. If you are not familiar with XML, it’s basically a precisely formatted text or string, which can be parsed into an array of objects containing the precious information.

A good tutorial about XML can be found here.

Here I will show you the usage of the NSXMLParser (part of the iOS SDK) and SWXMLHash from GitHub. …hence the awesome wallpaper:

text

The sample XML, which will help us now is the following:

<items>
	<item>
		<author>Robi</author>
		<description>My article about Olympics</description>
		<tag name = "Olympics" count = "3"/>
		<tag name = "Rio"/>
	</item>
	<item>
		<author>Robi</author>
		<description>I can't wait Spa-Francorchamps!!</description>
		<tag name = "Formula One"/>
		<tag name = "Eau Rouge" count = "5"/>
	</item>
</items>

Let’s start with the NSXMLParser.

First of all, we need to create our custom objects:

class Item { 
    var author = "";
    var desc = "";
    var tag = [Tag]();
}

class Tag {
    var name = "";
    var count: Int?;
}

As you can see, we created a separate class for every member, which can get multiple values within its parent (so we can make arrays from them).

To use NSXMLParser, our class needs to conform to the protocol NSXMLParserDelegate (eg. class ViewController: UIViewController, NSXMLParserDelegate {})

Now we need to make our XML file digestible for Xcode, create a parser and set it’s delegate (let the name of our sample data be xmlString):

let xmlData = xmlString.dataUsingEncoding(NSUTF8StringEncoding)!
let parser = NSXMLParser(data: xmlData)
        
parser.delegate = self;
        
parser.parse()

The delegate contains many functions, but only 4 of them are interesting to us:

  1. func parser(didStartElement): is called every time the parser finds a : In our example, it will be called when the parser reaches: , , ,
  2. func parser(didEndElement): is called every time the parser finds a : In our example it will be called when the parser reaches: , , …
  3. func parser(foundCharacters): is called every time the parser enters a and it will stop on line breaks and “special characters” (eg. í, ö): The second part looks like a problem, but with the most common usage of the parser this problem is totally eliminated (more on that later).
  4. func parserDidEndDocument: is called when the parser finished the document: The parser runs through the whole document once. This means that the parser will start with the key, next and (calls didStartElement on all of them), following foundCharacters called on “Robi”, didEndElement … The last function called will be didEndElement on .

Now, that we know how our parser works, we need to parse the elements. For this we will need a new Item array, an empty string for the found characters and a global Item variable (we need it because the delegate’s methods are called many-many times and we need to save the parsed values until we finish):

var items = [Item]();
var item = Item();
var foundCharacters = "";

and use the delegate’s methods:

func parser(parser: NSXMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String]) {
	if elementName == "tag" {
		let tempTag = Tag();
		if let name = attributeDict["name"] {
			tempTag.name = name;
		}
		if let c = attributeDict["count"] {
			if let count = Int(c) {
				tempTag.count = count;
			}
		}
		self.item.tag.append(tempTag);
	}
}

As you can see, if we have an in-line value () we cannot use the didEndElement, since there is no value. Luckily, the attributeDict [String: String] dictionary is here to help us. You just need to tell which value you want to get and set.

func parser(parser: NSXMLParser, foundCharacters string: String) {
	self.foundCharacters += string;
}

Another thing to mention is the foundCharacters variable. As I mentioned above, the parser(foundCharacters) function can be interrupted, but the parser parses the whole document synchronously, so it continues from where it stopped.

func parser(parser: NSXMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
	if elementName == "author" {
		self.item.author = self.foundCharacters;
	}
        
	if elementName == "description" {
		self.item.desc = self.foundCharacters;
 	}
        
	if elementName == "item" {
		let tempItem = Item();
		tempItem.author = self.item.author;
		tempItem.desc = self.item.desc;
		tempItem.tag = self.item.tag;
		self.items.append(tempItem);
		self.item.tag.removeAll();
	}
	self.foundCharacters = ""
}

You should notice, that we empty our foundCharacters value when the parser reached the end of an element. Therefore the fragments will give us the string we wanted: problem solved :).

func parserDidEndDocument(parser: NSXMLParser) {
	for item in self.items {
		print("\(item.author)\n\(item.desc)");
		for tags in item.tag {
			if let count = tags.count {
				print("\(tags.name), \(count)")
			} else {
				print("\(tags.name)")
			}
		}
		print("\n")
	}
}

When the parser finishes the document and calls the parserDidEndDocument, we log our newly created item list:

"Robi
My article about Olympics
Olympics, 3
Rio

Robi
I can't wait Spa-Francorchamps!!
F1
Eau Rouge, 5"

SWXMLHash

To use this framework, I recommend using CocoaPods and then importing SWXMLHash in your class using it.

The custom objects are the same as before:

class Item { 
    var author = "";
    var desc = "";
    var tag = [Tag]();
}

class Tag {
    var name = "";
    var count: Int?;
}

Using SWXMLHash, you will need to make an XMLIndexer from your xmlString by parsing it (you can use configurations, they can be found on GitHub:

let xml = SWXMLHash.config { // the xml variable is our XMLIndexer
            config in
            config.shouldProcessLazily = false
}.parse(self.xmlString);

Create a new Item array that you want to fill:

var items = [Item]();

Now, that you have an indexer, you can get the values by simply stating the key values the parser should search for:

xml["items"]
xml["items"]["item"]

As you can see, if you want to get values from deeper in the array, just write the name of the next key. After that you can get the value of the element by calling its text attribute (it will return optional value), eg. the text of key:

xml["items"]["item"]["author"].element?.text

Note that you have to define the name of the key you want to get. The indexer behaves like an array, so you can iterate through it:

for elem in xml["items"]["item"] {
	let newItem = Item();
	var tags = [Tag]()
	if let val = elem["author"].element?.text {
		newItem.author = val;
	}
	if let val = elem["description"].element?.text {
		newItem.desc = val;
	}
	for tag in elem["tag"] {
		let tempTag = Tag();
		if let val = tag.element?.attributes["name"] {
			tempTag.name = val;
		}
		if let val = tag.element?.attributes["count"] {
			if let count = Int(val) {
        		tempTag.count = count;
			}
		}
		tags.append(tempTag);
	}
	newItem.tag = tags;
	self.item.append(newItem);
}

Note that you don’t need to use foundCharacters, since you got the whole text between and . Don’t forget, that in this case the variable elem will be an Item, not the values key contains. If you have in-line value, you have to call attributes[“key”] instead of the element. The attributes is a [String: String] dictionary, so it will return a String.

Let’s see the whole function with a logger:

func parseXML() {
	let xml = SWXMLHash.config {
		config in
		config.shouldProcessLazily = false
		}.parse(self.xmlString);
        
	for elem in xml["items"]["item"] {
		let newItem = Item();
		var tags = [Tag]();
		if let val = elem["author"].element?.text {
			newItem.author = val;
		}
		if let val = elem["description"].element?.text {
			newItem.desc = val;
		}
		for tag in elem["tag"] {
			let tempTag = Tag();
			if let val = tag.element?.attributes["name"] {
				tempTag.name = val;
			}
			if let val = tag.element?.attributes["count"] {
				if let count = Int(val) {
					tempTag.count = count;
				}
			}
			tags.append(tempTag);
		}
		newItem.tag = tags;
		self.item.append(newItem);
	}
 
	for item in self.item {
		print("\(item.author)\n\(item.desc)");
		for tag in item.tag {
			if let count = tag.count {
				print("\(tag.name), \(count)");
			} else {
				print("\(tag.name)");
			}
		}
		print("\n")
	}
}

and our output is:

"Robi
My article about Olympics
Olympics, 3
Rio

Robi
I can't wait Spa-Francorchamps!!
F1
Eau Rouge, 5"

Conclusion

As you can see, you will need to know the structure of your XML in both cases. Then you can get the data from it pretty easily. The downside of the NSXMLParser is that you will have to work with 3-4 delegate functions, but they will be relatively short. Plus you can use native protocol. With SWXMLHash you can parse your whole XML in one function and therefore it will be pretty long with a more complex structure.

If you liked this post, you might like another one I wrote about converting projects to Swift 3.0. Also: if you’d like to be updated of our future posts, follow us on LinkedIn or like us on Fb.

member photo
colored slash

Róbert Klacso

He is a junior iOS developer at Wanari and has been with us since January. His coding style and favorite language are the same: Swift.

Latest post by Róbert Klacso

A Beginner’s Guide to XML Parsing in Swift