Tuesday, February 28, 2012

XML - SAX Parser

DOM parsers work well if the XML file is small, and can be loaded into memory all at once.

But oftentimes, the XML files contain millions of records - for example if it is a dump from a database. In this case, attempting to load the file into memory all at once will cause memory issues.

SAX - Java Simple API for XML - parses XML files element by element, so it only has one element in memory at any time. It calls user specified 'handler' each time it reads an element.

This style of parsing is often called "event" driven. Every time the SAX parser encounters an element, it is an event.

Let us see how to parse the users.xml file, this time with a SAX parser.

For this, we need to write a "handler".

In your /src/org/confucius folder, create a class UserSAXHandler.java, like this:
 package org.confucius;  

import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class UserSAXHandler extends DefaultHandler {
private boolean firstNameFlag = false;
private boolean lastNameFlag = false;
private String firstName = null;
private String lastName = null;

public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
if (qName.equals("firstName")){
firstNameFlag = true;
}
else if (qName.equals("lastName")){
lastNameFlag = true;
}
}

public void characters(char[] ch, int start, int length)
throws SAXException {
if (firstNameFlag){
firstName = new String(ch, start, length);
}
else if (lastNameFlag){
lastName = new String(ch, start, length);
}
}

public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("firstName")){
firstNameFlag = false;
}
else if (qName.equals("lastName")){
lastNameFlag = false;
System.out.println("got User: " + firstName + " " + lastName);
}
}
}

We are handling two SAX events - the start of a element, and the end of an element.
In between these, we extract the value specified for the element.

There are other "events" that we could have handled if we wished to - such as the start of the document, the end of the document, etc. The interface ContentHandler lists all the events that a SAX parser can expect to handle.

Let us now write a test for this parser.

In your /src/org/confucius, create a class TestSAXParser.java, like this:
 package org.confucius;  

import java.io.File;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.helpers.DefaultHandler;

public class TestSAXParser {
public static void main (String[] args){
try {
DefaultHandler handler = new UserSAXHandler();
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
saxParser.parse(new File("data/users.xml"), handler);
}
catch (Exception e){
e.printStackTrace();
}

}
}

We create a SAXParser, then pass it the XML file and the handler.

R-click on the TestSAXParser.java file in your Eclipse navigator.
Select Run As --> Java Application.

You will see the users printed to the console.