Friday, March 2, 2012

XML - STAX Parser

With a SAX parser, you do not have control over the parsing events. As the SAX parser runs through the XML document, it sees elements and calls your handlers. You cannot, for example, ask the SAX parser to go back to an earlier event, or jump to a future event.

This is called a "push-type" event parser - because the events are pushed to the application handlers.

Sometimes we want finer control over the parsing. For example, we might want to parse only a section of the XML. Or we might want to save an event which happens earlier in the XML, and parse it when another event happens later in the XML.

This is called "pull-type" event parser - because the application directs the parser to pull in events as needed.

STAX (STreaming API for XML) is such a "pull-type" parser.

As an example, let us parse the users.xml - this time using STAX.

We will use STAX to zero down on the elements of interest to use - firstName and lastName, and ignore all the rest.

In your /src/org/confucius, create a class UserSTAXParser.java, like this:
 package org.confucius;  

import java.io.FileInputStream;
import java.io.FileNotFoundException;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;

public class UserSTAXParser {

public static void main(String[] args) {
try {
XMLInputFactory xmlif = XMLInputFactory.newInstance();
XMLStreamReader xmlr = xmlif.createXMLStreamReader("stream_1", new FileInputStream("data/users.xml"));

String firstName = null;
String lastName = null;

while(xmlr.hasNext()){
if (xmlr.next() == XMLEvent.START_ELEMENT){

if (xmlr.getLocalName().equals("firstName")) {
while (xmlr.hasNext()){
if (xmlr.next() == XMLEvent.CHARACTERS){
firstName = xmlr.getText();
break;
}
}
}

else if (xmlr.getLocalName().equals("lastName")) {
while (xmlr.hasNext()){
if (xmlr.next() == XMLEvent.CHARACTERS){
lastName = xmlr.getText();
System.out.println("Got user: " + firstName + " " + lastName);
break;
}
}
}
}
}

} catch (XMLStreamException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}


This code is a lot more verbose than what we are used to. It also looks more complicated. But it has let us do two things - one, like DOM parser, it has allowed us to pick and choose which elements we are interested in. And, like a SAX parser, it is event driven - it loads only the requested events into memory - thus it has a very small memory footprint, even if the XML file is humongous.

In this code, we have selected only those events which parse first and last names, and ignored the rest.

Note: STAX library comes standard with JDK 1.6. For earlier JDK versions, you will need to download stax-api.jar by updating your ivy.xml

R-click on the USerSTAXParser.java file in your Eclipse Navigator, and select Run As--> Java Application.

You will see the users printed to the console.

No comments: