Process a
Bio-Informatics XML Document in Java
(For more code samples,
visit Official VTD-XML Blog)
(Separate code-only
VTD-XML tutorials are available in
C,
C++,
Java
and
C#)
This example shows how to process a
XML file representing DNA info using the Java version of VTD-XML. The file
structure representing the DNA sequence is highly complex. The goal is to
count the number of occurrences of certain elements. The corresponding XML file and the
Java source
file can be downloaded using the links below:
bioinfo.xml
stats.java
(without XPath)
stats2.java (with
XPath)
The sample imports the
following packages:
import
com.ximpleware.*;
import com.ximpleware.xpath.*;
import java.io.*; |
As always, the first step is to read
the file into a byte buffer.
File f = new File("c:/po.xml");
// counting child elements of parlist
int count = 0;
// counting child elements of parlist named
"par"
int par_count = 0;
FileInputStream fis = new FileInputStream(f);
byte[] b = new byte[(int) f.length()];
fis.read(b); |
Then, this example instantiates
VTDGen, and parses the input XML document.
VTDGen vg = new VTDGen();
vg.setDoc(b);
vg.parse(true); |
Next, this example manually moves the
cursor down the element hierarchy. The key method is "toElement(...)"
which takes an integer that determines the direction of the navigation. Call
"getAttrVal(...)" and "getText(...)" to get the VTD index of the
attribute value and text data. There is no node casting needed as with DOM.
VTDNav vn = vg.getNav();
if (vn.matchElement("bix")){ // match blix
// to first child named "package"
if (vn.toElement(VTDNav.FC,"package")){
do {
System.out.println("package");
// to first child named "command"
if (vn.toElement(VTDNav.FC,"command")){
do {
System.out.println("command");
if (vn.toElement(VTDNav.FC, "parlist")){
do {
System.out.println("parlist");
if (vn.toElement(VTDNav.FC)){
do {
count++; //increment count
if (vn.matchElement("par"))
par_count++;
}
while(vn.toElement(VTDNav.NS));
vn.toElement(VTDNav.P);
}
}
while (vn.toElement(VTDNav.NS,"parlist"));
vn.toElement(VTDNav.P);
}
}
// to next silbing named "command"
while (vn.toElement(VTDNav.NS, "command"));
vn.toElement(VTDNav.P); // go up one level
}
else
System.out.println(" no child element named 'command' ");
}
while(vn.toElement(VTDNav.NS,"package"));
// to next sibling named "package"
vn.toElement(VTDNav.P);
// go up one level
}else
System.out.println(" no child element named
'package' ");
} else
System.out.println(" Root is not 'bix' ");
// print out the results
System.out.println(" count ====> " + count);
System.out.println(" par_count ==> " + par_count); |
In this step, the sample code
instantiates autoPilot and use its node iterators to verify the results
obtained from the previous step.
// verify results using iterators
int v=0;
vn.toElement(VTDNav.ROOT);
AutoPilot ap = new AutoPilot(vn);
ap.selectElement("par");
while(ap.iterate()){
if (vn.getCurrentDepth() == 4)
{
v++;
}
}
System.out.println(" verify ==> "+v); |
As an alternative, XPath version of the sample code
is a lot shorter and simpler.
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot();
ap.bind(vn);
ap.selectXPath("/bix/package/command/parlist");
while(ap.evalXPath()!=-1)
count++;
ap.selectXPath("/bix/package/command/parlist/par");
while(ap.evalXPath()!=-1)
par_count++;
// print out the results
System.out.println(" count ====> " + count);
System.out.println(" par_count ==> " + par_count);
|