VTD-XML: The Future of XML Processing

SourceForge.net Logo

Sourceforge Home

Mailing Lists

XimpleWare

Download


VTD-XML Home

 

Process a Bio-Informatics XML Document in Java

(For more code samples, visit Official VTD-XML Blog)

(Separate code-only VTD-XML tutorials are available in C, C++, Java and C#)

This example shows how to process a XML file representing DNA info using the Java version of VTD-XML. The file structure representing the DNA sequence is highly complex. The goal is to count the number of occurrences of certain elements.  The corresponding XML file and the Java source file can be downloaded using the links below:

bioinfo.xml

stats.java (without XPath)

stats2.java (with XPath)

The sample imports the following packages:

import com.ximpleware.*;
import com.ximpleware.xpath.*;
import java.io.*;

As always, the first step is to read the file into a byte buffer.

File f = new File("c:/po.xml");
// counting child elements of parlist
int count = 0;
// counting child elements of parlist named "par"
int par_count = 0;
FileInputStream fis = new FileInputStream(f);
byte[] b = new byte[(int) f.length()];
fis.read(b);

Then, this example instantiates VTDGen, and parses the input XML document.

VTDGen vg = new VTDGen();
vg.setDoc(b);
vg.parse(true);

Next, this example manually moves the cursor down the element hierarchy. The key method is "toElement(...)" which takes an integer that determines the direction of the navigation. Call "getAttrVal(...)" and "getText(...)" to get the VTD index of the attribute value and text data. There is no node casting needed as with DOM.

VTDNav vn = vg.getNav();
if (vn.matchElement("bix")){
// match blix
// to first child named "package"
    if (vn.toElement(VTDNav.FC,"package")){
       do {
           System.out.println("package");
                      // to first child named "command"
           if (vn.toElement(VTDNav.FC,"command")){
              do {
                 System.out.println("command");
                 if (vn.toElement(VTDNav.FC, "parlist")){
                   do {
                      System.out.println("parlist");
                      if (vn.toElement(VTDNav.FC)){
                        do {
                          count++;
//increment count
                          if (vn.matchElement("par"))
                             par_count++;
                        }
                        while(vn.toElement(VTDNav.NS));
                        vn.toElement(VTDNav.P);
                      }
                   }
                   while (vn.toElement(VTDNav.NS,"parlist"));
                   vn.toElement(VTDNav.P);
                 }
              }
                            // to next silbing named "command"
              while (vn.toElement(VTDNav.NS, "command"));
              vn.toElement(VTDNav.P);
// go up one level
           }
           else
              System.out.println(" no child element named 'command' ");
        }
        while(vn.toElement(VTDNav.NS,"package"));
// to next sibling named "package"
        vn.toElement(VTDNav.P);
// go up one level
    }else
      System.out.println(" no child element named 'package' ");
} else
    System.out.println(" Root is not 'bix' ");
// print out the results
System.out.println(" count ====> " + count);
System.out.println(" par_count ==> " + par_count);

In this step, the sample code instantiates autoPilot and use its node iterators to verify the results obtained from the previous step.

// verify results using iterators
int v=0;
vn.toElement(VTDNav.ROOT);
AutoPilot ap = new AutoPilot(vn);
ap.selectElement("par");
while(ap.iterate()){
    if (vn.getCurrentDepth() == 4)
   {
        v++;
    }
}
System.out.println(" verify ==> "+v);

As an alternative, XPath version of the sample code is a lot shorter and simpler.

VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot();
ap.bind(vn);
ap.selectXPath("/bix/package/command/parlist");
while(ap.evalXPath()!=-1)
    count++;

ap.selectXPath("/bix/package/command/parlist/par");
while(ap.evalXPath()!=-1)
        par_count++;

// print out the results
System.out.println(" count ====> " + count);
System.out.println(" par_count ==> " + par_count);

 

 

 

 

VTD in 30 seconds

VTD+XML Format

User's Guide

Developer's Guide

VTD: A Technical Perspective

Code Samples

  RSS Reader in Java

  RSS Reader in C

  SOAP in Java

  SOAP in C

  BioInfo in Java

  BioInfo in C

  Modify XML In Java

  Modify XML In C

  Shuffle

  Edit XML

  Index Creation and Loading

  Process Huge XML Files (>2G)

FAQ

Getting Involved

Articles and Presentations

Benchmark

API Doc

Demo