VTD-XML: The Future of XML Processing

SourceForge.net Logo

Sourceforge Home

Mailing Lists

XimpleWare

Download


VTD-XML Home

 

Process a Bio-Informatics XML document in C

(For more code samples, visit Official VTD-XML Blog)

(Separate code-only VTD-XML tutorials are available in C, C++, Java and C#)

This example shows how to process a XML file representing DNA info using the Java version of VTD-XML. The file structure representing the DNA sequence is highly complex. The goal is to count the number of occurrences of certain elements.  The corresponding XML file and the C source file can be downloaded using the links below:

bioinfo.xml

stats.c (without XPath)

We are going to include the following header files:

#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "xpath1.h"
#include "helper.h"
#include "vtdGen.h"

In C version, the error context needs to be set up as a global variable.
_thread struct exception_context the_exception_context[1];

Next, this example declares all variables before they are used.
exception e;
FILE *f = NULL;
int i = 0,count=0,par_count=0,v=0;

char* filename = "./bioinfo.xml";
struct stat s;
UByte *xml = NULL;
// this is the buffer containing the XML content, UByte means unsigned byte
VTDGen *vg = NULL;
// This is the VTDGen that parses XML
VTDNav *vn = NULL;
// This is the VTDNav that navigates the VTD records
AutoPilot *ap = NULL;

Then, the code opens the XML file and copies the input into a byte buffer.
f = fopen(filename,"r");
stat(filename,&s);
i = (int) s.st_size;
wprintf(L"size of the file is %d \n",i);
xml = (UByte *)malloc(sizeof(UByte) *i);
i = fread(xml,sizeof(UByte),i,f);

An instance of VTDGen is created to parse the input XML.
vg = createVTDGen();
setDoc(vg,xml,i);
parse(vg,TRUE);

An instance of AutoPilot is created by calling "createAutoPilot2()," which is subsequent used to parse and evaluate XPath.
vn = getNav(vg);
ap = createAutoPilot2();
bind(ap,vn); // in version 1.5 bind replace rebind and setVTDNav
if (selectXPath(ap,L"/bix/package/command/parlist")){
      while(evalXPath(ap)!= -1){
           count++;
      }
}

if (selectXPath(ap,L"/bix/package/command/parlist/par")){
      while(evalXPath(ap)!= -1){
           par_count++;
      }
}
wprintf(L"count ==> %d \n",count);
wprintf(L"par_count ==> %d \n",par_count);

Then, the code uses the node iterator to verify the results.
toElement(vn,ROOT);
selectElement(ap,L"par");
while(iterateAP(ap)){
    if (getCurrentDepth(vn) == 4){
          v++;
    }
}
wprintf(L"verify ==> %d \n",v);

As the last step, the example closes the file descriptor and free all allocated data structures.
fclose(f);
// remember C has no automatic garbage collector
// needs to de-allocate manually.

freeVTDNav(vn);
freeVTDGen(vg);
freeAutoPilot(ap);

VTD in 30 seconds

VTD+XML Format

User's Guide

Developer's Guide

VTD: A Technical Perspective

Code Samples

  RSS Reader in Java

  RSS Reader in C

  SOAP in Java

  SOAP in C

  BioInfo in Java

  BioInfo in C

  Modify XML In Java

  Modify XML In C

  Shuffle

  Edit XML

  Index Creation and Loading

  Process Huge XML Files (>2G)

FAQ

Getting Involved

Articles and Presentations

Benchmark

API Doc

Demo