Process a
Bio-Informatics XML document in C
(For more code samples,
visit Official VTD-XML Blog)
(Separate code-only
VTD-XML tutorials are available in
C,
C++,
Java
and
C#)
This example shows how to process a
XML file representing DNA info using the Java version of VTD-XML. The file
structure representing the DNA sequence is highly complex. The goal is to
count the number of occurrences of certain elements. The corresponding XML file and the
C source
file can be downloaded using the links below:
bioinfo.xml
stats.c (without
XPath)
We are going to include the following header files:
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "xpath1.h"
#include "helper.h"
#include "vtdGen.h" |
In C version, the error context needs to be set up as
a global variable.
_thread struct exception_context
the_exception_context[1]; |
Next, this example declares all variables before they
are used.
exception e;
FILE *f = NULL;
int i = 0,count=0,par_count=0,v=0;
char* filename = "./bioinfo.xml";
struct stat s;
UByte *xml = NULL; // this is the buffer
containing the XML content, UByte means unsigned byte
VTDGen *vg = NULL; // This is the VTDGen that
parses XML
VTDNav *vn = NULL; // This is the VTDNav that
navigates the VTD records
AutoPilot *ap = NULL; |
Then, the code opens the XML file and copies the
input into a byte buffer.
f = fopen(filename,"r");
stat(filename,&s);
i = (int) s.st_size;
wprintf(L"size of the file is %d \n",i);
xml = (UByte *)malloc(sizeof(UByte) *i);
i = fread(xml,sizeof(UByte),i,f); |
An instance of VTDGen is created to parse the
input XML.
vg = createVTDGen();
setDoc(vg,xml,i);
parse(vg,TRUE); |
An instance of AutoPilot is created by calling
"createAutoPilot2()," which is subsequent used to parse and evaluate
XPath.
vn = getNav(vg);
ap = createAutoPilot2();
bind(ap,vn); // in version 1.5 bind replace rebind and setVTDNav
if (selectXPath(ap,L"/bix/package/command/parlist")){
while(evalXPath(ap)!= -1){
count++;
}
}
if (selectXPath(ap,L"/bix/package/command/parlist/par")){
while(evalXPath(ap)!= -1){
par_count++;
}
}
wprintf(L"count ==> %d \n",count);
wprintf(L"par_count ==> %d \n",par_count); |
Then, the code uses the node iterator to verify the
results.
toElement(vn,ROOT);
selectElement(ap,L"par");
while(iterateAP(ap)){
if (getCurrentDepth(vn) == 4){
v++;
}
}
wprintf(L"verify ==> %d \n",v); |
As the last step, the example closes the file
descriptor and free all allocated data structures.
fclose(f);
// remember C has no automatic garbage
collector
// needs to de-allocate manually.
freeVTDNav(vn);
freeVTDGen(vg);
freeAutoPilot(ap); |