VTD-XML: The Future of XML Processing

SourceForge.net Logo

Sourceforge Home

Mailing Lists

XimpleWare

Download


VTD-XML Home

 

7. C Version of VTD-XML


(A separate code-only VTD-XML tutorial in C is available at

Starting from Version 0.8, VTD-XML has a C port that delivers the same set of features as the Java version. In order to reduce porting effort, the VTD-XML project team made a conscious decision to make the C implmentation as close to the Java code as possible. However, there are still going to be a little different for the obvious reason: C is a different language than Java.

So the best way to describe the C implementation of VTD-XML is to first enumerate the differences between C and Java, and then describe how the C version deal with those differences.

The Notion of Class

The notion of class is key to Java's OO programming concepts. In C, the closest thing to class is a struct. So the C version of VTD-XML makes use of struct pointers to emulate Java's objects. For instance, C's "VTDGen *vg" is equivalent to Java's "VTDGen vg."

 

Constructors

A Java object is instantiated by calling its constructors using "new." Since C doesn't have the notion of class, the C version allocates and initializes a struct using functions.

The following Java statement constructs an instance of VTDGen:

VTDGen vg = new VTDGen();

The C version accomplishes the same thing with this:

VTDGen *vg = createVTDGen();

 

Garbage Collectors

Since C doesn't have automatic garbage collector, any allocated data strctures need to be manually freed. The C version of VTD-XML defines a set of "garbage- collect" functions. The following statement cleans the VTDGen *vg.

freeVTDGen(vg);

 

Methods/Functions Overloading

C doesn't allow a function name to have multiple input variable sets. So VTD-XML's C version appends a number to the function name to disambiguate the names and meanings of those functions. See the example below.

// Set the XMLDoc container.
void setDoc(VTDGen *vg, UByte *byteArray, int arrayLen);

// Set the XMLDoc container.Also set the offset and len of the document in the buffer
void setDoc2(VTDGen *vg, UByte *byteArray, int arrayLen, int offset, int docLen);

 

Function Calling Convention

Virtually every method in Java of VTD-XML has an equivalent function in the C version. The general way to translate from Java to C is to use the first object name as the first variable of the C function call.

The following statement calls the "parse()" in Java.

vg.parse(true);

The C version "parse()" has vg as its first parameter.

parse(vg, TRUE);

 

Exception handling

The C version of VTD-XML uses the "cexcept" package that provides the basic Try Catch exception handling similar to that in Java.

http://cexcept.sourceforge.net/

http://www.nicemice.net/cexcept/src/latest/rationale

 

String Literal

The C version of VTD-XML uses "wchar_t" array as the string data type. When declaring a string literal, one needs to append an L at the beginning to denote that it is a UCS string. 

 

List of constructors/destructors

// create VTDGen
VTDGen *createVTDGen();

// free VTDGen
void freeVTDGen(VTDGen *vg);

//Free VTDNav object
void freeVTDNav(VTDNav *vn);

//create AutoPilot
AutoPilot *createAutoPilot(VTDNav *v);
AutoPilot *createAutoPilot2();

//create AutoPilot
XMLModifier *createXMLModifier();
XMLModifier *createXMLModifier2(VTDNav *v);

// free XMLModifier
void freeXMLModifier(XMLModifer *xm);

// free AutoPilot
void freeAutoPilot(AutoPilot *ap);

 

List of functions in VTDGen

// clear the internal state of VTDGen so it can process
// the next XML file
void clear(VTDGen *vg);

// Returns the VTDNav object after parsing, it also cleans
// internal state so VTDGen can process the next file.
VTDNav *getNav(VTDGen *vg);

// Generating VTD tokens and Location cache info.
// One specifies whether the parsing is namespace aware or not.
void parse(VTDGen *vg, Boolean ns);

Boolean parseFile(VTDGen *vg, Boolean ns,char *fileName);

// Set the XMLDoc container.
void setDoc(VTDGen *vg, UByte *byteArray, int arrayLen);

// Set the XMLDoc container.Also set the offset and len of the document
void setDoc2(VTDGen *vg, UByte *byteArray, int arrayLen, int offset, int docLen);
 

/* Load VTD+XML from a FILE pointer */
VTDNav* loadIndex(VTDGen *vg, FILE *f);

/* load VTD+XML from a byte array */
VTDNav* loadIndex2(VTDGen *vg, UByte* ba,int len);
 

/* Write VTD+XML into a FILE pointer */
Boolean writeIndex(VTDGen *vg, FILE *f);
 

List of functions in VTDNav


//Return the attribute count of the element at the cursor position.
int getAttrCount(VTDNav *vn);

//Get the token index of the attribute value given an attribute name.    
int getAttrVal(VTDNav *vn, UCSChar *attrName);

//Get the token index of the attribute value of given URL and local name.
//If ns is not enabled, the lookup will return -1, indicating a no-found.
//Also namespace nodes are invisible using this method.
int getAttrValNS(VTDNav *vn, UCSChar* URL, UCSChar *localName);


//Get the depth (>=0) of the current element.
extern inline int getCurrentDepth(VTDNav *vn);

// Get the index value of the current element.
extern inline int getCurrentIndex(VTDNav *vn);

// Get the starting offset and length of an element
// encoded in a long, upper 32 bit is length; lower 32 bit is offset
Long getElementFragment(VTDNav *vn);

/**
 * Get the encoding of the XML document.
 * <pre>   0  ASCII       </pre>
 * <pre>   1  ISO-8859-1  </pre>
 * <pre>   2  UTF-8       </pre>
 * <pre>   3  UTF-16BE    </pre>
 * <pre>   4  UTF-16LE    </pre>
 */

extern inline encoding getEncoding(VTDNav *vn);

// Get the maximum nesting depth of the XML document (>0).
// max depth is nestingLevel -1

// max depth is nestingLevel -1
extern inline int getNestingLevel(VTDNav *vn);

// Get root index value.
extern inline int getRootIndex(VTDNav *vn);

// This function returns of the token index of the type character data or CDATA.
// Notice that it is intended to support data orient XML (not mixed-content XML).
int getText(VTDNav *vn);

//Get total number of VTD tokens for the current XML document.
extern inline int getTokenCount(VTDNav *vn);

//Get the depth value of a token (>=0)
int getTokenDepth(VTDNav *vn, int index);

//Get the token length at the given index value
//please refer to VTD spec for more details
int getTokenLength(VTDNav *vn, int index);

//Get the starting offset of the token at the given index.
extern inline int getTokenOffset(VTDNav *vn, int index);

// Get the XML document
extern inline UByte* getXML(VTDNav *vn);

//Get the token type of the token at the given index value.
extern inline tokenType getTokenType(VTDNav *vn, int index);

//Test whether current element has an attribute with the matching name.
Boolean hasAttr(VTDNav *vn, UCSChar *attrName);

//Test whether the current element has an attribute with
//matching namespace URL and localname.
Boolean hasAttrNS(VTDNav *vn, UCSChar *URL, UCSChar *localName);

//Test if the current element matches the given name.
Boolean matchElement(VTDNav *vn, UCSChar *en);

//Test whether the current element matches the given namespace URL and localname.
//URL, when set to "*", matches any namespace (including null), when set to null, defines a "always-no-match"
//ln is the localname that, when set to *, matches any localname
Boolean matchElementNS(VTDNav *vn, UCSChar *URL, UCSChar *ln);

//Match the string against the token at the given index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched against
Boolean matchRawTokenString(VTDNav *vn, int index, UCSChar *s);
 

//This method matches two VTD tokens of 2 VTDNavs
Boolean matchTokens(VTDNav *vn, int i1, VTDNav *vn2, int i2);

//Match the string against the token at the given index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched against
Boolean matchTokenString(VTDNav *vn, int index, UCSChar *s);

int compareRawTokenString(VTDNav *vn, int index, UCSChar *s);
 

//This method  lexically compares two VTD tokens
int compareTokens(VTDNav *vn, int i1, VTDNav *vn2, int i2);

//Compare the string against the token at the given index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched against
int compareTokenString(VTDNav *vn, int index, UCSChar *s);

Boolean overWrite(VTDNav *vn, int index, UByte* ba, int offset, int len);


//Convert a vtd token into a double.
double parseDouble(VTDNav *vn, int index);

//Convert a vtd token into a float.
float parseFloat(VTDNav *vn, int index);

//Convert a vtd token into an int
int parseInt(VTDNav *vn, int index);

//Convert a vtd token into a long
Long parseLong(VTDNav *vn, int index);

//Load the context info from ContextBuffer.
//Info saved including LC and current state of the context
Boolean pop(VTDNav *vn);
//Store the context info into the ContextBuffer.
//Info saved including LC and current state of the context
Boolean push(VTDNav *vn);

// A generic navigation method.
// Move the current to the element according to the direction constants
// If no such element, no position change and return false (0).
/* Legal direction constants are <br>
 * <pre> ROOT            0  </pre>
 * <pre> PARENT          1  </pre>
 * <pre> FIRST_CHILD     2  </pre>
 * <pre> LAST_CHILD      3  </pre>
 * <pre> NEXT_SIBLING    4  </pre>
 * <pre> PREV_SIBLING    5  </pre>
 * <br>
 */

Boolean toElement(VTDNav *vn, navDir direction);

/**
 * A generic navigation method.
 * Move the current to the element according to the direction
 * constants and the element name
 * If no such element, no position change and return false (0).
 * "*" matches any element
 * Legal direction constants are <br>
 * <pre> ROOT            0  </pre>
 * <pre> PARENT          1  </pre>
 * <pre> FIRST_CHILD     2  </pre>
 * <pre> LAST_CHILD      3  </pre>
 * <pre> NEXT_SIBLING    4  </pre>
 * <pre> PREV_SIBLING    5  </pre>
 * <br>
 * for ROOT and PARENT, element name will be ignored.
 */

Boolean toElement2(VTDNav *vn, navDir direction, UCSChar *en);
/*
 * A generic navigation function with namespace support.
 * Move the current to the element according to the direction constants and the prefix and local names
 * If no such element, no position change and return false(0).
 * URL * matches any namespace, including undefined namespaces
 * a null URL means hte namespace prefix is undefined for the element
 * ln *  matches any localname
 * Legal direction constants are<br>
 * <pre> ROOT            0  </pre>
 * <pre> PARENT          1  </pre>
 * <pre> FIRST_CHILD     2  </pre>
 * <pre> LAST_CHILD      3  </pre>
 * <pre> NEXT_SIBLING    4  </pre>
 * <pre> PREV_SIBLING    5  </pre>
 * <br>
 * for ROOT and PARENT, element name will be ignored.
 * If not ns enabled, return false immediately with no position change.
 */

Boolean toElementNS(VTDNav *vn, navDir direction, UCSChar *URL, UCSChar *ln);

//This method normalizes a token into a string in a way that resembles DOM.
//The leading and trailing white space characters will be stripped.
//The entity and character references will be resolved
//Multiple whitespaces char will be collapsed into one.
UCSChar *toNormalizedString(VTDNav *vn, int index);

//Convert a token at the given index to a String,
//(built-in entity and char references not resolved)
//(entities and char references not expanded).
UCSChar *toRawString(VTDNav *vn, int index);

//Convert a token at the given index to a String, (entities and char
//references resolved).
// An attribute name or an element name will get the UCS2 string of qualified name
UCSChar *toString(VTDNav *vn, int index);

 

List of functions in AutoPilot

// bind VTDNav to AutoPilot
void bind(AutoPilot *ap, VTDNav *vn);

//Select the element name before iterating
void selectElement(AutoPilot *ap, UCSChar *en);

//Select the element name (name space version) before iterating.
// * URL, if set to *, matches every namespace
// * URL, if set to null, indicates the namespace is undefined.
// * localname, if set to *, matches any localname
void selectElementNS(AutoPilot *ap, UCSChar *URL, UCSChar *ln);

//Iterate over all the selected element nodes.

Boolean iterateAP(AutoPilot *ap);

/*
 * This function selects the string representing XPath expression
 * Usually evalXPath is called afterwards
 * return true is the XPath is valid
 */

Boolean selectXPath(AutoPilot *ap, UCSChar *s);

/*
 * Evaluate XPath
 */

int evalXPath(AutoPilot *ap);

double evalXPathToNumber(AutoPilot *ap);

UCSChar* evalXPathToString(AutoPilot *ap);

Boolean evalXPathToBoolean(AutoPilot *ap);



/*
 * Reset XPath
 */

void resetXPath(AutoPilot *ap);

/*
 * Declare prefix/URL binding
 */


void declareXPathNameSpace(AutoPilot *ap, UCSChar *prefix, UCSChar *URL);

 

List of functions in XMLModifier

// bind VTDNav to XMLModifier
void bind4XMLModifer(XMLModifier *xm, VTDNav *vn);

// remove  what ever that is at the cursor position
void remove4XMLModifier(XMLModifier *xm);

void removeAttribute(XMLModifier *xm, UCSChar *attrName);

void removeToken(XMLModifier *xm, int index);

void insertBeforeElement(XMLModifier *xm, UCSChar *s);

void insertAttribute(XMLModifier *xm, UCSChar *attr);

void insertAfterElement(XMLModifier *xm);

void updateToken(XMLModifier *xm, int index);

void output(XMLModifier *xm, FILE *f);

void resetXMLModifier(XMLModifier *xm, FILE *f);



 


 

VTD in 30 seconds

VTD+XML Format

User's Guide

  0. Introduction

  1. Goals and Features   

  2. How to Process XML

  3. Navigate VTD

  4. Classes/Interfaces and Methods

  5. Comparison with DOM,  SAX, and Pull

  6. Table for Token Types

  7. The C version VTD-XML

Developer's Guide

VTD: A Technical Perspective

Code Samples

FAQ

Getting Involved

Articles and Presentations

Benchmark

API Doc

Demo