7. C Version of VTD-XML
(A separate code-only
VTD-XML tutorial in C is available at
Starting from Version 0.8,
VTD-XML has a C port that delivers the same set of features as the Java version. In order to reduce
porting effort, the VTD-XML project team made a conscious decision to make
the C implmentation as close to the Java code as possible. However, there
are still going to be a little different for the obvious reason: C is a
different language than Java.
So the best way to describe the C implementation of
VTD-XML is to first enumerate the differences between C and Java, and then
describe how the C version deal with those differences.
The Notion of Class
The notion of class is key to Java's OO programming
concepts. In C, the closest thing to class is a struct. So the C version of
VTD-XML makes use of struct pointers to emulate Java's objects. For
instance, C's "VTDGen *vg" is equivalent to Java's "VTDGen vg."
Constructors
A Java object is instantiated by calling its
constructors using "new." Since C doesn't have the notion of class, the C
version allocates and initializes a struct using functions.
The following Java statement constructs an instance
of VTDGen:
VTDGen vg = new VTDGen();
The C version accomplishes the same thing with this:
VTDGen *vg = createVTDGen();
Garbage Collectors
Since C doesn't have automatic garbage collector, any
allocated data strctures need to be manually freed. The C version of VTD-XML
defines a set of "garbage- collect" functions. The following statement
cleans the VTDGen *vg.
freeVTDGen(vg);
Methods/Functions Overloading
C doesn't allow a function name to have multiple
input variable sets. So VTD-XML's C version appends a number to the function
name to disambiguate the names and meanings of those functions. See the
example below.
// Set the XMLDoc
container.
void
setDoc(VTDGen *vg, UByte *byteArray,
int arrayLen);
// Set the XMLDoc container.Also set the offset and
len of the document in the buffer
void
setDoc2(VTDGen *vg, UByte *byteArray,
int arrayLen, int
offset, int docLen);
Function Calling Convention
Virtually every method in Java of VTD-XML has an
equivalent function in the C version. The general way to translate from Java
to C is to use the first object name as the first variable of the C function
call.
The following statement calls the "parse()" in Java.
vg.parse(true);
The C version "parse()" has vg as its first parameter.
parse(vg, TRUE);
Exception handling
The C version of VTD-XML uses the "cexcept" package that
provides the basic Try Catch exception handling similar to that in Java.
http://cexcept.sourceforge.net/
http://www.nicemice.net/cexcept/src/latest/rationale
String Literal
The C version of VTD-XML uses "wchar_t" array as the
string data type. When declaring a string literal, one needs to append an L
at the beginning to denote that it is a UCS string.
List of constructors/destructors
// create VTDGen
VTDGen *createVTDGen();
// free VTDGen
void
freeVTDGen(VTDGen *vg);
//Free VTDNav object
void
freeVTDNav(VTDNav *vn);
//create AutoPilot
AutoPilot *createAutoPilot(VTDNav
*v);
AutoPilot *createAutoPilot2();
//create AutoPilot
XMLModifier *createXMLModifier();
XMLModifier *createXMLModifier2(VTDNav
*v);
// free XMLModifier
void
freeXMLModifier(XMLModifer *xm);
// free AutoPilot
void
freeAutoPilot(AutoPilot *ap);
List of
functions in VTDGen
// clear the internal state
of VTDGen so it can process
// the next XML file
void
clear(VTDGen *vg);
// Returns the VTDNav object after parsing, it also
cleans
// internal state so VTDGen can process the next file.
VTDNav *getNav(VTDGen *vg);
// Generating VTD tokens and Location cache info.
// One specifies whether the parsing is namespace aware or not.
void
parse(VTDGen *vg, Boolean ns);
Boolean
parseFile(VTDGen
*vg, Boolean ns,char
*fileName);
// Set the XMLDoc container.
void
setDoc(VTDGen *vg, UByte *byteArray,
int arrayLen);
// Set the XMLDoc container.Also set the offset and
len of the document
void
setDoc2(VTDGen *vg, UByte *byteArray,
int arrayLen, int
offset, int docLen);
/* Load VTD+XML from a FILE pointer */
VTDNav*
loadIndex(VTDGen
*vg,
FILE *f);
/* load VTD+XML from a byte array */
VTDNav*
loadIndex2(VTDGen
*vg, UByte*
ba,int len);
/* Write VTD+XML into a FILE pointer */
Boolean
writeIndex(VTDGen
*vg,
FILE *f);
List of functions in VTDNav
//Return the attribute count of
the element at the cursor position.
int
getAttrCount(VTDNav *vn);
//Get the token index of the attribute value given
an attribute name.
int
getAttrVal(VTDNav *vn, UCSChar *attrName);
//Get the token index of the attribute value of
given URL and local name.
//If ns is not enabled, the lookup will return -1, indicating a no-found.
//Also namespace nodes are invisible using this method.
int
getAttrValNS(VTDNav *vn, UCSChar* URL, UCSChar *localName);
//Get the depth (>=0) of the current element.
extern inline
int
getCurrentDepth(VTDNav *vn);
// Get the index value of the current element.
extern inline
int
getCurrentIndex(VTDNav *vn);
// Get the starting offset and length of an element
// encoded in a long, upper 32 bit is length; lower 32 bit is offset
Long getElementFragment(VTDNav
*vn);
/**
* Get the encoding of the XML document.
* <pre> 0 ASCII
</pre>
* <pre> 1 ISO-8859-1 </pre>
* <pre> 2 UTF-8
</pre>
* <pre> 3 UTF-16BE </pre>
* <pre> 4 UTF-16LE </pre>
*/
extern inline encoding
getEncoding(VTDNav *vn);
// Get the maximum nesting depth of the XML
document (>0).
// max depth is nestingLevel -1
// max depth is nestingLevel -1
extern inline
int
getNestingLevel(VTDNav *vn);
// Get root index value.
extern inline
int getRootIndex(VTDNav
*vn);
// This function returns of the token index of the
type character data or CDATA.
// Notice that it is intended to support data orient XML (not mixed-content
XML).
int
getText(VTDNav *vn);
//Get total number of VTD tokens for the current
XML document.
extern inline
int getTokenCount(VTDNav
*vn);
//Get the depth value of a token (>=0)
int
getTokenDepth(VTDNav *vn, int
index);
//Get the token length at the given index value
//please refer to VTD spec for more details
int
getTokenLength(VTDNav *vn, int
index);
//Get the starting offset of the token at the given
index.
extern inline
int getTokenOffset(VTDNav
*vn, int index);
// Get the XML document
extern inline UByte*
getXML(VTDNav *vn);
//Get the token type of the token at the given
index value.
extern inline tokenType
getTokenType(VTDNav *vn,
int index);
//Test whether current element has an attribute
with the matching name.
Boolean hasAttr(VTDNav *vn,
UCSChar *attrName);
//Test whether the current element has an attribute
with
//matching namespace URL and localname.
Boolean hasAttrNS(VTDNav *vn,
UCSChar *URL, UCSChar *localName);
//Test if the current element matches the given
name.
Boolean matchElement(VTDNav *vn,
UCSChar *en);
//Test whether the current element matches the
given namespace URL and localname.
//URL, when set to "*", matches any namespace (including null), when set to
null, defines a "always-no-match"
//ln is the localname that, when set to *, matches any localname
Boolean matchElementNS(VTDNav
*vn, UCSChar *URL, UCSChar *ln);
//Match the string against the token at the given
index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched
against
Boolean matchRawTokenString(VTDNav
*vn, int index, UCSChar *s);
//This method matches two VTD tokens of 2 VTDNavs
Boolean matchTokens(VTDNav *vn,
int i1, VTDNav *vn2,
int i2);
//Match the string against the token at the given
index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched
against
Boolean matchTokenString(VTDNav
*vn, int index, UCSChar *s);
int compareRawTokenString(VTDNav
*vn, int index, UCSChar *s);
//This method lexically compares two VTD tokens
int compareTokens(VTDNav *vn,
int i1, VTDNav *vn2,
int i2);
//Compare the string against the token at the given
index value. When a token
//is an attribute name or starting tag, qualified name is what gets matched
against
int compareTokenString(VTDNav
*vn, int index, UCSChar *s);
Boolean
overWrite(VTDNav
*vn,
int index, UByte*
ba, int
offset, int
len);
//Convert a vtd token into a double.
double
parseDouble(VTDNav *vn,
int index);
//Convert a vtd token into a float.
float
parseFloat(VTDNav *vn,
int index);
//Convert a vtd token into an int
int
parseInt(VTDNav *vn, int
index);
//Convert a vtd token into a long
Long parseLong(VTDNav *vn,
int index);
//Load the context info from ContextBuffer.
//Info saved including LC and current state of the context
Boolean pop(VTDNav *vn);
//Store the context info into the ContextBuffer.
//Info saved including LC and current state of the context
Boolean push(VTDNav *vn);
// A generic navigation method.
// Move the current to the element according to the direction constants
// If no such element, no position change and return false (0).
/* Legal direction constants are <br>
* <pre> ROOT
0 </pre>
* <pre> PARENT
1 </pre>
* <pre> FIRST_CHILD 2 </pre>
* <pre> LAST_CHILD 3
</pre>
* <pre> NEXT_SIBLING 4 </pre>
* <pre> PREV_SIBLING 5 </pre>
* <br>
*/
Boolean toElement(VTDNav *vn, navDir
direction);
/**
* A generic navigation method.
* Move the current to the element according to the direction
* constants and the element name
* If no such element, no position change and return false (0).
* "*" matches any element
* Legal direction constants are <br>
* <pre> ROOT
0 </pre>
* <pre> PARENT
1 </pre>
* <pre> FIRST_CHILD 2 </pre>
* <pre> LAST_CHILD 3 </pre>
* <pre> NEXT_SIBLING 4 </pre>
* <pre> PREV_SIBLING 5 </pre>
* <br>
* for ROOT and PARENT, element name will be ignored.
*/
Boolean toElement2(VTDNav *vn, navDir
direction, UCSChar *en);
/*
* A generic navigation function with namespace support.
* Move the current to the element according to the direction constants and
the prefix and local names
* If no such element, no position change and return false(0).
* URL * matches any namespace, including undefined namespaces
* a null URL means hte namespace prefix is undefined for the element
* ln * matches any localname
* Legal direction constants are<br>
* <pre> ROOT
0 </pre>
* <pre> PARENT
1 </pre>
* <pre> FIRST_CHILD 2 </pre>
* <pre> LAST_CHILD 3 </pre>
* <pre> NEXT_SIBLING 4 </pre>
* <pre> PREV_SIBLING 5 </pre>
* <br>
* for ROOT and PARENT, element name will be ignored.
* If not ns enabled, return false immediately with no position change.
*/
Boolean toElementNS(VTDNav *vn, navDir
direction, UCSChar *URL, UCSChar *ln);
//This method normalizes a token into a string in a
way that resembles DOM.
//The leading and trailing white space characters will be stripped.
//The entity and character references will be resolved
//Multiple whitespaces char will be collapsed into one.
UCSChar *toNormalizedString(VTDNav
*vn, int index);
//Convert a token at the given index to a String,
//(built-in entity and char references not resolved)
//(entities and char references not expanded).
UCSChar *toRawString(VTDNav *vn,
int index);
//Convert a token at the given index to a String,
(entities and char
//references resolved).
// An attribute name or an element name will get the UCS2 string of
qualified name
UCSChar *toString(VTDNav *vn,
int index);
List of functions in AutoPilot
// bind VTDNav to AutoPilot
void
bind(AutoPilot *ap, VTDNav *vn);
//Select the element name before iterating
void
selectElement(AutoPilot *ap, UCSChar *en);
//Select the element name (name space version)
before iterating.
// * URL, if set to *, matches every namespace
// * URL, if set to null, indicates the namespace is undefined.
// * localname, if set to *, matches any localname
void
selectElementNS(AutoPilot *ap, UCSChar
*URL, UCSChar *ln);
//Iterate over all the selected element nodes.
Boolean iterateAP(AutoPilot *ap);
/*
* This function selects the string representing XPath expression
* Usually evalXPath is called afterwards
* return true is the XPath is valid
*/
Boolean selectXPath(AutoPilot *ap,
UCSChar *s);
/*
* Evaluate XPath
*/
int evalXPath(AutoPilot
*ap);
double evalXPathToNumber(AutoPilot
*ap);
UCSChar* evalXPathToString(AutoPilot
*ap);
Boolean evalXPathToBoolean(AutoPilot
*ap);
/*
* Reset XPath
*/
void resetXPath(AutoPilot
*ap);
/*
* Declare prefix/URL binding
*/
void
declareXPathNameSpace(AutoPilot *ap, UCSChar *prefix, UCSChar
*URL);
List of functions in XMLModifier
// bind VTDNav to
XMLModifier
void
bind4XMLModifer(XMLModifier *xm, VTDNav *vn);
// remove what ever that is at the cursor
position
void
remove4XMLModifier(XMLModifier *xm);
void
removeAttribute(XMLModifier *xm, UCSChar
*attrName);
void
removeToken(XMLModifier *xm, int index);
void
insertBeforeElement(XMLModifier *xm,
UCSChar *s);
void
insertAttribute(XMLModifier *xm, UCSChar
*attr);
void
insertAfterElement(XMLModifier *xm);
void
updateToken(XMLModifier *xm, int index);
void
output(XMLModifier *xm, FILE *f);
void
resetXMLModifier(XMLModifier *xm, FILE
*f);