5. Conclusion
In this paper we
introduced the concept of Virtual Token Descriptor and location cache,
both of which are designed to enable a “non-extractive” XML processing
model. We also provided a detailed description of the processing model and
showed how to navigate the element hierarchy as represented by the
combination of VTD tokens and location cache. Attempting to achieve most
of DOM’s functionality without incurring its resource overhead, the
processing model makes extensive use of 64-bit integers in order to avoid
per-object overhead associated with most object-based hierarchies. The
benchmark results suggest that we have met most of our design goals.
However, we would like to acknowledge the "work-in-progress" status of the
work presented in the paper. There are also some limitations of our
processing model worth mentioning. First, because VTD makes use of 64-bit
integers and fixed-sized fields to encode offset values, for documents
that are very large (>1G) or deep, one might need to move bits around, or
even add another 32 bits to a VTD record, to meet the actual processing
requirement. Second, the current implementation does not resolve entities
outside of those five built-in ones (&s; < > ' "). In
addition, our reference implementation doesn't support either DTD or
Schema validation. Last, the maximum supported array size in Java is 2G,
which is the maximum size that the processing model can handle. As a
workaround, we might need to use chunk-based byte buffers to overcome this
limit.