xmlwrapp
Lightweight C++ XML parsing library
Introduction

XML and C++

Even before the W3C published the first recommendation for the Extensible Markup Language (XML), it had a huge amount of hype surrounding it. Very quickly the terms XML and Java were practically synonymous. To this day it is hard to find any documentation about XML that does not include examples in Java.

The extreme hype around Java may be going down with the dotcom ship, but XML has maintained its market strength. This is true in almost all business circles. It is not uncommon to hear non-technical people throwing around the term XML even though they have no clue what it means. That is why it is so amazing that XML really does deliver what it promises and is a solid technology.

Programmers who are not wrapped up in the Java world have been slower to adopt XML. I attribute this to the lack of good XML parsers for other languages. You can see how the use of XML has exploded in languages such as Perl and Python after good parsers were made available. C and C++ programmers have not been so lucky.

That's not to say that there are no XML parsers available for C or C++. There are plenty of working parsers, some have been around for a long time, such as expat. The problem with parsers like expat is their limited functionality when you consider everything under the XML umbrella. XML has always been a moving target, and that is not likely to change any time soon.

There have been few parsers that really try to expose all the power that XML offers. A C++ example would be Xerces, the Apache Foundation XML parser. In an attempt to provide functionality, comprehension has been lost. In addition, Xerces is designed to work with older, pre-standard C++ code. This means no modern C++ concepts such as iterators.

In the C world, one parser has quickly become the defacto standard, libxml2. This library alone can support SAX and DOM parsing, XPath, XInclude and many other features. Combined with its companion library, libxslt, C programmers can take advantage of the power of XSLT. The libxml2 parser suffers from the same comprehension problem that Xerces does. With over 1200 public functions, libxml2 can be a little difficult to get your mind around.

The xmlwrapp library takes all the power of libxml2 and provides a clean and modern C++ interface for it. Although xmlwrapp does not expose all the functionality of libxml2, it does provide C++ interfaces for the most useful API calls. The main goal for xmlwrapp is to give access of XML technologies to C++ programmers, without drowning them in a sea of classes or functions. It meets this goal and it does it efficiently.

xmlwrapp is the library you would write for yourself if you had the time.

xmlwrapp Software License

xmlwrapp is released under a 3 clause Berkeley style license. You can find its exact text in the file named LICENSE in the top directory of xmlwrapp distribution archive.

Intended Audience

This book is targeted at C++ programmers who love using the C++ standard library containers. I am not going to explain what iterators are, or how to use them. If you understand how to work with standard containers like vectors you will have no problems using xmlwrapp.

In addition this book assumes that you understand the technologies around XML. I will cover some topics that relate to XML, but for the most part you should already understand XML basics such as elements, attributes and namespaces. Even if you have little exposure to these technologies, you might want to read this book anyway. Keep your favorite XML reference book handy just in case.

Terminology

If you have worked with XML parsers such as libxml2 in the past, you probably have a good understanding of the terminology involved. This section is just a quick review so that we are all on the same page.

XML Nodes

Almost everything in an XML document is made out of XML nodes. Elements, processing instructions, comments and even attributes are examples of XML nodes. It is important to remember that XML nodes are not just the elements inside the XML document.

It is also important to remember that everything in the document will be preserved, including whitespace. That's right, the newlines and tabs that are between the elements will be placed into XML nodes!

Parsing

The act of parsing an XML document involves translating the XML text document into something that you can work with in C++. As you will learn a bit later, this can be done completely in the parser or using a class you provide.

Well Formed XML and Valid XML

An XML document is "Well Formed" when there are no syntax errors such as forgetting to close an XML element. No matter what parsing style you choose, an XML document that is not well formed will be rejected. Therefore, the document must at least be well formed if you want to work with it using xmlwrapp.

In addition to being well formed, an XML document is said to be Valid when it is correctly formated according to either a Document Type Definition (DTD) or XML Schema. Both of these documents tell the parser where elements and attributes should be place relative to one another.

It is important to understand that a well formed document is not necessarily valid. A valid XML document is, of course, well formed.

What This Book Covers

This book is a guide to working with the xmlwrapp library. It will not cover all the functionality of the library, but instead give you everything that you need to get started. This includes the general layout of the API and classes.

After you have read this book, you should be able to read the xmlwrapp header files to understand the complete interface. You should also be able to know which header file to look into for which class.

Don't be afraid to crack open the xmlwrapp header files in your favorite text editor. They are very clean and contain extensive documentation about each of the member functions.