xmlwrapp
Lightweight C++ XML parsing library
|
As I mentioned earlier, almost everything in an XML document is a node. The first thing that you probably think of is the XML element, but XML elements are just one type of XML node.
Processing instructions, XML comments and even the whitespace around XML elements are nodes. When working with the xml::node class, you are going to want to know what type of node you have. The xml::node::get_type() member function is what you should use.
The xmlwrapp/node.h header file defines the node types as an enum. Using that enum and the xml::node::get_type() member function, you can discover what a node is holding.
All xmlwrapp nodes have name data, but it might not be what you expect. The definition for a node's name is different for each type of node. For example, an element node's name is the tag name, but the name for a text node is a string constant, "text".
The xml::node::get_name() function returns the name of the node. You probably guessed that xml::node::set_name() will set the name of a node. The following table lists the most common node types and what they store in their name data.
node type | get_name() return value |
---|---|
element node (type_element) | the name of the tag, for example, "root" for "<root/>" |
text node (type_text) | the string constant "text" |
CDATA node (type_cdata) | NULL (zero) pointer |
processing instruction node (type_pi) | the processing instruction application name |
comment node (type_comment) | the string constant "comment" |
Very much like a node's name data, its contents vary depending on what type of node it is. For text nodes and CDATA nodes, the node's contents are the text data inside the node or CDATA block. The xml::node::get_content() function is used to get the node's content.
You should be aware of some magic that this function does. If you call it on an element node, which has no content by the way, it will try to return the content of its children text nodes, if it has any. An example should clarify things:
Here, the entry
node does not contain any contents because it is an element node. It does, however, contain a child node that is a text node. Calling the xml::node::get_content() function on that child text node should return "Pick
up a Sun Enterprise 10000 while you are out shopping.". What you might not expect is that calling the xml::node::get_content() function on the entry
node returns the same string! Very cool if you ask me.
To be complete, here is a table that explains what the xml::node::get_content() function returns for the common node types:
node type | get_content() return value |
---|---|
element node (type_element) | the contents of its children nodes |
text node (type_text) | <the text |
CDATA node (type_cdata) | the text inside the CDATA block |
processing instruction node (type_pi) | the processing instruction data |
comment node (type_comment) | the comment text |
The function for setting the node's content is xml::node::set_content(). This function performs the same magic that the get_content() function does. That is, if you call it on an element node, it will remove all of the node's children and replace them with a text node.
From what we have seen in the past few sections, XML nodes can have children. Almost all node types can have children. In order to make effective use of a node tree, you are going to want to access the children of a node. In xmlwrapp, this is done with iterators.
Using iterators you can walk the node tree, add nodes, remove nodes and even replace nodes. xmlwrapp iterators are just as useful as the standard library iterators, however please notice that all iterators provided by the library are only forward, and not random access or even bidirectional, i.e. they can only be used to iterate over a collection of nodes, but not to access arbitrary nodes directly nor iterate in the reverse direction.
Just like the standard containers, the xml::node class has xml::node::begin() and xml::node::end() member functions that return either a xml::node::iterator or a xml::node::const_iterator. They are very useful for working with the children of a xml::node, which are xml::nodes themselves.
Example:
You can use the xml::node::find() member function to find an element node by its name. There are four different version of the xml::node::find() function. All of them return a xml::node::iterator or a xml::node::const_iterator that either points to the found element node, or is equal to the iterator that xml::node::end() returns.
Example:
If you need to do something with all child element nodes, you can use xml::node::elements() method to obtain a view of child nodes. The view, implemented by xml::nodes_view and xml::const_nodes_view classes, behaves like a standard container in that it lets you iterate over the nodes in the usual way. The difference between the iterator returned by xml::nodes_view::begin() and the one from xml::node::begin() is that the latter iterates over all child nodes, whereas the former iterates only over selected elements.
The usage is similar to iterating over all child nodes:
Similarly, you can iterate over all child elements with given name using the xml::node::elements(const char*) method:
There are two ways of adding a child to a xml::node. Which one you chose depends mostly on your style. The two member functions that allow you to do this are xml::node::push_back() and xml::node::insert().
The xml::node::push_back() member function takes a const reference to a xml::node. It will copy that node and then insert it as the last child.
Example:
The xml::node::push_back() function is just like the push_back() function for the standard library containers. You should be aware that it inserts a copy of the given node, and not the node itself.
In the example above, if you modified the child
node after it was inserted into the parent
node, those modifications will not be reflected in the parent
node's copy of child
.
See the next section for a better way of adding child nodes when you want to modify them after adding them to another node.
The xml::node::insert() function is a lot more flexible compared to the xml::node::push_back() function. For starters, you can insert a node anywhere in the list of children. This means you can add the node to the back or even the front of the child list.
More importantly, xml::node::insert() returns a xml::node::iterator that points to the copy of the node that was inserted. This means that you can modify the new child after it has been inserted into the parent.
Example:
Using iterators, it is possible to remove children and to replace them with other nodes. To remove a child node, you first need a xml::node::iterator that points to it, then you can call xml::node::erase().
The xml::node::erase() function will return a xml::node::iterator that points to the node after the one being removed. This should help you if you are iterating over all the children and wanted to remove one without losing your place.
To replace a node for which you have an iterator, you can use the xml::node::replace() function. It will remove and clean up the old node, and insert the new node in its place.
In addition to possibly having children, an element node may have attributes. In xmlwrapp, these attributes are stored in a xml::attributes class. You can use the xml::node::get_attributes() function to get a reference to a node's xml::attributes object.