XML

Introduction

The core of this library is a validating XML parser with DTD processing and all. On top of this are implemented an API for manipulating XML data in a DOM like fashion and a serialization API. As a bonus there’s also an XPath implementation, albeit this is limited to XPath 1.0.

The DOM API

libzeep uses a modern C++ way of accessing and manipulating data. To give an idea have a look at the following code.

int main()
{
    using namespace zeep::xml::literals; 

    /* Construct an XML document in memory using a string literal */
    auto doc = 
        R"(<persons>
            <person id="1">
                <firstname>John</firstname>
                <lastname>Doe</lastname>
            </person>
            <person id="2">
                <firstname>Jane</firstname>
                <lastname>Jones</lastname>
            </person>
        </persons>)"_xml;

    /* Iterate over an XPath result set */
    for (auto& person: doc.find("//person")) 
    {
        std::string firstname, lastname;

        /* Iterate over the __element__ nodes inside the person __element__ */
        for (auto& name: *person)
        {
            if (name.name() == "firstname")	firstname = name.str();
            if (name.name() == "lastname")	lastname = name.str();
        }

        std::cout << person->get_attribute("id") << ": " << lastname << ", " << firstname << std::endl;
    }

    return 0;
}

XML nodes

The class zeep::xml::node is the base class for all classes in the DOM API. The class is not copy constructable and subclasses use move semantics to offer a simple API while still being memory and performance efficient. Nodes can have siblings and a parent but no children.

The class zeep::xml::element is the main class, it implements a full XML node with child nodes and attributes. The children are stored as a linked list and same goes for the attributes.

The class zeep::xml::text contains the text between XML elements. A zeep::xml::cdata class is derived from zeep::xml::text and other possible child nodes for an XML element are zeep::xml::processing_instruction and zeep::xml::comment.

XML elements also contain attributes, stored in the zeep::xml::attribute class. Namespace information is stored in these attributes as well. Attributes support structured binding, so the following works:

zeep::xml::attribute a("x", "1");
auto& [name, value] = a; // name == "x", value == "1"

Input and output

The class zeep::xml::document derives from zeep::xml::element can load from and write to files.

streaming I/O

You can use std::iostream to read and write zeep::xml::document objects. Reading is as simple as:

zeep::xml::document doc;
std::cin >> doc;

Writing is just as simple. A warning though, round trip fidelity is not guaranteed. There are a few issues with that. First of all, the default is to replace CDATA sections in a file with their content. If this is not the desired behaviour you can call :cpp:zeep::xml::document::set_preserve_cdata() with argument true.

Another issue is that text nodes containing only white space are present in documents read from disk while these are absent by default in documents created on the fly. When writing out XML using iostream you can specify to wrap and indent a document. But if the document was read in, the result will have extraneous spacing.

Specifying indentation is BTW done like this:

std::cout << std::setw(2) << doc;

That will indent with two spaces for each level.

validation

This will not validate the XML using the DTD by default. If you do want to validate and process the DTD, you have to specify where to find this DTD and other external entities. You can either use :cpp:zeep::xml::document::set_base_dir() or you can specify an entity_loader using :cpp:zeep::xml::document::set_entity_loader()

As an example, take the following DTD file

<!ELEMENT foo (bar)>
<!ELEMENT bar (#PCDATA)>
<!ENTITY hello "Hello, world!">

And an XML document containing

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo SYSTEM "sample.dtd">
<foo>
<bar>&hello;</bar>
</foo>

When we want to see the &hello; entity replaced with ‘Hello, world!’ as specified in the DTD, we need to provide a way to load this DTD. To do this, look at the following code. Of course, in this example a simple call to :cpp:zeep::xml::document::set_base_dir() would have been sufficient.

    /* Define an entity loader function */
    auto loader = []
        (const std::string& base, const std::string& pubid, const std::string& sysid) -> std::istream*
    {
        if (base == "." and pubid.empty() and fs::exists(sysid))
            return new std::ifstream(sysid);
        
        throw std::invalid_argument("Invalid arguments passed in loader");
    };

    /* Create document and set the entity loader */
    zeep::xml::document doc;
    doc.set_entity_loader(loader);

    /* Read a file */
    std::ifstream is("sample.xml");
    is >> doc;

    using namespace zeep::xml::literals;

    /* Compare the doc with an in-memory constructed document, note that spaces are ignored */
    if (doc == R"(<foo><bar>Hello, world!</bar></foo>)"_xml)
        std::cout << "ok" << std::endl;

Serialization

An alternative way to read/write XML files is using serialization. To do this, we first construct a structure called Person. We add a templated function to this struct just like in boost::serialize and then we can read the file.

struct Person
{
    std::string firstname;
    std::string lastname;

    /* A struct we want to serialize needs a `serialize` method */
    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar & zeep::make_nvp("firstname", firstname)
           & zeep::make_nvp("lastname", lastname);
    }
};

int main()
{
    /* Read in a text document containing XML and parse it into a document object */
    std::ifstream file("test.xml");
    zeep::xml::document doc(file);
    
    std::vector<Person> persons;
    /* Deserialize all persons into an array */
    doc.deserialize("persons", persons);

    doc.clear();

    /* Serialize all persons back into an XML document again */
    doc.serialize("persons", persons);

    return 0;
}

attributes

Suppose you want to serialize a value into a XML attribute, you would have to replace zeep::make_nvp with zeep::make_attribute_nvp.

custom types

What happens during serialization is deconstruction of structured data types into parts that can be converted into text strings. For this final conversion there are __value_serializer__ helper classes. __value_serializer__ is a template and specializations for the default types are given in <zeep/value_serializer.hpp>. You can create your own specializations for this class for custom data types, look at the one for std::chrono::system_clock::time_point for inspiration.

enums

For conversion of enum’s you can use the __value_serializer__ specialization for enums:

enum class MyEnum { FOO, BAR };
__value_serializer__<MyEnum>::instance()
("foo", MyEnum::FOO)
("bar", MyEnum::BAR);

There’s also a new interface, somewhat more intuitive from a modern C++ viewpoint:

	enum class MyEnum { aap, noot, mies };
	zeep::value_serializer<MyEnum>::init("MyEnum",
	{
		{ MyEnum::aap, "aap" },
		{ MyEnum::noot, "noot" },
		{ MyEnum::mies, "mies" }
	});

	json j{ MyEnum::aap };
	assert(j.as<std::string>() == "aap");

XPath 1.0

Libzeep comes with a [XPath 1.0](http://www.w3.org/TR/xpath/) implementation. You can use this to locate elements in a DOM tree easily. For a complete description of the XPath specification you should read the documentation at e.g. http://www.w3.org/TR/xpath/ or https://www.w3schools.com/xml/xpath_intro.asp.

The way it works in libzeep is that you can call find() on an zeep::xml::element object and it will return a zeep::xml::element_set object which is actually a std::list of zeep::xml::element pointers of the elements that conform to the specification in XPath passed as parameter to find(). An alternative method find_first() can be used to return only the first element.

An example where we look for the first person in our test file with the lastname Jones:

zeep::xml::element* jones = doc.child()->find_first("//person[lastname='Jones']");

variables

XPath constructs can reference variables. As an example, suppose you need to find nodes in a special XML Namespace but you do not want to find out what the prefix of this Namespace is, you could do something like this:

int main()
{
    using namespace zeep::xml::literals;

    auto doc = R"(<bar xmlns:z="https://www.hekkelman.com/libzeep">
        <z:foo>foei</z:foo>
    </bar>)"_xml;

    /* Create an xpath context and store our variable */
    zeep::xml::context ctx;
    ctx.set("ns", "https://www.hekkelman.com/libzeep");

    /* Create an xpath object with the specified XPath using the variable `ns` */
    auto xp = zeep::xml::xpath("//*[namespace-uri() = $ns]");

    /* Iterate over the result of the evaluation of this XPath, the result will consist of zeep::xml::element object pointers */
    for (auto n: xp.evaluate<zeep::xml::element>(doc, ctx))
        std::cout << n->str() << std::endl;

    return 0;
}

Note

Please note that the evaluation of an XPath returns pointers to XML nodes. Of course these are only valid as long as you do not modify the the document in which they are contained.