zenmumbler

Writing a Modern C++ JSON Reader (part 1)

July 15, 2013

(This is part 1 of N in a series about writing a modern C++ JSON engine, the next post will be available soon.)

Does the world need another JSON parser? Perhaps not, but I made one anyway. In this series of posts I'll summarize how I did it and what I've learned.

The nice thing about a JSON engine, especially one written in a non-scripting language, is that it presents a couple of nice programming problems on a small enough scale that you can experiment with the implementation quickly:

For my implementation, named krystal, I chose to focus on the points below. I will elaborate on each in the coming posts.

Motivation

I was looking at writing a simple static blog generator and first made a version that is serverless, but which generates the site largely on the client in JavaScript. This is the site you're looking at now.

Moving things to a C++ generator would require me to work with JSON data that stores e.g. site generation settings and post metadata. I needed a C++ JSON DOM parser. It wasn't long until I found the json_benchmark project that compared a couple of the popular JSON engines on parsing speed.

rapidjson is by far the fastest of them all and to get that speed it pulls no punches: it creates objects in place inside the json text, all objects are custom and use a memory pool allocator, it has SSE3 and 4 optimized white-space skipping functions, etc.

Now, I like performance, I wrote a NES emulator in PowerPC assembly. I am down with low-level stuff. Right now though, I'm trying to learn me some C++.

I came back to C++ recently and I for now I like to keep things as "pure" as possible as to not let old coding styles and habits take hold in my new projects. There's a lot of "C+" code out there, C and C++ concepts mixed together with a ton of MACROS thrown in for good measure. That's not for me, so my goal became clear:

To make a JSON parser that gets as close as possible to rapidjson speed-wise while only using standard library storage and string classes. I required C++11 to use the things like std::unordered_map, lambdas, move semantics and other new goodies.

The Result

I will start at the end and show how you might use krystal. This will set the stage and hopefully make things clearer later on.

auto file = std::ifstream{"game_data.json"};
auto doc = krystal::parse_stream(file);

if (! doc.is_null()) {
    auto delay = doc["levels"][0]["zombie spawn delay"].number();
    auto name = doc["levels"][0]["level name"].string();

    for (auto kv : doc["levels"]) {
        auto level_index = kv.first.number_as<int>();
        auto& level_data = kv.second;
        ...
    }
}

parse_stream returns a document which is the root of the tree of values of the JSON data. Each value is a variant type. JSON supports 7 types of values: null, true, false, bool, number, string, array and object. The first 5 are pure values and the last 2 are containers. arrays map ints to values. objects map strings to values.

krystal allows direct indexing into arrays and objects. It also allows iteration over the contents of arrays and objects. Because krystal exposes begin and end methods for values, you can use C++11 ranged for loops. For consistency, each iteration returns a std::pair<value,value&>. The first value is the key (a number or a string) and second is a ref to the actual value.

Each value has value kind1 tests and data extraction methods. Calling the wrong data method will throw an exception.

// variant kind tests
value_kind type() const;
bool is_a(const value_kind type) const;
bool is_null() const;
bool is_false() const;
bool is_true() const;
bool is_bool() const;
bool is_number() const;
bool is_string() const;
bool is_array() const;
bool is_object() const;
bool is_container() const { return is_object() || is_array(); }

// data extractors
bool boolean();
double number();
template <typename T> T number_as() const;
std::string string();

Finally, there are methods that deal with containers. Again, calling container or array/object specific methods on values of a different kind will yield little but exceptions.

size_t size() const; // number of items in container
bool contains(const std::string& key) const; // object key check

const value& operator[](const std::string& key) const; // objects
const value& operator[](const size_t index) const; // arrays
iterator begin() const; // all iterators are const_iterators
iterator end() const;

By only having a std::string key param and no const char* alternative I avoid the gotcha present in rapidjson where the zeroth array item has to be indexed like val[size_t{0}] to avoid ambiguities with NULL. Type Win!

And on that note I will end this first part of this article. Next time I will cover the how and why of the evolution of the class design. In the mean time please have a look at the project on github and, if you have Clang 3.2 or 3.3, try it out for yourself.

  1. I first named it value_type but changed this after I moved to templated code for reasons you may understand.

Reply @zenmumbler to discuss this article.