Lean object encoding alternative to JSON

Lean object encoding alternative to JSON

JSON is Great, but is it Great for Everything?

JavaScript Object Notation (JSON) has exploded in popularity in recent years, and for good reason. It is simpler and less verbose than XML, and is widely and well supported by default on the majority of modern software platforms. Native support and ease of use have made it the go-to option for applications such as the format used in API data exchange, small scale data storage, and configuration files to name a few. With that in mind though we need to consider that something built for one thing is not necessarily ideal for all others.

JSON is extremely flexible and versatile. A developer can define a variety of object property types and names, and the syntax is simple, straightforward, and easily read by people and machines.

{
   "id": 1,
   "abbreviation": "appl",
   "name": "Apple",
   "types": ["Macintosh", "Golden Delicious", "Fuji", "Gala"]
}

This syntax is clean, simple, lean, and easy to read. It's all these reasons why JSON has become so popular, and makes it a great choice for encoding code/data objects.

Seems Like You're Pro-JSON

So why consider reinventing the wheel? This wheel works great, you said so yourself! Let's consider the point earlier about other applications for JSON such as a data exchange format for APIs. APIs often deal with multiple objects at one time, especially when updating a client with a list of information. Let's see what a list of fruits looks like.

[
   {"id": 1, "abbreviation": "appl", "name": "Apple"},
   {"id": 2, "abbreviation": "pear", "name": "Pear"},
   {"id": 3, "abbreviation": "bana", "name": "Banana"},
   {"id": 4, "abbreviation": "bkby", "name": "Blackberry"},
   {"id": 5, "abbreviation": "strw", "name": "Stawberry"},
   {"id": 5, "abbreviation": "pech", "name": "Peach"},
   {"id": 6, "abbreviation": "plum", "name": "Plum"}
]

Notice the problem? All the objects in the array have the same set of properties, and yet the property names are repeated for each object; in the final result the property names take up more space in the data set than the data itself. If you're wondering how common a set of data like this is, just try doing a select from a database table and encode the results.

Do I Care?

But hold on wait; is this actually a problem? What's the big deal about a few more bytes? The answer to that is it depends. Many people would say that a high speed internet connection on a mobile device or desktop can easily absorb the extra data with a negligible drop in performance. Data storage as well is improving in size and speed, so inefficient storage is less and less of a worry all the time.

So does this matter? Well high speed internet does not mean you have good service. Busy festivals and events in locations that don't normally see a high density of mobile devices, will often show full bars but limited practical connectivity. Rural and under-served urban areas also suffer peaks of traffic that limit throughput.

This argument could go on and on with arguments from all corners technically being right, so the answer again is that it depends. Does your application need a leaner method of encoding data that can reduce bandwidth and storage requirements? If you do then regular JSON has some shortcomings.

What's the Solution?

Introducing Lean Object Encoding Notation (LOEN, pronounced "loan"), a form of encoding notation based on JSON. LOEN takes the next steps in the path set out by JSON, and focuses on removing anything that isn't needed. Let's start with that same fruit example.

{
   id +1
   abbreviation :appl
   name :Apple
   types [:Macintosh :"Golden Delicious" :Fuji :Gala]
}

Kinda similar so far really; it follows the same flow as JSON, and is still human readable. The biggest difference between JSON and LOEN is the absence of double quotes for property names, and strings which don't require them; double quotes are now used only for strings and property names that might break the encoding. In order to differentiate between value types like integers, floats, string, etc., a variety of value prefixes are employed instead of always using a colon. Commas are still used to separate properties but may be omitted if, as in the above example, whitespace is present.

This is neat I suppose but losing some double quotes is probably not going to save enough characters to justify a whole new encoding scheme.

Lean Lists

As I mentioned before, it's with lists of data where JSON really starts to break down. Let's see what the list of fruit looks like when LOEN encoded.

<
   [:id :abbreviation :name]
   [+1 :appl :Apple]
   [+2 :pear :Pear]
   [+3 :bana :Banana]
   [+4 :bkby :Blackberry]
   [+5 :strw :Stawberry]
   [+5 :pech :Peach]
   [+6 :plum :Plum]
>

Ok now we're talking. Coming in at just over half the size of the JSON equivalent, this short list shows just how much potential there is for savings; add gzip on top of this and your storage and/or bandwidth requirements are significantly reduced.

In an array of objects, JSON duplicates the properties of each object in order to guarantee that the output matches the input. In many lists the properties are the same for all entries, so the extra labels are unnecessary. LOEN scans the input arrays before encoding to verify all entries have identical properties; if it does a new array is created with the first entry containing the keys for all subsequent properties; this special type of array is denoted by the greater and less than symbols. The encoding system supports sub arrays and properties, so there are no restrictions to the complexity of object supported.

Can I Give it a Try Before I Commit?

The LOEN project is open source, and free to use for any purpose. More details on the encoding methods and some libraries for implementation are available on the project's GitHub page.

What's the Plan?

The LOEN project is not intended for building libraries, but rather to develop the encoding into a standard from which libraries may be developed. Some libraries will be developed and maintained as examples, to demonstrate LOEN and encourage development; I highly encourage anyone with resources, skills, and/or interest, to develop new and/or improved LOEN libraries. With any luck, we'll one day be able to rid ourselves of all those annoying extra double quotes and commas!

Posted on