#imagine - the future of the ebook


  1. Imagine a student learning Spanish, French and German with English as their native language, wouldn't it be great if they could switch between all four languages in a book with the simplest of gestures?
  2. Imagine if a technical book could increase in complexity every time that you read it, so that complex ideas, at first discussed in brief could be expanded on subsequent reads.
  3. Imagine a children's book used in a school setting that could have its reading age adjusted or the number of images increased and decreased for each child without sourcing separate editions.
  4. Imagine a book where all previous editions, including their prefatory material and notes, were contained in the same book, and they could be switched between at will on a paragraph or book level
  5. Imagine all of these things not having to be produced separately for a single format, but being created for all formats current and future all at once.
I'm thinking of an expansion of books like these bilingual books from Doppeltext, but with the technology baked into the ereading apps, so there's less JavaScript and better searching across versions, more flexibility, etc. so that the books can be made available across the web using an API (that is standardised as far as possible), and which is ready to go for future formats.

What point am I trying to make?

In order to achieve (or at least work towards) the above aims in full it is necessary not to become fixated on any one endpoint, but to leave open all possible endpoints (or containers) for our books and publications. To achieve this, the obvious answer for dynamic delivery appears at present to be JSON (JavaScript Object Notation) or some adaptation of it, because it plays so nice with PHP, JavaScript, Objective-C, Android, etc.

What about XML?

At the print and document origin end of things XML cannot be ignored. This is because everything from Word to InDesign plays well with XML. There are ways and means to convert files from XML to JSON, such as with oXygen, but we need to be careful to keep the workflow together as much as possible to prevent fragmentation and error. We also need to think carefully about structuring JSON documents and any interrelationships between XML and JSON, given that XML has been far more dominant in publishing thus far.

For a more detailed discussion of the translation between JSON and XML, see 'Embracing JSON? Of course, but how?' by Eric van der Vlist, which is contained in the proceedings for XML Prague 2013 (free to download).  See also the HPub format which uses JSON for its manifest file and HTML5 for its content.

For a wider view of how JSON and XML are beginning to combine see this earlier post.

When will the technology be available to action this from the user end of things?

It's already with us, inside ereader apps that support JavaScript (e.g. iBooks), we just need pressure on the IDPF and ereaders/ereader apps to raise the level of sophistication ever so slightly to recognise, and search, multiple versions, for example, by reading data (e.g. JSON/XML) in a standardised way (or having some interface to make this possible).

Isn't JSON a fad, and aren't you becoming transfixed on it as an 'endpoint'?

A fad is always a danger in the tech world, but JSON has been around for long enough and proven itself to be transformative. As to the question of whether I've become transfixed on it, the benefit of JSON is that it is a plain text format for storing data, just like XML, and is human-readable in the same way. It can also be manipulated in similar ways to XML, but as I've written already it does this in a way that fits better with the way the web is evolving most of the time. Something XML can't ignore, and in turn XHTML and EPUB can't ignore.

As @aramanc of Pubfluence reminds us in this article we should remember that we are only at the beginning of the new digital pathway in publishing. The decision-making process isn't over, there's still a great distance left to run and we all need to be involved in this.

Why do you think these views are important?

The majority of programming is built around a model-view-controller format, where data (model) is separated from presentation (view) by an intermediary (the controller), and this separation has been the major factor in allowing the writing of code to accelerate thanks to the ease of repurposing code and making code clearer to read by being contained in smaller blocks.

Recently, with ebooks, there seems to be a (slight?) trend in the opposite direction encouraging us towards less separation (see Is a book a BLOB?), by not only moving away from data formats like XML when deemed unnecessary, but also to think of formats like EPUB 3 in isolation.

My view is that if we begin turning away from the book as data, rather than evolve it, it will be more difficult to implement the kinds of things I imagine the book can become with ease and grace, now and in the future.

Additionally, it makes sense that we don't allow the data side of the ebook to remain stoically attached to XML and XHTML if all the technologies that it is interfacing with would work better and faster with JSON.

Finally, it is essential that this type of discussion takes place so that the roadmaps for EPUB 4 and EPUB 5 can be drawn.

Another piece to the jigsaw

At a slight tangent, but still a part of this debate, @JulietaLionetti recently pointed to an article called 'A Publisher’s Job Is to Provide a Good API for Books: You can start with your index' by Hugh McGuire. It is a brilliant article that hits the nail on the head with indexes.

Hugh's article highlights that an index is ripe for sharing through a web-based API. And I would further add that representation in JSON seems obvious, where the words are keys and the page numbers are arrays stored in those keys. If publishers were to follow this pattern and approach, then comparison between indexes for matching keys across books should be a simple matter for programmers.

Manipulation of indexes in this way would be very similar to the hashtag on twitter. Imagine clicking on a topic and not seeing a list of tweets but seeing a list of book entries that can then be filtered to provide exactly what you are looking for without the hit-and-miss of a google (book) search.

We are not so very far from this becoming a reality.

Concluding remarks

What I've described here (prior to the 'Another piece to the jigsaw'), I view as the final prong of a three-pronged approach to the book as data. Here are the three prongs: (1) metadata, which links books by their categories and exposes a work to the book buying world; (2) making a book connect to other books on a fine-grained level using the index, as described by Hugh McGuire (see 'Another piece to the jigsaw'); (3) allowing the book to contain multiple language versions, levels of complexity and editions without excessive bloating.

What's next?

Tomorrow, I will publish a blogpost outlining my vision of what a book constructed entirely in JSON looks like on a code level. It will be called, 'A list of rules for a JSON book – a manifesto of sorts'. I'll then be keeping very quiet and seeing what level of agreement is out there for all of this.

Basically, that means it's up to you to comment and tweet, and make this future a reality if you think it's a good one.

Who's in?


Comments