Is a book a BLOB?

In an important post on the Safari Books website, called 'The unXMLing of digital books', Liza Daly declares that XSLT and XML are unnecessary for certain books (trade in particular, but also non-fiction), and that we should loosen the chains of XHTML from the EPUB standard, or at least create a new mini-standard for books that don't require its full weight.

While I'm in agreement with much of the argument, and can see that real-world digital workflows can become encumbered by the expectations of XML and its associated standards, there are two things that I struggle to accept.

First, that a book is ever a BLOB, a set of data that a machine is unable to penetrate, and only a human can understand. Yes, the structure of a book follows a fuzzy logic and fuzzy connections that align with human responses, but are books unstructured? Or structured to such a complex level that these structures can never be uncovered? No, this is the work of literary criticism, and literary criticism works to uncover these connections, and there is no reason to believe that the sophistication of literary criticism won't change as it gets up to speed with modern types of data and programming.

In this vein, there has already been Google's work of mapping areas of the globe contained within books (and a recent adaptation of Crime and Punishment for iOS that plots the novel through modern-day Petersburg), and although many in the literary field would connect such activities to formalism and structuralism, it represents not only a start, and an indication of the direction literary criticism might take, but also points to how tech-enabled writers of fiction and non-fiction could change the way we look at books.

Second, HTML is typically an end point to be displayed in a browser (or in this case an ereader). In this sense it is like a PDF, and a few years ago there was a great wave of panic across the publishing industry when publishers realised that even if they did hold the print-ready PDFs for titles that actually these weren't as useful as the InDesign, Quark or XML files when they wanted to convert the material into ebooks.

So what if books go the other way, and we want ebooks to become print books? We need enough structure for InDesign or Quark to apply its styles, and for the book without too much hacking about to become something that the typesetting package understands.

This is because the more hacking about we do, the more risk exists that errors are introduced, and here is one of the most fundamental issues with maintaining a print and electronic workflow: that each format is sourced from the same raw text, so that the formats don't vary unintentionally, so their edits don't slip out of sync with each other, and so we save time over the life of a book.

XML provides this raw text solution, because it can be imported into typesetting packages, its styles can be mapped, and we can have something as close to consistency across print book, PDF and ebooks as possible. Not to mention the freedom XML offers from being dependent on one piece of software.

Further, to abandon XML creates a few issues, the first of which is that it assumes that EPUB or Kindle will remain the standard for ebooks forever, or at least there will only be incremental updates that maintain compatibility. Not some epic change that pushes aside HTML (it could happen!). The second is that we restrict the possibilities of how text can be manipulated and transmitted across devices, which risks locking us down to the ereader mentality, when actually we want to be repurposing material across the web and mobile apps, and not necessarily simply packaging it in ebooks or presenting it in web browsers.

To this end, rather than take away structure and accept that books are big jelly blobs of matter that can't be understood as data, it actually appears to me that most books can be structured for the modern Internet by thinking about how we can simplify the XML, and lighten the weight of that data, along the lines of JSON so that the ease of repurposing can be accelerated, and everything can be made less clunky.

Postscript

I was pointed to this related article on the topic of EPUB3 by Nellie McKesson thanks to @bbirdiman and @__mharrison__ and while I agree whole heartedly with the concept that we can:
  • Take the material for the printed book and translate it into an eBook, but redefine what the building blocks mean in the context of the screen. (“What is a sidebar? What is a note? What is an index?”)
I find myself unable to accept the idea that we should:
  • Make an eBook-only product with no regard for PDF or print, targeted at the screen and designed accordingly.
Not for the reason that all books should be printed on paper or released as a PDF, but for the discipline imposed by thinking of the content outside of the container. For, while future versions might not echo those of the past, they'll certainly be different from the present.

Further reading



Comments