Digging the Meta in EPUB 3: Using the meta element to extend metadata in EPUB 3, 3.0.1, (not 3.1) and 3.2

One of the main areas addressed in recent changes to the EPUB spec is metadata, which is undeniably an important area. So I'm taking an opportunity here to look briefly but closely at the opportunities we are afforded within the content.opf file for enhancing the depth of our metadata.

We can see from the EPUB documentation that the opportunity exists to use ONIX to denote perhaps the use of a DOI through the meta/refines/property trio (which sometimes becomes a four-piece through the identifying of a scheme, as it does here):
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="pub-id">urn:doi:10.1016/j.iheduc.2008.03.001</dc:identifier>
    <meta refines="#pub-id" property="identifier-type" scheme="onix:codelist5">06</meta>
    …
</metadata>
Or to inform the use of an ISBN:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    …
    <dc:identifier id="isbn-id">urn:isbn:9780101010101</dc:identifier>
    <meta refines="#isbn-id" property="identifier-type" scheme="onix:codelist5">15</meta>
    
    <dc:source id="src-id">urn:isbn:9780375704024</dc:source>
    <meta refines="#src-id" property="identifier-type" scheme="onix:codelist5">15</meta>
    …
</metadata>
But true to the open nature of EPUB, we might decide to use MARC on some occasions, for instance for the description of a role (author, editor, illustrator, etc.):
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    …
    <dc:creator id="creator">Haruki Murakami</dc:creator>
    <meta refines="#creator" property="role" scheme="marc:relators" id="role">aut</meta>
    …
</metadata>
And going even further, a mix of XSD, ONIX, MARC and EPUB-specific values appears to be encouraged, a veritable pick and mix of markup languages:
Notice in each case the choice of property from the available metadata meta properties list.

Considering the programmers of reading systems

The issue exists that if we are going to cherry pick from different markup schemes and vocab, then the reading systems must be able to interpret these schemes for the metadata to be of use. The question is can they? In theory, yes. In practice, nobody seems too sure how sophisticated the platforms are and how much of this stuff gets used or is useful, especially when distributors, for example, prefer to receive metadata external to the EPUB in ONIX format or in more primitive ways.

I would personally question the wisdom of allowing the mix and match of markup schemes and vocab, because of the complexity it adds for parsers and the opportunity for error. And, since there is likely to be a limited subset of these schemes used it might well be preferable to have a bespoke range of EPUB vocab (which has been in part used anyway). This would avoid the sometimes puzzling decisions that are made because of adopting other standards. For example, the majority of our metadata can be contained within Dublin Core tags, e.g. <dc:date>, and yet for the modification of the package we must use <meta property="dcterms:modified"> which seems rather incongruous. This is because it fits the Dublin Core logic but not EPUB logic, the majority of Dublin Core borrowed in EPUB metadata are "elements" but "modified" is classified as a property. (This I would argue makes no sense in the context of an EPUB's metadata.)

Changes in EPUB3.1 -> EPUB3.2: The Brief Return and Subsequent Disappearance of the OPF Namespace

Programmers of reading systems must also be aware of what's on the horizon too, and it's worth being aware of changes that might occur to the future iterations of EPUB and where missteps have been taken.

The meta tag and its refines attribute make an interesting example here because refines is deprecated and meta is removed in EPUB 3.1, the current recommended standard, in favour of the opf namespace (which has been resurrected from EPUB 2). But even though EPUB 3.1 is the current standard, you should definitely not update your workflow. Why? First, because the tools are currently not EPUB 3.1 ready, the github page for EpubCheck notably warns "There is currently a severe shortage of developers working on the tool ... barely enough to handle critical bug fixes, and not nearly enough to undertake the necessary upgrades needed to keep the tool relevant, such as developing support for EPUB 3.1." Second, it has been written in "Why Specs Change: EPUB 3.2 and the Evolution of the Ebook Ecosystem" on the EPUBSecrets site that EPUB 3.2 is going to have rollbacks to EPUB 3 to address the lack of adoption (including reinstating meta and refines and re-obsoleting the use of the opf namespace) and that, if anything, the best thing to do now is to migrate to 3.0.1 instead of 3.1.

This is confirmed by the W3Cs own report:
EPUB 3.2 can be considered as a successor to both EPUB 3.0.1 and EPUB 3.1. While reverting certain incompatible changes to EPUB 3.0.1 that EPUB 3.1 introduced, it also retains most of the rest of the revisions that were made to EPUB 3.1.
Since EPUB 3.1 did not receive widespread adoption, the community group has decided to retain a list of all changes made during the EPUB 3.1 revision that are still reflected in EPUB 3.2. This decision was made to simplify comprehension of what has changed since EPUB 3.0.1 (i.e., readers do not have to review EPUB 3.1 before being able to determine how this new version updates EPUB 3.0.1).
This is good news for those who have stuck with EPUB 3 but I'd recommend you do nothing, not even move to 3.0.1 (even though EpubCheck has verified EPUB 3.0 upwards against 3.0.1 since version 4.0.0) unless you are confident that everyone in your chain of distribution can handle the version you plan on adopting. Otherwise your work might well be wasted. Move forward cautiously and test as widely as you can.

It's worth noting here that I see no mention of EPUB 3.0.1 in Apple's documentation or FAQs for iBooks, for example. Maybe Apple think it not worth mentioning and causes no issues, or maybe changes in 3.0.1 are not supported yet. The changes themselves are, as the version numbering suggests, fairly minor, lifting restrictions on the number of dc:source and dc:type elements identified in an EPUB's metadata, tightening the spine requirements to include all content, adding a new belongs-to-collection meta property, for describing series and sets, etc., and deprecating oeb-page-head and oeb-page-foot. But, most important of all, the version number you employ within the <package> tag of  your content.opf file should remain 3.0.

An EPUB that adheres to EPUB 3 should not break under a reading system that adopts 3.0.1 but can the same be said for a reading system that strictly enforces EPUB 3 when presented with an EPUB 3.0.1 that it has no way of knowing is any different from what it expects from an older EPUB 3? While iBooks and others make no mention of 3.0.1, I'm not going to make any assumptions. But the thing I am going to do is to keep an eye on EPUB 3.2, because there are deprecations there that act as a warning to not bother implementing new things that you haven't got round to yet: for example, epub:switch and epub:trigger.

Based on what appears to be happening in EPUB 3.2, it would seem like the people setting the EPUB standards are noticing the extraneous stuff that just doesn't get used and they are willing to prune when they think it is necessary. I'm hoping that this appreciation goes further and that EPUB becomes a standard that becomes more efficient to employ, explore and experiment with, to the point where there is greater innovation.

Further Reading

EPUB 3 metadata vocabulary: http://www.idpf.org/epub/30/spec/epub30-publications.html#sec-package-metadata-vocab

EPUB 3.0.1 changes to 3.0: http://www.idpf.org/epub/301/spec/epub-changes.html

EPUB 3.1 changes to 3.0.1: http://www.idpf.org/epub/31/spec/epub-changes.html

EPUB 3.2 changes to 3.0.1: https://w3c.github.io/publ-epub-revision/epub32/spec/epub-changes.html


Comments