EPUB2 to EPUB3: Lessons Learnt in Conversion


Following a recent conversion of an EPUB2 file created with Sigil, I'm recording the real-world issues (and realisations) that arose. (Note: in order to understand this post you should be aware of the zipping and unzipping process for EPUBs.)

META-INF folder and container.xml file

First, nothing about your META-INF folder or mimetype changes. As always the container.xml file points to the root file, which remains the OPF one.
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml">
</rootfile>
</rootfiles>
</container>
Note: the folder needn't be called OEBPS and the file needn't be called content.opf. The filename simply needs to end .opf and the full-path of the rootfile needs to point to it.

Changes to the content.opf file

Update version attribute of package tag from 2.0 to 3.0

<package prefix="cc: http://creativecommons.org/ns#" unique-identifier="uid" version="3.0" xml:lang="en" xmlns="http://www.idpf.org/2007/opf"></package>
This tells any parser (and epubcheck) that this is an EPUB3.

Removal of opf: namespace from content.opf file

Change the opening metadata tag:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
to:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
and with this required change all opf stuff seems to be out.

Remove opf:scheme attribute from dc:identifier

<dc:identifier id="uid" opf:scheme="UUID">56ca9730-7e4a-446b-962b-74db6533d168</dc:identifier>
or
<dc:identifier id="uid" opf:scheme="UUID">urn:uuid:56ca9730-7e4a-446b-962b-74db6533d168</dc:identifier>
now looks like:
<dc:identifier id="uid">56ca9730-7e4a-446b-962b-74db6533d168</dc:identifier>
or
<dc:identifier id="uid">urn:uuid:56ca9730-7e4a-446b-962b-74db6533d168</dc:identifier>
Notice that the value of the id attribute ("uid") in the dc:identifier tag pairs it with the attribute unique-identifier in the opening package tag, which has the same value.

Change opf:role to id-linked meta refines

<dc:creator opf:role="aut">Nathaniel Stern</dc:creator>

now looks like:

<dc:creator id="creator">Nathaniel Stern</dc:creator>
<meta refines="#creator" property="role" scheme="marc:relators">aut</meta>
Under the marc:realtors scheme, "aut" is author, "edt" is editor, "ill" is illustrator and there are many more. (Remember that id is always unique so you might use "creator1", "creator2", etc.)

When not to use dc:creator

While there is "pbl" to denote a publisher, you actually use:
<dc:publisher>Gylphi Limited</dc:publisher>
In addition to the dc:creator tag there is also dc:contributor, which is identical in its usage but denotes a secondary rather than primary role in the publication. Note also that where you have a foreign author, for example, you can provide an "alternate-script" property to give the name in a different language or script. This is just one example from the list of metadata properties.

Remove opf:event from dc:date

<dc:date opf:event="publication">2000-01-01T00:00:00Z</dc:date>
now simply looks like:
<dc:date>2000-01-01T00:00:00Z</dc:date>

When a toc.ncx file is present, spine open tag should read:

<spine toc="ncx"></spine>
where ncx is the id of the toc.ncx declared in the manifest. If you do not include an ncx file simply remove toc="ncx".

The xhtml navigation document

Included in the EPUB3 should be a navigation file. This is a regular xhtml file. It can double as your contents if you wish by including it in the .opf spine listing (but this is not required).

It must have at the very least a nav element with epub:type toc, like so:
<nav epub:type="toc" id="toc"></nav>
Inside the nav tag is an ordered list of contents. In addition to the epub:type attribute "toc", there are also "landmarks" and "page-list" (see here), which are optional but landmarks is encouraged and "page-list", if there is a print version of the book.

The rest of the xhtml file is down to presentation, aside from the aforementioned requirement that entries inside the nav tags are written in the form of ordered lists.

Entry of the xhtml navigation document in the opf manifest section

The file itself must be listed in the .opf manifest section with the additional property attribute identifying it as the nav file.
<item href="content.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav"></item>
All Publication Resources must declare any applicable descriptive metadata properties as defined in Manifest item Properties via the item element properties attribute. Exactly one item must be declared as the EPUB Navigation Document using the nav property - http://www.idpf.org/epub/30/spec/epub30-publications.html

Update: Further properties and ids

To find out more about properties and ids in the .opf document turn to Liz Castro's slideshare presentation from Digital Book World 2013. Here we find additional recommended uses of the properties attribute. For example, to define a cover image within the manifest using the value "cover-image"
and to describe levels of title and subtitle with the dc:title tag using the id attribute

HTML5 doctype declaration and removing DTDs

Finally, all xhtml files (including the navigation one) must use the HTML5 doctype declaration:
<!DOCTYPE html>
and be followed by a HTML opening tag similar to this:
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
Note: you are not allowed to link to an external DTD inside the doctype declaration.

Verifying the EPUB

You are now ready for a verification check with epubcheck, and hopefully all looks like this:
If you want an additional check, then try Pagina, which was recommended to me by @JeanKaplansky on twitter. There's also epubtest.org, a site which Richard Pipe writes about here. And the new tool from Joshua Tallent, FlightDeck.

Further reading

EPUB3 Now!  at IDPF Digital Book World 2013 (Pigs, Gourds, and Wikis)

Comments

Post a Comment