Some useful GREP to help recycle content.opf and toc.ncx elements in an EPUB 3 toc.xhtml file


There aren't a huge number of steps required to transform a compliant EPUB 2 into an EPUB 3, aside from the requirement of a toc.xhtml file and specifying your EPUB as type 3.0 in your content.opf file.

If you have existing EPUBs or follow a workflow in which you create EPUB 2 first, then you might well find a need to manually build the toc.xhtml file.

Doing this by typing each entry into the toc.xhtml file would be tedious. Instead I'd recommend using a GREP or RegEx capable text editor like TextWrangler (OS X).

Apple's sample EPUB 3 from iTunesConnect

Apple includes three main sections in its sample toc.xhtml file (inside the EPUB 3 sample file available from iTunesConnect).

<nav epub:type="toc" id="toc">
<nav epub:type="landmarks">
<nav epub:type="page-list">
The elements within each of these are list items arranged in ordered lists. Starting with the epub:type="toc" you will notice if you compare the entries here in this section of the toc.xhtml to an EPUB2 that they are the equivalent of navPoints in a toc.ncx file. So we'll deal with those first and then move through the others using GREP.

Convert navPoints (in toc.ncx) to toc.xhtml entries

navMap tags simply become ol tags in the toc.xhtml equivalent and you can use a GREP find and replace to do the rest. Something like this.

Find:

<navPoint id="navPoint-[0-9]{1,}" playOrder="[0-9]{1,}"> <navLabel> <text>([^<]+)</text> </navLabel> <content src="([^<]+)"/> </navPoint>

Replace:

<li>
<a href="../\2">\1</a>
</li>

Remember that navPoints can contain nested navPoints

Find:

<navPoint id="navPoint-[0-9]{1,}" playOrder="[0-9]{1,}"> <navLabel> <text>([^<]+)</text> </navLabel> <content src="([^<]+)"/> <navPoint id="navPoint-[0-9]{1,}" playOrder="[0-9]{1,}"> <navLabel> <text>([^<]+)</text> </navLabel> <content src="([^<]+)"/> </navPoint> </navPoint>

Replace:

<li>

<a href="../\2">\1</a><ol><li style="list-style-type: none"><a href="../\4">\3</li></ol>

</li>

Note: Use BBEdit's Text > Remove Line Breaks option to flatten text, making finding easier.

From Guide (content.opf) element to landmark

If you've used the add semantics function in an app like Sigil then this would've auto-built a guide section for you in the content.opf file. Now here's how to transform those entries with GREP. (This time changing <guide> to <ol> first.)

Find:

<reference href="(.*)" title="(.*)" type="(.*)" xmlns="http://www.idpf.org/2007/opf" />

Replace:

<li><a epub:type="\3" href="\1">\2</a></li>

Note: if your toc.xhtml file is in the main OEBPS folder and the text is within a subfolder you might need to alter the replacement href - e.g. href="Text/\1" - to make this work.

Convert spine in content.opf to page list entries

Let's suppose you 'shock horror' didn't take advantage of the page-list functionality in EPUB 2 and now want to build a page-list in EPUB 3 since you've seen the light and have decided to start behaving like a grown up. Easily done, take the spine from content.opf and use this GREP:

Find:

<itemref idref="(.*)" />

Replace:

<li><a href="\1#pageno.">pageno.</a></li>

Note: the same applies here about the href as in the previous example, and the italicised pageno. needs replacing in both instances with your actual page numbers – sorry nothing I can do to save you time here!


Comments