Hyperlinking endnotes in ePUBs the easy way: Using Regular Expressions

Following an earlier post on Kindle and ePUB footnotes/endnotes, I wanted to quickly sketch down a way of automating the coding using regular expressions in Open Office.

One problem with Regular Expressions and GREP is that while it is easy to identify the end of a paragraph it is not possible to identify sentences in the same way. And although it would seem fairly logical that a sentence begins with a capital, or that there is a full-stop followed by a space followed by a cap, still all the pieces necessary to define the beginning of a sentence in a precise way without hiccup based on the end have eluded me so far.

So here's my solution thus far, which may need tweaking depending on the type of project you're working on. And if you find and replace all at once then proof the results afterwards.

Anyway, what I would recommend based on the type of work I do, is to replace all note referents with [1], [2], [3] and so on in the text while you edit. Then do the following.

Find:

(<p1>|<p>|<blockquote>)(.{1,600})(\[)([:digit:]{1,2})(\])|([^<p1>|<p>|<blockquote>]\. )(.{1,600})(\[)([:digit:]{1,2})(\])

Replace:

$1<a id="noteref$4"></a>$2<sup id="ref$4"><a href="#note$4">$4</a></sup>

The code will look unwieldy to those not familiar with Regular Expressions (or Regex), but what it is doing is finding the note referents between one and two digits long enclosed in square brackets and then looking back between 1 and 600 characters (more rather than less) for either a full stop followed by a space or an opening paragraph or blockquote tag. If you weren't fussy where the <a id="noteref1"></a> code was placed in your text (see earlier post) then this wouldn't matter so much, but I like to keep things as neat as possible.

One word of warning if you have, for example, author initials in the text followed by full stops either keep a close eye on the code or adapt to take account of this. You may also wish to tweak code based on length of sentences in the work you edit.

Note: This should only be regarded as a starting point.

The replace code places the <a id="noteref1"></a> at the very beginning of the sentence and the <sup id="ref1"><a href="#note1">1</a></sup> at the end (I'm using note 1 as an example but of course this will automate note 2, 3, 4, etc. as well - that's what Regex does).

Formatting the notes section is much easier, because we are working in paragraphs, but since this has been such a long post already, I'll leave that until another time.

Any hints and tips to extend this post, or if you've spotted an error, please comment - I'm learning too and all feedback is appreciated.

Comments