What's in an EPUB 3 filename? Not much, it's all in the extension ... most of the time.


An interesting question popped up on twitter today
and it reminded me about how we become so used to the names we assign to files and folders within our EPUB structures that we might forget which are set in stone and which we've assigned or have been arbitrarily assigned by apps like InDesign for us.

Thankfully it is fairly clear cut. The mimetype and files within the META-INF folder (along with the folder itself) have fixed names. But the only required one that must be in the folder is the container.xml file and this file identifies the root file of the EPUB package, i.e. the one that identifies the location of all the others. All other files and folders can have their names freely chosen (within the limits of unicode).

META-INF/container.xml

The IDPF provides the following example of what the container.xml file might look like (and largely does):
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="EPUB/My Crazy Life.opf"
            media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
The text of this file is fixed in all but the full-path, which is entirely set by the creator but must point to a file with the OPF extension. This means that the folder name and file name contained in the path can be called anything as long as they match the folder and filename in the actual EPUB. This allows for multiple root files where different versions of the book are contained within the same package (see IDPF example).

Additional files in the META-INF folder

Other files that may be included in the META-INF folder are (1) an encryption.xml file, which must be added if any content is encrypted; (2) manifest.xml and metadata.xml, which are reserved names that may be used but have yet to have their purpose and inner tags defined by the spec; (3) the rights.xml file is available to handle DRM, but its use is once again undefined by the IDPF; (4) finally, the signatures.xml provides a means of including digital signatures, and the documentation provides details of the content for this file.

mimetype

The mimetype file is a fixed and simple one-line file:
application/epub+zip
But there are certain rules about this file:
OCF ZIP Containers must include a mimetype file as the first file in the Container, and the contents of this file must be the MIME type string application/epub+zip encoded in US-ASCII [US-ASCII].
The contents of the mimetype file must not contain any leading padding or whitespace, must not begin with the Unicode signature (or Byte Order Mark), and the case of the MIME type string must be exactly as presented above. The mimetype file additionally must not be compressed or encrypted, and there must not be an extra field in its ZIP header.
Note in particular the requirement of US-ASCII text-encoding rather than Unicode, and the avoidance of any whitespace around the text.

Content of the OPF file

The OPF (or Package Document) has a root-level tag after the XML declaration called the package element, which requires version and unique-identifier attributes to be fulfilled. It must also have three nested tags: metadata, manifest and spine, which in turn have their own nested elements.

Within the essential requirements of the OPF file, we have the barebones of an EPUB document. As long as we fulfil these requirements, one of the most crucial of which is the manifest's requirement that exactly one item must identified as the EPUB Navigation Document (and that this file must exist in the EPUB), then we will have along with the META-INF content and mimetype a valid EPUB once zipped together.

Note: as a point of interest the EPUB Navigation Document can also be re-used as a table of contents for inside the EPUB.



Comments