Adventures in PDF: Swift and PDFKit


Working with PDFs in OS X is straightforward using PDFKit (update: and in iOS since iOS 11).

PDFDocument

It all starts with the PDFDocument class and in order to leverage this you simply import Quartz and get started. Like so:
import Quartz

let url = NSBundle.mainBundle().URLForResource("myPDF", withExtension: "pdf")
let pdf = PDFDocument(URL: url)
pdf.pageCount() // number of pages in document
pdf.string() // entire text of document
As you can see counting the number of pages and accessing the text is straightforward and there's plenty more to explore alongside this, but I want to move quickly along to working with pages.

PDFPage

While you can access a string of the entire document and this is useful for searches, more often you'll want to work with PDFPage instances. You access pages by their index:
import Quartz

let url = NSBundle.mainBundle().URLForResource("myPDF", withExtension: "pdf")
let pdf = PDFDocument(URL: url)
let page = pdf.pageAtIndex(10) // returns a PDFPage instance
page.attributedString() // attributed string for the PDFPage instance

You could convert an entire PDF to a single NSAttributedString:
import Quartz

let url = NSBundle.mainBundle().URLForResource("myPDF", withExtension: "pdf")
let pdf = PDFDocument(URL: url)
let docStr = NSMutableAttributedString()
for i in 0..<pdf.pageCount() {
    docStr.appendAttributedString(doc.pageAtIndex(i).attributedString())
}

but this would take a good deal of time for a long PDF and if you are looking to simply display the PDF then PDFView is the class you're looking for.

PDFView

Creating a PDFView is once again a straightforward thing to do:
import Quartz

let url = NSBundle.mainBundle().URLForResource("myPDF", withExtension: "pdf")
let pdf = PDFDocument(URL: url)
let view = PDFView(frame: CGRect(x: 0, y: 0, width: 500, height: 750))
view.setDocument(pdf)

PDF Extras

Beyond the document, page and view classes there are PDFOutline, PDFSelection, and PDFAnnotation. The latter being a superclass of twelve further classes. Completing the PDFKit classes there's also PDFBorder, which is used for adding decoration to annotations.

To explore in depth read through the PDFKit Programming Guide.

Comments

  1. Hi Anthony - quick question. In the returned PDFPage.attributedString(), are the x,y,width,height of each word attributed?

    ReplyDelete
  2. Hello there!

    This article is great

    One question: how can I open pdf that are in any part of HDD?

    ReplyDelete
    Replies
    1. Hi Wilson,
      You can either read the Apple Docs on File Handling or use a library I created called SwiftFiles. With the latter, having copied the files from the playground into your project, you can then write for example:
      let data = FileLoad.loadData("myFile.pdf", directory: NSSearchPathDirectory.DocumentDirectory, subdirectory: "")
      let pdf = PDFDocument(data: data)

      Delete
    2. let url = NSURL(string: "file:////Volumes/SPACE/Documents/a.pdf")

      for example

      lets not make a meal out of this!

      Delete
  3. how to Generate PDF file with attributed text in swift

    ReplyDelete
    Replies
    1. There's an ObjC post on Cocoanetics that I'm in the middle of translating into Swift that might help.

      Delete
  4. This is great, as I'm looking as doing some Swift scripting on PDFKit. I'm a bit new to the language. How do you get the actual page number out of "pdf.pageCount()"?
    I've tried "let pageNum = pdf.pageCount()", but that doesn't seem to work.

    ReplyDelete
    Replies
    1. A PDF page can tell you its page number based on physical number of pages in PDF but it might also have a different page number assigned from InDesign, for example, if the numbering doesn't begin at one. Sample code (Swift 3):

      import Quartz

      if let url = Bundle.main.url(forResource: "0.5-Glass", withExtension: "pdf"), let pdf = PDFDocument(url: url) {
      let page = pdf.page(at: 10) // returns a PDFPage instance
      page?.label // page number given to page
      page?.pageRef?.pageNumber // page number based on position in PDF
      }

      Delete
  5. pageNum = pdf.numberOfPages
    works for me.
    pageNum = pdf.pageCount()
    doesn't.
    Thanks.

    ReplyDelete

Post a Comment