Original methods (using Cocoa)
A week or so ago, I wrote a series of methods for a
Gist String extension fork that simply borrowed unthinkingly from the Cocoa Framework:
extension String {
func splitStringByCharacters() -> [String] {
var arr = [String]()
self.enumerateSubstringsInRange(Range(start: self.startIndex, end: self.endIndex), options: NSStringEnumerationOptions.ByComposedCharacterSequences, { (substring, substringRange, enclosingRange, bool) -> () in arr.append(substring)})
return arr
}
func splitStringBySentences() -> [String] {
var arr = [String]()
self.enumerateSubstringsInRange(Range(start: self.startIndex, end: self.endIndex), options: NSStringEnumerationOptions.BySentences, { (substring, substringRange, enclosingRange, bool) -> () in arr.append(substring)})
return arr
}
func splitStringByLines() -> [String] {
var arr = [String]()
self.enumerateSubstringsInRange(Range(start: self.startIndex, end: self.endIndex), options: NSStringEnumerationOptions.ByLines, { (substring, substringRange, enclosingRange, bool) -> () in arr.append(substring)})
return arr
}
func splitStringByWords() -> [String] {
var arr = [String]()
self.enumerateSubstringsInRange(Range(start: self.startIndex, end: self.endIndex), options: NSStringEnumerationOptions.ByWords, { (substring, substringRange, enclosingRange, bool) -> () in arr.append(substring)})
return arr
}
func splitStringByParagraphs() -> [String] {
var arr = [String]()
self.enumerateSubstringsInRange(Range(start: self.startIndex, end: self.endIndex), options: NSStringEnumerationOptions.ByParagraphs, { (substring, substringRange, enclosingRange, bool) -> () in arr.append(substring)})
return arr
}
}
And so I decided that now it was time to rewrite these under the principals of
pure swift.
Rewriting without Cocoa
Three of the methods were beautifully simple to rewrite:
extension String {
splitStringByCharacters() -> [Character] {
return map(self){return $0}
}
func splitStringByLines() -> [String] {
return split(self, {contains("\u{2028}\n\r", $0)
}, allowEmptySlices: false)
}
func splitStringByWords() -> [String] {
return split(self, {contains(" .,!:;()[]{}<>?\"'\u{2028}\u{2029}\n\r", $0)}, allowEmptySlices: false)
}
func splitStringByParagraphs() -> [String] {
return split(self, {contains("\u{2029}\n\r", $0)
}, allowEmptySlices: false)
}
}
And although I might not have captured every eventuality in my selection of characters, it gets across most situations.
The last hurdle
It was then that I was faced with splitting sentences and realised that while I could easily split sentences based on their end punctuation, if I did this then I would lose the punctuation mark each time (which is something that NSStringEnumerationOptions.BySentences keeps).
So here's my solution:
extension String {
func splitStringBySentences() -> [String] {
let arr:[Character] = ["\u{2026}",".","?", "!"]
var startInd = self.startIndex
var strArr = [String]()
for b in enumerate(self) {
for a in arr {
if a == b.element {
var endInd = advance(self.startIndex,b.index,self.endIndex)
//TODO: add method to allow for multiple punctuation at end of sentence, e.g. ??? or !!!
var str = self[startInd...endInd]
// removes initial spaces and returns from sentence
if contains(" \u{2028}\u{2029}\n\r",first(str)!) {
str = dropFirst(str)
}
strArr.append(str)
startInd = advance(endInd,1,self.endIndex)
}
}
}
return strArr
}
}
This method is more cumbersome than before, and others might have a more succinct approach, but actually converting these methods into pure Swift provides us with greater control over the choice of dividing characters in every instance, and also greater control over what ends up in the array.
Conclusion
We still need the Cocoa Framework to build apps but when working with Strings and Arrays there's very little that can't be done in Swift. And the more that you work in Swift, the more you think in Swift.
This code makes assumptions about the text that will break for a lot of languages.
ReplyDelete