Playing with the Primitives: Characters, chars and strings in Swift and Objective-C (Xcode 6.1)



Going on a Swift safari

The purpose of this post is not to explain the C data type named char but instead to make some observations. These observations may be of no use or interest to you, or they may be of great interest. I merely report them here as a way of recording my findings on a recent safari into the Swift language.

char and CChar

In the Objective-C documentation Apple describes char as follows: "The term C string refers to the standard char * type."  It can be assigned a character or even a string of characters.
char *cc = "xcdewd";
NSString *cd = [NSString stringWithCString:cc encoding:NSUTF8StringEncoding];
NSLog(@"%@",cd);
Whereas in Swift we cannot use char as a type at all. Instead we use Swift's equivalent type CChar. A CChar can only be assigned a single Int8 (of which it is a typealias), so we must write:
var char:[CChar] = [97,98,99]
String.fromCString(&char) // "abc" (optional string)
String(UTF8String: &char)  // "abc" (optional string)
Decoding the array, it is possible to use fromCString() or init with UTF8String.

unichar

Switching back to Xcode I want to now look at a unichar array, which is suited to the storage of data units the same size as UTF-16 characters and can only contain number values (i.e. a unichar cannot be created using characters or strings):
unichar cs[2];
cs[0] = 97;
cs[1] = 104;
NSString *cds = [NSString stringWithCharacters:cs length:2];
The same is true in Swift but we have the added functionality of being able to count the number of unichar characters. Something that is not possible in Objective-C.
var cc:[unichar] = [97,104]
NSString(characters: &cc, length: cc.count)
Note: A point of interest is that UniChar is declared in MacTypes.h and unichar is declared in NSString.h but both are type aliases for UInt16 in Swift.

Switch chars for ints

In Objective-C a unichar is equivalent to an unsigned short, which is an Integer type, "Not smaller than char. At least 16 bits." (cplusplus.com) and in fact we can rewrite our code like this
unsigned short cs[2];
cs[0] = 97;
cs[1] = 104;
NSString *cds = [NSString stringWithCharacters:cs length:2]; // ah
without any ill effects. And since unichar is a type alias of UInt16 and CChar is a type alias of Int8 we could rewrite the earlier Swift code without using unichar and CChar declarations as well:
var cc:[UInt16] = [97,104]
NSString(characters: &cc, length: cc.count)

var char:[Int8] = [97,98,99]
String.fromCString(&char) // "abc" (optional string)
String(UTF8String: &char)  // "abc" (optional string)

Further experimentation

Looking a bit closer at the CChar and UniChar types
// MacTypes.h
UTF8Char("X") // 88
UTF16Char("X") // error - Does not conform to ExtendedGraphemeClusterLiteralConvertible
UniChar("X") // error - Does not conform to ExtendedGraphemeClusterLiteralConvertible
UTF32Char("X") // 88

// Swift
CChar("X") // error - Does not conform to ExtendedGraphemeClusterLiteralConvertible
CChar16("X") // error - Does not conform to ExtendedGraphemeClusterLiteralConvertible
CChar32("X") // 88
we find a variable ability to convert from a literal but always the ability to work with numbers.

UnicodeScalars and Character

As well as CChar16 (equiv. to char16_t), CChar32 (equiv. char32_t), Swift also has CWideChar (equiv. to wchar_t) which is a type alias of UnicodeScalar. A UnicodeScalar (and in turn a CWideChar) can be initialised using a UInt32, UInt16, UInt8 or UnicodeScalar. While a Character can be initialised with a string which is a single character in length or a UnicodeScalar.
let aScalar = UnicodeScalar("a")
let bString = "a"
Character(aScalar)
Character(bString)
In addition to Character, Swift also contains the following three structs: UTF8, UTF16 and UTF32. All three conform to the UnicodeCodecType protocol. You'll notice, however, that there is something only UTF8 and UTF32 can do:
UTF8.CodeUnit("x") // 120
UTF16.CodeUnit(121) // 121
UTF32.CodeUnit("z") // 122
All the time Swift keeps us close to the bytes through the Playground environment (and also the documentation). A UnicodeScalar doesn't display a character in the preview areas of a Playground but the unicode reference number.

Byte arrays

As an aside, to convert strings into byte arrays we can do the following:
[UInt8]("é".utf8) // [195,169]
[UInt16]("é".utf16) // [233]
If you are unclear about why the two arrays contain different numbers for the same character, please refer to discussion of unicode and code points in this earlier post.

Strings

In the String section, I'm going to focus solely on Swift (since that is the greater point of interest here). In Swift there are NSStrings and native Strings. A String has the NSString methods available to it, but not the opposite is not true. So rarely will you need to use NSString.

A String is a struct that can be initialised from a literal string, a Character, a UnicodeScalar or a repeated number of Characters or UnicodeScalars.
let aScalar = UnicodeScalar("a")
let bString = "a"
Character(aScalar)
let g = Character(bString)

String("Hello World")
String(g)
String(aScalar)
String(count: 10, repeatedValue:aScalar)
String(count: 10, repeatedValue: g)
The String type can also be initialised with Signed and UnsignedIntegerTypes and binary, octal, decimal and hexadecimal string representations returned.
String(-10) // "-10"
String(10) // "10"
String(10, radix: 2) // "1010"
String(2564, radix: 16, uppercase: true) // "A04"

An instance of String can append a Character or UnicodeScalar and be extended with a String or sequence.
var strA = "abc"
strA.append(Character("d"))
strA.append(UnicodeScalar("e"))
strA // abcde

strA.extend("fgh")
strA.extend(["i","j","k"])
strA // abcdefghijk
The appended sequence (in the example above) is an array of single character strings and these are being interpreted as characters, and if we wanted to be pedantic then we'd write:
strA.extend([Character("i"),Character("j"),Character("k")])
It is a reminder that the String itself (in Swift) can always be broken down into a sequence using for-in:
for a in strA {
    println(a)
}
And there's something important here to be taken away about the relationship between strings and characters, whether we are discussing the relationship between an NSString and a char array in Objective-C or the relationship between String and Character in Swift. The point is that no matter how many methods we must jump through to transform one into the other there is always a connection to be made between a string and a sequence (or array) of characters. And those characters are in turn reducible to one or more code units, which are simply numbers that are determined (when working with unicode characters) on their length in bits and the UTF specifications.

 The end.


Comments

  1. "we have the added functionality of being able to count the number of unichar characters. Something that is not possible in Objective-C." Hu? [NSString stringWithCharacters:cs length:sizeof(cs)/sizeof(unichar)]; or sizeof(cs)/sizeof(typeof(cs[0])) or a macro of that.

    ReplyDelete
  2. Thank you for the comment and for pulling me up on this.

    ReplyDelete

Post a Comment