Explaining the CATransform3D Matrix: Translation, Scale and Rotation (Swift, iOS, Xcode)


To understand the basics of using CATransform3D functions, see this earlier post and this earlier post. The current post explains in greater detail what is meant by the matrixes presented in the Apple documentation under the details of CATransform3D functions and it is advisable to read first this post on CGAffineTransform matrixes in order to fully understand what a transformation matrix is.

Translation

CATransform3DMakeTranslation() tells us that it
Returns a transform that translates by '(tx, ty, tz)'. t' = [1 0 0 0; 0 1 0 0; 0 0 1 0; tx ty tz 1]
As with the CGAffineTransform matrix, this should be read as a grid top to bottom:

1  0  0  0
0  1  0  0
0  0  1  0
tx ty tz 1


Where the translated x position should be read:
x'  = 1 * x + 0 * y + 0 * z + tx

The translated y position should similarly be read:
y'  = 0 * x + 1 * y + 0 * z + ty
The translated z position:
z'  = 0 * x + 0 * y + 1 * z + tz
And the 1 (factor):
1 =  0 * x + 0 * y + 0 * z + 1 * 1

Scaling

CATransform3DMakeScale() tells us that it
Returns a transform that scales by `(sx, sy, sz)': * t' = [sx 0 0 0; 0 sy 0 0; 0 0 sz 0; 0 0 0 1]
Once again the matrix can be viewed like so:

sx  0  0  0
0  sy  0  0
 0  0  sz  0 
0  0  0  1
Where the scaled x position should be read:
x'  = sx * x + 0 * y + 0 * z + 0
The scaled y position should be read:
y'  = 0 * x + sy * y + 0 * z + 0
The scaled z position:
z'  = 0 * x + 0 * y + sz * z + 0
And the 1 (factor):
1 =  0 * x + 0 * y + 0 * z + 1 * 1

Rotation

Rotation is the trickiest of the three transformations to understand because it relies not on a matrix but a vector. CATransform3DMakeRotation() tells us that it
Returns a transform that rotates by 'angle' radians about the vector '(x, y, z)'. If the vector has length zero the identity transform is returned.
The x, y and z values are not a distance but a multiplication factor, the value of which can either be 0.0 in which case nothing will change with the rotation around that specified axis (in other words, it will return the "identity transform") or it can be set to a number. It is the relationship between any set numbers that determines the result of the rotation. If z was set to 2.0 and y was set to 1.0, the rotation of any shape would be weighted to a greater extent towards the z-axis rotation rather than the y-axis, because the vector would be pulled more in that direction than the other.

The rotation is "about the vector" not stretched along the vector. The words here are carefully chosen in the Apple documentation. There is not elongation along the vectors. Scaling and rotation are kept separate, although this doesn't exclude a sense of lengthening and shortening as rotated drawing is presented as closer or further away.

Further into three-dimensional space

To explain, if we were to draw a line one pixel wide straight up from what we typically think of as an x,y-origin (rather than iOS's inverted origin) and then anchor the line at the origin, making an animated 360 degree rotation around the z-axis it would appear to go around a clock face. Now imagine a line one pixel high and 200 pixels wide that runs from the origin along the y-axis and imagine yourself sat at the top of the x-axis looking straight down rather than viewing the axes face on as before. Here you will see the line rotate as before, as if around a clock face. Finally imagine a line drawn from the origin, 1 pixel wide and 200 pixels upwards. And sit yourself at the end of the y-axis looking down its length. Once again you will see the line rotate as before, as if around a clock face.

Because visually we cannot view a single object from all three positions at once and we have only one position from which we view an object, we experience all but one of these rotations as something other than what it is. We experience the x and y rotations as foreshortening. The y width-wise and the x height-wise. So what we see and what is really happening, if we were to take a different vantage point are two different things.

But a computer can take into account all points of perspective at once, and so if we were to rotate an object by 90 degrees along the y and z axes, with equal factors, and identify the points between the two, the resulting line would appear to run close to 45 degrees, but it is not the same as a 45-degree rotation along the z-axis alone, or a line drawn at 45 degrees to the x and y axes (in two-dimensional thinking). Its position is instead calculated as a vector taking into account x, y and z values, and has the appearance of the corner of a necker cube (which presents itself as face on and yet we see one side and perspective).
And since every corner of a cube face has a 90 degree angle, it appears as if this line has at once has dropped 90 degrees from the y-axis and ticked 90 degrees around from the x-axis when taking into account a three-dimensional perspective.

The transformation code is actually written,
CATransform3DMakeRotation(degree2radian(90),0.0,1.0,1.0)
weighting the rotation towards the y and z axes. But what we do and what we see has a complex relationship in three-dimensional space.

Here a 120-degree rotation is applied in equal measure to the y and z axes:
A far sharper sense of perspective is achieved as the x-axis appears to swing round towards us and the clock hand rotating the y-axis has ticked on further away from us, but appearances are deceptive. If we add back the line that was drawn in the first image (but here is coloured green), then we see that the new line is foreshortened compared to the original due to the additional rotation about the y-aixs, but is actually lower.

The new line seems close representing something nearer to 60 degrees but as before this isn't precise, because a vector is calculated based on three-dimensional midpoints rather than a straight division of what we think of as x and y in two-dimensional space. For reference this is what a (blue) line rotated 60 degrees would look like:



For more detailed explanations see herehere and also here.

Note: in the first illustration the black line representing the x-axis is actually a line draw exactly the same as that of the one standing vertically but rotated 90 degrees along the z-axis. In the second illustration the same line is rotated 120 degrees. In fact every rotation is of a vertical line starting in the same position with the same height. And no scaling is performed.

Final Point

A CATransform3D has 16 property values. These are numbered m11-m14, m21-m24, m31-m34 and m41-44. The first number corresponds to the matrix row and the second number to the column, so m43 would be matrix, row 4, column 3, when arranged in the format shown above.

Note: A good way to explore matrix values is using a Playground file, because every CATransform3D that is created outputs its matrix values in the preview area. 

Final thoughts

CATransform3D functions manipulate two-dimensional objects in three-dimensional space, rather than manipulating three-dimensional objects in three-dimensional space. For the latter purpose there exists the SceneKit and OpenGL frameworks.

To make the CATransform3D work for the manipulation of three-dimensional objects we'd need to think of building 3D from 2D as illustrated by this post and taking account of the positioning and rotation needs of each element of a drawing.

Note: Apple include discussion of 2D vectors in the Swift documentation.

Endorse on Coderwall

Comments