A survey of Cartesian coordinate systems orientation in 3D softwares

Why this survey?

3D softwares need a way to represent the objects the users interact with through their interface. This representation can be done using geometric shapes which are easily manipulated through mathematical operations implemented in these softwares. To define these geometric shapes, a coordinate system is necessary. Softwares in this survey use the Cartesian coordinate system, others use different systems, like the spherical coordinate system, or even several systems simultaneously.

A Cartesian coordinate system is defined by an origin and as many orthogonal axes as the dimension of the system. The orientation of the axes and the location of the origin relatively to the environment simulated by the software are not standardized, as well as the label of the axes. This leads to various interpretation and implementation, which confuse a great number of people (1,2,3, ...). This also leads to many pointless arguments about what should be the "correct" coordinate system (1, 2, ...). And by the way, beside coordinate systems, have a look at time zones, colors, character encoding, PNG gamma correction (sorry, seems to be a broken link, look at here instead), RGB, ... Yes, we're doomed, there is no way out.

This survey checks for the existence, or non-existence, of a prominent choice in the orientation and labels of the coordinate systems, to help settle down the arguments (if such a thing is possible), and to help a little those who are lost in all that mess.

How it's been done?

I've used the lists of 3D rendering softwares and 3D modelling softwares available on Wikipedia. For each one I've tried to find on Internet information about how the coordinate system these softwares use is oriented: which vector represents the upward direction (direction from feet to head of a standing human, in case someone would like to argue about that too) and which handedness is used. I've discarded the softwares for which I couldn't find these information, and I've added a few other I've found during the survey. I've looked preferentially at official documentation, but when not available I've refered to screenshots (identified with a '!' in results), or forum/blog messages (identified with a '?' in results). I believe screenshots to be relatively reliable, I'm much less confident about information found on forums and blogs. When a software uses several coordinate systems, I've counted each one as a different system, added information in the results for each one and precised which one it is (camera, world, scene, ...).

It was quite a lot of work to find all these information. I did it honestly and double checked it, but I may have made mistakes or have misunderstood some systems (I don't know at all most of these softwares and it's far from explicit in some cases). If you find a mistake, let me know by email (providing proof of the mistake), I'll be happy to correct. However, I would rather not add new data you may send me, cause I can't know if you're honest or trying to bias the results toward one particular coordinate system (the whole point of this survey !). You can still let me know about software/libraries/... not in the list. I may add new data I would have looked for myself in the future.

Results

left-handed | right-handed | Total | |

+y is up | 17 (18.2%) | 34 (36.5%) | 51 (54.8%) |

+z is up | 5 (5.3%) | 33 (35.4%) | 38 (40.8%) |

-y is up | 0 (0.0%) | 3 (100.0%) | 3 (3.2%) |

-z is up | 0 (0.0%) | 1 (1.1%) | 1 (1.0%) |

Total | 22 (23.6%) | 71 (76.3%) | 93 (100.0%) |

Conclusion

- 90.3% of the coordinate systems are one of (+y is up, left-handed), (+y is up, right-handed) or (+z up, right-handed).
- 9.6% of the coordinate systems come from another solar system.
- The most often used are (+y up, right-handed, 36.5%) and (+z up, right-handed, 35.4%)
- Right-handed is more often used than left-handed (76.3% against 23.6%)
- +y representing upward direction is more often used than +z representing upward direction (54.8% against 40.8%)

With the most often used system being just above one third, it seems difficult to speak of one particular system as the 'standard'. Inside different softwares of the same company (Autodesk), or even inside the same software (Blender) you can find different systems. Not mentioned in this survey, even the color encoding XYZ→RGB doesn't make the unanimity (TrueSpace). It's a complete mess. Maybe the less polemical system would be (+y is up, right-handed) as it's the combination of the most often used orientation and most often used handedness ?

Edited on 2022/06/10: added a few more data. See also this page for a similar survey leading to the results: +y,left:16.3% +y,right:53.0% +z,left:4.0% +z,right:26.5%.

My two cents' worth

If I was still young and fresh, and maybe also cause I have Crusader ancestors, DNA you know, I would surely have a hero and fight for it. Now these days are gone, I've accepted that the world is a total mess and I won't make it better in any way. I have a job to do, if the client is happy with the result and I can get back home tonight early and safe, that's already more than enough. When I use a piece of software/code/library/... I'm using it as it is and don't waste time "correcting" it to match some presomptuous view. If I need to use several ones at the same time, I prefer to stick to each one preferred system and add some (rather as few as possible) conversion functions at strategic locations to glue everything together.

Now, if you insist and really want to know which coordinate system I prefer, which one I would choose if I was living in Utopia land where I'm alone, free and owner of every single line of code in the world: that's (+y is up, left-handed) and here is why (sorry, it's going to be a long, strongly opiniated, highly subjective story).

We are considering here softwares dealing with 3D geometry displayed on a 2D medium (screen, projector, ... I'll call it the screen to simplify). Lets start from the goal: the 2D representation. I need a coordinate system to locate the displayed elements on the 2D screen. The Cartesian coordinate system seems a sound choice as it's here since almost 400 years and many seems happy with it as shown by this survey. But I have now the same problem for 2D as for 3D: how should the axes be oriented and where should the origin be located. I'll leave the 2D version of this survey as homework for the reader, but from my personal experience I can tell you there is a lot of fun here too. First, the labels. 'x' for the abcissa and 'y' for the ordinate, everybody agrees about that, right ? ... right ?? Well, not necessarily. Ok, I use my 'Utopia land' joker card, 'x' IS the abcissa and 'y' IS the ordinate cause that's what I've been taught at school. Good, now lets talk about the direction. Technically, any combination of (x/y), (positive/negative), (up/down) and (left/right) would do the job and lead to the exact same result. I have a crush on +x/right and +y/up and I like the idea that's due to cultural influence and how my brain's hemispheres are arranged. Also for cultural reason (cause historically the first display devices were developped in countries writing top→down since centuries), monitors/printers/... have a preference for +y/down, I know. I stubbornly keep +y/up and I'll come back to that later. Last but not least, I want to look at the 2D screen as I'm usually looking at the world, standing on my two feet, looking at the horizon in front of me. Which brings me to the 2D coordinate system: origin at bottom-left, +x abcissa toward right, +y ordinate toward top. Phew !

Now the 3D coordinate system. I've choosen the orientation of the 2D coordinate system that will display what's described by the 3D coordinate system, why on Earth would I not choose the same orientation for the 3D coordinate system ? If +x is right and +y is up on the 2D screen, then lets use +x/right and +y/up in 3D too and save a few headaches ! In 3D we also need a third axis, lets use 'z' cause that's the letter coming after 'y' in the alphabet, which follows the logic of choosing 'y' cause it's the letter coming after 'x' (once again, completely arbitrary and culturally influenced). To be a Cartesian coordinate system z must be orthogonal to x and y, and I've set +x/right and +y/up, thus z is the rear/front axis. About the direction, I feel that's going to be the most controversial decision of all but thinking of what's in front of me as positive look to me as more natural (call it an over-optimistic view of the world!). Probably some subconscious cultural influence here too. Also, with a very egocentric view, thinking of the scene as in front of me, that brings what I care about (cause that's what I'm looking at) in the +z area and makes me feel it's going to save me a few '-' sign here and there. What about the origin ? Well, it doesn't make real sense here. In 2D, we are talking about the origin relatively to the (finite) screen, but the 3D coordinate system is here to describe the world, in its infinite entirety, relatively to nothing in particular. Finally, all of this brings me to the 3D coordinate system: +x/right, +y/up, +z/forward (which is a left-handed coordinate system). Phew, bis !

One word about color encoding. The RGB color model is not the only color model, but it's here since 19th century, works well and is widely accepted and used in computer graphics (another survey ?). As everything else, talking about standard here is a bit risky (1, 2, ...), but since OpenCV BGR model tends to give me epileptic ceisures, I'll stick to RGB. Then, given XYZ and RGB as the chosen orders, I won't add one more layer of confusion by not choosing the direct association X/R, Y/G, Z/B. Make things simple has the appreciated property of allowing direct mapping from coordinates to colors, for example the axis vectors <1,0,0>, <0,1,0>, <0,0,1> can be used directly has the RGB vector of their corresponding color.

Practically, how these choices translate into the calculation of the coordinates \(\vec{P_s}\) in the screen 2D coordinate system from the coordinates \(\vec{P_w}\) in the world 3D coordinate system ? The calculation depends on the type of projection and camera model and the goal here is not to introduced all of them, so I'll consider only the perspective projection and pinhole camera model (with no lens distortion) and you'll have to check yourself for the others.

\(\vec{P_s}=[c_x+S_x/S_z, c_y+S_y/S_z, S_z]\)

where

$$ \vec{S}^T= \left[ \begin{array}{ccc} \phi&0&0\\ 0&\phi&0\\ 0&0&1\\ \end{array} \right] \left[ \begin{array}{ccc} r_x&r_y&r_z\\ u_x&u_y&u_z\\ f_x&f_y&f_z\\ \end{array} \right] (\vec{P_w}-\vec{C})^T $$ and

- \(w\),\(h\) are the width and height of the screen in pixel
- \(\phi\) is the focal length and can be obtained from the field of view \(\alpha\) as follow \(\phi=\frac{w}{2tan(\alpha/2)}\)
- \(c_x\),\(c_y\) are the coordinates in pixel in the screen of where the camera is looking at, if you're not using a weird device it will always be the center of the screen: \((c_x,c_y)=(w/2,h/2)\)
- \(\vec{r}\),\(\vec{u}\),\(\vec{f}\) are respectively the normalised vectors in the world coordinate system expressing the camera's right direction, up direction and front direction
- \(\vec{C}\) is the position of the camera in the world coordinate system

The second matrix is the pose of the camera. I personnally find it much easier to think of it as a composition of the right, up and front direction rather than a composition of rotation matrices using angles and trigonometry. Can't remember in which order right, up and front are in the matrix ? From the reasoning above I immediately recall x/right, y/up, z/front, then x,y,z→1,2,3→right/1st row,up/2nd row,front/3rd row. Not sure anymore if it was column or row ? As I write one line per row, I write one axis per row. Want to know in which direction the camera is looking at ? Simple, that's front→z→the 3rd row of the pose matrix.

Wonder what a default (i.e. unrotated, untranslated) camera pose corresponds to, that's the identity for the pose matrix and a null vector for \(\vec{C}\), which looks natural to me. With a default camera, \(P_{wx}\) increases → \(P_{sx}\) increases, \(P_{wy}\) increases → \(P_{sy}\) increases, \(P_{wz}\) increases (ie the object's going away from me) → \(P_{sx}\) and \(P_{sy}\) converges to the center of the screen as I would expect. Etc...

Finally, I need to account for the case when +y is downward in the screen coordinate system. Simply use the following equation instead of the one above to correct the y component: \(\vec{P_s}=[c_x+S_x/S_z, h-(c_y+S_y/S_z), S_z]\). No need to complicate my life with several different systems.

If you have a doubt about the equations above, you'll find here a dirty little piece of C to test them. It renders the Stanford bunny in red using POV-Ray, then using the implementation of the equations introduced here in blue and compose the resulting images. As you can see below, they match so perfectly that the red dots are completely hidden by the blue dots (click to enlarge).

There you are. If you've read so far (you probably should have not), thank you. Don't contact me to tell me you agree or disagree, I really don't care. There is no "correct" system, there will never be. Use what exists as it is, if you make one from scratch make it the way it looks natural for you and let others do the same. Focus on the only important thing: fulfill the specs and meet the deadlines.

May this egg war ends !!