OOP in C - Methods

The object oriented programming paradigm is based on the concept of object. An object is a structure which can contain both data and code. The code of an object is called its methods. In C, structures usually contains only data, and code is applied to these structures via functions. Wouldn't it be nice if it was also possible to have methods in C ?

How to define and use methods in plain standard C.

Ideally I would like to be able to write something that resembles the C++ syntax: instance->method(arguments). Each component is immediately identifiable and appear only once, and the syntax is short. For the declaration of the method, to emphasize the fact that the method is part of the object, at the same level as the data, it should appear inside the structure declaration.

For the declaration, it is possible to use pointers to function in the structure declaration, for example as follow:

aField and aMethod clearly belong to MyObject and are declared at the same level and same place. If instance is an instance of MyObject, they can be used as instance.aField and instance.aMethod(...). So far so good.

The field value is accessed and modified as usual, but the method needs to be linked to its body. To create a new instance of MyObject, I define, for each objects, a function struct MyObject MyObjectCreate() which create the instance and eventually initialise the fields of the structure based on its arguments (not written here). This function, equivalent to the constructor in C++, also has the duty to link the function pointers of the instance with the appropriate functions body. It gives something like:

Now I can create an object instance and use its method as I expected:

However the method misses a crucial property: the ability to refer to its invoking object instance. Inside the body of MyObjectMethod() there is no way to access the instance at the origin of the function call. In C++ it's the role of the keyword this. The solution to this problem I keep seeing among those who does OOP in C is to add one argument to the function MyObjectMethod() and pass instance to the function, as follow:

which I personnally find super-mega-ugly. At each method definition and declaration you need to bother about that extra argument, and at each call to the method you need to write the instance variable twice. Which is even more a problem when you want to write things like functionWhichReturnAnInstanceOfMyObject().aMethod(...). What do you use as argument ? Another call to functionWhichReturnAnInstanceOfMyObject() ? If the function returns newly allocated instance, or has side effect, you're in trouble...

Then I looked for another, more elegant, solution. It took me a lot of time and led me to many dead-ends. Let me first show some of what I've tried, and didn't work.

Macros ! The pre-compiler is an underestimated friend, what can he do for us in this kind of situation. You could define the following macro: #define M(instance, method)(arguments) instance.method(instance, arguments). You don't need to write instance twice any more, but you're still in trouble if instance is the kind of function I was speaking of earlier. Also, what about arguments, they will be different for each method... To solve this problem you may think of variadic macros and try #define M(instance, method, ...) instance.method(instance, __VA_ARGS__). That creates another trouble: variadic macros need at least one argument. If you have a method with no argument (other than instance), you won't be able to write M(instance, method);. GCC accept the __VA_OPT__ macros added in C++20, which is the same as __VA_ARGS__ except it accepts empty value. That would be exactly what you need here but you're venturing out of the C standard (as far as I can tell, and at the time I'm writing this). This link gives hint toward a convoluted way to emulate __VA_OPT__ with the current C standard. I haven't explored that way further. Another workaround would be to always have at least one argument for the methods, including a dummy useless one if necessary. You guess what I think about that.

Anyway there is still the problem about instance. Here you may think of using a temporary variable as follow #define M(instance, method)(arguments) MyObject i=instance; i.method(i, arguments) (I omit the usual guard do {...} while(0) for conciseness). Now you're good to go even if instance is one of those problematic function, but you've just made your macro depends on the type of instance, which may be solved using _Generic. Rather, the variable i is problematic: it must not be an already used variable name in the scope where you use the macro, which you don't know anything about, hence must be choosen carefully. However you could still use this macro twice in the same scope thanks to the do guard. Given all these complication it's better to let the user define that temporary variable before using the macro (which then doesn't need to be worried about any more).

Yet, in all that struggle we see there are opportunities and a bit of hope, building up on which, I finally came up with the following solution (explanation following code).

First I define a global variable void* that_;. The name must not conflict with any other variable, so it must me choosen carefully. I'm using 'that_' here as an example. The structures declaration, with their fields and methods is as introduced previously. Then, I define three others macros to call a method and emulate C++'s 'this' keyword. (note, you'll need this variable to be _Thread_local if you plan to use these methods in a multithread environment)

The first macro, #define $(a, b) ((__typeof__(a))(that_=(a)))->b , reuses the idea of the temporary variable to avoid problems with return value from function as instance. That temporary variable is the one introduced in the previous paragraph and its type void pointer allow to handle any structure, even those the user would define later. An approach using _Generic, as I wrote in a previous version of this article, works too but creates problems when the macro as been declared several time in separate libraries which the user wants to use in a common project. He would have to edit these declarations to have only one _Generic including all the others. The __typeof__ solution avoids that, with #ifndef to guard against redefinition for the case I've just described. About __typeof__ being or not being standard, this is a nice read. It passes gcc -std=c18 -pedantic -Wall -Wextra silently, good enough for me ! (Ah, if you wonder why we need the cast: the assignment operator returns the type of the left value, hence void*, which is derefenced by ->b, and the dereferencement of a void pointer is invalid, hence the need for a cast)

I'm calling this macro $() as I want it to be as short as possible (it will be reused over and over), and in the same manner as the * operator (*a gives the value pointed to by a, $(i, m) gives the function body of the method m for the instance i). Note that here I don't care any more about the arguments. This macro is only about getting the function corresponding to the method of an instance and doing some magic under the hood to memorise the invoking instance. It is then used as $(instance, method)(arguments) and arguments can be whatever you want, including no argument at all. Also, the macro is a single instruction, so there is no need for do guard. And even better, the invoking instance is not passed through the argument any more, which simplifies the interface of the methods.

The second and third macro, #define THAT_OPERAND struct Operand* that=(struct Operand*)that_ and #define THAT_OPERATOR struct Operator* that=(struct Operator*)that_, are to be used at the beginning of each method's body. They declare the that variable which the method can use to refer to its invoking instance. This macro must be defined for each object, as for the _Generic of the first macro. That's the only downside I see in my approach. I think they are worth the advantages. If you split your code properly into one compilation unit per object you can also use a shorter name like THAT and reduce the burden of having to write it at the head of each method. If you forget to add a new object type, the compiler will notice it immediately and it's just one line to add. If you mistake the type in these macros, that will also be detected at compilation. It's straightforward, simple, light and safe. (Note that you'll also probably want a #define CONST_THAT struct XXX const* that=(struct XXX*)that_)

If you're worrying about the invoking instance being passed around through that one single global variable, you're right, but you (almost) shouldn't. This global variable is only here to memorise the address of the invoking instance between the moment the method is called and the moment THAT is called (which I took care to be the first line of the body of each method). Yes, there is here the possibility for a bug. If you invoke a method before calling THAT in the body of a macro, that_ will be clobbered and no one will warn you. Know it and stick to the rule "THAT always first in the method's body" and you'll be fine. Then, what could happen between the call and the first line of the body that would affect that_ ? Nothing, ... almost ! There is actually a chance for trouble. What happens if the arguments of the method contain another method's call, like in $(&operator, get)($(&op1, get)(), $(&op2, get)()) ? The standard says the order of evaluation of the function designator and the arguments is undefined behaviour. That's not a problem for the arguments, their order doesn't matter here. But that's a big problem for the function designator. It MUST be the last to be evaluated, or the assignment of that_ by $() during evaluation of arguments will clobber the assignment during $(&operator, get) and THAT_OPERATOR will get corrupted. Personnally I don't see that as a problem. The first reason is that the compiler does its job correctly: the line above does not compile and lead to the following error message:

Then you don't risk anything, the compiler protects you. The second reason, which is more an opinion than a reason, is that I personnally find calling function in the argument list of another function extremely dirty, and I have the habit of not doing so since well before I've started thinking about OOP in C. So, to me at least, it's really not a problem.

The advantages of using methods.

Great, I now have a satisfying way to declare and use methods in C. Lets see the advantages of using them.

The disadvantages of using methods.

To be fair I must also consider the disadvantages of using them.

Conclusion

I've found this solution attractive enough to decide to give it a try in a large personal project, LibCapy, to see how it peforms in reality. I may update this article later with some insights gained doing so. Until then, I hope you will give it a try too ! Let me know by email.

If this article was interesting to you, you may also want to read: OOP in C - Inheritance, OOP in C - Genericity.

2021-09-07
in All, C programming,
257 views
Copyright 2021-2024 Baillehache Pascal