Baillehache Pascal's personal website

OOP in C - Methods

The object oriented programming paradigm is based on the concept of object. An object is a structure which can contain both data and code. The code of an object is called its methods. In C, structures usually contains only data, and code is applied to these structures via functions. Wouldn't it be nice if it was also possible to have methods in C ?

How to define and use methods in plain standard C.

Ideally I would like to be able to write something that resembles the C++ syntax: instance->method(arguments). Each component is immediately identifiable and appear only once, and the syntax is short. For the declaration of the method, to emphasize the fact that the method is part of the object, at the same level as the data, it should appear inside the structure declaration.

For the declaration, it is possible to use pointers to function in the structure declaration, for example as follow:

struct MyObject {
  int aField;
  void (*aMethod)(int argumentOfTheMethod);
};

aField and aMethod clearly belong to MyObject and are declared at the same level and same place. If instance is an instance of MyObject, they can be used as instance.aField and instance.aMethod(...). So far so good.

The field value is accessed and modified as usual, but the method needs to be linked to its body. To create a new instance of MyObject, I define, for each objects, a function struct MyObject MyObjectCreate() which create the instance and eventually initialise the fields of the structure based on its arguments (not written here). This function, equivalent to the constructor in C++, also has the duty to link the function pointers of the instance with the appropriate functions body. It gives something like:

void MyObjectMethod(int argument) {
  ...
}
struct MyObject MyObjectCreate(int fieldInitialValue) {
  struct MyObject instance;
  instance.aField = fieldInitialValue;
  instance.aMethod = MyObjectMethod;
  return instance;
}

Now I can create an object instance and use its method as I expected:

struct MyObject instance = MyObjectCreate(0);
instance.aMethod(1);

However the method misses a crucial property: the ability to refer to its invoking object instance. Inside the body of MyObjectMethod() there is no way to access the instance at the origin of the function call. In C++ it's the role of the keyword this. The solution to this problem I keep seeing among those who does OOP in C is to add one argument to the function MyObjectMethod() and pass instance to the function, as follow:

void MyObjectMethod(struct MyObject* instance, int argument) {
  ...
}
...
instance.aMethod(instance, ...);

which I personnally find super-mega-ugly. At each method definition and declaration you need to bother about that extra argument, and at each call to the method you need to write the instance variable twice. Which is even more a problem when you want to write things like functionWhichReturnAnInstanceOfMyObject().aMethod(...). What do you use as argument ? Another call to functionWhichReturnAnInstanceOfMyObject() ? If the function returns newly allocated instance, or has side effect, you're in trouble...

Then I looked for another, more elegant, solution. It took me a lot of time and led me to many dead-ends. Let me first show some of what I've tried, and didn't work.

Macros ! The pre-compiler is an underestimated friend, what can he do for us in this kind of situation. You could define the following macro: #define M(instance, method)(arguments) instance.method(instance, arguments). You don't need to write instance twice any more, but you're still in trouble if instance is the kind of function I was speaking of earlier. Also, what about arguments, they will be different for each method... To solve this problem you may think of variadic macros and try #define M(instance, method, ...) instance.method(instance, __VA_ARGS__). That creates another trouble: variadic macros need at least one argument. If you have a method with no argument (other than instance), you won't be able to write M(instance, method);. GCC accept the __VA_OPT__ macros added in C++20, which is the same as __VA_ARGS__ except it accepts empty value. That would be exactly what you need here but you're venturing out of the C standard (as far as I can tell, and at the time I'm writing this). This link gives hint toward a convoluted way to emulate __VA_OPT__ with the current C standard. I haven't explored that way further. Another workaround would be to always have at least one argument for the methods, including a dummy useless one if necessary. You guess what I think about that.

Anyway there is still the problem about instance. Here you may think of using a temporary variable as follow #define M(instance, method)(arguments) MyObject i=instance; i.method(i, arguments) (I omit the usual guard do {...} while(0) for conciseness). Now you're good to go even if instance is one of those problematic function, but you've just made your macro depends on the type of instance, which may be solved using _Generic. Rather, the variable i is problematic: it must not be an already used variable name in the scope where you use the macro, which you don't know anything about, hence must be choosen carefully. However you could still use this macro twice in the same scope thanks to the do guard. Given all these complication it's better to let the user define that temporary variable before using the macro (which then doesn't need to be worried about any more).

Yet, in all that struggle we see there are opportunities and a bit of hope, building up on which, I finally came up with the following solution (explanation following code).

#include "stdio.h"
#include "stdlib.h"
#include "assert.h"

void* that_;

struct Operand {
  int a;
  int (*get)(void);
};

struct Operator {
  int a;
  int (*get)(int, int);
};

#ifndef $
  #define $(a, b) ((__typeof__(a))(that_=(a)))->b
#endif
#define THAT_OPERAND struct Operand* that=(struct Operand*)that_
#define THAT_OPERATOR struct Operator* that=(struct Operator*)that_

int OperandGet(void) {
  THAT_OPERAND;
  return that->a;
}

int OperatorGet(int a, int b) {
  THAT_OPERATOR;
  return that->a + a + b;
}

struct Operand OperandCreate(int a) {
  return (struct Operand){.a = a, .get = OperandGet};
}

struct Operator OperatorCreate(int a) {
  return (struct Operator){.a = a, .get = OperatorGet};
}

int main() {
  struct Operand op1 = OperandCreate(1);
  struct Operand op2 = OperandCreate(2);
  struct Operator operator = OperatorCreate(3);
  int a = $(&operator, get)(1, 2);
  printf("a = %d\n", a);
  assert(a == 6);
  int val1 = $(&op1, get)();
  int val2 = $(&op2, get)();
  int b = $(&operator, get)(val1, val2);
  printf("val1 = %d, val1 = %d, b = %d\n", val1, val2, b);
  assert(b == 6);
  return 0;
}

// Compiled with
// gcc -std=c18 -pedantic -Wall -Wextra -Werror -Wfatal-errors -o main main.c

First I define a global variable void* that_;. The name must not conflict with any other variable, so it must me choosen carefully. I'm using 'that_' here as an example. The structures declaration, with their fields and methods is as introduced previously. Then, I define three others macros to call a method and emulate C++'s 'this' keyword. (note, you'll need this variable to be _Thread_local if you plan to use these methods in a multithread environment)

The first macro, #define $(a, b) ((__typeof__(a))(that_=(a)))->b , reuses the idea of the temporary variable to avoid problems with return value from function as instance. That temporary variable is the one introduced in the previous paragraph and its type void pointer allow to handle any structure, even those the user would define later. An approach using _Generic, as I wrote in a previous version of this article, works too but creates problems when the macro as been declared several time in separate libraries which the user wants to use in a common project. He would have to edit these declarations to have only one _Generic including all the others. The __typeof__ solution avoids that, with #ifndef to guard against redefinition for the case I've just described. About __typeof__ being or not being standard, this is a nice read. It passes gcc -std=c18 -pedantic -Wall -Wextra silently, good enough for me ! (Ah, if you wonder why we need the cast: the assignment operator returns the type of the left value, hence void*, which is derefenced by ->b, and the dereferencement of a void pointer is invalid, hence the need for a cast)

I'm calling this macro $() as I want it to be as short as possible (it will be reused over and over), and in the same manner as the * operator (*a gives the value pointed to by a, $(i, m) gives the function body of the method m for the instance i). Note that here I don't care any more about the arguments. This macro is only about getting the function corresponding to the method of an instance and doing some magic under the hood to memorise the invoking instance. It is then used as $(instance, method)(arguments) and arguments can be whatever you want, including no argument at all. Also, the macro is a single instruction, so there is no need for do guard. And even better, the invoking instance is not passed through the argument any more, which simplifies the interface of the methods.

The second and third macro, #define THAT_OPERAND struct Operand* that=(struct Operand*)that_ and #define THAT_OPERATOR struct Operator* that=(struct Operator*)that_, are to be used at the beginning of each method's body. They declare the that variable which the method can use to refer to its invoking instance. This macro must be defined for each object, as for the _Generic of the first macro. That's the only downside I see in my approach. I think they are worth the advantages. If you split your code properly into one compilation unit per object you can also use a shorter name like THAT and reduce the burden of having to write it at the head of each method. If you forget to add a new object type, the compiler will notice it immediately and it's just one line to add. If you mistake the type in these macros, that will also be detected at compilation. It's straightforward, simple, light and safe. (Note that you'll also probably want a #define CONST_THAT struct XXX const* that=(struct XXX*)that_)

If you're worrying about the invoking instance being passed around through that one single global variable, you're right, but you (almost) shouldn't. This global variable is only here to memorise the address of the invoking instance between the moment the method is called and the moment THAT is called (which I took care to be the first line of the body of each method). Yes, there is here the possibility for a bug. If you invoke a method before calling THAT in the body of a macro, that_ will be clobbered and no one will warn you. Know it and stick to the rule "THAT always first in the method's body" and you'll be fine. Then, what could happen between the call and the first line of the body that would affect that_ ? Nothing, ... almost ! There is actually a chance for trouble. What happens if the arguments of the method contain another method's call, like in $(&operator, get)($(&op1, get)(), $(&op2, get)()) ? The standard says the order of evaluation of the function designator and the arguments is undefined behaviour. That's not a problem for the arguments, their order doesn't matter here. But that's a big problem for the function designator. It MUST be the last to be evaluated, or the assignment of that_ by $() during evaluation of arguments will clobber the assignment during $(&operator, get) and THAT_OPERATOR will get corrupted. Personnally I don't see that as a problem. The first reason is that the compiler does its job correctly: the line above does not compile and lead to the following error message:

main.c:xx:46: error: operation on ‘that_’ may be undefined
 [-Werror=sequence-point]
   xx |   struct Operator*: ((struct Operator*)(that_=(a)))->b)
      |                                        ~~~~~~^~~~~
main.c:yy:11: note: in expansion of macro ‘$’
   yy |   int b = $(&operator, get)($(&op1, get)(), $(&op2, get)());
      |           ^

Then you don't risk anything, the compiler protects you. The second reason, which is more an opinion than a reason, is that I personnally find calling function in the argument list of another function extremely dirty, and I have the habit of not doing so since well before I've started thinking about OOP in C. So, to me at least, it's really not a problem.

The advantages of using methods.

Great, I now have a satisfying way to declare and use methods in C. Lets see the advantages of using them.

Cleaner and shorter code.
Probably the biggest advantage I see in methods is that it enable me to write much shorter and cleaner code. Imagine you have two structures, Square and Circle. You need functions to display them. Of course the obvious name for these functions should be display() but you can't use the same name for two different functions. So you have no choice but use something like circleDisplay(Circle* that) and squareDisplay(Square* that). Now, if you can use methods, there is no problem any more. display() is a field in a structure, there is no problem to reuse it in several structures ! Even the underlying functions, if you properly use one compilation unit per object (and static), they can share the same name as they will be in different units. I also really like the fact that in the format introduced here it's clear to see 'who' is processing 'what', compare to the function where 'who' and 'what' are mixed together in the argument list, plus you have one less argument to write for each method.
Define the behaviour of the object at instance level.
I wrote that method's body should be initialised at the creation instance. First you're free to initialise it to different function based on the argument of the creation function. Second, nothing forbids to reassign the pointer to function to modify the behaviour of an instance dynamically. An example would be a state machine. It would have a step() method which a user would call without worrying of anything else, and the state machine would internally update that method to the appropriate function depending on its state. Nice and clean. This example shows also that it may avoid branching and storing the current status, they're both encoded in what function the method is currently pointing to, hence possible gain in performance and memory. Of course this is available without the whole machinery introduced here, all you need is (love and) pointer to function, but the present solution gives you other advantages at the same time.

The disadvantages of using methods.

To be fair I must also consider the disadvantages of using them.

Use more memory.
Given that objects store data and code (even if it's only a pointer to function), it's natural that they'll take more memory than a traditional C structure holding only data. Where this bothers me the most is in the duplication of these pointers to function. Imagine you have a sample object. If you have a huge data set with lot of samples, you'll have as many duplication of the pointer to function for the methods of the sample object, once for each sample instance. Nowadays computer have insane amount of memory and in most case you may probably not care, but that's not a good excuse anyway. Methods should be used reasonably, it's not because you can that you must use them. Where a good old structure is better there is no reason not using it. In the example of the data set, sample instances could be traditional structures with only data to stay light, and a data set object using the convenience of methods could be responsible for manipulating these sample instances. It's really a matter of code design depending on the project you're working on.
May be slower.
The management of the variable that requires two assignments (one at the call of the method and one at the entrance of the method) which a normal function call does not need, but it has one less variable to push on the stack. If the body of the method is extremely simple, the overhead of the extra assignments may make it slower than its function equivalent. For method with more complex body than these assignments (which honestly should be much more common) the difference should be less significant. Also, the creation of an instance takes more time due to the initialisation of the methods, but here too you probably spend much more time using the instances than creating them so overall that's probably not a big deal.

Conclusion

I've found this solution attractive enough to decide to give it a try in a large personal project, LibCapy, to see how it peforms in reality. I may update this article later with some insights gained doing so. Until then, I hope you will give it a try too ! Let me know by email.

If this article was interesting to you, you may also want to read: OOP in C - Inheritance, OOP in C - Genericity.

2021-09-07
in All, C programming,
324 views
A comment, question, correction ? A project we could work together on ? Email me!
Learn more about me in my profile.