OOP in C - Genericity

Generic programming is a style in computer programming in which the implementation of data structures and algorithms is decorrelated from the type of the data they manipulates. For example, implementing an array and the functions to manipulate it independently of the type of the data in the array. The implementation of data structures and algorithms in C requires the type of the data to be set at compile time. For example, there should be one implementation for arrays of integer: struct ArrayInt {int* data;};, int ArrayIntGet(struct ArrayInt* array, int idx); and one for arrays of float: struct ArrayFloat {float* data;};, float ArrayFloatGet(struct ArrayFloat* array, int idx);. The code in the 'Get' functions is exactly the same: return array->data[idx];, and if we later need an array of another type we have to implement it too and duplicate once again the exact same code. Then, it seems at first glance not possible to do generic programming in C. In this article I'll explore ways to bring genericity into C, through the examples of generic arrays and generic linked list.

Using void*.

A well known solution to achieve genericity is to use pointers to void. A void* typed variable contains the address of a location in memory without precising the type of the data at this location. It's the responsibility of the programmer to cast the pointed data into a non-void type before using it. Knowing which type means the void pointer is accompanied with some kind of identification indicating which type it should be cast into. To illustrate this, below is a dummy example of a function taking a void pointer and an identifier as arguments, casting the void pointer as necessary and printing it.

Using void* it is possible to create a generic array as follow:

An enumeration could be a better choice compare to an integer for dataType as it would prevent reuse of the same value for different types and clarify the code thanks to names instead of numbers.

And a generic linked list:

The array and list themselves are somehow generics but you can't define much functions to manipulate them. Even a 'Get' function would be impractical: its return type would need to be set to void*, raising the question of how other functions would use it. void* are dangerous because the compiler can't check any more if you're using variables of the correct type, and they limit the genericity to pointers only. It isn't possible to have a list of, for example, integer, only a list of pointer to integer. To extend genericity to base types, a solution is to use an union instead:

Which would be used as, for example:

Base types and pointers are then both available, at the cost of memory: the size of the union is the size of its biggest type. A linked list of char would use as much memory (per element) as a linked list of long long int. If the usage of memory is not critical this isn't a problem but in some case it may not be acceptable, and the user should use that kind of generic structure carefully based on its needs.

Another heavy downside of using void pointer and/or union is the need to maintain the enumeration of possible data types and the switch in the functions' body. If they are intended to be provided through a library, when the user data types are unknown at time of implementation, it isn't possible for the user to extend them himself with its own types.

Using _Generic.

The _Generic keyword has been introduced in C11. It allows to switch at compile time between several expressions based on the type of another one. It is introduced as a way to achieve genericity and can be used as follow:

From the point of view of the user it effectively looks like ArrayGet is a generic function, but to me it is very unsatisfying. Behind the scene, the structure and get function are implemented for each type, hence the goal of genericity (implement it only once) is not achieved. It is still necessary to maintain the list of supported types and add the corresponding functions for any new type, Meaning that the user can't extend the genericity to its own types if the code is burried into a library.

Using the preprocessor.

_Generic hints toward the use of the preprocessor to try to solve the genericity problem. Here the solution is to create macros that expand to the implementation of data structures and functions given a type in argument. This could be achieved as follow:

And for generic linked lists:

Now, we have data structures and functions truly implemented only once, any type available, no need to maintain lists of type identifier or _Generic, and with the macros exposed in a header file, an eventual user could add arrays and lists of any other types. That looks very promising !

However there are inconvenients here too. First, everything being defined in a macro, it must be on one single line, or split using the \ character. This makes editing these macros annoying, generates confusing debug message (a problem wherever in the macro is reported as if it was at the #define line), and worst of all prevents from writing comments in the functions created by these macros. Second, having to use a different name for each type (GenericArrayIntGet(), GenericArrayFloatGet(), ...) doesn't make it look very generic.

The "macros are one line" problem can be mitigated by splitting the declare/define macros into several submacros:

These line splitters are still a bit cumbersome but it's better, it gives room for comments at the beginning of each submacro, and no pain no gain !

Separate names have also their work around: object oriented programming.

Using OOP.

If you haven't read my article about OOP methods in C, read it first, you'll need it to understand the rest of this article.

Do you remember how methods help to resolve nicely name conflicts ? It's handy too to make the previous solution friendlier. Instead of having one function name per type, if you use methods, from the user point of view it becomes only one shared name accross all types. For example, using LibCapy for the method framework:

This brings us to a solution to the genericity problem which I find very interesting. No code duplication, extandable to any type (including not yet defined ones), non occlusive to type checking by the compiler, simple and intuitive usage for the user, and relatively simple to implement for the developer.

Using casting.

In the examples above I was only defining the get function to keep things short and simple. In reality these structures would have also functions to set, add, find, remove, count, etc... What about a function to sort ? It's a common operation on such structures so it sounds legitimate. However, if you think about it, you realise that sorting algorithms do not depend on the type of the containers they sort. Quick sort is conceptually the same for a generic array and a generic list. Hence, if you've implemented it as a method of a generic array, you'll need to implement it again as a method of a generic list. Considering we are interested in genericity here that's pity. Wouldn't it be possible to apply the same solution described above to implement a generic sorting function ? The code below gives an example (skipping the code for the generic array and list which has been given in the previous sections):

Note the unspecified arguments for the cmp pointer to function in the generic sorting function. This allows to pass cmpInt or cmpFloat as necessary.

It works has desired, but can be improved. The generic function relies on the shared name for the method getPtr and swap. This creates an unnecessary constraint on the data structure acceptable as type. Also, the cmp function is a bit of a burden. We want to let the user to choose upon which criteria the container is sorted, but it would be handy if a default criteria could be used without having to be specified. Lastly, the name of the generic function, including the type of container, is quite long. Using the OOP approach we can again improve on all these points. The previous solution, relying on shared name for the necessary methods, points toward another object of higher conceptual level, whose interface is the one needed to implement the sorting algorithm. I'll call this object here a SortableContainer. To sort a generic array or list, first cast it into a SortableContainer, then use this new object to execute the sorting. The code below shows how to implement it:

Here I introduce an operator which I haven't spoken about in my article about OOP: #define $$(instance, actor, method) ((capyThat=(actor)) ? (instance)->method : NULL). It allows to execute the method of the instance as if actor was the instance.

Then I add to the generic array and list a method to cast into a SortableContainer and use it as follow:

Conclusion.

In this article I have introduced solutions to the problem of genericity in C. In particular, using the preprocessor, OOP's method and void* pointers it is possible to implement truly generic data structures and functions, written only once and later extendable to any applicable data type in a rather simple way, even for complex scenario as the sorting functionality. The only downside is the edition of the preprocessor commands. If the developper accept this burden, he can provide to the user very easy to use generic code and benefit from the advantages of genericity: faster development, easier maintanability and more readable code.

Finally, here is a minimal listing for a generic array and list, and their sorting with a generic sorter:

2021-10-03
in All, C programming language,
31 views
Copyright 2021 Baillehache Pascal