Extensions for automatic allocation of flexible array members

In this document, we will examine some desirable extensions that some C compilers have implemented to make flexible array members ("FAMs") more usable, alternative options for compilers not implementing these extensions, and pose the question of whether a lack of a portable and straightforward way to automatically allocate memory for FAMs upon initialization is a deficiency in the ISO C standard.

FAMs are a feature introduced in the C99 standard. To motivate use cases, let's consider a deficient interface that could be helped by them: the Berkeley sockets API. With the sockets API, one fills in the following structure to create or connect to a UNIX domain socket:

#define _POSIX_C_SOURCE 200809L
#include <sys/un.h>
struct sockaddr_un {
	sa_family_t sun_family;
	char sun_path[/* unspecified size */];
}
This structure may also contain additional non-standard members which a portable application need not be concerned with. Here sun_family should always be set to the magic value AF_UNIX, and sun_path contains a not-necessarily null-terminated pathname in the filesystem for the socket. POSIX leaves the size of the sun_path array unspecified, and in fact it is not clear whether it is permitted to be a flexible array member. For the purposes of this discussion, we'll assume hereafter that sun_path is not a FAM, as is the case in every major implementation. (After all, the sockets API predates the introduction of FAMs.)

In practice, sun_path is usually pretty small and its size is arbitrarily chosen by the standard library. This is quite an undesirable situation. If the pathname may contain multibyte characters, one may be limited to a pathname with as few as sizeof((struct sockaddr_un){0}.sun_path)/MB_LEN_MAX characters. If the size of the sun_path array were to be omitted in its declaration, and if it's the last member of a structure that has at least one other member, then it constitutes a flexible array member.

The concept is that you can allocate extra memory at the end of the structure to provide as much room for the flexible array member as one wishes, and in this way the flexible array member provides a name for this extra room. One could then use it like this (error checking omitted for brevity):

#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <string.h>
#include <sys/un.h>

#define PATH "/run/foo/bar/socket"
int main(void) {
	/* Note that sizeof() returns the size of the structure as if the flexible
	 * array member were not there (since the compiler doesn't know the size of
	 * the memory that we're managing for it), except that there might be extra
	 * padding at the end as necessary to satisfy alignment requirements. */
	struct sockaddr_un *const addr = malloc(sizeof(*addr) + sizeof(PATH));

	/* This seems to be the only strictly conforming way to clear
	 * out the non-standard members of the structure. An all-bits-zero
	 * representation (i.e. the product of using memset()) need not be
	 * the same thing as what we'd get with default initialization. */
	memcpy(addr, &(struct sockaddr_un){.sun_family = AF_UNIX}, sizeof(struct sockaddr_un));
	memcpy(&addr->sun_path, PATH, sizeof(PATH));
	/* do something with our structure */
}

However, for most usages dynamic allocation is overkill. If the string or its length is known at compile-time, we'd like to simply do

struct sockaddr_un addr = {
	.sun_family = AF_UNIX,
	.sun_path = "/run/foo/bar/socket"
};
Unfortunately if sun_path is a flexible array member, this is no longer possible in strictly conforming ISO C. When a structure with a FAM is automatically allocated, it is given no room for the FAM, but there is little sense in not "doing the right thing" when an explicit initializer is provided, as is done here. Some compilers such as TinyCC support this as an extension: if an initializer is provided for a FAM, then the appropriate amount of room is allocated.

If an initializer is known to have a fixed size, then a disgusting but portable and strictly-conforming way to get exactly what we want is to do our own "memory management" on the stack using a union. We can create an array of char that is big enough to provide enough room for our structure with its flexible array member.

#define _POSIX_C_SOURCE 200809L
#include <string.h>
#include <sys/un.h>

#define PATH "/run/foo/bar/socket"
int main(void) {
	union {
		struct sockaddr_un addr;
		char spc[sizeof(struct sockaddr_un) + sizeof(PATH)];
	} addr = {0};
	addr.addr.sun_family = AF_UNIX;
	memcpy(&addr.addr.sun_path, PATH, sizeof(PATH));
}

Given that this is possible, it makes one wonder why this isn't supported in a more clean way directly in C. With a GCC extension, variable-length arrays inside unions and structures, a similar trick works to allocate a variable amount of space for the flexible array member. This might be useful where malloc() is not available and where the structure with the flexible array member is only needed within a block of code.