Trap representations and padding bits
June 16, 2016
The effects of padding and trap representations in C
The C programming language does not hide from you how the values you manipulate are represented. One consequence is that when padding happens, its presence may have observable effects in carelessly crafted programs. Padding is well-known to appear between members of a struct, and also possibly after the last member of a struct. The remaining space in a union when the active member is not the widest one is also considered padding. A C programmer that only cares for usual x86 platforms might be excused for thinking that, for them, this is it. As for trap representations, these may be believed to be reserved for weird hardware that use one’s complement or explicit parity bits.
Padding in structs and unions
Naïve attempts at making an x86-64 compiler take advantage of the unspecified nature of struct padding fail, as in example functions f, g and h, but the function i, provided by Alexander Cherepanov, shows that the padding of a struct x does not have to be copied along the rest of the struct after the two consecutive struct assignments y = x; z = y;
int i(void) { struct { char c; int i; } x, y, z; memset(&x, 1, sizeof x); memset(&z, 0, sizeof z); y = x; z = y; return *((char *)&z + 1); }
The function i is optimized to return 0; meaning that the entirety of the memory dedicated to x was not copied from x to z.
These occurrences of padding can be prevented when programming for a specific architecture, or a couple of architectures(*), with interstitial character array members. If, in a new project, you are implementing treatments so sophisticated that they require you to define a struct, C may be the wrong language to use in 2016. However, if you are going to do it anyway, you might want to insert these explicit interstitial members yourself: nothing the compiler can do with these unused bytes is worse than what the compiler can do with padding. Clang has a warning to help with this.
Consequences
Since padding does not have to be copied in a struct assignment, and since any struct member assignment can, according to the standard, set padding to unspecified values, memcmp is in general the wrong way to compare structs that have padding. Sending an entire struct wholesale to an interlocutor across the network (by passing its address to write) can leak information that was not intended to get out.
The rest of this post discusses padding and trap representations in scalar types, which we show that our exemplar C programmer for usual x86 platforms might encounter after all. Also, padding in scalar types cannot be eliminated with just additional members, so the problem, if rarely noticed, is in some ways more annoying than struct padding.
Padding in scalar types
It would be easy to assume that in such an ordinary architecture as x86, padding only happens because of structs and unions, as opposed to scalar types. The x86 architecture is not that ordinary: its first historical floating-point type occupied 80 bits; many compilation platforms still make that floating-point type available as long double, for some reason as a type of 12 bytes or 16 bytes of total (respectively in 32-bit and 64-bit mode), including respectively 2 and 6 bytes of padding. Padding may or may not be modified when the value is assigned, as shown in another example from Alexander Cherepanov:
int f(void) { long double x, y; memset(&x, 0, sizeof x); memset(&y, -1, sizeof y); y = x; return ((unsigned char *)&y)[10]; }
The function f above is compiled as is if was return 255; although the entire memory assigned to x was set to 0 before copying x to y with y = x;.
Trap representations
Trap representations are a particular case of padding bits. A symptom of padding bits is that a type t has fewer than 2^CHAR_BIT ✕ sizeof(t) distinct values. In the case of trap representations, some of the bit patterns are considered erroneous for type t. Accessing such erroneous representations with an lvalue of type t is undefined behavior.
The C11 standard latest draft contains this footnote(**):
53) Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.
This footnote is in the context of the representation of integer types, which are made of value bits, padding bits, and in the case of signed integers, one sign bit (6.2.6.2).
“I have no access to no parity bits,” the ordinary programmer might think. “Even if redundant internal representations are part of the implementation of my high-performance computer, the interface presented to me is that of ordinary memory, where all bits are value bits (with sometimes a sign bit).”
In fact, as implemented by GCC and Clang, the _Bool type has two values and 254 trap representations. This is visible on the following example:
int f(_Bool *b) { if (*b) return 1; else return 0; }
The function f in this example, as compiled, returns 123 when passed the address of a byte that contains 123. This value does not correspond to any of the two possible execution paths of the function. Undefined behavior is the only excuse the compiler has for generating code with this behavior, and this means that GCC and Clang, for the sake of this optimization, choose to interpret bytes containing a value other than 0 or 1 as trap representations for the type _Bool.
This post owes to Alexander Cherepanov’s examples, John Regehr’s encouragements and Miod Vallat’s remarks.
Footnotes:
(*) I would have recommended to use fixed-width integer types from stdint.h, such as uint32_t, to make the layout fixed, but that does not solve the problem with floating-point types or pointer types.
(**) The latest C11 draft contains two very similar footnotes 53 and 54 for which it is not clear that both were intended to be present in the final revision.