Out-of-bounds pointers: a common pattern and how to avoid it

April 20, 2016

Recent evolutions of TrustInSoft Interpreter

Pointer

A quick tis-interpreter update

At long last, tis-interpreter was released as open-source last month. It is somewhat difficult to install, and as a remedy I will regularly prepare binary snapshots for Linux (and maybe OpenBSD, OS X). At the time of this writing, the first publicly available binary snapshot can be downloaded from the bottom of the tis-interpreter homepage.

 The rest of this post is about extremely recent evolutions in tis-interpreter.

A newfound strict stance on pointer arithmetic

As tis-interpreter was getting easier to use and was getting used, the feedback from applying it led to what may seem like drastic changes. Here is one example.

 Among other software components, John Regehr applied tis-interpreter to Google’s implementation of their WebP image format, and then to SQLite. The former is only a compressed format for static images, but the latter is a large library with an even larger testsuite. Both were worth looking at, but the latter was a significant effort.

 In libwebp, among other problems, John reported a subtle bug in a pointer comparison. libwebp would check whether the size declared for a sub-object was consistent by comparing what would be the address to continue parsing at (after the sub-object, if it had the size it said it had) to the end pointer. This idiom is dangerous. tis-interpreter warned for the dangerous comparison, I wrote a short essay, John reported the bug, and it was the end of that story.

On the SQLite front, progress was slow. The very same “dangerous pointer comparison” warning was being emitted because of one-indexed arrays implemented as int *one_indexed = ptr – 1;, as explained in John’s post. John complained that the “pointer comparison” warning was emitted on an eventual one_indexed == NULL, arbitrarily far from the allocation of the one-indexed array, which was the real culprit, and in any case, the line he wished to change so as to suppress all warnings coming from that pointer in one swoop. After multiple requests, I finally implemented detection of invalid pointer arithmetic in tis-interpreter, so that the warning would be on ptr - 1 instead.

A surprise lurking in already visited code

This week, my colleague Anne Pacalet found herself going back to analyzing libwebp. She naturally started by using the latest version of tis-interpreter, with its all-new early warnings about invalid pointer arithmetic. She discovered that apart from the reported, since long fixed end_ptr idiom where a pointer was compared after going out of bounds, other invalid pointers are computed inside libwebp. This happens when decompressing examples provided as testcases by Google itself. The line sporting the tis-interpreter warning is this one, and the surrounding code looks like this:

  for (j = 0; j < height; ++j) {
    for (i = 0; i < width; ++i) {
      const uint32_t alpha_value = alpha[i];
      dst[4 * i] = alpha_value;
      alpha_mask &= alpha_value;
    }
    alpha += alpha_stride;
    dst += dst_stride; // goes out of bounds on the last iteration
  }

Do not confuse this with a buffer overflow or a similar immediate problem! The pointer dst goes out of bounds when it is computed at the end of the last iteration, and it is never used after that. Besides, it may look like this last value of dst is a one-past-the-end pointer as allowed by the C standard, but that is only true if dst started from 0.

The function in question is called from EmitAlphaRGB (through the WebPDispatchAlpha function pointer). At line 183, dst may have been incremented by 3 before being passed as argument, and in this case, the last value of dst is way out of bounds (it is four-past-the-end, but there is no “four-past-the-end” exception for pointer arithmetic in the C standard).

Fixing the problem

It may help both to communicate the problem and to convince the maintainers to remove it if it is reported with an example of change that would keep the computation the same and remove the undefined behavior. We looked at different variations. I thought I had a solution that wasn’t too ugly when I arrived at the following change:

-  for (j = 0; j < height; ++j) {
+  j = 0;
+  while (1) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
       dst[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
+    if (j == height - 1) break;
+    ++j;
     alpha += alpha_stride;
     dst += dst_stride;
   }

The above change applied, the exit test is now after dst has been used in the iteration but before it is incremented, so that the ultimate undefined-behavior-invoking incrementation does not take place. The generated assembly for the code above seems fine, but there is a subtle difference: the original function never entered the loop if height was 0, whereas this one always does at least one iteration. This is something that would have to be discussed with the developers, but the problem does not seem worth a discussion. We could pre-emptively guard the loop with if (height>0), but then the source code gets uglier, at the very least, and the generated assembly code might be longer than what was generated for the original function.

Jamey Sharp's solution

Jamey Sharp provided a brilliant insight. His remark was: “what about just doing the obvious multiply and letting strength reduction turn it into this code? Easier to maintain and correct”

Translated to code, Jamey’s remark amounts to this change:

   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      dst[j * dst_stride + 4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
   }

Indeed, we are removing more code than we are adding. It solves the undefined behavior problem because now, in the source code, j dst_stride is only added to dst at the time of accessing the memory location (in the source code, dst + height dst_stride is never computed). The beauty of this solution is that an optimizing compiler generates exactly the same assembly code, to the comma, for this version as it did for the original version:

$ diff -u old.s jamey.s
$

In other words, we stick to legal (and concise and readable) operations in the source code. The compiler does the tedious work of knowing that on the target platform, pointers are just integers and that a multiplication by the index inside a loop can be transformed into an addition. This is the correct division of labor between developer and computer.

Miod Vallat's solution

Miod Vallat pointed out that the solution above required you to trust that the compiler would always succeed at the strength-reduction optimization. Miod’s solution instead expresses in the source code the fact that pointers are just integers. It should work fine for all the ordinary platforms we are used to, because of footnote 67 in the C11 standard.

   int i, j;
+  uintptr_t dst_int = (uintptr_t) dst;
 
   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      ((uint8_t*)dst_int)[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
+    dst_int += dst_stride;
   }

With Miod’s solution too, the compiler (GCC 5.2) produces the exact same assembly as before:

$ diff -u old.s miod.s
$

Conclusion

The new “bug” discussed here is, as far as I know, harmless with current compilers. In this particular instance, by removing the undefined pointer arithmetic, we are making libwebp safer from agressive compiler optimizations for the next ten or fifteen years, but we are not solving any immediate problem. It is not always that clear-cut. The pointer comparison problem that had been reported earlier in libwebp, with current compilers and execution platforms, would never have noticeable consequences on the ordinary computers security researchers use to look for problems, but it could have consequences in a peculiar configuration.

 The option remains to disable the warnings on the creation of invalid pointers: if that is what you want, use:

tis-interpreter.sh -no-val-warn-pointer-arithmetic-out-of-bounds …

Newsletter