An old quirky libksba bug

June 27, 2017

Finding bugs with TIS Interpreter

Finding bugs with TrustInSoft Interpreter

The libksba library, used by GnuPG, provides functions for parsing X.509 cryptographic certificates. I was testing libksba with TIS Interpreter a little over a year ago. One of the bugs I found then illustrates a point I would like to make now.

The bug

Consider this function, as it was present in libksba two years ago:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  gpg_error_t err;
  AsnNode n;
  char *algo;
  size_t nread;

  if (!cert)
    return NULL;  /* Ooops (can't set cert->last_error :-().  */

  if (!cert->initialized)
    {
       cert->last_error = gpg_error (GPG_ERR_NO_DATA);
       return NULL;
    }

  if (cert->cache.digest_algo)
    return cert->cache.digest_algo;

  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The source code above contains the bug. The bug can almost be found by inspection of the function, inferring contents of types and behavior of callees from the way they are used. I have only removed commented-out code from the original. If you think that you have identified the bug I found one year ago, but that it may depend how the function ksba_cert_get_digest_algo is used, you may be on the right track. Here is how ksba_cert_get_digest_algo is invoked in tests/cert-basic.c:

  oid = ksba_cert_get_digest_algo (cert);
  s = get_oid_desc (oid);
  printf ("  hash algo.: %s%s%s%s\n",
          oid?oid:"(null)", s?" (":"",s?s:"",s?")":"");

If on the other hand, even with this clue, you are still finding the entire function ksba_cert_get_digest_algo too tedious to review, here is a synthetic view that makes that bug stand out:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  …
  char *algo;
  …
  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The bug is that the automatic variable algo remains uninitialized until ksbaasn_find_node is called. When that function succeeds, &algo is passed as argument of the function ksbaparse_algorithm_identifier (without looking at it, we can assume that that function does initialize algo in all cases). The fact remains that when ksbaasn_find_node fails because of the condition !n || n->off == -1 being true, the variable algo remains uninitialized until its “value” is returned as the result of the function.

The anecdote

The funny thing about this bug is that it is very easy to produce inputs for. I was using afl, an efficient and easy-to-use fuzzer, to generate testcases. This bug was the second one I found, right after I set up afl and TIS Interpreter to work together on libksba. It turned up just after afl generated an input that demonstrated a crash caused by an unrelated first bug. Whenever the target software crashes, afl generates a README.txt file to invite the user to report their success to afl’s author. This is smart timing: when using a fuzzer to find bugs, producing an input that causes a direct crash is one of the best possible outcomes. afl’s README file is thus a subtle declaration of victory: you only see it when afl has done a good job.

 The README, a pure ASCII file intended for human consumption, of all inputs pretending to be X.509 certificates, also happens to trigger the uninitialized-variable-use bug described in the first part of this post. Let they who would not have used the shell command for i in findings_dir/crashes/* ; do run_in_tis_interpreter.sh $i ; done throw the first stone.

 No great coincidence happened here, neither was the README.txt file generated designed to look fiendishly similar to a X.509 certificate. Any brute-force purely random fuzzer would have instantly produced a file that triggered the bug shown. In good time, afl would have found one too—and, importantly, it would have recognized that such an input caused a different execution path than less obviously incorrect certificates to be followed inside libksba’s cert-basic test program, and would have set it apart for later inspection.

Epilog

The crash initially found by afl was fixed in a timely manner by Werner Koch in commit a7eed17, and the uninitialized-variable-use bug described in this post was fixed in commit 3f74c2c.

 I never formally reported any of afl’s numerous findings to afl’s author, despite the instructions in the input file. Well, this post can be my report: all the bugs reported by me in libksba in 2016 were found using afl and TIS Interpreter together. This post is but an example to illustrate how well these two tools work together.

 afl had been used on libksba before (for instance this 2014 vulnerability is listed as having been found using afl). But although the uninitialized-variable-use bug is very shallow and can be evidenced by throwing any README file at cert-basic, the bug was probably missed because it did not cause a crash. Barring weird optimizations, when executing a binary not instrumented to detect the use of uninitialized variables, the variable’s value is what happened to be on the stack. The stack is full of valid addresses: addresses of variables and buffers that have previously served, but also addressed of call sites to return to. Reading from any of these addresses, it is likely that a zero will be found before the end of the page (code contains plenty of null bytes because of how constants in instructions are typically encoded, and data contains plenty of null bytes too, of course). This means that running cert-basic directly does not reveal, via a crash, that something wrong happened. All that happens is that instructions or (possibly secret) data is printed as if it were a string. Since there are no a priori expectations for what cert-basic should print when passed an invalid file, this is difficult to notice.

 One solution, of course, is to have used MemorySanitizer (MSan for short) to compile libksba’s cert-basic. One might speculate that the bug was not found earlier because MSan’s instrumentation is incompatible with ASan’s, and that users of afl, if they use a sanitizer at all, use ASan in order to find the most dangerous memory bugs.

 TIS Interpreter takes more time to run a test program than it takes to compile and execute this test program with MSan or ASan instrumentation, but TIS Interpreter finds all the problems that these two sanitizers find, and more.

 afl can produce minimal test suites that exert all the different execution paths it was able to identify in the target programs. This feature enables TIS Interpreter to be used harmoniously together with it.

 If the only way to use the two together was to run TIS Interpreter on every input that afl generated and discarded, then the collaboration between the two tools would be fruitless. The interesting inputs would be lost in a sea of uninteresting ones and TIS Interpreter’s comparatively slow speed would make it seem like it is not finding anything. But since afl can produce minimal test suites, covering a lot of different behaviors of a typical parser with a few hundred testcases, these few hundred testcases should definitely be run in the best detector of undefined behavior that exists, even if that detector uses up a few seconds per testcase. And, for C programs, the best detector of undefined behavior in terms of detected undefined behaviors is arguably TIS Interpreter.

Newsletter