David

Verifying loops in C and C++ - Part 1

When it comes to formal verification of single-threaded programs, one of the hardest constructs to verify is the humble loop. If a function contains no loops and no function calls, then a static analyzer can trace through the function, looking for constructs (such as indexing an array, or dividing one number by another) that have an implied precondition, and relating these preconditions back to the function inputs. For example, consider this function:

uint32 elemDiff(const uint32 * array arr, size_t p, size_t q) {
  return arr[p] - arr[q];
}

This function contains two array index expressions, therefore we can infer preconditions arr.lwb <= p <= arr.upb and arr.lwb <= q <= arr.upb immediately. If it weren’t for the fact that eCv requires nullable pointers to be declared as such, we would infer arr != NULL too. Finally, we’re taking the difference of two unsigned numbers, so we need a further precondition arr[p] >= arr[q]. If you’re wondering what the word array is doing in there, see my post on The taming of the pointer – part 2.

In this way, a static analyzer can calculate the weakest precondition that must be satisfied if run-time errors and other undefined behaviour is to be avoided, and the postcondition that will be achieved if the weakest precondition is satisfied. In the above example, the postcondition is simply result == arr[p] – arr[q].

We can extend this to include functions that call other functions. If function A calls function B, then we establish the precondition and postcondition of B first. Wherever A calls B, we relate the precondition of B back to A‘s inputs, and use the postcondition of B to calculate the program state just after the call.

In principle we could analyze the whole program in this way, so that we can determine whether there are any inputs for which the program uses features of the C language in a way that give rise to (in the words of the ISO Standard) “undefined behaviour”. Calls to library functions for which we have no source code need not be a problem, if the library functions are sufficiently well-documented for us to supply appropriate pre- and post-conditions.

This happy situation disappears if we use recursion. If a function is directly or indirectly recursive, then we can’t calculate weakest preconditions so easily, because calculating the precondition of a recursive function A requires knowing the precondition and postcondition of A at the point that it calls itself. I won’t go into possible solutions, because when writing critical software, recursion is normally banned (e.g. MISRA C 2004 rule 16.2).

Loops also spoil the party. More precisely, a loop makes analysis difficult if it changes the state of the program, and we can’t easily work out how many times the loop body will be executed. Consider the following:

int foo(const int * array arr, int size, int key) {
  for (int i = 0; i < size && key < arr[i]; ++i) {
  }
  return arr[i + 1];
}

What are the weakest precondition and the postcondition of foo? They are not easy to determine automatically. Even if we can work them out, the weakest precondition is quite likely not as strong as the precondition intended by the user. For example, the author of foo() probably intended that it should be called with size equal to (or at least no larger than) the number of elements in arr; but the weakest precondition allows size to be greater than the number of elements as long as there is an element greater than key. By way of another example, consider the call strncpy(dst, src, count). Everyone knows that count should be no more than the number of elements in the array dst. However, if we analyze the code for strncpy we find that the corresponding weakest precondition is that min(strlen(src) + 1, count) is no more than the number of elements in dst.

eCv requires that all function preconditions be declared explicitly. Even if you’re not using formal analysis, it’s a good idea to write preconditions. Good software engineers always prefix functions with comments such as /* ‘size’ is the number of elements in ‘arr’ */ . Writing a precondition is just a more precise – and usually more concise – way of expressing the same thing. I find it amazing that, more than 40 years after the importance of preconditions was first recognized, very few programming languages even have syntactic support for them.

The other sort of information that eCv requires you to write is a loop invariant for every loop. If you want eCv to prove that the loop terminates, then you’ll also need to provide a loop variant, unless eCv can work one out for itself.

Loop invariants and variants are by no means specific to eCv. You’ll find them in other formal verification systems too, such as Vcc, Perfect Developer, SPARK Ada and Spec#. So, if you want to get into formal verification, they’re well worth learning about. I’ll introduce loop invariants in my next post.

ARTICLES   List of Articles     TOPTOP