Mutable structures: Arrays


Nearly all languages with imperative elements have some kind of an array. In C
an array is just a typed pointer to a memory region. You can address the
different elements by giving an offset to the start of the memory region. The
compiler multiplies the offset with the size of the objects contained in the
array to get the actual offset in bytes.

The C view of arrays has certain advantages: It is memory efficient because it
uses just one pointer. Accessing elements is easy because it just needs some
simple address arithmetic.

But the C view has some pitfalls as well. First you might fail to allocate
memory for the array. Then the pointer to the memory region has some undefined
value. Second you might read elements of the array which have not yet been
initialized. Third you might read from or write to elements of the array which
are not within the allocated region.

Modern languages avoid this pitfalls by certain methods which can be executed
at compile time or at runtime. The failure to initialize the pointer to the
memory region is usually handled by initializing all pointers with zero. Any
access to a zero pointer results in a runtime exception. Addressing elements
outside the region is usually handled at runtime as well by storing with the
array its capacity and generating an exception if elements outside the region
are addressed.

All these solutions avoid memory corruption but they have a cost. Additional
code is required to check null pointer accesses and to check out of bound
accesses. The single pointer implementation is no longer sufficient because
the size of the allocated region has to be stored in the runtime object.

A language which allows formal verification can avoid all these pitfalls
without any cost. Without any cost? Well — without any runtime and memory
footprint cost. But nothing is free. Some assertions must be provable at
compile time. I.e. whenever you want to access some element of the array you
have to be sure that the index is within the array bounds. And it is not
sufficient that you are sure. You have to convince the verifier of this fact.

In the following article we show an array structure in our programming
language which allows us to convince the verifier that we make no illegal

Since an array is a mutable structure we have to address the framing problem
as well. The framing problem is addressed appropriately if for any modifying
procedure it is clear not only what it does change but also what it leaves

A lot of effort is done currently in the verification community to address the
framing problem. Frame specifications like modify and use clauses for each
procedure, separation logic, different sophisticated heap models etc.

In this article we demonstrate that the framing problem can be solved without
all these complicated structures. It is sufficient to regard access to data
structures as functions and have a sufficient powerful function model. It is
amazing to see that proper abstraction makes some difficult seeming problems a
lot easier.

As a warm up let us see what kind of operations we expect from arrays. First
of all we want to be able to declare variables of type array which hold
elements of a certain type.

          s: STRING
          a: ARRAY[STRING]

The declaration does not allocate the object. Therefore accessing any element
of the array has to be illegal at this point.

          a[4] := "Hello"     -- Illegal, array not yet allocated!

To allocate the object we need some kind of object creation.

          create a.with_capacity(10)

This instruction creates an array object with the possibility to store up to
10 strings. The creation just allocates the structure but does not populate
the array with any string.

At this point it is illegal to read elements.

          s := a[0]           -- Illegal, access to an uninitialized object!

But we can write some elements of the array and access these elements

          a[0] := "first string"
          a[1] := "second string"

          s    := a[1]

          s    := a[2]         -- Illegal, access to an uninitialized object!
          s    := a[100]       -- Illegal, out of bound access!

If the verifying compiler is able to detect these illegal accesses and let
pass only the legal accesses, then the actual runtime code can have the same
size as the corresponding C code and the actual array object can be
represented by a single pointer to the memory region.

Basic declarations and access functions

The type ARRAY is declared within the module array

  -- module: array

We need a formal generic which represents the type of the elements.

  G: ANY              -- Element type

There are no constraints on the element type. Any type can be used as element

The generic class ARRAY has three basic access functions and an invariant.

      capacity: ghost NATURAL
          -- The allocated capacity of the array.

      domain:   ghost NATURAL?
          -- The set of indices of elements which have been initialized.

      [] (i:NATURAL): G
              -- Element at position `i'.
              i in domain
      domain <= {i: i<capacity}    
             -- Note: '<=' for sets is the subset relation

‘capacity’ returns the allocated capacity of the array. Since it is a ghost
attribute it can be used only in assertions and cannot flow into actual
computations. The attribute ‘domain’ represents the set of indices of array
elements which have already been initialized. This is a ghost attribute as

The bracket operator ([]) represents the function which returns the element
at position ‘i’. It uses ‘domain’ in its precondition to express that only
elements at initialized positions can be allocated.

The invariant states that only elements within the allocated capacity can be

We have presented the above declaration of the class ARRAY in an object
oriented manner. Programmers familiar with object oriented languages like
C++, Java, C# should have no problems to read such a declaration. But it
should be noted that such a form of declaration hides one important thing. All
routines declared within the class ARRAY have one implicit argument which is
not explicitly mentioned, namely the array object (called ‘this’ in C++ and
Java). In order to make this argument more explicit we present this
declaration in a different but equivalent form.


      capacity(a:CURRENT): ghost NATURAL

      domain(a:CURRENT):   ghost NATURAL?

      [](a:CURRENT, i:NATURAL): G
              i in a.domain

              a.domain <= {i: i < a.capacity}

This form of the declaration makes transparent that the used identifiers and
operators have the following types:

      capacity: ghost CURRENT->NATURAL

      domain:   ghost CURRENT->NATURAL?

      ([]):     [CURRENT,NATURAL] -> G

Remark: In our programming language we use the Haskell convention that an
operator (unary or binary) put between parentheses can be used like an
ordinary function. I.e. for numbers the expressions ‘7+3’ and ‘(+)(7,3)’ are
equivalent, but the first one is preferable because it is much more readable.

If something looks like an attribute in object oriented notation it is
actually a function because it needs an argument. Therefore an assignment to
an attribute is something fundamentally different from the assignment to a
local variable. An assignment of a new value to an attribute is a modification
of the corresponding function.

Note that the unfolded form of the class invariant makes evident that the
invariant is not only an object invariant but an invariant for the whole
system. In the case of such simple classes as ARRAY this makes no
difference. But if we look at linked structures the difference is fundamental.

Implicit invariant

The function domain has been used in the precondition of the function
([]). This creates the following implicit invariant.

      ([]).domain = {a:CURRENT, i:NATURAL: i in a.domain}

Since procedures might modify the functions ([]) and domain, they have to
respect the invariant, the explicit and the implicit invariant.

Dependence of the access functions

The basic access functions capacity, domain and ([]) are independent in the
sense that the value of one function does not depend on the value of any other
or these basic functions. There is a consistency condition between domain and
([]) because the function domain is used in the precondition of ([]) (see
previous chapter). However the value returned by ([]) does not depend on the
function domain.

The basic access functions capacity and domain have another consistency
condition expressed in the invariant. Note that consistency conditions do not
establish a dependence.

Based on the basic access functions we can define other functions. E.g. we
might define a function is_full which returns true if all positions in the
array are accessible.

      is_full(a:CURRENT): ghost
              -- Are all positions of the array `a' accessible?
              Result = ({i: i < a.capacity} = a.domain)

This function has a clear dependency: It depends on the functions capacity and
domain. This implies that any modification of the functions capacity and/or
domain modifies implicitly the function is_full as well.

Creation of an object

In order to be able to create an object of type array we need a creation
procedure (or a constructor called in languages like C++, Java and C#).

              -- Create an array with capacity `c'.
              domain   = 0          -- '0' is the empty set
              capacity = c

In order to make the hidden argument more transparent we take the declaration
out of the class and get the following equivalent declaration.

      with_capacity(a:CURRENT, c:NATURAL)
              a.domain   = 0
              a.capacity = c

Each creation procedure receives as its first argument an allocated but
uninitialized object of the corresponding type. The task of the creation
procedure is to initialize the object such that it satisfies the consistency
conditions expressed in the invariant.

The postcondition states that the creation procedure has an effect on the
functions ‘domain’ and ‘capacity’. Remember that the corresponding types are

Expressions like ‘a.domain=0’ in postconditions of procedures are interpreted
in our language rather strictly. It reads: the identifier ‘domain’ identifies
a new function which is the same as the function which it identified before
the call except for the argument ‘a’. Only functions mentioned in the
postcondition (and functions which depend on them) are modified by a
procedure. I.e. the declaration of the creation procedure is interpreted in the
following manner.

      with_capacity(a:CURRENT, c:NATURAL)
              domain   = old (domain + [a,0])
              capacity = old (capacity + [a,c])
              ([])     = old ([])

For any function f:A->B the expression ‘f+[a,b]’ is defined as

      f + [a,b] =  (x -> if x=a then b else f(x) end)

with the consequences

      (f+[a,b]).domain = f.domain + {a}

      x=a  => (f+[a,b])(x) = b

      x/=a => (f+[a,b])(x) = f(x)

I.e. we view functions as sets of key value pairs then the expression ‘f+[a,b]’
is the set of key value pairs of the function ‘f’ except for the key ‘a’ where
the pair [a,b] either overrides the one already contained in ‘f’ or is added
to the set of key value pairs.

The strict interpretation of the postcondition of ‘with_capacity’ corresponds
to our intuition if we read expressions like ‘a.domain=0’ in the postcondition
of a procedure. Our intuition tells us the ‘a.domain’ has been set to ‘0’ but
‘b.domain’, ‘c.domain’, … for ‘b’ and ‘c’ different from ‘a’ have still the
same value.

The strict interpretation of postconditions of procedures allows us to
interpret the postconditions as framing conditions. A postcondition of a
procedure contains everything the procedure has modified and all information
needed to derived the things which have not been modified. Postconditions of
procedures are treated as implicit framing conditions.

Put elements into the array

We need a procedure to put elements at certain positions.

      put(i:NATURAL, v:G)
              -- Update (or set) the value `v' at position `i'.
              i < capacity
              domain     = old (domain + {i})
              Current[i] = v

In order to make the arguments more transparent we use the following
equivalent declaration of the procedure ‘put’.

      put(a:CURRENT, i:NATURAL, v:G)
              i < a.capacity
              a.domain   = old (a.domain + {i})
              Current[i] = v

The procedure ‘put’ modifies two functions, the ghost function ‘domain’ and
the element access function represented by the bracket operator ([]). The
verifier extracts the following strict interpretation of this procedure.

      put(a:CURRENT, i:NATURAL, v:G)
              i < a.capacity
              domain   = old (domain + [a, a.domain + {i}])
              ([])     = old (([])   + [[a,i],v])
              capacity = old capacity

I.e. the procedure put assigns new values to the functions domain and
([]). Since the function capacity is independent of the functions domain and
([]) and the postcondition does not mention neither the function capacity nor
any function which depend on it, the strict postcondition is extracted that
capacity remains unchanged.

There is one interesting thing to note about the procedure put and the
function ([]). The procedure put modifies the function ([]) at just one
position and updates the domain appropriately. The signatures of ([]) and put
are consistent in the sense that they share the same arguments. The procedure
put has just one argument more which is the value to be set. These facts are
sufficient for the compiler to derive that put is an assigner for the function
([]). We can use this to write more readable code. Instead of writing


we can use the more suggestive notation

      a[i] := v

Swap elements

It is easy to write a procedure which swaps two elements of an array.

      swap(a:CURRENT, i,j:NATURAL)
              i in a.domain
              j in a.domain
              a[i] = old a[j]
              a[j] = old a[i]

Because of the fact that the function ([]) has the assigner command put the
procedure swap needs no explicit implementation. The postcondition is
sufficient because it states that the function ([]) gets new values at the
argument positions [a,i] and [a,j]. The assigner command put can be used to
derive the implementation. The automatically derived implementation looks like

          tmp1, tmp2:G := a[i],a[j]

As for the previous procedures we give the strict interpretation of the

      swap(a:CURRENT, i,j:NATURAL)
              i in a.domain
              j in a.domain
              domain   = old domain
              capacity = old capacity

              ([]) = old (([]) <+ {i -> a[j], j -> a[j]})

In this postcondition we have used a notation which needs some
explanation. In our programming language we can define a function f:A->B by
giving a set of key value pairs.

           f := {"one" -> 1, "two" -> 2}

           -- An illegal assignment
           f := {"one" -> 1, "two" -> 2, "one" -> 2}

The verifier generates the proof obligation that the same key must have the
same value. Therefore the second assignment in the example is illegal. Let us
look at the proof obligation generated for the second assignment.

      a1: "one" = "two" => 1 = 2
      a2: "one" = "one" => 1 = 2

The assertion a1 is provable because the antecedent of the implication is
false and from a false antecedent anything can be in the consequent. The
assertion a2 is definitely not provable because it requires the verifier to
prove ‘1=2’ which is not possible.

Going back to the procedure swap we can see that ‘{i->a[j],j->[a[i]}’ is a
function of type NATURAL->G with generates the proof obligation

      i = j  =>  a[i] = a[j]

which is trivially valid.

Furthermore in the postcondition we have used the function override operator
‘<+’ which has the following definition for two functions ‘f,g:A->B’.

      f <+ g = (x -> if     x in g.domain then g(x)
                     elseif x in f.domain then f(x)

with the consequences

      (f<+g).domain = f.domain + g.domain

      x in g.domain          => (f<+g)(x) = g(x)

      x in f.domain-g.domain => (f<+g)(x) = f(x)

Using these definitions one can verify that the above mentioned strict
postcondition has the intended meaning.

  1. #1 by Jesper Nordenberg on April 4, 2013 - 2:57 am

    I don’t see how this can possibly work in the general case as the representation of the array domain must be able to contain roughly as many elements as the array which means you possibly would need as much memory when type checking the program as when running it. Not very practical.

    • #2 by helmutbrandl on April 4, 2013 - 2:37 pm

      The domain of the array and the capacity are ghost attributes. Therefore they are just used for reasoning and are not needed in any physical representation (ghost attributes/function cannot flow into actual computation, they can just be used within assertions). The physical representation of an array is a pointer to the corresponding memory region and nothing else.

      • #3 by Jesper Nordenberg on April 5, 2013 - 1:39 am

        I see. What if the computation of the indexes to write in the array is a complex function which takes a lot of time to execute (ie it’s extremely costly to check if a NATURAL is a member of the index set)? Or if the indexes are calculated in some function that simply returns NATURAL’s? For example:

        timesTwo : NATURAL -> NATURAL

        for (i in 1..10)
        a[i] := timesTwo(i)

        b = a[2] // Illegal?

      • #4 by helmutbrandl on April 5, 2013 - 7:59 am

        I am not sure whether I understand your question.

        Assertions are verified during compile time by the proof engine. This checking is done by reasoning and not by calculation. Therefore it does not matter if an operation is costly or not.

        The operation ‘b:=a[2]’ is legal if compiler can verify that ‘2 in a.domain’ is valid. This is not difficult in your example. The loop has the postcondition ‘all(i) i in (1..10) => i in a.domain and a[i]=times_two(i)’. Since ‘2 in (1..10) is valid, the verifier can derive the validity of ‘2 in a.domain’. I.e. the statement ‘b:=a[2]’ is legal and accepted by the verifier.

  2. #5 by Jesper Nordenberg on April 5, 2013 - 8:32 am

    I mis-wrote, I meant:

    timesTwo : NATURAL -> NATURAL

    for (i in 1..10) {
    a[timesTwo(i)] := i

    b = a[2] // Illegal?

    Here’s an example of a costly indexing operation:

    foo(i : Int) : Double = if (i < 3) i / 3.33 else foo(i – 1) * 0.33 + foo(i – 2) * 0.51 + foo(i – 3) * 0.71

    for (i in 1..100) {
    a[foo(i).toInt] := i

    b = a[844] // Illegal?

    • #6 by helmutbrandl on April 5, 2013 - 9:40 am

      First example:
      The loop has the postcondition ‘all(i) i in (1..10) => times_two(i) in a.domain and a[times_two(i)]=i’. With this we can do the reasoning ‘1 in (1..10)’ which implies ‘time_two(1) in a.domain’ which together with ‘2=times_two(1)’ implies ‘2 in a.domain’.

      Second example:
      The loop has the postcondition ‘all(i) i in (1..100) => foo(i) in a.domain and a[foo(i).to_natural]=i’. In order to proof that ‘844 in a.domain’ we need to proof ‘some(i) i in (1..10) and foo(i).to_natural = 844’. In order to proof this it is necesary to derive some assertions about the function foo.

      • #7 by Jesper Nordenberg on April 5, 2013 - 4:04 pm

        But the whole point is that you dont know “2=times_two(i)”, all you know is timesTwo : NATURAL -> NATURAL.

        I just gave you the implementation of foo, you’re free to derive any assertions you can from it. Can the verifier check if the arrary read is valid or not in reasonable time?

      • #8 by helmutbrandl on April 5, 2013 - 7:24 pm

        1. Function specification:

        All functions need a specification. The type NATURAL->NATURAL is not a specification it is just a type. A specification of times_two can look like

        times_two(i:NATURAL): NATURAL ensure Result = 2*i end

        Don’t confuse the specification with an implementation (in this trivial case the specification is nearly identical with the implementation).

        The compiler/verifier “knows” the specification and can therefore derive ‘2=times_two(1)’.

        2. Undecidable problems:

        Clearly the verifier cannot resolve problems which are in its general case undecidable. If you state any assertion and expect the verifier to tell you that the assertion is true or false I have to disappoint you. This is impossible!

        3. Your function foo:

        You designed some function foo. Then you want an answer to the question if there is some i in (1..100) such that foo(i).to_natural = 844 is valid. In order to answer this question I would have to analyze your function definition in detail. Since you have designed this function, you are better suited to analyze it. I can offer you the following: If you can convince yourself that ‘some(i) i in (1..100) and foo(i).to_natural=844’ is valid, write down your arguments and post them here. Then I can show you how to convince that verifier about this fact. If you are not able to convince yourself about this fact don’t expect that you can convince the verifier.

  3. #9 by Jesper Nordenberg on April 6, 2013 - 4:22 am

    Well, you compare your language to languages like C++, Java and C#. In these languages I don’t have to write specifications for my functions, I just write the implementation and type, and I get runtime array index checks. This shows the downside of static verification, it sometimes places an unnecessary burden on the programmer. The specification for the timesTwo is as you write trivial to come up with, but I can’t write the specification for foo (can you?), still I can write a valid implementation.

    • #10 by helmutbrandl on April 7, 2013 - 11:18 am

      1. Specification and implementation:

      For many functions (including your function foo) the specification and implementation is practically identical. Look at the function times_two

      times_two(i:NATURAL): NATURAL
         ensure Result = 2*i end

      Each time you can express that result of a function as a closed expression (i.e. in the form Result = …) you have a valid specification. As long as the right hand side of ‘Result=…’ is computable (i.e. contains no ghost functions) the specification is sufficient in my programming language. The compiler derives the implementation automatically. The compiler treats the above definition of ‘times_two’ as if it had been written

      times_two(i:NATURAL): NATURAL
              Result := 2*i
              Result = 2*i

      But it is not necessary to write implementation and specification if the specification has the form ‘Result=…’.

      Let’s look at your function foo. In my programming language it is sufficient to write

      foo(i:NATURAL): FLOAT
              Result = if i<3
                       then i/3.33 
                       else foo(i-1)*0.33

      As you can see there is practically no difference between my programming language and languages like C#, java, etc. in the complexity of writing such a function.

      2. Runtime exceptions and static verification:

      Let’s assume you use your function foo to do some indexing operations on an array a like ‘v := a[foo(i).to_natural]’. In java and C# each array access at runtime includes an out of bound check. In case that the expression ‘foo(i).to_natural’ is out of the array bounds you get a runtime exception. What are you going to do if you get such a runtime exception? You certainly will analyze what values can become attached to the variable i and you certainly have to analyze the possible return values of the function ‘foo’.

      Static verification requires you to do this analysis upfront. You have to convince the verifier that the expression ‘foo(i).to_natural’ never yields an index out of the bounds of the array ‘a’.

      Therefore you gain nothing by not doing any static analysis unless you are willing to accept that your program might have errors which have not yet popped up with the tests you have done up to now.

      3. Skills:

      It is undeniable that programming with static analysis requires some skills which many programmers do not yet have. Static analysis can be done successfully only if one is capable of writing assertions and capable of reasoning with assertions. But these skills are nothing magic. They can be learned like any other technique (e.g. using a debugger etc.).

      • #11 by Jesper Nordenberg on April 8, 2013 - 3:13 am

        Oh, it’s nice that you only have to write the spec+impl once. That surely saves a lot of work. I agree that you need to cover all error cases, either statically or at runtime, but my concern is the performance of the verifier. The foo function is extremely slow to execute for larger i’s unless it’s memoized. It’s also not easy to check if i == foo(x).to_natural without executing foo(x) since it contains recursive floating point calculations. So, how is the performance of your verifier with the specification of foo that you gave? Will it complete the verification of foo(50) == in reasonable time?

  4. #12 by helmutbrandl on April 8, 2013 - 10:24 am

    Hi Jesper,

    the definition of your function foo is recursive and the execution of this
    function in a naive way recursively is highly imperformant. This statement is
    true for the verifier and for any runtime execution as well.

    Let me make some general comments first and then adress the performance of your
    specific function foo.

    1. Needed calculations and compile/verfication time

    In my experience actual calculations are rarely necessary during
    verification. The question whether a function f returns a specific value when
    fed with a constant is usually not necessary. Why do you want the verifier to
    answer the question “Is 4! = 24 true?”? You usually ask the verifier if your
    specific implementation of the factorial function returns the same result as
    the mathematical definition. But in some cases actual calculations are
    necessary. Since the verifier will do calculations (if needed) in an
    interpretative manner, the performance of this needed calculations will be
    slower than the calculations done by software compiled to native code.

    2. Recursive and iterative evaluation of a function

    Your function foo is highly imperformant if executed recursively. Its like the
    fibonacci function. When you do recursive evalutation you get bad performance
    and when you do iterative evaluation you get excellent results. Therefore my
    language allows to give a definition/specification of a function and an
    implementation of a function.

    In the case of your function foo I would use the recursive definition as the
    specification and provide an iterative implementation.

    foo(i:NATURAL): FLOAT
            if i<3 then
                Result := i/3.33
                    j: NATURAL
                        j     := 2
                        a,b,c := 0/3.33, 1/3.33, 2/3.33
                        2 <= j <= i
                        a = foo(j-2)
                        b = foo(j-1)
                        c = foo(j)
                        i - j
                        i = j
                        j,a,b,c := 
                    Result := c
            Result = if i<3
                     then i/3.33 
                     else foo(i-1)*0.33

    For all values of i with i>=3 a loop is used to calculate foo(i). The
    important thing for the verifier is the invariant of the loop. At all
    iterations the invariant is maintained i.e. it is valid before and after the
    body of the loop. Note that the function foo can be used within the assertions
    of the implementation because it has a definition/specification (as shown in
    the postcondition) which is independent of the implementation of the function.

    With this code the execution of the funciton foo will be excellent. This
    applies to interpretation and compilation to native code as well. The
    calculation of foo(50) requires just 48 iterations with a trivial loop body.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: