# Introduction

Nearly all languages with imperative elements have some kind of an array. In C

an array is just a typed pointer to a memory region. You can address the

different elements by giving an offset to the start of the memory region. The

compiler multiplies the offset with the size of the objects contained in the

array to get the actual offset in bytes.

The C view of arrays has certain advantages: It is memory efficient because it

uses just one pointer. Accessing elements is easy because it just needs some

simple address arithmetic.

But the C view has some pitfalls as well. First you might fail to allocate

memory for the array. Then the pointer to the memory region has some undefined

value. Second you might read elements of the array which have not yet been

initialized. Third you might read from or write to elements of the array which

are not within the allocated region.

Modern languages avoid this pitfalls by certain methods which can be executed

at compile time or at runtime. The failure to initialize the pointer to the

memory region is usually handled by initializing all pointers with zero. Any

access to a zero pointer results in a runtime exception. Addressing elements

outside the region is usually handled at runtime as well by storing with the

array its capacity and generating an exception if elements outside the region

are addressed.

All these solutions avoid memory corruption but they have a cost. Additional

code is required to check null pointer accesses and to check out of bound

accesses. The single pointer implementation is no longer sufficient because

the size of the allocated region has to be stored in the runtime object.

A language which allows formal verification can avoid all these pitfalls

without any cost. Without any cost? Well — without any runtime and memory

footprint cost. But nothing is free. Some assertions must be provable at

compile time. I.e. whenever you want to access some element of the array you

have to be sure that the index is within the array bounds. And it is not

sufficient that you are sure. You have to convince the verifier of this fact.

In the following article we show an array structure in our programming

language which allows us to convince the verifier that we make no illegal

accesses.

Since an array is a mutable structure we have to address the framing problem

as well. The framing problem is addressed appropriately if for any modifying

procedure it is clear not only what it does change but also what it leaves

unchanged.

A lot of effort is done currently in the verification community to address the

framing problem. Frame specifications like modify and use clauses for each

procedure, separation logic, different sophisticated heap models etc.

In this article we demonstrate that the framing problem can be solved without

all these complicated structures. It is sufficient to regard access to data

structures as functions and have a sufficient powerful function model. It is

amazing to see that proper abstraction makes some difficult seeming problems a

lot easier.

As a warm up let us see what kind of operations we expect from arrays. First

of all we want to be able to declare variables of type array which hold

elements of a certain type.

local s: STRING a: ARRAY[STRING] do ...

The declaration does not allocate the object. Therefore accessing any element

of the array has to be illegal at this point.

a[4] := "Hello" -- Illegal, array not yet allocated!

To allocate the object we need some kind of object creation.

create a.with_capacity(10)

This instruction creates an array object with the possibility to store up to

10 strings. The creation just allocates the structure but does not populate

the array with any string.

At this point it is illegal to read elements.

s := a[0] -- Illegal, access to an uninitialized object!

But we can write some elements of the array and access these elements

a[0] := "first string" a[1] := "second string" s := a[1] s := a[2] -- Illegal, access to an uninitialized object! s := a[100] -- Illegal, out of bound access! ... end

If the verifying compiler is able to detect these illegal accesses and let

pass only the legal accesses, then the actual runtime code can have the same

size as the corresponding C code and the actual array object can be

represented by a single pointer to the memory region.

# Basic declarations and access functions

The type ARRAY is declared within the module array

-- module: array

We need a formal generic which represents the type of the elements.

G: ANY -- Element type

There are no constraints on the element type. Any type can be used as element

type.

The generic class ARRAY has three basic access functions and an invariant.

class ARRAY[G] feature capacity: ghost NATURAL -- The allocated capacity of the array. domain: ghost NATURAL? -- The set of indices of elements which have been initialized. [] (i:NATURAL): G -- Element at position `i'. require i in domain note built_in end invariant domain <= {i: i<capacity} -- Note: '<=' for sets is the subset relation end

‘capacity’ returns the allocated capacity of the array. Since it is a ghost

attribute it can be used only in assertions and cannot flow into actual

computations. The attribute ‘domain’ represents the set of indices of array

elements which have already been initialized. This is a ghost attribute as

well.

The bracket operator ([]) represents the function which returns the element

at position ‘i’. It uses ‘domain’ in its precondition to express that only

elements at initialized positions can be allocated.

The invariant states that only elements within the allocated capacity can be

initialized.

We have presented the above declaration of the class ARRAY in an object

oriented manner. Programmers familiar with object oriented languages like

C++, Java, C# should have no problems to read such a declaration. But it

should be noted that such a form of declaration hides one important thing. All

routines declared within the class ARRAY have one implicit argument which is

not explicitly mentioned, namely the array object (called ‘this’ in C++ and

Java). In order to make this argument more explicit we present this

declaration in a different but equivalent form.

class ARRAY[G] end feature capacity(a:CURRENT): ghost NATURAL domain(a:CURRENT): ghost NATURAL? [](a:CURRENT, i:NATURAL): G require i in a.domain note built_in end end invariant all(a:CURRENT) ensure a.domain <= {i: i < a.capacity} end end

This form of the declaration makes transparent that the used identifiers and

operators have the following types:

capacity: ghost CURRENT->NATURAL domain: ghost CURRENT->NATURAL? ([]): [CURRENT,NATURAL] -> G

Remark: In our programming language we use the Haskell convention that an

operator (unary or binary) put between parentheses can be used like an

ordinary function. I.e. for numbers the expressions ‘7+3’ and ‘(+)(7,3)’ are

equivalent, but the first one is preferable because it is much more readable.

If something looks like an attribute in object oriented notation it is

actually a function because it needs an argument. Therefore an assignment to

an attribute is something fundamentally different from the assignment to a

local variable. An assignment of a new value to an attribute is a modification

of the corresponding function.

Note that the unfolded form of the class invariant makes evident that the

invariant is not only an object invariant but an invariant for the whole

system. In the case of such simple classes as ARRAY this makes no

difference. But if we look at linked structures the difference is fundamental.

# Implicit invariant

The function domain has been used in the precondition of the function

([]). This creates the following implicit invariant.

invariant ([]).domain = {a:CURRENT, i:NATURAL: i in a.domain} end

Since procedures might modify the functions ([]) and domain, they have to

respect the invariant, the explicit and the implicit invariant.

# Dependence of the access functions

The basic access functions capacity, domain and ([]) are independent in the

sense that the value of one function does not depend on the value of any other

or these basic functions. There is a consistency condition between domain and

([]) because the function domain is used in the precondition of ([]) (see

previous chapter). However the value returned by ([]) does not depend on the

function domain.

The basic access functions capacity and domain have another consistency

condition expressed in the invariant. Note that consistency conditions do not

establish a dependence.

Based on the basic access functions we can define other functions. E.g. we

might define a function is_full which returns true if all positions in the

array are accessible.

is_full(a:CURRENT): ghost -- Are all positions of the array `a' accessible? ensure Result = ({i: i < a.capacity} = a.domain) end

This function has a clear dependency: It depends on the functions capacity and

domain. This implies that any modification of the functions capacity and/or

domain modifies implicitly the function is_full as well.

# Creation of an object

In order to be able to create an object of type array we need a creation

procedure (or a constructor called in languages like C++, Java and C#).

class ARRAY[G] create with_capacity(c:NATURAL) -- Create an array with capacity `c'. note built_in ensure domain = 0 -- '0' is the empty set capacity = c end end

In order to make the hidden argument more transparent we take the declaration

out of the class and get the following equivalent declaration.

create with_capacity(a:CURRENT, c:NATURAL) note built_in ensure a.domain = 0 a.capacity = c end end

Each creation procedure receives as its first argument an allocated but

uninitialized object of the corresponding type. The task of the creation

procedure is to initialize the object such that it satisfies the consistency

conditions expressed in the invariant.

The postcondition states that the creation procedure has an effect on the

functions ‘domain’ and ‘capacity’. Remember that the corresponding types are

CURRENT->NATURAL? and CURRENT->NATURAL.

Expressions like ‘a.domain=0’ in postconditions of procedures are interpreted

in our language rather strictly. It reads: the identifier ‘domain’ identifies

a new function which is the same as the function which it identified before

the call except for the argument ‘a’. Only functions mentioned in the

postcondition (and functions which depend on them) are modified by a

procedure. I.e. the declaration of the creation procedure is interpreted in the

following manner.

create with_capacity(a:CURRENT, c:NATURAL) note built_in ensure domain = old (domain + [a,0]) capacity = old (capacity + [a,c]) ([]) = old ([]) end end

For any function f:A->B the expression ‘f+[a,b]’ is defined as

f + [a,b] = (x -> if x=a then b else f(x) end)

with the consequences

(f+[a,b]).domain = f.domain + {a} x=a => (f+[a,b])(x) = b x/=a => (f+[a,b])(x) = f(x)

I.e. we view functions as sets of key value pairs then the expression ‘f+[a,b]’

is the set of key value pairs of the function ‘f’ except for the key ‘a’ where

the pair [a,b] either overrides the one already contained in ‘f’ or is added

to the set of key value pairs.

The strict interpretation of the postcondition of ‘with_capacity’ corresponds

to our intuition if we read expressions like ‘a.domain=0’ in the postcondition

of a procedure. Our intuition tells us the ‘a.domain’ has been set to ‘0’ but

‘b.domain’, ‘c.domain’, … for ‘b’ and ‘c’ different from ‘a’ have still the

same value.

The strict interpretation of postconditions of procedures allows us to

interpret the postconditions as framing conditions. A postcondition of a

procedure contains everything the procedure has modified and all information

needed to derived the things which have not been modified. Postconditions of

procedures are treated as implicit framing conditions.

# Put elements into the array

We need a procedure to put elements at certain positions.

class ARRAY[G] feature put(i:NATURAL, v:G) -- Update (or set) the value `v' at position `i'. require i < capacity note built_in ensure domain = old (domain + {i}) Current[i] = v end end

In order to make the arguments more transparent we use the following

equivalent declaration of the procedure ‘put’.

feature put(a:CURRENT, i:NATURAL, v:G) require i < a.capacity note built_in ensure a.domain = old (a.domain + {i}) Current[i] = v end end

The procedure ‘put’ modifies two functions, the ghost function ‘domain’ and

the element access function represented by the bracket operator ([]). The

verifier extracts the following strict interpretation of this procedure.

feature put(a:CURRENT, i:NATURAL, v:G) require i < a.capacity note built_in ensure domain = old (domain + [a, a.domain + {i}]) ([]) = old (([]) + [[a,i],v]) capacity = old capacity end end

I.e. the procedure put assigns new values to the functions domain and

([]). Since the function capacity is independent of the functions domain and

([]) and the postcondition does not mention neither the function capacity nor

any function which depend on it, the strict postcondition is extracted that

capacity remains unchanged.

There is one interesting thing to note about the procedure put and the

function ([]). The procedure put modifies the function ([]) at just one

position and updates the domain appropriately. The signatures of ([]) and put

are consistent in the sense that they share the same arguments. The procedure

put has just one argument more which is the value to be set. These facts are

sufficient for the compiler to derive that put is an assigner for the function

([]). We can use this to write more readable code. Instead of writing

a.put(i,v)

we can use the more suggestive notation

a[i] := v

# Swap elements

It is easy to write a procedure which swaps two elements of an array.

feature swap(a:CURRENT, i,j:NATURAL) require i in a.domain j in a.domain ensure a[i] = old a[j] a[j] = old a[i] end end

Because of the fact that the function ([]) has the assigner command put the

procedure swap needs no explicit implementation. The postcondition is

sufficient because it states that the function ([]) gets new values at the

argument positions [a,i] and [a,j]. The assigner command put can be used to

derive the implementation. The automatically derived implementation looks like

local tmp1, tmp2:G := a[i],a[j] do a.put(i,tmp2) a.put(j,tmp1) end

As for the previous procedures we give the strict interpretation of the

postcondition.

feature swap(a:CURRENT, i,j:NATURAL) require i in a.domain j in a.domain ensure domain = old domain capacity = old capacity ([]) = old (([]) <+ {i -> a[j], j -> a[j]}) end end

In this postcondition we have used a notation which needs some

explanation. In our programming language we can define a function f:A->B by

giving a set of key value pairs.

local f:STRING->NATURAL do ... f := {"one" -> 1, "two" -> 2} -- An illegal assignment f := {"one" -> 1, "two" -> 2, "one" -> 2} ... end

The verifier generates the proof obligation that the same key must have the

same value. Therefore the second assignment in the example is illegal. Let us

look at the proof obligation generated for the second assignment.

a1: "one" = "two" => 1 = 2 a2: "one" = "one" => 1 = 2

The assertion a1 is provable because the antecedent of the implication is

false and from a false antecedent anything can be in the consequent. The

assertion a2 is definitely not provable because it requires the verifier to

prove ‘1=2’ which is not possible.

Going back to the procedure swap we can see that ‘{i->a[j],j->[a[i]}’ is a

function of type NATURAL->G with generates the proof obligation

i = j => a[i] = a[j]

which is trivially valid.

Furthermore in the postcondition we have used the function override operator

‘<+’ which has the following definition for two functions ‘f,g:A->B’.

f <+ g = (x -> if x in g.domain then g(x) elseif x in f.domain then f(x) end)

with the consequences

(f<+g).domain = f.domain + g.domain x in g.domain => (f<+g)(x) = g(x) x in f.domain-g.domain => (f<+g)(x) = f(x)

Using these definitions one can verify that the above mentioned strict

postcondition has the intended meaning.

#1 by

Jesper Nordenbergon April 4, 2013 - 2:57 amI don’t see how this can possibly work in the general case as the representation of the array domain must be able to contain roughly as many elements as the array which means you possibly would need as much memory when type checking the program as when running it. Not very practical.

#2 by helmutbrandl on April 4, 2013 - 2:37 pm

The domain of the array and the capacity are ghost attributes. Therefore they are just used for reasoning and are not needed in any physical representation (ghost attributes/function cannot flow into actual computation, they can just be used within assertions). The physical representation of an array is a pointer to the corresponding memory region and nothing else.

#3 by

Jesper Nordenbergon April 5, 2013 - 1:39 amI see. What if the computation of the indexes to write in the array is a complex function which takes a lot of time to execute (ie it’s extremely costly to check if a NATURAL is a member of the index set)? Or if the indexes are calculated in some function that simply returns NATURAL’s? For example:

timesTwo : NATURAL -> NATURAL

for (i in 1..10)

a[i] := timesTwo(i)

b = a[2] // Illegal?

#4 by helmutbrandl on April 5, 2013 - 7:59 am

I am not sure whether I understand your question.

Assertions are verified during compile time by the proof engine. This checking is done by reasoning and not by calculation. Therefore it does not matter if an operation is costly or not.

The operation ‘b:=a[2]’ is legal if compiler can verify that ‘2 in a.domain’ is valid. This is not difficult in your example. The loop has the postcondition ‘all(i) i in (1..10) => i in a.domain and a[i]=times_two(i)’. Since ‘2 in (1..10) is valid, the verifier can derive the validity of ‘2 in a.domain’. I.e. the statement ‘b:=a[2]’ is legal and accepted by the verifier.

#5 by

Jesper Nordenbergon April 5, 2013 - 8:32 amI mis-wrote, I meant:

timesTwo : NATURAL -> NATURAL

for (i in 1..10) {

a[timesTwo(i)] := i

}

b = a[2] // Illegal?

Here’s an example of a costly indexing operation:

foo(i : Int) : Double = if (i < 3) i / 3.33 else foo(i – 1) * 0.33 + foo(i – 2) * 0.51 + foo(i – 3) * 0.71

for (i in 1..100) {

a[foo(i).toInt] := i

}

b = a[844] // Illegal?

#6 by helmutbrandl on April 5, 2013 - 9:40 am

First example:

The loop has the postcondition ‘all(i) i in (1..10) => times_two(i) in a.domain and a[times_two(i)]=i’. With this we can do the reasoning ‘1 in (1..10)’ which implies ‘time_two(1) in a.domain’ which together with ‘2=times_two(1)’ implies ‘2 in a.domain’.

Second example:

The loop has the postcondition ‘all(i) i in (1..100) => foo(i) in a.domain and a[foo(i).to_natural]=i’. In order to proof that ‘844 in a.domain’ we need to proof ‘some(i) i in (1..10) and foo(i).to_natural = 844’. In order to proof this it is necesary to derive some assertions about the function foo.

#7 by

Jesper Nordenbergon April 5, 2013 - 4:04 pmBut the whole point is that you dont know “2=times_two(i)”, all you know is timesTwo : NATURAL -> NATURAL.

I just gave you the implementation of foo, you’re free to derive any assertions you can from it. Can the verifier check if the arrary read is valid or not in reasonable time?

#8 by helmutbrandl on April 5, 2013 - 7:24 pm

1. Function specification:

All functions need a specification. The type NATURAL->NATURAL is not a specification it is just a type. A specification of times_two can look like

times_two(i:NATURAL): NATURAL ensure Result = 2*i end

Don’t confuse the specification with an implementation (in this trivial case the specification is nearly identical with the implementation).

The compiler/verifier “knows” the specification and can therefore derive ‘2=times_two(1)’.

2. Undecidable problems:

Clearly the verifier cannot resolve problems which are in its general case undecidable. If you state any assertion and expect the verifier to tell you that the assertion is true or false I have to disappoint you. This is impossible!

3. Your function foo:

You designed some function foo. Then you want an answer to the question if there is some i in (1..100) such that foo(i).to_natural = 844 is valid. In order to answer this question I would have to analyze your function definition in detail. Since you have designed this function, you are better suited to analyze it. I can offer you the following: If you can convince yourself that ‘some(i) i in (1..100) and foo(i).to_natural=844’ is valid, write down your arguments and post them here. Then I can show you how to convince that verifier about this fact. If you are not able to convince yourself about this fact don’t expect that you can convince the verifier.

#9 by

Jesper Nordenbergon April 6, 2013 - 4:22 amWell, you compare your language to languages like C++, Java and C#. In these languages I don’t have to write specifications for my functions, I just write the implementation and type, and I get runtime array index checks. This shows the downside of static verification, it sometimes places an unnecessary burden on the programmer. The specification for the timesTwo is as you write trivial to come up with, but I can’t write the specification for foo (can you?), still I can write a valid implementation.

#10 by helmutbrandl on April 7, 2013 - 11:18 am

1. Specification and implementation:

For many functions (including your function foo) the specification and implementation is practically identical. Look at the function times_two

Each time you can express that result of a function as a closed expression (i.e. in the form Result = …) you have a valid specification. As long as the right hand side of ‘Result=…’ is computable (i.e. contains no ghost functions) the specification is sufficient in my programming language. The compiler derives the implementation automatically. The compiler treats the above definition of ‘times_two’ as if it had been written

But it is not necessary to write implementation and specification if the specification has the form ‘Result=…’.

Let’s look at your function foo. In my programming language it is sufficient to write

As you can see there is practically no difference between my programming language and languages like C#, java, etc. in the complexity of writing such a function.

2. Runtime exceptions and static verification:

Let’s assume you use your function foo to do some indexing operations on an array a like ‘v := a[foo(i).to_natural]’. In java and C# each array access at runtime includes an out of bound check. In case that the expression ‘foo(i).to_natural’ is out of the array bounds you get a runtime exception. What are you going to do if you get such a runtime exception? You certainly will analyze what values can become attached to the variable i and you certainly have to analyze the possible return values of the function ‘foo’.

Static verification requires you to do this analysis upfront. You have to convince the verifier that the expression ‘foo(i).to_natural’ never yields an index out of the bounds of the array ‘a’.

Therefore you gain nothing by not doing any static analysis unless you are willing to accept that your program might have errors which have not yet popped up with the tests you have done up to now.

3. Skills:

It is undeniable that programming with static analysis requires some skills which many programmers do not yet have. Static analysis can be done successfully only if one is capable of writing assertions and capable of reasoning with assertions. But these skills are nothing magic. They can be learned like any other technique (e.g. using a debugger etc.).

#11 by

Jesper Nordenbergon April 8, 2013 - 3:13 amOh, it’s nice that you only have to write the spec+impl once. That surely saves a lot of work. I agree that you need to cover all error cases, either statically or at runtime, but my concern is the performance of the verifier. The foo function is extremely slow to execute for larger i’s unless it’s memoized. It’s also not easy to check if i == foo(x).to_natural without executing foo(x) since it contains recursive floating point calculations. So, how is the performance of your verifier with the specification of foo that you gave? Will it complete the verification of foo(50) == in reasonable time?

#12 by helmutbrandl on April 8, 2013 - 10:24 am

Hi Jesper,

the definition of your function foo is recursive and the execution of this

function in a naive way recursively is highly imperformant. This statement is

true for the verifier and for any runtime execution as well.

Let me make some general comments first and then adress the performance of your

specific function foo.

1. Needed calculations and compile/verfication time

In my experience actual calculations are rarely necessary during

verification. The question whether a function f returns a specific value when

fed with a constant is usually not necessary. Why do you want the verifier to

answer the question “Is 4! = 24 true?”? You usually ask the verifier if your

specific implementation of the factorial function returns the same result as

the mathematical definition. But in some cases actual calculations are

necessary. Since the verifier will do calculations (if needed) in an

interpretative manner, the performance of this needed calculations will be

slower than the calculations done by software compiled to native code.

2. Recursive and iterative evaluation of a function

Your function foo is highly imperformant if executed recursively. Its like the

fibonacci function. When you do recursive evalutation you get bad performance

and when you do iterative evaluation you get excellent results. Therefore my

language allows to give a definition/specification of a function and an

implementation of a function.

In the case of your function foo I would use the recursive definition as the

specification and provide an iterative implementation.

For all values of i with i>=3 a loop is used to calculate foo(i). The

important thing for the verifier is the invariant of the loop. At all

iterations the invariant is maintained i.e. it is valid before and after the

body of the loop. Note that the function foo can be used within the assertions

of the implementation because it has a definition/specification (as shown in

the postcondition) which is independent of the implementation of the function.

With this code the execution of the funciton foo will be excellent. This

applies to interpretation and compilation to native code as well. The

calculation of foo(50) requires just 48 iterations with a trivial loop body.