A dependently Typed Assembly Language (Joint work with Robert Harper)
tarix 25.03.2017 ölçüsü 476 b. #12519
(Joint work with Robert Harper )
The general motivation Q: Why do we want to type low level languages? A: We want to reap the benefits of type systems at low levels.
Advantages of type systems Capturing program errors at compile-time (well-known) Enabling aggressive compiler optimizations (recent) Supporting sophisticated module systems (SML) Facilitating program verification (NuPRL, Coq, PVS) Serving as program documentation
The goal of this work Memory Safety = Type Safety + Safe Array Access
Array bounds checking problem Array bounds checking refers to determining whether the value of an expression is within the bounds of an array when the expression is used to index the array.
Byte copy: A version in C void bcopy(int src[], int dst[]) { int i; if (length(src) != length(dst) { printf “bcopy: unequal lengths\n”; exit(1); } for(i=1; i < length(src), i++) dst[i] = src[i]; }
Dynamic array bounds checking is required for safe languages such as Java, Modula-3, ML, Pascal can be expensive in practice (e.g. numerical computation) bounds violation is a rich source of program errors in unsafe languages such as C, C++ (e.g. off-by-one error)
Static array bounds checking Flow Analysis no annotations required (fully automatic) limited to simple cases and sensitive to program structures little or no feedback for detecting program errors
Static array bounds checking Type-based approaches ML type system is too coarse full dependent types is too fine dependent ML provides an intermediate type system with practical type-checking it is adequate for array bounds checking elimination the programmer must write some type annotations
What are dependent types? Dependent types depend on the values of language expressions. For instance, type : dependent type array : array(x) int : int(x) stack : stack(x)
Byte copy: A version in de Caml let bcopy src dst = begin for i = 0 to vect_length(src) - 1 do dst..(i) <- src..(i) done end withtype {n:nat} int vect(n) -> int vect(n) -> unit
Array bounds checking in mobile code It is difficult to eliminate since the machine which executes the code may not trust the source of the code It is time-consuming to be compiled away
Some key applications of DTAL Compiler verification Mobile code security Mobile code efficiency
inc: pop r1 add r1, r1, 1 pop r2 push r1 jmp r2
State types A state type corresponds to code continuation. It records the type information about register file and stack. For instance, [r1: int(i), r2: int array(i)] (‘a)[r1: ‘a, r2: [r1: ‘a]] (‘a,‘b)[r1: ‘a, r2: ‘b, r3: [r1: ‘a, r2: ‘b]] {n:nat}[sp: [sp: stack(n)] :: stack(n)]
Register file We use an array representation for register file.
Stack We use a list representation for stack.
Increment: A version in DTAL inc:{i:int}{n:nat} [sp: int(i):: [sp: int(i+1)::stack(n)]::stack(n)] pop r1 // r1: int(i) add r1, r1, 1 // r1: int(i+1) pop r2 // r2: [sp:int(i+1)::stack(n)] push r1 // sp: int(i+1)::stack(n) jmp r2
Type index objects index i,j ::= a | c | i+j | i-j | i*j | i/j index prop P ::= false | true | i=j| i>j | not P | P1 and P2 | P1 or P2 index sort gamma ::= int | {a: gamma | P} For instance, nat is a shorthand for {a: int | a >= 0}
Types types tau ::= alpha | sigma | int(x) | tau array(x) | stack(x) | prod(tau1,…,taun) | {a:gamma}.tau state types sigma ::= [(alpha1,…,alphan){a1:gamma1,…,an:gamman}.state] state state ::= (register file,stack) Note: int is for {a:int }.int(a) nat is for {a:nat }.int(a)
Instructions in DTAL instructions ins ::= aop rd, rs, v | bop r, v | jmp v | load rd, rs(v) | store r | newtuple[tau] r | newarray[tau] r | mov r, v | pop r | push r values v ::= () | i | l | r instruction sequences I ::= halt | ins; I | l; I
Programs in DTAL label maps Lambda ::= (l1: sigma1, …, ln: sigman) programs ::= (Lambda, I)
Memory allocation
Memory allocation: an example A tuple of type prod(int, prod(int, int)):
Array types are non-variant tau1 <= tau2 does not implies tau1 array(n) <= tau2 array(n) Here is a counterexample: r1: nat array(2) r2: int array(2)
State types are contra-variant state state’ implies [state] <= [state’]
Typing unconditional jumps
Typing conditional jumps
Byte copy: A flow chart
Byte copy: A version in DTAL bcopy:{i:nat} [r1: int(i), r2: int array(i), r3: int array(i)] mov r4, 0 // r4 <- 0 jmp loop // start loop
Byte copy: a version in DTAL loop: {i:nat, k:nat} [r1: int(i), r2: int array(i) r3: int array(i), r4: int(k)] sub r5, r1, r4 // r5 = r1 - r4 blte r5, finish // r5 <= 0 ? load r5, r2(r4) // safe load store r3(r4), r5 // safe store add r4, r4, 1 // r4 <- r4 + 1 jmp loop // loop again finish:[] halt
Operational semantics of DTAL We use a standard abstract machine for assigning operational semantics to DTAL programs. The machine consists of three finite maps:
Soundness The execution of a DTAL program can either terminate normally, or run forever, or stall. A well-typed DTAL program can never stall.
Related work Here is a (partial) list of some closely related work. Dependent types in practical programming (Xi & Pfenning) TALC Compiler (Morrisett et al at Cornell) Safe C compiler (Necula & Lee) TIL compiler (the Fox project at CMU)
Current status & Future work We have finished the following. Theoretical development of DTAL A prototype implementation of a type-checker for DTAL We are working on the following. Designing a dependent type system of JVML (de JVML) Compiling (a subset of Java) into de JVML
Conclusion We have demonstrated some uses of dependent types at assembly level. It can help compiler debugging and verification It can certify the memory safety property of mobile code It can lead to safer programs without compromising efficiency
Dostları ilə paylaş: