Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking)

Progress tracking for master's thesis Translating Time Bounds of Functional Programs to a Deeply-Embedded Language.

Progress

March 24th–April 28th

Read through Proof-Producing Translation of Functional Programs into a Time & Space Reasonable Model which describes existing work on functional correctness.

Read A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification which develops a framework for interactive proofs of running time intended as part of an interface specification. The method first recursively descends through the function to calculate its running time up to recursive calls, e.g. T_f(x) = 2 + T_f(x - 1). The user is asked to provide a "shape" of the running time bound they wish to specify, e.g. a * x, where a is left as a variable to solve for. Then induction is performed on the goal e.g. 2 + T_f(x - 1) <= a * x, which through induction then becomes 2 + a * (x - 1) <= a * x, thus providing a constraint on a suitable a. This is essentially the "substitution method" of Cormen et al..

April 28th–May 12th

Investigated how to generate running time constraints in Isabelle by recursive tactics on the structure of the IMP-Tailcall term, as in previous work on correctness. The issue to overcome is that we must work with a predicate of the form \exists c. \forall s. T_f_IMP s <= c * T_f (s "x"), which requires providing a single witness c for the linear factor. Previous work on correctness split the single goal recursively at sequences and conditionals into goals for each sub-term, but this approach doesn't work directly with an existential quantor: the same constant must be a witness in both subterms.

Instead of working directly with the existential, we can "provide" a witness up-front in the form of a schematic variable, and instantiate the schematic step-by-step. Thus we work with a predicate that bounds the running time by e.g. c * T_f (s "x"), without the existential quantor, and compute suitable constraints on the way down. Once we have computed a suitable c, it is obviously a witness for the existential quantor and we can close the goal.

This poses an additional challenge when previously verified functions are called, as the scheme devised above requires a concrete running time bound, not quantified with the existence of a constant. For this, we use Isabelle's LEAST operator to materialize the existential c from the running time of the auxiliary function's bound. (The obvious approach of "obtain"ing the constant with Isabelle's proof functionality does not work, as the schematic witness c of the function being analyzed can not depend on a value obtained later). We can then work with this (LEAST c. ...) * T_aux ... term just like any other bound.

After analyzing the term, we can instantiate and apply an induction hypothesis where applicable, this essentially corresponds to Cormen's "substitution method".

We are left with constraints on the constant c which must be solved, then c can be instantiated and the goals closed.

To Discuss

Should we investigate other approaches? For example, computing the running time up-front.

How can we automatically solve for suitable constants from the resulting constraints?

To-do for Next Week

Examine more complex functions which:

call functions with non-constant running time
have preconditions on the well-formedness of their inputs
use data types (this will only work to relate to HOL-Nat)

The condition of an if/else may depend on the input beyond just determining which case of the original function is being called, i.e. cannot necessarily be solved directly based on the inductive case being considered. Thus we cannot assume that we can solve the if/else if we introduce one into our timing constraint. The natural idea to use a maximum of both branches' running time won't work, consider for example the simple expression if b then 1 else g y; then, taking the maximum on the running time would result in a constraint like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * T_f (s "x"), which upon expanding T_f in the right-hand-side and taking the lower bound of the resulting if/else, would yield something like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * 1, which is obviously nonsense. Instead, we could try creating one inequality constraint for each branch.

Investigate the use of SMT-solvers or other automated solvers for the resulting inequality constraints.

May 12th–26th

Better rule for if. Gather running time into constraint for each terminal (tail-call or assignment to return register), "run" function, rearrange constraint to find lower bound on c, take the maximum of all lower bounds as suitable c.

To Discuss

To-do for Next Week

Prove that the running time of compiled functions always has the assumed form. Use the definition of the compiler (Kappelmann et al.) and the definition of generated HOL running times.
From here, argue why or why not an SMT solver could be used to solve the resulting constraints.
Adapt the running/gather method to (tail) recursive functions.

May 26th–June 6th

Formalized a semantics for HOL-TCNat, such that T_f x1 ... xn = e_(x1, ..., xn) + 1 for a function f x1 ... xn = e and its generated running time function T_f. The big-step predicate has the form: (tail-call term :: HOL_TCNat, argument vector :: nat list) |- (term :: HOL_TCNat, let-bound variables :: Nat list) =>^(running time :: nat) (result :: nat).

So we define:

(f,xs) |- (Number n,bs) =>^0 n (and similar for LetBound n and Arg n, by accessing bs and xs respectively).
If forall i. (f,xs) |- (ts_i,bs) =>^zs_i vs_i and g vs = v' and z' = sum(zs) + T_g vs, then (f,xs) |- (Call g ts,bs) =>^z' v'
etc.

Then, for the body f of a HOL-TCN function F assuming:

there are correctness lemmas available for each non-primitive auxiliary function, and
each primitive aux. function has constant running time,

we can show:

forall r_0. forall t, r.
 exist fs. fs_i is called in t --> exist cs, k.
  forall xs, b.
   forall s. s arg_f,i = xs_i -->
    exist ss'.
     (f,xs) |- (t,b) =>^(sum_i(fs_i ss'_i) z for some z
     and [[f]]_(r_0)^([]) |- ([[t]]_r^b,s)
         =>^(k + sum_i(cs_i * fs_i s's_i)) s' for some s'

The proof is an induction on the rule of F, followed by an induction on the term f. The Let-case requires a lemma about substitution.

Thus the running time always has the assumed form, and approach from last time (almost) works: for tail-recursive functions, the procedure for finding a lower bound on c would try and instantiate c with a term containing c... This can be fixed. In general, we have to show:

k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i)

In the case where one of the fs_i is ?c * T_f s', we rearrange to:

k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i) - ?c * T_f s'

removing c_f, T_f, s' from the vectors on the LHS. On the RHS, one of the fs/s's is c_f/T_f, which cancels out the subtracted term.

Then we can rearrange to find a lower bound on ?c, as before.

June 6th–June 23rd

Finished writing up the proof from last week.

Discussion

Problem: when doing induction over the structure of the term t, in the recursive call case, there is no subterm to apply the induction hypothesis to. Instead, we can do the proof by rule induction over the semantics of HOL-TCN. This means we are assuming that there is an inference for the term in the semantics (i.e. it terminates with a time+value); the justification for this assumption is that the HOL function has a termination order, from which we can construct a termination order for the HOL-TCN term.

A certified translation of Coq functions into deeply-embedded lambda terms: https://drops.dagstuhl.de/storage/00lipics/lipics-vol141-itp2019/LIPIcs.ITP.2019.17/LIPIcs.ITP.2019.17.pdf. Question: what about the step from lambda terms to an imperative language?

To-do for next week

Write-up proof in TeX. Start implementation in ML.

June 23rd–July 21st

Idea from last time doesn't work. Need to explicitly calculate called functions of HOL-TCN term, then relate to called IMP programs of compiled IMP-TC program. See PDF (Zulip).

To-do for next time

Talk about "traces" instead of "running to leaves", use bigstep notation (time is replaced by trace, end state/value by pre-tail state).

Instead of putting the trace semantics into one function/predicate, one could investigate an alternative approach. The trace semantics and the subsequent interpretation as running time can essentially be split into three steps: for a given state, 1. compute a linear "list" of instructions by resolving all branches from the given state, 2. from the linearization, extract the list of calls/constants that actually compose the trace, 3. interpret the trace into a concrete (i.e. just an integer) running time. Steps 1. and 2. are currently combined into one predicate, but could be split. This should be investigated, as it could lead to (potentially more but) easier/simpler proofs.

July 21st–August 11th

Formalized (in Isabelle) the trace semantics that were presented last time, along with the HOL-TCN to IMP-TC compiler. Proved the compiler correct (for terms without recursive calls), and formalized the proof that the traces of the compiled program are related to the traces of the HOL-TCN term.

To-do for next time

A few (smaller?) lemmas are still required, finish those. Prove the theorems relating traces and running times for HOL-TCN and IMP-TC. Prove the corollary of related traces that running times are also related.

Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking)

Contents

Progress

March 24th–April 28th

April 28th–May 12th

To Discuss

To-do for Next Week

May 12th–26th

To Discuss

To-do for Next Week

May 26th–June 6th

June 6th–June 23rd

Discussion

To-do for next week

June 23rd–July 21st

To-do for next time

July 21st–August 11th

To-do for next time

Navigation menu

Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking)

Progress

March 24th–April 28th

April 28th–May 12th

To Discuss

To-do for Next Week

May 12th–26th

To Discuss

To-do for Next Week

May 26th–June 6th

June 6th–June 23rd

Discussion

To-do for next week

June 23rd–July 21st

To-do for next time

July 21st–August 11th

To-do for next time

Navigation menu

Search