Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking): Difference between revisions

Revision as of 12:31, 6 June 2025

Progress tracking for master's thesis Translating Time Bounds of Functional Programs to a Deeply-Embedded Language.

Progress

March 24th–April 28th

Read through Proof-Producing Translation of Functional Programs into a Time & Space Reasonable Model which describes existing work on functional correctness.

Read A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification which develops a framework for interactive proofs of running time intended as part of an interface specification. The method first recursively descends through the function to calculate its running time up to recursive calls, e.g. T_f(x) = 2 + T_f(x - 1). The user is asked to provide a "shape" of the running time bound they wish to specify, e.g. a * x, where a is left as a variable to solve for. Then induction is performed on the goal e.g. 2 + T_f(x - 1) <= a * x, which through induction then becomes 2 + a * (x - 1) <= a * x, thus providing a constraint on a suitable a. This is essentially the "substitution method" of Cormen et al..

April 28th–May 12th

Investigated how to generate running time constraints in Isabelle by recursive tactics on the structure of the IMP-Tailcall term, as in previous work on correctness. The issue to overcome is that we must work with a predicate of the form \exists c. \forall s. T_f_IMP s <= c * T_f (s "x"), which requires providing a single witness c for the linear factor. Previous work on correctness split the single goal recursively at sequences and conditionals into goals for each sub-term, but this approach doesn't work directly with an existential quantor: the same constant must be a witness in both subterms.

Instead of working directly with the existential, we can "provide" a witness up-front in the form of a schematic variable, and instantiate the schematic step-by-step. Thus we work with a predicate that bounds the running time by e.g. c * T_f (s "x"), without the existential quantor, and compute suitable constraints on the way down. Once we have computed a suitable c, it is obviously a witness for the existential quantor and we can close the goal.

This poses an additional challenge when previously verified functions are called, as the scheme devised above requires a concrete running time bound, not quantified with the existence of a constant. For this, we use Isabelle's LEAST operator to materialize the existential c from the running time of the auxiliary function's bound. (The obvious approach of "obtain"ing the constant with Isabelle's proof functionality does not work, as the schematic witness c of the function being analyzed can not depend on a value obtained later). We can then work with this (LEAST c. ...) * T_aux ... term just like any other bound.

After analyzing the term, we can instantiate and apply an induction hypothesis where applicable, this essentially corresponds to Cormen's "substitution method".

We are left with constraints on the constant c which must be solved, then c can be instantiated and the goals closed.

To Discuss

Should we investigate other approaches? For example, computing the running time up-front.

How can we automatically solve for suitable constants from the resulting constraints?

To-do for Next Week

Examine more complex functions which:

call functions with non-constant running time
have preconditions on the well-formedness of their inputs
use data types (this will only work to relate to HOL-Nat)

The condition of an if/else may depend on the input beyond just determining which case of the original function is being called, i.e. cannot necessarily be solved directly based on the inductive case being considered. Thus we cannot assume that we can solve the if/else if we introduce one into our timing constraint. The natural idea to use a maximum of both branches' running time won't work, consider for example the simple expression if b then 1 else g y; then, taking the maximum on the running time would result in a constraint like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * T_f (s "x"), which upon expanding T_f in the right-hand-side and taking the lower bound of the resulting if/else, would yield something like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * 1, which is obviously nonsense. Instead, we could try creating one inequality constraint for each branch.

Investigate the use of SMT-solvers or other automated solvers for the resulting inequality constraints.

May 12th–26th

Better rule for if. Gather running time into constraint for each terminal (tail-call or assignment to return register), "run" function, rearrange constraint to find lower bound on c, take the maximum of all lower bounds as suitable c.

To Discuss

To-do for Next Week

Prove that the running time of compiled functions always has the assumed form. Use the definition of the compiler (Kappelmann et al.) and the definition of generated HOL running times.
From here, argue why or why not an SMT solver could be used to solve the resulting constraints.
Adapt the running/gather method to (tail) recursive functions.

May 26th–June 6th

Formalized a semantics for HOL-TCNat, such that T_f x1 ... xn = e_(x1, ..., xn) + 1 for a function f x1 ... xn = e and its generated running time function T_f. The big-step predicate has the form: (tail-call term :: HOL_TCNat, argument vector :: nat list) |- (term :: HOL_TCNat, let-bound variables :: Nat list) =>^(running time :: nat) (result :: nat).

So we define:

(f,xs) |- (Number n,bs) =>^0 n (and similar for LetBound n and Arg n, by accessing bs and xs respectively).
If forall i. (f,xs) |- (ts_i,bs) =>^zs_i vs_i and g vs = v' and z' = sum(zs) + T_g vs, then (f,xs) |- (Call g ts,bs) =>^z' v'
etc.

Then, for the body f of a HOL-TCN function F assuming:

there are correctness lemmas available for each non-primitive auxiliary function, and
each primitive aux. function has constant running time,

we can show:

forall r_0. forall t, r.
 exist fs. fs_i is called in t --> exist cs, k.
  forall xs, b.
   forall s. s arg_f,i = xs_i -->
    exist ss'.
     (f,xs) |- (t,b) =>^(sum_i(fs_i ss'_i) z for some z
     and [[f]]_(r_0)^([]) |- ([[t]]_r^b,s)
         =>^(k + sum_i(cs_i * fs_i s's_i)) s' for some s'

The proof is an induction on the rule of F, followed by an induction on the term f. The Let-case requires a lemma about substitution.

Thus the running time always has the assumed form, and approach from last time (almost) works: for tail-recursive functions, the procedure for finding a lower bound on c would try and instantiate c with a term containing c... This can be fixed. In general, we have to show:

k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i)

In the case where one of the fs_i is ?c * T_f s', we rearrange to:

k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i) - ?c * T_f s'

removing c_f, T_f, s' from the vectors on the LHS. On the RHS, one of the fs/s's is c_f/T_f, which cancels out the subtracted term.

Then we can rearrange to find a lower bound on ?c, as before.

@@ Line 39: / Line 39: @@
 === May 12th–26th ===
+Better rule for <code>if</code>. Gather running time into constraint for each terminal (tail-call or assignment to return register), "run" function, rearrange constraint to find lower bound on c, take the maximum of all lower bounds as suitable c.
 ==== To Discuss ====
@@ Line 50: / Line 52: @@
 === May 26th–June 6th ===
-Formalized a semantics for HOL-TCNat, such that the running time of a term <code>e</code> obtained from <code>f x1 ... xn = e</code> corresponds to <code>T_f x1 ... xn</code>.
+Formalized a semantics for HOL-TCNat, such that <code>T_f x1 ... xn = [[e]]_(x1, ..., xn) + 1</code> for a function <code>f x1 ... xn = e</code> and its generated running time function <code>T_f</code>. The big-step predicate has the form: <code>(tail-call term :: HOL_TCNat, argument vector :: nat list) |- (term :: HOL_TCNat, let-bound variables :: Nat list) =>^(running time :: nat) (result :: nat)</code>.
+So we define:
+* <code>(f,xs) |- (Number n,bs) =>^0 n</code> (and similar for <code>LetBound n</code> and <code>Arg n</code>, by accessing <code>bs</code> and <code>xs</code> respectively).
+* If <code>forall i. (f,xs) |- (ts_i,bs) =>^zs_i vs_i</code> and <code>g vs = v'</code> and <code>z' = sum(zs) + T_g vs</code>, then <code>(f,xs) |- (Call g ts,bs) =>^z' v'</code>
+* etc.
+Then, for the body <code>f</code> of a HOL-TCN function <code>F</code> assuming:
+* there are correctness lemmas available for each non-primitive auxiliary function, and
+* each primitive aux. function has constant running time,
+we can show:
+ <nowiki>
+forall r_0. forall t, r.
+ exist fs. fs_i is called in t --> exist cs, k.
+  forall xs, b.
+   forall s. s arg_f,i = xs_i -->
+    exist ss'.
+     (f,xs) |- (t,b) =>^(sum_i(fs_i ss'_i) z for some z
+     and [[f]]_(r_0)^([]) |- ([[t]]_r^b,s)
+         =>^(k + sum_i(cs_i * fs_i s's_i)) s' for some s'</nowiki>
+The proof is an induction on the rule of <code>F</code>, followed by an induction on the term <code>f</code>. The <code>Let</code>-case requires a lemma about substitution.
+Thus the running time always has the assumed form, and approach from last time (almost) works: for tail-recursive functions, the procedure for finding a lower bound on c would try and instantiate c with a term containing c... This can be fixed. In general, we have to show:
+ <nowiki>
+k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i)</nowiki>
+In the case where one of the <code>fs_i</code> is <code>?c * T_f s'</code>, we rearrange to:
+ <nowiki>
+k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i) - ?c * T_f s'</nowiki>
+removing c_f, T_f, s' from the vectors on the LHS. On the RHS, one of the fs/s's is c_f/T_f, which cancels out the subtracted term.
+Then we can rearrange to find a lower bound on ?c, as before.

Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking): Difference between revisions

Revision as of 12:31, 6 June 2025

Contents

Progress

March 24th–April 28th

April 28th–May 12th

To Discuss

To-do for Next Week

May 12th–26th

To Discuss

To-do for Next Week

May 26th–June 6th

Navigation menu

Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking): Difference between revisions

Revision as of 12:31, 6 June 2025

Progress

March 24th–April 28th

April 28th–May 12th

To Discuss

To-do for Next Week

May 12th–26th

To Discuss

To-do for Next Week

May 26th–June 6th

Navigation menu

Search