Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking): Difference between revisions
progress 1 |
No edit summary |
||
Line 39: | Line 39: | ||
=== May 12th–26th === | === May 12th–26th === | ||
Better rule for <code>if</code>. Gather running time into constraint for each terminal (tail-call or assignment to return register), "run" function, rearrange constraint to find lower bound on c, take the maximum of all lower bounds as suitable c. | |||
==== To Discuss ==== | ==== To Discuss ==== | ||
Line 50: | Line 52: | ||
=== May 26th–June 6th === | === May 26th–June 6th === | ||
Formalized a semantics for HOL-TCNat, such that | Formalized a semantics for HOL-TCNat, such that <code>T_f x1 ... xn = [[e]]_(x1, ..., xn) + 1</code> for a function <code>f x1 ... xn = e</code> and its generated running time function <code>T_f</code>. The big-step predicate has the form: <code>(tail-call term :: HOL_TCNat, argument vector :: nat list) |- (term :: HOL_TCNat, let-bound variables :: Nat list) =>^(running time :: nat) (result :: nat)</code>. | ||
So we define: | |||
* <code>(f,xs) |- (Number n,bs) =>^0 n</code> (and similar for <code>LetBound n</code> and <code>Arg n</code>, by accessing <code>bs</code> and <code>xs</code> respectively). | |||
* If <code>forall i. (f,xs) |- (ts_i,bs) =>^zs_i vs_i</code> and <code>g vs = v'</code> and <code>z' = sum(zs) + T_g vs</code>, then <code>(f,xs) |- (Call g ts,bs) =>^z' v'</code> | |||
* etc. | |||
Then, for the body <code>f</code> of a HOL-TCN function <code>F</code> assuming: | |||
* there are correctness lemmas available for each non-primitive auxiliary function, and | |||
* each primitive aux. function has constant running time, | |||
we can show: | |||
<nowiki> | |||
forall r_0. forall t, r. | |||
exist fs. fs_i is called in t --> exist cs, k. | |||
forall xs, b. | |||
forall s. s arg_f,i = xs_i --> | |||
exist ss'. | |||
(f,xs) |- (t,b) =>^(sum_i(fs_i ss'_i) z for some z | |||
and [[f]]_(r_0)^([]) |- ([[t]]_r^b,s) | |||
=>^(k + sum_i(cs_i * fs_i s's_i)) s' for some s'</nowiki> | |||
The proof is an induction on the rule of <code>F</code>, followed by an induction on the term <code>f</code>. The <code>Let</code>-case requires a lemma about substitution. | |||
Thus the running time always has the assumed form, and approach from last time (almost) works: for tail-recursive functions, the procedure for finding a lower bound on c would try and instantiate c with a term containing c... This can be fixed. In general, we have to show: | |||
<nowiki> | |||
k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i)</nowiki> | |||
In the case where one of the <code>fs_i</code> is <code>?c * T_f s'</code>, we rearrange to: | |||
<nowiki> | |||
k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i) - ?c * T_f s'</nowiki> | |||
removing c_f, T_f, s' from the vectors on the LHS. On the RHS, one of the fs/s's is c_f/T_f, which cancels out the subtracted term. | |||
Then we can rearrange to find a lower bound on ?c, as before. |
Latest revision as of 12:31, 6 June 2025
Progress tracking for master's thesis Translating Time Bounds of Functional Programs to a Deeply-Embedded Language.
Progress
March 24th–April 28th
Read through Proof-Producing Translation of Functional Programs into a Time & Space Reasonable Model which describes existing work on functional correctness.
Read A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification which develops a framework for interactive proofs of running time intended as part of an interface specification. The method first recursively descends through the function to calculate its running time up to recursive calls, e.g. T_f(x) = 2 + T_f(x - 1)
. The user is asked to provide a "shape" of the running time bound they wish to specify, e.g. a * x
, where a
is left as a variable to solve for. Then induction is performed on the goal e.g. 2 + T_f(x - 1) <= a * x
, which through induction then becomes 2 + a * (x - 1) <= a * x
, thus providing a constraint on a suitable a
. This is essentially the "substitution method" of Cormen et al..
April 28th–May 12th
Investigated how to generate running time constraints in Isabelle by recursive tactics on the structure of the IMP-Tailcall term, as in previous work on correctness. The issue to overcome is that we must work with a predicate of the form \exists c. \forall s. T_f_IMP s <= c * T_f (s "x"), which requires providing a single witness c for the linear factor. Previous work on correctness split the single goal recursively at sequences and conditionals into goals for each sub-term, but this approach doesn't work directly with an existential quantor: the same constant must be a witness in both subterms.
Instead of working directly with the existential, we can "provide" a witness up-front in the form of a schematic variable, and instantiate the schematic step-by-step. Thus we work with a predicate that bounds the running time by e.g. c * T_f (s "x"), without the existential quantor, and compute suitable constraints on the way down. Once we have computed a suitable c, it is obviously a witness for the existential quantor and we can close the goal.
This poses an additional challenge when previously verified functions are called, as the scheme devised above requires a concrete running time bound, not quantified with the existence of a constant. For this, we use Isabelle's LEAST operator to materialize the existential c from the running time of the auxiliary function's bound. (The obvious approach of "obtain"ing the constant with Isabelle's proof functionality does not work, as the schematic witness c of the function being analyzed can not depend on a value obtained later). We can then work with this (LEAST c. ...) * T_aux ... term just like any other bound.
After analyzing the term, we can instantiate and apply an induction hypothesis where applicable, this essentially corresponds to Cormen's "substitution method".
We are left with constraints on the constant c which must be solved, then c can be instantiated and the goals closed.
To Discuss
Should we investigate other approaches? For example, computing the running time up-front.
How can we automatically solve for suitable constants from the resulting constraints?
To-do for Next Week
Examine more complex functions which:
- call functions with non-constant running time
- have preconditions on the well-formedness of their inputs
- use data types (this will only work to relate to HOL-Nat)
The condition of an if/else may depend on the input beyond just determining which case of the original function is being called, i.e. cannot necessarily be solved directly based on the inductive case being considered. Thus we cannot assume that we can solve the if/else if we introduce one into our timing constraint. The natural idea to use a maximum of both branches' running time won't work, consider for example the simple expression if b then 1 else g y
; then, taking the maximum on the running time would result in a constraint like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * T_f (s "x")
, which upon expanding T_f
in the right-hand-side and taking the lower bound of the resulting if/else, would yield something like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * 1
, which is obviously nonsense. Instead, we could try creating one inequality constraint for each branch.
Investigate the use of SMT-solvers or other automated solvers for the resulting inequality constraints.
May 12th–26th
Better rule for if
. Gather running time into constraint for each terminal (tail-call or assignment to return register), "run" function, rearrange constraint to find lower bound on c, take the maximum of all lower bounds as suitable c.
To Discuss
To-do for Next Week
- Prove that the running time of compiled functions always has the assumed form. Use the definition of the compiler (Kappelmann et al.) and the definition of generated HOL running times.
- From here, argue why or why not an SMT solver could be used to solve the resulting constraints.
- Adapt the
running
/gather
method to (tail) recursive functions.
May 26th–June 6th
Formalized a semantics for HOL-TCNat, such that T_f x1 ... xn = e_(x1, ..., xn) + 1
for a function f x1 ... xn = e
and its generated running time function T_f
. The big-step predicate has the form: (tail-call term :: HOL_TCNat, argument vector :: nat list) |- (term :: HOL_TCNat, let-bound variables :: Nat list) =>^(running time :: nat) (result :: nat)
.
So we define:
(f,xs) |- (Number n,bs) =>^0 n
(and similar forLetBound n
andArg n
, by accessingbs
andxs
respectively).- If
forall i. (f,xs) |- (ts_i,bs) =>^zs_i vs_i
andg vs = v'
andz' = sum(zs) + T_g vs
, then(f,xs) |- (Call g ts,bs) =>^z' v'
- etc.
Then, for the body f
of a HOL-TCN function F
assuming:
- there are correctness lemmas available for each non-primitive auxiliary function, and
- each primitive aux. function has constant running time,
we can show:
forall r_0. forall t, r. exist fs. fs_i is called in t --> exist cs, k. forall xs, b. forall s. s arg_f,i = xs_i --> exist ss'. (f,xs) |- (t,b) =>^(sum_i(fs_i ss'_i) z for some z and [[f]]_(r_0)^([]) |- ([[t]]_r^b,s) =>^(k + sum_i(cs_i * fs_i s's_i)) s' for some s'
The proof is an induction on the rule of F
, followed by an induction on the term f
. The Let
-case requires a lemma about substitution.
Thus the running time always has the assumed form, and approach from last time (almost) works: for tail-recursive functions, the procedure for finding a lower bound on c would try and instantiate c with a term containing c... This can be fixed. In general, we have to show:
k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i)
In the case where one of the fs_i
is ?c * T_f s'
, we rearrange to:
k + sum_i(cs_i * fs_i s's_i) <= ?c * sum_i(fs_i s's_i) - ?c * T_f s'
removing c_f, T_f, s' from the vectors on the LHS. On the RHS, one of the fs/s's is c_f/T_f, which cancels out the subtracted term.
Then we can rearrange to find a lower bound on ?c, as before.