Translating Time Bounds of Functional Programs to a Deeply-Embedded Language (Thesis Project Tracking)

From max.wiki
Jump to navigation Jump to search

Progress tracking for master's thesis Translating Time Bounds of Functional Programs to a Deeply-Embedded Language.

Progress

March 24th–April 28th

Read through Proof-Producing Translation of Functional Programs into a Time & Space Reasonable Model which describes existing work on functional correctness.

Read A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification which develops a framework for interactive proofs of running time intended as part of an interface specification. The method first recursively descends through the function to calculate its running time up to recursive calls, e.g. T_f(x) = 2 + T_f(x - 1). The user is asked to provide a "shape" of the running time bound they wish to specify, e.g. a * x, where a is left as a variable to solve for. Then induction is performed on the goal e.g. 2 + T_f(x - 1) <= a * x, which through induction then becomes 2 + a * (x - 1) <= a * x, thus providing a constraint on a suitable a. This is essentially the "substitution method" of Cormen et al..

April 28th–May 12th

Investigated how to generate running time constraints in Isabelle by recursive tactics on the structure of the IMP-Tailcall term, as in previous work on correctness. The issue to overcome is that we must work with a predicate of the form \exists c. \forall s. T_f_IMP s <= c * T_f (s "x"), which requires providing a single witness c for the linear factor. Previous work on correctness split the single goal recursively at sequences and conditionals into goals for each sub-term, but this approach doesn't work directly with an existential quantor: the same constant must be a witness in both subterms.

Instead of working directly with the existential, we can "provide" a witness up-front in the form of a schematic variable, and instantiate the schematic step-by-step. Thus we work with a predicate that bounds the running time by e.g. c * T_f (s "x"), without the existential quantor, and compute suitable constraints on the way down. Once we have computed a suitable c, it is obviously a witness for the existential quantor and we can close the goal.

This poses an additional challenge when previously verified functions are called, as the scheme devised above requires a concrete running time bound, not quantified with the existence of a constant. For this, we use Isabelle's LEAST operator to materialize the existential c from the running time of the auxiliary function's bound. (The obvious approach of "obtain"ing the constant with Isabelle's proof functionality does not work, as the schematic witness c of the function being analyzed can not depend on a value obtained later). We can then work with this (LEAST c. ...) * T_aux ... term just like any other bound.

After analyzing the term, we can instantiate and apply an induction hypothesis where applicable, this essentially corresponds to Cormen's "substitution method".

We are left with constraints on the constant c which must be solved, then c can be instantiated and the goals closed.

To Discuss

Should we investigate other approaches? For example, computing the running time up-front.

How can we automatically solve for suitable constants from the resulting constraints?

To-do for Next Week

Examine more complex functions which:

  • call functions with non-constant running time
  • have preconditions on the well-formedness of their inputs
  • use data types (this will only work to relate to HOL-Nat)

The condition of an if/else may depend on the input beyond just determining which case of the original function is being called, i.e. cannot necessarily be solved directly based on the inductive case being considered. Thus we cannot assume that we can solve the if/else if we introduce one into our timing constraint. The natural idea to use a maximum of both branches' running time won't work, consider for example the simple expression if b then 1 else g y; then, taking the maximum on the running time would result in a constraint like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * T_f (s "x"), which upon expanding T_f in the right-hand-side and taking the lower bound of the resulting if/else, would yield something like max 3 ((LEAST c. ...) * T_g (s' "y")) <= ?c * 1, which is obviously nonsense. Instead, we could try creating one inequality constraint for each branch.

Investigate the use of SMT-solvers or other automated solvers for the resulting inequality constraints.

May 12th–26th

To Discuss

To-do for Next Week