CSE 291 - Program Synthesis

Paper Notes

Lecture 1 - Intro

Understand what program synthesis can do and how
Use existing synthesis tools
contribute to synthesis techniques and tools towards a publication in an academic conference

Intro to Program Synthesis

goal: automate programming
examples:
- MS Excel FlashFill
- simpler problem: Isolate the least significant zero bit in a word
other ideas:
- define program synthesis goal as a type:
  - the type defines constraints on the output result.

Overall program synthesis comes from a high-level specification, and turns it into a program - synthesis is different from compilation because there is a search space to find the right program.

Dimensions of program synthesis
- search strategy
  - how does the system find the right strategy?
- behavioral constraints (specification)
  - how to tell the system what the program should do
- structural constraints
  - what is the space of programs to explore?
Behavioral constraints
- how to tell the system what the program should do?
  - what is the input language/format?
  - what is the interaction model?
  - what happens when intent is ambiguous?
- examples of behavioral constraints
  - input/output examples
  - equivalent program
  - format specifications
  - natural languages
Structural constraints
- what is the space of programs to explore?
  - large enough to contain interesting programs, yet small enough to exclude garbage/search efficiently
  - built-in or user defined?
  - can domain knowledge be extracted from existing code?
- structure constraint examples:
  - built-in DSL
  - user-defined DSL
  - user-provided components
  - languages with synthesis constructs (e.g. generators in sketch)
Search strategies
- synthesis in search:
  - find a program in space defined by structural constraints that satisfies the behavioral constraints
- challenge: space is astronomically large
- how does the system find the program you want?
  - how does it know it’s the program you want?
  - how can it leverage structural constraints to guide the search?
  - how can it leverage behavioral constraints to guide the search
- examples:
  - enumerative search: exhaustively enumerate all programs in the language in the order of increasing size
  - stochastic search: random exploration of search space guided by a fitness function
  - representation-based search: use a data structure to represent a large set of programs
  - constraint-based search: translate to constraints and use a solver

Course structure

Module 1: Synthesis of Simple Programs
- easy to decide when a program is correct
- challenge: search in large space
Module 2: Synthesis of Complex Programs
- decide when a program is correct can be hard
- search in a large space – still a problem
Module 3: Advanced Topics
- quantitative synthesis, human aspects, applications

Lecture 2

Synthesis from Examples

Synthesis from Examples
- also:
  - programming by example
  - inductive synthesis
  - inductive programming
The “Zendo Game”
- teacher makes up a “rule”, gives two examples, one passing, one not
- students attempt to guess the rule
1980s/90s not much progress in learning by example (LBE)
key issues in inductive learning
1. how to find a program that matches observations
2. how to know if it is the program you are looking for?
Traditional ML emphasizes problem (2) - knowing if it is the program you are looking for
- fixes the space so that (1) is easy
modern emphasis
- if you can do really well with (1) you can “win”
- (2) is still important
- decrease the search space as much as possible
key idea: parameterize the search by structural constraints - make the program space domain-specific

Syntax-Guided Synthesis

context-free grammars

L ::= sort(L) |
      L[N..N] |
      L + L   |
      [N]     |
      x
N ::= find(L, N) |
      0

Grammar composed of
- terminals,
- nonterminals
- rules (productions)
- starting nonterminal
\[<T, N, R, S>\]
Sentential forms ${\alpha \in (N\cup T)^*}$
Rewrites to : $\alphaA\Beta \rightarrow \alpha \gamma\Beta == (A \rightarrow \gamma \in R$
(Incomplete) terms/programs: $${\alpha \in (N\cup T)^* A\rightarrow^* \alpha}$$
Ground programs
- programs without holes (complete)
Whole programs
- roughly, programs of the right type
context-free grammer (CFG) define the space of programs which represent all ground and whole programs
How big is the space of problems for a simple CFG?

CFG

E ::= x | E @ E

For a given syntax tree of depth $N$, the number of programs that can be generated for a given depth is $N(d-1)^2 + 1$

Size explodes as complexity of program increases. For more complex grammars it is even larger

Enumerative Search

Idea: Sample programs from the grammar one by one, and test them against examples
Challenge: how to systematically enumerate all programs?
techniques:
- bottom-up enumeration
- top-down enumeration
Bottom-up enumeration
- start from nullary production (variables and constant)
- combine sub-programs into larger programs using productions

Given a grammar: $(\lt T, N, R, S\gt, [i \rightarrow o])$

bank := [t | A ::= t in R]
while (true)
 for all (p in bank)
    if (p([i]) = [o])
        return p
  bank += grow(bank)

grow(bank) {
    bank' := []
    forall (A ::= rhs in R)
        bank' += [rhs[B -> p] | p in bank, B -> *p]
    return bank
}

Lecture 3

Enumerative Search – Cont’d

The top-down algorithm (could be searched breadth/depth first) from root of tree down to further depths

def topDown(<T, N, R, S>, [i --> o]) {
    wl := [S]
    while (wl != []) {
        p := wl.dequeue();
        if (ground(p) && p([i]) = [o]) {
            return p;
        }
        wl.enqueue(unroll(p));
    }
}

def unroll(p) {
    wl' := []
    A := left-most non-terminal in p
    forall (A ::= rhs in R) {
        wl' += p[A --> rhs]
    }
    return wl'
}

bottom up:
- candidates may be ground, but might not be whole programs
  - can always run on inputs
  - may not be able to relate to outputs
top-down:
- candidates are whole, but might not be ground
  - cannot always run on inputs
  - can always relate to outputs

Search Space Pruning

When can we discard a subprogram?
- it’s equivalent to something we already explored:
  - e.g. sort(x) vs sort(sort(x))
  - (equivalence reduction/symmetry breaking)
- no matter what it is combined with, it will not satisfy the specification
how to do equivalence reduction with an expensive “equiv” check?
- expensive for SyGuS problems
- expensive checks on every candidate defeats purpose of space pruning checks
observational equivalence
- in PBE we only care about equivalence on given inputs
  - easy to check efficiently
  - many more programs equivalent on particular inputs than on the entire input space
term-re-writing system (TRS)
- transform one term into another form of a term
- issues with deciding which terms should be included, or when to use
  - e.g. commutativity rules could result in an infinite re-writing of a term - leading to an infinite loop
just don’t enumerate over non-normal form expressions - remove as soon as an equivalence is identified
Built-in equivalences
- for a set of ops - equivalence reduction can be hard-coded into the tool or directly into the grammar
  - can modify the grammar so that equivalences are not generated
Equivalence reduction:
- Observational:
  - Pros:
    - very general, no additional user input
    - finds more equivalences
  - Cons:
    - can be costly (many large examples, large outputs)
    - new samples added, search must be restarted
- user-specified
  - Pros:
    - fast
  - Cons:
    - requires equations
- built-in:
  - Pros:
    - even faster
  - Cons:
    - restricted to built-in operators
    - only certain symmetries can be eliminated by modifying the grammar

Lecture 4

when to discard subprogram?
- equivalent to something already explored
- the program doesn’t fit the spec
idea: once we pick a production, infer specification for subprograms
property of cons (list construction operator) that allows decomposition?
- injective: for every input, there is exactly one output
property of an operator that makes it good for top down enumeration?
- works when a function is “sufficiently injective”
  - output examples have a small pre-image
Conditional Abduction
- form of top-down propagation

eusolver discussion:

constraints:
- behavioral
  - linear arithmetic, first order logical formulas
    - f(x, y) >= x && f(x, y) >= y && …
- structural
  - conditional expression grammar
- search strategy
  - enumerative search strategy
specification needs to be pointwise
- what is pointwise vs non pointwise?
  - f(x) > f(x+1)
- how would it break the enumerative solver?
  - CEGIS (counterexample guided inductive synthesis)
  - if the spec is not monotonically increasing/decreasing, CEGIS would break
What are pruning decomp techniques EUsolver uses to speed up search
- condition abduction + special form of equivalence reduction
- why generate additional terms when all inputs are covered
  - the predicates might not cover all the terms in a manner which solves the problem
naive alternative to decision tree learning for synthesizing branch conditions?
- learn atomic predicates that precisely classify points
  - why worse? -
  - as bad as Esolver? - enumerate over many possible conditions not much better
- next best is decision tree learning w/o heuristics
  - why is this worse? more space used by decision tree

Lecture 5

Order of search:
- enumerative search explores programs by depth/size
  - good default bias: small solution is likely to generalize?
  - result?
    - scales poorly with the size of the smallest solution to a given spec
    - if spec is insufficient: plays monkeys paw
biasing search:
- idea: explore programs in order of likelihood rather than size
  - q1: how do we know which programs are likely
    - learn some kind of statistical/probabilistic models
  - q2: how do we use that information to guide search
statistical language model:
- originated in NLP
- in general, a probability distribution over sentences in a language
  - \[P(s) \for s \in L\]
  - what are the natural programs (sentences) in a DSL?
- kinds of corpora:
  - all programs from DSL: what is natural in DSL?
  - solutions to specific tasks
  - spec-program pairs: what are the likely programs?
- kinds of __:
- example:
  - Code completion with statistical language models: SLANG
    - predicts completions for a sequence of API calls
    - treats programs as a set of abstract histories
    - training learns bigrams, n-grams, RNNs on histories
      - inference: given a history with holes
        
        bigrams to get some possibilities
        
        n-grams/RNN to rank them
        
        combined history completions into a coherent program
      - features: fast
      - limitations: all invocation pairs must appear in the training set
  - neural corrector for MOOCs
    - input: incorrect program + test suite
    - output: correct program
    - treats programs as sequence of tokens, abstracting variable names
    - uses skipgram model to predict which statement is most likely to occur between the begin and end statement
    - features: repair syntax errors; limitations: needs all algorithmically distinct solutions to appear in the training set
  - DeepCoder:
    - algorithm with a neural net that given input/output examples attempts to guess which operations and operators are the most likely to be in the part of the grammar.
    - features:
      - trains on synthetic data:
      - can be easily combined with any enumerative search
      - significant speedsups for small list DSL
weighted top-down search
- idea: explore programs in the order of likelihood, not size
- assign weights to edges such that if one distance from the root is less, the program is more likely.
  - can search with dijkstra’s (or A*)

Lecture 6

PCFG doesn’t provide enough information to synthesize
- instead, use a PHOG which represents more information than a PCFG.
  - contains a context object
synthesize a program that describes the program
- start with CFG
- synthesize into AST
- learn a “context” and probability for the grammar
PHOGs supposed to be good for:
- code completion, deobfuscation, PL translation, statistical bug detection
Euphony contraints:
- structural: input/output examples
- behavioral: PHOGs
- search strategy: top-down enumerative w/ A*
euphony q2: what would productions of Rep(x,"-", S) look like if we use a PCFG, 3-gram? | S --> "-" = 0.2 | S --> "." = 0.3 | ...
equivalence vs weak equivalence
- equivalence: no matter how holes are filled, the programs are equivalent
- weak equivalence: subsequences of sentential forms are equivalent on some points
paper contributions
- efficient way to guide search by probabilistic grammar
- transfer learning for PHOGs
- extend observational equivalence to top-down search
weaknesses
- requires high-quality training data
- transfer learning requires manually designed features

Lecture 7

representation-based + stochastic search
representation-based search
- idea:
  - build a data structure that represent a large part of the search space
  - search within the data structure
- useful when
  - need to return multiple results/rank results
  - can pre-process search space / use for multiple queries
- tradeoff: easy to build vs easy to search
representations
- version space algegra (VSA)
- Finite Tree Automaton (FTA)
- Type Transition Net (TTN)
VSA
- build a data structure succinctly representing the set of programs consistent with examples
- operations on version spaces:
  - learn <i, o> -> VS
  - VS1 \intersect VS2 -> VS
  - pick VS -> program
- version space is a DAG
  - leaves: set of programs
  - union nodes: represents two sets of programs together
  - join node: corresponds to an operation on pairs of programs
  - join/union can be anywhere
- Volume of a VSA –> V(VSA) (number of nodes)
- Size of a VSA –> VSA (number of programs)
- VSA DSL restrictions
  - every operator has a small easily computable inverse
  - every recursive rule generates a strictly smaller subproblem
PROSE framework

Lecture 8

Finite Tree Automaton (FTA)

example:

N ::= id(V) | N + T | N * T
T ::= 2 | 3
V ::= x

spec:
1 -> 9

FTA is $A = {Q, F, Q_f, \delta}$
- states + alphabet, final states, transitions
FTA can have a lot of states, even with simple grammars
- idea: instead of one state/one value:
Type transition net (TTN)
context: component-based synthesis
- given a library of components + query: type + examples
  - synthesize composition of components
idea: build a compact graph of the search space using
- types as nodes
- components as transitions
generate a petri net (special graph) that can define reachability of a particular program
petri-nets
- bipartite graph
  - two node types, edges only between each type
- transitions consume tokens
  - produce token for return type
representation-based vs enumerative
- enumerative unfolds the search space in time, while representation-based stores the search space in memory
- FTA –> bottom-up
  - with observational equivalence
- VSA –> top-down
  - with top-down propagation

Lecture 9

topics
- stochastic search
- constraint solvers
- constraint-based search
search techniques
- CFG - enumerative, good for small programs
- PCFG/PHOG - weighted enumerative search. Still quite large exploration
- local search -
naive local search
- to find the best program
- pick a starting point, mutate and check if new mutation is better than old.
- likely results in getting stuck at a local minima
- MCMC sampling to give chance to get out of a local optima
stochastic sampling requires a good cost function
\[C_s(p) = eq_s(p) + \text{perf}(p)\]
- components
  - source program
  - penalty for wrong results
  - penalty for being slow
local search:
- can explore program spaces with no a-priori bias
- limitations?
  - only applicable with a good cost function
  - counterexample: round to next power of two
Intro to SAT and SMT solvers
- synthesis is combinatorial search, so is SAT
- SAT solvers are quite good
- ???
- profit!!
Boolean SATisfiability
- logical operators/propositional variables
- (gin \or tonic) \and (minor => !gin) \and minor
SMT - Satisfiability Modulo Theories
- not all formulas are modeled as boolean variables
- SMT support for non-boolean problems
Using SMT solvers
- example: Array partitioning
  - partition array on N evenly into P sub-ranges
  - what happens when N is not divisible by P?
    - sizes of partitions don’t differ by more than 1
popular theories
- equality and uninterpreted functions (axioms of equality and congruence)
- linear integer arithmetic - e.g. {0, 1,, …. +, -}
- Arrays - select/store at particular index
- theories can be combined

Lecture 10

Constraint-based search
- idea: encode synthesis problem as a SAT/SMT problem
- program space can be encoded into parameters
- two constraints by default: semantic correctness constraint + well-formed constraint
- phi(C, i, o) - behavior/function of program
how to define an encoding
- define parameter space => C = {c1, c2, c3, …, cn}
  - encode: program –> C
  - decode: C –> program
- define a formula wf(c1, …, cn)
  - must hold iff decode[C] is a well-formed program
- define a formula phi(c1,…,cn, i, o)
  - holds iff (decode[C])(i) = 0
properties of a good encoding
- sound: well-formed and semantically correct
- complete: decode[C] is a solution to the w-f and semantic constraints
- small parameter space to avoid symmetries
- solver-friendly : decidable logic, compact constraint
DSL limitations
- program space can be parameterized with a finite set of parameters
- program semantics should be expressible as a decidable SAT/SMT formula

Lecture 11

Rich specifications (why not PBE?)
- reference implementation
  - easy to compute result, but hard to do so efficiently or under structural constraints
- assertions
  - hard to compute result, but easy to check desired properties
- pre-post conditions
  - assign logical expression to be true before and after
- refinement types
  - push the logical expression to be inside the types.
why else to go beyond examples?
- examples contain too-little information
- output can be difficult to construct (outputs hard to compute)
reasoning about non-functional properties (e.g., security protocols)
why is this hard?
- compute GCD – infinitely many inputs
  - also, infinitely many paths (loops)
constraint-based synthesis from specifications
- how to solve constraints on infinitely many inputs?
  - CEGIS/SAT Solver
- CBS from specifications
  - for all i/o examples, relate them by a function $phi$
  - there exists a program, such that for all inputs if the program satisfies the constraints, then it must also solve the program $phi$
CEGIS, formally
- given a harness create an encoding:
- there exists [unknown variable set] for all [input/output examples] such that the [program] implies [an assertion/constraint]
- linear constraint guaranteed to satisfy bounded observation hypothesis (linear constraint only needs two)
- how to find the “good” inputs for which BOH holds
  - rely on oracle to generate counterexamples
constraint-based synthesis from specifications
- different constraints for different problems
  - CFGs
  - Components
  - “just figure out the constants”
- generators in sketch: function with hole, always filled in the same way
  - generator can be filled in multiple ways, depending on the context where it is called.

Lecture 12

Encoding arbitrary semantics in programs with holes/constraints
symbolic execution: no concrete values for each variable
- variables represented as formulas instead of concrete values
expression reads a state and produces a value
states are modeled as a map (sigma) from variables to values
- using lambda calculus to model Alpha, Sigma, Psi (symbols, state, constraints of valid executions)
- $A[[x]]\sigma$ – value of x at current state
- \[C[[command]]<\sigma, \psi> = \langle {State}, {Assertions} \rangle\]

Lecture 13

Type-driven program synthesis
specification –> types (programmer-friendly, informative)
use types to prune search space
synthesis accepts a type, a context, and produces a program
Type systems
- deductive system for proving facts about programs and types
- defined using inference rules over judgements
- “under context $\Gamma$, term $e$ has type $T$
a simple type system:
- \[e ::= 0 | e + 1 | x | e e | \lambda x.e\]
- types: $$ T ::= Int T \rightarrow T $$
- contexts: $$ \Gamma ::= . x : T,\Gamma $$
syntax guided enumeration
- only enumerate programs with the proper types
  - still need to type check all programs before skipping
- better idea: type-guided enumeration
  - enumerate all derivations generated by the type systems
  - extract terms from derivations
- three ways to do typing judgements
  - type checking (term and type known)
  - type inference (term is known, type is unknown)
  - synthesis (term unknown, type is known)
- search strategy: recursively transform synthesis goal using typing rules until finding a complete derivation
generating some programs can result in equivalencies
- only generate programs in normal form
- restrict the type system to make redundant programs ill-typed
type-driven synthesis in 3 steps:
- annotate types with extra specification(examples, logical predicates, resources)
- design a type system for annotated types (propagate as much info as possible from conclusion to premises)

Lecture 14

polymorphic types:
- type that takes a can take a type as a parameter
- e.g. generics in Java
refinement types: augment types with predicates
more complex types:
- dependent function types: name arguments of a function, then use arguments later in the function/predicate
  - example, max function: max :: x : Int -> y: Int -> {v: Int | x <= v && y <= v }
subtyping: T’ is a subtype of T is all values of type T’ also being to T
- written T' <; T
synquid limitations
- user interaction:
  - types can be large and hard to write
  - components need to be annotated
- expressiveness
  - specs are tricky/impossible to express
  - cannot synthesize recursive auxiliary functions
- condition abduction is limited to liquid predicates
- cannot generate arbitrary constraints
synquid questions
- behavioral: refinement types?

Lecture 15

constraint-based synthesis
- loops are hard!
  - loops create many paths, integers gives many inputs
- sketch handles loops by unrolling to a particular depth
- if we can’t unroll enough, get an unsatisfiable program
Hoare logic: a logic for simple imperative programs
- particularly loop invariants

The imp language:

e ::= n | x |
      e + e | e - e | e * e |
      e = e | e < e | e > e | !e | e && e
+ assignment/if/then/while logic

Hoare triples

judgments grouped as {P} c {Q}
- precondition P
- if execution of c
- postcondition Q

if P holds in initial state $$

sigma$, and if the execution from c from$\sigma$terminated in a state$‘\sigma$, then Q holds in$‘\sigma$$

“partial correctness” –> program may not terminate
“full correctness” –> program terminates

need to be able to express values in final states are equivalent to values in beginning states
- solution: introduce logical variables which don’t appear in the actual program
- this can make expressions for pre/post conditions more expressive
“skip” –> {P} skip {P} –> state holds before and after
assignment semantics in hoare logic
- {P[x --> e]} x := e {P}
- (P[x->e]\sigma) –> P holds in state
semantics of compositions
- need to invent intermediate insertion
rule of consequence –> if you “need” less to prove in the beginning but can “prove” more at the end, it is essentially just as strong as the original preconditions

Lecture 16

synthesizing C code: verbose, unstructured, pointers (yuck)
hoare logic is not enough to prove heap manipulating programs in the case where two variables might point to the same location in memory
suslik approach:
- separation logic –> deductive synthesis –> provable program
suslik swap example:

Precondition: {x -> A * y -> B} Postcondition: {x -> B * y -> A}

special part of separation logic: *
- items joined with operator must be in disjoint heap locations
judgement of a program:
- P –> Q c
- A state satisfying P can be transformed into a state satisfying Q using a program c
use rules to deduce /simplify pre-postconditions until “skip” (do nothing)
proof search

Lecture 17

Nope paper contributions:
- first to prove unrealizability for infinite program spaces
- CEGIS for unrealizability
- sound for synthesis and unrealizability
NOPE limitations:
- can timeout/run forever
- situations where it can fail to terminate:
  - program that works on finitely many examples (but not infinite)
  - very large/difficult problem.
- in practice.. works on < 1/2 benchmarks
- limited to SyGuS

Lecture 18

Programming with humans
- snippy: projection boxes to show programming sate while writing the program
- live programming is a good environment for synthesis
  - inputs already exist (for simple programs)
  - encourages “small-step” PBE (easier for synthesizer)
- snippy was more limited, but faster, lower cognitive burden, and synthesized more compact solutions
- hoogle+
  - synthesize from types, but difficult because they are ambiguous
    - if output all programs with correct type, too many irrelevant programs
  - which solution is the right one?
    - use a test-based filtering
  - help beginners with types
- RESL
  - REPL but with synthesis
    - syntactic specs + sketching + debugger
    - give granular feedback about grammar to synthesizer

Lecture 19

synthesis with users cont’d
today: Rousillon, Wrex, Regae
Rousillon / Helena
- web scraping tool for social scientists
- problem with traditional scrapers: need to reverse-engineer web-page DOM
- types of web data:
  - distributed: user must naviagete between many pages, click, use forms, etc
  - hierarchical: tree-structured data
Wrex:
- goals: for data scientists, not a black box:
- users create a data frame and sample it
- Wrex has an interactive grid where users derive a new column and give transformation examples
- give code window with synthesized code
- apply synthesized code to full data frame and plotting.