CSE 291  Program Synthesis
Paper Notes
Lecture 1  Intro
 Understand what program synthesis can do and how
 Use existing synthesis tools
 contribute to synthesis techniques and tools towards a publication in an academic conference
Intro to Program Synthesis

goal: automate programming
 examples:
 MS Excel FlashFill
 simpler problem: Isolate the least significant zero bit in a word
 other ideas:
 define program synthesis goal as a type:
 the type defines constraints on the output result.
 define program synthesis goal as a type:
Overall program synthesis comes from a highlevel specification, and turns it into a program  synthesis is different from compilation because there is a search space to find the right program.
 Dimensions of program synthesis
 search strategy
 how does the system find the right strategy?
 behavioral constraints (specification)
 how to tell the system what the program should do
 structural constraints
 what is the space of programs to explore?
 search strategy
 Behavioral constraints
 how to tell the system what the program should do?
 what is the input language/format?
 what is the interaction model?
 what happens when intent is ambiguous?
 examples of behavioral constraints
 input/output examples
 equivalent program
 format specifications
 natural languages
 how to tell the system what the program should do?
 Structural constraints
 what is the space of programs to explore?
 large enough to contain interesting programs, yet small enough to exclude garbage/search efficiently
 builtin or user defined?
 can domain knowledge be extracted from existing code?
 structure constraint examples:
 builtin DSL
 userdefined DSL
 userprovided components
 languages with synthesis constructs (e.g. generators in sketch)
 what is the space of programs to explore?
 Search strategies
 synthesis in search:
 find a program in space defined by structural constraints that satisfies the behavioral constraints
 challenge: space is astronomically large
 how does the system find the program you want?
 how does it know it’s the program you want?
 how can it leverage structural constraints to guide the search?
 how can it leverage behavioral constraints to guide the search
 examples:
 enumerative search: exhaustively enumerate all programs in the language in the order of increasing size
 stochastic search: random exploration of search space guided by a fitness function
 representationbased search: use a data structure to represent a large set of programs
 constraintbased search: translate to constraints and use a solver
 synthesis in search:
Course structure
 Module 1: Synthesis of Simple Programs
 easy to decide when a program is correct
 challenge: search in large space
 Module 2: Synthesis of Complex Programs
 decide when a program is correct can be hard
 search in a large space – still a problem
 Module 3: Advanced Topics
 quantitative synthesis, human aspects, applications
Lecture 2
Synthesis from Examples
 Synthesis from Examples
 also:
 programming by example
 inductive synthesis
 inductive programming
 also:
 The “Zendo Game”
 teacher makes up a “rule”, gives two examples, one passing, one not
 students attempt to guess the rule

1980s/90s not much progress in learning by example (LBE)
 key issues in inductive learning
 how to find a program that matches observations
 how to know if it is the program you are looking for?
 Traditional ML emphasizes problem (2)  knowing if it is the program
you are looking for
 fixes the space so that (1) is easy
 modern emphasis
 if you can do really well with (1) you can “win”
 (2) is still important
 decrease the search space as much as possible
 key idea: parameterize the search by structural constraints  make the program space domainspecific
SyntaxGuided Synthesis
 contextfree grammars
L ::= sort(L) 
L[N..N] 
L + L 
[N] 
x
N ::= find(L, N) 
0
 Grammar composed of
 terminals,
 nonterminals
 rules (productions)
 starting nonterminal
 \[<T, N, R, S>\]
 Sentential forms \({\alpha \in (N\cup T)^*}\)
 Rewrites to : \(\alphaA\Beta \rightarrow \alpha \gamma\Beta == (A \rightarrow \gamma \in R\)

(Incomplete) terms/programs: $${\alpha \in (N\cup T)^* A\rightarrow^* \alpha}$$  Ground programs
 programs without holes (complete)
 Whole programs
 roughly, programs of the right type

contextfree grammer (CFG) define the space of programs which represent all ground and whole programs
 How big is the space of problems for a simple CFG?
CFG
E ::= x  E @ E
For a given syntax tree of depth \(N\), the number of programs that can be generated for a given depth is \(N(d1)^2 + 1\)
Size explodes as complexity of program increases. For more complex grammars it is even larger
Enumerative Search
 Idea: Sample programs from the grammar one by one, and test them against examples

Challenge: how to systematically enumerate all programs?
 techniques:
 bottomup enumeration
 topdown enumeration
 Bottomup enumeration
 start from nullary production (variables and constant)
 combine subprograms into larger programs using productions
Given a grammar: \((\lt T, N, R, S\gt, [i \rightarrow o])\)
bank := [t  A ::= t in R]
while (true)
for all (p in bank)
if (p([i]) = [o])
return p
bank += grow(bank)
grow(bank) {
bank' := []
forall (A ::= rhs in R)
bank' += [rhs[B > p]  p in bank, B > *p]
return bank
}
Lecture 3
Enumerative Search – Cont’d
 The topdown algorithm (could be searched breadth/depth first) from root of tree down to further depths
def topDown(<T, N, R, S>, [i > o]) {
wl := [S]
while (wl != []) {
p := wl.dequeue();
if (ground(p) && p([i]) = [o]) {
return p;
}
wl.enqueue(unroll(p));
}
}
def unroll(p) {
wl' := []
A := leftmost nonterminal in p
forall (A ::= rhs in R) {
wl' += p[A > rhs]
}
return wl'
}
 bottom up:
 candidates may be ground, but might not be whole programs
 can always run on inputs
 may not be able to relate to outputs
 candidates may be ground, but might not be whole programs
 topdown:
 candidates are whole, but might not be ground
 cannot always run on inputs
 can always relate to outputs
 candidates are whole, but might not be ground
Search Space Pruning
 When can we discard a subprogram?
 it’s equivalent to something we already explored:
 e.g.
sort(x)
vssort(sort(x))
 (equivalence reduction/symmetry breaking)
 e.g.
 no matter what it is combined with, it will not satisfy the specification
 it’s equivalent to something we already explored:
 how to do equivalence reduction with an expensive “equiv” check?
 expensive for SyGuS problems
 expensive checks on every candidate defeats purpose of space pruning checks
 observational equivalence
 in PBE we only care about equivalence on given inputs
 easy to check efficiently
 many more programs equivalent on particular inputs than on the entire input space
 in PBE we only care about equivalence on given inputs
 termrewriting system (TRS)
 transform one term into another form of a term
 issues with deciding which terms should be included, or when to use
 e.g. commutativity rules could result in an infinite rewriting of a term  leading to an infinite loop

just don’t enumerate over nonnormal form expressions  remove as soon as an equivalence is identified
 Builtin equivalences
 for a set of ops  equivalence reduction can be hardcoded into the tool
or directly into the grammar
 can modify the grammar so that equivalences are not generated
 for a set of ops  equivalence reduction can be hardcoded into the tool
or directly into the grammar
 Equivalence reduction:
 Observational:
 Pros:
 very general, no additional user input
 finds more equivalences
 Cons:
 can be costly (many large examples, large outputs)
 new samples added, search must be restarted
 Pros:
 userspecified
 Pros:
 fast
 Cons:
 requires equations
 Pros:
 builtin:
 Pros:
 even faster
 Cons:
 restricted to builtin operators
 only certain symmetries can be eliminated by modifying the grammar
 Pros:
 Observational:
Lecture 4
 when to discard subprogram?
 equivalent to something already explored
 the program doesn’t fit the spec

idea: once we pick a production, infer specification for subprograms
 property of
cons
(list construction operator) that allows decomposition? injective: for every input, there is exactly one output
 property of an operator that makes it good for top down enumeration?
 works when a function is “sufficiently injective”
 output examples have a small preimage
 works when a function is “sufficiently injective”
 Conditional Abduction
 form of topdown propagation
eusolver discussion:
 constraints:
 behavioral
 linear arithmetic, first order logical formulas
 f(x, y) >= x && f(x, y) >= y && …
 linear arithmetic, first order logical formulas
 structural
 conditional expression grammar
 search strategy
 enumerative search strategy
 behavioral
 specification needs to be pointwise
 what is pointwise vs non pointwise?
 f(x) > f(x+1)
 how would it break the enumerative solver?
 CEGIS (counterexample guided inductive synthesis)
 if the spec is not monotonically increasing/decreasing, CEGIS would break
 what is pointwise vs non pointwise?
 What are pruning decomp techniques EUsolver uses to speed up search
 condition abduction + special form of equivalence reduction
 why generate additional terms when all inputs are covered
 the predicates might not cover all the terms in a manner which solves the problem
 naive alternative to decision tree learning for synthesizing branch
conditions?
 learn atomic predicates that precisely classify points
 why worse? 
 as bad as Esolver?  enumerate over many possible conditions not much better
 next best is decision tree learning w/o heuristics
 why is this worse? more space used by decision tree
 learn atomic predicates that precisely classify points
Lecture 5
 Order of search:
 enumerative search explores programs by depth/size
 good default bias: small solution is likely to generalize?
 result?
 scales poorly with the size of the smallest solution to a given spec
 if spec is insufficient: plays monkeys paw
 enumerative search explores programs by depth/size
 biasing search:
 idea: explore programs in order of likelihood rather than size
 q1: how do we know which programs are likely
 learn some kind of statistical/probabilistic models
 q2: how do we use that information to guide search
 q1: how do we know which programs are likely
 idea: explore programs in order of likelihood rather than size
 statistical language model:
 originated in NLP
 in general, a probability distribution over sentences in a language
 \[P(s) \for s \in L\]
 what are the natural programs (sentences) in a DSL?
 kinds of corpora:
 all programs from DSL: what is natural in DSL?
 solutions to specific tasks
 specprogram pairs: what are the likely programs?
 kinds of __:
 example:
 Code completion with statistical language models: SLANG
 predicts completions for a sequence of API calls
 treats programs as a set of abstract histories
 training learns bigrams, ngrams, RNNs on histories
 inference: given a history with holes
 bigrams to get some possibilities
 ngrams/RNN to rank them
 combined history completions into a coherent program
 features: fast
 limitations: all invocation pairs must appear in the training set
 inference: given a history with holes
 neural corrector for MOOCs
 input: incorrect program + test suite
 output: correct program
 treats programs as sequence of tokens, abstracting variable names
 uses skipgram model to predict which statement is most likely to occur between the begin and end statement
 features: repair syntax errors; limitations: needs all algorithmically distinct solutions to appear in the training set
 DeepCoder:
 algorithm with a neural net that given input/output examples attempts to guess which operations and operators are the most likely to be in the part of the grammar.
 features:
 trains on synthetic data:
 can be easily combined with any enumerative search
 significant speedsups for small list DSL
 Code completion with statistical language models: SLANG
 weighted topdown search
 idea: explore programs in the order of likelihood, not size
 assign weights to edges such that if one distance from the root is less,
the program is more likely.
 can search with dijkstra’s (or A*)
Lecture 6
 PCFG doesn’t provide enough information to synthesize
 instead, use a PHOG which represents more information than a PCFG.
 contains a context object
 instead, use a PHOG which represents more information than a PCFG.
 synthesize a program that describes the program
 start with CFG
 synthesize into AST
 learn a “context” and probability for the grammar
 PHOGs supposed to be good for:
 code completion, deobfuscation, PL translation, statistical bug detection
 Euphony contraints:
 structural: input/output examples
 behavioral: PHOGs
 search strategy: topdown enumerative w/ A*
 euphony q2: what would productions of
Rep(x,"", S)
look like if we use a PCFG, 3gram? S > "" = 0.2  S > "." = 0.3  ...
 equivalence vs weak equivalence
 equivalence: no matter how holes are filled, the programs are equivalent
 weak equivalence: subsequences of sentential forms are equivalent on some points
 paper contributions
 efficient way to guide search by probabilistic grammar
 transfer learning for PHOGs
 extend observational equivalence to topdown search
 weaknesses
 requires highquality training data
 transfer learning requires manually designed features
Lecture 7
 representationbased + stochastic search
 representationbased search
 idea:
 build a data structure that represent a large part of the search space
 search within the data structure
 useful when
 need to return multiple results/rank results
 can preprocess search space / use for multiple queries
 tradeoff: easy to build vs easy to search
 idea:
 representations
 version space algegra (VSA)
 Finite Tree Automaton (FTA)
 Type Transition Net (TTN)
 VSA
 build a data structure succinctly representing the set of programs consistent with examples
 operations on version spaces:
learn <i, o> > VS
VS1 \intersect VS2 > VS
pick VS > program
 version space is a DAG
 leaves: set of programs
 union nodes: represents two sets of programs together
 join node: corresponds to an operation on pairs of programs
 join/union can be anywhere
 Volume of a VSA –> V(VSA) (number of nodes)

Size of a VSA –> VSA (number of programs)  VSA DSL restrictions
 every operator has a small easily computable inverse
 every recursive rule generates a strictly smaller subproblem
 PROSE framework
Lecture 8
 Finite Tree Automaton (FTA)
example:
N ::= id(V)  N + T  N * T
T ::= 2  3
V ::= x
spec:
1 > 9
 FTA is \(A = {Q, F, Q_f, \delta}\)
 states + alphabet, final states, transitions
 FTA can have a lot of states, even with simple grammars
 idea: instead of one state/one value:
 Type transition net (TTN)
 context: componentbased synthesis
 given a library of components + query: type + examples
 synthesize composition of components
 given a library of components + query: type + examples
 idea: build a compact graph of the search space using
 types as nodes
 components as transitions
 generate a petri net (special graph) that can define reachability of a particular program
 petrinets
 bipartite graph
 two node types, edges only between each type
 transitions consume tokens
 produce token for return type
 bipartite graph
 representationbased vs enumerative
 enumerative unfolds the search space in time, while representationbased stores the search space in memory
 FTA –> bottomup
 with observational equivalence
 VSA –> topdown
 with topdown propagation
Lecture 9
 topics
 stochastic search
 constraint solvers
 constraintbased search
 search techniques
 CFG  enumerative, good for small programs
 PCFG/PHOG  weighted enumerative search. Still quite large exploration
 local search 
 naive local search
 to find the best program
 pick a starting point, mutate and check if new mutation is better than old.
 likely results in getting stuck at a local minima
 MCMC sampling to give chance to get out of a local optima
 stochastic sampling requires a good cost function

\[C_s(p) = eq_s(p) + \text{perf}(p)\]
 components
 source program
 penalty for wrong results
 penalty for being slow
 components
 local search:
 can explore program spaces with no apriori bias
 limitations?
 only applicable with a good cost function
 counterexample: round to next power of two
 Intro to SAT and SMT solvers
 synthesis is combinatorial search, so is SAT
 SAT solvers are quite good
 ???
 profit!!
 Boolean SATisfiability
 logical operators/propositional variables
 (gin \or tonic) \and (minor => !gin) \and minor
 SMT  Satisfiability Modulo Theories
 not all formulas are modeled as boolean variables
 SMT support for nonboolean problems
 Using SMT solvers
 example: Array partitioning
 partition array on N evenly into P subranges
 what happens when N is not divisible by P?
 sizes of partitions don’t differ by more than 1
 example: Array partitioning
 popular theories
 equality and uninterpreted functions (axioms of equality and congruence)
 linear integer arithmetic  e.g. {0, 1,, …. +, }
 Arrays  select/store at particular index
 theories can be combined
Lecture 10
 Constraintbased search
 idea: encode synthesis problem as a SAT/SMT problem
 program space can be encoded into parameters
 two constraints by default: semantic correctness constraint + wellformed constraint
 phi(C, i, o)  behavior/function of program
 how to define an encoding
 define parameter space => C = {c1, c2, c3, …, cn}
 encode: program –> C
 decode: C –> program
 define a formula wf(c1, …, cn)
 must hold iff decode[C] is a wellformed program
 define a formula phi(c1,…,cn, i, o)
 holds iff (decode[C])(i) = 0
 define parameter space => C = {c1, c2, c3, …, cn}
 properties of a good encoding
 sound: wellformed and semantically correct
 complete: decode[C] is a solution to the wf and semantic constraints
 small parameter space to avoid symmetries
 solverfriendly : decidable logic, compact constraint
 DSL limitations
 program space can be parameterized with a finite set of parameters
 program semantics should be expressible as a decidable SAT/SMT formula
Lecture 11
 Rich specifications (why not PBE?)
 reference implementation
 easy to compute result, but hard to do so efficiently or under structural constraints
 assertions
 hard to compute result, but easy to check desired properties
 prepost conditions
 assign logical expression to be true before and after
 refinement types
 push the logical expression to be inside the types.
 reference implementation
 why else to go beyond examples?
 examples contain toolittle information
 output can be difficult to construct (outputs hard to compute)

reasoning about nonfunctional properties (e.g., security protocols)
 why is this hard?
 compute GCD – infinitely many inputs
 also, infinitely many paths (loops)
 compute GCD – infinitely many inputs
 constraintbased synthesis from specifications
 how to solve constraints on infinitely many inputs?
 CEGIS/SAT Solver
 CBS from specifications
 for all i/o examples, relate them by a function \(phi\)
 there exists a program, such that for all inputs if the program satisfies the constraints, then it must also solve the program \(phi\)
 how to solve constraints on infinitely many inputs?
 CEGIS, formally
 given a harness create an encoding:
 there exists [unknown variable set] for all [input/output examples] such that the [program] implies [an assertion/constraint]
 linear constraint guaranteed to satisfy bounded observation hypothesis (linear constraint only needs two)
 how to find the “good” inputs for which BOH holds
 rely on oracle to generate counterexamples
 constraintbased synthesis from specifications
 different constraints for different problems
 CFGs
 Components
 “just figure out the constants”
 generators in sketch: function with hole, always filled in the same way
 generator can be filled in multiple ways, depending on the context where it is called.
 different constraints for different problems
Lecture 12
 Encoding arbitrary semantics in programs with holes/constraints
 symbolic execution: no concrete values for each variable
 variables represented as formulas instead of concrete values
 expression reads a state and produces a value
 states are modeled as a map (sigma) from variables to values
 using lambda calculus to model Alpha, Sigma, Psi (symbols, state, constraints of valid executions)
 \(A[[x]]\sigma\) – value of x at current state
 \[C[[command]]<\sigma, \psi> = \langle {State}, {Assertions} \rangle\]
Lecture 13
 Typedriven program synthesis
 specification –> types (programmerfriendly, informative)
 use types to prune search space
 synthesis accepts a type, a context, and produces a program
 Type systems
 deductive system for proving facts about programs and types
 defined using inference rules over judgements
 “under context \(\Gamma\), term \(e\) has type \(T\)
 a simple type system:
 \[e ::= 0  e + 1  x  e e  \lambda x.e\]

types: $$ T ::= Int T \rightarrow T $$ 
contexts: $$ \Gamma ::= . x : T,\Gamma $$
 syntax guided enumeration
 only enumerate programs with the proper types
 still need to type check all programs before skipping
 better idea: typeguided enumeration
 enumerate all derivations generated by the type systems
 extract terms from derivations
 three ways to do typing judgements
 type checking (term and type known)
 type inference (term is known, type is unknown)
 synthesis (term unknown, type is known)
 search strategy: recursively transform synthesis goal using typing rules until finding a complete derivation
 only enumerate programs with the proper types
 generating some programs can result in equivalencies
 only generate programs in normal form
 restrict the type system to make redundant programs illtyped
 typedriven synthesis in 3 steps:
 annotate types with extra specification(examples, logical predicates, resources)
 design a type system for annotated types (propagate as much info as possible from conclusion to premises)
Lecture 14
 polymorphic types:
 type that takes a can take a type as a parameter
 e.g. generics in Java
 refinement types: augment types with predicates
 more complex types:
 dependent function types: name arguments of a function, then use
arguments later in the function/predicate
 example, max function:
max :: x : Int > y: Int > {v: Int  x <= v && y <= v }
 example, max function:
 dependent function types: name arguments of a function, then use
arguments later in the function/predicate
 subtyping: T’ is a subtype of T is all values of type T’ also being to T
 written
T' <; T
 written
 synquid limitations
 user interaction:
 types can be large and hard to write
 components need to be annotated
 expressiveness
 specs are tricky/impossible to express
 cannot synthesize recursive auxiliary functions
 condition abduction is limited to liquid predicates
 cannot generate arbitrary constraints
 user interaction:
 synquid questions
 behavioral: refinement types?
Lecture 15
 constraintbased synthesis
 loops are hard!
 loops create many paths, integers gives many inputs
 sketch handles loops by unrolling to a particular depth
 if we can’t unroll enough, get an unsatisfiable program
 loops are hard!
 Hoare logic: a logic for simple imperative programs
 particularly loop invariants
The imp language:
e ::= n  x 
e + e  e  e  e * e 
e = e  e < e  e > e  !e  e && e
+ assignment/if/then/while logic
Hoare triples
 judgments grouped as {P} c {Q}
 precondition P
 if execution of c
 postcondition Q

if P holds in initial state $$ sigma\(, and if the execution from c from\)\sigma\(terminated in a state\)‘\sigma\(, then Q holds in\)‘\sigma$$  “partial correctness” –> program may not terminate
 “full correctness” –> program terminates
 need to be able to express values in final states are equivalent to values in beginning states
 solution: introduce logical variables which don’t appear in the actual program
 this can make expressions for pre/post conditions more expressive
 “skip” –> {P} skip {P} –> state holds before and after
 assignment semantics in hoare logic
{P[x > e]} x := e {P}
(P[x>e]\sigma)
–> P holds in state
 semantics of compositions
 need to invent intermediate insertion
 rule of consequence –> if you “need” less to prove in the beginning but can “prove” more at the end, it is essentially just as strong as the original preconditions
Lecture 16
 synthesizing C code: verbose, unstructured, pointers (yuck)

hoare logic is not enough to prove heap manipulating programs in the case where two variables might point to the same location in memory
 suslik approach:
 separation logic –> deductive synthesis –> provable program
 suslik swap example:
Precondition: {x > A * y > B}
Postcondition: {x > B * y > A}
 special part of separation logic:
*
 items joined with operator must be in disjoint heap locations
 judgement of a program:

P –> Q c  A state satisfying P can be transformed into a state satisfying Q using a program c

 use rules to deduce /simplify prepostconditions until “skip” (do nothing)
 proof search
Lecture 17
 Nope paper contributions:
 first to prove unrealizability for infinite program spaces
 CEGIS for unrealizability
 sound for synthesis and unrealizability
 NOPE limitations:
 can timeout/run forever
 situations where it can fail to terminate:
 program that works on finitely many examples (but not infinite)
 very large/difficult problem.
 in practice.. works on < 1/2 benchmarks
 limited to SyGuS
Lecture 18
 Programming with humans
 snippy: projection boxes to show programming sate while writing the program
 live programming is a good environment for synthesis
 inputs already exist (for simple programs)
 encourages “smallstep” PBE (easier for synthesizer)
 snippy was more limited, but faster, lower cognitive burden, and synthesized more compact solutions
 hoogle+
 synthesize from types, but difficult because they are ambiguous
 if output all programs with correct type, too many irrelevant programs
 which solution is the right one?
 use a testbased filtering
 help beginners with types
 synthesize from types, but difficult because they are ambiguous
 RESL
 REPL but with synthesis
 syntactic specs + sketching + debugger
 give granular feedback about grammar to synthesizer
 REPL but with synthesis
Lecture 19
 synthesis with users cont’d

today: Rousillon, Wrex, Regae
 Rousillon / Helena
 web scraping tool for social scientists
 problem with traditional scrapers: need to reverseengineer webpage DOM
 types of web data:
 distributed: user must naviagete between many pages, click, use forms, etc
 hierarchical: treestructured data
 Wrex:
 goals: for data scientists, not a black box:
 users create a data frame and sample it
 Wrex has an interactive grid where users derive a new column and give transformation examples
 give code window with synthesized code
 apply synthesized code to full data frame and plotting.
