Introduction

Introduction

DSL

Approach and InputDataGraphs

Questions

Summary

The authors of BlinkFill present an improvement to the FlashFill predecessor introduced in Microsoft Excel. The authors devise an interesting approach to compressing the total program space utilized during the learning phase. As the amount of input data grows, they are able to contain the graph size to a much smaller size. This helps them improve the speed of learning the synthesis program and enables an efficient graph search with dijkstra’s to pick the most likely program candidate once the initial set of programs is generated.

Strengths

Weaknesses

The main weaknesses of BlinkFill is that is it constrained to string manipulation programs, and even then, maintains a limited set of possible regex operations. It does not support conditionals or loops like its predecessor FlashFill which potentially limits its use cases depending on how complicated the data transformations may be

What does BlinkFill use as behavioral constraints? Structural constraints? Search strategy?

What is the main technical insight of BlinkFill wrt FlashFill?

The main technical insight with BlinkFill is the ability to represent the input space in a compressed format as the InputDataGaph and once the possible programs are generated from the InputDataGraph a most likely candidate is chosen by ranking the possible programs which reduces the number of examples required to synthesize a correct program.

-

Write a program in the BlinkFill DSL (L_s) that extracts conference acronyms and years; the program should satisfy the following examples:

> "Programming Language Design and Implementation (PLDI), 2019, Phoenix AZ" -> "PLDI 2019"

> "Principles of Programming Languages (POPL), 2020, New Orleans LA" -> "POPL 2020",
Concat(SubStr(Position("(", 1, Start), Position(")", 1, Start)), SubStr(Position(", ", 1, Start), Position(",", 2, Start)))

As described in the paper, the position expressions of the BlinkFill DSL only support matching a single token from Table 1, e.g. C. Could we extend the algorithm to support sequences of tokens, e.g. C ws $ to match caps followed by whitespace follows by end of string? How would this change the construction and intersection of input data graphs (use Fig 6 as an example).

I think it is possible, but would be difficult and likely affect the performance benefits of BlinkFill. If the DSL supported compound regex formulas I presume that the IDG would probably be much larger in size as you would likely need to express exponentially more edges since the combination of operators increases the search space of the synthesis program

Introduction - zac blanco