# TACAS'21 Artifact: Syntax-Guided Quantifier Instantiation Aina Niemetz (1), Mathias Preiner (1), Andrew Reynolds (2), Clark Barrett (1), and Cesare Tinelli (2) (1) Stanford University, (2) The University of Iowa This artifact contains the log files and data to replicate the experiments in the paper "Syntax-Guided Quantifier Instantiation" accepted to TACAS'21. ## Artifact Directory Structure ### Files * `cmpr.py`: Python script to extract, analyze, compare and print runtime data from solver runs. * `generate-tables.sh`: Script to generate Tables 2, 3, and 4 from the paper. * `install-packages.sh`: Script to install required Python packages in directory `packages`. * `paper.pdf`: Revised version of the paper. * `run-solver.sh`: Script to rerun experiments for a specific solver configuration. ### Directories * `benchmarks`: Contains all SMT-LIB benchmarks used in our experiments (non-incremental logics `BV`, `BVFP`, `BVFPLRA`, `FP`, `FPLRA`, `LIA`, `LRA`, `NIA`, `NRA`, and `UFBV`). * `binaries`: Contains the binaries for CVC4 ([branch](https://github.com/CVC4/CVC4/tree/sygus-inst)) and Z3 (version 4.8.9) that were used for the experiments in the paper. * `data-table2`: Runtime data for comparison of term selection strategies (Table 2). * `data-table3`: Runtime data for comparison of lemma selection strategies (Table 3). * `data-table4`: Runtime data for comparison against other techniques (Table 4). * `packages`: Contains Python packages downloaded via `pip3 download`, which are required for generating Tables 2-4 from the paper. #### Structure of `data-tableN` directories Each runtime data directory `data-tableN` contains subdirectories of runtime data grouped by solver/configuration. In each configuration directory the runtime data is grouped by SMT-LIB logic, e.g., `non_incremental_BV/`. It further contains a file called `options`, which contains the command used to run the specific configuration/solver. For example, the options used for configuration `both-both` in Table 2 can be found in `data-table2/both-both/options` and includes the following two lines. ``` command: cvc4 --no-e-matching --sygus-inst --no-cegqi --fp-exp --sygus-inst-scope=both --sygus-inst-term-sel=both --sygus-inst-mode=interleave --pre-skolem-quant runexec: --timelimit 300 --memlimit 8000MB ``` The first line is the command used to execute a configuration on a given benchmark, whereas the second line specifies options passed to the tool `runexec`, which we used to limit runtime and memory and is part of the benchmarking and resource management framework [`benchexec`](https://github.com/sosy-lab/benchexec). The runtime data directories for each SMT-LIB logic are structured based on the benchmark structure used in SMT-LIB. Our cluster setup produces two log files per benchmark, which is stored as follows: ``` //...//{output.log,run.out} ``` * `output.log`: Contains the configuration/solver output printed to `stdout` and `stderr`. * `run.out`: Contains the actual runtime data measured by `runexec`. **Note**: The experiments were performed on a cluster with Intel Xeon E5-2620 CPUs (2.1GHz) and 128GB memory. ## Steps to generate Tables 2-4 1. **Install Python Packages** Generating the tables in the paper from the provided log files requires Python pandas. The following script will install the required Python dependencies from directory `packages`. **Note**: The error and warning printed during package installation can be ignored. ``` ./install-packages.sh ``` 2. **Generating Tables 2, 3, and 4** ``` ./generate-tables.sh ``` This script will generate `table2.tex`, `table3.tex`, and `table4.tex`, which correspond to Tables 2, 3, and 4 in the paper. Note that layout of Table 4 in the paper was manually changed after generating the initial version. The numbers however match the numbers in `table4.tex`. **Note (Changes after artifact review)**: 1. The artifact now contains the data for two more solvers (Boolector, Vampire) in Table 4. The comparison against these two solvers was added for the camera-ready version of the paper and therefore was not part of the initial version of the artifact that was reviewed by the TACAS 2021 artifact evaluation committee. 2. The artifact further includes updated numbers on 316 benchmarks for the solver Z3. In the initial experiments, Z3 had issues with filenames containing the character `=`. As a consequence, we had to rerun Z3 on these benchmarks and updated the numbers in Table 4 accordingly. This change also happend after the artifact was reviewed by the TACAS 2021 artifact evaluation committee. ## Replicating Data not in Tables * In the paper we reported that the CVC4 configurations and Z3 disagree on 10 benchmarks of logic `FP`, which is due to a known problem in Z3 related to operator `fp.rem`. To list the corresponding benchmarks use: ``` ./cmpr.py data-table4/* --show-dis ``` * Uniquely solved instances for all four CVC4 configurations: ``` ./cmpr.py -t data-table4/{cegqi,ematch,enum,syqi} -c uniq ``` * Uniquely solved instances between `cegqi` and `syqi` CVC4 configurations: ``` ./cmpr.py -t data-table4/{cegqi,syqi} -c uniq ``` ## Rerunning Experiments Since rerunning the experiments on all benchmarks is not feasible in the context of this artifact (required 3710 hours of CPU time on our cluster) we list the steps to rerun selected benchmarks in the following. The options used for CVC4 (`binaries/cvc4`) and Z3 (`binaries/z3`) can be found in each configurations `options` file. For example, for configuration `both-both` in Table 2, the options can be found `data-table2/both-both/options`. CVC4 can then be called on a selected benchmark from the `benchmarks` directory as follows: ``` time binaries/cvc4 --no-e-matching --sygus-inst --no-cegqi --fp-exp --sygus-inst-scope=both --sygus-inst-term-sel=both --sygus-inst-mode=interleave --pre-skolem-quant ``` ### Example Running `both-both` on benchmark `BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2` produces the following output: ``` $ time binaries/cvc4 --no-e-matching --sygus-inst --no-cegqi --fp-exp --sygus-inst-scope=both --sygus-inst-term-sel=both --sygus-inst-mode=interleave --pre-skolem-quant benchmarks/BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2 unsat real 0m1.276s user 0m1.268s sys 0m0.007s ``` **Note**: This was run on a work station with a Ryzen 1800X with 3.6GHz CPU. The runtime data for this benchmark can either be checked via the `cmpr.py` script with: ``` $ ./cmpr.py data-table2/both-both/ -f /BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2 status result exit_code time_cpu memory benchmark config usb-phy-fixpoint-1.smt2 ok unsat 0 2.4 58.2 ``` or by inspecting the provided log files in ``` $ cat data-table2/both-both/non_incremental_BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2/run.out c host: barrett33.stanford.edu c start: Tue Oct 13 04:38:32 PDT 2020 c arrayjobid: 163160053 c jobid: 163344143 c command: /sygus-both-both-interleave/cvc4 --no-e-matching --sygus-inst --no-cegqi --fp-exp --sygus-inst-scope=both --sygus-inst-term-sel=both --sygus-inst-mode=interleave --pre-skolem-quant /SMT-LIB/non-incremental/BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2 c args: /SMT-LIB/non-incremental/BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2 2020-10-13 04:38:32,265 - INFO - LXCFS is not available, some host information like the uptime leaks into the container. 2020-10-13 04:38:32,268 - INFO - Starting command /sygus-both-both-interleave/cvc4 --no-e-matching --sygus-inst --no-cegqi --fp-exp --sygus-inst-scope=both --sygus-inst-term-sel=both --sygus-inst-mode=interleave --pre-skolem-quant /SMT-LIB/non-incremental/BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2 2020-10-13 04:38:32,268 - INFO - Writing output to /sygus-both-both-interleave/non_incremental_BV/wintersteiger/fmsd13/fixpoint/usb-phy-fixpoint-1.smt2/output.log starttime=2020-10-13T04:38:32.462632-07:00 returnvalue=0 walltime=2.468061118386686s cputime=2.44409025s cputime-cpu18=2.443987523s cputime-cpu7=0.000102727s memory=58245120B blkio-read=0B blkio-write=0B c done ``` ## Rerunning Experiments in a more Automated Way For convenience we also provide the script `run-solver.sh` that allows to rerun subsets of the experiments. The script can then be invoked as follows: ``` ./run-solver.sh data-table4/syqi/options benchmarks/NIA/ ``` where the first argument specifies the solver configuration to be run and the second argument points to the benchmarks to be run. The above command will run the syqi configuration in Table 4 on benchmark set NIA. ### Example Output ``` Benchmark: NIA/tptp/ARI123=1.smt2 CPU time from log: 0.015325465 Status from log: unsat CPU time from run: 0.007 Status from run: unsat Benchmark: NIA/tptp/NUM882=1.smt2 CPU time from log: 0.005401276 Status from log: sat CPU time from run: 0.004 Status from run: sat Benchmark: NIA/tptp/NUM878=1.smt2 CPU time from log: 0.007369659 Status from log: unsat CPU time from run: 0.000 Status from run: unsat Benchmark: NIA/tptp/NUM879=1.smt2 CPU time from log: 0.005251154 Status from log: sat CPU time from run: 0.000 Status from run: sat ... ``` ## Publication of Artifact The artifact will be published at [https://cvc4.github.io/papers/tacas2021-syqi](https://cvc4.github.io/papers/tacas2021-syqi) and will be permanently archived on [https://zenodo.org](https://zenodo.org).