CAV 2014 has asked authors to provide repositories of data, log files, sources, and binaries used for experimentation. To this end, we are including in this repository the sources/binaries used for the associated tools: Kaluza, Z3-str (including Z3), and CVC4. These are described first, followed by benchmarks, data, log files, and experimental scripts. ============= TOOLS ============= Kaluza is covered by a non-commercial no-derivative creative commons license. There were minor changes made to Kaluza for our work, and only to the "RunKal.sh" script; according to the creative commons FAQ on the subject, this appears not to constitute a derivative work (as it's not substantially or creatively different in a substantial way). However, it must be stated that the work is a minor modification. See KaluzaBin/LICENSE.txt for details of the license. Kaluza requires HAMPI (covered by the MIT license, see KaluzaBin/hampi/LICENSE.txt), and the exact version used is included here. Kaluza also requires yices 1.0.27, which *CANNOT BE INCLUDED IN THIS ARCHIVE* due to licensing restrictions. The exact version used can be downloaded from SRI's website (see KaluzaBin/yices-1.0.27-README.txt). Hampi must be altered to work due to its incorporation of hardcoded paths; see especially KaluzaBin/hampi/hampi.sh. Z3-str is covered by the MIT license, so we can distribute it. It depends on Z3 4.1.1, which is covered by a license that allows us to redistribute it. Z3-str's python script is slightly modified to run with an older version of Python and to include a hardcoded path (necessary). It is otherwise unmodified. Z3 is unmodified. See Z3-str/LICENSE.txt and Z3-str/z3/LICENSE.txt for details of these licenses. Z3-str must be altered to work due to its incorporation of hardcoded paths; see especially Z3-str/Z3-str.py. We used a development version of CVC4 that corresponds to git commit ef5d5880 in our GitHub.com repository: https://github.com/CVC4/CVC4/tree/ef5d5880ad48d3659db33477c08a45eba44aab0d CVC4 is covered by a BSD-ish license, and we hold the copyright, so of course we can redistribute it. See cvc4-ef5d5880/src/COPYING for details of the CVC4 license. Additionally, an old version of "TreeLimitedRun", developed for CASC and also used for SMT-COMP for a number of years, was used to ensure memory limits and time limits were respected. I believe Geoff Sutcliffe is the author of this tool. The source is a single file, TreeLimitedRun.c. See http://www.cs.miami.edu/~tptp/CASC for details of CASC. Several scripts were developed for these experiments. They are described below, under EXPERIMENTAL SETUP. ============= BENCHMARKS ============= Benchmarks are in the "benchmarks/Kaluza" directory. The original benchmarks from the Kaluza project are found in the "original" subdirectory. We translated these into SMT-LIB format with CVC4-style string constraints; these translations are found in the "SMTLIB" subdirectory. Finally, these SMT-LIB versions were turned into Z3-str's format (same as SMT-LIB but the String constraints are slightly different, see below); these are in the "Z3-STR" subdirectory. Each subdirectory contains 47284 benchmarks. The "KaluzaTranslator" translated from Kaluza format to SMT-LIB (CVC4-style) benchmarks. Source is available in benchmarks/KaluzaTranslator. In Kaluza, the CapturedBrack function is not well-documented. It takes two arguments, a regular expression and an integer. If the second one is 0, then it means the regular expression, it is confirmed. If the second argument is not 0, we don't know the meaning. So we simply removed those benchmarks. The "z3str-translator" binary is a version of CVC4 built to simply parse SMT-LIB and then print expressions with string constraints in Z3-str's format. Source is in z3str-translator-src. It's in our github at https://github.com/mdeters/CVC4/tree/z3str-translator ---although this is not an archival link (it's likely to change, or be removed after similar functionality is merged into mainline). The z3str-translator translates CVC4-style string operations like "str.++" to Z3-str's (like "Concat"). In the case of str.++/Concat, it also has to fix N-ary applications of str.++, since Concat only handles two arguments. Further, the regular expressions in our benchmark set were largely trivial (e.g. (str.in.re s (str.to.re "x")) is the same as (= s "x")), so these were converted. One benchmark, sat/small/search.readable.smt2.out.check, is not converted by z3str-translator, because its use of regular expressions is a bit (though not much) more complicated. This one benchmark was converted by hand into Z3-str's format. ============= RESULTS ============= The main collected results are in cav-paper-results.csv. The columns are - the benchmark name - the file size in bytes of the SMT-LIB (CVC4) version - CVC4's answer (sat/unsat/timeout) - CVC4's user time - CVC4's wall clock time - CVC4's model status (blank when answer was unsat/timeout, else "confirmed" or "confirmed-cvc4-only", see below) - Kaluza (corrected script) answer (sat/unsat/timeout/unknown, "crash" for a script error) - Kaluza (corrected script) user time - Kaluza (corrected script) wall clock - Kaluza (original script) answer (sat/unsat/timeout/unknown, "crash" for a script error) - Kaluza (original script) user time - Kaluza (original script) wall clock - Z3-str's answer (sat/unsat/timeout, "error" for parse errors (escapes embedded in strings)) - Z3-str's user time - Z3-str's wall clock - Z3-str's model status (blank when answer was unsat/timeout/error, else "confirmed" or "bogus", see below) The "model status" columns are calculated as follows. For each benchmark CVC4 (respectively Z3-str) answered "sat", a model was extracted and this model was turned into assertions. These assertions weree added to the original benchmark. The result was then fed to both CVC4 and to Z3-str for confirmation of the model. (If the result is "sat" for both, the model status is "confirmed." If the result is "unsat", the model status is "bogus"---i.e., the model was rejected. Because Z3-str cannot handle some cases of quotes embedded in strings (for example, the string literal "\\\"" gives it trouble), it cannot confirm CVC4's models for certain benchmarks. In this case, the model status is "confirmed-cvc4-only" as there cannot be confirmation from Z3-str. The results show that all of CVC4's models are confirmed, and some of Z3-str's are confirmed (and some are bogus). Again, all "confirmed" and "bogus" statuses are agreed upon by *both* tools. There was no disagreement on the correctness of models in our tests. ============= RAW LOGS ============= The logs are collected in cvc4-ef5d5880.output, KaluzaBin-original.output, KaluzaBin-corrected-script.output, and Z3-str.output. ========================= EXPERIMENTAL SETUP ========================= The "run" scripts were used to run benchmarks and create logfiles. E.g., "run-cvc4-experiments" runs cvc4 on all the benchmarks and produces output files in the cvc4-ef5d5880.output directory. It logs standard output, standard error, and the exit code. Kaluza's run script is similar, but also cleans up some things between Kaluza runs (Kaluza depends on files in /tmp, for example), and also logs some additional things from runs (the "stdout" and "stderr" files collected by Kaluza's script, which log internal tools' stdout and stderr and can indicate intermediate stage failures). Z3-str's run script is straightforward. To collect results from the raw logs into the results, we have a "collect-output" script. This script takes a single argument, the directory containing the output files, and produces a list of results. To check models, the "model-checker" scripts go through the output files, match them to the benchmarks, and produce new "check" (input for z3-str) and "checkcvc" (input for CVC4) benchmarks in the output directory. These new inputs are used for model confirmation. A list of results is produced. Regards, The authors of the CAV 2014 submission entitled "A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions": - Tianyi Liang - Andrew Reynolds - Cesare Tinelli - Clark Barrett - Morgan Deters This explanation, and the repository, is complete and truthful to the best of our knowledge, and adquately and responsibly documents our experimental process for the results included in the paper. Sat, 08 Feb 2014 00:42:24 -0500