CAV 2014 has asked authors to provide repositories of data, log files, sources,
and binaries used for experimentation.  To this end, we are including in this
repository the sources/binaries used for the associated tools: Kaluza, Z3-str
(including Z3), and CVC4.  These are described first, followed by benchmarks,
data, log files, and experimental scripts.

=============
TOOLS
=============

Kaluza is covered by a non-commercial no-derivative creative commons license.
There were minor changes made to Kaluza for our work, and only to the
"RunKal.sh" script; according to the creative commons FAQ on the subject,
this appears not to constitute a derivative work (as it's not substantially
or creatively different in a substantial way).  However, it must be stated
that the work is a minor modification.  See KaluzaBin/LICENSE.txt for details
of the license.  Kaluza requires HAMPI (covered by the MIT license, see
KaluzaBin/hampi/LICENSE.txt), and the exact version used is included here.
Kaluza also requires yices 1.0.27, which *CANNOT BE INCLUDED IN THIS ARCHIVE*
due to licensing restrictions.  The exact version used can be downloaded
from SRI's website (see KaluzaBin/yices-1.0.27-README.txt).  Hampi must be
altered to work due to its incorporation of hardcoded paths; see especially
KaluzaBin/hampi/hampi.sh.

Z3-str is covered by the MIT license, so we can distribute it.  It depends
on Z3 4.1.1, which is covered by a license that allows us to redistribute
it.  Z3-str's python script is slightly modified to run with an older
version of Python and to include a hardcoded path (necessary).  It is
otherwise unmodified.  Z3 is unmodified.  See Z3-str/LICENSE.txt and
Z3-str/z3/LICENSE.txt for details of these licenses.  Z3-str must be altered
to work due to its incorporation of hardcoded paths; see especially
Z3-str/Z3-str.py.

We used a development version of CVC4 that corresponds to git commit ef5d5880
in our GitHub.com repository:
  https://github.com/CVC4/CVC4/tree/ef5d5880ad48d3659db33477c08a45eba44aab0d
CVC4 is covered by a BSD-ish license, and we hold the copyright, so of course
we can redistribute it.  See cvc4-ef5d5880/src/COPYING for details of the
CVC4 license.

Additionally, an old version of "TreeLimitedRun", developed for CASC and
also used for SMT-COMP for a number of years, was used to ensure memory
limits and time limits were respected.  I believe Geoff Sutcliffe is the
author of this tool.  The source is a single file, TreeLimitedRun.c.  See
http://www.cs.miami.edu/~tptp/CASC for details of CASC.

Several scripts were developed for these experiments.  They are described
below, under EXPERIMENTAL SETUP.

=============
BENCHMARKS
=============

Benchmarks are in the "benchmarks/Kaluza" directory.  The original
benchmarks from the Kaluza project are found in the "original" subdirectory.
We translated these into SMT-LIB format with CVC4-style string constraints;
these translations are found in the "SMTLIB" subdirectory.  Finally, these
SMT-LIB versions were turned into Z3-str's format (same as SMT-LIB but the
String constraints are slightly different, see below); these are in the
"Z3-STR" subdirectory.  Each subdirectory contains 47284 benchmarks.

The "KaluzaTranslator" translated from Kaluza format to SMT-LIB (CVC4-style)
benchmarks.  Source is available in benchmarks/KaluzaTranslator.
In Kaluza, the CapturedBrack function is not well-documented.  It takes
two arguments, a regular expression and an integer.  If the second one is 0,
then it means the regular expression, it is confirmed.  If the second
argument is not 0, we don't know the meaning.  So we simply removed those
benchmarks.

The "z3str-translator" binary is a version of CVC4 built to simply
parse SMT-LIB and then print expressions with string constraints in
Z3-str's format.  Source is in z3str-translator-src.  It's in our github
at https://github.com/mdeters/CVC4/tree/z3str-translator ---although this
is not an archival link (it's likely to change, or be removed after similar
functionality is merged into mainline).

The z3str-translator translates CVC4-style string operations like "str.++"
to Z3-str's (like "Concat").  In the case of str.++/Concat, it also has to
fix N-ary applications of str.++, since Concat only handles two arguments.
Further, the regular expressions in our benchmark set were largely trivial
(e.g. (str.in.re s (str.to.re "x")) is the same as (= s "x")), so these were
converted.  One benchmark, sat/small/search.readable.smt2.out.check, is not
converted by z3str-translator, because its use of regular expressions is a
bit (though not much) more complicated.  This one benchmark was converted
by hand into Z3-str's format.

=============
RESULTS
=============

The main collected results are in cav-paper-results.csv.  The columns are

- the benchmark name
- the file size in bytes of the SMT-LIB (CVC4) version
- CVC4's answer (sat/unsat/timeout)
- CVC4's user time
- CVC4's wall clock time
- CVC4's model status (blank when answer was unsat/timeout, else "confirmed"
  or "confirmed-cvc4-only", see below)
- Kaluza (corrected script) answer (sat/unsat/timeout/unknown, "crash" for a script error)
- Kaluza (corrected script) user time
- Kaluza (corrected script) wall clock
- Kaluza (original script) answer (sat/unsat/timeout/unknown, "crash" for a script error)
- Kaluza (original script) user time
- Kaluza (original script) wall clock
- Z3-str's answer (sat/unsat/timeout, "error" for parse errors (escapes embedded in strings))
- Z3-str's user time
- Z3-str's wall clock
- Z3-str's model status (blank when answer was unsat/timeout/error, else
  "confirmed" or "bogus", see below)

The "model status" columns are calculated as follows.  For each benchmark
CVC4 (respectively Z3-str)  answered "sat", a model was extracted and this
model was turned into assertions.  These assertions weree added to the
original benchmark.  The result was then fed to both CVC4 and to Z3-str for
confirmation of the model.  (If the result is "sat" for both, the model
status is "confirmed."  If the result is "unsat", the model status is
"bogus"---i.e., the model was rejected.  Because Z3-str cannot handle some
cases of quotes embedded in strings (for example, the string literal "\\\""
gives it trouble), it cannot confirm CVC4's models for certain benchmarks.
In this case, the model status is "confirmed-cvc4-only" as there cannot be
confirmation from Z3-str.  The results show that all of CVC4's models are
confirmed, and some of Z3-str's are confirmed (and some are bogus).  Again,
all "confirmed" and "bogus" statuses are agreed upon by *both* tools.  There
was no disagreement on the correctness of models in our tests.

=============
RAW LOGS
=============

The logs are collected in cvc4-ef5d5880.output, KaluzaBin-original.output,
KaluzaBin-corrected-script.output, and Z3-str.output.

=========================
EXPERIMENTAL SETUP
=========================

The "run" scripts were used to run benchmarks and create logfiles.  E.g.,
"run-cvc4-experiments" runs cvc4 on all the benchmarks and produces output
files in the cvc4-ef5d5880.output directory.  It logs standard output,
standard error, and the exit code.  Kaluza's run script is similar, but also
cleans up some things between Kaluza runs (Kaluza depends on files in /tmp,
for example), and also logs some additional things from runs (the "stdout"
and "stderr" files collected by Kaluza's script, which log internal tools'
stdout and stderr and can indicate intermediate stage failures).  Z3-str's
run script is straightforward.

To collect results from the raw logs into the results, we have a
"collect-output" script.  This script takes a single argument, the
directory containing the output files, and produces a list of results.

To check models, the "model-checker" scripts go through the output files,
match them to the benchmarks, and produce new "check" (input for z3-str)
and "checkcvc" (input for CVC4) benchmarks in the output directory.  These
new inputs are used for model confirmation.  A list of results is produced.

Regards,
The authors of the CAV 2014 submission entitled "A DPLL(T) Theory Solver for
a Theory of Strings and Regular Expressions":
- Tianyi Liang
- Andrew Reynolds
- Cesare Tinelli
- Clark Barrett
- Morgan Deters

This explanation, and the repository, is complete and truthful to the best
of our knowledge, and adquately and responsibly documents our experimental
process for the results included in the paper. Sat, 08 Feb 2014 00:42:24 -0500