Strings
This page describes support for the theory of strings in CVC4.
Syntax
This document focuses on input written in SMTLIB 2 format. A frontend for CVC4's native syntax is not available yet.
We highly recommend that users use SMTLIB Version 2.5, instead of Version 2.0. The major difference is in the definition of escape sequences for string literals.
The syntax below is for CVC4 version > 1.4. Version 1.3 has only partial support for syntax in this document.
Since the string (sub)solver is still relatively new, the current stable version of CVC4 (1.4) does not provide the latest version of that solver. Please use our latest Development version instead.
Currently, the string solver supports string constants over a set of characters limited to the printable ASCII characters. Other characters must be encoded with escape sequences. For arbitry alphabets, we plan to provide later a separate solver for a theory of parametric sequences.
To use the string solver it is important to declare initially
(using the setlogic
command) an SMTLIB logic that includes strings.
Since the SMTLIB standard does not have an official theory of strings and
related logics yet, the logic names described below are tentative and
might change later.
The basic logic is QF_S
consisting of quantifierfree formulas
over just the theory of strings, e.g.:
(setlogic QF_S)
A summary of the relevant syntax for strings in the SMT2, CVC, and API is below. Note that regular expressions are not yet supported in the CVC format. More details on these operators can be found later in this page.
CVC language  SMTLIB language  C++ API  

Logic string  Not needed  preappend "S" for strings  preappend "S" for strings 
(setlogic QF_SLIA)

smt.setLogic("QF_SLIA");
 
String Sort  STRING

String

em.stringType();

String literals  "abcdef"

"abcdef"

em.mkConst( ::CVC4::String("abcdef") );

Concatenation  CONCAT( X1, ..., Xn )

(str.++ X1 ... Xn )

em.mkExpr(kind::STRING_CONCAT, X1, ..., Xn);

Length  LENGTH( x )

(str.len X)

em.mkExpr(kind::STRING_LENGTH, X);

String contains  CONTAINS( X, Y )

(str.contains X Y)

em.mkExpr(kind::STRING_STRCTN, X, Y);

Index of  INDEXOF( X, Y, N )

(str.indexof X Y N)

em.mkExpr(kind::STRING_STRIDOF, X, Y, N);

Replace  REPLACE( X, Y, Z )

(str.replace X Y Z)

em.mkExpr(kind::STRING_STRREPL, X, Y, Z);

Substring  SUBSTR( X, Y, Z )

(str.substr X Y Z)

em.mkExpr(kind::STRING_SUBSTR, X, Y, Z);

Prefix of  PREFIXOF( X, Y )

(str.prefixof X Y)

em.mkExpr(kind::STRING_PREFIX, X, Y);

Suffix of  SUFFIXOF( X, Y )

(str.suffixof X Y)

em.mkExpr(kind::STRING_SUFFIX, X, Y);

String to Integer  STRING_TO_INTEGER( X )

(str.to.int X)

em.mkExpr(kind::STRING_STOI, X);

Integer to String  INTEGER_TO_STRING( X )

(int.to.str X)

em.mkExpr(kind::STRING_ITOS, X);

String to Integer (16bit)  STRING_TO_UINT16( X )

(str.to.u16 X)

em.mkExpr(kind::STRING_STOU16, X);

Integer (16bit) to String  UINT16_TO_STRING( X )

(u16.to.str X)

em.mkExpr(kind::STRING_U16TOS, X);

String to Integer (32bit)  STRING_TO_UINT32( X )

(str.to.u32 X)

em.mkExpr(kind::STRING_STOU32, X);

Integer (32bit) to String  UINT32_TO_STRING( X )

(u32.to.str X)

em.mkExpr(kind::STRING_U32TOS, X);

Character at  CHARAT( X, N )

(str.at X N)

em.mkExpr(kind::STRING_CHARAT, X, N);

Regular expression sort  n/a

RegExp

em.regExpType();

Membership in regular expression  n/a

(str.in.re X R)

em.mkExpr(kind::STRING_IN_REGEXP, X, R);

String to regular expression  n/a

(str.to.re X)

em.mkExpr(kind::STRING_TO_REGEXP, X);

Regular expression concatenation  n/a

(re.++ R1 ... Rn)

em.mkExpr(kind::REGEXP_CONCAT, R1, ..., Rn);

Regular expression union  n/a

(re.union R1 ... Rn)

em.mkExpr(kind::REGEXP_UNION, R1, ..., Rn);

Regular expression intersection  n/a

(re.inter R1 ... Rn)

em.mkExpr(kind::REGEXP_INTER, R1, ..., Rn);

Regular expression Kleene star  n/a

(re.* R)

em.mkExpr(kind::REGEXP_STAR, R);

Regular expression plus  n/a

(re.+ R)

em.mkExpr(kind::REGEXP_PLUS, R);

Regular expression option  n/a

(re.opt R)

em.mkExpr(kind::REGEXP_OPT, R);

Options
Some functions in the theory are have only experimental support currently and
are disabled by default (even in the ALL_SUPPORTED
logic:
To use them:
(setoption :stringsexp true)
The solver can be run in finite model finding mode which guarantees termination for satisfiable problems. This mode is disabled by default. To enable it:
(setoption :stringsfmf true)
Note that in this mode the solver is much slower than in default mode, so we recommend it only as a fall back option when the default mode fails to find a solution with a reasonably large timeout.
To select the strategy of LB rule application: 0lazy, 1eager, 2no (0 by default):
(setoption :stringslb 1)
To set up string alphabet cardinality (256 by default, expert option):
(setoption :stringsalphabetcard n)
This is a reserved option for the extension of the sequence theory.
Alphabet
Currently, the solver's theory is based on an alphabet consisting of the 256 characters from (8bit) Extended ASCII. Since there are several versions of Extended ASCII, we allow string constants to contain only printable US ASCII characters, which are encoded in the same way in all Extended ASCII versions.
Note: The alphabet will change to the one prescribed by the SMTLIB standard once there is one.
Printable Characters
A printable character is any character with numerical value between 0x20 and 0x7e in the standard US ASCII encoding.
Escape Sequences in String Constants
String constants are denoted by SMTLIB string literals consisting of sequences of printable characters delimited by doublequotes ("
).
We support escape sequences used in most programming languages to represent nonprintable characters.
\0 … \9

represents ASCII character 0 … 9, respectively 
\a , \b , \e , \f , \n , \r , \t , \v

represents its corresponding ASCII character (C++ convention) 
\ ooo

encodes a single ASCII character where ooo consists of exactly three digits in the octal encoding of the character (from 0 to 377). For example, \101 represents A . Note: going beyond value 377 might give unexpected results. For instance, \437 will be translated in the twocharacter string #7 .

\x NN

encodes a single ASCII character, where NN consists of exactly two digits in the exadecimal encoding of the character. 
The backslash character (\
) is silently ignored when it is followed by a sequence of characters not recognized as an escape sequence. For example, \$
, say, is parsed as if it was just $
.
When CVC4 outputs a string constant, a nonprintable/extended ASCII character is printed in the exadecimal format \x
NN, except for the character denoted by the escape sequences \a
, \b
, \e
, \f
, \n
, \r
, \t
, \v
, which are printed using those escape sequences.
Note:
These escape sequences are specific to string constants in the theory of strings. They are 'not' escape sequences in SMTLIB 2 per se.
SMTLIB 2.5 has only one escape sequence for string literals: ""
,
which denotes the double quotes character.
This means that a string literal like "a""c"
is read by the solver as the string constant consisting of the characters a
, "
, and c
.
The same constant can be entered as "a\042c"
or "a\x22c"
.
Theory Signature
To define a string variable, i.e., a free string constant:
(declarefun x () String)
Alternatively:
(declareconst x () String)
String Concatenation:
(str.++ s1 s2 ... sn)
where s1, s2, ..., and sn are string terms. String concatenation takes at least 2 arguments.
String Length:
(str.len s)
where s is a string term.
Character in String:
(str.at s i)
where s is a string term and i is an integer. The index is starting from 0.
SubString:
(str.substr s i j )
where s is a string term, i and j are integers.
Escape Sequences for Regular Expressions
Currently, it is for CVC format only. (Coming soon.)
Symbolic Regular Expression
Membership Constraint:
(str.in.re s r)
where s is a string term and r is a regular expression.
String to Regular Expression Conversion:
(str.to.re s)
where s is a string term. The statement turns a regular expression that only contains a string s.
Regular Expression Concatenation:
(re.++ r_1 r_2 ... r_n)
where r_1, r_2, ..., r_n are regular expressions.
Regular Expression Alternation:
(re.union r_1 r_2 ... r_n)
where r_1, r_2, ..., r_n are regular expressions. re.or is for releases before March, 2014.
Regular Expression Intersection:
(re.inter r_1 r_2 ... r_n)
where r_1, r_2, ..., r_n are regular expressions. re.itr is for releases before March, 2014.
Regular Expression KleeneStar:
(re.* r)
where r is a regular expression.
Regular Expression KleeneCross:
(re.+ r)
where r is a regular expression.
Regular Expression Option:
(re.opt r)
where r is a regular expression.
Regular Expression Range:
(re.range s t)
where s, t are single characters in double quotes, e.g. "a", "b". It returns a regular expression that contains any character between s and t.
Regular Expression Loop:
(re.loop r l u)
where r is a regular expression, l is a nonnegative constant integer, and u is a nonnegative constant integer. It returns a regular expression that contains at least l repetitions of r and at most u repetitions of r. If l >= u, it returns exactly l repetitions of r.
Regular Expression Loop2:
(re.loop r l)
where r is a regular expression, and l is a nonnegative constant integer. It returns a regular expression that contains at least l repetitions of r.
The Empty Regular Expression:
re.nostr
The Regular Expression that contains all characters:
re.allchar
Extended Functions
Following functions are available when using the stringsexp option.
String CharAt:
(str.at s i)
where s is a string term and i is an integer term. i is the position. If i is negative or greater than or equal to the length of s, then (str.at s i) returns the empty string.
String Substring:
(str.substr s i j)
where s is a string term and i, j are integer terms. i is the starting position, and j is the offset. If i is negative, it returns the empty string; otherwise, it returns the substring (of s) that begins at the specified index i and extends to the length j (or to the last character of s if the length of s is shorter than i + j).
String Contain:
(str.contains s t)
where s and t are string terms. It returns true if the string s contains the string t. This function determines whether the string t can be found within the string s, returning true or false as appropriate.
String IndexOf:
(str.indexof s t i)
where s is a string, t is a nonempty string and i is a nonnegative integer. This function returns the position of the first occurrence of the specified value t in the string s after the index i. It returns 1 if the value to search for does not occur.
String Replacement:
(str.replace s t1 t2)
where s, t1 and t2 are string terms, t1 is nonempty. This function searches the string s for the specified value t1, and returns a new string where the first occurrence of the specified value t1 is replaced by the string t2.
String PrefixOf:
(str.prefixof s t)
where s and t are string terms. It returns true if the string s is a prefix of the string t.
String SuffixOf:
(str.suffixof s t)
where s and t are string terms. It returns true if the string s is a suffix of the string t.
String To Integer Conversion:
(str.to.int s)
where s is a string term. It returns the corresponding natural number if s is string of digits; otherwise, it returns 1.
Integer To String Conversion:
(int.to.str i)
where i is an integer term. It returns the corresponding string if i is a natural number; otherwise, it returns an empty string.
Extensions
Together with other engine in CVC4, we can extend new functionality in the theory of strings. For example,
(definefun fun1 ((?x String) (?s String)) Bool (or (= ?x ?s) (> (str.len ?x) (str.len ?s)) ))
Quantifiers over bounded Integers (with strings in the body) are supported in the experimental mode; however, quantifiers over strings are still under development.
Limitations
The decidability of this theory is unknown. For satisfiable problems (without extensions), our solver is sound, complete and terminating in the FMF mode (although the FMF mode will be slower than the default mode in general). For unsatisfiable problems, termination is not guaranteed; however, users can tune the options for termination.
The current version of the solver supports ASCII characters only. We might move on to UNICODE in future versions.
Examples
Find an assignment for x, where x."ab"="ba".x and the length of x equals to 7.
(setlogic QF_S) (declarefun x () String) (assert (= (str.++ x "ab") (str.++ "ba" x))) (assert (= (str.len x) 7)) (checksat)
Find assignments for x and y, where x and y are distinct and their lengths are equal.
(setlogic QF_S) (declarefun x () String) (declarefun y () String) (assert (not (= x y))) (assert (= (str.len x) (str.len y))) (checksat)
Find assignments for x and y, where x.y != y.x.
(setlogic QF_S) (declarefun x () String) (declarefun y () String) (assert (not (= (str.++ x y) (str.++ y x)))) (checksat)
Find a model for x, y and z, where x."ab".y=y."ba".z and z=x.y and x."a"!="a".x.
(setlogic QF_S) (declarefun x () String) (declarefun y () String) (declarefun z () String) (assert (= (str.++ x "ab" y) (str.++ y "ba" z))) (assert (= z (str.++ x y))) (assert (not (= (str.++ x "a") (str.++ "a" x)))) (checksat)
Find a model for x and y, where both x and y are in the RegEx (a*b)* and they are different but have the same length.
(setlogic QF_S) (setoption :stringsfmf true) (declarefun x () String) (declarefun y () String) (assert (str.in.re x (re.* (re.++ (re.* (str.to.re "a") ) (str.to.re "b") )))) (assert (str.in.re y (re.* (re.++ (re.* (str.to.re "a") ) (str.to.re "b") )))) (assert (not (= x y))) (assert (= (str.len x) (str.len y))) (checksat)
API
More details can be found in the Tutorials.
C++
The example can be found in examples/api/strings.cpp.
If setting the logic, use "S" to enable theory of strings.
smt.setLogic("S");
To create a string type, call mkSetType
in the ExprManager
.
Type string = em.stringType();
Make some string literals:
// std::string std::string std_str_ab("ab"); // CVC4::String CVC4::String cvc4_str_ab(std_str_ab); CVC4::String cvc4_str_abc("abc"); // String constants Expr ab = em.mkConst(cvc4_str_ab); Expr abc = em.mkConst(CVC4::String("abc"));
Make some string variables:
Expr x = em.mkVar("x", string); Expr y = em.mkVar("y", string); Expr z = em.mkVar("z", string);
Make some string constraints:
// String concatenation: x.ab.y Expr lhs = em.mkExpr(kind::STRING_CONCAT, x, ab, y); // String concatenation: abc.z Expr rhs = em.mkExpr(kind::STRING_CONCAT, abc, z); // x.ab.y = abc.z Expr formula1 = em.mkExpr(kind::EQUAL, lhs, rhs); // Length of y: y Expr leny = em.mkExpr(kind::STRING_LENGTH, y); // y >= 0 Expr formula2 = em.mkExpr(kind::GEQ, leny, em.mkConst(Rational(0))); // Regular expression: (ab[ce]*f)gh Expr r = em.mkExpr(kind::REGEXP_UNION, em.mkExpr(kind::REGEXP_CONCAT, em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("ab"))), em.mkExpr(kind::REGEXP_STAR, em.mkExpr(kind::REGEXP_RANGE, em.mkConst(String("c")), em.mkConst(String("e")))), em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("f")))), em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("g"))), em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("h")))); // String variables Expr s1 = em.mkVar("s1", string); Expr s2 = em.mkVar("s2", string); // String concatenation: s1.s2 Expr s = em.mkExpr(kind::STRING_CONCAT, s1, s2); // s1.s2 in (ab[ce]*f)gh Expr formula3 = em.mkExpr(kind::STRING_IN_REGEXP, s, r);
Make a query:
Expr q = em.mkExpr(kind::AND, formula1, formula2, formula3);
Check the result:
Result result = smt.checkSat(q); std::cout << "CVC4 reports: " << q << " is " << result << "." << std::endl; if(result == Result::SAT) { std::cout << " x = " << smt.getValue(x) << std::endl; std::cout << " s1.s2 = " << smt.getValue(s) << std::endl; }
Java
The example can be found in examples/api/java/Strings.java.
Unsat Cores
The string solver supports the generation of unsatisfiable cores. As with other subsolvers though, you must enable proofs at configuration time, and then run CVC with "dumpunsatcores" flag.
References
 Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark Barrett, and Morgan Deters. An efficient SMT solver for string constraints. Formal Methods in System Design. 2016.
 Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings. In Proceedings of the 10th International Symposium on Frontiers of Combining Systems (FroCoS'15), Wroclaw, Poland, 2015.
 Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett and Morgan Deters. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions.In Proceedings of the 26th International Conference on Computer Aided Verification (CAV'14), Vienna, Austria, 2014.
 Tianyi Liang. Automated reasoning over string constraints. PhD Dissertation, Department of Computer Science, The University of Iowa, Dec 2014.