Difference between revisions of "Strings"

From CVC4
Jump to: navigation, search
(Experimental Mode)
(Extended Functions)
 
(101 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This page is about strings in CVC4
+
This page describes support for the theory of strings in CVC4.
  
 
=Syntax=
 
=Syntax=
This document is for the SMT-Lib v2 format.
+
This document focuses on input written in SMT-LIB 2 format.  
  
The Theory of Strings (Quantifier-Free) logic symbol:
+
'''We highly recommend that users use SMT-LIB [http://smt-lib.org/language.shtml Version 2.5] or greater,
  (set-logic QF_S)
+
instead of Version 2.0.'''
 +
The major difference is in the definition of escape sequences for string literals.
  
The logic symbol is '''IMPORTANT''' to the Theory of String, and it has to be set up.
+
'''The syntax below is for CVC4 version > 1.4.''' Version 1.3 has only
 +
''partial'' support for syntax in this document.
  
If the constraints contain more theories, please add them accordingly, e.g. if it contains BitVector, the symbol should be QF_SBV.
+
Currently, the string solver supports string constants over a set of characters
 +
limited to the printable ASCII characters. Other characters ''must'' be encoded
 +
with escape sequences.  
 +
For arbitrary alphabets, we plan to provide later a separate solver
 +
for a theory of parametric sequences.
  
The Theory of Quantified Strings logic symbol:
+
To use the string solver it is important to declare initially
   (set-logic UFSLIA)
+
(using the <code>set-logic</code> command) an SMT-LIB logic that includes strings.
 +
Since the SMT-LIB standard does not have an official theory of strings and
 +
related logics yet (although one is in development http://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml),
 +
the logic and operator names described below are tentative and might change later.
 +
 
 +
The basic logic is <code>QF_S</code> consisting of quantifier-free formulas
 +
over just the theory of strings, e.g.:
 +
  (set-logic QF_S)
 +
For string applications that require reasoning about length, the logic should be extended to also include linear integer arithmetic:
 +
   (set-logic QF_SLIA)
 +
 
 +
A summary of the relevant syntax for strings in the SMT2, CVC, and API is below.  Note that regular expressions are not yet supported in the CVC format.  More details on these operators can be found later in this page.
 +
 
 +
{| class="wikitable" cellpadding="5" border="0" style="border-collapse:collapse"
 +
|-
 +
!
 +
! CVC language
 +
! SMTLIB language
 +
! C++ API
 +
|-
 +
| Logic string
 +
| Not needed
 +
| preappend "S" for strings
 +
| preappend "S" for strings
 +
|-
 +
|
 +
|
 +
| <code>(set-logic QF_'''S'''LIA)</code>
 +
| <code>smt.setLogic("QF_'''S'''LIA");</code>
 +
|-
 +
| String Sort
 +
| <code>STRING</code>
 +
| <code>String</code>
 +
| <code>em.stringType();</code>
 +
|-
 +
| String literals
 +
| <code>"abcdef"</code>
 +
| <code>"abcdef"</code>
 +
| <code>em.mkConst( '''::CVC4::String'''("abcdef") );</code>
 +
|-
 +
| Concatenation
 +
| <code>'''CONCAT'''( X1, ..., Xn )</code>
 +
| <code>('''str.++''' X1 ... Xn )</code>
 +
| <code>em.mkExpr('''kind::STRING_CONCAT''', X1, ..., Xn);</code>
 +
|-
 +
| Length
 +
| <code>'''LENGTH'''( x )</code>
 +
| <code>('''str.len''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_LENGTH''', X);</code>
 +
|-
 +
| String contains
 +
| <code>'''CONTAINS'''( X, Y )</code>
 +
| <code>('''str.contains''' X Y)</code>
 +
| <code>em.mkExpr('''kind::STRING_STRCTN''', X, Y);</code>
 +
|-
 +
| Index of
 +
| <code>'''INDEXOF'''( X, Y, N )</code>
 +
| <code>('''str.indexof''' X Y N)</code>
 +
| <code>em.mkExpr('''kind::STRING_STRIDOF''', X, Y, N);</code>
 +
|-
 +
| Replace
 +
| <code>'''REPLACE'''( X, Y, Z )</code>
 +
| <code>('''str.replace''' X Y Z)</code>
 +
| <code>em.mkExpr('''kind::STRING_STRREPL''', X, Y, Z);</code>
 +
|-
 +
| Replace
 +
| <code>'''REPLACEALL'''( X, Y, Z )</code>
 +
| <code>('''str.replaceall''' X Y Z)</code>
 +
| <code>em.mkExpr('''kind::STRING_STRREPLALL''', X, Y, Z);</code>
 +
|-
 +
| Substring
 +
| <code>'''SUBSTR'''( X, Y, Z )</code>
 +
| <code>('''str.substr''' X Y Z)</code>
 +
| <code>em.mkExpr('''kind::STRING_SUBSTR''', X, Y, Z);</code>
 +
|-
 +
| Prefix of
 +
| <code>'''PREFIXOF'''( X, Y )</code>
 +
| <code>('''str.prefixof''' X Y)</code>
 +
| <code>em.mkExpr('''kind::STRING_PREFIX''', X, Y);</code>
 +
|-
 +
| Suffix of
 +
| <code>'''SUFFIXOF'''( X, Y )</code>
 +
| <code>('''str.suffixof''' X Y)</code>
 +
| <code>em.mkExpr('''kind::STRING_SUFFIX''', X, Y);</code>
 +
|-
 +
| String to Integer
 +
| <code>'''STRING_TO_INTEGER'''( X )</code>
 +
| <code>('''str.to.int''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_STOI''', X);</code>
 +
|-
 +
| Integer to String
 +
| <code>'''INTEGER_TO_STRING'''( X )</code>
 +
| <code>('''int.to.str''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_ITOS''', X);</code>
 +
|-
 +
| String to Integer (16-bit)
 +
| <code>'''STRING_TO_UINT16'''( X )</code>
 +
| <code>('''str.to.u16''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_STOU16''', X);</code>
 +
|-
 +
| Integer (16-bit) to String
 +
| <code>'''UINT16_TO_STRING'''( X )</code>
 +
| <code>('''u16.to.str''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_U16TOS''', X);</code>
 +
|-
 +
| String to Integer (32-bit)
 +
| <code>'''STRING_TO_UINT32'''( X )</code>
 +
| <code>('''str.to.u32''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_STOU32''', X);</code>
 +
|-
 +
| Integer (32-bit) to String
 +
| <code>'''UINT32_TO_STRING'''( X )</code>
 +
| <code>('''u32.to.str''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_U32TOS''', X);</code>
 +
|-
 +
| Character at
 +
| <code>'''CHARAT'''( X, N )</code>
 +
| <code>('''str.at''' X N)</code>
 +
| <code>em.mkExpr('''kind::STRING_CHARAT''', X, N);</code>
 +
|-
 +
| Regular expression sort
 +
| <code>n/a</code>
 +
| <code>RegExp</code>
 +
| <code>em.regExpType();</code>
 +
|-
 +
| Membership in regular expression
 +
| <code>n/a</code>
 +
| <code>('''str.in.re''' X R)</code>
 +
| <code>em.mkExpr('''kind::STRING_IN_REGEXP''', X, R);</code>
 +
|-
 +
| String to regular expression
 +
| <code>n/a</code>
 +
| <code>('''str.to.re''' X)</code>
 +
| <code>em.mkExpr('''kind::STRING_TO_REGEXP''', X);</code>
 +
|-
 +
| Regular expression concatenation
 +
| <code>n/a</code>
 +
| <code>('''re.++''' R1 ... Rn)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_CONCAT''', R1, ..., Rn);</code>
 +
|-
 +
| Regular expression union
 +
| <code>n/a</code>
 +
| <code>('''re.union''' R1 ... Rn)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_UNION''', R1, ..., Rn);</code>
 +
|-
 +
| Regular expression intersection
 +
| <code>n/a</code>
 +
| <code>('''re.inter''' R1 ... Rn)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_INTER''', R1, ..., Rn);</code>
 +
|-
 +
| Regular expression Kleene star
 +
| <code>n/a</code>
 +
| <code>('''re.*''' R)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_STAR''', R);</code>
 +
|-
 +
| Regular expression plus
 +
| <code>n/a</code>
 +
| <code>('''re.+''' R)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_PLUS''', R);</code>
 +
|-
 +
| Regular expression option
 +
| <code>n/a</code>
 +
| <code>('''re.opt''' R)</code>
 +
| <code>em.mkExpr('''kind::REGEXP_OPT''', R);</code>
 +
|}
  
 +
We refer to all functions apart from string length and string concatenation as ''extended functions'' in the following.
  
 
==Options==
 
==Options==
To use the experimental functions (disabled by default):
+
The extended functions in the theory are disabled by default,
 +
even in the <code>ALL_SUPPORTED</code> logic. To enable them, use:
 
   (set-option :strings-exp true)
 
   (set-option :strings-exp true)
'''The strings-exp option is REQUIRED for the releases after March 1, 2014'''
 
  
To use finite model finding mode (false by default):
+
The solver can be run in ''finite model finding mode'' which guarantees
 +
termination for satisfiable problems where all strings are interpreted with small lengths.
 +
This mode is disabled by default. To enable it:
 
   (set-option :strings-fmf true)
 
   (set-option :strings-fmf true)
 +
Note that in this mode the solver can be much '''slower''' than in default mode,
 +
so we recommend it as a fall back option when the default mode fails
 +
to find a solution with a reasonably large timeout.
  
To select the strategy of LB rule application: 0-lazy, 1-eager, 2-no (0 by default):
+
Currently, the solver's theory is based on an alphabet consisting of the 256
   (set-option :strings-lb 1)
+
characters from (8-bit) Extended ASCII.
 +
Since there are several versions of Extended ASCII, we allow string constants
 +
to contain only ''printable US ASCII characters'', i.e. those with numerical value between 0x20 and 0x7e
 +
in the standard US ASCII encoding.
 +
To limit the alphabet of the strings solver to the 128 printable ASCII characters, use:
 +
   (set-option :--strings-print-ascii)
 +
'''Note:''' The alphabet will change to the one prescribed by the SMT-LIB standard
 +
once there is one.
  
To set up string alphabet cardinality (256 by default, expert option):
+
==Escape Sequences in String Constants==
  (set-option :strings-alphabet-card n)
+
String constants are denoted by SMT-LIB string literals consisting of sequences of printable characters delimited by double-quotes (<code>"</code>).
  
To set up depth of unrolling regular expression used by the theory of strings (10 by default, deprecated):
+
We support escape sequences used in most programming languages
  (set-option :strings-regexp-depth 20)
+
to represent non-printable characters.  
The unrolling option will be removed in a new release. Our string engine no longer depends on it.
+
  
==Escaped Character Literals==
 
 
{| border="1"
 
{| border="1"
 
|-
 
|-
| \0 … \9
+
| <code>\0</code> <code>\9</code>
 
| represents ASCII character 0 … 9, respectively
 
| represents ASCII character 0 … 9, respectively
 
|-
 
|-
| \a, \b, \f, \n, \r, \t, \v
+
| <code>\a</code>, <code>\b</code>, <code>\e</code>, <code>\f</code>, <code>\n</code>, <code>\r</code>, <code>\t</code>, <code>\v</code>
 
| represents its corresponding ASCII character (C++ convention)
 
| represents its corresponding ASCII character (C++ convention)
 
|-
 
|-
| \ooo
+
| <code>\</code>''ooo''
| matches an ASCII character, where ooo consists of (no more than) three digits that represent the octal character code (from 0 to 377). For example, \101 represents ‘A’, while “\437” represent a string with two characters “#7”( *important* ).
+
| encodes a single ASCII character where ''ooo'' consists of exactly three digits in the octal encoding of the character (from 0 to 377). For example, <code>\101</code> represents <code>A</code>. '''Note:''' going beyond value 377 might give unexpected results. For instance, <code>\437</code> will be translated in the two-character string <code>#7</code>.
 
|-
 
|-
| \xNN
+
| <code>\x</code>''NN''
| matches an ASCII character, where NN is a two-digit hexadecimal character code. NN has to be exactly two hex-digits. If not, an exception will be raised.
+
| encodes a single ASCII character, where ''NN'' consists of exactly two digits in the exadecimal encoding of the character.
 
|}
 
|}
  
The backslash character (\) : when followed by a character that is not recognized as an escaped character, matches that character. For example, \" matches a double quote(").
+
The backslash character (<code>\</code>) is silently ignored when it is followed by a sequence of characters not recognized as an escape sequence. For example, <code>\$ </code>, say, is parsed as if it was just <code>$</code>.
 
   
 
   
In CVC4 string literal output, a non-printable character is printed in the format \xNN, where NN matches its ASCII character, except for \a, \b, \f, \n, \r, \t, \v, which are printed directly.
+
When CVC4 outputs a string constant, a non-printable/extended ASCII character is printed in the exadecimal format <code>\x</code>''NN'', except for the character denoted by the escape sequences <code>\a</code>, <code>\b</code>, <code>\e</code>, <code>\f</code>, <code>\n</code>, <code>\r</code>, <code>\t</code>, <code>\v</code>, which are printed using those escape sequences.
  
==Strings==
+
'''Note''':
To define a string variable:
+
These escape sequences are specific to string constants in the theory of strings. They are 'not' escape sequences in SMT-LIB 2 per se.
 +
SMT-LIB 2.5 has only one escape sequence for string literals: <code>""</code>,
 +
which denotes the double quotes character.
 +
This means that a string literal like  <code>"a""c"</code> is read by the solver as the string constant consisting of the characters <code>a</code>, <code>"</code>, and <code>c</code>.
 +
The same constant can be entered as <code>"a\042c"</code> or <code>"a\x22c"</code>.
 +
 
 +
==Theory Signature==
 +
To define a string variable, i.e., a free string constant:
 
   (declare-fun x () String)
 
   (declare-fun x () String)
 +
Alternatively:
 +
  (declare-const x () String)
  
 
String Concatenation:
 
String Concatenation:
   (str.++ s t)
+
   (str.++ s1 s2 ... sn)
where s, t are string terms.
+
where s1, s2, ..., and sn are string terms. String concatenation takes at least 2 arguments.
  
 
String Length:
 
String Length:
Line 65: Line 255:
 
where s is a string term.
 
where s is a string term.
  
Character in String:
+
==Regular Expression Memberships==
  (str.at s i)
+
where s is a string term and i is a natural number. (see partial functions)
+
The index is starting from 0.
+
 
+
Sub-String:
+
  (str.substr s i j )
+
where s is a string term, i and j are natural numbers. (see partial functions)
+
 
+
==Regular Expression==
+
 
Membership Constraint:
 
Membership Constraint:
 
   (str.in.re s r)
 
   (str.in.re s r)
Line 112: Line 293:
 
It returns a regular expression that contains any character between s and t.
 
It returns a regular expression that contains any character between s and t.
  
==Experimental Mode==
+
Regular Expression Loop:
Following functions are under the --strings-exp option. They are under active refinement. Once they are stable, we will move them to the default mode. Please let us know when you have some suggestions.
+
  (re.loop r l u)
 +
where r is a regular expression, l is a non-negative constant integer, and u is a non-negative constant integer.
 +
It returns a regular expression that contains at least l repetitions of r and at most u repetitions of r.
 +
If l >= u, it returns exactly l repetitions of r.
 +
 
 +
Regular Expression Loop-2:
 +
  (re.loop r l)
 +
where r is a regular expression, and l is a non-negative constant integer.
 +
It returns a regular expression that contains at least l repetitions of r.
 +
 
 +
 
 +
The Empty Regular Expression:
 +
  re.nostr
 +
 
 +
The Regular Expression that contains all characters:
 +
  re.allchar
 +
 
 +
==Extended Functions==
 +
Following functions are available when using the --strings-exp option.  
 +
 
 +
String Char-At:
 +
  (str.at s i)
 +
where s is a string term and i is an integer term. i is the position. If i is negative or greater than or equal to the length of s, then (str.at s i) returns the empty string.
 +
 
 +
String Sub-string:
 +
  (str.substr s i j)
 +
where s is a string term and i, j are integer terms. i is the starting position, and j is the offset. If i is negative, it returns the empty string; otherwise, it returns the substring (of s) that begins at the specified index i and extends to the length j (or to the last character of s if the length of s is shorter than i + j).
  
 
String Contain:
 
String Contain:
Line 122: Line 329:
 
String IndexOf:
 
String IndexOf:
 
  (str.indexof s t i)
 
  (str.indexof s t i)
 +
where s is a string, t is a non-empty string and i is a non-negative integer.
 
This function returns the position of the first occurrence of the specified value t in the string s after the index i.
 
This function returns the position of the first occurrence of the specified value t in the string s after the index i.
It returns -1 if the value to search for never occurs.
+
It returns -1 if the value to search for does not occur.
  
 
String Replacement:
 
String Replacement:
 
  (str.replace s t1 t2)
 
  (str.replace s t1 t2)
where s, t1 and t2 are string terms.
+
where s, t1 and t2 are string terms, t1 is non-empty.
This function searches the string s for the specified value t1, and returns a new string where the first occurrence of the specified value s1 is replaced by the string s2.
+
This function searches the string s for the specified value t1, and returns a new string where the first occurrence of the specified value t1 is replaced by the string t2.
  
 
String PrefixOf:
 
String PrefixOf:
Line 140: Line 348:
 
String To Integer Conversion:
 
String To Integer Conversion:
 
  (str.to.int s)
 
  (str.to.int s)
where s is a string term. It returns the corresponding natural number if s is valid; otherwise, it returns -1.
+
where s is a string term. It returns the corresponding natural number if s is string of digits; otherwise, it returns -1.
The string s has no leading zeros, e.g. (str.to.int "05") is equal to -1.
+
  
 
Integer To String Conversion:
 
Integer To String Conversion:
Line 147: Line 354:
 
where i is an integer term. It returns the corresponding string if i is a natural number; otherwise, it returns an empty string.
 
where i is an integer term. It returns the corresponding string if i is a natural number; otherwise, it returns an empty string.
  
=Partial Functions=
+
=Limitations=
By the definition of partial functions in [http://www.smtlib.org/ SMT-Lib Document], the following constraint is satisfiable:
+
  (assert (= (str.substr "a" 2 1) "b"))
+
However, the following constraints are unsatisfiable:
+
  (assert (= (str.substr "a" 2 1) "b"))
+
  (assert (= (str.substr "a" 2 1) "c"))
+
To achieve a desirable goal, it requires users to guard proper conditions. For example,
+
  (assert (= (str.at x j) "b"))
+
  (assert (> j 0))
+
  (assert (> (str.len x) j))
+
 
+
=Extension=
+
Together with other engine in CVC4, we can extend new functionality in the theory of strings. For example,
+
  (define-fun fun1 ((?x String) (?s String)) Bool
+
    (or (= ?x ?s)
+
        (> (str.len ?x) (str.len ?s))
+
    ))
+
Quantifiers over bounded Integers (with strings in the body) are supported in the experimental mode; however, quantifiers over strings are still under development.
+
 
+
=Limitation=
+
 
The decidability of this theory is unknown.
 
The decidability of this theory is unknown.
For satisfiable problems (without extensions), our solver is sound, complete and terminating in the FMF mode. For unsatisfiable problems, the termination is not guaranteed; however, user can tune the options for termination.
+
For satisfiable problems (without extensions), our solver is sound, complete and terminating in the FMF mode (although the FMF mode will be slower than the default mode in general). For unsatisfiable problems, termination is not guaranteed; however, users can tune the options for termination.
 +
 
 +
The current version of the solver supports ASCII characters only. We might move on to UNICODE in future versions.
  
 
=Examples=
 
=Examples=
Line 231: Line 421:
 
    
 
    
 
   (check-sat)
 
   (check-sat)
 +
 +
=API=
 +
More details can be found in the [http://cvc4.cs.stanford.edu/wiki/Tutorials Tutorials].
 +
 +
==C++==
 +
The example can be found in [https://github.com/CVC4/CVC4/blob/master/examples/api/strings.cpp examples/api/strings.cpp].
 +
 +
If setting the logic, use "S" to enable theory of strings.
 +
  smt.setLogic("S");
 +
 +
To create a string type, call <code>mkSetType</code> in the <code>ExprManager</code>.
 +
  Type string = em.stringType();
 +
 +
Make some string literals:
 +
  // std::string
 +
  std::string std_str_ab("ab");
 +
  // CVC4::String
 +
  CVC4::String cvc4_str_ab(std_str_ab);
 +
  CVC4::String cvc4_str_abc("abc");
 +
  // String constants
 +
  Expr ab = em.mkConst(cvc4_str_ab);
 +
  Expr abc = em.mkConst(CVC4::String("abc"));
 +
 +
Make some string variables:
 +
  Expr x = em.mkVar("x", string);
 +
  Expr y = em.mkVar("y", string);
 +
  Expr z = em.mkVar("z", string);
 +
 +
Make some string constraints:
 +
  // String concatenation: x.ab.y
 +
  Expr lhs = em.mkExpr(kind::STRING_CONCAT, x, ab, y);
 +
  // String concatenation: abc.z
 +
  Expr rhs = em.mkExpr(kind::STRING_CONCAT, abc, z);
 +
  // x.ab.y = abc.z
 +
  Expr formula1 = em.mkExpr(kind::EQUAL, lhs, rhs);
 +
  // Length of y: |y|
 +
  Expr leny = em.mkExpr(kind::STRING_LENGTH, y);
 +
  // |y| >= 0
 +
  Expr formula2 = em.mkExpr(kind::GEQ, leny, em.mkConst(Rational(0)));
 +
  // Regular expression: (ab[c-e]*f)|g|h
 +
  Expr r = em.mkExpr(kind::REGEXP_UNION,
 +
  em.mkExpr(kind::REGEXP_CONCAT,
 +
  em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("ab"))),
 +
  em.mkExpr(kind::REGEXP_STAR,
 +
  em.mkExpr(kind::REGEXP_RANGE, em.mkConst(String("c")), em.mkConst(String("e")))),
 +
  em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("f")))),
 +
  em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("g"))),
 +
  em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("h"))));
 +
  // String variables
 +
  Expr s1 = em.mkVar("s1", string);
 +
  Expr s2 = em.mkVar("s2", string);
 +
  // String concatenation: s1.s2
 +
  Expr s = em.mkExpr(kind::STRING_CONCAT, s1, s2);
 +
  // s1.s2 in (ab[c-e]*f)|g|h
 +
  Expr formula3 = em.mkExpr(kind::STRING_IN_REGEXP, s, r);
 +
 +
Make a query:
 +
  Expr q = em.mkExpr(kind::AND,
 +
    formula1,
 +
    formula2,
 +
    formula3);
 +
 +
Check the result:
 +
  Result result = smt.checkSat(q);
 +
  std::cout << "CVC4 reports: " << q << " is " << result << "." << std::endl;
 +
  if(result == Result::SAT) {
 +
    std::cout << " x = " << smt.getValue(x) << std::endl;
 +
    std::cout << " s1.s2 = " << smt.getValue(s) << std::endl;
 +
  }
 +
 +
==Java==
 +
The example can be found in [https://github.com/CVC4/CVC4/blob/master/examples/api/java/Strings.java examples/api/java/Strings.java].
 +
 +
==Unsat Cores==
 +
The string solver supports the generation of unsatisfiable cores.
 +
As with other subsolvers though, you must enable proofs at configuration time,
 +
and then run CVC with "--dump-unsat-cores" flag.
 +
 +
=References=
 +
* Andrew Reynolds, Maverick Woo, Clark Barrett, David Brumley, Tianyi Liang, Cesare Tinelli. [http://homepage.divms.uiowa.edu/~ajreynol/cav17a.pdf Scaling Up DPLL(T) String Solvers Using Context-Dependent Simplification]. CAV 2017.
 +
* Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark Barrett, and Morgan Deters. [http://dl.acm.org/citation.cfm?id=2994123 An efficient SMT solver for string constraints]. Formal Methods in System Design. 2016.
 +
* Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. [http://link.springer.com/chapter/10.1007/978-3-319-24246-0_9 A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings]. In Proceedings of the 10th International Symposium on Frontiers of Combining Systems (FroCoS'15), Wroclaw, Poland, 2015.
 +
* Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett and Morgan Deters. [http://link.springer.com/chapter/10.1007%2F978-3-319-08867-9_43 A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions].In Proceedings of the 26th International Conference on Computer Aided Verification (CAV'14), Vienna, Austria, 2014.
 +
* Tianyi Liang. [http://ir.uiowa.edu/etd/1478/ Automated reasoning over string constraints]. PhD Dissertation, Department of Computer Science, The University of Iowa, Dec 2014.

Latest revision as of 10:35, 11 December 2018

This page describes support for the theory of strings in CVC4.

Syntax

This document focuses on input written in SMT-LIB 2 format.

We highly recommend that users use SMT-LIB Version 2.5 or greater, instead of Version 2.0. The major difference is in the definition of escape sequences for string literals.

The syntax below is for CVC4 version > 1.4. Version 1.3 has only partial support for syntax in this document.

Currently, the string solver supports string constants over a set of characters limited to the printable ASCII characters. Other characters must be encoded with escape sequences. For arbitrary alphabets, we plan to provide later a separate solver for a theory of parametric sequences.

To use the string solver it is important to declare initially (using the set-logic command) an SMT-LIB logic that includes strings. Since the SMT-LIB standard does not have an official theory of strings and related logics yet (although one is in development http://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml), the logic and operator names described below are tentative and might change later.

The basic logic is QF_S consisting of quantifier-free formulas over just the theory of strings, e.g.:

 (set-logic QF_S)

For string applications that require reasoning about length, the logic should be extended to also include linear integer arithmetic:

 (set-logic QF_SLIA)

A summary of the relevant syntax for strings in the SMT2, CVC, and API is below. Note that regular expressions are not yet supported in the CVC format. More details on these operators can be found later in this page.

CVC language SMTLIB language C++ API
Logic string Not needed preappend "S" for strings preappend "S" for strings
(set-logic QF_SLIA) smt.setLogic("QF_SLIA");
String Sort STRING String em.stringType();
String literals "abcdef" "abcdef" em.mkConst( ::CVC4::String("abcdef") );
Concatenation CONCAT( X1, ..., Xn ) (str.++ X1 ... Xn ) em.mkExpr(kind::STRING_CONCAT, X1, ..., Xn);
Length LENGTH( x ) (str.len X) em.mkExpr(kind::STRING_LENGTH, X);
String contains CONTAINS( X, Y ) (str.contains X Y) em.mkExpr(kind::STRING_STRCTN, X, Y);
Index of INDEXOF( X, Y, N ) (str.indexof X Y N) em.mkExpr(kind::STRING_STRIDOF, X, Y, N);
Replace REPLACE( X, Y, Z ) (str.replace X Y Z) em.mkExpr(kind::STRING_STRREPL, X, Y, Z);
Replace REPLACEALL( X, Y, Z ) (str.replaceall X Y Z) em.mkExpr(kind::STRING_STRREPLALL, X, Y, Z);
Substring SUBSTR( X, Y, Z ) (str.substr X Y Z) em.mkExpr(kind::STRING_SUBSTR, X, Y, Z);
Prefix of PREFIXOF( X, Y ) (str.prefixof X Y) em.mkExpr(kind::STRING_PREFIX, X, Y);
Suffix of SUFFIXOF( X, Y ) (str.suffixof X Y) em.mkExpr(kind::STRING_SUFFIX, X, Y);
String to Integer STRING_TO_INTEGER( X ) (str.to.int X) em.mkExpr(kind::STRING_STOI, X);
Integer to String INTEGER_TO_STRING( X ) (int.to.str X) em.mkExpr(kind::STRING_ITOS, X);
String to Integer (16-bit) STRING_TO_UINT16( X ) (str.to.u16 X) em.mkExpr(kind::STRING_STOU16, X);
Integer (16-bit) to String UINT16_TO_STRING( X ) (u16.to.str X) em.mkExpr(kind::STRING_U16TOS, X);
String to Integer (32-bit) STRING_TO_UINT32( X ) (str.to.u32 X) em.mkExpr(kind::STRING_STOU32, X);
Integer (32-bit) to String UINT32_TO_STRING( X ) (u32.to.str X) em.mkExpr(kind::STRING_U32TOS, X);
Character at CHARAT( X, N ) (str.at X N) em.mkExpr(kind::STRING_CHARAT, X, N);
Regular expression sort n/a RegExp em.regExpType();
Membership in regular expression n/a (str.in.re X R) em.mkExpr(kind::STRING_IN_REGEXP, X, R);
String to regular expression n/a (str.to.re X) em.mkExpr(kind::STRING_TO_REGEXP, X);
Regular expression concatenation n/a (re.++ R1 ... Rn) em.mkExpr(kind::REGEXP_CONCAT, R1, ..., Rn);
Regular expression union n/a (re.union R1 ... Rn) em.mkExpr(kind::REGEXP_UNION, R1, ..., Rn);
Regular expression intersection n/a (re.inter R1 ... Rn) em.mkExpr(kind::REGEXP_INTER, R1, ..., Rn);
Regular expression Kleene star n/a (re.* R) em.mkExpr(kind::REGEXP_STAR, R);
Regular expression plus n/a (re.+ R) em.mkExpr(kind::REGEXP_PLUS, R);
Regular expression option n/a (re.opt R) em.mkExpr(kind::REGEXP_OPT, R);

We refer to all functions apart from string length and string concatenation as extended functions in the following.

Options

The extended functions in the theory are disabled by default, even in the ALL_SUPPORTED logic. To enable them, use:

 (set-option :strings-exp true)

The solver can be run in finite model finding mode which guarantees termination for satisfiable problems where all strings are interpreted with small lengths. This mode is disabled by default. To enable it:

 (set-option :strings-fmf true)

Note that in this mode the solver can be much slower than in default mode, so we recommend it as a fall back option when the default mode fails to find a solution with a reasonably large timeout.

Currently, the solver's theory is based on an alphabet consisting of the 256 characters from (8-bit) Extended ASCII. Since there are several versions of Extended ASCII, we allow string constants to contain only printable US ASCII characters, i.e. those with numerical value between 0x20 and 0x7e in the standard US ASCII encoding. To limit the alphabet of the strings solver to the 128 printable ASCII characters, use:

 (set-option :--strings-print-ascii)

Note: The alphabet will change to the one prescribed by the SMT-LIB standard once there is one.

Escape Sequences in String Constants

String constants are denoted by SMT-LIB string literals consisting of sequences of printable characters delimited by double-quotes (").

We support escape sequences used in most programming languages to represent non-printable characters.

\0\9 represents ASCII character 0 … 9, respectively
\a, \b, \e, \f, \n, \r, \t, \v represents its corresponding ASCII character (C++ convention)
\ooo encodes a single ASCII character where ooo consists of exactly three digits in the octal encoding of the character (from 0 to 377). For example, \101 represents A. Note: going beyond value 377 might give unexpected results. For instance, \437 will be translated in the two-character string #7.
\xNN encodes a single ASCII character, where NN consists of exactly two digits in the exadecimal encoding of the character.

The backslash character (\) is silently ignored when it is followed by a sequence of characters not recognized as an escape sequence. For example, \$ , say, is parsed as if it was just $.

When CVC4 outputs a string constant, a non-printable/extended ASCII character is printed in the exadecimal format \xNN, except for the character denoted by the escape sequences \a, \b, \e, \f, \n, \r, \t, \v, which are printed using those escape sequences.

Note: These escape sequences are specific to string constants in the theory of strings. They are 'not' escape sequences in SMT-LIB 2 per se. SMT-LIB 2.5 has only one escape sequence for string literals: "", which denotes the double quotes character. This means that a string literal like "a""c" is read by the solver as the string constant consisting of the characters a, ", and c. The same constant can be entered as "a\042c" or "a\x22c".

Theory Signature

To define a string variable, i.e., a free string constant:

 (declare-fun x () String)

Alternatively:

 (declare-const x () String)

String Concatenation:

 (str.++ s1 s2 ... sn)

where s1, s2, ..., and sn are string terms. String concatenation takes at least 2 arguments.

String Length:

 (str.len s)

where s is a string term.

Regular Expression Memberships

Membership Constraint:

 (str.in.re s r)

where s is a string term and r is a regular expression.

String to Regular Expression Conversion:

 (str.to.re s)

where s is a string term. The statement turns a regular expression that only contains a string s.

Regular Expression Concatenation:

 (re.++ r_1 r_2 ... r_n)

where r_1, r_2, ..., r_n are regular expressions.

Regular Expression Alternation:

 (re.union r_1 r_2 ... r_n)

where r_1, r_2, ..., r_n are regular expressions. re.or is for releases before March, 2014.

Regular Expression Intersection:

 (re.inter r_1 r_2 ... r_n)

where r_1, r_2, ..., r_n are regular expressions. re.itr is for releases before March, 2014.

Regular Expression Kleene-Star:

 (re.* r)

where r is a regular expression.

Regular Expression Kleene-Cross:

 (re.+ r)

where r is a regular expression.

Regular Expression Option:

 (re.opt r)

where r is a regular expression.

Regular Expression Range:

 (re.range s t)

where s, t are single characters in double quotes, e.g. "a", "b". It returns a regular expression that contains any character between s and t.

Regular Expression Loop:

 (re.loop r l u)

where r is a regular expression, l is a non-negative constant integer, and u is a non-negative constant integer. It returns a regular expression that contains at least l repetitions of r and at most u repetitions of r. If l >= u, it returns exactly l repetitions of r.

Regular Expression Loop-2:

 (re.loop r l)

where r is a regular expression, and l is a non-negative constant integer. It returns a regular expression that contains at least l repetitions of r.


The Empty Regular Expression:

 re.nostr

The Regular Expression that contains all characters:

 re.allchar

Extended Functions

Following functions are available when using the --strings-exp option.

String Char-At:

 (str.at s i)

where s is a string term and i is an integer term. i is the position. If i is negative or greater than or equal to the length of s, then (str.at s i) returns the empty string.

String Sub-string:

 (str.substr s i j)

where s is a string term and i, j are integer terms. i is the starting position, and j is the offset. If i is negative, it returns the empty string; otherwise, it returns the substring (of s) that begins at the specified index i and extends to the length j (or to the last character of s if the length of s is shorter than i + j).

String Contain:

 (str.contains s t)

where s and t are string terms. It returns true if the string s contains the string t. This function determines whether the string t can be found within the string s, returning true or false as appropriate.

String IndexOf:

(str.indexof s t i)

where s is a string, t is a non-empty string and i is a non-negative integer. This function returns the position of the first occurrence of the specified value t in the string s after the index i. It returns -1 if the value to search for does not occur.

String Replacement:

(str.replace s t1 t2)

where s, t1 and t2 are string terms, t1 is non-empty. This function searches the string s for the specified value t1, and returns a new string where the first occurrence of the specified value t1 is replaced by the string t2.

String PrefixOf:

(str.prefixof s t)

where s and t are string terms. It returns true if the string s is a prefix of the string t.

String SuffixOf:

(str.suffixof s t)

where s and t are string terms. It returns true if the string s is a suffix of the string t.

String To Integer Conversion:

(str.to.int s)

where s is a string term. It returns the corresponding natural number if s is string of digits; otherwise, it returns -1.

Integer To String Conversion:

(int.to.str i)

where i is an integer term. It returns the corresponding string if i is a natural number; otherwise, it returns an empty string.

Limitations

The decidability of this theory is unknown. For satisfiable problems (without extensions), our solver is sound, complete and terminating in the FMF mode (although the FMF mode will be slower than the default mode in general). For unsatisfiable problems, termination is not guaranteed; however, users can tune the options for termination.

The current version of the solver supports ASCII characters only. We might move on to UNICODE in future versions.

Examples

Find an assignment for x, where x."ab"="ba".x and the length of x equals to 7.

 (set-logic QF_S)
 
 (declare-fun x () String)
 
 (assert (= (str.++ x "ab") (str.++ "ba" x)))
 (assert (= (str.len x) 7))
 
 (check-sat)

Find assignments for x and y, where x and y are distinct and their lengths are equal.

 (set-logic QF_S)
 
 (declare-fun x () String)
 (declare-fun y () String)
 
 (assert (not (=  x y)))
 (assert (= (str.len x) (str.len y)))
 
 (check-sat)

Find assignments for x and y, where x.y != y.x.

 (set-logic QF_S)
 
 (declare-fun x () String)
 (declare-fun y () String)
 (assert (not (= (str.++ x y) (str.++ y x))))
 
 (check-sat)

Find a model for x, y and z, where x."ab".y=y."ba".z and z=x.y and x."a"!="a".x.

 (set-logic QF_S)
 
 (declare-fun x () String)
 (declare-fun y () String)
 (declare-fun z () String)
 
 (assert (= (str.++ x "ab" y) (str.++ y "ba" z)))
 (assert (= z (str.++ x y)))
 (assert (not (= (str.++ x "a") (str.++ "a" x))))
 
 (check-sat)

Find a model for x and y, where both x and y are in the RegEx (a*b)* and they are different but have the same length.

 (set-logic QF_S)
 (set-option :strings-fmf true)
 
 (declare-fun x () String)
 (declare-fun y () String)
 
 (assert
   (str.in.re x
      (re.* (re.++ (re.* (str.to.re "a") ) (str.to.re "b") ))))
 (assert (str.in.re y
      (re.* (re.++ (re.* (str.to.re "a") ) (str.to.re "b") ))))
 
 (assert (not (= x y)))
 (assert (= (str.len x) (str.len y)))
 
 (check-sat)

API

More details can be found in the Tutorials.

C++

The example can be found in examples/api/strings.cpp.

If setting the logic, use "S" to enable theory of strings.

 smt.setLogic("S");

To create a string type, call mkSetType in the ExprManager.

 Type string = em.stringType();

Make some string literals:

 // std::string
 std::string std_str_ab("ab");
 // CVC4::String
 CVC4::String cvc4_str_ab(std_str_ab);
 CVC4::String cvc4_str_abc("abc");
 // String constants
 Expr ab = em.mkConst(cvc4_str_ab);
 Expr abc = em.mkConst(CVC4::String("abc"));

Make some string variables:

 Expr x = em.mkVar("x", string);
 Expr y = em.mkVar("y", string);
 Expr z = em.mkVar("z", string);

Make some string constraints:

 // String concatenation: x.ab.y
 Expr lhs = em.mkExpr(kind::STRING_CONCAT, x, ab, y);
 // String concatenation: abc.z
 Expr rhs = em.mkExpr(kind::STRING_CONCAT, abc, z);
 // x.ab.y = abc.z
 Expr formula1 = em.mkExpr(kind::EQUAL, lhs, rhs);
 // Length of y: |y|
 Expr leny = em.mkExpr(kind::STRING_LENGTH, y);
 // |y| >= 0
 Expr formula2 = em.mkExpr(kind::GEQ, leny, em.mkConst(Rational(0)));
 // Regular expression: (ab[c-e]*f)|g|h
 Expr r = em.mkExpr(kind::REGEXP_UNION,
 em.mkExpr(kind::REGEXP_CONCAT,
 em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("ab"))),
 em.mkExpr(kind::REGEXP_STAR,
 em.mkExpr(kind::REGEXP_RANGE, em.mkConst(String("c")), em.mkConst(String("e")))),
 em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("f")))),
 em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("g"))),
 em.mkExpr(kind::STRING_TO_REGEXP, em.mkConst(String("h"))));
 // String variables
 Expr s1 = em.mkVar("s1", string);
 Expr s2 = em.mkVar("s2", string);
 // String concatenation: s1.s2
 Expr s = em.mkExpr(kind::STRING_CONCAT, s1, s2);
 // s1.s2 in (ab[c-e]*f)|g|h
 Expr formula3 = em.mkExpr(kind::STRING_IN_REGEXP, s, r);

Make a query:

 Expr q = em.mkExpr(kind::AND,
   formula1,
   formula2,
   formula3);

Check the result:

 Result result = smt.checkSat(q);
 std::cout << "CVC4 reports: " << q << " is " << result << "." << std::endl;
 if(result == Result::SAT) {
   std::cout << " x = " << smt.getValue(x) << std::endl;
   std::cout << " s1.s2 = " << smt.getValue(s) << std::endl;
 }

Java

The example can be found in examples/api/java/Strings.java.

Unsat Cores

The string solver supports the generation of unsatisfiable cores. As with other subsolvers though, you must enable proofs at configuration time, and then run CVC with "--dump-unsat-cores" flag.

References