Thursday, March 31, 2011

C++11 Raw String Literals: A simple example

NOTE: I've edited the title of this article from "C++0x Raw String Literals: A simple example" to "C++11 Raw String Literals: A simple example" since it has now been several years since the name is officially adopted.

C++ string literals and C++ multine strings are now possible.  Raw string literals are a feature of C++11 which I've been waiting for.  Since the C++0x draft is now complete, I don't think we need to worry about the implementation being changed anymore.

Part of the problem with C++ ISO 1998, is that it did not allow breaking a string into multiple lines of code.  The closest way one could achieve this was by concatenating two strings by having them side by side or line by line:

C++ 1998 code:
1
2
3
string s = "this is line 1, "
           "followed by line 2, "
           "followed by line 3, etc...\n";

C++ automatically concatenates two strings that are separated by spaces as long as they are properly enclosed within double quotes.  Consider a more problematic example, such as attempting to output HTML code:

C++ 1998 code:
1
2
3
4
5
6
7
8
9
string s = 
"<HTML>"
"<HEAD>"
"    <TITLE>this is my title</TITLE>"
"</HEAD>"
"<BODY>"
"    <P>This is a paragraph</P>"
"</BODY>"
"</HTML>";

An alternative:
C++ 1998 code:
1
2
3
4
5
6
7
8
9
string s = 
"<HTML>\
<HEAD>\
    <TITLE>this is my title</TITLE>\
</HEAD>\
<BODY>\
    <P>This is a paragraph</P>\
</BODY>\
</HTML>";

This is very messy and will be hard to maintain at best.  It's not all bad though; because if you made a mistake, like forgetting a double quote, it would result in a compile time error which is fairly easy to catch.

C++11 on the other hand allows for much more sophisticated string handling.
Here is an example of the usefulness of this facility:

C++0x code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <string>
using namespace std;

int main()
{
        string s = 

R"(<HTML>
<HEAD>
        <TITLE>This is a test</TITLE>
</HEAD>
<BODY>
        <P>Hello, C++ HTML World!</P>
</BODY>
</HTML>
)";

        cout << s << endl;
        return 0;
}

Compile the above code using the GNU GCC g++ command:

g++ -Wall -std=c++0x ./test.cpp -o ./test

UPDATE (2014): Note that the -std flag is still required, but you can use either c++0x or c++11.  As of November 2014, GNU's C++11 support is still experimental: https://gcc.gnu.org/projects/cxx0x.html

Officially this is valid C++0x code and you can imagine how useful a raw string literal can be when regular expressions come into play.  See the new standard:  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2146.html

Note that while the standard draft was being written, the raw string delimiter was initially "[ and ]" but that has been changed to "( and )".

You can optionally specify the delimiter sequence between the quote and parenthesis.  For example, instead of "(  )" you could use "x(  )x".  To quote from the standard:  "The terminating d-char-sequence of a raw string literal shall be the same sequence of characters as the initial d-char-sequence, The maximum length of d-char-sequence shall be 16 characters."

This means if you were to write a string literal which contained the character sequence )" it could mistakenly terminate the string at that point, resulting in a compile time error.  In order to mitigate this, the standard says you can specify your own delimiter up to a maximum of 16 characters.  So you could write this:

C++0x code:
1
2
3
string s =
R"X*X(A C++0x raw string literal can be specified like this: R"(This is my raw string)" )X*X";
cout << s << endl;

The program's output: A C++11 raw string literal can be specified like this: R"(This is my raw string)"

As you can see, being able to specify your own delimiter is a necessity when working with raw string literals.

PHP has a similar feature called "Heredoc": http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

No comments:

Post a Comment