Skip to content

Add string escape function or escape functionality built into CAPTURE macro #2967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toughengineer opened this issue Mar 28, 2025 · 13 comments

Comments

@toughengineer
Copy link

toughengineer commented Mar 28, 2025

The problem

This:

const auto asciiControlsString =
  "null \0 tab \t carriage return \r line feed \n whatever this is \xf quotation mark \" chars"sv;
CAPTURE(asciiControlsString);

results in a message like this:

with message:
 line feed rolsString := "null  tab      carriage return
   whatever this is � quotation mark " chars"

Notice that because of carriage return \r character it garbles up the message:

 line feed rolsString := "null  tab      carriage return
 ^         ^
text       remnants of the variable name
after \r

Proposed solution

Having an escape() function to escape non-printable and some special characters (like carriage return) provided by Catch2 would be wonderful so people don't have to reinvent the wheel every time.

The result of code like this:

CAPTURE(escape(asciiControlsString));

can look like this:

with message:
  escape(asciiControlsString) := "null \0 tab \t carriage return \r
  line feed \n whatever this is � quotation mark " chars"

Alternatively CAPTURE() macro can incorporate this functionality, or Catch2 can provide additional macro with such functionality, e.g. CAPTURE_ESCAPED().

@ChrisThrasher
Copy link
Collaborator

Could you solve your problem by using raw string literals? That way you can type \0 without that being converted into a null character. Instead you'll have \ and 0 end up in the string itself.

@toughengineer
Copy link
Author

Sometimes you need to have strings containing special characters as test input, and you want to have a message containing the input as a context to make diagnosing easier when the test fails, e.g.

const auto asciiControlsString =
  "null \0 tab \t carriage return \r line feed \n whatever this is \xf quotation mark \" chars"sv;
CAPTURE(asciiControlsString);

@ChrisThrasher, I don't understand how your suggestion to use raw string literals solves the problem of garbled up output of strings containing special characters.

@ChrisThrasher
Copy link
Collaborator

Try this

const auto asciiControlsString =
  R"(null \0 tab \t carriage return \r line feed \n whatever this is \xf quotation mark " chars)"sv;

Raw string literals ensure those characters do not get escaped. This is the canonical C++ solution for writing strings that contain escape sequences that you don't want to get escaped.

@toughengineer
Copy link
Author

Again, the point is to have special characters in the input string in order to test some functionality.
At the same time straightforwardly printing such a string (e.g. capturing it with CAPTURE() macro) as context to help diagnose problems in case the test fails garbles up the output.

To try to completely eliminate misunderstanding, imagine that the input is defined like this:

const auto asciiControlsString =
  std::string{'\x6e', '\x75', '\x6c', '\x6c', '\x20', '\x0', '\x20', '\x74', '\x61', '\x62',
              '\x20', '\x9', '\x20', '\x63', '\x61', '\x72', '\x72', '\x69', '\x61', '\x67',
              '\x65', '\x20', '\x72', '\x65', '\x74', '\x75', '\x72', '\x6e', '\x20', '\xd',
              '\x20', '\x6c', '\x69', '\x6e', '\x65', '\x20', '\x66', '\x65', '\x65', '\x64',
              '\x20', '\xa', '\x20', '\x77', '\x68', '\x61', '\x74', '\x65', '\x76', '\x65',
              '\x72', '\x20', '\x74', '\x68', '\x69', '\x73', '\x20', '\x69', '\x73', '\x20',
              '\xf', '\x20', '\x71', '\x75', '\x6f', '\x74', '\x61', '\x74', '\x69', '\x6f',
              '\x6e', '\x20', '\x6d', '\x61', '\x72', '\x6b', '\x20', '\x22', '\x20', '\x63',
              '\x68', '\x61', '\x72', '\x73'};
CAPTURE(asciiControlsString);

Your suggestion to not have special characters in the input string when you want to test input with special characters does not solve any problems.

@ChrisThrasher
Copy link
Collaborator

ChrisThrasher commented Apr 9, 2025

Okay I see. It took me a minute to understand what you're asking for. You just want a free function that takes a given string and returns that same string but with certain ASCII characters like \n converted to \ and n. Can you elaborate more on the use case for this? What makes this worth inclusion to the library when it could be a free function in a given codebase's test suite.

@toughengineer
Copy link
Author

As I stated in the initial message, having to reinvent such escape function every time in every codebase's test suite that needs it is cumbersome,
so it makes it worth if Catch2 offers such functionality out of the box.

@ChrisThrasher
Copy link
Collaborator

Right, that’s just repeating what your PR originally stated. I’m asking for use cases. You have described what you want the function to do but haven’t really talked about the use case that is motivating you in the first place.

Does your use case apply beyond just you? Are you aware of any other codebases that have the same problem and would benefit from such a feature?

@toughengineer
Copy link
Author

toughengineer commented Apr 9, 2025

I thought the use case is self evident.

Sometimes you need to have strings containing special characters as test input, and you want to have a message containing the input as a context to make diagnosing easier when the test fails,
here is a little bit expanded example from my specific case:

//...
  SECTION("strings with ASCII control characters and \"") {
    const auto asciiControlsString =
      "null \0 tab \t carriage return \r line feed \n whatever this is \xf quotation mark \" chars"sv;

    SECTION("unchanged in relaxed mode") {
      CAPTURE(asciiControlsString);
      CHECK(minjson::unescape(asciiControlsString, minjson::UnescapeMode::Relaxed) == asciiControlsString);
    }


    SECTION("error in strict mode and during parsing") {
      SECTION("unescape") {
        CAPTURE(asciiControlsString);
        CHECK(minjson::unescape(asciiControlsString).empty());
        CHECK(minjson::unescape(asciiControlsString, minjson::UnescapeMode::Strict).empty());
      }
//...

I imagine this situation is quite common for certain types of code, like a JSON library, although I don't have any list of codebases which indeed have similar situation.

Quick search gave https://github.com/nlohmann/json, although they use doctest instead of Catch, their usage of CAPTURE() in tests illustrates the use case, e.g.:
https://github.com/nlohmann/json/blob/00ecc7ed7ab2f37fbdf5bc7eca46503301999547/tests/src/unit-testsuites.cpp#L241-L252

        auto TEST_STRING = [](const std::string & json_string, const std::string & expected)
        {
            CAPTURE(json_string)
            CAPTURE(expected)
            CHECK(json::parse(json_string)[0].get<std::string>() == expected);
        };


        TEST_STRING("[\"\"]", "");
        TEST_STRING("[\"Hello\"]", "Hello");
        TEST_STRING(R"(["Hello\nWorld"])", "Hello\nWorld");
        //TEST_STRING("[\"Hello\\u0000World\"]", "Hello\0World");
        TEST_STRING(R"(["\"\\/\b\f\n\r\t"])", "\"\\/\b\f\n\r\t");

In this particular example when expected == "\"\\/\b\f\n\r\t" the output of CAPTURE(expected) would be unreadable.

@ChrisThrasher
Copy link
Collaborator

Can you elaborate on how certain escape sequences will be replaced? One concern I have is different users will want to replace escape sequences with different replacement strings. For example, one user may want to print \n as an actual newline while someone else wants to replace that with the string "\n" or even the hex value of the newline character. I'm not sure how we can make a central API that satisfies all those use cases, not to mention other non-printable characters like ASCII control codes.

@toughengineer
Copy link
Author

Anything that makes the output not garbled and understandable is acceptable to me.

The obvious way is to use the same rules as in C++, e.g. \0\b\f\n\r\t\\ for standard escapes and \xN
for other non printable characters.

In this case you want to escape \\ to distinguish e.g. a single newline character \n from a sequence of literal characters of backslash and small letter 'n' \\n.

You can also decide to escape \" and maybe \', that makes the result copy-pastable right into the code which is a nice feature.

A side note.

Detecting incorrect UTF-8 code points and outputting escaped code units would be amazing, e.g.:

this code point is missing one continuation byte: "\xF0\x9F\x98"

But that probably should not be the responsibility of this function.

The purpose of the hypothetical function is to offer functionality, not to satisfy all potential users.
This is the same logic that can be applied to the current functionality of Catch, you had to decide which functionality to provide and stop there instead of wondering what one user or someone else may want the functionality to be.

@toughengineer
Copy link
Author

Today I discovered -i option, so there already is similar functionality that one must explicitly turn on.
Would be great if this functionality can be turned on for concrete invocations of e.g. CAPTURE().

@philsquared
Copy link
Collaborator

Hey, remember me? :-) I still get notifications for these issues bubble up to me sometimes.
In this case, since it was me that added -i originally I thought I'd chip in.

The purpose of that switch was for exactly what you describe (AFAICS) - although do say if you think there is anything missing in terms of what it converts (I think I added that back in the day when I was the "primary customer").

Sounds like you want to be able to apply the same feature, but programmatically for certain strings, only?

IIRC the conversion happens at the reporting stage, so it might not be trivial to accomplish that by just building on what's already there. Would a per-test-case option be sufficient? (no promises, just asking)

@toughengineer
Copy link
Author

Sounds like you want to be able to apply the same feature, but programmatically for certain strings, only?

Pretty much.

Anything per test case/per section/per macro invocation would be sufficient for me.
Per test case (or even per section) escaping of "invisibles" makes a lot of sense now that you mentioned it, but a separate escape function would also do.

I want it to be baked into the test code itself, so e.g. someone (probably me a year from now) doesn't have to stare at the garbled output and somehow discover that -i option is a thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants