Skip to content

Improved readme #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 161 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,27 @@

Fast compile-time regular expressions with support for matching/searching/capturing during compile-time or runtime.

You can use the single header version from directory `single-header`. This header can be regenerated with `make single-header`. If you are using cmake, you can add this directory as subdirectory and link to target `ctre`.

More info at [compile-time.re](https://compile-time.re/)

- [What this library can do](#What-this-library-can-do)
- [Unicode support](#Unicode-support)
- [Unknown character escape behaviour](#Unknown-character-escape-behaviour)
- [Supported compilers](#Supported-compilers)
- [API Overview](#API-Overview)
- [Range outputing API](#Range-outputing-API)
- [Functors](#Functors)
- [Possible subjects (inputs)](#Possible-subjects-\(inputs\))
- [Template UDL syntax](#Template-UDL-syntax)
- [C++17 syntax](#C++17-syntax)
- [C++20 syntax](#C++20-syntax)
- [Examples](#Examples)
- [Extracting number from input](#Extracting-number-from-input)
- [Extracting-values-from-date](#Extracting-values-from-date)
- [Using captures](#Using-captures)
- [Lexer](#Lexer)
- [Range over input](#Range-over-input)
- [Unicode](#Unicode)
- [Integration](#Integration)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToC is already provided by GitHub's UI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the github.com website that's why I created the ToC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, GitHub UI provides a ToC already:

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't know that

## What this library can do

```c++
Expand Down Expand Up @@ -38,6 +55,12 @@ The library is implementing most of the PCRE syntax with a few exceptions:

More documentation on [pcre.org](https://www.pcre.org/current/doc/html/pcre2syntax.html).

### Unicode support

To enable you need to include:
* `<ctre-unicode.hpp>`
* or `<ctre.hpp>` and `<unicode-db.hpp>`

### Unknown character escape behaviour

Not all escaped characters are automatically inserted as self, behaviour of the library is escaped characters are with special meaning, unknown escaped character is a syntax error.
Expand All @@ -46,7 +69,16 @@ Explicitly allowed character escapes which insert only the character are:

```\-\"\<\>```

## Basic API
## Supported compilers

* clang 6.0+ (template UDL, C++17 syntax)
* xcode clang 10.0+ (template UDL, C++17 syntax)
* clang 12.0+ (C++17 syntax, C++20 cNTTP syntax)
* gcc 8.0+ (template UDL, C++17 syntax)
* gcc 9.0+ (C++17 & C++20 cNTTP syntax)
* MSVC 15.8.8+ (C++17 syntax only) (semi-supported, I don't have windows machine)

## API Overview

This is approximated API specification from a user perspective (omitting `constexpr` and `noexcept` which are everywhere, and using C++20 syntax even the API is C++17 compatible):
```c++
Expand Down Expand Up @@ -107,23 +139,8 @@ if (matcher(input)) ...
* `std::string`-like objects (`std::string_view` or your own string if it's providing `begin`/`end` functions with forward iterators)
* pairs of forward iterators

### Unicode support

To enable you need to include:
* `<ctre-unicode.hpp>`
* or `<ctre.hpp>` and `<unicode-db.hpp>`

Otherwise you will get missing symbols if you try to use the unicode support without enabling it.

## Supported compilers

* clang 6.0+ (template UDL, C++17 syntax)
* xcode clang 10.0+ (template UDL, C++17 syntax)
* clang 12.0+ (C++17 syntax, C++20 cNTTP syntax)
* gcc 8.0+ (template UDL, C++17 syntax)
* gcc 9.0+ (C++17 & C++20 cNTTP syntax)
* MSVC 15.8.8+ (C++17 syntax only) (semi-supported, I don't have windows machine)

### Template UDL syntax

The compiler must support extension N3599, for example as GNU extension in gcc (not in GCC 9.1+) and clang.
Expand Down Expand Up @@ -151,21 +168,28 @@ constexpr auto match(std::string_view sv) noexcept {

(this is tested in MSVC 15.8.8)

[link to compiler explorer](https://gcc.godbolt.org/z/hc4x9f3s1)

### C++20 syntax

Currently, the only compiler which supports cNTTP syntax `ctre::match<PATTERN>(subject)` is GCC 9+.
The only compilers which support cNTTP syntax `ctre::match<PATTERN>(subject)` are GCC 9+ and Clang 12+.

```c++
constexpr auto match(std::string_view sv) noexcept {
return ctre::match<"h.*">(sv);
}
```
[link to compiler explorer](https://gcc.godbolt.org/z/Yv3PjK7Pd)

## Examples

### Extracting number from input

```c++
#include <ctre.hpp>
#include <optional>
#include <string_view>

std::optional<std::string_view> extract_number(std::string_view s) noexcept {
if (auto m = ctre::match<"[a-z]+([0-9]+)">(s)) {
return m.get<1>().to_view();
Expand All @@ -175,68 +199,92 @@ std::optional<std::string_view> extract_number(std::string_view s) noexcept {
}
```

[link to compiler explorer](https://gcc.godbolt.org/z/5U67_e)
[link to compiler explorer](https://gcc.godbolt.org/z/MqfWaYPMG)

### Extracting values from date

```c++
struct date { std::string_view year; std::string_view month; std::string_view day; };
#include <ctre.hpp>
#include <optional>
#include <string_view>

struct date {
std::string_view year;
std::string_view month;
std::string_view day;
};

std::optional<date> extract_date(std::string_view s) noexcept {
using namespace ctre::literals;
if (auto [whole, year, month, day] = ctre::match<"(\\d{4})/(\\d{1,2})/(\\d{1,2})">(s); whole) {
return date{year, month, day};
} else {
return std::nullopt;
}
if (auto [whole, year, month, day] = ctre::match<"^(\\d{4})/(\\d{1,2})/(\\d{1,2})">(s); whole) {
return date{year, month, day};
} else {
return std::nullopt;
}
}

//static_assert(extract_date("2018/08/27"sv).has_value());
//static_assert((*extract_date("2018/08/27"sv)).year == "2018"sv);
//static_assert((*extract_date("2018/08/27"sv)).month == "08"sv);
//static_assert((*extract_date("2018/08/27"sv)).day == "27"sv);
// using namespace std::literals;
// static_assert(extract_date("2018/08/27"sv).has_value());
// static_assert((*extract_date("2018/08/27"sv)).year == "2018"sv);
// static_assert((*extract_date("2018/08/27"sv)).month == "08"sv);
// static_assert((*extract_date("2018/08/27"sv)).day == "27"sv);
```

[link to compiler explorer](https://gcc.godbolt.org/z/x64CVp)

### Using captures

```c++
auto result = ctre::match<"(?<year>\\d{4})/(?<month>\\d{1,2})/(?<day>\\d{1,2})">(s);
return date{result.get<"year">(), result.get<"month">, result.get<"day">};

// or in C++ emulation, but the object must have a linkage
static constexpr ctll::fixed_string year = "year";
static constexpr ctll::fixed_string month = "month";
static constexpr ctll::fixed_string day = "day";
return date{result.get<year>(), result.get<month>, result.get<day>};

// or use numbered access
// capture 0 is the whole match
return date{result.get<1>(), result.get<2>, result.get<3>};
#include <ctre.hpp>
#include <optional>
#include <string_view>

struct date {
std::string_view year;
std::string_view month;
std::string_view day;
};

// const char * s = "2021/01/01";
extern std::string_view s;

std::optional<date> extract_date() noexcept {
auto result =
ctre::match<"(?<year>\\d{4})/(?<month>\\d{1,2})/(?<day>\\d{1,2})">(s);

// or in C++ emulation, but the object must have a linkage
static constexpr ctll::fixed_string year = "year";
static constexpr ctll::fixed_string month = "month";
static constexpr ctll::fixed_string day = "day";
return date{result.get<year>(), result.get<month>(), result.get<day>()};

// or use numbered access
// capture 0 is the whole match
return date{result.get<1>(), result.get<2>(), result.get<3>()};
}
```

### Lexer

```c++
enum class type {
unknown, identifier, number
};
#include <ctre.hpp>
#include <optional>
#include <string_veiw>

enum class type { unknown, identifier, number };

struct lex_item {
type t;
std::string_view c;
type t;
std::string_view c;
};

std::optional<lex_item> lexer(std::string_view v) noexcept {
if (auto [m,id,num] = ctre::match<"([a-z]+)|([0-9]+)">(v); m) {
if (id) {
return lex_item{type::identifier, id};
} else if (num) {
return lex_item{type::number, num};
}
if (auto [m, id, num] = ctre::match<"^([a-z]++)|([0-9]++)$">(v); m) {
if (id) {
return lex_item{type::identifier, id};
} else if (num) {
return lex_item{type::number, num};
}
return std::nullopt;
}
return std::nullopt;
}
```

Expand All @@ -247,35 +295,81 @@ std::optional<lex_item> lexer(std::string_view v) noexcept {
This support is preliminary, probably the API will be changed.

```c++
auto input = "123,456,768"sv;
#include <ctre.hpp>
#include <iostream>
#include <string_veiw>

// auto input = "123,456,768";
extern const char *input;

for (auto match: ctre::range<"([0-9]+),?">(input)) {
int main(void) {
auto matches = ctre::range<"([0-9]+),?">(input);
for (auto match : matches) {
std::cout << std::string_view{match.get<0>()} << "\n";
}
return 0;
}
```
[link to compiler explorer](https://gcc.godbolt.org/z/s4zedb68n)

### Unicode

```c++
#include <ctre-unicode.hpp>
#include <iostream>

// needed if you want to output to the terminal
std::string_view cast_from_unicode(std::u8string_view input) noexcept {
return std::string_view(reinterpret_cast<const char *>(input.data()), input.size());
return std::string_view(reinterpret_cast<const char *>(input.data()), input.size());
}
int main()
{
using namespace std::literals;
std::u8string_view original = u8"Tu es un génie"sv;

for (auto match : ctre::range<"\\p{Letter}+">(original))
std::cout << cast_from_unicode(match) << std::endl;
return 0;

int main() {
using namespace std::literals;
std::u8string_view original = u8"Tu es un génie"sv;

for (auto match : ctre::range<"\\p{Letter}+">(original))
std::cout << cast_from_unicode(match) << std::endl;
return 0;
}
```

[link to compiler explorer](https://godbolt.org/z/erTshe6sz)

## Integration
You can get [ctre.hpp](https://github.com/hanickadot/compile-time-regular-expressions/blob/main/single-header/ctre.hpp) from the directory `single-header`.
You need to add:

```C++
#include "ctre.hpp"

//using namespace ctre;
```
This header can be regenerated with `make single-header`.
### CMake
If you are using cmake, you can add this directory as subdirectory and link to target `ctre`.

```CMake
cmake_minimum_required(VERSION 3.8.0)
include(FetchContent)

project(MyProject VERSION 1.0)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED True)
Comment on lines +356 to +357

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake 3.8 does not know about C++20, you must not use this value. Please read the docs:

New in version 3.12

This is not the way to set the standard requirement of a target, you must use compile features for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh,

For compilers that have no notion of a standard level, such as Microsoft Visual C++ before 2015 Update 3, this has no effect.


FetchContent_Declare(
ctre
GIT_REPOSITORY https://github.com/hanickadot/compile-time-regular-expressions.git
GIT_TAG 95c63867bf0f6497825ef6cf44a7d0791bd25883 # v3.4.1
)
Comment on lines +359 to +363

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include is missing and please read the docs:

New in version 3.11

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I totally forgot that
include(FetchContent)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FetchContent_Declare(
ctre
GIT_REPOSITORY https://github.com/hanickadot/compile-time-regular-expressions.git
GIT_TAG 95c63867bf0f6497825ef6cf44a7d0791bd25883 # v3.4.1
)


FetchContent_MakeAvailable(ctre)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read the docs:

New in version 3.14

If #230 is merged then this becomes entirely wrong, as using ctre would simply be:

find_package(ctre REQUIRED)
target_link_libraries(project_target PRIVATE ctre::ctre)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the package was not found we could do

(pseudocode)

find_package(ctre)
if(package not found){
    FetchContent(ctre)
    include_target_directories(ctre dir)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the responsibility of this project. There are many ways to "polyfill" a find_package call. It should instead just use idiomatic CMake in examples.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess

Suggested change
FetchContent_MakeAvailable(ctre)
find_package(ctre REQUIRED)
target_link_libraries(project_target PRIVATE ctre::ctre)

include_directories("${ctre_SOURCE_DIR}/single-header")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use directory scope commands. Prefer target_include_directories.


# Add an executable with the above sources

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the above sources

What sources?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ctre

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctre is a header only library, it has no source files and to the headers to the include path you use target_link_libraries(project_target PRIVATE ctre::ctre) (once the appropriate PR is merged).

add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME})
Comment on lines +369 to +370

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME})
add_executable(example main.cpp)
target_link_libraries(example PRIVATE ctre::ctre)

Copy link
Author

@alexios-angel alexios-angel Nov 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the CMake example would be more of a template if we did

Suggested change
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME})
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} PRIVATE ctre::ctre)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ${PROJECT_NAME} is pointless either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${PROJECT_NAME} makes it so there are fewer things to rewrite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find and replace exists. How often does the project name change that this abstraction needs to exist? This also just obfuscates the target's name. There are only downsides to using this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project name changes every time someone copies the CMake example template and puts it in their own project. I would say that anyone wanting to not use the variable ${PROJECT_NAME} is already further along in their project where they would only need to Ctrl+C the middle portion of the template.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just for a copy-paste example, then using an explicit name makes no semantic difference, but syntactically aligns better with what one should write to begin with. Anyone sufficiently knowledgeable regarding CMake will not bother with this example, as the only interesting thing is the exported target(s) in the install interface.

```

## Running tests (for developers)

Just run `make` in root of this project.