Dependencies

Direct dependencies: Foundation, Memory
All dependencies: Foundation, Memory

Statistics

Lines of code (excluding comments): 3387
Lines of code (including comments): 4950

Features

Class	Description
SC::String	A non-modifiable owning string with associated encoding.
SC::StringBuilder	Builds String out of a sequence of StringView or formatting through StringFormat.
SC::StringConverter	Converts String to a different encoding (UTF8, UTF16).
SC::StringIterator	A position inside a fixed range `[start, end)` of UTF code points.
SC::StringIteratorASCII	A string iterator for ASCII strings.
SC::StringIteratorUTF8	A string iterator for UTF8 strings.
SC::StringIteratorUTF16	A string iterator for UTF16 strings.
SC::StringView	Non-owning view over a range of characters with UTF Encoding.
SC::StringAlgorithms	Algorithms operating on strings (glob / wildcard).
SC::StringViewTokenizer	Splits a StringView in tokens according to separators.
SC::StringFormat	Formats String with a simple DSL embedded in the format string.
SC::Path	Parse and compose filesystem paths for windows and posix.
SC::Console	Writes to console using SC::StringFormat.

Status

🟩 Usable
Library is usable and can be successfully used to mix operations with strings made in different encodings.

Blog

Some relevant blog posts are:

July 2025 Update

Definition

StringView

Non-owning view over a range of characters with UTF Encoding.

It additional also holds the SC::StringEncoding information (ASCII, UTF8 or UTF16). During construction the encoding information and the null-termination state must be specified. All methods are const because it's not possible to modify a string with it.
Example (Construct)

StringView s("asd");
SC_ASSERT_RELEASE(s.sizeInBytes() == 3);
SC_ASSERT_RELEASE(s.isNullTerminated());

Example (Construct from null terminated string)

const char* someString = "asdf";
// construct only "asd", not null terminated (as there is 'f' after 'd')
StringView s({someString, strlen(asd) - 1}, false, StringEncoding::Ascii);
SC_ASSERT_RELEASE(s.sizeInBytes() == 3);
SC_ASSERT_RELEASE(not s.isNullTerminated());
//
// ... or
StringView s2 = StringView::fromNullTerminated(s, StringEncoding::Ascii); // s2 == "asdf"

StringView::containsString

Check if StringView contains another StringView with compatible encoding.

Parameters

str	The other StringView to check with current

Returns: Returns true if this StringView contains str

Warning: This method will assert if strings have non compatible encoding. It can be checked with StringView::hasCompatibleEncoding (str) == true

Example:

StringView asd = "123 456";
SC_TRY(asd.containsString("123"));
SC_TRY(asd.containsString("456"));
SC_TRY(not asd.containsString("124"));
SC_TRY(not asd.containsString("4567"));

StringView::compare

Ordering comparison between non-normalized StringView (operates on code points, not on utf graphemes)

Parameters

other The string being compared to current one

Returns: Result of the comparison (smaller, equals or bigger)

Example:

// àèìòù (1 UTF16-LE sequence, 2 UTF8 sequence)
SC_ASSERT_RELEASE("\xc3\xa0\xc3\xa8\xc3\xac\xc3\xb2\xc3\xb9"_u8.compare(
                    "\xe0\x0\xe8\x0\xec\x0\xf2\x0\xf9\x0"_u16) == StringView::Comparison::Equals);
 
// 日本語語語 (1 UTF16-LE sequence, 3 UTF8 sequence)
StringView stringUtf8  = StringView("\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e\xe8\xaa\x9e\xe8\xaa\x9e"_u8);
StringView stringUtf16 = StringView("\xE5\x65\x2C\x67\x9E\x8a\x9E\x8a\x9E\x8a\x00"_u16); // LE
// Comparisons are on code points NOT grapheme clusters!!
SC_ASSERT_RELEASE(stringUtf8.compare(stringUtf16) == StringView::Comparison::Equals);
SC_ASSERT_RELEASE(stringUtf16.compare(stringUtf8) == StringView::Comparison::Equals);
SC_ASSERT_RELEASE(stringUtf8 == stringUtf16);
SC_ASSERT_RELEASE(stringUtf16 == stringUtf8);

StringView::fullyOverlaps

Check if this StringView is equal to other StringView (operates on code points, not on utf graphemes).

Returns the number of code points that are the same in both StringView-s.

Parameters

other	The StringView to be compared to
commonOverlappingPoints	number of equal code points in both StringView

Returns: true if the two StringViews are equal

Example:

StringView asd = "123 456"_a8;
size_t overlapPoints = 0;
SC_TEST_EXPECT(not asd.fullyOverlaps("123___", overlapPoints) and overlapPoints == 3);

StringView::startsWithAnyOf

Check if StringView starts with any utf code point in the given span.

Parameters

codePoints The utf code points to check against

Returns: Returns true if this StringView starts with any code point inside codePoints

Example:

SC_TEST_EXPECT("123 456".startsWithAnyOf({'1', '8'})); // '1' will match

StringView::endsWithAnyOf

Check if StringView ends with any utf code point in the given span.

Parameters

codePoints The utf code points to check against

Returns: Returns true if this StringView ends with any code point inside codePoints

Example:

SC_TEST_EXPECT("123 456".endsWithAnyOf({'a', '6'})); // '6' will match

StringView::startsWith

Check if StringView starts with another StringView.

Parameters

str	The other StringView to check with current

Returns: Returns true if this StringView starts with str

Example:

SC_TEST_EXPECT("123 456".startsWith("123"));

StringView::endsWith

Check if StringView ends with another StringView.

Parameters

str	The other StringView to check with current

Returns: Returns true if this StringView ends with str

Example:

SC_TEST_EXPECT("123 456".endsWith("456"));

StringView::containsString

Check if StringView contains another StringView with compatible encoding.

Parameters

str	The other StringView to check with current

Returns: Returns true if this StringView contains str

Warning: This method will assert if strings have non compatible encoding. It can be checked with StringView::hasCompatibleEncoding (str) == true

Example:

StringView asd = "123 456";
SC_TRY(asd.containsString("123"));
SC_TRY(asd.containsString("456"));
SC_TRY(not asd.containsString("124"));
SC_TRY(not asd.containsString("4567"));

StringView::containsCodePoint

Check if StringView contains given utf code point.

Parameters

c	The utf code point to check against

Returns: Returns true if this StringView contains code point c

StringView::sliceStartEnd

Get slice [start, end) starting at offset start and ending at end (measured in utf code points)

Parameters

start	The initial code point where the slice starts
end	One after the final code point where the slice ends

Returns: The [start, end) StringView slice

Example:

StringView str = "123_567";
SC_TEST_EXPECT(str.sliceStartEnd(0, 3) == "123");
SC_TEST_EXPECT(str.sliceStartEnd(4, 7) == "567");

StringView::sliceStartLength

Get slice [start, start+length] starting at offset start and of length code points.

Parameters

start	The initial code point where the slice starts
length	One after the final code point where the slice ends

Returns: The [start, start+length] StringView slice

Example:

StringView str = "123_567";
SC_TEST_EXPECT(str.sliceStartLength(7, 0) == "");
SC_TEST_EXPECT(str.sliceStartLength(0, 3) == "123");

StringView::sliceStart

Get slice [offset, end] measured in utf code points.

Parameters

offset The initial code point where the slice starts

Returns: The sliced StringView [offset, end]

Example:

StringView str = "123_567";

SC_TEST_EXPECT(str.sliceStart(4) == "567");

StringView::sliceEnd

Get slice [end-offset, end] measured in utf code points.

Parameters

offset The initial code point where the slice starts

Returns: The sliced StringView [end-offset, end]

Example:

StringView str = "123_567";

SC_TEST_EXPECT(str.sliceEnd(4) == "123");

StringView::trimEndAnyOf

Returns a shortened StringView removing ending utf code points matching the codePoints span.

Parameters

codePoints The span of utf code points to look for

Returns: The trimmed StringView

Example:

SC_TEST_EXPECT("myTest_\n__"_a8.trimEndAnyOf({'_', '\n'}) == "myTest");

SC_TEST_EXPECT("myTest"_a8.trimEndAnyOf({'_'}) == "myTest");

StringView::trimStartAnyOf

Returns a shortened StringView removing starting utf code points matching the codePoints span.

Parameters

codePoints The span of utf code points to look for

Returns: The trimmed StringView

Example:

SC_TEST_EXPECT("__\n_myTest"_a8.trimStartAnyOf({'_', '\n'}) == "myTest");

SC_TEST_EXPECT("_myTest"_a8.trimStartAnyOf({'_'}) == "myTest");

StringViewTokenizer

Splits a StringView in tokens according to separators.

StringViewTokenizer::tokenizeNext

Splits the string along a list of separators.

Parameters

separators	List of separators
options	If to skip empty tokens or not

Returns: true if there are additional tokens to parse
Example:
StringViewTokenizer tokenizer("bring,me,the,horizon");

while (tokenizer.tokenizeNext(',', StringViewTokenizer::SkipEmpty))

{

console.printLine(tokenizer.component);

}

StringViewTokenizer::countTokens

Count the number of tokens that exist in the string view passed in constructor, when splitted along the given separators.

Parameters

separators Separators to split the original string with

Returns: Current StringViewTokenizer to inspect SC::StringViewTokenizer::numSplitsNonEmpty or SC::StringViewTokenizer::numSplitsTotal.
Example:
SC_TEST_EXPECT(StringViewTokenizer("___").countTokens('_').numSplitsNonEmpty == 0);

SC_TEST_EXPECT(StringViewTokenizer("___").countTokens('_').numSplitsTotal == 3);

StringBuilder

Builds String out of a sequence of StringView or formatting through StringFormat.

The output can be a SC::Buffer or a SC::SmallBuffer (see Foundation)

StringBuilder::format

Uses StringFormat to format the given StringView against args, replacing destination contents.

Template Parameters

Types Type of Args

Parameters

fmt	The format strings
args	arguments to format

Returns: true if format succeeded

String buffer(StringEncoding::Ascii); // Or SmallString<N>

StringBuilder builder(buffer);

SC_TRY(builder.format("[{1}-{0}]", "Storia", "Bella"));

SC_ASSERT_RELEASE(builder.view() == "[Bella-Storia]");

StringBuilder::append

Uses StringFormat to format the given StringView against args, appending to destination contents.

Template Parameters

Types Type of Args

Parameters

fmt	The format strings
args	arguments to format

Returns: true if format succeeded
Example:
String buffer(StringEncoding::Ascii); // Or SmallString<N>

StringBuilder builder(buffer);

SC_TRY(builder.append("Salve"));

SC_TRY(builder.append(" {1} {0}!!!", "tutti", "a"));

SC_ASSERT_RELEASE(builder.view() == "Salve a tutti!!!");

StringBuilder::appendReplaceAll

Appends source to destination buffer, replacing occurrencesOf StringView with StringView with

Parameters

source	The StringView to be appended
occurrencesOf	The StringView to be searched inside `source`
with	The replacement StringView to be written in destination buffer

Returns: true if append succeeded

Example:

    String        buffer(StringEncoding::Ascii);
    StringBuilder builder(buffer);
    SC_TEST_EXPECT(builder.appendReplaceAll("123 456 123 10", "123", "1234"));
    SC_TEST_EXPECT(buffer == "1234 456 1234 10");
    buffer = String();
    SC_TEST_EXPECT(builder.appendReplaceAll("088123", "123", "1"));
    SC_TEST_EXPECT(buffer == "0881");

StringBuilder::appendReplaceMultiple

Appends source to destination buffer, replacing multiple substitutions pairs.

Parameters

source	The StringView to be appended
substitutions	For each substitution in the span, the first is searched and replaced with the second.

Returns: true if append succeeded

Example:

    String        buffer(StringEncoding::Utf8);
    StringBuilder sb(buffer);
    SC_TEST_EXPECT(sb.appendReplaceMultiple("asd\\salve\\bas"_u8, {{"asd", "un"}, {"bas", "a_tutti"}, {"\\", "/"}}));
    SC_TEST_EXPECT(buffer == "un/salve/a_tutti");

StringBuilder::appendHex

Appends given binary data escaping it as hexadecimal ASCII characters.

Parameters

data	Binary data to append to destination buffer
casing	Specifies if it should be appended using upper case or lower case

Returns: true if append succeeded

Example:

    uint8_t bytes[4] = {0x12, 0x34, 0x56, 0x78};
 
    String        buffer;
    StringBuilder builder(buffer);
    SC_TEST_EXPECT(builder.appendHex({bytes, sizeof(bytes)}, StringBuilder::AppendHexCase::UpperCase));
    SC_TEST_EXPECT(buffer.view() == "12345678");

String

A non-modifiable owning string with associated encoding.

SC::String is (currently) implemented as a SC::Vector with the associated string encoding. A SC::StringView can be obtained from it calling SC::String::view method but it's up to the user making sure that the usage of such SC::StringView doesn't exceed lifetime of the SC::String it originated from (but thankfully Address Sanitizer will catch the issue if it goes un-noticed).

StringIterator

A position inside a fixed range [start, end) of UTF code points.

It's a range of bytes (start and end pointers) with a current pointer pointing at a specific code point of the range. There are three classes derived from it (SC::StringIteratorASCII, SC::StringIteratorUTF8 and SC::StringIteratorUTF16) and they allow doing operations along the string view in UTF code points.

Note: Code points are not the same as perceived characters (that would be grapheme clusters). Invariants: start <= end and it >= start and it <= end.

Template Parameters

CharIterator StringIteratorASCII, StringIteratorUTF8 or StringIteratorUTF16

StringFormat

Formats String with a simple DSL embedded in the format string.

This is a small implementation to format using a minimal string based DSL, but good enough for simple usages. It uses the same {} syntax and supports positional arguments.
StringFormat::format(output, "{1} {0}", "World", "Hello") is formatted as "Hello World".
Inside the {} after a colon (:) a specification string can be used to indicate how to format the given value. As the backend for actual number to string formatting is snprintf, such specification strings are the same as what would be given to snprintf. For example passing "{:02}" is transformed to "%.02f" when passed to snprintf.
{ is escaped if found near to another {. In other words format("{{") will print a single {.

Example:

String        buffer(StringEncoding::Ascii);
StringBuilder builder(buffer);
SC_TEST_EXPECT(builder.format("{1}_{0}_{1}", 1, 0));
SC_TEST_EXPECT(buffer == "0_1_0");
SC_TEST_EXPECT(builder.format("{0:.2}_{1}_{0:.4}", 1.2222, "salve"));
SC_TEST_EXPECT(buffer == "1.22_salve_1.2222");

Note: It's not convenient to use SC::StringFormat directly, as you should probably use SC::StringBuilder

Template Parameters

RangeIterator Type of the specific StringIterator used

StringConverter

Converts String to a different encoding (UTF8, UTF16).

SC::StringConverter converts strings between different UTF encodings and can add null-terminator if requested. When the SC::StringSpan is already null-terminated, the class just forwards the original SC::StringSpan.

Example:

    const char utf8String1[]  = "\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E"; // "日本語" in UTF-8
    const char utf16String1[] = "\xE5\x65\x2C\x67\x9E\x8a";             // "日本語" in UTF-16LE
 
    SmallBuffer<255> buffer;
 
    StringView input, output, expected;
 
    input    = StringView({utf8String1, sizeof(utf8String1) - 1}, false, StringEncoding::Utf8);
    expected = StringView({utf16String1, sizeof(utf16String1) - 1}, false, StringEncoding::Utf16);
    buffer.clear();
    SC_TEST_EXPECT(StringConverter::convertEncodingToUTF16(input, buffer, &output, StringConverter::AddZeroTerminator));
    SC_TEST_EXPECT(output == expected);
 
    input    = StringView({utf16String1, sizeof(utf16String1) - 1}, false, StringEncoding::Utf16);
    expected = StringView({utf8String1, sizeof(utf8String1) - 1}, false, StringEncoding::Utf8);
    buffer.clear();
    SC_TEST_EXPECT(
        StringConverter::convertEncodingToUTF8(input, buffer, &output, StringConverter::DoNotAddZeroTerminator));
    SC_TEST_EXPECT(output == expected);

StringAlgorithms

Algorithms operating on strings (glob / wildcard).

Example

SC_ASSERT(StringAlgorithms::matchWildcard("", ""));
SC_ASSERT(StringAlgorithms::matchWildcard("1?3", "123"));
SC_ASSERT(StringAlgorithms::matchWildcard("1*3", "12223"));
SC_ASSERT(StringAlgorithms::matchWildcard("*2", "12"));
SC_ASSERT(not StringAlgorithms::matchWildcard("*1", "12"));
SC_ASSERT(not StringAlgorithms::matchWildcard("*1", "112"));
SC_ASSERT(not StringAlgorithms::matchWildcard("**1", "112"));
SC_ASSERT(not StringAlgorithms::matchWildcard("*?1", "112"));
SC_ASSERT(StringAlgorithms::matchWildcard("1*", "12123"));
SC_ASSERT(StringAlgorithms::matchWildcard("*/myString", "myString/myString/myString"));
SC_ASSERT(StringAlgorithms::matchWildcard("**/myString", "myString/myString/myString"));
SC_ASSERT(not StringAlgorithms::matchWildcard("*/String", "myString/myString/myString"));
SC_ASSERT(StringAlgorithms::matchWildcard("*/Directory/File.cpp", "/Root/Directory/File.cpp"));

Console

Writes to console using SC::StringFormat.

Example:

// Create a buffer used for UTF conversions (if necessary)
SmallBuffer< 512 * sizeof(native_char_t)> consoleConversionBuffer;
// Construct console with the buffer
String str = StringView("Test Test\n");
// Have fun printing
console.print(str.view());

Path

Parse and compose filesystem paths for windows and posix.

Path::isAbsolute

Checks if a path is absolute.

For example:

Path::isAbsolute("/dirname/basename", Path::AsPosix) == true;        // Posix Absolute
Path::isAbsolute("./dirname/basename", Path::AsPosix) == false;      // Posix Relative
Path::isAbsolute("C:\\dirname\\basename", Path::AsWindows) == true;  // Windows with Drive
Path::isAbsolute("\\\\server\\dir", Path::AsWindows) == true;        // Windows with Network
Path::isAbsolute("\\\\?\\C:\\server\\dir", Path::AsWindows) == true; // Windows with Long
Path::isAbsolute("..\\dirname\\basename", Path::AsWindows) == false; // Windows relative

Parameters

[in]	input	The StringView with path to be parsed. Trailing separators are ignored.
[in]	type	Specify to parse as Windows or Posix path

Returns: true if input is absolute

Path::dirname

Returns the directory name of a path.

Trailing separators are ignored.

For example:

Path::dirname("/dirname/basename", Path::AsPosix) == "/dirname";
Path::dirname("/dirname/basename//", Path::AsPosix) == "/dirname";
Path::dirname("C:\\dirname\\basename", Path::AsWindows) == "C:\\dirname";
Path::dirname("\\dirname\\basename\\\\", Path::AsWindows) == "\\dirname";

Parameters

[in]	input	The StringView with path to be parsed. Trailing separators are ignored.
[in]	type	Specify to parse as Windows or Posix path
	repeat	how many directory levels should be removed `dirname("/1/2/3/4", repeat=1) == "/1/2"`

Returns: Substring of input holding the directory name

Path::basename

Returns the base name of a path.

Trailing separators are ignored.

For example:

Path::basename("/a/basename", Path::AsPosix) == "basename";

Path::basename("/a/basename//", Path::AsPosix) == "basename";

Parameters

[in]	input	The StringView with path to be parsed. Trailing separators are ignored.
[in]	type	Specify to parse as Windows or Posix path

Returns: Substring of input holding the base name

Path::parseNameExtension

Splits a StringView of type "name.ext" into "name" and "ext".

Parameters

[in]	input	An input path coded as UTF8 sequence (ex. "name.ext")
[out]	name	Output string holding name ("name" in "name.ext")
[out]	extension	Output string holding extension ("ext" in "name.ext")

Returns: false if both name and extension will be empty after trying to parse them

Example:

    SC_TEST_EXPECT(Path::parseNameExtension("name.ext", name, ext));
    SC_TEST_EXPECT(name == "name");
    SC_TEST_EXPECT(ext == "ext");
 
    SC_TEST_EXPECT(!Path::parseNameExtension("", name, ext));
    SC_TEST_EXPECT(name.isEmpty());
    SC_TEST_EXPECT(ext.isEmpty());
 
    SC_TEST_EXPECT(!Path::parseNameExtension(".", name, ext));
    SC_TEST_EXPECT(name.isEmpty());
    SC_TEST_EXPECT(ext.isEmpty());
 
    SC_TEST_EXPECT(Path::parseNameExtension(".ext", name, ext));
    SC_TEST_EXPECT(name.isEmpty());
    SC_TEST_EXPECT(ext == "ext");
 
    SC_TEST_EXPECT(Path::parseNameExtension("name.", name, ext));
    SC_TEST_EXPECT(name == "name");
    SC_TEST_EXPECT(ext.isEmpty());
 
    SC_TEST_EXPECT(Path::parseNameExtension("name.name.ext", name, ext));
    SC_TEST_EXPECT(name == "name.name");
    SC_TEST_EXPECT(ext == "ext");
 
    SC_TEST_EXPECT(Path::parseNameExtension("name..", name, ext));
    SC_TEST_EXPECT(name == "name.");
    SC_TEST_EXPECT(ext.isEmpty());

Path::normalize

Resolves all .. to output a normalized path String.

For example:

Path::normalize("/Users/SC/../Documents/", &path, Path::AsPosix);

SC_RELEASE_ASSERT(path == "/Users/Documents");

Parameters

[out]	output	Reference to String that will receive the normalized Path
	view	The path to be normalized (but it should not be a view() of the output String)
	type	Specify to parse as Windows or Posix path

Returns: true if the Path was successfully parsed and normalized

Path::relativeFromTo

Get relative path that appended to source resolves to destination.

For example:

Path::relativeFromTo("/a/b/1/2/3", "/a/b/d/e", path, Path::AsPosix, Path::AsPosix);

SC_TEST_ASSERT(path == "../../../d/e");

Parameters

[in]	source	The source Path
[in]	destination	The destination Path
[out]	output	The output relative path computed that transforms source into destination
[in]	type	Specify to parse as Windows or Posix path
[in]	outputType	Specify if the output relative path should be formatted as a Posix or Windows path

Returns: true if source and destination paths can be properly parsed as absolute paths

Implementation

A design choice of the library is that strings cannot be modified. Strings are either read-only (SC::StringView) or they need to be built from scratch with SC::StringBuilder. Another design choice is to support different encodings (ASCII, UTF8 or UTF16). The reason is that ASCII is efficient when it's known that the strings manipulated have Code Points made of a single byte. UTF8 is useful on Posix platforms and UTF16 is needed because that's the default encoding used by Win32 API. All functions interacting with filesystem, for example the ones in FileSystem or FileSystemIterator, return strings in the operating system native encoding. This means that on windows they will be UTF16 strings and on Apple Devices (or Linux) they are UTF8.

Roadmap

We need to understand if we want to allow iterating grapheme clusters (perceived end-user 'characters') or advanced capabilities like normalization and uppercase / lowercase conversions. As doing these operations from scratch is non trivial we will investigate if there OS functions allowing to achieve that functionality

🟦 Complete Features:

UTF Normalization
UTF Case Conversion

💡 Unplanned Features:

UTF word breaking
Grapheme Cluster iteration

Table of Contents

Dependencies

Statistics

Features

Status

Blog

Definition

StringView

StringView::containsString

StringView::compare

StringView::fullyOverlaps

StringView::startsWithAnyOf

StringView::endsWithAnyOf

StringView::startsWith

StringView::endsWith

StringView::containsString

StringView::containsCodePoint

StringView::sliceStartEnd

StringView::sliceStartLength

StringView::sliceStart

StringView::sliceEnd

StringView::trimEndAnyOf

StringView::trimStartAnyOf

StringViewTokenizer

StringViewTokenizer::tokenizeNext

StringViewTokenizer::countTokens

StringBuilder

StringBuilder::format

StringBuilder::append

StringBuilder::appendReplaceAll

StringBuilder::appendReplaceMultiple

StringBuilder::appendHex

String

StringIterator

StringFormat

StringConverter

StringAlgorithms

Console

Path

Path::isAbsolute

Path::dirname

Path::basename

Path::parseNameExtension

Path::normalize

Path::relativeFromTo

Implementation

Roadmap