🟩 String formatting / conversion / manipulation (ASCII / UTF8 / UTF16)
Strings library allow read-only and write string operations and UTF Conversions.
Class | Description |
---|---|
SC::String | A non-modifiable owning string with associated encoding. |
SC::StringBuilder | Builds String out of a sequence of StringView or formatting through StringFormat. |
SC::StringConverter | Converts String to a different encoding (UTF8, UTF16). |
SC::StringIterator | A position inside a fixed range [start, end) of UTF code points. |
SC::StringIteratorASCII | A string iterator for ASCII strings. |
SC::StringIteratorUTF8 | A string iterator for UTF8 strings. |
SC::StringIteratorUTF16 | A string iterator for UTF16 strings. |
SC::StringView | Non-owning view over a range of characters with UTF Encoding. |
SC::StringAlgorithms | Algorithms operating on strings (glob / wildcard). |
SC::StringViewTokenizer | Splits a StringView in tokens according to separators. |
SC::StringFormat | Formats String with a simple DSL embedded in the format string. |
SC::Console | Writes to console using SC::StringFormat. |
🟩 Usable
Library is usable and can be successfully used to mix operations with strings made in different encodings.
Non-owning view over a range of characters with UTF Encoding. It additional also holds the SC::StringEncoding information (ASCII
, UTF8
or UTF16
). During construction the encoding information and the null-termination state must be specified. All methods are const because it's not possible to modify a string with it.
Example (Construct)
Example (Construct from null terminated string)
Check if StringView contains another StringView with compatible encoding.
str | The other StringView to check with current |
true
if this StringView contains str true
Example:
Ordering comparison between non-normalized StringView (operates on code points, not on utf graphemes)
other | The string being compared to current one |
Example:
Check if this StringView is equal to other StringView (operates on code points, not on utf graphemes). Returns the number of code points that are the same in both StringView-s.
other | The StringView to be compared to |
commonOverlappingPoints | number of equal code points in both StringView |
true
if the two StringViews are equalExample:
Check if StringView starts with any utf code point in the given span.
codePoints | The utf code points to check against |
true
if this StringView starts with any code point inside codePointsExample:
Check if StringView ends with any utf code point in the given span.
codePoints | The utf code points to check against |
true
if this StringView ends with any code point inside codePointsExample:
Check if StringView starts with another StringView.
str | The other StringView to check with current |
true
if this StringView starts with strExample:
Check if StringView ends with another StringView.
str | The other StringView to check with current |
true
if this StringView ends with strExample:
Check if StringView contains another StringView with compatible encoding.
str | The other StringView to check with current |
true
if this StringView contains str true
Example:
Check if StringView contains given utf code point.
c | The utf code point to check against |
true
if this StringView contains code point cGet slice [start, end)
starting at offset start
and ending at end
(measured in utf code points)
start | The initial code point where the slice starts |
end | One after the final code point where the slice ends |
[start, end)
StringView sliceExample:
Get slice [start, start+length]
starting at offset start
and of length
code points.
start | The initial code point where the slice starts |
length | One after the final code point where the slice ends |
[start, start+length]
StringView sliceExample:
Get slice [offset, end]
measured in utf code points.
offset | The initial code point where the slice starts |
[offset, end]
Example:
Get slice [end-offset, end]
measured in utf code points.
offset | The initial code point where the slice starts |
[end-offset, end]
Example:
Returns a shortened StringView removing ending utf code points matching the codePoints
span.
codePoints | The span of utf code points to look for |
Example:
Returns a shortened StringView removing starting utf code points matching the codePoints
span.
codePoints | The span of utf code points to look for |
Example:
Splits a StringView in tokens according to separators.
Splits the string along a list of separators.
separators | List of separators |
options | If to skip empty tokens or not |
true
if there are additional tokens to parse Count the number of tokens that exist in the string view passed in constructor, when splitted along the given separators.
separators | Separators to split the original string with |
Builds String out of a sequence of StringView or formatting through StringFormat. The output can be a SC::Vector (or a SC::SmallVector, see Containers)
Uses StringFormat to format the given StringView against args, replacing destination contents.
Types | Type of Args |
fmt | The format strings |
args | arguments to format |
true
if format succeeded Uses StringFormat to format the given StringView against args, appending to destination contents.
Types | Type of Args |
fmt | The format strings |
args | arguments to format |
true
if format succeeded Appends source to destination buffer, replacing occurrencesOf
StringView with StringView with
source | The StringView to be appended |
occurrencesOf | The StringView to be searched inside source |
with | The replacement StringView to be written in destination buffer |
true
if append succeededExample:
Appends source to destination buffer, replacing multiple substitutions pairs.
source | The StringView to be appended |
substitutions | For each substitution in the span, the first is searched and replaced with the second. |
true
if append succeededExample:
Appends given binary data escaping it as hexadecimal ASCII characters.
data | Binary data to append to destination buffer |
casing | Specifies if it should be appended using upper case or lower case |
true
if append succeededExample:
A non-modifiable owning string with associated encoding. SC::String is (currently) implemented as a SC::Vector with the associated string encoding. A SC::StringView can be obtained from it calling SC::String::view method but it's up to the user making sure that the usage of such SC::StringView doesn't exceed lifetime of the SC::String it originated from (but thankfully Address Sanitizer will catch the issue if it goes un-noticed).
A position inside a fixed range [start, end)
of UTF code points. It's a range of bytes (start and end pointers) with a current pointer pointing at a specific code point of the range. There are three classes derived from it (SC::StringIteratorASCII, SC::StringIteratorUTF8 and SC::StringIteratorUTF16) and they allow doing operations along the string view in UTF code points.
CharIterator | StringIteratorASCII, StringIteratorUTF8 or StringIteratorUTF16 |
Formats String with a simple DSL embedded in the format string. This is a small implementation to format using a minimal string based DSL, but good enough for simple usages. It uses the same {}
syntax and supports positional arguments.
StringFormat::format(output, "{1} {0}", "World", "Hello")
is formatted as "Hello World"
.
Inside the {}
after a colon (:
) a specification string can be used to indicate how to format the given value. As the backend for actual number to string formatting is snprintf
, such specification strings are the same as what would be given to snprintf. For example passing "{:02}"
is transformed to "%.02f"
when passed to snprintf.
{
is escaped if found near to another {
. In other words format("{{")
will print a single {
.
Example:
RangeIterator | Type of the specific StringIterator used |
Converts String to a different encoding (UTF8, UTF16). SC::StringConverter converts strings between different UTF encodings and can add null-terminator if requested. When the SC::StringView is already null-terminated, the class just forwards the original SC::StringView.
Example:
Algorithms operating on strings (glob / wildcard).
Example
Writes to console using SC::StringFormat. Example:
A design choice of the library is that strings cannot be modified. Strings are either read-only (SC::StringView) or they need to be built from scratch with SC::StringBuilder. Another design choice is to support different encodings (ASCII
, UTF8
or UTF16
). The reason is that ASCII
is efficient when it's known that the strings manipulated have Code Points made of a single byte. UTF8 is useful on Posix platforms and UTF16 is needed because that's the default encoding used by Win32 API. All functions interacting with filesystem, for example the ones in FileSystem or FileSystemIterator, return strings in the operating system native encoding. This means that on windows they will be UTF16 strings and on Apple Devices (or Linux) they are UTF8.
We need to understand if we want to allow iterating grapheme clusters (perceived end-user 'characters') or advanced capabilities like normalization and uppercase / lowercase conversions. As doing these operations from scratch is non trivial we will investigate if there OS functions allowing to achieve that functionality
🟦 Complete Features:
💡 Unplanned Features: