🟩 String formatting / conversion / manipulation (ASCII / UTF8 / UTF16)
Strings library allow read-only and write string operations and UTF Conversions. Path is able to parse and manipulate windows and posix paths.
Class | Description |
---|---|
SC::String | A non-modifiable owning string with associated encoding. |
SC::StringBuilder | Builds String out of a sequence of StringView or formatting through StringFormat. |
SC::StringConverter | Converts String to a different encoding (UTF8, UTF16). |
SC::StringIterator | A position inside a fixed range [start, end) of UTF code points. |
SC::StringIteratorASCII | A string iterator for ASCII strings. |
SC::StringIteratorUTF8 | A string iterator for UTF8 strings. |
SC::StringIteratorUTF16 | A string iterator for UTF16 strings. |
SC::StringView | Non-owning view over a range of characters with UTF Encoding. |
SC::StringAlgorithms | Algorithms operating on strings (glob / wildcard). |
SC::StringViewTokenizer | Splits a StringView in tokens according to separators. |
SC::StringFormat | Formats String with a simple DSL embedded in the format string. |
SC::Path | Parse and compose filesystem paths for windows and posix. |
SC::Console | Writes to console using SC::StringFormat. |
🟩 Usable
Library is usable and can be successfully used to mix operations with strings made in different encodings.
Some relevant blog posts are:
Non-owning view over a range of characters with UTF Encoding.
It additional also holds the SC::StringEncoding information (ASCII
, UTF8
or UTF16
). During construction the encoding information and the null-termination state must be specified. All methods are const because it's not possible to modify a string with it.
Example (Construct)
Example (Construct from null terminated string)
Check if StringView contains another StringView with compatible encoding.
str | The other StringView to check with current |
true
if this StringView contains str true
Example:
Ordering comparison between non-normalized StringView (operates on code points, not on utf graphemes)
other | The string being compared to current one |
Example:
Check if this StringView is equal to other StringView (operates on code points, not on utf graphemes).
Returns the number of code points that are the same in both StringView-s.
other | The StringView to be compared to |
commonOverlappingPoints | number of equal code points in both StringView |
true
if the two StringViews are equalExample:
Check if StringView starts with any utf code point in the given span.
codePoints | The utf code points to check against |
true
if this StringView starts with any code point inside codePointsExample:
Check if StringView ends with any utf code point in the given span.
codePoints | The utf code points to check against |
true
if this StringView ends with any code point inside codePointsExample:
Check if StringView starts with another StringView.
str | The other StringView to check with current |
true
if this StringView starts with strExample:
Check if StringView ends with another StringView.
str | The other StringView to check with current |
true
if this StringView ends with strExample:
Check if StringView contains another StringView with compatible encoding.
str | The other StringView to check with current |
true
if this StringView contains str true
Example:
Check if StringView contains given utf code point.
c | The utf code point to check against |
true
if this StringView contains code point c Get slice [start, end)
starting at offset start
and ending at end
(measured in utf code points)
start | The initial code point where the slice starts |
end | One after the final code point where the slice ends |
[start, end)
StringView sliceExample:
Get slice [start, start+length]
starting at offset start
and of length
code points.
start | The initial code point where the slice starts |
length | One after the final code point where the slice ends |
[start, start+length]
StringView sliceExample:
Get slice [offset, end]
measured in utf code points.
offset | The initial code point where the slice starts |
[offset, end]
Example:
Get slice [end-offset, end]
measured in utf code points.
offset | The initial code point where the slice starts |
[end-offset, end]
Example:
Returns a shortened StringView removing ending utf code points matching the codePoints
span.
codePoints | The span of utf code points to look for |
Example:
Returns a shortened StringView removing starting utf code points matching the codePoints
span.
codePoints | The span of utf code points to look for |
Example:
Splits a StringView in tokens according to separators.
Splits the string along a list of separators.
separators | List of separators |
options | If to skip empty tokens or not |
true
if there are additional tokens to parse Count the number of tokens that exist in the string view passed in constructor, when splitted along the given separators.
separators | Separators to split the original string with |
Builds String out of a sequence of StringView or formatting through StringFormat.
The output can be a SC::Buffer or a SC::SmallBuffer (see Foundation)
Uses StringFormat to format the given StringView against args, replacing destination contents.
Types | Type of Args |
fmt | The format strings |
args | arguments to format |
true
if format succeeded Uses StringFormat to format the given StringView against args, appending to destination contents.
Types | Type of Args |
fmt | The format strings |
args | arguments to format |
true
if format succeeded Appends source to destination buffer, replacing occurrencesOf
StringView with StringView with
source | The StringView to be appended |
occurrencesOf | The StringView to be searched inside source |
with | The replacement StringView to be written in destination buffer |
true
if append succeededExample:
Appends source to destination buffer, replacing multiple substitutions pairs.
source | The StringView to be appended |
substitutions | For each substitution in the span, the first is searched and replaced with the second. |
true
if append succeededExample:
Appends given binary data escaping it as hexadecimal ASCII characters.
data | Binary data to append to destination buffer |
casing | Specifies if it should be appended using upper case or lower case |
true
if append succeededExample:
A non-modifiable owning string with associated encoding.
SC::String is (currently) implemented as a SC::Vector with the associated string encoding. A SC::StringView can be obtained from it calling SC::String::view method but it's up to the user making sure that the usage of such SC::StringView doesn't exceed lifetime of the SC::String it originated from (but thankfully Address Sanitizer will catch the issue if it goes un-noticed).
A position inside a fixed range [start, end)
of UTF code points.
It's a range of bytes (start and end pointers) with a current pointer pointing at a specific code point of the range. There are three classes derived from it (SC::StringIteratorASCII, SC::StringIteratorUTF8 and SC::StringIteratorUTF16) and they allow doing operations along the string view in UTF code points.
CharIterator | StringIteratorASCII, StringIteratorUTF8 or StringIteratorUTF16 |
Formats String with a simple DSL embedded in the format string.
This is a small implementation to format using a minimal string based DSL, but good enough for simple usages. It uses the same {}
syntax and supports positional arguments.
StringFormat::format(output, "{1} {0}", "World", "Hello")
is formatted as "Hello World"
.
Inside the {}
after a colon (:
) a specification string can be used to indicate how to format the given value. As the backend for actual number to string formatting is snprintf
, such specification strings are the same as what would be given to snprintf. For example passing "{:02}"
is transformed to "%.02f"
when passed to snprintf.
{
is escaped if found near to another {
. In other words format("{{")
will print a single {
.
Example:
RangeIterator | Type of the specific StringIterator used |
Converts String to a different encoding (UTF8, UTF16).
SC::StringConverter converts strings between different UTF encodings and can add null-terminator if requested. When the SC::StringSpan is already null-terminated, the class just forwards the original SC::StringSpan.
Example:
Algorithms operating on strings (glob / wildcard).
Example
Writes to console using SC::StringFormat.
Example:
Parse and compose filesystem paths for windows and posix.
Checks if a path is absolute.
For example:
[in] | input | The StringView with path to be parsed. Trailing separators are ignored. |
[in] | type | Specify to parse as Windows or Posix path |
true
if input
is absolute Returns the directory name of a path.
Trailing separators are ignored.
For example:
[in] | input | The StringView with path to be parsed. Trailing separators are ignored. |
[in] | type | Specify to parse as Windows or Posix path |
repeat | how many directory levels should be removed dirname("/1/2/3/4", repeat=1) == "/1/2" |
input
holding the directory name Returns the base name of a path.
Trailing separators are ignored.
For example:
[in] | input | The StringView with path to be parsed. Trailing separators are ignored. |
[in] | type | Specify to parse as Windows or Posix path |
input
holding the base name Splits a StringView of type "name.ext" into "name" and "ext".
[in] | input | An input path coded as UTF8 sequence (ex. "name.ext") |
[out] | name | Output string holding name ("name" in "name.ext") |
[out] | extension | Output string holding extension ("ext" in "name.ext") |
false
if both name and extension will be empty after trying to parse themExample:
Resolves all ..
to output a normalized path String.
For example:
[out] | output | Reference to String that will receive the normalized Path |
view | The path to be normalized (but it should not be a view() of the output String) | |
type | Specify to parse as Windows or Posix path |
true
if the Path was successfully parsed and normalized Get relative path that appended to source
resolves to destination
.
For example:
[in] | source | The source Path |
[in] | destination | The destination Path |
[out] | output | The output relative path computed that transforms source into destination |
[in] | type | Specify to parse as Windows or Posix path |
[in] | outputType | Specify if the output relative path should be formatted as a Posix or Windows path |
true
if source and destination paths can be properly parsed as absolute paths A design choice of the library is that strings cannot be modified. Strings are either read-only (SC::StringView) or they need to be built from scratch with SC::StringBuilder. Another design choice is to support different encodings (ASCII
, UTF8
or UTF16
). The reason is that ASCII
is efficient when it's known that the strings manipulated have Code Points made of a single byte. UTF8 is useful on Posix platforms and UTF16 is needed because that's the default encoding used by Win32 API. All functions interacting with filesystem, for example the ones in FileSystem or FileSystemIterator, return strings in the operating system native encoding. This means that on windows they will be UTF16 strings and on Apple Devices (or Linux) they are UTF8.
We need to understand if we want to allow iterating grapheme clusters (perceived end-user 'characters') or advanced capabilities like normalization and uppercase / lowercase conversions. As doing these operations from scratch is non trivial we will investigate if there OS functions allowing to achieve that functionality
🟦 Complete Features:
💡 Unplanned Features: