SaneCppStrings.h library allow read-only and write string operations and UTF Conversions. Path is able to parse and manipulate windows and posix paths.
Note: SC::String and SC::SmallString, the "classic C++ style" string containers are defined in Memory because they are based on SC::Buffer and they use dynamic memory allocation provided by the Memory library.
Dependencies
- Dependencies: (none)
- All dependencies: (none)
Features
| Class | Description |
|---|---|
| SC::StringBuilder | SC::StringBuilder |
| SC::StringConverter | SC::StringConverter |
| SC::StringIterator | SC::StringIterator |
| SC::StringIteratorASCII | SC::StringIteratorASCII |
| SC::StringIteratorUTF8 | SC::StringIteratorUTF8 |
| SC::StringIteratorUTF16 | SC::StringIteratorUTF16 |
| SC::StringView | SC::StringView |
| SC::StringAlgorithms | SC::StringAlgorithms |
| SC::StringViewTokenizer | SC::StringViewTokenizer |
| SC::StringFormat | SC::StringFormat |
| SC::Path | SC::Path |
| SC::Console | SC::Console |
Status
Usable Library is usable and can be successfully used to mix operations with strings made in different encodings.
Blog
Some relevant blog posts are:
Definition
StringBuilder
StringView
StringView::containsString
StringView::fullyOverlaps
StringView::startsWithAnyOf
StringView::endsWithAnyOf
StringView::startsWith
StringView::endsWith
StringView::containsString
StringView::containsCodePoint
StringView::sliceStartEnd
StringView::sliceStartLength
StringView::sliceStart
StringView::sliceEnd
StringView::trimEndAnyOf
StringView::trimStartAnyOf
StringViewTokenizer
StringViewTokenizer::tokenizeNext
StringViewTokenizer::countTokens
StringIterator
StringFormat
StringConverter
StringAlgorithms
Console
Path
Path::isAbsolute
Path::dirname
Path::basename
Path::parseNameExtension
Path::normalize
Path::relativeFromTo
Implementation
A design choice of the library is that strings cannot be modified.
Strings are either read-only (SC::StringView) or they need to be built from scratch with SC::StringBuilder.
Another design choice is to support different encodings (ASCII, UTF8 or UTF16).
The reason is that ASCII is efficient when it's known that the strings manipulated have Code Points made of a single byte.
UTF8 is useful on Posix platforms and UTF16 is needed because that's the default encoding used by Win32 API.
All functions interacting with filesystem, for example the ones in FileSystem or
FileSystemIterator, return strings in the operating system native encoding.
This means that on windows they will be UTF16 strings and on Apple Devices (or Linux) they are UTF8.
Roadmap
We need to understand if we want to allow iterating grapheme clusters (perceived end-user 'characters') or advanced capabilities like normalization and uppercase / lowercase conversions. As doing these operations from scratch is non trivial we will investigate if there OS functions allowing to achieve that functionality
Complete Features:
- UTF Normalization
- UTF Case Conversion
💡 Unplanned Features:
- UTF word breaking
- Grapheme Cluster iteration
Statistics
| Type | Lines Of Code | Comments | Sum |
|---|---|---|---|
| Headers | 1012 | 1081 | 2093 |
| Sources | 3115 | 398 | 3513 |
| Sum | 4127 | 1479 | 5606 |