Library

Strings

String formatting / conversion / manipulation (ASCII / UTF8 / UTF16)

Usable
Dependencies

No library-level dependencies

SaneCppStrings.h library allow read-only and write string operations and UTF Conversions. Path is able to parse and manipulate windows and posix paths.

Note: SC::String and SC::SmallString, the "classic C++ style" string containers are defined in Memory because they are based on SC::Buffer and they use dynamic memory allocation provided by the Memory library.

Dependencies

  • Dependencies: (none)
  • All dependencies: (none)

Dependency Graph

Features

Class Description
SC::StringBuilder SC::StringBuilder
SC::StringConverter SC::StringConverter
SC::StringIterator SC::StringIterator
SC::StringIteratorASCII SC::StringIteratorASCII
SC::StringIteratorUTF8 SC::StringIteratorUTF8
SC::StringIteratorUTF16 SC::StringIteratorUTF16
SC::StringView SC::StringView
SC::StringAlgorithms SC::StringAlgorithms
SC::StringViewTokenizer SC::StringViewTokenizer
SC::StringFormat SC::StringFormat
SC::Path SC::Path
SC::Console SC::Console

Status

Usable Library is usable and can be successfully used to mix operations with strings made in different encodings.

Blog

Some relevant blog posts are:

Definition

StringBuilder

StringView

StringView::containsString

StringView::fullyOverlaps

StringView::startsWithAnyOf

StringView::endsWithAnyOf

StringView::startsWith

StringView::endsWith

StringView::containsString

StringView::containsCodePoint

StringView::sliceStartEnd

StringView::sliceStartLength

StringView::sliceStart

StringView::sliceEnd

StringView::trimEndAnyOf

StringView::trimStartAnyOf

StringViewTokenizer

StringViewTokenizer::tokenizeNext

StringViewTokenizer::countTokens

StringIterator

StringFormat

StringConverter

StringAlgorithms

Console

Path

Path::isAbsolute

Path::dirname

Path::basename

Path::parseNameExtension

Path::normalize

Path::relativeFromTo

Implementation

A design choice of the library is that strings cannot be modified. Strings are either read-only (SC::StringView) or they need to be built from scratch with SC::StringBuilder. Another design choice is to support different encodings (ASCII, UTF8 or UTF16). The reason is that ASCII is efficient when it's known that the strings manipulated have Code Points made of a single byte. UTF8 is useful on Posix platforms and UTF16 is needed because that's the default encoding used by Win32 API. All functions interacting with filesystem, for example the ones in FileSystem or FileSystemIterator, return strings in the operating system native encoding. This means that on windows they will be UTF16 strings and on Apple Devices (or Linux) they are UTF8.

Roadmap

We need to understand if we want to allow iterating grapheme clusters (perceived end-user 'characters') or advanced capabilities like normalization and uppercase / lowercase conversions. As doing these operations from scratch is non trivial we will investigate if there OS functions allowing to achieve that functionality

Complete Features:

  • UTF Normalization
  • UTF Case Conversion

💡 Unplanned Features:

  • UTF word breaking
  • Grapheme Cluster iteration

Statistics

Type Lines Of Code Comments Sum
Headers 1012 1081 2093
Sources 3115 398 3513
Sum 4127 1479 5606