Sane C++ Libraries
C++ Platform Abstraction Libraries
Serialization Binary

🟨 Serialize to and from a binary format using Reflection

This is a versioned binary serializer / deserializer built on top of Reflection.
It uses struct member iterators on Reflection schema serialize all members and the recursively Packed property for optimizations, reducing the number of read / writes (or memcpy) needed.

Features

  • No heap allocations
  • Serialize primitive types (Little Endian)
  • Serialize Vector-like types (including SC::Vector, SC::Array, SC::String)
  • Serialize C-Array-like types (T[N])
  • Serialize Structs composed of above types or other structs
  • Optimization for Packed types
  • Optimized fast code path when deserializing data generated with same schema version
  • Automatic versioned deserialization (without losing data) generated with a different schema version for following events:
    • Dropping fields
    • Adding new fields
    • Dropping excess array members
    • Moving fields in structs
    • Integer to / from float conversions

Status

🟨 MVP
Under described limitations, the library should be usable but more testing is needed and also supporting all of the relevant additional container data-types.

Description

SerializationBinary::write

SC::SerializationBinary::write is used to serialize data. The schema itself is not used at all but it could written along with the binary data so that when reading back the data in a later version of the program, the correct choice can be made between deserializing using SerializationBinary::loadVersioned (slower but allows for missing fields and conversion) or deserializing using SerializationBinary::loadExact (faster, but schema must match 1:1).

Template Parameters
TType of object to be serialized (must be described by Reflection)
Parameters
valueThe object to be serialized
bufferThe buffer that will receive serialized bytes
numberOfWritesIf provided, will return the number of serialization operations
Returns
true if serialization succeeded

Assuming the following struct:

struct SC::SerializationSuiteTest::PrimitiveStruct
{
uint8_t arrayValue[4] = {0, 1, 2, 3};
float floatValue = 1.5f;
int64_t int64Value = -13;
bool operator!=(const PrimitiveStruct& other) const
{
for (size_t i = 0; i < TypeTraits::SizeOfArray(arrayValue); ++i)
{
if (arrayValue[i] != other.arrayValue[i])
return true;
}
if (floatValue != other.floatValue)
return true;
if (int64Value != other.int64Value)
return true;
return false;
}
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::PrimitiveStruct)
SC_REFLECT_STRUCT_FIELD(0, arrayValue)
SC_REFLECT_STRUCT_FIELD(1, floatValue)
SC_REFLECT_STRUCT_FIELD(2, int64Value)
SC_REFLECT_STRUCT_LEAVE()
constexpr auto SizeOfArray(const T(&)[N])
SizeOfArray is a constexpr function that returns the compile-time size N of a plain C array.
Definition: TypeTraits.h:76
unsigned char uint8_t
Platform independent (1) byte unsigned int.
Definition: PrimitiveTypes.h:36
long long int64_t
Platform independent (8) bytes signed int.
Definition: PrimitiveTypes.h:50

PrimitiveStruct can be written to a binary buffer with the following code:

PrimitiveStruct objectToSerialize;
Vector buffer;
SC_TRY(SerializerWriter::write(objectToSerialize, buffer));
#define SC_TRY(expression)
Checks the value of the given expression and if failed, returns this value to caller.
Definition: Result.h:48

SerializationBinary::loadExact

SC::SerializationBinary::loadExact can deserialize binary data into a struct whose schema has not changed from when SerializationBinary::write has been used to generate that same binary data. In other words if the schema of the type passed to SerializationBinary::write must match the one of current type being deserialized. If the two schemas hash match then it's possible to use this fast path, that skips all versioning checks.

Template Parameters
TType of object to be deserialized (must be described by Reflection)
Parameters
valueThe object to be deserialized
bufferThe buffer holding actual bytes for deserialization
numberOfReadsIf provided, will return the number deserialization operations
Returns
true if deserialization succeeded

Assuming the following structs:

struct SC::SerializationSuiteTest::PrimitiveStruct
{
uint8_t arrayValue[4] = {0, 1, 2, 3};
float floatValue = 1.5f;
int64_t int64Value = -13;
bool operator!=(const PrimitiveStruct& other) const
{
for (size_t i = 0; i < TypeTraits::SizeOfArray(arrayValue); ++i)
{
if (arrayValue[i] != other.arrayValue[i])
return true;
}
if (floatValue != other.floatValue)
return true;
if (int64Value != other.int64Value)
return true;
return false;
}
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::PrimitiveStruct)
SC_REFLECT_STRUCT_FIELD(0, arrayValue)
SC_REFLECT_STRUCT_FIELD(1, floatValue)
SC_REFLECT_STRUCT_FIELD(2, int64Value)
SC_REFLECT_STRUCT_LEAVE()
struct SC::SerializationSuiteTest::NestedStruct
{
int16_t int16Value = 244;
PrimitiveStruct structsArray[2];
double doubleVal = -1.24;
Array<int, 7> arrayInt = {1, 2, 3, 4, 5, 6};
bool operator!=(const NestedStruct& other) const
{
if (int16Value != other.int16Value)
return true;
for (size_t i = 0; i < TypeTraits::SizeOfArray(structsArray); ++i)
if (structsArray[i] != other.structsArray[i])
return true;
if (doubleVal != other.doubleVal)
return true;
return false;
}
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::NestedStruct)
SC_REFLECT_STRUCT_FIELD(0, int16Value)
SC_REFLECT_STRUCT_FIELD(1, structsArray)
SC_REFLECT_STRUCT_FIELD(2, doubleVal)
SC_REFLECT_STRUCT_LEAVE()
struct SC::SerializationSuiteTest::TopLevelStruct
{
NestedStruct nestedStruct;
bool operator!=(const TopLevelStruct& other) const { return nestedStruct != other.nestedStruct; }
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::TopLevelStruct)
SC_REFLECT_STRUCT_FIELD(0, nestedStruct)
SC_REFLECT_STRUCT_LEAVE()
short int16_t
Platform independent (2) bytes signed int.
Definition: PrimitiveTypes.h:45

TopLevelStruct can be serialized and de-serialized with the following code:

TopLevelStruct objectToSerialize;
TopLevelStruct deserializedObject;
// Change a field just as a test
objectToSerialize.nestedStruct.doubleVal = 44.4;
// Serialization
Vector buffer;
SC_TRY(SerializerWriter::write(objectToSerialize, buffer));
// Deserialization
SC_TRY(SerializerReader::loadExact(deserializedObject, buffer.toSpanConst()));
SC_ASSERT_RELEASE(objectToSerialize.nestedStruct.doubleVal == deserializedObject.nestedStruct.doubleVal);
#define SC_ASSERT_RELEASE(e)
Assert expression e to be true.
Definition: Assert.h:66

SerializationBinary::loadVersioned

The versioned read serializer SC::SerializationBinary::loadVersioned must be used when source and destination schemas do not match. Compatibility flags can be customized through SC::SerializationBinaryOptions object, allowing to remap data coming from an older (or just different) version of the schema to the current one. SerializationBinary::loadVersioned will try to match the memberTag field specified in [Reflection](Reflection) to match fields between source and destination schemas. When the types of the fields are different, a few options allow controlling the behaviour.

Template Parameters
TType of object to be deserialized (must be described by Reflection)
Parameters
valueThe object to deserialize
bufferThe buffer holding the bytes to be used for deserialization
schemaThe schema used to serialize data in the buffer
numberOfReadsIf provided, will return the number deserialization operations
optionsOptions for data conversion (allow dropping fields, array items etc)
Returns
true if deserialization succeeded


Assuming the following structs:

struct SC::SerializationSuiteTest::VersionedStruct1
{
float floatValue = 1.5f;
int64_t fieldToRemove = 12;
Vector<String> field2ToRemove = {"ASD1", "ASD2", "ASD3"};
int64_t int64Value = -13;
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::VersionedStruct1)
SC_REFLECT_STRUCT_FIELD(2, field2ToRemove)
SC_REFLECT_STRUCT_FIELD(0, floatValue)
SC_REFLECT_STRUCT_FIELD(1, fieldToRemove)
SC_REFLECT_STRUCT_FIELD(3, int64Value)
SC_REFLECT_STRUCT_LEAVE()
struct SC::SerializationSuiteTest::VersionedStruct2
{
int64_t int64Value = 55;
float floatValue = -2.9f;
bool operator!=(const VersionedStruct1& other) const
{
if (floatValue != other.floatValue)
return true;
if (int64Value != other.int64Value)
return true;
return false;
}
};
SC_REFLECT_STRUCT_VISIT(SC::SerializationSuiteTest::VersionedStruct2)
SC_REFLECT_STRUCT_FIELD(3, int64Value)
SC_REFLECT_STRUCT_FIELD(0, floatValue)
SC_REFLECT_STRUCT_LEAVE()

VersionedStruct2 can be deserialized from VersionedStruct1 in the following way

constexpr auto schema = SerializerSchemaCompiler::template compile<VersionedStruct1>();
VersionedStruct1 objectToSerialize;
VersionedStruct2 deserializedObject;
// Serialization
Vector buffer;
SC_TRY(SerializerWriter::write(objectToSerialize, buffer));
// Deserialization
SC_TRY(SerializerReader::loadVersioned(deserializedObject, buffer.toSpanConst(), schema.typeInfos));
Note
The versioned serializer is greatly simplified in conjunction with Reflection sorting Packed structs by offsetInBytes.

SerializationBinaryOptions

Conversion options for the binary versioned deserializer.

Option Description
allowFloatToIntTruncation Can truncate a float to get an integer value.
allowDropExcessArrayItems Can drop array items if destination array is smaller.
allowDropExcessStructMembers Can drop fields not matching any memberTag in destination struct.

Binary Format

The binary format is defined as follows:

  • Primitive types (int, float etc) are Packed by definition and get dumped to binary stream as is (with their native endian-ness)
  • struct are serialized:
    • If Packed == True: In a single memcpy-dump (as if sorted by offsetInBytes)
    • If Packed == False: Serializing each field sorted by their visit order (and not the memberTag)
  • T[N] arrays (fixed number of elements) are serialized:
    • If item type Packed == True: In a single memcpy-dump
    • If item type Packed == False: Serializing each array item in sequence
  • Vector<T> (variable number of elements) are serialized:
    • Serialize number of elements as an uint64_t
    • If item type Packed == True: In a single memcpy-dump
    • If item type Packed == False: Serializing each vector item in sequence

Packed types (optimization)

If a struct, T[N] array or content of Vector<T> is made of a recursively Packed type (i.e. no padding bytes at any level inside the given type) it will be serialized in a single operation.
This can really condense a large number of operations in a single one on types obeying to the Packed property.

Note
It's possible to static_assert the Packed property of a type, if one wants to be sure not to accidentally introduce padding bytes in serialized types.

Blog

Some relevant blog posts are:

Alternative implementation

The binary serializer has an additional parallel implementation implementations, see SerializationBinaryTypeErased in Libraries Extra.
This is just an experiment to check if with some more runtime-code and less compile-time-code we can further bring down compile times, but we still need to build a proper benchmark for it.

Roadmap

The binary serializer is not a streaming one, so loading a large data structure will at some point need double of the required space in memory.
This can be solved implementing streaming binary serializer or experimenting with memory mapped files to support really large binary data structures.

🟩 Usable

🟦 Complete Features:

  • Streaming serializer

💡 Unplanned Features:

  • None so far