Sane C++ Libraries
C++ Platform Abstraction Libraries
Reflection

🟩 Describe C++ types at compile time for serialization

Reflection generates compile time information of fields in a structure or class.
Typically this library is used with one of the serialization libraries ( Serialization Binary or Serialization Text).

Note
Reflection uses more complex C++ constructs compared to other libraries in this repository. To limit the issue, effort has been spent trying not to use obscure C++ meta-programming techniques. The library uses only template partial specialization and constexpr.

Features

  • Reflection info is built at compile time
  • Free of heap allocations
  • Describe primitive types
  • Describe C-Arrays
  • Describe SC::Vector, SC::VectorMap, SC::Array, SC::String
  • Describe Structs composition of any supported type
  • Identify types that can be serialized with a single memcpy

Status

🟩 Usable
Under described limitations, the library should be usable.

Roadmap

🟦 Complete Features:

  • To be decided

💡 Unplanned Features:

  • None so far

Description

The main target use case is generating reflection information to be used for automatic serialization. There are many rules and limitations so far, and the main one is not supporting any type of reference or pointer. The output of the process is a schema of the reflected type. This schema is an array of SC::Reflection::TypeInfo tracking the type and offset location (in bytes) of the field in the structure it belongs to.
Fields that refer to non-primitive types (like other structs for example) can follow a link index that describes that field elsewhere in the scheme.

Packed attribute

The schema contains information about all the types of all fields of the structure and the packing state.
A packed struct is made of primitive types that are described to the Reflection system so that there are no padding bytes left in the struct.
Example of packed struct:

struct Vec3
{
float x;
float y;
float z;
};

Example of non-packed struct:

struct Vec3
{
float x;
// we have 4 bytes of padding between y and z
float z;
};
unsigned short uint16_t
Platform independent (2) bytes unsigned int.
Definition: PrimitiveTypes.h:37

A recursively packed struct is a struct made of other structs or arrays of structs without any padding bytes inside of themselves. Example of recursively packed struct:

struct ArrayOfVec3
{
Vec3 array[10];
};
struct Vec3
{
int32_t someNumber;
ArrayOfVec3 array;
};
int int32_t
Platform independent (4) bytes signed int.
Definition: PrimitiveTypes.h:46

The recursively packed property allows binary serializers and deserializer to optimize reading / writing with a single memcpy (for example Serialization Binary).

Note
This means that serializers like Serialization Binary will not invoke type constructor when deserializing a Packed type, as all members are explicitly written by serialization.

How to use it

Describing a structure is done externally to the struct itself, specializing a SC::Reflection::Reflect<> struct.

For Example:

struct TestNamespace::SimpleStructure
{
SC::uint8_t f0 = 0;
SC::uint16_t f1 = 1;
SC::uint32_t f2 = 2;
SC::uint64_t f3 = 3;
SC::int8_t f4 = 4;
SC::int16_t f5 = 5;
SC::int32_t f6 = 6;
SC::int64_t f7 = 7;
float f8 = 8;
double f9 = 9;
int arrayOfInt[3] = {1, 2, 3};
};
namespace SC
{
namespace Reflection
{
template <>
struct Reflect<TestNamespace::SimpleStructure> : ReflectStruct<TestNamespace::SimpleStructure>
{
template <typename Visitor>
static constexpr bool visit(Visitor&& visitor)
{
return visitor(0, &T::f0, "f0", SC_COMPILER_OFFSETOF(T, f0)) and //
visitor(1, &T::f1, "f1", SC_COMPILER_OFFSETOF(T, f1)) and //
visitor(2, &T::f2, "f2", SC_COMPILER_OFFSETOF(T, f2)) and //
visitor(3, &T::f3, "f3", SC_COMPILER_OFFSETOF(T, f3)) and //
visitor(4, &T::f4, "f4", SC_COMPILER_OFFSETOF(T, f4)) and //
visitor(5, &T::f5, "f5", SC_COMPILER_OFFSETOF(T, f5)) and //
visitor(6, &T::f6, "f6", SC_COMPILER_OFFSETOF(T, f6)) and //
visitor(7, &T::f7, "f7", SC_COMPILER_OFFSETOF(T, f7)) and //
visitor(8, &T::f8, "f8", SC_COMPILER_OFFSETOF(T, f8)) and //
visitor(9, &T::f9, "f9", SC_COMPILER_OFFSETOF(T, f9)) and //
visitor(10, &T::arrayOfInt, "arrayOfInt", SC_COMPILER_OFFSETOF(T, arrayOfInt));
}
};
} // namespace Reflection
} // namespace SC
#define SC_COMPILER_OFFSETOF(Class, Field)
Returns offset of Class::Field in bytes.
Definition: Compiler.h:108
unsigned char uint8_t
Platform independent (1) byte unsigned int.
Definition: PrimitiveTypes.h:36
unsigned long long uint64_t
Platform independent (8) bytes unsigned int.
Definition: PrimitiveTypes.h:42
signed char int8_t
Platform independent (1) byte signed int.
Definition: PrimitiveTypes.h:44
long long int64_t
Platform independent (8) bytes signed int.
Definition: PrimitiveTypes.h:50
unsigned int uint32_t
Platform independent (4) bytes unsigned int.
Definition: PrimitiveTypes.h:38
short int16_t
Platform independent (2) bytes signed int.
Definition: PrimitiveTypes.h:45

Struct member info

  • These fields are required for binary and text serialization with Versioning support
    • MemberTag (integer)
    • Pointer to Member
    • Field Name (string)
    • Field Byte Offset in its parent struct
  • This means being able to deserialize data from an older version of the program:
    • For Binary Formats: retaining data in struct members with matching MemberTag
    • For Textual Formats: retaining data in struct members with matching Field Name
    • Specifying both of them allow refactoring names of c++ struct members without breaking serialization formats
  • The Field Byte Offset is necessary to generate an unique versioning signature of a given Struct
  • The Pointer to Member allows serializing / deserializing without reinterpret_cast<> (we could use Field Byte Offset as an alternative)
Note
Additional considerations regarding the level of repetition:
  • There are techniques to get field name as string from member pointer on all compilers, but they're all C++ 20+.
  • There are techniques to get compile-time offset of field from member pointer but they are complex and increase compile time unnecessarily.
  • We could hash the Field Name to obtain MemberTag but an explicit integer has been preferred to allow breaking textual formats and binary formats independently.

Reflection Macros

With some handy macros one can save typing and they're generally preferable.

SC_REFLECT_STRUCT_VISIT(TestNamespace::SimpleStructure)
SC_REFLECT_STRUCT_FIELD(0, f0)
SC_REFLECT_STRUCT_FIELD(1, f1)
SC_REFLECT_STRUCT_FIELD(2, f2)
SC_REFLECT_STRUCT_FIELD(3, f3)
SC_REFLECT_STRUCT_FIELD(4, f4)
SC_REFLECT_STRUCT_FIELD(5, f5)
SC_REFLECT_STRUCT_FIELD(6, f6)
SC_REFLECT_STRUCT_FIELD(7, f7)
SC_REFLECT_STRUCT_FIELD(8, f8)
SC_REFLECT_STRUCT_FIELD(9, f9)
SC_REFLECT_STRUCT_FIELD(10, arrayOfInt);
SC_REFLECT_STRUCT_LEAVE()

Example (print schema)

To understand a little bit more how Serialization library can use this information, let's try to print the schema.
The compile time flat schema can be obtained by calling SC::Reflection::Schema::compile:

using namespace SC;
using namespace SC::Reflection;
constexpr auto SimpleStructureFlatSchema = Schema::compile<TestNamespace::SimpleStructure>();
Describe C++ types at compile time for serialization (see Reflection).
Definition: Reflection.h:13

For example we could print the schema with the following code:

// Copyright (c) Stefano Cristiano
// SPDX-License-Identifier: MIT
#include "../../Strings/Console.h"
#include "../../Strings/String.h"
#include "../../Strings/StringBuilder.h"
#include "../Reflection.h"
namespace SC
{
inline const StringView typeCategoryToStringView(Reflection::TypeCategory type)
{
switch (type)
{
case Reflection::TypeCategory::TypeInvalid: return "TypeInvalid ";
case Reflection::TypeCategory::TypeUINT8: return "TypeUINT8 ";
case Reflection::TypeCategory::TypeUINT16: return "TypeUINT16 ";
case Reflection::TypeCategory::TypeUINT32: return "TypeUINT32 ";
case Reflection::TypeCategory::TypeUINT64: return "TypeUINT64 ";
case Reflection::TypeCategory::TypeINT8: return "TypeINT8 ";
case Reflection::TypeCategory::TypeINT16: return "TypeINT16 ";
case Reflection::TypeCategory::TypeINT32: return "TypeINT32 ";
case Reflection::TypeCategory::TypeINT64: return "TypeINT64 ";
case Reflection::TypeCategory::TypeFLOAT32: return "TypeFLOAT32 ";
case Reflection::TypeCategory::TypeDOUBLE64: return "TypeDOUBLE64";
case Reflection::TypeCategory::TypeStruct: return "TypeStruct ";
case Reflection::TypeCategory::TypeArray: return "TypeArray ";
case Reflection::TypeCategory::TypeVector: return "TypeVector ";
}
Assert::unreachable();
}
template <int NUM_TYPES>
inline void printFlatSchema(Console& console, const Reflection::TypeInfo (&type)[NUM_TYPES],
const Reflection::TypeStringView (&names)[NUM_TYPES])
{
String buffer(StringEncoding::Ascii);
int typeIndex = 0;
while (typeIndex < NUM_TYPES)
{
StringBuilder builder(buffer, StringBuilder::Clear);
typeIndex += printTypes(builder, typeIndex, type + typeIndex, names + typeIndex) + 1;
console.print(buffer.view());
}
}
inline int printTypes(StringBuilder& builder, int typeIndex, const Reflection::TypeInfo* types,
const Reflection::TypeStringView* typeNames)
{
const StringView typeName({typeNames[0].data, typeNames[0].length}, false, StringEncoding::Ascii);
builder.append("[{:02}] {}", typeIndex, typeName);
switch (types[0].type)
{
builder.append(" (Struct with {} members - Packed = {})", types[0].getNumberOfChildren(),
types[0].structInfo.isPacked ? "true" : "false");
break;
builder.append(" (Array of size {} with {} children - Packed = {})", types[0].arrayInfo.numElements,
types[0].getNumberOfChildren(), types[0].arrayInfo.isPacked ? "true" : "false");
break;
builder.append(" (Vector with {} children)", types[0].getNumberOfChildren());
break;
default: break;
}
builder.append("\n{\n");
for (int idx = 0; idx < types[0].getNumberOfChildren(); ++idx)
{
const Reflection::TypeInfo& field = types[idx + 1];
builder.append("[{:02}] ", typeIndex + idx + 1);
const StringView fieldName({typeNames[idx + 1].data, typeNames[idx + 1].length}, false, StringEncoding::Ascii);
{
builder.append("Type={}\tOffset={}\tSize={}\tName={}", typeCategoryToStringView(field.type),
field.memberInfo.offsetInBytes, field.sizeInBytes, fieldName);
}
else
{
builder.append("Type={}\t \tSize={}\tName={}", typeCategoryToStringView(field.type),
field.sizeInBytes, fieldName);
}
if (field.hasValidLinkIndex())
{
builder.append("\t[LinkIndex={}]", field.getLinkIndex());
}
builder.append("\n");
}
builder.append("}\n");
return types[0].getNumberOfChildren();
}
} // namespace SC
#define SC_COMPILER_WARNING_POP
Pops warning from inside a macro.
Definition: Compiler.h:104
#define SC_COMPILER_WARNING_PUSH_UNUSED_RESULT
Disables unused-result warning (due to ignoring a return value marked as [[nodiscard]])
Definition: Compiler.h:143
TypeCategory
Enumeration of possible category types recognized by Reflection.
Definition: Reflection.h:32
@ TypeUINT32
Type is uint32_t
@ TypeUINT16
Type is uint16_t
@ TypeUINT64
Type is uint64_t
@ TypeArray
Type is an array type.
@ TypeINT16
Type is int16_t
@ TypeINT64
Type is int64_t
@ TypeINT32
Type is int32_t
@ TypeVector
Type is a vector type.
@ TypeUINT8
Type is uint8_t
@ TypeStruct
Type is a struct type.
@ TypeDOUBLE64
Type is double
@ TypeInvalid
Invalid type sentinel.
@ Ascii
Encoding is ASCII.
@ Clear
Destination buffer will be cleared before pushing to it.
Definition: StringBuilder.h:20

Called with the following code

printFlatSchema(report.console, SimpleStructureFlatSchema.typeInfos.values, SimpleStructureFlatSchema.typeNames.values);

It will print the following output for the above struct:

[00] TestNamespace::SimpleStructure (Struct with 11 members - Packed = false)
{
[01] Type=TypeUINT8 Offset=0 Size=1 Name=f0
[02] Type=TypeUINT16 Offset=2 Size=2 Name=f1
[03] Type=TypeUINT32 Offset=4 Size=4 Name=f2
[04] Type=TypeUINT64 Offset=8 Size=8 Name=f3
[05] Type=TypeINT8 Offset=16 Size=1 Name=f4
[06] Type=TypeINT16 Offset=18 Size=2 Name=f5
[07] Type=TypeINT32 Offset=20 Size=4 Name=f6
[08] Type=TypeINT64 Offset=24 Size=8 Name=f7
[09] Type=TypeFLOAT32 Offset=32 Size=4 Name=f8
[10] Type=TypeDOUBLE64 Offset=40 Size=8 Name=f9
[11] Type=TypeArray Offset=48 Size=12 Name=arrayOfInt [LinkIndex=12]
}
[12] Array (Array of size 3 with 1 children)
{
[13] Type=TypeINT32 Size=4 Name=int
}

Another example with a more complex structure building on top of the simple one:

struct TestNamespace::IntermediateStructure
{
SC::Vector<int> vectorOfInt;
SimpleStructure simpleStructure;
};
SC_REFLECT_STRUCT_VISIT(TestNamespace::IntermediateStructure)
SC_REFLECT_STRUCT_FIELD(1, vectorOfInt)
SC_REFLECT_STRUCT_FIELD(0, simpleStructure)
SC_REFLECT_STRUCT_LEAVE()
struct TestNamespace::ComplexStructure
{
SC::uint8_t f1 = 0;
SimpleStructure simpleStructure;
SimpleStructure simpleStructure2;
SC::uint16_t f4 = 0;
IntermediateStructure intermediateStructure;
SC::Vector<SimpleStructure> vectorOfStructs;
};
SC_REFLECT_STRUCT_VISIT(TestNamespace::ComplexStructure)
SC_REFLECT_STRUCT_FIELD(0, f1)
SC_REFLECT_STRUCT_FIELD(1, simpleStructure)
SC_REFLECT_STRUCT_FIELD(2, simpleStructure2)
SC_REFLECT_STRUCT_FIELD(3, f4)
SC_REFLECT_STRUCT_FIELD(4, intermediateStructure)
SC_REFLECT_STRUCT_FIELD(5, vectorOfStructs)
SC_REFLECT_STRUCT_LEAVE()
A contiguous sequence of heap allocated elements.
Definition: Vector.h:51

Printing the schema of ComplexStructure outputs the following:

Note
Packed structs will get their members sorted by offsetInBytes.
For regular structs, they are left in the same order as the visit sequence.
This allows some substantial simplifications in Serialization Binary implementation.
[00] TestNamespace::ComplexStructure (Struct with 6 members - Packed = false)
{
[01] Type=TypeUINT8 Offset=0 Size=1 Name=f1
[02] Type=TypeStruct Offset=8 Size=64 Name=simpleStructure [LinkIndex=7]
[03] Type=TypeStruct Offset=72 Size=64 Name=simpleStructure2 [LinkIndex=7]
[04] Type=TypeUINT16 Offset=136 Size=2 Name=f4
[05] Type=TypeStruct Offset=144 Size=72 Name=intermediateStructure [LinkIndex=19]
[06] Type=TypeVector Offset=216 Size=8 Name=vectorOfStructs [LinkIndex=22]
}
[07] TestNamespace::SimpleStructure (Struct with 11 members - Packed = false)
{
[08] Type=TypeUINT8 Offset=0 Size=1 Name=f0
[09] Type=TypeUINT16 Offset=2 Size=2 Name=f1
[10] Type=TypeUINT32 Offset=4 Size=4 Name=f2
[11] Type=TypeUINT64 Offset=8 Size=8 Name=f3
[12] Type=TypeINT8 Offset=16 Size=1 Name=f4
[13] Type=TypeINT16 Offset=18 Size=2 Name=f5
[14] Type=TypeINT32 Offset=20 Size=4 Name=f6
[15] Type=TypeINT64 Offset=24 Size=8 Name=f7
[16] Type=TypeFLOAT32 Offset=32 Size=4 Name=f8
[17] Type=TypeDOUBLE64 Offset=40 Size=8 Name=f9
[18] Type=TypeArray Offset=48 Size=12 Name=arrayOfInt [LinkIndex=24]
}
[19] TestNamespace::IntermediateStructure (Struct with 2 members - Packed = false)
{
[20] Type=TypeVector Offset=0 Size=8 Name=vectorOfInt [LinkIndex=26]
[21] Type=TypeStruct Offset=8 Size=64 Name=simpleStructure [LinkIndex=7]
}
[22] SC::Vector (Vector with 1 children)
{
[23] Type=TypeStruct Size=64 Name=TestNamespace::SimpleStructure [LinkIndex=7]
}
[24] Array (Array of size 3 with 1 children - Packed = true)
{
[25] Type=TypeINT32 Size=4 Name=int
}
[26] SC::Vector (Vector with 1 children)
{
[27] Type=TypeINT32 Size=4 Name=int
}

Implementation

As already said in the introduction, effort has been put to keep the library as readable as possible, within the limits of C++.
The only technique used is template partial specialization and some care in writing functions that are valid in constexpr context.
For example generation of the schema is done through partial specialization of the SC::Reflection::Reflect template, by redefining the visit constexpr static member function for a given type. Inside the visit function, it's possible to let the Reflection system know about a given field.

The output of reflection is an array of SC::Reflection::TypeInfo referred to as Flat Schema at compile time.
Such compile time information is used when serializing and deserializing data that has missing fields.

enum class TypeCategory : uint8_t
{
// Primitive types
TypeUINT8 = 1,
TypeUINT16 = 2,
TypeUINT32 = 3,
TypeUINT64 = 4,
TypeINT8 = 5,
TypeINT16 = 6,
TypeINT32 = 7,
TypeINT64 = 8,
TypeDOUBLE64 = 10,
// Non primitive types
TypeStruct = 11,
TypeArray = 12,
TypeVector = 13,
};
struct TypeInfo
{
bool hasLink : 1;
TypeCategory type : 7;
union
{
uint8_t numberOfChildren;
uint8_t linkIndex;
};
uint16_t sizeInBytes;
struct EmptyInfo
{
};
struct MemberInfo
{
uint16_t memberTag;
uint16_t offsetInBytes;
constexpr MemberInfo(uint8_t memberTag, uint16_t offsetInBytes)
: memberTag(memberTag), offsetInBytes(offsetInBytes)
{}
};
struct StructInfo
{
bool isPacked : 1;
constexpr StructInfo(bool isPacked) : isPacked(isPacked) {}
};
struct ArrayInfo
{
uint32_t isPacked : 1;
uint32_t numElements : 31;
constexpr ArrayInfo(bool isPacked, uint32_t numElements) : isPacked(isPacked), numElements(numElements) {}
};
union
{
EmptyInfo emptyInfo;
MemberInfo memberInfo;
StructInfo structInfo;
ArrayInfo arrayInfo;
};

The flat schema is generated by SC::Reflection::SchemaCompiler, that walks the structures and building an array of SC::Reflection::TypeInfo describing each field.

For example a struct is defined by a SC::Reflection::TypeInfo that contains information on the numberOfChildren of the struct, corresponding to the number of fields. numberOfChildren SC::Reflection::TypeInfo exist immediately after the struct itself in the flat schema array.
If type of one of these fields is complex (for example it refers to another struct) it contains a link.
A Link is an index/offset in the flat schema array (linkIndex field).

This simple data structure allows to describe hierarchically all types in a structure decomposing it into its primitive types or into special classes that must be handled specifically (SC::Vector, SC::Array etc.).
It also has an additional nice property that it's trivially serializable by dumping this (compile time known) array with a single memcpy or by calculating an hash that can act as a unique signature of the entire type itself (for versioning purposes).

It's possible associating a string literal with each type or member, so that text based serializer can refer to it, as used by the JSON Serialization Text.
Lastly it's also possible associating to a field an MemberTag integer field, that can be leveraged by by binary serializers to keep track of the field position in binary formats. This allows binary formats to avoid using strings at all for format versioning/evolution, allowing some executable size benefits. This makes also possible not breaking binary file formats just because a field / member was renamed. This is leveraged by the Serialization Binary.

Note
It's possible also trying the experimental SC_REFLECT_AUTOMATIC mode by using Reflection Auto library that automatically lists struct members. This makes sense only if used with Serialization Binary, as SC_REFLECT_AUTOMATIC cannot obtain field names as strings, so any text based serialization format like Serialization Text cannot work.
Reflection Auto library is an experimental library, unfortunately using some more obscure C++ meta-programming techniques, part of Libraries Extra.