Data Model

The data model of Spiral Tables is dictionary-like structures of columnar arrays, sorted and unique by a set of primary key columns. Tables are sorted and unique by a primary key. The data model enables:

Efficient sparse columns.
Columns arranged in a nested dictionary-like structure.
Support for large values and cell-level pushdown filtering.
Support for appending columns without rewriting the entire table.

Key Schema

When a table is created, a key schema is defined that represents the primary key and sort order of the table. This schema is fixed and cannot be changed after the table is created.

The key schema can be any number of columns of the following types:

(u)int{8,16,32,64}
float{16,32,64}
timestamp
bytes (up to 1KB)
string (up to 1KB)

Column Groups

The columns of a table are arranged in a nested dictionary-like structure. We refer to this as the schema tree. Column groups can be thought of as horizontal partitions of the table. Tables provide complete isolation between column groups, scanning only the column groups needed for a query.

For example, the GitHub Archive dataset has a key schema that looks like this:


import pyarrow as pa
 
key_schema = [('created_at', pa.timestamp('us')), ('id', pa.int64())]

And a schema tree of:


schema_tree = {
    'name': pa.string(),
    'public': pa.bool_(),
    'payload': pa.string(),
    'repo': {
        'id': pa.int64(),
        'name': pa.string(),
        'url': pa.string()
    },
    'actor': {
        'id': pa.int64(),
        'login': pa.string(),
        'gravatar_id': pa.string(),
        'url': pa.string(),
        'avatar_url': pa.string()
    },
    'org': {
        'id': pa.int64(),
        'login': pa.string(),
        'gravatar_id': pa.string(),
        'url': pa.string(),
        'avatar_url': pa.string()
    }
}

A column group refers to a set of sibling leaf columns in the schema tree. For example, in the schema tree above, there is a root (“) column group containing the name, public, and payload columns; as well as repo, actor and org column groups.

Storage Model

Each column group is stored as a log-structured merge (LSM) tree in object storage. This is a data structure that consists of sorted runs of data.

The key columns are split out and stored in key files, while the value columns are stored in fragment files. Background maintenance jobs periodically compact the LSM tree to merge overlapping sorted runs of value columns to improve read performance.

See Table Format for a more detailed specification of the storage model.