Anatomy of a Scan
Before we dive into the details of table scans, let’s remind ourselves of the few design decisions in a table format:
- Column Groups are co-partitioned sets of columns. A partition is called fragment.
- Fragments are stored as Vortex files, sorted by primary key, but keys are stored separately.
- Key Spaces store keys and are optimized for sort-merge joins (especially for dense runs of identical keys).
- Manifests store fragments metadata including an alignment between fragment and a key space.

These design decisions have following implications for table scans:
- Metadata that is frequently used for filtering is partitioned separately from data columns that are usually just projected (for example, it’s very rare to filter on audio bytes).
- Fragments are partitioned by size (optimized for object storage). In practice, fragments that store metadata have a lot more rows compared to fragments that store projected data.
A look at the table
Our table contains audio data with following schema:
Schema({
audio_length=f64?,
silence_ratio=f64?
audio={
bytes=binary?,
meta={
size=u64?,
e_tag=utf8?,
}?
}?
})Table contains audio bytes in a column group called audio, and two “metadata” column groups, a root one and
audio.meta with some additional source-specific metadata. A look at the manifests shows following (truncated):
Key Space manifest
131 fragments, total: 60.6MB, avg: 473.5KB, metadata: 431.7KB
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID ┃ Size (Metadata) ┃ Format ┃ Key Span ┃ Level ┃ Committed At ┃ Compacted At ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ m86rvtu9y6 │ 260.5KB (3.2KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ 0qaq87gz87 │ 30.4MB (19.6KB) │ vortex │ 0..1294000 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
│ dve28ce6z8 │ 247.1KB (3.1KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ hcjbwo31q5 │ 237.4KB (3.1KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ tzs9ssfgd7 │ 238.0KB (3.2KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
Column Group manifest for table_sl6o0u
6 fragments, total: 113.9MB, avg: 19.0MB, metadata: 111.5KB
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID ┃ Size (Metadata) ┃ Format ┃ Key Span ┃ Level ┃ Committed At ┃ Compacted At ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ pqjb420f8r │ 20.2MB (19.1KB) │ vortex │ 0..228261 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
│ vrv19qwg1i │ 20.2MB (19.1KB) │ vortex │ 228261..456522 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
│ 9lhiuacvoi │ 20.0MB (19.1KB) │ vortex │ 456522..684783 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
│ yq7dqeed9r │ 20.1MB (19.1KB) │ vortex │ 684783..913044 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
│ 2tbvh0v6td │ 20.1MB (19.1KB) │ vortex │ 913044..1141305 │ L0 │ 2025-11-06 18:20:00.224537+00:00 │ N/A │
Column Group manifest for table_sl6o0u.audio
1165 fragments, total: 137.6GB, avg: 120.9MB, metadata: 2.7MB
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID ┃ Size (Metadata) ┃ Format ┃ Key Span ┃ Level ┃ Committed At ┃ Compacted At ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ xn6rq15db4 │ 129.5MB (2.4KB) │ vortex │ 0..1177 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ vn4d04pp2r │ 128.7MB (2.4KB) │ vortex │ 1177..2354 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ yqgn69c1rh │ 126.2MB (2.4KB) │ vortex │ 2354..3531 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ 9mmqc78utg │ 129.3MB (2.4KB) │ vortex │ 3531..4708 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ ffsgw7yf5m │ 126.6MB (2.4KB) │ vortex │ 4708..5885 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
Column Group manifest for table_sl6o0u.audio.meta
130 fragments, total: 63.8MB, avg: 502.5KB, metadata: 891.4KB
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID ┃ Size (Metadata) ┃ Format ┃ Key Span ┃ Level ┃ Committed At ┃ Compacted At ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ lkbhpyhmhq │ 507.0KB (6.9KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ y052mol59q │ 514.9KB (6.9KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ w128y2udmu │ 504.6KB (6.9KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ v36ealhiys │ 502.9KB (6.9KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │
│ a03enoo1j5 │ 502.7KB (6.9KB) │ vortex │ 0..10000 │ L0 │ 2025-11-06 20:10:41.359348+00:00 │ N/A │A typical scan
A typical scan over this table might look like this:
sp.scan(table["audio.bytes"], where=table["silence_ratio"] < 0.1)When executing this scan, Spiral client performs following steps:
Load Fragment Manifest(s)
Client identifies that the scan involves two column groups:
- root, for filtering on
silence_ratio audio, for projectingaudio.bytes
Client scans the table’s manifests to identify relevant fragments for both column groups.
Client determines that silence_ratio filter can be pushed-down into the root column group scan, and
prunes manifests using statistics metadata about fragments.

Scan Filtered Column Group(s)
Client scans the fragments of the column group involved in filtering (root).
Client applies the filter silence_ratio < 0.1 and produces a row mask (and the projected columns but in this
case no columns are being projected from this column group). In practice, this is a columnar file scan over lots
of rows (very efficient!).
Row mask indicates which rows satisfy the filter condition, and when combined with alignment metadata from manifests, client can determine which keys correspond to the filtered rows.

Join Needed Key Spaces
Client identifies the key spaces needed for join between the filtered column group and the projected column group.
Client loads key spaces, applies row mask that is the result of the filtering, and joins them together to produce a new row mask that indicates which value rows are needed for the projected column group.

Take Projected Column Group(s)
Client scans the fragments of the column group involved in projection, audio in this case.
Client applies the row mask obtained from the previous step to read only the needed rows from the projected column group.

This is only possible because of the random access performance of Vortex files.
About Performance
Scans are optimized for high-throughput.
- Filters are pushed down into Vortex file scans over large number of rows (10-20x faster that Parquet!).
- Keys are joined efficiently using partitioned Merkle hash tries.
- Projections are evaluated as random access Vortex file reads (100x faster than Parquet!).
This last point means that each batch of rows out of a scan is expressed only as masked Vortex array, enabling zero-copy & zero-decompression transfer of data from storage all the way to the end user. And since Vortex arrays can decompress on the GPU, this means that data can be transferred directly into GPU memory!