What is SpiralDB?

SpiralDB is an extremely fast analytical database designed for storing and querying multimodal, multi-rate data streams — the kind of data produced by sensors, cameras, and other instruments in fields like robotics, biology, autonomous systems, and other domains at the intersection of AI and the physical world.

It is built by the creators of Vortex , the open-source columnar file format, and uses Vortex as its storage foundation. This gives SpiralDB best-in-class compression and scan performance out of the box.

The core abstraction is Collections: hierarchical, tree-structured groups of records (similar to nested JSON arrays) backed by sorted columnar storage. They enable:

Multimodal storage with isolated column groups. Columns are stored independently, so tables can be arbitrarily wide — mixing video frames, embeddings, tensors, and metadata — without penalizing queries that only touch a few columns.
Multi-rate alignment. Different sensors record at different rates (a camera at 60 Hz, a robotic arm at 500 Hz). SpiralDB provides a streaming resample primitive that aligns sibling data streams to a shared timeline, so you can combine them in a single query.
Free column appends. Adding derived data like labels, annotations, or embeddings doesn’t require rewriting existing columns. New columns are appended independently through enrichment.

Additionally, SpiralDB supports GPU-optimized data loading by streaming data directly from object storage to the GPU, bypassing the traditional CPU-bound staging bottleneck. It integrates with PyTorch via a custom DataLoader with built-in sharding, shuffling, checkpointing, and multi-threaded Rust-based I/O.

Who is SpiralDB for?

SpiralDB is built for teams training AI models on data from the physical world — any domain where you’re capturing observations from sensors, instruments, or simulations and need to turn them into training-ready datasets.

This includes teams working in robotics, autonomous vehicles, biology, neuroscience, organic chemistry, materials science, and similar fields. The common thread is data that is both multimodal (video, audio, point clouds, scalar telemetry, embeddings) and temporal (recorded over time, often at different rates across sources).

If your workflow involves ingesting heterogeneous sensor data, aligning streams across time resolutions, deriving new features like labels or embeddings, and then loading the result into GPUs at high throughput — SpiralDB is designed for exactly that pipeline.

Use Cases

Feature Engineering: Large-scale multimodal datasets.
GPU Data Loading: High-throughput data loading for machine learning.

Dive Deeper

Getting Started: Set up Spiral and run through the core workflow.
Spiral Tables: Store, analyze and query massive and/or multimodal datasets.
File Systems: Learn how to configure and manage file systems in Spiral.
Access and Permissions: Manage user permissions and access control.