Try SpiralDB

Ready to take Spiral for a spin? This guide walks through the typical workflow:

Create a project
Connect data
Model your table
Ingest data
Enrich with derived or fetched data
Query for analytics with Polars
Build a training data loader

Use uv (or your favorite package manager) to install the Python client and CLI.

uv


uv add pyspiral

1. Create Project

Create a project first. Projects are the top-level unit used for access control and storage configuration.


uv run spiral projects create --description my-first-project

The CLI prints a <project-id> you can reference from Python:


from spiral import Spiral
 
sp = Spiral()
project = sp.project("<project-id>")

Learn more: Projects, Command Line

2. Connect a bucket

Before ingesting, connect your project to a backing file system (for example S3, GCS, or Azure Blob).
This is how Spiral reads and writes table data in object storage.


spiral fs update --type s3 --bucket <bucket-name> --region <region> <project-id>

After this is configured, table writes and enrichments can read/write through that file system.

Learn more: File Systems, CLI: spiral fs update

3. Data Modeling

Spiral tables are sorted and unique by a primary key, and support nested column groups for multimodal data. Start by designing a stable key schema and a column layout that matches how you query.


import pyarrow as pa
 
key_schema = pa.schema([
    ("created_at", pa.timestamp("ms")),
    ("id", pa.string()),
])

Learn more: Data Model, Tables Overview, Best Practices

4. Ingest

Create a table, then write Arrow tables or python objects. Every write must include key columns.


from datetime import datetime
 
events = project.create_table("getting-started.events", key_schema=key_schema, exist_ok=True)
 
data = pa.table({
    "created_at": pa.array(
        [datetime(2024, 1, 1, 0, 0), datetime(2024, 1, 1, 0, 1), datetime(2024, 1, 1, 0, 2)],
        type=pa.timestamp("ms"),
    ),
    "id": ["evt-1", "evt-2", "evt-3"],
    "type": ["PullRequestEvent", "PushEvent", "PullRequestEvent"],
    "actor": ["alice", "bob", "carol"],
    "url": [
        "https://picsum.photos/seed/alice/64",
        "https://picsum.photos/seed/bob/64",
        "https://picsum.photos/seed/carol/64",
    ],
})
 
events.write(data)

Learn more: Write Tables

5. Enrichment

Enrichment lets you append new columns without rewriting existing data, including media fetched from URLs/S3/files.


from spiral import expressions as se
 
enrichment = events.enrich(
    se.pack({
        "thumbnail": se.http.get(events["url"])
    })
)
enrichment.run()

Learn more: Enrichment

6. Querying Data (Polars Analytics)

Use scan for projection/filtering and collect into Polars for fast analytics.


scan = sp.scan(
    events[["created_at", "type", "actor"]],
    where=events["type"] == "PullRequestEvent",
)
 
df = scan.to_polars()
summary = (
    df.group_by("type")
    .len()
    .sort("len", descending=True)
)

You can also use a Polars LazyFrame:


lazy = events.to_polars_lazy_frame()
result = lazy.collect()

Learn more: Scan & Query

7. Constructing a Data Loader

Turn a scan into a PyTorch-compatible loader for training and evaluation.


scan = sp.scan(events)
loader = scan.to_data_loader(batch_size=32, seed=42)

For distributed jobs:


dist_loader = scan.to_distributed_data_loader(batch_size=32, seed=42)

Learn more: GPU Data Loading, Python API