Try SpiralDB
Ready to take Spiral for a spin? This guide walks through the typical workflow:
- Create a project
- Connect data
- Model your table
- Ingest data
- Enrich with derived or fetched data
- Query for analytics with Polars
- Build a training data loader
Use uv (or your favorite package manager) to install the Python client and CLI.
uv
uv add pyspiral1. Create Project
Create a project first. Projects are the top-level unit used for access control and storage configuration.
uv run spiral projects create --description my-first-projectThe CLI prints a <project-id> you can reference from Python:
from spiral import Spiral
sp = Spiral()
project = sp.project("<project-id>")Learn more: Projects, Command Line
2. Connect a bucket
Before ingesting, connect your project to a backing file system (for example S3, GCS, or Azure Blob).
This is how Spiral reads and writes table data in object storage.
spiral fs update --type s3 --bucket <bucket-name> --region <region> <project-id>After this is configured, table writes and enrichments can read/write through that file system.
Learn more: File Systems, CLI: spiral fs update
3. Data Modeling
Spiral tables are sorted and unique by a primary key, and support nested column groups for multimodal data. Start by designing a stable key schema and a column layout that matches how you query.
import pyarrow as pa
key_schema = pa.schema([
("created_at", pa.timestamp("ms")),
("id", pa.string()),
])Learn more: Data Model, Tables Overview, Best Practices
4. Ingest
Create a table, then write Arrow tables or python objects. Every write must include key columns.
from datetime import datetime
events = project.create_table("getting-started.events", key_schema=key_schema, exist_ok=True)
data = pa.table({
"created_at": pa.array(
[datetime(2024, 1, 1, 0, 0), datetime(2024, 1, 1, 0, 1), datetime(2024, 1, 1, 0, 2)],
type=pa.timestamp("ms"),
),
"id": ["evt-1", "evt-2", "evt-3"],
"type": ["PullRequestEvent", "PushEvent", "PullRequestEvent"],
"actor": ["alice", "bob", "carol"],
"url": [
"https://picsum.photos/seed/alice/64",
"https://picsum.photos/seed/bob/64",
"https://picsum.photos/seed/carol/64",
],
})
events.write(data)Learn more: Write Tables
5. Enrichment
Enrichment lets you append new columns without rewriting existing data, including media fetched from URLs/S3/files.
from spiral import expressions as se
enrichment = events.enrich(
se.pack({
"thumbnail": se.http.get(events["url"])
})
)
enrichment.run()Learn more: Enrichment
6. Querying Data (Polars Analytics)
Use scan for projection/filtering and collect into Polars for fast analytics.
scan = sp.scan(
events[["created_at", "type", "actor"]],
where=events["type"] == "PullRequestEvent",
)
df = scan.to_polars()
summary = (
df.group_by("type")
.len()
.sort("len", descending=True)
)You can also use a Polars LazyFrame:
lazy = events.to_polars_lazy_frame()
result = lazy.collect()Learn more: Scan & Query
7. Constructing a Data Loader
Turn a scan into a PyTorch-compatible loader for training and evaluation.
scan = sp.scan(events)
loader = scan.to_data_loader(batch_size=32, seed=42)For distributed jobs:
dist_loader = scan.to_distributed_data_loader(batch_size=32, seed=42)Learn more: GPU Data Loading, Python API