Skip to Content
Getting Started

Try SpiralDB

Ready to take Spiral for a spin? This guide walks through the typical workflow:

  1. Create a project
  2. Connect data
  3. Model your table
  4. Ingest data
  5. Enrich with derived or fetched data
  6. Query for analytics with Polars
  7. Build a training data loader

Use uv (or your favorite package manager) to install the Python client and CLI.

uv add pyspiral

1. Create Project

Create a project first. Projects are the top-level unit used for access control and storage configuration.

uv run spiral projects create --description my-first-project

The CLI prints a <project-id> you can reference from Python:

from spiral import Spiral sp = Spiral() project = sp.project("<project-id>")

Learn more: Projects, Command Line

2. Connect a bucket

Before ingesting, connect your project to a backing file system (for example S3, GCS, or Azure Blob).
This is how Spiral reads and writes table data in object storage.

spiral fs update --type s3 --bucket <bucket-name> --region <region> <project-id>

After this is configured, table writes and enrichments can read/write through that file system.

Learn more: File Systems, CLI: spiral fs update

3. Data Modeling

Spiral tables are sorted and unique by a primary key, and support nested column groups for multimodal data. Start by designing a stable key schema and a column layout that matches how you query.

import pyarrow as pa key_schema = pa.schema([ ("created_at", pa.timestamp("ms")), ("id", pa.string()), ])

Learn more: Data Model, Tables Overview, Best Practices

4. Ingest

Create a table, then write Arrow tables or python objects. Every write must include key columns.

from datetime import datetime events = project.create_table("getting-started.events", key_schema=key_schema, exist_ok=True) data = pa.table({ "created_at": pa.array( [datetime(2024, 1, 1, 0, 0), datetime(2024, 1, 1, 0, 1), datetime(2024, 1, 1, 0, 2)], type=pa.timestamp("ms"), ), "id": ["evt-1", "evt-2", "evt-3"], "type": ["PullRequestEvent", "PushEvent", "PullRequestEvent"], "actor": ["alice", "bob", "carol"], "url": [ "https://picsum.photos/seed/alice/64", "https://picsum.photos/seed/bob/64", "https://picsum.photos/seed/carol/64", ], }) events.write(data)

Learn more: Write Tables

5. Enrichment

Enrichment lets you append new columns without rewriting existing data, including media fetched from URLs/S3/files.

from spiral import expressions as se enrichment = events.enrich( se.pack({ "thumbnail": se.http.get(events["url"]) }) ) enrichment.run()

Learn more: Enrichment

6. Querying Data (Polars Analytics)

Use scan for projection/filtering and collect into Polars for fast analytics.

scan = sp.scan( events[["created_at", "type", "actor"]], where=events["type"] == "PullRequestEvent", ) df = scan.to_polars() summary = ( df.group_by("type") .len() .sort("len", descending=True) )

You can also use a Polars LazyFrame:

lazy = events.to_polars_lazy_frame() result = lazy.collect()

Learn more: Scan & Query

7. Constructing a Data Loader

Turn a scan into a PyTorch-compatible loader for training and evaluation.

scan = sp.scan(events) loader = scan.to_data_loader(batch_size=32, seed=42)

For distributed jobs:

dist_loader = scan.to_distributed_data_loader(batch_size=32, seed=42)

Learn more: GPU Data Loading, Python API

Last updated on