Table Indexes
Please contact us before using these features.
Indexes enable different access patterns. Indexes are built on top of Spiral Tables, and querying an index returns a key table. Returned key table can be used with any scan of the original table, see Key Table Scanning for more details.
Following examples use the FineWeb dataset.
import spiral
from spiral.demo import demo_project, fineweb
sp = spiral.Spiral()
project = demo_project(sp)
fineweb_table = fineweb(sp)fineweb_table.schema().to_arrow()pa.schema({
"date": pa.int64(),
"id": pa.string(),
"text": pa.string(),
"dump": pa.string(),
"url": pa.string(),
"file_path": pa.string(),
"language": pa.string(),
"language_score": pa.float64(),
"token_count": pa.int64(),
})Text Index
Creating a text index is as simple as querying a table - you need to specify the table projection that is being indexed, and an optional row filter. Let’s create a text index on the ‘text’ column of the table.
from spiral import expressions as se
fineweb_index = project.create_text_index(
"fineweb-v1-text",
# Select just the 'text' column in the indexing projection
# Use `se.text.field` to configure indexing options such as tokenizer.
fineweb_table.select("text"),
# Only index English documents
where=fineweb_table['language'] == 'en',
# If the index already exists, do not error.
exist_ok=True,
)After creation, the index is built in the background. New changes to the table are automatically indexed.
New columns added to the projection after the index creation will not be automatically indexed.
scored_keys: pa.RecordBatchReader = sp.search(
# Number of results to return
10,
# Rank by expression(s), interpreted as `should`.
se.text.find(
fineweb_index["text"],
"Good Morning America", # Search term
),
)Querying the index returns a key table and results can be used with a table scan, see Key Table Scanning for more details.
table_scan = sp.scan(fineweb_table)
results = table_scan.to_table(key_table=scored_keys.read_all())Vector Index
Please contact us if you are interested in this feature.
…
Key Space Index
Key space indexes can be used to reliably shard table and maintain balanced shards, even when the table is filtered and projected. Creating a key space index enables global sharding for distributed model training over filtered & projected data.
from spiral import expressions as se
index = project.create_key_space_index(
"fineweb-en-key-space",
# Granularity of the index, in row count.
8192,
# Table projection.
fineweb_table.select("text"),
# Only index English documents
where=fineweb_table['language'] == 'en',
# If the index already exists, do not error.
exist_ok = True,
)After creation, the index is built in the background. New changes to the table are automatically indexed.