Skip to Content
TablesBest Practices

Best Practices

Spiral Tables are designed to be flexible, yet performant for a wide variety of use cases. However, there are some best practices to follow when designing your table schema to ensure optimal performance.

Key Schema

The key schema defines the sorting order of the table and is crucial for performance. In general, rows frequently accessed together should be “close” in the key space.

  • Avoid using random keys such as UUIDs, as they lead to low locality and frequent compactions.
  • If possible, use keys that reflect the natural ordering of your data. If data doesn’t have a natural ordering, consider using a time-based key (e.g., UUIDv7, timestamp column, etc.) to group recent data together.

Column Groups

Column groups are horizontal partitions of a table that are (almost) completely independent from each other. This design allows Spiral Tables to support very wide schemas with hundreds of thousands of columns efficiently.

  • Columns that aren’t used as filters should be separated from those that commonly appear in filters.
  • Columns that commonly appear in filters together should be placed in the same column group.
  • Columns with large cells (e.g. text), and especially binary columns, (e.g., images, files, etc.) should be placed in their own column groups.

When scanning a table, Spiral prunes fragments based on column groups. To do this efficiently, at least one column from a column group must be included in the projection or filter when scanning the table. If no column groups are referenced in your query, the scan would have to consider all column groups across the table, which defeats the purpose of having independent column groups and can significantly impact performance. Due to this design, it is not currently possible to read or write just key columns without including at least one column from any column group. If this limitation is causing problems for your use case, please reach out.

Interactive Mode

When in interactive mode, such as in a Jupyter notebook, it is recommended to enable disk caching to speed up subsequent scans of the same table. Disk caching can be enabled and configured when creating a Spiral client.

from spiral import Spiral sp = Spiral(overrides={"cache.enabled": "1"})

Or with more configuration options:

from spiral import Spiral sp = Spiral(overrides={ "cache.enabled": "1", "cache.memory_capacity_bytes": "1073741824", # 1 GiB "cache.disk_capacity_bytes": "10737418240", # 10 GiB })
Last updated on