Skip to Content
Python APIspiral.expressions

aux

def aux(name: builtins.str, dtype: pa.DataType) -> Expr

Create a variable expression referencing a column in the auxiliary table.

Auxiliary table is optionally given to Scan#to_record_batches function when reading only specific keys or doing cell pushdown.

Arguments:

  • name - variable name
  • dtype - must match dtype of the column in the auxiliary table.

scalar

def scalar(value: Any) -> Expr

Create a scalar expression.

cast

def cast(expr: ExprLike, dtype: pa.DataType) -> Expr

Cast an expression into another PyArrow DataType.

and_

def and_(expr: ExprLike, *exprs: ExprLike) -> Expr

Create a conjunction of one or more expressions.

or_

def or_(expr: ExprLike, *exprs: ExprLike) -> Expr

Create a disjunction of one or more expressions.

eq

def eq(lhs: ExprLike, rhs: ExprLike) -> Expr

Create an equality comparison.

neq

def neq(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a not-equal comparison.

xor

def xor(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a XOR comparison.

lt

def lt(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a less-than comparison.

lte

def lte(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a less-than-or-equal comparison.

gt

def gt(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a greater-than comparison.

gte

def gte(lhs: ExprLike, rhs: ExprLike) -> Expr

Create a greater-than-or-equal comparison.

negate

def negate(expr: ExprLike) -> Expr

Negate the given expression.

not_

def not_(expr: ExprLike) -> Expr

Negate the given expression.

is_null

def is_null(expr: ExprLike) -> Expr

Check if the given expression is null.

is_not_null

def is_not_null(expr: ExprLike) -> Expr

Check if the given expression is not null.

add

def add(lhs: ExprLike, rhs: ExprLike) -> Expr

Add two expressions.

subtract

def subtract(lhs: ExprLike, rhs: ExprLike) -> Expr

Subtract two expressions.

multiply

def multiply(lhs: ExprLike, rhs: ExprLike) -> Expr

Multiply two expressions.

divide

def divide(lhs: ExprLike, rhs: ExprLike) -> Expr

Divide two expressions.

modulo

def modulo(lhs: ExprLike, rhs: ExprLike) -> Expr

Modulo two expressions.

getitem

def getitem(expr: ExprLike, field: str) -> Expr

Get field from a struct.

Arguments:

  • expr - The struct expression to get the field from.
  • field - The field to get. Dot-separated string is supported to access nested fields.

pack

def pack(fields: dict[str, ExprLike], *, nullable: bool = False) -> Expr

Assemble a new struct from the given named fields.

Arguments:

  • fields - A dictionary of field names to expressions. The field names will be used as the struct field names.

merge

def merge(*structs: "ExprLike") -> Expr

Merge fields from the given structs into a single struct.

Arguments:

  • *structs - Each expression must evaluate to a struct.

Returns:

A single struct containing all the fields from the input structs. If a field is present in multiple structs, the value from the last struct is used.

select

def select(expr: ExprLike, names: list[str] = None, exclude: list[str] = None) -> Expr

Select fields from a struct.

Arguments:

  • expr - The struct-like expression to select fields from.
  • names - Field names to select. If a path contains a dot, it is assumed to be a nested struct field.
  • exclude - List of field names to exclude from result. Exactly one of names or exclude must be provided.

Expr

class Expr()

Base class for Spiral expressions. All expressions support comparison and basic arithmetic operations.

__getitem__

def __getitem__(item: str | int | list[str]) -> "Expr"

Get an item from a struct or list.

Arguments:

  • item - The key or index to get. If item is a string, it is assumed to be a field in a struct. Dot-separated string is supported to access nested fields. If item is a list of strings, it is assumed to be a list of fields in a struct. If item is an integer, it is assumed to be an index in a list.

cast

def cast(dtype: pa.DataType) -> "Expr"

Cast the expression result to a different data type.

select

def select(*paths: str, exclude: list[str] = None) -> "Expr"

Select fields from a struct-like expression.

Arguments:

  • *paths - Field names to select. If a path contains a dot, it is assumed to be a nested struct field.
  • exclude - List of field names to exclude from result.

UDF

class UDF(abc.ABC)

A User-Defined Function (UDF). This class should be subclassed to define custom UDFs.

Example:

import spiral from spiral.demo import fineweb sp = spiral.Spiral() fineweb_table = fineweb(sp) from spiral import expressions as se import pyarrow as pa class MyAdd(se.UDF): def __init__(self): super().__init__("my_add") def return_type(self, scope: pa.DataType): if not isinstance(scope, pa.StructType): raise ValueError("Expected struct type as input") return scope.field(0).type def invoke(self, scope: pa.Array): if not isinstance(scope, pa.StructArray): raise ValueError("Expected struct array as input") return pa.compute.add(scope.field(0), scope.field(1)) my_add = MyAdd() expr = my_add(fineweb_table.select("first_arg", "second_arg"))

__call__

def __call__(scope: ExprLike) -> Expr

Create an expression that calls this UDF with the given arguments.

return_type

@abc.abstractmethod def return_type(scope: pa.DataType) -> pa.DataType

Must return the return type of the UDF given the input scope type.

All expressions in Spiral must return nullable (Arrow default) types, including nested structs, meaning that all fields in structs must also be nullable, and if those fields are structs, their fields must also be nullable, and so on.

invoke

@abc.abstractmethod def invoke(scope: pa.Array) -> pa.Array

Must implement the UDF logic given the input scope array.

Last updated on