aux
def aux(name: builtins.str, dtype: pa.DataType) -> ExprCreate a variable expression referencing a column in the auxiliary table.
Auxiliary table is optionally given to Scan#to_record_batches function when reading only specific keys
or doing cell pushdown.
Arguments:
name- variable namedtype- must match dtype of the column in the auxiliary table.
scalar
def scalar(value: Any) -> ExprCreate a scalar expression.
cast
def cast(expr: ExprLike, dtype: pa.DataType) -> ExprCast an expression into another PyArrow DataType.
and_
def and_(expr: ExprLike, *exprs: ExprLike) -> ExprCreate a conjunction of one or more expressions.
or_
def or_(expr: ExprLike, *exprs: ExprLike) -> ExprCreate a disjunction of one or more expressions.
eq
def eq(lhs: ExprLike, rhs: ExprLike) -> ExprCreate an equality comparison.
neq
def neq(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a not-equal comparison.
xor
def xor(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a XOR comparison.
lt
def lt(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a less-than comparison.
lte
def lte(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a less-than-or-equal comparison.
gt
def gt(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a greater-than comparison.
gte
def gte(lhs: ExprLike, rhs: ExprLike) -> ExprCreate a greater-than-or-equal comparison.
negate
def negate(expr: ExprLike) -> ExprNegate the given expression.
not_
def not_(expr: ExprLike) -> ExprNegate the given expression.
is_null
def is_null(expr: ExprLike) -> ExprCheck if the given expression is null.
is_not_null
def is_not_null(expr: ExprLike) -> ExprCheck if the given expression is not null.
add
def add(lhs: ExprLike, rhs: ExprLike) -> ExprAdd two expressions.
subtract
def subtract(lhs: ExprLike, rhs: ExprLike) -> ExprSubtract two expressions.
multiply
def multiply(lhs: ExprLike, rhs: ExprLike) -> ExprMultiply two expressions.
divide
def divide(lhs: ExprLike, rhs: ExprLike) -> ExprDivide two expressions.
modulo
def modulo(lhs: ExprLike, rhs: ExprLike) -> ExprModulo two expressions.
getitem
def getitem(expr: ExprLike, field: str) -> ExprGet field from a struct.
Arguments:
expr- The struct expression to get the field from.field- The field to get. Dot-separated string is supported to access nested fields.
pack
def pack(fields: dict[str, ExprLike], *, nullable: bool = False) -> ExprAssemble a new struct from the given named fields.
Arguments:
fields- A dictionary of field names to expressions. The field names will be used as the struct field names.
merge
def merge(*structs: "ExprLike") -> ExprMerge fields from the given structs into a single struct.
Arguments:
*structs- Each expression must evaluate to a struct.
Returns:
A single struct containing all the fields from the input structs. If a field is present in multiple structs, the value from the last struct is used.
select
def select(expr: ExprLike,
names: list[str] = None,
exclude: list[str] = None) -> ExprSelect fields from a struct.
Arguments:
expr- The struct-like expression to select fields from.names- Field names to select. If a path contains a dot, it is assumed to be a nested struct field.exclude- List of field names to exclude from result. Exactly one ofnamesorexcludemust be provided.
Expr
class Expr()Base class for Spiral expressions. All expressions support comparison and basic arithmetic operations.
__getitem__
def __getitem__(item: str | int | list[str]) -> "Expr"Get an item from a struct or list.
Arguments:
item- The key or index to get. If item is a string, it is assumed to be a field in a struct. Dot-separated string is supported to access nested fields. If item is a list of strings, it is assumed to be a list of fields in a struct. If item is an integer, it is assumed to be an index in a list.
cast
def cast(dtype: pa.DataType) -> "Expr"Cast the expression result to a different data type.
select
def select(*paths: str, exclude: list[str] = None) -> "Expr"Select fields from a struct-like expression.
Arguments:
*paths- Field names to select. If a path contains a dot, it is assumed to be a nested struct field.exclude- List of field names to exclude from result.
UDF
class UDF(abc.ABC)A User-Defined Function (UDF). This class should be subclassed to define custom UDFs.
Example:
import spiral
from spiral.demo import fineweb
sp = spiral.Spiral()
fineweb_table = fineweb(sp)
from spiral import expressions as se
import pyarrow as pa
class MyAdd(se.UDF):
def __init__(self):
super().__init__("my_add")
def return_type(self, scope: pa.DataType):
if not isinstance(scope, pa.StructType):
raise ValueError("Expected struct type as input")
return scope.field(0).type
def invoke(self, scope: pa.Array):
if not isinstance(scope, pa.StructArray):
raise ValueError("Expected struct array as input")
return pa.compute.add(scope.field(0), scope.field(1))
my_add = MyAdd()
expr = my_add(fineweb_table.select("first_arg", "second_arg"))__call__
def __call__(scope: ExprLike) -> ExprCreate an expression that calls this UDF with the given arguments.
return_type
@abc.abstractmethod
def return_type(scope: pa.DataType) -> pa.DataTypeMust return the return type of the UDF given the input scope type.
All expressions in Spiral must return nullable (Arrow default) types, including nested structs, meaning that all fields in structs must also be nullable, and if those fields are structs, their fields must also be nullable, and so on.
invoke
@abc.abstractmethod
def invoke(scope: pa.Array) -> pa.ArrayMust implement the UDF logic given the input scope array.