srdatalog.dsl

Python front-end DSL for SRDatalog, replacing the Nim macro DSL in lang.nim/syntax.nim.

Rules are constructed with Python objects and operator overloading:

X, Y, Z = Var(“X”), Var(“Y”), Var(“Z”) edge = Relation(“Edge”, 2) path = Relation(“Path”, 2)

Module Contents

Classes

Agg

Aggregation body clause. Binds result_var to the aggregate of rel(args...) using func (C++ aggregator name; “agg” + cpp_type for custom aggregators, mirrors Nim’s AggClause).

ArgKind

Mirrors Nim ClauseArgKind in syntax.nim.

Atom

A relation application, used as head or body clause.

ClauseArg

An argument slot in an atom: a logic var, a compile-time constant, or raw C++ code.

Conjunction

Intermediate: accumulates body clauses under &. Not emitted directly.

Const

A compile-time constant argument wrapping a Python value.

Filter

Inline filter — return <cpp_code> against bound vars. Mostly produced by the constant-rewriting pass (where e.g. R(1, x) becomes R(_c0, x)

HeadGroup

Intermediate: accumulates head atoms under |. Mirrors Nim’s {(A args), (B args)} <-- body multi-head rule form.

Let

Bind a fresh variable to a C++ expression. Produced by the head- constant-rewriting pass when a head has literal args; the head arg is replaced by a fresh variable and a corresponding Let is appended to the body (so the fresh variable is bound before InsertInto reads it).

Negation

Negated atom (~rel(...)). Appears only in rule bodies.

PlanEntry

User-specified plan for a rule variant. Mirrors PlanEntry in syntax.nim.

Program

A Datalog program. Takes rules; the relations list is derived from them via the Relation back-ref on each Atom.

Relation

A relation declaration. Callable to build atoms.

Rule

A Datalog rule: head_1, head_2, ... :- body_1, body_2, ....

Split

Split marker — partitions a rule body into above-split and below-split sections. Mirrors Nim’s SplitClause (split keyword).

Var

A logic variable. Distinct from Python values; used by operator overloads to build AST.

Functions

agg

Build an aggregation body clause.

count

Convenience: count(v, R(x, y)) → (v = count(R(x, y))).

cpp

Raw C++ code as a clause argument (rare; mirrors the $"..." Nim syntax).

sum

Data

API

class srdatalog.dsl.Agg[source]

Aggregation body clause. Binds result_var to the aggregate of rel(args...) using func (C++ aggregator name; “agg” + cpp_type for custom aggregators, mirrors Nim’s AggClause).

Example: count of R(x, y) bound to c: agg(c, “count”, r(x, y))

Nim’s HIR emits these into JSON as {"kind": "aggregation", ...} but its lowering pipeline does not construct moAggregate nodes from AggClause (zero such constructions in src/srdatalog). Python mirrors that behavior: Agg round-trips through HIR but does not appear in MIR.

__and__(other: srdatalog.dsl.BodyClauseT | srdatalog.dsl.Conjunction) srdatalog.dsl.Conjunction[source]
args: tuple[srdatalog.dsl.ClauseArg, ...]

None

cpp_type: str = <Multiline-String>
func: str

None

rel: str

None

relation: Relation | None

‘field(…)’

result_var: str

None

class srdatalog.dsl.ArgKind(*args, **kwds)[source]

Bases: enum.Enum

Mirrors Nim ClauseArgKind in syntax.nim.

Initialization

CONST

‘const’

CPP_CODE

‘code’

LVAR

‘var’

class srdatalog.dsl.Atom[source]

A relation application, used as head or body clause.

Build via Relation.__call__, never directly. Supports & to chain into a body conjunction and <= to form a rule with this atom as head.

prov carries rewrite provenance: set by passes like semi-join optimization when a rewritten body clause is emitted in place of the original. Defaults to user-written.

__and__(other) srdatalog.dsl.Conjunction[source]

Compose with Atom / Negation / Filter / Let / Conjunction.

__invert__() srdatalog.dsl.Negation[source]

~atom = negation.

__le__(body) srdatalog.dsl.Rule[source]

head <= body → Rule. Anonymous; call .named(name) to label. body can be any BodyClauseT or a Conjunction of them.

__or__(other: srdatalog.dsl.Atom | srdatalog.dsl.HeadGroup) srdatalog.dsl.HeadGroup[source]

Compose atoms into a multi-head group: A | B | C <= body.

args: tuple[srdatalog.dsl.ClauseArg, ...]

None

prov: srdatalog.ir.hir.provenance.Provenance

None

rel: str

None

relation: Relation | None

‘field(…)’

srdatalog.dsl.BodyClauseT

None

class srdatalog.dsl.ClauseArg[source]

An argument slot in an atom: a logic var, a compile-time constant, or raw C++ code.

const_cpp_expr: str | None

None

const_value: int | None

None

cpp_code: str | None

None

kind: srdatalog.dsl.ArgKind

None

var_name: str | None

None

class srdatalog.dsl.Conjunction[source]

Intermediate: accumulates body clauses under &. Not emitted directly.

__and__(other: srdatalog.dsl.BodyClauseT | srdatalog.dsl.Conjunction) srdatalog.dsl.Conjunction[source]
clauses: tuple[srdatalog.dsl.BodyClauseT, ...]

None

class srdatalog.dsl.Const(value, cpp_expr: str | None = None)[source]

A compile-time constant argument wrapping a Python value.

Prefer this over bare int arguments when you want the intent explicit at the call site — e.g., Method_Modifier(Const(abstract_id), meth) instead of Method_Modifier(abstract_id, meth) where abstract_id is a Python int that readers can’t tell apart from a pure-Python value.

For dataset-resolved constants (read from a meta.json at program construction time), this is the recommended shape:

meta = load_meta("batik_meta.json")
ABSTRACT = Const(meta["abstract"])   # Python binding, value baked in
Method_Modifier(ABSTRACT, meth)

cpp_expr overrides the auto-derived C++ literal. For int it defaults to str(value). Other types require an explicit cpp_expr until we need them.

Initialization

__repr__() str[source]
__slots__

(‘cpp_expr’, ‘value’)

class srdatalog.dsl.Filter[source]

Inline filter — return <cpp_code> against bound vars. Mostly produced by the constant-rewriting pass (where e.g. R(1, x) becomes R(_c0, x)

  • Filter((_c0,), "return _c0 == 1;")), but available in the surface DSL too.

__and__(other: srdatalog.dsl.BodyClauseT | srdatalog.dsl.Conjunction) srdatalog.dsl.Conjunction[source]
code: str

None

vars: tuple[str, ...]

None

class srdatalog.dsl.HeadGroup[source]

Intermediate: accumulates head atoms under |. Mirrors Nim’s {(A args), (B args)} <-- body multi-head rule form.

__le__(body) srdatalog.dsl.Rule[source]
__or__(other: srdatalog.dsl.Atom | srdatalog.dsl.HeadGroup) srdatalog.dsl.HeadGroup[source]
atoms: tuple[srdatalog.dsl.Atom, ...]

None

class srdatalog.dsl.Let[source]

Bind a fresh variable to a C++ expression. Produced by the head- constant-rewriting pass when a head has literal args; the head arg is replaced by a fresh variable and a corresponding Let is appended to the body (so the fresh variable is bound before InsertInto reads it).

__and__(other: srdatalog.dsl.BodyClauseT | srdatalog.dsl.Conjunction) srdatalog.dsl.Conjunction[source]
code: str

None

deps: tuple[str, ...]

()

var_name: str

None

class srdatalog.dsl.Negation[source]

Negated atom (~rel(...)). Appears only in rule bodies.

__and__(other: srdatalog.dsl.BodyClauseT | srdatalog.dsl.Conjunction) srdatalog.dsl.Conjunction[source]
atom: srdatalog.dsl.Atom

None

class srdatalog.dsl.PlanEntry[source]

User-specified plan for a rule variant. Mirrors PlanEntry in syntax.nim.

delta == -1 targets the base (non-recursive) variant; otherwise it is the body-clause index used as the delta seed for semi-naive evaluation. var_order and clause_order override the default planning heuristic; when only var_order is given, clause_order is derived from it.

The pragma flags flow through to HirRuleVariant so codegen sees them:

  • fanout -> fan-out work-stealing for Cartesian products

  • work_stealing -> mid-level work-stealing (task queue + steal loop)

  • block_group -> block-group work partitioning

  • dedup_hash -> GPU hash table for in-kernel existential dedup balanced_root / balanced_sources drive balanced partitioning for skewed joins (not yet lowered in Python).

balanced_root: tuple[str, ...]

()

balanced_sources: tuple[str, ...]

()

block_group: bool

False

clause_order: tuple[int, ...]

()

dedup_hash: bool

False

delta: int

None

fanout: bool

False

var_order: tuple[str, ...]

()

work_stealing: bool

False

class srdatalog.dsl.Program[source]

A Datalog program. Takes rules; the relations list is derived from them via the Relation back-ref on each Atom.

The previous API took relations=[...] in parallel with rules=[...]. That was a pure bug generator — if a relation was declared but never used, or used but never declared, the downstream passes silently generated wrong code. With the derived list, the schema is exactly the set of relations referenced by some rule, in rule-first-occurrence order (heads before body, body in source order). This matches the Nim-side normalization in hir.nim:normalizeDecls and keeps byte-match across the two ports.

__post_init__() None[source]
add(*items: srdatalog.dsl.Rule) srdatalog.dsl.Program[source]
relations: list[srdatalog.dsl.Relation]

‘field(…)’

rules: list[srdatalog.dsl.Rule]

‘field(…)’

show(*, rule: str | None = None, delta: int | None = None, theme: str = 'dark', include_jit: bool = True, height_px: int = 600) None[source]

Render this program in Jupyter with full options.

Args: rule: when None, shows the ruleset overview (the default the cell’s prog expression already produces). When a string, drills into that rule’s plan view — variant access patterns, clause order, var order with drag-to-reorder. delta: only meaningful with rule. Filters to a single variant of the rule — e.g. delta=0 shows just the variant seeded on body clause 0. Recursive rules emit one variant per body clause for semi-naive evaluation; this is how you isolate one of those “versions”. Default None shows all. theme: ‘dark’ (default), ‘light’, or ‘high-contrast’. Controls the renderer’s color palette inside the iframe — independent of VS Code’s editor theme. include_jit: include per-rule JIT C++ kernels. Adds ~2-3 MB on doop; off by default in _repr_mimebundle_ for cell rerun speed, on by default here since you’re explicitly invoking. height_px: iframe height. Bump for larger rulesets.

Examples: prog.show() # ruleset, dark, with JIT prog.show(rule=’TCRec’) # all variants of TCRec prog.show(rule=’TCRec’, delta=0) # just delta-0 variant prog.show(theme=’light’) # light mode prog.show(rule=’VPT_Load’, delta=1, theme=’light’, height_px=900)

Requires IPython.

class srdatalog.dsl.Relation(name: str, arity: int, column_types: tuple[type, ...] | None = None, *, input_file: str = '', print_size: bool = False, output_file: str = '', index_type: str = '', semiring: str = 'NoProvenance')[source]

A relation declaration. Callable to build atoms.

Arity + column_types are structural metadata. Pragma fields (all optional) mirror Nim’s Relation[…] pragmas:

  • input_file → CSV the load-data block reads into this relation

  • print_size → runner emits a size-readback line after the fixpoint

  • output_file → runner writes the final contents to this path

  • index_type → C++ index template (e.g. “SRDatalog::GPU::Device2LevelIndex”)

  • semiring → override “NoProvenance” (rare — provenance semirings)

Initialization

__call__(*args) srdatalog.dsl.Atom[source]
__repr__() str[source]
__slots__

(‘arity’, ‘column_types’, ‘index_type’, ‘input_file’, ‘name’, ‘output_file’, ‘print_size’, ‘semiring…

class srdatalog.dsl.Rule[source]

A Datalog rule: head_1, head_2, ... :- body_1, body_2, ....

heads is always a tuple of one or more Atoms (mirrors Nim’s Rule.head: seq[HeadClause]). Build multi-head rules with (A | B | C) <= body; single-head still reads A <= body.

plans holds user-provided PlanEntry overrides (one per delta position). count marks a rule as count-only: no materialization, just the cardinality. semi_join opts the rule into the Pass 1.5 semi-join optimization. is_generated is True for compiler-synthesised rules (e.g. the _SJ_Target_Filter_... helpers emitted by semi-join optimization). prov carries rewrite provenance (user vs compiler-gen) — mirrors syntax.nim’s Rule.prov.

body: tuple[srdatalog.dsl.BodyClauseT, ...]

None

count: bool

False

debug_code: str = <Multiline-String>
property head: srdatalog.dsl.Atom

First head (convenience for single-head rules). For multi-head, iterate self.heads.

heads: tuple[srdatalog.dsl.Atom, ...]

None

is_generated: bool

False

name: str | None

None

named(name: str) srdatalog.dsl.Rule[source]
plans: tuple[srdatalog.dsl.PlanEntry, ...]

()

prov: srdatalog.ir.hir.provenance.Provenance

None

semi_join: bool

False

with_count() srdatalog.dsl.Rule[source]

Mark this rule as count-only.

with_inject_cpp(code: str) srdatalog.dsl.Rule[source]

Attach a C++ debug hook to be emitted as an InjectCppHook MIR node once per variant (after the rule’s pipeline runs). Mirrors Nim’s inject_cpp: "..." rule pragma.

with_plan(*, delta: int = -1, var_order: tuple[str, ...] | list[str] | None = None, clause_order: tuple[int, ...] | list[int] | None = None, fanout: bool = False, work_stealing: bool = False, block_group: bool = False, dedup_hash: bool = False, balanced_root: tuple[str, ...] | list[str] | None = None, balanced_sources: tuple[str, ...] | list[str] | None = None) srdatalog.dsl.Rule[source]

Append a single PlanEntry. Can be called multiple times to add entries for different deltas (or use .with_plans(entries) to replace).

with_plans(entries: list[srdatalog.dsl.PlanEntry] | tuple[srdatalog.dsl.PlanEntry, ...]) srdatalog.dsl.Rule[source]

Replace all plans with the given sequence.

with_semi_join() srdatalog.dsl.Rule[source]

Opt into semi-join optimization (Pass 1.5). Ignored on rules with <= 2 body clauses (the pass skips them per Nim’s semantics).

srdatalog.dsl.SPLIT

‘Split(…)’

class srdatalog.dsl.Split[source]

Split marker — partitions a rule body into above-split and below-split sections. Mirrors Nim’s SplitClause (split keyword).

Pipeline A writes the above-split output to a temp relation; Pipeline B scans the temp and joins with below-split clauses to produce the head. Useful for negation pushdown / selective join evaluation. At most one Split per rule body.

__and__(other)[source]
class srdatalog.dsl.Var(name: str)[source]

A logic variable. Distinct from Python values; used by operator overloads to build AST.

Initialization

__repr__() str[source]
__slots__

(‘name’,)

srdatalog.dsl.agg(result_var, func: str, rel_atom: srdatalog.dsl.Atom, cpp_type: str = '') srdatalog.dsl.Agg[source]

Build an aggregation body clause.

result_var may be a Var instance or a bare string var name. rel_atom is the output of Relation(...)(...) — its rel + args become the aggregation’s relation reference.

srdatalog.dsl.count(result_var, rel_atom: srdatalog.dsl.Atom) srdatalog.dsl.Agg[source]

Convenience: count(v, R(x, y)) → (v = count(R(x, y))).

srdatalog.dsl.cpp(code: str) srdatalog.dsl.ClauseArg[source]

Raw C++ code as a clause argument (rare; mirrors the $"..." Nim syntax).

srdatalog.dsl.sum(result_var, rel_atom: srdatalog.dsl.Atom) srdatalog.dsl.Agg[source]

Path(X, Y) :- Edge(X, Y)

r1 = Rule(heads=(path(X, Y),), body=[edge(X, Y)], name=”TCBase”)

Path(X, Z) :- Path(X, Y), Edge(Y, Z)

r2 = (path(X, Z) <= path(X, Y) & edge(Y, Z)).named(“TCRec”)

This module defines only the DSL surface; lowering to HIR is in hir_passes.py (TBD).