srdatalog.ir.dialects.iir.cf.ops

iir.cf op definitions — control flow.

All ops here are pure data (D1), frozen+slots dataclasses (D2, D3), @final to lock the closed sum (D11). Names mirror the spec where possible. Fields stay primitives or tuples of Op (no lists) so strategy combinators traverse cleanly.

Naming convention for binders:

Bind(name, expr) — declare auto <name> = <expr>; in the enclosing Block. The name is used verbatim in generated code; the lowering chooses names to match the legacy emitter’s bump order for byte-equivalence.

VarRef(name) — refer to a previously-bound name.

This explicit string-name approach is the M1 pragmatic choice. The spec calls for lexical scoping (D8); a future refactor can replace the string-keyed lookup with proper de Bruijn / Let scopes once the byte-equivalence gate has been validated end-to-end.

Module Contents

Classes

AddCount

Bump the count counter directly. Used by the count-as-product short-circuit (R1) and by counting-only paths.

Bind

Declare auto <name> = <expr>; (or <type> <name> = <expr>;).

BlankLine

Emit a single empty line. Used to match legacy emission where whitespace has structural meaning (e.g. between the degree fetch and the loop preamble).

Block

A sequence of statements emitted in order.

Cartesian2DDecompose

Adaptive 2-source flat-index decomposition.

CartesianFlatLoop

Flat for-loop over the Cartesian product, partitioned by lane.

CartesianNDecompose

Countdown-remainder decomposition for an N-source flat index (N >= 3).

Comment

Emit a // ... comment. Pass-through to the C++ source. The legacy emitter sprinkles these for debugging; the dialect carries them as IR so byte-equivalence preserves them.

GridStrideLoop

Warp-strided grid-stride for-loop with body.

If

if (<cond>) { <body> } — body emitted at the SAME indent as the wrapping if (matches the legacy emitter’s no-inc-indent quirk for filter chains, where the body was rendered before the wrap was applied).

IfContinueIfNot

if (!<cond>) continue; — the inner-loop validity guard.

IfReturnIfNot

if (!<cond>) return; — the validity guard pattern.

IndentBlock

Render contained statements at +extra indent levels.

IntersectIter

Intersect-and-iterate over multiple narrowed handles.

LaneZeroGuard

if (tile.thread_rank() == 0) <body> — single-thread guard applied around output writes when not inside a Cartesian (so 32 cooperating threads don’t all emit the same row).

OuterAnchor

Render body at the surrounding scope’s indent (`ctx.indent_level

ParallelFor

Parallel-execution scaffold. The body is run by N workers according to the strategy. M1 supports only warp_strided (GPU warp-strided grid-stride).

Phase

Counting (mode=’C’) or materialize (mode=’M’) scope. The same body emits differently inside each phase via the surrounding OutputContext template; the IR carries the intent but the legacy emitter currently only emits the unified body.

RawString

Escape hatch for emission templates we haven’t dialectified yet. Carries a literal string into the C++ output. The byte-equivalence port uses RawString sparingly to bridge gaps as it ports each MIR op kind. Each use is a candidate for replacement by a proper IR op in a later milestone.

TiledBallotBlock

Multi-output ballot-coalesced write block used inside tiled- Cartesian materialize emission.

VarRef

Refer to a previously-bound name. Renders as the bare name.

WriteOutput

Emit a row to the output context.

API

class srdatalog.ir.dialects.iir.cf.ops.AddCount[source]

Bases: srdatalog.ir.core.Op

Bump the count counter directly. Used by the count-as-product short-circuit (R1) and by counting-only paths.

Lowers to <output_var>.add_count(<delta>);.

delta: srdatalog.ir.core.Op

None

output_var: str

None

class srdatalog.ir.dialects.iir.cf.ops.Bind[source]

Bases: srdatalog.ir.core.Op

Declare auto <name> = <expr>; (or <type> <name> = <expr>;).

expr is an expression-shaped Op; the target lowering renders it via emit_expr().

expr: srdatalog.ir.core.Op

None

name: str

None

type_decl: str

‘auto’

class srdatalog.ir.dialects.iir.cf.ops.BlankLine[source]

Bases: srdatalog.ir.core.Op

Emit a single empty line. Used to match legacy emission where whitespace has structural meaning (e.g. between the degree fetch and the loop preamble).

class srdatalog.ir.dialects.iir.cf.ops.Block[source]

Bases: srdatalog.ir.core.Op

A sequence of statements emitted in order.

stmts: tuple[srdatalog.ir.core.Op, ...]

None

class srdatalog.ir.dialects.iir.cf.ops.Cartesian2DDecompose[source]

Bases: srdatalog.ir.core.Op

Adaptive 2-source flat-index decomposition.

Lowers (target.cuda) to: const bool <major_var> = (<deg1_var> >= <deg0_var>); uint32_t <idx0_var>, <idx1_var>; if (<major_var>) { <idx0_var> = <flat_idx_var> / <deg1_var>; <idx1_var> = <flat_idx_var> % <deg1_var>; } else { <idx1_var> = <flat_idx_var> / <deg0_var>; <idx0_var> = <flat_idx_var> % <deg0_var>; }

Picking which source is the divisor based on relative size keeps the modulus on the smaller dimension — matches the legacy _nested_column_join_multi’s adaptive shape.

deg0_var: str

None

deg1_var: str

None

flat_idx_var: str

None

idx0_var: str

None

idx1_var: str

None

major_var: str

None

class srdatalog.ir.dialects.iir.cf.ops.CartesianFlatLoop[source]

Bases: srdatalog.ir.core.Op

Flat for-loop over the Cartesian product, partitioned by lane.

Lowers (target.cuda) to: for (uint32_t <idx_var> = <lane_var>; <idx_var> < <bound_var>; <idx_var> += <group_size_var>) { }

Used by nested CartesianJoin: each thread in the tile takes a share of the Cartesian product based on its lane_var = tile.thread_rank() and stride group_size_var = tile.size().

body: srdatalog.ir.core.Op

None

bound_var: str

None

group_size_var: str

None

idx_var: str

None

lane_var: str

None

class srdatalog.ir.dialects.iir.cf.ops.CartesianNDecompose[source]

Bases: srdatalog.ir.core.Op

Countdown-remainder decomposition for an N-source flat index (N >= 3).

Lowers (target.cuda) to: uint32_t remaining = <flat_idx>; uint32_t <idx_{N-1}> = remaining % <deg_{N-1}>; remaining /= <deg_{N-1}>; uint32_t <idx_{N-2}> = remaining % <deg_{N-2}>; remaining /= <deg_{N-2}>; … uint32_t <idx_0> = remaining % <deg_0>; (no final div)

The 2-source case has its own adaptive Cartesian2DDecompose with major_is_1 runtime flag — N>=3 doesn’t bother with the adaptive branch.

deg_vars: tuple[str, ...]

None

flat_idx_var: str

None

idx_vars: tuple[str, ...]

None

class srdatalog.ir.dialects.iir.cf.ops.Comment[source]

Bases: srdatalog.ir.core.Op

Emit a // ... comment. Pass-through to the C++ source. The legacy emitter sprinkles these for debugging; the dialect carries them as IR so byte-equivalence preserves them.

text: str

None

class srdatalog.ir.dialects.iir.cf.ops.GridStrideLoop[source]

Bases: srdatalog.ir.core.Op

Warp-strided grid-stride for-loop with body.

Lowers to: for (uint32_t <idx_name> = warp_id; <idx_name> < ; <idx_name> += num_warps) { }

body: srdatalog.ir.core.Op

None

bound: srdatalog.ir.core.Op

None

idx_name: str

None

class srdatalog.ir.dialects.iir.cf.ops.If[source]

Bases: srdatalog.ir.core.Op

if (<cond>) { <body> } — body emitted at the SAME indent as the wrapping if (matches the legacy emitter’s no-inc-indent quirk for filter chains, where the body was rendered before the wrap was applied).

Use IndentBlock inside body if some inner statements need to go deeper than the outer indent.

body: srdatalog.ir.core.Op

None

cond: srdatalog.ir.core.Op

None

class srdatalog.ir.dialects.iir.cf.ops.IfContinueIfNot[source]

Bases: srdatalog.ir.core.Op

if (!<cond>) continue; — the inner-loop validity guard.

Used inside grid-stride loops over root_unique_values: a failed prefix narrowing on any source means this root_val has no intersection, so skip to the next iteration.

cond: srdatalog.ir.core.Op

None

class srdatalog.ir.dialects.iir.cf.ops.IfReturnIfNot[source]

Bases: srdatalog.ir.core.Op

if (!<cond>) return; — the validity guard pattern.

cond: srdatalog.ir.core.Op

None

class srdatalog.ir.dialects.iir.cf.ops.IndentBlock[source]

Bases: srdatalog.ir.core.Op

Render contained statements at +extra indent levels.

Used to model the legacy emitter’s mixed-indent quirks where some children of a scope are at a different indent than others. The most common case: in a root Scan, the var-bind statements are at the loop’s inner indent while the InsertInto body is at the outer indent (because the body was rendered before inc_indent).

extra: int

None

stmts: tuple[srdatalog.ir.core.Op, ...]

None

class srdatalog.ir.dialects.iir.cf.ops.IntersectIter[source]

Bases: srdatalog.ir.core.Op

Intersect-and-iterate over multiple narrowed handles.

Lowers (target.cuda) to:

auto <intersect_var> = intersect_handles(tile, <iter_exprs...>);
for (auto <iter_var> = <intersect_var>.begin();
     <iter_var>.valid(); <iter_var>.next()) {
  auto <value_var> = <iter_var>.value();
  auto positions = <iter_var>.positions();
  <body>
}

iterator_exprs are expression-shaped ops (typically SaIterators) that produce the per-source iterator pairs handed to intersect_handles. The literal name positions is part of the legacy convention; child_range calls inside the body reference it.

Indent quirk under D2L segment loops: the value/positions lines and the body anchor against the OUTER indent (ctx.indent_level - ctx.segment_depth), not against the IntersectIter’s own indent. This mirrors the legacy _nested_column_join_multi where seg_indent is a string-only offset and ind(ctx) (the structural indent) is unaffected by segment loops. The emit takes care of this via EmitCtx.segment_depth.

body: srdatalog.ir.core.Op

None

intersect_var: str

None

iter_var: str

None

iterator_exprs: tuple[srdatalog.ir.core.Op, ...]

None

value_var: str

None

class srdatalog.ir.dialects.iir.cf.ops.LaneZeroGuard[source]

Bases: srdatalog.ir.core.Op

if (tile.thread_rank() == 0) <body> — single-thread guard applied around output writes when not inside a Cartesian (so 32 cooperating threads don’t all emit the same row).

body: srdatalog.ir.core.Op

None

class srdatalog.ir.dialects.iir.cf.ops.OuterAnchor[source]

Bases: srdatalog.ir.core.Op

Render body at the surrounding scope’s indent (`ctx.indent_level

  • ctx.segment_depth`), regardless of how deep the wrapping D2lSegmentLoops have nested.

Used to embed a CJ-multi body_op INSIDE a root-CJ D2lSegmentLoop’s body (so the segment loop’s brace closes AFTER the body) while keeping the body’s first-line indent at the outer kernel level — matches the legacy _root_cj_multi pattern of pre-rendering body at the outer indent and letting the segment loops wrap textually around it.

Resets segment_depth to 0 inside body so any further nested IntersectIter / D2lSegmentLoop in body anchors against the new (outer) base.

body: srdatalog.ir.core.Op

None

class srdatalog.ir.dialects.iir.cf.ops.ParallelFor[source]

Bases: srdatalog.ir.core.Op

Parallel-execution scaffold. The body is run by N workers according to the strategy. M1 supports only warp_strided (GPU warp-strided grid-stride).

Strategy is a string for now; later milestones promote it to a proper sub-dialect (par.data.warp_strided, par.data.tbb_for, …).

body: srdatalog.ir.core.Op

None

strategy: str

None

class srdatalog.ir.dialects.iir.cf.ops.Phase[source]

Bases: srdatalog.ir.core.Op

Counting (mode=’C’) or materialize (mode=’M’) scope. The same body emits differently inside each phase via the surrounding OutputContext template; the IR carries the intent but the legacy emitter currently only emits the unified body.

body: srdatalog.ir.core.Op

None

mode: str

None

class srdatalog.ir.dialects.iir.cf.ops.RawString[source]

Bases: srdatalog.ir.core.Op

Escape hatch for emission templates we haven’t dialectified yet. Carries a literal string into the C++ output. The byte-equivalence port uses RawString sparingly to bridge gaps as it ports each MIR op kind. Each use is a candidate for replacement by a proper IR op in a later milestone.

text: str

None

class srdatalog.ir.dialects.iir.cf.ops.TiledBallotBlock[source]

Bases: srdatalog.ir.core.Op

Multi-output ballot-coalesced write block used inside tiled- Cartesian materialize emission.

Lowers (target.cuda) to:

{
  uint32_t _tc_ballot = tile.ballot(<valid_var>);
  uint32_t _tc_active = __popc(_tc_ballot);
  if (_tc_active > 0) {
    uint32_t _tc_mask = (1u << tile.thread_rank()) - 1u;
    uint32_t _tc_off = __popc(_tc_ballot & _tc_mask);
    for each output (dest_idx, values):
      if (<valid_var>) {
        uint32_t _tc_pos_<dest_idx> = old_size_<dest_idx>
          + warp_write_base + warp_local_count + _tc_off;
        output_data_<dest_idx>[col * static_cast<uint32_t>(
          output_stride_<dest_idx>) + _tc_pos_<dest_idx>] = vN;
        ...
      }
    warp_local_count += _tc_active;
  }
}

outputs is a tuple of (dest_idx, sanitized_values, debug_text). Multi-head pipelines emit several entries; the ballot setup + _tc_active increment happen once around all of them. Replaces the legacy tiled_cartesian_ballot_done flag on CodeGenContext.

outputs: tuple[tuple[int, tuple[str, ...], str], ...]

None

valid_var: str

None

class srdatalog.ir.dialects.iir.cf.ops.VarRef[source]

Bases: srdatalog.ir.core.Op

Refer to a previously-bound name. Renders as the bare name.

name: str

None

class srdatalog.ir.dialects.iir.cf.ops.WriteOutput[source]

Bases: srdatalog.ir.core.Op

Emit a row to the output context.

Lowers to <output_var>.emit_direct(<values>) in materialize phase or <output_var>.emit_direct() in count phase (the polymorphic OutputContext template handles the dispatch at C++ level).

output_var: str

None

values: tuple[srdatalog.ir.core.Op, ...]

None