`srdatalog.ir.codegen.cuda.envelope`¶

target.cuda — full-file envelope emission.

The dialect-emitted kernel body sits inside a fixed-shape envelope:

JIT_FILE_PRELUDE (constant header)

banner(rule_name, num_handles)
functor_start(rule_name, …)
[optional DedupTable struct]
view_declarations + (operator() body)
functor_end()
“\n”
JIT_FILE_FOOTER (constant footer)

Pure string emitters — no algorithm dispatch, no feature-flag-driven branching beyond the dedup-hash struct injection. Lives in the dialect because the envelope shape is target-specific (CUDA cooperative-groups signature, __device__ qualifier, etc.); a target.cpp_tbb envelope would emit a different shape from the same MIR pipeline.

The legacy ir/dialects/target/cuda/ modules still own their own copies of these helpers during the Stage 2 transition (they are imported by the byte-equivalence harness which compares the legacy emitter against the dialect path). Once the legacy inner-body emitters are deleted, the legacy copies go with them.

Module Contents¶

Classes¶

ViewSpec

(rel_name, index, version, handle_idx) — handle_idx is the handle position of the FIRST op that referenced this view spec.

Functions¶

`assign_handle_positions`	Assign `handle_start` to every source-bearing node in pipeline order starting from 0. Mutates `ops` in place.
`collect_unique_view_specs`	Walk the pipeline and collect a deduplicated list of `ViewSpec`s in first-occurrence order. Covers every op that references a view (ColumnJoin, CartesianJoin, Scan, Negation, Aggregate, BalancedScan, PositionedExtract).
`count_handles`	Number of `views[]` slots needed by the kernel — `max(handle_start) + 1`.
`emit_dedup_table_struct`	Emit the DedupTable struct nested inside a rule’s kernel scope.
`emit_full_file`	Wrap a dialect-emitted operator() body in the standard file envelope.
`emit_functor_end`
`emit_functor_start`	Open `struct Kernel_<rule> { ... operator()(...) const {`.
`emit_view_declarations`	Emit the top-of-kernel `auto view_X = views[i];` block.
`first_dest_arity`	Arity of the first InsertInto’s column set. Sizes the DedupTable’s hash function (one v0..vN-1 column per parameter).

Data¶

`JIT_FILE_FOOTER`
`JIT_FILE_PRELUDE`
`__all__`

API¶

srdatalog.ir.codegen.cuda.envelope.JIT_FILE_FOOTER = <Multiline-String>¶

srdatalog.ir.codegen.cuda.envelope.JIT_FILE_PRELUDE = <Multiline-String>¶

class srdatalog.ir.codegen.cuda.envelope.ViewSpec[source]¶

(rel_name, index, version, handle_idx) — handle_idx is the handle position of the FIRST op that referenced this view spec.

handle_idx: int¶: None

index: list[int]¶: None

rel_name: str¶: None

version: str¶: None

srdatalog.ir.codegen.cuda.envelope.__all__¶: [‘JIT_FILE_FOOTER’, ‘JIT_FILE_PRELUDE’, ‘ViewSpec’, ‘assign_handle_positions’, ‘collect_unique_view_…

srdatalog.ir.codegen.cuda.envelope.assign_handle_positions(ops: list[srdatalog.ir.mir.types.MirNode]) → None[source]¶: Assign handle_start to every source-bearing node in pipeline order starting from 0. Mutates ops in place.

srdatalog.ir.codegen.cuda.envelope.collect_unique_view_specs(ops: list[srdatalog.ir.mir.types.MirNode]) → list[srdatalog.ir.codegen.cuda.envelope.ViewSpec][source]¶: Walk the pipeline and collect a deduplicated list of ViewSpecs in first-occurrence order. Covers every op that references a view (ColumnJoin, CartesianJoin, Scan, Negation, Aggregate, BalancedScan, PositionedExtract).

srdatalog.ir.codegen.cuda.envelope.count_handles(ops: list[srdatalog.ir.mir.types.MirNode]) → int[source]¶: Number of views[] slots needed by the kernel — max(handle_start) + 1.

srdatalog.ir.codegen.cuda.envelope.emit_dedup_table_struct(arity: int) → str[source]¶

Emit the DedupTable struct nested inside a rule’s kernel scope.

GPU hash table over arity 32-bit columns: try_insert (atomicCAS during count phase) and check_winner (read-only during materialize phase). Linear probing over up to 128 slots from an FNV-1a hash; the capacity is power-of-2 (host-side runner zero-initializes hash_slots between phases).

srdatalog.ir.codegen.cuda.envelope.emit_full_file(ep: srdatalog.ir.mir.types.ExecutePipeline, body: str, *, scalar_mode: bool = False) → str[source]¶

Wrap a dialect-emitted operator() body in the standard file envelope.

body must be everything between operator() { and the closing }, i.e. the view declarations followed by the dialect-emitted kernel logic. Caller composes view_decls + emit(iir, emit_ctx).

srdatalog.ir.codegen.cuda.envelope.emit_functor_end() → str[source]¶

srdatalog.ir.codegen.cuda.envelope.emit_functor_start(rule_name: str, *, scalar_mode: bool = False, dedup_hash: bool = False) → str[source]¶

Open struct Kernel_<rule> { ... operator()(...) const {.

When dedup_hash=True, operator() takes an additional DedupTable dedup_table parameter.

srdatalog.ir.codegen.cuda.envelope.emit_view_declarations(specs: list[srdatalog.ir.codegen.cuda.envelope.ViewSpec], pipeline: list[srdatalog.ir.mir.types.MirNode], *, indent_level: int = 4, debug: bool = True, slot_mode: str = 'handle_idx', view_counts: list[int] | None = None) → tuple[str, dict[str, str]][source]¶

Emit the top-of-kernel auto view_X = views[i]; block.

Returns (decls_string, view_vars). view_vars maps both:

spec key (<rel>_<cols>_<VER>) → view variable name
str(handle_idx) → view variable name (so handle-bearing ops can look up “which view does this handle name reference?”)

slot_mode controls how the index into views[] is chosen:

'handle_idx': use sp.handle_idx directly (matches the jit_batch.<rule>.cpp standalone-kernel goldens, which don’t apply slot-offset compaction).
'positional': use cumulative-sum-of-view_counts slot per spec (matches the jit_runner.<rule>.cpp production goldens via compute_view_slot_offsets).

view_counts (per-spec, parallel to specs) is the number of physical view slots each spec consumes. Default = all 1s (DSAI). D2L FULL_VER specs consume 2 slots (HEAD + FULL); the dialect’s view decl emits the BASE view at the first slot only (the second slot is referenced by BG histogram via views[base+seg]).

srdatalog.ir.codegen.cuda.envelope.first_dest_arity(ops: list[srdatalog.ir.mir.types.MirNode]) → int[source]¶: Arity of the first InsertInto’s column set. Sizes the DedupTable’s hash function (one v0..vN-1 column per parameter).

srdatalog.ir.codegen.cuda.envelope¶