srdatalog.ir.codegen.cuda.envelope¶
target.cuda — full-file envelope emission.
The dialect-emitted kernel body sits inside a fixed-shape envelope:
JIT_FILE_PRELUDE (constant header)
banner(rule_name, num_handles)
functor_start(rule_name, …)
[optional DedupTable struct]
view_declarations +
(operator() body)functor_end()
“\n”
JIT_FILE_FOOTER (constant footer)
Pure string emitters — no algorithm dispatch, no feature-flag-driven
branching beyond the dedup-hash struct injection. Lives in the dialect
because the envelope shape is target-specific (CUDA cooperative-groups
signature, __device__ qualifier, etc.); a target.cpp_tbb envelope
would emit a different shape from the same MIR pipeline.
The legacy ir/dialects/target/cuda/ modules still own their own copies of these
helpers during the Stage 2 transition (they are imported by the
byte-equivalence harness which compares the legacy emitter against
the dialect path). Once the legacy inner-body emitters are deleted,
the legacy copies go with them.
Module Contents¶
Classes¶
(rel_name, index, version, handle_idx) — handle_idx is the handle position of the FIRST op that referenced this view spec. |
Functions¶
Assign |
|
Walk the pipeline and collect a deduplicated list of |
|
Number of |
|
Emit the DedupTable struct nested inside a rule’s kernel scope. |
|
Wrap a dialect-emitted operator() body in the standard file envelope. |
|
Open |
|
Emit the top-of-kernel |
|
Arity of the first InsertInto’s column set. Sizes the DedupTable’s hash function (one v0..vN-1 column per parameter). |
Data¶
API¶
- srdatalog.ir.codegen.cuda.envelope.JIT_FILE_FOOTER = <Multiline-String>¶
- srdatalog.ir.codegen.cuda.envelope.JIT_FILE_PRELUDE = <Multiline-String>¶
- class srdatalog.ir.codegen.cuda.envelope.ViewSpec[source]¶
(rel_name, index, version, handle_idx) — handle_idx is the handle position of the FIRST op that referenced this view spec.
- srdatalog.ir.codegen.cuda.envelope.__all__¶
[‘JIT_FILE_FOOTER’, ‘JIT_FILE_PRELUDE’, ‘ViewSpec’, ‘assign_handle_positions’, ‘collect_unique_view_…
- srdatalog.ir.codegen.cuda.envelope.assign_handle_positions(ops: list[srdatalog.ir.mir.types.MirNode]) None[source]¶
Assign
handle_startto every source-bearing node in pipeline order starting from 0. Mutatesopsin place.
- srdatalog.ir.codegen.cuda.envelope.collect_unique_view_specs(ops: list[srdatalog.ir.mir.types.MirNode]) list[srdatalog.ir.codegen.cuda.envelope.ViewSpec][source]¶
Walk the pipeline and collect a deduplicated list of
ViewSpecs in first-occurrence order. Covers every op that references a view (ColumnJoin, CartesianJoin, Scan, Negation, Aggregate, BalancedScan, PositionedExtract).
- srdatalog.ir.codegen.cuda.envelope.count_handles(ops: list[srdatalog.ir.mir.types.MirNode]) int[source]¶
Number of
views[]slots needed by the kernel —max(handle_start) + 1.
- srdatalog.ir.codegen.cuda.envelope.emit_dedup_table_struct(arity: int) str[source]¶
Emit the DedupTable struct nested inside a rule’s kernel scope.
GPU hash table over
arity32-bit columns:try_insert(atomicCAS during count phase) andcheck_winner(read-only during materialize phase). Linear probing over up to 128 slots from an FNV-1a hash; the capacity is power-of-2 (host-side runner zero-initializes hash_slots between phases).
- srdatalog.ir.codegen.cuda.envelope.emit_full_file(ep: srdatalog.ir.mir.types.ExecutePipeline, body: str, *, scalar_mode: bool = False) str[source]¶
Wrap a dialect-emitted operator() body in the standard file envelope.
bodymust be everything betweenoperator() {and the closing}, i.e. the view declarations followed by the dialect-emitted kernel logic. Caller composesview_decls + emit(iir, emit_ctx).
- srdatalog.ir.codegen.cuda.envelope.emit_functor_start(rule_name: str, *, scalar_mode: bool = False, dedup_hash: bool = False) str[source]¶
Open
struct Kernel_<rule> { ... operator()(...) const {.When
dedup_hash=True, operator() takes an additionalDedupTable dedup_tableparameter.
- srdatalog.ir.codegen.cuda.envelope.emit_view_declarations(specs: list[srdatalog.ir.codegen.cuda.envelope.ViewSpec], pipeline: list[srdatalog.ir.mir.types.MirNode], *, indent_level: int = 4, debug: bool = True, slot_mode: str = 'handle_idx', view_counts: list[int] | None = None) tuple[str, dict[str, str]][source]¶
Emit the top-of-kernel
auto view_X = views[i];block.Returns
(decls_string, view_vars).view_varsmaps both:spec key (
<rel>_<cols>_<VER>) → view variable namestr(handle_idx)→ view variable name (so handle-bearing ops can look up “which view does this handle name reference?”)
slot_modecontrols how the index intoviews[]is chosen:'handle_idx': usesp.handle_idxdirectly (matches thejit_batch.<rule>.cppstandalone-kernel goldens, which don’t apply slot-offset compaction).'positional': use cumulative-sum-of-view_counts slot per spec (matches thejit_runner.<rule>.cppproduction goldens viacompute_view_slot_offsets).
view_counts(per-spec, parallel tospecs) is the number of physical view slots each spec consumes. Default = all 1s (DSAI). D2L FULL_VER specs consume 2 slots (HEAD + FULL); the dialect’s view decl emits the BASE view at the first slot only (the second slot is referenced by BG histogram viaviews[base+seg]).
- srdatalog.ir.codegen.cuda.envelope.first_dest_arity(ops: list[srdatalog.ir.mir.types.MirNode]) int[source]¶
Arity of the first InsertInto’s column set. Sizes the DedupTable’s hash function (one v0..vN-1 column per parameter).