srdatalog.ir.codegen.cuda.runner¶
target.cuda — per-rule runner emission.
emit_runner_full(ep, db, rel_index_types) is the canonical entry
point that compile.compile_runner calls into. It produces the
per-rule JitRunner_<rule> struct plus all kernel definitions and
out-of-line phase methods — the content of jit_runner.<rule>.cpp.
Today the implementation delegates to the legacy
ir.codegen.cuda.complete_runner.gen_complete_runner for the runner
scaffolding (phase methods, type aliases, execute() dispatcher,
LaunchParams struct, BG variants, fused kernel) and routes kernel
bodies through compile_kernel_body when _dialect_safe_kernel
holds. Subsequent milestones port the remaining pieces:
N2 Fused composer (count + materialize back-to-back operator())
N4 par.data.block_group dialect (BG warp-cumulative dispatch)
N5 relation.d2l dialect (multi-view plugin dispatch + setup)
N6 Dedup-hash WriteOutput variant
N7 Tiled-Cartesian ballot-reuse on relation.sorted_array
N8 par.data.atomic_ws dialect (WCOJ task queue)
Each milestone collapses one slice of the delegation into native
dialect emission, validated by tests/test_runner_byte_equivalence.py.
The emission output of this module is byte-equivalent (modulo
_cpp_norm) to the upstream Nim jit_runner.<rule>.cpp goldens
on every fixture that the legacy emitter handled.
Module Contents¶
Functions¶
|
|
|
|
Grid configuration template — populates |
|
|
|
|
|
|
|
LaunchParams block — shared between |
|
Phase-method forward declarations inside |
|
|
|
|
|
Emit the forward-declaration variant — type aliases + LaunchParams |
|
Emit the full per-rule runner — struct + kernel defs + out-of-line
phase methods + execute(). Goes into the per-rule |
|
|
|
|
|
Type alias block shared between |
Data¶
API¶
- srdatalog.ir.codegen.cuda.runner.__all__¶
[‘emit_execute’, ‘emit_execute_fused’, ‘emit_grid_config_code’, ‘emit_launch_count’, ‘emit_launch_fu…
- srdatalog.ir.codegen.cuda.runner.emit_execute(rule_name: str, runner_prefix: str, is_count: bool, *, is_block_group: bool = False, is_dedup_hash: bool = False, dest_specs: list[srdatalog.ir.mir.types.InsertInto] | None = None) str[source]¶
<runner_prefix>::execute— top-level dispatcher. For BG materialize rules, fans out into a 5-step pipeline (histogram → prefix sum → BG count → scan + resize → BG materialize) with adaptive fallback to baseline below the size threshold.
- srdatalog.ir.codegen.cuda.runner.emit_execute_fused(ep: srdatalog.ir.mir.types.ExecutePipeline, runner_prefix: str) str[source]¶
<runner_prefix>::execute_fused— single-pass fused dispatcher with speculative output buffer + automatic capacity growth on overflow.
- srdatalog.ir.codegen.cuda.runner.emit_grid_config_code(prefix: str, root_is_scan: bool) str[source]¶
Grid configuration template — populates
<prefix>num_threads/num_blocksbased on whether the rule is a binary join (row-based) or WCOJ (unique-key-based).
- srdatalog.ir.codegen.cuda.runner.emit_launch_count(runner_prefix: str, *, is_block_group: bool = False, is_dedup_hash: bool = False) str[source]¶
<runner_prefix>::launch_count— fires kernel_count (and the BG variant whenis_block_group=True) on the given stream after the zero-key fast path. Whenis_dedup_hash=True, passesp.dedup_tableto the kernel.
- srdatalog.ir.codegen.cuda.runner.emit_launch_fused(ep: srdatalog.ir.mir.types.ExecutePipeline, runner_prefix: str) str[source]¶
<runner_prefix>::launch_fused— fires kernel_fused (or kernel_bg_fused with stream-ordered histogram) into the given stream.
- srdatalog.ir.codegen.cuda.runner.emit_launch_materialize(ep: srdatalog.ir.mir.types.ExecutePipeline, runner_prefix: str) str[source]¶
<runner_prefix>::launch_materialize— fires the materialize kernel (and BG variant when ep.block_group). Pure template; ProvPtrType is always nullptr today (no provenance materialization yet).
- srdatalog.ir.codegen.cuda.runner.emit_launch_params_struct(num_dests: int, is_fused_eligible: bool, is_block_group: bool = False, is_dedup_hash: bool = False, for_decl: bool = False) str[source]¶
LaunchParams block — shared between
fullanddeclemission. Whenfor_declis True the BG-block comment uses the decl variant (“must match JIT batch definition exactly!”) to mirror Nim exactly.
- srdatalog.ir.codegen.cuda.runner.emit_method_forward_decls(is_count: bool, is_fused_eligible: bool) str[source]¶
Phase-method forward declarations inside
struct JitRunner_X.
- srdatalog.ir.codegen.cuda.runner.emit_read_fused_result(ep: srdatalog.ir.mir.types.ExecutePipeline, runner_prefix: str) str[source]¶
<runner_prefix>::read_fused_result— readback fused write countsoverflow flag (call after device sync).
- srdatalog.ir.codegen.cuda.runner.emit_read_total(runner_prefix: str) str[source]¶
<runner_prefix>::read_total— read the post-scan total count (call after device sync).
- srdatalog.ir.codegen.cuda.runner.emit_runner_decl(ep: srdatalog.ir.mir.types.ExecutePipeline, db_type_name: str, rel_index_types: dict[str, str] | None = None) str[source]¶
Emit the forward-declaration variant — type aliases + LaunchParams
method declarations only. Goes into the main compile unit so the orchestrator can call
JitRunner_<rule>::execute().
- srdatalog.ir.codegen.cuda.runner.emit_runner_full(ep: srdatalog.ir.mir.types.ExecutePipeline, db_type_name: str, rel_index_types: dict[str, str] | None = None) str[source]¶
Emit the full per-rule runner — struct + kernel defs + out-of-line phase methods + execute(). Goes into the per-rule
jit_batch_N.cppfile at production-build time.
- srdatalog.ir.codegen.cuda.runner.emit_scan_and_resize(ep: srdatalog.ir.mir.types.ExecutePipeline, runner_prefix: str) str[source]¶
<runner_prefix>::scan_and_resize— exclusive prefix-scan over thread_counts, read total, resize each dest relation in place.
- srdatalog.ir.codegen.cuda.runner.emit_scan_only(runner_prefix: str) str[source]¶
<runner_prefix>::scan_only— async prefix-scan, no host sync.
- srdatalog.ir.codegen.cuda.runner.emit_struct_type_aliases(rule_name: str, db_type_name: str, first_schema: str, first_version: str, dest_specs: list[srdatalog.ir.mir.types.InsertInto], dest_arities: list[int], total_view_count: int) str[source]¶
Type alias block shared between
fullanddecl. Does NOT includestruct JitRunner_X {or the closing brace.