srdatalog.ir.codegen.cuda.api

Public compile entry point for the dialect-based codegen.

compile_pipeline(ep, target='cuda') emits a complete C++ JIT batch file by routing the pipeline through the IIR-sorted-array dialect + target.cuda emit, wrapped in the dialect’s envelope helpers (file prelude, banner, functor struct, view declarations, footer).

compile_kernel_body(ep, ...) is the lower-level entry: emits just the operator() body (view_decls + dialect-emitted kernel logic), parameterized by phase (count vs materialize) and output-var bindings. The runner emit (Phase N3) calls into this for each kernel it wraps.

Pipeline shapes the dialect doesn’t (yet) handle raise loudly via lower_scan_pipeline_supported_pipeline() is the authoritative scope statement. Adding coverage for a new shape means adding a lowering rule, not a fallback.

The byte-equivalence harnesses:

  • tests/test_byte_equivalence_jit.py — materialize-phase kernel functor against the upstream Nim goldens.

  • tests/test_count_phase_byte_equivalence.py — count-phase body against the legacy jit_pipeline count emit (the only spec for count-phase shape, since the runner files contain count bodies but no isolated count-only goldens exist).

See:

  • docs/stage2_emitter_audit.md — the per-milestone migration plan.

  • docs/ir_lowering_semantics.md — the formal lowering rules.

  • docs/design_principles.md — discipline rules for the rewrite.

Module Contents

Functions

compile_kernel_body

Emit the operator() body for one kernel — view_decls followed by the dialect-emitted kernel logic. Caller is responsible for the envelope (file prelude, kernel signature, OutputContext setup).

compile_pipeline

Compile an MIR ExecutePipeline to target C++ source via the dialect.

compile_runner

Compile an ExecutePipeline to its full per-rule runner — the JitRunner_<rule> struct + kernel definitions + out-of-line phase methods + execute(). Production output: this is what jit_runner.<rule>.cpp golden files capture.

Data

API

srdatalog.ir.codegen.cuda.api.Target

None

srdatalog.ir.codegen.cuda.api.__all__

[‘Target’, ‘compile_kernel_body’, ‘compile_pipeline’, ‘compile_runner’]

srdatalog.ir.codegen.cuda.api.compile_kernel_body(ep: srdatalog.ir.mir.types.ExecutePipeline, *, is_counting: bool, output_var_name: str = 'output', output_vars: dict[str, str] | None = None, slot_mode: str = 'positional', rel_index_types: dict[str, str] | None = None, tiled_cartesian: bool = False, bg_enabled: bool = False) str[source]

Emit the operator() body for one kernel — view_decls followed by the dialect-emitted kernel logic. Caller is responsible for the envelope (file prelude, kernel signature, OutputContext setup).

Parameters mirror the legacy _make_kernel_ctx knobs the runner emit (complete_runner.py) twiddles per kernel:

is_counting: True selects count-phase emit (emit_direct() with no args, AddCount-style increments).

output_var_name: name of the OutputContext variable used by the single-output InsertInto path (legacy default ‘output’; runner uses ‘output_ctx’ in count phase, ‘output_ctx_0’ in materialize).

output_vars: per-relation output-var override map. Multi-head rules use this so each InsertInto resolves to its own dest’s OutputContext. Pass {rel_name: '__skip_counting__'} to suppress count-phase emission for secondary outputs.

slot_mode: ‘positional’ (default, matches jit_runner.<rule>.cpp production goldens) or ‘handle_idx’ (matches the standalone jit_batch.<rule>.cpp test fixtures emitted via compile_pipeline). See emit_view_declarations docstring.

rel_index_types: per-relation custom index type (e.g., Device2LevelIndex). Used to compute per-spec view_counts via relation.d2l (and any future index dialect) so positional slots advance by 2 per FULL_VER D2L source — matching legacy compute_view_slot_offsets. Pass {} or None for plain DSAI.

srdatalog.ir.codegen.cuda.api.compile_pipeline(ep: srdatalog.ir.mir.types.ExecutePipeline, *, target: srdatalog.ir.codegen.cuda.api.Target = 'cuda') str[source]

Compile an MIR ExecutePipeline to target C++ source via the dialect.

Raises ValueError on unsupported targets. Raises (via lower_scan_pipeline) on pipeline shapes the dialect doesn’t cover — there is no legacy fallback.

srdatalog.ir.codegen.cuda.api.compile_runner(ep: srdatalog.ir.mir.types.ExecutePipeline, db_type_name: str, rel_index_types: dict[str, str] | None = None) str[source]

Compile an ExecutePipeline to its full per-rule runner — the JitRunner_<rule> struct + kernel definitions + out-of-line phase methods + execute(). Production output: this is what jit_runner.<rule>.cpp golden files capture.

The dialect’s codegen.cuda.runner module owns the runner emission surface. Most pieces (phase methods, execute, BG variants, fused kernel) currently delegate to legacy helpers in ir.codegen.cuda.complete_runner; later milestones (N2/N4/N5/N6/N8) collapse them into native dialect emission.

Kernel bodies (count + materialize) already route through compile_kernel_body when _dialect_safe_kernel holds — see the swap inside complete_runner._gen_kernel_count / _gen_kernel_materialize.

The byte-equivalence gate (tests/test_runner_byte_equivalence.py) anchors this entry point to the upstream goldens throughout the migration.