srdatalog.ir.codegen.cuda.api¶
Public compile entry point for the dialect-based codegen.
compile_pipeline(ep, target='cuda') emits a complete C++ JIT batch
file by routing the pipeline through the IIR-sorted-array dialect +
target.cuda emit, wrapped in the dialect’s envelope helpers (file
prelude, banner, functor struct, view declarations, footer).
compile_kernel_body(ep, ...) is the lower-level entry: emits just
the operator() body (view_decls + dialect-emitted kernel logic),
parameterized by phase (count vs materialize) and output-var bindings.
The runner emit (Phase N3) calls into this for each kernel it wraps.
Pipeline shapes the dialect doesn’t (yet) handle raise loudly via
lower_scan_pipeline — _supported_pipeline() is the authoritative
scope statement. Adding coverage for a new shape means adding a
lowering rule, not a fallback.
The byte-equivalence harnesses:
tests/test_byte_equivalence_jit.py — materialize-phase kernel functor against the upstream Nim goldens.
tests/test_count_phase_byte_equivalence.py — count-phase body against the legacy
jit_pipelinecount emit (the only spec for count-phase shape, since the runner files contain count bodies but no isolated count-only goldens exist).
See:
docs/stage2_emitter_audit.md — the per-milestone migration plan.
docs/ir_lowering_semantics.md — the formal lowering rules.
docs/design_principles.md — discipline rules for the rewrite.
Module Contents¶
Functions¶
Emit the operator() body for one kernel — view_decls followed by the dialect-emitted kernel logic. Caller is responsible for the envelope (file prelude, kernel signature, OutputContext setup). |
|
Compile an MIR ExecutePipeline to target C++ source via the dialect. |
|
Compile an ExecutePipeline to its full per-rule runner — the
|
Data¶
API¶
- srdatalog.ir.codegen.cuda.api.Target¶
None
- srdatalog.ir.codegen.cuda.api.__all__¶
[‘Target’, ‘compile_kernel_body’, ‘compile_pipeline’, ‘compile_runner’]
- srdatalog.ir.codegen.cuda.api.compile_kernel_body(ep: srdatalog.ir.mir.types.ExecutePipeline, *, is_counting: bool, output_var_name: str = 'output', output_vars: dict[str, str] | None = None, slot_mode: str = 'positional', rel_index_types: dict[str, str] | None = None, tiled_cartesian: bool = False, bg_enabled: bool = False) str[source]¶
Emit the operator() body for one kernel — view_decls followed by the dialect-emitted kernel logic. Caller is responsible for the envelope (file prelude, kernel signature, OutputContext setup).
Parameters mirror the legacy
_make_kernel_ctxknobs the runner emit (complete_runner.py) twiddles per kernel:is_counting: True selects count-phase emit (
emit_direct()with no args, AddCount-style increments).output_var_name: name of the OutputContext variable used by the single-output InsertInto path (legacy default ‘output’; runner uses ‘output_ctx’ in count phase, ‘output_ctx_0’ in materialize).
output_vars: per-relation output-var override map. Multi-head rules use this so each InsertInto resolves to its own dest’s OutputContext. Pass
{rel_name: '__skip_counting__'}to suppress count-phase emission for secondary outputs.slot_mode: ‘positional’ (default, matches
jit_runner.<rule>.cppproduction goldens) or ‘handle_idx’ (matches the standalonejit_batch.<rule>.cpptest fixtures emitted viacompile_pipeline). Seeemit_view_declarationsdocstring.rel_index_types: per-relation custom index type (e.g.,
Device2LevelIndex). Used to compute per-spec view_counts viarelation.d2l(and any future index dialect) so positional slots advance by 2 per FULL_VER D2L source — matching legacycompute_view_slot_offsets. Pass {} or None for plain DSAI.
- srdatalog.ir.codegen.cuda.api.compile_pipeline(ep: srdatalog.ir.mir.types.ExecutePipeline, *, target: srdatalog.ir.codegen.cuda.api.Target = 'cuda') str[source]¶
Compile an MIR ExecutePipeline to target C++ source via the dialect.
Raises ValueError on unsupported targets. Raises (via
lower_scan_pipeline) on pipeline shapes the dialect doesn’t cover — there is no legacy fallback.
- srdatalog.ir.codegen.cuda.api.compile_runner(ep: srdatalog.ir.mir.types.ExecutePipeline, db_type_name: str, rel_index_types: dict[str, str] | None = None) str[source]¶
Compile an ExecutePipeline to its full per-rule runner — the
JitRunner_<rule>struct + kernel definitions + out-of-line phase methods + execute(). Production output: this is whatjit_runner.<rule>.cppgolden files capture.The dialect’s
codegen.cuda.runnermodule owns the runner emission surface. Most pieces (phase methods, execute, BG variants, fused kernel) currently delegate to legacy helpers inir.codegen.cuda.complete_runner; later milestones (N2/N4/N5/N6/N8) collapse them into native dialect emission.Kernel bodies (count + materialize) already route through
compile_kernel_bodywhen_dialect_safe_kernelholds — see the swap insidecomplete_runner._gen_kernel_count/_gen_kernel_materialize.The byte-equivalence gate (
tests/test_runner_byte_equivalence.py) anchors this entry point to the upstream goldens throughout the migration.