srdatalog.ir.codegen.cuda.render.parallel_data¶
CUDA renderer for the parallel.data dialect.
Per docs/stage3a_execution_plan.md §7 tasks S3A.3 + S3A.9b.
Two pieces of CUDA emission live here:
_render_bg_root_cj_multi— the BgRootCjMulti op renderer (split from the legacy codegen/cuda/emit.py 41-case match).emit_bg_histogram_kernel— the standalone histogram kernel template (a per-rule kernel that’s not part of the BG body rendering; called by complete_runner.py during runner emit). Relocated S3A.9b from dialects/parallel/data/block_group.py so the dialect file contains only ops + their helper data (BgSourceSpec); CUDA emission lives in the codegen, not inside the dialect.
Module Contents¶
Functions¶
Emit |
Data¶
API¶
- srdatalog.ir.codegen.cuda.render.parallel_data.__all__¶
[‘emit_bg_histogram_kernel’]
- srdatalog.ir.codegen.cuda.render.parallel_data.emit_bg_histogram_kernel(ep: srdatalog.ir.mir.types.ExecutePipeline, rel_index_types: dict[str, str]) str[source]¶
Emit
kernel_bg_histogram— a grid-stride loop over unique root keys that writes the per-key work estimate (product of root-source degrees) intobg_work_per_key[].Body is a hand-crafted prefix+degree sweep, not a
jit_pipelinerender. Pulls plugin/view-management helpers from the codegen internals (gen_root_handle, plugin_view_count, view_slot helpers).