srdatalog.ir.codegen.cuda.render.parallel_data

CUDA renderer for the parallel.data dialect.

Per docs/stage3a_execution_plan.md §7 tasks S3A.3 + S3A.9b.

Two pieces of CUDA emission live here:

  • _render_bg_root_cj_multi — the BgRootCjMulti op renderer (split from the legacy codegen/cuda/emit.py 41-case match).

  • emit_bg_histogram_kernel — the standalone histogram kernel template (a per-rule kernel that’s not part of the BG body rendering; called by complete_runner.py during runner emit). Relocated S3A.9b from dialects/parallel/data/block_group.py so the dialect file contains only ops + their helper data (BgSourceSpec); CUDA emission lives in the codegen, not inside the dialect.

Module Contents

Functions

emit_bg_histogram_kernel

Emit kernel_bg_histogram — a grid-stride loop over unique root keys that writes the per-key work estimate (product of root-source degrees) into bg_work_per_key[].

Data

API

srdatalog.ir.codegen.cuda.render.parallel_data.__all__

[‘emit_bg_histogram_kernel’]

srdatalog.ir.codegen.cuda.render.parallel_data.emit_bg_histogram_kernel(ep: srdatalog.ir.mir.types.ExecutePipeline, rel_index_types: dict[str, str]) str[source]

Emit kernel_bg_histogram — a grid-stride loop over unique root keys that writes the per-key work estimate (product of root-source degrees) into bg_work_per_key[].

Body is a hand-crafted prefix+degree sweep, not a jit_pipeline render. Pulls plugin/view-management helpers from the codegen internals (gen_root_handle, plugin_view_count, view_slot helpers).