srdatalog.ir.dialects.parallel.data.block_group

par.data.block_group — block-group work-balanced parallelism strategy.

The block-group strategy assigns each CUDA block a contiguous slice of the total flat work-space [0, bg_total_work). Per-block:

  1. Binary-search bg_cumulative_work[] for the starting key.

  2. Iterate keys from there until the block’s work budget is consumed.

  3. Inside each key, redistribute work across warps proportional to the first source’s degree.

Used for skewed root-key workloads where uniform warp-strided dispatch would leave warps idle. Three runtime arrays produced by the host:

  • bg_work_per_key[] filled by kernel_bg_histogram (per-key work estimate = product of root-source degrees)

  • bg_cumulative_work[] exclusive prefix-sum over the histogram

  • bg_total_work sum of all per-key work

This module owns the block-group dialect ops:

  • BgRootCjMulti — the BG dispatch shape for root multi-source ColumnJoin (count/materialize/fused kernel bodies). Lifted N4.1 from legacy jit_root_column_join_block_group. Bundles the work-assignment preamble, binary-search key loop, per-source handle narrowing, warp-row redistribution, optional D2L segment-loops, and the wrapped body into one IR op.

  • BgSourceSpec — per-source descriptor consumed by BgRootCjMulti.

CUDA emission lives in codegen/cuda/render/parallel_data.py — both _render_bg_root_cj_multi (BgRootCjMulti’s renderer) and emit_bg_histogram_kernel (the standalone histogram template called by the runner). Per docs/stage3a_execution_plan.md §7 task S3A.9b, the dialect file holds only data; rendering is the codegen’s job.

Module Contents

Classes

BgRootCjMulti

Block-group root multi-source ColumnJoin (count/materialize body).

BgSourceSpec

Per-source descriptor for BgRootCjMulti.

Data

API

class srdatalog.ir.dialects.parallel.data.block_group.BgRootCjMulti[source]

Bases: srdatalog.ir.core.Op

Block-group root multi-source ColumnJoin (count/materialize body).

Lifts legacy jit_root_column_join_block_group into a single dialect op. Emits the full BG scaffolding around body:

  • block-level work assignment preamble (work_per_block, block_begin/end, return-if-out-of-range — with thread_counts[thread_id] = 0; in count mode);

  • binary search for the starting key index;

  • per-key loop with key-range checks;

  • per-source handle narrow with the first source using the key_idx hint; multi-view (D2L FULL_VER) non-first sources defer their handle bind to a _bg_seg_<idx> segment loop;

  • warp-row redistribution narrowing the first source handle proportionally on its degree;

  • segment loops (when present) wrapping the body;

  • body emit;

  • segment-loop close braces, bg_remaining_begin = ...;, key-loop close brace.

var_name is the sanitized inner-bind name (auto <var> = root_val_<n>;). is_counting toggles the count-phase thread_counts[...] = 0; early-exit branch. key_idx_var, root_val_var, hint_lo, hint_hi are the counter-allocated names.

Sources is in pipeline order (sources[0] = first/hint source; sources[i>0] = either single-view direct narrow or multi-view segment-loop deferred).

body: srdatalog.ir.core.Op

None

hint_hi: str

None

hint_lo: str

None

is_counting: bool

None

key_idx_var: str

None

root_val_var: str

None

sources: tuple[srdatalog.ir.dialects.parallel.data.block_group.BgSourceSpec, ...]

None

var_name: str

None

class srdatalog.ir.dialects.parallel.data.block_group.BgSourceSpec[source]

Per-source descriptor for BgRootCjMulti.

Carries the legacy state that jit_root_column_join_block_group threads through its emit: rel_name + view/handle var names, multi-view view_count for D2L segment loops, base view-slot for per-segment view rebinding, and the index_type passed to legacy helpers (gen_root_handle, gen_valid).

base_slot: int

None

handle_var: str

None

index_type: str

None

rel_name: str

None

view_count: int

None

view_var: str

None

srdatalog.ir.dialects.parallel.data.block_group.__all__

[‘BgRootCjMulti’, ‘BgSourceSpec’]