srdatalog.ir.dialects.parallel.data.block_group¶
par.data.block_group — block-group work-balanced parallelism strategy.
The block-group strategy assigns each CUDA block a contiguous slice of
the total flat work-space [0, bg_total_work). Per-block:
Binary-search
bg_cumulative_work[]for the starting key.Iterate keys from there until the block’s work budget is consumed.
Inside each key, redistribute work across warps proportional to the first source’s degree.
Used for skewed root-key workloads where uniform warp-strided dispatch would leave warps idle. Three runtime arrays produced by the host:
bg_work_per_key[]filled bykernel_bg_histogram(per-key work estimate = product of root-source degrees)bg_cumulative_work[]exclusive prefix-sum over the histogrambg_total_worksum of all per-key work
This module owns the block-group dialect ops:
BgRootCjMulti— the BG dispatch shape for root multi-source ColumnJoin (count/materialize/fused kernel bodies). Lifted N4.1 from legacyjit_root_column_join_block_group. Bundles the work-assignment preamble, binary-search key loop, per-source handle narrowing, warp-row redistribution, optional D2L segment-loops, and the wrapped body into one IR op.BgSourceSpec— per-source descriptor consumed byBgRootCjMulti.
CUDA emission lives in codegen/cuda/render/parallel_data.py — both
_render_bg_root_cj_multi (BgRootCjMulti’s renderer) and
emit_bg_histogram_kernel (the standalone histogram template called
by the runner). Per docs/stage3a_execution_plan.md §7 task S3A.9b,
the dialect file holds only data; rendering is the codegen’s job.
Module Contents¶
Classes¶
Block-group root multi-source ColumnJoin (count/materialize body). |
|
Per-source descriptor for |
Data¶
API¶
- class srdatalog.ir.dialects.parallel.data.block_group.BgRootCjMulti[source]¶
Bases:
srdatalog.ir.core.OpBlock-group root multi-source ColumnJoin (count/materialize body).
Lifts legacy
jit_root_column_join_block_groupinto a single dialect op. Emits the full BG scaffolding aroundbody:block-level work assignment preamble (work_per_block, block_begin/end, return-if-out-of-range — with
thread_counts[thread_id] = 0;in count mode);binary search for the starting key index;
per-key loop with key-range checks;
per-source handle narrow with the first source using the key_idx hint; multi-view (D2L FULL_VER) non-first sources defer their handle bind to a
_bg_seg_<idx>segment loop;warp-row redistribution narrowing the first source handle proportionally on its degree;
segment loops (when present) wrapping the body;
body emit;
segment-loop close braces,
bg_remaining_begin = ...;, key-loop close brace.
var_nameis the sanitized inner-bind name (auto <var> = root_val_<n>;).is_countingtoggles the count-phasethread_counts[...] = 0;early-exit branch.key_idx_var,root_val_var,hint_lo,hint_hiare the counter-allocated names.Sources is in pipeline order (sources[0] = first/hint source; sources[i>0] = either single-view direct narrow or multi-view segment-loop deferred).
- body: srdatalog.ir.core.Op¶
None
- sources: tuple[srdatalog.ir.dialects.parallel.data.block_group.BgSourceSpec, ...]¶
None
- class srdatalog.ir.dialects.parallel.data.block_group.BgSourceSpec[source]¶
Per-source descriptor for
BgRootCjMulti.Carries the legacy state that
jit_root_column_join_block_groupthreads through its emit: rel_name + view/handle var names, multi-view view_count for D2L segment loops, base view-slot for per-segment view rebinding, and the index_type passed to legacy helpers (gen_root_handle,gen_valid).
- srdatalog.ir.dialects.parallel.data.block_group.__all__¶
[‘BgRootCjMulti’, ‘BgSourceSpec’]