program and hardware design
Software, FPGA design, reused code, and failed attempts.
Program Details
The software preprocessing path computes the tables used by the FPGA sampler. It generates candidate parent sets, calculates local scores, converts scores into fixed-point log-space values, and emits binary/hex files for hardware initialization.
Representative preprocessing command:
uv run python preprocess_bn.py \
--data cleaned-datasets/asia_samples.csv \
--output-dir out_bn_tables \
--target-type discrete \
--max-parent-size 4 \
--max-candidates-per-node 10 \
--bootstrap-iters 20 \
--min-stability-frequency 0.3 \
--score bdeu \
--equivalent-sample-size 1.0 \
--fixed-point q16.16 \
--emit-hex
TODO: Describe the final software modules, file formats, and tricky implementation details.
Likely tricky parts:
- Keeping parent-set masks aligned with score-table entries.
- Choosing fixed-point widths that preserve scoring behavior.
- Validating hardware score output against floating-point software reference output.
- Preventing overflow or underflow in log-space accumulation.
Hardware Details
The FPGA design stores precomputed parent-set masks and local scores in BRAM. It proposes new node orders, checks which parent sets are valid under the proposed order, accumulates scores for each node, and applies the MCMC acceptance rule.
Core hardware blocks:
| Block | Purpose |
|---|---|
| LFSR/random source | Generates random proposal choices and acceptance thresholds. |
| Order representation | Stores the current and proposed topological order. |
| Parent-set memory | Stores candidate parent masks generated by software. |
| Score memory | Stores fixed-point local scores. |
| Compatibility logic | Checks whether a parent set is valid for the proposed order. |
| Per-node score accumulators | Accumulate valid local scores in parallel. |
| Log-add LUT | Approximates log(1 + exp(x)). |
| MCMC controller | Accepts or rejects proposed orders. |
TODO: Add enough detail that another group could rebuild the design: module names, interfaces, clocking, reset behavior, memory map, and HPS/FPGA communication.
External Design and Code References
TODO: List every external codebase, hardware design, paper implementation, dataset, or course module reused or adapted.
For each item, include source name, URL/citation, license or usage permission, what was reused, and what was modified.
Things Tried That Did Not Work
TODO: Add failed or abandoned approaches.
Candidates:
- Direct graph-space sampling versus order-space sampling.
- Floating-point hardware scoring versus fixed-point scoring.
- Different LUT ranges or fixed-point widths.
- Designs that exceeded timing, memory, or FPGA resource limits.