Benchmark Mode
Benchmark mode reproduces Sim2Real-ST evaluations. It compares reconstructed
SVCs against ground-truth SVCs and writes per-gene metrics to
metrics_normalized.csv. The main paper-facing metrics are PCC, SSIM, and
MSE; NRMSE is retained in the CSV for compatibility with earlier reports.
Use benchmark mode when you want to evaluate reconstruction behavior under a controlled confounding factor. Use Application Mode instead when you want to run REVISE on real application data without ground-truth SVCs.
At a Glance
Need |
Where it is configured |
|---|---|
Input data |
|
Confounding family |
|
Ground truth |
Resolved from the Sim2Real-ST data layout and |
Output metrics |
|
Supported Confounding Factors
|
Profile |
Notes |
|---|---|---|
|
|
sp-SVC benchmark over segmentation methods |
|
|
sp-SVC benchmark for bin-to-cell assignment. |
|
|
sc-SVC super-resolution benchmark across batch-reference settings. |
|
|
sc-SVC super-resolution benchmark across spot sizes. |
|
|
sc-SVC imputation benchmark for limited gene panels. |
|
|
sc-SVC imputation benchmark for dropout. |
Command-Line Usage
Run one confounding-factor family:
python benchmark_main.py \
--cf segmentation \
--raw_data_path raw_data/Sim2Real-ST \
--sample_name P2CRC/cut_part1 \
--task segmentation \
--save_path output/benchmark
--seed_scope controls reproducibility granularity. The default
process seeds once for the full script to preserve wrapper parity, while
run resets the seed for each individual case.
Run all supported confounding-factor families for one or more sample parts:
SAMPLE_PARTS="part1 part2 part3" bash benchmark_main.sh
benchmark_main.sh writes launcher logs under 0_records/ and defaults to
results_unified/benchmark_runs/<timestamp> unless SAVE_PATH is set.
By default, it runs P2CRC/cut_part1 only. Set SAMPLE_PARTS when you
need the complete part1/part2/part3 benchmark set.
The focused launchers under reproduce/benchmark/benchmark_*.sh use the same
benchmark_main.py entry point and accept the same runtime environment:
export RAW_DATA_PATH=raw_data/Sim2Real-ST
export SAMPLE_PATIENT=P2CRC
export SAMPLE_PARTS="part1 part2 part3"
export CONFIG_PATH=revise/revise.yaml
export SAVE_PATH=results_unified/benchmark_runs/sim2real_all
bash benchmark_main.sh
To run the six focused launchers instead, keep the same environment and call the task-specific scripts:
bash reproduce/benchmark/benchmark_segmentation.sh
bash reproduce/benchmark/benchmark_bin2cell.sh
bash reproduce/benchmark/benchmark_batch_effect.sh
bash reproduce/benchmark/benchmark_spot_size.sh
bash reproduce/benchmark/benchmark_gene_panel.sh
bash reproduce/benchmark/benchmark_gene_dropout.sh
With the same SAVE_PATH and sample settings, the six focused launchers
produce the same metrics tree as benchmark_main.sh. The difference is only
orchestration: benchmark_main.sh starts every supported confounding family,
while each focused launcher starts one family.
Benchmark outputs are organized by task, sample, route, and case leaf. Each
completed case writes provenance files and, when evaluation is enabled,
metrics_normalized.csv.
<save_path>/<task>/<sample_name>/<route>/<case>/
├── merged_config.json
├── provenance.json
├── metrics_normalized.csv
└── *.h5ad
Before comparing cases, inspect merged_config.json and provenance.json
to confirm the route, seed, input fingerprints, and stage trace.
Python API
from revise.framework import REVISEPipeline
pipeline = REVISEPipeline()
svc = pipeline.run(
profile="benchmark_seg",
runtime_overrides={"platform": "sim2real", "confounding": "segmentation"},
io_overrides={
"data_root": "raw_data/Sim2Real-ST/segmentation",
"output_root": "output/benchmark",
"sample_name": "P2CRC/cut_part1",
"st_file": "xenium_spot.h5ad",
"gt_svc_file": "selected_xenium.h5ad",
"sc_ref_file": "real_sc_ref.h5ad",
"seg_method": "seg_1",
},
)