Benchmark Mode

Benchmark mode reproduces Sim2Real-ST evaluations. It compares reconstructed SVCs against ground-truth SVCs and writes per-gene metrics to metrics_normalized.csv. The main paper-facing metrics are PCC, SSIM, and MSE; NRMSE is retained in the CSV for compatibility with earlier reports.

Use benchmark mode when you want to evaluate reconstruction behavior under a controlled confounding factor. Use Application Mode instead when you want to run REVISE on real application data without ground-truth SVCs.

At a Glance

Need	Where it is configured
Input data	`--raw_data_path` and `--sample_name` in `benchmark_main.py`.
Confounding family	`--cf` maps to one of the benchmark profiles in `revise/revise.yaml`.
Ground truth	Resolved from the Sim2Real-ST data layout and `io.gt_svc_file`.
Output metrics	`metrics_normalized.csv` under the resolved route/case directory.

Supported Confounding Factors

`--cf`	Profile	Notes
`segmentation`	`benchmark_seg`	sp-SVC benchmark over segmentation methods `seg_1` to `seg_4`.
`bin2cell`	`benchmark_bin2cell`	sp-SVC benchmark for bin-to-cell assignment.
`batch_effect`	`benchmark_sr_batch`	sc-SVC super-resolution benchmark across batch-reference settings.
`spot_size`	`benchmark_sr_spot_size`	sc-SVC super-resolution benchmark across spot sizes.
`gene_panel`	`benchmark_impute_panel`	sc-SVC imputation benchmark for limited gene panels.
`gene_dropout`	`benchmark_impute_dropout`	sc-SVC imputation benchmark for dropout.

Command-Line Usage

Run one confounding-factor family:

python benchmark_main.py \
  --cf segmentation \
  --raw_data_path raw_data/Sim2Real-ST \
  --sample_name P2CRC/cut_part1 \
  --task segmentation \
  --save_path output/benchmark

--seed_scope controls reproducibility granularity. The default process seeds once for the full script to preserve wrapper parity, while run resets the seed for each individual case.

Run all supported confounding-factor families for one or more sample parts:

SAMPLE_PARTS="part1 part2 part3" bash benchmark_main.sh

benchmark_main.sh writes launcher logs under 0_records/ and defaults to results_unified/benchmark_runs/<timestamp> unless SAVE_PATH is set. By default, it runs P2CRC/cut_part1 only. Set SAMPLE_PARTS when you need the complete part1/part2/part3 benchmark set.

The focused launchers under reproduce/benchmark/benchmark_*.sh use the same benchmark_main.py entry point and accept the same runtime environment:

export RAW_DATA_PATH=raw_data/Sim2Real-ST
export SAMPLE_PATIENT=P2CRC
export SAMPLE_PARTS="part1 part2 part3"
export CONFIG_PATH=revise/revise.yaml
export SAVE_PATH=results_unified/benchmark_runs/sim2real_all

bash benchmark_main.sh

To run the six focused launchers instead, keep the same environment and call the task-specific scripts:

bash reproduce/benchmark/benchmark_segmentation.sh
bash reproduce/benchmark/benchmark_bin2cell.sh
bash reproduce/benchmark/benchmark_batch_effect.sh
bash reproduce/benchmark/benchmark_spot_size.sh
bash reproduce/benchmark/benchmark_gene_panel.sh
bash reproduce/benchmark/benchmark_gene_dropout.sh

With the same SAVE_PATH and sample settings, the six focused launchers produce the same metrics tree as benchmark_main.sh. The difference is only orchestration: benchmark_main.sh starts every supported confounding family, while each focused launcher starts one family.

Benchmark outputs are organized by task, sample, route, and case leaf. Each completed case writes provenance files and, when evaluation is enabled, metrics_normalized.csv.

<save_path>/<task>/<sample_name>/<route>/<case>/
├── merged_config.json
├── provenance.json
├── metrics_normalized.csv
└── *.h5ad

Before comparing cases, inspect merged_config.json and provenance.json to confirm the route, seed, input fingerprints, and stage trace.

Python API

from revise.framework import REVISEPipeline

pipeline = REVISEPipeline()
svc = pipeline.run(
    profile="benchmark_seg",
    runtime_overrides={"platform": "sim2real", "confounding": "segmentation"},
    io_overrides={
        "data_root": "raw_data/Sim2Real-ST/segmentation",
        "output_root": "output/benchmark",
        "sample_name": "P2CRC/cut_part1",
        "st_file": "xenium_spot.h5ad",
        "gt_svc_file": "selected_xenium.h5ad",
        "sc_ref_file": "real_sc_ref.h5ad",
        "seg_method": "seg_1",
    },
)

Benchmark Mode

At a Glance

Supported Confounding Factors

Command-Line Usage

Python API

Benchmark Notebooks