Benchmark Mode

Benchmark mode reproduces Sim2Real-ST evaluations. It compares reconstructed SVCs against ground-truth SVCs and writes per-gene metrics to metrics_normalized.csv. The main paper-facing metrics are PCC, SSIM, and MSE; NRMSE is retained in the CSV for compatibility with earlier reports.

Use benchmark mode when you want to evaluate reconstruction behavior under a controlled confounding factor. Use Application Mode instead when you want to run REVISE on real application data without ground-truth SVCs.

At a Glance

Need

Where it is configured

Input data

--raw_data_path and --sample_name in benchmark_main.py.

Confounding family

--cf maps to one of the benchmark profiles in revise/revise.yaml.

Ground truth

Resolved from the Sim2Real-ST data layout and io.gt_svc_file.

Output metrics

metrics_normalized.csv under the resolved route/case directory.

Supported Confounding Factors

--cf

Profile

Notes

segmentation

benchmark_seg

sp-SVC benchmark over segmentation methods seg_1 to seg_4.

bin2cell

benchmark_bin2cell

sp-SVC benchmark for bin-to-cell assignment.

batch_effect

benchmark_sr_batch

sc-SVC super-resolution benchmark across batch-reference settings.

spot_size

benchmark_sr_spot_size

sc-SVC super-resolution benchmark across spot sizes.

gene_panel

benchmark_impute_panel

sc-SVC imputation benchmark for limited gene panels.

gene_dropout

benchmark_impute_dropout

sc-SVC imputation benchmark for dropout.

Command-Line Usage

Run one confounding-factor family:

python benchmark_main.py \
  --cf segmentation \
  --raw_data_path raw_data/Sim2Real-ST \
  --sample_name P2CRC/cut_part1 \
  --task segmentation \
  --save_path output/benchmark

--seed_scope controls reproducibility granularity. The default process seeds once for the full script to preserve wrapper parity, while run resets the seed for each individual case.

Run all supported confounding-factor families for one or more sample parts:

SAMPLE_PARTS="part1 part2 part3" bash benchmark_main.sh

benchmark_main.sh writes launcher logs under 0_records/ and defaults to results_unified/benchmark_runs/<timestamp> unless SAVE_PATH is set. By default, it runs P2CRC/cut_part1 only. Set SAMPLE_PARTS when you need the complete part1/part2/part3 benchmark set.

The focused launchers under reproduce/benchmark/benchmark_*.sh use the same benchmark_main.py entry point and accept the same runtime environment:

export RAW_DATA_PATH=raw_data/Sim2Real-ST
export SAMPLE_PATIENT=P2CRC
export SAMPLE_PARTS="part1 part2 part3"
export CONFIG_PATH=revise/revise.yaml
export SAVE_PATH=results_unified/benchmark_runs/sim2real_all

bash benchmark_main.sh

To run the six focused launchers instead, keep the same environment and call the task-specific scripts:

bash reproduce/benchmark/benchmark_segmentation.sh
bash reproduce/benchmark/benchmark_bin2cell.sh
bash reproduce/benchmark/benchmark_batch_effect.sh
bash reproduce/benchmark/benchmark_spot_size.sh
bash reproduce/benchmark/benchmark_gene_panel.sh
bash reproduce/benchmark/benchmark_gene_dropout.sh

With the same SAVE_PATH and sample settings, the six focused launchers produce the same metrics tree as benchmark_main.sh. The difference is only orchestration: benchmark_main.sh starts every supported confounding family, while each focused launcher starts one family.

Benchmark outputs are organized by task, sample, route, and case leaf. Each completed case writes provenance files and, when evaluation is enabled, metrics_normalized.csv.

<save_path>/<task>/<sample_name>/<route>/<case>/
├── merged_config.json
├── provenance.json
├── metrics_normalized.csv
└── *.h5ad

Before comparing cases, inspect merged_config.json and provenance.json to confirm the route, seed, input fingerprints, and stage trace.

Python API

from revise.framework import REVISEPipeline

pipeline = REVISEPipeline()
svc = pipeline.run(
    profile="benchmark_seg",
    runtime_overrides={"platform": "sim2real", "confounding": "segmentation"},
    io_overrides={
        "data_root": "raw_data/Sim2Real-ST/segmentation",
        "output_root": "output/benchmark",
        "sample_name": "P2CRC/cut_part1",
        "st_file": "xenium_spot.h5ad",
        "gt_svc_file": "selected_xenium.h5ad",
        "sc_ref_file": "real_sc_ref.h5ad",
        "seg_method": "seg_1",
    },
)

Benchmark Notebooks