Benchmark Mode ============== Benchmark mode reproduces Sim2Real-ST evaluations. It compares reconstructed SVCs against ground-truth SVCs and writes per-gene metrics to ``metrics_normalized.csv``. The main paper-facing metrics are PCC, SSIM, and MSE; NRMSE is retained in the CSV for compatibility with earlier reports. Use benchmark mode when you want to evaluate reconstruction behavior under a controlled confounding factor. Use :doc:`case` instead when you want to run REVISE on real application data without ground-truth SVCs. At a Glance ----------- .. list-table:: :header-rows: 1 :widths: 1 2 * - Need - Where it is configured * - Input data - ``--raw_data_path`` and ``--sample_name`` in ``benchmark_main.py``. * - Confounding family - ``--cf`` maps to one of the benchmark profiles in ``revise/revise.yaml``. * - Ground truth - Resolved from the Sim2Real-ST data layout and ``io.gt_svc_file``. * - Output metrics - ``metrics_normalized.csv`` under the resolved route/case directory. Supported Confounding Factors ----------------------------- .. list-table:: :header-rows: 1 :widths: 1 2 2 * - ``--cf`` - Profile - Notes * - ``segmentation`` - ``benchmark_seg`` - sp-SVC benchmark over segmentation methods ``seg_1`` to ``seg_4``. * - ``bin2cell`` - ``benchmark_bin2cell`` - sp-SVC benchmark for bin-to-cell assignment. * - ``batch_effect`` - ``benchmark_sr_batch`` - sc-SVC super-resolution benchmark across batch-reference settings. * - ``spot_size`` - ``benchmark_sr_spot_size`` - sc-SVC super-resolution benchmark across spot sizes. * - ``gene_panel`` - ``benchmark_impute_panel`` - sc-SVC imputation benchmark for limited gene panels. * - ``gene_dropout`` - ``benchmark_impute_dropout`` - sc-SVC imputation benchmark for dropout. Command-Line Usage ------------------ Run one confounding-factor family: .. code-block:: bash python benchmark_main.py \ --cf segmentation \ --raw_data_path raw_data/Sim2Real-ST \ --sample_name P2CRC/cut_part1 \ --task segmentation \ --save_path output/benchmark ``--seed_scope`` controls reproducibility granularity. The default ``process`` seeds once for the full script to preserve wrapper parity, while ``run`` resets the seed for each individual case. Run all supported confounding-factor families for one or more sample parts: .. code-block:: bash SAMPLE_PARTS="part1 part2 part3" bash benchmark_main.sh ``benchmark_main.sh`` writes launcher logs under ``0_records/`` and defaults to ``results_unified/benchmark_runs/`` unless ``SAVE_PATH`` is set. By default, it runs ``P2CRC/cut_part1`` only. Set ``SAMPLE_PARTS`` when you need the complete ``part1``/``part2``/``part3`` benchmark set. The focused launchers under ``reproduce/benchmark/benchmark_*.sh`` use the same ``benchmark_main.py`` entry point and accept the same runtime environment: .. code-block:: bash export RAW_DATA_PATH=raw_data/Sim2Real-ST export SAMPLE_PATIENT=P2CRC export SAMPLE_PARTS="part1 part2 part3" export CONFIG_PATH=revise/revise.yaml export SAVE_PATH=results_unified/benchmark_runs/sim2real_all bash benchmark_main.sh To run the six focused launchers instead, keep the same environment and call the task-specific scripts: .. code-block:: bash bash reproduce/benchmark/benchmark_segmentation.sh bash reproduce/benchmark/benchmark_bin2cell.sh bash reproduce/benchmark/benchmark_batch_effect.sh bash reproduce/benchmark/benchmark_spot_size.sh bash reproduce/benchmark/benchmark_gene_panel.sh bash reproduce/benchmark/benchmark_gene_dropout.sh With the same ``SAVE_PATH`` and sample settings, the six focused launchers produce the same metrics tree as ``benchmark_main.sh``. The difference is only orchestration: ``benchmark_main.sh`` starts every supported confounding family, while each focused launcher starts one family. Benchmark outputs are organized by task, sample, route, and case leaf. Each completed case writes provenance files and, when evaluation is enabled, ``metrics_normalized.csv``. .. code-block:: text ///// ├── merged_config.json ├── provenance.json ├── metrics_normalized.csv └── *.h5ad Before comparing cases, inspect ``merged_config.json`` and ``provenance.json`` to confirm the route, seed, input fingerprints, and stage trace. Python API ---------- .. code-block:: python from revise.framework import REVISEPipeline pipeline = REVISEPipeline() svc = pipeline.run( profile="benchmark_seg", runtime_overrides={"platform": "sim2real", "confounding": "segmentation"}, io_overrides={ "data_root": "raw_data/Sim2Real-ST/segmentation", "output_root": "output/benchmark", "sample_name": "P2CRC/cut_part1", "st_file": "xenium_spot.h5ad", "gt_svc_file": "selected_xenium.h5ad", "sc_ref_file": "real_sc_ref.h5ad", "seg_method": "seg_1", }, ) Benchmark Notebooks ------------------- .. toctree:: :maxdepth: 1 segmentation benchmark <../benchmark/seg_benchmark> spot size benchmark <../benchmark/spot_benchmark> batch effect benchmark <../benchmark/batch_benchmark> imputation benchmark <../benchmark/imputation_benchmark>