Benchmark Mode
==============

Benchmark mode reproduces Sim2Real-ST evaluations. It compares reconstructed
SVCs against ground-truth SVCs and writes per-gene metrics to
``metrics_normalized.csv``. The main paper-facing metrics are PCC, SSIM, and
MSE; NRMSE is retained in the CSV for compatibility with earlier reports.

Use benchmark mode when you want to evaluate reconstruction behavior under a
controlled confounding factor. Use :doc:`case` instead when you want to run
REVISE on real application data without ground-truth SVCs.

At a Glance
-----------

.. list-table::
   :header-rows: 1
   :widths: 1 2

   * - Need
     - Where it is configured
   * - Input data
     - ``--raw_data_path`` and ``--sample_name`` in ``benchmark_main.py``.
   * - Confounding family
     - ``--cf`` maps to one of the benchmark profiles in
       ``revise/revise.yaml``.
   * - Ground truth
     - Resolved from the Sim2Real-ST data layout and ``io.gt_svc_file``.
   * - Output metrics
     - ``metrics_normalized.csv`` under the resolved route/case directory.


Supported Confounding Factors
-----------------------------

.. list-table::
   :header-rows: 1
   :widths: 1 2 2

   * - ``--cf``
     - Profile
     - Notes
   * - ``segmentation``
     - ``benchmark_seg``
     - sp-SVC benchmark over segmentation methods ``seg_1`` to ``seg_4``.
   * - ``bin2cell``
     - ``benchmark_bin2cell``
     - sp-SVC benchmark for bin-to-cell assignment.
   * - ``batch_effect``
     - ``benchmark_sr_batch``
     - sc-SVC super-resolution benchmark across batch-reference settings.
   * - ``spot_size``
     - ``benchmark_sr_spot_size``
     - sc-SVC super-resolution benchmark across spot sizes.
   * - ``gene_panel``
     - ``benchmark_impute_panel``
     - sc-SVC imputation benchmark for limited gene panels.
   * - ``gene_dropout``
     - ``benchmark_impute_dropout``
     - sc-SVC imputation benchmark for dropout.


Command-Line Usage
------------------

Run one confounding-factor family:

.. code-block:: bash

   python benchmark_main.py \
     --cf segmentation \
     --raw_data_path raw_data/Sim2Real-ST \
     --sample_name P2CRC/cut_part1 \
     --task segmentation \
     --save_path output/benchmark

``--seed_scope`` controls reproducibility granularity. The default
``process`` seeds once for the full script to preserve wrapper parity, while
``run`` resets the seed for each individual case.

Run all supported confounding-factor families for one or more sample parts:

.. code-block:: bash

   SAMPLE_PARTS="part1 part2 part3" bash benchmark_main.sh

``benchmark_main.sh`` writes launcher logs under ``0_records/`` and defaults to
``results_unified/benchmark_runs/<timestamp>`` unless ``SAVE_PATH`` is set.
By default, it runs ``P2CRC/cut_part1`` only. Set ``SAMPLE_PARTS`` when you
need the complete ``part1``/``part2``/``part3`` benchmark set.

The focused launchers under ``reproduce/benchmark/benchmark_*.sh`` use the same
``benchmark_main.py`` entry point and accept the same runtime environment:

.. code-block:: bash

   export RAW_DATA_PATH=raw_data/Sim2Real-ST
   export SAMPLE_PATIENT=P2CRC
   export SAMPLE_PARTS="part1 part2 part3"
   export CONFIG_PATH=revise/revise.yaml
   export SAVE_PATH=results_unified/benchmark_runs/sim2real_all

   bash benchmark_main.sh

To run the six focused launchers instead, keep the same environment and call
the task-specific scripts:

.. code-block:: bash

   bash reproduce/benchmark/benchmark_segmentation.sh
   bash reproduce/benchmark/benchmark_bin2cell.sh
   bash reproduce/benchmark/benchmark_batch_effect.sh
   bash reproduce/benchmark/benchmark_spot_size.sh
   bash reproduce/benchmark/benchmark_gene_panel.sh
   bash reproduce/benchmark/benchmark_gene_dropout.sh

With the same ``SAVE_PATH`` and sample settings, the six focused launchers
produce the same metrics tree as ``benchmark_main.sh``. The difference is only
orchestration: ``benchmark_main.sh`` starts every supported confounding family,
while each focused launcher starts one family.

Benchmark outputs are organized by task, sample, route, and case leaf. Each
completed case writes provenance files and, when evaluation is enabled,
``metrics_normalized.csv``.

.. code-block:: text

   <save_path>/<task>/<sample_name>/<route>/<case>/
   ├── merged_config.json
   ├── provenance.json
   ├── metrics_normalized.csv
   └── *.h5ad

Before comparing cases, inspect ``merged_config.json`` and ``provenance.json``
to confirm the route, seed, input fingerprints, and stage trace.


Python API
----------

.. code-block:: python

   from revise.framework import REVISEPipeline

   pipeline = REVISEPipeline()
   svc = pipeline.run(
       profile="benchmark_seg",
       runtime_overrides={"platform": "sim2real", "confounding": "segmentation"},
       io_overrides={
           "data_root": "raw_data/Sim2Real-ST/segmentation",
           "output_root": "output/benchmark",
           "sample_name": "P2CRC/cut_part1",
           "st_file": "xenium_spot.h5ad",
           "gt_svc_file": "selected_xenium.h5ad",
           "sc_ref_file": "real_sc_ref.h5ad",
           "seg_method": "seg_1",
       },
   )


Benchmark Notebooks
-------------------

.. toctree::
   :maxdepth: 1

   segmentation benchmark <../benchmark/seg_benchmark>
   spot size benchmark <../benchmark/spot_benchmark>
   batch effect benchmark <../benchmark/batch_benchmark>
   imputation benchmark <../benchmark/imputation_benchmark>