.. _rig-support:

Rig Support
===========

COLMAP has native support for modeling sensor rigs during the reconstruction
process. The sensors in a rig are assumed to have fixed relative poses between
each other with one reference sensor defining the origin of the rig. A frame
defines a specific instance of the rig with all or a subset of sensors exposed
at the same time. For example, in a stereo camera rig, one camera would be
defined as the reference sensor and have an identity `sensor_from_rig` pose,
whereas the second camera would be posed relative to the reference camera. Each
frame would then usually be composed of two images as the measurements of both
of the cameras at the same time.

Workflow
--------

By default, when running the standard reconstruction pipeline, each camera is
modeled with a separate rig and thus each frame contains only a single image. To
model rigs, the recommended workflow is to organize images by rigs and cameras
in a folder structure as follows (ensure that images corresponding to the same
frame have identical filenames across all folders)::

    rig1/
        camera1/
            image0001.jpg
            image0002.jpg
            ...
        camera2/
            image0001.jpg # same frame as camera1/image0001.jpg
            image0002.jpg # same frame as camera1/image0002.jpg
            ...
        ...
    rig2/
        camera1/
            ...
        ...
    ...

As a next step, we would first extract features using::

    colmap feature_extractor \
        --image_path $DATASET_PATH/images \
        --database_path $DATASET_PATH/database.db \
        --ImageReader.single_camera_per_folder 1

By default, the resulting database now contains a separate rig for each camera
and a separate frame for each image. As such, we must adjust the relationships
in the database with the desired rig configuration. This is done using::

    colmap rig_configurator \
        --database_path $DATASET_PATH/database.db \
        --rig_config_path $DATASET_PATH/rig_config.json

where the `rig_config.json` could look as follows, if the relative sensor poses
in the rig are known a priori::

    [
      {
        "cameras": [
          {
            "image_prefix": "rig1/camera1/",
            "ref_sensor": true
          },
          {
            "image_prefix": "rig1/camera2/",
            "cam_from_rig_rotation": [
                0.7071067811865475,
                0.0,
                0.7071067811865476,
                0.0
            ],
            "cam_from_rig_translation": [
                0,
                0,
                0
            ]
          }
        ]
      },
      {
        "cameras": [
          {
            "image_prefix": "rig2/camera1/",
            "ref_sensor": true
          },
          ...
        ]
      },
      ...
    ]

Notice that this modifies the rig and frame configuration in the database, which
contains the full specification of rigs that we later feed as an input to
downstream processing steps.

With known calibrated camera parameters, each camera can optionally also have
specified `camera_model_name` and `camera_params` fields.

For more fine-grain configuration of rigs and frames, the most convenient option
is to manually configure the database using pycolmap by either using the
`apply_rig_config` function or by individually adding the desired rig and frame
objects to the reconstruction for the most flexibility.

Next, we run standard feature matching. Note that it is important to configure
the rigs before sequential feature matching, as images in consecutive frames will
be automatically matched against each other.

Finally, we can reconstruct the scene using the standard `mapper` command with
the option of keeping the relative poses in the rig fixed using
`--Mapper.ba_refine_sensor_from_rig 0`.

Unknown rig sensor poses
------------------------

If the relative poses of sensors in the rig are not known a priori and we only
know that a specific set of sensors are rigidly mounted and exposed at the same
time, one can attempt the following two-step reconstruction approach. Before
starting, ensure to organize your images as detailed above and perform feature
extraction with the `--ImageReader.single_camera_per_folder 1` option.

Next, reconstruct the scene without rig constraints by modeling each camera as
its own rig (the default behavior of COLMAP without further configuration). Note
that this can be a partial reconstruction from a subset of the full set of input
images. The only requirement is that each camera must have at least one
registered image in the same frame with a registered image of the reference
camera. If the reconstruction was successful and the relative poses between
registered images look roughly correct, we can proceed with the next step.

The `rig_configurator` can also work without `cam_from_rig_*` transformations.
By providing an existing (partial) reconstruction of the scene, it can compute
the average relative rig sensor poses from all registered images::

    colmap rig_configurator \
        --database_path $DATASET_PATH/database.db \
        --input_path $DATASET_PATH/sparse-model-without-rigs-and-frames \
        --rig_config_path $DATASET_PATH/rig_config.json \
        [ --output_path $DATASET_PATH/sparse-model-with-rigs-and-frames ]

The provided `rig_config.json` must simply omit the respective
`cam_from_rig_rotation` and `cam_from_rig_translation` fields.

Now, we can either run rig bundle adjustment on the (optional) output
reconstruction with configured rigs and frames::

    colmap bundle_adjuster \
        --input_path $DATASET_PATH/sparse-model-with-rigs-and-frames \
        --output_path $DATASET_PATH/bundled-sparse-model-with-rigs-and-frames

or alternatively start the reconstruction process from scratch with rig
constraints, which may lead to more accurate reconstruction results::

    colmap mapper
        --image_path $DATASET_PATH/images \
        --database_path $DATASET_PATH/database.db \
        --output_path $DATASET_PATH/sparse-model-with-rigs-and-frames


Example
-------

The following example shows an end-to-end example for how to reconstruct one of
the ETH3D rig datasets using COLMAP's rig support::

    wget https://www.eth3d.net/data/terrains_rig_undistorted.7z
    7zz x terrains_rig_undistorted.7z

    colmap feature_extractor \
        --database_path terrains/database.db \
        --image_path terrains/images \
        --ImageReader.single_camera_per_folder 1

The ETH3D dataset conveniently comes with a groundtruth COLMAP reconstruction
that we use to configure the sensor rig poses as well as camera models using::

    colmap rig_configurator \
        --database_path terrains/database.db \
        --rig_config_path terrains/rig_config.json \
        --input_path terrains/rig_calibration_undistorted

with the `rig_config.json`::

    [
        {
            "cameras": [
                {
                    "image_prefix": "images_rig_cam4_undistorted/",
                    "ref_sensor": true
                },
                {
                    "image_prefix": "images_rig_cam5_undistorted/"
                },
                {
                    "image_prefix": "images_rig_cam6_undistorted/"
                },
                {
                    "image_prefix": "images_rig_cam7_undistorted/"
                }
            ]
        }
    ]

Notice that we do not specify the sensor poses, because we used an existing
reconstruction (in this case, the groundtruth but it can also be a
reconstruction without rig constraints, as explained in the previous section) to
automatically infer the average rig extrinsics and camera parameters.

Next, we sequentially match the frames, since they were captured as a video::

    colmap sequential_matcher --database_path terrains/database.db

Finally, we reconstruct the scene using the mapper while keeping the groundtruth
sensor rig poses and camera parameters fixed::

    mkdir -p terrains/sparse
    colmap mapper \
        --database_path terrains/database.db \
        --Mapper.ba_refine_sensor_from_rig 0 \
        --Mapper.ba_refine_focal_length 0 \
        --Mapper.ba_refine_extra_params 0 \
        --output_path terrains/sparse


Reconstruction from 360° spherical images
-----------------------------------------

COLMAP can handle collections of 360° panoramas by rendering virtual pinhole
images (similar to a cubemap) and treating them as a camera rig. Since the rig
extrinsics and camera intrinsics are known, the reconstruction process is more
robust. We provide an example Python script to reconstruct a 360° collection::

    python python/examples/panorama_sfm.py \
        --input_image_path image_directory \
        --output_path output_directory