No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Our Bi-Layout Estimations

Blue and Green represent ground truth labels and our predictions, respectively.

Input Panorama Our Predictions

Abstract

Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is designed to capture specific contextual information for each layout type. With our novel feature guidance module, the image feature retrieves relevant context from these embeddings, generating layout-aware features for precise bi-layout predictions.

A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions. To circumvent the need for manual correction of ambiguous annotations during testing, we also introduce a new metric for disambiguating ground truth layouts. Our method demonstrates superior performance on benchmark datasets, notably outperforming leading approaches. Specifically, on the MatterportLayout dataset, it improves 3DIoU from 81.70% to 82.57% across the full test set and notably from 54.80% to 59.97% in subsets with significant ambiguity.

Video

Method

We proposed three main parts to construct our bi-layout model.

(a) The first part is the Feature Extractor to encode the panorama. We use ResNet to generate different levels of features and use our simplified height compression module to compress and concatenate these features into an efficient 1-dimensional feature representation.

(b) The second part is our Global Context Embeddings, which learn the global contextual information from the dataset labels. These embeddings learn the contextual information during training time.

(c) The third part is our Shared Feature Guidance Module, which guides the shared panorama feature with corresponding global context embedding and generates the final feature for different predictions. This is the most important part of our model design to generate the two distinct layout predictions.

Qantitative Comparison

Full set and Subset evaluation. Equivalent branch represents the output, which is trained with the same label as baseline methods. Disambiguate is our proposed metric.

Qualitative Comparison

Qualitative comparison on the MatterportLayout (top) and ZInd datasets (bottom). Blue and Green represent ground truth labels and predictions, respectively. The boundaries of the room layout are on the left, and their bird's eye view projections are on the right. We show our disambiguate results, which effectively address the ambiguity issue, while the SoTA methods struggle with the ambiguity, as highlighted in dashed lines.

BibTeX

@inproceedings{tsai2024no,
        title={No more ambiguity in 360◦ room layout via bi-layout estimation},
        author={Tsai, Yu-Ju and Jhang, Jin-Cheng and Zheng, Jingjing and Wang, Wei and Chen, Albert and Sun, Min and Kuo, Cheng-Hao and Yang, Ming-Hsuan},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year={2024}
      }

No More Ambiguity in 360° Room Layout via
Bi-Layout Estimation

CVPR 2024