Unified Multimodal Layout Control for Image Composition
SenseTime Research
Layout constraints are encoded as succinct coordinate expressions in text, allowing the model to bind each subject identity and its designated spatial position through the shared token space governing both understanding and generation.
A coordinate-aware classifier-free guidance mechanism further enhances spatial fidelity during sampling, without altering the backbone architecture or introducing task-specific layout-centric branches.
We evaluate on COCO-Position for layout controllability and MS-Bench for identity-consistent multi-reference generation. ConsistCompose establishes state-of-the-art performance on both benchmarks.
| Methods | Instance Success Ratio (%) ↑ | Image Success Ratio (%) ↑ | Position Accuracy (%) ↑ | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| L2 | L3 | L4 | L5 | L6 | Avg | L2 | L3 | L4 | L5 | L6 | Avg | mIoU | AP | AP50 | AP75 | |
| GLIGEN | 89.1 | 86.3 | 82.0 | 79.6 | 81.6 | 82.6 | 78.8 | 63.8 | 48.1 | 35.0 | 35.0 | 52.1 | 69.0 | 40.5 | 75.9 | 39.1 |
| InstanceDiffusion | 94.1 | 94.4 | 89.5 | 84.6 | 83.8 | 87.8 | 89.4 | 84.4 | 67.5 | 46.9 | 39.4 | 65.5 | 78.1 | 57.2 | 83.6 | 65.5 |
| MIGC++ | 94.1 | 92.1 | 87.3 | 84.1 | 83.4 | 86.8 | 89.4 | 78.1 | 62.5 | 48.1 | 38.8 | 63.4 | 74.9 | 48.3 | 79.2 | 52.6 |
| CreatiLayout | 81.9 | 76.3 | 73.4 | 73.5 | 71.2 | 74.0 | 69.4 | 48.1 | 36.9 | 31.9 | 26.3 | 42.5 | 64.9 | 32.4 | 61.1 | 31.6 |
| PlanGen | 85.3 | 84.2 | 83.8 | 80.9 | 81.2 | 82.5 | 72.5 | 63.1 | 51.3 | 33.1 | 31.3 | 50.3 | 66.2 | 31.9 | 74.0 | 21.5 |
| Ours | 95.6 | 94.2 | 92.7 | 90.6 | 92.4 | 92.6 | 91.9 | 83.1 | 73.1 | 63.7 | 68.8 | 76.1 | 85.3 | 70.9 | 89.1 | 76.9 |
3.4M samples spanning layout-grounded text-to-image, single-reference, and multi-reference composition tasks.
If you find our work useful, please cite:
@article{shi2025consistcompose,
title={ConsistCompose: Unified Multimodal Layout Control for Image Composition},
author={Shi, Xuanke and Li, Boxuan and Han, Xiaoyang and Cai, Zhongang and
Yang, Lei and Lin, Dahua and Wang, Quan},
journal={arXiv preprint arXiv:2511.18333},
year={2025}
}