cat

Can 3D Generative Models Help 3D Assembly?

ICML 2026

New York University
*, † Equal Contribution, ✉ Corresponding Author

CRAG shows that assembly and generation are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly.

Condition

Unposed Part Point Clouds

Image (optional)

Reference View

Generation

Generated Complete Shape

Assembly

Predicted Assembled Parts
Reference view

From a set of unposed parts (left) and an optional reference image, CRAG jointly generates a complete shape and predicts the assembled pose for each part. Corresponding parts share colors across the Condition and Assembly views. Orbit any viewer to inspect from different angles, and use the buttons below to switch examples.

Abstract

Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces.

Methodology

Overview Image

Overall illustration of CRAG. Our model consists of two interacting branches: an Assembly Branch that predicts the pose for each part via SE(3) flow matching, and a Generation Branch that synthesizes the complete shape via flow matching. A Joint Adapter bridges these branches, enabling bidirectional information flow. We employ a two-stage training strategy: learning assembly first, and then jointly finetuning both tasks

Assembly Results

Explore our 3D reassembly results interactively. The models below demonstrate how CRAG accurately aligns fragments across different material categories. You can rotate, zoom, and inspect the reassembled objects from any angle.

Quantitative comparison on PartNeXt and Breaking Bad: CRAG vs. baselines under complete and missing part-status settings.
Table 1. Quantitative comparison on PartNeXt and Breaking Bad under two part-status settings: Complete (all parts observed) and Missing (with missing parts). CRAG consistently achieves the best overall performance across datasets and remains robust in the challenging missing-part setting. The best and second best results are highlighted.

Qualitative Comparison

Each row is the same shape reassembled by four methods. Corresponding parts share the same color across the entire row. Models are loaded lazily as you scroll — orbit any viewer to inspect the assembly from a different angle.

Ground Truth
CRAG
Assembler
GARF
RPF

Generation Results

Generation quality on the PartNeXt evaluation set: CRAG vs. TripoSG vs. CRAG w/o image, across CD, F-score, EMD, and failure rate.
Table 3. Generation quality on the PartNeXt evaluation set (4,626 shapes). Part-level evidence from the assembly branch substantially improves image-conditioned generation across all metrics and prevents the VAE-decoding failures observed in image-only TripoSG.

Qualitative Comparison — Generation with Image Condition

Each row shows the unposed input parts, the reference image used to condition generation, CRAG’s generated complete shape, and the ground truth. Parts in the Condition viewer carry baked macaron colors so individual fragments are easy to track. Models load lazily as you scroll — orbit any viewer to inspect from a different angle.

Condition
Image
Generation
Ground Truth
reference view
reference view
reference view
reference view
reference view
reference view
reference view
reference view
reference view
reference view

Qualitative Comparison — Generation without Image Condition

Same four-column layout, but here CRAG receives no reference image — the generation is driven purely by the part-level evidence on the left. The Image column is therefore marked N/A.

Condition
Image
Generation
Ground Truth
N/A
N/A
N/A
N/A
N/A
N/A

BibTeX

@inproceedings{jiang2026CRAG,
 title={CRAG: Can 3D Generative Models Help 3D Assembly?},
 author={Zeyu Jiang and Sihang Li and Siqi Tan and Chenyang Xu and Juexiao Zhang and Julia Galway-Witham and Xue Wang and Scott A. Williams and Radu Iovita and Chen Feng and Jing Zhang},
 year={2026},
 booktitle={International Conference on Machine Learning (ICML)}
}

Acknowledgements

This work was supported in part by NSF Grants 2152565, 2238968, and 2514030, and by NYU IT High Performance Computing resources, services, and staff expertise. This research was also supported by the NVIDIA Academic Grant Program using NVIDIA RTX 6000 Ada GPUs.