📝 Representative Publications

3D/4D Perception & Understanding

  • 3D/4D Preception Foundation Model: Orient Anything v2 (NeurIPS 2025), Orient Anything (ICML 2025), Prior Depth Anything
  • 3D/4D Understanding MLLMs: SpatialCLIP (CVPR 2025), Chat-Scene (NeurIPS 2024), Chat-3D (NAACL 2023)
  • Unified Multimodal Representations: C-MCR (NeurIPS 2023), Ex-MCR (NeurIPS 2024), FreeBind (ICML 2024), OmniBind (ICLR 2025)

Generative World Model:

  • World Model: Post-training for World Model (Working on)
  • 3D-aware Visual Generation: SpatialHand, GenSpace (NeurIPS 2025)

I am currently highly interested in the synergy between perception and generation for spatial intelligence. I am exploring: 1. Employing 3D/4D perception foundation models as reward models to enhance content generation. 2. Utilizing generative models to produce imaginative content that aids 3D/4D perception.

Arxiv 2025
sym
  • Depth Anything with Any Prior. Zehan Wang, Siyu Chen, Lihe Yang, Jialei Wang, Ziang Zhang, Hengshuang Zhao, Zhou Zhao Arxiv, 2025
  • The SoTA zero-shot depth estimation model that can integrate any form of depth measurement as prior.
ICML 2025
sym
CVPR 2025
sym
NeurIPS 2024
sym
ICLR 2025
sym
ICML 2024
sym
NeurIPS 2023
sym
  • Connecting Multi-modal Contrastive Representations Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao NeurIPS 2023
  • Learning multimodal contrastive representations without requiring paired data.

Full Publication List

2025

2024

2023