- [2026.03] Our paper is available on arXiv!
- [2026.03] Code and data will be released after the base model's release. Stay tuned!
Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level consistency, or produce copy-paste artifacts where subjects appear rigid due to pose locking. We present Actor-18M, a large-scale human video dataset designed to capture identity consistency under unconstrained viewpoints and environments. Actor-18M comprises 1.6M videos with 18M corresponding human images, covering both arbitrary views and canonical three-view representations. Leveraging Actor-18M, we propose WildActor, a framework for any-view conditioned human video generation. We introduce an Asymmetric Identity-Preserving Attention (AIPA) mechanism coupled with a Viewpoint-Adaptive Monte Carlo Sampling strategy. Evaluated on the proposed Actor-Bench, WildActor consistently preserves full body identity under diverse shot compositions, large viewpoint transitions, and substantial motions, surpassing existing methods in these challenging settings.
- Release inference code and pre-trained weights.
- Release the Actor-18M dataset building code.
Inference scripts and detailed usage guidelines will be provided upon the release of the pre-trained weights.
If you find our work helpful, please consider citing our paper:
@article{guo2026wildactor,
title={WildActor: Unconstrained Identity-Preserving Video Generation},
author={Guo, Qin and Yang, Tianyu and He, Xuanhua and Shen, Fei and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Xu, Dan},
journal={arXiv preprint arXiv:2603.00586},
year={2026}
}