CamI2V: Camera-Controlled Image-to-Video Diffusion Model

🎥 Gallery

rightward rotation and zoom in (CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)	leftward rotation and zoom in (CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)

zoom in and upward movement (CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)	downward movement and zoom-out (CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)

🌟 News

🔥 25/07/12: Release model and evaluation code of RealCam-I2V (DynamiCrafter-based, for reproducing and comparing the results we report in paper). For DiT-based (e.g. CogVideoX) version, please refer to RealCam-I2V.
🔥 25/06/26: RealCam-I2V is accepted by ICCV 2025! 🎉🎉
🔥 25/03/26: Release our dataset RealCam-Vid v1 for metric-scale camera-controlled video generation!
🔥 25/03/17: Upload test metadata used in our paper to make easier evaluation.
🔥 25/02/15: Release demo of RealCam-I2V for real-world applications.
🔥 25/01/12: Release CamI2V (512x320, 100k) checkpoint with longer training.
🔥 25/01/02: Release CamI2V (512x320, 50k) checkpoint, which is suitable for research propose and comparison.
🔥 24/12/23: Release checkpoint of CamI2V (256x256, 50k).
🔥 24/12/16: Release reproduced non-official MotionCtrl (256x256, 50k) and CameraCtrl (256x256, 50k) checkpoints on DynamiCrafter.
🔥 24/12/09: Release training configs and scripts.
🔥 24/12/06: Release dataset pre-process code for RealEstate10K.
🔥 24/12/02: Release evaluation code for RotErr, TransErr, CamMC and FVD.
🌱 24/11/16: Release model code of CamI2V, including implementation for MotionCtrl and CameraCtrl.

📈 Performance

Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).

Method	RotErr↓	TransErr↓	CamMC↓	FVD↓ (VideoGPT)	FVD↓ (StyleGAN)
DynamiCrafter	3.3415	9.8024	11.625	106.02	92.196
MotionCtrl	0.8636	2.5068	2.9536	70.820	60.363
CameraCtrl	0.7064	1.9379	2.3070	66.713	57.644
CamI2V	0.4120	1.3409	1.5291	62.439	53.361

Inference Speed and GPU Memory

Method	# Parameters	GPU Memory	Generation Time (RTX 3090)
DynamiCrafter	1.4 B	11.14 GiB	8.14 s
MotionCtrl	+ 63.4 M	11.18 GiB	8.27 s
CameraCtrl	+ 211 M	11.56 GiB	8.38 s
CamI2V	+ 261 M	11.67 GiB	10.3 s

⚙️ Environment

Quick Start

apt install libgl1-mesa-glx libgl1-mesa-dri xvfb # for ubuntu
yum install -y mesa-libGL mesa-dri-drivers Xvfb. # for centos
conda create -n cami2v python=3.10
conda activate cami2v

conda install -y libstdcxx-ng=12 -c conda-forge
conda install -y pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y xformers -c xformers
pip install -r requirements.txt

💫 Inference

Download Model Checkpoints

Model	Resolution	Training Steps
RealCam-I2V	512x320	50k
RealCam-I2V	256x256	50k
CamI2V	512x320	50k, 100k
CamI2V	256x256	50k
CameraCtrl	256x256	50k
MotionCtrl	256x256	50k

Currently we release 256x256 checkpoints with 50k training steps of DynamiCrafter-based RealCam-I2V, CamI2V, CameraCtrl and MotionCtrl, which is suitable for research propose and comparison.

We also release 512x320 checkpoints of RealCam-I2V and CamI2V, make possible higher resolution and more advanced camera-controlled video generation.

Download above checkpoints and put under ckpts folder. Please edit ckpt_path in configs/models.json if you have a different model path.

Download Depth Anything V2 (metric version) and put under pretrained_models folder for metric depth estimation.

Download Qwen2-VL and put under pretrained_models folder for image caption in gradio demo for video generaion. AWQ-quantized version is prefered due to speed and GPU memory.

Run Gradio Demo

python cami2v_gradio_app.py --use_qwenvl_captioner  # for cami2v
python realcami2v_gradio_app.py --use_qwenvl_captioner  # for realcam-i2v

Gradio may struggle to establish network connection, please re-try with --use_host_ip.

🚀 Training

Prepare Dataset

Please follow instructions in datasets folder in this repo to download RealEstate10K dataset and pre-process necessary items like video_clips and valid_metadata.

Download Pretrained Models

Download pretrained weights of base model DynamiCrafter (256x256, 512x320) and put under pretrained_models folder:

─┬─ pretrained_models/
 ├─┬─ DynamiCrafter/
 │ └─── model.ckpt
 └─┬─ DynamiCrafter_512/
   └─── model.ckpt

Launch

Start training by passing config yaml to --base argument of main/trainer.py. Example training configs are provided in configs folder.

torchrun --standalone --nproc_per_node 8 main/trainer.py --train \
    --logdir $(pwd)/logs \
    --base configs/training/003_cami2v_256x256.yaml \
    --name 256_CamI2V

🔧 Evaluation

We calculate RotErr, TransErr, CamMC and FVD to evaluate camera controllability and visual quality. Code and installation guide for requirements are provided in evaluation folder, including COLMAP and GLOMAP. Support for VBench is planned in months as well.

🤗 Related Repo

RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V

RealCam-Vid: https://github.com/ZGCTroy/RealCam-Vid

Depth Anything V2: https://github.com/DepthAnything/Depth-Anything-V2

CameraCtrl: https://github.com/hehao13/CameraCtrl

MotionCtrl: https://github.com/TencentARC/MotionCtrl

DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter

🗒️ Citation

@article{zheng2024cami2v,
  title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
  author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
  journal={arXiv preprint arXiv:2410.15957},
  year={2024}
}

@article{li2025realcam,
    title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control}, 
    author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
    journal={arXiv preprint arXiv:2502.10059},
    year={2025},
}

@article{zheng2025realcam,
    title={RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements}, 
    author={Zheng, Guangcong and Li, Teng and Zhou, Xianpan and Li, Xi},
    journal={arXiv preprint arXiv:2504.08212},
    year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CameraControl		CameraControl
ckpts		ckpts
configs		configs
datasets		datasets
demo		demo
evaluation		evaluation
lvdm		lvdm
main		main
pretrained_models		pretrained_models
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
cami2v_gradio_app.py		cami2v_gradio_app.py
realcami2v_gradio_app.py		realcami2v_gradio_app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

🎥 Gallery

🌟 News

📈 Performance

Inference Speed and GPU Memory

⚙️ Environment

Quick Start

💫 Inference

Download Model Checkpoints

Run Gradio Demo

🚀 Training

Prepare Dataset

Download Pretrained Models

Launch

🔧 Evaluation

🤗 Related Repo

🗒️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

ZGCTroy/CamI2V

Folders and files

Latest commit

History

Repository files navigation

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

🎥 Gallery

🌟 News

📈 Performance

Inference Speed and GPU Memory

⚙️ Environment

Quick Start

💫 Inference

Download Model Checkpoints

Run Gradio Demo

🚀 Training

Prepare Dataset

Download Pretrained Models

Launch

🔧 Evaluation

🤗 Related Repo

🗒️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages