What's new in 2.5.0 (2026-04-13)

@llyycchhee

What's new in 2.5.0 (2026-04-13)

These are the changes in inference v2.5.0.

New features

feat(sglang): support qwen3.5 by @llyycchhee in #4763
FEAT: reconnect and reconstrcut model replicas after restart supervisor by @leslie2046 in #4731
feat(audio): support qwen3-tts by @llyycchhee in #4781
FEAT: [model] Qwen3-TTS-12Hz-1.7B-Base support by @llyycchhee in #4776
FEAT: [model] Qwen3-TTS-12Hz-0.6B-Base support by @llyycchhee in #4777
FEAT: [model] Qwen3-TTS-12Hz-1.7B-CustomVoice support by @llyycchhee in #4778
FEAT: [model] Qwen3-TTS-12Hz-0.6B-CustomVoice support by @llyycchhee in #4779
FEAT: [model] Qwen3-TTS-12Hz-1.7B-VoiceDesign support by @llyycchhee in #4780
FEAT(webui): add localstorage management for model deploy configuration by @leslie2046 in #4739
FEAT: [model] gemma-4 support by @qinxuye in #4768

Enhancements

ENH: update model "DeepSeek-OCR" JSON by @amumu96 in #4751
ENH: update 2 models JSON ("Ernie4.5", "qwen3.5") by @XprobeBot in #4754
ENH: update model "DeepSeek-V3.2" JSON by @amumu96 in #4762
ENH: update 2 models JSON ("Qwen3-ASR-0.6B", "Qwen3-ASR-1.7B") by @qinxuye in #4765
ENH: auto-detect PyTorch CUDA version for virtual environment setup by @qinxuye in #4766
ENH: update model "jina-embeddings-v4" JSON by @qinxuye in #4775
ENH: Optimize worker details for deployment progress tooltip. by @leslie2046 in #4746
ENH: update model "qwen3.5" JSON by @llyycchhee in #4782
ENH: update 2 models JSON ("Kokoro-82M-v1.1-zh", "Kokoro-82M") by @qinxuye in #4795
ENH: update model "gemma-3-it" JSON by @qinxuye in #4794
ENH: update models JSON [llm] by @XprobeBot in #4796
ENH: add lightweight heartbeat mechanism for worker liveness detection by @qinxuye in #4785
ENH: update model "ChatTTS" JSON by @qinxuye in #4793
bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4743
bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4749
bld: Fix the front-end UI access issue by @zwt-1234 in #4758
BLD: limit gptqmodel installation to specified version by @zwt-1234 in #4798

Bug fixes

fix: use constant-time comparison for auth credentials (CWE-208) by @spidershield-contrib in #4734
bug: fix qwen3 reranker vllm precision by @ZhikaiGuo960110 in #4747
fix: add variable to control template for Qwen3 Reranker Family by @ZhikaiGuo960110 in #4752
BUG: Fix Qwen3.5 wrong tag in streaming API by @la1ty in #4759
BUG: Fix Jinja template error for models using {% break %} tag (e.g. ……Kimi K2.5) by @amumu96 in #4770
BUG: fix qwen3-vl embedding model for vllm engine by @llyycchhee in #4783

Documentation

DOC: add v2.4.0 release notes by @qinxuye in #4740

Others

Fix #4597: [Bug] v2.0.0 Docker image: ImportError (circular import) a... by @JiwaniZakir in #4757

New Contributors

@spidershield-contrib made their first contribution in #4734
@JiwaniZakir made their first contribution in #4757

Full Changelog: v2.4.0...v2.5.0

@leslie2046

What's new in 2.4.0 (2026-03-29)

These are the changes in inference v2.4.0.

New features

FEAT: introducing OTEL by @leslie2046 in #4666
FEAT: [UI] add Xagent link by @yiboyasss in #4693
FEAT: [UI] remove featured/all toggle and prioritize featured models by @yiboyasss in #4694
feat(vllm): support v0.18.0 by @llyycchhee in #4718
FEAT: add gpu load metrics by @leslie2046 in #4712
feat: Upgrade the base image to version 0.17.1 and add support for aarch64 version images by @zwt-1234 in #4726
feat(ci): fix aarch64 build by @zwt-1234 in #4735

Enhancements

ENH: update model "qwen3.5" JSON by @qinxuye in #4689
ENH: update model "qwen3.5" JSON by @llyycchhee in #4707
ENH: update models JSON [llm] by @XprobeBot in #4710
ENH: update models JSON [llm] by @XprobeBot in #4713
enh: adapt normalize param of vllm>0.16.0 for embedding models. by @la1ty in #4729
BLD: Requirements dependency version adjustment by @zwt-1234 in #4736
bld: Requirements dependency version adjustment by @zwt-1234 in #4737
bld: Requirements dependency version adjustment by @zwt-1234 in #4738
REF: parallelize supervisor model registration listing by @leslie2046 in #4690

Bug fixes

BUG: Fix async client FormData handling and response lifecycle issues by @qinxuye in #4687
BUG: MLX backend accumulates intermediate generation steps into final output (tested on 1.17.0, 2.0.0, 2.1.0) #4615 by @nasircsms in #4617
fix(worker): inject parent site-packages into child venv via .pth file by @nasircsms in #4692
BUG: launch multi gpu qwen3.5 error by @llyycchhee in #4700
fix(tool_call): add qwen3.5 by @llyycchhee in #4703
fix(qwen3.5): support tool calls by @llyycchhee in #4709
FIX: qwen3.5 reasoning parse by @llyycchhee in #4719
fix(qwen3.5): support XML-like tool call format in non-streaming mode by @amumu96 in #4715
FIX: webui crash when gpu_utilization is none by @leslie2046 in #4728

Documentation

DOC: add v2.3.0 release notes by @qinxuye in #4688
DOC: add xagent in readme by @qinxuye in #4699

New Contributors

@nasircsms made their first contribution in #4617
@octo-patch made their first contribution in #4704
@la1ty made their first contribution in #4729

Full Changelog: v2.3.0...v2.4.0

@llyycchhee

What's new in 2.3.0 (2026-03-13)

These are the changes in inference v2.3.0.

New features

FEAT: support qwen-3.5 for vllm by @llyycchhee in #4656
FEAT: add seed and repeptition_penalty parameter for precision test by @ZhikaiGuo960110 in #4684
FEAT: [model] qwen2-audio removed by @ZhikaiGuo960110 in #4683

Enhancements

ENH: update 2 models JSON ("qwen3.5", "glm-5") by @llyycchhee in #4655
ENH: update model "MiniMax-M2.5" JSON by @llyycchhee in #4663
ENH: update model "qwen3.5" JSON by @llyycchhee in #4661
ENH: update model "qwen3.5" JSON by @Jun-Howie in #4672
ENH: update 2 models JSON ("glm-5", "Kimi-K2.5") by @llyycchhee in #4662
ENH: update models JSON [llm] by @XprobeBot in #4682
ENH: support transformers for qwen 3.5 by @llyycchhee in #4685
ENH: update models JSON [llm] by @XprobeBot in #4686
BLD: [CI] fix windows runner SSL can't found by @llyycchhee in #4627
REF: Implement REST API dependency injection and response handling by @amumu96 in #4620
REF: extract require_model utility to reduce code duplication by @amumu96 in #4677

Bug fixes

BUG: fix error WorkerWrapperBase.__init__() got multiple values for argument 'rpc_rank' by @llyycchhee in #4649
BUG: fix vLLM embedding check for qwen3-vl-embedding by @ace-xc in #4647
FIX: update the QR code URL by @yiboyasss in #4668
BUG: fix chat for multiple gpus by @llyycchhee in #4671
BUG: [UI] initialize formData with default values from modelFormConfig. by @yiboyasss in #4678
BUG: fix qwen 3.5 vllm since no generation_config.json exists by @llyycchhee in #4681

Documentation

DOC: add v2.2.0 release notes by @qinxuye in #4643
DOC: add missing periods in docstrings by @Jah-yee in #4669

New Contributors

@Jah-yee made their first contribution in #4669

Full Changelog: v2.2.0...v2.3.0

@amumu96

What's new in 2.2.0 (2026-02-28)

These are the changes in inference v2.2.0.

New features

FEAT: Add DeepSeek V3.1 tool parser and update chat template (#4611) by @amumu96 in #4619
FEAT: [model] Kimi-K2.5 support by @XprobeBot in #4631
FEAT: [model] MiniMax-M2.5 support by @XprobeBot in #4630
FEAT: [model] glm-5 support by @llyycchhee in #4638
FEAT: [model] qwen3.5 support by @llyycchhee in #4639
FEAT: support glm-5 and kimi-k2.5 for vllm by @llyycchhee in #4642

Enhancements

ENH: update model "Deepseek-V3.1" JSON by @amumu96 in #4621

Bug fixes

BUG: Fix create_image_edits method for multiple files handling by @lazariv in #4610
BUG: replace 55 bare excepts with except Exception by @haosenwang1018 in #4624

Documentation

DOC: update v2.1.0 doc by @qinxuye in #4608
DOC: add docker pulls in readme by @qinxuye in #4614

New Contributors

@haosenwang1018 made their first contribution in #4624

Full Changelog: v2.1.0...v2.2.0

@Jun-Howie

What's new in 2.1.0 (2026-02-14)

These are the changes in inference v2.1.0.

New features

FEAT: [model] GLM-4.7 support by @Jun-Howie in #4565
FEAT: [model] MinerU2.5-2509-1.2B removed by @OliverBryant in #4568
FEAT: [model] GLM-4.7-Flash support by @OliverBryant in #4578
FEAT: [model] Qwen3-ASR-0.6B support by @leslie2046 in #4579
FEAT: [model] Qwen3-ASR-1.7B support by @leslie2046 in #4580
FEAT: added support qwen3-asr models by @leslie2046 in #4581
FEAT: [model] MinerU2.5-2509-1.2B support by @GaoLeiA in #4569
FEAT: [model] FLUX.2-klein-4B support by @lazariv in #4602
FEAT: [model] FLUX.2-klein-9B support by @lazariv in #4603
FEAT: Add support for FLUX.2-Klein-9B and -4B models by @lazariv in #4596

Enhancements

ENH: update model "DeepSeek-V3.2" JSON by @OliverBryant in #4563
ENH: update model "DeepSeek-V3.2-Exp" JSON by @OliverBryant in #4567
ENH: update models JSON [image] by @XprobeBot in #4606
BLD: constrain setuptools<82 in Docker images by @qinxuye in #4607
REF: extract Pydantic request schemas from restful_api.py into xinference/api/schemas/ by @amumu96 in #4598
REF: extract route registration into domain-specific routers/ by @amumu96 in #4600

Bug fixes

BUG: vllm embedding model error by @OliverBryant in #4562
BUG: vllm reranker score error by @OliverBryant in #4573
BUG: handle async tokenizer in vllm core by @ace-xc in #4577
BUG: vllm reranker model gpu release error by @OliverBryant in #4575

Documentation

DOC: add v2.0.0 release by @qinxuye in #4566

Others

BUG：setuptools CI error by @OliverBryant in #4595

New Contributors

@ace-xc made their first contribution in #4577
@GaoLeiA made their first contribution in #4569
@lazariv made their first contribution in #4602

Full Changelog: v2.0.0...v2.1.0

@OliverBryant

What's new in 2.0.0 (2026-01-31)

These are the changes in inference v2.0.0.

New features

FEAT: add video gguf cache_manager.py by @OliverBryant in #4462
FEAT: [model] Qwen3-VL-Embedding-2B support by @OliverBryant in #4469
FEAT: [UI] move featured to backend API data-driven; remove frontend hardcoding. by @yiboyasss in #4466
FEAT: [model] Qwen3-VL-Reranker-8B support by @OliverBryant in #4472
FEAT: llm cache config in model json to skip unnecessary downloads by @OliverBryant in #4480
FEAT: [UI] add official website and model hub links. by @yiboyasss in #4493
FEAT: add custom llm models config json analysis by @OliverBryant in #4478
FEAT: [model] MinerU2.5-2509-1.2B support by @leslie2046 in #4510
FEAT: Introduce MinerU 2.5 OCR model. by @leslie2046 in #4511
FEAT: add chat_template.jinja support by @OliverBryant in #4526
FEAT: support engines for virtualenv by @OliverBryant in #4497
FEAT: [model] Z-Image support by @OliverBryant in #4546
FEAT: [model] GLM-4.6 support by @Jun-Howie in #4525
FEAT: [model] Qwen3-VL-Embedding-8B support by @OliverBryant in #4470
FEAT: [UI] use browser locale as default language. by @yiboyasss in #4539
FEAT: [model] Qwen3-VL-Reranker-2B support by @OliverBryant in #4471

Enhancements

ENH: update 3 models JSON ("HunyuanVideo", "gme-Qwen2-VL-7B-Instruct", "gme-Qwen2-VL-2B-Instruct") by @OliverBryant in #4464
ENH: update models JSON [embedding, image, llm, video] by @XprobeBot in #4463
ENH: update models JSON [llm] by @XprobeBot in #4490
ENH: update model "Fun-ASR-Nano-2512" JSON by @leslie2046 in #4496
ENH: update model "Fun-ASR-MLT-Nano-2512" JSON by @leslie2046 in #4498
ENH: update model "Qwen3-VL-Embedding-2B" JSON by @OliverBryant in #4503
ENH: update models JSON [embedding, image, llm, rerank] by @XprobeBot in #4524
ENH: update models JSON [embedding, image, llm, rerank] by @XprobeBot in #4534
ENH: update model "Qwen3-VL-Embedding-2B" JSON by @OliverBryant in #4552
BLD: remove Dockerfile for version CU12.4 by @zwt-1234 in #4487
REF: [UI] remove featureModels array. by @yiboyasss in #4488

Bug fixes

BUG: fix has_musa_device error by @OliverBryant in #4477
BUG: [xavier] fix xavier hash function to ensure prefix cache hit by @llyycchhee in #4482
BUG: image/audio/video download hub exclude modelscope by @OliverBryant in #4483
BUG: [UI] historical parameter backfill bug. by @yiboyasss in #4479
BUG: deepseek ocr markdown bug by @OliverBryant in #4491
BUG: new vllm version cannot launch embedding models by @OliverBryant in #4489
BUG: Failed to download model 'Fun-ASR-MLT-Nano-2512' after multiple retries by @leslie2046 in #4537
BUG: transformers version < 5.0.0 by @OliverBryant in #4553
BUG: cachemanager makedirs only init once to prevent from stuck when downloading by @llyycchhee in #4551

Documentation

DOC: add v1.17.0 release note by @qinxuye in #4467
DOC: add limitations for Xavier by @ZhikaiGuo960110 in #4486
DOC: add v2.0 doc by @OliverBryant in #4545
DOC: add cudnn/nccl/cusparselt error solution in virtualenv's doc by @OliverBryant in #4556

Others

feat：Upgrade the vllm base image to version 0.13.0 by @zwt-1234 in #4522
CHORE: modify copyright by @OliverBryant in #4494

Full Changelog: v1.17.0...v2.0.0

v1.17.1 is a hotfix version of v1.17.0

Full Changelog: v1.17.0...v1.17.1

@OliverBryant

What's new in 1.17.0 (2026-01-10)

These are the changes in inference v1.17.0.

New features

FEAT: add enable_thinking kwarg support by @OliverBryant in #4423
FEAT: Support MThreads (MUSA) GPU by @yeahdongcn in #4425
FEAT: support distributed model launch for vllm version>=v0.11.0 by @OliverBryant in #4428
FEAT: [model] Qwen-Image-Edit-2511 support by @OliverBryant in #4427
FEAT: add minimax tool call support by @OliverBryant in #4434
FEAT: [model] Qwen-Image-2512 support by @OliverBryant in #4435
FEAT: support auto batch for sentence_transformers rerank by @llyycchhee in #4429
FEAT: add multi engines for ocr && deepseek ocr mlx support by @OliverBryant in #4437
FEAT: add fp4 support by @OliverBryant in #4450
FEAT: add video gguf support by @OliverBryant in #4458
FEAT: add multi engines for image model by @OliverBryant in #4446

Enhancements

ENH: update 4 models JSON ("Deepseek-V3.1", "deepseek-r1-0528", "deepseek-r1-0528-qwen3", ... +1 more) by @OliverBryant in #4445
ENH: update model "DeepSeek-OCR" JSON by @OliverBryant in #4444
ENH: support vllm mtp & rope scaling by @ZhikaiGuo960110 in #4454

Bug fixes

BUG: fix empty cache for vllm embedding & rerank by @ZhikaiGuo960110 in #4422
BUG: Selecting the same worker repeatedly by @OliverBryant in #4447
BUG: fix vllm ocr model cannot stop by @OliverBryant in #4460
BUG: Models being downloaded cannot be canceled. by @OliverBryant in #4461

Documentation

DOC: update new models and release notes for v1.16.0 by @qinxuye in #4416
DOC: update docker docs by @qinxuye in #4419
DOC: vLLM + Torch + Xinference Compatibility Issue by @qiulang in #4442

New Contributors

@yeahdongcn made their first contribution in #4425

Full Changelog: v1.16.0...v1.17.0

@Jun-Howie

What's new in 1.16.0 (2025-12-27)

These are the changes in inference v1.16.0.

New features

FEAT: [model] DeepSeek-V3.2-Exp support by @Jun-Howie in #4374
FEAT:Add vLLM backend support for DeepSeek-V3.2 by @Jun-Howie in #4377
FEAT:Add vLLM backend support for DeepSeek-V3.2-Exp. by @Jun-Howie in #4375
FEAT: vacc support by @ZhikaiGuo960110 in #4382
FEAT: support vlm for vacc by @ZhikaiGuo960110 in #4385
FEAT: [model] Fun-ASR-Nano-2512 support by @leslie2046 in #4397
FEAT: [model] Qwen-Image-Layered support by @OliverBryant in #4395
FEAT: [model] Fun-ASR-MLT-Nano-2512 support by @leslie2046 in #4398
FEAT: continuous batching support for MLX chat models by @qinxuye in #4403
FEAT: Add the architectures field for llm model launch by @OliverBryant in #4405
FEAT: [UI] image models support configuration via environment variables and custom parameters. by @yiboyasss in #4413
FEAT: support rerank async batch by @llyycchhee in #4414
FEAT:Support VLLM backend for MiniMaxM2ForCausalLM by @Jun-Howie in #4412

Enhancements

ENH: fix assigning replica to make gpu idxes assigned continuous by @ZhikaiGuo960110 in #4370
ENH: update model "DeepSeek-V3.2" JSON by @Jun-Howie in #4381
ENH: update model "glm-4.5" JSON by @OliverBryant in #4383
ENH: update 2 models JSON ("glm-4.1v-thinking", "glm-4.5v") by @OliverBryant in #4384
ENH: support torchaudio 2.9.0 by @llyycchhee in #4390
ENH: update 3 models JSON ("llama-2-chat", "llama-3", "llama-3-instruct") by @OliverBryant in #4400
ENH: update 4 models JSON ("llama-3.1", "llama-3.1-instruct", "llama-3.2-vision-instruct", ... +1 more) by @OliverBryant in #4401
ENH: update model "jina-embeddings-v3" JSON by @XprobeBot in #4404
ENH: update models JSON [audio, embedding, image, llm, video] by @XprobeBot in #4407
ENH: update models JSON [audio, image] by @XprobeBot in #4408
ENH: update model "Z-Image-Turbo" JSON by @OliverBryant in #4409
ENH: update 2 models JSON ("DeepSeek-V3.2", "DeepSeek-V3.2-Exp") by @Jun-Howie in #4392
ENH: update models JSON [llm] by @XprobeBot in #4415
BLD: remove python 3.9 support by @OliverBryant in #4387
BLD: Update Dockerfile to 12.9 to use VLLM v0.11.2 version by @zwt-1234 in #4393

Bug fixes

BUG: fix PaddleOCR-VL output by @leslie2046 in #4368
BUG: custom embedding and rerank model analysis error by @OliverBryant in #4367
BUG: cannot launch model on cpu && multi workers launch error by @OliverBryant in #4361
BUG: OCR API return is null && add doc for how to modify model_size by @OliverBryant in #4331
BUG: fix n_gpu parameter by @OliverBryant in #4411

Documentation

DOC: update new models and release notes for v1.15.0 by @qinxuye in #4359

Full Changelog: v1.15.0...v1.16.0

@OliverBryant

What's new in 1.15.0 (2025-12-13)

These are the changes in inference v1.15.0.

New features

FEAT: added more detailed instructions for engine unavailability. by @OliverBryant in #4308
FEAT: [model] Z-Image-Turbo support by @OliverBryant in #4333
FEAT: [model] DeepSeek-V3.2 support by @Jun-Howie in #4344
FEAT: [model] PaddleOCR-VL support by @leslie2046 in #4354
FEAT: add llama_cpp json schema output by @OliverBryant in #4282
FEAT: PaddleOCR-VL implementation by @leslie2046 in #4304
FEAT: multi replicas on a single GPU && add launch strategy by @OliverBryant in #4358

Enhancements

ENH: update models JSON [llm] by @XprobeBot in #4343
ENH: update model "MiniMax-M2" JSON by @XprobeBot in #4342
ENH: update models JSON [llm] by @XprobeBot in #4349
ENH: support lauching with --device cpu by @hubutui in #4352
ENH: add glm-4.5 tool calls support && vllm StructuredOutputsParams support by @OliverBryant in #4357

Bug fixes

BUG: fix manage cache models missing by @OliverBryant in #4329
BUG: [llm, vllm]: support ignore eos by @ZhikaiGuo960110 in #4332
BUG: Multimodal settings for video parameters are not taking effect. by @OliverBryant in #4338
BUG: Soft links cannot be completely deleted by @OliverBryant in #4337
BUG: Packages with identical names in virtual environments error by @OliverBryant in #4348
BUG: Fix typo in xinference/deploy/docker/Dockerfile.cu128 by @hubutui in #4350
BUG: custom embedding model register fail by @OliverBryant in #4335
BUG: [UI] fix the bug in the copy function. by @yiboyasss in #4355
BUG: [UI] control Select dropdown width to prevent it from becoming too wide. by @yiboyasss in #4356

Documentation

DOC: add new models and v1.14.0 release notes by @qinxuye in #4305

Others

Fixed- workflow Vulnerability by @barakharyati in #4328
CHORE: add i18n for replica details by @leslie2046 in #4306

New Contributors

@barakharyati made their first contribution in #4328
@ZhikaiGuo960110 made their first contribution in #4332
@hubutui made their first contribution in #4350

Full Changelog: v1.14.0...v1.15.0

Releases: xorbitsai/inference

v2.5.0

What's new in 2.5.0 (2026-04-13)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v2.4.0

What's new in 2.4.0 (2026-03-29)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v2.3.0

What's new in 2.3.0 (2026-03-13)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v2.2.0

What's new in 2.2.0 (2026-02-28)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v2.1.0

What's new in 2.1.0 (2026-02-14)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v2.0.0

What's new in 2.0.0 (2026-01-31)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

Uh oh!

v1.17.1

Uh oh!

v1.17.0

What's new in 1.17.0 (2026-01-10)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.16.0

What's new in 1.16.0 (2025-12-27)

New features

Enhancements

Bug fixes

Documentation

Contributors

Uh oh!

v1.15.0

What's new in 1.15.0 (2025-12-13)

New features

Enhancements