Skip to content

FEAT: [model] gemma-4 support#4768

Merged
qinxuye merged 26 commits intoxorbitsai:mainfrom
qinxuye:chore/models-sync/user-qinxuye/gemma-4
Apr 12, 2026
Merged

FEAT: [model] gemma-4 support#4768
qinxuye merged 26 commits intoxorbitsai:mainfrom
qinxuye:chore/models-sync/user-qinxuye/gemma-4

Conversation

@qinxuye
Copy link
Copy Markdown
Contributor

@qinxuye qinxuye commented Apr 5, 2026

Automated sync for model "gemma-4" (llm) by user qinxuye.

@XprobeBot XprobeBot added this to the v2.x milestone Apr 5, 2026
@qinxuye qinxuye force-pushed the chore/models-sync/user-qinxuye/gemma-4 branch 2 times, most recently from 5ef21fb to 7fab628 Compare April 10, 2026 15:19
qinxuye and others added 25 commits April 12, 2026 12:07
… support

- Add _sanitize_generate_config to MLXVisionModel to handle stop/stop_token_ids from model family
- Initialize reasoning parser in MLXVisionModel.load() for hybrid models
- Pass reasoning_parser to completion methods for proper content parsing
- Pass tokenizer to get_full_context for correct jinja template rendering
- Set enable_thinking=False as default for MLX models
- Enable stream by default for multimodal interface in Gradio UI
…abled

- Fix _force_virtualenv_engine_params: only include engines with matching specs,
  previously fallback to all specs when matched_specs is empty
- Add Gemma4ForConditionalGeneration to VLLM_SUPPORTED_MULTI_MODEL_LIST for vllm >= 0.19.0
- VLLMMultiModel: fall back to tokenizer's chat_template when model_family.chat_template is empty
- SGLANGVisionModel: same logic as VLLMMultiModel, align with MLX core behavior
- SGLANGModel: use empty string instead of asserting when chat_template is empty

When chat_template is empty, engines will:
1. Get tokenizer and retrieve its chat_template attribute
2. If still empty, raise ValueError
3. Pass tokenizer to get_full_context for proper template application

This aligns the behavior across vllm, sglang, and mlx engines.
When thinking mode is enabled (enable_thinking=True), special tokens are
needed for the thinking/reasoning format. Set skip_special_tokens=False
in this case to preserve the special tokens.
@qinxuye qinxuye force-pushed the chore/models-sync/user-qinxuye/gemma-4 branch from f36ea85 to 5fd9e51 Compare April 12, 2026 04:33
Copy link
Copy Markdown
Contributor

@amumu96 amumu96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit d6d1007 into xorbitsai:main Apr 12, 2026
2 checks passed
@qinxuye qinxuye deleted the chore/models-sync/user-qinxuye/gemma-4 branch April 12, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants