[quantization] Implement tie_word_embeddings

### What
Some models like `unsloth/Llama-3.2-3B-Instruct` use `tie_word_embeddings`, so `lm_head` weights are just a clone of `input_embedings`. But in export this flag is ignored (no known ways to share weights of `input_embeddings` and `lm_head` in export to `circle` (please correct me if i'm wrong)). So to fit to 2Gb of current size constraint of `circle` file we need to quantize `lm_head` to 4 bits (checked - `flatbuffers: cannot grow buffer beyond 2 gigabytes`), which decreases accuracy significantly. It would be nice just to share quantized `lm_head` weights with `input_embeddings` in `circle`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Implement tie_word_embeddings #624

What

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[quantization] Implement tie_word_embeddings #624

Description

What

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions