Skip to content

[quantization] Implement tie_word_embeddings #624

@stamalakhov

Description

@stamalakhov

What

Some models like unsloth/Llama-3.2-3B-Instruct use tie_word_embeddings, so lm_head weights are just a clone of input_embedings. But in export this flag is ignored (no known ways to share weights of input_embeddings and lm_head in export to circle (please correct me if i'm wrong)). So to fit to 2Gb of current size constraint of circle file we need to quantize lm_head to 4 bits (checked - flatbuffers: cannot grow buffer beyond 2 gigabytes), which decreases accuracy significantly. It would be nice just to share quantized lm_head weights with input_embeddings in circle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions