What
Some models like unsloth/Llama-3.2-3B-Instruct use tie_word_embeddings, so lm_head weights are just a clone of input_embedings. But in export this flag is ignored (no known ways to share weights of input_embeddings and lm_head in export to circle (please correct me if i'm wrong)). So to fit to 2Gb of current size constraint of circle file we need to quantize lm_head to 4 bits (checked - flatbuffers: cannot grow buffer beyond 2 gigabytes), which decreases accuracy significantly. It would be nice just to share quantized lm_head weights with input_embeddings in circle.
What
Some models like
unsloth/Llama-3.2-3B-Instructusetie_word_embeddings, solm_headweights are just a clone ofinput_embedings. But in export this flag is ignored (no known ways to share weights ofinput_embeddingsandlm_headin export tocircle(please correct me if i'm wrong)). So to fit to 2Gb of current size constraint ofcirclefile we need to quantizelm_headto 4 bits (checked -flatbuffers: cannot grow buffer beyond 2 gigabytes), which decreases accuracy significantly. It would be nice just to share quantizedlm_headweights withinput_embeddingsincircle.