Conversation
This commit removes redundant copy of key modules. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
| ) | ||
| self.k_proj = PTQWrapper( | ||
| copy.deepcopy(fp_attn.k_proj), qcfg=k_cfg, fp_name=f"{fp_name}.k_proj" | ||
| fp_attn.k_proj, qcfg=k_cfg, fp_name=f"{fp_name}.k_proj" |
There was a problem hiding this comment.
Oh, I actually think that it's needed because this modifies the original model's weight.
There was a problem hiding this comment.
But it's already modified by GPTQ.
There was a problem hiding this comment.
It consumes a lot of memory for balanced device map.
There was a problem hiding this comment.
Ah, okay. Hmm.. I designed the wrapper just refers to original ones. But, memory issue could happen as you said. Let's remove such overhead instead.
There was a problem hiding this comment.
@mhs4670go
i mean the model is loaded, then its weights are modified in GPTQ.
There was a problem hiding this comment.
i mean the model is loaded, then its weights are modified in GPTQ.
Right. I just think that wrappres just wrap the original nodes unlike GPTQ. Anyway, currnet PR seems okay for reducing overhead.
This commit removes redundant copy of key modules.
Copy was used for debugging, so we don't need it any more.
Draft: #570
TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com