Skip to content

[quantization] Remove copy#606

Merged
mhs4670go merged 1 commit intoSamsung:mainfrom
stamalakhov:rem_rednt_copy
Apr 3, 2026
Merged

[quantization] Remove copy#606
mhs4670go merged 1 commit intoSamsung:mainfrom
stamalakhov:rem_rednt_copy

Conversation

@stamalakhov
Copy link
Copy Markdown
Contributor

This commit removes redundant copy of key modules.
Copy was used for debugging, so we don't need it any more.

Draft: #570
TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This commit removes redundant copy of key modules.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
@stamalakhov stamalakhov self-assigned this Apr 3, 2026
@stamalakhov stamalakhov requested a review from mhs4670go April 3, 2026 06:32
)
self.k_proj = PTQWrapper(
copy.deepcopy(fp_attn.k_proj), qcfg=k_cfg, fp_name=f"{fp_name}.k_proj"
fp_attn.k_proj, qcfg=k_cfg, fp_name=f"{fp_name}.k_proj"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I actually think that it's needed because this modifies the original model's weight.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's already modified by GPTQ.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It consumes a lot of memory for balanced device map.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Hmm.. I designed the wrapper just refers to original ones. But, memory issue could happen as you said. Let's remove such overhead instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhs4670go
i mean the model is loaded, then its weights are modified in GPTQ.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean the model is loaded, then its weights are modified in GPTQ.

Right. I just think that wrappres just wrap the original nodes unlike GPTQ. Anyway, currnet PR seems okay for reducing overhead.

Copy link
Copy Markdown
Contributor

@mhs4670go mhs4670go left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhs4670go mhs4670go merged commit 6bd39f7 into Samsung:main Apr 3, 2026
7 checks passed
@stamalakhov stamalakhov deleted the rem_rednt_copy branch April 3, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants