[quantization] Fix a bug in AffineObserverBase.compute_qparams#634
Open
dvsav wants to merge 1 commit intoSamsung:mainfrom
Open
[quantization] Fix a bug in AffineObserverBase.compute_qparams#634dvsav wants to merge 1 commit intoSamsung:mainfrom
dvsav wants to merge 1 commit intoSamsung:mainfrom
Conversation
This change fixes incorrect computation of scale in AffineObserverBase.compute_qparams for asymmetric qscheme. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
This change fixes incorrect computation of scale in
AffineObserverBase.compute_qparamsfor asymmetric qscheme.Why
Symptoms
The issue was detected when testing QuantLayerNorm wrapper for torch.nn.LayerNorm.
I've obtained a surprisingly large divergence between the original model (
torch.nn.LayerNorm) and the fake-quantized modelQuantLayerNorm:Debugging revealed that
QuantLayerNormwas incorrectly quantizing the variance of the input tensor it was trying to normalize:Note that variance is usually strictly positive (which is one of the bug contitions - see next).
The Bug
tico/quantization/wrapq/observers/affine_base.pyAffineObserverBase.fake_quantThe bug manifests itself when 2 conditions are met simultaneously:
observer.qscheme.is_symmetric()returnsFalse. For example when you specifydefault_qscheme=QScheme.PER_TENSOR_ASYMMordefault_qscheme=QScheme.PER_CHANNEL_ASYMMinPTQConfigor when you specify unsigneddefault_dtypeinPTQConfig(or leave it defaultuint8which is also unsigned).Calculated zero point then goes outside of the range respresentable by the quantized type and gets clamped which leads to an inconsistent combination of scale and zero point. Here's an example:
Actually any fp value greater than
scale * qmaxgets quantized toqmaxand therefore clamped toscale * qmax.The Solution
The simplest solution is to expand the range of observed variable so that it includes 0. This means:
Let's revisit the example above with this fix:
Now the original and the dequantized values are the same.
In
AffineObserverBasethis solution was expressed as the correction of variable range:Unit Tests
The bug has been covered with 4 new regression tests (see below).
BEFORE FIX
AFTER FIX