[quantization] Fix a bug in AffineObserverBase.compute_qparams by dvsav · Pull Request #634 · Samsung/TICO

dvsav · 2026-04-15T13:35:01Z

What

This change fixes incorrect computation of scale in AffineObserverBase.compute_qparams for asymmetric qscheme.

Why

Symptoms

The issue was detected when testing QuantLayerNorm wrapper for torch.nn.LayerNorm.
I've obtained a surprisingly large divergence between the original model (torch.nn.LayerNorm) and the fake-quantized model QuantLayerNorm:

┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.695877
│ PEIR       : 45.260801 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 4.1┤                                            │
 2.7┤                                            │
    │                                            │
 1.3┤                                            │
-0.1┤                 ••••••••••••••••••••••• •  │
    │  • •••••••••••••••                         │
-1.5┤                                            │
-2.9┤                                            │
    │                                            │
-4.3┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -4.3       -2.2       -0.1       2.0       4.1

Debugging revealed that QuantLayerNorm was incorrectly quantizing the variance of the input tensor it was trying to normalize:

# tico/quantization/wrapq/wrappers/nn/quant_layernorm.py

# 4) variance (via squared mean)
v = s_q.mean(dim=dims, keepdim=True)
v_q = self._fq(v, self.obs_var) # <-- almost all values of v_q were clamped to the same value

Note that variance is usually strictly positive (which is one of the bug contitions - see next).

The Bug

File: tico/quantization/wrapq/observers/affine_base.py
Function: AffineObserverBase.fake_quant

The bug manifests itself when 2 conditions are met simultaneously:

observer.qscheme.is_symmetric() returns False. For example when you specify default_qscheme=QScheme.PER_TENSOR_ASYMM or default_qscheme=QScheme.PER_CHANNEL_ASYMM in PTQConfig or when you specify unsigned default_dtype in PTQConfig (or leave it default uint8 which is also unsigned).
The range of observed variable doesn't include 0 (all values > 0 or all values < 0).

Calculated zero point then goes outside of the range respresentable by the quantized type and gets clamped which leads to an inconsistent combination of scale and zero point. Here's an example:

fp_min = 1.0
fp_max = 3.55
qmin = 0
qmax = 255
scale = (fp_max - fp_min) / (qmax - qmin) = (3.55 - 1.0) / (255 - 0) = 0.01
zero_point = round(qmin - fp_min / scale) = round(0 - 1.0 / 0.01) = -100 # doesn't fit into qmin...qmax range!
zero_point = zero_point.clamp(qmin, qmax) = 0

# Let's quantize fp_max=3.55
quantized_fp_max = round(fp_max / scale + zero_point) = round(3.55 / 0.01 + 0) = 355 # doesn't fit into qmin...qmax range!
quantized_fp_max = quantized_fp_max.clamp(qmin, qmax) = 355.clamp(0, 255) = 255

# Let's dequantize it back
dequantized_fp_min = scale * (quantized_fp_min - zero_point) = 0.01 * (255 - 0) = 2.55 # not 3.55!

Actually any fp value greater than scale * qmax gets quantized to qmax and therefore clamped to scale * qmax.

The Solution

The simplest solution is to expand the range of observed variable so that it includes 0. This means:

if fp_min > 0 replace fp_min with 0.
if fp_max < 0 replace fp_max with 0.

Let's revisit the example above with this fix:

fp_min = 1.0
fp_max = 3.55
qmin = 0
qmax = 255

fp_min > 0.0 ==> fp_min := 0.0 # <-- THIS IS THE FIX

scale = (fp_max - fp_min) / (qmax - qmin) = (3.55 - 0.0) / (255 - 0) = 0.014
zero_point = round(qmin - fp_min / scale) = round(0 - 0.0 / 0.014) = 0

# Let's quantize fp_max=3.55
quantized_fp_max = round(fp_max / scale + zero_point) = round(3.55 / 0.014 + 0) = 255

# Let's dequantize it back
dequantized_fp_min = scale * (quantized_fp_min - zero_point) = 0.014 * (255 - 0) = 3.55

Now the original and the dequantized values are the same.

In AffineObserverBase this solution was expressed as the correction of variable range:

# Force the range to include 0
rng = torch.where(0 < self.min_val, self.max_val, rng)
rng = torch.where(0 > self.max_val, -self.min_val, rng)

Unit Tests

The bug has been covered with 4 new regression tests (see below).

BEFORE FIX

$ python -m pytest test/quantization/wrapq/observers/test_affine_base.py -v
======================================================================= test session starts ========================================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 12 items                                                                                                                                                 

test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_degenerate_constant_cases PASSED                                         [  8%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_fake_quant_requires_qparams PASSED                                       [ 16%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_load_qparams_and_fake_quant PASSED                                       [ 25%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams PASSED                               [ 33%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_negative_range FAILED                [ 41%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_positive_range FAILED                [ 50%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_fake_quant_path PASSED                                       [ 58%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams PASSED                                          [ 66%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_negative_range FAILED                           [ 75%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_positive_range FAILED                           [ 83%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_symmetric_qparams PASSED                                      [ 91%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_reset_clears_minmax_and_qparams PASSED                                   [100%]

============================================================================= FAILURES =============================================================================
__________________________________________ TestAffineObserverBase.test_per_channel_asymm_stats_and_qparams_negative_range __________________________________________

self = <test.quantization.wrapq.observers.test_affine_base.TestAffineObserverBase testMethod=test_per_channel_asymm_stats_and_qparams_negative_range>

    def test_per_channel_asymm_stats_and_qparams_negative_range(self):
        # Test per-channel asymmetric quantization with negative-only ranges
        # shape (C=2, N=3)
        x = torch.tensor([[-1.0, -3.0, -2.0], [-4.0, -5.0, -0.5]])
    
        obs = _MinMaxLikeObserver(
            name="pc_asymm_neg",
            dtype=DType.int(5),  # 5-bit signed
            qscheme=QScheme.PER_CHANNEL_ASYMM,
            channel_axis=0,
        )
        obs.collect(x)
    
        self.assertTrue(torch.equal(obs.min_val, torch.tensor([-3.0, -5.0])))
        self.assertTrue(torch.equal(obs.max_val, torch.tensor([-1.0, -0.5])))
    
        scale, zp = obs.compute_qparams()
        qmin, qmax = obs.dtype.qmin, obs.dtype.qmax
        expected_scale = (-obs.min_val) / (qmax - qmin)
        expected_zp = torch.full(size=(x.shape[0],), fill_value=qmax)
    
>       self.assertTrue(torch.allclose(scale, expected_scale, atol=1e-6))
E       AssertionError: False is not true

test/quantization/wrapq/observers/test_affine_base.py:263: AssertionError
__________________________________________ TestAffineObserverBase.test_per_channel_asymm_stats_and_qparams_positive_range __________________________________________

self = <test.quantization.wrapq.observers.test_affine_base.TestAffineObserverBase testMethod=test_per_channel_asymm_stats_and_qparams_positive_range>

    def test_per_channel_asymm_stats_and_qparams_positive_range(self):
        # Test per-channel asymmetric quantization with positive-only ranges
        # shape (C=2, N=3)
        x = torch.tensor([[1.0, 3.0, 2.0], [4.0, 5.0, 0.5]])
    
        obs = _MinMaxLikeObserver(
            name="pc_asymm_pos",
            dtype=DType.int(5),  # 5-bit signed
            qscheme=QScheme.PER_CHANNEL_ASYMM,
            channel_axis=0,
        )
        obs.collect(x)
    
        self.assertTrue(torch.equal(obs.min_val, torch.tensor([1.0, 0.5])))
        self.assertTrue(torch.equal(obs.max_val, torch.tensor([3.0, 5.0])))
    
        scale, zp = obs.compute_qparams()
        qmin, qmax = obs.dtype.qmin, obs.dtype.qmax
        expected_scale = obs.max_val / (qmax - qmin)
        expected_zp = torch.full(size=(x.shape[0],), fill_value=qmin)
    
>       self.assertTrue(torch.allclose(scale, expected_scale, atol=1e-6))
E       AssertionError: False is not true

test/quantization/wrapq/observers/test_affine_base.py:239: AssertionError
_______________________________________________ TestAffineObserverBase.test_per_tensor_asymm_qparams_negative_range ________________________________________________

self = <test.quantization.wrapq.observers.test_affine_base.TestAffineObserverBase testMethod=test_per_tensor_asymm_qparams_negative_range>

    def test_per_tensor_asymm_qparams_negative_range(self):
        # Test per-tensor asymmetric quantization with negative-only range
        obs = _MinMaxLikeObserver(name="pt_asymm_neg", dtype=DType.uint(4))
        obs.collect(torch.tensor([-4.0, -3.0, -2.0]))
        obs.collect(torch.tensor([-1.0]))
    
        self.assertEqual(obs.min_val.item(), -4.0)
        self.assertEqual(obs.max_val.item(), -1.0)
    
        scale, zp = obs.compute_qparams()
        qmin, qmax = obs.dtype.qmin, obs.dtype.qmax
        expected_scale = 4.0 / (qmax - qmin)
        expected_zp = qmax
    
>       self.assertAlmostEqual(scale.item(), expected_scale, places=6)
E       AssertionError: 0.20000000298023224 != 0.26666666666666666 within 6 places (0.06666666368643442 difference)

test/quantization/wrapq/observers/test_affine_base.py:215: AssertionError
_______________________________________________ TestAffineObserverBase.test_per_tensor_asymm_qparams_positive_range ________________________________________________

self = <test.quantization.wrapq.observers.test_affine_base.TestAffineObserverBase testMethod=test_per_tensor_asymm_qparams_positive_range>

    def test_per_tensor_asymm_qparams_positive_range(self):
        # Test per-tensor asymmetric quantization with positive-only range
        obs = _MinMaxLikeObserver(name="pt_asymm_pos", dtype=DType.uint(4))
        obs.collect(torch.tensor([1.0, 2.0, 3.0]))
        obs.collect(torch.tensor([4.0]))
    
        self.assertEqual(obs.min_val.item(), 1.0)
        self.assertEqual(obs.max_val.item(), 4.0)
    
        scale, zp = obs.compute_qparams()
        qmin, qmax = obs.dtype.qmin, obs.dtype.qmax
        expected_scale = 4.0 / (qmax - qmin)
        expected_zp = 0
    
>       self.assertAlmostEqual(scale.item(), expected_scale, places=6)
E       AssertionError: 0.20000000298023224 != 0.26666666666666666 within 6 places (0.06666666368643442 difference)

test/quantization/wrapq/observers/test_affine_base.py:198: AssertionError
===================================================================== short test summary info ======================================================================
FAILED test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_negative_range - AssertionError: False is not true
FAILED test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_positive_range - AssertionError: False is not true
FAILED test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_negative_range - AssertionError: 0.20000000298023224 != 0.26666666666666666 within 6 places (0.06666666368643442 difference)
FAILED test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_positive_range - AssertionError: 0.20000000298023224 != 0.26666666666666666 within 6 places (0.06666666368643442 difference)
=================================================================== 4 failed, 8 passed in 1.58s ====================================================================

AFTER FIX

$ python -m pytest test/quantization/wrapq/observers/test_affine_base.py -v
======================================================================= test session starts ========================================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 12 items                                                                                                                                                 

test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_degenerate_constant_cases PASSED                                         [  8%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_fake_quant_requires_qparams PASSED                                       [ 16%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_load_qparams_and_fake_quant PASSED                                       [ 25%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams PASSED                               [ 33%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_negative_range PASSED                [ 41%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_asymm_stats_and_qparams_positive_range PASSED                [ 50%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_channel_fake_quant_path PASSED                                       [ 58%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams PASSED                                          [ 66%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_negative_range PASSED                           [ 75%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_asymm_qparams_positive_range PASSED                           [ 83%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_per_tensor_symmetric_qparams PASSED                                      [ 91%]
test/quantization/wrapq/observers/test_affine_base.py::TestAffineObserverBase::test_reset_clears_minmax_and_qparams PASSED                                   [100%]

======================================================================== 12 passed in 1.56s ========================================================================

This change fixes incorrect computation of scale in AffineObserverBase.compute_qparams for asymmetric qscheme. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

[quantization] Fix a bug in AffineObserverBase.compute_qparams

632ae61

This change fixes incorrect computation of scale in AffineObserverBase.compute_qparams for asymmetric qscheme. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

dvsav force-pushed the fix_affine_base branch from 0348157 to 632ae61 Compare April 15, 2026 14:51

dvsav marked this pull request as ready for review April 16, 2026 08:04

dvsav mentioned this pull request Apr 16, 2026

[quantization] Introduce example script for torch.nn.LayerNorm quantization #633

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Fix a bug in AffineObserverBase.compute_qparams#634

[quantization] Fix a bug in AffineObserverBase.compute_qparams#634
dvsav wants to merge 1 commit intoSamsung:mainfrom
dvsav:fix_affine_base

dvsav commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dvsav commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Symptoms

The Bug

The Solution

Unit Tests

BEFORE FIX

AFTER FIX

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dvsav commented Apr 15, 2026 •

edited

Loading