diff --git a/demo/dygraph/unstructured_pruning/README.md b/demo/dygraph/unstructured_pruning/README.md index f03c8c256..d7b27b409 100644 --- a/demo/dygraph/unstructured_pruning/README.md +++ b/demo/dygraph/unstructured_pruning/README.md @@ -9,8 +9,8 @@ ## 版本要求 ```bash python3.5+ -paddlepaddle>=2.2.0 -paddleslim>=2.2.0 +paddlepaddle>=2.4.0 +paddleslim>=2.4.0 ``` 请参照github安装[paddlepaddle](https://github.com/PaddlePaddle/Paddle)和[paddleslim](https://github.com/PaddlePaddle/PaddleSlim)。 diff --git a/demo/ofa/bert/README.md b/demo/ofa/bert/README.md index 7fbb071d2..4998323cd 100644 --- a/demo/ofa/bert/README.md +++ b/demo/ofa/bert/README.md @@ -183,7 +183,7 @@ BERT-base模型是一个迁移能力很强的通用语义表示模型,但是 ```shell pip install paddlenlp -pip install paddlepaddle_gpu>=2.0rc1 +pip install paddlepaddle_gpu ``` ### 2.2 Fine-tuing diff --git a/demo/quant/BiBERT/README.md b/demo/quant/BiBERT/README.md index 5e7daa31b..59e8a7c22 100644 --- a/demo/quant/BiBERT/README.md +++ b/demo/quant/BiBERT/README.md @@ -88,11 +88,10 @@ If you find our work useful in your research, please consider citing: ```shell @inproceedings{Qin:iclr22, - author = {Haotong Qin and Yifu Ding and Mingyuan Zhang and Qinghua Yan and + author = {Haotong Qin and Yifu Ding and Mingyuan Zhang and Qinghua Yan and Aishan Liu and Qingqing Dang and Ziwei Liu and Xianglong Liu}, title = {BiBERT: Accurate Fully Binarized BERT}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2022} } ``` - diff --git a/demo/quant/quant_post/README.md b/demo/quant/quant_post/README.md index 66f609388..7912f0998 100755 --- a/demo/quant/quant_post/README.md +++ b/demo/quant/quant_post/README.md @@ -10,7 +10,7 @@ ### 环境准备 -PaddlePaddle >= 2.3 或develop版本 +PaddlePaddle >= 2.4.0 或develop版本 ### 准备数据 diff --git a/demo/unstructured_prune/README.md b/demo/unstructured_prune/README.md index b25afceea..7783aa180 100644 --- a/demo/unstructured_prune/README.md +++ b/demo/unstructured_prune/README.md @@ -9,8 +9,8 @@ ## 版本要求 ```bash python3.5+ -paddlepaddle>=2.2.0 -paddleslim>=2.2.0 +paddlepaddle>=2.4.0 +paddleslim>=2.4.0 ``` 请参照github安装[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)和[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)。 diff --git a/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst b/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst index c308413db..7750e2419 100644 --- a/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst +++ b/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst @@ -12,29 +12,34 @@ AutoCompression **参数: ** - **model_dir(str)** - 需要压缩的推理模型所在的目录。 -- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。 -- **model_filename(str)** - 需要压缩的推理模型文件名称。 -- **params_filename(str)** - 需要压缩的推理模型参数文件名称。 +- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_dataloader`` 设置相同的数据读取即可。 +- **model_filename(str)** - 需要压缩的推理模型文件名称。如果压缩的是onnx模型,则本参数设置为 ``None`` 即可。 +- **params_filename(str)** - 需要压缩的推理模型参数文件名称。如果压缩的是onnx模型,则本参数设置为 ``None`` 即可。 - **save_dir(str)** - 压缩后模型的所保存的目录。 -- **train_config(dict)** - 训练配置。可以配置的参数请参考: ``_ 。注意:如果选择离线量化超参搜索策略的话, ``train_config`` 直接设置为 ``None`` 即可。 +- **input_shapes(dict|tuple|list)** - 如果模型除 ``batch size`` 维度外还有可变维度(某一维度为-1意味着当前维度是可变维度),则需要设置此参数在压缩前固定下来。如果设置的是dict类型,则关键字为输入的名字,对应的值为每个输入的具体shape,例如模型中输入 ``X`` 的形状为 ``[-1, 3, -1, -1]`` 意味着 ``batch size`` 维度、 ``hight`` 维度和 ``width`` 维度都是变化的, ``input_shape`` 可以设置为 ``{"X": [-1, 3, 512, 512]}`` 。如果 ``input_shapes`` 设置为list或者tuple形式的话,模型只能有一个输入,并且输入的形状会设置成 ``input_shapes`` 的形状。设置为 ``None`` 的话,就保持原始形状不变,可能会跳过搜索压缩策略的过程。默认: ``None`` 。 +- **train_config(dict)** - 训练配置。可以配置的参数请参考: `TrainConfig `_ 。注意:如果选择离线量化超参搜索策略的话, ``train_config`` 直接设置为 ``None`` 即可。 - **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略,可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在: - ``Quantization`` (量化配置, 可配置的参数参考 ``_ ), - ``Distillation`` (蒸馏配置, 可配置的参数参考 ``_), - ``MultiTeacherDistillation`` (多teacher蒸馏配置, 可配置的参数参考 ``_), - ``HyperParameterOptimization`` (超参搜索配置, 可配置的参数参考 ``_), - ``Prune`` (剪枝配置, 可配置的参数参考 ``_), - ``UnstructurePrune`` (非结构化稀疏配置, 可配置的参数参考 ``_) 之间选择。 + ``QuantAware`` (量化训练配置, 可配置的参数参考 `QuantAware `_ ), + ``QuantPost`` (离线量化配置, 可配置的参数参考 `QuantPost `_ ), + ``Distillation`` (蒸馏配置, 可配置的参数参考 `Distillation `_), + ``MultiTeacherDistillation`` (多teacher蒸馏配置, 可配置的参数参考 `MultiTeacherDistillation `_), + ``HyperParameterOptimization`` (超参搜索配置, 可配置的参数参考 `HyperParameterOptimization `_), + ``ChannelPrune`` (结构化稀疏配置, 可配置的参数参考 `ChannelPrune `_), + ``UnstructurePrune`` (非结构化稀疏配置, 可配置的参数参考 `UnstructurePrune `_) 之间选择。 + ``ASPPrune`` (ASP半结构化结构化稀疏配置, 可配置的参数参考 `ASPPrune `_) 之间选择。 + ``TransformerPrune`` (Transformer结构化稀疏配置, 只针对Transformer-encoder结构进行剪枝,可配置的参数参考 `TransformerPrune `_) 之间选择。 目前关键字只支持以下几种组合策略或者单策略配置: - 1) ``Quantization`` & ``HyperParameterOptimization``: 离线量化超参搜索策略; - 2) ``Quantization`` & ``Distillation``: 量化训练和蒸馏的策略; - 3) ``ChannelPrune`` & ``Distillation``: 结构化剪枝和蒸馏的策略; - 4) ``ASPPrune`` & ``Distillation``: ASP结构化剪枝和蒸馏的策略; - 5) ``TransformerPrune`` & ``Distillation``: Transformer结构化剪枝和蒸馏的策略; + 1) ``QuantPost`` & ``HyperParameterOptimization``: 离线量化超参搜索策略; + 2) ``QuantAware`` & ``Distillation``: 量化训练和蒸馏的策略; + 3) ``ChannelPrune`` & ``Distillation``: 结构化稀疏和蒸馏的策略; + 4) ``ASPPrune`` & ``Distillation``: ASP半结构化稀疏和蒸馏的策略; + 5) ``TransformerPrune`` & ``Distillation``: Transformer结构化稀疏和蒸馏的策略; 6) ``UnstructurePrune`` & ``Distillation``: 非结构化稀疏和蒸馏的策略; - 7) ``Distillation``: 单独单蒸馏策略; + 7) ``Distillation``: 单独单teacher蒸馏策略; 8) ``MultiTeacherDistillation``: 多teacher蒸馏策略。 设置为None的话会自动的选择策略去做压缩。默认:None。 -- **eval_callback(function, 可选)** - eval回调函数,使用回调函数判断模型训练情况, 回调函数的写法参考: ``_ 。 ``eval_callback`` 和 ``eval_dataloader`` 不能都设置为None。默认:None。 +- **target_speedup(float, 可选)** - 目标加速比例,在支持硬件延时表的设备上会根据预估的加速进行压缩策略选择;在硬件延时表不支持的设备上会默认量化相比 ``float32`` 加速70%,剩下的加速比会等价设置成剪枝的比例(压缩后模型实测的加速情况和预计差别可能较大,暂时不太推荐在硬件延时表不支持的设备上使用本参数)。默认: ``None`` 。 +- **eval_callback(function, 可选)** - eval回调函数,使用回调函数判断模型训练情况, 回调函数的写法参考: `custom_function `_ 。 ``eval_callback`` 和 ``eval_dataloader`` 不能都设置为None。默认:None。 - **eval_dataloader(paddle.io.Dataloader, 可选)** - 如果传入测试数据迭代器,则使用 ``EMD`` 距离判断压缩前后模型之间的差别,目前仅支持离线量化超参搜索使用这种方式判断压缩前后模型的压缩。 - **deploy_hardware(str, 可选)** - 压缩后模型的部署硬件。默认: ``gpu`` 。 @@ -42,55 +47,30 @@ AutoCompression **示例代码:** -```shell - +.. code-block:: shell import paddle - from paddleslim.auto_compression import AutoCompression - default_qat_config = { - "quantize_op_types": ["conv2d", "depthwise_conv2d", "mul"], - "weight_bits": 8, - "activation_bits": 8, - "is_full_quantize": False, - "not_quant_pattern": ["skip_quant"], - } - default_distill_config = { - "loss": args.loss, - "node": args.node, - "alpha": args.alpha, - "teacher_model_dir": args.teacher_model_dir, - "teacher_model_filename": args.teacher_model_filename, - "teacher_params_filename": args.teacher_params_filename, - } - train_dataloader = Cifar10(mode='train') - eval_dataloader = Cifar10(mode='eval') - ac = AutoCompression(model_path, train_dataloader, model_filename, params_filename, save_dir, \ - - strategy_config="Quantization": Quantization(**default_ptq_config), - + strategy_config="QuantPost": QuantPost(**default_ptq_config), "Distillation": HyperParameterOptimization(**default_distill_config)}, \ - - train_config=None, eval_callback=eval_dataloader,devices='gpu') - -``` + train_config=None, eval_dataloader=eval_dataloader,devices='gpu') .. py:method:: paddleslim.auto_compression.AutoCompression.compress() @@ -130,19 +110,33 @@ TrainConfig - **sharding_config(dict, optional)** - 使用fleet api的前提下可以使用sharding 策略。参数按照fleet 接口中所描述的进行配置: `sharding_configs `_ 。 - **sparse_model(bool, optional)** - 设置 ``sparse_model`` 为 True, 可以移出非结构化稀疏产出的模型中多余的mask tensor的变量,默认: False. -Quantization +QuantAware ---------- -量化配置。 +量化训练配置。 **参数:** -- **quantize_op_types(list[str])** - 需要进行量化的 op 类型。 -- **weight_quantize_type(str)** - 参数量化方式,可选: ['channel_wise_abs_max', 'abs_max']。 -- **weight_bits(int)** - 参数量化bit数。 -- **activation_bits(int)** - 激活量化bit数。 -- **is_full_quantize(bool)** - 是否量化所有可支持op类型。 -- **not_quant_pattern(str|list[str])** - 所有 ``name_scope`` 包含 ``'not_quant_pattern'`` 字符串的 op 都不量化, 设置方式请参考 `fluid.name_scope `_ 。 +- **use_pact(bool)** - 是否开启PACT。一般情况下,开启PACT后,量化产出的模型精度会更高。算法原理请参考: `PACT: Parameterized Clipping Activation for Quantized Neural Networks `_ +- **weight_quantize_type(str)** - 参数量化方式,可选: ['channel_wise_abs_max', 'abs_max', 'moving_average_abs_max', 'range_abs_max']。如果使用 TensorRT 加载量化后的模型来预测,请使用 'channel_wise_abs_max' 。 默认 'channel_wise_abs_max' 。 +- **quantize_op_types(list[str])** - 需要进行量化的 op 类型。通过以下代码输出所有支持量化的OP类型: +.. code-block:: shell +from paddleslim.quant.quanter import TRANSFORM_PASS_OP_TYPES,QUANT_DEQUANT_PASS_OP_TYPES +print(TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES) + +- **onnx_format(bool)** - 量化后的模型是否和符合ONNX量化格式标准, **如果需要导出成ONNX,则需要设置为True。** 默认:False。 +- **weight_bits(int)** - 参数量化bit数。默认:8. +- **activation_bits(int)** - 激活量化bit数。默认:8。 +- **activation_quantize_type(str)** - 激活量化方式,可选 'abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测,请使用 'range_abs_max' 或 'moving_average_abs_max' 。默认为 'moving_average_abs_max'。 +- **not_quant_pattern(str|list[str])** - 所有 ``name_scope`` 包含 ``'not_quant_pattern'`` 字符串的 op 都不量化, 设置方式请参考 `fluid.name_scope `_ 。默认:'skip_quant'. +- **window_size(int)** - 'range_abs_max' 量化方式的 window size ,默认10000。 +- **moving_rate(float)** - 'moving_average_abs_max' 量化方式的衰减系数,默认 0.9。 +- **for_tensorrt(bool)** - 量化后的模型是否使用 TensorRT 进行预测。默认值为False. 通过以下代码,输出for_tensorrt=True时会量化到的OP: +.. code-block:: shell +from paddleslim.quant.quanter import TENSORRT_OP_TYPES +print(TENSORRT_OP_TYPES) + +- **is_full_quantize(bool)** - 是否量化所有可支持op类型。默认:False。 Distillation ---------- @@ -151,7 +145,7 @@ Distillation **参数:** -- **loss(str|list[str])** - 蒸馏损失名字,可以设置的损失类型为paddleslim中支持的蒸馏损失,可选的损失函数有: ``fsp``, ``l2``, ``soft_label`` 。如果您需要其他损失函数,可以暂时通过向 `蒸馏损失文件`_ z中添加相应的损失函数计算,或者通过提issue的方式我们来协助解决。 +- **loss(str|list[str])** - 蒸馏损失名字,可以设置的损失类型为paddleslim中支持的蒸馏损失,可选的损失函数有: ``fsp``, ``l2``, ``soft_label`` 。如果您需要其他损失函数,可以暂时通过向 `蒸馏损失文件 `_ 中添加相应的损失函数计算,或者通过提issue的方式我们来协助解决。 。 - **node(list[str])** - 蒸馏节点名字列表,可以选择:1. 使用自蒸馏的话,蒸馏结点仅包含学生网络节点即可, 支持多节点蒸馏; 2. 使用其他蒸馏的话,蒸馏节点需要包含教师网络节点和对应的学生网络节点, 每两个节点组成一对,分别属于教师模型和学生模型。 - **alpha(float|list[float])** - 每一个蒸馏损失的权重,长度需要和 ``loss`` 的长度保持一致。 @@ -167,7 +161,7 @@ MultiTeacherDistillation **参数:** -- **loss(list[str])** - 蒸馏损失名字,可以设置的损失类型为paddleslim中支持的蒸馏损失,可选的损失函数有: ``fsp``, ``l2``, ``soft_label`` 。如果您需要其他损失函数,可以暂时通过向 `蒸馏损失文件`_ z中添加相应的损失函数计算,或者通过提issue的方式我们来协助解决。 +- **loss(list[str])** - 蒸馏损失名字,可以设置的损失类型为paddleslim中支持的蒸馏损失,可选的损失函数有: ``fsp``, ``l2``, ``soft_label`` 。如果您需要其他损失函数,可以暂时通过向 `蒸馏损失文件 `_ 中添加相应的损失函数计算,或者通过提issue的方式我们来协助解决。 。 - **node(list[list[str]])** - 蒸馏节点名字嵌套列表,教师模型的个数和外部列表的长度需要保持一致。每一个列表代表一个教师模型和学生模型直接的蒸馏节点,其中每两个节点组成一对,分别属于教师模型和学生模型。 - **alpha(list[float])** - 每一个蒸馏损失的权重,长度需要和 ``distill_loss`` 的长度保持一致。 @@ -194,17 +188,33 @@ HyperParameterOptimization - **batch_num(int|list[int])** - 迭代次数, 设置类型为列表的话,列表中的最大最小值会作为上下界,在上下界范围内进行均匀采样。 - **max_quant_count(int)** - 超参搜索运行的最大轮数, 默认:20。 -PruneConfig +ChannelPrune ---------- -裁剪配置。 +结构化稀疏配置。 **参数:** -- **prune_algo(str)** - 裁剪算法,可设置为: ``prune`` 或者 ``asp`` 。 ``prune`` 暂时只支持对视觉模型进行压缩, ``asp`` 裁剪暂时只支持对 ``FC`` 进行压缩。 -- **pruned_ratio(float)** - 裁剪比例。 -- **prune_params_name(list[str])** - 参与裁剪的参数的名字。 -- **criterion(str)** - 裁剪算法设置为 ``prune`` 时,评估一个卷积层内通道重要性所参考的指标。目前支持 ``l1_norm``, ``bn_scale``, ``geometry_median`` 。 +- **pruned_ratio(float)** - 每个卷积层的通道数被剪裁的比例。 +- **prune_params_name(list[str])** - 参与裁剪的参数的名字。如果设置为 ``None`` , 则会按照传入的剪枝比例对所有可以裁剪的卷积层进行裁剪。合适的卷积层可以通过计算每一层的敏感度来选择,敏感度可以通过 `敏感度计算工具 <../../../../../example/auto_compression/prune_sensitivity_analysis/>`_ 来获得每层的敏感度信息,然后设置合适的裁剪的卷积层名字。也可以使用 `Netron工具 `_ 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。默认: ``None`` 。 + +TransformerPrune +---------- + +针对Transformer结构的结构化剪枝参数 + +- **pruned_ratio(float)** - 每个全链接层的被剪裁的比例。 UnstructurePrune ---------- @@ -221,5 +231,5 @@ UnstructurePrune ``prune_steps(int)`` - 迭代训练多少iteration后,改变稀疏比例。 ``initial_ratio(float)`` - 初始的稀疏比例。 其它配置可以参考非结构化稀疏接口中 `configs参数 `_ 的配置。 -- **prune_params_type(str)** - 用以指定哪些类型的参数参与稀疏。目前只支持 ``None`` 和 ``conv1x1_only`` 两个选项,后者表示只稀疏化1x1卷积。而前者表示稀疏化除了归一化的参数。 +- **prune_params_type(str)** - 用以指定哪些类型的参数参与稀疏。目前只支持 ``None`` 和 ``conv1x1_only`` 两个选项,后者表示只稀疏化1x1卷积。而前者表示稀疏化除了归一化的参数。默认: ``conv1x1_only`` 。 - **local_sparsity(bool)** - 剪裁比例(ratio)应用的范围: ``local_sparsity`` 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 ``ratio`` , 关闭时表示只保证模型整体稀疏度达到 ``ratio`` ,但是每个参数矩阵的稀疏度可能存在差异。 diff --git a/docs/zh_cn/api_cn/static/auto-compression/custom_function.rst b/docs/zh_cn/api_cn/static/auto-compression/custom_function.rst index 239d0ee9f..03c1f02fa 100644 --- a/docs/zh_cn/api_cn/static/auto-compression/custom_function.rst +++ b/docs/zh_cn/api_cn/static/auto-compression/custom_function.rst @@ -35,7 +35,7 @@ 1.3 自定义计算逻辑 ########## -首先需要根据 `如何基于Paddle自定义DataLoader <>`_ 章节定义测试数据集 ``test_dataloader`` 。 +首先需要根据 `如何基于Paddle自定义DataLoader `_ 章节定义测试数据集 ``test_dataloader`` 。 ```python diff --git a/docs/zh_cn/tutorials/quant/AnalysisPTQ.md b/docs/zh_cn/tutorials/quant/AnalysisPTQ.md new file mode 100644 index 000000000..6ad49a98d --- /dev/null +++ b/docs/zh_cn/tutorials/quant/AnalysisPTQ.md @@ -0,0 +1,99 @@ +# PTQ(Post Training Quantization)量化分析工具详细教程 + +## 1. 量化分析工具功能 +1. 统计分析(statistical_analyse): + - 可视化激活和权重箱状图。箱状图可发现是否出现离群点。 + - 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。 + - 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等。 + +2. 精度误差分析(metric_error_analyse): + - 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。 + +3. 获取目标模型(get_target_quant_model): + - 输入预期精度,直接产出符合预期精度的量化模型。 + + +## 2. paddleslim.quant.AnalysisPTQ 可传入参数解析 +| **参数名** | **参数释义** | +|-----------------------------|-----------------------------------------| +| model_dir | 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可 | +| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | +| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入 | +| eval_function | 若需要验证精度,需要传入自定义的验证函数 | +| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader | +| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`| +| resume | 是否加载中间分析文件,默认为False| +| ptq_config | 可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post) | + + + + + +## 3. 量化分析工具的使用 +**创建量化分析工具** : +``` +analyzer = AnalysisPTQ( + model_dir=config["model_dir"], + model_filename=config["model_filename"], + params_filename=config["params_filename"], + eval_function=eval_function, + data_loader=data_loader, + save_dir=config['save_dir'], + ptq_config=config['PTQ']) +``` + +**统计分析** +``` +analyzer.statistical_analyse() +``` + +调用该接口,会统计量化前和量化后每一个可量化权重和其对应激活的数据。只使用该接口可以不输入Eval Function,但需要输入DataLoader,少量数据即可。会产出以下文件: +- `fp_activation_boxplot.pdf`:量化前Float数据类型的模型激活箱状图 +- `fp_weight_boxplot.pdf`:量化前Float数据类型的模型权重箱状图 +- `quantized_activation_boxplot.pdf`:量化后INT数据类型的模型激活箱状图 +- `quantized_weight_boxplot.pdf`:量化后INT数据类型的模型权重箱状图 +- `fp_activation_histplot.pdf`:量化前Float数据类型的模型激活直方图 +- `fp_weight_histplot.pdf`:量化前Float数据类型的模型权重直方图 +- `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图 +- `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图 +- `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有: + - Var Name: Variable的名称 + - Var Type:Variable的类型,Weight或Activation + - Corresponding Weight Name:如果为Activation,其对应的Weight名称 + - FP32 Min:量化前Float数据类型的最小值 + - FP32 Max:量化前Float数据类型的最大值 + - FP32 Mean:量化前Float数据类型的平均值 + - FP32 Std:量化前Float数据类型的方差值 + - Quantized Min:量化后INT数据类型的最小值 + - Quantized Max:量化后INT数据类型的最大值 + - Quantized Mean:量化后INT数据类型的平均值 + - Quantized Std:量化后INT数据类型的方差值 + - Diff Min:量化前后该Variable的相差的最小值 + - Diff Max:量化前后该Variable的相差的最大值 + - Diff Mean:量化前后该Variable的相差的平均值 + - Diff Std:量化前后该Variable的相差的方差值 + + +**精度误差分析** +``` +analyzer.metric_error_analyse() +``` +调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。 + + + +**直接产出符合预期精度的目标量化模型** +``` +analyzer.get_target_quant_model(target_metric) +``` + +## 4. 根据分析结果执行离线量化 +执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 + + +## FAQ: +- 与QAT(Quantization-Aware Training)量化分析工具的区别:与QAT量化分析工具不同的是,PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。而QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。 + +- PTQ量化分析工具设计的原因:PTQ量化分析工具依次量化模型中的每一层,而不是依次去掉量化层是由于PTQ本身的高效性。依次量化一层进行验证,查看对模型精度的损失十分直观。 + +- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。 diff --git a/docs/zh_cn/tutorials/quant/AnalysisQAT.md b/docs/zh_cn/tutorials/quant/AnalysisQAT.md new file mode 100644 index 000000000..b6386c9c7 --- /dev/null +++ b/docs/zh_cn/tutorials/quant/AnalysisQAT.md @@ -0,0 +1,56 @@ +# QAT(Quantization-Aware Training)量化分析工具详细教程 + +## 1. 量化分析工具功能 +精度误差分析(metric_error_analyse): + - 遍历量化训练后模型的每层,去掉量化节点并计算当前层不量化的模型精度。该功能可以定位具体某层导致的量化损失。 + + +## 2. paddleslim.quant.AnalysisQAT 可传入参数解析 +| **参数名** | **参数释义** | +|-----------------------------|-----------------------------------------| +| quant_model_dir | 必须传入的量化后的模型文件路径 | +| float_model_dir | 必须传入的量化前的模型文件路径 | +| model_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称 | +| params_filename | 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称 | +| quantizable_op_type | 需分析的量化的op类型,默认为`conv2d`, `depthwise_conv2d`, `mul` | +| qat_metric | 量化模型的精度,可不传入,默认为None,不传入时会自动计算 | +| eval_function | 需要传入自定义的验证函数 | +| data_loader | 模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader | +| save_dir | 分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`| +| resume | 是否加载中间分析文件,默认为False| + + + + + +## 3. 量化分析工具的使用 +**创建量化分析工具** : +``` +analyzer = AnalysisQAT( + quant_model_dir=config["quant_model_dir"], + float_model_dir=config["float_model_dir"], + model_filename=config["model_filename"], + params_filename=config["params_filename"], + quantizable_op_type=config['quantizable_op_type'], + qat_metric=config['qat_metric'], + eval_function=eval_function, + data_loader=eval_loader, + save_dir=config['save_dir'], + resume=config['resume'], +) +``` + + +**精度误差分析** +``` +analyzer.metric_error_analyse() +``` +调用该接口,会遍历量化模型中的每一层,去掉量化节点并计算当前层不量化的模型精度。调用该接口时,需要输入Eval Function。会产出所有去掉一层量化的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。具体使用可参考[GPT量化训练敏感度分析DEMO](../../../../example/quantization_analysis/GPT/README.md)。 + + +## FAQ: +- 与PTQ(Post Training Quantization)量化分析工具的区别:与PTQ量化分析工具不同的是,QAT量化分析工具加载量化训练后的量化模型,遍历所有量化的层,依次去掉量化层,加载Float模型的参数,并进行验证获取精度误差分析。而PTQ量化分析工具则是加载待量化的原模型,对模型所有层依次进行量化,每次量化一层,进行验证获取精度误差分析。 + +- QAT量化分析工具设计的原因:QAT量化分析工具依次去掉量化层,而不是依次量化一层是由于QAT需要训练的特性。遍历每层进行量化训练再验证精度比较耗时,直接加载量化训练后的量化模型,依次去掉量化层更高效。 + +- 量化分析工具为什么要区分PTQ和QAT:实验证明PTQ和QAT后的量化模型的敏感层并不完全一致,将两种算法分开,敏感度分析结果更加准确。 diff --git a/docs/zh_cn/tutorials/quant/AnalysisQuant.md b/docs/zh_cn/tutorials/quant/AnalysisQuant.md deleted file mode 100644 index 669a126b4..000000000 --- a/docs/zh_cn/tutorials/quant/AnalysisQuant.md +++ /dev/null @@ -1,98 +0,0 @@ -# 量化分析工具详细教程 - -## 1. 量化分析工具功能 -1. statistical_analyse: - - 可视化激活和权重箱状图。箱状图可发现是否出现离群点。 - - 可视化权重和激活直方分布图。直方分布图可观察更具体的数值分布。 - - 提供量化前后权重和激活的具体数据信息,包括min,max,mean,std等 - -2. metric_error_analyse: - - 遍历量化模型的每层,并计算量化后精度。该功能可以定位具体某层导致的量化损失。 - -3. get_target_quant_model: - - 输入预期精度,直接产出符合预期精度的量化模型。 - - -## 2. paddleslim.quant.AnalysisQuant 可传入参数解析 -```yaml -model_dir -model_filename: None -params_filename: None -eval_function: None -data_loader: None -save_dir: 'analysis_results' -resume: False -ptq_config -``` -- model_dir: 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可。 -- model_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。 -- params_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。 -- eval_function:若需要验证精度,需要传入自定义的验证函数。 -- data_loader:模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。 -- save_dir:分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`。 -- resume:是否加载中间分析文件 -- ptq_config:可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post)。 - - - - -## 3. 量化分析工具的使用 -**创建量化分析工具** : -``` -analyzer = AnalysisQuant( - model_dir=config["model_dir"], - model_filename=config["model_filename"], - params_filename=config["params_filename"], - eval_function=eval_function, - data_loader=data_loader, - save_dir=config['save_dir'], - ptq_config=config['PTQ']) -``` - -**统计分析** -``` -analyzer.statistical_analyse() -``` - -调用该接口,会统计量化前和量化后每一个可量化权重和其对应激活的数据。只使用该接口可以不输入Eval Function,但需要输入DataLoader,少量数据即可。会产出以下文件: -- `fp_activation_boxplot.pdf`:量化前Float数据类型的模型激活箱状图 -- `fp_weight_boxplot.pdf`:量化前Float数据类型的模型权重箱状图 -- `quantized_activation_boxplot.pdf`:量化后INT数据类型的模型激活箱状图 -- `quantized_weight_boxplot.pdf`:量化后INT数据类型的模型权重箱状图 -- `fp_activation_histplot.pdf`:量化前Float数据类型的模型激活直方图 -- `fp_weight_histplot.pdf`:量化前Float数据类型的模型权重直方图 -- `quantized_activation_histplot.pdf`:量化后INT数据类型的模型激活直方图 -- `quantized_weight_histplot.pdf`:量化后INT数据类型的模型权重直方图 -- `statistic.csv`:量化前后权重和激活的具体数据信息,表格中会保存的信息有: - - Var Name: Variable的名称 - - Var Type:Variable的类型,Weight或Activation - - Corresponding Weight Name:如果为Activation,其对应的Weight名称 - - FP32 Min:量化前Float数据类型的最小值 - - FP32 Max:量化前Float数据类型的最大值 - - FP32 Mean:量化前Float数据类型的平均值 - - FP32 Std:量化前Float数据类型的方差值 - - Quantized Min:量化后INT数据类型的最小值 - - Quantized Max:量化后INT数据类型的最大值 - - Quantized Mean:量化后INT数据类型的平均值 - - Quantized Std:量化后INT数据类型的方差值 - - Diff Min:量化前后该Variable的相差的最小值 - - Diff Max:量化前后该Variable的相差的最大值 - - Diff Mean:量化前后该Variable的相差的平均值 - - Diff Std:量化前后该Variable的相差的方差值 - - -**精度误差分析** -``` -analyzer.metric_error_analyse() -``` -调用该接口,会遍历量化模型中的一层,并计算量化该层后模型的损失。调用该接口时,需要输入Eval Function。会产出所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。 - - - -**直接产出符合预期精度的量化模型** -``` -analyzer.get_target_quant_model(target_metric) -``` - -## 4. 根据分析结果执行离线量化 -执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。 diff --git a/docs/zh_cn/tutorials/quant/post_training_quantization.md b/docs/zh_cn/tutorials/quant/post_training_quantization.md index 0077569b1..09d22d17b 100644 --- a/docs/zh_cn/tutorials/quant/post_training_quantization.md +++ b/docs/zh_cn/tutorials/quant/post_training_quantization.md @@ -72,6 +72,9 @@ $$ 说明: - 如果想使用bias_correction,可以在PaddleSlim的[离线量化接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)修改`bias_correction`参数为True即可,默认为False。 - 如果想使用Adaround方法,可以在PaddleSlim的[离线量化接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)修改`round_type`参数为`adaround`即可,默认为`round`。 +- 如果想使用BRECQ方法,可以在PaddleSlim的[量化重构接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)修改`recon_level`参数为`region-wise`即可,默认为`layer-wise`。 +- 如果想使用QDrop方法,可以在PaddleSlim的[量化重构接口](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)修改`simulate_activation_quant`参数为`True`即可,默认为`False`。 + ### 效果对比 diff --git a/example/auto_compression/README.md b/example/auto_compression/README.md index 4342aa16d..b14b65938 100644 --- a/example/auto_compression/README.md +++ b/example/auto_compression/README.md @@ -27,6 +27,10 @@ PaddleSlim推出全新自动化压缩工具(Auto Compression Toolkit, ACT),旨在通过Source-Free的方式,自动对预测模型进行压缩,压缩后模型可直接部署应用。 +- ACT可以自动处理常见的预测模型,如果有更特殊的改造需求,可以参考:[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。 +- ACT接口各个参数详细含义可以参考: [ACT API文档](../docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst)。 +- 一些问题以及解决方案可以参考:[FAQ](./hyperparameter_tutorial.md#12-faq)。如果FAQ不能解决您的问题,欢迎加入用户群或者通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSlim/issues)给我们提issues。 + ## **News** 📢 * 🎉 **2022.8.22** [**PaddleSlim v2.3.3**](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.3)全新发布!目前已经在图像分类、目标检测、图像分割、NLP等20多个模型验证正向效果。 @@ -68,25 +72,39 @@ ACT相比传统的模型压缩方法, -| 模型类型 | model name | 压缩前
精度(Top1 Acc %) | 压缩后
精度(Top1 Acc %) | 压缩前
推理时延(ms) | 压缩后
推理时延(ms) | 推理
加速比 | 芯片 | -| ------------------------------- | ---------------------------- | ---------------------- | ---------------------- | ---------------- | ---------------- | ---------- | ----------------- | -| [图像分类](./image_classification) | MobileNetV1 | 70.90 | 70.57 | 33.15 | 13.64 | **2.43** | SDM865(骁龙865) | -| [图像分类](./image_classification) | ShuffleNetV2_x1_0 | 68.65 | 68.32 | 10.43 | 5.51 | **1.89** | SDM865(骁龙865) | -| [图像分类](./image_classification) | SqueezeNet1_0_infer | 59.60 | 59.45 | 35.98 | 16.96 | **2.12** | SDM865(骁龙865) | -| [图像分类](./image_classification) | PPLCNetV2_base | 76.86 | 76.43 | 36.50 | 15.79 | **2.31** | SDM865(骁龙865) | -| [图像分类](./image_classification) | ResNet50_vd | 79.12 | 78.74 | 3.19 | 0.92 | **3.47** | NVIDIA Tesla T4 | -| [语义分割](./semantic_segmentation) | PPHGNet_tiny | 79.59 | 79.20 | 2.82 | 0.98 | **2.88** | NVIDIA Tesla T4 | -| [语义分割](./semantic_segmentation) | PP-HumanSeg-Lite | 92.87 | 92.35 | 56.36 | 37.71 | **1.49** | SDM710 | -| [语义分割](./semantic_segmentation) | PP-LiteSeg | 77.04 | 76.93 | 1.43 | 1.16 | **1.23** | NVIDIA Tesla T4 | -| [语义分割](./semantic_segmentation) | HRNet | 78.97 | 78.90 | 8.19 | 5.81 | **1.41** | NVIDIA Tesla T4 | -| [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 | -| [NLP](./nlp) | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 | -| [NLP](./nlp) | ERNIE 3.0-Medium | 73.09 | 72.40 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 | -| [目标检测](./pytorch_yolo_series) | YOLOv5s
(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 | -| [目标检测](./pytorch_yolo_series) | YOLOv6s
(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | -| [目标检测](./pytorch_yolo_series) | YOLOv7
(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 | -| [目标检测](./detection) | PP-YOLOE-s | 43.1 | 42.6 | 6.51 | 2.12 | **3.07** | NVIDIA Tesla T4 | -| [图像分类](./image_classification) | MobileNetV1
(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) | +| 模型类型 | model name | 压缩前
精度(Top1 Acc %) | 压缩后
精度(Top1 Acc %) | 压缩前
推理时延(ms) | 压缩后
推理时延(ms) | 推理
加速比 | 芯片 | +| ------------------------------- | ----------------------------- | ---------------------- | ---------------------- | ---------------- | ---------------- | ---------- | --------------- | +| [图像分类](./image_classification) | MobileNetV1 | 70.90 | 70.57 | 33.15 | 13.64 | **2.43** | SDM865(骁龙865) | +| [图像分类](./image_classification) | MobileNetV3_large_x1_0 | 75.32 | 74.04 | 16.62 | 9.85 | **1.69** | SDM865(骁龙865) | +| [图像分类](./image_classification) | MobileNetV3_large_x1_0_ssld | 78.96 | 77.17 | 16.62 | 9.85 | **1.69** | SDM865(骁龙865) | +| [图像分类](./image_classification) | ShuffleNetV2_x1_0 | 68.65 | 68.32 | 10.43 | 5.51 | **1.89** | SDM865(骁龙865) | +| [图像分类](./image_classification) | SqueezeNet1_0_infer | 59.60 | 59.45 | 35.98 | 16.96 | **2.12** | SDM865(骁龙865) | +| [图像分类](./image_classification) | PPLCNetV2_base | 76.86 | 76.39 | 36.50 | 15.79 | **2.31** | SDM865(骁龙865) | +| [图像分类](./image_classification) | ResNet50_vd | 79.12 | 78.74 | 3.19 | 0.92 | **3.47** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | PPHGNet_tiny | 79.59 | 79.20 | 2.82 | 0.98 | **2.88** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | InceptionV3 | 79.14 | 78.32 | 4.79 | 1.47 | **3.26** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | EfficientNetB0 | 77.02 | 74.27 | 1.95 | 1.44 | **1.35** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | GhostNet_x1_0 | 74.02 | 72.62 | 2.93 | 1.03 | **2.84** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | ViT_base_patch16_224 | 81.89 | 82.05 | 367.17 | 51.70 | **7.10** | NVIDIA Tesla T4 | +| [语义分割](./semantic_segmentation) | PP-HumanSeg-Lite | 92.87 | 92.35 | 56.36 | 37.71 | **1.49** | SDM710 | +| [语义分割](./semantic_segmentation) | PP-LiteSeg | 77.04 | 76.93 | 1.43 | 1.16 | **1.23** | NVIDIA Tesla T4 | +| [语义分割](./semantic_segmentation) | HRNet | 78.97 | 78.90 | 8.188 | 5.812 | **1.41** | NVIDIA Tesla T4 | +| [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 | +| [语义分割](./semantic_segmentation) | Deeplabv3-ResNet50 | 79.90 | 79.26 | 12.766 | 8.839 | **1.44** | NVIDIA Tesla T4 | +| [语义分割](./semantic_segmentation) | BiSeNetV2 | 73.17 | 73.20 | 35.61 | 15.94 | **2.23** | NVIDIA Tesla T4 | +| [NLP](./nlp) | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 | +| [NLP](./nlp) | ERNIE 3.0-Medium | 73.09 | 72.16 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 | +| [NLP](./pytorch_huggingface) | bert-base-cased(Hugging-Face) | 81.35 | 81.51 | 11.60 | 4.83 | **2.40** | NVIDIA Tesla T4 | +| [目标检测](./detection) | SSD-MobileNetv1 | 73.8(voc) | 73.52 | 4.0 | 1.7 | **2.35** | NVIDIA Tesla T4 | +| [目标检测](./pytorch_yolo_series) | YOLOv5s
(PyTorch) | 37.4 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 | +| [目标检测](./pytorch_yolo_series) | YOLOv6s
(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | +| [目标检测](./pytorch_yolo_series) | YOLOv6s_v2(PyTorch) | 43.4 | 43.0 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | +| [目标检测](./pytorch_yolo_series) | YOLOv7-Tiny(PyTorch) | 37.3 | 37.0 | 5.06 | 1.68 | **3.01** | NVIDIA Tesla T4 | +| [目标检测](./pytorch_yolo_series) | YOLOv7
(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 | +| [目标检测](./detection) | PP-YOLOE-l | 50.9 | 50.6 | 11.2 | 6.7 | **1.67** | NVIDIA Tesla T4 | +| [目标检测](./detection) | PP-YOLOE-s | 43.1 | 42.6 | 6.51 | 2.12 | **3.07** | NVIDIA Tesla T4 | +| [图像分类](./image_classification) | MobileNetV1
(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) | + - 备注:目标检测精度指标为mAP(0.5:0.95)精度测量结果。图像分割精度指标为IoU精度测量结果。 - 更多飞桨模型应用示例及Benchmark可以参考:[图像分类](./image_classification),[目标检测](./detection),[语义分割](./semantic_segmentation),[自然语言处理](./nlp) @@ -94,19 +112,19 @@ ACT相比传统的模型压缩方法, ## **环境准备** -- 安装PaddlePaddle >= 2.3.2:(可以参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- 安装PaddlePaddle >= 2.4.0:(可以参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) ```shell # CPU - pip install paddlepaddle --upgrade + pip install paddlepaddle # GPU 以CUDA11.2为例 - python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + python -m pip install paddlepaddle_gpu -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` -- 安装PaddleSlim >=2.3.3: +- 安装PaddleSlim >= 2.4.0: ```shell - pip install paddleslim==2.3.3 + pip install paddleslim ``` ## **快速开始** @@ -124,6 +142,8 @@ tar -xf ILSVRC2012_data_demo.tar.gz - **2.运行自动化压缩** +由于目前离线量化超参搜索仅支持Linux系统,以下默认示例需在Linux环境中测试。如果想要在Windows环境中测试,可以使用代码中Windows环境的config,由于Windows环境中配置的压缩策略为量化训练,所以需要全量数据集,否则会有一定的精度下降。 + ```python # 导入依赖包 import paddle @@ -162,7 +182,8 @@ ac = AutoCompression( model_filename="inference.pdmodel", params_filename="inference.pdiparams", save_dir="MobileNetV1_quant", - config={'Quantization': {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}}, + config={"QuantPost": {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}}, + ### config={"QuantAware": {}, "Distillation": {}}, ### 如果您的系统为Windows系统, 请使用当前这一行配置 train_dataloader=train_loader, eval_dataloader=train_loader) ac.compress() @@ -190,7 +211,7 @@ ac.compress() - 量化模型速度的测试依赖推理库的支持,所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、Python3.7、TensorRT得到的。 - - 使用以下指令查看本地cuda版本,并且在[下载链接](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。 + - 使用以下指令查看本地cuda版本,并且在[下载链接](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的PaddlePaddle安装包。 ```shell cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89 @@ -234,6 +255,7 @@ ac.compress() ## 进阶使用 - ACT可以自动处理常见的预测模型,如果有更特殊的改造需求,可以参考[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。 +- ACT接口各个参数详细含义可以参考 [ACT API文档](../../docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst)。 ## 社区交流 diff --git a/example/auto_compression/detection/README.md b/example/auto_compression/detection/README.md index 40973d369..a61b9d9eb 100644 --- a/example/auto_compression/detection/README.md +++ b/example/auto_compression/detection/README.md @@ -35,17 +35,17 @@ ## 3. 自动压缩流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - PaddleDet >= 2.4 - opencv-python 安装paddlepaddle: ```shell # CPU -pip install paddlepaddle==2.3.2 +pip install paddlepaddle # GPU 以Ubuntu、CUDA 11.2为例 -python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html +python -m pip install paddlepaddle_gpu -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` 安装paddleslim: diff --git a/example/auto_compression/detection/configs/picodet_s_qat_dis.yaml b/example/auto_compression/detection/configs/picodet_s_qat_dis.yaml index 3b1b08e1b..72de6102b 100644 --- a/example/auto_compression/detection/configs/picodet_s_qat_dis.yaml +++ b/example/auto_compression/detection/configs/picodet_s_qat_dis.yaml @@ -18,7 +18,7 @@ Distillation: - conv2d_154.tmp_1 - tmp_8 -Quantization: +QuantAware: use_pact: true activation_quantize_type: 'moving_average_abs_max' weight_bits: 8 diff --git a/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml b/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml index 730fb14ad..d1c28b060 100644 --- a/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml +++ b/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true use_pact: true activation_quantize_type: 'moving_average_abs_max' diff --git a/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml b/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml index be324ac7c..2090babab 100644 --- a/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml +++ b/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true use_pact: true activation_quantize_type: 'moving_average_abs_max' diff --git a/example/auto_compression/detection/configs/ssd_mbv1_voc_qat_dis.yaml b/example/auto_compression/detection/configs/ssd_mbv1_voc_qat_dis.yaml index fc532a0a9..710a3a97f 100644 --- a/example/auto_compression/detection/configs/ssd_mbv1_voc_qat_dis.yaml +++ b/example/auto_compression/detection/configs/ssd_mbv1_voc_qat_dis.yaml @@ -13,7 +13,7 @@ Distillation: - concat_2.tmp_0 - concat_1.tmp_0 -Quantization: +QuantAware: use_pact: True weight_quantize_type: 'channel_wise_abs_max' activation_quantize_type: 'moving_average_abs_max' diff --git a/example/auto_compression/detection/configs/tinypose_qat_dis.yaml b/example/auto_compression/detection/configs/tinypose_qat_dis.yaml index 237f73643..7cf508fc2 100644 --- a/example/auto_compression/detection/configs/tinypose_qat_dis.yaml +++ b/example/auto_compression/detection/configs/tinypose_qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: node: - conv2d_441.tmp_0 -Quantization: +QuantAware: use_pact: true activation_quantize_type: 'moving_average_abs_max' weight_quantize_type: 'channel_wise_abs_max' # 'abs_max' is layer wise quant diff --git a/example/auto_compression/detection/configs/yolov3_mbv1_qat_dis.yaml b/example/auto_compression/detection/configs/yolov3_mbv1_qat_dis.yaml index bc48a679e..e0cf9a9ba 100644 --- a/example/auto_compression/detection/configs/yolov3_mbv1_qat_dis.yaml +++ b/example/auto_compression/detection/configs/yolov3_mbv1_qat_dis.yaml @@ -13,7 +13,7 @@ Distillation: - conv2d_85.tmp_0 - conv2d_86.tmp_0 -Quantization: +QuantAware: activation_quantize_type: 'range_abs_max' quantize_op_types: - conv2d diff --git a/example/auto_compression/hyperparameter_tutorial.md b/example/auto_compression/hyperparameter_tutorial.md index 7c95c94f4..045429950 100644 --- a/example/auto_compression/hyperparameter_tutorial.md +++ b/example/auto_compression/hyperparameter_tutorial.md @@ -3,15 +3,15 @@ ## 1.1 各压缩方法超参解析 -### 1.1.1 量化(quantization) +### 1.1.1 量化训练(quant aware) 量化参数主要设置量化比特数和量化op类型,其中量化op包含卷积层(conv2d, depthwise_conv2d)和全连接层(mul, matmul_v2)。以下为只量化卷积层的示例: ```yaml -Quantization: +QuantAware: use_pact: false # 量化训练是否使用PACT方法 weight_quantize_type: 'channel_wise_abs_max' # 权重量化方式 quantize_op_types: [conv2d, depthwise_conv2d] # 量化OP列表 - onnx_format: false # 是否采用ONNX量化标准格式 + onnx_format: false # 化后的模型是否和符合ONNX量化格式标准 ############### 不常用,以下参数不用设置 ######################### activation_bits: 8 # 激活量化比特数 weight_bits: 8 # 权重量化比特数 @@ -34,7 +34,7 @@ Quantization: from paddleslim.quant.quanter import TRANSFORM_PASS_OP_TYPES,QUANT_DEQUANT_PASS_OP_TYPES print(TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES) ``` -- onnx_format: 是否采用ONNX量化格式标准,如果需要导出成ONNX,则需要设置为True。 +- onnx_format: 量化后的模型是否和符合ONNX量化格式标准,**如果需要导出成ONNX,则需要设置为True。** - activation_bits: 激活量化bit数,可选1~8。默认为8。 - weight_bits: 参数量化bit数,可选1~8。默认为8。 - activation_quantize_type: 激活量化方式,可选 'abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测,请使用 'range_abs_max' 或 'moving_average_abs_max' 。默认为 'moving_average_abs_max'。 @@ -50,7 +50,53 @@ print(TENSORRT_OP_TYPES) - is_full_quantize: 是否量化所有可支持op类型。默认值为False. -### 1.1.2 知识蒸馏(knowledge distillation) +### 1.1.2 离线量化(post-traing quantization) +离线量化中基本的量化参数和量化训练相同,不再赘述。以下介绍离线量化特有的参数: +```yaml +QuantPost: + batch_size: 32 + batch_nums: None + algo: 'hist' + hist_percent: 0.999 + bias_correct: False + recon_level: None + regions: None + epochs: 20 + lr: 0.1 + simulate_activation_quant: False + skip_tensor_list: None +``` +以上配置项说明如下: +- batch_size: 设置每个 batch 的图片数量。默认值为32。 +- batch_nums: 离线量化迭代次数。如果设置为 None ,则会一直运行到全部训练数据迭代结束;否则,迭代次数为 batch_nums, 即参与对 Scale 进行校正的样本个数为 batch_nums * batch_size 。 +- algo: 量化时使用的算法名称,可为 'KL','mse', 'hist', 'avg' 或 'abs_max'。当 algo 设置为 'abs_max' 时,使用校正数据的激活值的绝对值的最大值当作 scale 值,当设置为 'KL' 时,则使用KL散度的方法来计算 Scale 值,当设置为 'avg' 时,使用校正数据激活值的最大绝对值平均数作为 scale 值,当设置为 'hist' 时,则使用基于百分比的直方图的方法来计算 scale 值,当设置为 'mse' 时,则使用搜索最小mse损失的方法来计算 scale 值。默认值为 'hist' 。 +- hist_percent: 'hist' 方法的百分位数。默认值为0.9999。 +- bias_correct: 是否使用 bias correction 算法。默认值为 False 。 +- recon_level: 设置该参数将在离线量化之后进行逐区域重建训练,目前支持 'layer-wise' 和 'region-wise'。当设置为'layer-wise'时, 以层为单位进行重建训练;当设置为'region-wise'时,以 `regions` 中每个块区域为单位进行重建训练;当设置为 None 时,则不进行重建训练。 默认值为 None 。 +- regions(list[list]): 当 recon_level 是 'region-wise' 时,需要设置该参数。该列表中每个元素由一个区域的输入和输出变量名组成,可参考该[示例](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml#L11)。 +- epochs: 逐区域重建训练的训练次数。每个 epoch 内的样本数量为 batch_nums * batch_size 。默认值为20。 +- lr: 设置逐区域重建训练的学习率。 +- simulate_activation_quant: 是否在重建训练中引入激活量化噪声。默认值为 False 。 +- skip_tensor_list: 不进行量化的 Tensor 列表,需填入 Tensor 的 name。Tensor 的name 可以通过可视化工具查看。默认值为 None 。 + + +### 1.1.3 离线量化超参优化(hyper parameter optimization) +超参优化是对离线量化中的超参数进行搜索,以选择最优的超参实现更好的量化效果。离线量化超参优化需要设置 `QuantPost` 和 `HyperParameterOptimization`。 +```yaml +HyperParameterOptimization: + ptq_algo: ["KL", "hist", "avg", "mse"] + bias_correct: [True, False] + hist_percent: [0.98, 0.999], + batch_num: [10, 30], +``` +以上配置项说明如下: +- ptq_algo: 设置待搜索的离线量化算法。 +- bias_correct: 是否使用 bias correction 算法。 +- hist_percent: 设置 'hist' 算法阈值的上限和下限,实际百分比在此范围内均匀采样而得。 +- batch_num: 设置 'batch_num' 的上下限,实际数值在此范围内均匀采样而得。 + + +### 1.1.4 知识蒸馏(knowledge distillation) 蒸馏参数主要设置蒸馏节点(`node`)和教师预测模型路径,如下所示: ```yaml @@ -96,7 +142,7 @@ Distillation: - teacher_params_filename: 教师模型的参数文件名称,格式为 *.pdiparams 或 __params__。仅当设置`teacher_model_dir`后生效。 -### 1.1.3 结构化稀疏(sparsity) +### 1.1.5 结构化稀疏(sparsity) 结构化稀疏参数设置如下所示: ```yaml @@ -108,25 +154,10 @@ ChannelPrune: ``` - pruned_ratio: 每个卷积层的通道数被剪裁的比例。 -- prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称: - -``` -import paddle -paddle.enable_static() -model_dir="./inference_model" -exe = paddle.static.Executor(paddle.CPUPlace()) -[inference_program, feed_target_names, fetch_targets] = ( - paddle.static.load_inference_model(model_dir, exe)) -for var_ in inference_program.list_vars(): - if var_.persistable and "conv2d" in var_.name: - print(f"{var_.name}") -``` - -或者,使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。 - +- prune_params_name: 待剪裁的卷积层的权重名称。如果设置为 "None", 则会按照传入的剪枝比例对所有可以裁剪的卷积层进行裁剪。或者可以参考[结构化剪枝敏感度分析工具](./prune_sensitivity_analysis/README.md)获得合适的要剪枝的参数和比例。也可以使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。默认:"None"。 - criterion: 评估卷积通道重要性的指标。可选 “l1_norm” , “bn_scale” , “geometry_median”。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)。 -### 1.1.4 ASP半结构化稀疏 +### 1.1.6 ASP半结构化稀疏 半结构化稀疏参数设置如下所示: ```yaml @@ -135,23 +166,9 @@ ASPPrune: - conv1_weights ``` -- prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称: - -``` -import paddle -paddle.enable_static() -model_dir="./inference_model" -exe = paddle.static.Executor(paddle.CPUPlace()) -[inference_program, feed_target_names, fetch_targets] = ( - paddle.static.load_inference_model(model_dir, exe)) -for var_ in inference_program.list_vars(): - if var_.persistable and "conv2d" in var_.name: - print(f"{var_.name}") -``` - -或者,使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。 +- prune_params_name: 待剪裁的卷积层的权重名称。如果设置为 "None", 则会按照传入的剪枝比例对所有可以裁剪的卷积层进行裁剪。或者,使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。 -### 1.1.5 Transformer结构化剪枝 +### 1.1.7 Transformer结构化剪枝 针对Transformer结构的结构化剪枝参数设置如下所示: ```yaml @@ -160,7 +177,7 @@ TransformerPrune: ``` - pruned_ratio: 每个全链接层的被剪裁的比例。 -### 1.1.6 非结构化稀疏策略 +### 1.1.8 非结构化稀疏策略 非结构化稀疏参数设置如下所示: ```yaml @@ -196,11 +213,11 @@ UnstructurePrune: {'pruning_steps': int} # the total times you want to increase the ratio {'initial_ratio': float} # the initial ratio value ``` -- prune_params_type 目前只支持None和"conv1x1_only"两个选项,前者表示稀疏化除了归一化层的参数,后者表示只稀疏化1x1卷积。 +- prune_params_type 目前只支持None和"conv1x1_only"两个选项,前者表示稀疏化除了归一化层的参数,后者表示只稀疏化1x1卷积。默认:"conv1x1_only". - local_sparsity 表示剪裁比例(ratio)应用的范围,仅在 'ratio' 模式生效。local_sparsity 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 'ratio', 关闭时表示只保证模型整体稀疏度达到'ratio',但是每个参数矩阵的稀疏度可能存在差异。各个矩阵稀疏度保持一致时,稀疏加速更显著。 - 更多非结构化稀疏的参数含义详见[非结构化稀疏API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/pruners/unstructured_pruner.rst) -### 1.1.7 训练超参 +### 1.1.9 训练超参 训练参数主要设置学习率、训练次数(epochs)和优化器等。 ```yaml @@ -287,3 +304,7 @@ for var_ in inference_program.list_vars(): paddle.static.save_inference_model("./infer_model", feed_vars, fetch_targets, exe, program=inference_program) ``` + +### 5. 量化后模型如何导出成ONNX格式 + +如果想导出ONNX格式的模型,需要在量化的时候设置 ``onnx_format=True``,而且仅支持PaddlePaddle2.4rc0 和PaddleSlim2.4rc0以上版本。 diff --git a/example/auto_compression/image_classification/README.md b/example/auto_compression/image_classification/README.md index 973c04e52..2194eb4db 100644 --- a/example/auto_compression/image_classification/README.md +++ b/example/auto_compression/image_classification/README.md @@ -45,6 +45,8 @@ | MobileNetV3_large_x1_0 | 量化+蒸馏 | 74.04 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_QAT.tar) | | MobileNetV3_large_x1_0_ssld | Baseline | 78.96 | - | 16.62 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_ssld_infer.tar) | | MobileNetV3_large_x1_0_ssld | 量化+蒸馏 | 77.17 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_ssld_QAT.tar) | +| ViT_base_patch16_224 | Baseline | 81.89 | 367.17(batch_size=40) | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) | +| ViT_base_patch16_224 | 量化+蒸馏 | 82.05 | 51.70(batch_size=40) | - | [Config](./configs/VIT/qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ViT_base_patch16_224_QAT.tar) | - ARM CPU 测试环境:`SDM865(4xA77+4xA55)` - Nvidia GPU 测试环境: @@ -57,15 +59,15 @@ #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4 安装paddlepaddle: ```shell # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: @@ -73,6 +75,13 @@ pip install paddlepaddle-gpu pip install paddleslim ``` +若使用`run_ppclas.py`脚本,需安装paddleclas: +```shell +git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.5 +cd PaddleClas +pip install --upgrade -r requirements.txt +``` + #### 3.2 准备数据集 本案例默认以ImageNet1k数据进行自动压缩实验,如数据集为非ImageNet1k格式数据, 请参考[PaddleClas数据准备文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/data_preparation/classification_dataset.md)。将下载好的数据集放在当前目录下`./ILSVRC2012`。 @@ -143,7 +152,12 @@ python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' - - TensorRT预测: -环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) +环境配置:如果使用 TesorRT 预测引擎,需安装的是带有TensorRT的PaddlePaddle,使用以下指令查看本地cuda版本,并且在[下载链接](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的PaddlePaddle安装包。 + + ```shell + cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89 + ### 10.2.89 为cuda版本号,可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。 + ``` ```shell python paddle_inference_eval.py \ diff --git a/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml b/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml index 461f18e03..1bcc0e73b 100644 --- a/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/EfficientNetB0/qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: node: - softmax_1.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml b/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml index 71e2eeaf5..0e91d4c09 100644 --- a/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/GhostNet_x1_0/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_0.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml b/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml index 6276f703e..3b1e4084c 100644 --- a/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/InceptionV3/qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: loss: l2 node: - softmax_1.tmp_0 -Quantization: +QuantAware: is_full_quantize: false activation_quantize_type: moving_average_abs_max weight_quantize_type: channel_wise_abs_max diff --git a/example/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml b/example/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml index 9c3c2b97f..8f74d745d 100644 --- a/example/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/MobileNetV1/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_0.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml b/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml index e6a2e1049..2da27da8d 100644 --- a/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/MobileNetV3_large_x1_0/qat_dis.yaml @@ -9,7 +9,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml b/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml index 64d571171..50eb9898d 100644 --- a/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/PPHGNet_tiny/qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: node: - softmax_1.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml b/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml index 00c05888a..ae6f25b01 100644 --- a/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/PPLCNetV2_base/qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: node: - softmax_1.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml b/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml index d588f8a9f..f0e67260a 100644 --- a/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/PPLCNet_x1_0/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_1.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml b/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml index 078915aa3..2d0ea1ebc 100644 --- a/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/ResNet50_vd/qat_dis.yaml @@ -11,7 +11,7 @@ Distillation: node: - softmax_0.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml b/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml index 0c0ca531f..31c618e4b 100644 --- a/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/ShuffleNetV2_x1_0/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_0.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml b/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml index 073f38724..4b9964afb 100644 --- a/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/SqueezeNet1_0/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_0.tmp_0 -Quantization: +QuantAware: activation_bits: 8 is_full_quantize: false activation_quantize_type: moving_average_abs_max diff --git a/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml b/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml index ce8f746f4..99f61b775 100644 --- a/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml +++ b/example/auto_compression/image_classification/configs/SwinTransformer_base_patch4_window7_224/qat_dis.yaml @@ -10,7 +10,7 @@ Distillation: loss: l2 node: - softmax_48.tmp_0 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/nlp/README.md b/example/auto_compression/nlp/README.md index af1a5cf31..41e9f2de7 100644 --- a/example/auto_compression/nlp/README.md +++ b/example/auto_compression/nlp/README.md @@ -30,13 +30,20 @@ | ERNIE 3.0-Medium | Base模型| 75.35 | 57.45 | 60.17 | 81.16 | 77.19 | 80.59 | 79.70 | 73.09 | | ERNIE 3.0-Medium | 剪枝+量化训练| 74.17 | 56.84 | 59.75 | 80.54 | 76.03 | 76.97 | 80.80 | 72.16 | +| 模型 | 策略 | 报销工单数据 | +|:------:|:------:|:------:| +| UIE-base | Base模型 | [91.83](https://bj.bcebos.com/v1/paddle-slim-models/act/uie_base.tar) | +| UIE-base | 量化训练 | [95.80](https://bj.bcebos.com/v1/paddle-slim-models/act/uie_base_qat_model.tar) | + +注:UIE模型精度为在5-shot(每个类别包含5条标注数据)数据集上进行模型微调的结果,压缩后精度更高可能原因是过拟合在当前数据集。 + 模型在不同任务上平均精度以及加速对比如下: -| 模型 |策略| Accuracy(avg) | 时延(ms) | 加速比 | -|:-------:|:--------:|:----------:|:------------:| :------:| -|PP-MiniLM| Base模型| 72.81 | 128.01 | - | -|PP-MiniLM| 剪枝+离线量化 | 72.44 | 17.97 | 7.12 | -|ERNIE 3.0-Medium| Base模型| 73.09 | 29.25(fp16) | - | -|ERNIE 3.0-Medium| 剪枝+量化训练 | 72.16 | 19.61 | 1.49 | +| 模型 |策略| Accuracy(avg) | 预测时延FP32
| 预测时延FP16
| 预测时延INT8
| 加速比 | +|:-------:|:--------:|:----------:|:------------:|:------:|:------:|:------:| +|PP-MiniLM| Base模型| 72.81 | 94.49ms | 23.31ms | - | - | +|PP-MiniLM| 剪枝+离线量化 | 71.85 | - | - | 15.76ms | 5.99x | +|ERNIE 3.0-Medium| Base模型| 73.09 | 89.71ms | 20.76ms | - | - | +|ERNIE 3.0-Medium| 剪枝+量化训练 | 72.16 | - | - | 14.08ms | 6.37x | 性能测试的环境为 - 硬件:NVIDIA Tesla T4 单卡 @@ -47,8 +54,8 @@ #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - PaddleNLP >= 2.3 安装paddlepaddle: @@ -56,7 +63,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: @@ -157,7 +164,9 @@ Prune: pruned_ratio: 0.25 ``` -- 优化参数 +- 离线量化超参搜索 + +本示例的离线量化采取了超参搜索策略,以选择最优的超参数取得更好的离线量化效果。首先,配置待搜索的参数: ```yaml HyperParameterOptimization: @@ -177,12 +186,12 @@ HyperParameterOptimization: - channel_wise_abs_max ``` -- 量化参数 +其次,配置离线量化参数: 量化参数主要设置量化比特数和量化op类型,其中量化op包含卷积层(conv2d, depthwise_conv2d)和全连接层(mul,matmul_v2)。 ```yaml -Quantization: +QuantPost: activation_bits: 8 quantize_op_types: - conv2d diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml index f50fa6cd0..9c9f58826 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 6 eval_iter: 1070 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml index 0667d5e6e..98035ed77 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 100 eval_iter: 70 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml index 8a991e0af..e5f21a5e3 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 6 eval_iter: 2000 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml index af6e81b2f..47e02597e 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 16 eval_iter: 1000 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml index aaacb2969..d215b192c 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 12 eval_iter: 750 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml index 8fce64496..eec5b87f4 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 20 eval_iter: 1050 diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml index 8c57ae30e..1344332e8 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml @@ -10,7 +10,7 @@ TransformerPrune: pruned_ratio: 0.25 HyperParameterOptimization: Distillation: -Quantization: +QuantPost: TrainConfig: epochs: 6 eval_iter: 1110 diff --git a/example/auto_compression/ocr/README.md b/example/auto_compression/ocr/README.md index 82c2d2bdd..66d980cae 100644 --- a/example/auto_compression/ocr/README.md +++ b/example/auto_compression/ocr/README.md @@ -29,15 +29,15 @@ #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 安装paddlepaddle: ```shell # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: diff --git a/example/auto_compression/ocr/configs/ppocrv3_det_qat_dist.yaml b/example/auto_compression/ocr/configs/ppocrv3_det_qat_dist.yaml index 7b54d4207..b753dd71c 100644 --- a/example/auto_compression/ocr/configs/ppocrv3_det_qat_dist.yaml +++ b/example/auto_compression/ocr/configs/ppocrv3_det_qat_dist.yaml @@ -9,7 +9,7 @@ Distillation: alpha: 1.0 loss: l2 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/auto_compression/prune_sensitivity_analysis/README.md b/example/auto_compression/prune_sensitivity_analysis/README.md new file mode 100644 index 000000000..cc9f58780 --- /dev/null +++ b/example/auto_compression/prune_sensitivity_analysis/README.md @@ -0,0 +1,170 @@ +# 结构化剪枝敏感度分析 + +本示例将以自动压缩示例中MobileNetV1为例,介绍如何快速修改示例代码,进行结构化剪枝敏感度分析工具分析模型参数敏感度,从而设置合适的剪枝比例和要剪枝的参数,在保证剪枝后模型精度的前提下进行最大比例的模型剪枝。 +图像分类除MobileNetV1模型外其他模型的结构化剪枝敏感度分析可以直接使用 [run.py](./run.py) 脚本,替换传入的 config_path 文件为其他模型的任一压缩yaml文件,即可对其他图像分类模型进行敏感度分析。 + +## 计算通道剪枝敏感度 + +以下为示例代码每一步的含义,如果您是ACT(自动压缩工具)的用户,加粗文字表示如何把一个自动压缩示例改为一个敏感度分析示例。 + +### 1. 引入依赖 + +引入一些需要的依赖,可以直接复用以下代码,如果您需要对其他场景下模型进行敏感度分析,需要把其他场景文件下中 ``run.py`` 文件中独有的依赖也导入进来。**或者把最后一个依赖放入自动压缩示例代码中。** + +```python +import os +import sys +import argparse +import pickle +import functools +from functools import partial +import math +from tqdm import tqdm + +import numpy as np +import paddle +import paddle.nn as nn +from paddle.io import DataLoader +import paddleslim +from imagenet_reader import ImageNetDataset +from paddleslim.common import load_config as load_slim_config +from paddleslim.auto_compression.analysis import analysis_prune +``` + +### 2. 定义可传入参数 + +定义一些可以通过指令传入的参数。**此段代码无论您想对任何场景的模型进行分析都无需修改,复制过去替换原本的指令即可** + +```python +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--analysis_file', + type=str, + default='sensitivity_0.data', + help="directory to save compressed model.") + parser.add_argument( + '--pruned_ratios', + nargs='+', + type=float, + default=[0.1, 0.2, 0.3, 0.4], + help="The ratios to be pruned when compute sensitivity.") + parser.add_argument( + '--target_loss', + type=float, + default=0.2, + help="use the target loss to get prune ratio of each parameter") + + return parser + + +``` + +### 3. 定义eval_function + +需要定义完整的测试流程,可以直接使用对应场景文件夹下 ``run.py`` 文件中的测试流程即可,**把自动压缩示例代码中测试回调函数中下面这一行代码:** + +```python +def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): +``` +**修改成:** +```python +def eval_function(compiled_test_program, exe, test_feed_names, test_fetch_list): +``` + +最终的测试过程代码如下: +```python +def eval_reader(data_dir, batch_size, crop_size, resize_size, place=None): + val_reader = ImageNetDataset( + mode='val', + data_dir=data_dir, + crop_size=crop_size, + resize_size=resize_size) + val_loader = DataLoader( + val_reader, + places=[place] if place is not None else None, + batch_size=global_config['batch_size'], + shuffle=False, + drop_last=False, + num_workers=0) + return val_loader + + +def eval_function(compiled_test_program, exe, test_feed_names, test_fetch_list): + val_loader = eval_reader( + global_config['data_dir'], + batch_size=global_config['batch_size'], + crop_size=img_size, + resize_size=resize_size) + + results = [] + with tqdm( + total=len(val_loader), + bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', + ncols=80) as t: + for batch_id, (image, label) in enumerate(val_loader): + # top1_acc, top5_acc + if len(test_feed_names) == 1: + image = np.array(image) + label = np.array(label).astype('int64') + pred = exe.run(compiled_test_program, + feed={test_feed_names[0]: image}, + fetch_list=test_fetch_list) + pred = np.array(pred[0]) + label = np.array(label) + sort_array = pred.argsort(axis=1) + top_1_pred = sort_array[:, -1:][:, ::-1] + top_1 = np.mean(label == top_1_pred) + top_5_pred = sort_array[:, -5:][:, ::-1] + acc_num = 0 + for i in range(len(label)): + if label[i][0] in top_5_pred[i]: + acc_num += 1 + top_5 = float(acc_num) / len(label) + results.append([top_1, top_5]) + else: + # eval "eval model", which inputs are image and label, output is top1 and top5 accuracy + image = np.array(image) + label = np.array(label).astype('int64') + result = exe.run(compiled_test_program, + feed={ + test_feed_names[0]: image, + test_feed_names[1]: label + }, + fetch_list=test_fetch_list) + result = [np.mean(r) for r in result] + results.append(result) + t.update() + result = np.mean(np.array(results), axis=0) + return result[0] +``` + +### 4. 加载配置文件 +加载配置文件,获得文件中数据读取部分的相关配置。**使用原始的自动压缩示例代码中的即可** +```python +global global_config +all_config = load_slim_config(args.config_path) + +assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" +global_config = all_config["Global"] + +global img_size, resize_size +img_size = global_config['img_size'] if 'img_size' in global_config else 224 +resize_size = global_config[ + 'resize_size'] if 'resize_size' in global_config else 256 +``` + +### 4. 进行敏感度分析 + +传入测试回调函数,配置(主要包括模型位置和模型名称等信息),分析文件保存的位置,要分析的裁剪比例和可以接受的精度目标损失。如果不传入可以接受的精度目标损失,则只返回敏感度分析情况。**把自动压缩代码中调用AutoCompression 和 ac.compress 的代码替换成以下代码即可** + +```python +analysis_prune(eval_function, global_config['model_dir'], global_config['model_filename'], global_config['params_filename'], args.analysis_file, + args.pruned_ratios, args.target_loss) +``` diff --git a/example/auto_compression/prune_sensitivity_analysis/run.py b/example/auto_compression/prune_sensitivity_analysis/run.py new file mode 100644 index 000000000..59abd8d59 --- /dev/null +++ b/example/auto_compression/prune_sensitivity_analysis/run.py @@ -0,0 +1,149 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import argparse +import pickle +import functools +from functools import partial +import math +from tqdm import tqdm + +import numpy as np +import paddle +import paddle.nn as nn +from paddle.io import DataLoader +import paddleslim +from imagenet_reader import ImageNetDataset +from paddleslim.common import load_config as load_slim_config +from paddleslim.auto_compression.analysis import analysis_prune + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--analysis_file', + type=str, + default='sensitivity_0.data', + help="directory to save compressed model.") + parser.add_argument( + '--pruned_ratios', + nargs='+', + type=float, + default=[0.1, 0.2, 0.3, 0.4], + help="The ratios to be pruned when compute sensitivity.") + parser.add_argument( + '--target_loss', + type=float, + default=0.2, + help="use the target loss to get prune ratio of each parameter") + + return parser + + +def eval_reader(data_dir, batch_size, crop_size, resize_size, place=None): + val_reader = ImageNetDataset( + mode='val', + data_dir=data_dir, + crop_size=crop_size, + resize_size=resize_size) + val_loader = DataLoader( + val_reader, + places=[place] if place is not None else None, + batch_size=global_config['batch_size'], + shuffle=False, + drop_last=False, + num_workers=0) + return val_loader + + +def eval_function(compiled_test_program, exe, test_feed_names, test_fetch_list): + val_loader = eval_reader( + global_config['data_dir'], + batch_size=global_config['batch_size'], + crop_size=img_size, + resize_size=resize_size) + + results = [] + with tqdm( + total=len(val_loader), + bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', + ncols=80) as t: + for batch_id, (image, label) in enumerate(val_loader): + # top1_acc, top5_acc + if len(test_feed_names) == 1: + image = np.array(image) + label = np.array(label).astype('int64') + pred = exe.run(compiled_test_program, + feed={test_feed_names[0]: image}, + fetch_list=test_fetch_list) + pred = np.array(pred[0]) + label = np.array(label) + sort_array = pred.argsort(axis=1) + top_1_pred = sort_array[:, -1:][:, ::-1] + top_1 = np.mean(label == top_1_pred) + top_5_pred = sort_array[:, -5:][:, ::-1] + acc_num = 0 + for i in range(len(label)): + if label[i][0] in top_5_pred[i]: + acc_num += 1 + top_5 = float(acc_num) / len(label) + results.append([top_1, top_5]) + else: + # eval "eval model", which inputs are image and label, output is top1 and top5 accuracy + image = np.array(image) + label = np.array(label).astype('int64') + result = exe.run(compiled_test_program, + feed={ + test_feed_names[0]: image, + test_feed_names[1]: label + }, + fetch_list=test_fetch_list) + result = [np.mean(r) for r in result] + results.append(result) + t.update() + result = np.mean(np.array(results), axis=0) + return result[0] + + +def main(): + global global_config + all_config = load_slim_config(args.config_path) + + assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" + global_config = all_config["Global"] + + global img_size, resize_size + img_size = global_config['img_size'] if 'img_size' in global_config else 224 + resize_size = global_config[ + 'resize_size'] if 'resize_size' in global_config else 256 + + analysis_prune(eval_function, global_config['model_dir'], + global_config['model_filename'], + global_config['params_filename'], args.analysis_file, + args.pruned_ratios, args.target_loss) + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + args = parser.parse_args() + main() diff --git a/example/auto_compression/pytorch_huggingface/README.md b/example/auto_compression/pytorch_huggingface/README.md index b7cc14374..c5ccaff38 100644 --- a/example/auto_compression/pytorch_huggingface/README.md +++ b/example/auto_compression/pytorch_huggingface/README.md @@ -40,8 +40,8 @@ ## 3. 自动压缩流程 #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - X2Paddle develop版本 - transformers >= 4.18.0 - PaddleNLP >= 2.3 @@ -54,7 +54,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: diff --git a/example/auto_compression/pytorch_yolo_series/README.md b/example/auto_compression/pytorch_yolo_series/README.md index e75ac8b97..e65677391 100644 --- a/example/auto_compression/pytorch_yolo_series/README.md +++ b/example/auto_compression/pytorch_yolo_series/README.md @@ -45,24 +45,30 @@ ## 3. 自动压缩流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3.2版本 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据相应环境的安装指令进行安装) +- PaddlePaddle >= 2.4.0版本 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据相应环境的安装指令进行安装) - PaddleSlim develop 版本 (1)安装paddlepaddle ``` # CPU -pip install paddlepaddle==2.3.2 +pip install paddlepaddle # GPU 以Ubuntu、CUDA 11.2为例 -python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html +python -m pip install paddlepaddle_gpu -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` -(2)安装paddleslim>=2.3.3: +(2)安装paddleslim>=2.4.0: ```shell -pip install paddleslim==2.3.3 +pip install paddleslim ``` +#### 版本对齐 -#### 3.2 准备数据集 +| PaddleSlim | x2paddle | +| :-----------: | :------------: | +| 2.3.x | 1.3.8 | +| develop / 2.4 | 1.3.9 | + +### 3.2 准备数据集 **选择(1)或(2)中一种方法准备数据即可。** @@ -107,7 +113,7 @@ pip install paddleslim==2.3.3 ``` -#### 3.3 准备预测模型 +### 3.3 准备预测模型 (1)准备ONNX模型: @@ -130,7 +136,7 @@ pip install paddleslim==2.3.3 **注意**:目前ACT支持**不带NMS**模型,使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)。 -#### 3.4 自动压缩并产出模型 +### 3.4 自动压缩并产出模型 蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。 @@ -160,7 +166,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log - │ ├── calibration.cache # TensorRT可以直接加载的校准表 ``` -#### Paddle Inference部署测试 +### Paddle Inference部署测试 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。 @@ -219,7 +225,7 @@ bash compile.sh ./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8 ``` -#### 导出至ONNX使用TensorRT部署 +### 导出至ONNX使用TensorRT部署 加载`quant_model.onnx`和`calibration.cache`,可以直接使用TensorRT测试脚本进行验证,详细代码可参考[TensorRT部署](./TensorRT) diff --git a/example/auto_compression/pytorch_yolo_series/configs/yolov5s_qat_dis.yaml b/example/auto_compression/pytorch_yolo_series/configs/yolov5s_qat_dis.yaml index c2b230d89..683f4a6f0 100644 --- a/example/auto_compression/pytorch_yolo_series/configs/yolov5s_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolo_series/configs/yolov5s_qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true use_pact: true activation_quantize_type: 'moving_average_abs_max' diff --git a/example/auto_compression/pytorch_yolo_series/configs/yolov6s_qat_dis.yaml b/example/auto_compression/pytorch_yolo_series/configs/yolov6s_qat_dis.yaml index 9a3f7af3b..ded463063 100644 --- a/example/auto_compression/pytorch_yolo_series/configs/yolov6s_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolo_series/configs/yolov6s_qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true activation_quantize_type: 'moving_average_abs_max' quantize_op_types: diff --git a/example/auto_compression/pytorch_yolo_series/configs/yolov6s_v2_qat_dis.yaml b/example/auto_compression/pytorch_yolo_series/configs/yolov6s_v2_qat_dis.yaml index 4c775392b..92acc3be8 100644 --- a/example/auto_compression/pytorch_yolo_series/configs/yolov6s_v2_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolo_series/configs/yolov6s_v2_qat_dis.yaml @@ -13,7 +13,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true activation_quantize_type: 'moving_average_abs_max' quantize_op_types: diff --git a/example/auto_compression/pytorch_yolo_series/configs/yolov7_qat_dis.yaml b/example/auto_compression/pytorch_yolo_series/configs/yolov7_qat_dis.yaml index b7dcce83b..29c92a99d 100644 --- a/example/auto_compression/pytorch_yolo_series/configs/yolov7_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolo_series/configs/yolov7_qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true activation_quantize_type: 'moving_average_abs_max' quantize_op_types: diff --git a/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml b/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml index 7359e0ee6..842902379 100644 --- a/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml @@ -12,7 +12,7 @@ Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: onnx_format: true activation_quantize_type: 'moving_average_abs_max' quantize_op_types: diff --git a/example/auto_compression/semantic_segmentation/README.md b/example/auto_compression/semantic_segmentation/README.md index 952c83662..14f1e5dcc 100644 --- a/example/auto_compression/semantic_segmentation/README.md +++ b/example/auto_compression/semantic_segmentation/README.md @@ -47,8 +47,8 @@ #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - PaddleSeg == 2.5.0 安装paddlepaddle: @@ -56,7 +56,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: diff --git a/example/auto_compression/semantic_segmentation/configs/BiSeNetV2/BiSeNetV2_qat.yaml b/example/auto_compression/semantic_segmentation/configs/BiSeNetV2/BiSeNetV2_qat.yaml index 1de0705a4..52700e2d4 100644 --- a/example/auto_compression/semantic_segmentation/configs/BiSeNetV2/BiSeNetV2_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/BiSeNetV2/BiSeNetV2_qat.yaml @@ -11,7 +11,7 @@ Distillation: node: - conv2d_103.tmp_1 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/semantic_segmentation/configs/deeplabv3/deeplabv3_qat.yaml b/example/auto_compression/semantic_segmentation/configs/deeplabv3/deeplabv3_qat.yaml index 36c4e34ef..3a2e8c620 100644 --- a/example/auto_compression/semantic_segmentation/configs/deeplabv3/deeplabv3_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/deeplabv3/deeplabv3_qat.yaml @@ -11,7 +11,7 @@ Distillation: node: - conv2d_123.tmp_1 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/semantic_segmentation/configs/hrnet/hrnet_qat.yaml b/example/auto_compression/semantic_segmentation/configs/hrnet/hrnet_qat.yaml index 1eec456e2..8f852cdf7 100644 --- a/example/auto_compression/semantic_segmentation/configs/hrnet/hrnet_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/hrnet/hrnet_qat.yaml @@ -10,7 +10,7 @@ Distillation: node: - conv2d_613.tmp_1 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/semantic_segmentation/configs/pp_humanseg/pp_humanseg_qat.yaml b/example/auto_compression/semantic_segmentation/configs/pp_humanseg/pp_humanseg_qat.yaml index 8893dc35c..5b497a1e6 100644 --- a/example/auto_compression/semantic_segmentation/configs/pp_humanseg/pp_humanseg_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/pp_humanseg/pp_humanseg_qat.yaml @@ -10,7 +10,7 @@ Distillation: node: - batch_norm_47.tmp_2 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/semantic_segmentation/configs/pp_liteseg/pp_liteseg_qat.yaml b/example/auto_compression/semantic_segmentation/configs/pp_liteseg/pp_liteseg_qat.yaml index 12eea7e26..f739354a1 100644 --- a/example/auto_compression/semantic_segmentation/configs/pp_liteseg/pp_liteseg_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/pp_liteseg/pp_liteseg_qat.yaml @@ -10,7 +10,7 @@ Distillation: node: - conv2d_95.tmp_0 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/semantic_segmentation/configs/unet/unet_qat.yaml b/example/auto_compression/semantic_segmentation/configs/unet/unet_qat.yaml index ff055e2b0..c25033f9e 100644 --- a/example/auto_compression/semantic_segmentation/configs/unet/unet_qat.yaml +++ b/example/auto_compression/semantic_segmentation/configs/unet/unet_qat.yaml @@ -10,7 +10,7 @@ Distillation: node: - conv2d_37.tmp_1 -Quantization: +QuantAware: onnx_format: True quantize_op_types: - conv2d diff --git a/example/auto_compression/tensorflow_mobilenet/README.md b/example/auto_compression/tensorflow_mobilenet/README.md index 0a41d6e96..7370a4ec3 100644 --- a/example/auto_compression/tensorflow_mobilenet/README.md +++ b/example/auto_compression/tensorflow_mobilenet/README.md @@ -31,8 +31,8 @@ ## 3. 自动压缩流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6 - opencv-python @@ -41,7 +41,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` (2)安装paddleslim: diff --git a/example/auto_compression/tensorflow_mobilenet/configs/mbv1_qat_dis.yaml b/example/auto_compression/tensorflow_mobilenet/configs/mbv1_qat_dis.yaml index 359ac18d1..eda30fa31 100644 --- a/example/auto_compression/tensorflow_mobilenet/configs/mbv1_qat_dis.yaml +++ b/example/auto_compression/tensorflow_mobilenet/configs/mbv1_qat_dis.yaml @@ -38,7 +38,7 @@ Distillation: - batch_norm_26.tmp_3 - conv2d_42.tmp_1 -Quantization: +QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false diff --git a/example/full_quantization/image_classification/README.md b/example/full_quantization/image_classification/README.md index 8c33ca841..9f631f307 100644 --- a/example/full_quantization/image_classification/README.md +++ b/example/full_quantization/image_classification/README.md @@ -31,15 +31,15 @@ #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 安装paddlepaddle: ```shell # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: diff --git a/example/full_quantization/image_classification/configs/mobilenetv3_large_qat_dis.yaml b/example/full_quantization/image_classification/configs/mobilenetv3_large_qat_dis.yaml index 52c762196..8c72318b8 100644 --- a/example/full_quantization/image_classification/configs/mobilenetv3_large_qat_dis.yaml +++ b/example/full_quantization/image_classification/configs/mobilenetv3_large_qat_dis.yaml @@ -9,7 +9,7 @@ Global: Distillation: alpha: 1.0 loss: soft_label -Quantization: +QuantAware: use_pact: true activation_bits: 8 activation_quantize_type: moving_average_abs_max diff --git a/example/full_quantization/picodet/README.md b/example/full_quantization/picodet/README.md index a47296e38..45832b312 100644 --- a/example/full_quantization/picodet/README.md +++ b/example/full_quantization/picodet/README.md @@ -30,8 +30,8 @@ ## 3. 全量化流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3.4 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - PaddleDet >= 2.4 - opencv-python @@ -40,7 +40,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: diff --git a/example/full_quantization/picodet/configs/picodet_npu.yaml b/example/full_quantization/picodet/configs/picodet_npu.yaml index 37f20d7b7..9bfffab4b 100644 --- a/example/full_quantization/picodet/configs/picodet_npu.yaml +++ b/example/full_quantization/picodet/configs/picodet_npu.yaml @@ -11,7 +11,7 @@ Distillation: alpha: 1.0 loss: l2 -Quantization: +QuantAware: # Auto Compression use_pact: true activation_quantize_type: 'moving_average_abs_max' weight_bits: 8 diff --git a/example/full_quantization/picodet/configs/picodet_npu_with_postprocess.yaml b/example/full_quantization/picodet/configs/picodet_npu_with_postprocess.yaml index 4064df0db..6a291be11 100644 --- a/example/full_quantization/picodet/configs/picodet_npu_with_postprocess.yaml +++ b/example/full_quantization/picodet/configs/picodet_npu_with_postprocess.yaml @@ -11,7 +11,7 @@ Distillation: alpha: 1.0 loss: l2 -Quantization: +QuantAware: # Auto Compression use_pact: true activation_quantize_type: 'moving_average_abs_max' weight_bits: 8 diff --git a/example/post_training_quantization/detection/README.md b/example/post_training_quantization/detection/README.md index 3e76ba645..b51f5d581 100644 --- a/example/post_training_quantization/detection/README.md +++ b/example/post_training_quantization/detection/README.md @@ -35,8 +35,8 @@ ## 3. 离线量化流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.3 +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.4.0 - PaddleDet >= 2.4 - opencv-python @@ -45,7 +45,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` 安装paddleslim: @@ -130,7 +130,7 @@ python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml - 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。 #### 3.6 提高离线量化精度 -本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md)。 +本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。 经过多个实验,包括尝试多种激活算法(avg,KL等)、weight的量化方式(abs_max,channel_wise_abs_max),对PicoDet-s进行离线量化后精度均为0,以PicoDet-s为例,量化分析工具具体使用方法如下: @@ -162,11 +162,11 @@ python post_quant.py --config_path=./configs/picodet_s_analyzed_ptq.yaml --save_ **加速分析过程** 使用量化分析工具时,因需要逐层量化模型并进行验证,因此过程可能较慢,若想加速分析过程,可以在配置文件中设置 `FastEvalDataset` ,输入一个图片数量较少的annotation文件路径。注意,用少量数据验证的模型精度不一定等于全量数据验证的模型精度,若只需分析时获得不同层量化效果的相对排序,可以使用少量数据集;若要求准确精度,请使用全量验证数据集。如需要全量验证数据,将 `FastEvalDataset` 字段删掉即可。 +若需要少量验证数据集来快速验证,可下载:[单张COCO验证数据集](https://bj.bcebos.com/v1/paddle-slim-models/data/small_instances_val2017.json)。 注:分析之后若需要直接产出符合目标精度的量化模型,demo代码不会使用少量数据集验证,会自动使用全量验证数据。 -量化分析工具详细介绍见[量化分析工具介绍](../analysis.md) ## 4.预测部署 预测部署可参考[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection) diff --git a/example/post_training_quantization/detection/analysis.py b/example/post_training_quantization/detection/analysis.py index 4acd54e43..7b854d265 100644 --- a/example/post_training_quantization/detection/analysis.py +++ b/example/post_training_quantization/detection/analysis.py @@ -23,7 +23,7 @@ from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from keypoint_utils import keypoint_post_process from post_process import PPYOLOEPostProcess -from paddleslim.quant.analysis import AnalysisQuant +from paddleslim.quant.analysis_ptq import AnalysisPTQ def argsparser(): @@ -161,7 +161,7 @@ def main(): else: raise ValueError("metric currently only supports COCO and VOC.") - analyzer = AnalysisQuant( + analyzer = AnalysisPTQ( model_dir=config["model_dir"], model_filename=config["model_filename"], params_filename=config["params_filename"], diff --git a/example/post_training_quantization/pytorch_yolo_series/README.md b/example/post_training_quantization/pytorch_yolo_series/README.md index e0ed9bfc8..dbf23ef26 100644 --- a/example/post_training_quantization/pytorch_yolo_series/README.md +++ b/example/post_training_quantization/pytorch_yolo_series/README.md @@ -36,7 +36,7 @@ ## 3. 离线量化流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddlePaddle >= 2.4.0 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) - PaddleSlim > 2.3版本 - X2Paddle >= 1.3.9 - opencv-python @@ -47,7 +47,7 @@ # CPU pip install paddlepaddle # GPU -pip install paddlepaddle-gpu +pip install paddlepaddle_gpu ``` (2)安装paddleslim: @@ -116,7 +116,9 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml #### 3.6 提高离线量化精度 -本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisQuant```详解见[AnalysisQuant.md](../../../../docs/zh_cn/tutorials/quant/AnalysisQuant.md)。 + +###### 3.6.1 量化分析工具 +本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](../../../docs/zh_cn/tutorials/quant/AnalysisPTQ.md)。 由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下: @@ -153,10 +155,10 @@ python post_quant.py --config_path=./configs/yolov6s_analyzed_ptq.yaml --save_di **加速分析过程** 使用量化分析工具时,因需要逐层量化模型并进行验证,因此过程可能较慢,若想加速分析过程,可以在配置文件中设置 `fast_val_anno_path` ,输入一个图片数量较少的annotation文件路径。注意,用少量数据验证的模型精度不一定等于全量数据验证的模型精度,若只需分析时获得不同层量化效果的相对排序,可以使用少量数据集;若要求准确精度,请使用全量验证数据集。如需要全量验证数据,将 `fast_val_anno_path` 设置为None即可。 +若需要少量验证数据集来快速验证,可下载:[单张COCO验证数据集](https://bj.bcebos.com/v1/paddle-slim-models/data/small_instances_val2017.json)。 注:分析之后若需要直接产出符合目标精度的量化模型,demo代码不会使用少量数据集验证,会自动使用全量验证数据。 -量化分析工具详细介绍见[量化分析工具介绍](../analysis.md) ## 4.预测部署 预测部署可参考[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series) diff --git a/example/post_training_quantization/pytorch_yolo_series/analysis.py b/example/post_training_quantization/pytorch_yolo_series/analysis.py index 39d879f0c..088c7aec6 100644 --- a/example/post_training_quantization/pytorch_yolo_series/analysis.py +++ b/example/post_training_quantization/pytorch_yolo_series/analysis.py @@ -21,7 +21,7 @@ from post_process import YOLOPostProcess, coco_metric from dataset import COCOValDataset, COCOTrainDataset from paddleslim.common import load_config, load_onnx_model -from paddleslim.quant.analysis import AnalysisQuant +from paddleslim.quant.analysis_ptq import AnalysisPTQ def argsparser(): @@ -103,7 +103,7 @@ def main(): load_onnx_model(config["model_dir"]) inference_model_path = config["model_dir"].rstrip().rstrip( '.onnx') + '_infer' - analyzer = AnalysisQuant( + analyzer = AnalysisPTQ( model_dir=inference_model_path, model_filename='model.pdmodel', params_filename='model.pdiparams', diff --git a/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml b/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml index 971a7376e..ad6061212 100755 --- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml +++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml @@ -1,5 +1,5 @@ arch: YOLOv6 -model_dir: ./yolov6s.onnx +model_dir: ./yolov6s.onnx dataset_dir: /dataset/coco/ model_filename: model.pdmodel params_filename: model.pdiparams @@ -8,25 +8,3 @@ val_image_dir: val2017 train_anno_path: annotations/instances_train2017.json val_anno_path: annotations/instances_val2017.json skip_tensor_list: None -regions: [['x2paddle_image_arrays','relu_8.tmp_0'], - ['relu_8.tmp_0','relu_15.tmp_0'], - ['relu_15.tmp_0','relu_21.tmp_0'], - ['concat_1.tmp_0','relu_26.tmp_0'], - ['concat_2.tmp_0', 'relu_30.tmp_0'], - ['relu_30.tmp_0', 'concat_4.tmp_0'], - ['relu_30.tmp_0', 'relu_31.tmp_0'], - ['concat_3.tmp_0', 'relu_35.tmp_0'], - ['relu_35.tmp_0', 'relu_36.tmp_0'], - ['concat_5.tmp_0', 'concat_10.tmp_0'], - ['relu_35.tmp_0', 'concat_8.tmp_0']] -region_weights_names: [['conv2d_0.w_0','conv2d_1.w_0','conv2d_2.w_0','conv2d_3.w_0','conv2d_4.w_0','conv2d_5.w_0','conv2d_6.w_0','conv2d_7.w_0','conv2d_8.w_0'], - ['conv2d_9.w_0','conv2d_10.w_0','conv2d_11.w_0','conv2d_12.w_0','conv2d_13.w_0','conv2d_14.w_0','conv2d_15.w_0'], - ['conv2d_16.w_0','conv2d_17.w_0','conv2d_18.w_0','conv2d_19.w_0','conv2d_20.w_0','conv2d_21.w_0'], - ['conv2d_22.w_0','conv2d_23.w_0','conv2d_24.w_0','conv2d_25.w_0','conv2d_26.w_0'], - ['conv2d_27.w_0','conv2d_28.w_0','conv2d_29.w_0','conv2d_30.w_0'], - ['conv2d_32.w_0','conv2d_34.w_0','conv2d_35.w_0','conv2d_37.w_0','conv2d_38.w_0','conv2d_39.w_0'], - ['conv2d_31.w_0'], - ['conv2d_33.w_0','conv2d_36.w_0','conv2d_40.w_0','conv2d_41.w_0'], - ['conv2d_42.w_0'], - ['conv2d_44.w_0','conv2d_47.w_0','conv2d_51.w_0','conv2d_52.w_0','conv2d_53.w_0','conv2d_54.w_0','conv2d_55.w_0','conv2d_56.w_0','conv2d_57.w_0','conv2d_58.w_0'], - ['conv2d_43.w_0','conv2d_45.w_0','conv2d_46.w_0','conv2d_49.w_0','conv2d_48.w_0','conv2d_50.w_0'],] \ No newline at end of file diff --git a/example/post_training_quantization/pytorch_yolo_series/fine_tune.py b/example/post_training_quantization/pytorch_yolo_series/fine_tune.py index ea777a474..144cde3ea 100755 --- a/example/post_training_quantization/pytorch_yolo_series/fine_tune.py +++ b/example/post_training_quantization/pytorch_yolo_series/fine_tune.py @@ -43,8 +43,6 @@ def argsparser(): help="which device used to compress.") parser.add_argument( '--algo', type=str, default='avg', help="post quant algo.") - parser.add_argument( - '--round_type', type=str, default='adaround', help="round type.") parser.add_argument('--gpu', type=int, default=0, help='gpu index') parser.add_argument( @@ -57,6 +55,12 @@ def argsparser(): type=bool, default=False, help='simulate activation quant') + parser.add_argument( + '--epochs', type=int, default=20, help='steps to reconstruct') + parser.add_argument( + '--lr', type=float, default=0.1, help='learning rate of reconstruct') + parser.add_argument( + '--limit', type=int, default=5, help='size of each region') return parser @@ -102,12 +106,11 @@ def main(): weight_quantize_type='channel_wise_abs_max', recon_level=FLAGS.recon_level, simulate_activation_quant=FLAGS.simulate_activation_quant, - regions=config['regions'], - region_weights_names=config['region_weights_names'], - skip_tensor_list=config['skip_tensor_list'] - if 'skip_tensor_list' in config else None, - epochs=20, - lr=0.1) + regions=None, + region_weights_names=None, + epochs=FLAGS.epochs, + lr=FLAGS.lr, + limit=FLAGS.limit) if __name__ == '__main__': diff --git a/example/quantization_analysis/GPT/README.md b/example/quantization_analysis/GPT/README.md new file mode 100644 index 000000000..007c37ce1 --- /dev/null +++ b/example/quantization_analysis/GPT/README.md @@ -0,0 +1,46 @@ +# GPT量化训练敏感度分析示例 + + +## 1. 简介 +本示例将以自然语言处理生成模型GPT-3为例,介绍如何使用量化训练敏感度分析工具分析量化模型,以及提升量化训练精度。 + +## 2.Benchmark +| 模型 | 策略 | ACC | Inference模型 | +| :-------- |:-------- | :--------: | :--------: | +| GPT-345M | Baseline | 44.17 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) | +| GPT-345M | 量化训练(分析前) | 41.58 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) | +| GPT-345M | 量化训练(分析后) | 44.94 | [Model](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_QAT_w_analysis_infer.tar) | + + +- ACC的指标均在基于[LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl)数据集,采用 ACC(accuracy) 指标评测得到 + +## 3. 量化分析流程 +#### 3.1 准备环境 +- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim develop版本 +- PaddleFleetX >= 2.4 + +#### 3.2 准备数据集 + +量化敏感度分析基于验证集获得每层的敏感度,可下载和使用 [LAMBADA](https://raw.githubusercontent.com/cybertronai/bflm/master/lambada_test.jsonl) 或者 [WikiText](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip) 数据集。本示例使用LAMBADA数据集来进行敏感度分析。 + +#### 3.3 准备预测模型 +- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345M_Baseline.tar) :Base模型 +- [GPT-345M](https://bj.bcebos.com/v1/paddle-slim-models/GPT_345_QAT_wo_analysis.tar) :分析前量化训练后的模型 + + +#### 3.4 量化敏感度分析 +量化敏感度分析示例通过analysis.py脚本启动,会使用接口```paddleslim.quant.AnalysisQAT```对模型进行敏感度分析。配置config文件中模型路径、数据路径和量化相关的参数,配置完成后便可对模型进行敏感度分析。具体运行命令为: + +```shell +python analysis.py --config_path=./configs/gpt_345M_analysis.yaml +``` + +分析完成后,会产生排序好的层敏感度(敏感度由大到小排序,敏感度越大说明约负向影响模型精度),并保存在```analysis_results/analysis.txt```中。 +敏感度排序前10层分别为:```linear_31```,```linear_27```,```linear_22```,```linear_43```,```linear_83```,```linear_15```,```linear_87```,```linear_3```,```linear_38```,```linear_39```。在这十层中,其中有八层属于```TransformerDecoder```中第二个FFN层,两层属于```TransformerDecoder```中第一个FFN层,而```MultiHeadAttention```中的Linear层都相对不敏感。 + +```paddleslim.quant.AnalysisQAT```详解见[AnalysisQAT.md](../../../docs/zh_cn/tutorials/quant/AnalysisQAT.md)。 + +#### 3.5 重新量化训练 + +根据分析结果,重新量化训练时,去掉了```linear_31```,```linear_27```,```linear_22```,```linear_43```,```linear_83```,```linear_15```,```linear_87```七层Linear的量化,最后量化模型精度达到44.94。 diff --git a/example/quantization_analysis/GPT/analysis.py b/example/quantization_analysis/GPT/analysis.py new file mode 100644 index 000000000..d41818e61 --- /dev/null +++ b/example/quantization_analysis/GPT/analysis.py @@ -0,0 +1,188 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import random +import numpy as np +import argparse +import time + +import paddle +from paddleslim.common import load_config as load_slim_config +from paddleslim.quant.analysis_qat import AnalysisQAT +from ppfleetx.data import build_dataloader +from ppfleetx.distributed.apis import env +from utils import parse_config + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--save_dir', + type=str, + default='analysis_results', + help="directory to save compressed model.") + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + return parser + + +def eval_reader_wrapper(reader): + def gen(): + for data in reader: + tokens, loss_mask, attention_mask, position_ids, labels, info = data + in_dict = {} + in_dict['tokens'] = tokens + in_dict['ids'] = position_ids + yield in_dict, labels, loss_mask, info + + return gen + + +def eval_function(exe, program, feed_names, fetch_list): + tic_eval = time.time() + score_name = "loss" if not global_config['cloze_eval'] else "number correct" + first_step = True + eval_losses = [] + total_score = 0 + for eval_step, (data, labels, loss_mask, info) in enumerate(eval_loader()): + preds = exe.run(program=program, + feed=data, + fetch_list=fetch_list, + return_numpy=False) + + paddle.disable_static() + + labels = paddle.to_tensor(labels) + preds = paddle.to_tensor(preds[0]) + loss_mask = paddle.to_tensor(loss_mask) + info = paddle.to_tensor(info) + + if not global_config['cloze_eval']: + if first_step: + num_original_tokens = info.numpy()[0][0] + num_tokenized_tokens = info.numpy()[0][1] + first_step = False + + masked_lm_loss = paddle.nn.functional.cross_entropy( + preds, labels, reduction="none") + loss = paddle.sum(masked_lm_loss * loss_mask) + eval_losses.append(loss.numpy()[0]) + total_score += loss.numpy() / (num_tokenized_tokens - 1) + + else: + if first_step: + num_examples = info.numpy()[0][0] + first_step = False + outputs = paddle.argmax(preds, -1) + acc = paddle.cast(outputs == labels, 'float32') + acc = paddle.where( + paddle.cast(loss_mask, 'bool'), acc, paddle.ones_like(acc)) + acc = paddle.sum(paddle.prod(acc, -1)) + eval_losses.append(acc.numpy()[0]) + total_score += acc.numpy()[0] + + if eval_step != 0 and (eval_step % 10 == 0): + print("[eval] step: %d, batch: %d, %s: %.9f, speed: %.2f step/s" % + (eval_step, eval_step, score_name, total_score, + 1. / (time.time() - tic_eval))) + tic_eval = time.time() + paddle.enable_static() + + metric = None + if not global_config['cloze_eval']: + total_loss = float(total_score) + ppl = math.exp(min(20, total_loss)) + token_ratio = (num_tokenized_tokens - 1) / (num_original_tokens - 1) + adjusted_ppl = math.exp(min(20, total_loss * token_ratio)) + string = ' validation results on {} | '.format(gpt_config['Data'][ + 'Eval']['dataset']['name']) + string += 'avg loss: {:.4E} | '.format(total_loss) + string += 'ppl: {:.4E} | '.format(ppl) + string += 'adjusted ppl: {:.4E} | '.format(adjusted_ppl) + string += 'token ratio: {} |'.format(token_ratio) + metric = ppl + else: + num_correct = float(total_score) + acc = float(num_correct / num_examples) + string = ' validation results on {} | '.format(gpt_config['Data'][ + 'Eval']['dataset']['name']) + string += 'number correct: {:.4E} | '.format(num_correct) + string += 'total examples: {:.4E} | '.format(num_examples) + string += 'avg accuracy: {:.4E}'.format(acc) + metric = acc + + print(string) + return metric + + +def main(): + global global_config, all_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format( + all_config) + global_config = all_config["Global"] + + seed = all_config['Global']['seed'] + random.seed(seed) + np.random.seed(seed) + paddle.seed(seed) + env.set_seed(seed) + + global gpt_config + gpt_config = parse_config(global_config['reader_config']) + + if not global_config['cloze_eval']: + gpt_config['Data']['Eval']['dataset']['name'] = "LM_Eval_Dataset" + else: + gpt_config['Data']['Eval']['dataset']['name'] = "Lambada_Eval_Dataset" + + valid_data_loader = build_dataloader(gpt_config['Data'], "Eval") + + global eval_loader + eval_loader = eval_reader_wrapper(valid_data_loader) + + analyzer = AnalysisQAT( + quant_model_dir=global_config["quant_model_dir"], + float_model_dir=global_config["float_model_dir"], + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"], + quantizable_op_type=global_config['quantizable_op_type'], + qat_metric=global_config['qat_metric'] + if 'qat_metric' in global_config else None, + eval_function=eval_function, + data_loader=eval_loader, + save_dir=FLAGS.save_dir, + resume=global_config['resume'], ) + analyzer.metric_error_analyse() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml b/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml new file mode 100644 index 000000000..1be19fc55 --- /dev/null +++ b/example/quantization_analysis/GPT/configs/gpt_345M_analysis.yaml @@ -0,0 +1,15 @@ +Global: + device: gpu + seed: 1024 + quant_model_dir: ./GPT_345_QAT_wo_analysis + float_model_dir: ./GPT_345M_Baseline + model_filename: model.pdmodel + params_filename: model.pdiparams + quantizable_op_type: ["mul", "matmul", "matmul_v2"] + resume: False + reader_config: ./configs/gpt_reader.yaml + cloze_eval: True # True for LAMBADA Dataset; False for WikiText + + + + \ No newline at end of file diff --git a/example/quantization_analysis/GPT/configs/gpt_reader.yaml b/example/quantization_analysis/GPT/configs/gpt_reader.yaml new file mode 100644 index 000000000..55612323e --- /dev/null +++ b/example/quantization_analysis/GPT/configs/gpt_reader.yaml @@ -0,0 +1,13 @@ +Data: + Eval: + dataset: + name: GPTDataset + input_dir: ./lambada_test.jsonl + max_seq_len: 1024 + overlapping_eval: 32 + loader: + num_workers: 1 + return_list: True + collate_fn: gpt_collate_fn + batch_size: 1 + \ No newline at end of file diff --git a/example/quantization_analysis/GPT/utils.py b/example/quantization_analysis/GPT/utils.py new file mode 100644 index 000000000..42e62b5fa --- /dev/null +++ b/example/quantization_analysis/GPT/utils.py @@ -0,0 +1,110 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import codecs +import yaml +import time +import copy + + +class AttrDict(dict): + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __copy__(self): + cls = self.__class__ + result = cls.__new__(cls) + result.__dict__.update(self.__dict__) + return result + + def __deepcopy__(self, memo): + cls = self.__class__ + result = cls.__new__(cls) + memo[id(self)] = result + for k, v in self.__dict__.items(): + setattr(result, k, copy.deepcopy(v, memo)) + for k, v in self.items(): + setattr(result, k, copy.deepcopy(v, memo)) + return result + + def setdefault(self, k, default=None): + if k not in self or self[k] is None: + self[k] = default + return default + else: + return self[k] + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + + def _update_dic(dic, base_dic): + '''Update config from dic based base_dic + ''' + base_dic = base_dic.copy() + dic = dic.copy() + + if dic.get('_inherited_', True) == False: + dic.pop('_inherited_') + return dic + + for key, val in dic.items(): + if isinstance(val, dict) and key in base_dic: + base_dic[key] = _update_dic(val, base_dic[key]) + else: + base_dic[key] = val + dic = base_dic + return dic + + def _parse_from_yaml(path): + '''Parse a yaml file and build config''' + + with codecs.open(path, 'r', 'utf-8') as file: + dic = yaml.load(file, Loader=yaml.FullLoader) + + if '_base_' in dic: + cfg_dir = os.path.dirname(path) + base_path = dic.pop('_base_') + base_path = os.path.join(cfg_dir, base_path) + base_dic = _parse_from_yaml(base_path) + dic = _update_dic(dic, base_dic) + return dic + + yaml_dict = _parse_from_yaml(cfg_file) + yaml_config = AttrDict(yaml_dict) + + create_attr_dict(yaml_config) + return yaml_config diff --git a/paddleslim/auto_compression/__init__.py b/paddleslim/auto_compression/__init__.py index cfc26259d..8ab8686dc 100644 --- a/paddleslim/auto_compression/__init__.py +++ b/paddleslim/auto_compression/__init__.py @@ -17,16 +17,11 @@ from .strategy_config import * from .config_helpers import * from .utils import * +from .analysis import * __all__ = [ - "AutoCompression", - "Quantization", - "Distillation", - "MultiTeacherDistillation", - "HyperParameterOptimization", - "Prune", - "UnstructurePrune", - "ProgramInfo", - "TrainConfig", - "predict_compressed_model", + "AutoCompression", "QuantAware", "QuantPost", "Distillation", + "MultiTeacherDistillation", "HyperParameterOptimization", "Prune", + "UnstructurePrune", "ProgramInfo", "TrainConfig", + "predict_compressed_model", "analysis_prune" ] diff --git a/paddleslim/auto_compression/analysis.py b/paddleslim/auto_compression/analysis.py new file mode 100644 index 000000000..3423db4a7 --- /dev/null +++ b/paddleslim/auto_compression/analysis.py @@ -0,0 +1,80 @@ +import sys +import pickle +import logging +import paddle +from ..common import get_logger +from ..common.load_model import load_inference_model +from ..prune import sensitivity, get_ratios_by_loss + +_logger = get_logger(__name__, level=logging.INFO) + +__all__ = ['analysis_prune'] + + +def get_prune_params(program): + params = [] + for block in program.blocks: + for op in block.ops: + if op.type == 'conv2d' and op.attr('groups') == 1: + for inp_name in op.input_arg_names: + if block.var(inp_name).persistable is True: + params.append(inp_name) + return params + + +def analysis_prune(eval_function, + model_dir, + model_filename, + params_filename, + analysis_file, + pruned_ratios, + target_loss=None, + criterion='l1_norm'): + ''' + Args: + eval_func(function): The callback function used to evaluate the model. It should accept a instance of `paddle.static.Program` as argument and return a score on test dataset. + model_dir(str): Directory path to load model. If you want to load onnx model, only set ``model_dir=model.onnx``. + model_filename(str): Specify model_filename. If you want to load onnx model, model filename should be None. + params_filename(str): Specify params_filename. If you want to load onnx model, params filename should be None. + analysis_file(str): The file to save the sensitivities. It will append the latest computed sensitivities into the file. And the sensitivities in the file would not be computed again. This file can be loaded by `pickle` library. + pruned_ratios(list): The ratios to be pruned. + criterion(str|function): The criterion used to sort channels for pruning. Currently supports l1_ norm, bn_scale, geometry_median. Default: l1_norm. + ''' + + devices = paddle.device.get_device().split(':')[0] + places = paddle.device._convert_to_place(devices) + exe = paddle.static.Executor(places) + [eval_program, feed_target_names, fetch_targets] = (load_inference_model( + model_dir, + model_filename=model_filename, + params_filename=params_filename, + executor=exe)) + params = get_prune_params(eval_program) + + _logger.info("start analysis") + sens_0 = sensitivity( + eval_program, + places, + params, + eval_function, + sensitivities_file=analysis_file, + eval_args=[exe, feed_target_names, fetch_targets], + pruned_ratios=pruned_ratios, + criterion=criterion) + + with open(analysis_file, 'rb') as f: + if sys.version_info < (3, 0): + sensitivities = pickle.load(f) + else: + sensitivities = pickle.load(f, encoding='bytes') + + _logger.info("finish analysis: {}".format(sensitivities)) + + ratios = {} + if target_loss is not None: + ratios = get_ratios_by_loss(sensitivities, target_loss) + _logger.info("you can set prune_params_name: {} in ChannelPrune".format( + ratios.keys())) + _logger.info("you can set pruned_ratio: {} in ChannelPrune".format( + ratios.values())) + return ratios diff --git a/paddleslim/auto_compression/auto_strategy.py b/paddleslim/auto_compression/auto_strategy.py index eab962add..cfad16b77 100644 --- a/paddleslim/auto_compression/auto_strategy.py +++ b/paddleslim/auto_compression/auto_strategy.py @@ -125,17 +125,17 @@ def create_strategy_config(strategy_str, model_type): ### only platform is linux can use smac to do hyperparameter optimization ### choose quant_aware to do quantization in other platform if platform.system().lower() == 'linux': - quant_config = Quantization(**default_quant_config) + quant_config = QuantAware(**default_quant_config) hpo_config = HyperParameterOptimization(**hpo_config_tester) configs.append({ - 'Quantization': quant_config, + 'QuantPost': quant_config, 'HyperParameterOptimization': hpo_config }) else: - quant_config = Quantization(**default_quant_config) + quant_config = QuantAware(**default_quant_config) dis_config = Distillation() configs.append({ - 'Quantization': quant_config, + 'QuantAware': quant_config, 'Distillation': dis_config }) @@ -248,18 +248,18 @@ def get_final_quant_config(ptq_loss, model_type=None): return None ### if emd loss less than MAGIC_MAX_EMD_DISTANCE, select quant_post & hpo. elif ptq_loss < MAGIC_MAX_EMD_DISTANCE: - quant_config = Quantization(**default_quant_config) + quant_config = QuantAware(**default_quant_config) hpo_config = HyperParameterOptimization(**default_hpo_config) configs = [{ - 'Quantization': quant_config, + 'QuantPost': quant_config, 'HyperParameterOptimization': hpo_config }] ### if emd loss greater than MAGIC_MAX_EMD_DISTANCE, select qat & dist. else: - quant_config = Quantization(**default_quant_config) + quant_config = QuantAware(**default_quant_config) dis_config = Distillation() - configs = [{'Quantization': quant_config, 'Distillation': dis_config}] + configs = [{'QuantAware': quant_config, 'Distillation': dis_config}] _logger.info("Start Quantization and Distillation Training.") return configs diff --git a/paddleslim/auto_compression/compressor.py b/paddleslim/auto_compression/compressor.py index a74105a23..1692b5095 100644 --- a/paddleslim/auto_compression/compressor.py +++ b/paddleslim/auto_compression/compressor.py @@ -26,9 +26,10 @@ import itertools import paddle.distributed.fleet as fleet from ..quant.quanter import convert, quant_post +from ..quant.reconstruction_quantization import quant_recon_static from ..common.recover_program import recover_inference_program from ..common import get_logger -from ..common.patterns import get_patterns +from ..common.patterns import get_patterns, find_final_nodes from ..common.load_model import load_inference_model, get_model_dir, export_onnx from ..common.dataloader import wrap_dataloader, get_feed_vars from ..common.config_helper import load_config @@ -87,28 +88,30 @@ def __init__(self, Only one strategy(quant_post with hyperparameter optimization) can set train_config to None. Default: None. strategy_config(dict, list(dict), optional): The strategy config. You can set single config to get multi-strategy config, such as - 1. set ``Quantization`` and ``Distillation`` to get quant_aware and distillation compress config. - The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24`_ . - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . - 2. set ``Quantization`` and ``HyperParameterOptimization`` to get quant_post and hyperparameter optimization compress config. - The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24`_ . - The HyperParameterOptimization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L73`_ . + 1. set ``QuantAware`` and ``Distillation`` to get quant_aware and distillation compress config. + The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L55`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . + 2. set ``QuantPost`` and ``HyperParameterOptimization`` to get quant_post and hyperparameter optimization compress config. + The QuantPost config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L187`_ . + The HyperParameterOptimization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L160`_ . 3. set ``ChannelPrune`` and ``Distillation`` to get channel prune and distillation compress config. - The ChannelPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ . - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . + The ChannelPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L254`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . 4. set ``ASPPrune`` and ``Distillation`` to get asp prune and distillation compress config. - The ASPPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ . - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . + The ASPPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L268`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . 5. set ``TransformerPrune`` and ``Distillation`` to get transformer prune and distillation compress config. - The TransformerPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ . - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . + The TransformerPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L278`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . 6. set ``UnstructurePrune`` and ``Distillation`` to get unstructureprune and distillation compress config. - The UnstructurePrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L91`_ . - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . + The UnstructurePrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L288`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . 7. set ``Distillation`` to use one teacher modol to distillation student model. - The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ . + The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ . 8. set ``MultiTeacherDistillation`` to use multi-teacher to distillation student model. - The MultiTeacherDistillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L56`_ . + The MultiTeacherDistillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L134`_ . + 9. set ``QuantPost`` to get quant_post compress config. + The QuantPost config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L187`_ . If set to None, will choose a strategy automatically. Default: None. target_speedup(float, optional): target speedup ratio by the way of auto compress. Default: None. @@ -124,7 +127,7 @@ def __init__(self, self.final_dir = save_dir if not os.path.exists(self.final_dir): - os.makedirs(self.final_dir) + os.makedirs(self.final_dir, exist_ok=True) # load config if isinstance(config, str): @@ -155,7 +158,7 @@ def __init__(self, paddle.enable_static() self._exe, self._places = self._prepare_envs() - self.model_type = self._get_model_type() + self.default_distill_node_pair, self.model_type = self._get_model_info() if self.train_config is not None and self.train_config.use_fleet: fleet.init(is_collective=True) @@ -188,7 +191,6 @@ def __init__(self, self._strategy, self._config = self._prepare_strategy( self.strategy_config) - self.train_config = self._get_final_train_config( self.train_config, self._strategy, self.model_type) _logger.info(f"Selected strategies: {self._strategy}") @@ -206,7 +208,7 @@ def _get_final_train_config(self, train_config, strategy_config, ### The TrainConfig for quantization is extrapolate from above. tmp_train_config = copy.deepcopy(train_config.__dict__) ### the epoch, train_iter, learning rate of quant is 10% of the prune compress - if self.model_type != 'transformer': + if self.model_type != 'transformer' and train_config.epochs is not None: tmp_train_config['epochs'] = max( int(train_config.epochs * 0.1), 1) if train_config.train_iter is not None: @@ -261,9 +263,14 @@ def _infer_shape(self, model_dir, model_filename, params_filename, op.desc.infer_shape(block.desc) save_path = os.path.join(save_path, "infered_shape") - os.makedirs(save_path) + os.makedirs(save_path, exist_ok=True) paddle.static.save_inference_model( - save_path, feed_vars, fetch_targets, exe, program=inference_program) + save_path, + feed_vars, + fetch_targets, + exe, + program=inference_program, + clip_extra=False) _logger.info(f"Saved model infered shape to {save_path}") @property @@ -301,17 +308,29 @@ def _prepare_envs(self): exe = paddle.static.Executor(places) return exe, places - def _get_model_type(self): + def _get_model_info(self): [inference_program, _, _] = (load_inference_model( self.model_dir, model_filename=self.model_filename, params_filename=self.params_filename, executor=self._exe)) - _, _, model_type = get_patterns(inference_program) + + ### set the output of final weight node as the default distillation node + distill_node = [] + final_weight_node = find_final_nodes(inference_program) + for out_var in final_weight_node: + distill_node.append('teacher_' + out_var.name()) + distill_node.append(out_var.name()) + + model_type = None + if not isinstance(self.strategy_config, dict): + _, model_type = get_patterns(inference_program) + _logger.info(f"Detect model type: {model_type}") + if self.model_filename is None: opt_model_filename = '__opt_model__' else: - opt_model_filename = 'opt_' + self.model_filename + opt_model_filename = self.model_filename program_bytes = inference_program._remove_training_info( clip_extra=False).desc.serialize_to_string() with open( @@ -321,8 +340,8 @@ def _get_model_type(self): shutil.move( os.path.join(self.updated_model_dir, opt_model_filename), os.path.join(self.updated_model_dir, self.model_filename)) - _logger.info(f"Detect model type: {model_type}") - return model_type + + return distill_node, model_type def _prepare_strategy(self, strategy_config): if not isinstance(strategy_config, list): @@ -331,8 +350,9 @@ def _prepare_strategy(self, strategy_config): strategy = [] config = [] for strategy_c in strategy_config: - quant_config = strategy_c.get("Quantization", None) + quant_config = strategy_c.get("QuantAware", None) hpo_config = strategy_c.get("HyperParameterOptimization", None) + ptq_config = strategy_c.get("QuantPost", None) prune_config = strategy_c.get("ChannelPrune", None) asp_config = strategy_c.get("ASPPrune", None) transformer_prune_config = strategy_c.get("TransformerPrune", None) @@ -383,10 +403,10 @@ def _prepare_strategy(self, strategy_config): self._distill_config)) ### case5: quant_config & hpo_config ==> PTQ & HPO - if quant_config is not None and hpo_config is not None: + if ptq_config is not None and hpo_config is not None: only_distillation = False strategy.append('ptq_hpo') - config.append(merge_config(quant_config, hpo_config)) + config.append(merge_config(ptq_config, hpo_config)) ### case6: quant_config & distill config ==> QAT & Distill if quant_config is not None and self._distill_config is not None and 'ptq_hpo' not in strategy: @@ -403,6 +423,11 @@ def _prepare_strategy(self, strategy_config): strategy.append('multi_teacher_dis') config.append(multi_teacher_distill_config) + ### case8: only qtp_config ==> PTQ + if ptq_config is not None and hpo_config is None: + strategy.append('quant_post') + config.append(ptq_config) + ### NOTE: keep quantation in the last step idx = -1 if 'qat_dis' in strategy and strategy.index('qat_dis') != ( @@ -438,8 +463,7 @@ def _prepare_fleet_strategy(train_config): return strategy def _prepare_program(self, program, feed_target_names, fetch_targets, - patterns, default_distill_node_pair, strategy, config, - train_config): + patterns, strategy, config, train_config): train_program = recover_inference_program(program) startup_program = paddle.static.Program() train_program_info = ProgramInfo(startup_program, train_program, @@ -476,7 +500,7 @@ def _prepare_program(self, program, feed_target_names, fetch_targets, strategy, patterns, self.eval_dataloader) if train_config.use_fleet: - dist_strategy = _prepare_fleet_strategy(train_config) + dist_strategy = self._prepare_fleet_strategy(train_config) else: dist_strategy = None @@ -490,7 +514,7 @@ def _prepare_program(self, program, feed_target_names, fetch_targets, train_program_info, pruner=self._pruner, dist_strategy=dist_strategy, - default_distill_node_pair=default_distill_node_pair) + default_distill_node_pair=self.default_distill_node_pair) self._quant_config = None ### add quant_aware program, quant always is last step @@ -548,8 +572,8 @@ def _compiled_program(self, program_info, strategy): def create_tmp_dir(self, base_dir, prefix="tmp"): # create a new temp directory in final dir - s_datetime = strftime("%Y_%m_%d_%H_%M_%S", gmtime()) - tmp_base_name = "_".join([prefix, str(os.getpid()), s_datetime]) + s_datetime = strftime("%Y_%m_%d_%H_%M", gmtime()) + tmp_base_name = "_".join([prefix, str(os.getppid()), s_datetime]) tmp_dir = os.path.join(base_dir, tmp_base_name) if not os.path.exists(tmp_dir): os.makedirs(tmp_dir) @@ -562,6 +586,7 @@ def compress(self): config = None train_config = None strategy_idx = None + self.final_metric = -1.0 for strategy_idx, ( strategy, config, train_config ) in enumerate(zip(self._strategy, self._config, self.train_config)): @@ -580,15 +605,28 @@ def compress(self): self.single_strategy_compress(quant_strategy[0], quant_config[0], strategy_idx, train_config) - tmp_model_path = os.path.join( - self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))) - final_model_path = os.path.join(self.final_dir) if paddle.distributed.get_rank() == 0: + tmp_model_path = os.path.join( + self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))) + final_model_path = os.path.join(self.final_dir) for _file in os.listdir(tmp_model_path): _file_path = os.path.join(tmp_model_path, _file) if os.path.isfile(_file_path): shutil.copy(_file_path, final_model_path) shutil.rmtree(self.tmp_dir) + + if self.eval_function is not None and self.final_metric < 0.0: + [inference_program, feed_target_names, fetch_targets]= load_inference_model( \ + final_model_path, \ + model_filename=self.model_filename, params_filename=self.params_filename, + executor=self._exe) + self.final_metric = self.eval_function( + self._exe, inference_program, feed_target_names, + fetch_targets) + if self.eval_function is not None: + _logger.info("==> The metric of final model is {:.4f}".format( + self.final_metric)) + _logger.info( "==> The ACT compression has been completed and the final model is saved in `{}`". format(final_model_path)) @@ -611,41 +649,64 @@ def single_strategy_compress(self, strategy, config, strategy_idx, params_filename=self.params_filename, executor=self._exe) if strategy == 'quant_post': - quant_post( - self._exe, - model_dir=model_dir, - quantize_model_path=os.path.join( - self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))), - data_loader=self.train_dataloader, - model_filename=self.model_filename, - params_filename=self.params_filename, - save_model_filename=self.model_filename, - save_params_filename=self.params_filename, - batch_size=1, - batch_nums=config.batch_num, - algo=config.ptq_algo, - round_type='round', - bias_correct=config.bias_correct, - hist_percent=config.hist_percent, - quantizable_op_type=config.quantize_op_types, - is_full_quantize=config.is_full_quantize, - weight_bits=config.weight_bits, - activation_bits=config.activation_bits, - activation_quantize_type='range_abs_max', - weight_quantize_type=config.weight_quantize_type, - onnx_format=False) + if config.recon_level is None: + quant_post( + self._exe, + model_dir=self.updated_model_dir, + quantize_model_path=os.path.join( + self.tmp_dir, + 'strategy_{}'.format(str(strategy_idx + 1))), + data_loader=self.train_dataloader, + model_filename=self.model_filename, + params_filename=self.params_filename, + save_model_filename=self.model_filename, + save_params_filename=self.params_filename, + batch_size=config.batch_size, + batch_nums=config.batch_nums, + algo=config.algo, + bias_correction=config.bias_correction, + hist_percent=config.hist_percent, + quantizable_op_type=config.quantize_op_types, + is_full_quantize=config.is_full_quantize, + weight_bits=config.weight_bits, + activation_bits=config.activation_bits, + activation_quantize_type=config.activation_quantize_type, + weight_quantize_type=config.weight_quantize_type, + onnx_format=config.onnx_format) + else: + quant_recon_static( + executor=self._exe, + model_dir=self.updated_model_dir, + quantize_model_path=os.path.join( + self.tmp_dir, + 'strategy_{}'.format(str(strategy_idx + 1))), + data_loader=self.train_dataloader, + model_filename=self.model_filename, + params_filename=self.params_filename, + batch_size=config.batch_size, + batch_nums=config.batch_nums, + algo=config.algo, + hist_percent=config.hist_percent, + quantizable_op_type=config.quantize_op_types, + is_full_quantize=config.is_full_quantize, + bias_correction=config.bias_correction, + onnx_format=config.onnx_format, + weight_bits=config.weight_bits, + activation_bits=config.activation_bits, + weight_quantize_type=config.weight_quantize_type, + activation_quantize_type=config.activation_quantize_type, + recon_level=config.recon_level, + simulate_activation_quant=config.simulate_activation_quant, + regions=config.regions, + region_weights_names=config.region_weights_names, + skip_tensor_list=config.skip_tensor_list, + epochs=config.epochs, + lr=config.lr) elif strategy == 'ptq_hpo': if platform.system().lower() != 'linux': raise NotImplementedError( "post-quant-hpo is not support in system other than linux") - if self.updated_model_dir != model_dir: - # If model is ONNX, convert it to inference model firstly. - load_inference_model( - model_dir, - model_filename=self.model_filename, - params_filename=self.params_filename, - executor=self._exe) if self.eval_function is None: # If eval function is None, ptq_hpo will use emd distance to eval the quantized model, so need the dataloader without label eval_dataloader = self.train_dataloader @@ -654,7 +715,7 @@ def single_strategy_compress(self, strategy, config, strategy_idx, post_quant_hpo.quant_post_hpo( self._exe, self._places, - model_dir=model_dir, + model_dir=self.updated_model_dir, quantize_model_path=os.path.join( self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))), train_dataloader=self.train_dataloader, @@ -702,18 +763,19 @@ def single_strategy_compress(self, strategy, config, strategy_idx, train_config.origin_metric, metric)) self.metric_before_compressed = metric - patterns, default_distill_node_pair, _ = get_patterns( - inference_program) - + patterns = None + if 'transformer' in strategy: + patterns, _ = get_patterns(inference_program) train_program_info, test_program_info = self._prepare_program( inference_program, feed_target_names, fetch_targets, patterns, - default_distill_node_pair, strategy, config, train_config) - if 'unstructure' in self._strategy: + strategy, config, train_config) + if 'unstructure' in strategy: test_program_info.program._program = remove_unused_var_nodes( test_program_info.program._program) test_program_info = self._start_train( train_program_info, test_program_info, strategy, train_config) - self._save_model(test_program_info, strategy, strategy_idx) + if paddle.distributed.get_rank() == 0: + self._save_model(test_program_info, strategy, strategy_idx) def _start_train(self, train_program_info, test_program_info, strategy, train_config): @@ -721,13 +783,17 @@ def _start_train(self, train_program_info, test_program_info, strategy, total_epochs = train_config.epochs if train_config.epochs else 100 total_train_iter = 0 stop_training = False + + loss_vars = [var for var in train_program_info.loss_dict.values()] + loss_names = [name for name in train_program_info.loss_dict.keys()] + for epoch_id in range(total_epochs): if stop_training: break for batch_id, data in enumerate(self.train_dataloader()): - np_probs_float, = self._exe.run(train_program_info.program, \ + loss = self._exe.run(train_program_info.program, \ feed=data, \ - fetch_list=train_program_info.fetch_targets) + fetch_list=train_program_info.fetch_targets+loss_vars) if not isinstance(train_program_info.learning_rate, float): train_program_info.learning_rate.step() if 'unstructure' in strategy: @@ -738,10 +804,12 @@ def _start_train(self, train_program_info, test_program_info, strategy, else: logging_iter = train_config.logging_iter if batch_id % int(logging_iter) == 0: - _logger.info( - "Total iter: {}, epoch: {}, batch: {}, loss: {}".format( - total_train_iter, epoch_id, batch_id, - np_probs_float)) + print_info = "Total iter: {}, epoch: {}, batch: {}, loss: {}".format( + total_train_iter, epoch_id, batch_id, loss[0]) + for idx, loss_value in enumerate(loss[1:]): + print_info += '{}: {} '.format(loss_names[idx], + loss_value) + _logger.info(print_info) total_train_iter += 1 if total_train_iter % int( train_config.eval_iter) == 0 and total_train_iter != 0: @@ -770,7 +838,7 @@ def _start_train(self, train_program_info, test_program_info, strategy, self.metric_before_compressed) ) / self.metric_before_compressed <= 0.005: _logger.info( - "The error rate between the compressed model and original model is less than 5%. The training process ends." + "The error rate between the compressed model and original model is less than 0.5%. The training process ends." ) stop_training = True break @@ -792,8 +860,9 @@ def _start_train(self, train_program_info, test_program_info, strategy, ) if (train_config.train_iter and total_train_iter >= train_config.train_iter) or stop_training: + stop_training = True break - + self.final_metric = best_metric if 'unstructure' in self._strategy or train_config.sparse_model: self._pruner.update_params() @@ -843,21 +912,23 @@ def _save_model(self, test_program_info, strategy, strategy_idx): feed_vars=feed_vars, fetch_vars=test_program_info.fetch_targets, executor=self._exe, - program=test_program) + program=test_program, + clip_extra=False) def export_onnx(self, model_name='quant_model.onnx', deploy_backend='tensorrt'): - infer_model_path = os.path.join(self.final_dir, self.model_filename) - assert os.path.exists( - infer_model_path), 'Not found {}, please check it.'.format( - infer_model_path) - onnx_save_path = os.path.join(self.final_dir, 'ONNX') - if not os.path.exists(onnx_save_path): - os.makedirs(onnx_save_path) - export_onnx( - self.final_dir, - model_filename=self.model_filename, - params_filename=self.params_filename, - save_file_path=os.path.join(onnx_save_path, model_name), - deploy_backend=deploy_backend) + if paddle.distributed.get_rank() == 0: + infer_model_path = os.path.join(self.final_dir, self.model_filename) + assert os.path.exists( + infer_model_path), 'Not found {}, please check it.'.format( + infer_model_path) + onnx_save_path = os.path.join(self.final_dir, 'ONNX') + if not os.path.exists(onnx_save_path): + os.makedirs(onnx_save_path) + export_onnx( + self.final_dir, + model_filename=self.model_filename, + params_filename=self.params_filename, + save_file_path=os.path.join(onnx_save_path, model_name), + deploy_backend=deploy_backend) diff --git a/paddleslim/auto_compression/create_compressed_program.py b/paddleslim/auto_compression/create_compressed_program.py index 011af243c..7217b0331 100644 --- a/paddleslim/auto_compression/create_compressed_program.py +++ b/paddleslim/auto_compression/create_compressed_program.py @@ -24,6 +24,7 @@ from ..common import get_logger from .strategy_config import ProgramInfo from ..common.load_model import load_inference_model +from ..analysis import flops _logger = get_logger(__name__, level=logging.INFO) __all__ = [ @@ -118,7 +119,7 @@ def _parse_distill_loss(distill_node_pair, distill_lambda=1.0): """parse distill loss config""" loss_dist = 0.0 - losses = [] + losses = {} if isinstance(distill_node_pair[0], str): assert isinstance(distill_loss, str) assert isinstance(distill_lambda, float) @@ -128,16 +129,17 @@ def _parse_distill_loss(distill_node_pair, assert len(distill_node_pair) == len(distill_loss) assert len(distill_node_pair) == len(distill_lambda) - for node, loss, lam in zip(distill_node_pair, distill_loss, distill_lambda): - tmp_loss = 0.0 - _logger.info("train config.distill_node_pair: {}".format(node, loss, - lam)) + for node, loss_clas, lam in zip(distill_node_pair, distill_loss, + distill_lambda): + tmp_loss = losses.get(loss_clas, 0.0) + _logger.info("train config.distill_node_pair: {}".format( + node, loss_clas, lam)) assert len(node) % 2 == 0, \ "distill_node_pair config wrong, the length needs to be an even number" for i in range(len(node) // 2): - tmp_loss += eval(loss)(node[i * 2], node[i * 2 + 1]) - loss_dist += lam * tmp_loss - losses.append(tmp_loss) + tmp_loss += eval(loss_clas)(node[i * 2], node[i * 2 + 1]) * lam + loss_dist += tmp_loss + losses[loss_clas] = tmp_loss return loss_dist, losses @@ -313,7 +315,7 @@ def build_distill_program(executor, use_dynamic_loss_scaling=True, **train_config['amp_config']) - distill_loss, losses = _parse_distill_loss( + distill_loss, loss_dict = _parse_distill_loss( distill_node_pair, config.get('loss') or 'l2', ### default loss is l2 config.get('alpha') or 1.0) ### default alpha is 1.0 @@ -334,7 +336,7 @@ def build_distill_program(executor, train_program_info = ProgramInfo(startup_program, train_program, feed_target_names, train_fetch_list, - optimizer, learning_rate) + optimizer, learning_rate, loss_dict) test_program_info = ProgramInfo(startup_program, test_program, feed_target_names, fetch_targets) return train_program_info, test_program_info @@ -399,6 +401,33 @@ def _get_label_info(dataloader, feed_target_names): return label_info +def _get_chn_prune_params(program): + params = [] + original_shapes = {} + for block in program.blocks: + for op in block.ops: + if op.type == 'conv2d' and op.attr('groups') == 1: + for inp_name in op.input_arg_names: + var_ = block.var(inp_name) + if var_.persistable is True: + params.append(inp_name) + original_shapes[inp_name] = var_.shape + return params, original_shapes + + +def _get_asp_prune_params(program): + params = [] + for block in program.blocks: + for op in block.ops: + if (op.type == 'conv2d' and op.attr('groups') == 1 + ) or op.type == 'mul' or op.type == 'matmul_v2': + for inp_name in op.input_arg_names: + var_ = block.var(inp_name) + if var_.persistable is True: + params.append(inp_name) + return params + + def build_prune_program(executor, place, config, @@ -428,20 +457,29 @@ def build_prune_program(executor, elif strategy.startswith('channel_prune'): from ..prune import Pruner pruner = Pruner(config["criterion"]) - params = [] - original_shapes = {} - ### TODO(ceci3): set default prune weight - for param in train_program_info.program.global_block().all_parameters(): - if config['prune_params_name'] is not None and param.name in config[ - 'prune_params_name']: - params.append(param.name) - original_shapes[param.name] = param.shape + if config['prune_params_name'] is None: + params, original_shapes = _get_chn_prune_params( + train_program_info.program) + else: + params = [] + original_shapes = {} + for param in train_program_info.program.global_block( + ).all_parameters(): + if config[ + 'prune_params_name'] is not None and param.name in config[ + 'prune_params_name']: + params.append(param.name) + original_shapes[param.name] = param.shape + + origin_flops = flops(train_program_info.program) pruned_program, _, _ = pruner.prune( train_program_info.program, paddle.static.global_scope(), params=params, - ratios=[config['pruned_ratio']] * len(params), + ratios=[config['pruned_ratio']] * len(params) + if isinstance(config['pruned_ratio'], float) else + config['pruned_ratio'], place=place) _logger.info( "####################channel pruning##########################") @@ -451,13 +489,22 @@ def build_prune_program(executor, param.name, original_shapes[param.name], param.shape)) _logger.info( "####################channel pruning end##########################") + + final_flops = flops(pruned_program) + pruned_flops = abs(origin_flops - final_flops) / origin_flops + _logger.info("FLOPs before pruning: {}".format(origin_flops)) + _logger.info("FLOPs after pruning: {}. Pruned FLOPs: {}%.".format( + final_flops, round(pruned_flops * 100, 2))) train_program_info.program = pruned_program elif strategy.startswith('asp'): from paddle.static import sparsity pruner = sparsity excluded_params_name = [] - ### TODO(ceci3): set default prune weight + if config['prune_params_name'] is None: + config['prune_params_name'] = _get_asp_prune_params( + train_program_info.program) + for param in train_program_info.program.global_block().all_parameters(): if config['prune_params_name'] is not None: if param.name not in config['prune_params_name']: diff --git a/paddleslim/auto_compression/strategy_config.py b/paddleslim/auto_compression/strategy_config.py index d8b3e90ce..508da6448 100644 --- a/paddleslim/auto_compression/strategy_config.py +++ b/paddleslim/auto_compression/strategy_config.py @@ -16,7 +16,7 @@ __all__ = [ "BaseStrategy", - "Quantization", + "QuantAware", "Distillation", "MultiTeacherDistillation", "HyperParameterOptimization", @@ -29,10 +29,11 @@ "TrainConfig", "SUPPORTED_CONFIG", "TRAIN_CONFIG_NAME", + "QuantPost", ] SUPPORTED_CONFIG = [ - "Quantization", + "QuantAware", "Distillation", "MultiTeacherDistillation", "HyperParameterOptimization", @@ -40,6 +41,7 @@ "UnstructurePrune", "TransformerPrune", "ASPPrune", + "QuantPost", ] TRAIN_CONFIG_NAME = "TrainConfig" @@ -50,7 +52,7 @@ def __init__(self, name): self.name = name -class Quantization(BaseStrategy): +class QuantAware(BaseStrategy): def __init__(self, quantize_op_types=[ 'conv2d', 'depthwise_conv2d', 'conv2d_transpose', 'mul', @@ -85,7 +87,7 @@ def __init__(self, onnx_format(bool): Whether to export the quantized model with format of ONNX. Default is False. is_full_quantize(bool): If True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES. Default: False. """ - super(Quantization, self).__init__("Quantization") + super(QuantAware, self).__init__("QuantAware") self.quantize_op_types = quantize_op_types self.weight_bits = weight_bits self.activation_bits = activation_bits @@ -182,12 +184,82 @@ def __init__(self, self.max_quant_count = max_quant_count +class QuantPost(BaseStrategy): + def __init__(self, + batch_size=32, + batch_nums=None, + epochs=20, + lr=0.1, + algo='hist', + hist_percent=0.999, + regions=None, + region_weights_names=None, + recon_level=None, + is_full_quantize=False, + bias_correction=False, + weight_quantize_type='channel_wise_abs_max', + activation_quantize_type='range_abs_max', + simulate_activation_quant=False, + skip_tensor_list=None, + onnx_format=False, + quantize_op_types=[ + "conv2d", "depthwise_conv2d", "mul", "matmul", "matmul_v2" + ], + weight_bits=8, + activation_bits=8): + """ + QuantPost Config. + Args: + batch_size(int, optional): The batch size of DataLoader. Default: 1. + batch_nums(int, optional): If batch_nums is not None, the number of calibrate data is 'batch_size*batch_nums'. If batch_nums is None, use all data generated by sample_generator as calibrate data. Default: None. + lr(float, optional): The learning rate of Reconstruction Quanter. Default: 0.1. + algo(str, optional): Post-Training Quantization algorithm, can be set reference the algo from ``. Default: 'hist'. + hist_percent(float, optional): The percentile of histogram for algo hist. Default: 0.999. + regions(list[list], optional): The list of some regions, each region is a subgraph of fp32 program and it will have exact 1 input operation and 1 output operation. When the recon-level is region, the reconstruction loss of each region is minimized. Default: None. + region_weights_names(list[list], optional): The weight names inside every region. Default: None. + recon_level(str, optional): The type of reconstruction granularity. Currently support ['layer-wise', 'region-wise'] types. Only when recon_level isn't None can Reconstruction Quanter be used. Default: None. + is_full_quantize(bool): If True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES. Default: False. + bias_correct(list(bool)): Whether to use bias correction method of https://arxiv.org/abs/1810.05723. Default: False. + weight_quantize_type(str): Weight quantize type. Default: 'channel_wise_abs_max'. + activation_quantize_type(str): Activation quantize type. Default: 'moving_average_abs_max'. + simulate_activation_quant(bool, optional): Whether we need the noise caused by activation quantization during the reconstruction process. Default: False. + skip_tensor_list(list): List of skip quant tensor name. Default: None. + onnx_format(bool): Whether to export the quantized model with format of ONNX. Default: False. + quantize_op_types(list(str)): Ops of type in quantize_op_types, will be quantized. Default: ['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2']. + weight_bits(int): Weight quantize bit num. Default: 8. + activation_bits(int): Activation quantize bit num. Default: 8. + """ + super(QuantPost, self).__init__("PTQ") + self.batch_size = batch_size + self.batch_nums = batch_nums + self.epochs = epochs + self.lr = lr + self.algo = algo + self.hist_percent = hist_percent + self.regions = regions + self.region_weights_names = region_weights_names + self.recon_level = recon_level + self.is_full_quantize = is_full_quantize + self.bias_correction = bias_correction + self.weight_quantize_type = weight_quantize_type + self.activation_quantize_type = activation_quantize_type + self.simulate_activation_quant = simulate_activation_quant + self.skip_tensor_list = skip_tensor_list + self.onnx_format = onnx_format + self.quantize_op_types = quantize_op_types + self.weight_bits = weight_bits + self.activation_bits = activation_bits + + class ChannelPrune: - def __init__(self, pruned_ratio, prune_params_name, criterion='l1_norm'): + def __init__(self, + pruned_ratio, + prune_params_name=None, + criterion='l1_norm'): """ ChannelPrune Config. Args: - pruned_ratio(float): The ratios to be pruned. + pruned_ratio(float|list[float]): The ratios to be pruned. prune_params_name(list(str)): A list of parameter names to be pruned. criterion(str|function): the criterion used to sort channels for pruning, can be choose from ['l1_norm', 'bn_scale', 'geometry_median']. Default: 'l1_norm'. """ @@ -197,7 +269,7 @@ def __init__(self, pruned_ratio, prune_params_name, criterion='l1_norm'): class ASPPrune: - def __init__(self, prune_params_name): + def __init__(self, prune_params_name=None): """ ASPPrune Config. Args: @@ -223,7 +295,7 @@ def __init__(self, threshold=0.01, ratio=0.55, gmp_config=None, - prune_params_type=None, + prune_params_type='conv1x1_only', local_sparsity=False): """ UnstructurePrune Config. @@ -359,7 +431,8 @@ def __init__(self, feed_target_names, fetch_targets, optimizer=None, - learning_rate=None): + learning_rate=None, + loss_dict=None): """ ProgramInfo Config. Args: @@ -369,6 +442,7 @@ def __init__(self, fetch_targets(list(Variable)): The fetch variable in the program. optimizer(Optimizer, optional): Optimizer in training. Default: None. learning_rate(float|paddle.optimizer.lr, optional): learning_rate in training. Default: None. + loss_dict(dict): The components of losses. """ self.startup_program = startup_program self.program = program @@ -376,3 +450,4 @@ def __init__(self, self.fetch_targets = fetch_targets self.optimizer = optimizer self.learning_rate = learning_rate + self.loss_dict = loss_dict diff --git a/paddleslim/auto_compression/utils/fake_ptq.py b/paddleslim/auto_compression/utils/fake_ptq.py index 91cccfc2f..bce49b4f1 100644 --- a/paddleslim/auto_compression/utils/fake_ptq.py +++ b/paddleslim/auto_compression/utils/fake_ptq.py @@ -169,5 +169,6 @@ def analysis_and_save_info(op_node, out_var_name): feed_vars=feed_vars, fetch_vars=_fetch_list, executor=executor, - program=_program) + program=_program, + clip_extra=False) print("The quantized model is saved in: " + save_model_path) diff --git a/paddleslim/auto_compression/utils/prune_model.py b/paddleslim/auto_compression/utils/prune_model.py index c0da14ca9..a784aa11d 100644 --- a/paddleslim/auto_compression/utils/prune_model.py +++ b/paddleslim/auto_compression/utils/prune_model.py @@ -95,7 +95,8 @@ def get_sparse_model(executor, places, model_file, param_file, ratio, feed_vars=feed_vars, fetch_vars=fetch_targets, executor=executor, - program=inference_program) + program=inference_program, + clip_extra=False) print("The pruned model is saved in: ", save_path) @@ -170,4 +171,5 @@ def get_prune_model(executor, places, model_file, param_file, ratio, save_path): feed_vars=feed_vars, fetch_vars=fetch_targets, executor=executor, - program=main_program) + program=main_program, + clip_extra=False) diff --git a/paddleslim/common/load_model.py b/paddleslim/common/load_model.py index 0b4bef2c2..208bd7c7e 100644 --- a/paddleslim/common/load_model.py +++ b/paddleslim/common/load_model.py @@ -125,12 +125,13 @@ def load_onnx_model(model_path, version = x2paddle.__version__ v0, v1, v2 = version.split('.') version_sum = int(v0) * 100 + int(v1) * 10 + int(v2) - if version_sum < 139: - _logger.error( - "x2paddle>=1.3.9 is required, please use \"pip install x2paddle\"." + if version_sum != 139: + _logger.warning( + "x2paddle==1.3.9 is required, please use \"pip install x2paddle==1.3.9\"." ) + os.system('python -m pip install -U x2paddle==1.3.9') except: - os.system('python -m pip install -U x2paddle') + os.system('python -m pip install -U x2paddle==1.3.9') # check onnx installation and version try: pkg.require('onnx') @@ -217,11 +218,11 @@ def export_onnx(model_dir, try: import paddle2onnx version = paddle2onnx.__version__ - if version != '1.0.1': - os.system('python -m pip install -U paddle2onnx==1.0.1') + if version < '1.0.1': + os.system('python -m pip install -U paddle2onnx==1.0.3') except: from pip._internal import main - main(['install', 'paddle2onnx==1.0.1']) + main(['install', 'paddle2onnx==1.0.3']) import paddle2onnx paddle2onnx.command.c_paddle_to_onnx( model_file=os.path.join(model_dir, model_filename), diff --git a/paddleslim/common/patterns.py b/paddleslim/common/patterns.py index c5047d193..3335c65a4 100644 --- a/paddleslim/common/patterns.py +++ b/paddleslim/common/patterns.py @@ -79,8 +79,7 @@ def _is_ffn(pattern_ops, pattern_ops_type): def get_patterns(program, only_final_node=True): - """ distinguish the pattern in the program and get distillation node """ - distill_node = [] + """ distinguish the pattern in the program and get model type """ skip_quant_tensor_list = [] patterns = {} graph = GraphWrapper(program) @@ -124,10 +123,6 @@ def get_patterns(program, only_final_node=True): pattern_name = 'FFN$' + str(block_num) block_num += 1 - if not only_final_node: - distill_node.append('teacher_' + out_var_name) - distill_node.append(out_var_name) - if model_type == 'transformer' and ( 'fetch' in pattern_ops_type or pattern_ops_type[-1] == 'scale'): @@ -140,16 +135,6 @@ def get_patterns(program, only_final_node=True): patterns[pattern_name] = pattern_ops - if model_type != 'transformer' and (not only_final_node): - distill_node.append('teacher_' + out_var_name) - distill_node.append(out_var_name) - - ### add the output of final weight node to distill node - final_weight_node = find_final_nodes(program) - for out_var in final_weight_node: - distill_node.append('teacher_' + out_var.name()) - distill_node.append(out_var.name()) - #### skip quant matmul in attention if model_type == 'transformer': for block_id in range(len(program.blocks)): @@ -158,4 +143,4 @@ def get_patterns(program, only_final_node=True): if inp_name in skip_quant_tensor_list: op._set_attr("op_namescope", "skip_quant") - return patterns, distill_node, model_type + return patterns, model_type diff --git a/paddleslim/dist/__init__.py b/paddleslim/dist/__init__.py index de4b6196a..46a02564e 100755 --- a/paddleslim/dist/__init__.py +++ b/paddleslim/dist/__init__.py @@ -12,5 +12,5 @@ # See the License for the specific language governing permissions and # limitations under the License. -from .single_distiller import merge, fsp, l2, soft_label, loss, dkd +from .single_distiller import merge, fsp, l2, soft_label, loss, dkd, skd from .dml import DML diff --git a/paddleslim/dist/single_distiller.py b/paddleslim/dist/single_distiller.py index 8a658a6ae..ac349d589 100644 --- a/paddleslim/dist/single_distiller.py +++ b/paddleslim/dist/single_distiller.py @@ -15,6 +15,7 @@ import numpy as np import paddle from paddleslim.core import GraphWrapper +import paddle.nn.functional as F def merge(teacher_program, @@ -203,8 +204,11 @@ def soft_label(teacher_var_name, teacher_var = paddle.nn.functional.softmax(teacher_var / teacher_temperature) soft_label_loss = paddle.mean( - paddle.fluid.layers.cross_entropy( - student_var, teacher_var, soft_label=True)) + paddle.nn.functional.cross_entropy( + input=student_var, + label=teacher_var, + soft_label=True, + use_softmax=False)) return soft_label_loss @@ -305,3 +309,53 @@ def dkd(teacher_var_name, temperature=temperature, alpha=alpha, beta=beta) + + +def skd(teacher_var_name, student_var_name, program=None, multiplier=None): + """Combine variables from student model and teacher model + by Spherical Knowledge Distillation loss (aka. skd-loss). + Reference: https://github.com/forjiuzhou/Spherical-Knowledge-Distillation + Args: + teacher_var_name(str): The name of teacher_var. + student_var_name(str): The name of student_var. + program(Program): The input distiller program. If not specified, + the default program will be used. Default: None + multiplier(float): The multiplier to recover its norm to the original + level. When it's None, the appropriate multiplier can be computed by + teacher's logits with paddle.std(output_t, axis=1). Default: None. + + Returns: + Variable: skd distiller loss. + """ + if program == None: + program = paddle.static.default_main_program() + + student_var = program.global_block().var(student_var_name) + teacher_var = program.global_block().var(teacher_var_name) + teacher_var.stop_gradient = True + + if multiplier is None: + multiplier = paddle.std(teacher_var, axis=1, keepdim=True) + + logits_student = F.layer_norm( + student_var, + student_var.shape[1:], + weight=None, + bias=None, + epsilon=1e-7) * multiplier + logits_teacher = F.layer_norm( + teacher_var, + teacher_var.shape[1:], + weight=None, + bias=None, + epsilon=1e-7) * multiplier + + student_out = F.softmax(logits_student, axis=1) + teacher_out = F.softmax(logits_teacher, axis=1) + skd_loss = paddle.mean( + F.cross_entropy( + input=student_out, + label=teacher_out, + soft_label=True, + use_softmax=False)) + return skd_loss diff --git a/paddleslim/dygraph/quant/ptq.py b/paddleslim/dygraph/quant/ptq.py index 2d8e47d81..78727d95a 100644 --- a/paddleslim/dygraph/quant/ptq.py +++ b/paddleslim/dygraph/quant/ptq.py @@ -118,7 +118,7 @@ def find_conv_bn_names(self, model): return fuse_list - def save_quantized_model(self, model, path, input_spec=None): + def save_quantized_model(self, model, path, input_spec=None, **kwargs): """ Save the quantized inference model. @@ -131,7 +131,7 @@ def save_quantized_model(self, model, path, input_spec=None): InputSpec or example Tensor. If None, all input variables of the original Layer's forward method would be the inputs of the saved model. Default: None. - + kwargs (dict, optional): Other save configuration options for compatibility. Returns: None """ @@ -143,7 +143,7 @@ def save_quantized_model(self, model, path, input_spec=None): model.eval() self.ptq.save_quantized_model( - model=model, path=path, input_spec=input_spec) + model=model, path=path, input_spec=input_spec, **kwargs) if training: model.train() diff --git a/paddleslim/prune/pruner.py b/paddleslim/prune/pruner.py index 4c58c2e1d..d8242f17a 100644 --- a/paddleslim/prune/pruner.py +++ b/paddleslim/prune/pruner.py @@ -188,6 +188,16 @@ def _transform(self, items): for idx in src: idx = idx * repeat target.extend(range(idx, idx + repeat)) + elif "stride" in trans: + stride = trans['stride'] + target = src.repeat(stride) if stride > 1 else src + elif "squeeze" in trans: + repeat = trans['repeat'] + targets_set = set() + for idx in src: + targets_set.add(idx / repeat) + target = list(targets_set) + src = target ret.append((name, axis, src)) diff --git a/paddleslim/prune/sensitive.py b/paddleslim/prune/sensitive.py index dcddd6c3f..a032f345d 100644 --- a/paddleslim/prune/sensitive.py +++ b/paddleslim/prune/sensitive.py @@ -87,7 +87,7 @@ def sensitivity(program, if eval_args is None: baseline = eval_func(graph.program) else: - baseline = eval_func(eval_args) + baseline = eval_func(graph.program, *eval_args) pruner = Pruner(criterion=criterion) _logger.info("sensitive - param: {}; ratios: {}".format(name, @@ -104,7 +104,7 @@ def sensitivity(program, if eval_args is None: pruned_metric = eval_func(pruned_program) else: - pruned_metric = eval_func(eval_args) + pruned_metric = eval_func(pruned_program, *eval_args) loss = (baseline - pruned_metric) / baseline _logger.info("pruned param: {}; {}; loss={}".format(name, ratio, loss)) diff --git a/paddleslim/quant/analysis.py b/paddleslim/quant/analysis_ptq.py similarity index 97% rename from paddleslim/quant/analysis.py rename to paddleslim/quant/analysis_ptq.py index 7da2a10e6..c207eb56f 100644 --- a/paddleslim/quant/analysis.py +++ b/paddleslim/quant/analysis_ptq.py @@ -37,10 +37,10 @@ _logger = get_logger(__name__, level=logging.INFO) -__all__ = ["AnalysisQuant"] +__all__ = ["AnalysisPTQ"] -class AnalysisQuant(object): +class AnalysisPTQ(object): def __init__(self, model_dir, model_filename=None, @@ -51,7 +51,7 @@ def __init__(self, resume=False, ptq_config=None): """ - AnalysisQuant provides to analysis the sensitivity of each op in the model. + AnalysisPTQ provides to analysis the sensitivity of each op in the model. Args: model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx' @@ -86,8 +86,6 @@ def __init__(self, 'is_full_quantize'] if 'is_full_quantize' in ptq_config else False self.onnx_format = ptq_config[ 'onnx_format'] if 'onnx_format' in ptq_config else False - if 'algo' not in ptq_config: - ptq_config['algo'] = 'avg' if not os.path.exists(self.save_dir): os.mkdir(self.save_dir) @@ -112,7 +110,7 @@ def __init__(self, self.data_loader = wrap_dataloader(data_loader, self.feed_list) # quant model to get quantizable ops - post_training_quantization = self.create_ptq(executor, None, 'avg') + post_training_quantization = self.create_ptq(executor, None) _logger.info('Run PTQ before analysis.') program = post_training_quantization.quantize() @@ -170,7 +168,7 @@ def save_csv(self, data, save_name, csv_columns): writer.writerow(d) _logger.info('Activation Statistic is saved in {}'.format(save_path)) - def create_ptq(self, executor, skip_tensor_list, algo): + def create_ptq(self, executor, skip_tensor_list): return PostTrainingQuantization( executor=executor, data_loader=self.data_loader, @@ -178,7 +176,6 @@ def create_ptq(self, executor, skip_tensor_list, algo): model_filename=self.model_filename, params_filename=self.params_filename, skip_tensor_list=skip_tensor_list, - algo=algo, # avg fastest onnx_format=self.onnx_format, **self.ptq_config) @@ -196,8 +193,7 @@ def sampling(self, executor, program, scope): def eval_quant_model(self, skip_list): executor = paddle.static.Executor(self.places) - post_training_quantization = self.create_ptq( - executor, skip_list, algo='avg') + post_training_quantization = self.create_ptq(executor, skip_list) program = post_training_quantization.quantize() _logger.info('Evaluating...') if self.onnx_format: @@ -313,7 +309,7 @@ def collect_quant_stat(self): _logger.info('Collecting Statistic After PTQ...') executor = paddle.static.Executor(self.places) scope = global_scope() - post_training_quantization = self.create_ptq(executor, None, algo='avg') + post_training_quantization = self.create_ptq(executor, None) program = post_training_quantization.quantize() persistable_var_names = [] @@ -407,7 +403,8 @@ def collect_statistic(self, statistic = [] box_fp_dist, box_q_dist = [], [] hist_fp_dist, hist_q_dist = {}, {} - for var_name in fp_tensors: + fp_tensor_names = sorted(list(fp_tensors.keys())) + for var_name in fp_tensor_names: fp_tensor = fp_tensors[var_name] quant_name = var_name_map[ var_name] if var_name_map is not None else var_name @@ -507,7 +504,9 @@ def plot_hist_distribution(self, hist_data, save_name): for name in hist_data: plt.hist(hist_data[name][0], bins=hist_data[name][1]) plt.xlabel(name) - plt.ylabel("Frequency") + plt.ylabel("Probability") + locs, _ = plt.yticks() + plt.yticks(locs, np.round(locs / len(hist_data[name][0]), 3)) if 'act' in save_name: plt.title("Hist of Activation {}".format(name)) else: @@ -538,11 +537,7 @@ def get_target_quant_model(self, target_metric): skip_list.append(rank_list.pop(0)) _logger.info('Skip Ops: {}'.format(skip_list)) executor = paddle.static.Executor(self.places) - post_training_quantization = self.create_ptq( - executor, - skip_list, - algo=self.ptq_config['algo'] - if 'algo' in self.ptq_config else 'KL') + post_training_quantization = self.create_ptq(executor, skip_list) program = post_training_quantization.quantize() _logger.info('Evaluating...') diff --git a/paddleslim/quant/analysis_qat.py b/paddleslim/quant/analysis_qat.py new file mode 100644 index 000000000..98a990333 --- /dev/null +++ b/paddleslim/quant/analysis_qat.py @@ -0,0 +1,266 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import pickle +import copy +import logging +import numpy as np + +import paddle +from paddle.fluid import core +from paddle.fluid.framework import IrGraph +from ..common import get_logger, load_inference_model + +_logger = get_logger(__name__, level=logging.INFO) + +__all__ = ["AnalysisQAT"] + + +class AnalysisQAT(object): + def __init__(self, + quant_model_dir, + float_model_dir, + model_filename=None, + params_filename=None, + quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"], + qat_metric=None, + eval_function=None, + data_loader=None, + save_dir='analysis_results', + resume=False): + ''' + AnalysisQAT provides to analysis the sensitivity of each op in the model. + + Args: + quant_model_dir(str): the path of INT8 model that quantized through QAT + float_model_dir(str): the path of FP32 model that is the base model of quant_model + model_filename(str, optional): the model file name of the model + params_filename(str, optional): the parameter file name of the model + quantizable_op_type(list of str, optional): the type of op that will be analyzed + qat_metric(float, optional): the metric of the quantized model, which will be calculated automatically if is None + eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model. + data_loader(Python Generator, Paddle.io.DataLoader, optional): the + Generator or Dataloader provides calibrate data, and it could + return a batch every time + save_dir(str, optional): the output dir that stores the analyzed information + resume(bool, optional): When break off while ananlyzing, could resume analysis program and load already analyzed information. + ''' + if model_filename is None: + model_filename = 'model.pdmodel' + if params_filename is None: + params_filename = 'model.pdiparams' + self.quant_model_dir = quant_model_dir + self.float_model_dir = float_model_dir + self.model_filename = model_filename + self.params_filename = params_filename + self.quantizable_op_type = quantizable_op_type + self.qat_metric = qat_metric + self.eval_function = eval_function + self.save_dir = save_dir + self.checkpoint_name = os.path.join(save_dir, 'analysis_checkpoint.pkl') + self.nonquant_layer_metrics = {} + if not os.path.exists(self.save_dir): + os.mkdir(self.save_dir) + + devices = paddle.device.get_device().split(':')[0] + self.places = paddle.device._convert_to_place(devices) + executor = paddle.static.Executor(self.places) + [program, self.feed_list, self.fetch_list] = load_inference_model( + self.quant_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + _logger.info('Loaded model from: {}'.format(quant_model_dir)) + + graph = IrGraph(core.Graph(program.desc), for_test=True) + + # find all inputs for each quantizable op + self.inputs_of_quantized_op = [] + sorted_ops = graph.topology_sort() + for op_node in sorted_ops: + op_name = op_node.name() + if op_name in quantizable_op_type: + input_names = op_node.op().input_arg_names() + for input_name in input_names: + if 'quantized' in input_name: + self.inputs_of_quantized_op.append(input_names) + break + + if self.qat_metric is None: + _logger.info('Calculating the metric of QAT model...') + self.qat_metric = self.eval_function( + executor, program, self.feed_list, self.fetch_list) * 100 + _logger.info('The metric of QAT model is {}'.format( + round(self.qat_metric, 4))) + executor.close() + + def save_checkpoint(self): + if not os.path.exists(self.save_dir): + os.makedirs(self.save_dir) + with open(self.checkpoint_name, 'wb') as f: + pickle.dump(self.nonquant_layer_metrics, f) + _logger.info('Save checkpoint to {}.'.format(self.checkpoint_name)) + + def load_checkpoint(self): + if not os.path.exists(self.checkpoint_name): + _logger.info('Checkpoint path {} does not exist.'.format( + self.checkpoint_name)) + return False + with open(self.checkpoint_name, 'rb') as f: + self.nonquant_layer_metrics = pickle.load(f) + _logger.info('Load checkpoint from {}.'.format(self.checkpoint_name)) + return True + + def get_weight_name(self, inputs_names): + # TODO(xc) + w_idx = 0 if 'w_0' in inputs_names[0] else 1 + weight_name = inputs_names[w_idx].split('.quantized.dequantized')[0] + return weight_name + + def get_new_in_out_map( + self, + input_list, + graph, + float_scope, + quant_scope, ): + + input_rename_map = {} + output_rename_map = {} + removed_ops = [] + for op_node in graph.all_op_nodes(): + if op_node.id() in removed_ops: + continue + in_names = op_node.input_arg_names() + out_names = op_node.output_arg_names() + if len(out_names) == 1 and out_names[0] in input_list: + in_var = graph._find_node_by_name(op_node.inputs, + op_node.input('X')[0]) + out_var = graph._find_node_by_name(op_node.outputs, + op_node.output('Y')[0]) + if 'quantized' in in_var.name(): + # act + for op in graph.all_op_nodes(): + o_ns = op.output_arg_names() + if len(o_ns) == 1 and o_ns[0] == in_var.name(): + in_var_1 = graph._find_node_by_name( + op.inputs, op.input('X')[0]) + graph.safe_remove_nodes(op) + removed_ops.append(op.id()) + input_rename_map[out_var.node] = in_var_1 + else: + # weight + with paddle.static.scope_guard(float_scope): + float_weight = np.array( + float_scope.find_var(in_var.name()).get_tensor()) + with paddle.static.scope_guard(quant_scope): + quant_scope.find_var(in_var.name()).get_tensor().set( + float_weight, self.places) + input_rename_map[out_var.node] = in_var + graph.safe_remove_nodes(op_node) + removed_ops.append(op_node.id()) + output_rename_map[in_var.node] = out_var + + return input_rename_map, output_rename_map, removed_ops + + def relink_graph(self, graph, input_rename_map, output_rename_map, + removed_ops): + for op_node in graph.all_op_nodes(): + if op_node.id() in removed_ops: + continue + for var in op_node.inputs: + if var.node in input_rename_map: + old_in = var + new_in = input_rename_map[var.node] + graph.update_input_link(old_in, new_in, op_node) + _logger.info( + f'relink {op_node.name()} \'s input node from {old_in.name()} to {new_in.name()}.' + ) + for var in op_node.outputs: + if var.node in output_rename_map: + old_out = var + new_out = output_rename_map[var.node] + graph.update_input_link(old_out, new_out, op_node) + _logger.info( + f'relink {op_node.name()} \'s output node from {old_out.name()} to {new_out.name()}.' + ) + + return graph.to_program() + + def metric_error_analyse(self): + executor = paddle.static.Executor(self.places) + + float_scope = paddle.static.Scope() + quant_scope = paddle.static.Scope() + + for idx, input_list in enumerate(self.inputs_of_quantized_op): + weight_name = self.get_weight_name(input_list) + _logger.info( + 'Checking {}/{} quant model: without quant layer {}'.format( + idx + 1, len(self.inputs_of_quantized_op), weight_name)) + + with paddle.static.scope_guard(float_scope): + load_inference_model( + self.float_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + + with paddle.static.scope_guard(quant_scope): + [program, self.feed_list, + self.fetch_list] = load_inference_model( + self.quant_model_dir, + executor=executor, + model_filename=self.model_filename, + params_filename=self.params_filename) + + program_copy = program.clone() + graph = IrGraph(core.Graph(program_copy.desc), for_test=True) + input_rename_map, output_rename_map, removed_ops = self.get_new_in_out_map( + input_list, graph, float_scope, quant_scope) + saved_program = self.relink_graph(graph, input_rename_map, + output_rename_map, removed_ops) + with paddle.static.scope_guard(quant_scope): + _logger.info('Skip quant {}, evaluating....'.format( + weight_name)) + metric = self.eval_function(executor, saved_program, + self.feed_list, + self.fetch_list) * 100 + self.nonquant_layer_metrics[weight_name] = metric + _logger.info( + 'When skip quant {}, the metric is {}, the diff is {}'. + format(weight_name, + round(metric, 4), round(metric - self.qat_metric, + 4))) + self.save_checkpoint() + + executor.close() + + self.sensitivity_ranklist = sorted( + self.nonquant_layer_metrics, + key=self.nonquant_layer_metrics.get, + reverse=True) + _logger.info('Finished computing the sensitivity of the model.') + for name in self.sensitivity_ranklist: + _logger.info("without quant layer name: {}, eval metric: {}".format( + name, self.nonquant_layer_metrics[name])) + + analysis_file = os.path.join(self.save_dir, "analysis.txt") + with open(analysis_file, "w") as analysis_ret_f: + for name in self.sensitivity_ranklist: + analysis_ret_f.write( + "without layer name: {}, eval metric: {}\n".format( + name, self.nonquant_layer_metrics[name])) + _logger.info('Analysis file is saved in {}'.format(analysis_file)) diff --git a/paddleslim/quant/post_quant_hpo.py b/paddleslim/quant/post_quant_hpo.py index 92617a4c5..9d631c6b9 100755 --- a/paddleslim/quant/post_quant_hpo.py +++ b/paddleslim/quant/post_quant_hpo.py @@ -417,8 +417,11 @@ def quant_post_hpo( try: import smac + assert smac.version == '1.4.0' except: - os.system('python -m pip install -U smac') + _logger.warning( + "smac==1.4.0 is required, please use \"pip install smac==1.4.0\".") + os.system('python -m pip install smac==1.4.0') # smac from ConfigSpace.hyperparameters import CategoricalHyperparameter, \ UniformFloatHyperparameter, UniformIntegerHyperparameter diff --git a/paddleslim/quant/quant_aware_with_infermodel.py b/paddleslim/quant/quant_aware_with_infermodel.py index f7c7367e9..89af10236 100644 --- a/paddleslim/quant/quant_aware_with_infermodel.py +++ b/paddleslim/quant/quant_aware_with_infermodel.py @@ -30,7 +30,6 @@ from ..dist import merge, l2, soft_label, fsp from ..auto_compression.create_compressed_program import build_distill_program import logging -logging.getLogger().setLevel(logging.INFO) from ..common import get_logger _logger = get_logger(__name__, level=logging.INFO) diff --git a/paddleslim/quant/quanter.py b/paddleslim/quant/quanter.py index 85d240bc9..7c6af4777 100755 --- a/paddleslim/quant/quanter.py +++ b/paddleslim/quant/quanter.py @@ -319,7 +319,7 @@ def _is_skip_layernorm(program, op): skip_tensor_list = [] same_scale_tensor_list = [] if model_type == 'transformer' and pattern_ops is None: - pattern_ops, _, model_type = get_patterns(program) + pattern_ops, model_type = get_patterns(program) if model_type != 'transformer': _logger.info( 'Warning! After analysis, the real model type is not transformer! If you encounter this situation, please raise an issue let us know in which case "get_patterns" determines model type is not transformer.' diff --git a/paddleslim/quant/reconstruction_quantization.py b/paddleslim/quant/reconstruction_quantization.py old mode 100755 new mode 100644 index e081a6dec..48e233d6f --- a/paddleslim/quant/reconstruction_quantization.py +++ b/paddleslim/quant/reconstruction_quantization.py @@ -23,13 +23,9 @@ import numpy as np import paddle -import paddle.fluid as fluid -from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization -from paddle.fluid.contrib.slim.quantization import utils - from ..dist import merge from ..core.graph_wrapper import GraphWrapper -from ..common import get_logger +from ..common import get_logger, recover_program __all__ = ['ReconstructionQuantization', ] @@ -52,7 +48,8 @@ def _get_config(self): return self._config -class ReconstructionQuantization(PostTrainingQuantization): +class ReconstructionQuantization( + paddle.fluid.contrib.slim.quantization.PostTrainingQuantization): """ Utilizing reconstruction quantization method to quantize the FP32 model, and it uses calibrate data to get the quantization information for all @@ -75,7 +72,6 @@ def quantize(self): Load the FP32 model, and use the calibrate data to calculate the forward-stage. Based on the sample data, we can get the quantization information, and obtain the final quantized model. - Args: None Returns: @@ -96,7 +92,7 @@ def quantize(self): def _preparation(self): batch_id = 0 - with utils.tqdm( + with paddle.fluid.contrib.slim.quantization.utils.tqdm( total=self._batch_nums, bar_format='Preparation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', ncols=80, ) as t: @@ -116,7 +112,7 @@ def _preparation(self): def _sampling_threshold(self): batch_id = 0 - with utils.tqdm( + with paddle.fluid.contrib.slim.quantization.utils.tqdm( total=self._batch_nums, bar_format='Sampling stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', ncols=80, ) as t: @@ -156,18 +152,27 @@ def _reconstruction(self): scope=self._scope, place=self._place, quantized_op_pairs=self._quantized_op_pairs, + weight_op_pairs=self._weight_op_pairs, weight_quantize_type=self._weight_quantize_type, + activation_bits=self._activation_bits, + weight_bits=self._weight_bits, scale_dict=copy.deepcopy(self._scale_dict), regions=self._config['regions'], region_weights_names=self._config['region_weights_names'], recon_level=self._config['recon_level'], simulate_activation_quant=self._config['simulate_activation_quant'], + skip_tensor_list=self._skip_tensor_list, num_iterations=self._batch_nums, lr=self._config['lr'], bias_correction=self._bias_correction, epochs=self._config['epochs'], - scale_trainable=self._config['scale_trainable']) - self._program = reconstruction_quanter._run() + limit=self._config['limit']) + self._program, self._scale_dict = reconstruction_quanter._run() + + if self._algo in ["KL", "hist"]: + self._quantized_var_threshold = self._scale_dict + else: + self._quantized_threshold = self._scale_dict def _postprocessing(self): if self._algo is 'min_max': @@ -210,27 +215,30 @@ def __init__(self, scope, place, quantized_op_pairs, + weight_op_pairs, weight_quantize_type, + activation_bits, + weight_bits, scale_dict, regions, region_weights_names, recon_level, simulate_activation_quant, + skip_tensor_list=None, num_iterations=1000, lr=0.1, bias_correction=False, epochs=20, - scale_trainable=False, - drop_prob=0.5): + drop_prob=0.5, + limit=5): ''' Reconstruction Quanter, used to optimize the rounding policy by reconstructing the intermediate output. - Args: data_loader(Python Generator, Paddle.io.DataLoader, optional): The Generator or Dataloader provides calibrate data, and it could return a batch every time. - executor(fluid.Executor): The executor to load, run and save the + executor(paddle.static.Executor): The executor to load, run and save the quantized model. scope(fluid.Scope, optional): The scope of the program, use it to load and save variables. If scope=None, get scope by global_scope(). @@ -250,6 +258,7 @@ def __init__(self, Currently support ['layer-wise', 'region-wise'] types. Default is layer-wise. simulate_activation_quant(bool, optional): Whether we need the noise caused by activation quantization during the reconstruction process. + skip_tensor_list(list): List of skip quant tensor name. regions(list[list], optional): The list of some regions, each region is a subgraph of fp32 program and it will have exact 1 input operation and 1 output operation. When the recon-level is region, the reconstruction loss of each region is minimized. @@ -259,20 +268,17 @@ def __init__(self, lr(float, optional): The learning rate of Reconstruction Quanter. Default is 0.1. bias_correction(bool, optional): If set as True, use the bias correction method of https://arxiv.org/abs/1810.05723. Default is False. - scale_trainable: Wether weight‘s scale is trainable. Default is False. - drop_prob: The dropout probability of activation quantization, and it is valid only if + drop_prob(float, optional): The dropout probability of activation quantization, and it is valid only if simulate_activation_quant is True. Default is 0.5. + limit(int, optional): The size of each region. Default is 5. Returns: None ''' assert recon_level in [ 'layer-wise', 'region-wise' - ], "recon_level must be one of the ['layer-wise', 'region-wise'],but received: {}".format( + ], "recon_level must be one of the ['layer-wise', 'region-wise'], but received: {}".format( recon_level) - if recon_level == 'region-wise': - assert regions is not None, "The regions cannot be None." - assert region_weights_names is not None, "The region_weights_names cannot be None." self._simulate_activation_quant = simulate_activation_quant self._program = fp32_program self._data_loader = data_loader @@ -283,20 +289,31 @@ def __init__(self, self._scope = scope self._place = place self._quantized_op_pairs = quantized_op_pairs + self._weight_op_pairs = weight_op_pairs self._weight_var_names = list(self._quantized_op_pairs.keys()) self._weight_quantize_type = weight_quantize_type self._scale_dict = scale_dict + self._activation_bits = activation_bits + self._weight_bits = weight_bits self._num_iterations = num_iterations self._epochs = epochs self._lr = lr self._regions = regions self._region_weights_names = region_weights_names self._bias_correction = bias_correction - if self._recon_level == 'layer-wise': + self._limit = limit + self._skip_tensor_list = skip_tensor_list + + if recon_level == 'region-wise' and regions is None: + builder = RegionBuilder(program=self._program) + _logger.info('Begin Region division') + self._regions, self._region_weights_names = builder._create_regions( + limit=self._limit) + _logger.info('End Region division') + elif self._recon_level == 'layer-wise': regions, region_weights_names = self._get_layers() self._regions = regions self._region_weights_names = region_weights_names - self._scale_trainable = scale_trainable self._drop_prob = drop_prob def _get_layers(self): @@ -306,13 +323,16 @@ def _get_layers(self): self._input_weight_pairs = {} for block_id in range(len(self._program.blocks)): for op in self._program.blocks[block_id].ops: - in_var_names = utils._get_op_input_var_names(op) + in_var_names = paddle.fluid.contrib.slim.quantization.utils._get_op_input_var_names( + op) for in_var_name in in_var_names: if in_var_name in persistable_var_names: in_var_names.remove(in_var_name) self._input_weight_pairs[in_var_name] = in_var_names break for name in self._weight_var_names: + if self._skip_tensor_list is not None and name in self._skip_tensor_list: + continue region_weights_names.append([name]) region_ = [] region_.append(self._input_weight_pairs[name][0]) @@ -321,6 +341,13 @@ def _get_layers(self): return regions, region_weights_names def _preprocess(self): + + if self._weight_quantize_type == 'channel_wise_abs_max': + for name in self._weight_var_names: + for i, s in enumerate(self._scale_dict[name]): + if s == 0.0: + self._scale_dict[name][i] = 1e-8 + data_name_map = {} for name in self._feed_list: data_name_map[name] = name @@ -333,15 +360,7 @@ def _preprocess(self): teacher_scope=None, name_prefix="teacher_", merge_feed=True, ) - for name in self._weight_var_names: - weight_np = utils.load_variable_data(self._scope, name) - scale = self._scale_dict[name] - weight_np_floor = np.floor(utils.quant_tensor(weight_np, scale)) - utils.set_variable_data( - self._scope, - self._place, - name, - weight_np_floor, ) + self._graph = GraphWrapper(self._student_program) if self._simulate_activation_quant: @@ -352,40 +371,43 @@ def _preprocess(self): def _run(self): self._preprocess() startup_program = paddle.static.Program() + tmp_program = self._student_program.clone() for k in range(len(self._regions)): region_ = self._regions[k] - names = self._region_weights_names[k] - tmp_program = self._student_program.clone() + tmp_program.global_block().var(region_[0]).stop_gradient = True quant_op_out_name = region_[1] + _logger.info(f"Region's input: {region_[0]} output: {region_[1]}") + + names = self._region_weights_names[k] + _logger.info(f"Current quanted weights: {names}") + loss_function = ReconstructionQuanterLoss( + program=tmp_program, weight_region_names=names) + update_params = [ + tmp_program.global_block().var(name + '.alpha') + for name in names + ] + with paddle.static.program_guard(tmp_program, startup_program): - loss_function = ReconstructionQuanterLoss(tmp_program, names) - quant_op_out_name = region_[1] student_var = tmp_program.global_block().var(quant_op_out_name) teacher_var = tmp_program.global_block().var("teacher_" + quant_op_out_name) - scheduler = paddle.optimizer.lr.CosineAnnealingDecay( - learning_rate=20, - eta_min=2, - T_max=2000, - verbose=True, ) total_loss, recon_loss, round_loss = loss_function.get_loss( student_var, - teacher_var, - scheduler, ) + teacher_var, ) train_fetches_loss = { "total_loss": total_loss, "recon_loss": recon_loss, "round_loss": round_loss, } - optimizer = paddle.optimizer.Adam(learning_rate=self._lr) + optimizer = paddle.optimizer.Adam( + learning_rate=self._lr, parameters=update_params) optimizer.minimize(total_loss) - self._exe.run(startup_program) start_time = time.time() prev_start_time = start_time - loader = self._data_loader() + for epoch in range(self._epochs): - for i, data in enumerate(loader): + for i, data in (enumerate(self._data_loader())): prev_start_time = start_time start_time = time.time() out = self._exe.run( @@ -396,52 +418,64 @@ def _run(self): ], return_numpy=True, ) _logger.info( - "Iter {:d}, lr {}, total_loss {:.5f}, recon_loss {:.5f}, round_loss {:.5f}, time {:.5f}s" - .format(epoch, self._lr, + "Epoch {:d}, Iter {:d}, lr {}, total_loss {:.5f}, recon_loss {:.5f}, round_loss {:.5f}, time {:.5f}s" + .format(epoch, i, self._lr, np.mean(out[0]), np.mean(out[1]), np.mean(out[2]), start_time - prev_start_time), ) sys.stdout.flush() - if i == self._num_iterations: + if i + 1 == self._num_iterations: break + if self._weight_quantize_type == 'channel_wise_abs_max': + self._update_scale() self._update_weights_to_int() if self._bias_correction: self._bias_correction_w() - return self._program + return self._program, self._scale_dict def _init_alpha(self, name, scale): - _tensor = utils.load_variable_data(self._scope, "teacher_" + name) - tensor_scaled = utils.quant_tensor(_tensor, scale) + _tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( + self._scope, "teacher_" + name) + tensor_scaled = paddle.fluid.contrib.slim.quantization.utils.quant_tensor( + x=_tensor, + scale=scale, + weight_bits=self._weight_bits, + quant_axis=0 if self._weight_op_pairs[name] not in paddle.fluid. + contrib.slim.quantization.utils._channelwise_quant_axis1_ops else 1) tensor_floor = np.floor(tensor_scaled) tensor = tensor_scaled - tensor_floor alpha = -np.log((ZETA - GAMMA) / (tensor - GAMMA) - 1) return alpha - def _soft_rounding(self, weight, scale, weight_bits=8): + def _soft_rounding(self, weight, scale): """ Define network of soft rounding. Args: weight: The quanted weight with dtype=float32 """ - bnt = (1 << (weight_bits - 1)) - 1 + bnt = (1 << (self._weight_bits - 1)) - 1 + + def _quant(x, scale): + s = scale / bnt + quant_x = x / s + return quant_x def _dequant(x, scale): - s = (scale + 1e-8) / bnt + s = scale / bnt dequant_x = s * x return dequant_x - quantized_weight = paddle.static.data( + weight_copy = paddle.static.data( shape=weight.shape, dtype=weight.dtype, - name=weight.name + '_quant', ) + name=weight.name + '_copy', ) v = paddle.static.create_parameter( shape=weight.shape, dtype=weight.dtype, name=weight.name + ".alpha", - default_initializer=fluid.initializer.NumpyArrayInitializer( - self._alpha, ), ) + default_initializer=paddle.nn.initializer.Assign(self._alpha, ), ) h_v = paddle.clip( paddle.nn.functional.sigmoid(v) * (ZETA - GAMMA) + GAMMA, @@ -453,15 +487,21 @@ def _dequant(x, scale): dtype=weight.dtype, shape=weight.shape, name=weight.name + '.scale', - default_initializer=fluid.initializer.NumpyArrayInitializer( - scale, ), ) + default_initializer=paddle.nn.initializer.Assign(scale, )) else: scale_var = scale - w = _dequant(quantized_weight + h_v, scale_var) + + quantized_weight = _quant(weight_copy, scale_var) + floor_weight = (paddle.floor(quantized_weight) - quantized_weight + ).detach() + quantized_weight + clip_weight = paddle.clip(floor_weight + h_v, -bnt, bnt) + w = _dequant(clip_weight, scale_var) return w def _insert_soft_rounding(self): for name in self._weight_var_names: + if self._skip_tensor_list is not None and name in self._skip_tensor_list: + continue weight = self._graph.var(name) scale = self._scale_dict[name] shape = weight.shape() @@ -470,18 +510,18 @@ def _insert_soft_rounding(self): scale = np.array(scale) scale = scale.reshape(scale.shape[0], 1) if len(shape) == 2: - scale = scale.repeat(shape[0], axis=0) + scale = scale.repeat(shape[0], axis=1).T else: scale = scale.repeat(shape[1] * shape[2] * shape[3], axis=1) - scale = scale.reshape(shape) + scale = scale.reshape(shape) self._insert_func(var=weight, scale=scale, func="_soft_rounding") - def _drop_quant_dequant(self, inputs, scale, weight_bits=8): + def _drop_quant_dequant(self, inputs, scale): x = paddle.static.data( shape=inputs.shape, dtype=inputs.dtype, name=inputs.name + '.tmp', ) - bnt = (1 << (weight_bits - 1)) - 1 + bnt = (1 << (self._weight_bits - 1)) - 1 scale = scale / bnt dequantized_tensor = paddle.round(x / scale) * scale quant_noise = x - dequantized_tensor @@ -491,13 +531,14 @@ def _drop_quant_dequant(self, inputs, scale, weight_bits=8): def _insert_drop_quant_dequant(self): for op in self._graph.ops(): - if op.type() in ['conv2d', 'depthwise_conv2d', 'mul']: + if op.type( + ) in ['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2']: if op.type() in ['conv2d', 'depthwise_conv2d']: if op.inputs("Filter")[0].name().startswith("teacher"): break else: input = op.inputs("Input")[0] - if op.type() in ['mul']: + if op.type() in ['mul', 'matmul', 'matmul_v2']: if op.inputs("Y")[0].name().startswith("teacher"): break else: @@ -522,7 +563,7 @@ def _insert_func(self, var, scale, func): self._exe.run(startup_program) # create var in program for new_var in new_program.list_vars(): - if new_var.name == var._var.name + '_quant' or new_var.name == var._var.name + '.tmp': + if new_var.name == var._var.name + '_copy' or new_var.name == var._var.name + '.tmp': continue elif new_var.name == var._var.name + '.alpha': program.global_block().create_parameter( @@ -530,7 +571,8 @@ def _insert_func(self, var, scale, func): shape=new_var.shape, dtype=new_var.dtype, type=new_var.type, - stop_gradient=new_var.stop_gradient, ) + stop_gradient=False, + trainable=True) elif new_var.name == var._var.name + '.scale': program.global_block().create_parameter( name=new_var.name, @@ -538,7 +580,7 @@ def _insert_func(self, var, scale, func): dtype=new_var.dtype, type=new_var.type, stop_gradient=True, - trainable=self._scale_trainable, ) + trainable=False) else: if func == "_soft_rounding": program.global_block().create_var( @@ -550,7 +592,7 @@ def _insert_func(self, var, scale, func): stop_gradient=new_var.stop_gradient, ) else: program.global_block().create_var( - name=new_var.name, + name=new_var.name + '.qdrop', shape=new_var.shape, dtype=new_var.dtype, type=new_var.type, @@ -561,11 +603,12 @@ def _insert_func(self, var, scale, func): block = var._var.block # prepend new_program's op in program for _op in ops: - if _op.type() not in ['conv2d', 'depthwise_conv2d', 'mul']: + if _op.type() not in [ + 'conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2' + ]: continue idx = block.ops.index(_op._op) for op in op_list: - # _attrs = op.all_attrs() _type = op.type _attrs = { 'use_mkldnn': False, @@ -585,7 +628,7 @@ def _insert_func(self, var, scale, func): 'scale': op.attr('scale'), 'bias_after_scale': op.attr('bias_after_scale'), } - elif _type == 'elementwise_mul': + elif _type in ['elementwise_mul', 'elementwise_div']: _attrs = { 'use_mkldnn': False, 'with_quant_attr': False, @@ -597,43 +640,47 @@ def _insert_func(self, var, scale, func): if func == "_soft_rounding": _outputs = {'Out': op.output('Out')[0] + '.rounding'} - if _type == "elementwise_add": + if _type in [ + "elementwise_add", "elementwise_sub", + "elementwise_mul" + ]: _inputs = { - 'X': var. - _var, # replace tmp var conv.weight_quant with var conv.weight + 'X': op.input('X')[0] + '.rounding', 'Y': op.input('Y')[0] + '.rounding', } - elif _type == "elementwise_mul": + elif _type == "elementwise_div": _inputs = { - 'X': op.input('X')[0] + '.rounding', + 'X': var._var, 'Y': op.input('Y')[0] + '.rounding', } elif (_type == 'scale' and op.input('X')[0].endswith('scale') ) or _type == 'sigmoid': _inputs = {'X': op.input('X')[0]} + elif (_type == 'scale' and + op.input('X')[0].endswith('copy')): + _inputs = {'X': var._var} else: _inputs = {'X': op.input('X')[0] + '.rounding'} elif func == "_drop_quant_dequant": if _type == 'dropout': _outputs = { - 'Out': op.output('Out')[0], - 'Mask': op.output('Mask')[0], + 'Out': op.output('Out')[0] + '.qdrop', + 'Mask': op.output('Mask')[0] + '.qdrop', } else: - _outputs = {'Out': op.output('Out')[0]} + _outputs = {'Out': op.output('Out')[0] + '.qdrop'} if _type == 'elementwise_add' or _type == 'elementwise_sub': _inputs = { - 'X': var. - _var, # replace tmp var conv.weight_quant with var conv.weight - 'Y': op.input('Y'), + 'X': var._var, + 'Y': op.input('Y')[0] + '.qdrop', } elif _type == 'scale' and op.input('X')[ 0] == inputs.name + '.tmp': _inputs = {'X': var._var} else: - _inputs = {'X': op.input('X')[0]} + _inputs = {'X': op.input('X')[0] + '.qdrop'} block._insert_op( idx, @@ -642,18 +689,20 @@ def _insert_func(self, var, scale, func): inputs=_inputs, outputs=_outputs, ) for op in ops: - if op.type() not in ['conv2d', 'depthwise_conv2d', 'mul']: + if op.type() not in [ + 'conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2' + ]: continue if op.type() in ['conv2d', 'depthwise_conv2d'] and op.inputs( 'Filter')[0].name().startswith('teacher'): continue - if op.type() in ['mul'] and op.inputs('Y')[0].name().startswith( - 'teacher'): + if op.type() in ['mul', 'matmul', 'matmul_v2'] and op.inputs('Y')[ + 0].name().startswith('teacher'): continue if func == '_soft_rounding': op._op._rename_input(inputs.name, out.name + '.rounding') else: - op._op._rename_input(inputs.name, out.name) + op._op._rename_input(inputs.name, out.name + '.qdrop') def _isolate_regions(self): starts = [region[0] for region in self._regions] @@ -692,37 +741,67 @@ def _duplicate_var(self, var): op_._rename_input(var_.name, duplicated_var.name) return vars + def _update_scale(self): + for _name in self._weight_var_names: + if self._skip_tensor_list is not None and _name in self._skip_tensor_list: + continue + scale_name = _name + '.scale' + scale_tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( + self._scope, scale_name) + scale_list = [] + if self._weight_op_pairs[ + _name] in paddle.fluid.contrib.slim.quantization.utils._channelwise_quant_axis1_ops: + scale_list = list(scale_tensor[0]) + else: + for i in range(scale_tensor.shape[0]): + scale_list.append(scale_tensor[i][0][0][0]) + self._scale_dict[scale_name] = scale_list + def _update_weights_to_int(self): for weight_var_name in self._weight_var_names: - alpha_tensor = utils.load_variable_data( + if self._skip_tensor_list is not None and weight_var_name in self._skip_tensor_list: + continue + alpha_tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( self._scope, weight_var_name + '.alpha', ) h_alpha_tensor = self._compute_soft_rounding_np(alpha_tensor) - weight_quant_tensor = utils.load_variable_data( + weight_tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( self._scope, weight_var_name, ) - utils.set_variable_data( + weight_quant_tensor = paddle.fluid.contrib.slim.quantization.utils.quant_tensor( + x=weight_tensor, + scale=self._scale_dict[weight_var_name], + weight_bits=self._weight_bits, + quant_axis=0 + if self._weight_op_pairs[weight_var_name] not in paddle.fluid. + contrib.slim.quantization.utils._channelwise_quant_axis1_ops + else 1) + + paddle.fluid.contrib.slim.quantization.utils.set_variable_data( self._scope, self._place, weight_var_name, - np.round(weight_quant_tensor + h_alpha_tensor, ), ) + np.floor(weight_quant_tensor) + h_alpha_tensor, ) def _bias_correction_w(self): for weight_var_name in self._weight_var_names: - weight_var_tensor = utils.load_variable_data( + weight_var_tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( self._scope, "teacher_" + weight_var_name, ) - weight_quant_tensor = utils.load_variable_data( + weight_quant_tensor = paddle.fluid.contrib.slim.quantization.utils.load_variable_data( self._scope, weight_var_name, ) scale = self._scale_dict[weight_var_name] - final_weight_tensor = utils.bias_correction_w( + final_weight_tensor = paddle.fluid.contrib.slim.quantization.utils.bias_correction_w( weight_var_tensor, weight_quant_tensor, scale, - quant_axis=0, - weight_bits=8, ) - utils.set_variable_data( + quant_axis=0 + if self._weight_op_pairs[weight_var_name] not in paddle.fluid. + contrib.slim.quantization.utils._channelwise_quant_axis1_ops + else 1, + weight_bits=self._weight_bits, ) + paddle.fluid.contrib.slim.quantization.utils.set_variable_data( self._scope, self._place, weight_var_name, @@ -730,7 +809,8 @@ def _bias_correction_w(self): def _compute_soft_rounding_np(self, alpha_v): return np.clip( - utils.stable_sigmoid(alpha_v) * (ZETA - GAMMA) + GAMMA, + paddle.fluid.contrib.slim.quantization.utils.stable_sigmoid(alpha_v) + * (ZETA - GAMMA) + GAMMA, a_min=0, a_max=1, ) @@ -752,7 +832,6 @@ def __init__(self, weight=0.1): """ The loss function of Rounding Optimizer. - Args: program(Program): The student program. weight_region_names(list, optional): The weight names inside a region. @@ -776,7 +855,7 @@ def compute_soft_rounding(self, alpha_v): paddle.nn.functional.sigmoid(alpha_v) * (ZETA - GAMMA) + GAMMA, 0, 1) - def get_loss(self, student_tensor, teacher_tensor, scheduler): + def get_loss(self, student_tensor, teacher_tensor, scheduler=None): if self.rec_loss_type == 'mse': rec_loss = paddle.nn.functional.mse_loss( student_tensor, @@ -804,6 +883,202 @@ def get_loss(self, student_tensor, teacher_tensor, scheduler): return total_loss, rec_loss, round_loss +class PriorityQueue: + def __init__(self): + self._data = [] + self._ops = set() + self._idx = 0 + self._lazy_tag = True + + def pop(self): + if not self._lazy_tag: + self._data = sorted(self._data, key=lambda x: x[0]) + self._lazy_tag = True + if self._idx >= len(self._data): raise IndexError('Index out of range!') + ele = self._data[self._idx] + self._idx += 1 + return ele + + def push(self, depth, op): + if op in self._ops: return + self._data.append((depth, op)) + self._ops.add(op) + self._lazy_tag = False + + def empty(self): + return self._idx >= len(self._data) + + +class RegionBuilder(object): + def __init__(self, program): + self._program = program + self._graph = GraphWrapper(self._program) + self._op_idx_map = {} + for op in self._graph.ops(): + self._op_idx_map[op.idx()] = op + self._depth = {} + self._init_depth() + self._cache = {} + self._regions = [] + self._region_weights_names = [] + + def _init_depth(self): + for op in self._graph.ops(): + if len(self._graph.pre_ops(op)) == 0: + self._depth[op.idx()] = 0 + continue + + depths_cache = [] + for up_op in self._graph.pre_ops(op): + assert up_op.idx() in self._depth + depths_cache.append(self._depth[up_op.idx()]) + self._depth[op.idx()] = max(depths_cache) + 1 + + def _build(self, op, limit): + def _find_multi_input_ep(op): + least_first_queue = PriorityQueue() + + for down_op in self._graph.next_ops(op): + least_first_queue.push(self._depth[down_op.idx()], + down_op.idx()) + + while not least_first_queue.empty(): + iter_op_idx = least_first_queue.pop()[-1] + iter_op = self._op_idx_map[iter_op_idx] + if (least_first_queue.empty() and + len(self._graph.pre_ops(iter_op)) > 1): + return iter_op + for down_op in self._graph.next_ops(iter_op): + least_first_queue.push(self._depth[down_op.idx()], + down_op.idx()) + return None + + def _find_coherent_ep(op): + ops = self._graph.next_ops(op) + if len(ops) == 1: + following_op = ops[0] + if following_op.type() == 'fetch': + return None + inps = op.all_inputs() + non_parameter_input = 0 + for var in inps: + if not var._var.persistable: + non_parameter_input += 1 + upstream_ops = len(self._graph.pre_ops(following_op)) + if non_parameter_input == 1 and upstream_ops == 1: + return ops[0] + return None + + sp, ep, future_ep = op, op, op + while future_ep is not None: + if len(self._graph.next_ops(ep)) <= 1: + future_ep = _find_coherent_ep(ep) + else: + future_ep = _find_multi_input_ep(ep) + + if future_ep is None or self._depth[future_ep.idx()] - self._depth[ + sp.idx()] >= limit: + return self._create_region(sp, ep) + ep = future_ep + + return self._create_region(sp=sp, ep=ep) + + def _opset_matching(self, sp, ep): + + if sp.idx() in self._cache: return self._cache[sp.idx()] + + ret_collection = set() + + following_ops = self._graph.next_ops(sp) + + if (len(following_ops)) == 0: + return ret_collection.add(sp.idx()) + + for op in following_ops: + if op == ep: + ret_collection.update([sp.idx(), op.idx()]) + else: + further_res = self._opset_matching(sp=op, ep=ep) + + if further_res is None: + return None + + if len(further_res) > 0: + ret_collection.update(further_res) + ret_collection.add(sp.idx()) + self._cache[sp.idx()] = ret_collection + return ret_collection + + def opset_matching(self, sp, ep): + + ret_collection, candidates = set(), set() + for op in self._graph.ops(): + if op == sp: + candidates.add(op.idx()) + for idx in candidates: + op = self._op_idx_map[idx] + partial_matchings = self._opset_matching(sp=op, ep=ep) + if partial_matchings is None: + return None + if len(partial_matchings) > 0: + ret_collection.update(partial_matchings) + self._cache.clear() + return ret_collection + + def _create_region(self, sp, ep): + rps = self.opset_matching(sp, ep) + return sp, ep, rps + + def _create_regions(self, limit): + visited = [] + for op in self._graph.ops(): + region = [] + region_weight_names = [] + if op.type() == 'fill_constant': continue + if op.type() == 'feed': continue + if op.type() == 'fetch': continue + if op.idx() in visited: continue + + sp, ep, rps = self._build(op=op, limit=limit) + if rps is None: + continue + ops = [self._op_idx_map[idx] for idx in rps] + + # add region's input var + inps = sp.all_inputs() + for var in inps: + if not var._var.persistable: + region.append(var._var.name) + break + + # add region's output var + if ep.type() == 'batch_norm': + out_var = ep.outputs('Y') + else: + out_var = ep.all_outputs() + if not out_var[0]._var.persistable: + region.append(out_var[0]._var.name) + + for idx in rps: + visited.append(idx) + op = self._op_idx_map[idx] + if op.type() not in [ + "conv2d", "depthwise_conv2d", "mul", "matmul", + "matmul_v2" + ]: + continue + inps = op.all_inputs() + for var in inps: + if var._var.persistable: + region_weight_names.append(var._var.name) + + if len(region) < 2 or len(region_weight_names) < 1: continue + self._regions.append(region) + self._region_weights_names.append(region_weight_names) + + return self._regions, self._region_weights_names + + def quant_recon_static(executor, model_dir, quantize_model_path, @@ -823,11 +1098,8 @@ def quant_recon_static(executor, hist_percent=0.9999, bias_correction=False, quantizable_op_type=[ - "conv2d", - "depthwise_conv2d", - "mul", - "matmul", - "matmul_v2", + "conv2d", "depthwise_conv2d", "mul", "matmul", + "matmul_v2" ], is_full_quantize=False, weight_bits=8, @@ -842,15 +1114,14 @@ def quant_recon_static(executor, regions=None, region_weights_names=None, epochs=20, - scale_trainable=False, drop_prob=0.5, - lr=0.1): + lr=0.1, + limit=6): """ The function utilizes static post training quantization method to quantize the fp32 model. It uses calibrate data to calculate the scale factor of quantized variables, and inserts fake quantization and dequantization operators to obtain the quantized model. - Args: executor(paddle.static.Executor): The executor to load, run and save the quantized model. @@ -918,9 +1189,8 @@ def quant_recon_static(executor, skip_tensor_list(list): List of skip quant tensor name. is_use_cache_file(bool): This param is deprecated. cache_dir(str): This param is deprecated. - epochs: The number of steps in the reconstruction proces. Default is 20. - scale_trainable: Wether weight‘s scale is trainable. Default is False. - drop_prob: The dropout probability of activation quantization, and it is valid only if + epochs(int): The number of steps in the reconstruction proces. Default is 20. + drop_prob(float): The dropout probability of activation quantization, and it is valid only if simulate_activation_quant is True. Default is 0.5. regions(list[list], optional): The list of some regions, each region is a subgraph of fp32 program and it will have exact 1 input operation and 1 output operation. When @@ -928,6 +1198,7 @@ def quant_recon_static(executor, Default is None. region_weights_names(list[list], optional): The weight names inside every region. Default is None. + limit(int): The size of each region. Default is 6. Returns: None """ @@ -963,8 +1234,8 @@ def quant_recon_static(executor, regions=regions, region_weights_names=region_weights_names, epochs=epochs, - scale_trainable=scale_trainable, - lr=lr) + lr=lr, + limit=limit) reconstruction_quantization = ReconstructionQuantization( PTQCollections=PTQCollections, RSQCollections=RSQCollections) diff --git a/requirements.txt b/requirements.txt index 9b645c7ca..770c5f26d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,4 +4,5 @@ matplotlib pillow pyyaml scikit-learn -swig \ No newline at end of file +swig +opencv-python==4.6.0.66 diff --git a/setup.py b/setup.py index f47a47522..bc2842802 100644 --- a/setup.py +++ b/setup.py @@ -27,7 +27,7 @@ else: tag_list = subprocess.getoutput('git tag').split('\n') if 'rc' in tag_list[-1]: - if tag_list[-1].split('-')[0] == tag_list[-2]: + if tag_list[-1].split('rc')[0] in tag_list[-2]: slim_version = tag_list[-2] else: slim_version = tag_list[-1] diff --git a/tests/act/qat_dist_train.yaml b/tests/act/qat_dist_train.yaml index 166bc173f..82266ed79 100644 --- a/tests/act/qat_dist_train.yaml +++ b/tests/act/qat_dist_train.yaml @@ -1,8 +1,9 @@ # For unittests -Quantization: +QuantAware: quantize_op_types: - conv2d - depthwise_conv2d + onnx_format: True Distillation: alpha: 1.0 diff --git a/tests/act/test_act_api.py b/tests/act/test_act_api.py index fc7cbf029..fc8be17e9 100644 --- a/tests/act/test_act_api.py +++ b/tests/act/test_act_api.py @@ -119,6 +119,7 @@ def test_compress(self): train_dataloader=train_loader, eval_dataloader=train_loader) # eval_function to verify accuracy ac.compress() + ac.export_onnx() class TestLoadONNXModel(ACTBase): @@ -152,5 +153,55 @@ def test_compress(self): deploy_backend='tensorrt') +class TestDictPTQ(ACTBase): + def __init__(self, *args, **kwargs): + super(TestDictPTQ, self).__init__(*args, **kwargs) + + def test_compress(self): + image = paddle.static.data( + name='data', shape=[-1, 3, 32, 32], dtype='float32') + train_loader = paddle.io.DataLoader( + self.eval_dataset, + feed_list=[image], + batch_size=4, + return_list=False) + ac = AutoCompression( + model_dir=self.tmpdir.name, + model_filename="infer.pdmodel", + params_filename="infer.pdiparams", + save_dir="output", + config={'QuantPost': {}}, + train_dataloader=train_loader, + eval_dataloader=train_loader + ) # eval_function to verify accuracy + ac.compress() + + +class TestDictPTQRecon(ACTBase): + def __init__(self, *args, **kwargs): + super(TestDictPTQRecon, self).__init__(*args, **kwargs) + + def test_compress(self): + image = paddle.static.data( + name='data', shape=[-1, 3, 32, 32], dtype='float32') + train_loader = paddle.io.DataLoader( + self.eval_dataset, + feed_list=[image], + batch_size=4, + return_list=False) + ac = AutoCompression( + model_dir=self.tmpdir.name, + model_filename="infer.pdmodel", + params_filename="infer.pdiparams", + save_dir="output", + config={'QuantPost': { + 'recon_level': 'layer-wise' + }}, + train_dataloader=train_loader, + eval_dataloader=train_loader + ) # eval_function to verify accuracy + ac.compress() + + if __name__ == '__main__': unittest.main() diff --git a/tests/act/test_act_prune.py b/tests/act/test_act_prune.py new file mode 100644 index 000000000..c8711da43 --- /dev/null +++ b/tests/act/test_act_prune.py @@ -0,0 +1,278 @@ +import os +import sys +import numpy as np +from tqdm import tqdm +import unittest +sys.path.append("../../") +import paddle +from PIL import Image +from paddle.vision.datasets import DatasetFolder +from paddle.vision.transforms import transforms +from paddleslim.auto_compression import AutoCompression +from paddleslim.auto_compression.analysis import analysis_prune +paddle.enable_static() + + +class ImageNetDataset(DatasetFolder): + def __init__(self, data_dir, image_size=224, mode='train'): + super(ImageNetDataset, self).__init__(data_dir) + self.data_dir = data_dir + normalize = transforms.Normalize( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) + self.transform = transforms.Compose([ + transforms.Resize(256), transforms.CenterCrop(image_size), + transforms.Transpose(), normalize + ]) + self.mode = mode + train_file_list = os.path.join(data_dir, 'train_list.txt') + val_file_list = os.path.join(data_dir, 'val_list.txt') + self.mode = mode + if mode == 'train': + with open(train_file_list) as flist: + full_lines = [line.strip() for line in flist] + np.random.shuffle(full_lines) + lines = full_lines + self.samples = [line.split() for line in lines] + else: + with open(val_file_list) as flist: + lines = [line.strip() for line in flist] + self.samples = [line.split() for line in lines] + + def __getitem__(self, idx): + img_path, label = self.samples[idx] + if self.mode == 'train': + return self.transform( + Image.open(os.path.join(self.data_dir, img_path)).convert( + 'RGB')) + else: + return self.transform( + Image.open(os.path.join(self.data_dir, img_path)).convert( + 'RGB')), np.array([label]).astype('int64') + + def __len__(self): + return len(self.samples) + + +def eval_func(program, exe, feed_names, fetch_list, dataloader): + results = [] + with tqdm( + total=len(dataloader), + bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', + ncols=80) as t: + for batch_id, data in enumerate(dataloader): + image = data[0]['inputs'] + label = data[0]['labels'] + # top1_acc, top5_acc + if len(feed_names) == 1: + image = np.array(image) + label = np.array(label).astype('int64') + pred = exe.run(program, + feed={feed_names[0]: image}, + fetch_list=fetch_list) + pred = np.array(pred[0]) + label = np.array(label) + sort_array = pred.argsort(axis=1) + top_1_pred = sort_array[:, -1:][:, ::-1] + top_1 = np.mean(label == top_1_pred) + top_5_pred = sort_array[:, -5:][:, ::-1] + acc_num = 0 + for i in range(len(label)): + if label[i][0] in top_5_pred[i]: + acc_num += 1 + top_5 = float(acc_num) / len(label) + results.append([top_1, top_5]) + else: + image = np.array(image) + label = np.array(label).astype('int64') + result = exe.run( + program, + feed={feed_names[0]: image, + feed_names[1]: label}, + fetch_list=fetch_list) + result = [np.mean(r) for r in result] + results.append(result) + t.update() + result = np.mean(np.array(results), axis=0) + return result[0] + + +class ACTChannelPrune(unittest.TestCase): + def __init__(self, *args, **kwargs): + super(ACTChannelPrune, self).__init__(*args, **kwargs) + if not os.path.exists('MobileNetV1_infer'): + os.system( + 'wget -q https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar' + ) + os.system('tar -xf MobileNetV1_infer.tar') + if not os.path.exists('ILSVRC2012_data_demo'): + os.system( + 'wget -q https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz' + ) + os.system('tar -xf ILSVRC2012_data_demo.tar.gz') + + self.train_dataloader, self.eval_dataloader = self.create_dataloader() + + def create_dataloader(self): + train_dataset = ImageNetDataset("./ILSVRC2012_data_demo/ILSVRC2012/") + image = paddle.static.data( + name='inputs', shape=[None] + [3, 224, 224], dtype='float32') + label = paddle.static.data( + name='labels', shape=[None] + [1], dtype='float32') + train_dataloader = paddle.io.DataLoader( + train_dataset, + feed_list=[image], + batch_size=32, + shuffle=True, + num_workers=0, + return_list=False) + + def eval_reader(data_dir, + batch_size, + crop_size, + resize_size, + place=None): + val_dataset = ImageNetDataset( + "./ILSVRC2012_data_demo/ILSVRC2012/", mode='val') + val_loader = paddle.io.DataLoader( + val_dataset, + feed_list=[image, label], + batch_size=batch_size, + shuffle=False, + drop_last=False, + num_workers=0, + return_list=False) + return val_loader + + val_loader = eval_reader( + './ILSVRC2012_data_demo/ILSVRC2012/', + batch_size=32, + crop_size=224, + resize_size=256) + return train_dataloader, val_loader + + def get_analysis(self): + def eval_function(compiled_test_program, exe, test_feed_names, + test_fetch_list): + res = eval_func(compiled_test_program, exe, test_feed_names, + test_fetch_list, self.eval_dataloader) + return res + + ratios = analysis_prune(eval_function, './MobileNetV1_infer', + 'inference.pdmodel', 'inference.pdiparams', + 'senti.data', [0.1], 0.05) + return ratios + + def test_ac_prune_name_is_None(self): + def eval_function(exe, compiled_test_program, test_feed_names, + test_fetch_list): + res = eval_func(compiled_test_program, exe, test_feed_names, + test_fetch_list, self.eval_dataloader) + return res + + configs = { + 'Distillation': {}, + 'ChannelPrune': { + 'pruned_ratio': 0.1 + }, + 'TrainConfig': { + 'epochs': 1, + 'eval_iter': 1000, + 'learning_rate': 5.0e-03, + 'optimizer_builder': { + 'optimizer': { + 'type': 'SGD' + }, + "weight_decay": 0.0005, + } + } + } + + ac = AutoCompression( + model_dir='./MobileNetV1_infer', + model_filename="inference.pdmodel", + params_filename="inference.pdiparams", + save_dir="prune_output", + config=configs, + train_dataloader=self.train_dataloader, + eval_callback=eval_function) # eval_function to verify accuracy + ac.compress() + os.system('rm -rf prune_output') + + def test_ac_prune(self): + ratios = self.get_analysis() + + def eval_function(exe, compiled_test_program, test_feed_names, + test_fetch_list): + res = eval_func(compiled_test_program, exe, test_feed_names, + test_fetch_list, self.eval_dataloader) + return res + + configs = { + 'Distillation': {}, + 'TrainConfig': { + 'epochs': 1, + 'eval_iter': 1000, + 'learning_rate': 5.0e-03, + 'optimizer_builder': { + 'optimizer': { + 'type': 'SGD' + }, + "weight_decay": 0.0005, + } + } + } + configs.update({ + 'ChannelPrune': { + 'prune_params_name': list(ratios.keys()) + } + }) + configs['ChannelPrune'].update({'pruned_ratio': list(ratios.values())}) + + ac = AutoCompression( + model_dir='./MobileNetV1_infer', + model_filename="inference.pdmodel", + params_filename="inference.pdiparams", + save_dir="prune_output", + config=configs, + train_dataloader=self.train_dataloader, + eval_callback=eval_function) # eval_function to verify accuracy + ac.compress() + os.system('rm -rf prune_output') + + def test_ac_sparse(self): + def eval_function(exe, compiled_test_program, test_feed_names, + test_fetch_list): + res = eval_func(compiled_test_program, exe, test_feed_names, + test_fetch_list, self.eval_dataloader) + return res + + configs = { + 'Distillation': {}, + 'ASPPrune': {}, + 'TrainConfig': { + 'epochs': 1, + 'eval_iter': 1000, + 'learning_rate': 5.0e-03, + 'optimizer_builder': { + 'optimizer': { + 'type': 'SGD' + }, + "weight_decay": 0.0005, + } + } + } + + ac = AutoCompression( + model_dir='./MobileNetV1_infer', + model_filename="inference.pdmodel", + params_filename="inference.pdiparams", + save_dir="asp_output", + config=configs, + train_dataloader=self.train_dataloader, + eval_callback=eval_function) # eval_function to verify accuracy + ac.compress() + os.system('rm -rf asp_output') + + +if __name__ == '__main__': + unittest.main() diff --git a/tests/act/test_demo.py b/tests/act/test_demo.py index 600e0b680..e4e30c61b 100644 --- a/tests/act/test_demo.py +++ b/tests/act/test_demo.py @@ -56,7 +56,7 @@ def test_demo(self): params_filename="inference.pdiparams", save_dir="MobileNetV1_quant", config={ - 'Quantization': {}, + 'QuantPost': {}, "HyperParameterOptimization": { 'ptq_algo': ['avg'], 'max_quant_count': 3 diff --git a/tests/test_reconstruct_quantization.py b/tests/test_reconstruct_quantization.py index b1582c3c5..d98cbbd4c 100755 --- a/tests/test_reconstruct_quantization.py +++ b/tests/test_reconstruct_quantization.py @@ -14,44 +14,51 @@ import sys sys.path.append("../") import unittest +import tempfile import paddle from paddleslim.quant import quant_post_static from static_case import StaticCase sys.path.append("../demo") -from models import MobileNet +from models import * from layers import conv_bn_layer import paddle.dataset.mnist as reader import numpy as np from paddleslim.quant import quant_recon_static -class TestRoundingOptimizer(StaticCase): +class ReconPTQ(unittest.TestCase): def __init__(self, *args, **kwargs): - super(TestRoundingOptimizer, self).__init__(*args, **kwargs) + super(ReconPTQ, self).__init__(*args, **kwargs) paddle.enable_static() + self.tmpdir = tempfile.TemporaryDirectory(prefix="test_") self._gen_model() def _gen_model(self): - image = paddle.static.data( - name='image', shape=[None, 1, 28, 28], dtype='float32') - label = paddle.static.data(name='label', shape=[None, 1], dtype='int64') - model = MobileNet() - out = model.net(input=image, class_dim=10) - cost = paddle.nn.functional.loss.cross_entropy(input=out, label=label) - avg_cost = paddle.mean(x=cost) - acc_top1 = paddle.metric.accuracy(input=out, label=label, k=1) - acc_top5 = paddle.metric.accuracy(input=out, label=label, k=5) - optimizer = paddle.optimizer.Momentum( - momentum=0.9, - learning_rate=0.01, - weight_decay=paddle.regularizer.L2Decay(4e-5)) - optimizer.minimize(avg_cost) - main_prog = paddle.static.default_main_program() - val_prog = main_prog.clone(for_test=True) place = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda( ) else paddle.CPUPlace() exe = paddle.static.Executor(place) - exe.run(paddle.static.default_startup_program()) + main_program = paddle.static.Program() + startup_program = paddle.static.Program() + with paddle.static.program_guard(main_program, startup_program): + image = paddle.static.data( + name='image', shape=[None, 1, 28, 28], dtype='float32') + label = paddle.static.data( + name='label', shape=[None, 1], dtype='int64') + model = MobileNetV2() + out = model.net(input=image, class_dim=10) + cost = paddle.nn.functional.loss.cross_entropy( + input=out, label=label) + avg_cost = paddle.mean(x=cost) + acc_top1 = paddle.metric.accuracy(input=out, label=label, k=1) + acc_top5 = paddle.metric.accuracy(input=out, label=label, k=5) + + val_program = main_program.clone(for_test=True) + optimizer = paddle.optimizer.Momentum( + momentum=0.9, + learning_rate=0.01, + weight_decay=paddle.regularizer.L2Decay(4e-5)) + optimizer.minimize(avg_cost) + exe.run(startup_program) def transform(x): return np.reshape(x, [1, 28, 28]) @@ -95,64 +102,66 @@ def train(program): 'train iter={}, avg loss {}, acc_top1 {}, acc_top5 {}'. format(iter, cost, top1, top5)) - train(main_prog) + train(main_program) paddle.fluid.io.save_inference_model( - dirname='./test_rounding_optimizer', - feeded_var_names=[image.name, label.name], - target_vars=[avg_cost, acc_top1, acc_top5], - main_program=val_prog, + dirname=self.tmpdir.name, + feeded_var_names=[image.name], + target_vars=[out], + main_program=val_program, executor=exe, - model_filename='model', - params_filename='params') - + model_filename='model.pdmodel', + params_filename='params.pdiparams') + print(f"saved infer model to [{self.tmpdir.name}]") self.data_loader = sample_generator_creator() - self._regions = [['image', 'batch_norm_26.tmp_4']] - self._region_weights_names = [[ - 'conv1_weights', 'conv2_1_dw_weights', 'conv2_1_sep_weights', - 'conv2_2_dw_weights', 'conv2_2_sep_weights', 'conv3_1_dw_weights', - 'conv3_1_sep_weights', 'conv3_2_dw_weights', 'conv3_2_sep_weights', - 'conv4_1_dw_weights', 'conv4_1_sep_weights', 'conv4_2_dw_weights', - 'conv4_2_sep_weights', 'conv5_1_dw_weights', 'conv5_1_sep_weights', - 'conv5_2_dw_weights', 'conv5_2_sep_weights', 'conv5_3_dw_weights', - 'conv5_3_sep_weights', 'conv5_4_dw_weights', 'conv5_4_sep_weights', - 'conv5_5_dw_weights', 'conv5_5_sep_weights', 'conv5_6_dw_weights', - 'conv5_6_sep_weights', 'conv6_dw_weights', 'conv6_sep_weights' - ]] - - def test_qdrop(self): + def __del__(self): + self.tmpdir.cleanup() + + +class TestReconRegion(ReconPTQ): + def __init__(self, *args, **kwargs): + super(TestReconRegion, self).__init__(*args, **kwargs) + + def test_qdrop_region(self): place = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda( ) else paddle.CPUPlace() exe = paddle.static.Executor(place) quant_recon_static( exe, - './test_rounding_optimizer', - quantize_model_path='rsq_out', + self.tmpdir.name, + quantize_model_path='output_region', sample_generator=self.data_loader, - model_filename='model', - params_filename='params', - batch_nums=10, + model_filename='model.pdmodel', + params_filename='params.pdiparams', + batch_nums=1, + epochs=1, algo='abs_max', - regions=self._regions, - region_weights_names=self._region_weights_names, + regions=None, + region_weights_names=None, recon_level='region-wise', simulate_activation_quant=True) - def test_qdrop(self): + +class TestReconLayer(ReconPTQ): + def __init__(self, *args, **kwargs): + super(TestReconLayer, self).__init__(*args, **kwargs) + + def test_qdrop_layer(self): place = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda( ) else paddle.CPUPlace() exe = paddle.static.Executor(place) quant_recon_static( exe, - './test_rounding_optimizer', - quantize_model_path='rsq_out', + self.tmpdir.name, + quantize_model_path='output_layer', sample_generator=self.data_loader, - model_filename='model', - params_filename='params', - batch_nums=10, + model_filename='model.pdmodel', + params_filename='params.pdiparams', + batch_nums=1, + epochs=1, algo='KL', - regions=self._regions, - region_weights_names=self._region_weights_names, + regions=None, + region_weights_names=None, recon_level='layer-wise', simulate_activation_quant=True, bias_correction=True) diff --git a/tests/test_sensitivity.py b/tests/test_sensitivity.py index 94857ee19..9b89e8266 100644 --- a/tests/test_sensitivity.py +++ b/tests/test_sensitivity.py @@ -61,10 +61,10 @@ def eval_func(program): print("acc_val_mean: {}".format(acc_val_mean)) return acc_val_mean - def eval_func_for_args(args): - program = args[0] - feeder = fluid.DataFeeder( - feed_list=['image', 'label'], place=place, program=program) + def eval_func_for_args(program, feed_list): + feeder = paddle.fluid.DataFeeder( + feed_list=feed_list, place=place, program=program) + acc_set = [] for data in val_reader(): acc_np = exe.run(program=program, @@ -93,7 +93,7 @@ def eval_func_for_args(args): eval_program, place, ["conv4_weights"], eval_func_for_args, - eval_args=[eval_program], + eval_args=[['image', 'label']], sensitivities_file="./sensitivites_file_params", pruned_ratios=[0.1, 0.2, 0.3, 0.4]) diff --git a/tests/test_skd_loss.py b/tests/test_skd_loss.py new file mode 100644 index 000000000..19a07b345 --- /dev/null +++ b/tests/test_skd_loss.py @@ -0,0 +1,81 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License" +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys +sys.path.append("../") +import unittest +import paddle +from paddleslim.dist import merge, skd +from layers import conv_bn_layer +from static_case import StaticCase + + +class TestSKDLoss(StaticCase): + def test_skd_loss(self): + place = paddle.CPUPlace() + exe = paddle.static.Executor(place) + + student_program = paddle.static.Program() + student_startup = paddle.static.Program() + with paddle.static.program_guard(student_program, student_startup): + with paddle.utils.unique_name.guard(): + input = paddle.static.data( + name="image", shape=[None, 3, 224, 224]) + conv1 = conv_bn_layer(input, 8, 3, "conv1") + conv2 = conv_bn_layer(conv1, 8, 3, "conv2") + student_predict = conv1 + conv2 + + teacher_program = paddle.static.Program() + teacher_startup = paddle.static.Program() + with paddle.static.program_guard(teacher_program, teacher_startup): + with paddle.utils.unique_name.guard(): + input = paddle.static.data( + name="image", shape=[None, 3, 224, 224]) + conv1 = conv_bn_layer(input, 8, 3, "conv1") + conv2 = conv_bn_layer(conv1, 8, 3, "conv2") + sum1 = conv1 + conv2 + conv3 = conv_bn_layer(sum1, 8, 3, "conv3") + conv4 = conv_bn_layer(conv3, 8, 3, "conv4") + sum2 = conv4 + sum1 + conv5 = conv_bn_layer(sum2, 8, 3, "conv5") + teacher_predict = conv_bn_layer(conv5, 8, 3, "conv6") + + exe.run(teacher_startup) + exe.run(student_startup) + + data_name_map = {'image': 'image'} + merge(teacher_program, student_program, data_name_map, place) + merged_ops = [] + for block in student_program.blocks: + for op in block.ops: + merged_ops.append(op.type) + with paddle.static.program_guard(student_program, student_startup): + distill_loss = skd('teacher_' + teacher_predict.name, + student_predict.name, + program=None, + multiplier=None) + + loss_ops = [] + for block in student_program.blocks: + for op in block.ops: + loss_ops.append(op.type) + print(f"ret: {set(loss_ops).difference(set(merged_ops))}") + self.assertTrue(set(merged_ops).difference(set(loss_ops)) == set()) + + self.assertTrue({ + 'softmax_with_cross_entropy', 'softmax', 'reduce_mean', 'layer_norm' + }.issubset(set(loss_ops).difference(set(merged_ops)))) + + +if __name__ == '__main__': + unittest.main() diff --git a/tests/test_soft_label_loss.py b/tests/test_soft_label_loss.py index 64544aa67..2e0bf8c8b 100644 --- a/tests/test_soft_label_loss.py +++ b/tests/test_soft_label_loss.py @@ -54,9 +54,12 @@ def test_soft_label_loss(self): for block in paddle.static.default_main_program().blocks: for op in block.ops: loss_ops.append(op.type) + print(f"ret: {set(loss_ops).difference(set(merged_ops))}") self.assertTrue(set(merged_ops).difference(set(loss_ops)) == set()) - self.assertTrue({'cross_entropy', 'softmax', 'reduce_mean'}.issubset( - set(loss_ops).difference(set(merged_ops)))) + + self.assertTrue({ + 'softmax_with_cross_entropy', 'softmax', 'reduce_mean' + }.issubset(set(loss_ops).difference(set(merged_ops)))) if __name__ == '__main__':