【輕量化】三個(gè)經(jīng)典輕量化網(wǎng)絡(luò)解讀

發(fā)布人：地平線開發(fā)者時(shí)間：2025-04-09 來源：工程師

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對面交流
  海量資料庫查詢

1. 引言

模型壓縮常用的方案包括量化、蒸餾、輕量化網(wǎng)絡(luò)、網(wǎng)絡(luò)剪枝（稀疏化）等，詳細(xì)介紹可見文章：模型壓縮理論簡介及剪枝與稀疏化在 J5 上實(shí)踐。最近在學(xué)習(xí)地平線提供的輕量化網(wǎng)絡(luò)結(jié)構(gòu) HENet，結(jié)合幾年前整理的 mobilenetv3、Efficnertnet 放在一塊進(jìn)行介紹。

輕量化網(wǎng)絡(luò)旨在減少模型參數(shù)和計(jì)算量，同時(shí)保持較高準(zhǔn)確率。為了降低設(shè)備能耗，提升實(shí)時(shí)性，輕量化網(wǎng)絡(luò)結(jié)構(gòu)在嵌入式設(shè)備等資源受限環(huán)境中廣泛應(yīng)用。

2. 經(jīng)典輕量化網(wǎng)絡(luò)結(jié)構(gòu)

2.1 MobileNetV3

Description

SE 模塊類似注意力機(jī)制，通過全局平均池化和兩個(gè)全連接層，計(jì)算每個(gè)通道的權(quán)重系數(shù)，自適應(yīng)調(diào)整特征。SE 模塊細(xì)節(jié)介紹如下

Description

此外，還更換激活函數(shù)為 hardswish 和 relu，前者計(jì)算速度快且對量化過程友好，最后 1x1 降維投影層使用線性激活，整體提升計(jì)算效率和量化友好性。具體代碼介紹，可見文章：【MobileNetV3】MobileNetV3 網(wǎng)絡(luò)結(jié)構(gòu)詳解。

2.2 EfficientNet

1. 總體介紹：利用 NAS 技術(shù)，綜合考慮輸入分辨率、網(wǎng)絡(luò)深度和寬度，平衡三者關(guān)系，構(gòu)建高效網(wǎng)絡(luò)。通過調(diào)整寬度系數(shù)和深度系數(shù)，改變網(wǎng)絡(luò)的通道數(shù)和層數(shù)，有 EfficientNet-B0 到 B7 多個(gè)變體，EfficientNet-B0 作為基礎(chǔ)版本，B1 - B7 在其基礎(chǔ)上逐漸增加復(fù)雜度和性能。

2. MBConv 結(jié)構(gòu)：包含 1x1 普通卷積（升維）、kxk 深度卷積（3x3 或 5x5）、SE 模塊、1x1 普通卷積（降維）和 Dropout 層。SE 模塊中第一個(gè)全連接層節(jié)點(diǎn)個(gè)數(shù)是輸入特征矩陣通道數(shù)的 1/4，使用 Swish 激活函數(shù)；第二個(gè)全連接層節(jié)點(diǎn)個(gè)數(shù)等于深度卷積層輸出通道數(shù)，使用 Sigmoid 激活函數(shù)。

Description

具體代碼介紹，可見文章：【EfficientNet】EfficientNet 網(wǎng)絡(luò)結(jié)構(gòu)及代碼詳解。

3. HENet：地平線的高效輕量化網(wǎng)絡(luò)

理論部分，地平線開發(fā)者社區(qū) 介紹的很好！下面不會過多介紹，重點(diǎn)在代碼使用。

HENet（Hybrid Efficient Network）是針對地平線征程 6 系列芯片設(shè)計(jì)的高效網(wǎng)絡(luò)。

3.1 HENet_TinyM 理論簡介

采用純 CNN 架構(gòu)，分為四個(gè) stage，每個(gè) stage 進(jìn)行一次 2 倍下采樣。通過不同的參數(shù)配置，如 depth、block_cls、width 等，構(gòu)建高效的特征提取網(wǎng)絡(luò)。

DWCB：主分支使用 3x3 深度卷積融合空間信息，兩個(gè)連續(xù)的點(diǎn)卷積融合通道信息，借鑒 transformer 架構(gòu)，在殘差分支添加可學(xué)習(xí)的 layer_scale，平衡性能與計(jì)算量。

Description

GroupDWCB：基于 DWCB 改進(jìn)，將主分支第一個(gè)點(diǎn)卷積改為點(diǎn)分組卷積，在特定條件下可實(shí)現(xiàn)精度無損且提速（實(shí)驗(yàn)中觀察到，當(dāng)滿足 ① channel 數(shù)量不太小 ② 較淺的位層兩個(gè)條件時(shí)，GroupDWCB 可以達(dá)到精度無損，同時(shí)提速的效果），在 TinyM 的第二個(gè) stage 使用（g = 2）。

Description

AltDWCB：DWCB 的變種，將深度卷積核改為（1，5）或（5，1）交替使用，在第三個(gè) stage 使用可提升性能，適用于層數(shù)較多的 stage。

Description

2.下采樣方式：S2DDown 使用 space to depth 操作降采樣，利用征程 6 系列芯片對 tensor layout 操作的高效支持，快速完成降采樣，改變特征的空間和通道維度。（自己設(shè)計(jì)時(shí)，謹(jǐn)慎使用 S2DDown 降采樣方法。)

Description

自行構(gòu)建有效基礎(chǔ) block：構(gòu)建 baseline 時(shí)，可先使用 DWCB，再嘗試 GroupDWCB/AltDWCB 結(jié)構(gòu)提升性能。

3.2 性能/精度數(shù)據(jù)對比

從幀率和精度數(shù)據(jù)來看，HENet_TinyM 和 HENet_TinyE 在 J6 系列芯片上表現(xiàn)出色，與其他經(jīng)典輕量化網(wǎng)絡(luò)相比，在保證精度的同時(shí)，具有更高的幀率，更適合實(shí)際應(yīng)用。

Description

3.3 HENet_TinyM 代碼詳解

HENet 源碼在地平線 docker 路徑：/usr/local/lib/python3.10/dist-packages/hat/models/backbones/henet.py

HENet_TinyM 總體分為四個(gè) stage，每個(gè) stage 會進(jìn)行一次 2 倍下采樣。以下是總體的結(jié)構(gòu)配置：

# ---------------------- TinyM ----------------------
depth = [4, 3, 8, 6]
block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"]
width = [64, 128, 192, 384]
attention_block_num = [0, 0, 0, 0]
mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2
act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"]
use_layer_scale = [True, True, True, True]
final_expand_channel, feature_mix_channel = 0, 1024
down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"]

參數(shù)含義：

depth：每個(gè) stage 包含的 block 數(shù)量

block_cls：每個(gè) stage 使用的基礎(chǔ) block 類型

width：每個(gè) stage 中 block 的輸出 channel 數(shù)

attention_block_num：每個(gè) stage 中的 attention_block 數(shù)量，將用在 stage 的尾部（TinyM 中沒有用到）

mlp_ratios：每個(gè) stage 中的 mlp 的通道擴(kuò)增系數(shù)

act_layer：每個(gè) stage 使用的激活函數(shù)

use_layer_scale：是否對 residual 分支進(jìn)行可學(xué)習(xí)的縮放

final_expand_channel：在網(wǎng)絡(luò)尾部的 pooling 之前進(jìn)行 channel 擴(kuò)增的數(shù)量，0 代表不使用擴(kuò)增

feature_mix_channel ：在分類 head 之前進(jìn)行 channel 擴(kuò)增的數(shù)量

down_cls：每個(gè) stage 對應(yīng)的下采樣類型

代碼解讀：

from typing import Sequence, Tuple

import horizon_plugin_pytorch.nn as hnn
import torch
import torch.nn as nn
from horizon_plugin_pytorch.quantization import QuantStub
from torch.quantization import DeQuantStub

# 基礎(chǔ)模塊的代碼，可見地平線提供的OE docker 
# /usr/local/lib/python3.10/dist-packages/hat/models/base_modules/basic_henet_module.py
from basic_henet_module import (
    BasicHENetStageBlock,   # HENet 的基本階段塊
    S2DDown,                # 降采樣（downsampling）模塊
)
from basic_henet_module import ConvModule2d # 2D 卷積層模塊

# 繼承 torch.nn.Module，定義神經(jīng)網(wǎng)絡(luò)的標(biāo)準(zhǔn)方式
class HENet(nn.Module):
    """
    Module of HENet.

    Args:
        in_channels: The in_channels for the block.
        block_nums: Number of blocks in each stage.
        embed_dims: Output channels in each stage.
        attention_block_num: Number of attention blocks in each stage.
        mlp_ratios: Mlp expand ratios in each stage.
        mlp_ratio_attn: Mlp expand ratio in attention blocks.
        act_layer: activation layers type.
        use_layer_scale: Use a learnable scale factor in the residual branch.
        layer_scale_init_value: Init value of the learnable scale factor.
        num_classes: Number of classes for a Classifier.
        include_top: Whether to include output layer.
        flat_output: Whether to view the output tensor.
        extra_act: Use extra activation layers in each stage.
        final_expand_channel: Channel expansion before pooling.
        feature_mix_channel: Channel expansion is performed before head.
        block_cls: Basic block types in each stage.
        down_cls: Downsample block types in each stage.
        patch_embed: Stem conv style in the very beginning.
        stage_out_norm: Add a norm layer to stage outputs.
            Ignored if include_top is True.
    """

    def __init__(
        self,
        in_channels: int,       # 輸入圖像的通道數(shù)（常見圖像為 3）
        block_nums: Tuple[int], # 每個(gè)階段（Stage）的基礎(chǔ)塊（Block）數(shù)量
        embed_dims: Tuple[int], # 每個(gè)階段的特征通道數(shù)
        attention_block_num: Tuple[int],    # 每個(gè)階段的注意力塊（Attention Block）數(shù)量
        mlp_ratios: Tuple[int] = (2, 2, 2, 2),  # 多層感知機(jī)（MLP）擴(kuò)展比率
        mlp_ratio_attn: int = 2,
        act_layer: Tuple[str] = ("nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"),   # 激活函數(shù)類型
        use_layer_scale: Tuple[bool] = (True, True, True, True),
        layer_scale_init_value: float = 1e-5,
        num_classes: int = 1000,
        include_top: bool = True,   # 是否包含最終的分類頭（通常為 nn.Linear）
        flat_output: bool = True,
        extra_act: Tuple[bool] = (False, False, False, False),
        final_expand_channel: int = 0,
        feature_mix_channel: int = 0,   
        block_cls: Tuple[str] = ("DWCB", "DWCB", "DWCB", "DWCB"),
        down_cls: Tuple[str] = ("S2DDown", "S2DDown", "S2DDown", "None"),
        patch_embed: str = "origin",    # 圖像預(yù)處理方式（卷積 embedding）
        stage_out_norm: bool = True,    # 是否在階段輸出后加一層 BatchNorm，建議不要
    ):
        super().__init__()

        self.final_expand_channel = final_expand_channel
        self.feature_mix_channel = feature_mix_channel
        self.stage_out_norm = stage_out_norm

        self.block_cls = block_cls

        self.include_top = include_top
        self.flat_output = flat_output

        if self.include_top:
            self.num_classes = num_classes

        # patch_embed 負(fù)責(zé)將輸入圖像轉(zhuǎn)換為特征
        # 里面有兩個(gè)convModule2d，進(jìn)行了兩次 3×3 的卷積（步長 stride=2），相當(dāng)于 對輸入圖像進(jìn)行 4 倍降采樣
        if patch_embed in ["origin"]:
            self.patch_embed = nn.Sequential(
                ConvModule2d(
                    in_channels,
                    embed_dims[0] // 2,
                    kernel_size=3,
                    stride=2,
                    padding=1,
                    norm_layer=nn.BatchNorm2d(embed_dims[0] // 2),
                    act_layer=nn.ReLU(),
                ),
                ConvModule2d(
                    embed_dims[0] // 2,
                    embed_dims[0],
                    kernel_size=3,
                    stride=2,
                    padding=1,
                    norm_layer=nn.BatchNorm2d(embed_dims[0]),
                    act_layer=nn.ReLU(),
                ),
            )

        stages = [] # 構(gòu)建多個(gè)階段 (Stages)，存放多個(gè) BasicHENetStageBlock，每個(gè)block處理不同通道數(shù)的特征。
        downsample_block = []   # 存放 S2DDown，在每個(gè)階段之間進(jìn)行降采樣。
        for block_idx, block_num in enumerate(block_nums):
            stages.append(
                BasicHENetStageBlock(
                    in_dim=embed_dims[block_idx],
                    block_num=block_num,
                    attention_block_num=attention_block_num[block_idx],
                    mlp_ratio=mlp_ratios[block_idx],
                    mlp_ratio_attn=mlp_ratio_attn,
                    act_layer=act_layer[block_idx],
                    use_layer_scale=use_layer_scale[block_idx],
                    layer_scale_init_value=layer_scale_init_value,
                    extra_act=extra_act[block_idx],
                    block_cls=block_cls[block_idx],
                )
            )
            if block_idx < len(block_nums) - 1:
                assert eval(down_cls[block_idx]) in [S2DDown], down_cls[
                    block_idx
                ]
                downsample_block.append(
                    eval(down_cls[block_idx])(
                        patch_size=2,
                        in_dim=embed_dims[block_idx],
                        out_dim=embed_dims[block_idx + 1],
                    )
                )
        self.stages = nn.ModuleList(stages)
        self.downsample_block = nn.ModuleList(downsample_block)

        if final_expand_channel in [0, None]:
            self.final_expand_layer = nn.Identity()
            self.norm = nn.BatchNorm2d(embed_dims[-1])
            last_channels = embed_dims[-1]
        else:
            self.final_expand_layer = ConvModule2d(
                embed_dims[-1],
                final_expand_channel,
                kernel_size=1,
                bias=False,
                norm_layer=nn.BatchNorm2d(final_expand_channel),
                act_layer=eval(act_layer[-1])(),
            )
            last_channels = final_expand_channel

        if feature_mix_channel in [0, None]:
            self.feature_mix_layer = nn.Identity()
        else:
            self.feature_mix_layer = ConvModule2d(
                last_channels,
                feature_mix_channel,
                kernel_size=1,
                bias=False,
                norm_layer=nn.BatchNorm2d(feature_mix_channel),
                act_layer=eval(act_layer[-1])(),
            )
            last_channels = feature_mix_channel

        # 分類頭
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) #  將特征圖變?yōu)?nbsp;1×1
            self.head = (
                nn.Linear(last_channels, num_classes)
                if num_classes > 0
                else nn.Identity()
            )
        else:
            stage_norm = []
            for embed_dim in embed_dims:
                if self.stage_out_norm is True:
                    stage_norm.append(nn.BatchNorm2d(embed_dim))
                else:
                    stage_norm.append(nn.Identity())
            self.stage_norm = nn.ModuleList(stage_norm)

        self.up = hnn.Interpolate(
            scale_factor=2, mode="bilinear", recompute_scale_factor=True
        )
        self.quant = QuantStub()
        self.dequant = DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        if isinstance(x, Sequence) and len(x) == 1:
            x = x[0]

        # 依次經(jīng)過 patch_embed、stages、downsample_block 處理特征圖。
        x = self.patch_embed(x)
        outs = []
        for idx in range(len(self.stages)):
            x = self.stages[idx](x)
            if not self.include_top:
                x_normed = self.stage_norm[idx](x)
                if idx == 0:
                    outs.append(self.up(x_normed))
                outs.append(x_normed)
            if idx < len(self.stages) - 1:
                x = self.downsample_block[idx](x)

        if not self.include_top:
            return outs

        if self.final_expand_channel in [0, None]:
            x = self.norm(x)
        else:
            x = self.final_expand_layer(x)
        x = self.avgpool(x)
        x = self.feature_mix_layer(x)
        x = self.head(torch.flatten(x, 1))

        x = self.dequant(x)
        if self.flat_output:
            x = x.view(-1, self.num_classes)
        return x

# ---------------------- TinyM ----------------------
depth = [4, 3, 8, 6]
block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"]
width = [64, 128, 192, 384]
attention_block_num = [0, 0, 0, 0]
mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2
act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"]
use_layer_scale = [True, True, True, True]
extra_act = [False, False, False, False]
final_expand_channel, feature_mix_channel = 0, 1024
down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"]
patch_embed = "origin"
stage_out_norm = False

# 初始化 HENet 模型
model = HENet(
    in_channels=3,  # 假設(shè)輸入是 RGB 圖像
    block_nums=tuple(depth),
    embed_dims=tuple(width),
    attention_block_num=tuple(attention_block_num),
    mlp_ratios=tuple(mlp_ratios),
    mlp_ratio_attn=mlp_ratio_attn,
    act_layer=tuple(act_layer),
    use_layer_scale=tuple(use_layer_scale),
    extra_act=tuple(extra_act),
    final_expand_channel=final_expand_channel,
    feature_mix_channel=feature_mix_channel,
    block_cls=tuple(block_cls),
    down_cls=tuple(down_cls),
    patch_embed=patch_embed,
    stage_out_norm=stage_out_norm,
    num_classes=1000,  # 假設(shè)用于 ImageNet 1000 類分類
    include_top=True,
)

# ---------------------- 處理單幀輸入數(shù)據(jù) ----------------------
# 生成一個(gè)隨機(jī)圖像張量，假設(shè)輸入是 224x224 RGB 圖像
input_tensor = torch.randn(1, 3, 224, 224)  # [batch, channels, height, width]

# ---------------------- 進(jìn)行推理 ----------------------
model.eval()
with torch.no_grad():  # 關(guān)閉梯度計(jì)算，提高推理速度
    output = model(input_tensor)

# ---------------------- 輸出結(jié)果 ----------------------
print("模型輸出形狀:", output.shape)
print("模型輸出類型:", type(output))
print("模型輸出長度:", len(output))
print(output)
print("預(yù)測類別索引:", torch.argmax(output, dim=1).item())  # 獲取最大概率的類別索引

# 輸出 FLOPs 和 參數(shù)量
from thop import profile
flops, params = profile(model, inputs=(input_tensor,))
print(f"FLOPs: {flops / 1e6:.2f}M")     # 以百萬次運(yùn)算（MFLOPs）顯示
print(f"Params: {params / 1e6:.2f}M")   # 以百萬參數(shù)（M）顯示

4. 基于 block 構(gòu)建網(wǎng)絡(luò)

可參考如下代碼構(gòu)建：

import torch

from torch import nn
from torch.quantization import DeQuantStub
from typing import Union, Tuple, Optional
from horizon_plugin_pytorch.nn.quantized import FloatFunctional as FF
from torch.nn.parameter import Parameter
from horizon_plugin_pytorch.quantization import QuantStub


class ChannelScale2d(nn.Module):
    """對 Conv2d 的輸出特征圖進(jìn)行線性縮放"""

    def __init__(self, num_features: int) -> None:
        super().__init__()
        self.num_features = num_features
        self.weight = Parameter(torch.ones(num_features))  # 初始化權(quán)重為1
        self.weight_quant = QuantStub()

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return input * self.weight_quant(self.weight).reshape(self.num_features, 1, 1)


class ConvModule2d(nn.Module):
    """標(biāo)準(zhǔn)的 2D 卷積塊，包含可選的歸一化層和激活層"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: Union[int, Tuple[int, int]],
        stride: Union[int, Tuple[int, int]] = 1,
        padding: Union[int, Tuple[int, int]] = 0,
        dilation: Union[int, Tuple[int, int]] = 1,
        groups: int = 1,
        bias: bool = True,
        padding_mode: str = "zeros",
        norm_layer: Optional[nn.Module] = None,
        act_layer: Optional[nn.Module] = None,
    ):
        super().__init__()
        layers = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)]
        if norm_layer:
            layers.append(norm_layer)
        if act_layer:
            layers.append(act_layer)
        self.block = nn.Sequential(*layers)

    def forward(self, x):
        return self.block(x)


class GroupDWCB(nn.Module):
    """分組深度可分離卷積塊"""

    def __init__(
        self,
        dim: int,
        hidden_dim: int,
        kernel_size: int = 3,
        act_layer: str = "nn.ReLU",
        use_layer_scale: bool = True,
        extra_act: Optional[bool] = False,
    ):
        super().__init__()

        self.extra_act = eval(act_layer)() if extra_act else nn.Identity()

        group_width_dict = {
            64: 64,
            128: 64,
            192: 64,
            384: 64,
            256: 128,
            48: 48,
            96: 48,
        }
        group_width = group_width_dict.get(dim, 64)

        self.dwconv = ConvModule2d(dim, dim, kernel_size=kernel_size, padding=kernel_size // 2, groups=dim, norm_layer=nn.BatchNorm2d(dim))
        self.pwconv1 = nn.Conv2d(dim, hidden_dim, kernel_size=1, groups=dim // group_width)
        self.act = eval(act_layer)()
        self.pwconv2 = nn.Conv2d(hidden_dim, dim, kernel_size=1)

        self.use_layer_scale = use_layer_scale
        if use_layer_scale:
            self.layer_scale = ChannelScale2d(dim)

        self.add = FF()

    def forward(self, x):
        
        input_x = x
        x = self.dwconv(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)

        if self.use_layer_scale:
            x = self.add.add(input_x, self.layer_scale(x))
        else:
            x = self.add.add(input_x, x)

        x = self.extra_act(x)
        return x


class CustomModel(nn.Module):
    """完整的模型"""

    def __init__(self, d_model=256, output_channels=2):
        super().__init__()

        self.encoder_layer = nn.Sequential(
            GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"),
            GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"),
        )

        self.out_layer = nn.Sequential(
            ConvModule2d(in_channels=d_model, out_channels=d_model, kernel_size=1),
            nn.BatchNorm2d(d_model),
            nn.ReLU(inplace=True),
            ConvModule2d(in_channels=d_model, out_channels=output_channels, kernel_size=1),
        )
        
        self.quant = QuantStub()
        self.dequant = DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.encoder_layer(x)
        x = self.out_layer(x)
        x = self.dequant(x)
        return x


# =================== 輸入?yún)?shù) =================== #
d_model = 64
output_channels = 10
model = CustomModel(d_model=d_model, output_channels=output_channels)
# 生成輸入
input_tensor = torch.randn(1, 64, 300, 200)
# 前向傳播
output = model(input_tensor)
print("The shape of output is:", output.shape)


# 輸出 FLOPs 和 參數(shù)量
from thop import profile
flops, params = profile(model, inputs=(input_tensor,))
print(f"FLOPs: {flops / 1e6:.2f}M")     # 以百萬次運(yùn)算（MFLOPs）顯示
print(f"Params: {params / 1e6:.2f}M")   # 以百萬參數(shù)（M）顯示

輸出信息如下：

The shape of output is: torch.Size([1, 10, 300, 200])
FLOPs: 1382.40M
Params: 0.02M

*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布，僅代表博主個(gè)人觀點(diǎn)，如有侵權(quán)請聯(lián)系工作人員刪除。

a一级爱做片免费观看欧美,久久国产一区二区,日本一二三区免费,久草视频手机在线观看

博客專欄

【輕量化】三個(gè)經(jīng)典輕量化網(wǎng)絡(luò)解讀

相關(guān)推薦

技術(shù)專區(qū)