【輕量化】三個(gè)經(jīng)典輕量化網(wǎng)絡(luò)解讀
1. 引言
模型壓縮常用的方案包括量化、蒸餾、輕量化網(wǎng)絡(luò)、網(wǎng)絡(luò)剪枝(稀疏化)等,詳細(xì)介紹可見文章:
。最近在學(xué)習(xí)地平線提供的輕量化網(wǎng)絡(luò)結(jié)構(gòu) HENet,結(jié)合幾年前整理的 mobilenetv3、Efficnertnet 放在一塊進(jìn)行介紹。輕量化網(wǎng)絡(luò)旨在減少模型參數(shù)和計(jì)算量,同時(shí)保持較高準(zhǔn)確率。為了降低設(shè)備能耗,提升實(shí)時(shí)性,輕量化網(wǎng)絡(luò)結(jié)構(gòu)在嵌入式設(shè)備等資源受限環(huán)境中廣泛應(yīng)用。
2. 經(jīng)典輕量化網(wǎng)絡(luò)結(jié)構(gòu)
2.1 MobileNetV3
SE 模塊類似注意力機(jī)制,通過全局平均池化和兩個(gè)全連接層,計(jì)算每個(gè)通道的權(quán)重系數(shù),自適應(yīng)調(diào)整特征。SE 模塊細(xì)節(jié)介紹如下
此外,還更換激活函數(shù)為 hardswish 和 relu,前者計(jì)算速度快且對量化過程友好,最后 1x1 降維投影層使用線性激活,整體提升計(jì)算效率和量化友好性。 具體代碼介紹,可見文章:
。2.2 EfficientNet
1. 總體介紹:利用 NAS 技術(shù),綜合考慮輸入分辨率、網(wǎng)絡(luò)深度和寬度,平衡三者關(guān)系,構(gòu)建高效網(wǎng)絡(luò)。通過調(diào)整寬度系數(shù)和深度系數(shù),改變網(wǎng)絡(luò)的通道數(shù)和層數(shù),有 EfficientNet-B0 到 B7 多個(gè)變體,EfficientNet-B0 作為基礎(chǔ)版本,B1 - B7 在其基礎(chǔ)上逐漸增加復(fù)雜度和性能。
2. MBConv 結(jié)構(gòu):包含 1x1 普通卷積(升維)、kxk 深度卷積(3x3 或 5x5)、SE 模塊、1x1 普通卷積(降維)和 Dropout 層。SE 模塊中第一個(gè)全連接層節(jié)點(diǎn)個(gè)數(shù)是輸入特征矩陣通道數(shù)的 1/4,使用 Swish 激活函數(shù);第二個(gè)全連接層節(jié)點(diǎn)個(gè)數(shù)等于深度卷積層輸出通道數(shù),使用 Sigmoid 激活函數(shù)。
具體代碼介紹,可見文章:
。3. HENet:地平線的高效輕量化網(wǎng)絡(luò)
理論部分,
介紹的很好!下面不會過多介紹,重點(diǎn)在代碼使用。
HENet(Hybrid Efficient Network)是針對地平線 征程 6 系列芯片設(shè)計(jì)的高效網(wǎng)絡(luò)。
3.1 HENet_TinyM 理論簡介
采用純 CNN 架構(gòu),分為四個(gè) stage,每個(gè) stage 進(jìn)行一次 2 倍下采樣。通過不同的參數(shù)配置,如 depth、block_cls、width 等,構(gòu)建高效的特征提取網(wǎng)絡(luò)。
DWCB:主分支使用 3x3 深度卷積融合空間信息,兩個(gè)連續(xù)的點(diǎn)卷積融合通道信息,借鑒 transformer 架構(gòu),在殘差分支添加可學(xué)習(xí)的 layer_scale,平衡性能與計(jì)算量。
GroupDWCB:基于 DWCB 改進(jìn),將主分支第一個(gè)點(diǎn)卷積改為點(diǎn)分組卷積,在特定條件下可實(shí)現(xiàn)精度無損且提速(實(shí)驗(yàn)中觀察到,當(dāng)滿足 ① channel 數(shù)量不太小 ② 較淺的位層 兩個(gè)條件時(shí),GroupDWCB 可以達(dá)到精度無損,同時(shí)提速的效果),在 TinyM 的第二個(gè) stage 使用(g = 2)。
AltDWCB:DWCB 的變種,將深度卷積核改為(1,5)或(5,1)交替使用,在第三個(gè) stage 使用可提升性能,適用于層數(shù)較多的 stage。
2.下采樣方式:S2DDown 使用 space to depth 操作降采樣,利用 征程 6 系列芯片對 tensor layout 操作的高效支持,快速完成降采樣,改變特征的空間和通道維度。(自己設(shè)計(jì)時(shí),謹(jǐn)慎使用 S2DDown 降采樣方法。)
自行構(gòu)建有效基礎(chǔ) block:構(gòu)建 baseline 時(shí),可先使用 DWCB,再嘗試 GroupDWCB/AltDWCB 結(jié)構(gòu)提升性能。
3.2 性能/精度數(shù)據(jù)對比
從幀率和精度數(shù)據(jù)來看,HENet_TinyM 和 HENet_TinyE 在 J6 系列芯片上表現(xiàn)出色,與其他經(jīng)典輕量化網(wǎng)絡(luò)相比,在保證精度的同時(shí),具有更高的幀率,更適合實(shí)際應(yīng)用。
3.3 HENet_TinyM 代碼詳解
HENet 源碼在地平線 docker 路徑:/usr/local/lib/python3.10/dist-packages/hat/models/backbones/henet.py
HENet_TinyM 總體分為四個(gè) stage,每個(gè) stage 會進(jìn)行一次 2 倍下采樣。以下是總體的結(jié)構(gòu)配置:
# ---------------------- TinyM ---------------------- depth = [4, 3, 8, 6] block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"] width = [64, 128, 192, 384] attention_block_num = [0, 0, 0, 0] mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2 act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"] use_layer_scale = [True, True, True, True] final_expand_channel, feature_mix_channel = 0, 1024 down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"]
參數(shù)含義:
depth:每個(gè) stage 包含的 block 數(shù)量
block_cls:每個(gè) stage 使用的基礎(chǔ) block 類型
width:每個(gè) stage 中 block 的輸出 channel 數(shù)
attention_block_num:每個(gè) stage 中的 attention_block 數(shù)量,將用在 stage 的尾部(TinyM 中沒有用到)
mlp_ratios:每個(gè) stage 中的 mlp 的通道擴(kuò)增系數(shù)
act_layer:每個(gè) stage 使用的激活函數(shù)
use_layer_scale:是否對 residual 分支進(jìn)行可學(xué)習(xí)的縮放
final_expand_channel:在網(wǎng)絡(luò)尾部的 pooling 之前進(jìn)行 channel 擴(kuò)增的數(shù)量,0 代表不使用擴(kuò)增
feature_mix_channel :在分類 head 之前進(jìn)行 channel 擴(kuò)增的數(shù)量
down_cls:每個(gè) stage 對應(yīng)的下采樣類型
代碼解讀:
from typing import Sequence, Tuple import horizon_plugin_pytorch.nn as hnn import torch import torch.nn as nn from horizon_plugin_pytorch.quantization import QuantStub from torch.quantization import DeQuantStub # 基礎(chǔ)模塊的代碼,可見地平線提供的OE docker # /usr/local/lib/python3.10/dist-packages/hat/models/base_modules/basic_henet_module.py from basic_henet_module import ( BasicHENetStageBlock, # HENet 的基本階段塊 S2DDown, # 降采樣(downsampling)模塊 ) from basic_henet_module import ConvModule2d # 2D 卷積層模塊 # 繼承 torch.nn.Module,定義神經(jīng)網(wǎng)絡(luò)的標(biāo)準(zhǔn)方式 class HENet(nn.Module): """ Module of HENet. Args: in_channels: The in_channels for the block. block_nums: Number of blocks in each stage. embed_dims: Output channels in each stage. attention_block_num: Number of attention blocks in each stage. mlp_ratios: Mlp expand ratios in each stage. mlp_ratio_attn: Mlp expand ratio in attention blocks. act_layer: activation layers type. use_layer_scale: Use a learnable scale factor in the residual branch. layer_scale_init_value: Init value of the learnable scale factor. num_classes: Number of classes for a Classifier. include_top: Whether to include output layer. flat_output: Whether to view the output tensor. extra_act: Use extra activation layers in each stage. final_expand_channel: Channel expansion before pooling. feature_mix_channel: Channel expansion is performed before head. block_cls: Basic block types in each stage. down_cls: Downsample block types in each stage. patch_embed: Stem conv style in the very beginning. stage_out_norm: Add a norm layer to stage outputs. Ignored if include_top is True. """ def __init__( self, in_channels: int, # 輸入圖像的通道數(shù)(常見圖像為 3) block_nums: Tuple[int], # 每個(gè)階段(Stage)的基礎(chǔ)塊(Block)數(shù)量 embed_dims: Tuple[int], # 每個(gè)階段的特征通道數(shù) attention_block_num: Tuple[int], # 每個(gè)階段的注意力塊(Attention Block)數(shù)量 mlp_ratios: Tuple[int] = (2, 2, 2, 2), # 多層感知機(jī)(MLP)擴(kuò)展比率 mlp_ratio_attn: int = 2, act_layer: Tuple[str] = ("nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"), # 激活函數(shù)類型 use_layer_scale: Tuple[bool] = (True, True, True, True), layer_scale_init_value: float = 1e-5, num_classes: int = 1000, include_top: bool = True, # 是否包含最終的分類頭(通常為 nn.Linear) flat_output: bool = True, extra_act: Tuple[bool] = (False, False, False, False), final_expand_channel: int = 0, feature_mix_channel: int = 0, block_cls: Tuple[str] = ("DWCB", "DWCB", "DWCB", "DWCB"), down_cls: Tuple[str] = ("S2DDown", "S2DDown", "S2DDown", "None"), patch_embed: str = "origin", # 圖像預(yù)處理方式(卷積 embedding) stage_out_norm: bool = True, # 是否在階段輸出后加一層 BatchNorm,建議不要 ): super().__init__() self.final_expand_channel = final_expand_channel self.feature_mix_channel = feature_mix_channel self.stage_out_norm = stage_out_norm self.block_cls = block_cls self.include_top = include_top self.flat_output = flat_output if self.include_top: self.num_classes = num_classes # patch_embed 負(fù)責(zé)將輸入圖像轉(zhuǎn)換為特征 # 里面有兩個(gè)convModule2d,進(jìn)行了兩次 3×3 的卷積(步長 stride=2),相當(dāng)于 對輸入圖像進(jìn)行 4 倍降采樣 if patch_embed in ["origin"]: self.patch_embed = nn.Sequential( ConvModule2d( in_channels, embed_dims[0] // 2, kernel_size=3, stride=2, padding=1, norm_layer=nn.BatchNorm2d(embed_dims[0] // 2), act_layer=nn.ReLU(), ), ConvModule2d( embed_dims[0] // 2, embed_dims[0], kernel_size=3, stride=2, padding=1, norm_layer=nn.BatchNorm2d(embed_dims[0]), act_layer=nn.ReLU(), ), ) stages = [] # 構(gòu)建多個(gè)階段 (Stages),存放多個(gè) BasicHENetStageBlock,每個(gè)block處理不同通道數(shù)的特征。 downsample_block = [] # 存放 S2DDown,在每個(gè)階段之間進(jìn)行降采樣。 for block_idx, block_num in enumerate(block_nums): stages.append( BasicHENetStageBlock( in_dim=embed_dims[block_idx], block_num=block_num, attention_block_num=attention_block_num[block_idx], mlp_ratio=mlp_ratios[block_idx], mlp_ratio_attn=mlp_ratio_attn, act_layer=act_layer[block_idx], use_layer_scale=use_layer_scale[block_idx], layer_scale_init_value=layer_scale_init_value, extra_act=extra_act[block_idx], block_cls=block_cls[block_idx], ) ) if block_idx < len(block_nums) - 1: assert eval(down_cls[block_idx]) in [S2DDown], down_cls[ block_idx ] downsample_block.append( eval(down_cls[block_idx])( patch_size=2, in_dim=embed_dims[block_idx], out_dim=embed_dims[block_idx + 1], ) ) self.stages = nn.ModuleList(stages) self.downsample_block = nn.ModuleList(downsample_block) if final_expand_channel in [0, None]: self.final_expand_layer = nn.Identity() self.norm = nn.BatchNorm2d(embed_dims[-1]) last_channels = embed_dims[-1] else: self.final_expand_layer = ConvModule2d( embed_dims[-1], final_expand_channel, kernel_size=1, bias=False, norm_layer=nn.BatchNorm2d(final_expand_channel), act_layer=eval(act_layer[-1])(), ) last_channels = final_expand_channel if feature_mix_channel in [0, None]: self.feature_mix_layer = nn.Identity() else: self.feature_mix_layer = ConvModule2d( last_channels, feature_mix_channel, kernel_size=1, bias=False, norm_layer=nn.BatchNorm2d(feature_mix_channel), act_layer=eval(act_layer[-1])(), ) last_channels = feature_mix_channel # 分類頭 if self.include_top: self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # 將特征圖變?yōu)?nbsp;1×1 self.head = ( nn.Linear(last_channels, num_classes) if num_classes > 0 else nn.Identity() ) else: stage_norm = [] for embed_dim in embed_dims: if self.stage_out_norm is True: stage_norm.append(nn.BatchNorm2d(embed_dim)) else: stage_norm.append(nn.Identity()) self.stage_norm = nn.ModuleList(stage_norm) self.up = hnn.Interpolate( scale_factor=2, mode="bilinear", recompute_scale_factor=True ) self.quant = QuantStub() self.dequant = DeQuantStub() def forward(self, x): x = self.quant(x) if isinstance(x, Sequence) and len(x) == 1: x = x[0] # 依次經(jīng)過 patch_embed、stages、downsample_block 處理特征圖。 x = self.patch_embed(x) outs = [] for idx in range(len(self.stages)): x = self.stages[idx](x) if not self.include_top: x_normed = self.stage_norm[idx](x) if idx == 0: outs.append(self.up(x_normed)) outs.append(x_normed) if idx < len(self.stages) - 1: x = self.downsample_block[idx](x) if not self.include_top: return outs if self.final_expand_channel in [0, None]: x = self.norm(x) else: x = self.final_expand_layer(x) x = self.avgpool(x) x = self.feature_mix_layer(x) x = self.head(torch.flatten(x, 1)) x = self.dequant(x) if self.flat_output: x = x.view(-1, self.num_classes) return x # ---------------------- TinyM ---------------------- depth = [4, 3, 8, 6] block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"] width = [64, 128, 192, 384] attention_block_num = [0, 0, 0, 0] mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2 act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"] use_layer_scale = [True, True, True, True] extra_act = [False, False, False, False] final_expand_channel, feature_mix_channel = 0, 1024 down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"] patch_embed = "origin" stage_out_norm = False # 初始化 HENet 模型 model = HENet( in_channels=3, # 假設(shè)輸入是 RGB 圖像 block_nums=tuple(depth), embed_dims=tuple(width), attention_block_num=tuple(attention_block_num), mlp_ratios=tuple(mlp_ratios), mlp_ratio_attn=mlp_ratio_attn, act_layer=tuple(act_layer), use_layer_scale=tuple(use_layer_scale), extra_act=tuple(extra_act), final_expand_channel=final_expand_channel, feature_mix_channel=feature_mix_channel, block_cls=tuple(block_cls), down_cls=tuple(down_cls), patch_embed=patch_embed, stage_out_norm=stage_out_norm, num_classes=1000, # 假設(shè)用于 ImageNet 1000 類分類 include_top=True, ) # ---------------------- 處理單幀輸入數(shù)據(jù) ---------------------- # 生成一個(gè)隨機(jī)圖像張量,假設(shè)輸入是 224x224 RGB 圖像 input_tensor = torch.randn(1, 3, 224, 224) # [batch, channels, height, width] # ---------------------- 進(jìn)行推理 ---------------------- model.eval() with torch.no_grad(): # 關(guān)閉梯度計(jì)算,提高推理速度 output = model(input_tensor) # ---------------------- 輸出結(jié)果 ---------------------- print("模型輸出形狀:", output.shape) print("模型輸出類型:", type(output)) print("模型輸出長度:", len(output)) print(output) print("預(yù)測類別索引:", torch.argmax(output, dim=1).item()) # 獲取最大概率的類別索引 # 輸出 FLOPs 和 參數(shù)量 from thop import profile flops, params = profile(model, inputs=(input_tensor,)) print(f"FLOPs: {flops / 1e6:.2f}M") # 以百萬次運(yùn)算(MFLOPs)顯示 print(f"Params: {params / 1e6:.2f}M") # 以百萬參數(shù)(M)顯示
4. 基于 block 構(gòu)建網(wǎng)絡(luò)
可參考如下代碼構(gòu)建:
import torch from torch import nn from torch.quantization import DeQuantStub from typing import Union, Tuple, Optional from horizon_plugin_pytorch.nn.quantized import FloatFunctional as FF from torch.nn.parameter import Parameter from horizon_plugin_pytorch.quantization import QuantStub class ChannelScale2d(nn.Module): """對 Conv2d 的輸出特征圖進(jìn)行線性縮放""" def __init__(self, num_features: int) -> None: super().__init__() self.num_features = num_features self.weight = Parameter(torch.ones(num_features)) # 初始化權(quán)重為1 self.weight_quant = QuantStub() def forward(self, input: torch.Tensor) -> torch.Tensor: return input * self.weight_quant(self.weight).reshape(self.num_features, 1, 1) class ConvModule2d(nn.Module): """標(biāo)準(zhǔn)的 2D 卷積塊,包含可選的歸一化層和激活層""" def __init__( self, in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = "zeros", norm_layer: Optional[nn.Module] = None, act_layer: Optional[nn.Module] = None, ): super().__init__() layers = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)] if norm_layer: layers.append(norm_layer) if act_layer: layers.append(act_layer) self.block = nn.Sequential(*layers) def forward(self, x): return self.block(x) class GroupDWCB(nn.Module): """分組深度可分離卷積塊""" def __init__( self, dim: int, hidden_dim: int, kernel_size: int = 3, act_layer: str = "nn.ReLU", use_layer_scale: bool = True, extra_act: Optional[bool] = False, ): super().__init__() self.extra_act = eval(act_layer)() if extra_act else nn.Identity() group_width_dict = { 64: 64, 128: 64, 192: 64, 384: 64, 256: 128, 48: 48, 96: 48, } group_width = group_width_dict.get(dim, 64) self.dwconv = ConvModule2d(dim, dim, kernel_size=kernel_size, padding=kernel_size // 2, groups=dim, norm_layer=nn.BatchNorm2d(dim)) self.pwconv1 = nn.Conv2d(dim, hidden_dim, kernel_size=1, groups=dim // group_width) self.act = eval(act_layer)() self.pwconv2 = nn.Conv2d(hidden_dim, dim, kernel_size=1) self.use_layer_scale = use_layer_scale if use_layer_scale: self.layer_scale = ChannelScale2d(dim) self.add = FF() def forward(self, x): input_x = x x = self.dwconv(x) x = self.pwconv1(x) x = self.act(x) x = self.pwconv2(x) if self.use_layer_scale: x = self.add.add(input_x, self.layer_scale(x)) else: x = self.add.add(input_x, x) x = self.extra_act(x) return x class CustomModel(nn.Module): """完整的模型""" def __init__(self, d_model=256, output_channels=2): super().__init__() self.encoder_layer = nn.Sequential( GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"), GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"), ) self.out_layer = nn.Sequential( ConvModule2d(in_channels=d_model, out_channels=d_model, kernel_size=1), nn.BatchNorm2d(d_model), nn.ReLU(inplace=True), ConvModule2d(in_channels=d_model, out_channels=output_channels, kernel_size=1), ) self.quant = QuantStub() self.dequant = DeQuantStub() def forward(self, x): x = self.quant(x) x = self.encoder_layer(x) x = self.out_layer(x) x = self.dequant(x) return x # =================== 輸入?yún)?shù) =================== # d_model = 64 output_channels = 10 model = CustomModel(d_model=d_model, output_channels=output_channels) # 生成輸入 input_tensor = torch.randn(1, 64, 300, 200) # 前向傳播 output = model(input_tensor) print("The shape of output is:", output.shape) # 輸出 FLOPs 和 參數(shù)量 from thop import profile flops, params = profile(model, inputs=(input_tensor,)) print(f"FLOPs: {flops / 1e6:.2f}M") # 以百萬次運(yùn)算(MFLOPs)顯示 print(f"Params: {params / 1e6:.2f}M") # 以百萬參數(shù)(M)顯示
輸出信息如下:
The shape of output is: torch.Size([1, 10, 300, 200]) FLOPs: 1382.40M Params: 0.02M
*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布,僅代表博主個(gè)人觀點(diǎn),如有侵權(quán)請聯(lián)系工作人員刪除。