融合金字塔池化和注意力机制的轻量语义分割方法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

重庆市技术创新与应用发展重点项目(cstc2019jscx-mbdxX0061)


Lightweight Semantic Segmentation Method Fusing Pyramid Pooling and Attention Mechanisms
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语义分割被广泛应用于医学图像分割、无人驾驶、遥感图像分割等计算机视觉任务中,而目前语义分割方法通常所需的计算量和参数量庞大,难以在算力和硬件存储有限的嵌入式平台部署。针对这一问题,从网络的参数量、计算量、性能等3个方面综合考虑,设计了1种轻量化语义分割方法。以轻量化网络MobileNetV2为主干,使用深度可分离卷积对模型进行压缩,分为高低语义2条路径向前推导。为了保证网络性能,高语义路径通过融合金字塔池化与双重注意力模块来获取准确的上下文信息;低语义路径通过多尺度拼接与类似于注意力机制的信息传递模块来获取清晰的分割边界;最后拼接2条路径获取分割结果。在PASCAL VOC 2012数据集上的实验中,与主流网络模型相比,该模型的网络参数量仅为PSPNet参数量的4.9%,DeeplabV3+的4.2%;浮点计算量仅为PSPNet浮点计算量的6.7%,DeeplabV3+的4.8%;平均交并比略低于PSPNet与DeeplabV3+。所提模型在保证网络性能的同时实现了轻量化。

    Abstract:

    Semantic segmentation is widely used in medical image segmentation, unmanned driving, remote sensing image segmentation and other computer vision tasks. In order to solve the problem of deploying embedded platforms with limited computing power and hardware storage, a lightweight semantic segmentation model is proposed by considering three aspects of network parameters, calculation and performance. The model takes the lightweight network MobileNetV2 as the backbone, depthwise separable convolution is applied to compress the model, which is divided into two paths of high and low semantic features for derivation. High-semantic features can obtain accurate contextual information through the dual attention pyramid pooling module. Low-semantic features can obtain clearer segmentation boundary by multi-scale feature stitching and high semantic information transmission. Finally, high and low semantic features are fused to obtain the segmentation results. In the experiments on PASCAL VOC 2012 dataset, compared with the mainstream network model, the number of network parameters of model is 2.31×106, which is only 4.9% of PSPNet and 4.2% of DeeplabV3+. The number of floating point computing is 7.989GFLOPs, only 6.7% of PSPNet’s floating point computing and 4.8% of DeeplabV3+. The mean intersection over union is 73.75%, slightly lower than PSPNet and DeeplabV3+. It achieves a better balance between computational efficiency and segmentation accuracy.

    参考文献
    相似文献
    引证文献
引用本文

廖恒锋,魏延,杜韩宇.融合金字塔池化和注意力机制的轻量语义分割方法[J].重庆师范大学学报自然科学版,2023,40(6):95-106

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-27