Efficient tensor decomposition-based filter pruning

Abstract

In this paper, we present CORING, which is short for effiCient tensOr decomposition-based filteR prunING, a novel filter pruning methodology for neural networks. CORING is crafted to achieve efficient tensor decomposition-based pruning, a stark departure from conventional approaches that rely on vectorized or matricized filter representations. Our approach represents a significant leap forward in the field by introducing tensor decompositions, specifically the HOSVD, which preserves the multidimensional nature of filters while providing a low-rank approximation, thus substantially reducing complexity. Furthermore, we introduce a versatile method for calculating filter similarity by using the low-rank approximation offered by the HOSVD. This obviates the need for using full filters or reshaped versions and enhances the overall efficiency and effectiveness of our approach. Extensive experimentation across diverse architectures and datasets spanning various vision tasks, including image classification, object detection, instance segmentation, and keypoint detection, validates CORING’s prowess. Remarkably, it outperforms state-of-the-art methods in reducing MACs and parameters, consistently enhancing validation accuracy. Furthermore, we supplement our quantitative results with a comprehensive ablation study, providing substantial evidence of the efficiency of our tensor-based approach. Beyond quantitative outcomes, qualitative results vividly illustrate CORING’s ability to retain essential features within pruned neural networks.

🔥 News

15.05.2024: 🎉 Accepted! Our paper made it into Neural Networks after 216 days since submission!

🚀 Throughput acceleration

To evaluate CORING's effectiveness in downstream tasks, we used our compressed ResNet-50/Imagenet as the backbone for training Faster/Mask/Keypoint-RCNN on COCO. Our method shows promising results in terms of precision and recall, along with achieving relatively higher compression levels compared to other approaches. Remarkably, CORING significantly enhances inference throughput, resulting in over a \(2 \times\) improvement in frames per second (FPS) compared to the baseline models. For instance, MaskRCNN exhibits a reduction in end-to-end latency from 100 ms to 42 ms, achieving a real-time framerate of 21 FPS. It is worth emphasizing that these performance evaluations were conducted on a GTX 3060 GPU, providing robust evidence of the real-world applicability of our approach. These results highlight CORING's potential as a valuable tool for enhancing neural network efficiency and effectiveness in demanding tasks such as real-world object detection, instance segmentation, and keypoint detection.

Figure 2: Baseline (left) vs Compressed (right) model inference.

🔖 Citation

If the code and paper help your research, please kindly cite:

        
          @article{pham2024efficient,
            title={Efficient tensor decomposition-based filter pruning},
            author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
            journal={Neural Networks},
            volume={178},
            pages={106393},
            year={2024},
            doi={10.1016/j.neunet.2024.106393}
          }

👍 Acknowledgements

This work was granted access to the high-performance computing resources of IDRIS under the allocation 2023-103147 made by GENCI. Specifically, our experiments were conducted on the Jean Zay supercomputer, located at IDRIS, the national computing center for the National Centre for Scientific Research (CNRS).

We thank the Agence Nationale de la Recherche (ANR) for partially supporting our work through the ANR ASTRID ROV-Chasseur project (ANR-21-ASRO-0003).