In this article, we introduce an approach called coupled filters decomposition, which builds on the key observation that redundancy exists among filters in a convolutional layer, meaning that similar filters can produce partially overlapping outputs. Leveraging this insight, we propose a joint decomposition of filters using coupled tensor decompositions, specifically coupled canonical polyadic decomposition (CPD), which enables the sharing of a common factor matrix across similar filters. This joint factorization not only reduces the number of parameters but also lowers computational complexity by eliminating redundant computations. To further improve efficiency, we first cluster the filters before decomposition. The grouping relies on a custom metric based on the subspace spanned by the shared-mode factor. Within each group, the coupling constraint is less restrictive. Extensive experiments across various architectures, datasets, and tasks validate the effectiveness of our method, demonstrating its competitive performance compared to state-of-the-art model compression techniques.
We highlight a key observation that drives our approach: within a convolutional layer, redundancy exists among the filters, as noted in various CNN compression studies, particularly in similarity-based filter pruning methods. Since all filters extract information from a common input, partially similar filters may produce partially similar output features. To enhance computational efficiency, the redundant computation of these similar parts should be avoided.
For example, in the top half of Figure 1, the filters \(\tens{W}_1\) and \(\tens{W}_3\) exhibit partial similarity, leading to their output feature maps \(\tens{O}_1\) and \(\tens{O}_3\) sharing a similar component (shown in blue) that is computed twice, causing redundant calculations. To avoid such duplicative computations and enhance efficiency, these filters can be jointly decomposed.
Building on these insights, we introduce the concept of coupled filters decomposition. In this scheme, multiple filters are jointly approximated using coupled tensor decompositions. To demonstrate the use of this method, we employ coupled CPD as a representative example due to its simplicity and efficiency, although our approach can be adapted to other decomposition techniques. Specifically, instead of decomposing each filter individually, we propose jointly factorizing them along a specific mode. After decomposition, the jointly decomposed filters share a common factor matrix in the selected mode while retaining their unique factor matrices in other modes.
This approach suggests that filters possess both common and particular characteristics. For instance, in the bottom of Figure 1, the two similar filters \(\tens{W}_1\) and \(\tens{W}_3\) are jointly factorized along the first mode, yielding a common factor matrix \(\matr{A}^{(1)}\) and different factor matrices in the other modes, namely matrices \(\matr{B}_1, \matr{C}_1\) and \(\matr{B}_3, \matr{C}_3\). Notably, since these filters share the same input tensor, the computation between the decomposed common factor and the input must only be performed once.
The central idea of CoDeC is to jointly factorize groups of similar filters along a specific mode, rather than decomposing each filter independently. To enable effective joint decomposition, the framework groups filters based on their similarity in the chosen mode, ensuring that filters within the same group exhibit higher similarity in the joint mode subspace compared to those in different groups. To achieve this, CoDeC adopts a two-stage process, as illustrated in Figure 2, consisting of the following components:
This scheme is simultaneously applied to all convolution layers of the original model.
To demonstrate the adaptability of CoDeC, we assess 4 architectures: VGG-16, ResNet-20/32/56/110 with residual blocks, DenseNet-40 with dense blocks, and SqueezeNet with fire modules. These models are tested on the CIFAR-10/100 dataset. Additionally, to validate the scalability of CoDeC, experiments are conducted on the ImageNet dataset using ResNet-18/32/50/152 architectures. Furthermore, the compressed ResNet-50 model is employed as the backbone network for FasterRCNN, MaskRCNN, and KeypointRCNN on the COCO-2017 dataset. We compare CoDeC with more than 50 related works, as detailed in the paper, and present ResNet-50 results on ImageNet in Table 1 for clarity. Our method consistently surpasses other approaches across all compression levels in terms of performance and complexity reduction.
Method | Type | Top-1 | Top-5 | MACs (↓%) | Params (↓%) |
---|---|---|---|---|---|
ResNet-50 (CVPR'16) | 76.15 | 92.87 | 4.12G (00) | 25.56M (00) | |
RR-Tu2 (TNNLS'25) | Tucker Decomposition | 76.10 | 92.97 | 2.64B (36) | 17.00M (33) |
Lee et al. (TNNLS'24) | Pruning + NAS + Knowledge Distillation | 76.23 | 92.87 | 2.48B (39) | 21.56M (15) |
CoDeC (Ours) | Coupled Canonical Polyadic Decomposition | 76.74 | 93.43 | 2.25B (45) | 14.32M (44) |
HSC (TPAMI'25) | Pruning | 75.46 | 92.40 | 1.57G (62) | N/A |
BFP (Neurocomputing'25) | Pruning + Knowledge Distillation | 75.47 | 92.47 | 1.68B (59) | 13.48M (47) |
CEPD (TNNLS'25) | Tensor Train Decomposition + Pruning | 75.82 | 92.84 | 1.53B (63) | 9.38M (63) |
LRPET (TNNLS'25) | Singular Value Decomposition | 75.91 | 92.79 | 1.90B (54) | 12.89M (50) |
CoDeC (Ours) | Coupled Canonical Polyadic Decomposition | 75.96 | 92.91 | 1.42B (66) | 8.81M (66) |
If the code and paper help your research, please kindly cite:
@article{pham2025coupled,
title={Coupled Tensor Decomposition for Compact Network Representation},
author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
journal={IEEE Transactions on Neural Networks and Learning Systems},
year={2025},
pages={1--15},
doi={10.1109/TNNLS.2025.3609797}
}
This work was granted access to the high-performance computing resources of IDRIS under the allocation 2023-103147 made by GENCI. Specifically, our experiments were conducted on the Jean Zay supercomputer, located at IDRIS, the national computing center for the National Centre for Scientific Research (CNRS).
We thank the Agence Nationale de la Recherche (ANR) for partially supporting our work through the ANR ASTRID ROV-Chasseur project (ANR-21-ASRO-0003).