Coupled Tensor Decomposition for Compact Network Representation

Abstract

In this article, we introduce an approach called coupled filters decomposition, which builds on the key observation that redundancy exists among filters in a convolutional layer, meaning that similar filters can produce partially overlapping outputs. Leveraging this insight, we propose a joint decomposition of filters using coupled tensor decompositions, specifically coupled canonical polyadic decomposition (CPD), which enables the sharing of a common factor matrix across similar filters. This joint factorization not only reduces the number of parameters but also lowers computational complexity by eliminating redundant computations. To further improve efficiency, we first cluster the filters before decomposition. The grouping relies on a custom metric based on the subspace spanned by the shared-mode factor. Within each group, the coupling constraint is less restrictive. Extensive experiments across various architectures, datasets, and tasks validate the effectiveness of our method, demonstrating its competitive performance compared to state-of-the-art model compression techniques.

🔥 News

10.10.2025: 📢 Exciting update! CoDec will be presented at the Journées Apprentissage Signal Image du LIS in Châteauneuf-le-Rouge, France, from 6 to 7 November 2025. Come and join us! 👋
03.10.2025: 🎥✨ The wait is over! Our presentation video is out — 3 weeks and 3 days after acceptance! 🍿🥂 Sit back, relax, and enjoy the premiere!
09.09.2025: 🎉 Accepted! Our paper made it into TNNLS after 222 days since submission!

🎥 Presentation Videos

💪 Motivation

We highlight a key observation that drives our approach: within a convolutional layer, redundancy exists among the filters, as noted in various CNN compression studies, particularly in similarity-based filter pruning methods. Since all filters extract information from a common input, partially similar filters may produce partially similar output features. To enhance computational efficiency, the redundant computation of these similar parts should be avoided.

For example, in the top half of Figure 1, the filters \(\tens{W}_1\) and \(\tens{W}_3\) exhibit partial similarity, leading to their output feature maps \(\tens{O}_1\) and \(\tens{O}_3\) sharing a similar component (shown in blue) that is computed twice, causing redundant calculations. To avoid such duplicative computations and enhance efficiency, these filters can be jointly decomposed.

CoDeC teaser — **Figure 1:** Comparison between standard convolution *(top)* and CoDeC-based convolution with coupled CPD factors *(bottom)*.

Building on these insights, we introduce the concept of coupled filters decomposition. In this scheme, multiple filters are jointly approximated using coupled tensor decompositions. To demonstrate the use of this method, we employ coupled CPD as a representative example due to its simplicity and efficiency, although our approach can be adapted to other decomposition techniques. Specifically, instead of decomposing each filter individually, we propose jointly factorizing them along a specific mode. After decomposition, the jointly decomposed filters share a common factor matrix in the selected mode while retaining their unique factor matrices in other modes.

This approach suggests that filters possess both common and particular characteristics. For instance, in the bottom of Figure 1, the two similar filters \(\tens{W}_1\) and \(\tens{W}_3\) are jointly factorized along the first mode, yielding a common factor matrix \(\matr{A}^{(1)}\) and different factor matrices in the other modes, namely matrices \(\matr{B}_1, \matr{C}_1\) and \(\matr{B}_3, \matr{C}_3\). Notably, since these filters share the same input tensor, the computation between the decomposed common factor and the input must only be performed once.

🎯 Approach

The central idea of CoDeC is to jointly factorize groups of similar filters along a specific mode, rather than decomposing each filter independently. To enable effective joint decomposition, the framework groups filters based on their similarity in the chosen mode, ensuring that filters within the same group exhibit higher similarity in the joint mode subspace compared to those in different groups. To achieve this, CoDeC adopts a two-stage process, as illustrated in Figure 2, consisting of the following components:

Clustering: The filters of the original convolution layer are classified into \(K\) clusters, as illustrated in the left half of Figure 2.
Coupled decomposition: Filters within each cluster are jointly decomposed using coupled CPD. The decomposition processes across all \(K\) clusters are depicted in the right half of Figure 2. Once all clusters are processed in parallel, the output factor matrices are combined to form the CoDeCBlock (Figure 3).

This scheme is simultaneously applied to all convolution layers of the original model.

CoDeC framework diagram — **Figure 2:** CoDeC framework — filters are first clustered *(left)*, then each cluster is jointly decomposed using coupled canonical polyadic decomposition *(right)*.

🚩 Main results

To demonstrate the adaptability of CoDeC, we assess 4 architectures: VGG-16, ResNet-20/32/56/110 with residual blocks, DenseNet-40 with dense blocks, and SqueezeNet with fire modules. These models are tested on the CIFAR-10/100 dataset. Additionally, to validate the scalability of CoDeC, experiments are conducted on the ImageNet dataset using ResNet-18/32/50/152 architectures. Furthermore, the compressed ResNet-50 model is employed as the backbone network for FasterRCNN, MaskRCNN, and KeypointRCNN on the COCO-2017 dataset. We compare CoDeC with more than 50 related works, as detailed in the paper, and present ResNet-50 results on ImageNet in Table 1 for clarity. Our method consistently surpasses other approaches across all compression levels in terms of performance and complexity reduction.

**Table 1.** Compression results of ResNet-50 on ImageNet
Method	Type	Top-1	Top-5	MACs (↓%)	Params (↓%)
ResNet-50 (CVPR'16)		76.15	92.87	4.12G (00)	25.56M (00)
RR-Tu2 (TNNLS'25)	Tucker Decomposition	76.10	92.97	2.64B (36)	17.00M (33)
Lee et al. (TNNLS'24)	Pruning + NAS + Knowledge Distillation	76.23	92.87	2.48B (39)	21.56M (15)
CoDeC (Ours)	Coupled Canonical Polyadic Decomposition	76.74	93.43	2.25B (45)	14.32M (44)
HSC (TPAMI'25)	Pruning	75.46	92.40	1.57G (62)	N/A
BFP (Neurocomputing'25)	Pruning + Knowledge Distillation	75.47	92.47	1.68B (59)	13.48M (47)
CEPD (TNNLS'25)	Tensor Train Decomposition + Pruning	75.82	92.84	1.53B (63)	9.38M (63)
LRPET (TNNLS'25)	Singular Value Decomposition	75.91	92.79	1.90B (54)	12.89M (50)
CoDeC (Ours)	Coupled Canonical Polyadic Decomposition	75.96	92.91	1.42B (66)	8.81M (66)

🚀 Throughput acceleration

To evaluate CoDeC's effectiveness in downstream tasks, we used our compressed ResNet-50/Imagenet as the backbone for training Faster/Mask/Keypoint-RCNN on COCO. Our method shows promising results in terms of precision and recall, along with achieving relatively higher compression levels compared to other approaches. Remarkably, CoDeC significantly enhances inference throughput, resulting in over a \(2 \times\) improvement in frames per second (FPS) compared to the baseline models. For instance, MaskRCNN exhibits a reduction in end-to-end latency from 100 ms to 42 ms, achieving a real-time framerate of 24 FPS. It is worth emphasizing that these performance evaluations were conducted on a GTX 3060 GPU, providing robust evidence of the real-world applicability of our approach. These results highlight CoDeC's potential as a valuable tool for enhancing neural network efficiency and effectiveness in demanding tasks such as real-world object detection, instance segmentation, and keypoint detection.

Figure 2: Baseline (left) vs Compressed (right) model inference.

🔖 Citation

If the code and paper help your research, please kindly cite:

        
          @article{pham2025coupled,
            title={Coupled Tensor Decomposition for Compact Network Representation},
            author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
            journal={IEEE Transactions on Neural Networks and Learning Systems},
            year={2025},
            pages={1--15},
            doi={10.1109/TNNLS.2025.3609797}
          }

👍 Acknowledgements

This work was granted access to the high-performance computing resources of IDRIS under the allocation 2023-103147 made by GENCI. Specifically, our experiments were conducted on the Jean Zay supercomputer, located at IDRIS, the national computing center for the National Centre for Scientific Research (CNRS).

We thank the Agence Nationale de la Recherche (ANR) for partially supporting our work through the ANR ASTRID ROV-Chasseur project (ANR-21-ASRO-0003).