| [1] |
Yu J, Qin Z, Wan T. Feature integration analysis of bag of features model for image retrieval[J]. Neurocomputing, 2013, 1(120):355-364.
|
| [2] |
Lecun Y, Boser J S. Back propagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4):541-551.
|
| [3] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 1(30):5998-6008.
|
| [4] |
Dosovitskiy A, Lucas Beyer. An image is worth 16×16 words:Transformers for image recognition at scale[C]. Online: The Eighth International Conference on Learning Representations, 2020:158-162.
|
| [5] |
Touvron H, Cored M, Douze M, et al. Training data-efficient image transformers and distillation through attention[C]. Vienna: Proceedings of the International Conference on Machine Learning, 2021:10347-10357.
|
| [6] |
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]. Online: Proceedings of the European Conference on Computer Vision, 2020:213-229.
|
| [7] |
Wang W H, Xie E Z, Li X, et al. Pyramid vision Transformer:A versatile backbone for dense prediction without convolutions[C]. Montreal: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021:568-578.
|
| [8] |
Graham B, El-Nouby A, Touvron H, et al. Levit:A vision transformer in convnet's clothing for faster inference[C]. Montreal: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021:12259-12269.
|
| [9] |
Mehta S, Rastegari M. Mobilevit: Light-weight,general-purpose and mobile-friendly vision transformer[EB/OL].(2022-5-22)[2024-01-4]. https://arxiv.org/abs/2110.02178.
|
| [10] |
Pan J, Bulat A, Tan F, et al. Edgevits:Competing light-weight CNNs on mobile devices with vision transformers[C]. Tel-Aviv: European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022:294-311.
|
| [11] |
Chen X, Zhang S, Song D, et al. Transformer with bidirectional decoder for speech recognition[C]. Shanghai: Proceedings of the Annual Conference of the International Speech Communication Association, 2020:1-12.
|
| [12] |
Guo J, Han K, Wu H, et al. CMT:Convolutional neural networks meet vision transformers[C]. Hulun Buir: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022:12175-12185.
|
| [13] |
He K, Zhang X, Ren S, et al. Deep residual learning for-image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
|
| [14] |
孙红, 杨晨, 莫光萍. 基于通道特征金字塔的图像分割算法[J]. 电子科技, 2023, 36(12):1007-7082.
|
|
Sun Hong, Yang Chen, Mo Guangping. Research on image segmentation algorithm based on channel feature pyramid[J]. Electronic Science and Technology, 2023, 36(12):1007-7082.
|
| [15] |
Liu Z, Lin Y, Cao Y, et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]. Montreal: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021:10012-10022.
|
| [16] |
Hassani A, Walton S, Li J, et al. Neighborhood attention-transformer[C]. Paris: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023:6185-6194.
|
| [17] |
Lin H, Cheng X, Wu X, et al. Cat:Cross attention in vision transformer[C]. Taipei: IEEE International Conference on Multimedia and Expo, 2022:1-6.
|
| [18] |
Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4,inception-resnet and the impact of residual connections on learning[C]. San Francisco: Proceedings of the AAAI Conference on Artificial Intelligence, 2017:1-22.
|
| [19] |
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. Toronto:University of Toronto, 2012:76-99.
|
| [20] |
Deng J, Dong W, Socher R, et al. Imagenet:A large-scale hierarchical image database[C]. Miami: IEEE Conference on Computer Vision and Pattern Recognition, 2009:248-255.
|
| [21] |
Cubuk E D, Zoph B. Randaugment:Practical automated data augmentation with a reduced search space[C]. Online: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020:702-703.
|
| [22] |
Zhong Z, Zheng L, Kang G. Random erasing data augmentation[C]. New York: Proceedings of the AAAI Conference on Artificial Intelligence, 2020:13001-13008.
|
| [23] |
Yun S, Han D. Regularization strategy to train strong classifiers with localizable features[C]. Seoul: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019:6023-6032.
|
| [24] |
Zhang Z, Zhang H, Zhao L, et al. Aggregating nested transfomers[EB/OL].(2021-10-30)[2024-01-12]. https://arxiv.org/abs/2105.12723.
|
| [25] |
Radosavovic I, Kosaraju R P, Girshick R. Designing network design spaces[C]. Online: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:10428-10436.
|
| [26] |
He K, Zhang X, Ren S. Deep residual learning for image recognition[C]. New York: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
|