| [1] |
Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C]. Long Beach: Advances in Neural Information Processing Systems, 2017:3856-3866.
|
| [2] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. San Diego: International Conference on Learning Representation, 2015:1-14.
|
| [3] |
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
|
| [4] |
Huang G, Liu Z, Laurens V D M, et al. Densely connected convolutional networks[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:4700-4708.
|
| [5] |
Zeng R, Song Y. A fast routing capsule network with improved dense blocks[J]. IEEE Transactions on Industrial Informatics, 2021, 18(7):4383-4392.
|
| [6] |
Zeng R, Zhu J L, Song Y, et al. Enhanced capsule networks with improved residual modules[C]. Beijing: IEEE International Conference on Unmanned Systems, 2021:39-44.
|
| [7] |
Rajasegaran J, Jayasundara V, Jayasekara S, et al. DeepCaps:Going deeper with capsule networks[C]. Long Beach: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019:10725-10733.
|
| [8] |
Hinton G E. How to represent part-whole hierarchies in a neural network[J]. Neural Computation, 2023, 35(3):413-452.
|
| [9] |
Alexey D, Lucas B, Alexander K, et al. An image is worth 16×16 words:Transformers for image recognition at scale[C]. Online: International Conference on Learning Representations, 2021:1-21.
|
| [10] |
Garau N, Bisagno N, Sambugaro Z, et al. Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks[C]. New Orleans: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022:13689-13698.
|
| [11] |
Bromley J, Guyon I, Yang L C, et al. Signature verification using a "Siamese" time delay neural network[C]. Denver: Advances in Neural Information Processing Systems, 1993:737-744.
|
| [12] |
Md A I, Sen J, Neil D B B. How much position information do convolutional neural networks encode[C]. Online: International Conference on Learning Representations, 2020:1-11.
|
| [13] |
Xu K, Ba J, Kiros R, et al. Show,attend and tell:Neural image caption generation with visual attention[C]. Lille: Proceedings of the Thirty-second International Conference on Machine Learning, 2015:2048-2057.
|
| [14] |
Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]. Online: Proceedings of the Thirty-seventh International Conference on Machine Learning, 2020:1597-1607.
|
| [15] |
Hinton G E, Sabour S, Frosst N. Matrix capsules with EM routing[C]. Vancouver: International Conference on Learning Representations, 2018:1-15.
|
| [16] |
Tankovich V, Hane C, Zhang Y D, et al. HITNet: Hierarchical iterative tile refinement network for real-time stereo matching[C]. Online: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021:14362-14372.
|
| [17] |
Hahn T, Pyeon M, Kim G. Self-routing capsule networks[C]. Vancouver: Advances in Neural Information Processing Systems, 2019:7658-7667.
|
| [18] |
Su J L, Ahmed M, Lu Y, et al. RoFormer:Enhanced transformer with rotary position embedding[J]. Neurocomputing, 2024, 56(8):127063-127075.
|