[1] |
HERATH S, HARANDI M, PORIKLI F. Going Deeper into Action Recognition:A Survey[J]. Image and Vision Computing, 2017, 60:4-21.
doi: 10.1016/j.imavis.2017.01.010
|
[2] |
罗会兰, 童康, 孔繁胜. 基于深度学习的视频中人体动作识别进展综述[J]. 电子学报, 2019, 47(5):1162-1173.
|
|
LUO Huilan, TONG Kang, KONG Fansheng. The Progress of Human Action Recognition in Videos Based on Deep Learning:A Review[J]. Acta Electronica Sinica, 2019, 47(5):1162-1173.
|
[3] |
KRIZHEVSKYA, SUTSKEVER I, HINTON G, et al. Image Net Classification with Deep Convolutional Neural Networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2):1097-1105.
|
[4] |
SIMONYAN K, ZISSERMAN A. Two-Stream Convolutional Networks for Action Recognition in Videos[C]// Advances in Neural Information Processing Systems. San Diego: NIPS, 2014:568-576.
|
[5] |
TRAND, BOURDEV L, FERGUS R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks[C]// IEEE International Conference on Computer Vision.Piscataway:IEEE, 2015:4489-4497.
|
[6] |
WANGL M, XIONG Y J, WANG Z, et al. Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]// European Conference on Computer Vision.Heidelberg:Springer, 2016:20-36.
|
[7] |
DONAHUE J, HENDRICKS L A, ROHRBACH M, et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):677-691.
doi: 10.1109/TPAMI.2016.2599174
|
[8] |
XU K, BA J, KIROS R, et al. Show,Attend and Tell:Neural Image Caption Generation with Visual Attention[C]// In International Conference on Machine Learning. New York: ACM, 2015:2048-2057.
|
[9] |
成磊, 王玥, 田春娜. 一种添加残差注意力机制的视觉目标跟踪算法[J]. 西安电子科技大学学报, 2020, 47(6):148-157.
|
|
CHENG Lei, WANG Yue, TIAN Chunna. Residual Attention Mechanism for Visual Tracking[J]. Journal of Xidian University, 2020, 47(6):148-157.
|
[10] |
SHARMA S, KIROS R, SALAKHUTDINOV R. Action Recognition Using Visual Attention[C]// Neural Information Processing Systems Time Series Workshop. San Diego: NIPS, 2015:1-6.
|
[11] |
DU W, WANG Y, QIAO Y. RPAN:An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos[C]// IEEE International Conference on Computer Vision.Piscataway:IEEE, 2017:3745-3754.
|
[12] |
GE H W, YAN Z H, YU W H, et al. An Attention Mechanism Based Convolutional LSTM Network for Video Action Recognition[J]. Multimedia Tools and Applications, 2019, 78(14):1-24.
doi: 10.1007/s11042-018-6670-5
|
[13] |
TONG M, LI MY, BAI H, et al. DKD-DAD:A Novel Framework with Discriminative Kinematic Descriptor and Deep Attention-Pooled Descriptor for Action Recognition[J]. Neural Computing and Applications, 2020,5285-5302.
|
[14] |
WOO S, PARK J, LEE J Y, et al. CBAM:Convolutional Block Attention Module (2018)[J]. [2018-07-17]. https://arxiv.org/abs/1807.06521.
|
[15] |
LI Z, GAVRILYUK K, GAVVES E, et al. VideoLSTM Convolves,Attends and Flows for Action Recognition[J]. Computer Vision and Image Understanding. 2018, 166:41-50.
doi: 10.1016/j.cviu.2017.10.011
|
[16] |
王洁然. 基于高低层特征融合与卷积注意力机制的视频动作识别方法研究[D]. 武汉: 华中科技大学, 2019.
|
[17] |
HE K M, ZHANG X Y, RENS Q, et al. Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2016:770-778.
|
[18] |
SHI X, CHEN Z, WANG H, et al. Convolutional LSTM Network:A Machine Learning Approach for Precipitation Nowcasting (2015)[J/OL]. [2015-06-13]. https://arxiv.org/abs/1506.04214.
|
[19] |
LIU J, LUO J, SHAH M, et al. Recognizing Realistic Actions From Videos “In the Wild”[C]// IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2009:1996-2003.
|
[20] |
SOOMRO K, ZAMIR A R, SHAH M. UCF101:A Dataset of 101 Human Action Classes From Videos in the Wild (2012)[J/OL]. [2012-12-03]. https://arxiv.org/abs/1212.0402.
|
[21] |
JHUANG H, GARROTE H, POGGIO E, et al. A Large Video Database for Human Motion Recognition[C]// Proceedings of IEEE International Conference on Computer Vision.Piscataway:IEEE, 2011:2556-2563.
|
[22] |
KINGMA D P, BA J. Adam:A Method for Stochastic Optimization (2014)[J]. [2014-12-22]. https://arxiv.org/abs/1412.6980.
|
[23] |
NG Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond Short Snippets:Deep Networks for Video Classification[C]// IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2015:4694-4702.
|
[24] |
VAROL G, LAPTEV I, SCHMID C. Long-Term Temporal Convolutions for Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6):1510-1517.
doi: 10.1109/TPAMI.2017.2712608
|
[25] |
DIBA A, FAYYAZ M, SHARMAV, et al. Temporal 3D Convnets:New Architecture and Transfer Learning for Video Classification (2017)[J]. [2017-11-22]. https://arxiv.org/abs/1711.08200.
|
[26] |
李庆辉, 李艾华, 王涛. 结合有序光流图和双流卷积网络的行为识别[J]. 光学学报, 2018, 38(6):234-240.
|
|
LI Qinghui, LI Aihua, WANG Tao, et al. Double-Stream Convolution Networks with Sequential Optical Flow Image for Action Recognition[J]. Acta Optical Sinica, 2018, 38(6):234-240.
|