Loading...
Office

Table of Content

    20 October 2021 Volume 48 Issue 5
      
    Editorial:Introduction to the special issue on visual perception and understanding for intelligent monitoring
    GAO Xinbo,WANG Nannan,LIANG Ronghua,ZHENG Weishi,XU Mai,LU Cewu,SONG Yizhe,HAN Jungong
    Journal of Xidian University. 2021, 48(5):  1-7.  doi:10.19665/j.issn1001-2400.2021.05.001
    Abstract ( 870 )   HTML ( 1186 )   PDF (1125KB) ( 555 )   Save
    References | Related Articles | Metrics
    Video super-resolution based on multi-scale 3D convolution
    ZHAN Keyu,SUN Yue,LI Ying
    Journal of Xidian University. 2021, 48(5):  8-14.  doi:10.19665/j.issn1001-2400.2021.05.002
    Abstract ( 634 )   HTML ( 81 )   PDF (2074KB) ( 329 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Video super-resolution aims to restore high-resolution videos from low-resolution videos,which can effectively improve the display effect of videos.What is different from single image super-resolution is that how to exploit the information between contiguous video frames is important for video super-resolution.In order to improve the performance of video super-resolution and make full use of the spatio-temporal information on video frames,a video super-resolution model based on multi-scale 3D convolution is proposed,which takes continuous video frames as the input and outputs the reconstruction super-resolution result of the intermediate frame.This model consists of three modules:multi-scale feature extraction,feature fusion and high-resolution reconstruction.First,multi-scale 3D convolution is used for preliminary feature extraction.Then,3D convolution residual structure is adopted in feature fusion,and the feature maps are split,which can not only fuse the features of different scales,but also effectively reduce the number of network parameters.Finally,residual dense blocks and sub-pixel convolution are used for high-resolution reconstruction,and the reconstructed video frame is obtained by combining with the global residual connection.Experimental results of 3× and 4× super-resolution in Vid4 dataset show that compared with other methods,the proposed method can enhance the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) performance effectively with a better visual effect.

    Lightweight image super-resolution with the adaptive weight learning network
    ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei
    Journal of Xidian University. 2021, 48(5):  15-22.  doi:10.19665/j.issn1001-2400.2021.05.003
    Abstract ( 716 )   HTML ( 60 )   PDF (2445KB) ( 229 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In recent years,the single-image super-resolution (SISR) method using deep convolutional neural networks (CNN) has achieved remarkable results.The Pixel Attention Network(PAN) is one of the most advanced lightweight super-resolution methods,which can lead to a good reconstruction performance with a very small number of parameters.But the PAN is limited by the parameters of each module,resulting in slow model training and strict training conditions.To address these problems,this paper proposes a Lightweight Adaptive Weight learning Network (LAWN) for image super-resolution.The network uses multiple adaptive weight modules to form a non-linear mapping network,with each module extracting different levels of feature information.In each adaptive weight module,the network employs the attention branch and the non-attention branch to extract the corresponding information,and then the adaptive weight fusion branch is employed to integrate these two branches.Splitting and fusing the two branches with a specific convolutional layer greatly reduces the number of parameters of the attention branch and the non-attention branch,which helps the network to achieve a relative balance between the number of parameters and the performance.The quantitative evaluations on benchmark datasets demonstrate that the proposed LAWN reduces the number of model parameters and performs favorably against state-of-the-art methods in terms of both PSNR and SSIM.Experimental results show that this method can reconstruct more accurate texture details.The qualitative evaluations with better visual effects prove the effectiveness of the proposed method.

    Cloud removal method for the remote sensing image based on the GAN
    WANG Junjun,SUN Yue,LI Ying
    Journal of Xidian University. 2021, 48(5):  23-29.  doi:10.19665/j.issn1001-2400.2021.05.004
    Abstract ( 1256 )   HTML ( 59 )   PDF (2237KB) ( 287 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Since remote sensing images will inevitably be affected by the climate in the acquisition process,obtained images may contain cloud information,which affects the subsequent use of images to a large extent.Image cloud removal methods based on deep learning can remove clouds well,but they have problems such as long training time,insufficient cloud removal effect and color distortion.To solve these problems,a cloud removal method based on the end-to-end generative adversarial network (GAN) is proposed to recover clear images from remote sensing images containing clouds.First,the U-Net is used as the main structure of the generator,and a continuous memory residual module is added between the encoder module and the decoder module to mine the depth characteristics of the input information.Then,a convolutional neural network is adopted as the discriminator to distinguish authenticity.Finally,the loss function,by combining the adversarial function with the absolute loss function,is designed to measure the advantages and disadvantages of the model by calculating the gap between the output of the network model and the real data.Experimental results show that the proposed method is superior to existing cloud removal methods in both quantitative indexes (peak signal to noise ratio and structural similarity) and running time.Under the same number of parameters,the proposed method has the lowest calculation amount (GFLOPs) and a lower algorithm complexity.Besides,remote sensing images obtained by the proposed method can lead to richer detailed information,almost no color distortion,and a better subjective visual effect.

    Synthesis of the expression image and its application under the dimentional emotion model
    YANG Jingbo,ZHAO Qijun,LYU Zejun
    Journal of Xidian University. 2021, 48(5):  30-37.  doi:10.19665/j.issn1001-2400.2021.05.005
    Abstract ( 334 )   HTML ( 26 )   PDF (1881KB) ( 94 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problem that the training data of deep learning based facial expression recognition methods usually cover a limited part of the expression space and have an imbalanced distribution,we propose AV-GAN,a facial expression image synthesis method in Arousal-Valence dimensional emotion space,based on the generative adversarial network,to generate more diverse and balanced facial expression training data.The method uses label distribution to represent the expression for the face image,and employs an identity control module,an expression control module,and adversarial learning to realize the random sampling and generation of expression images in Arousal-Valence space.Evaluations on Oulu-CASIA database show that the accuracy of the recognition of the facial expression using the proposed method to augment training data is increased by 6.5%,compared with that using the original training data.It is proved that the proposed method can effectively improve the facial expression recognition accuracy under imbalanced training data.

    Fire segmentation based on the improved DeeplabV3+ and the analytical method for fire development
    NING Yang,DU Jianchao,HAN Shuo,YANG Chuankai
    Journal of Xidian University. 2021, 48(5):  38-46.  doi:10.19665/j.issn1001-2400.2021.05.006
    Abstract ( 398 )   HTML ( 26 )   PDF (2454KB) ( 112 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Fire detection and development analysis are significant for fire control.The fire segmentation based on the improved DeeplabV3+ and the analytical method for fire development are proposed:First,the low-level feature sources are added to the decoder of the DeeplabV3+,which is fused with high-level features,and the image size is gradually recovered by 2 times up sampling to retain more details and achieve more accurate fire segmentation.Then,the number of pixels obtained by each fire video frame is combined into a fire series,and key points are used to segment and linearly fit the series to obtain the key trend of fire development.Experimental results show that the proposed method can effectively analyze the fire development situation on the basis of accurate fire segmentation,and provide an effective help for fire detection and control.

    Semantic segmentation of remote sensing images based on neural architecture search
    ZHOU Peng,YANG Jun
    Journal of Xidian University. 2021, 48(5):  47-57.  doi:10.19665/j.issn1001-2400.2021.05.007
    Abstract ( 438 )   HTML ( 34 )   PDF (4115KB) ( 109 )   Save
    Figures and Tables | References | Related Articles | Metrics

    High-resolution remote sensing image segmentation based on the traditional deep convolutional neural network needs hand-crafted architectures,which is excessively dependent on expert experience,time-consuming and laborious,and the network generalization ability is poor.A neural network architecture search method of resource balanced partial channel sampling is presented in this study.First,the resource-balanced strategy is added to the network architecture parameter to minimize the updating imbalances and discretization discrepancy during pruning,so the stability of the search algorithm is improved.Second,the partial channel is sampled for the mixed operation in search space,which can effectively reduce the computing cost,improve the search efficiency and alleviate the problem of network overfitting.Finally,according to the characteristic of complex features,the discrete distribution and the wide spatial range of high resolution remote sensing images,the Gumbel-Softmax trick is introduced to improve the sampling efficiency and make the sampling process backpropagate.The proposed method can achieve 90.93% and 69.53% MIoU on the WHUBuilding and GID dataset,respectively,which outperforms the prior work like SegNet,U-Net,Deeplab v3+ and NAS-HRIS.Experimental results show that this proposed method can help search the architecture for high resolution remote sensing image segmentation efficiently and automatically,and has the advantages of a high segmentation accuracy and low computing resources.

    Method for detection of a student’s pose in a multi-scene classroom based on meta-learning
    QIAN Zhihua,GAO Chenqiang,YE Sheng
    Journal of Xidian University. 2021, 48(5):  58-67.  doi:10.19665/j.issn1001-2400.2021.05.008
    Abstract ( 304 )   HTML ( 28 )   PDF (3041KB) ( 84 )   Save
    Figures and Tables | References | Related Articles | Metrics

    To solve the problem of domain shift in different classroom scenes,this paper proposes a multi-scene classroom pose detection method based on meta-learning.In this method,a pose detection meta-model and a domain adaptive optimizer with learnable parameters are designed.Besides,the offline learning mode and online learning mode are combined to realize the fast domain adaptation of the detection model in a specific classroom scene.In the offline learning stage,the method trains the parameters of the pose detection meta-model and the adaptive domain optimizer through two-layer training.In the online learning stage,guided by the adaptive domain optimizer,the meta-model can quickly adapt to the data distribution of the scene with a few labeled images.In addition,this paper also proposes an external training optimizer which can make the double-layer training more stable.Experiments show that the detection accuracy of this method in multi-scene classroom pose detection dataset is better than that of the current popular object detection models,and that it also has a good domain adaptation effect for new scenes with a few labeled images.

    Human body detection algorithm in complex monitoring scenes
    ZHANG Shuwei,LI Junmin
    Journal of Xidian University. 2021, 48(5):  68-77.  doi:10.19665/j.issn1001-2400.2021.05.009
    Abstract ( 7038 )   HTML ( 136 )   PDF (2697KB) ( 135 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In the video surveillance scene,due to the influence of factors such as complex background,multi-posture and occlusion,existing human body detection algorithms have problems such as low accuracy and weak model generalization ability.In response to the above problems,we have designed a detection network feature fusion method and feature map generation strategy based on the feature image pyramid and multi-scale receptive field theory.By relying on the lightweight feature image pyramid technology and combining optimization methods such as data enhancement,anchor box matching strategy and occlusion loss function,we have further proposed a human body detection algorithm EFIPNet based on the deep neural network.Meantime,in order to fully verify the effectiveness of the EFIPNet algorithm,this paper establishes 4 diversified video surveillance scene data sets,which involves a total of 50 common human body postures.The validation of the algorithm shows that the human detection network we have designed can effectively improve the detection accuracy of the human body,and achieve accurate and real-time human body detection in complex monitoring scenarios.In addition,in order to verify the effectiveness of different modules in the EFIPNet algorithm,we have used the ablation research method to analyze the influence of the main modules in the network on the performance of the human body detection model.On the Person dataset,compared with the SSD detection algorithm,the EFIPNet algorithm improves the detection accuracy of human targets by 4.34% while maintaining the detection speed of 45 frames per second.

    Gait recognition method combining LSTM and CNN
    QI Yanjun,KONG Yueping,WANG Jiajing,ZHU Xudong
    Journal of Xidian University. 2021, 48(5):  78-85.  doi:10.19665/j.issn1001-2400.2021.05.010
    Abstract ( 489 )   HTML ( 34 )   PDF (1465KB) ( 153 )   Save
    Figures and Tables | References | Related Articles | Metrics

    To solve the problem of influence factors such as view angle and other external factors of variation on gait recognition,we propose a novel and practical gait recognition method combining Long Short Term Memory and Convolutional Neural Networks.Focusing on the three-dimensionality of gait,the new method uses human three-dimensional (3D) pose estimation to obtain 3D coordinates of joints.Then,by analyzing the periodic motion constraint relationships between joints in 3D space,a robust 3D gait constraint model is designed from time and space dimensions.In the model,the motion constraint matrix characterizes both the temporal constraint relationships between joint motion and human body structure,while the action feature matrix characterizes the spatial constraint relationships of the joint position.In addition,based on the characteristics of the 3D gait constraint model,a parallel deep gait recognition network consisting of Long Short Term Memory and Convolutional Neural Networks is developed to extract spatiotemporal features of the model.Finally,the proposed method is evaluated on multi-view gait database CASIA-B.Experimental results show that the recognition rate of the new method is higher than that of some classic methods.At the same time,the recognition rate does not decrease significantly in the case of great view angle changes,illustrating that our method has a state-of-the-art performance and is robust to view changes.

    Faceanti-spoofing method using the optical flow features of back ground
    KONG Yueping,LIU Chu,ZHU Xudong
    Journal of Xidian University. 2021, 48(5):  86-91.  doi:10.19665/j.issn1001-2400.2021.05.011
    Abstract ( 266 )   HTML ( 25 )   PDF (1510KB) ( 56 )   Save
    Figures and Tables | References | Related Articles | Metrics

    The face recognition system is vulnerable to spoofing attacks by presenting photosor videos of a valid user,but there is a hand shaking existing when the illegal user holds the photosor videos.Focused on this phenomenon to represent motion information by optical flow features,an ovelface anti-spoofing method which exploits the background of optical flow features is proposed.The method took advantage of the background motion difference between the registered user and the invalid user to analyze the optical flow angle distribution of outside the face in multi comparing regions.And the similarity function is built to measure the correlation degree between the discograms of optical flow angle distribution in comparing regions.According to the measurements of the background comparing regions,their motion consistency have been evaluated,then the real face or the attacked face will be detected.Experiments have been done on the publicface anti-spoofing database of the Replay Attack and CASIA-FASD.The accuracy of the new method is 97.87% and 90.95%,respectively,which shows that the new method can effectively detect the attacker hold photosor videos with the background region.

    Multi-scalefusion sketch recognition model by dilated convolution
    YANG Yunhang,MIN Lianquan
    Journal of Xidian University. 2021, 48(5):  92-99.  doi:10.19665/j.issn1001-2400.2021.05.012
    Abstract ( 338 )   HTML ( 20 )   PDF (1044KB) ( 61 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Focused on the issue that existing sketch recognition methods based on deep learning still use ordinary convolution as the main method of sketch feature extraction,ignoring the sparsity characteristics of sketch objects,this paper proposes a sketch recognition model based on dilated convolution.This model combines the dilated convolution and ordinary convolution by using the dilated convolution’s characteristics of expanding the receptive field without increasing the number of effective units of the convolution kernel,to realize the preliminary extraction of the structural features of the sketch.Due to the sparsely sampled input signal of the dilated convolution,there is no correlation between the information obtained by the long-distance convolution,which will affect the classification result.Therefore,the model uses the dilated convolution and ordinary convolution to extract the input image features separately,and finally adds the feature output by the two different convolution methods in the channel dimension.This method not only takes advantage of the sparse sampling characteristics of the dilated convolution,but also makes full use of the advantages of multi-scale information from different convolution methods.Experimental results show that this model has achieved a recognition accuracy of 72.6% on the TU-Berlin SKetch dataset,indicating that it has certain advantages over the current mainstream sketch recognition methods.

    Facial expression recognition based on local representation
    CHEN Changchuan,WANG Haining,HUANG Lian,HUANG Tao,LI Lianjie,HUANG Xiangkang,DAI Shaosheng
    Journal of Xidian University. 2021, 48(5):  100-109.  doi:10.19665/j.issn1001-2400.2021.05.013
    Abstract ( 499 )   HTML ( 21 )   PDF (4059KB) ( 112 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Expression is an important embodiment of human inner emotion change.Current expression recognition methods usually rely on global facial features,ignoring local features extraction.Psychologists point out that different facial expressions correspond to different regions of local muscle movement.In this paper,we propose an expression recognition algorithm based on local representation,referred to as EAU-CNN.In order to extract the local features of the face,the whole face image is first divided into 43 sub-regions according to the 68 feature points of the face.Then,8 local candidate regions covered by the muscle motion region and the facial salient organs are selected as the input of the convolution neural network.In order to balance the features of local candidate regions,the EAU-CNN adopts 8 parallel feature extraction branches,each of which dominates the full connected layer of different dimensions.The outputs of the branches are adaptively connected in terms of attention to highlight the importance of different local candidate regions.Finally,the expressions are divided into 7 categories:neutral,angry,disgusted,surprised,happy,sad and afraid by the Softmax function.In this paper,the algorithm is evaluated on CK +,JAFFE and custom FED datasets.The average accuracy of the proposed algorithm is 99.85%,96.61% and 98.6%,respectively.The evaluation index is 6.01%,10.17%,6.09% higher than that of the S-Patches algorithm.The results show that local representation can improve the performance of emotional recognition.

    Multi-scale single object tracking based on the attention mechanism
    SONG Jianfeng,MIAO Qiguang,WANG Chongxiao,XU Hao,YANG Jin
    Journal of Xidian University. 2021, 48(5):  110-116.  doi:10.19665/j.issn1001-2400.2021.05.014
    Abstract ( 547 )   HTML ( 87 )   PDF (2181KB) ( 113 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In the process of single object tracking,due to problems of occlusion,disappearance and similar target interference,the tracking accuracy of the algorithm will be reduced.In order to solve these problems,a multi-scale single target tracking algorism based on the attention mechanism is proposed which uses asymmetric convolution to extract the multi-scale feature while reducing the parameters.It combines local features and global features to improve tracking robustness.The online update algorithm based on the attention mechanism is used which combines the response diagram and attention diagram to calculate a score which is used to weed out frames without targets.The attention mechanism strengthens the ability to distinguish the target and background,makes the network quickly adapt to the changes of the appearance of the targets,and improves the tracking performance of the algorithm.The algorithm is tested on OTB-100 datasets with other advanced tracking methods.Compared to the ATOM,the accuracy and success rate of our method are improved by 0.9% and 0.8% respectively and it is easier to retrieve the target after it is lost.

    Target tracking control algorithm for small size quad-rotor helicopter
    GU Zhaojun,CHEN Hui,WANG Jialiang,GAO Bing
    Journal of Xidian University. 2021, 48(5):  117-127.  doi:10.19665/j.issn1001-2400.2021.05.015
    Abstract ( 338 )   HTML ( 29 )   PDF (4933KB) ( 88 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In order to improve the real-time performance and stability of target tracking by using the video frames transmitted from the monocular camera of a quad-rotor,an algorithm for video frame preprocessing based on the attention model and attitude adjustment of the quad-rotor based on feedback control is proposed.First,the attention model is used to preprocess the video images collected by the quad-rotor,and some frames which contribute more to the tracking results are selected.Second,the error of feedback control is obtained by calculating the position deviation and area deviation of the pixel points of the marquee.Finally,the error of feedback control is converted into the numerical change in motor speed of the four-axis aircraft and the motor speed control instruction are sent to the aircraft,which can make attitude adjustment of the yaw and forward and backward movement in time.The actual flight verification is executed on the Tello aircraft platform,andexperimental results shows that the target tracking algorithm proposed in this paper improves both the real-time performance of image processing and the flight stability of aircraft.Therefore,the algorithm proposed in this paper can provide a powerful technical support for the image processing application of quad-rotor aircraft.

    Application of the deep fusion mechanism in object detection of remote sensing images
    DONG Ruchan,JIAO Licheng,ZHAO Jin,SHEN Weiyan
    Journal of Xidian University. 2021, 48(5):  128-138.  doi:10.19665/j.issn1001-2400.2021.05.016
    Abstract ( 475 )   HTML ( 29 )   PDF (4449KB) ( 79 )   Save
    Figures and Tables | References | Related Articles | Metrics

    A new target detection technology for remote sensing images based on the deep fusion mechanism is proposed,which combines the multi-scale,attention mechanism and broad learning system based on the deep convolutional neural network.This technology focuses effectively on the high-level semantic information of remote sensing images and the characteristics of small targets.Because of the problem of manual adjustment of hyperparameters in the broad learning system,the author proposes a broad learning system based on the Bayesian network search,which can learn intelligently.A set of parameter values applicable to different remote sensing images can efficiently identify targets.Compared with other state-of-the-art methods,experimental results show that this technology can effectively solve the problems of a slow detection speed,a low recognition accuracy,and small targets in remote sensing image target detection tasks.

    Ballistic target fretting classification network based on Bayesian optimization
    LI Peng,FENG Cunqian,XU Xuguang,TANG Zixiang
    Journal of Xidian University. 2021, 48(5):  139-148.  doi:10.19665/j.issn1001-2400.2021.05.017
    Abstract ( 256 )   HTML ( 21 )   PDF (2729KB) ( 58 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Ballistic target recognition plays an important role in the current military anti-missile system.Different ballistic targets show different fretting characteristics due to their different motion characteristics,so fretting features are widely used in ballistic target recognition.Since artificial selection for ballistic target classification of the micro structure of neural network parameters needs human experience and a large amount of computing time,and does not guarantee the optimal parameters of the problem,we suggest using the bayesian optimization algorithm to automatically obtainthe convolution method for neural network parameters and the optimal structure,in order to improve the classification performance of the neural network for the micro motion features.Experimental results show that the bayesian optimization algorithm can quickly accomplish the parameter optimization of the convolutional neural network,and that the convolutional neural network model has a good recognition effect and is robust.

    UAV object tracking via the correlation filter with the response divergence constraint
    WANG Haijun,ZHANG Shengyan,DU Yujie
    Journal of Xidian University. 2021, 48(5):  149-155.  doi:10.19665/j.issn1001-2400.2021.05.018
    Abstract ( 300 )   HTML ( 20 )   PDF (9711KB) ( 56 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that targets are easily subject to deformation and background clutter interference in the drone sequences,this paper proposes a novel unmanned aerial vehicle (UAV) object tracking method based on the correlation filter with the response divergence constraint.According to the consistency of the filter variation between the previous frame and current frame,the response divergence of different filters acting on the same training sample is modeled.Furthermore,an objective function with the constraint mechanism is built,which can learn the target variation accurately and promote the robustness of filters.Meanwhile,an auxiliary is introduced to construct the optimization function.The alternating direction method of multipliers is used to optimize the solution of the filter and auxiliary variable.We have tested the proposed algorithm and eleven state-of-the-art algorithms on three UAV video databases including DTB70,UAV123@10fps and UAVDT.Experimental results demonstrate that our method is superior to comparison algorithms on two evaluation indicators such as tracking accuracy and success rate and has good robustness for illumination variation,deformation,occlusion,motion blur and other challenging attributes in complex environments from the view of UAV.Meanwhile,the average tracking rate of our algorithm reaches 21.7 frames per second,which meets the real-time requirements of UAV.

    Detection of the object in the fast remote sensing airport area on the improved YOLOv3
    HAN Yongsai,MA Shiping,HE Linyuan,LI Chenghao,ZHU Mingming,ZHANG Fei
    Journal of Xidian University. 2021, 48(5):  156-166.  doi:10.19665/j.issn1001-2400.2021.05.019
    Abstract ( 476 )   HTML ( 93 )   PDF (4992KB) ( 96 )   Save
    Figures and Tables | References | Related Articles | Metrics

    The detection of remote sensing airport regional objects is of great military and civilian significance.In order to achieve fast and accurate detection results,a data set that is more mission-specific is independently constructed for the detection task.We use the representative network YOLOv3 of the one-step regression global detection method as the basic framework.For the problem of uneven distribution of categories in the data set,the use of generated data isproposed.The method uses generative adversarial networks to perform targeted data expansion to obtain data sets with domain transformation characteristics and more balanced distribution of different types of data.At the same time,the improved DWFPN detection component is used to fusion deeper distinguishable and more robust features.Experiments show that,compared with the original network,the improved network brings 4.98% increase in mean average precision (mAP) and 8.33% increase in average IOU,which reaches 89.07% mAP and 61.97% average IOU,respectively.At the same time,the average detection time of the improved network is 0.0625s,which is 7 times faster than the RetinaNet-101 network with a similar detection rate.Experiments prove the effectiveness of the network and its practicality for specific tasks.

    Joint spatial reliability and correlation filter learning for visual tracking
    ZHANG Fei,MA Shiping,ZHANG Lichao,HE Linyuan,QIU Zhuling,HAN Yongsai
    Journal of Xidian University. 2021, 48(5):  167-177.  doi:10.19665/j.issn1001-2400.2021.05.020
    Abstract ( 249 )   HTML ( 19 )   PDF (4333KB) ( 53 )   Save
    Figures and Tables | References | Related Articles | Metrics

    The discriminant correlation filter (DCF) uses the cyclic shift to generate negative samples,which inevitably brings boundary effects.The background-aware correlation filter (BACF) attempts to use the clipping matrix to obtain more real negative samples.The method can not only effectively alleviate the influence of the boundary effect,but also enhance the learning of background information.However,the use of the clipping matrix lacks the learning of the spatial reliability of different positions,which may cause the background information to dominate the learning of the filter.In order to solve this problem,this paper introduces the learning of spatial reliability into the correlation filter.And the Alternate Direction Method is used to iteratively obtain the solution of spatial reliability and the filter.Our method can strengthen the learning of the spatial reliability region and enhance the filter's ability to discriminate targets and background.In addition,in order to optimize the model update strategy,an adaptive model update method based on the Perceptual Hash Algorithm is proposed,which improves the effectiveness of filter learning.The proposed algorithm has been comprehensively evaluated on standard visual tracking datasets.The results verify the effectiveness and real-time performance of the algorithm.

    Vehicle video surveillance and analysis system for the expressway
    MAO Zhaoyong,WANG Yichen,WANG Xin,SHEN Junge
    Journal of Xidian University. 2021, 48(5):  178-189.  doi:10.19665/j.issn1001-2400.2021.05.021
    Abstract ( 304 )   HTML ( 23 )   PDF (3933KB) ( 58 )   Save
    Figures and Tables | References | Related Articles | Metrics

    With the rapid development of video surveillance technology in the application of road safety,in order to realize the intelligent management of the expressway,this paper proposes a vehicle video surveillance and analysis system for the expressway.By detecting and tracking the vehicles for the surveillance videos,the applications of expressway related vehicle monitoring are further realized.The system presents a lightweight vehicle detection and tracking algorithm based on bidirectional pyramid multi-scale integration.The algorithm uses the lightweight network EfficientNet based on YOLOv3,and uses the bidirectional feature pyramid network (BiFPN) for multi-scale feature fusion.This system could ensure the real-time detection and improve the detection accuracy.Furthermore,in this paper,a multi-scene-highway-vehicles dataset is constructed by collecting freeway monitoring videos.Experimental results of this dataset shows that the detection accuracy of the proposed algorithm is 97.11%,which is 16.5% higher than that of the original YOLOv3 detection algorithm,and that the algorithm could run in real time at 31fps on vehicle tracking by combining with the DeepSORT model.At the same time,the vehicle monitoring system could realize multi-channel real-time detections in the field of vehicle flow statistics and traffic abnormal event detection,which is of practical application value.

    Cross-modality person re-identification utilizing the hybrid two-stream neural networks
    CHENG De,HAO Yi,ZHOU Jingyu,WANG Nannan,GAO Xinbo
    Journal of Xidian University. 2021, 48(5):  190-200.  doi:10.19665/j.issn1001-2400.2021.05.022
    Abstract ( 252 )   HTML ( 23 )   PDF (2167KB) ( 61 )   Save
    Figures and Tables | References | Related Articles | Metrics

    An infrared image can effectively make up for the shortcomings of single-modality visible-light image data under low illumination conditions.Therefore,the study of cross-modality Visible-to-Infrared person re-identification will provide a strong technical support for constructing an intelligence video surveillance system under various lighting conditions.The key for cross-modality person re-identification is to construct a unified shared feature representation among multi-modal data,which needs to effectively distinguish the modal-shared/modal-specific feature information in the cross-modal data.Based on this,this paper proposes a cross-modality person re-identification method based on a hybrid dual-channel neural network.This method deeply analyzes the influence of the parameter-shared layer and the non-shared parameter layer on the cross-modality person re-identification model in the hybrid dual-channel neural network architecture.Besides cross-entropy loss,we also use the intra-class distribution and inter-class correlation constraints in the loss function to further improve the re-identification performance.In the optimization process,we effectively utilize the adaptive learning rate adjustment strategy to improve the feature learning capability of the neural network architecture.Experimental results illustrate the effectiveness of the proposed method.Also,we obtain a superior performance on the two widely used cross-modality person re-identification benchmark datasets,SYSU-MM01 and RegDB.

    Person re-identification method combining the DD-GAN and Global feature in a coal mine
    SUN Yanjing,WEI Li,ZHANG Nianlong,YUN Xiao,DONG Kaiwen,GE Min,CHENG Xiaozhou,HOU Xiaofeng
    Journal of Xidian University. 2021, 48(5):  201-211.  doi:10.19665/j.issn1001-2400.2021.05.023
    Abstract ( 366 )   HTML ( 22 )   PDF (3972KB) ( 52 )   Save
    Figures and Tables | References | Related Articles | Metrics

    It is of great significance to the smarter safe production of coal to control and analyze the video data obtained by multiple surveillance cameras in each important area of the coal mine,and to locate and identify the workers in the video.However,due to the dim light and uneven illumination in the mine,the existing conventional method of person re-identification (Re-ID) cannot meet the requirements in the coal mine.In order to solve the above problems,this paper proposes a Re-ID method combining the Dual-Discriminator Generative Adversarial Network and global feature in coal mine.First,the Dual-Discriminator Generative Adversarial Network(DD-GAN) is designed to enhance and restore images with dim light or uneven illumination,providing a more discriminating image foundation for person re-identification.Second,the Global Feature Network for Re-ID is proposed in the coal mine to solve the miner’s identification problem,and the methods of Random erasing and Re-ranking of k-reciprocal nearest neighbors are used to improve further the robustness and accuracy of the Re-ID network.Finally,the Miner-CUMT dataset suitable for special downhole scenes is constructed,which solves the problems of the single scene of existing sample sets and improves the generalization of the method presented in this paper.The proposed method has achieved good results in the Miner-CUMT dataset and actual scene in the coal mine,which lays an important foundation for the development of intelligent and safe production in coal mines.

    Deep asymmetric compression Hashing algorithm
    YAN Jia,CAO Yudong,REN Jiaxing,CHEN Donghao,LI Xiaohui
    Journal of Xidian University. 2021, 48(5):  212-221.  doi:10.19665/j.issn1001-2400.2021.05.024
    Abstract ( 238 )   HTML ( 10 )   PDF (1970KB) ( 45 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Most existing deep supervised hashing algorithms in image retrieval fail to effectively utilize difficult samples and the supervised information.In order to solve the this problem,an end-to-end asymmetric compression hashing algorithm is proposed which divides the output space of the network into the query set and database set,constructs the supervised data matrix,and effectively uses the global supervised information in an asymmetric way.Meanwhile,the gathering degree of within-class hash codes and the separation degree of inter-class hash codes are explicitly constrained in the loss function,which improves the discriminative ability of the model on difficult samples under training.First,the hashing layer and thresholding layer are added into the improved backbone feature extraction network,SKNet-50,which outputs the query set matrix.Then,the matrix of the database set is obtained by optimizing the loss function with the alternating direction method of multipliers(ADMM).Finally,the deep model is trained with the alternative optimization method.The proposed method can achieve 0.946,0.923 and 0.811 MAP on the CIFAR-10,NUS-WIDE and MS-COCO datasets,respectively,when the 48-bit hash code is used to retrieve images.Experimental results show that the proposed method can learn more discriminative and compact hash codes,and that the retrieval accuracy is superior to the current mainstream algorithm.

    Survey of self-supervised video representation learning
    TIAN Chunna,YE Yanyu,SHAN Xiao,DING Yuxuan,ZHANG Xiangnan
    Journal of Xidian University. 2021, 48(5):  222-230.  doi:10.19665/j.issn1001-2400.2021.05.025
    Abstract ( 579 )   HTML ( 36 )   PDF (1238KB) ( 143 )   Save
    Figures and Tables | References | Related Articles | Metrics

    Learning high-quality video representations is helpful for the machine to accurately understand the video content.Video representation based on supervised learning needs to annotate massive amounts of video data,which is extremely time-consuming and laborious.Thus,self-supervised video representation,which adopts unannotated data,has become a hot research topic.Self-supervised video representation learning uses massive amounts of unlabeled data.It uses the temporal-spatial continuity of videos as the supervision information to design auxiliary tasks for representation learning,and then applies the learned video representations to downstream tasks.For lack of the survey on new developments of self-supervised video representation learning,we analyze and summarize the methods for self-supervised video representation learning,which are mostly published in recent three years.According to the information used in pretext tasks,we categorize the methods into three groups:Time series information,temporal-spatial information and multi-modal information based ones.We compare the experimental results of self-supervised video representation learning on two downstream tasks of action recognition and video retrieval,and then analyze the advantages and disadvantages of those models and the reason behind it.Finally,we summarize the existing issues and propose the promising prospects on self-supervised video representation learning.

    A multi-frame track before detect algorithm utilizing measurement space clustering
    ZHANG Jiaqi,TAO Haihong,ZHANG Xiushe,HAN Chunlei
    Journal of Xidian University. 2021, 48(5):  231-238.  doi:10.19665/j.issn1001-2400.2021.05.026
    Abstract ( 330 )   HTML ( 17 )   PDF (1168KB) ( 48 )   Save
    Figures and Tables | References | Related Articles | Metrics

    A multi-frame track before detect (TBD) algorithm based on Measurement Space clustering is proposed to solve the high computational complexity problem of the existing multi-frame TBD algorithm in the multiple targets scenario.In the proposed algorithm,a track extrapolation strategy is used to construct an association set among the tracks of continuous data frames,and then a label is constructed according to the association set to obtain the distribution information on multiple targets in the measurement space,a multi-frame measurement points is divided into clusterings,and a multi-frame TBD Algorithm is implemented in each clustering.Simulation result and performance analysis show that the detection performance of the proposed algorithm is comparable to that of the existing multi-frame TBD algorithm,and that the computational complexity is greatly reduced.

    Improved NSGA-II algorithm and research on monitoring antenna optimization deployment
    DU Wenzhan,YU Zhiyong,YANG Jian,JIANG Haibin
    Journal of Xidian University. 2021, 48(5):  239-248.  doi:10.19665/j.issn1001-2400.2021.05.027
    Abstract ( 343 )   HTML ( 84 )   PDF (2548KB) ( 56 )   Save
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the optimum deployment location of the ground radiation monitoring antenna to achieve accurate monitoring and efficient deployment,a multi-objective optimization model is constructed to maximize the coverage rate,minimize the redundancy rate and the number of uncovered obstacles.And the model is constrained by communication between antennas.By comparing the relationship between the distance between two obstacles and the maximum radius of the antenna's working coverge range,the reference points are preset as the antenna deployment location.The effects of the number of obstacles and the number of reference points on the deployment efficiency such as coverage rate and redundancy rate are compared by simulation.It is verified that the number of obstacles has a greater effect on the redundancy rate,and that the number of obstacles in the antenna coverage area is reduced at the expense of the increased redundancy rate.The second generation of the non-dominant sorting genetic (NSGA-II) algorithm is improved by using the memory to store the previous generations of non-dominant solutions.After changing the environment,the new optimal solution can be obtained faster and the convergence speed can be improved.The simulation results of the model and the improved algorithm show that the iteration speed is increased by more than 30% on average,and that the average iteration time is reduced by more than 25%,which verifies the effectiveness of the improved algorithm in response to dynamic changes.The convergence and diversity of the improved algorithm are verified by the algorithm evaluation index mIGDB.