Journal articles on the topic 'Human-object Interaction Detection'

To see the other types of publications on this topic, follow the link: Human-object Interaction Detection.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Human-object Interaction Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Weifeng, Hongbing Yang, Zhou Lei, and Dawei Niu. "Distance-based Human-Object Interaction Detection." Journal of Physics: Conference Series 1920, no. 1 (May 1, 2021): 012073. http://dx.doi.org/10.1088/1742-6596/1920/1/012073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Jiali, Zuriahati Mohd Yunos, and Habibollah Haron. "Interactivity Recognition Graph Neural Network (IR-GNN) Model for Improving Human–Object Interaction Detection." Electronics 12, no. 2 (January 16, 2023): 470. http://dx.doi.org/10.3390/electronics12020470.

Full text
Abstract:
Human–object interaction (HOI) detection is important for promoting the development of many fields such as human–computer interactions, service robotics, and video security surveillance. A high percentage of human–object pairs with invalid interactions are discovered in the object detection phase of conventional human–object interaction detection algorithms, resulting in inaccurate interaction detection. To recognize invalid human–object interaction pairs, this paper proposes a model structure, the interactivity recognition graph neural network (IR-GNN) model, which can directly infer the probability of human–object interactions from a graph model architecture. The model consists of three modules: The first one is the human posture feature module, which uses key points of the human body to construct relative spatial pose features and further facilitates the discrimination of human–object interactivity through human pose information. Second, a human–object interactivity graph module is proposed. The spatial relationship of human–object distance is used as the initialization weight of edges, and the graph is updated by combining the message passing of attention mechanism so that edges with interacting node pairs obtain higher weights. Thirdly, the classification module is proposed; by finally using a fully connected neural network, the interactivity of human–object pairs is binarily classified. These three modules work in collaboration to enable the effective inference of interactive possibilities. On the datasets HICO-DET and V-COCO, comparative and ablation experiments are carried out. It has been proved that our technology can improve the detection of human–object interactions.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu, and Wang Liu. "Multi-stream Network for Human-object Interaction Detection." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 08 (March 12, 2021): 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Full text
Abstract:
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Tianlang, Tao Lu, Wenhua Fang, and Yanduo Zhang. "Human–Object Interaction Detection with Ratio-Transformer." Symmetry 14, no. 8 (August 11, 2022): 1666. http://dx.doi.org/10.3390/sym14081666.

Full text
Abstract:
Human–object interaction (HOI) is a human-centered object detection task that aims to identify the interactions between persons and objects in an image. Previous end-to-end methods have used the attention mechanism of a transformer to spontaneously identify the associations between persons and objects in an image, which effectively improved detection accuracy; however, a transformer can increase computational demands and slow down detection processes. In addition, the end-to-end method can result in asymmetry between foreground and background information. The foreground data may be significantly less than the background data, while the latter consumes more computational resources without significantly improving detection accuracy. Therefore, we proposed an input-controlled transformer, “ratio-transformer” to solve an HOI task, which could not only limit the amount of information in the input transformer by setting a sampling ratio, but also significantly reduced the computational demands while ensuring detection accuracy. The ratio-transformer consisted of a sampling module and a transformer network. The sampling module divided the input feature map into foreground versus background features. The irrelevant background features were a pooling sampler, which were then fused with the foreground features as input data for the transformer. As a result, the valid data input into the Transformer network remained constant, while irrelevant information was significantly reduced, which maintained the foreground and background information symmetry. The proposed network was able to learn the feature information of the target itself and the association features between persons and objects so it could query to obtain the complete HOI interaction triplet. The experiments on the VCOCO dataset showed that the proposed method reduced the computational demand of the transformer by 57% without any loss of accuracy, as compared to other current HOI methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Kunlun, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui Xu, Luxin Yan, Sheng Zhong, and Xu Zou. "Effective actor-centric human-object interaction detection." Image and Vision Computing 121 (May 2022): 104422. http://dx.doi.org/10.1016/j.imavis.2022.104422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kogashi, Kaen, Yang Wu, Shohei Nobuhara, and Ko Nishino. "Human–object interaction detection with missing objects." Image and Vision Computing 113 (September 2021): 104262. http://dx.doi.org/10.1016/j.imavis.2021.104262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gao, Yiming, Zhanghui Kuang, Guanbin Li, Wayne Zhang, and Liang Lin. "Hierarchical Reasoning Network for Human-Object Interaction Detection." IEEE Transactions on Image Processing 30 (2021): 8306–17. http://dx.doi.org/10.1109/tip.2021.3093784.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fang, Hao-Shu, Yichen Xie, Dian Shao, and Cewu Lu. "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1291–99. http://dx.doi.org/10.1609/aaai.v35i2.16217.

Full text
Abstract:
Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Xinpeng, Yong-Lu Li, and Cewu Lu. "Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1819–27. http://dx.doi.org/10.1609/aaai.v36i2.20075.

Full text
Abstract:
Human-Object Interaction (HOI) detection plays a core role in activity understanding. As a compositional learning problem (human-verb-object), studying its generalization matters. However, widely-used metric mean average precision (mAP) fails to model the compositional generalization well. Thus, we propose a novel metric, mPD (mean Performance Degradation), as a complementary of mAP to evaluate the performance gap among compositions of different objects and the same verb. Surprisingly, mPD reveals that previous methods usually generalize poorly. With mPD as a cue, we propose Object Category (OC) Immunity to boost HOI generalization. The idea is to prevent model from learning spurious object-verb correlations as a short-cut to over-fit the train set. To achieve OC-immunity, we propose an OC-immune network that decouples the inputs from OC, extracts OC-immune representations, and leverages uncertainty quantification to generalize to unseen objects. In both conventional and zero-shot experiments, our method achieves decent improvements. To fully evaluate the generalization, we design a new and more difficult benchmark, on which we present significant advantage. The code is available at https://github.com/Foruck/OC-Immunity.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhong, Xubin, Changxing Ding, Xian Qu, and Dacheng Tao. "Polysemy Deciphering Network for Robust Human–Object Interaction Detection." International Journal of Computer Vision 129, no. 6 (April 19, 2021): 1910–29. http://dx.doi.org/10.1007/s11263-021-01458-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Su, Zhan, Yuting Wang, Qing Xie, and Ruiyun Yu. "Pose graph parsing network for human-object interaction detection." Neurocomputing 476 (March 2022): 53–62. http://dx.doi.org/10.1016/j.neucom.2021.12.085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Liu, Lu, and Robby T. Tan. "Human object interaction detection using two-direction spatial enhancement and exclusive object prior." Pattern Recognition 124 (April 2022): 108438. http://dx.doi.org/10.1016/j.patcog.2021.108438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Yuan, Hangjie, Mang Wang, Dong Ni, and Liangpeng Xu. "Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3206–14. http://dx.doi.org/10.1609/aaai.v36i3.20229.

Full text
Abstract:
Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI detection models thrive, their paradigm of parallel human/object detection and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one HOI triplet gives direct clues to the verb to be predicted. In this paper, we aim to boost end-to-end models with object-guided statistical priors. Specifically, We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy. Similarity KL (SKL) loss is proposed to optimize VSM to align with the HOI dataset's priors. To overcome the static semantic embedding problem, we propose to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC). The above modules combined composes Object-guided Cross-modal Calibration Network (OCN). Experiments conducted on two popular HOI detection benchmarks demonstrate the significance of incorporating the statistical prior knowledge and produce state-of-the-art performances. More detailed analysis indicates proposed modules serve as a stronger verb predictor and a more superior method of utilizing prior knowledge. The codes are available at https://github.com/JacobYuan7/OCN-HOI-Benchmark.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Haoran, Licheng Jiao, Fang Liu, Lingling Li, Xu Liu, Deyi Ji, and Weihao Gan. "IPGN: Interactiveness Proposal Graph Network for Human-Object Interaction Detection." IEEE Transactions on Image Processing 30 (2021): 6583–93. http://dx.doi.org/10.1109/tip.2021.3096333.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Xu, Bingjie, Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. "Interact as You Intend: Intention-Driven Human-Object Interaction Detection." IEEE Transactions on Multimedia 22, no. 6 (June 2020): 1423–32. http://dx.doi.org/10.1109/tmm.2019.2943753.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Lim, JunYi, Vishnu Monn Baskaran, Joanne Mun-Yee Lim, KokSheik Wong, John See, and Massimo Tistarelli. "ERNet: An Efficient and Reliable Human-Object Interaction Detection Network." IEEE Transactions on Image Processing 32 (2023): 964–79. http://dx.doi.org/10.1109/tip.2022.3231528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Shieh, Ming-Yuan, Chung-Yu Hsieh, and Tsung-Min Hsieh. "Fuzzy visual detection for human-robot interaction." Engineering Computations 31, no. 8 (October 28, 2014): 1709–19. http://dx.doi.org/10.1108/ec-11-2012-0292.

Full text
Abstract:
Purpose – The purpose of this paper is to propose a fast object detection algorithm based on structural light analysis, which aims to detect and recognize human gesture and pose and then to conclude the respective commands for human-robot interaction control. Design/methodology/approach – In this paper, the human poses are estimated and analyzed by the proposed scheme, and then the resultant data concluded by the fuzzy decision-making system are used to launch respective robotic motions. The RGB camera and the infrared light module aim to do distance estimation of a body or several bodies. Findings – The modules not only provide image perception but also objective skeleton detection. In which, a laser source in the infrared light module emits invisible infrared light which passes through a filter and is scattered into a semi-random but constant pattern of small dots which is projected onto the environment in front of the sensor. The reflected pattern is then detected by an infrared camera and analyzed for depth estimation. Since the depth of object is a key parameter for pose recognition, one can estimate the distance to each dot and then get depth information by calculation of distance between emitter and receiver. Research limitations/implications – Future work will consider to reduce the computation time for objective estimation and to tune parameters adaptively. Practical implications – The experimental results demonstrate the feasibility of the proposed system. Originality/value – This paper achieves real-time human-robot interaction by visual detection based on structural light analysis.
APA, Harvard, Vancouver, ISO, and other styles
18

Lee, Geonu, Kimin Yun, and Jungchan Cho. "Improved Human-Object Interaction Detection Through On-the-Fly Stacked Generalization." IEEE Access 9 (2021): 34251–63. http://dx.doi.org/10.1109/access.2021.3061208.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

ISHII, Masaki, Tatsuki ISHIJIMA, and Shinya FUJINO. "Basic Study of Object Recognition Based on Detection of Human Interaction." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 30, no. 5 (October 15, 2018): 675–81. http://dx.doi.org/10.3156/jsoft.30.5_675.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Zhimin, Cheng Zou, Yu Zhao, Boxun Li, and Sheng Zhong. "Improving Human-Object Interaction Detection via Phrase Learning and Label Composition." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1509–17. http://dx.doi.org/10.1609/aaai.v36i2.20041.

Full text
Abstract:
Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding. We propose PhraseHOI, containing a HOI branch and a novel phrase branch, to leverage language prior and improve relation expression. Specifically, the phrase branch is supervised by semantic embeddings, whose ground truths are automatically converted from the original HOI annotations without extra human efforts. Meanwhile, a novel label composition method is proposed to deal with the long-tailed problem in HOI, which composites novel phrase labels by semantic neighbors. Further, to optimize the phrase branch, a loss composed of a distilling loss and a balanced triplet loss is proposed. Extensive experiments are conducted to prove the effectiveness of the proposed PhraseHOI, which achieves significant improvement over the baseline and surpasses previous state-of-the-art methods on Full and NonRare on the challenging HICO-DET benchmark.
APA, Harvard, Vancouver, ISO, and other styles
21

Kim, Daesik, Gyujeong Lee, Jisoo Jeong, and Nojun Kwak. "Tell Me What They're Holding: Weakly-Supervised Object Detection with Transferable Knowledge from Human-Object Interaction." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11246–53. http://dx.doi.org/10.1609/aaai.v34i07.6784.

Full text
Abstract:
In this work, we introduce a novel weakly supervised object detection (WSOD) paradigm to detect objects belonging to rare classes that have not many examples using transferable knowledge from human-object interactions (HOI). While WSOD shows lower performance than full supervision, we mainly focus on HOI as the main context which can strongly supervise complex semantics in images. Therefore, we propose a novel module called RRPN (relational region proposal network) which outputs an object-localizing attention map only with human poses and action verbs. In the source domain, we fully train an object detector and the RRPN with full supervision of HOI. With transferred knowledge about localization map from the trained RRPN, a new object detector can learn unseen objects with weak verbal supervision of HOI without bounding box annotations in the target domain. Because the RRPN is designed as an add-on type, we can apply it not only to the object detection but also to other domains such as semantic segmentation. The experimental results on HICO-DET dataset show the possibility that the proposed method can be a cheap alternative for the current supervised object detection paradigm. Moreover, qualitative results demonstrate that our model can properly localize unseen objects on HICO-DET and V-COCO datasets.
APA, Harvard, Vancouver, ISO, and other styles
22

Fang, Hao-Shu, Yichen Xie, Dian Shao, Yong-Lu Li, and Cewu Lu. "DecAug: Augmenting HOI Detection via Decomposition." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1300–1308. http://dx.doi.org/10.1609/aaai.v35i2.16218.

Full text
Abstract:
Human-object interaction (HOI) detection requires a large amount of annotated data. Current algorithms suffer from insufficient training samples and category imbalance within datasets. To increase data efficiency, in this paper, we propose an efficient and effective data augmentation method called DecAug for HOI detection. Based on our proposed object state similarity metric, object patterns across different HOIs are shared to augment local object appearance features without changing their states. Further, we shift spatial correlation between humans and objects to other feasible configurations with the aid of a pose-guided Gaussian Mixture Model while preserving their interactions. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICO-DET dataset for two advanced models. Specifically, interactions with fewer samples enjoy more notable improvement. Our method can be easily integrated into various HOI detection models with negligible extra computational consumption.
APA, Harvard, Vancouver, ISO, and other styles
23

Saad, Aldosary, and Abdallah A. Mohamed. "An integrated human computer interaction scheme for object detection using deep learning." Computers & Electrical Engineering 96 (December 2021): 107475. http://dx.doi.org/10.1016/j.compeleceng.2021.107475.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Siadari, Thomhert S., Mikyong Han, and Hyunjin Yoon. "Three‐stream network with context convolution module for human–object interaction detection." ETRI Journal 42, no. 2 (April 2020): 230–38. http://dx.doi.org/10.4218/etrij.2019-0230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Rehman, Amjad, Tanzila Saba, Muhammad Zeeshan Khan, Robertas Damaševičius, and Saeed Ali Bahaj. "Internet-of-Things-Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security." Security and Communication Networks 2022 (October 5, 2022): 1–12. http://dx.doi.org/10.1155/2022/8383461.

Full text
Abstract:
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
26

Moutik, Oumaima, Smail Tigani, Rachid Saadane, and Abdellah Chehri. "Hybrid Deep Learning Vision-based Models for Human Object Interaction Detection by Knowledge Distillation." Procedia Computer Science 192 (2021): 5093–103. http://dx.doi.org/10.1016/j.procs.2021.09.287.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Neha B., Naveen V., and Angelin Gladston. "Detection of Hands for Hand-Controlled Skyfall Game in Real Time Using CNN." International Journal of Interactive Communication Systems and Technologies 10, no. 2 (July 2020): 15–25. http://dx.doi.org/10.4018/ijicst.2020070102.

Full text
Abstract:
With human-computer interaction technology evolving, direct use of the hand as an input device is of wide attraction. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection. This paper focuses on creating a hand-controlled web-based skyfall game by building a real time hand detection using CNN-based technique. A CNN network, which uses a MobileNet as the feature extractor along with the single shot detector framework, is used to achieve a robust and fast detection of hand location and tracking. Along with detection and tracking of hand, skyfall game has been designed to play using hand in real time with tensor flow framework. This way of designing the game where hand is used as input to control the paddle of skyfall game improved the player interaction and interest towards playing the game. This model of CNN network used egohands dataset for detecting and tracking the hands in real time and produced an average accuracy of 0.9 for open hands and 0.6 for closed hands which in turn improved player and game interactions.
APA, Harvard, Vancouver, ISO, and other styles
28

Choi, Yujin, Wookho Son, and Yoon Sang Kim. "A Study on Interaction Prediction for Reducing Interaction Latency in Remote Mixed Reality Collaboration." Applied Sciences 11, no. 22 (November 12, 2021): 10693. http://dx.doi.org/10.3390/app112210693.

Full text
Abstract:
Various studies on latency in remote mixed reality collaborations (remote MR collaboration) have been conducted, but studies related to interaction latency are scarce. Interaction latency in a remote MR collaboration occurs because action detection (such as contact or collision) between a human and a virtual object is required for finding the interaction performed. Therefore, in this paper, we propose a method based on interaction prediction to reduce the time for detecting the action between humans and virtual objects. The proposed method predicts an interaction based on consecutive joint angles. To examine the effectiveness of the proposed method, an experiment was conducted and the results were given. From the experimental results, it was confirmed that the proposed method could reduce the interaction latency compared to the one obtained by conventional methods.
APA, Harvard, Vancouver, ISO, and other styles
29

Maraghi, Vali Ollah, and Karim Faez. "Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning." Computational Intelligence and Neuroscience 2021 (June 9, 2021): 1–15. http://dx.doi.org/10.1155/2021/9922697.

Full text
Abstract:
Recognition of human activities is an essential field in computer vision. The most human activity consists of the interaction between humans and objects. Many successful works have been done on human-object interaction (HOI) recognition and achieved acceptable results in recent years. Still, they are fully supervised and need to train labeled data for all HOIs. Due to the enormous space of human-object interactions, listing and providing the training data for all possible categories is costly and impractical. We propose an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique to solve this problem. Our method recognizes a verb and an object from the video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows identifying a new combination of verbs and objects. So, a new HOI class can be identified, which is not seen by the recognizer system. We introduce a neural network architecture that can understand and represent the video data. The proposed system learns verbs and objects from available training data at the training phase and can identify the verb-object pairs in a video at test time. So, the system can identify the HOI class with different combinations of objects and verbs. Also, we propose to use lateral information for combining the verbs and the objects to make valid verb-object pairs. It helps to prevent the detection of rare and probably wrong HOIs. The lateral information comes from word embedding techniques. Furthermore, we propose a new feature aggregation method for aggregating extracted high-level features from video frames before feeding them to the classifier. We illustrate that this feature aggregation method is more effective for actions that include multiple subactions. We evaluated our system by recently introduced Charades challengeable dataset, which has lots of HOI categories in videos. We show that our proposed system can detect unseen HOI classes in addition to the acceptable recognition of seen types. Therefore, the number of classes identifiable by the system is greater than the number of classes used for training.
APA, Harvard, Vancouver, ISO, and other styles
30

Tsai, Tsung Han, Chung-Yuan Lin, and Sz-Yan Li. "Algorithm and Architecture Design of Human–Machine Interaction in Foreground Object Detection With Dynamic Scene." IEEE Transactions on Circuits and Systems for Video Technology 23, no. 1 (January 2013): 15–29. http://dx.doi.org/10.1109/tcsvt.2012.2202193.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Wu, Haibin, Bingying Zheng, Haomiao Wang, and Jinhua Ye. "New Flexible Tactile Sensor Based on Electrical Impedance Tomography." Micromachines 13, no. 2 (January 26, 2022): 185. http://dx.doi.org/10.3390/mi13020185.

Full text
Abstract:
In order to obtain external information and ensure the security of human–computer interaction, a double sensitive layer structured tactile sensor was proposed in this paper. Based on the EIT (Electrical Impedance Tomography) method, the sensor converts the information from external collisions or contact into local conductivity changes, and realizes the detection of one or more contact points. These changes can be processed into an image containing positional and force information. The experiments were conducted on the actual sensor sample. The OpenCV toolkit was used to process the positional information of contact points. The distributional regularities of errors in positional detection were analyzed, and the accuracy of the positional detection was evaluated. The effectiveness, sensitivity, and contact area of the force detection were analyzed based on the result of the EIT calculations. Furthermore, multi-object tests of pressure were conducted. The results of the experiment indicated that the proposed sensor performed well in detecting the position and force of contact. It is suitable for human–robot interaction.
APA, Harvard, Vancouver, ISO, and other styles
32

Liu, Sheng, Yangqing Wang, Fengji Dai, and Jingxiang Yu. "Simultaneous 3D Motion Detection, Long-Term Tracking and Model Reconstruction for Multi-Objects." International Journal of Humanoid Robotics 16, no. 04 (August 2019): 1950017. http://dx.doi.org/10.1142/s0219843619500178.

Full text
Abstract:
Motion detection and object tracking play important roles in unsupervised human–machine interaction systems. Nevertheless, the human–machine interaction would become invalid when the system fails to detect the scene objects correctly due to occlusion and limited field of view. Thus, robust long-term tracking of scene objects is vital. In this paper, we present a 3D motion detection and long-term tracking system with simultaneous 3D reconstruction of dynamic objects. In order to achieve the high precision motion detection, an optimization framework with a novel motion pose estimation energy function is provided in the proposed method by which the 3D motion pose of each object can be estimated independently. We also develop an accurate object-tracking method which combines 2D visual information and depth. We incorporate a novel boundary-optimization segmentation based on 2D visual information and depth to improve the robustness of tracking significantly. Besides, we also introduce a new fusion and updating strategy in the 3D reconstruction process. This strategy brings higher robustness to 3D motion detection. Experiments results show that, for synthetic sequences, the root-mean-square error (RMSE) of our system is much smaller than Co-Fusion (CF); our system performs extremely well in 3D motion detection accuracy. In the case of occlusion or out-of-view on real scene data, CF will suffer the loss of tracking or object-label changing, by contrast, our system can always keep the robust tracking and maintain the correct labels for each dynamic object. Therefore, our system is robust to occlusion and out-of-view application scenarios.
APA, Harvard, Vancouver, ISO, and other styles
33

Moldovan, Constantin Catalin, and Ionel Staretu. "Real-Time Gesture Recognition for Controlling a Virtual Hand." Advanced Materials Research 463-464 (February 2012): 1147–50. http://dx.doi.org/10.4028/www.scientific.net/amr.463-464.1147.

Full text
Abstract:
Object tracking in three dimensional environments is an area of research that has attracted a lot of attention lately, for its potential regarding the interaction between man and machine. Hand gesture detection and recognition, in real time, from video stream, plays a significant role in the human-computer interaction and, on the current digital image processing applications, this represent a difficult task. This paper aims to present a new method for human hand control in virtual environments, by eliminating the need of an external device currently used for hand motion capture and digitization. A first step in this direction would be the detection of human hand, followed by the detection of gestures and their use to control a virtual hand in a virtual environment.
APA, Harvard, Vancouver, ISO, and other styles
34

Feng, Kai Ping, Ke Wan, and Na Luo. "Natural Gesture Recognition Based on Motion Detection and Skin Color." Applied Mechanics and Materials 321-324 (June 2013): 974–79. http://dx.doi.org/10.4028/www.scientific.net/amm.321-324.974.

Full text
Abstract:
With the development of the Virtual Reality technology and the next Human-Machine Interaction technology, this paper focus on the object motion detection and object skin color analysis, provide one kind of hand gesture segmentation method based on one camera. This method capture the image from the single camera to detect the moving object by the time difference method and the Gaussian module method, tracking the hand motion region real time, then to segment the hand gesture using the specified region skin color features after the hand region is extracted. Using the motion detection and the skin color features both, to do static gesture recognition by the template match method after extracting the features of the static gesture contour.This experiment make clear that the segmentation has better effect and recognition result.
APA, Harvard, Vancouver, ISO, and other styles
35

Fiedler, Marc-André, Philipp Werner, Aly Khalifa, and Ayoub Al-Hamadi. "SFPD: Simultaneous Face and Person Detection in Real-Time for Human–Robot Interaction." Sensors 21, no. 17 (September 2, 2021): 5918. http://dx.doi.org/10.3390/s21175918.

Full text
Abstract:
Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human–robot interaction.
APA, Harvard, Vancouver, ISO, and other styles
36

Diete, Alexander, and Heiner Stuckenschmidt. "Fusing Object Information and Inertial Data for Activity Recognition." Sensors 19, no. 19 (September 23, 2019): 4119. http://dx.doi.org/10.3390/s19194119.

Full text
Abstract:
In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are popular choices as data sources. Using interaction sensors, however, has one drawback: they may not differentiate between proper interaction and simple touching of an object. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g., when an object is only touched but no interaction occurred afterwards. There are, however, many scenarios like medicine intake that rely heavily on correctly recognized activities. In our work, we aim to address this limitation and present a multimodal egocentric-based activity recognition approach. Our solution relies on object detection that recognizes activity-critical objects in a frame. As it is infeasible to always expect a high quality camera view, we enrich the vision features with inertial sensor data that monitors the users’ arm movement. This way we try to overcome the drawbacks of each respective sensor. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve an F 1 -measure of up to 79.6%.
APA, Harvard, Vancouver, ISO, and other styles
37

Xu, Jun, Yanxin Ma, Songhua He, and Jiahua Zhu. "3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud." Sensors 19, no. 19 (September 22, 2019): 4093. http://dx.doi.org/10.3390/s19194093.

Full text
Abstract:
Three-dimensional (3D) object detection is an important research in 3D computer vision with significant applications in many fields, such as automatic driving, robotics, and human–computer interaction. However, the low precision is an urgent problem in the field of 3D object detection. To solve it, we present a framework for 3D object detection in point cloud. To be specific, a designed Backbone Network is used to make fusion of low-level features and high-level features, which makes full use of various information advantages. Moreover, the two-dimensional (2D) Generalized Intersection over Union is extended to 3D use as part of the loss function in our framework. Empirical experiments of Car, Cyclist, and Pedestrian detection have been conducted respectively on the KITTI benchmark. Experimental results with average precision (AP) have shown the effectiveness of the proposed network.
APA, Harvard, Vancouver, ISO, and other styles
38

Choi, Seong-Wook, Kiho Seong, Sukho Lee, Kwang-Hyun Baek, and Yong Shim. "Noise Immunity-Enhanced Capacitance Readout Circuit for Human Interaction Detection in Human Body Communication Systems." Electronics 11, no. 4 (February 14, 2022): 577. http://dx.doi.org/10.3390/electronics11040577.

Full text
Abstract:
Recent healthcare systems based on human body communication (HBC) require human interaction sensors. Due to the conductive properties of the human body, capacitive sensors are most widely known and are applied to many electronic gadgets for communication. Capacitance fluctuations due to the fact of human interaction are typically converted to voltage levels using some analog circuits, and then analog-to-digital converters (ADCs) are used to convert analog voltages into digital codes for further processing. However, signals detected by human touch naturally contain large noise, and an active analog filter that consumes a lot of power is required. In addition, the inclusion of ADCs causes the system to use a large area and amount of power. The proposed structure adopts a digital-based moving average filter (MAF) that can effectively operate as a low-pass filter (LPF) instead of a large-area and high-power consumption analog filter. In addition, the proposed ∆C detection algorithm can distinguish between human interaction and object interaction. As a result, two individual digital signals of touch/release and movement can be generated, and the type and strength of the touch can be effectively expressed without the help of an ADC. The prototype chip of the proposed capacitive sensing circuit was fabricated with commercial 65 nm CMOS process technology, and its functionality was fully verified through testing and measurement. The prototype core occupies an active area of 0.0067 mm2, consumes 7.5 uW of power, and has a conversion time of 105 ms.
APA, Harvard, Vancouver, ISO, and other styles
39

Ye, Qing, and Yong Mei Zhang. "Moving Object Detection and Tracking Algorithm Based on Background Subtraction." Applied Mechanics and Materials 263-266 (December 2012): 2211–16. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.2211.

Full text
Abstract:
Moving target detection and tracking algorithm as the core issue of computer vision and human-computer interaction is the first step of intelligent video surveillance system. Through comparing temporal difference method and background subtraction, a moving object detection and tracking algorithm based on background subtraction under static background is proposed, in order to quickly and accurately detect and identify the moving object in the intelligent monitoring system. In this algorithm, firstly, we use background acquisition method to receive the background image, then use the current frame image and the received background image to perform background subtraction in order to extract foreground object information and receive the difference image; secondly, we use threshold segmentation and morphology image processing to process the difference image in order to eliminate noises and receive the clear binary moving object image; finally, we use the centroid tracking method to track and mark the moving object. Experimental results show that the algorithm can effectively and quickly detect and track moving object from video sequence under static background. This algorithm is easily realized and has good real-time and robust, which is automated and self triggered for background updating. The algorithm can be used in driver assistance systems, motion capture, virtual reality and other fields.
APA, Harvard, Vancouver, ISO, and other styles
40

WERSING, HEIKO, STEPHAN KIRSTEIN, MICHAEL GÖTTING, HOLGER BRANDL, MARK DUNN, INNA MIKHAILOVA, CHRISTIAN GOERICK, JOCHEN STEIL, HELGE RITTER, and EDGAR KÖRNER. "ONLINE LEARNING OF OBJECTS IN A BIOLOGICALLY MOTIVATED VISUAL ARCHITECTURE." International Journal of Neural Systems 17, no. 04 (August 2007): 219–30. http://dx.doi.org/10.1142/s0129065707001081.

Full text
Abstract:
We present a biologically motivated architecture for object recognition that is capable of online learning of several objects based on interaction with a human teacher. The system combines biological principles such as appearance-based representation in topographical feature detection hierarchies and context-driven transfer between different levels of object memory. Training can be performed in an unconstrained environment by presenting objects in front of a stereo camera system and labeling them by speech input. The learning is fully online and thus avoids an artificial separation of the interaction into training and test phases. We demonstrate the performance on a challenging ensemble of 50 objects.
APA, Harvard, Vancouver, ISO, and other styles
41

Przybylo, Jaromir. "Object detection and tracking for low-cost video surveillance system." Image Processing & Communications 18, no. 2-3 (December 1, 2013): 91–99. http://dx.doi.org/10.2478/v10248-012-0083-2.

Full text
Abstract:
Abstract Automated and intelligent video surveillance systems play important role in the modern world. Since the amount of various video streams that must be analyzed grows, such artificial intelligence systems can assist humans in performing tiresome tasks. As a result, the effectiveness of response to a dangerous situations is increasing (detect unexpected movement or unusual behavior that may pose a threat to people, property and infrastructure). Video surveillance systems have to meet several requirements: must be accurate and not produce too many false alarms, moreover it must be able to process the received video stream in real-time to provide a sufficient response time. The work presented here focuses on the selected challenges of scene analysis in video surveillance systems (object detection/tracking, effectiveness of the whole system). The aim of the research is to design a low-budget surveillance system, that can be used for example in a home security monitoring. Such solution can be use not only to surveillance but also to monitor elderly person at home or provide new ways of interacting in human-computer interaction systems.
APA, Harvard, Vancouver, ISO, and other styles
42

WooJang, Seok, and Siwoo Byun. "Facial region detection robust to changing backgrounds." International Journal of Engineering & Technology 7, no. 2.12 (April 3, 2018): 25. http://dx.doi.org/10.14419/ijet.v7i2.12.11028.

Full text
Abstract:
Background/Objectives: These days, many studies have actively been conducted on intelligent robots capable of providing human friendly service. To make natural interaction between humans and robots, it is required to develop the mobile robot-based technology of detecting human facial regions robustly in dynamically changing real backgrounds.Methods/Statistical analysis: This paper proposes a method for detecting facial regions adaptively through the mobile robot-based monitoring of backgrounds in a dynamic real environment. In the proposed method, a camera-object distance and a color change in object background are monitored, and thereby the skin color extraction algorithm most suitable for the measured distance and color is applied. In the face detection step, if the searched range is valid, the most suitable skin color detection method is selected so as to detect facial regions.Findings: To sum up the experimental results, algorithms have a difference in performance depending on a distance and a background color. Overall, the algorithms using neural network showed stable results. The algorithm using Kismet had a good perception rate for the ground truth part of an original image, and a skin color detection rate was greatly influenced by pink and yellow background colors similar to a skin tone, and consequently an incorrect perception rate of background was considerably high. With regard to each algorithm performance depending on a distance, the closer a distance with an object was to 320cm, the more an incorrect perception rate of a background sharply increased. To analyze the performance of each skin color detection algorithm applied to face detection, we examined how much a skin color of an original image was detected by each algorithm. For a skin color detection rate, after the ground truth for the skin of an original image, the number of pixels of the skin color detected by each algorithm was calculated. In this case, the ground truth means a range of the skin color of an original image to detect.Improvements/Applications: We expect that the proposed approach of detecting facial regionsin a dynamic real environment will be used in a variety of application areas related to computer vision and image processing.
APA, Harvard, Vancouver, ISO, and other styles
43

Naik, S. Gopi. "Weapon and Object Detection Using Mobile-Net SSD Model in Deep Neural Network." International Journal for Research in Applied Science and Engineering Technology 9, no. 8 (August 31, 2021): 1573–82. http://dx.doi.org/10.22214/ijraset.2021.37622.

Full text
Abstract:
Abstract: The plan is to establish an integrated system that can manage high-quality visual information and also detect weapons quickly and efficiently. It is obtained by integrating ARM-based computer vision and optimization algorithms with deep neural networks able to detect the presence of a threat. The whole system is connected to a Raspberry Pi module, which will capture live broadcasting and evaluate it using a deep convolutional neural network. Due to the intimate interaction between object identification and video and image analysis in real-time objects, By generating sophisticated ensembles that incorporate various low-level picture features with high-level information from object detection and scenario classifiers, their performance can quickly plateau. Deep learning models, which can learn semantic, high-level, deeper features, have been developed to overcome the issues that are present in optimization algorithms. It presents a review of deep learning based object detection frameworks that use Convolutional Neural Network layers for better understanding of object detection. The Mobile-Net SSD model behaves differently in network design, training methods, and optimization functions, among other things. The crime rate in suspicious areas has been reduced as a consequence of weapon detection. However, security is always a major concern in human life. The Raspberry Pi module, or computer vision, has been extensively used in the detection and monitoring of weapons. Due to the growing rate of human safety protection, privacy and the integration of live broadcasting systems which can detect and analyse images, suspicious areas are becoming indispensable in intelligence. This process uses a Mobile-Net SSD algorithm to achieve automatic weapons and object detection. Keywords: Computer Vision, Weapon and Object Detection, Raspberry Pi Camera, RTSP, SMTP, Mobile-Net SSD, CNN, Artificial Intelligence.
APA, Harvard, Vancouver, ISO, and other styles
44

Liu, Fang, Liang Zhao, Xiaochun Cheng, Qin Dai, Xiangbin Shi, and Jianzhong Qiao. "Fine-Grained Action Recognition by Motion Saliency and Mid-Level Patches." Applied Sciences 10, no. 8 (April 18, 2020): 2811. http://dx.doi.org/10.3390/app10082811.

Full text
Abstract:
Effective extraction of human body parts and operated objects participating in action is the key issue of fine-grained action recognition. However, most of the existing methods require intensive manual annotation to train the detectors of these interaction components. In this paper, we represent videos by mid-level patches to avoid the manual annotation, where each patch corresponds to an action-related interaction component. In order to capture mid-level patches more exactly and rapidly, candidate motion regions are extracted by motion saliency. Firstly, the motion regions containing interaction components are segmented by a threshold adaptively calculated according to the saliency histogram of the motion saliency map. Secondly, we introduce a mid-level patch mining algorithm for interaction component detection, with object proposal generation and mid-level patch detection. The object proposal generation algorithm is used to obtain multi-granularity object proposals inspired by the idea of the Huffman algorithm. Based on these object proposals, the mid-level patch detectors are trained by K-means clustering and SVM. Finally, we build a fine-grained action recognition model using a graph structure to describe relationships between the mid-level patches. To recognize actions, the proposed model calculates the appearance and motion features of mid-level patches and the binary motion cooperation relationships between adjacent patches in the graph. Extensive experiments on the MPII cooking database demonstrate that the proposed method gains better results on fine-grained action recognition.
APA, Harvard, Vancouver, ISO, and other styles
45

Patil, Rupali, Adhish Velingkar, Mohammad Nomaan Parmar, Shubham Khandhar, and Bhavin Prajapati. "Machine Vision Enabled Bot for Object Tracking." JINAV: Journal of Information and Visualization 1, no. 1 (October 1, 2020): 15–26. http://dx.doi.org/10.35877/454ri.jinav155.

Full text
Abstract:
Object detection and tracking are essential and testing undertaking in numerous PC vision appliances. To distinguish the object first find a way to accumulate information. In this design, the robot can distinguish the item and track it just as it can turn left and right position and afterward push ahead and in reverse contingent on the object motion. It keeps up the consistent separation between the item and the robot. We have designed a webpage that is used to display a live feed from the camera and the camera can be controlled by the user efficiently. Implementation of machine learning is done for detection purposes along with open cv and creating cloud storage. The pan-tilt mechanism is used for camera control which is attached to our 3-wheel chassis robot through servo motors. This idea can be used for surveillance purposes, monitoring local stuff, and human-machine interaction.
APA, Harvard, Vancouver, ISO, and other styles
46

Achirei, Stefan-Daniel, Mihail-Cristian Heghea, Robert-Gabriel Lupu, and Vasile-Ion Manta. "Human Activity Recognition for Assisted Living Based on Scene Understanding." Applied Sciences 12, no. 21 (October 24, 2022): 10743. http://dx.doi.org/10.3390/app122110743.

Full text
Abstract:
The growing share of the population over the age of 65 is putting pressure on the social health insurance system, especially on institutions that provide long-term care services for the elderly or to people who suffer from chronic diseases or mental disabilities. This pressure can be reduced through the assisted living of the patients, based on an intelligent system for monitoring vital signs and home automation. In this regard, since 2008, the European Commission has financed the development of medical products and services through the ambient assisted living (AAL) program—Ageing Well in the Digital World. The SmartCare Project, which integrates the proposed Computer Vision solution, follows the European strategy on AAL. This paper presents an indoor human activity recognition (HAR) system based on scene understanding. The system consists of a ZED 2 stereo camera and a NVIDIA Jetson AGX processing unit. The recognition of human activity is carried out in two stages: all humans and objects in the frame are detected using a neural network, then the results are fed to a second network for the detection of interactions between humans and objects. The activity score is determined based on the human–object interaction (HOI) detections.
APA, Harvard, Vancouver, ISO, and other styles
47

KUBOTA, NAOYUKI, HIROYUKI KOJIMA, NAOHIDE AIZAWA, and DALAI TANG. "DYNAMIC TOPOLOGICAL VISUALIZATION OF CHANGE IN PERCEPTUAL INFORMATION OF PARTNER ROBOTS." International Journal of Information Acquisition 05, no. 03 (September 2008): 247–58. http://dx.doi.org/10.1142/s0219878908001673.

Full text
Abstract:
This paper proposes a method for topologically visualizing the perceptual information of a partner robot. First, we explain the methods for human detection, human motion extraction, and object recognition. Next, we explain the perceptual system of the robot based on the detected human and objects. We propose a topological visualization method based on a spring-mass-damper system according to the perceptual information. Finally, we show several experimental results of the proposed method, and the proposed method enables a human to understand what the robot perceives in the interaction with the human and environment.
APA, Harvard, Vancouver, ISO, and other styles
48

Zhong, Ming, Yanqiang Zhang, Xi Yang, Yufeng Yao, Junlong Guo, Yaping Wang, and Yaxin Liu. "Assistive Grasping Based on Laser-point Detection with Application to Wheelchair-mounted Robotic Arms." Sensors 19, no. 2 (January 14, 2019): 303. http://dx.doi.org/10.3390/s19020303.

Full text
Abstract:
As the aging of the population becomes more severe, wheelchair-mounted robotic arms (WMRAs) are gaining an increased amount of attention. Laser pointer interactions are an attractive method enabling humans to unambiguously point out objects and pick them up. In addition, they bring about a greater sense of participation in the interaction process as an intuitive interaction mode. However, the issue of human–robot interactions remains to be properly tackled, and traditional laser point interactions still suffer from poor real-time performance and low accuracy amid dynamic backgrounds. In this study, combined with an advanced laser point detection method and an improved pose estimation algorithm, a laser pointer is used to facilitate the interactions between humans and a WMRA in an indoor environment. Assistive grasping using a laser selection consists of two key steps. In the first step, the images captured using an RGB-D camera are pre-processed, and then fed to a convolutional neural network (CNN) to determine the 2D coordinates of the laser point and objects within the image. Meanwhile, the centroid coordinates of the selected object are also obtained using the depth information. In this way, the object to be picked up and its location are determined. The experimental results show that the laser point can be detected with almost 100% accuracy in a complex environment. In the second step, a compound pose-estimation algorithm aiming at a sparse use of multi-view templates is applied, which consists of both coarse- and precise-matching of the target to the template objects, greatly improving the grasping performance. The proposed algorithms were implemented on a Kinova Jaco robotic arm, and the experimental results demonstrate their effectiveness. Compared with commonly accepted methods, the time consumption of the pose generation can be reduced from 5.36 to 4.43 s, and synchronously, the pose estimation error is significantly improved from 21.31% to 3.91%.
APA, Harvard, Vancouver, ISO, and other styles
49

Tang, Zhiyun. "Intelligent Target Detection and Tracking Algorithm for Martial Arts Applications." Wireless Communications and Mobile Computing 2022 (March 23, 2022): 1–10. http://dx.doi.org/10.1155/2022/7008467.

Full text
Abstract:
Moving object detection and tracking is the basis and key technology for intelligent video surveillance, human-computer interaction, mobile robot and vehicle visual navigation, industrial robot system, and other applications. It has important applications in intelligent monitoring, human-computer interaction, visual navigation, and so on. Intelligent technology can greatly facilitate the monitoring of countless targets. This paper adopts the random motion model to describe the motion state of the target. Based on wireless communication and information security, this paper studies the communication and information security of the intelligent system of martial arts target detection and tracking algorithm, and the basic idea of mean-shift tracking algorithm is to use gradient climbing of probability density to find local optimal. This paper uses the background subtraction method based on the vibe algorithm to obtain the binary image of the foreground object; the shadow detection algorithm is used to remove the shadow of the foreground image; the haar-like (Haar) feature is selected as the feature of motion detection, and the feature value of the rectangular area is quickly calculated to describe the adjacent image area. The difference between the features, the final result image, and then the intelligent system for analysis. Experimental data shows that the time consumed by the tracking algorithm is less than 20 ms, which can meet the real-time requirements of ordinary target tracking systems. The average processing time of the hybrid modeling method is 62.8 ms, and the detection rate is 15.92 frames/SEC. The results show that the algorithm improves the utilization of particles, greatly reduces the complexity, and reduces the degradation of the particle filter.
APA, Harvard, Vancouver, ISO, and other styles
50

Rong, Yao, Naemi-Rebecca Kassautzki, Wolfgang Fuhl, and Enkelejda Kasneci. "Where and What." Proceedings of the ACM on Human-Computer Interaction 6, ETRA (May 13, 2022): 1–22. http://dx.doi.org/10.1145/3530887.

Full text
Abstract:
Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting "where" human drivers look at and lack knowledge of "what" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography