Academic literature on the topic 'Human-object Interaction Detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Human-object Interaction Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Human-object Interaction Detection"

1

Li, Weifeng, Hongbing Yang, Zhou Lei, and Dawei Niu. "Distance-based Human-Object Interaction Detection." Journal of Physics: Conference Series 1920, no. 1 (May 1, 2021): 012073. http://dx.doi.org/10.1088/1742-6596/1920/1/012073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Jiali, Zuriahati Mohd Yunos, and Habibollah Haron. "Interactivity Recognition Graph Neural Network (IR-GNN) Model for Improving Human–Object Interaction Detection." Electronics 12, no. 2 (January 16, 2023): 470. http://dx.doi.org/10.3390/electronics12020470.

Full text
Abstract:
Human–object interaction (HOI) detection is important for promoting the development of many fields such as human–computer interactions, service robotics, and video security surveillance. A high percentage of human–object pairs with invalid interactions are discovered in the object detection phase of conventional human–object interaction detection algorithms, resulting in inaccurate interaction detection. To recognize invalid human–object interaction pairs, this paper proposes a model structure, the interactivity recognition graph neural network (IR-GNN) model, which can directly infer the probability of human–object interactions from a graph model architecture. The model consists of three modules: The first one is the human posture feature module, which uses key points of the human body to construct relative spatial pose features and further facilitates the discrimination of human–object interactivity through human pose information. Second, a human–object interactivity graph module is proposed. The spatial relationship of human–object distance is used as the initialization weight of edges, and the graph is updated by combining the message passing of attention mechanism so that edges with interacting node pairs obtain higher weights. Thirdly, the classification module is proposed; by finally using a fully connected neural network, the interactivity of human–object pairs is binarily classified. These three modules work in collaboration to enable the effective inference of interactive possibilities. On the datasets HICO-DET and V-COCO, comparative and ablation experiments are carried out. It has been proved that our technology can improve the detection of human–object interactions.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu, and Wang Liu. "Multi-stream Network for Human-object Interaction Detection." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 08 (March 12, 2021): 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Full text
Abstract:
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Tianlang, Tao Lu, Wenhua Fang, and Yanduo Zhang. "Human–Object Interaction Detection with Ratio-Transformer." Symmetry 14, no. 8 (August 11, 2022): 1666. http://dx.doi.org/10.3390/sym14081666.

Full text
Abstract:
Human–object interaction (HOI) is a human-centered object detection task that aims to identify the interactions between persons and objects in an image. Previous end-to-end methods have used the attention mechanism of a transformer to spontaneously identify the associations between persons and objects in an image, which effectively improved detection accuracy; however, a transformer can increase computational demands and slow down detection processes. In addition, the end-to-end method can result in asymmetry between foreground and background information. The foreground data may be significantly less than the background data, while the latter consumes more computational resources without significantly improving detection accuracy. Therefore, we proposed an input-controlled transformer, “ratio-transformer” to solve an HOI task, which could not only limit the amount of information in the input transformer by setting a sampling ratio, but also significantly reduced the computational demands while ensuring detection accuracy. The ratio-transformer consisted of a sampling module and a transformer network. The sampling module divided the input feature map into foreground versus background features. The irrelevant background features were a pooling sampler, which were then fused with the foreground features as input data for the transformer. As a result, the valid data input into the Transformer network remained constant, while irrelevant information was significantly reduced, which maintained the foreground and background information symmetry. The proposed network was able to learn the feature information of the target itself and the association features between persons and objects so it could query to obtain the complete HOI interaction triplet. The experiments on the VCOCO dataset showed that the proposed method reduced the computational demand of the transformer by 57% without any loss of accuracy, as compared to other current HOI methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Kunlun, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui Xu, Luxin Yan, Sheng Zhong, and Xu Zou. "Effective actor-centric human-object interaction detection." Image and Vision Computing 121 (May 2022): 104422. http://dx.doi.org/10.1016/j.imavis.2022.104422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kogashi, Kaen, Yang Wu, Shohei Nobuhara, and Ko Nishino. "Human–object interaction detection with missing objects." Image and Vision Computing 113 (September 2021): 104262. http://dx.doi.org/10.1016/j.imavis.2021.104262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gao, Yiming, Zhanghui Kuang, Guanbin Li, Wayne Zhang, and Liang Lin. "Hierarchical Reasoning Network for Human-Object Interaction Detection." IEEE Transactions on Image Processing 30 (2021): 8306–17. http://dx.doi.org/10.1109/tip.2021.3093784.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fang, Hao-Shu, Yichen Xie, Dian Shao, and Cewu Lu. "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1291–99. http://dx.doi.org/10.1609/aaai.v35i2.16217.

Full text
Abstract:
Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Xinpeng, Yong-Lu Li, and Cewu Lu. "Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1819–27. http://dx.doi.org/10.1609/aaai.v36i2.20075.

Full text
Abstract:
Human-Object Interaction (HOI) detection plays a core role in activity understanding. As a compositional learning problem (human-verb-object), studying its generalization matters. However, widely-used metric mean average precision (mAP) fails to model the compositional generalization well. Thus, we propose a novel metric, mPD (mean Performance Degradation), as a complementary of mAP to evaluate the performance gap among compositions of different objects and the same verb. Surprisingly, mPD reveals that previous methods usually generalize poorly. With mPD as a cue, we propose Object Category (OC) Immunity to boost HOI generalization. The idea is to prevent model from learning spurious object-verb correlations as a short-cut to over-fit the train set. To achieve OC-immunity, we propose an OC-immune network that decouples the inputs from OC, extracts OC-immune representations, and leverages uncertainty quantification to generalize to unseen objects. In both conventional and zero-shot experiments, our method achieves decent improvements. To fully evaluate the generalization, we design a new and more difficult benchmark, on which we present significant advantage. The code is available at https://github.com/Foruck/OC-Immunity.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhong, Xubin, Changxing Ding, Xian Qu, and Dacheng Tao. "Polysemy Deciphering Network for Robust Human–Object Interaction Detection." International Journal of Computer Vision 129, no. 6 (April 19, 2021): 1910–29. http://dx.doi.org/10.1007/s11263-021-01458-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Human-object Interaction Detection"

1

Li, Ying. "Efficient and Robust Video Understanding for Human-robot Interaction and Detection." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152207324664654.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rukanskaitė, Julija. "Tuning into uncertainty : A material exploration of object detection through play." Thesis, Malmö universitet, Institutionen för konst, kultur och kommunikation (K3), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-44239.

Full text
Abstract:
The ubiquitous yet opaque logic of machine learning complicates both the design process and end-use. Because of this, much of Interaction Design and HCI now focus on making this logic transparent through human-like explanations and tight control while disregarding other, non-normative human-AI interactions as technical failures. In this thesis I re-frame such interactions as generative for both material exploration and user experience in non-purpose-driven applications. By expanding on the notion of machine learning uncertainty with play, queering, and more-than human design, I try to understand them in a designerly way. This re-framing is followed by a material-centred Research through Design process that concludes with Object Detection Radio: a ludic device that sonifies Tensorflow.js Object Detection API’s prediction probabilities. The design process suggests ways of making machine learning uncertainty explicit in human-AI interaction. In addition, I propose play as an alternative way of relating to and understanding the agency of machine learning technology.
APA, Harvard, Vancouver, ISO, and other styles
3

Richards, Mark Andrew. "An intuitive motion-based input model for mobile devices." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/16556/1/Mark_Richards_Thesis.pdf.

Full text
Abstract:
Traditional methods of input on mobile devices are cumbersome and difficult to use. Devices have become smaller, while their operating systems have become more complex, to the extent that they are approaching the level of functionality found on desktop computer operating systems. The buttons and toggle-sticks currently employed by mobile devices are a relatively poor replacement for the keyboard and mouse style user interfaces used on their desktop computer counterparts. For example, when looking at a screen image on a device, we should be able to move the device to the left to indicate we wish the image to be panned in the same direction. This research investigates a new input model based on the natural hand motions and reactions of users. The model developed by this work uses the generic embedded video cameras available on almost all current-generation mobile devices to determine how the device is being moved and maps this movement to an appropriate action. Surveys using mobile devices were undertaken to determine both the appropriateness and efficacy of such a model as well as to collect the foundational data with which to build the model. Direct mappings between motions and inputs were achieved by analysing users' motions and reactions in response to different tasks. Upon the framework being completed, a proof of concept was created upon the Windows Mobile Platform. This proof of concept leverages both DirectShow and Direct3D to track objects in the video stream, maps these objects to a three-dimensional plane, and determines device movements from this data. This input model holds the promise of being a simpler and more intuitive method for users to interact with their mobile devices, and has the added advantage that no hardware additions or modifications are required the existing mobile devices.
APA, Harvard, Vancouver, ISO, and other styles
4

Richards, Mark Andrew. "An intuitive motion-based input model for mobile devices." Queensland University of Technology, 2006. http://eprints.qut.edu.au/16556/.

Full text
Abstract:
Traditional methods of input on mobile devices are cumbersome and difficult to use. Devices have become smaller, while their operating systems have become more complex, to the extent that they are approaching the level of functionality found on desktop computer operating systems. The buttons and toggle-sticks currently employed by mobile devices are a relatively poor replacement for the keyboard and mouse style user interfaces used on their desktop computer counterparts. For example, when looking at a screen image on a device, we should be able to move the device to the left to indicate we wish the image to be panned in the same direction. This research investigates a new input model based on the natural hand motions and reactions of users. The model developed by this work uses the generic embedded video cameras available on almost all current-generation mobile devices to determine how the device is being moved and maps this movement to an appropriate action. Surveys using mobile devices were undertaken to determine both the appropriateness and efficacy of such a model as well as to collect the foundational data with which to build the model. Direct mappings between motions and inputs were achieved by analysing users' motions and reactions in response to different tasks. Upon the framework being completed, a proof of concept was created upon the Windows Mobile Platform. This proof of concept leverages both DirectShow and Direct3D to track objects in the video stream, maps these objects to a three-dimensional plane, and determines device movements from this data. This input model holds the promise of being a simpler and more intuitive method for users to interact with their mobile devices, and has the added advantage that no hardware additions or modifications are required the existing mobile devices.
APA, Harvard, Vancouver, ISO, and other styles
5

Wieslander, Johan. "Digitizing notes using a moving smartphone : Evaluating Oriented FAST and Rotated BRIEF (ORB)." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302554.

Full text
Abstract:
This thesis investigates the problem of tracking objects for an Augmented Reality (AR) setting. More specifically, the issue of tracking Post-It® notes to be used in a Mobile Augmented Reality (MAR) application using the Oriented FAST and Rotated BRIEF (ORB) keypoint extractor and descriptor, is investigated. This problem explores the relatively new and unexplored territory of tracking specific objects in real-time on mobile devices. Since MAR is becoming more prevalent, this is a field that is likely to be explored in more depth in the future. A solution was implemented in an existing note scanning application. Test sequences, with accompanying ground truth, were created for the applicable scenarios. The test sequences were used to reliably verify and evaluate the implementation with regard to precision, recall, accuracy, and speed. The ground truth was generated in a Mixed-Initiative Computing (MIC) application. The results show that tracking using only ORB is not viable if high precision, recall, or accuracy is needed. While tracking via ORB may not be viable as a standalone solution, the thesis describes methods for using it in a MIC setting, which may be viable.
Denna masteruppsats undersöker spårning av objekt för användning i en AR- miljö. Mer specifikt så undersöks spårning av Post-It®-notiser för användning i en MAR applikation med hjälp av ORB. Det här problemet utforskar det relativt nya och outforksade området rörande spårning av specifika objekt i realtid på mobila enheter. Eftersom MAR blir alltmer vanligt förekommande, så kommer det här forskningsområdet troligtvis att utforskas mer ingående i framtiden. En lösning implementeras utöver en existerande applikation for att skanna notiser. Testsekvenser, med ackompanjerande faktisk data skapades för de relevanta scenarierna. Dessa testsekvenser användes för att kunna verifiera och utvärdera implementationen med avseende på precision, återkall, träffsäkerhet och snabbhet. All faktisk data genererades i en MIC-applikation. Resultaten visar att spårning med enbart ORB är inte genomförbart om höga krav på precision, återkall, träffsäkerhet eller snabbhet behövs. Medan spårning via ORB måhända inte är genomförbart i nuläget som en självstående lösning, så har den här mastersuppsatsen beskrivit metoder för att använda ORB i en MIC-applikation. Något som faktiskt kan vara genomförbart.
APA, Harvard, Vancouver, ISO, and other styles
6

Khalidov, Vasil. "Modèles de mélanges conjugués pour la modélisation de la perception visuelle et auditive." Grenoble, 2010. http://www.theses.fr/2010GRENM064.

Full text
Abstract:
Dans cette thèse, nous nous intéressons à la modélisation de la perception audio-visuelle avec une tête robotique. Les problèmes associés, notamment la calibration audio-visuelle, la détection, la localisation et le suivi d'objets audio-visuels sont étudiés. Une approche spatio-temporelle de calibration d'une tête robotique est proposée, basée sur une mise en correspondance probabiliste multimodale des trajectoires. Le formalisme de modèles de mélange conjugué est introduit ainsi qu'une famille d'algorithmes d'optimisation efficaces pour effectuer le regroupement multimodal. Un cas particulier de cette famille d'algorithmes, notamment l'algorithme EM conjugue, est amélioré pour obtenir des propriétés théoriques intéressantes. Des méthodes de détection d'objets multimodaux et d'estimation du nombre d'objets sont développées et leurs propriétés théoriques sont étudiées. Enfin, la méthode de regroupement multimodal proposée est combinée avec des stratégies de détection et d'estimation du nombre d'objets ainsi qu'avec des techniques de suivi pour effectuer le suivi multimodal de plusieurs objets. La performance des méthodes est démontrée sur des données simulées et réelles issues d'une base de données de scénarios audio-visuels réalistes (base de données CAVA)
In this thesis, the modelling of audio-visual perception with a head-like device is considered. The related problems, namely audio-visual calibration, audio-visual object detection, localization and tracking are addressed. A spatio-temporal approach to the head-like device calibration is proposed based on probabilistic multimodal trajectory matching. The formalism of conjugate mixture models is introduced along with a family of efficient optimization algorithms to perform multimodal clustering. One instance of this algorithm family, namely the conjugate expectation maximization (ConjEM) algorithm is further improved to gain attractive theoretical properties. The multimodal object detection and object number estimation methods are developed, their theoretical properties are discussed. Finally, the proposed multimodal clustering method is combined with the object detection and object number estimation strategies and known tracking techniques to perform multimodal multiobject tracking. The performance is demonstrated on simulated data and the database of realistic audio-visual scenarios (CAVA database)
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Human-object Interaction Detection"

1

Karasulu, Bahadir. Performance Evaluation Software: Moving Object Detection and Tracking in Videos. New York, NY: Springer New York, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Karasulu, Bahadir, and Serdar Korukoglu. Performance Evaluation Software: Moving Object Detection and Tracking in Videos. Springer, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Human-object Interaction Detection"

1

Hou, Zhi, Xiaojiang Peng, Yu Qiao, and Dacheng Tao. "Visual Compositional Learning for Human-Object Interaction Detection." In Computer Vision – ECCV 2020, 584–600. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58555-6_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhong, Xubin, Changxing Ding, Xian Qu, and Dacheng Tao. "Polysemy Deciphering Network for Human-Object Interaction Detection." In Computer Vision – ECCV 2020, 69–85. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58565-5_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Liu, Yang, Qingchao Chen, and Andrew Zisserman. "Amplifying Key Cues for Human-Object-Interaction Detection." In Computer Vision – ECCV 2020, 248–65. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58568-6_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Hongyi, Lisha Mo, and Huimin Ma. "Semantic Inference Network for Human-Object Interaction Detection." In Lecture Notes in Computer Science, 518–29. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-34120-6_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Leonardi, Rosario, Francesco Ragusa, Antonino Furnari, and Giovanni Maria Farinella. "Egocentric Human-Object Interaction Detection Exploiting Synthetic Data." In Image Analysis and Processing – ICIAP 2022, 237–48. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-06430-2_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hassan, Mahmudul, and Anuja Dharmaratne. "Attribute Based Affordance Detection from Human-Object Interaction Images." In Image and Video Technology – PSIVT 2015 Workshops, 220–32. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-30285-0_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gao, Chen, Jiarui Xu, Yuliang Zou, and Jia-Bin Huang. "DRG: Dual Relation Graph for Human-Object Interaction Detection." In Computer Vision – ECCV 2020, 696–712. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58610-2_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Hai, Wei-shi Zheng, and Ling Yingbiao. "Contextual Heterogeneous Graph Network for Human-Object Interaction Detection." In Computer Vision – ECCV 2020, 248–64. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58520-4_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Odashima, Shigeyuki, Taketoshi Mori, Masamichi Simosaka, Hiroshi Noguchi, and Tomomasa Sato. "Event Understanding of Human-Object Interaction: Object Movement Detection via Stable Changes." In Intelligent Video Event Analysis and Understanding, 195–210. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-17554-1_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kim, Bumsoo, Taeho Choi, Jaewoo Kang, and Hyunwoo J. Kim. "UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection." In Computer Vision – ECCV 2020, 498–514. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58555-6_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Human-object Interaction Detection"

1

Bergstrom, Trevor, and Humphrey Shi. "Human-Object Interaction Detection." In MM '20: The 28th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3422852.3423481.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Dongming, and Yuexian Zou. "A Graph-based Interactive Reasoning for Human-Object Interaction Detection." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/155.

Full text
Abstract:
Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring triplets of < human, verb, object >. However, recent HOI detection methods mostly rely on additional annotations (e.g., human pose) and neglect powerful interactive reasoning beyond convolutions. In this paper, we present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs, in which interactive semantics implied among visual targets are efficiently exploited. The proposed model consists of a project function that maps related targets from convolution space to a graph-based semantic space, a message passing process propagating semantics among all nodes and an update function transforming the reasoned nodes back to convolution space. Furthermore, we construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Beyond inferring HOIs using instance features respectively, the framework dynamically parses pairwise interactive semantics among visual targets by integrating two-level in-Graphs, i.e., scene-wide and instance-wide in-Graphs. Our framework is end-to-end trainable and free from costly annotations like human pose. Extensive experiments show that our proposed framework outperforms existing HOI detection methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 9.4% and 15% relatively, validating its efficacy in detecting HOIs.
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Dongming, Yuexian Zou, Can Zhang, Meng Cao, and Jie Chen. "RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/169.

Full text
Abstract:
Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects. Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. In this paper, we therefore propose novel relation reasoning for HOI detection. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.
APA, Harvard, Vancouver, ISO, and other styles
4

Sun, Xu, Yunqing He, Tongwei Ren, and Gangshan Wu. "Spatial-Temporal Human-Object Interaction Detection." In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021. http://dx.doi.org/10.1109/icme51207.2021.9428163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sugimoto, Masaki, Ryosuke Furuta, and Yukinobu Taniguchi. "Weakly-supervised Human-object Interaction Detection." In 16th International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2021. http://dx.doi.org/10.5220/0010196802930300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Tiancai, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, and Jian Sun. "Learning Human-Object Interaction Detection Using Interaction Points." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. http://dx.doi.org/10.1109/cvpr42600.2020.00417.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kilickaya, Mert, and Arnold Smeulders. "Diagnosing Rarity in Human-object Interaction Detection." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020. http://dx.doi.org/10.1109/cvprw50498.2020.00460.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kogashi, Kaen, Yang Wu, Shohei Nobuhara, and Ko Nishino. "Human-Object Interaction Detection with Missing Objects." In 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, 2021. http://dx.doi.org/10.23919/mva51890.2021.9511361.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Gao, Song, Hongyu Wang, Jilai Song, Fang Xu, and Fengshan Zou. "An Improved Human-Object Interaction Detection Network." In 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). IEEE, 2019. http://dx.doi.org/10.1109/icasid.2019.8924999.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhou, Desen, Zhichao Liu, Jian Wang, Leshan Wang, Tao Hu, Errui Ding, and Jingdong Wang. "Human-Object Interaction Detection via Disentangled Transformer." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.01896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography