Literatura académica sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

1

Wan, Yijie, and Mengqi Ren. "New Visual Expression of Anime Film Based on Artificial Intelligence and Machine Learning Technology." Journal of Sensors 2021 (June 26, 2021): 1–10. http://dx.doi.org/10.1155/2021/9945187.

Texto completo
Resumen
With the improvement of material living standards, spiritual entertainment has become more and more important. As a more popular spiritual entertainment project, film and television entertainment is gradually receiving attention from people. However, in recent years, the film industry has developed rapidly, and the output of animation movies has also increased year by year. How to quickly and accurately find the user’s favorite movies in the huge amount of animation movie data has become an urgent problem. Based on the above background, the purpose of this article is to study the new visual ex
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Anh, Dao Nam. "Interestingness Improvement of Face Images by Learning Visual Saliency." Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no. 5 (2020): 630–37. http://dx.doi.org/10.20965/jaciii.2020.p0630.

Texto completo
Resumen
Connecting features of face images with the interestingness of a face may assist in a range of applications such as intelligent visual human-machine communication. To enable the connection, we use interestingness and image features in combination with machine learning techniques. In this paper, we use visual saliency of face images as learning features to classify the interestingness of the images. Applying multiple saliency detection techniques specifically to objects in the images allows us to create a database of saliency-based features. Consistent estimation of facial interestingness and u
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

V., Dr Suma. "COMPUTER VISION FOR HUMAN-MACHINE INTERACTION-REVIEW." Journal of Trends in Computer Science and Smart Technology 2019, no. 02 (2019): 131–39. http://dx.doi.org/10.36548/jtcsst.2019.2.006.

Texto completo
Resumen
The paper is a review on the computer vision that is helpful in the interaction between the human and the machines. The computer vision that is termed as the subfield of the artificial intelligence and the machine learning is capable of training the computer to visualize, interpret and respond back to the visual world in a similar way as the human vision does. Nowadays the computer vision has found its application in broader areas such as the heath care, safety security, surveillance etc. due to the progress, developments and latest innovations in the artificial intelligence, deep learning and
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Prijs, Jasper, Zhibin Liao, Soheil Ashkani-Esfahani, et al. "Artificial intelligence and computer vision in orthopaedic trauma." Bone & Joint Journal 104-B, no. 8 (2022): 911–14. http://dx.doi.org/10.1302/0301-620x.104b8.bjj-2022-0119.r1.

Texto completo
Resumen
Artificial intelligence (AI) is, in essence, the concept of ‘computer thinking’, encompassing methods that train computers to perform and learn from executing certain tasks, called machine learning, and methods to build intricate computer models that both learn and adapt, called complex neural networks. Computer vision is a function of AI by which machine learning and complex neural networks can be applied to enable computers to capture, analyze, and interpret information from clinical images and visual inputs. This annotation summarizes key considerations and future perspectives concerning co
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Liu, Yang, Anbu Huang, Yun Luo, et al. "Federated Learning-Powered Visual Object Detection for Safety Monitoring." AI Magazine 42, no. 2 (2021): 19–27. http://dx.doi.org/10.1609/aimag.v42i2.15095.

Texto completo
Resumen
Visual object detection is an important artificial intelligence (AI) technique for safety monitoring applications. Current approaches for building visual object detection models require large and well-labeled dataset stored by a centralized entity. This not only poses privacy concerns under the General Data Protection Regulation (GDPR), but also incurs large transmission and storage overhead. Federated learning (FL) is a promising machine learning paradigm to address these challenges. In this paper, we report on FedVision—a machine learning engineering platform to support the development of fe
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

L, Anusha, and Nagaraja G. S. "Outlier Detection in High Dimensional Data." International Journal of Engineering and Advanced Technology 10, no. 5 (2021): 128–30. http://dx.doi.org/10.35940/ijeat.e2675.0610521.

Texto completo
Resumen
Artificial intelligence (AI) is the science that allows computers to replicate human intelligence in areas such as decision-making, text processing, visual perception. Artificial Intelligence is the broader field that contains several subfields such as machine learning, robotics, and computer vision. Machine Learning is a branch of Artificial Intelligence that allows a machine to learn and improve at a task over time. Deep Learning is a subset of machine learning that makes use of deep artificial neural networks for training. The paper proposed on outlier detection for multivariate high dimens
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Li, Jing, and Guangren Zhou. "Visual Information Features and Machine Learning for Wushu Arts Tracking." Journal of Healthcare Engineering 2021 (August 4, 2021): 1–6. http://dx.doi.org/10.1155/2021/6713062.

Texto completo
Resumen
Martial arts tracking is an important research topic in computer vision and artificial intelligence. It has extensive and vital applications in video monitoring, interactive animation and 3D simulation, motion capture, and advanced human-computer interaction. However, due to the change of martial arts’ body posture, clothing variability, and light mixing, the appearance changes significantly. As a result, accurate posture tracking becomes a complicated problem. A solution to this complicated problem is studied in this paper. The proposed solution improves the accuracy of martial arts tracking
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Mogadala, Aditya, Marimuthu Kalimuthu, and Dietrich Klakow. "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods." Journal of Artificial Intelligence Research 71 (August 30, 2021): 1183–317. http://dx.doi.org/10.1613/jair.1.11688.

Texto completo
Resumen
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussi
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Zhou, Zhiyu, Jiangfei Ji, Yaming Wang, Zefei Zhu, and Ji Chen. "Hybrid regression model via multivariate adaptive regression spline and online sequential extreme learning machine and its application in vision servo system." International Journal of Advanced Robotic Systems 19, no. 3 (2022): 172988062211086. http://dx.doi.org/10.1177/17298806221108603.

Texto completo
Resumen
To solve the problems of slow convergence speed, poor robustness, and complex calculation of image Jacobian matrix in image-based visual servo system, a hybrid regression model based on multiple adaptive regression spline and online sequential extreme learning machine is proposed to predict the product of pseudo inverse of image Jacobian matrix and image feature error and online sequential extreme learning machine is proposed to predict the product of pseudo inverse of image Jacobian matrix and image feature error. In MOS-ELM, MARS is used to evaluate the importance of input features and selec
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Liu, Yang, Anbu Huang, Yun Luo, et al. "FedVision: An Online Visual Object Detection Platform Powered by Federated Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 08 (2020): 13172–79. http://dx.doi.org/10.1609/aaai.v34i08.7021.

Texto completo
Resumen
Visual object detection is a computer vision-based artificial intelligence (AI) technique which has many practical applications (e.g., fire hazard monitoring). However, due to privacy concerns and the high cost of transmitting video data, it is highly challenging to build object detection models on centrally stored large training datasets following the current approach. Federated learning (FL) is a promising approach to resolve this challenge. Nevertheless, there currently lacks an easy to use tool to enable computer vision application developers who are not experts in federated learning to co
Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

1

Mahendru, Aroma. "Role of Premises in Visual Question Answering." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78030.

Texto completo
Resumen
In this work, we make a simple but important observation questions about images often contain premises -- objects and relationships implied by the question -- and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its pr
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Bui, Anh Duc. "Visual Scene Understanding through Scene Graph Generation and Joint Learning." Thesis, University of Sydney, 2023. https://hdl.handle.net/2123/29954.

Texto completo
Resumen
Deep visual scene understanding is an essential part for the development of high-level visual understanding tasks such as storytelling or Visual Question Answering. One of the proposed solutions for such purposes were Scene Graphs, with the capacity to represent the semantic details of images into abstract elements using a graph structure which is both suitable for machine processing as well as human understanding. However, automatically generating reasonable and informative scene graphs remains a challenge due to the problem of long tail biases present in the annotated data available. Therefo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Rochford, Matthew. "Visual Speech Recognition Using a 3D Convolutional Neural Network." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2109.

Texto completo
Resumen
Main stream automatic speech recognition (ASR) makes use of audio data to identify spoken words, however visual speech recognition (VSR) has recently been of increased interest to researchers. VSR is used when audio data is corrupted or missing entirely and also to further enhance the accuracy of audio-based ASR systems. In this research, we present both a framework for building 3D feature cubes of lip data from videos and a 3D convolutional neural network (CNN) architecture for performing classification on a dataset of 100 spoken words, recorded in an uncontrolled envi- ronment. Our 3D-CNN ar
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Salem, Tawfiq. "Learning to Map the Visual and Auditory World." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/86.

Texto completo
Resumen
The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Billions of images that capture this complex relationship are uploaded to social-media websites every day and often are associated with precise time and location metadata. This rich source of data can be beneficial to improve our understanding of the globe. In this work, we propose a general framework that uses these publicly available images for constructing dense maps of different ground-level attributes from overhead imagery. In particular, we use well-defined probabil
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Azizpour, Hossein. "Visual Representations and Models: From Latent SVM to Deep Learning." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.

Texto completo
Resumen
Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning. First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sample
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Warnakulasuriya, Tharindu R. "Context modelling for single and multi agent trajectory prediction." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/128480/1/Tharindu_Warnakulasuriya_Thesis.pdf.

Texto completo
Resumen
This research addresses the problem of predicting future agent behaviour in both single and multi agent settings where multiple agents can enter and exit an environment, and the environment can change dynamically. Both short-term and long-term context was captured in the given domain and utilised neural memory networks to use the derived knowledge for the prediction task. The efficacy of the techniques was demonstrated by applying it to aircraft path prediction, passenger movement prediction in crowded railway stations, driverless car steering, predicting next shot location in tennis and for p
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Hernández-Vela, Antonio. "From pixels to gestures: learning visual representations for human analysis in color and depth data sequences." Doctoral thesis, Universitat de Barcelona, 2015. http://hdl.handle.net/10803/292488.

Texto completo
Resumen
The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others. In this dissertation in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RCB and depth image modalities and address the problem from three different research lines, at different levels of abstraction;
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Novotný, Václav. "Rozpoznání displeje embedded zařízení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-376924.

Texto completo
Resumen
This master thesis deals with usage of machine learning methods in computer vision for classification of unknown images. The first part contains research of available machine learning methods, their limitations and also their suitability for this task. The second part describes the processes of creating training and testing gallery. In the practical part, the solution for the problem is proposed and later realised and implemented. Proper testing and evaluation of resulting system is conducted.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

ALTIERI, ALEX. "Yacht experience, ricerca e sviluppo di soluzioni basate su intelligenza artificiale per il comfort e la sicurezza in alto mare." Doctoral thesis, Università Politecnica delle Marche, 2021. http://hdl.handle.net/11566/287605.

Texto completo
Resumen
La tesi descrive i risultati dell’attività di ricerca e sviluppo di nuove tecnologie basate su tecniche di intelligenza artificiale, capaci di raggiungere un’interazione empatica e una connessione emotiva tra l’uomo e “le macchine” così da migliorare il comfort e la sicurezza a bordo di uno yacht. Tale interazione è ottenuta grazie al riconoscimento di emozioni e comportamenti e alla successiva attivazione di tutti quegli apparati multimediali presenti nell’ambiente a bordo, che si adattano al mood del soggetto all’interno della stanza. Il sistema prototipale sviluppato durante i tre anni di d
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Zanca, Dario. "Towards laws of visual attention." Doctoral thesis, 2019. http://hdl.handle.net/2158/1159344.

Texto completo
Resumen
Visual attention is a crucial process for humans and foveated animals in general. The ability to select extit{relevant} locations in the visual field greatly simplifies the problem of extit{vision}. It allows a parsimonious management of the computational resources while catching and tracking coherences within the observed temporal phenomenon. Understanding the mechanisms of attention can reveal a lot about human intelligence. At the same time, it seems increasingly important for building intelligent artificial agents that aim at approaching human performance in real-world visual tasks. For
Los estilos APA, Harvard, Vancouver, ISO, etc.

Libros sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

1

Yong, Rui, and Huang Thomas S. 1936-, eds. Exploration of visual data. Kluwer Academic Publishers, 2003.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Visual Saliency Computation: A Machine Learning Perspective. Springer, 2014.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Gao, Wen, and Jia Li. Visual Saliency Computation: A Machine Learning Perspective. Springer London, Limited, 2014.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Huang, Thomas S. Exploration of Visual Data. Springer My Copy UK, 2003.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Huang, Thomas S., Yong Rui, and Sean Xiang Zhou. Exploration of Visual Data (The International Series in Video Computing). Springer, 2003.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

1

Madani, Kurosh. "Robots’ Vision Humanization Through Machine-Learning Based Artificial Visual Attention." In Communications in Computer and Information Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-35430-5_2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Whitworth, Brian, and Hokyoung Ryu. "A Comparison of Human and Computer Information Processing." In Machine Learning. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-60960-818-7.ch101.

Texto completo
Resumen
Over 30 years ago, TV shows from The Jetsons to Star Trek suggested that by the millennium’s end computers would read, talk, recognize, walk, converse, think, and maybe even feel. People do these things easily, so how hard could it be? However, in general we still don’t talk to our computers, cars, or houses, and they still don’t talk to us. The Roomba, a successful household robot, is a functional flat round machine that neither talks to nor recognizes its owner. Its “smart” programming tries mainly to stop it getting “stuck,” which it still frequently does, either by getting jammed somewhere or tangling in things like carpet tassels. The idea that computers are incredibly clever is changing, as when computers enter human specialties like conversation, many people find them more stupid than smart, as any “conversation” with a computer help can illustrate. Computers do easily do calculation tasks that people find hard, but the opposite also applies, for example, people quickly recognize familiar faces but computers still cannot recognize known terrorist faces at airport check-ins. Apparently minor variations, like lighting, facial angle, or expression, accessories like glasses or hat, upset them. Figure 1 shows a Letraset page, which any small child would easily recognize as letter “As” but computers find this extremely difficult. People find such visual tasks easy, so few in artificial intelligence (AI) appreciated the difficulties of computer-vision at first. Initial advances were rapid, but AI has struck a 99% barrier, for example, computer voice recognition is 99% accurate but one error per 100 words is unacceptable. There are no computer controlled “auto-drive” cars because 99% accuracy means an accident every month or so, which is also unacceptable. In contrast, the “mean time between accidents” of competent human drivers is years not months, and good drivers go 10+ years without accidents. Other problems easy for most people but hard for computers are language translation, speech recognition, problem solving, social interaction, and spatial coordination.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Khare, Neelu, Brijendra Singh, and Munis Ahmed Rizvi. "Deep Learning Methods for Modelling Emotional Intelligence." In Multidisciplinary Applications of Deep Learning-Based Artificial Emotional Intelligence. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-6684-5673-6.ch015.

Texto completo
Resumen
Machine learning and deep learning play a vital role in making smart decisions, especially with huge amounts of data. Identifying the emotional intelligence levels of individuals helps them to avoid superfluous problems in the workplace or in society. Emotions reflect the psychological state of a person or represent a quick (a few minutes or seconds) reactions to a stimulus. Emotions can be categorized on the basis of a person's feelings in a situation: positive, negative, and neutral. Emotional intelligence seeks attention from computer engineers and psychologists to work together to address EI. However, identifying human emotions through deep learning methods is still a challenging task in computer vision. This chapter investigates deep learning models for the recognition and assessment of emotional states with diverse emotional data such as speech and video streaming. Finally, the conclusion summarises the usefulness of DL methods in assessing human emotions. It helps future researchers carry out their work in the field of deep learning-based emotional artificial intelligence.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Cao, Yushi, Yon Shin Teo, Yan Zheng, Yuxuan Toh, and Shang-Wei Lin. "A Holistic Automated Software Structure Exploration Framework for Testing." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2022. http://dx.doi.org/10.3233/faia220259.

Texto completo
Resumen
Exploring the underlying structure of a Human-Machine Interface (HMI) product effectively while adhering to the pre-defined test conditions and methodology is critical for validating the quality of the software. We propose an reinforcement-learning powered Automated Software Structure Exploration Framework for Testing (ASSET), which is capable of interacting with and analyzing the HMI software under testing (SUT). The main challenge is to incorporate the human instructions into the ASSET phase by using the visual feedback such as the downloaded image sequence from the HMI, which could be difficult to analyze. Our framework combines both computer vision and natural language processing techniques to understand the semantic meanings of the visual feedback. Building on the semantic understanding, we develop a rules-guided software exploration algorithm via reinforcement learning and deterministic finite automaton (DFA). We conducted experiments on HMI software in actual production phase and demonstrate that the exploration coverage and efficiency of our framework outperforms current start-of-art methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Whittlestone, Jess. "AI and Decision-Making." In Future Morality. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780198862086.003.0010.

Texto completo
Resumen
This chapter assesses how advances in artificial intelligence (AI) can help us address the biggest global challenges we face today. Psychology research has painted a pessimistic picture of human decision-making in recent decades: documenting a whole host of biases and irrationalities people are prone to. We find it difficult to be motivated by long-term, abstract, or statistical considerations; many global challenges are far too complex for a human brain to understand in its entirety; and we cannot predict far into the future with any degree of certainty. At the same time, advances in AI are receiving increasing amounts of attention, raising the question: might we be able to leverage these AI developments to improve human decision-making on the problems that matter most for humanity’s future? If so, how? Thinking about AI more as supporting and complementing human decisions, than as replacing them, we might find that what we most need is quite far from the most sophisticated machine learning capabilities that are the subject of hype and research attention today. For many important real-world problems, what is most needed is not necessarily better computer vision or natural language processing but simpler ways to do large-scale data analysis, and practical tools for structuring reasoning and decision-making.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Visual attention, artificial intelligence, machine learning, computer vision"

1

Jiao, Zhicheng, Haoxuan You, Fan Yang, Xin Li, Han Zhang, and Dinggang Shen. "Decoding EEG by Visual-guided Deep Neural Networks." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/192.

Texto completo
Resumen
Decoding visual stimuli from brain activities is an interdisciplinary study of neuroscience and computer vision. With the emerging of Human-AI Collaboration, Human-Computer Interaction, and the development of advanced machine learning models, brain decoding based on deep learning attracts more attention. Electroencephalogram (EEG) is a widely used neurophysiology tool. Inspired by the success of deep learning on image representation and neural decoding, we proposed a visual-guided EEG decoding method that contains a decoding stage and a generation stage. In the classification stage, we designed
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Zhang, Licheng, Xianzhi Wang, Lina Yao, Lin Wu, and Feng Zheng. "Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/126.

Texto completo
Resumen
Zero-shot object detection (ZSD) has received considerable attention from the community of computer vision in recent years. It aims to simultaneously locate and categorize previously unseen objects during inference. One crucial problem of ZSD is how to accurately predict the label of each object proposal, i.e. categorizing object proposals, when conducting ZSD for unseen categories. Previous ZSD models generally relied on learning an embedding from visual space to semantic space or learning a joint embedding between semantic description and visual representation. As the features in the learned
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Wang, Yaxiong, Hao Yang, Xueming Qian, et al. "Position Focused Attention Network for Image-Text Matching." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/526.

Texto completo
Resumen
Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the re
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Venkata Sai Saran Naraharisetti, Sree Veera, Benjamin Greenfield, Benjamin Placzek, Steven Atilho, Mohamad Nassar, and Mehdi Mekni. "A Novel Intelligent Image-Processing Parking Systems." In 3rd International Conference on Artificial Intelligence and Machine Learning (CAIML 2022). Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.121212.

Texto completo
Resumen
The scientific community is looking for efficient solutions to improve the quality of life in large cities because of traffic congestion, driving experience, air pollution, and energy consumption. This surge exceeds the capacity of existing transit infrastructure and parking facilities. Intelligent Parking Systems (SPS) that can accommodate short-term parking demand are a must-have for smart city development. SPS are designed to count the number of parked automobiles and identify available parking spaces. In this paper, we present a novel SPS based on real-time computer vision techniques. The
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Diao, Xiaolei. "Building a Visual Semantics Aware Object Hierarchy." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/826.

Texto completo
Resumen
The semantic gap is defined as the difference between the linguistic representations of the same concept, which usually leads to misunderstanding between individuals with different knowledge backgrounds. Since linguistically annotated images are extensively used for training machine learning models, semantic gap problem (SGP) also results in inevitable bias on image annotations and further leads to poor performance on current computer vision tasks. To address this problem, we propose a novel unsupervised method to build visual semantics aware object hierarchy, aiming to get a classification mo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Lin, Jianxin, Yingce Xia, Yijun Wang, Tao Qin, and Zhibo Chen. "Image-to-Image Translation with Multi-Path Consistency Regularization." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/413.

Texto completo
Resumen
Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from a source domain to a target domain as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version. In this work, we introduce a new kind of loss, multi-path consistency loss, which eval
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!