Journal articles on the topic 'Visual attention, artificial intelligence, machine learning, computer vision'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Visual attention, artificial intelligence, machine learning, computer vision.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wan, Yijie, and Mengqi Ren. "New Visual Expression of Anime Film Based on Artificial Intelligence and Machine Learning Technology." Journal of Sensors 2021 (June 26, 2021): 1–10. http://dx.doi.org/10.1155/2021/9945187.

Full text
Abstract:
With the improvement of material living standards, spiritual entertainment has become more and more important. As a more popular spiritual entertainment project, film and television entertainment is gradually receiving attention from people. However, in recent years, the film industry has developed rapidly, and the output of animation movies has also increased year by year. How to quickly and accurately find the user’s favorite movies in the huge amount of animation movie data has become an urgent problem. Based on the above background, the purpose of this article is to study the new visual expression of animation movies based on artificial intelligence and machine learning technology. This article takes the film industry’s informatization and intelligent development and upgrading as the background, uses computer vision and machine learning technology as the basis to explore new methods and new models for realizing film visual expression, and proposes relevant thinking to promote the innovative development of film visual expression from a strategic level. This article takes the Hollywood anime movie “Kung Fu Panda” as a sample and uses convolutional neural algorithms to study its new visual expression. The study found that after the parameters of the model were determined, the accuracy of the test set did not change much, all around 57%. This is of great significance for improving the audiovisual quality and creative standards of film works and promoting the healthy and sustainable development of the film industry.
APA, Harvard, Vancouver, ISO, and other styles
2

Anh, Dao Nam. "Interestingness Improvement of Face Images by Learning Visual Saliency." Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no. 5 (September 20, 2020): 630–37. http://dx.doi.org/10.20965/jaciii.2020.p0630.

Full text
Abstract:
Connecting features of face images with the interestingness of a face may assist in a range of applications such as intelligent visual human-machine communication. To enable the connection, we use interestingness and image features in combination with machine learning techniques. In this paper, we use visual saliency of face images as learning features to classify the interestingness of the images. Applying multiple saliency detection techniques specifically to objects in the images allows us to create a database of saliency-based features. Consistent estimation of facial interestingness and using multiple saliency methods contribute to estimate, and exclusively, to modify the interestingness of the image. To investigate interestingness – one of the personal characteristics in a face image, a large benchmark face database is tested using our method. Taken together, the method may advance prospects for further research incorporating other personal characteristics and visual attention related to face images.
APA, Harvard, Vancouver, ISO, and other styles
3

V., Dr Suma. "COMPUTER VISION FOR HUMAN-MACHINE INTERACTION-REVIEW." Journal of Trends in Computer Science and Smart Technology 2019, no. 02 (December 29, 2019): 131–39. http://dx.doi.org/10.36548/jtcsst.2019.2.006.

Full text
Abstract:
The paper is a review on the computer vision that is helpful in the interaction between the human and the machines. The computer vision that is termed as the subfield of the artificial intelligence and the machine learning is capable of training the computer to visualize, interpret and respond back to the visual world in a similar way as the human vision does. Nowadays the computer vision has found its application in broader areas such as the heath care, safety security, surveillance etc. due to the progress, developments and latest innovations in the artificial intelligence, deep learning and neural networks. The paper presents the enhanced capabilities of the computer vision experienced in various applications related to the interactions between the human and machines involving the artificial intelligence, deep learning and the neural networks.
APA, Harvard, Vancouver, ISO, and other styles
4

Prijs, Jasper, Zhibin Liao, Soheil Ashkani-Esfahani, Jakub Olczak, Max Gordon, Prakash Jayakumar, Paul C. Jutte, Ruurd L. Jaarsma, Frank F. A. IJpma, and Job N. Doornberg. "Artificial intelligence and computer vision in orthopaedic trauma." Bone & Joint Journal 104-B, no. 8 (August 1, 2022): 911–14. http://dx.doi.org/10.1302/0301-620x.104b8.bjj-2022-0119.r1.

Full text
Abstract:
Artificial intelligence (AI) is, in essence, the concept of ‘computer thinking’, encompassing methods that train computers to perform and learn from executing certain tasks, called machine learning, and methods to build intricate computer models that both learn and adapt, called complex neural networks. Computer vision is a function of AI by which machine learning and complex neural networks can be applied to enable computers to capture, analyze, and interpret information from clinical images and visual inputs. This annotation summarizes key considerations and future perspectives concerning computer vision, questioning the need for this technology (the ‘why’), the current applications (the ‘what’), and the approach to unlocking its full potential (the ‘how’). Cite this article: Bone Joint J 2022;104-B(8):911–914.
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Yang, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. "Federated Learning-Powered Visual Object Detection for Safety Monitoring." AI Magazine 42, no. 2 (October 20, 2021): 19–27. http://dx.doi.org/10.1609/aimag.v42i2.15095.

Full text
Abstract:
Visual object detection is an important artificial intelligence (AI) technique for safety monitoring applications. Current approaches for building visual object detection models require large and well-labeled dataset stored by a centralized entity. This not only poses privacy concerns under the General Data Protection Regulation (GDPR), but also incurs large transmission and storage overhead. Federated learning (FL) is a promising machine learning paradigm to address these challenges. In this paper, we report on FedVision—a machine learning engineering platform to support the development of federated learning powered computer vision applications—to bridge this important gap. The platform has been deployed through collaboration between WeBank and Extreme Vision to help customers develop computer vision-based safety monitoring solutions in smart city applications. Through actual usage, it has demonstrated significant efficiency improvement and cost reduction while fulfilling privacy-preservation requirements (e.g., reducing communication overhead for one company by 50 fold and saving close to 40,000RMB of network cost per annum). To the best of our knowledge, this is the first practical application of FL in computer vision-based tasks.
APA, Harvard, Vancouver, ISO, and other styles
6

L, Anusha, and Nagaraja G. S. "Outlier Detection in High Dimensional Data." International Journal of Engineering and Advanced Technology 10, no. 5 (June 30, 2021): 128–30. http://dx.doi.org/10.35940/ijeat.e2675.0610521.

Full text
Abstract:
Artificial intelligence (AI) is the science that allows computers to replicate human intelligence in areas such as decision-making, text processing, visual perception. Artificial Intelligence is the broader field that contains several subfields such as machine learning, robotics, and computer vision. Machine Learning is a branch of Artificial Intelligence that allows a machine to learn and improve at a task over time. Deep Learning is a subset of machine learning that makes use of deep artificial neural networks for training. The paper proposed on outlier detection for multivariate high dimensional data for Autoencoder unsupervised model.
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Jing, and Guangren Zhou. "Visual Information Features and Machine Learning for Wushu Arts Tracking." Journal of Healthcare Engineering 2021 (August 4, 2021): 1–6. http://dx.doi.org/10.1155/2021/6713062.

Full text
Abstract:
Martial arts tracking is an important research topic in computer vision and artificial intelligence. It has extensive and vital applications in video monitoring, interactive animation and 3D simulation, motion capture, and advanced human-computer interaction. However, due to the change of martial arts’ body posture, clothing variability, and light mixing, the appearance changes significantly. As a result, accurate posture tracking becomes a complicated problem. A solution to this complicated problem is studied in this paper. The proposed solution improves the accuracy of martial arts tracking by the image representation method of martial arts tracking. This method is based on the second-generation strip wave transform and applies it to the video martial arts tracking based on the machine learning method.
APA, Harvard, Vancouver, ISO, and other styles
8

Mogadala, Aditya, Marimuthu Kalimuthu, and Dietrich Klakow. "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods." Journal of Artificial Intelligence Research 71 (August 30, 2021): 1183–317. http://dx.doi.org/10.1613/jair.1.11688.

Full text
Abstract:
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhou, Zhiyu, Jiangfei Ji, Yaming Wang, Zefei Zhu, and Ji Chen. "Hybrid regression model via multivariate adaptive regression spline and online sequential extreme learning machine and its application in vision servo system." International Journal of Advanced Robotic Systems 19, no. 3 (May 1, 2022): 172988062211086. http://dx.doi.org/10.1177/17298806221108603.

Full text
Abstract:
To solve the problems of slow convergence speed, poor robustness, and complex calculation of image Jacobian matrix in image-based visual servo system, a hybrid regression model based on multiple adaptive regression spline and online sequential extreme learning machine is proposed to predict the product of pseudo inverse of image Jacobian matrix and image feature error and online sequential extreme learning machine is proposed to predict the product of pseudo inverse of image Jacobian matrix and image feature error. In MOS-ELM, MARS is used to evaluate the importance of input features and select specific features as the input features of online sequential extreme learning machine, so as to obtain better generalization performance and increase the stability of regression model. Finally, the method is applied to the speed predictive control of the manipulator end effector controlled by image-based visual servo and the prediction of machine learning data sets. Experimental results show that the algorithm has high prediction accuracy on machine learning data sets and good control performance in image-based visual servo.
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, Yang, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. "FedVision: An Online Visual Object Detection Platform Powered by Federated Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 08 (April 3, 2020): 13172–79. http://dx.doi.org/10.1609/aaai.v34i08.7021.

Full text
Abstract:
Visual object detection is a computer vision-based artificial intelligence (AI) technique which has many practical applications (e.g., fire hazard monitoring). However, due to privacy concerns and the high cost of transmitting video data, it is highly challenging to build object detection models on centrally stored large training datasets following the current approach. Federated learning (FL) is a promising approach to resolve this challenge. Nevertheless, there currently lacks an easy to use tool to enable computer vision application developers who are not experts in federated learning to conveniently leverage this technology and apply it in their systems. In this paper, we report FedVision - a machine learning engineering platform to support the development of federated learning powered computer vision applications. The platform has been deployed through a collaboration between WeBank and Extreme Vision to help customers develop computer vision-based safety monitoring solutions in smart city applications. Over four months of usage, it has achieved significant efficiency improvement and cost reduction while removing the need to transmit sensitive data for three major corporate customers. To the best of our knowledge, this is the first real application of FL in computer vision-based tasks.
APA, Harvard, Vancouver, ISO, and other styles
11

Jasim, Mohammed Saaduldeen, and Mohammed Chachan Younis. "Object-based Classification of Natural Scenes Using Machine Learning Methods." Technium: Romanian Journal of Applied Sciences and Technology 6 (February 8, 2023): 1–22. http://dx.doi.org/10.47577/technium.v6i.8286.

Full text
Abstract:
The replication of human intellectual processes by machines, particularly computer systems, is known as artificial intelligence (AI). AI is an intelligent tool that is utilized across sectors to improve decision making, increase productivity, and eliminate repetitive tasks. Machine learning (ML) is a key component of AI since it includes understanding and developing ways that can learn or improve performance on tasks. For the last decade, ML has been applied in computer vision (CV) applications. In computer vision, systems and computers extract meaningful data from digital videos, photos, and other visual sources and use that information to conduct actions or make suggestions. In this work, we have solved the image segmentation problem for the natural images to segment out water, land, and sky. Instead of applying image segmentation directly to the images, images are pre-processed, and statistical and textural features are then passed through a neural network for the pixel-wise semantic segmentation of the images. We chose the 5X5 window over the pixel-by-pixel technique since it requires less resources and time for training and testing.
APA, Harvard, Vancouver, ISO, and other styles
12

Kim, Chris, Xiao Lin, Christopher Collins, Graham W. Taylor, and Mohamed R. Amer. "Learn, Generate, Rank, Explain: A Case Study of Visual Explanation by Generative Machine Learning." ACM Transactions on Interactive Intelligent Systems 11, no. 3-4 (December 31, 2021): 1–34. http://dx.doi.org/10.1145/3465407.

Full text
Abstract:
While the computer vision problem of searching for activities in videos is usually addressed by using discriminative models, their decisions tend to be opaque and difficult for people to understand. We propose a case study of a novel machine learning approach for generative searching and ranking of motion capture activities with visual explanation. Instead of directly ranking videos in the database given a text query, our approach uses a variant of Generative Adversarial Networks (GANs) to generate exemplars based on the query and uses them to search for the activity of interest in a large database. Our model is able to achieve comparable results to its discriminative counterpart, while being able to dynamically generate visual explanations. In addition to our searching and ranking method, we present an explanation interface that enables the user to successfully explore the model’s explanations and its confidence by revealing query-based, model-generated motion capture clips that contributed to the model’s decision. Finally, we conducted a user study with 44 participants to show that by using our model and interface, participants benefit from a deeper understanding of the model’s conceptualization of the search query. We discovered that the XAI system yielded a comparable level of efficiency, accuracy, and user-machine synchronization as its black-box counterpart, if the user exhibited a high level of trust for AI explanation.
APA, Harvard, Vancouver, ISO, and other styles
13

Cao, Zehong. "A review of artificial intelligence for EEG‐based brain−computer interfaces and applications." Brain Science Advances 6, no. 3 (September 2020): 162–70. http://dx.doi.org/10.26599/bsa.2020.9050017.

Full text
Abstract:
The advancement in neuroscience and computer science promotes the ability of the human brain to communicate and interact with the environment, making brain–computer interface (BCI) top interdisciplinary research. Furthermore, with the modern technology advancement in artificial intelligence (AI), including machine learning (ML) and deep learning (DL) methods, there is vast growing interest in the electroencephalogram (EEG)‐based BCIs for AI‐related visual, literal, and motion applications. In this review study, the literature on mainstreams of AI for the EEG‐based BCI applications is investigated to fill gaps in the interdisciplinary BCI field. Specifically, the EEG signals and their main applications in BCI are first briefly introduced. Next, the latest AI technologies, including the ML and DL models, are presented to monitor and feedback human cognitive states. Finally, some BCI‐inspired AI applications, including computer vision, natural language processing, and robotic control applications, are presented. The future research directions of the EEG‐based BCI are highlighted in line with the AI technologies and applications.
APA, Harvard, Vancouver, ISO, and other styles
14

Hütten, Nils, Richard Meyes, and Tobias Meisen. "Vision Transformer in Industrial Visual Inspection." Applied Sciences 12, no. 23 (November 23, 2022): 11981. http://dx.doi.org/10.3390/app122311981.

Full text
Abstract:
Artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a potential paradigm shift and have the potential to facilitate an automated visual inspection, even under complex environmental conditions. Thereby, convolutional neural networks (CNN) have been the de facto standard in deep-learning-based computer vision (CV) for the last 10 years. Recently, attention-based vision transformer architectures emerged and surpassed the performance of CNNs on benchmark datasets, regarding regular CV tasks, such as image classification, object detection, or segmentation. Nevertheless, despite their outstanding results, the application of vision transformers to real world visual inspection is sparse. We suspect that this is likely due to the assumption that they require enormous amounts of data to be effective. In this study, we evaluate this assumption. For this, we perform a systematic comparison of seven widely-used state-of-the-art CNN and transformer based architectures trained in three different use cases in the domain of visual damage assessment for railway freight car maintenance. We show that vision transformer models achieve at least equivalent performance to CNNs in industrial applications with sparse data available, and significantly surpass them in increasingly complex tasks.
APA, Harvard, Vancouver, ISO, and other styles
15

Xu, Yifan, Huapeng Wei, Minxuan Lin, Yingying Deng, Kekai Sheng, Mengdan Zhang, Fan Tang, Weiming Dong, Feiyue Huang, and Changsheng Xu. "Transformers in computational visual media: A survey." Computational Visual Media 8, no. 1 (October 27, 2021): 33–62. http://dx.doi.org/10.1007/s41095-021-0247-3.

Full text
Abstract:
AbstractTransformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development.
APA, Harvard, Vancouver, ISO, and other styles
16

Allawadi, Sidhant, Jayaty, Parmod Sharma, Kapil Rohilla, and Gopal Deokar. "Artificial intelligence: A cutting edge technology in agriculture." INTERNATIONAL JOURNAL OF AGRICULTURAL SCIENCES 17, no. 1 (January 15, 2021): 114–20. http://dx.doi.org/10.15740/has/ijas/17.1/114-120.

Full text
Abstract:
Attention is currently being paid to the use of smart technologies. Agriculture has provided an important source of food for humans over thousands of years, including the development of appropriate farming methods for the cultivation of different crops. The emergence of new advanced technologies has the potential to monitor the agricultural environment to ensure high-quality produce. In this context, a systematic review that aimsto study the application of various technologies and algorithms in Artificial Intelligence (AI) with the latest solutions to make the farming more efficient remains one of the greatest imperatives. Artificial intelligence can be applied directly in the field of agriculture for various operations. Amid high expectations about how AI will help the common personand transform his mindset, thoughts and attitude towards the benefits that it may bring. There are certain concerns about the ill effects of such sophisticated technologies as well.This review also focuses on the activation of perceptive technologies and application of computer vision and machine learning in agriculture.
APA, Harvard, Vancouver, ISO, and other styles
17

Lioutas, Vasileios, Nikolaos Passalis, and Anastasios Tefas. "Explicit ensemble attention learning for improving visual question answering." Pattern Recognition Letters 111 (August 2018): 51–57. http://dx.doi.org/10.1016/j.patrec.2018.04.031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Chrupała, Grzegorz. "Visually Grounded Models of Spoken Language: A Survey of Datasets, Architectures and Evaluation Techniques." Journal of Artificial Intelligence Research 73 (February 18, 2022): 673–707. http://dx.doi.org/10.1613/jair.1.12967.

Full text
Abstract:
This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques.
APA, Harvard, Vancouver, ISO, and other styles
19

Caelli, Terry, and Walter F. Bischof. "The Role of Machine Learning in Building Image Interpretation Systems." International Journal of Pattern Recognition and Artificial Intelligence 11, no. 01 (February 1997): 143–68. http://dx.doi.org/10.1142/s021800149700007x.

Full text
Abstract:
Machine learning has been applied to many problems related to scene interpretation. It has become clear from these studies that it is important to develop or choose learning procedures appropriate for the types of data models involved in a given problem formulation. In this paper, we focus on this issue of learning with respect to different data structures and consider, in particular, problems related to the learning of relational structures in visual data. Finally, we discuss problems related to rule evaluation in multi-object complex scenes and introduce some new techniques to solve them.
APA, Harvard, Vancouver, ISO, and other styles
20

Cui, Shaowei, Rui Wang, Junhang Wei, Jingyi Hu, and Shuo Wang. "Self-Attention Based Visual-Tactile Fusion Learning for Predicting Grasp Outcomes." IEEE Robotics and Automation Letters 5, no. 4 (October 2020): 5827–34. http://dx.doi.org/10.1109/lra.2020.3010720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Jafary, P., D. Shojaei, A. Rajabifard, and T. Ngo. "A FRAMEWORK TO INTEGRATE BIM WITH ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-BASED PROPERTY VALUATION METHODS." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-4/W2-2022 (October 14, 2022): 129–36. http://dx.doi.org/10.5194/isprs-annals-x-4-w2-2022-129-2022.

Full text
Abstract:
Abstract. Property valuation is of extreme importance since variations in the real estate market enormously influence people’s life. The main goal of Automated Valuation Models (AVMs) is to calculate the market value of a large number of properties with an acceptable accuracy. The Hedonic Price Model (HPM) is the most widely used AVM for the valuation purposes. Despite its simplicity, ease of use and straightforwardness, HPM lacks the capability to address the non-linear relationships between different value-related factors. Hence, researchers have developed other state-of-the-art property valuation methods based on the advancements in computer science including Artificial Intelligence (AI), Machine Learning (ML), computer vision and deep learning. Design, development, and validation of such advanced AVMs require establishment of a database including data on the different influential factors. Two types of factors are used in the literature, including textual and visual features. Reliable data sources are required for the implementation of AVMs since the accuracy of the provided valuations is definitely linked to the reliability of the used real estate databases. Building Information Modelling (BIM) provides precise information on different components of properties. Although some scholars have tried to use BIM for property valuation, BIM benefits in different valuation procedures have not been fully investigated. Hence, this paper provides a framework that consider BIM capabilities to be integrated with different stages and processes in property valuation, especially in relation to advanced AVMs based on AI and ML.
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Junbo, Wei Wang, Liang Wang, Zhiyong Wang, David Dagan Feng, and Tieniu Tan. "Learning visual relationship and context-aware attention for image captioning." Pattern Recognition 98 (February 2020): 107075. http://dx.doi.org/10.1016/j.patcog.2019.107075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Kaur, Sumit. "Deep Learning Based High-Resolution Remote Sensing Image classification." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 10 (October 30, 2017): 22. http://dx.doi.org/10.23956/ijarcsse.v7i10.384.

Full text
Abstract:
Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification.
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, T. "A Deep Learning-Based Programming and Creation Algorithm of NFT Artwork." Mobile Information Systems 2022 (September 1, 2022): 1–10. http://dx.doi.org/10.1155/2022/2325179.

Full text
Abstract:
In the field of computer vision, it is a very challenging task to use artificial intelligence deep learning method to realize the programming and creation of NFT artwork. With the continuous development and improvement of deep learning technology, this task has become a reality. The generative adversarial network model used in deep learning can generate new images based on the extraction and analysis of image data features and has become an important tool for NFT artwork image generation. In order to better realize the NFT artwork programming, this paper analyzes the working principle of the traditional adversarial generation method and then uses the StyleGAN model to edit the higher-level attributes of the image, which can effectively control the generated style and style of the NFT artwork image. Finally, in order to improve the quality of the generated images, this paper introduces a channel attention mechanism and a spatial attention mechanism to ensure that the generated images are more reasonable and realistic. Finally, through a large number of experiments, it is proved that the NFT artwork transmission programming algorithm based on artificial intelligence deep learning proposed in this paper can control the overall style of image generation according to the needs of the transmission, and the generated image features have good details and high visual quality.
APA, Harvard, Vancouver, ISO, and other styles
25

Schlosser, Tobias, Michael Friedrich, Frederik Beuth, and Danny Kowerko. "Improving automated visual fault inspection for semiconductor manufacturing using a hybrid multistage system of deep neural networks." Journal of Intelligent Manufacturing 33, no. 4 (January 25, 2022): 1099–123. http://dx.doi.org/10.1007/s10845-021-01906-9.

Full text
Abstract:
AbstractIn the semiconductor industry, automated visual inspection aims to improve the detection and recognition of manufacturing defects by leveraging the power of artificial intelligence and computer vision systems, enabling manufacturers to profit from an increased yield and reduced manufacturing costs. Previous domain-specific contributions often utilized classical computer vision approaches, whereas more novel systems deploy deep learning based ones. However, a persistent problem in the domain stems from the recognition of very small defect patterns which are often in the size of only a few $$\mu $$ μ m and pixels within vast amounts of high-resolution imagery. While these defect patterns occur on the significantly larger wafer surface, classical machine and deep learning solutions have problems in dealing with the complexity of this challenge. This contribution introduces a novel hybrid multistage system of stacked deep neural networks (SH-DNN) which allows the localization of the finest structures within pixel size via a classical computer vision pipeline, while the classification process is realized by deep neural networks. The proposed system draws the focus over the level of detail from its structures to more task-relevant areas of interest. As the created test environment shows, our SH-DNN-based multistage system surpasses current approaches of learning-based automated visual inspection. The system reaches a performance (F1-score) of up to 99.5%, corresponding to a relative improvement of the system’s fault detection capabilities by 8.6-fold. Moreover, by specifically selecting models for the given manufacturing chain, runtime constraints are satisfied while improving the detection capabilities of currently deployed approaches.
APA, Harvard, Vancouver, ISO, and other styles
26

Jin, Zhenxun, Fengyan Zhong, Qiang Zhang, Weisong Wang, and Xuanyin Wang. "Visual detection of tobacco packaging film based on apparent features." International Journal of Advanced Robotic Systems 18, no. 3 (May 1, 2021): 172988142110248. http://dx.doi.org/10.1177/17298814211024839.

Full text
Abstract:
The main purpose of this article is to study the detection of transparent film on the surface of tobacco packs. Tobacco production line needs an industrial robot to remove the transparent film in the process of unpacking. Therefore, after the industrial robot removes the transparent film, it is necessary to use machine vision technology to determine whether there is transparent film residue on the surface of tobacco packaging. In this article, based on the study of the optical features of semitransparent objects, an algorithm for detecting the residue of transparent film in tobacco packs based on surface features is proposed. According to the difference of surface features between tobacco and film, a probability distribution model considering highlights, saturation, and texture density is designed. Because the probability distribution model integrates many features of tobacco and film, it is more reasonable to distinguish the tobacco film regions. In this article, an appropriate foreground box with a trapezoidal mask and image segmentation algorithm GrabCut is used to segment the foreground area of tobacco pack more accurately, and the possible film area is obtained by image differential and morphological processing. Finally, on the basis of comparing the effect of various machine learning algorithms on the image classification of possible film regions, support vector machine based on color features is used to judge the possible film region. Application results of the system show that the method proposed in this article can effectively detect whether there is film residue on the surface of tobacco pack.
APA, Harvard, Vancouver, ISO, and other styles
27

Mahmoud, Ahmed, and Mohamed Atia. "Improved Visual SLAM Using Semantic Segmentation and Layout Estimation." Robotics 11, no. 5 (September 6, 2022): 91. http://dx.doi.org/10.3390/robotics11050091.

Full text
Abstract:
The technological advances in computational systems have enabled very complex computer vision and machine learning approaches to perform efficiently and accurately. These new approaches can be considered a new set of tools to reshape the visual SLAM solutions. We present an investigation of the latest neuroscientific research that explains how the human brain can accurately navigate and map unknown environments. The accuracy suggests that human navigation is not affected by traditional visual odometry drifts resulting from tracking visual features. It utilises the geometrical structures of the surrounding objects within the navigated space. The identified objects and space geometrical shapes anchor the estimated space representation and mitigate the overall drift. Inspired by the human brain’s navigation techniques, this paper presents our efforts to incorporate two machine learning techniques into a VSLAM solution: semantic segmentation and layout estimation to imitate human abilities to map new environments. The proposed system benefits from the geometrical relations between the corner points of the cuboid environments to improve the accuracy of trajectory estimation. Moreover, the implemented SLAM solution semantically groups the map points and then tracks each group independently to limit the system drift. The implemented solution yielded higher trajectory accuracy and immunity to large pure rotations.
APA, Harvard, Vancouver, ISO, and other styles
28

Papadimitriou, Angeliki, Nikolaos Passalis, and Anastasios Tefas. "Visual representation decoding from human brain activity using machine learning: A baseline study." Pattern Recognition Letters 128 (December 2019): 38–44. http://dx.doi.org/10.1016/j.patrec.2019.08.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Zhu, Wenyao, and Shuyue Zhou. "3D Reconstruction Method of Virtual and Real Fusion Based on Machine Learning." Mathematical Problems in Engineering 2022 (May 19, 2022): 1–11. http://dx.doi.org/10.1155/2022/7158504.

Full text
Abstract:
With the continuous development of computer vision technology, people are paying more and more attention to the method of using computers to simulate actual 3D scenes, and the requirements for 3D reconstruction technology are getting higher and higher. Virtual and real fusion refers to combining the virtual environment generated by the computer with the actual scenes around the user through photoelectric display, sensors, computer graphics, multimedia, and other technologies. This is a technology that can obtain more convenient and direct expressions, and it is also a technique for expressing content more abundantly and accurately. The key to virtual and real fusion technology is the registration of virtual objects and real scenes. It means that the system should be able to correctly estimate the position and posture of the camera in the real world, and then place the virtual object where it should be. Machine learning is a multifield interdisciplinary subject that specializes in how computers simulate or realize human learning behaviors. It is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications are in all the fields of artificial intelligence. This article introduces the virtual-real fusion 3D reconstruction method based on machine learning, compares the performance of the method with other algorithms through experiments, and draws the following conclusion: the algorithm in this study is the fastest, with an average speed of 72.9% under different times. To evaluate the image acquisition indicators of each algorithm, the algorithm in this study has the lowest error rate. The matching accuracy of each algorithm is tested, and it is found that the average matching accuracy of the algorithm in this study is about 0.87, which is the highest.
APA, Harvard, Vancouver, ISO, and other styles
30

Fradkov, Alexander L., and Alexander I. Shepeljavyi. "The history of cybernetics and artificial intelligence: a view from Saint Petersburg." Cybernetics and Physics, Volume 11, 2022, Number 4 (December 30, 2022): 253–63. http://dx.doi.org/10.35470/2226-4116-2022-11-3-253-263.

Full text
Abstract:
In the article the history of cybernetics and artificial intelligence in the world and, particularly in the USSR is outlined starting from the 1940s-1950s. The rapid development of these areas in the 1960s is described in more detail. Special attention is paid to the results of Leningrad (St. Petersburg) researchers, particularly to the work of Vladimir Yakubovich and his scientific school on machine learning, pattern recognition, adaptive systems, intelligent robots and their importance for the further development of cybernetics and artificial intelligence.
APA, Harvard, Vancouver, ISO, and other styles
31

Gumbs, Andrew A., Vincent Grasso, Nicolas Bourdel, Roland Croner, Gaya Spolverato, Isabella Frigerio, Alfredo Illanes, Mohammad Abu Hilal, Adrian Park, and Eyad Elyan. "The Advances in Computer Vision That Are Enabling More Autonomous Actions in Surgery: A Systematic Review of the Literature." Sensors 22, no. 13 (June 29, 2022): 4918. http://dx.doi.org/10.3390/s22134918.

Full text
Abstract:
This is a review focused on advances and current limitations of computer vision (CV) and how CV can help us obtain to more autonomous actions in surgery. It is a follow-up article to one that we previously published in Sensors entitled, “Artificial Intelligence Surgery: How Do We Get to Autonomous Actions in Surgery?” As opposed to that article that also discussed issues of machine learning, deep learning and natural language processing, this review will delve deeper into the field of CV. Additionally, non-visual forms of data that can aid computerized robots in the performance of more autonomous actions, such as instrument priors and audio haptics, will also be highlighted. Furthermore, the current existential crisis for surgeons, endoscopists and interventional radiologists regarding more autonomy during procedures will be discussed. In summary, this paper will discuss how to harness the power of CV to keep doctors who do interventions in the loop.
APA, Harvard, Vancouver, ISO, and other styles
32

Muhamada, Azhee Wria, and Aree A. Mohammed. "Review on recent Computer Vision Methods for Human Action Recognition." ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal 10, no. 4 (February 8, 2022): 361–79. http://dx.doi.org/10.14201/adcaij2021104361379.

Full text
Abstract:
The subject of human activity recognition is considered an important goal in the domain of computer vision from the beginning of its development and has reached new levels. It is also thought of as a simple procedure. Problems arise in fast-moving and advanced scenes, and the numerical analysis of artificial intelligence (AI) through activity prediction mistreatment increased the attention of researchers to study. Having decent methodological and content related variations, several datasets were created to address the evaluation of these ways. Human activities play an important role but with challenging characteristic in various fields. Many applications exist in this field, such as smart home, helpful AI, HCI (Human-Computer Interaction), advancements in protection in applications such as transportation, education, security, and medication management, including falling or helping elderly in medical drug consumption. The positive impact of deep learning techniques on many vision applications leads to deploying these ways in video processing. Analysis of human behavior activities involves major challenges when human presence is concerned. One individual can be represented in multiple video sequences through skeleton, motion and/or abstract characteristics. This work aims to address human presence by combining many options and utilizing a new RNN structure for activities. The paper focuses on recent advances in machine learning-assisted action recognition. Existing modern techniques for the recognition of actions and prediction similarly because the future scope for the analysis is mentioned accuracy within the review paper.
APA, Harvard, Vancouver, ISO, and other styles
33

Otsubo, Shun, Yasutake Takahashi, and Masaki Haruna. "Modular Neural Network for Learning Visual Features, Routes, and Operation Through Human Driving Data Toward Automatic Driving System." Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no. 3 (May 20, 2020): 368–76. http://dx.doi.org/10.20965/jaciii.2020.p0368.

Full text
Abstract:
This paper proposes an automatic driving system based on a combination of modular neural networks processing human driving data. Research on automatic driving vehicles has been actively conducted in recent years. Machine learning techniques are often utilized to realize an automatic driving system capable of imitating human driving operations. Almost all of them adopt a large monolithic learning module, as typified by deep learning. However, it is inefficient to use a monolithic deep learning module to learn human driving operations (accelerating, braking, and steering) using the visual information obtained from a human driving a vehicle. We propose combining a series of modular neural networks that independently learn visual feature quantities, routes, and driving maneuvers from human driving data, thereby imitating human driving operations and efficiently learning a plurality of routes. This paper demonstrates the effectiveness of the proposed method through experiments using a small vehicle.
APA, Harvard, Vancouver, ISO, and other styles
34

Peterson, Marco, Minzhen Du, Bryant Springle, and Jonathan Black. "SpaceDrones 2.0—Hardware-in-the-Loop Simulation and Validation for Orbital and Deep Space Computer Vision and Machine Learning Tasking Using Free-Flying Drone Platforms." Aerospace 9, no. 5 (May 6, 2022): 254. http://dx.doi.org/10.3390/aerospace9050254.

Full text
Abstract:
The proliferation of reusable space vehicles has fundamentally changed how assets are injected into the low earth orbit and beyond, increasing both the reliability and frequency of launches. Consequently, it has led to the rapid development and adoption of new technologies in the aerospace sector, including computer vision (CV), machine learning (ML)/artificial intelligence (AI), and distributed networking. All these technologies are necessary to enable truly autonomous “Human-out-of-the-loop” mission tasking for spaceborne applications as spacecrafts travel further into the solar system and our missions become more ambitious. This paper proposes a novel approach for space-based computer vision sensing and machine learning simulation and validation using synthetically trained models to generate the large amounts of space-based imagery needed to train computer vision models. We also introduce a method of image data augmentation known as domain randomization to enhance machine learning performance in the dynamic domain of spaceborne computer vision to tackle unique space-based challenges such as orientation and lighting variations. These synthetically trained computer vision models then apply that capability for hardware-in-the-loop testing and evaluation via free-flying robotic platforms, thus enabling sensor-based orbital vehicle control, onboard decision making, and mobile manipulation similar to air-bearing table methods. Given the current energy constraints of space vehicles using solar-based power plants, cameras provide an energy-efficient means of situational awareness when compared to active sensing instruments. When coupled with computationally efficient machine learning algorithms and methods, it can enable space systems proficient in classifying, tracking, capturing, and ultimately manipulating objects for orbital/planetary assembly and maintenance (tasks commonly referred to as In-Space Assembly and On-Orbit Servicing). Given the inherent dangers of manned spaceflight/extravehicular activities (EVAs) currently employed to perform spacecraft maintenance and the current limitation of long-duration human spaceflight outside the low earth orbit, space robotics armed with generalized sensing and control and machine learning architecture have a unique automation potential. However, the tools and methodologies required for hardware-in-the-loop simulation, testing, and validation at a large scale and at an affordable price point are in developmental stages. By leveraging a drone’s free-flight maneuvering capability, theater projection technology, synthetically generated orbital and celestial environments, and machine learning, this work strives to build a robust hardware-in-the-loop testing suite. While the focus of the specific computer vision models in this paper is narrowed down to solving visual sensing problems in orbit, this work can very well be extended to solve any problem set that requires a robust onboard computer vision, robotic manipulation, and free-flight capabilities.
APA, Harvard, Vancouver, ISO, and other styles
35

Tawiah, Thomas Andzi-Quainoo. "A review of algorithms and techniques for image-based recognition and inference in mobile robotic systems." International Journal of Advanced Robotic Systems 17, no. 6 (November 1, 2020): 172988142097227. http://dx.doi.org/10.1177/1729881420972278.

Full text
Abstract:
Autonomous vehicles include driverless, self-driving and robotic cars, and other platforms capable of sensing and interacting with its environment and navigating without human help. On the other hand, semiautonomous vehicles achieve partial realization of autonomy with human intervention, for example, in driver-assisted vehicles. Autonomous vehicles first interact with their surrounding using mounted sensors. Typically, visual sensors are used to acquire images, and computer vision techniques, signal processing, machine learning, and other techniques are applied to acquire, process, and extract information. The control subsystem interprets sensory information to identify appropriate navigation path to its destination and action plan to carry out tasks. Feedbacks are also elicited from the environment to improve upon its behavior. To increase sensing accuracy, autonomous vehicles are equipped with many sensors [light detection and ranging (LiDARs), infrared, sonar, inertial measurement units, etc.], as well as communication subsystem. Autonomous vehicles face several challenges such as unknown environments, blind spots (unseen views), non-line-of-sight scenarios, poor performance of sensors due to weather conditions, sensor errors, false alarms, limited energy, limited computational resources, algorithmic complexity, human–machine communications, size, and weight constraints. To tackle these problems, several algorithmic approaches have been implemented covering design of sensors, processing, control, and navigation. The review seeks to provide up-to-date information on the requirements, algorithms, and main challenges in the use of machine vision–based techniques for navigation and control in autonomous vehicles. An application using land-based vehicle as an Internet of Thing-enabled platform for pedestrian detection and tracking is also presented.
APA, Harvard, Vancouver, ISO, and other styles
36

Tewes, Federico R. "Artificial Intelligence in the American Healthcare Industry: Looking Forward to 2030." Journal of Medical Research and Surgery 3, no. 5 (October 6, 2022): 107–8. http://dx.doi.org/10.52916/jmrs224089.

Full text
Abstract:
Artificial intelligence (AI) has the potential to speed up the exponential growth of cutting-edge technology, much way the Internet did. Due to intense competition from the private sector, governments, and businesspeople around the world, the Internet has already reached its peak as an exponential technology. In contrast, artificial intelligence is still in its infancy, and people all over the world are unsure of how it will impact their lives in the future. Artificial intelligence, is a field of technology that enables robots and computer programmes to mimic human intellect by teaching a predetermined set of software rules to learn by repetitive learning from experience and slowly moving toward maximum performance. Although this intelligence is still developing, it has already demonstrated five different levels of independence. Utilized initially to resolve issues. Next, think about solutions. Third, respond to inquiries. Fourth, use data analytics to generate forecasts. Fifth, make tactical recommendations. Massive data sets and "iterative algorithms," which use lookup tables and other data structures like stacks and queues to solve issues, make all of this possible. Iteration is a strategy where software rules are regularly adjusted to patterns in the data for a certain number of iterations. The artificial intelligence continuously makes small, incremental improvements that result in exponential growth, which enables the computer to become incredibly proficient at whatever it is trained to do. For each round of data processing, the artificial intelligence tests and measures its performance to develop new expertise. In order to address complicated problems, artificial intelligence aims to create computer systems that can mimic human behavior and exhibit human-like thought processes [1]. Artificial intelligence technology is being developed to give individualized medication in the field of healthcare. By 2030, six different artificial intelligence sectors will have considerably improved healthcare delivery through the utilization of larger, more accessible data sets. The first is machine learning. This area of artificial intelligence learns automatically and produces improved results based on identifying patterns in the data, gaining new insights, and enhancing the outcomes of whatever activity the system is intended to accomplish. It does this without being trained to learn a particular topic. Here are several instances of machine learning in the healthcare industry. The first is the IBM Watson Genomics, which aids in rapid disease diagnosis and identification by fusing cognitive computing with genome-based tumour sequencing. Second, a project called Nave Bayes allows for the prediction of diabetes years before an official diagnosis, before it results in harm to the kidneys, the heart, and the nerves. Third, employing two machine learning approaches termed classification and clustering to analyse the Indian Liver Patient Data (ILPD) set in order to predict liver illness before this organ that regulates metabolism becomes susceptible to chronic hepatitis, liver cancer, and cirrhosis [2]. Second, deep learning. Deep learning employs artificial intelligence to learn from data processing, much like machine learning does. Deep learning, on the other hand, makes use of synthetic neural networks that mimic human brain function to analyse data, identify relationships between the data, and provide outputs based on positive and negative reinforcement. For instance, in the fields of Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), deep learning aids in the processes of picture recognition and object detection. Deep learning algorithms for the early identification of Alzheimer's, diabetic retinopathy, and breast nodule ultrasound detection are three applications of this cutting-edge technology in the real world. Future developments in deep learning will make considerable improvements in pathology and radiology pictures [3]. Third, neural networks. The artificial intelligence system can now accept massive data sets, find patterns within the data, and respond to queries regarding the information processed because the computer learning process resembles a network of neurons in the human brain. Let's examine a few application examples that are now applicable to the healthcare sector. According to studies from John Hopkins University, surgical errors are a major contributor to medical malpractice claims since they happen more than 4,000 times a year in just the United States due to the human error of surgeons. Neural networks can be used in robot-assisted surgery to model and plan procedures, evaluate the abilities of the surgeon, and streamline surgical activities. In one study of 379 orthopaedic patients, it was discovered that robotic surgery using neural networks results in five times fewer complications than surgery performed by a single surgeon. Another application of neural networks is in visualising diagnostics, which was proven to physicians by Harvard University researchers who inserted an image of a gorilla to x-rays. Of the radiologists who saw the images, 83% did not recognise the gorilla. The Houston Medical Research Institute has created a breast cancer early detection programme that can analyse mammograms with 99 percent accuracy and offer diagnostic information 30 times faster than a human [4]. Cognitive computing is the fourth. Aims to replicate the way people and machines interact, showing how a computer may operate like the human brain when handling challenging tasks like text, speech, or image analysis. Large volumes of patient data have been analysed, with the majority of the research to date focusing on cancer, diabetes, and cardiovascular disease. Companies like Google, IBM, Facebook, and Apple have shown interest in this work. Cognitive computing made up the greatest component of the artificial market in 2020, with 39% of the total [5]. Hospitals made up 42% of the market for cognitive computing end users because of the rising demand for individualised medical data. IBM invested more than $1 billion on the development of the WATSON analytics platform ecosystem and collaboration with startups committed to creating various cloud and application-based systems for the healthcare business in 2014 because it predicted the demand for cognitive computing in this sector. Natural Language Processing (NLP) is the fifth. This area of artificial intelligence enables computers to comprehend and analyse spoken language. The initial phase of this pre-processing is to divide the data up into more manageable semantic units, which merely makes the information simpler for the NLP system to understand. Clinical trial development is experiencing exponential expansion in the healthcare sector thanks to NLP. First, the NLP uses speech-to-text dictation and structured data entry to extract clinical data at the point of care, reducing the need for manual assessment of complex clinical paperwork. Second, using NLP technology, healthcare professionals can automatically examine enormous amounts of unstructured clinical and patient data to select the most suitable patients for clinical trials, perhaps leading to an improvement in the patients' health [6]. Computer vision comes in sixth. Computer vision, an essential part of artificial intelligence, uses visual data as input to process photos and videos continuously in order to get better results faster and with higher quality than would be possible if the same job were done manually. Simply put, doctors can now diagnose their patients with diseases like cancer, diabetes, and cardiovascular disorders more quickly and at an earlier stage. Here are a few examples of real-world applications where computer vision technology is making notable strides. Mammogram images are analysed by visual systems that are intended to spot breast cancer at an early stage. Automated cell counting is another example from the real world that dramatically decreases human error and raises concerns about the accuracy of the results because they might differ greatly depending on the examiner's experience and degree of focus. A third application of computer vision in the real world is the quick and painless early-stage tumour detection enabled by artificial intelligence. Without a doubt, computer vision has the unfathomable potential to significantly enhance how healthcare is delivered. Other than for visual data analysis, clinicians can use this technology to enhance their training and skill development. Currently, Gramener is the top company offering medical facilities and research organisations computer vision solutions [7]. The usage of imperative rather than functional programming languages is one of the key difficulties in creating artificial intelligence software. As artificial intelligence starts to increase exponentially, developers employing imperative programming languages must assume that the machine is stupid and supply detailed instructions that are subject to a high level of maintenance and human error. In software with hundreds of thousands of lines of code, human error detection is challenging. Therefore, the substantial amount of ensuing maintenance may become ridiculously expensive, maintaining the high expenditures of research and development. As a result, software developers have contributed to the unreasonably high cost of medical care. Functional programming languages, on the other hand, demand that the developer use their problem-solving abilities as though the computer were a mathematician. As a result, compared to the number of lines of code needed by the programme to perform the same operation, mathematical functions are orders of magnitude shorter. In software with hundreds of thousands of lines of code, human error detection is challenging. Therefore, the substantial amount of ensuing maintenance may become ridiculously expensive, maintaining the high expenditures of research and development. As a result, software developers have contributed to the unreasonably high cost of medical care. Functional programming languages, on the other hand, demand that the developer use their problem-solving abilities as though the computer were a mathematician. As a result, compared to the number of lines of code needed by the programme to perform the same operation, mathematical functions are orders of magnitude shorter. The bulk of software developers that use functional programming languages are well-trained in mathematical logic; thus, they reason differently than most American software developers, who are more accustomed to following step-by-step instructions. The market for artificial intelligence in healthcare is expected to increase from $3.4 billion in 2021 to at least $18.7 billion by 2027, or a 30 percent annual growth rate before 2030, according to market research firm IMARC Group. The only outstanding query is whether these operational reductions will ultimately result in less expensive therapies.
APA, Harvard, Vancouver, ISO, and other styles
37

Wu, Jun, Jiaming Dong, Wanyu Nie, and Zhiwei Ye. "A Lightweight YOLOv5 Optimization of Coordinate Attention." Applied Sciences 13, no. 3 (January 30, 2023): 1746. http://dx.doi.org/10.3390/app13031746.

Full text
Abstract:
As Machine Learning technologies evolve, there is a desire to add vision capabilities to all devices within the IoT in order to enable a wider range of artificial intelligence. However, for most mobile devices, their computing power and storage space are affected by factors such as cost and the tight supply of relevant chips, making it impossible to effectively deploy complex network models to small processors with limited resources and to perform efficient real-time detection. In this paper, YOLOv5 is studied to achieve the goal of lightweight devices by reducing the number of original network channels. Then detection accuracy is guaranteed by adding a detection head and CA attention mechanism. The YOLOv5-RC model proposed in this paper is 30% smaller and lighter than YOLOv5s, but still maintains good detection accuracy. YOLOv5-RC network models can achieve a good balance between detection accuracy and detection speed, with potential for its widespread use in industry.
APA, Harvard, Vancouver, ISO, and other styles
38

Lin, Yueh-lung, and Conghua Wen. "Vehicle Vision Robust Detection and Recognition Method." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 10 (December 31, 2019): 2055020. http://dx.doi.org/10.1142/s0218001420550204.

Full text
Abstract:
With the rapid growth of the global economy, the global car ownership is also increasing year by year, which has caused a series of problems, the most prominent of which is traffic congestion and traffic accidents. In order to solve the traffic problem, all countries are actively studying the intelligent transportation system, and one of the important research contents of the intelligent transportation system is vehicle detection. Vehicle detection based on vision is to capture vehicle images in the driving environment through a camera, and then use computer vision recognition technology for vehicle detection and recognition. Although computer vision recognition technology has made great progress, how to improve the detection accuracy of the image to be detected is still an important content of visual recognition technology research. Intelligent vehicle visual robust detection and identification of methods of research to reduce the growing incidence of traffic accidents, improve the existing road traffic safety and transportation efficiency, alleviate the degree of driver fatigue problem are of great significance. This paper considers the intelligent vehicle environmental awareness of the key technology to the goal of robust detection and recognition based on machine vision problems for further research. The particle filter is used to extract the local energy of the image to realize the fast segmentation of the region of interest (ROI). In order to further verify the ROI, a measure learning method based on multi-core embedding is proposed, and the semantic classification of ROI is realized by integrating the color, shape and geometric features of ROI. Experimental results show that the algorithm can effectively eliminate false sexy ROI interest, and the algorithm is robust to complex background, illumination changes, perspective changes and other conditions.
APA, Harvard, Vancouver, ISO, and other styles
39

Liu, Yuanyuan, Xingmei Li, Fang Fang, Fayong Zhang, Jingying Chen, and Zhizhong Zeng. "Visual Focus of Attention and Spontaneous Smile Recognition Based on Continuous Head Pose Estimation by Cascaded Multi-Task Learning." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 07 (June 7, 2019): 1940006. http://dx.doi.org/10.1142/s0218001419400068.

Full text
Abstract:
Multi-person Visual focus of attention (M-VFOA) and spontaneous smile (SS) recognition are important for persons’ behavior understanding and analysis in class. Recently, promising results have been reported using special hardware in constrained environment. However, M-VFOA and SS remain challenging problems in natural and crowd classroom environment, e.g. various poses, occlusion, expressions, illumination and poor image quality, etc. In this study, a robust and un-invasive M-VFOA and SS recognition system has been developed based on continuous head pose estimation in the natural classroom. A novel cascaded multi-task Hough forest (CM-HF) combined with weighted Hough voting and multi-task learning is proposed for continuous head pose estimation, tip of the nose location and SS recognition, which improves accuracies of recognition and reduces the training time. Then, M-VFOA can be recognized based on estimated head poses, environmental cues and prior states in the natural classroom. Meanwhile, SS is classified using CM-HF with local cascaded mouth-eyes areas normalized by the estimated head poses. The method is rigorously evaluated for continuous head pose estimation, multi-person VFOA recognition, and SS recognition on some public available datasets and real-class video sequences. Experimental results show that our method reduces training time greatly and outperforms the state-of-the-art methods for both performance and robustness with an average accuracy of 83.5% on head pose estimation, 67.8% on M-VFOA recognition and 97.1% on SS recognition in challenging environments.
APA, Harvard, Vancouver, ISO, and other styles
40

Shalini Jaiswal and Preeti Singh Bahadur, Sai Sri Nandan Challapalli. "Latest Advances of Natural Language Processing and their Applications in Everyday life." International Journal for Modern Trends in Science and Technology 6, no. 10 (November 24, 2020): 31–35. http://dx.doi.org/10.46501/ijmtst061006.

Full text
Abstract:
Natural language processing (NLP) area of Artificial Intelligence (AI) has offered the scope to apply and integrate various other traditional AI fields. While the world was working on comparatively simpler aspects like constraint satisfaction and logical reasoning, the last decade saw a dramatic shift in the research. Now large-scale applications of statistical methods, such as machine learning and data mining are in the limelight. At the same time, the integration of this understanding with Computer Vision, a tech that deals with obtaining information from visual data through cameras will pave way to bring the AI enabled devices closer to a layman also. This paper gives an overview of implementation and trend analysis of such technology in Sales and ServiceSectors.
APA, Harvard, Vancouver, ISO, and other styles
41

Lu, Yuanyao, and Kexin Li. "Research on lip recognition algorithm based on MobileNet + attention-GRU." Mathematical Biosciences and Engineering 19, no. 12 (2022): 13526–40. http://dx.doi.org/10.3934/mbe.2022631.

Full text
Abstract:
<abstract> <p>With the development of deep learning and artificial intelligence, the application of lip recognition is in high demand in computer vision and human-machine interaction. Especially, utilizing automatic lip recognition technology to improve performance during social interactions for those hard of hearing, and pronunciation is one of the most promising applications of artificial intelligence in medical healthcare and rehabilitation. Lip recognition means to recognize the content expressed by the speaker by analyzing dynamic motions. Presently, lip recognition research mainly focuses on the algorithms and computational performance, but there are relatively few research articles on its practical application. In order to amend that, this paper focuses on the research of a deep learning-based lip recognition application system, i.e., the design and development of a speech correction system for the hearing impaired, which aims to lay the foundation for the comprehensive implementation of automatic lip recognition technology in the future. First, we used a MobileNet lightweight network to extract spatial features from the original lip image; the extracted features are robust and fault-tolerant. Then, the gated recurrent unit (GRU) network was used to further extract the 2D image features and temporal features of the lip. To further improve the recognition rate, based on the GRU network, we incorporated an attention mechanism; the performance of this model is illustrated through a large number of experiments. Meanwhile, we constructed a lip similarity matching system to assist hearing-impaired people in learning and correcting their mouth shape with correct pronunciation. The experiments finally show that this system is highly feasible and effective.</p> </abstract>
APA, Harvard, Vancouver, ISO, and other styles
42

BELZ, A., T. L. BERG, and L. YU. "From image to language and back again." Natural Language Engineering 24, no. 3 (April 23, 2018): 325–62. http://dx.doi.org/10.1017/s1351324918000086.

Full text
Abstract:
Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution. The present volume brings together five research articles from several different corners of the area: multilingual multimodal image description (Franket al.), multimodal machine translation (Madhyasthaet al., Franket al.), image caption generation (Madhyasthaet al., Tantiet al.), visual scene understanding (Silbereret al.), and multimodal learning of high-level attributes (Sorodocet al.). In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation (REG) and comprehension (Section 3), and visual question answering (VQA) (Section 4).
APA, Harvard, Vancouver, ISO, and other styles
43

Takadama, Keiki, and Kazuteru Miyazaki. "Special Issue on Cutting Edge of Reinforcement Learning and its Hybrid Methods." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 5 (September 20, 2017): 833. http://dx.doi.org/10.20965/jaciii.2017.p0833.

Full text
Abstract:
Machine learning has been attracting significant attention again since the potential of deep learning was recognized. Not only has machine learning been improved, but it has also been integrated with “reinforcement learning,” revealing other potential applications, e.g., deep Q-networks (DQN) and AlphaGO proposed by Google DeepMind. It is against this background that this special issue, “Cutting Edge of Reinforcement Learning and its Hybrid Methods,” focuses on both reinforcement learning and its hybrid methods, including reinforcement learning with deep learning or evolutionary computation, to explore new potentials of reinforcement learning.Of the many contributions received, we finally selected 13 works for publication. The first three propose hybrids of deep learning and reinforcement learning for single agent environments, which include the latest research results in the areas of convolutional neural networks and DQN. The fourth through seventh works are related to the Learning Classifier System, which integrates evolutionary computation and reinforcement learning to develop the rule discovery mechanism. The eighth and ninth works address problems related to goal design or the reward, an issue that is particularly important to the application of reinforcement learning. The last four contributions deal with multiagent environments.These works cover a wide range of studies, from the expansion of techniques incorporating simultaneous learning to applications in multiagent environments. All works are on the cutting edge of reinforcement learning and its hybrid methods. We hope that this special issue constitutes a large contribution to the development of the reinforcement learning field.
APA, Harvard, Vancouver, ISO, and other styles
44

Ahmed, Ibrahim Abdulrab, Ebrahim Mohammed Senan, Taha H. Rassem, Mohammed A. H. Ali, Hamzeh Salameh Ahmad Shatnawi, Salwa Mutahar Alwazer, and Mohammed Alshahrani. "Eye Tracking-Based Diagnosis and Early Detection of Autism Spectrum Disorder Using Machine Learning and Deep Learning Techniques." Electronics 11, no. 4 (February 10, 2022): 530. http://dx.doi.org/10.3390/electronics11040530.

Full text
Abstract:
Eye tracking is a useful technique for detecting autism spectrum disorder (ASD). One of the most important aspects of good learning is the ability to have atypical visual attention. The eye-tracking technique provides useful information about children’s visual behaviour for early and accurate diagnosis. It works by scanning the paths of the eyes to extract a sequence of eye projection points on the image to analyse the behaviour of children with autism. In this study, three artificial-intelligence techniques were developed, namely, machine learning, deep learning, and a hybrid technique between them, for early diagnosis of autism. The first technique, neural networks [feedforward neural networks (FFNNs) and artificial neural networks (ANNs)], is based on feature classification extracted by a hybrid method between local binary pattern (LBP) and grey level co-occurrence matrix (GLCM) algorithms. This technique achieved a high accuracy of 99.8% for FFNNs and ANNs. The second technique used a pre-trained convolutional neural network (CNN) model, such as GoogleNet and ResNet-18, on the basis of deep feature map extraction. The GoogleNet and ResNet-18 models achieved high performances of 93.6% and 97.6%, respectively. The third technique used the hybrid method between deep learning (GoogleNet and ResNet-18) and machine learning (SVM), called GoogleNet + SVM and ResNet-18 + SVM. This technique depends on two blocks. The first block used CNN to extract deep feature maps, whilst the second block used SVM to classify the features extracted from the first block. This technique proved its high diagnostic ability, achieving accuracies of 95.5% and 94.5% for GoogleNet + SVM and ResNet-18 + SVM, respectively.
APA, Harvard, Vancouver, ISO, and other styles
45

Zvarevashe, Kudakwashe, and Oludayo O. Olugbara. "Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm." Intelligent Data Analysis 24, no. 5 (September 30, 2020): 1065–86. http://dx.doi.org/10.3233/ida-194747.

Full text
Abstract:
Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.
APA, Harvard, Vancouver, ISO, and other styles
46

Qiu, Shaojian, Lu Lu, Siyu Jiang, and Yang Guo. "An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 12 (November 2019): 1959037. http://dx.doi.org/10.1142/s0218001419590377.

Full text
Abstract:
Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.
APA, Harvard, Vancouver, ISO, and other styles
47

Wu, Xu, Youlong Yang, and Lingyu Ren. "Entropy difference and kernel-based oversampling technique for imbalanced data learning." Intelligent Data Analysis 24, no. 6 (December 18, 2020): 1239–55. http://dx.doi.org/10.3233/ida-194761.

Full text
Abstract:
Class imbalance is often a problem in various real-world datasets, where one class contains a small number of data and the other contains a large number of data. It is notably difficult to develop an effective model using traditional data mining and machine learning algorithms without using data preprocessing techniques to balance the dataset. Oversampling is often used as a pretreatment method for imbalanced datasets. Specifically, synthetic oversampling techniques focus on balancing the number of training instances between the majority class and the minority class by generating extra artificial minority class instances. However, the current oversampling techniques simply consider the imbalance of quantity and pay no attention to whether the distribution is balanced or not. Therefore, this paper proposes an entropy difference and kernel-based SMOTE (EDKS) which considers the imbalance degree of dataset from distribution by entropy difference and overcomes the limitation of SMOTE for nonlinear problems by oversampling in the feature space of support vector machine classifier. First, the EDKS method maps the input data into a feature space to increase the separability of the data. Then EDKS calculates the entropy difference in kernel space, determines the majority class and minority class, and finds the sparse regions in the minority class. Moreover, the proposed method balances the data distribution by synthesizing new instances and evaluating its retention capability. Our algorithm can effectively distinguish those datasets with the same imbalance ratio but different distribution. The experimental study evaluates and compares the performance of our method against state-of-the-art algorithms, and then demonstrates that the proposed approach is competitive with the state-of-art algorithms on multiple benchmark imbalanced datasets.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Lei. "Application Research of Deep Convolutional Neural Network in Computer Vision." Journal of Networking and Telecommunications 2, no. 2 (August 6, 2020): 23. http://dx.doi.org/10.18282/jnt.v2i2.886.

Full text
Abstract:
<p>As an important research achievement in the field of brain like computing, deep convolution neural network has been widely used in many fields such as computer vision, natural language processing, information retrieval, speech recognition, semantic understanding and so on. It has set off a wave of neural network research in industry and academia and promoted the development of artificial intelligence. At present, the deep convolution neural network mainly simulates the complex hierarchical cognitive laws of the human brain by increasing the number of layers of the network, using a larger training data set, and improving the network structure or training learning algorithm of the existing neural network, so as to narrow the gap with the visual system of the human brain and enable the machine to acquire the capability of "abstract concepts". Deep convolution neural network has achieved great success in many computer vision tasks such as image classification, target detection, face recognition, pedestrian recognition, etc. Firstly, this paper reviews the development history of convolutional neural networks. Then, the working principle of the deep convolution neural network is analyzed in detail. Then, this paper mainly introduces the representative achievements of convolution neural network from the following two aspects, and shows the improvement effect of various technical methods on image classification accuracy through examples. From the aspect of adding network layers, the structures of classical convolutional neural networks such as AlexNet, ZF-Net, VGG, GoogLeNet and ResNet are discussed and analyzed. From the aspect of increasing the size of data set, the difficulties of manually adding labeled samples and the effect of using data amplification technology on improving the performance of neural network are introduced. This paper focuses on the latest research progress of convolution neural network in image classification and face recognition. Finally, the problems and challenges to be solved in future brain-like intelligence research based on deep convolution neural network are proposed.</p>
APA, Harvard, Vancouver, ISO, and other styles
49

Fujita, Kosuke, and Hideaki Touyama. "Majority Rule Using Collaborative P300 by Auditory Stimulation." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 7 (November 20, 2017): 1312–20. http://dx.doi.org/10.20965/jaciii.2017.p1312.

Full text
Abstract:
In this study, a new method to realize majority rule is presented by using noninvasive brain activities. With the majority rule based on an electroencephalogram (EEG), a technique to determine the attention of multiple users is proposed. In general, a single-shot EEG ensures short-time response, but it is inevitably deteriorated by artifacts. To enhance the accuracy of the majority rule, the collaborative signals of P300 evoked potentials are focused. The collaborative P300 signal is prepared by averaging individual single-shot P300 signals among subjects. In experiments, the EEG signals of twelve volunteers were collected by using auditory stimuli. The subjects paid attention to target stimuli and no attention to standard stimuli. The collaborative P300 signal was used to evaluate the performance of the majority rule. The proposed algorithm enables us to estimate the degree of attention of the group. The classification is based on supervised machine learning, and the accuracy approximately 80%. The applications of this novel technique in multimedia content evaluations as well as neuromarketing and computer-supported co-operative work are discussed.
APA, Harvard, Vancouver, ISO, and other styles
50

Chan, Hannah O., Rakesh Joshi, Alexander Morzycki, Andrew C. Pun, Joshua N. Wong, and Collins Hong. "96.5 Using Computer Vision-Based Algorithms Trained on Mobile-Device Camera Images for Monitoring Burn Wound Healing." Journal of Burn Care & Research 43, Supplement_1 (March 23, 2022): S64. http://dx.doi.org/10.1093/jbcr/irac012.099.

Full text
Abstract:
Abstract Introduction The appropriate characterization of burn depth and healing is paramount. Unfortunately, the accuracy of approximating thermal injury depth among all physicians is poor. While tools to improve detection accuracy, including laser doppler imaging and laser speckle imaging exist, these technologies are expensive and limited to specialized burn referral centres. They also do not provide an easy means for quantitative, interval tracking of burn healing. Considering these limitations, the application of artificial intelligence has garnered significant interest. We herein present the use of three novel machine learning and computer vision-based algorithms to track burn wound healing. Methods Convolutional neural network (CNN) models, were trained on 1800 2D color burn images, to classify them into four burn severities. These CNNs were used to develop saliency algorithms that identify the highest “attention” pixels used to recognize burns. Image-based algorithms that count these attention pixels of the CNN, count pixels representing red granulation of burns, and measure burns, were also developed. As proof-of-concept, we tracked the healing of a localized burn on a 25-year-old female patient. The patient suffered a scald on the dorsum of the foot, resulting in a deep partial-thickness burn. Opting out of surgical intervention, the patient visited the hospital over a 6-week period for treatment with non-adhesive dressings and silver nitrate. High-resolution images of the burn, with and without a fiducial marker, were captured with a smartphone camera every 7-days. Images were taken under institutional lighting and used as algorithmic inputs. Results Data analyses indicate that the healing of the open-wound area was accurately measured in millimetres (+/- 1.7 mm error) using a fiducial marker (18.3 mm diameter). The open-wound area shrank consistently from week 1 to week 6 seen in (Figure 1. a-b). The normalized, 2D colour images, where the “red” pixel value was counted (Figure 1. a-b), confirms the reduction of the red granulation in the wound. The saliency algorithm also measured a percentage reduction in the machine learning model’s total attention pixels over the 6-week period (Figure 1. c-d). This suggests that the model was less discerning of the healing burn wound over time, suggesting burn healing, which was also clinically validated.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography