Siga este enlace para ver otros tipos de publicaciones sobre el tema: Dataset VISION.

Artículos de revistas sobre el tema "Dataset VISION"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Dataset VISION".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Scheuerman, Morgan Klaus, Alex Hanna y Emily Denton. "Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development". Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (13 de octubre de 2021): 1–37. http://dx.doi.org/10.1145/3476058.

Texto completo
Resumen
Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Geiger, A., P. Lenz, C. Stiller y R. Urtasun. "Vision meets robotics: The KITTI dataset". International Journal of Robotics Research 32, n.º 11 (23 de agosto de 2013): 1231–37. http://dx.doi.org/10.1177/0278364913491297.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Liew, Yu Liang y Jeng Feng Chin. "Vision-based biomechanical markerless motion classification". Machine Graphics and Vision 32, n.º 1 (16 de febrero de 2023): 3–24. http://dx.doi.org/10.22630/mgv.2023.32.1.1.

Texto completo
Resumen
This study used stick model augmentation on single-camera motion video to create a markerless motion classification model of manual operations. All videos were augmented with a stick model composed of keypoints and lines by using the programming model, which later incorporated the COCO dataset, OpenCV and OpenPose modules to estimate the coordinates and body joints. The stick model data included the initial velocity, cumulative velocity, and acceleration for each body joint. The extracted motion vector data were normalized using three different techniques, and the resulting datasets were subjected to eight classifiers. The experiment involved four distinct motion sequences performed by eight participants. The random forest classifier performed the best in terms of accuracy in recorded data classification in its min-max normalized dataset. This classifier also obtained a score of 81.80% for the dataset before random subsampling and a score of 92.37% for the resampled dataset. Meanwhile, the random subsampling method dramatically improved classification accuracy by removing noise data and replacing them with replicated instances to balance the class. This research advances methodological and applied knowledge on the capture and classification of human motion using a single camera view.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Alyami, Hashem, Abdullah Alharbi y Irfan Uddin. "Lifelong Machine Learning for Regional-Based Image Classification in Open Datasets". Symmetry 12, n.º 12 (16 de diciembre de 2020): 2094. http://dx.doi.org/10.3390/sym12122094.

Texto completo
Resumen
Deep Learning algorithms are becoming common in solving different supervised and unsupervised learning problems. Different deep learning algorithms were developed in last decade to solve different learning problems in different domains such as computer vision, speech recognition, machine translation, etc. In the research field of computer vision, it is observed that deep learning has become overwhelmingly popular. In solving computer vision related problems, we first take a CNN (Convolutional Neural Network) which is trained from scratch or some times a pre-trained model is taken and further fine-tuned based on the dataset that is available. The problem of training the model from scratch on new datasets suffers from catastrophic forgetting. Which means that when a new dataset is used to train the model, it forgets the knowledge it has obtained from an existing dataset. In other words different datasets does not help the model to increase its knowledge. The problem with the pre-trained models is that mostly CNN models are trained on open datasets, where the data set contains instances from specific regions. This results into predicting disturbing labels when the same model is used for instances of datasets collected in a different region. Therefore, there is a need to find a solution on how to reduce the gap of Geo-diversity in different computer vision problems in developing world. In this paper, we explore the problems of models that were trained from scratch along with models which are pre-trained on a large dataset, using a dataset specifically developed to understand the geo-diversity issues in open datasets. The dataset contains images of different wedding scenarios in South Asian countries. We developed a Lifelong CNN that can incrementally increase knowledge i.e., the CNN learns labels from the new dataset but includes the existing knowledge of open data sets. The proposed model demonstrates highest accuracy compared to models trained from scratch or pre-trained model.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Bai, Long, Liangyu Wang, Tong Chen, Yuanhao Zhao y Hongliang Ren. "Transformer-Based Disease Identification for Small-Scale Imbalanced Capsule Endoscopy Dataset". Electronics 11, n.º 17 (31 de agosto de 2022): 2747. http://dx.doi.org/10.3390/electronics11172747.

Texto completo
Resumen
Vision Transformer (ViT) is emerging as a new leader in computer vision with its outstanding performance in many tasks (e.g., ImageNet-22k, JFT-300M). However, the success of ViT relies on pretraining on large datasets. It is difficult for us to use ViT to train from scratch on a small-scale imbalanced capsule endoscopic image dataset. This paper adopts a Transformer neural network with a spatial pooling configuration. Transfomer’s self-attention mechanism enables it to capture long-range information effectively, and the exploration of ViT spatial structure by pooling can further improve the performance of ViT on our small-scale capsule endoscopy dataset. We trained from scratch on two publicly available datasets for capsule endoscopy disease classification, obtained 79.15% accuracy on the multi-classification task of the Kvasir-Capsule dataset, and 98.63% accuracy on the binary classification task of the Red Lesion Endoscopy dataset.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Wang, Zhixue, Yu Zhang, Lin Luo y Nan Wang. "AnoDFDNet: A Deep Feature Difference Network for Anomaly Detection". Journal of Sensors 2022 (16 de agosto de 2022): 1–14. http://dx.doi.org/10.1155/2022/3538541.

Texto completo
Resumen
This paper proposed a novel anomaly detection (AD) approach of high-speed train images based on convolutional neural networks and the Vision Transformer. Different from previous AD works, in which anomalies are identified with a single image using classification, segmentation, or object detection methods, the proposed method detects abnormal difference between two images taken at different times of the same region. In other words, we cast anomaly detection problem with a single image into a difference detection problem with two images. The core idea of the proposed method is that the “anomaly” commonly represents an abnormal state instead of a specific object, and this state should be identified by a pair of images. In addition, we introduced a deep feature difference AD network (AnoDFDNet) which sufficiently explored the potential of the Vision Transformer and convolutional neural networks. To verify the effectiveness of the proposed AnoDFDNet, we gathered three datasets, a difference dataset (Diff dataset), a foreign body dataset (FB dataset), and an oil leakage dataset (OL dataset). Experimental results on the above datasets demonstrate the superiority of the proposed method. In terms of the F1-score, the AnoDFDNet obtained 76.24%, 81.04%, and 83.92% on Diff dataset, FB dataset, and OL dataset, respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Voytov, D. Y., S. B. Vasil’ev y D. V. Kormilitsyn. "Technology development for determining tree species using computer vision". FORESTRY BULLETIN 27, n.º 1 (febrero de 2023): 60–66. http://dx.doi.org/10.18698/2542-1468-2023-1-60-66.

Texto completo
Resumen
A technology has been developed to determine the European white birch (Betula pendula Roth.) species in the photo. The differences of the known neural networks of classifiers with the definition of objects are studied. YOLOv4 was chosen as the most promising for further development of the technology. The mechanism of image markup for the formation of training examples has been studied. The method of marking on the image has been formed. Two different datasets have been formed to retrain the network. An algorithmic increase in the dataset was carried out by transforming images and applying filters. The difference in the results of the classifier is determined. The accuracy when training exclusively on images containing hanging birch was 35 %, the accuracy when training on a dataset containing other trees was 71 %, the accuracy when training on the entire dataset was 75 %. To demonstrate the work, birch trees were identified in photographs taken in the arboretum of the MF Bauman Moscow State Technical University. To improve the technology, additional training is recommended to determine the remaining tree species. The technology can be used for the implementation of taxation of specific tree species; the formation of marked datasets for further development; the primary element in the tree image analysis system, to exclude third-party objects in the original image.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Ayana, Gelan y Se-woon Choe. "BUViTNet: Breast Ultrasound Detection via Vision Transformers". Diagnostics 12, n.º 11 (1 de noviembre de 2022): 2654. http://dx.doi.org/10.3390/diagnostics12112654.

Texto completo
Resumen
Convolutional neural networks (CNNs) have enhanced ultrasound image-based early breast cancer detection. Vision transformers (ViTs) have recently surpassed CNNs as the most effective method for natural image analysis. ViTs have proven their capability of incorporating more global information than CNNs at lower layers, and their skip connections are more powerful than those of CNNs, which endows ViTs with superior performance. However, the effectiveness of ViTs in breast ultrasound imaging has not yet been investigated. Here, we present BUViTNet breast ultrasound detection via ViTs, where ViT-based multistage transfer learning is performed using ImageNet and cancer cell image datasets prior to transfer learning for classifying breast ultrasound images. We utilized two publicly available ultrasound breast image datasets, Mendeley and breast ultrasound images (BUSI), to train and evaluate our algorithm. The proposed method achieved the highest area under the receiver operating characteristics curve (AUC) of 1 ± 0, Matthew’s correlation coefficient (MCC) of 1 ± 0, and kappa score of 1 ± 0 on the Mendeley dataset. Furthermore, BUViTNet achieved the highest AUC of 0.968 ± 0.02, MCC of 0.961 ± 0.01, and kappa score of 0.959 ± 0.02 on the BUSI dataset. BUViTNet outperformed ViT trained from scratch, ViT-based conventional transfer learning, and CNN-based transfer learning in classifying breast ultrasound images (p < 0.01 in all cases). Our findings indicate that improved transformers are effective in analyzing breast images and can provide an improved diagnosis if used in clinical settings. Future work will consider the use of a wide range of datasets and parameters for optimized performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Hanji, Param, Muhammad Z. Alam, Nicola Giuliani, Hu Chen y Rafał K. Mantiuk. "HDR4CV: High Dynamic Range Dataset with Adversarial Illumination for Testing Computer Vision Methods". Journal of Imaging Science and Technology 65, n.º 4 (1 de julio de 2021): 40404–1. http://dx.doi.org/10.2352/j.imagingsci.technol.2021.65.4.040404.

Texto completo
Resumen
Abstract Benchmark datasets used for testing computer vision (CV) methods often contain little variation in illumination. The methods that perform well on these datasets have been observed to fail under challenging illumination conditions encountered in the real world, in particular, when the dynamic range of a scene is high. The authors present a new dataset for evaluating CV methods in challenging illumination conditions such as low light, high dynamic range, and glare. The main feature of the dataset is that each scene has been captured in all the adversarial illuminations. Moreover, each scene includes an additional reference condition with uniform illumination, which can be used to automatically generate labels for the tested CV methods. We demonstrate the usefulness of the dataset in a preliminary study by evaluating the performance of popular face detection, optical flow, and object detection methods under adversarial illumination conditions. We further assess whether the performance of these applications can be improved if a different transfer function is used.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Jing Li, Jing Li y Xueping Luo Jing Li. "Malware Family Classification Based on Vision Transformer". 電腦學刊 34, n.º 1 (febrero de 2023): 087–99. http://dx.doi.org/10.53106/199115992023023401007.

Texto completo
Resumen
<p>Cybersecurity worries intensify as Big Data, the Internet of Things, and 5G technologies develop. Based on code reuse technologies, malware creators are producing new malware quickly, and new malware is continually endangering the effectiveness of existing detection methods. We propose a vision transformer-based approach for malware picture identification because, in contrast to CNN, Transformer’s self-attentive process is not constrained by local interactions and can simultaneously compute long-range mine relationships. We use ViT-B/16 weights pre-trained on the ImageNet21k dataset to improve model generalization capability and fine-tune them for the malware image classification task. This work demonstrates that (i) a pure attention mechanism applies to malware recognition, and (ii) the Transformer can be used instead of traditional CNN for malware image recognition. We train and assess our models using the MalImg dataset and the BIG2015 dataset in this paper. Our experimental evaluation found that the recognition accuracy of transfer learning-based ViT for MalImg samples and BIG2015 samples is 99.14% and 98.22%, respectively. This study shows that training ViT models using transfer learning can perform better than CNN in malware family classification.</p> <p>&nbsp;</p>
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Qadri, Salman, Dost Muhammad Khan, Syed Furqan Qadri, Abdul Razzaq, Nazir Ahmad, Mutiullah Jamil, Ali Nawaz Shah, Syed Shah Muhammad, Khalid Saleem y Sarfraz Ahmad Awan. "Multisource Data Fusion Framework for Land Use/Land Cover Classification Using Machine Vision". Journal of Sensors 2017 (2017): 1–8. http://dx.doi.org/10.1155/2017/3515418.

Texto completo
Resumen
Data fusion is a powerful tool for the merging of multiple sources of information to produce a better output as compared to individual source. This study describes the data fusion of five land use/cover types, that is, bare land, fertile cultivated land, desert rangeland, green pasture, and Sutlej basin river land derived from remote sensing. A novel framework for multispectral and texture feature based data fusion is designed to identify the land use/land cover data types correctly. Multispectral data is obtained using a multispectral radiometer, while digital camera is used for image dataset. It has been observed that each image contained 229 texture features, while 30 optimized texture features data for each image has been obtained by joining together three features selection techniques, that is, Fisher, Probability of Error plus Average Correlation, and Mutual Information. This 30-optimized-texture-feature dataset is merged with five-spectral-feature dataset to build the fused dataset. A comparison is performed among texture, multispectral, and fused dataset using machine vision classifiers. It has been observed that fused dataset outperformed individually both datasets. The overall accuracy acquired using multilayer perceptron for texture data, multispectral data, and fused data was 96.67%, 97.60%, and 99.60%, respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Lambert, Reeve, Jalil Chavez-Galaviz, Jianwen Li y Nina Mahmoudian. "ROSEBUD: A Deep Fluvial Segmentation Dataset for Monocular Vision-Based River Navigation and Obstacle Avoidance". Sensors 22, n.º 13 (21 de junio de 2022): 4681. http://dx.doi.org/10.3390/s22134681.

Texto completo
Resumen
Obstacle detection for autonomous navigation through semantic image segmentation using neural networks has grown in popularity for use in unmanned ground and surface vehicles because of its ability to rapidly create a highly accurate pixel-wise classification of complex scenes. Due to the lack of available training data, semantic networks are rarely applied to navigation in complex water scenes such as rivers, creeks, canals, and harbors. This work seeks to address the issue by making a one-of-its-kind River Obstacle Segmentation En-Route By USV Dataset (ROSEBUD) publicly available for use in robotic SLAM applications that map water and non-water entities in fluvial images from the water level. ROSEBUD provides a challenging baseline for surface navigation in complex environments using complex fluvial scenes. The dataset contains 549 images encompassing various water qualities, seasons, and obstacle types that were taken on narrow inland rivers and then hand annotated for use in semantic network training. The difference between the ROSEBUD dataset and existing marine datasets was verified. Two state-of-the-art networks were trained on existing water segmentation datasets and tested for generalization to the ROSEBUD dataset. Results from further training show that modern semantic networks custom made for water recognition, and trained on marine images, can properly segment large areas, but they struggle to properly segment small obstacles in fluvial scenes without further training on the ROSEBUD dataset.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Dang, Minh. "Efficient Vision-Based Face Image Manipulation Identification Framework Based on Deep Learning". Electronics 11, n.º 22 (17 de noviembre de 2022): 3773. http://dx.doi.org/10.3390/electronics11223773.

Texto completo
Resumen
Image manipulation of the human face is a trending topic of image forgery, which is done by transforming or altering face regions using a set of techniques to accomplish desired outputs. Manipulated face images are spreading on the internet due to the rise of social media, causing various societal threats. It is challenging to detect the manipulated face images effectively because (i) there has been a limited number of manipulated face datasets because most datasets contained images generated by GAN models; (ii) previous studies have mainly extracted handcrafted features and fed them into machine learning algorithms to perform manipulated face detection, which was complicated, error-prone, and laborious; and (iii) previous models failed to prove why their model achieved good performances. In order to address these issues, this study introduces a large face manipulation dataset containing vast variations of manipulated images created and manually validated using various manipulation techniques. The dataset is then used to train a fine-tuned RegNet model to detect manipulated face images robustly and efficiently. Finally, a manipulated region analysis technique is implemented to provide some in-depth insights into the manipulated regions. The experimental results revealed that the RegNet model showed the highest classification accuracy of 89% on the proposed dataset compared to standard deep learning models.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Mouheb, Kaouther, Ali Yürekli y Burcu Yılmazel. "TRODO: A public vehicle odometers dataset for computer vision". Data in Brief 38 (octubre de 2021): 107321. http://dx.doi.org/10.1016/j.dib.2021.107321.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Jinfeng, Gao, Sehrish Qummar, Zhang Junming, Yao Ruxian y Fiaz Gul Khan. "Ensemble Framework of Deep CNNs for Diabetic Retinopathy Detection". Computational Intelligence and Neuroscience 2020 (15 de diciembre de 2020): 1–11. http://dx.doi.org/10.1155/2020/8864698.

Texto completo
Resumen
Diabetic retinopathy (DR) is an eye disease that damages the blood vessels of the eye. DR causes blurred vision or it may lead to blindness if it is not detected in early stages. DR has five stages, i.e., 0 normal, 1 mild, 2 moderate, 3 severe, and 4 PDR. Conventionally, many hand-on projects of computer vision have been applied to detect DR but cannot code the intricate underlying features. Therefore, they result in poor classification of DR stages, particularly for early stages. In this research, two deep CNN models were proposed with an ensemble technique to detect all the stages of DR by using balanced and imbalanced datasets. The models were trained with Kaggle dataset on a high-end Graphical Processing data. Balanced dataset was used to train both models, and we test these models with balanced and imbalanced datasets. The result shows that the proposed models detect all the stages of DR unlike the current methods and perform better compared to state-of-the-art methods on the same Kaggle dataset.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Zadorozhny, Vladimir, Patrick Manning, Daniel J. Bain y Ruth Mostern. "Collaborative for Historical Information and Analysis: Vision and Work Plan". Journal of World-Historical Information 1, n.º 1 (20 de febrero de 2013): 1–14. http://dx.doi.org/10.5195/jwhi.2013.2.

Texto completo
Resumen
This article conveys the vision of a world-historical dataset, constructed in order to provide data on human social affairs at the global level over the past several centuries. The construction of this dataset will allow the routine application of tools developed for analyzing “Big Data” to global, historical analysis. The work is conducted by the Collaborative for Historical Information and Analysis (CHIA). This association of groups at universities and research institutes in the U.S. and Europe includes five groups funded by the National Science Foundation for work to construct infrastructure for collecting and archiving data on a global level. The article identifies the elements of infrastructure-building, shows how they are connected, and sets the project in the context of previous and current efforts to build large-scale historical datasets. The project is developing a crowd-sourcing application for ingesting and documenting data, a broad and flexible archive, and a “data hoover” process to locate and gather historical datasets for inclusion. In addition, the article identifies four types of data and analytical questions to be explored through this data resource, addressing development, governance, social structure, and the interaction of social and natural variables.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Buzzelli, Marco, Alessio Albé y Gianluigi Ciocca. "A Vision-Based System for Monitoring Elderly People at Home". Applied Sciences 10, n.º 1 (3 de enero de 2020): 374. http://dx.doi.org/10.3390/app10010374.

Texto completo
Resumen
Assisted living technologies can be of great importance for taking care of elderly people and helping them to live independently. In this work, we propose a monitoring system designed to be as unobtrusive as possible, by exploiting computer vision techniques and visual sensors such as RGB cameras. We perform a thorough analysis of existing video datasets for action recognition, and show that no single dataset can be considered adequate in terms of classes or cardinality. We subsequently curate a taxonomy of human actions, derived from different sources in the literature, and provide the scientific community with considerations about the mutual exclusivity and commonalities of said actions. This leads us to collecting and publishing an aggregated dataset, called ALMOND (Assisted Living MONitoring Dataset), which we use as the training set for a vision-based monitoring approach.We rigorously evaluate our solution in terms of recognition accuracy using different state-of-the-art architectures, eventually reaching 97% on inference of basic poses, 83% on alerting situations, and 71% on daily life actions. We also provide a general methodology to estimate the maximum allowed distance between camera and monitored subject. Finally, we integrate the defined actions and the trained model into a computer-vision-based application, specifically designed for the objective of monitoring elderly people at their homes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Oppong, Stephen Opoku, Frimpong Twum, James Ben Hayfron-Acquah y Yaw Marfo Missah. "A Novel Computer Vision Model for Medicinal Plant Identification Using Log-Gabor Filters and Deep Learning Algorithms". Computational Intelligence and Neuroscience 2022 (27 de septiembre de 2022): 1–21. http://dx.doi.org/10.1155/2022/1189509.

Texto completo
Resumen
Computer vision is the science that enables computers and machines to see and perceive image content on a semantic level. It combines concepts, techniques, and ideas from various fields such as digital image processing, pattern matching, artificial intelligence, and computer graphics. A computer vision system is designed to model the human visual system on a functional basis as closely as possible. Deep learning and Convolutional Neural Networks (CNNs) in particular which are biologically inspired have significantly contributed to computer vision studies. This research develops a computer vision system that uses CNNs and handcrafted filters from Log-Gabor filters to identify medicinal plants based on their leaf textural features in an ensemble manner. The system was tested on a dataset developed from the Centre of Plant Medicine Research, Ghana (MyDataset) consisting of forty-nine (49) plant species. Using the concept of transfer learning, ten pretrained networks including Alexnet, GoogLeNet, DenseNet201, Inceptionv3, Mobilenetv2, Restnet18, Resnet50, Resnet101, vgg16, and vgg19 were used as feature extractors. The DenseNet201 architecture resulted with the best outcome of 87% accuracy and GoogLeNet with 79% preforming the worse averaged across six supervised learning algorithms. The proposed model (OTAMNet), created by fusing a Log-Gabor layer into the transition layers of the DenseNet201 architecture achieved 98% accuracy when tested on MyDataset. OTAMNet was tested on other benchmark datasets; Flavia, Swedish Leaf, MD2020, and the Folio dataset. The Flavia dataset achieved 99%, Swedish Leaf 100%, MD2020 99%, and the Folio dataset 97%. A false-positive rate of less than 0.1% was achieved in all cases.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Martínez-Villaseñor, Lourdes, Hiram Ponce, Jorge Brieva, Ernesto Moya-Albor, José Núñez-Martínez y Carlos Peñafort-Asturiano. "UP-Fall Detection Dataset: A Multimodal Approach". Sensors 19, n.º 9 (28 de abril de 2019): 1988. http://dx.doi.org/10.3390/s19091988.

Texto completo
Resumen
Falls, especially in elderly persons, are an important health problem worldwide. Reliable fall detection systems can mitigate negative consequences of falls. Among the important challenges and issues reported in literature is the difficulty of fair comparison between fall detection systems and machine learning techniques for detection. In this paper, we present UP-Fall Detection Dataset. The dataset comprises raw and feature sets retrieved from 17 healthy young individuals without any impairment that performed 11 activities and falls, with three attempts each. The dataset also summarizes more than 850 GB of information from wearable sensors, ambient sensors and vision devices. Two experimental use cases were shown. The aim of our dataset is to help human activity recognition and machine learning research communities to fairly compare their fall detection solutions. It also provides many experimental possibilities for the signal recognition, vision, and machine learning community.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Yu, Zhou, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang y Dacheng Tao. "ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julio de 2019): 9127–34. http://dx.doi.org/10.1609/aaai.v33i01.33019127.

Texto completo
Resumen
Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Schmitt, M. y Y. L. Wu. "REMOTE SENSING IMAGE CLASSIFICATION WITH THE SEN12MS DATASET". ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2021 (17 de junio de 2021): 101–6. http://dx.doi.org/10.5194/isprs-annals-v-2-2021-101-2021.

Texto completo
Resumen
Abstract. Image classification is one of the main drivers of the rapid developments in deep learning with convolutional neural networks for computer vision. So is the analogous task of scene classification in remote sensing. However, in contrast to the computer vision community that has long been using well-established, large-scale standard datasets to train and benchmark high-capacity models, the remote sensing community still largely relies on relatively small and often application-dependend datasets, thus lacking comparability. With this paper, we present a classification-oriented conversion of the SEN12MS dataset. Using that, we provide results for several baseline models based on two standard CNN architectures and different input data configurations. Our results support the benchmarking of remote sensing image classification and provide insights to the benefit of multi-spectral data and multi-sensor data fusion over conventional RGB imagery.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Panchal, Sachin, Ankita Naik, Manesh Kokare, Samiksha Pachade, Rushikesh Naigaonkar, Prerana Phadnis y Archana Bhange. "Retinal Fundus Multi-Disease Image Dataset (RFMiD) 2.0: A Dataset of Frequently and Rarely Identified Diseases". Data 8, n.º 2 (28 de enero de 2023): 29. http://dx.doi.org/10.3390/data8020029.

Texto completo
Resumen
Irreversible vision loss is a worldwide threat. Developing a computer-aided diagnosis system to detect retinal fundus diseases is extremely useful and serviceable to ophthalmologists. Early detection, diagnosis, and correct treatment could save the eye’s vision. Nevertheless, an eye may be afflicted with several diseases if proper care is not taken. A single retinal fundus image might be linked to one or more diseases. Age-related macular degeneration, cataracts, diabetic retinopathy, Glaucoma, and uncorrected refractive errors are the leading causes of visual impairment. Our research team at the center of excellence lab has generated a new dataset called the Retinal Fundus Multi-Disease Image Dataset 2.0 (RFMiD2.0). This dataset includes around 860 retinal fundus images, annotated by three eye specialists, and is a multiclass, multilabel dataset. We gathered images from a research facility in Jalna and Nanded, where patients across Maharashtra come for preventative and therapeutic eye care. Our dataset would be the second publicly available dataset consisting of the most frequent diseases, along with some rarely identified diseases. This dataset is auxiliary to the previously published RFMiD dataset. This dataset would be significant for the research and development of artificial intelligence in ophthalmology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Lee, Jaewoo, Sungjun Lee, Wonki Cho, Zahid Ali Siddiqui y Unsang Park. "Vision Transformer-Based Tailing Detection in Videos". Applied Sciences 11, n.º 24 (7 de diciembre de 2021): 11591. http://dx.doi.org/10.3390/app112411591.

Texto completo
Resumen
Tailing is defined as an event where a suspicious person follows someone closely. We define the problem of tailing detection from videos as an anomaly detection problem, where the goal is to find abnormalities in the walking pattern of the pedestrians (victim and follower). We, therefore, propose a modified Time-Series Vision Transformer (TSViT), a method for anomaly detection in video, specifically for tailing detection with a small dataset. We introduce an effective way to train TSViT with a small dataset by regularizing the prediction model. To do so, we first encode the spatial information of the pedestrians into 2D patterns and then pass them as tokens to the TSViT. Through a series of experiments, we show that the tailing detection on a small dataset using TSViT outperforms popular CNN-based architectures, as the CNN architectures tend to overfit with a small dataset of time-series images. We also show that when using time-series images, the performance of CNN-based architecture gradually drops, as the network depth is increased, to increase its capacity. On the other hand, a decreasing number of heads in Vision Transformer architecture shows good performance on time-series images, and the performance is further increased as the input resolution of the images is increased. Experimental results demonstrate that the TSViT performs better than the handcrafted rule-based method and CNN-based method for tailing detection. TSViT can be used in many applications for video anomaly detection, even with a small dataset.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Albattah, Waleed y Saleh Albahli. "Intelligent Arabic Handwriting Recognition Using Different Standalone and Hybrid CNN Architectures". Applied Sciences 12, n.º 19 (10 de octubre de 2022): 10155. http://dx.doi.org/10.3390/app121910155.

Texto completo
Resumen
Handwritten character recognition is a computer-vision-system problem that is still critical and challenging in many computer-vision tasks. With the increased interest in handwriting recognition as well as the developments in machine-learning and deep-learning algorithms, researchers have made significant improvements and advances in developing English-handwriting-recognition methodologies; however, Arabic handwriting recognition has not yet received enough interest. In this work, several deep-learning and hybrid models were created. The methodology of the current study took advantage of machine learning in classification and deep learning in feature extraction to create hybrid models. Among the standalone deep-learning models trained on the two datasets used in the experiments performed, the best results were obtained with the transfer-learning model on the MNIST dataset, with 0.9967 accuracy achieved. The results for the hybrid models using the MNIST dataset were good, with accuracy measures exceeding 0.9 for all the hybrid models; however, the results for the hybrid models using the Arabic character dataset were inferior.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Andriyanov, Nikita. "Application of Graph Structures in Computer Vision Tasks". Mathematics 10, n.º 21 (29 de octubre de 2022): 4021. http://dx.doi.org/10.3390/math10214021.

Texto completo
Resumen
On the one hand, the solution of computer vision tasks is associated with the development of various kinds of images or random fields mathematical models, i.e., algorithms, that are called traditional image processing. On the other hand, nowadays, deep learning methods play an important role in image recognition tasks. Such methods are based on convolutional neural networks that perform many matrix multiplication operations with model parameters and local convolutions and pooling operations. However, the modern artificial neural network architectures, such as transformers, came to the field of machine vision from natural language processing. Image transformers operate with embeddings, in the form of mosaic blocks of picture and the links between them. However, the use of graph methods in the design of neural networks can also increase efficiency. In this case, the search for hyperparameters will also include an architectural solution, such as the number of hidden layers and the number of neurons for each layer. The article proposes to use graph structures to develop simple recognition networks on different datasets, including small unbalanced X-ray image datasets, widely known the CIFAR-10 dataset and the Kaggle competition Dogs vs Cats dataset. Graph methods are compared with various known architectures and with networks trained from scratch. In addition, an algorithm for representing an image in the form of graph lattice segments is implemented, for which an appropriate description is created, based on graph data structures. This description provides quite good accuracy and performance of recognition. The effectiveness of this approach based, on the descriptors of the resulting segments, is shown, as well as the graph methods for the architecture search.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Bunyamin, Hendra. "Utilizing Indonesian Universal Language Model Fine-tuning for Text Classification". Journal of Information Technology and Computer Science 5, n.º 3 (25 de enero de 2021): 325–37. http://dx.doi.org/10.25126/jitecs.202053215.

Texto completo
Resumen
Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Baba, Tetsuaki. "VIDVIP: Dataset for Object Detection During Sidewalk Travel". Journal of Robotics and Mechatronics 33, n.º 5 (20 de octubre de 2021): 1135–43. http://dx.doi.org/10.20965/jrm.2021.p1135.

Texto completo
Resumen
In this paper, we report on the “VIsual Dataset for Visually Impaired Persons” (VIDVIP), a dataset for obstacle detection during sidewalk travel. In recent years, there have been many reports on assistive technologies using deep learning and computer vision technologies; nevertheless, developers cannot implement the corresponding applications without datasets. Although a number of open-source datasets have been released by research institutes and companies, large-scale datasets are not as abundant in the field of disability support, owing to their high development costs. Therefore, we began developing a dataset for outdoor mobility support for the visually impaired in April 2018. As of May 1, 2021, we have annotated 538,747 instances for 32,036 images in 39 classes of labels. We have implemented and tested navigation systems and other applications that utilize our dataset. In this study, we first compare our dataset with other general-purpose datasets, and show that our dataset has properties similar to those of datasets for automated driving. As a result of the discussion on the characteristics of the dataset, it is shown that the nature of the image shooting location, rather than the regional characteristics, tends to affect the annotation ratio. Accordingly, it is possible to examine the type of location based on the nature of the shooting location, and to infer the maintenance statuses of traffic facilities (such as Braille blocks) from the annotation ratio.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Kim, Sangwon, Jaeyeal Nam y Byoung Chul Ko. "Facial Expression Recognition Based on Squeeze Vision Transformer". Sensors 22, n.º 10 (13 de mayo de 2022): 3729. http://dx.doi.org/10.3390/s22103729.

Texto completo
Resumen
In recent image classification approaches, a vision transformer (ViT) has shown an excellent performance beyond that of a convolutional neural network. A ViT achieves a high classification for natural images because it properly preserves the global image features. Conversely, a ViT still has many limitations in facial expression recognition (FER), which requires the detection of subtle changes in expression, because it can lose the local features of the image. Therefore, in this paper, we propose Squeeze ViT, a method for reducing the computational complexity by reducing the number of feature dimensions while increasing the FER performance by concurrently combining global and local features. To measure the FER performance of Squeeze ViT, experiments were conducted on lab-controlled FER datasets and a wild FER dataset. Through comparative experiments with previous state-of-the-art approaches, we proved that the proposed method achieves an excellent performance on both types of datasets.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Siva Sundhara Raja, D. y S. Vasuki. "Automatic Detection of Blood Vessels in Retinal Images for Diabetic Retinopathy Diagnosis". Computational and Mathematical Methods in Medicine 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/419279.

Texto completo
Resumen
Diabetic retinopathy (DR) is a leading cause of vision loss in diabetic patients. DR is mainly caused due to the damage of retinal blood vessels in the diabetic patients. It is essential to detect and segment the retinal blood vessels for DR detection and diagnosis, which prevents earlier vision loss in diabetic patients. The computer aided automatic detection and segmentation of blood vessels through the elimination of optic disc (OD) region in retina are proposed in this paper. The OD region is segmented using anisotropic diffusion filter and subsequentially the retinal blood vessels are detected using mathematical binary morphological operations. The proposed methodology is tested on two different publicly available datasets and achieved 93.99% sensitivity, 98.37% specificity, 98.08% accuracy in DRIVE dataset and 93.6% sensitivity, 98.96% specificity, and 95.94% accuracy in STARE dataset, respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Knyaz, V. A. y P. V. Moshkantsev. "JOINT GEOMETRIC CALIBRATION OF COLOR AND THERMAL CAMERAS FOR SYNCHRONIZED MULTIMODAL DATASET CREATING". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W18 (29 de noviembre de 2019): 79–84. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w18-79-2019.

Texto completo
Resumen
Abstract. With increasing performance and availability of thermal cameras the number of applications using them in various purposes grows noticeable. Nowadays thermal vision is widely used in industrial control and monitoring, thermal mapping of industrial areas, surveillance and robotics which output huge amount of thermal images. This circumstance creates the necessary basis for applying deep learning which demonstrates the state-of-the-art performance for the most complicated computer vision tasks. Using different modalities for scene analysis allows to outperform results of mono-modal processing, but in case of machine learning it requires synchronized annotated multimodal dataset. The prerequisite condition for such dataset creating is geometric calibration of sensors used for image acquisition. So the purpose of the performed study was to develop a technique for joint calibration of color and long wave infra-red cameras which are to be used for collecting multimodal dataset needed for the tasks of computer vision algorithms developing and evaluating.The paper presents the techniques for camera parameters estimation and experimental evaluation of interior orientation of color and long wave infra-red cameras for further exploiting in datasets collecting. Also the results of geometrically calibrated camera exploiting for 3D reconstruction and 3D model realistic texturing based on visible and thermal imagery are presented. They proved the effectivity of the developed techniques for collecting and augmenting synchronized multimodal imagery dataset for convolutional neural networks model training and evaluating.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Shao, Hongmin, Jingyu Pu y Jiong Mu. "Pig-Posture Recognition Based on Computer Vision: Dataset and Exploration". Animals 11, n.º 5 (30 de abril de 2021): 1295. http://dx.doi.org/10.3390/ani11051295.

Texto completo
Resumen
Posture changes in pigs during growth are often precursors of disease. Monitoring pigs’ behavioral activities can allow us to detect pathological changes in pigs earlier and identify the factors threatening the health of pigs in advance. Pigs tend to be farmed on a large scale, and manual observation by keepers is time consuming and laborious. Therefore, the use of computers to monitor the growth processes of pigs in real time, and to recognize the duration and frequency of pigs’ postural changes over time, can prevent outbreaks of porcine diseases. The contributions of this article are as follows: (1) The first human-annotated pig-posture-identification dataset in the world was established, including 800 pictures of each of the four pig postures: standing, lying on the stomach, lying on the side, and exploring. (2) When using a deep separable convolutional network to classify pig postures, the accuracy was 92.45%. The results show that the method proposed in this paper achieves adequate pig-posture recognition in a piggery environment and may be suitable for livestock farm applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Nine, Julkar y Aarti Kishor Anapunje. "Dataset Evaluation for Multi Vehicle Detection using Vision Based Techniques". Embedded Selforganising Systems 8, n.º 2 (21 de diciembre de 2021): 8–14. http://dx.doi.org/10.14464/ess.v8i2.492.

Texto completo
Resumen
Vehicle detection is one of the primal challenges of modern driver-assistance systems owing to the numerous factors, for instance, complicated surroundings, diverse types of vehicles with varied appearance and magnitude, low-resolution videos, fast-moving vehicles. It is utilized for multitudinous applications including traffic surveillance and collision prevention. This paper suggests a Vehicle Detection algorithm developed on Image Processing and Machine Learning. The presented algorithm is predicated on a Support Vector Machine(SVM) Classifier which employs feature vectors extracted via Histogram of Gradients(HOG) approach conducted on a semi-real time basis. A comparison study is presented stating the performance metrics of the algorithm on different datasets.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

GRECU, LACRAMIOARA LITA y ELENA PELICAN. "Customized orthogonalization via deflation algorithm with applications in face recognition". Carpathian Journal of Mathematics 30, n.º 2 (2014): 231–38. http://dx.doi.org/10.37193/cjm.2014.02.05.

Texto completo
Resumen
The face recognition problem is a topical issue in computer vision. In this paper we propose a customized version of the orthogonalization via deflation algorithm to tackle this problem. We test the new proposed algorithm on two datasets: the well-known ORL dataset and an own face dataset, CTOVF; also, we compare our results (in terms of rate recognition and average quiery time) with the outcome of a standard algorithm in this class (dimension reduction methods using numerical linear algebra tools).
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Wang, Hongmiao, Cheng Xing, Junjun Yin y Jian Yang. "Land Cover Classification for Polarimetric SAR Images Based on Vision Transformer". Remote Sensing 14, n.º 18 (18 de septiembre de 2022): 4656. http://dx.doi.org/10.3390/rs14184656.

Texto completo
Resumen
Deep learning methods have been widely studied for Polarimetric synthetic aperture radar (PolSAR) land cover classification. The scarcity of PolSAR labeled samples and the small receptive field of the model limit the performance of deep learning methods for land cover classification. In this paper, a vision Transformer (ViT)-based classification method is proposed. The ViT structure can extract features from the global range of images based on a self-attention block. The powerful feature representation capability of the model is equivalent to a flexible receptive field, which is suitable for PolSAR image classification at different resolutions. In addition, because of the lack of labeled data, the Mask Autoencoder method is used to pre-train the proposed model with unlabeled data. Experiments are carried out on the Flevoland dataset acquired by NASA/JPL AIRSAR and the Hainan dataset acquired by the Aerial Remote Sensing System of the Chinese Academy of Sciences. The experimental results on both datasets demonstrate the superiority of the proposed method.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Zareen, Syeda Shamaila, Sun Guangmin, Yu Li, Mahwish Kundi, Salman Qadri, Syed Furqan Qadri, Mubashir Ahmad y Ali Haider Khan. "A Machine Vision Approach for Classification of Skin Cancer Using Hybrid Texture Features". Computational Intelligence and Neuroscience 2022 (18 de julio de 2022): 1–11. http://dx.doi.org/10.1155/2022/4942637.

Texto completo
Resumen
The main purpose of this study is to observe the importance of machine vision (MV) approach for the identification of five types of skin cancers, namely, actinic-keratosis, benign, solar-lentigo, malignant, and nevus. The 1000 (200 × 5) benchmark image datasets of skin cancers are collected from the International Skin Imaging Collaboration (ISIC). The acquired ISIC image datasets were transformed into texture feature dataset that was a combination of first-order histogram and gray level co-occurrence matrix (GLCM) features. For the skin cancer image, a total of 137,400 (229 × 3 x 200) texture features were acquired on three nonover-lapping regions of interest (ROIs). Principal component analysis (PCA) clustering approach was employed for reducing the dimension of feature dataset. Each image acquired twenty most discriminate features based on two different approaches of statistical features such as average correlation coefficient plus probability of error (ACC + POE) and Fisher (Fis). Furthermore, a correlation-based feature selection (CFS) approach was employed for feature reduction, and optimized 12 features were acquired. Furthermore, a classification algorithm naive bayes (NB), Bayes Net (BN), LMT Tree, and multilayer perception (MLP) using 10 K-fold cross-validation approach were employed on optimized feature datasets and the overall accuracy achieved by MLP is 97.1333%.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Sharet, Nir y Ilan Shimshoni. "Analyzing Data Changes using Mean Shift Clustering". International Journal of Pattern Recognition and Artificial Intelligence 30, n.º 07 (25 de mayo de 2016): 1650016. http://dx.doi.org/10.1142/s0218001416500166.

Texto completo
Resumen
A nonparametric unsupervised method for analyzing changes in complex datasets is proposed. It is based on the mean shift clustering algorithm. Mean shift is used to cluster the old and new datasets and compare the results in a nonparametric manner. Each point from the new dataset naturally belongs to a cluster of points from its dataset. The method is also able to find to which cluster the point belongs in the old dataset and use this information to report qualitative differences between that dataset and the new one. Changes in local cluster distribution are also reported. The report can then be used to try to understand the underlying reasons which caused the changes in the distributions. On the basis of this method, a transductive transfer learning method for automatically labeling data from the new dataset is also proposed. This labeled data is used, in addition to the old training set, to train a classifier better suited to the new dataset. The algorithm has been implemented and tested on simulated and real (a stereo image pair) datasets. Its performance was also compared with several state-of-the-art methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Khan, Sulaiman, Habib Ullah Khan y Shah Nazir. "Offline Pashto Characters Dataset for OCR Systems". Security and Communication Networks 2021 (27 de julio de 2021): 1–7. http://dx.doi.org/10.1155/2021/3543816.

Texto completo
Resumen
In computer vision and artificial intelligence, text recognition and analysis based on images play a key role in the text retrieving process. Enabling a machine learning technique to recognize handwritten characters of a specific language requires a standard dataset. Acceptable handwritten character datasets are available in many languages including English, Arabic, and many more. However, the lack of datasets for handwritten Pashto characters hinders the application of a suitable machine learning algorithm for recognizing useful insights. In order to address this issue, this study presents the first handwritten Pashto characters image dataset (HPCID) for the scientific research work. This dataset consists of fourteen thousand, seven hundred, and eighty-four samples—336 samples for each of the 44 characters in the Pashto character dataset. Such samples of handwritten characters are collected on an A4-sized paper from different students of Pashto Department in University of Peshawar, Khyber Pakhtunkhwa, Pakistan. On total, 336 students and faculty members contributed in developing the proposed database accumulation phase. This dataset contains multisize, multifont, and multistyle characters and of varying structures.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Majidifard, Hamed, Peng Jin, Yaw Adu-Gyamfi y William G. Buttlar. "Pavement Image Datasets: A New Benchmark Dataset to Classify and Densify Pavement Distresses". Transportation Research Record: Journal of the Transportation Research Board 2674, n.º 2 (febrero de 2020): 328–39. http://dx.doi.org/10.1177/0361198120907283.

Texto completo
Resumen
Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community. Recent developments in deep learning have led to considerable research activity directed towards improving the efficacy of automated pavement distress identification and rating. Deep learning models require a large ground truth data set, which is often not readily available in the case of pavements. In this study, a labeled dataset approach is introduced as a first step towards a more robust, easy-to-deploy pavement condition assessment system. The technique is termed herein as the pavement image dataset (PID) method. The dataset consists of images captured from two camera views of an identical pavement segment, that is, a wide view and a top-down view. The wide-view images were used to classify the distresses and to train the deep learning frameworks, while the top-down-view images allowed calculation of distress density, which will be used in future studies aimed at automated pavement rating. For the wide view group dataset, 7,237 images were manually annotated and distresses classified into nine categories. Images were extracted using the Google application programming interface (API), selecting street-view images using a python-based code developed for this project. The new dataset was evaluated using two mainstream deep learning frameworks: You Only Look Once (YOLO v2) and Faster Region Convolution Neural Network (Faster R-CNN). Accuracy scores using the F1 index were found to be 0.84 for YOLOv2 and 0.65 for the Faster R-CNN model runs; both quite acceptable considering the convenience of utilizing Google Maps images.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Shin, Jungpil, Akitaka Matsuoka, Md Al Mehedi Hasan y Azmain Yakin Srizon. "American Sign Language Alphabet Recognition by Extracting Feature from Hand Pose Estimation". Sensors 21, n.º 17 (31 de agosto de 2021): 5856. http://dx.doi.org/10.3390/s21175856.

Texto completo
Resumen
Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based approaches. Due to the cost-effectiveness of vision-based approaches, researchers have been conducted here also despite the accuracy drop. The purpose of this research is to recognize American sign characters using hand images obtained from a web camera. In this work, the media-pipe hands algorithm was used for estimating hand joints from RGB images of hands obtained from a web camera and two types of features were generated from the estimated coordinates of the joints obtained for classification: one is the distances between the joint points and the other one is the angles between vectors and 3D axes. The classifiers utilized to classify the characters were support vector machine (SVM) and light gradient boosting machine (GBM). Three character datasets were used for recognition: the ASL Alphabet dataset, the Massey dataset, and the finger spelling A dataset. The results obtained were 99.39% for the Massey dataset, 87.60% for the ASL Alphabet dataset, and 98.45% for Finger Spelling A dataset. The proposed design for automatic American sign language recognition is cost-effective, computationally inexpensive, does not require any special sensors or devices, and has outperformed previous studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Ullah, Najib, Muhammad Ismail Mohmand, Kifayat Ullah, Mohammed S. M. Gismalla, Liaqat Ali, Shafqat Ullah Khan y Niamat Ullah. "Diabetic Retinopathy Detection Using Genetic Algorithm-Based CNN Features and Error Correction Output Code SVM Framework Classification Model". Wireless Communications and Mobile Computing 2022 (25 de julio de 2022): 1–13. http://dx.doi.org/10.1155/2022/7095528.

Texto completo
Resumen
Diabetic retinopathy (DR) is a type of eye disease that may be caused in individuals suffering from diabetes which results in vision loss. DR identification and routine diagnosis is a challenging task and may need several screenings. Early identification of DR has the potential to prevent or delay vision loss. For real-time applications, an automated DR identification approach is required to assist and reduce possible human mistakes. In this research work, we propose a deep neural network and genetic algorithm-based feature selection approach. Five advanced convolutional neural network architectures are used to extract features from the fundus images, i.e., AlexNet, NASNet-Large, VGG-19, Inception V3, and ShuffleNet, followed by genetic algorithm for feature selection and ranking features into high rank (optimal) and lower rank (unsatisfactory). The nonoptimal feature attributes from the training and validation feature vectors are then dropped. Support vector machine- (SVM-) based classification model is used to develop diabetic retinopathy recognition model. The model performance is evaluated using accuracy, precision, recall, and F1 score. The proposed model is tested on three different datasets: the Kaggle dataset, a self-generated custom dataset, and an enhanced custom dataset with 97.9%, 94.76%, and 96.4% accuracy, respectively. In the enhanced custom dataset, data augmentation has been performed due to the smaller size of the dataset and to eliminate the noise in fundus images.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Kim, Alexander, Kyuhyup Lee, Seojoon Lee, Jinwoo Song, Soonwook Kwon y Suwan Chung. "Synthetic Data and Computer-Vision-Based Automated Quality Inspection System for Reused Scaffolding". Applied Sciences 12, n.º 19 (8 de octubre de 2022): 10097. http://dx.doi.org/10.3390/app121910097.

Texto completo
Resumen
Regular scaffolding quality inspection is an essential part of construction safety. However, current evaluation methods and quality requirements for temporary structures are based on subjective visual inspection by safety managers. Accordingly, the assessment process and results depend on an inspector’s competence, experience, and human factors, making objective analysis complex. The safety inspections performed by specialized services bring additional costs and increase evaluation times. Therefore, a temporary structure quality and safety evaluation system based on experts’ experience and independent of the human factor is the relevant solution in intelligent construction. This study aimed to present a quality evaluation system prototype for scaffolding parts based on computer vision. The main steps of the proposed system development are preparing a dataset, designing a neural network (NN) model, and training and evaluating the model. Since traditional methods of preparing a dataset are very laborious and time-consuming, this work used mixed real and synthetic datasets modeled in Blender. Further, the resulting datasets were processed using artificial intelligence algorithms to obtain information about defect type, size, and location. Finally, the tested parts’ quality classes were calculated based on the obtained defect values.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Ishiwaka, Yuko, Xiao S. Zeng, Michael Lee Eastman, Sho Kakazu, Sarah Gross, Ryosuke Mizutani y Masaki Nakada. "Foids". ACM Transactions on Graphics 40, n.º 6 (diciembre de 2021): 1–15. http://dx.doi.org/10.1145/3478513.3480520.

Texto completo
Resumen
We present a bio-inspired fish simulation platform, which we call "Foids", to generate realistic synthetic datasets for an use in computer vision algorithm training. This is a first-of-its-kind synthetic dataset platform for fish, which generates all the 3D scenes just with a simulation. One of the major challenges in deep learning based computer vision is the preparation of the annotated dataset. It is already hard to collect a good quality video dataset with enough variations; moreover, it is a painful process to annotate a sufficiently large video dataset frame by frame. This is especially true when it comes to a fish dataset because it is difficult to set up a camera underwater and the number of fish (target objects) in the scene can range up to 30,000 in a fish cage on a fish farm. All of these fish need to be annotated with labels such as a bounding box or silhouette, which can take hours to complete manually, even for only a few minutes of video. We solve this challenge by introducing a realistic synthetic dataset generation platform that incorporates details of biology and ecology studied in the aquaculture field. Because it is a simulated scene, it is easy to generate the scene data with annotation labels from the 3D mesh geometry data and transformation matrix. To this end, we develop an automated fish counting system utilizing the part of synthetic dataset that shows comparable counting accuracy to human eyes, which reduces the time compared to the manual process, and reduces physical injuries sustained by the fish.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Krapf, Sebastian, Lukas Bogenrieder, Fabian Netzler, Georg Balke y Markus Lienkamp. "RID—Roof Information Dataset for Computer Vision-Based Photovoltaic Potential Assessment". Remote Sensing 14, n.º 10 (10 de mayo de 2022): 2299. http://dx.doi.org/10.3390/rs14102299.

Texto completo
Resumen
Computer vision has great potential to accelerate the global scale of photovoltaic potential analysis by extracting detailed roof information from high-resolution aerial images, but the lack of existing deep learning datasets is a major barrier. Therefore, we present the Roof Information Dataset for semantic segmentation of roof segments and roof superstructures. We assessed the label quality of initial roof superstructure annotations by conducting an annotation experiment and identified annotator agreements of 0.15–0.70 mean intersection over union, depending on the class. We discuss associated the implications on the training and evaluation of two convolutional neural networks and found that the quality of the prediction behaved similarly to the annotator agreement for most classes. The class photovoltaic module was predicted to be best with a class-specific mean intersection over union of 0.69. By providing the datasets in initial and reviewed versions, we promote a data-centric approach for the semantic segmentation of roof information. Finally, we conducted a photovoltaic potential analysis case study and demonstrated the high impact of roof superstructures as well as the viability of the computer vision approach to increase accuracy. While this paper’s primary use case was roof information extraction for photovoltaic potential analysis, its implications can be transferred to other computer vision applications in remote sensing and beyond.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Iskandaryan, Ditsuhi, Francisco Ramos y Sergio Trilles. "Features Exploration from Datasets Vision in Air Quality Prediction Domain". Atmosphere 12, n.º 3 (28 de febrero de 2021): 312. http://dx.doi.org/10.3390/atmos12030312.

Texto completo
Resumen
Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Kitzler, Florian, Norbert Barta, Reinhard W. Neugschwandtner, Andreas Gronauer y Viktoria Motsch. "WE3DS: An RGB-D Image Dataset for Semantic Segmentation in Agriculture". Sensors 23, n.º 5 (1 de marzo de 2023): 2713. http://dx.doi.org/10.3390/s23052713.

Texto completo
Resumen
Smart farming (SF) applications rely on robust and accurate computer vision systems. An important computer vision task in agriculture is semantic segmentation, which aims to classify each pixel of an image and can be used for selective weed removal. State-of-the-art implementations use convolutional neural networks (CNN) that are trained on large image datasets. In agriculture, publicly available RGB image datasets are scarce and often lack detailed ground-truth information. In contrast to agriculture, other research areas feature RGB-D datasets that combine color (RGB) with additional distance (D) information. Such results show that including distance as an additional modality can improve model performance further. Therefore, we introduce WE3DS as the first RGB-D image dataset for multi-class plant species semantic segmentation in crop farming. It contains 2568 RGB-D images (color image and distance map) and corresponding hand-annotated ground-truth masks. Images were taken under natural light conditions using an RGB-D sensor consisting of two RGB cameras in a stereo setup. Further, we provide a benchmark for RGB-D semantic segmentation on the WE3DS dataset and compare it with a solely RGB-based model. Our trained models achieve up to 70.7% mean Intersection over Union (mIoU) for discriminating between soil, seven crop species, and ten weed species. Finally, our work confirms the finding that additional distance information improves segmentation quality.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Yar, Hikmat, Tanveer Hussain, Zulfiqar Ahmad Khan, Deepika Koundal, Mi Young Lee y Sung Wook Baik. "Vision Sensor-Based Real-Time Fire Detection in Resource-Constrained IoT Environments". Computational Intelligence and Neuroscience 2021 (21 de diciembre de 2021): 1–15. http://dx.doi.org/10.1155/2021/5195508.

Texto completo
Resumen
Fire detection and management is very important to prevent social, ecological, and economic damages. However, achieving real-time fire detection with higher accuracy in an IoT environment is a challenging task due to limited storage, transmission, and computation resources. To overcome these challenges, early fire detection and automatic response are very significant. Therefore, we develop a novel framework based on a lightweight convolutional neural network (CNN), requiring less training time, and it is applicable over resource-constrained devices. The internal architecture of the proposed model is inspired by the block-wise VGG16 architecture with a significantly reduced number of parameters, input size, inference time, and comparatively higher accuracy for early fire detection. In the proposed model, small-size uniform convolutional filters are employed that are specifically designed to capture fine details of input fire images with a sequentially increasing number of channels to aid effective feature extraction. The proposed model is evaluated on two datasets such as a benchmark Foggia’s dataset and our newly created small-scaled fire detection dataset with extremely challenging real-world images containing a high-level of diversity. Experimental results conducted on both datasets reveal the better performance of the proposed model compared to state-of-the-art in terms of accuracy, false-positive rate, model size, and running time, which indicates its robustness and feasible installation in real-world scenarios.
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Paul, Sayak y Pin-Yu Chen. "Vision Transformers Are Robust Learners". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 2 (28 de junio de 2022): 2071–81. http://dx.doi.org/10.1609/aaai.v36i2.20103.

Texto completo
Resumen
Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) (Dosovitskiy et al. 2021) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT(Dosovitskiy et al. 2021) models and SOTA convolutional neural networks (CNNs), Big-Transfer (Kolesnikov et al. 2020). Through a series of six systematically designed experiments, we then present analyses that provide both quantitative andqualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1accuracy of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https://git.io/J3VO0.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Puertas, Enrique, Gonzalo De-Las-Heras, Javier Sánchez-Soriano y Javier Fernández-Andrés. "Dataset: Variable Message Signal Annotated Images for Object Detection". Data 7, n.º 4 (1 de abril de 2022): 41. http://dx.doi.org/10.3390/data7040041.

Texto completo
Resumen
This publication presents a dataset consisting of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Additionally, a CSV file is attached with information regarding the geographic position, the folder where the image is located and the text in Spanish. This can be used to train supervised learning computer vision algorithms such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition and labeling and its specifications are detailed. The dataset constitutes 1216 instances, 888 positives and 328 negatives, in 1152 jpg images with a resolution of 1280 × 720 pixels. These are divided into 756 real images and 756 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Kim, Taehoon y Wooguil Pak. "Deep Learning-Based Network Intrusion Detection Using Multiple Image Transformers". Applied Sciences 13, n.º 5 (21 de febrero de 2023): 2754. http://dx.doi.org/10.3390/app13052754.

Texto completo
Resumen
The development of computer vision-based deep learning models for accurate two-dimensional (2D) image classification has enabled us to surpass existing machine learning-based classifiers and human classification capabilities. Recently, steady efforts have been made to apply these sophisticated vision-based deep learning models as network intrusion detection domains, and various experimental results have confirmed their applicability and limitations. In this paper, we present an optimized method for processing network intrusion detection system (NIDS) datasets using vision-based deep learning models by further expanding existing studies to overcome these limitations. In the proposed method, the NIDS dataset can further enhance the performance of existing deep-learning-based intrusion detection by converting the dataset into 2D images through various image transformers and then integrating into three-channel RGB color images, unlike the existing method. Various performance evaluations confirm that the proposed method can significantly improve intrusion detection performance over the recent method using grayscale images, and existing NIDSs without the use of images. As network intrusion is increasingly evolving in complexity and variety, we anticipate that the intrusion detection algorithm outlined in this study will facilitate network security.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Rahman, Ehab Ur, Muhammad Asghar Khan, Fahad Algarni, Yihong Zhang, M. Irfan Uddin, Insaf Ullah y Hafiz Ishfaq Ahmad. "Computer Vision-Based Wildfire Smoke Detection Using UAVs". Mathematical Problems in Engineering 2021 (27 de abril de 2021): 1–9. http://dx.doi.org/10.1155/2021/9977939.

Texto completo
Resumen
This paper presents a new methodology based on texture and color for the detection and monitoring of different sources of forest fire smoke using unmanned aerial vehicles (UAVs). A novel dataset has been gathered comprised of thin smoke and dense smoke generated from the dry leaves on the floor of the forest, which is a source of igniting forest fires. A classification task has been done by training a feature extractor to check the feasibility of the proposed dataset. A meta-architecture is trained above the feature extractor to check the dataset viability for smoke detection and tracking. Results have been obtained by implementing the proposed methodology on forest fire smoke images, smoke videos taken on a stand by the camera, and real-time UAV footages. A microaverage F1-score of 0.865 has been achieved with different test videos. An F1-score of 0.870 has been achieved on real UAV footage of wildfire smoke. The structural similarity index has been used to show some of the difficulties encountered in smoke detection, along with examples.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía