Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: LIMITED DATASET.

Artykuły w czasopismach na temat „LIMITED DATASET”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „LIMITED DATASET”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Gusarova, Nataliya, Artem Lobantsev, Aleksandra Vatian, Anton Klochrov, Maxim Kabyshev, Anatoly Shalyto, Anna Tatarinova, Tatiana Treshkur i Min Li. "Generative augmentation to improve lung nodules detection in resource-limited settings". Information and Control Systems, nr 6 (15.12.2020): 60–69. http://dx.doi.org/10.31799/1684-8853-2020-6-60-69.

Pełny tekst źródła
Streszczenie:
Introduction: Lung cancer is one of the most formidable cancers. The use of neural networks technologies in its diagnostics is promising, but the datasets collected from real clinical practice cannot cover a variety of lung cancer manifestations. Purpose: Assessment of the possibility of improving the classification of pulmonary nodules by means of generative augmentation of available datasets under resource constraints. Methods: We used part of LIDC-IDRI dataset, the StyleGAN architecture for generating artificial lung nodules and the VGG11 model as a classifier. We generated pulmonary nodules using the proposed pipeline and invited four experts to visually evaluate them. We formed four experimental datasets with different types of augmentation, including use of synthesized data, and we compared the effectiveness of the classification performed by the VGG11 network when training for each dataset. Results: 10 generated nodules in each group of characteristics were presented for assessment. In all cases, positive expert assessments were obtained with a Fleiss's kappa coefficient k = 0.6–0.9. We got the best values of ROCAUC=0.9604 and PRAUC=0.9625 with the proposed approach of a generative augmentation. Discussion: The obtained efficience metrics are superior to the baseline results obtained using comparably small training datasets, and slightly less than the best results achieved using much more powerful computational resources. So, we have shown that one can effectively use for augmenting an unbalanced dataset a combination of StyleGAN and VGG11, which does not require large computing resources as well as a large initial dataset for training.
Style APA, Harvard, Vancouver, ISO itp.
2

Sarwati Rahayu, Sulis Sandiwarno, Erwin Dwika Putra, Marissa Utami i Hadiguna Setiawan. "Model Sequential Resnet50 Untuk Pengenalan Tulisan Tangan Aksara Arab". JSAI (Journal Scientific and Applied Informatics) 6, nr 2 (30.06.2023): 234–41. http://dx.doi.org/10.36085/jsai.v6i2.5379.

Pełny tekst źródła
Streszczenie:
Research for Arabic handwriting recognition is still limited. The number of public datasets regarding Arabic script is still limited for this type of public dataset. Therefore, each study usually uses its dataset to conduct research. However, recently public datasets have become available and become research opportunities to compare methods with the same dataset. This study aimed to determine the implementation of the transfer learning model with the best accuracy for handwriting recognition in Arabic script. The results of the experiment using ResNet50 are as follows: training accuracy is 91.63%, validation accuracy is 91.82%, and the testing accuracy is 95.03%.
Style APA, Harvard, Vancouver, ISO itp.
3

Mohammad Alfadli, Khadijah, i Alaa Omran Almagrabi. "Feature-Limited Prediction on the UCI Heart Disease Dataset". Computers, Materials & Continua 74, nr 3 (2023): 5871–83. http://dx.doi.org/10.32604/cmc.2023.033603.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Ko, Yu-Chieh, Wei-Shiang Chen, Hung-Hsun Chen, Tsui-Kang Hsu, Ying-Chi Chen, Catherine Jui-Ling Liu i Henry Horng-Shing Lu. "Widen the Applicability of a Convolutional Neural-Network-Assisted Glaucoma Detection Algorithm of Limited Training Images across Different Datasets". Biomedicines 10, nr 6 (3.06.2022): 1314. http://dx.doi.org/10.3390/biomedicines10061314.

Pełny tekst źródła
Streszczenie:
Automated glaucoma detection using deep learning may increase the diagnostic rate of glaucoma to prevent blindness, but generalizable models are currently unavailable despite the use of huge training datasets. This study aims to evaluate the performance of a convolutional neural network (CNN) classifier trained with a limited number of high-quality fundus images in detecting glaucoma and methods to improve its performance across different datasets. A CNN classifier was constructed using EfficientNet B3 and 944 images collected from one medical center (core model) and externally validated using three datasets. The performance of the core model was compared with (1) the integrated model constructed by using all training images from the four datasets and (2) the dataset-specific model built by fine-tuning the core model with training images from the external datasets. The diagnostic accuracy of the core model was 95.62% but dropped to ranges of 52.5–80.0% on the external datasets. Dataset-specific models exhibited superior diagnostic performance on the external datasets compared to other models, with a diagnostic accuracy of 87.50–92.5%. The findings suggest that dataset-specific tuning of the core CNN classifier effectively improves its applicability across different datasets when increasing training images fails to achieve generalization.
Style APA, Harvard, Vancouver, ISO itp.
5

Guo, Runze, Bei Sun, Xiaotian Qiu, Shaojing Su, Zhen Zuo i Peng Wu. "Fine-Grained Recognition of Surface Targets with Limited Data". Electronics 9, nr 12 (2.12.2020): 2044. http://dx.doi.org/10.3390/electronics9122044.

Pełny tekst źródła
Streszczenie:
Recognition of surface targets has a vital influence on the development of military and civilian applications such as maritime rescue patrols, illegal-vessel screening, and maritime operation monitoring. However, owing to the interference of visual similarity and environmental variations and the lack of high-quality datasets, accurate recognition of surface targets has always been a challenging task. In this paper, we introduce a multi-attention residual model based on deep learning methods, in which channel and spatial attention modules are applied for feature fusion. In addition, we use transfer learning to improve the feature expression capabilities of the model under conditions of limited data. A function based on metric learning is adopted to increase the distance between different classes. Finally, a dataset with eight types of surface targets is established. Comparative experiments on our self-built dataset show that the proposed method focuses more on discriminative regions, avoiding problems like gradient disappearance, and achieves better classification results than B-CNN, RA-CNN, MAMC, and MA-CNN, DFL-CNN.
Style APA, Harvard, Vancouver, ISO itp.
6

Gaikwad, Mayur, Swati Ahirrao, Shraddha Phansalkar, Ketan Kotecha i Shalli Rani. "Multi-Ideology, Multiclass Online Extremism Dataset, and Its Evaluation Using Machine Learning". Computational Intelligence and Neuroscience 2023 (1.03.2023): 1–33. http://dx.doi.org/10.1155/2023/4563145.

Pełny tekst źródła
Streszczenie:
Social media platforms play a key role in fostering the outreach of extremism by influencing the views, opinions, and perceptions of people. These platforms are increasingly exploited by extremist elements for spreading propaganda, radicalizing, and recruiting youth. Hence, research on extremism detection on social media platforms is essential to curb its influence and ill effects. A study of existing literature on extremism detection reveals that it is restricted to a specific ideology, binary classification with limited insights on extremism text, and manual data validation methods to check data quality. In existing research studies, researchers have used datasets limited to a single ideology. As a result, they face serious issues such as class imbalance, limited insights with class labels, and a lack of automated data validation methods. A major contribution of this work is a balanced extremism text dataset, versatile with multiple ideologies verified by robust data validation methods for classifying extremism text into popular extremism types such as propaganda, radicalization, and recruitment. The presented extremism text dataset is a generalization of multiple ideologies such as the standard ISIS dataset, GAB White Supremacist dataset, and recent Twitter tweets on ISIS and white supremacist ideology. The dataset is analyzed to extract features for the three focused classes in extremism with TF-IDF unigram, bigrams, and trigrams features. Additionally, pretrained word2vec features are used for semantic analysis. The extracted features in the proposed dataset are evaluated using machine learning classification algorithms such as multinomial Naïve Bayes, support vector machine, random forest, and XGBoost algorithms. The best results were achieved by support vector machine using the TF-IDF unigram model confirming 0.67 F1 score. The proposed multi-ideology and multiclass dataset shows comparable performance to the existing datasets limited to single ideology and binary labels.
Style APA, Harvard, Vancouver, ISO itp.
7

Huč, Aleks, Jakob Šalej i Mira Trebar. "Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices". Sensors 21, nr 14 (20.07.2021): 4946. http://dx.doi.org/10.3390/s21144946.

Pełny tekst źródła
Streszczenie:
The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the idea of moving intelligence to the IoT is becoming feasible, with machine learning (ML) moved to edge devices. The aim of this study is to provide an experimental analysis of processing a large imbalanced dataset (DS2OS), split into a training dataset (80%) and a test dataset (20%). The training dataset was reduced by randomly selecting a smaller number of samples to create new datasets Di (i = 1, 2, 5, 10, 15, 20, 40, 60, 80%). Afterwards, they were used with several machine learning algorithms to identify the size at which the performance metrics show saturation and classification results stop improving with an F1 score equal to 0.95 or higher, which happened at 20% of the training dataset. Further on, two solutions for the reduction of the number of samples to provide a balanced dataset are given. In the first, datasets DRi consist of all anomalous samples in seven classes and a reduced majority class (‘NL’) with i = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20 percent of randomly selected samples. In the second, datasets DCi are generated from the representative samples determined with clustering from the training dataset. All three dataset reduction methods showed comparable performance results. Further evaluation of training times and memory usage on Raspberry Pi 4 shows a possibility to run ML algorithms with limited sized datasets on edge devices.
Style APA, Harvard, Vancouver, ISO itp.
8

Muniraj, Inbarasan, Changliang Guo, Ra'ed Malallah, Harsha Vardhan R. Maraka, James P. Ryle i John T. Sheridan. "Subpixel based defocused points removal in photon-limited volumetric dataset". Optics Communications 387 (marzec 2017): 196–201. http://dx.doi.org/10.1016/j.optcom.2016.11.047.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Shin, Changho, Seungeun Rho, Hyoseop Lee i Wonjong Rhee. "Data Requirements for Applying Machine Learning to Energy Disaggregation". Energies 12, nr 9 (5.05.2019): 1696. http://dx.doi.org/10.3390/en12091696.

Pełny tekst źródła
Streszczenie:
Energy disaggregation, or nonintrusive load monitoring (NILM), is a technology for separating a household’s aggregate electricity consumption information. Although this technology was developed in 1992, its practical usage and mass deployment have been rather limited, possibly because the commonly used datasets are not adequate for NILM research. In this study, we report the findings from a newly collected dataset that contains 10 Hz sampling data for 58 houses. The dataset not only contains the aggregate measurements, but also individual appliance measurements for three types of appliances. By applying three classification algorithms (vanilla DNN (Deep Neural Network), ML (Machine Learning) with feature engineering, and CNN (Convolutional Neural Network) with hyper-parameter tuning) and a recent regression algorithm (Subtask Gated Network) to the new dataset, we show that NILM performance can be significantly limited when the data sampling rate is too low or when the number of distinct houses in the dataset is too small. The well-known NILM datasets that are popular in the research community do not meet these requirements. Our results indicate that higher quality datasets should be used to expedite the progress of NILM research.
Style APA, Harvard, Vancouver, ISO itp.
10

Althnian, Alhanoof, Duaa AlSaeed, Heyam Al-Baity, Amani Samha, Alanoud Bin Dris, Najla Alzakari, Afnan Abou Elwafa i Heba Kurdi. "Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain". Applied Sciences 11, nr 2 (15.01.2021): 796. http://dx.doi.org/10.3390/app11020796.

Pełny tekst źródła
Streszczenie:
Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.
Style APA, Harvard, Vancouver, ISO itp.
11

Meier, Deborah, i Wolfgang Tschacher. "Beyond Dyadic Coupling: The Method of Multivariate Surrogate Synchrony (mv-SUSY)". Entropy 23, nr 11 (22.10.2021): 1385. http://dx.doi.org/10.3390/e23111385.

Pełny tekst źródła
Streszczenie:
Measuring interpersonal synchrony is a promising approach to assess the complexity of social interaction, which however has been mostly limited to dyads. In this study, we introduce multivariate Surrogate Synchrony (mv-SUSY) to extend the current set of computational methods. Methods: mv-SUSY was applied to eight datasets consisting of 10 time series each, all with n = 9600 observations. Datasets 1 to 5 consist of simulated time series with the following characteristics: white noise (dataset 1), non-stationarity with linear time trends (dataset 2), autocorrelation (dataset 3), oscillation (dataset 4), and multivariate correlation (dataset 5). Datasets 6 to 8 comprise empirical multivariate movement data of two individuals (datasets 6 and 7) and between members of a group discussion (dataset 8.) Results: As hypothesized, findings of mv-SUSY revealed absence of synchrony in datasets 1 to 4 and presence of synchrony in dataset 5. In the empirical datasets, mv-SUSY indicated significant movement synchrony. These results were predominantly replicated by two well-established dyadic synchrony approaches, Surrogate Synchrony (SUSY) and Surrogate Concordance (SUCO). Conclusions: The study applied and evaluated a novel synchrony approach, mv-SUSY. We demonstrated the feasibility and validity of estimating multivariate nonverbal synchrony within and between individuals by mv-SUSY.
Style APA, Harvard, Vancouver, ISO itp.
12

Dinh, Thi Lan Anh, i Filipe Aires. "Nested leave-two-out cross-validation for the optimal crop yield model selection". Geoscientific Model Development 15, nr 9 (5.05.2022): 3519–35. http://dx.doi.org/10.5194/gmd-15-3519-2022.

Pełny tekst źródła
Streszczenie:
Abstract. The use of statistical models to study the impact of weather on crop yield has not ceased to increase. Unfortunately, this type of application is characterized by datasets with a very limited number of samples (typically one sample per year). In general, statistical inference uses three datasets: the training dataset to optimize the model parameters, the validation dataset to select the best model, and the testing dataset to evaluate the model generalization ability. Splitting the overall database into three datasets is often impossible in crop yield modelling due to the limited number of samples. The leave-one-out cross-validation method, or simply leave one out (LOO), is often used to assess model performance or to select among competing models when the sample size is small. However, the model choice is typically made using only the testing dataset, which can be misleading by favouring unnecessarily complex models. The nested cross-validation approach was introduced in machine learning to avoid this problem by truly utilizing three datasets even with limited databases. In this study, we propose one particular implementation of the nested cross-validation, called the nested leave-two-out cross-validation method or simply the leave two out (LTO), to choose the best model with an optimal model selection (using the validation dataset) and estimate the true model quality (using the testing dataset). Two applications are considered: robusta coffee in Cu M'gar (Dak Lak, Vietnam) and grain maize over 96 French departments. In both cases, LOO is misleading by choosing models that are too complex; LTO indicates that simpler models actually perform better when a reliable generalization test is considered. The simple models obtained using the LTO approach have improved yield anomaly forecasting skills in both study crops. This LTO approach can also be used in seasonal forecasting applications. We suggest that the LTO method should become a standard procedure for statistical crop modelling.
Style APA, Harvard, Vancouver, ISO itp.
13

de Rouw, Nikki, Sabine Visser, Stijn L. W. Koolen, Joachim G. J. V. Aerts, Michel M. van den Heuvel, Hieronymus J. Derijks, David M. Burger i Rob ter Heine. "A limited sampling schedule to estimate individual pharmacokinetics of pemetrexed in patients with varying renal functions". Cancer Chemotherapy and Pharmacology 85, nr 1 (18.12.2019): 231–35. http://dx.doi.org/10.1007/s00280-019-04006-x.

Pełny tekst źródła
Streszczenie:
Abstract Purpose Pemetrexed is a widely used cytostatic agent with an established exposure–response relationship. Although dosing is based on body surface area (BSA), large interindividual variability in pemetrexed plasma concentrations is observed. Therapeutic drug monitoring (TDM) can be a feasible strategy to reduce variability in specific cases leading to potentially optimized pemetrexed treatment. The aim of this study was to develop a limited sampling schedule (LSS) for the assessment of pemetrexed pharmacokinetics. Methods Based on two real-life datasets, several limited sampling designs were evaluated on predicting clearance, using NONMEM, based on mean prediction error (MPE %) and normalized root mean squared error (NRMSE %). The predefined criteria for an acceptable LSS were: a maximum of four sampling time points within 8 h with an MPE and NRMSE ≤ 20%. Results For an accurate estimation of clearance, only four samples in a convenient window of 8 h were required for accurate and precise prediction (MPE and NRMSE of 3.6% and 5.7% for dataset 1 and of 15.5% and 16.5% for dataset 2). A single sample at t = 24 h performed also within the criteria with MPE and NRMSE of 5.8% and 8.7% for dataset 1 and of 11.5% and 16.4% for dataset 2. Bias increased when patients had lower creatinine clearance. Conclusions We presented two limited sampling designs for estimation of pemetrexed pharmacokinetics. Either one can be used based on preference and feasibility.
Style APA, Harvard, Vancouver, ISO itp.
14

Abdulraheem, Abdulkabir, Jamiu T. Suleiman i Im Y. Jung. "Generative Adversarial Network Models for Augmenting Digit and Character Datasets Embedded in Standard Markings on Ship Bodies". Electronics 12, nr 17 (30.08.2023): 3668. http://dx.doi.org/10.3390/electronics12173668.

Pełny tekst źródła
Streszczenie:
Accurate recognition of characters imprinted on ship bodies is essential for ensuring operational efficiency, safety, and security in the maritime industry. However, the limited availability of datasets of specialized digits and characters poses a challenge. To overcome this challenge, we propose a generative adversarial network (GAN) model for augmenting the limited dataset of special digits and characters in ship markings. We evaluated the performance of various GAN models, and the Wasserstein GAN with Gradient Penalty (WGAN-GP) and Wasserstein GAN with divergence (WGANDIV) models demonstrated exceptional performance in generating high-quality synthetic images that closely resemble the original imprinted characters required for augmenting the limited datasets. And the evaluation metric, Fréchet inception distance, further validated the outstanding performance of the WGAN-GP and WGANDIV models, establishing them as optimal choices for dataset augmentation to enhance the accuracy and reliability of recognition systems.
Style APA, Harvard, Vancouver, ISO itp.
15

Xu, Xinkai, Hailan Zhang, Yan Ma, Kang Liu, Hong Bao i Xu Qian. "TranSDet: Toward Effective Transfer Learning for Small-Object Detection". Remote Sensing 15, nr 14 (12.07.2023): 3525. http://dx.doi.org/10.3390/rs15143525.

Pełny tekst źródła
Streszczenie:
Small-object detection is a challenging task in computer vision due to the limited training samples and low-quality images. Transfer learning, which transfers the knowledge learned from a large dataset to a small dataset, is a popular method for improving performance on limited data. However, we empirically find that due to the dataset discrepancy, directly transferring the model trained on a general object dataset to small-object datasets obtains inferior performance. In this paper, we propose TranSDet, a novel approach for effective transfer learning for small-object detection. Our method adapts a model trained on a general dataset to a small-object-friendly model by augmenting the training images with diverse smaller resolutions. A dynamic resolution adaptation scheme is employed to ensure consistent performance on various sizes of objects using meta-learning. Additionally, the proposed method introduces two network components, an FPN with shifted feature aggregation and an anchor relation module, which are compatible with transfer learning and effectively improve small-object detection performance. Extensive experiments on the TT100K, BUUISE-MO-Lite, and COCO datasets demonstrate that TranSDet achieves significant improvements compared to existing methods. For example, on the TT100K dataset, TranSDet outperforms the state-of-the-art method by 8.0% in terms of the mean average precision (mAP) for small-object detection. On the BUUISE-MO-Lite dataset, TranSDet improves the detection accuracy of RetinaNet and YOLOv3 by 32.2% and 12.8%, respectively.
Style APA, Harvard, Vancouver, ISO itp.
16

Tran, Thi-Dung, Junghee Kim, Ngoc-Huynh Ho, Hyung-Jeong Yang, Sudarshan Pant, Soo-Hyung Kim i Guee-Sang Lee. "Stress Analysis with Dimensions of Valence and Arousal in the Wild". Applied Sciences 11, nr 11 (3.06.2021): 5194. http://dx.doi.org/10.3390/app11115194.

Pełny tekst źródła
Streszczenie:
In the field of stress recognition, the majority of research has conducted experiments on datasets collected from controlled environments with limited stressors. As these datasets cannot represent real-world scenarios, stress identification and analysis are difficult. There is a dire need for reliable, large datasets that are specifically acquired for stress emotion with varying degrees of expression for this task. In this paper, we introduced a dataset for Stress Analysis with Dimensions of Valence and Arousal of Korean Movie in Wild (SADVAW), which includes video clips with diversity in facial expressions from different Korean movies. The SADVAW dataset contains continuous dimensions of valence and arousal. We presented a detailed statistical analysis of the dataset. We also analyzed the correlation between stress and continuous dimensions. Moreover, using the SADVAW dataset, we trained a deep learning-based model for stress recognition.
Style APA, Harvard, Vancouver, ISO itp.
17

Kadam, Kalyani Dhananjay, Swati Ahirrao i Ketan Kotecha. "Multiple Image Splicing Dataset (MISD): A Dataset for Multiple Splicing". Data 6, nr 10 (28.09.2021): 102. http://dx.doi.org/10.3390/data6100102.

Pełny tekst źródła
Streszczenie:
Image forgery has grown in popularity due to easy access to abundant image editing software. These forged images are so devious that it is impossible to predict with the naked eye. Such images are used to spread misleading information in society with the help of various social media platforms such as Facebook, Twitter, etc. Hence, there is an urgent need for effective forgery detection techniques. In order to validate the credibility of these techniques, publically available and more credible standard datasets are required. A few datasets are available for image splicing, such as Columbia, Carvalho, and CASIA V1.0. However, these datasets are employed for the detection of image splicing. There are also a few custom datasets available such as Modified CASIA, AbhAS, which are also employed for the detection of image splicing forgeries. A study of existing datasets used for the detection of image splicing reveals that they are limited to only image splicing and do not contain multiple spliced images. This research work presents a Multiple Image Splicing Dataset, which consists of a total of 300 multiple spliced images. We are the pioneer in developing the first publicly available Multiple Image Splicing Dataset containing high-quality, annotated, realistic multiple spliced images. In addition, we are providing a ground truth mask for these images. This dataset will open up opportunities for researchers working in this significant area.
Style APA, Harvard, Vancouver, ISO itp.
18

Shi, Zhengxiang, Qiang Zhang i Aldo Lipani. "StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 10 (28.06.2022): 11321–29. http://dx.doi.org/10.1609/aaai.v36i10.21383.

Pełny tekst źródła
Streszczenie:
Inferring spatial relations in natural language is a crucial ability an intelligent system should possess. The bAbI dataset tries to capture tasks relevant to this domain (task 17 and 19). However, these tasks have several limitations. Most importantly, they are limited to fixed expressions, they are limited in the number of reasoning steps required to solve them, and they fail to test the robustness of models to input that contains irrelevant or redundant information. In this paper, we present a new Question-Answering dataset called StepGame for robust multi-step spatial reasoning in texts. Our experiments demonstrate that state-of-the-art models on the bAbI dataset struggle on the StepGame dataset. Moreover, we propose a Tensor-Product based Memory-Augmented Neural Network (TP-MANN) specialized for spatial reasoning tasks. Experimental results on both datasets show that our model outperforms all the baselines with superior generalization and robustness performance.
Style APA, Harvard, Vancouver, ISO itp.
19

Wang, Hao, Suxing Lyu i Yaxin Ren. "Paddy Rice Imagery Dataset for Panicle Segmentation". Agronomy 11, nr 8 (31.07.2021): 1542. http://dx.doi.org/10.3390/agronomy11081542.

Pełny tekst źródła
Streszczenie:
Accurate panicle identification is a key step in rice-field phenotyping. Deep learning methods based on high-spatial-resolution images provide a high-throughput and accurate solution of panicle segmentation. Panicle segmentation tasks require costly annotations to train an accurate and robust deep learning model. However, few public datasets are available for rice-panicle phenotyping. We present a semi-supervised deep learning model training process, which greatly assists the annotation and refinement of training datasets. The model learns the panicle features with limited annotations and localizes more positive samples in the datasets, without further interaction. After the dataset refinement, the number of annotations increased by 40.6%. In addition, we trained and tested modern deep learning models to show how the dataset is beneficial to both detection and segmentation tasks. Results of our comparison experiments can inspire others in dataset preparation and model selection.
Style APA, Harvard, Vancouver, ISO itp.
20

Chang, Yingxiu, Yongqiang Cheng, John Murray, Shi Huang i Guangyi Shi. "The HDIN Dataset: A Real-World Indoor UAV Dataset with Multi-Task Labels for Visual-Based Navigation". Drones 6, nr 8 (11.08.2022): 202. http://dx.doi.org/10.3390/drones6080202.

Pełny tekst źródła
Streszczenie:
Supervised learning for Unmanned Aerial Vehicle (UAVs) visual-based navigation raises the need for reliable datasets with multi-task labels (e.g., classification and regression labels). However, current public datasets have limitations: (a) Outdoor datasets have limited generalization capability when being used to train indoor navigation models; (b) The range of multi-task labels, especially for regression tasks, are in different units which require additional transformation. In this paper, we present a Hull Drone Indoor Navigation (HDIN) dataset to improve the generalization capability for indoor visual-based navigation. Data were collected from the onboard sensors of a UAV. The scaling factor labeling method with three label types has been proposed to overcome the data jitters during collection and unidentical units of regression labels simultaneously. An open-source Convolutional Neural Network (i.e., DroNet) was employed as a baseline algorithm to retrain the proposed HDIN dataset, and compared with DroNet’s pretrained results on its original dataset since we have a similar data format and structure to the DroNet dataset. The results show that the labels in our dataset are reliable and consistent with the image samples.
Style APA, Harvard, Vancouver, ISO itp.
21

Sahiner, Berkman, Heang-Ping Chan i Lubomir Hadjiiski. "Classifier performance prediction for computer-aided diagnosis using a limited dataset". Medical Physics 35, nr 4 (24.03.2008): 1559–70. http://dx.doi.org/10.1118/1.2868757.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
22

Yu, Zhou, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang i Dacheng Tao. "ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 9127–34. http://dx.doi.org/10.1609/aaai.v33i01.33019127.

Pełny tekst źródła
Streszczenie:
Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
Style APA, Harvard, Vancouver, ISO itp.
23

AL-Banna, Alaa Ahmed, i Abeer K. AL-Mashhadany. "Natural Language Processing For Automatic text summarization [Datasets] - Survey". Wasit Journal of Computer and Mathematics Science 1, nr 4 (31.12.2022): 156–70. http://dx.doi.org/10.31185/wjcm.72.

Pełny tekst źródła
Streszczenie:
Natural language processing has developed significantly recently, which has progressed the text summarization task. It is no longer limited to reducing the text size or obtaining helpful information from a long document only. It has begun to be used in getting answers from summarization, measuring the quality of sentiment analysis systems, research and mining techniques, document categorization, and natural language Inference, which increased the importance of scientific research to get a good summary. This paper reviews the most used datasets in text summarization in different languages and types, with the most effective methods for each dataset. The results are shown using text summarization matrices. The review indicates that the pre-training models achieved the highest results in the summary measures in most of the researchers' works for the datasets. Dataset English made up about 75% of the databases available to researchers due to the extensive use of the English language. Other languages such as Arabic, Hindi, and others suffered from low resources of dataset sources, which limited progress in the academic field.
Style APA, Harvard, Vancouver, ISO itp.
24

Pasunuru, Ramakanth, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal i Jianfeng Gao. "Data Augmentation for Abstractive Query-Focused Multi-Document Summarization". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 15 (18.05.2021): 13666–74. http://dx.doi.org/10.1609/aaai.v35i15.17611.

Pełny tekst źródła
Streszczenie:
The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.
Style APA, Harvard, Vancouver, ISO itp.
25

Duong, Huu-Thanh, Tram-Anh Nguyen-Thi i Vinh Truong Hoang. "Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks". Complexity 2022 (30.06.2022): 1–14. http://dx.doi.org/10.1155/2022/3188449.

Pełny tekst źródła
Streszczenie:
The annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no or missing training data. Building these datasets is a tedious and expensive task and depends on the domains and languages. This is especially a big challenge for low-resource languages. In this paper, we experiment and evaluate many various approaches on sentiment analysis problems so that they can still obtain high performances under limited training data. This paper uses the preprocessing techniques to clean and normalize the data and generate the new samples from the limited training dataset based on many text augmentation techniques such as lexicon substitution, sentence shuffling, back translation, syntax-tree transformation, and embedding mixup. Several experiments have been performed for both well-known machine learning-based classifiers and deep learning models. We compare, analyze, and evaluate the results to indicate the advantage and disadvantage points of the techniques for each approach. The experimental results show that the data augmentation techniques enhance the accuracy of the predictive models; this promises that smart systems can be applied widely in several domains under limited training data.
Style APA, Harvard, Vancouver, ISO itp.
26

Sun, Cunwei, Yuxin Yang, Chang Wen, Kai Xie i Fangqing Wen. "Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning". Sensors 18, nr 7 (23.07.2018): 2399. http://dx.doi.org/10.3390/s18072399.

Pełny tekst źródła
Streszczenie:
The convolutional neural network (CNN) has made great strides in the area of voiceprint recognition; but it needs a huge number of data samples to train a deep neural network. In practice, it is too difficult to get a large number of training samples, and it cannot achieve a better convergence state due to the limited dataset. In order to solve this question, a new method using a deep migration hybrid model is put forward, which makes it easier to realize voiceprint recognition for small samples. Firstly, it uses Transfer Learning to transfer the trained network from the big sample voiceprint dataset to our limited voiceprint dataset for the further training. Fully-connected layers of a pre-training model are replaced by restricted Boltzmann machine layers. Secondly, the approach of Data Augmentation is adopted to increase the number of voiceprint datasets. Finally, we introduce fast batch normalization algorithms to improve the speed of the network convergence and shorten the training time. Our new voiceprint recognition approach uses the TLCNN-RBM (convolutional neural network mixed restricted Boltzmann machine based on transfer learning) model, which is the deep migration hybrid model that is used to achieve an average accuracy of over 97%, which is higher than that when using either CNN or the TL-CNN network (convolutional neural network based on transfer learning). Thus, an effective method for a small sample of voiceprint recognition has been provided.
Style APA, Harvard, Vancouver, ISO itp.
27

Hussain, Altaf, i Muhammad Aleem. "GoCJ: Google Cloud Jobs Dataset for Distributed and Cloud Computing Infrastructures". Data 3, nr 4 (28.09.2018): 38. http://dx.doi.org/10.3390/data3040038.

Pełny tekst źródła
Streszczenie:
Developers of resource-allocation and scheduling algorithms share test datasets (i.e., benchmarks) to enable others to compare the performance of newly developed algorithms. However, mostly it is hard to acquire real cloud datasets due to the users’ data confidentiality issues and policies maintained by Cloud Service Providers (CSP). Accessibility of large-scale test datasets, depicting the realistic high-performance computing requirements of cloud users, is very limited. Therefore, the publicly available real cloud dataset will significantly encourage other researchers to compare and benchmark their applications using an open-source benchmark. To meet these objectives, the contemporary state of the art has been scrutinized to explore a real workload behavior in Google cluster traces. Starting from smaller- to moderate-size cloud computing infrastructures, the dataset generation process is demonstrated using the Monte Carlo simulation method to produce a Google Cloud Jobs (GoCJ) dataset based on the analysis of Google cluster traces. With this article, the dataset is made publicly available to enable other researchers in the field to investigate and benchmark their scheduling and resource-allocation schemes for the cloud. The GoCJ dataset is archived and available on the Mendeley Data repository.
Style APA, Harvard, Vancouver, ISO itp.
28

Lee, JoonHo, Joonseok Lee, Sooah Cho, JiEun Song, Minyoung Lee, Sung Ho Kim, Jin Young Lee i in. "Development of Decision Support Software for Deep Learning-Based Automated Retinal Disease Screening Using Relatively Limited Fundus Photograph Data". Electronics 10, nr 2 (13.01.2021): 163. http://dx.doi.org/10.3390/electronics10020163.

Pełny tekst źródła
Streszczenie:
Purpose—This study was conducted to develop an automated detection algorithm for screening fundus abnormalities, including age-related macular degeneration (AMD), diabetic retinopathy (DR), epiretinal membrane (ERM), retinal vascular occlusion (RVO), and suspected glaucoma among health screening program participants. Methods—The development dataset consisted of 43,221 retinal fundus photographs (from 25,564 participants, mean age 53.38 ± 10.97 years, female 39.0%) from a health screening program and patients of the Kangbuk Samsung Hospital Ophthalmology Department from 2006 to 2017. We evaluated our screening algorithm on independent validation datasets. Five separate one-versus-rest (OVR) classification algorithms based on deep convolutional neural networks (CNNs) were trained to detect AMD, ERM, DR, RVO, and suspected glaucoma. The ground truth for both development and validation datasets was graded at least two times by three ophthalmologists. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were calculated for each disease, as well as their macro-averages. Results—For the internal validation dataset, the average sensitivity was 0.9098 (95% confidence interval (CI), 0.8660–0.9536), the average specificity was 0.9079 (95% CI, 0.8576–0.9582), and the overall accuracy was 0.9092 (95% CI, 0.8769–0.9415). For the external validation dataset consisting of 1698 images, the average of the AUCs was 0.9025 (95% CI, 0.8671–0.9379). Conclusions—Our algorithm had high sensitivity and specificity for detecting major fundus abnormalities. Our study will facilitate expansion of the applications of deep learning-based computer-aided diagnostic decision support tools in actual clinical settings. Further research is needed to improved generalization for this algorithm.
Style APA, Harvard, Vancouver, ISO itp.
29

Bhattacharya, Ayan. "Posteriors in Limited Time". AppliedMath 2, nr 4 (12.12.2022): 700–710. http://dx.doi.org/10.3390/appliedmath2040041.

Pełny tekst źródła
Streszczenie:
This paper obtains a measure-theoretic restriction that must be satisfied by a prior probability measure for posteriors to be computed in limited time. Specifically, it is shown that the prior must be factorizable. Factorizability is a set of independence conditions for events in the sample space that allows agents to calculate posteriors using only a subset of the dataset. The result has important implications for models in mathematical economics and finance that rely on a common prior. If one introduces the limited time restriction to Aumann’s famous Agreeing to Disagree setup, one sees that checking for factorizability requires agents to have access to every event in the measure space, thus severely limiting the scope of the agreement result.
Style APA, Harvard, Vancouver, ISO itp.
30

Chen, Tingkai, Ning Wang, Rongfeng Wang, Hong Zhao i Guichen Zhang. "One-stage CNN detector-based benthonic organisms detection with limited training dataset". Neural Networks 144 (grudzień 2021): 247–59. http://dx.doi.org/10.1016/j.neunet.2021.08.014.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Ahmed, Nehal K., Elsayed E. Hemayed i Magda B. Fayek. "Hybrid Siamese Network for Unconstrained Face Verification and Clustering under Limited Resources". Big Data and Cognitive Computing 4, nr 3 (6.08.2020): 19. http://dx.doi.org/10.3390/bdcc4030019.

Pełny tekst źródła
Streszczenie:
In this paper, we propose an unconstrained face verification approach that is dependent on Hybrid Siamese architecture under limited resources. The general face verification trend suggests that larger training datasets and/or complex architectures lead to higher accuracy. The proposed approach tends to achieve high accuracy while using a small dataset and a simple architecture by directly learn face’s similarity/dissimilarity from raw face pixels, which is critical for various applications. The proposed architecture has two branches; the first part of these branches is trained independently, while the other parts shared their parameters. A multi-batch algorithm is utilized for training. The training process takes a few hours on a single GPU. The proposed approach achieves near-human accuracy (98.9%) on the Labeled Faces in the Wild (LFW) benchmark, which is competitive with other techniques that are presented in the literature. It reaches 99.1% on the Arabian faces dataset. Moreover, features learned by the proposed architecture are used in building a face clustering system that is based on an updated version of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). To handle the cluster quality challenge, a novel post-clustering optimization procedure is proposed. It outperforms popular clustering approaches, like K-Means and spectral by 0.098 and up to 0.344 according to F1-measure.
Style APA, Harvard, Vancouver, ISO itp.
32

Zuo, Mei, i Yang Zhang. "Dataset-aware multi-task learning approaches for biomedical named entity recognition". Bioinformatics 36, nr 15 (16.05.2020): 4331–38. http://dx.doi.org/10.1093/bioinformatics/btaa515.

Pełny tekst źródła
Streszczenie:
Abstract Motivation Named entity recognition is a critical and fundamental task for biomedical text mining. Recently, researchers have focused on exploiting deep neural networks for biomedical named entity recognition (Bio-NER). The performance of deep neural networks on a single dataset mostly depends on data quality and quantity while high-quality data tends to be limited in size. To alleviate task-specific data limitation, some studies explored the multi-task learning (MTL) for Bio-NER and achieved state-of-the-art performance. However, these MTL methods did not make full use of information from various datasets of Bio-NER. The performance of state-of-the-art MTL method was significantly limited by the number of training datasets. Results We propose two dataset-aware MTL approaches for Bio-NER which jointly train all models for numerous Bio-NER datasets, thus each of these models could discriminatively exploit information from all of related training datasets. Both of our two approaches achieve substantially better performance compared with the state-of-the-art MTL method on 14 out of 15 Bio-NER datasets. Furthermore, we implemented our approaches by incorporating Bio-NER and biomedical part-of-speech (POS) tagging datasets. The results verify Bio-NER and POS can significantly enhance one another. Availability and implementation Our source code is available at https://github.com/zmmzGitHub/MTL-BC-LBC-BioNER and all datasets are publicly available at https://github.com/cambridgeltl/MTL-Bioinformatics-2016. Supplementary information Supplementary data are available at Bioinformatics online.
Style APA, Harvard, Vancouver, ISO itp.
33

Kim, Minjeong, Yujung Gil, Yuyeon Kim i Jihie Kim. "Deep-Learning-Based Scalp Image Analysis Using Limited Data". Electronics 12, nr 6 (14.03.2023): 1380. http://dx.doi.org/10.3390/electronics12061380.

Pełny tekst źródła
Streszczenie:
The World Health Organization and Korea National Health Insurance assert that the number of alopecia patients is increasing every year, and approximately 70 percent of adults suffer from scalp problems. Although alopecia is a genetic problem, it is difficult to diagnose at an early stage. Although deep-learning-based approaches have been effective for medical image analyses, it is challenging to generate deep learning models for alopecia detection and analysis because creating an alopecia image dataset is challenging. In this paper, we present an approach for generating a model specialized for alopecia analysis that achieves high accuracy by applying data preprocessing, data augmentation, and an ensemble of deep learning models that have been effective for medical image analyses. We use an alopecia image dataset containing 526 good, 13,156 mild, 3742 moderate, and 825 severe alopecia images. The dataset was further augmented by applying normalization, geometry-based augmentation (rotate, vertical flip, horizontal flip, crop, and affine transformation), and PCA augmentation. We compare the performance of a single deep learning model using ResNet, ResNeXt, DenseNet, XceptionNet, and ensembles of these models. The best result was achieved when DenseNet, XceptionNet, and ResNet were combined to achieve an accuracy of 95.75 and an F1 score of 87.05.
Style APA, Harvard, Vancouver, ISO itp.
34

Demuynck, Thomas, i Christian Seel. "Revealed Preference with Limited Consideration". American Economic Journal: Microeconomics 10, nr 1 (1.02.2018): 102–31. http://dx.doi.org/10.1257/mic.20150343.

Pełny tekst źródła
Streszczenie:
We derive revealed preference tests for models where individuals use consideration sets to simplify their consumption problem. Our basic test provides necessary and sufficient conditions for consistency of observed choices with the existence of consideration set restrictions. The same conditions can also be derived from a model in which the consideration set formation is endogenous and based on subjective prices. By imposing restrictions on these subjective prices, we obtain additional refined revealed preference tests. We illustrate and compare the performance of our tests by means of a dataset on household consumption choices. (JEL D11, D12, M31)
Style APA, Harvard, Vancouver, ISO itp.
35

Perera, Asanka G., Yee Wei Law i Javaan Chahl. "Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition". Drones 3, nr 4 (28.11.2019): 82. http://dx.doi.org/10.3390/drones3040082.

Pełny tekst źródła
Streszczenie:
Aerial human action recognition is an emerging topic in drone applications. Commercial drone platforms capable of detecting basic human actions such as hand gestures have been developed. However, a limited number of aerial video datasets are available to support increased research into aerial human action analysis. Most of the datasets are confined to indoor scenes or object tracking and many outdoor datasets do not have sufficient human body details to apply state-of-the-art machine learning techniques. To fill this gap and enable research in wider application areas, we present an action recognition dataset recorded in an outdoor setting. A free flying drone was used to record 13 dynamic human actions. The dataset contains 240 high-definition video clips consisting of 66,919 frames. All of the videos were recorded from low-altitude and at low speed to capture the maximum human pose details with relatively high resolution. This dataset should be useful to many research areas, including action recognition, surveillance, situational awareness, and gait analysis. To test the dataset, we evaluated the dataset with a pose-based convolutional neural network (P-CNN) and high-level pose feature (HLPF) descriptors. The overall baseline action recognition accuracy calculated using P-CNN was 75.92%.
Style APA, Harvard, Vancouver, ISO itp.
36

Moon, Myungjin, i Kenta Nakai. "Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers". Journal of Bioinformatics and Computational Biology 16, nr 02 (kwiecień 2018): 1850006. http://dx.doi.org/10.1142/s0219720018500063.

Pełny tekst źródła
Streszczenie:
Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box–Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Style APA, Harvard, Vancouver, ISO itp.
37

Batanović, Vuk, Miloš Cvetanović i Boško Nikolić. "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts". PLOS ONE 15, nr 11 (12.11.2020): e0242050. http://dx.doi.org/10.1371/journal.pone.0242050.

Pełny tekst źródła
Streszczenie:
Choosing a comprehensive and cost-effective way of articulating and annotating the sentiment of a text is not a trivial task, particularly when dealing with short texts, in which sentiment can be expressed through a wide variety of linguistic and rhetorical phenomena. This problem is especially conspicuous in resource-limited settings and languages, where design options are restricted either in terms of manpower and financial means required to produce appropriate sentiment analysis resources, or in terms of available language tools, or both. In this paper, we present a versatile approach to addressing this issue, based on multiple interpretations of sentiment labels that encode information regarding the polarity, subjectivity, and ambiguity of a text, as well as the presence of sarcasm or a mixture of sentiments. We demonstrate its use on Serbian, a resource-limited language, via the creation of a main sentiment analysis dataset focused on movie comments, and two smaller datasets belonging to the movie and book domains. In addition to measuring the quality of the annotation process, we propose a novel metric to validate its cost-effectiveness. Finally, the practicality of our approach is further validated by training, evaluating, and determining the optimal configurations of several different kinds of machine-learning models on a range of sentiment classification tasks using the produced dataset.
Style APA, Harvard, Vancouver, ISO itp.
38

Wang, Xiao, Zheng Wang, Toshihiko Yamasaki i Wenjun Zeng. "Very Important Person Localization in Unconstrained Conditions: A New Benchmark". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 4 (18.05.2021): 2809–16. http://dx.doi.org/10.1609/aaai.v35i4.16386.

Pełny tekst źródła
Streszczenie:
This paper presents a new high-quality dataset for Very Important Person Localization (VIPLoc), named Unconstrained-7k. Generally, current datasets: 1) are limited in scale; 2) built under simple and constrained conditions, where the number of disturbing non-VIPs is not large, the scene is relatively simple, and the face of VIP is always in frontal view and salient. To tackle these problems, the proposed Unconstrained-7k dataset is featured in two aspects. First, it contains over 7,000 annotated images, making it the largest VIPLoc dataset under unconstrained conditions to date. Second, our dataset is collected freely on the Internet, including multiple scenes, where images are in unconstrained conditions. VIPs in the new dataset are in different settings, e.g., large view variation, varying sizes, occluded, and complex scenes. Meanwhile, each image has more persons (> 20), making the dataset more challenging. As a minor contribution, motivated by the observation that VIPs are highly related to not only neighbors but also iconic objects, this paper proposes a Joint Social Relation and Individual Interaction Graph Neural Networks (JSRII-GNN) for VIPLoc. Experiments show that the JSRII-GNN yields competitive accuracy on NCAA (National Collegiate Athletic Association), MS (Multi-scene), and Unconstrained-7k datasets. https://github.com/xiaowang1516/VIPLoc.
Style APA, Harvard, Vancouver, ISO itp.
39

Mostofi, Fatemeh, Vedat Toğan i Hasan Basri Başağa. "Real-estate price prediction with deep neural network and principal component analysis". Organization, Technology and Management in Construction: an International Journal 14, nr 1 (1.01.2022): 2741–59. http://dx.doi.org/10.2478/otmcj-2022-0016.

Pełny tekst źródła
Streszczenie:
Abstract Despite the wide application of deep neural networks (DNN) models, their application over small-sized real-estate price prediction is limited due to the reduced prediction accuracy and the high-dimensionality of the dataset. This study motivates small-sized real-estate agencies to take DNN-driven decisions using the available local dataset. To improve the high-dimensionality of real-estate price datasets and thus enhance the price-prediction accuracy of a DNN model, this paper adopts principal component analysis (PCA). The PCA benefits in improving the prediction accuracy of a DNN model are threefold: dimensionality reduction, dataset transformation and localisation of influential price features. The results indicate that, through the PCA-DNN model, the transformed dataset achieves higher accuracy (90%–95%) and better generalisation ability compared with other benchmark price predictors. The spatial and building age proved to have the most impact in determining the overall real-estate price. The application of PCA not only reduces the high-dimensionality of the dataset but also enhances the quality of the encoded feature attributes. The model is beneficial in real-estate and construction applications, where the absence of medium and big datasets decreases the price-prediction accuracy.
Style APA, Harvard, Vancouver, ISO itp.
40

Abo Zidan, Rawan, i George Karraz. "Gaussian Pyramid for Nonlinear Support Vector Machine". Applied Computational Intelligence and Soft Computing 2022 (31.05.2022): 1–9. http://dx.doi.org/10.1155/2022/5255346.

Pełny tekst źródła
Streszczenie:
Support vector machine (SVM) is one of the most efficient machine learning tools, and it is fast, simple to use, reliable, and provides accurate classification results. Despite its generalization capability, SVM is usually posed as a quadratic programming (QP) problem to find a separation hyperplane in nonlinear cases. This needs huge quantities of computational time and memory for large datasets, even for moderately sized ones. SVM could be used for classification tasks whose number of samples is limited but does not scale well to large datasets. The idea is to solve this problem by a smoothing technique to get a new smaller dataset representing the original one. This paper proposes a fast and less time and memory-consuming algorithm to solve the problems represented by a nonlinear support vector machine tool, based on generating a Gaussian pyramid to minimize the size of the dataset. The reduce operation between dataset points and the Gaussian pyramid is reformulated to get a smoothed copy of the original dataset. The new dataset points after passing the Gaussian pyramid will be closed to each other, and this will minimize the degree of nonlinearity in the dataset, and it will be 1/4 of the size of the original large dataset. The experiments demonstrate that our proposed techniques can reduce the classical SVM tool complexity, more accurately, and are applicable in real time.
Style APA, Harvard, Vancouver, ISO itp.
41

Vobecký, Antonín, David Hurych, Michal Uřičář, Patrick Pérez i Josef Sivic. "Artificial Dummies for Urban Dataset Augmentation". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 3 (18.05.2021): 2692–700. http://dx.doi.org/10.1609/aaai.v35i3.16373.

Pełny tekst źródła
Streszczenie:
Existing datasets for training pedestrian detectors in images suffer from limited appearance and pose variation. The most challenging scenarios are rarely included because they are too difficult to capture due to safety reasons, or they are very unlikely to happen. The strict safety requirements in assisted and autonomous driving applications call for an extra high detection accuracy also in these rare situations. Having the ability to generate people images in arbitrary poses, with arbitrary appearances and embedded in different background scenes with varying illumination and weather conditions, is a crucial component for the development and testing of such applications. The contributions of this paper are three-fold. First, we describe an augmentation method for the controlled synthesis of urban scenes containing people, thus producing rare or never-seen situations. This is achieved with a data generator (called DummyNet) with disentangled control of the pose, the appearance, and the target background scene. Second, the proposed generator relies on novel network architecture and associated loss that takes into account the segmentation of the foreground person and its composition into the background scene. Finally, we demonstrate that the data generated by our DummyNet improve the performance of several existing person detectors across various datasets as well as in challenging situations, such as night-time conditions, where only a limited amount of training data is available. In the setup with only day-time data available, we improve the night-time detector by 17% log-average miss rate over the detector trained with the day-time data only.
Style APA, Harvard, Vancouver, ISO itp.
42

Deng, Fei, Shengliang Pu, Xuehong Chen, Yusheng Shi, Ting Yuan i Shengyan Pu. "Hyperspectral Image Classification with Capsule Network Using Limited Training Samples". Sensors 18, nr 9 (18.09.2018): 3153. http://dx.doi.org/10.3390/s18093153.

Pełny tekst źródła
Streszczenie:
Deep learning techniques have boosted the performance of hyperspectral image (HSI) classification. In particular, convolutional neural networks (CNNs) have shown superior performance to that of the conventional machine learning algorithms. Recently, a novel type of neural networks called capsule networks (CapsNets) was presented to improve the most advanced CNNs. In this paper, we present a modified two-layer CapsNet with limited training samples for HSI classification, which is inspired by the comparability and simplicity of the shallower deep learning models. The presented CapsNet is trained using two real HSI datasets, i.e., the PaviaU (PU) and SalinasA datasets, representing complex and simple datasets, respectively, and which are used to investigate the robustness or representation of every model or classifier. In addition, a comparable paradigm of network architecture design has been proposed for the comparison of CNN and CapsNet. Experiments demonstrate that CapsNet shows better accuracy and convergence behavior for the complex data than the state-of-the-art CNN. For CapsNet using the PU dataset, the Kappa coefficient, overall accuracy, and average accuracy are 0.9456, 95.90%, and 96.27%, respectively, compared to the corresponding values yielded by CNN of 0.9345, 95.11%, and 95.63%. Moreover, we observed that CapsNet has much higher confidence for the predicted probabilities. Subsequently, this finding was analyzed and discussed with probability maps and uncertainty analysis. In terms of the existing literature, CapsNet provides promising results and explicit merits in comparison with CNN and two baseline classifiers, i.e., random forests (RFs) and support vector machines (SVMs).
Style APA, Harvard, Vancouver, ISO itp.
43

Tachibana, Rie, Janne J. Näppi, Toru Hironaka i Hiroyuki Yoshida. "Self-Supervised Adversarial Learning with a Limited Dataset for Electronic Cleansing in Computed Tomographic Colonography: A Preliminary Feasibility Study". Cancers 14, nr 17 (26.08.2022): 4125. http://dx.doi.org/10.3390/cancers14174125.

Pełny tekst źródła
Streszczenie:
Existing electronic cleansing (EC) methods for computed tomographic colonography (CTC) are generally based on image segmentation, which limits their accuracy to that of the underlying voxels. Because of the limitations of the available CTC datasets for training, traditional deep learning is of limited use in EC. The purpose of this study was to evaluate the technical feasibility of using a novel self-supervised adversarial learning scheme to perform EC with a limited training dataset with subvoxel accuracy. A three-dimensional (3D) generative adversarial network (3D GAN) was pre-trained to perform EC on CTC datasets of an anthropomorphic phantom. The 3D GAN was then fine-tuned to each input case by use of the self-supervised scheme. The architecture of the 3D GAN was optimized by use of a phantom study. The visually perceived quality of the virtual cleansing by the resulting 3D GAN compared favorably to that of commercial EC software on the virtual 3D fly-through examinations of 18 clinical CTC cases. Thus, the proposed self-supervised 3D GAN, which can be trained to perform EC on a small dataset without image annotations with subvoxel accuracy, is a potentially effective approach for addressing the remaining technical problems of EC in CTC.
Style APA, Harvard, Vancouver, ISO itp.
44

Jin, Ye-Ji, Erkinov Habibilloh, Ye-Seul Jang, Taejun An, Donghyun Jo, Saron Park i Won-Du Chang. "A Photoplethysmogram Dataset for Emotional Analysis". Applied Sciences 12, nr 13 (28.06.2022): 6544. http://dx.doi.org/10.3390/app12136544.

Pełny tekst źródła
Streszczenie:
In recent years, research on emotion classification based on physiological signals has actively attracted scholars’ attention worldwide. Several studies and experiments have been conducted to analyze human emotions based on physiological signals, including the use of electrocardiograms (ECGs), electroencephalograms (EEGs), and photoplethysmograms (PPGs). Although the achievements with ECGs and EEGs are progressive, reaching higher accuracies over 90%, the number of studies utilizing PPGs are limited and their accuracies are relatively lower than other signals. One of the difficulties in studying PPGs for emotional analysis is the lack of open datasets (there is a single dataset to the best of the authors). This study introduces a new PPG dataset for emotional analysis. A total of 72 PPGs were recorded from 18 participants while watching short video clips and analyzed in time and frequency domains. Moreover, emotional classification accuracies with the presented dataset were presented with various neural network structures. The results prove that this dataset can be used for further emotional analysis with PPGs.
Style APA, Harvard, Vancouver, ISO itp.
45

Tomaszewski, Michał, Paweł Michalski i Jakub Osuchowski. "Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset". Applied Sciences 10, nr 6 (20.03.2020): 2104. http://dx.doi.org/10.3390/app10062104.

Pełny tekst źródła
Streszczenie:
This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.
Style APA, Harvard, Vancouver, ISO itp.
46

Majid, Haneen, i Khawla Ali. "Expanding New Covid-19 Data with Conditional Generative Adversarial Networks". Iraqi Journal for Electrical and Electronic Engineering 18, nr 1 (4.04.2022): 103–10. http://dx.doi.org/10.37917/ijeee.18.1.12.

Pełny tekst źródła
Streszczenie:
COVID-19 is an infectious viral disease that mostly affects the lungs. That quickly spreads across the world. Early detection of the virus boosts the chances of patients recovering quickly worldwide. Many radiographic techniques are used to diagnose an infected person such as X-rays, deep learning technology based on a large amount of chest x-ray images is used to diagnose COVID-19 disease. Because of the scarcity of available COVID-19 X-rays image, the limited COVID-19 Datasets are insufficient for efficient deep learning detection models. Another problem with a limited dataset is that training models suffer from over-fitting, and the predictions are not generalizable to address these problems. In this paper, we developed Conditional Generative Adversarial Networks (CGAN) to produce synthetic images close to real images for the COVID-19 case and traditional augmentation that was used to expand the limited dataset then used to train by Customized deep detection model. The Customized Deep learning model was able to obtain excellent detection accuracy of 97% accurate with only ten epochs. The proposed augmentation outperforms other augmentation techniques. The augmented dataset includes 6988 high-quality and resolution COVID-19 X-rays images. At the same time, the original COVID-19 X-rays images are only 587.
Style APA, Harvard, Vancouver, ISO itp.
47

Pushpanathan, Kalananthni, Marsyita Hanafi, Syamsiah Masohor i Wan Fazilah Fazlil Ilahi. "MYLPHerb-1: A Dataset of Malaysian Local Perennial Herbs for the Study of Plant Images Classification under Uncontrolled Environment". Pertanika Journal of Science and Technology 30, nr 1 (4.01.2022): 413–31. http://dx.doi.org/10.47836/pjst.30.1.23.

Pełny tekst źródła
Streszczenie:
Research in the medicinal plants’ recognition field has received great attention due to the need of producing a reliable and accurate system that can recognise medicinal plants under various imaging conditions. Nevertheless, the standard medicinal plant datasets publicly available for research are very limited. This paper proposes a dataset consisting of 34200 images of twelve different high medicinal value local perennial herbs in Malaysia. The images were captured under various imaging conditions, such as different scales, illuminations, and angles. It will enable larger interclass and intraclass variability, creating abundant opportunities for new findings in leaf classification. The complexity of the dataset is investigated through automatic classification using several high-performance deep learning algorithms. The experiment results showed that the dataset creates more opportunities for advanced classification research due to the complexity of the images. The dataset can be accessed through https://www.mylpherbs.com/.
Style APA, Harvard, Vancouver, ISO itp.
48

YU, HUI, KANG TU, LU XIE i YUAN-YUAN LI. "DIGOUT: VIEWING DIFFERENTIAL EXPRESSION GENES AS OUTLIERS". Journal of Bioinformatics and Computational Biology 08, supp01 (grudzień 2010): 161–75. http://dx.doi.org/10.1142/s0219720010005208.

Pełny tekst źródła
Streszczenie:
With regards to well-replicated two-conditional microarray datasets, the selection of differentially expressed (DE) genes is a well-studied computational topic, but for multi-conditional microarray datasets with limited or no replication, the same task is not properly addressed by previous studies. This paper adopts multivariate outlier analysis to analyze replication-lacking multi-conditional microarray datasets, finding that it performs significantly better than the widely used limit fold change (LFC) model in a simulated comparative experiment. Compared with the LFC model, the multivariate outlier analysis also demonstrates improved stability against sample variations in a series of manipulated real expression datasets. The reanalysis of a real non-replicated multi-conditional expression dataset series leads to satisfactory results. In conclusion, a multivariate outlier analysis algorithm, like DigOut, is particularly useful for selecting DE genes from non-replicated multi-conditional gene expression dataset.
Style APA, Harvard, Vancouver, ISO itp.
49

Hemedan, Ahmed Abdelmonem, Aboul Ella Hassanien, Mohamed Hamed N. Taha i Nour Eldeen Mahmoud Khalifa. "Deep bacteria: robust deep learning data augmentation design for limited bacterial colony dataset". International Journal of Reasoning-based Intelligent Systems 11, nr 3 (2019): 256. http://dx.doi.org/10.1504/ijris.2019.10023444.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Khalifa, Nour Eldeen Mahmoud, Mohamed Hamed N. Taha, Aboul Ella Hassanien i Ahmed Abdelmonem Hemedan. "Deep bacteria: robust deep learning data augmentation design for limited bacterial colony dataset". International Journal of Reasoning-based Intelligent Systems 11, nr 3 (2019): 256. http://dx.doi.org/10.1504/ijris.2019.102610.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii