Journal articles on the topic 'Small datasets'

To see the other types of publications on this topic, follow the link: Small datasets.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Small datasets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Agliari, Elena, Francesco Alemanno, Miriam Aquaro, Adriano Barra, Fabrizio Durante, and Ido Kanter. "Hebbian dreaming for small datasets." Neural Networks 173 (May 2024): 106174. http://dx.doi.org/10.1016/j.neunet.2024.106174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ingrassia, Salvatore, and Isabella Morlini. "Neural Network Modeling for Small Datasets." Technometrics 47, no. 3 (August 2005): 297–311. http://dx.doi.org/10.1198/004017005000000058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ricchiuto, Piero, Judy C. G. Sng, and Wilson Wen Bin Goh. "Analysing extremely small sized ratio datasets." International Journal of Bioinformatics Research and Applications 11, no. 3 (2015): 268. http://dx.doi.org/10.1504/ijbra.2015.069225.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tuomo, Alasalmi, Jaakko Suutala, Juha Röning, and Heli Koskimäki. "Better Classifier Calibration for Small Datasets." ACM Transactions on Knowledge Discovery from Data 14, no. 3 (May 14, 2020): 1–19. http://dx.doi.org/10.1145/3385656.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Montalvão, J., R. Attux, and D. G. Silva. "Simple entropy estimator for small datasets." Electronics Letters 48, no. 17 (August 16, 2012): 1059–61. http://dx.doi.org/10.1049/el.2012.2002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Khobragade, Vandana, M. S. Pradeep Kumar Patnaik, and Srinivasa Rao Sura. "Revaluating Pretraining in Small Size Training Sample Regime." International Journal of Electrical and Electronics Research 10, no. 3 (September 30, 2022): 694–704. http://dx.doi.org/10.37391/ijeer.100346.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Deep neural network (DNN) based models are highly acclaimed in medical image classification. The existing DNN architectures are claimed to be at the forefront of image classification. These models require very large datasets to classify the images with a high level of accuracy. However, fail to perform when trained on datasets of small size. Low accuracy and overfitting are the problems observed when medical datasets of small sizes are used to train a classifier using deep learning models such as Convolutional Neural Networks (CNN). These existing methods and models either always overfit when training on these small datasets or will result in classification accuracy which tends towards randomness. This issue stands even when using Transfer Learning (TL), the current standard for such a scenario. In this paper, we have tested several models including ResNet and VGGs along with more modern models like MobileNets on different medical datasets with transfer learning and without transfer learning. We have proposed solid theories as to why there exists a need for a more novel approach to this issue, and how the current methodologies fail when applied to the aforementioned datasets. Larger, more complex models are not able to converge for smaller datasets. Smaller models with less complexity perform better on the same dataset than their larger model counterparts.
7

Burmakova, Anastasiya, and Diana Kalibatienė. "Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment." Applied Sciences 12, no. 16 (August 18, 2022): 8252. http://dx.doi.org/10.3390/app12168252.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Applying machine learning (ML) and fuzzy inference systems (FIS) requires large datasets to obtain more accurate predictions. However, in the cases of oil spills on ground environments, only small datasets are available. Therefore, this research aims to assess the suitability of ML techniques and FIS for the prediction of the consequences of oil spills on ground environments using small datasets. Consequently, we present a hybrid approach for assessing the suitability of ML (Linear Regression, Decision Trees, Support Vector Regression, Ensembles, and Gaussian Process Regression) and the adaptive neural fuzzy inference system (ANFIS) for predicting the consequences of oil spills with a small dataset. This paper proposes enlarging the initial small dataset of an oil spill on a ground environment by using the synthetic data generated by applying a mathematical model. ML techniques and ANFIS were tested with the same generated synthetic datasets to assess the proposed approach. The proposed ANFIS-based approach shows significant performance and sufficient efficiency for predicting the consequences of oil spills on ground environments with a smaller dataset than the applied ML techniques. The main finding of this paper indicates that FIS is suitable for prediction with a small dataset and provides sufficiently accurate prediction results.
8

Jamjoom, Mona. "The pertinent single-attribute-based classifier for small datasets classification." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 3 (June 1, 2020): 3227. http://dx.doi.org/10.11591/ijece.v10i3.pp3227-3234.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attribute-based-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
9

Petráš, Jaroslav, Marek Pavlík, Ján Zbojovský, Ardian Hyseni, and Jozef Dudiak. "Benford’s Law in Electric Distribution Network." Mathematics 11, no. 18 (September 10, 2023): 3863. http://dx.doi.org/10.3390/math11183863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Benford’s law can be used as a method to detect non-natural changes in data sets with certain properties; in our case, the dataset was collected from electricity metering devices. In this paper, we present a theoretical background behind this law. We applied Benford’s law first digit probability distribution test for electricity metering data sets acquired from smart electricity meters, i.e., the natural data of electricity consumption acquired during a specific time interval. We present the results of Benford’s law distribution for an original measured dataset with no artificial intervention and a set of results for different kinds of affected datasets created by simulated artificial intervention. Comparing these two dataset types with each other and with the theoretical probability distribution provided us the proof that with this kind of data, Benford’s law can be applied and that it can extract the dataset’s artificial manipulation markers. As presented in the results part of the article, non-affected datasets mostly have a deviation from BL theoretical probability values below 10%, rarely between 10% and 20%. On the other side, simulated affected datasets show deviations mostly above 20%, often approximately 70%, but rarely lower than 20%, and this only in the case of affecting a small part of the original dataset (10%), which represents only a small magnitude of intervention.
10

Andonie, Răzvan. "Extreme Data Mining: Inference from Small Datasets." International Journal of Computers Communications & Control 5, no. 3 (September 1, 2010): 280. http://dx.doi.org/10.15837/ijccc.2010.3.2481.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<p>Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.<br /> We have the following goals: i. To discuss the meaning of "small" in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application.</p>
11

Ku, C. J., and T. L. Fine. "A Bayesian Independence Test for Small Datasets." IEEE Transactions on Signal Processing 54, no. 10 (October 2006): 4026–31. http://dx.doi.org/10.1109/tsp.2006.880243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Li, Der-Chiang, Hung-Yu Chen, and Qi-Shi Shi. "Learning from small datasets containing nominal attributes." Neurocomputing 291 (May 2018): 226–36. http://dx.doi.org/10.1016/j.neucom.2018.02.069.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Xu, Weihuang, Guohao Yu, Alina Zare, Brendan Zurweller, Diane L. Rowland, Joel Reyes-Cabrera, Felix B. Fritschi, Roser Matamala, and Thomas E. Juenger. "Overcoming small minirhizotron datasets using transfer learning." Computers and Electronics in Agriculture 175 (August 2020): 105466. http://dx.doi.org/10.1016/j.compag.2020.105466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Xu, Zi’an, Yin Dai, Fayu Liu, Weibing Chen, Yue Liu, Lifu Shi, Sheng Liu, and Yuhang Zhou. "Swin MAE: Masked autoencoders for small datasets." Computers in Biology and Medicine 161 (July 2023): 107037. http://dx.doi.org/10.1016/j.compbiomed.2023.107037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Bhalla, Vandna. "INNOVATIVE MODEL TO AUGMENT SMALL DATASETS FOR CLASSIFICATION." International Journal of Advanced Research 11, no. 04 (April 30, 2023): 313–19. http://dx.doi.org/10.21474/ijar01/16658.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A vast number of applications do not have ample training data and consequently the accuracies suffer. There is a need to develop a technique to optimally and intelligently augment such datasets and design an automated classifier which gives good performance despite the smallness of dataset size. Meticulously stored data will ease retrieval. Artificial Immune System (AIS) is one amongst many computational algorithms in literature that are inspired by the dynamic learning mechanism of the human system. AIS based classification algorithm was proposed initially as one of the machine learning techniques which is suited for supervised learning problems. There are various applications areas of this powerful algorithm. We present a novel technique inspired by clonal selection to augment small datasets and classification.
16

Keum, Bitna, Juoh Sun, Woojin Lee, Seongheum Park, and Harksoo Kim. "Persona-Identified Chatbot through Small-Scale Modeling and Data Transformation." Electronics 13, no. 8 (April 9, 2024): 1409. http://dx.doi.org/10.3390/electronics13081409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Research on chatbots aimed at facilitating more natural and engaging conversations is actively underway. With the growing recognition of the significance of personas in this context, persona-based conversational research is gaining prominence. Despite the abundance of publicly available chit-chat datasets, persona-based chat datasets remain scarce, primarily due to the higher associated costs. Consequently, we propose a methodology for transforming extensive chit-chat datasets into persona-based chat datasets. Simultaneously, we propose a model adept at effectively incorporating personas into responses, even with a constrained number of parameters. This model can discern the most relevant information from persona memory without resorting to a retrieval model. Furthermore, it makes decisions regarding whether to reference the memory, thereby enhancing the interpretability of the model’s judgments. Our CC2PC framework demonstrates superior performance in both automatic and LLM evaluations when compared to high-cost persona-based chat dataset. Additionally, experimental results on the proposed model indicate the improved persona-based response capabilities.
17

Bao, Yan, Frank Heilig, Chuo-Hsuan Lee, and Edward J. Lusk. "Full Range Testing of the Small Size Effect Bias for Benford Screening: A Note." International Journal of Economics and Finance 10, no. 6 (April 30, 2018): 47. http://dx.doi.org/10.5539/ijef.v10n6p47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Bao, Lee, Heilig, and Lusk (2018) have documented and illustrated the Small Sample Size bias in Benford Screening of datasets for Non-Conformity. However, their sampling plan tested only a few random sample-bundles from a core set of data that were clearly Conforming to the Benford first digit profile. We extended their study using the same core datasets and DSS, called the Newcomb Benford Decision Support Systems Profiler [NBDSSP], to create an expanded set of random samples from their core sample. Specifically, we took repeated random samples in blocks of 10 down to 5% from their core-set of data in increments of 5% and finished with a random sample of 1%, 0.5% & 20 thus creating 221 sample-bundles. This arm focuses on the False Positive Signaling Error [FPSE]—i.e., believing that the sampled dataset is Non-Conforming when it, in fact, comes from a Conforming set of data. The second arm used the Hill Lottery dataset, argued and tested as Non-Conforming; we will use the same iteration model noted above to create a test of the False Negative Signaling Error [FNSE]—i.e., if for the sampled datasets the NBDSSP fails to detect Non-Conformity—to wit believing incorrectly that the dataset is Conforming. We find that there is a dramatic point in the sliding sampling scale at about 120 sampled points where the FPSE first appears—i.e., where the state of nature: Conforming incorrectly is flagged as Non-Conforming. Further, we find it is very unlikely that the FNSE manifests itself for the Hill dataset. This demonstrated clearly that small datasets are indeed likely to create the FPSE, and there should be little concern that Hill-type of datasets will not be indicated as Non-Conforming. We offer a discussion of these results with implications for audits in the Big-Data context where the audit In-charge may find it necessary to partition the datasets of the client.
18

Sumalatha, M., and Latha Parthiban. "Augmentation of Predictive Competence of Non-Small Cell Lung Cancer Datasets through Feature Pre-Processing Techniques." EAI Endorsed Transactions on Pervasive Health and Technology 8, no. 5 (November 2, 2022): e1. http://dx.doi.org/10.4108/eetpht.v8i5.3169.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The major Objective of the Study is to augment the predictive analytics of Non-Small Cell Lung Cancer (NSCLC) datasets with Feature Pre-Processing (FPP) technique in three stages viz. Remove base errors with common analytics on emptiness or non-numerical or missing values in the dataset, remove repeated features through regression analysis and eliminate irrelevant features through clustering methods. The FPP Model is validated using classifiers like simple and complex Tree, Linear and Gaussian SVM, Weighted KNN and Boosted Trees in terms of accuracy, sensitivity, specificity, kappa, positive and negative likelihood. The result showed that the NSCLC dataset formed after FPP outperformed the raw NSCLC dataset in all performance levels and showed good augmentation in predictive analytics of NSCLC datasets. The research proved that preprocessing is essential for better prediction of complex medical datasets.
19

Bai, Long, Liangyu Wang, Tong Chen, Yuanhao Zhao, and Hongliang Ren. "Transformer-Based Disease Identification for Small-Scale Imbalanced Capsule Endoscopy Dataset." Electronics 11, no. 17 (August 31, 2022): 2747. http://dx.doi.org/10.3390/electronics11172747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Vision Transformer (ViT) is emerging as a new leader in computer vision with its outstanding performance in many tasks (e.g., ImageNet-22k, JFT-300M). However, the success of ViT relies on pretraining on large datasets. It is difficult for us to use ViT to train from scratch on a small-scale imbalanced capsule endoscopic image dataset. This paper adopts a Transformer neural network with a spatial pooling configuration. Transfomer’s self-attention mechanism enables it to capture long-range information effectively, and the exploration of ViT spatial structure by pooling can further improve the performance of ViT on our small-scale capsule endoscopy dataset. We trained from scratch on two publicly available datasets for capsule endoscopy disease classification, obtained 79.15% accuracy on the multi-classification task of the Kvasir-Capsule dataset, and 98.63% accuracy on the binary classification task of the Red Lesion Endoscopy dataset.
20

Bao, Yan, Chuo-Hsuan Lee, Frank Heilig, and Edward J. Lusk. "Empirical Information on the Small Size Effect Bias Relative to the False Positive Rejection Error for Benford Test-Screening." International Journal of Economics and Finance 10, no. 2 (January 3, 2018): 1. http://dx.doi.org/10.5539/ijef.v10n2p1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the theoretical work of Hill Benford digital profile testing is now a staple in screening data for forensic investigations and audit examinations. Prior empirical literature indicates that Benford testing when applied to a large Benford Conforming Dataset often produces a bias called the FPE Screening Signal [FPESS] that misleads investigators into believing that the dataset is Non-Conforming in nature. Interestingly, the same FPESS can also be observed when investigators partition large datasets into smaller datasets to address a variety of auditing questions. In this study, we fill the empirical gap in the literature by investigating the sensitivity of the FPESS to partitioned datasets. We randomly selected 16 balance-sheet datasets from: China Stock Market Financial Statements Database™, that tested to be Benford Conforming noted as RBCD. We then explore how partitioning these datasets affects the FPESS by repeated randomly sampling: first 10% of the RBCD and then selecting 250 observations from the RBCD. This created two partitioned groups of 160 datasets each. The Statistical profile observed was: For the RBCD there were no indications of Non-Conformity; for the 10%-Sample there were no overall indications that Extended Procedures would be warranted; and for the 250-Sample there were a number of indications that the dataset was Non-Conforming. This demonstrated clearly that small datasets are indeed likely to create the FPESS. We offer a discussion of these results with implications for audits in the Big-Data context where the audit In-charge would find it necessary to partition the datasets of the client.
21

Mabuni, D., and S. Aquter Babu. "High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy." International Journal of Innovative Technology and Exploring Engineering 10, no. 2 (January 10, 2021): 105–10. http://dx.doi.org/10.35940/ijitee.c8403.0110321.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.
22

Jaryani, Farhang, and Maryam Amiri. "A Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets." Iranian Journal of Health Sciences 11, no. 1 (January 1, 2023): 47–58. http://dx.doi.org/10.32598/ijhs.11.1.883.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers among women. Early detection of the cancer type is essential to help inform subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large datasets and are not developed for small datasets. Although large datasets may lead to more reliable results, their collecting and processing are challenging. Materials and Methods: This paper proposes a new ensemble deep learning model for breast cancer grade detection based on small datasets. Our model uses some basic deep-learning classifiers to grade the breast tumors, including grades I, II, and III. Since none of the previous works focus on the datasets, including breast cancer grades, we have used a new dataset called Databiox to grade the breast cancers in the three grades. Databiox includes histopathological microscopy images from patients with invasive ductal carcinoma (IDC). Results: The performance of the model is evaluated based on the small dataset. We compare the proposed three-layer ensemble classifier with the most common single deep learning classifiers in terms of accuracy and loss. The experimental results show that the proposed model can improve the classification accuracy of the breast cancer grade compared to the other state-of-the-art single classifiers. Conclusion: The ensemble model can be also used for small datasets. In addition, they can improve the accuracy compared to the other models. This achievement is fundamental for the design of classification-based systems in computer-aided diagnosis.
23

Kim, Dongseob, Seungho Lee, Junsuk Choe, and Hyunjung Shim. "Weakly Supervised Semantic Segmentation for Driving Scenes." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2741–49. http://dx.doi.org/10.1609/aaai.v38i3.28053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to driving scene datasets. Based on extensive analysis of dataset characteristics, we employ Contrastive Language-Image Pre-training (CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from CLIP lack in representing small object classes, and (2) these masks contain notable noise. We propose solutions for each issue as follows. (1) We devise Global-Local View Training that seamlessly incorporates small-scale patches during model training, thereby enhancing the model's capability to handle small-sized yet critical objects in driving scenes (e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing (CARB), a novel technique that discerns reliable and noisy regions through evaluating the consistency between CLIP masks and segmentation predictions. It prioritizes reliable pixels over noisy pixels via adaptive loss weighting. Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test dataset, showcasing its potential as a strong WSSS baseline on driving scene datasets. Experimental results on CamVid and WildDash2 demonstrate the effectiveness of our method across diverse datasets, even with small-scale datasets or visually challenging conditions. The code is available at https://github.com/k0u-id/CARB.
24

Xu, Xinkai, Hailan Zhang, Yan Ma, Kang Liu, Hong Bao, and Xu Qian. "TranSDet: Toward Effective Transfer Learning for Small-Object Detection." Remote Sensing 15, no. 14 (July 12, 2023): 3525. http://dx.doi.org/10.3390/rs15143525.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Small-object detection is a challenging task in computer vision due to the limited training samples and low-quality images. Transfer learning, which transfers the knowledge learned from a large dataset to a small dataset, is a popular method for improving performance on limited data. However, we empirically find that due to the dataset discrepancy, directly transferring the model trained on a general object dataset to small-object datasets obtains inferior performance. In this paper, we propose TranSDet, a novel approach for effective transfer learning for small-object detection. Our method adapts a model trained on a general dataset to a small-object-friendly model by augmenting the training images with diverse smaller resolutions. A dynamic resolution adaptation scheme is employed to ensure consistent performance on various sizes of objects using meta-learning. Additionally, the proposed method introduces two network components, an FPN with shifted feature aggregation and an anchor relation module, which are compatible with transfer learning and effectively improve small-object detection performance. Extensive experiments on the TT100K, BUUISE-MO-Lite, and COCO datasets demonstrate that TranSDet achieves significant improvements compared to existing methods. For example, on the TT100K dataset, TranSDet outperforms the state-of-the-art method by 8.0% in terms of the mean average precision (mAP) for small-object detection. On the BUUISE-MO-Lite dataset, TranSDet improves the detection accuracy of RetinaNet and YOLOv3 by 32.2% and 12.8%, respectively.
25

Davila Delgado, Juan Manuel, and Lukumon Oyedele. "Deep learning with small datasets: using autoencoders to address limited datasets in construction management." Applied Soft Computing 112 (November 2021): 107836. http://dx.doi.org/10.1016/j.asoc.2021.107836.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Marston, Louise, Janet L. Peacock, Keming Yu, Peter Brocklehurst, Sandra A. Calvert, Anne Greenough, and Neil Marlow. "Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets." Paediatric and Perinatal Epidemiology 23, no. 4 (July 2009): 380–92. http://dx.doi.org/10.1111/j.1365-3016.2009.01046.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Karunanithi, Sivarajan, Martin Simon, and Marcel H. Schulz. "Automated analysis of small RNA datasets with RAPID." PeerJ 7 (April 10, 2019): e6710. http://dx.doi.org/10.7717/peerj.6710.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Understanding the role of short-interfering RNA (siRNA) in diverse biological processes is of current interest and often approached through small RNA sequencing. However, analysis of these datasets is difficult due to the complexity of biological RNA processing pathways, which differ between species. Several properties like strand specificity, length distribution, and distribution of soft-clipped bases are few parameters known to guide researchers in understanding the role of siRNAs. We present RAPID, a generic eukaryotic siRNA analysis pipeline, which captures information inherent in the datasets and automatically produces numerous visualizations as user-friendly HTML reports, covering multiple categories required for siRNA analysis. RAPID also facilitates an automated comparison of multiple datasets, with one of the normalization techniques dedicated for siRNA knockdown analysis, and integrates differential expression analysis using DESeq2. Availability and Implementation RAPID is available under MIT license at https://github.com/SchulzLab/RAPID. We recommend using it as a conda environment available from https://anaconda.org/bioconda/rapid
28

Goyal, Gaurvi, Nicoletta Noceti, and Francesca Odone. "Cross-view action recognition with small-scale datasets." Image and Vision Computing 120 (April 2022): 104403. http://dx.doi.org/10.1016/j.imavis.2022.104403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Singh, Gurpartap, Sunil Agrawal, and Balwinder Singh Sohi. "Handwritten Gurmukhi Digit Recognition System for Small Datasets." Traitement du Signal 37, no. 4 (October 10, 2020): 661–69. http://dx.doi.org/10.18280/ts.370416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In the present study, a method to increase the recognition accuracy of Gurmukhi (Indian Regional Script) Handwritten Digits has been proposed. The proposed methodology uses a DCNN (Deep Convolutional Neural Network) with a cascaded XGBoost (Extreme Gradient Boosting) algorithm. Also, a comprehensive analysis has been done to apprehend the impact of kernel size of DCNN on recognition accuracy. The reason for using DCNN is its impressive performance in terms of recognition accuracy of handwritten digits, but in order to achieve good recognition accuracy, DCNN requires a huge amount of data and also significant training/testing time. In order to increase the accuracy of DCNN for a small dataset more images have been generated by applying a shear transformation (A transformation that preserves parallelism but not length and angles) to the original images. To address the issue of large training time only two hidden layers along with selective cascading XGBoost among the misclassified digits have been used. Also, the issue of overfitting is discussed in detail and has been reduced to a great extent. Finally, the results are compared with performance of some recent techniques like SVM (Support Vector Machine) Random Forest, and XGBoost classifiers on DCT (Discrete Cosine Transform) and DWT (Discrete Wavelet Transform) features obtained on the same dataset. It is found that proposed methodology can outperform other techniques in terms of overall rate of recognition.
30

Mauldin, Taylor, Anne H. Ngu, Vangelis Metsis, and Marc E. Canby. "Ensemble Deep Learning on Wearables Using Small Datasets." ACM Transactions on Computing for Healthcare 2, no. 1 (December 30, 2020): 1–30. http://dx.doi.org/10.1145/3428666.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Li, Jingmei, Di Xue, Weifei Wu, and Jiaxiang Wang. "Incremental Learning for Malware Classification in Small Datasets." Security and Communication Networks 2020 (February 20, 2020): 1–12. http://dx.doi.org/10.1155/2020/6309243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Information security is an important research area. As a very special yet important case, malware classification plays an important role in information security. In the real world, the malware datasets are open-ended and dynamic, and new malware samples belonging to old classes and new classes are increasing continuously. This requires the malware classification method to enable incremental learning, which can efficiently learn the new knowledge. However, existing works mainly focus on feature engineering with machine learning as a tool. To solve the problem, we present an incremental malware classification framework, named “IMC,” which consists of opcode sequence extraction, selection, and incremental learning method. We develop an incremental learning method based on multiclass support vector machine (SVM) as the core component of IMC, named “IMCSVM,” which can incrementally improve its classification ability by learning new malware samples. In IMC, IMCSVM adds the new classification planes (if new samples belong to a new class) and updates all old classification planes for new malware samples. As a result, IMC can improve the classification quality of known malware classes by minimizing the prediction error and transfer the old model with known knowledge to classify unknown malware classes. We apply the incremental learning method into malware classification, and the experimental results demonstrate the advantages and effectiveness of IMC.
32

Baroni, Michel, Fabrice Barthélémy, and Mahdi Mokrane. "A repeat sales index robust to small datasets." Journal of Property Investment & Finance 29, no. 1 (February 8, 2011): 35–48. http://dx.doi.org/10.1108/14635781111100182.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

von Ungern-Sternberg, Britta S., and Adrian Regli. "Big problem, small incidence, and large registry datasets." Lancet Respiratory Medicine 4, no. 1 (January 2016): 5–6. http://dx.doi.org/10.1016/s2213-2600(15)00519-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Vatian, A. S., A. A. Golubev, N. F. Gusarova, N. V. Dobrenko, A. A. Zubanenko, E. S. Kustova, A. A. Tatarinova, I. V. Tomilov, and G. F. Shovkoplyas. "Intelligent clinical decision support for small patient datasets." Scientific and Technical Journal of Information Technologies, Mechanics and Optics 23, no. 3 (June 1, 2023): 595–607. http://dx.doi.org/10.17586/2226-1494-2023-23-3-595-607.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Tanov, Vladislav. "Data-Centric Optimization Approach for Small, Imbalanced Datasets." Journal of information and organizational sciences 47, no. 1 (June 30, 2023): 167–77. http://dx.doi.org/10.31341/jios.47.1.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Data-centric is a newly explored concept, where the attention is given to dataoptimization methodologies and techniques to improve model performance, rather thanfocusing on machine learning models and hyperparameter tunning. This paper suggestsan effective data optimization methodology for optimizing imbalanced small datasetsthat improves machine learning model performance.This paper is focused on providing an effective solution when the number ofobservations is not enough to construct a machine learning model with high values ofthe estimated magnitudes. For example, the majority of the observations are labeled asone class (majority class), and the rest as the other, commonly considered as the classof interest (minority class). The proposed methodology does not depend on the appliedclassification models, rather it is based on the properties of the data resamplingapproach to systematically enhance and optimize the training dataset. The paperexamines numerical experiments applying the data centric optimization methodology,and compares with previously obtained results by other authors.
36

Wu, Yumei, Jingxiu Yao, Shuo Chang, and Bin Liu. "LIMCR: Less-Informative Majorities Cleaning Rule Based on Naïve Bayes for Imbalance Learning in Software Defect Prediction." Applied Sciences 10, no. 23 (November 24, 2020): 8324. http://dx.doi.org/10.3390/app10238324.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Software defect prediction (SDP) is an effective technique to lower software module testing costs. However, the imbalanced distribution almost exists in all SDP datasets and restricts the accuracy of defect prediction. In order to balance the data distribution reasonably, we propose a novel resampling method LIMCR on the basis of Naïve Bayes to optimize and improve the SDP performance. The main idea of LIMCR is to remove less-informative majorities for rebalancing the data distribution after evaluating the degree of being informative for every sample from the majority class. We employ 29 SDP datasets from the PROMISE and NASA dataset and divide them into two parts, the small sample size (the amount of data is smaller than 1100) and the large sample size (larger than 1100). Then we conduct experiments by comparing the matching of classifiers and imbalance learning methods on small datasets and large datasets, respectively. The results show the effectiveness of LIMCR, and LIMCR+GNB performs better than other methods on small datasets while not brilliant on large datasets.
37

Perin, Vinicius, Samapriya Roy, Joe Kington, Thomas Harris, Mirela G. Tulbure, Noah Stone, Torben Barsballe, Michele Reba, and Mary A. Yaeger. "Monitoring Small Water Bodies Using High Spatial and Temporal Resolution Analysis Ready Datasets." Remote Sensing 13, no. 24 (December 20, 2021): 5176. http://dx.doi.org/10.3390/rs13245176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Basemap and Planet Fusion—derived from PlanetScope imagery—represent the next generation of analysis ready datasets that minimize the effects of the presence of clouds. These datasets have high spatial (3 m) and temporal (daily) resolution, which provides an unprecedented opportunity to improve the monitoring of on-farm reservoirs (OFRs)—small water bodies that store freshwater and play important role in surface hydrology and global irrigation activities. In this study, we assessed the usefulness of both datasets to monitor sub-weekly surface area changes of 340 OFRs in eastern Arkansas, USA, and we evaluated the datasets main differences when used to monitor OFRs. When comparing the OFRs surface area derived from Basemap and Planet Fusion to an independent validation dataset, both datasets had high agreement (r2 ≥ 0.87), and small uncertainties, with a mean absolute percent error (MAPE) between 7.05% and 10.08%. Pairwise surface area comparisons between the two datasets and the PlanetScope imagery showed that 61% of the OFRs had r2 ≥ 0.55, and 70% of the OFRs had MAPE <5%. In general, both datasets can be employed to monitor OFRs sub-weekly surface area changes, and Basemap had higher surface area variability and was more susceptible to the presence of cloud shadows and haze when compared to Planet Fusion, which had a smoother time series with less variability and fewer abrupt changes throughout the year. The uncertainties in surface area classification decreased as the OFRs increased in size. In addition, the surface area time series can have high variability, depending on the OFR environmental conditions (e.g., presence of vegetation inside the OFR). Our findings suggest that both datasets can be used to monitor OFRs sub-weekly, seasonal, and inter-annual surface area changes; therefore, these datasets can help improve freshwater management by allowing better assessment and management of the OFRs.
38

Sheeny, Marcel, Andrew Wallace, and Sen Wang. "RADIO: Parameterized Generative Radar Data Augmentation for Small Datasets." Applied Sciences 10, no. 11 (June 2, 2020): 3861. http://dx.doi.org/10.3390/app10113861.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We present a novel, parameterised radar data augmentation (RADIO) technique to generate realistic radar samples from small datasets for the development of radar-related deep learning models. RADIO leverages the physical properties of radar signals, such as attenuation, azimuthal beam divergence and speckle noise, for data generation and augmentation. Exemplary applications on radar-based classification and detection demonstrate that RADIO can generate meaningful radar samples that effectively boost the accuracy of classification and generalisability of deep models trained with a small dataset.
39

Li, Jindi, Kefeng Li, Guangyuan Zhang, Jiaqi Wang, Keming Li, and Yumin Yang. "Recognition of Dorsal Hand Vein in Small-Scale Sample Database Based on Fusion of ResNet and HOG Feature." Electronics 11, no. 17 (August 28, 2022): 2698. http://dx.doi.org/10.3390/electronics11172698.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
As artificial intelligence develops, deep learning algorithms are increasingly being used in the field of dorsal hand vein (DHV) recognition. However, deep learning has high requirements regarding the number of samples, and current DHV datasets have few images. To solve the above problems, we propose a method based on the fusion of ResNet and Histograms of Oriented Gradients (HOG) features, in which the shallow semantic information extracted by primary convolution and HOG features are fed into the residual structure of ResNet for full fusion and, finally, classification. By adding Gaussian noise, the North China University of Technology dataset, the Shandong University of Science and Technology dataset, and the Eastern Mediterranean University dataset are extended and fused to from a fused dataset. Our proposed method is applied to the above datasets, and the experimental results show that our proposed method achieves good recognition rates on each of the datasets. Importantly, we achieved a 93.47% recognition rate on the fused dataset, which was 2.31% and 26.08% higher than using ResNet and HOG alone.
40

Panda, Rameswar, Michele Merler, Mayoore S. Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Richard Chen, et al. "NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 9294–302. http://dx.doi.org/10.1609/aaai.v35i10.17121.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset. Despite a number of recent results that show the promise of transfer from proxy datasets, a comprehensive evaluation of different NAS methods studying the impact of different source datasets has not yet been addressed. In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as ImageNet1K and ImageNet22K. We find that: (i) The size and domain of the proxy set does not seem to influence architecture performance on the target dataset. On average, transfer performance of architectures searched using completely different small datasets (e.g., CIFAR10) perform similarly to the architectures searched directly on proxy target datasets. However, design of proxy sets has considerable impact on rankings of different NAS methods. (ii) While different NAS methods show similar performance on a source dataset (e.g., CIFAR10), they significantly differ on the transfer performance to a large dataset (e.g., ImageNet1K). (iii) Even on large datasets, random sampling baseline is very competitive, but the choice of the appropriate combination of proxy set and search strategy can provide significant improvement over it. We believe that our extensive empirical analysis will prove useful for future design of NAS algorithms.
41

Maack, Lennart, Lennart Holstein, and Alexander Schlaefer. "GANs for generation of synthetic ultrasound images from small datasets." Current Directions in Biomedical Engineering 8, no. 1 (July 1, 2022): 17–20. http://dx.doi.org/10.1515/cdbme-2022-0005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract The task of medical image classification is increasingly supported by algorithms. Deep learning methods like convolutional neural networks (CNNs) show superior performance in medical image analysis but need a high-quality training dataset with a large number of annotated samples. Particularly in the medical domain, the availability of such datasets is rare due to data privacy or the lack of data sharing practices among institutes. Generative adversarial networks (GANs) are able to generate high quality synthetic images. This work investigates the capabilities of different state-of-the-art GAN architectures in generating realistic breast ultrasound images if only a small amount of training data is available. In a second step, these synthetic images are used to augment the real ultrasound image dataset utilized for training CNNs. The training of both GANs and CNNs is conducted with systematically reduced dataset sizes. The GAN architectures are capable of generating realistic ultrasound images. GANs using data augmentation techniques outperform the baseline Style- GAN2 with respect to the Frechet Inception distance by up to 64.2%. CNN models trained with additional synthetic data outperform the baseline CNN model using only real data for training by up to 15.3% with respect to the F1 score, especially for datasets containing less than 100 images. As a conclusion, GANs can successfully be used to generate synthetic ultrasound images of high quality and diversity, improve classification performance of CNNs and thus provide a benefit to computer-aided diagnostics.
42

Ahmed, Shouket Abdulrahman, Hazry Desa, and Abadal-Salam T. Hussain. "Aerial image semantic segmentation based on 3D fits a small dataset of 1D." IAES International Journal of Artificial Intelligence (IJ-AI) 12, no. 4 (December 1, 2023): 2048. http://dx.doi.org/10.11591/ijai.v12.i4.pp2048-2054.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<p>Time restrictions and lack of precision demand that the initial technique be abandoned. Even though the remaining datasets had fewer identified classes than initially planned for the study, the labels were more accurate. Because of the need for additional data, a single network cannot categorize all the essential elements in a picture, including bodies of water, roads, trees, buildings, and crops. However, the final network gains some invariance in detecting these classes with environmental changes due to the different geographic positions of roads and buildings discovered in the final datasets, which could be valuable in future navigation research. At the moment, binary classifications of a single class are the only datasets that can be used for the semantic segmentation of aerial images. Even though some pictures have more than one classification, images of roads and buildings were only found in a significant number of samples. Then, the building datasets were pooled to produce a larger dataset and for the constructed models to gain some invariance on image location. Because of the massive disparity in sample size, road datasets needed to be integrated.</p>
43

Ng, Wartini, Budiman Minasny, Brendan Malone, and Patrick Filippi. "In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra." PeerJ 6 (October 3, 2018): e5722. http://dx.doi.org/10.7717/peerj.5722.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets. Methods Here, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS, n = 5,639), a regional dataset from Australia (Geeves, n = 379), and a local dataset from New South Wales, Australia (Hillston, n = 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets. Results Overall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size. Discussion Our results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.
44

Zhang, Ruofan, Yi Wang, Ping Jiang, Jialiang Peng, and Hailin Chen. "IBSA_Net: A Network for Tomato Leaf Disease Identification Based on Transfer Learning with Small Samples." Applied Sciences 13, no. 7 (March 29, 2023): 4348. http://dx.doi.org/10.3390/app13074348.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Tomatoes are a crop of significant economic importance, and disease during growth poses a substantial threat to yield and quality. In this paper, we propose IBSA_Net, a tomato leaf disease recognition network that employs transfer learning and small sample data, while introducing the Shuffle Attention mechanism to enhance feature representation. The model is optimized by employing the IBMax module to increase the receptive field and adding the HardSwish function to the ConvBN layer to improve stability and speed. To address the challenge of poor generalization of models trained on public datasets to real environment datasets, we developed an improved PlantDoc++ dataset and utilized transfer learning to pre-train the model on PDDA and PlantVillage datasets. The results indicate that after pre-training on the PDDA dataset, IBSA_Net achieved a test accuracy of 0.946 on a real environment dataset, with an average precision, recall, and F1-score of 0.942, 0.944, and 0.943, respectively. Additionally, the effectiveness of IBSA_Net in other crops is verified. This study provides a dependable and effective method for recognizing tomato leaf diseases in real agricultural production environments, with the potential for application in other crops.
45

Mu, Lingli, Lina Xian, Lihong Li, Gang Liu, Mi Chen, and Wei Zhang. "YOLO-Crater Model for Small Crater Detection." Remote Sensing 15, no. 20 (October 20, 2023): 5040. http://dx.doi.org/10.3390/rs15205040.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Craters are the most prominent geomorphological features on the surface of celestial bodies, which plays a crucial role in studying the formation and evolution of celestial bodies as well as in landing and planning for surface exploration. Currently, the main automatic crater detection models and datasets focus on the detection of large and medium craters. In this paper, we created 23 small lunar crater datasets for model training based on the Chang’E-2 (CE-2) DOM, DEM, Slope, and integrated data with 7 kinds of visualization stretching methods. Then, we proposed the YOLO-Crater model for Lunar and Martian small crater detection by replacing EioU and VariFocal loss to solve the crater sample imbalance problem and introducing a CBAM attention mechanism to mitigate interference from the complex extraterrestrial environment. The results show that the accuracy (P = 87.86%, R = 66.04%, and F1 = 75.41%) of the Lunar YOLO-Crater model based on the DOM-MMS (Maximum-Minimum Stretching) dataset is the highest and better than that of the YOLOX model. The Martian YOLO-Crater, trained by the Martian dataset from the 2022 GeoAI Martian Challenge, achieves good performance with P = 88.37%, R = 69.25%, and F1 = 77.65%. It indicates that the YOLO-Crater model has strong transferability and generalization capability, which can be applied to detect small craters on the Moon and other celestial bodies.
46

Shao, Ran, Xiao-Jun Bi, and Zheng Chen. "A novel hybrid transformer-CNN architecture for environmental microorganism classification." PLOS ONE 17, no. 11 (November 11, 2022): e0277557. http://dx.doi.org/10.1371/journal.pone.0277557.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The success of vision transformers (ViTs) has given rise to their application in classification tasks of small environmental microorganism (EM) datasets. However, due to the lack of multi-scale feature maps and local feature extraction capabilities, the pure transformer architecture cannot achieve good results on small EM datasets. In this work, a novel hybrid model is proposed by combining the transformer with a convolution neural network (CNN). Compared to traditional ViTs and CNNs, the proposed model achieves state-of-the-art performance when trained on small EM datasets. This is accomplished in two ways. 1) Instead of the original fixed-size feature maps of the transformer-based designs, a hierarchical structure is adopted to obtain multi-scale feature maps. 2) Two new blocks are introduced to the transformer’s two core sections, namely the convolutional parameter sharing multi-head attention block and the local feed-forward network block. The ways allow the model to extract more local features compared to traditional transformers. In particular, for classification on the sixth version of the EM dataset (EMDS-6), the proposed model outperforms the baseline Xception by 6.7 percentage points, while being 60 times smaller in parameter size. In addition, the proposed model also generalizes well on the WHOI dataset (accuracy of 99%) and constitutes a fresh approach to the use of transformers for visual classification tasks based on small EM datasets.
47

Nguyen, Nhat-Duy, Tien Do, Thanh Duc Ngo, and Duy-Dinh Le. "An Evaluation of Deep Learning Methods for Small Object Detection." Journal of Electrical and Computer Engineering 2020 (April 27, 2020): 1–18. http://dx.doi.org/10.1155/2020/3189691.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.
48

Liu, Tengjun, Ying Chen, and Wanxuan Gu. "Copyright-Certified Distillation Dataset: Distilling One Million Coins into One Bitcoin with Your Private Key." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 5 (June 26, 2023): 6458–66. http://dx.doi.org/10.1609/aaai.v37i5.25794.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid development of neural network dataset distillation in recent years has provided new ideas in many areas such as continuous learning, neural network architecture search and privacy preservation. Dataset distillation is a very effective method to distill large training datasets into small data, thus ensuring that the test accuracy of models trained on their synthesized small datasets matches that of models trained on the full dataset. Thus, dataset distillation itself is commercially valuable, not only for reducing training costs, but also for compressing storage costs and significantly reducing the training costs of deep learning. However, copyright protection for dataset distillation has not been proposed yet, so we propose the first method to protect intellectual property by embedding watermarks in the dataset distillation process. Our approach not only popularizes the dataset distillation technique, but also authenticates the ownership of the distilled dataset by the models trained on that distilled dataset.
49

Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David M. Mattli, and Kristina Haruka Yamamoto. "A Program for Handling Map Projections of Small Scale Geospatial Raster Data." Cartographic Perspectives, no. 71 (September 24, 2012): 53–67. http://dx.doi.org/10.14714/cp71.61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Scientists routinely accomplish small-scale geospatial modeling using raster datasets of global extent. Such use often requires the projection of global raster datasets onto a map or the reprojection from a given map projection associated with a dataset. The distortion characteristics of these projection transformations can have significant effects on modeling results. Distortions associated with the reprojection of global data are generally greater than distortions associated with reprojections of larger-scale, localized areas. The accuracy of areas in projected raster datasets of global extent is dependent on resolution. To address these problems of projection and the associated resampling that accompanies it, methods for framing the transformation space, direct point-to-point transformations rather than gridded transformation spaces, a solution to the wrap-around problem, and an approach to alternative resampling methods are presented. The implementations of these methods are provided in an open source software package called MapImage (or mapIMG, for short), which is designed to function on a variety of computer architectures.
50

MacKinnon, James G. "Inference with Large Clustered Datasets." Articles 92, no. 4 (July 12, 2017): 649–65. http://dx.doi.org/10.7202/1040501ar.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Inference using large datasets is not nearly as straightforward as conventional econometric theory suggests when the disturbances are clustered, even with very small intra-cluster correlations. The information contained in such a dataset grows much more slowly with the sample size than it would if the observations were independent. Moreover, inferences become increasingly unreliable as the dataset gets larger. These assertions are based on an extensive series of estimations undertaken using a large dataset taken from the U.S. Current Population Survey.

To the bibliography