Accedi

Bibliografie tematiche / Artificial datasets / Articoli di riviste

Articoli di riviste sul tema "Artificial datasets"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Artificial datasets.

Autore: Grafiati

Pubblicato: 22 giugno 2024

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Artificial datasets".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Serrano-Pérez, Jonathan, e L. Enrique Sucar. "Artificial datasets for hierarchical classification". Expert Systems with Applications 182 (novembre 2021): 115218. http://dx.doi.org/10.1016/j.eswa.2021.115218.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

2

Lychev, Andrey V. "Synthetic Data Generation for Data Envelopment Analysis". Data 8, n. 10 (27 settembre 2023): 146. http://dx.doi.org/10.3390/data8100146.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The paper is devoted to the problem of generating artificial datasets for data envelopment analysis (DEA), which can be used for testing DEA models and methods. In particular, the papers that applied DEA to big data often used synthetic data generation to obtain large-scale datasets because real datasets of large size, available in the public domain, are extremely rare. This paper proposes the algorithm which takes as input some real dataset and complements it by artificial efficient and inefficient units. The generation process extends the efficient part of the frontier by inserting artificial efficient units, keeping the original efficient frontier unchanged. For this purpose, the algorithm uses the assurance region method and consistently relaxes weight restrictions during the iterations. This approach produces synthetic datasets that are closer to real ones, compared to other algorithms that generate data from scratch. The proposed algorithm is applied to a pair of small real-life datasets. As a result, the datasets were expanded to 50K units. Computational experiments show that artificially generated DMUs preserve isotonicity and do not increase the collinearity of the original data as a whole.

3

Petráš, Jaroslav, Marek Pavlík, Ján Zbojovský, Ardian Hyseni e Jozef Dudiak. "Benford’s Law in Electric Distribution Network". Mathematics 11, n. 18 (10 settembre 2023): 3863. http://dx.doi.org/10.3390/math11183863.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Benford’s law can be used as a method to detect non-natural changes in data sets with certain properties; in our case, the dataset was collected from electricity metering devices. In this paper, we present a theoretical background behind this law. We applied Benford’s law first digit probability distribution test for electricity metering data sets acquired from smart electricity meters, i.e., the natural data of electricity consumption acquired during a specific time interval. We present the results of Benford’s law distribution for an original measured dataset with no artificial intervention and a set of results for different kinds of affected datasets created by simulated artificial intervention. Comparing these two dataset types with each other and with the theoretical probability distribution provided us the proof that with this kind of data, Benford’s law can be applied and that it can extract the dataset’s artificial manipulation markers. As presented in the results part of the article, non-affected datasets mostly have a deviation from BL theoretical probability values below 10%, rarely between 10% and 20%. On the other side, simulated affected datasets show deviations mostly above 20%, often approximately 70%, but rarely lower than 20%, and this only in the case of affecting a small part of the original dataset (10%), which represents only a small magnitude of intervention.

4

Dasari, Kishore Babu, e Nagaraju Devarakonda. "TCP/UDP-Based Exploitation DDoS Attacks Detection Using AI Classification Algorithms with Common Uncorrelated Feature Subset Selected by Pearson, Spearman and Kendall Correlation Methods". Revue d'Intelligence Artificielle 36, n. 1 (28 febbraio 2022): 61–71. http://dx.doi.org/10.18280/ria.360107.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The Distributed Denial of Service (DDoS) attack is a serious cyber security attack that attempts to disrupt the availability security principle of computer networks and information systems. It's critical to detect DDoS attacks quickly and accurately while using as less computing power as possible in order to minimize damage and cost efficient. This research proposes a fast and high-accuracy detection approach by using features selected by proposed method for Exploitation-based DDoS attacks. Experiments are carried out on the CICDDoS2019 datasets Syn flood, UDP flood, and UDP-Lag, as well as customized dataset. In addition, experiments were also conducted on a customized dataset that was constructed by combining three CICDDoS2019 datasets. Pearson, Spearman, and Kendall correlation techniques have been used for datasets to find un-correlated feature subsets. Then, among three un-correlated feature subsets, choose the common un-correlated features. On the datasets, classification techniques are applied to these common un-correlated features. This research used conventional classifiers Logistic regression, Decision tree, KNN, Naive Bayes, bagging classifier Random forest, boosting classifiers Ada boost, Gradient boost, and neural network-based classifier Multilayer perceptron. The performance of these classification algorithms was also evaluated in terms of accuracy, precision, recall, F1-score, specificity, log loss, execution time, and K-fold cross-validation. Finally, classification techniques were tested on a customized dataset with common features that were common in all of the dataset’s common un-correlated feature sets.

5

Kusetogullari, Huseyin, Amir Yavariabdi, Abbas Cheddad, Håkan Grahn e Johan Hall. "ARDIS: a Swedish historical handwritten digit dataset". Neural Computing and Applications 32, n. 21 (29 marzo 2019): 16505–18. http://dx.doi.org/10.1007/s00521-019-04163-3.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract This paper introduces a new image-based handwritten historical digit dataset named Arkiv Digital Sweden (ARDIS). The images in ARDIS dataset are extracted from 15,000 Swedish church records which were written by different priests with various handwriting styles in the nineteenth and twentieth centuries. The constructed dataset consists of three single-digit datasets and one-digit string dataset. The digit string dataset includes 10,000 samples in red–green–blue color space, whereas the other datasets contain 7600 single-digit images in different color spaces. An extensive analysis of machine learning methods on several digit datasets is carried out. Additionally, correlation between ARDIS and existing digit datasets Modified National Institute of Standards and Technology (MNIST) and US Postal Service (USPS) is investigated. Experimental results show that machine learning algorithms, including deep learning methods, provide low recognition accuracy as they face difficulties when trained on existing datasets and tested on ARDIS dataset. Accordingly, convolutional neural network trained on MNIST and USPS and tested on ARDIS provide the highest accuracies $$58.80\%$$ 58.80 % and $$35.44\%$$ 35.44 % , respectively. Consequently, the results reveal that machine learning methods trained on existing datasets can have difficulties to recognize digits effectively on our dataset which proves that ARDIS dataset has unique characteristics. This dataset is publicly available for the research community to further advance handwritten digit recognition algorithms.

6

Morgan, Maria, Carla Blank e Raed Seetan. "Plant disease prediction using classification algorithms". IAES International Journal of Artificial Intelligence (IJ-AI) 10, n. 1 (1 marzo 2021): 257. http://dx.doi.org/10.11591/ijai.v10.i1.pp257-264.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

<p>This paper investigates the capability of six existing classification algorithms (Artificial Neural Network, Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Decision Tree and Random Forest) in classifying and predicting diseases in soybean and mushroom datasets using datasets with numerical or categorical attributes. While many similar studies have been conducted on datasets of images to predict plant diseases, the main objective of this study is to suggest classification methods that can be used for disease classification and prediction in datasets that contain raw measurements instead of images. A fungus and a plant dataset, which had many differences, were chosen so that the findings in this paper could be applied to future research for disease prediction and classification in a variety of datasets which contain raw measurements. A key difference between the two datasets, other than one being a fungus and one being a plant, is that the mushroom dataset is balanced and only contained two classes while the soybean dataset is imbalanced and contained eighteen classes. All six algorithms performed well on the mushroom dataset, while the Artificial Neural Network and k-Nearest Neighbor algorithms performed best on the soybean dataset. The findings of this paper can be applied to future research on disease classification and prediction in a variety of dataset types such as fungi, plants, humans, and animals.</p>

7

Saul, Marcia, e Shahin Rostami. "Assessing performance of artificial neural networks and re-sampling techniques for healthcare datasets". Health Informatics Journal 28, n. 1 (gennaio 2022): 146045822210871. http://dx.doi.org/10.1177/14604582221087109.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Re-sampling methods to solve class imbalance problems have shown to improve classification accuracy by mitigating the bias introduced by differences in class size. However, it is possible that a model which uses a specific re-sampling technique prior to Artificial neural networks (ANN) training may not be suitable for aid in classifying varied datasets from the healthcare industry. Five healthcare-related datasets were used across three re-sampling conditions: under-sampling, over-sampling and combi-sampling. Within each condition, different algorithmic approaches were applied to the dataset and the results were statistically analysed for a significant difference in ANN performance. The combi-sampling condition showed that four out of the five datasets did not show significant consistency for the optimal re-sampling technique between the f1-score and Area Under the Receiver Operating Characteristic Curve performance evaluation methods. Contrarily, the over-sampling and under-sampling condition showed all five datasets put forward the same optimal algorithmic approach across performance evaluation methods. Furthermore, the optimal combi-sampling technique (under-, over-sampling and convergence point), were found to be consistent across evaluation measures in only two of the five datasets. This study exemplifies how discrete ANN performances on datasets from the same industry can occur in two ways: how the same re-sampling technique can generate varying ANN performance on different datasets, and how different re-sampling techniques can generate varying ANN performance on the same dataset.

8

Gau, Michael-Lian, Huong-Yong Ting, Teck-Hock Toh, Pui-Ying Wong, Pei-Jun Woo, Su-Woan Wo e Gek-Ling Tan. "Effectiveness of Using Artificial Intelligence for Early Child Development Screening". Green Intelligent Systems and Applications 3, n. 1 (9 maggio 2023): 1–13. http://dx.doi.org/10.53623/gisa.v3i1.229.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This study presents a novel approach to recognizing emotions in infants using machine learning models. To address the lack of infant-specific datasets, a custom dataset of infants' faces was created by extracting images from the AffectNet dataset. The dataset was then used to train various machine learning models with different parameters. The best-performing model was evaluated on the City Infant Faces dataset. The proposed deep learning model achieved an accuracy of 94.63% in recognizing positive, negative, and neutral facial expressions. These results provide a benchmark for the performance of machine learning models in infant emotion recognition and suggest potential applications in developing emotion-sensitive technologies for infants. This study fills a gap in the literature on emotion recognition, which has largely focused on adults or children and highlights the importance of developing infant-specific datasets and evaluating different parameters to achieve accurate results.

9

GHAFFARI, REZA, IOAN GROSU, DACIANA ILIESCU, EVOR HINES e MARK LEESON. "DIMENSIONALITY REDUCTION FOR SENSORY DATASETS BASED ON MASTER–SLAVE SYNCHRONIZATION OF LORENZ SYSTEM". International Journal of Bifurcation and Chaos 23, n. 05 (maggio 2013): 1330013. http://dx.doi.org/10.1142/s0218127413300139.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In this study, we propose a novel method for reducing the attributes of sensory datasets using Master–Slave Synchronization of chaotic Lorenz Systems (DPSMS). As part of the performance testing, three benchmark datasets and one Electronic Nose (EN) sensory dataset with 3 to 13 attributes were presented to our algorithm to be projected into two attributes. The DPSMS-processed datasets were then used as input vector to four artificial intelligence classifiers, namely Feed-Forward Artificial Neural Networks (FFANN), Multilayer Perceptron (MLP), Decision Tree (DT) and K-Nearest Neighbor (KNN). The performance of the classifiers was then evaluated using the original and reduced datasets. Classification rate of 94.5%, 89%, 94.5% and 82% were achieved when reduced Fishers iris, crab gender, breast cancer and electronic nose test datasets were presented to the above classifiers.

10

Pavlov, Nikolay A., Anna E. Andreychenko, Anton V. Vladzymyrskyy, Anush A. Revazyan, Yury S. Kirpichev e Sergey P. Morozov. "Reference medical datasets (MosMedData) for independent external evaluation of algorithms based on artificial intelligence in diagnostics". Digital Diagnostics 2, n. 1 (30 aprile 2021): 49–66. http://dx.doi.org/10.17816/dd60635.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The article describes a novel approach to creating annotated medical datasets for testing artificial intelligence-based diagnostic solutions. Moreover, there are four stages of dataset formation described: planning, selection of initial data, marking and verification, and documentation. There are also examples of datasets created using the described methods. The technique is scalable and versatile, and it can be applied to other areas of medicine and healthcare that are being automated and developed using artificial intelligence and big data technologies.

11

Akgül, İsmail, Volkan Kaya e Özge Zencir Tanır. "A novel hybrid system for automatic detection of fish quality from eye and gill color characteristics using transfer learning technique". PLOS ONE 18, n. 4 (25 aprile 2023): e0284804. http://dx.doi.org/10.1371/journal.pone.0284804.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Fish remains popular among the body’s most essential nutrients, as it contains protein and polyunsaturated fatty acids. It is extremely important to choose the fish consumption according to the season and the freshness of the fish to be purchased. It is very difficult to distinguish between non-fresh fish and fresh fish mixed in the fish stalls. In addition to traditional methods used to determine meat freshness, significant success has been achieved in studies on fresh fish detection with artificial intelligence techniques. In this study, two different types of fish (anchovy and horse mackerel) used to determine fish freshness with convolutional neural networks, one of the artificial intelligence techniques. The images of fresh fish were taken, images of non-fresh fish were taken and two new datasets (Dataset1: Anchovy, Dataset2: Horse mackerel) were created. A novel hybrid model structure has been proposed to determine fish freshness using fish eye and gill regions on these two datasets. In the proposed model, Yolo-v5 and Inception-ResNet-v2 and Xception model structures are used through transfer learning. Whether the fish is fresh in both of the Yolo-v5 + Inception-ResNet-v2 (Dataset1: 97.67%, Dataset2: 96.0%) and Yolo-v5 + Xception (Dataset1: 88.00%, Dataset2: 94.67%) hybrid models created using these model structures has been successfully detected. Thanks to the model we have proposed, it will make an important contribution to the studies that will be conducted in the freshness studies of fish using different storage days and the estimation of fish size.

12

Vasilev, Y. A., T. M. Bobrovskaya, K. M. Arzamasov, S. F. Chetverikov, A. V. Vladzymyrskyy, O. V. Omelyanskaya, A. E. Andreychenko, N. A. Pavlov e L. N. Anishchenko. "Medical datasets for machine learning: fundamental principles of standartization and systematization". Manager Zdravookhranenia, n. 4 (7 giugno 2023): 28–41. http://dx.doi.org/10.21045/1811-0185-2023-4-28-41.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Backgraund: Active implementation of artificial intelligence technologies in the healthcare in recent years promotes increasing amount of medical data for the development of machine learning models, including radiology and instrumental diagnostics data. To solve various problems of digital medical technologies, new datasets are being created through machine learning algorithms, therefore, the problems of their systematization and standardization, storage, access, rational and safe use become actual. A i m : development of an approach to systematization and standardization of information about datasets to represent, store, apply and optimize the use of datasets and ensure the safety and transparency of the development and testing of medical devices using artificial intelligence. M a t e r i a l s a n d m e t h o d s : analysis of own and international experience in the creation and use of medical datasets, medical reference books searching and analysis, registry structure development and justification, scientific publications search with the keywords “datasets”, “registry of medical data”, placed in the databases of the RSCI, Scopus, Web of Science. R e s u l t s . The register of medical instrumental diagnostics datasets structure has been developed in accordance with stages of datasets lifecycle: 7 parameters at the initiation stage, 8 – at the planning stage, 70 – dataset card, 1 – version change, 14 – at the use stage, total – 100 parameters. We propose datasets classification according to the purpose of their creation, a classification of data verification methods, as well as the principles of forming names for standardization and datasets presentation clarity. In addition, the main features of the organization of maintaining this registry are highlighted: management, data quality, confidentiality and security. C o n c l u s i o n s . For the first time, an original technology of medical datasets for instrumental diagnostics structuring and systematization is proposed. It is based on the developed terminology and principles of information classification. This makes it possible to standardize the structure of information about datasets for machine learning, and ensures the storage centralization. It also allows to get quick access to all information about the dataset, and ensure transparency, reliability and reproducibility of artificial intelligence developments. Creating a registry makes it possible to quickly form visual data libraries. This allows a wide range of researchers, developers and companies to choose data sets for their tasks. This approach ensures their widespread use, resource optimization and contributes to the rapid development and implementation of artificial intelligence.

13

Xie, Ning-Ning, Fang-Fang Wang, Jue Zhou, Chang Liu e Fan Qu. "Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network". BioMed Research International 2020 (20 agosto 2020): 1–13. http://dx.doi.org/10.1155/2020/2613091.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Polycystic ovary syndrome (PCOS) is one of the most common metabolic and reproductive endocrinopathies. However, few studies have tried to develop a diagnostic model based on gene biomarkers. In this study, we applied a computational method by combining two machine learning algorithms, including random forest (RF) and artificial neural network (ANN), to identify gene biomarkers and construct diagnostic model. We collected gene expression data from Gene Expression Omnibus (GEO) database containing 76 PCOS samples and 57 normal samples; five datasets were utilized, including one dataset for screening differentially expressed genes (DEGs), two training datasets, and two validation datasets. Firstly, based on RF, 12 key genes in 264 DEGs were identified to be vital for classification of PCOS and normal samples. Moreover, the weights of these key genes were calculated using ANN with microarray and RNA-seq training dataset, respectively. Furthermore, the diagnostic models for two types of datasets were developed and named neuralPCOS. Finally, two validation datasets were used to test and compare the performance of neuralPCOS with other two set of marker genes by area under curve (AUC). Our model achieved an AUC of 0.7273 in microarray dataset, and 0.6488 in RNA-seq dataset. To conclude, we uncovered gene biomarkers and developed a novel diagnostic model of PCOS, which would be helpful for diagnosis.

14

Antczak, Karol. "On regularization properties of artificial datasets for deep learning". Computer Science and Mathematical Modelling, n. 9/2019 (30 novembre 2019): 13–18. http://dx.doi.org/10.5604/01.3001.0013.6599.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The paper discusses regularization properties of artificial data for deep learning. Artificial datasets allow to train neural networks in the case of a real data shortage. It is demonstrated that the artificial data generation process, described as injecting noise to high-level features, bears several similarities to existing regularization methods for deep neural networks. One can treat this property of artificial data as a kind of “deep” regularization. It is thus possible to regularize hidden layers of the network by generating the training data in a certain way.

15

Mathur, Varoon, Caitlin Lustig e Elizabeth Kaziunas. "Disordering Datasets". Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (7 novembre 2022): 1–33. http://dx.doi.org/10.1145/3555141.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The application of artificial intelligence (AI) to the behavioral health domain has led to a growing interest in the use of machine learning (ML) techniques to identify patterns in people's personal data with the goal of detecting-and even predicting-conditions such as depression, bipolar disorder, and schizophrenia. This paper investigates the data science practices and design narratives that underlie AI-mediated behavioral health through the situational analysis of three natural language processing (NLP) training datasets. Examining datasets as a sociotechnical system inextricably connected to particular social worlds, discourses, and infrastructural arrangements, we identify several misalignments between the technical project of dataset construction and benchmarking (a current focus of AI research in the behavioral health domain) and the social complexity of behavioral health. Our study contributes to a growing critical CSCW literature of AI systems by articulating the sensitizing concept ofdisordering datasets that aims to productively trouble dominant logics of AI/ML applications in behavioral health, and also support researchers and designers in reflecting on their roles and responsibilities working within this emerging and sensitive design space.

16

Alshayeb, Mohammad, e Mashaan A. Alshammari. "The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study". Inteligencia Artificial 24, n. 68 (26 ottobre 2021): 72–88. http://dx.doi.org/10.4114/intartif.vol24iss68pp72-88.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.

17

Orelaja, Adeyinka, Chidubem Ejiofor, Samuel Sarpong, Success Imakuh, Christian Bassey, Iheanyichukwu Opara, Josiah Nii Armah Tettey e Omolola Akinola. "Attribute-specific Cyberbullying Detection Using Artificial Intelligence". Journal of Electronic & Information Systems 6, n. 1 (28 febbraio 2024): 10–21. http://dx.doi.org/10.30564/jeis.v6i1.6206.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Cyberbullying, a pervasive issue in the digital age, poses threats to individuals’ well-being across various attributes such as religion, age, ethnicity, and gender. This research employs artificial intelligence to detect cyberbullying instances in Twitter data, utilizing both traditional and deep learning models. The study repurposes the Sentiment140 dataset, originally intended for sentiment analysis, for the nuanced task of cyberbullying detection. Ethical considerations guide the dataset transformation process, ensuring responsible AI development. The Naive Bayes algorithm demonstrates commendable precision, recall, and accuracy, showcasing its efficacy. The Bi-LSTM model, leveraging deep learning capabilities, exhibits nuanced cyberbullying detection. The study also underscores limitations, emphasizing the need for refined models and diverse datasets.

18

Wilde, Henry, Vincent Knight e Jonathan Gillard. "Evolutionary dataset optimisation: learning algorithm quality through evolution". Applied Intelligence 50, n. 4 (27 dicembre 2019): 1172–91. http://dx.doi.org/10.1007/s10489-019-01592-4.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractIn this paper we propose a novel method for learning how algorithms perform. Classically, algorithms are compared on a finite number of existing (or newly simulated) benchmark datasets based on some fixed metrics. The algorithm(s) with the smallest value of this metric are chosen to be the ‘best performing’. We offer a new approach to flip this paradigm. We instead aim to gain a richer picture of the performance of an algorithm by generating artificial data through genetic evolution, the purpose of which is to create populations of datasets for which a particular algorithm performs well on a given metric. These datasets can be studied so as to learn what attributes lead to a particular progression of a given algorithm. Following a detailed description of the algorithm as well as a brief description of an open source implementation, a case study in clustering is presented. This case study demonstrates the performance and nuances of the method which we call Evolutionary Dataset Optimisation. In this study, a number of known properties about preferable datasets for the clustering algorithms known as k-means and DBSCAN are realised in the generated datasets.

19

Harper, F. Maxwell, e Joseph A. Konstan. "The MovieLens Datasets". ACM Transactions on Interactive Intelligent Systems 5, n. 4 (7 gennaio 2016): 1–19. http://dx.doi.org/10.1145/2827872.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

20

Sgantzos, Konstantinos, e Ian Grigg. "Artificial Intelligence Implementations on the Blockchain. Use Cases and Future Applications". Future Internet 11, n. 8 (2 agosto 2019): 170. http://dx.doi.org/10.3390/fi11080170.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

An exemplary paradigm of how an AI can be a disruptive technological paragon via the utilization of blockchain comes straight from the world of deep learning. Data scientists have long struggled to maintain the quality of a dataset for machine learning by an AI entity. Datasets can be very expensive to purchase, as, depending on both the proper selection of the elements and the homogeneity of the data contained within, constructing and maintaining the integrity of a dataset is difficult. Blockchain as a highly secure storage medium presents a technological quantum leap in maintaining data integrity. Furthermore, blockchain’s immutability constructs a fruitful environment for creating high quality, permanent and growing datasets for deep learning. The combination of AI and blockchain could impact fields like Internet of things (IoT), identity, financial markets, civil governance, smart cities, small communities, supply chains, personalized medicine and other fields, and thereby deliver benefits to many people.

21

Ansari, Shaheer, Afida Ayob, Molla Shahadat Hossain Lipu, Aini Hussain e Mohamad Hanif Md Saad. "Multi-Channel Profile Based Artificial Neural Network Approach for Remaining Useful Life Prediction of Electric Vehicle Lithium-Ion Batteries". Energies 14, n. 22 (11 novembre 2021): 7521. http://dx.doi.org/10.3390/en14227521.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Remaining useful life (RUL) is a crucial assessment indicator to evaluate battery efficiency, robustness, and accuracy by determining battery failure occurrence in electric vehicle (EV) applications. RUL prediction is necessary for timely maintenance and replacement of the battery in EVs. This paper proposes an artificial neural network (ANN) technique to predict the RUL of lithium-ion batteries under various training datasets. A multi-channel input (MCI) profile is implemented and compared with single-channel input (SCI) or single input (SI) with diverse datasets. A NASA battery dataset is utilized and systematic sampling is implemented to extract 10 sample values of voltage, current, and temperature at equal intervals from each charging cycle to reconstitute the input training profile. The experimental results demonstrate that MCI profile-based RUL prediction is highly accurate compared to SCI profile under diverse datasets. It is reported that RMSE for the proposed MCI profile-based ANN technique is 0.0819 compared to 0.5130 with SCI profile for the B0005 battery dataset. Moreover, RMSE is higher when the proposed model is trained with two datasets and one dataset, respectively. Additionally, the importance of capacity regeneration phenomena in batteries B0006 and B0018 to predict battery RUL is investigated. The results demonstrate that RMSE for the testing battery dataset B0005 is 3.7092, 3.9373 when trained with B0006, B0018, respectively, while it is 3.3678 when trained with B0007 due to the effect of capacity regeneration in B0006 and B0018 battery datasets.

22

Knoblock, Craig A., e Pedro Szekely. "Exploiting Semantics for Big Data Integration". AI Magazine 36, n. 1 (25 marzo 2015): 25–38. http://dx.doi.org/10.1609/aimag.v36i1.2565.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

There is a great deal of interest in big data, focusing mostly on dataset size. An equally important dimension of big data is variety, where the focus is to process highly heterogeneous datasets. We describe how we use semantics to address the problem of big data variety. We also describe Karma, a system that implements our approach and show how Karma can be applied to integrate data in the cultural heritage domain. In this use case, Karma integrates data across many museums even though the datasets from different museums are highly heterogeneous.

23

Chiang, Cheng-Han, e Hung-yi Lee. "On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets". Proceedings of the AAAI Conference on Artificial Intelligence 36, n. 10 (28 giugno 2022): 10518–25. http://dx.doi.org/10.1609/aaai.v36i10.21295.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance than their counterparts directly trained on the downstream tasks. In this work, we study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks. We propose to use artificially constructed datasets as the pre-training data to exclude the effect of semantics, and further control what characteristics the pre-training corpora have. By fine-tuning the pre-trained models on GLUE benchmark, we can learn how beneficial it is to transfer the knowledge from the model trained on the dataset possessing that specific trait. We define and discuss three different characteristics in the artificial dataset: 1) matching the token's uni-gram or bi-gram distribution between pre-training and downstream fine-tuning, 2) the presence of the explicit dependencies among the tokens in a sequence, 3) the length of the implicit dependencies among the tokens in a sequence. Our experiments show that the explicit dependencies in the sequences of the pre-training data are critical to the downstream performance. Our results also reveal that models achieve better downstream performance when pre-trained on a dataset with a longer range of implicit dependencies. Based on our analysis, we find that models pre-trained with artificial datasets are prone to learn spurious correlation in downstream tasks. Our work reveals that even if the LMs are not pre-trained on natural language, they still gain transferability on certain human language downstream tasks once the LMs learn to model the token dependencies in the sequences. This result helps us understand the exceptional transferability of pre-trained LMs.

24

Chen, M., V. Rotemberg, J. Lester, R. Novoa, A. Chiou e R. Daneshjou. "662 Evaluation of diagnosis diversity in artificial intelligence datasets". Journal of Investigative Dermatology 142, n. 8 (agosto 2022): S114. http://dx.doi.org/10.1016/j.jid.2022.05.673.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

25

Wahid, Kareem A., Enrico Glerean, Jaakko Sahlsten, Joel Jaskari, Kimmo Kaski, Mohamed A. Naser, Renjie He, Abdallah S. R. Mohamed e Clifton D. Fuller. "Artificial Intelligence for Radiation Oncology Applications Using Public Datasets". Seminars in Radiation Oncology 32, n. 4 (ottobre 2022): 400–414. http://dx.doi.org/10.1016/j.semradonc.2022.06.009.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

26

Mesquita, Diego P. P., João Paulo P. Gomes e Leonardo R. Rodrigues. "Artificial Neural Networks with Random Weights for Incomplete Datasets". Neural Processing Letters 50, n. 3 (6 marzo 2019): 2345–72. http://dx.doi.org/10.1007/s11063-019-10012-0.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

27

Altman, RB. "Artificial intelligence (AI) systems for interpreting complex medical datasets". Clinical Pharmacology & Therapeutics 101, n. 5 (17 marzo 2017): 585–86. http://dx.doi.org/10.1002/cpt.650.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

28

Uma, Alexandra N., Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank e Massimo Poesio. "Learning from Disagreement: A Survey". Journal of Artificial Intelligence Research 72 (27 dicembre 2021): 1385–470. http://dx.doi.org/10.1613/jair.1.12752.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.

29

Mehta, Harshkumar, e Kalpdrum Passi. "Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)". Algorithms 15, n. 8 (17 agosto 2022): 291. http://dx.doi.org/10.3390/a15080291.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Explainable artificial intelligence (XAI) characteristics have flexible and multifaceted potential in hate speech detection by deep learning models. Interpreting and explaining decisions made by complex artificial intelligence (AI) models to understand the decision-making process of these model were the aims of this research. As a part of this research study, two datasets were taken to demonstrate hate speech detection using XAI. Data preprocessing was performed to clean data of any inconsistencies, clean the text of the tweets, tokenize and lemmatize the text, etc. Categorical variables were also simplified in order to generate a clean dataset for training purposes. Exploratory data analysis was performed on the datasets to uncover various patterns and insights. Various pre-existing models were applied to the Google Jigsaw dataset such as decision trees, k-nearest neighbors, multinomial naïve Bayes, random forest, logistic regression, and long short-term memory (LSTM), among which LSTM achieved an accuracy of 97.6%. Explainable methods such as LIME (local interpretable model—agnostic explanations) were applied to the HateXplain dataset. Variants of BERT (bidirectional encoder representations from transformers) model such as BERT + ANN (artificial neural network) with an accuracy of 93.55% and BERT + MLP (multilayer perceptron) with an accuracy of 93.67% were created to achieve a good performance in terms of explainability using the ERASER (evaluating rationales and simple English reasoning) benchmark.

30

Aggarwal, Mukul, Amod Kumar Tiwari e M. Partha Sarathi. "Comparative Analysis of Deep Learning Models on Brain Tumor Segmentation Datasets: BraTS 2015-2020 Datasets". Revue d'Intelligence Artificielle 36, n. 6 (31 dicembre 2022): 863–71. http://dx.doi.org/10.18280/ria.360606.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Deep Learning neural networks have shown applicability in segmentation of brain tumor images.This research have been carried for comprehensive review of several deep learning neural networks. The datasets included in this study are standard datasets Multimodal Brain Tumor Segmentation (BraTS). This paper has summarized the performance of various deep learning neural network algorithms on BraTS datasets. Algorithms have been compared and summarized against the baseline models with specific attributes like dice score, PPV and sensitivity. It has been found that out of the different models applied on the BraTS 2015 dataset GAN in the year 2020 algorithm is showing better results on this data set. GAN architecture termed RescueNet gave the best segmentation results in terms of 0.94 dice score and 0.88 Sensitivity. This has been also observed that models used cascaded deep learning models had independent deep learning models at each stage which had no correlation among the stages which can cause class imbalance. Further it have found that the Attention models tried to solve problem of class imbalance in the brain tumor segmentation task. This work also found that existing CNN’s is having overfitting issues. For this ResNet models can add a rapid connect bounce relationship parallel to the layers of CNN to accomplish better outcomes for the brain tumor segmentation task.

31

Dewangan, Neha, Kavita Thakur, Sunandan Mandal e Bikesh Kumar Singh. "Time-Frequency Image-based Speech Emotion Recognition using Artificial Neural Network". Journal of Ravishankar University (PART-B) 36, n. 2 (31 dicembre 2023): 144–57. http://dx.doi.org/10.52228/jrub.2023-36-2-10.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Automatic Speech Emotion Recognition (ASER) is a state-of-the-art application in artificial intelligence. Speech recognition intelligence is employed in various applications such as digital assistance, security, and other human-machine interactive products. In the present work, three open-source acoustic datasets, namely SAVEE, RAVDESS, and EmoDB, have been utilized (Haq et al., 2008, Livingstone et al., 2005, Burkhardt et al., 2005). From these datasets, six emotions namely anger, disgust, fear, happy, neutral, and sad, are selected for automatic speech emotion recognition. Various types of algorithms are already reported for extracting emotional content from acoustic signals. This work proposes a time-frequency (t-f) image-based multiclass speech emotion classification model for the six emotions mentioned above. The proposed model extracts 472 grayscale image features from the t-f images of speech signals. The t-f image is a visual representation of the time component and frequency component at that time in the two-dimensional space, and differing colors show its amplitude. An artificial neural network-based multiclass machine learning approach is used to classify selected emotions. The experimental results show that the above-mentioned emotions' average classification accuracy (CA) of 88.6%, 85.5%, and 93.56% is achieved using SAVEE, RAVDESS, and EmoDB datasets, respectively. Also, an average CA of 83.44% has been achieved for the combination of all three datasets. The maximum reported average classification accuracy (CA) using spectrogram for SAVEE, RAVDESS, and EmoDB dataset is 87.8%, 79.5 %, and 83.4%, respectively (Wani et al., 2020, Mustaqeem and Kwon, 2019, Badshah et al., 2017). The proposed t-f image-based classification model shows improvement in average CA by 0.91%, 7.54%, and 12.18 % for SAVEE, RAVDESS, and EmoDB datasets, respectively. This study can be helpful in human-computer interface applications to detect emotions precisely from acoustic signals.

32

Dognin, Pierre, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young e Brian Belgodere. "Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge". Journal of Artificial Intelligence Research 73 (31 gennaio 2022): 437–59. http://dx.doi.org/10.1613/jair.1.13113.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems. This article appears in the special track on AI & Society.

33

Polymenis, Ioannis, Maryam Haroutunian, Rose Norman e David Trodden. "Virtual Underwater Datasets for Autonomous Inspections". Journal of Marine Science and Engineering 10, n. 9 (13 settembre 2022): 1289. http://dx.doi.org/10.3390/jmse10091289.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Underwater Vehicles have become more sophisticated, driven by the off-shore sector and the scientific community’s rapid advancements in underwater operations. Notably, many underwater tasks, including the assessment of subsea infrastructure, are performed with the assistance of Autonomous Underwater Vehicles (AUVs). There have been recent breakthroughs in Artificial Intelligence (AI) and, notably, Deep Learning (DL) models and applications, which have widespread usage in a variety of fields, including aerial unmanned vehicles, autonomous car navigation, and other applications. However, they are not as prevalent in underwater applications due to the difficulty of obtaining underwater datasets for a specific application. In this sense, the current study utilises recent advancements in the area of DL to construct a bespoke dataset generated from photographs of items captured in a laboratory environment. Generative Adversarial Networks (GANs) were utilised to translate the laboratory object dataset into the underwater domain by combining the collected images with photographs containing the underwater environment. The findings demonstrated the feasibility of creating such a dataset, since the resulting images closely resembled the real underwater environment when compared with real-world underwater ship hull images. Therefore, the artificial datasets of the underwater environment can overcome the difficulties arising from the limited access to real-world underwater images and are used to enhance underwater operations through underwater object image classification and detection.

34

Landes, Juergen, e Jon Williamson. "Objective Bayesian Nets for Integrating Consistent Datasets". Journal of Artificial Intelligence Research 74 (27 maggio 2022): 393–458. http://dx.doi.org/10.1613/jair.1.13363.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper addresses a data integration problem: given several mutually consistent datasets each of which measures a subset of the variables of interest, how can one construct a probabilistic model that fits the data and gives reasonable answers to questions which are under-determined by the data? Here we show how to obtain a Bayesian network model which represents the unique probability function that agrees with the probability distributions measured by the datasets and otherwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantial efficiency savings over the standard brute-force approach to determining the maximum entropy probability function. Furthermore, we develop modifications to the general algorithm which enable further efficiency savings but which are only applicable in particular situations. We show that there are circumstances in which one can obtain the model (i) directly from the data; (ii) by solving algebraic problems; and (iii) by solving relatively simple independent optimisation problems.

35

Rodriguez-Baena, Domingo S. "Extracting and validating biclusters from binary datasets". AI Communications 26, n. 4 (2013): 417–18. http://dx.doi.org/10.3233/aic-130570.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

36

Alfonso Perez, Gerardo, e Javier Caballero Villarraso. "Alzheimer Identification through DNA Methylation and Artificial Intelligence Techniques". Mathematics 9, n. 19 (4 ottobre 2021): 2482. http://dx.doi.org/10.3390/math9192482.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

A nonlinear approach to identifying combinations of CpGs DNA methylation data, as biomarkers for Alzheimer (AD) disease, is presented in this paper. It will be shown that the presented algorithm can substantially reduce the amount of CpGs used while generating forecasts that are more accurate than using all the CpGs available. It is assumed that the process, in principle, can be non-linear; hence, a non-linear approach might be more appropriate. The proposed algorithm selects which CpGs to use as input data in a classification problem that tries to distinguish between patients suffering from AD and healthy control individuals. This type of classification problem is suitable for techniques, such as support vector machines. The algorithm was used both at a single dataset level, as well as using multiple datasets. Developing robust algorithms for multi-datasets is challenging, due to the impact that small differences in laboratory procedures have in the obtained data. The approach that was followed in the paper can be expanded to multiple datasets, allowing for a gradual more granular understanding of the underlying process. A 92% successful classification rate was obtained, using the proposed method, which is a higher value than the result obtained using all the CpGs available. This is likely due to the reduction in the dimensionality of the data obtained by the algorithm that, in turn, helps to reduce the risk of reaching a local minima.

37

Vobecký, Antonín, David Hurych, Michal Uřičář, Patrick Pérez e Josef Sivic. "Artificial Dummies for Urban Dataset Augmentation". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 3 (18 maggio 2021): 2692–700. http://dx.doi.org/10.1609/aaai.v35i3.16373.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Existing datasets for training pedestrian detectors in images suffer from limited appearance and pose variation. The most challenging scenarios are rarely included because they are too difficult to capture due to safety reasons, or they are very unlikely to happen. The strict safety requirements in assisted and autonomous driving applications call for an extra high detection accuracy also in these rare situations. Having the ability to generate people images in arbitrary poses, with arbitrary appearances and embedded in different background scenes with varying illumination and weather conditions, is a crucial component for the development and testing of such applications. The contributions of this paper are three-fold. First, we describe an augmentation method for the controlled synthesis of urban scenes containing people, thus producing rare or never-seen situations. This is achieved with a data generator (called DummyNet) with disentangled control of the pose, the appearance, and the target background scene. Second, the proposed generator relies on novel network architecture and associated loss that takes into account the segmentation of the foreground person and its composition into the background scene. Finally, we demonstrate that the data generated by our DummyNet improve the performance of several existing person detectors across various datasets as well as in challenging situations, such as night-time conditions, where only a limited amount of training data is available. In the setup with only day-time data available, we improve the night-time detector by 17% log-average miss rate over the detector trained with the day-time data only.

38

Merdas, Hussam, e Ayad Mousa. "Forecasting Sales of Iraqi Dates Using Artificial Intelligence". Iraqi Journal of Intelligent Computing and Informatics (IJICI) 2, n. 2 (17 novembre 2023): 130–45. http://dx.doi.org/10.52940/ijici.v2i2.47.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Iraq is considered one of the first countries in the world to export Dates of all kinds. This sector at present needs support and serious work to improve sales to provide the country's economy with more revenues. This study proposes building an integrated artificial intelligence model that predicts the quantities of Dates that Iraq will produce in the coming years based on previous data and based on two main points: The first point is to make a comparison between three different food datasets with a different correlation between their features, as the first dataset is of high correlation, the second is of medium correlation, and the third is of weak correlation. The second point is to apply twelve Machine Learning algorithms and evaluate their results to obtain the best three algorithms. The model was applied to predict the quantities of Dates that Iraq will produce for the next five years. The proposed three algorithms were used and gave the following results: (Gradient Boosting: 99.51, Random Forest: 97.05, and Bagging Regressor: 98.54). This study constitutes a starting point for future studies in terms of the process of choosing the datasets, as well as the machine learning technique.

39

Kang, Myounghee, Takeshi Nakamura e Akira Hamano. "A methodology for acoustic and geospatial analysis of diverse artificial-reef datasets". ICES Journal of Marine Science 68, n. 10 (2 settembre 2011): 2210–21. http://dx.doi.org/10.1093/icesjms/fsr141.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract Kang, M., Nakamura, T., and Hamano, A. 2011. A methodology for acoustic and geospatial analysis of diverse artificial-reef datasets. – ICES Journal of Marine Science, 68: 2210–2221. A methodology is introduced for understanding fish-school characteristics around artificial reefs and for obtaining the quantitative relationship between geospatial datasets related to artificial-reef environments using a new geographic information system application. To describe the characteristics of fish schools (energetic, positional, morphological characteristics and dB difference range), acoustic data from two artificial reefs located off the coast of Shimonoseki, Yamaguchi prefecture, Japan, were used. To demonstrate the methodology of the geospatial analysis, diverse datasets on artificial reefs, such as fish-school characteristics, marine-environmental information from a conductivity, temperature, and depth sensor, information on artificial reefs, seabed geographic information, and sediment information around the reefs, were utilized. The habitat preference of fish schools was demonstrated quantitatively. The acoustic density of fish schools is described with respect to the closest distance from reefs and the preferred reef depths, the relationship between fish schools and environmental information was visualized in three dimensions, and the current condition of the reefs and their connection to seabed type is represented. This geospatial method of analysis can provide a better way of comprehensively understanding the circumstances around artificial-reef environments.

40

Agliari, Elena, Francesco Alemanno, Miriam Aquaro, Adriano Barra, Fabrizio Durante e Ido Kanter. "Hebbian dreaming for small datasets". Neural Networks 173 (maggio 2024): 106174. http://dx.doi.org/10.1016/j.neunet.2024.106174.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

41

Chiaia, Bernardino, e Valerio De Biagi. "Archetypal Use of Artificial Intelligence for Bridge Structural Monitoring". Applied Sciences 10, n. 20 (14 ottobre 2020): 7157. http://dx.doi.org/10.3390/app10207157.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Structural monitoring is a research topic that is receiving more and more attention, especially in light of the fact that a large part our infrastructural heritage was built in the Sixties and is aging and approaching the end of its design working life. The detection of damage is usually performed through artificial intelligence techniques. In contrast, tools for the localization and the estimation of the extent of the damage are limited, mainly due to the complete datasets of damages needed for training the system. The proposed approach consists in numerically generating datasets of damaged structures on the basis of random variables representing the actions and the possible damages. Neural networks were trained to perform the main structural monitoring tasks: damage detection, localization, and estimation. The artificial intelligence tool interpreted the measurements on a real structure. To simulate real measurements more accurately, noise was added to the synthetic dataset. The results indicate that the accuracy of the measurement devices plays a relevant role in the quality of the monitoring.

42

Kamp, R. G., e H. H. G. Savenije. "Optimising training data for ANNs with Genetic Algorithms". Hydrology and Earth System Sciences 10, n. 4 (7 settembre 2006): 603–8. http://dx.doi.org/10.5194/hess-10-603-2006.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract. Artificial Neural Networks (ANNs) have proved to be good modelling tools in hydrology for rainfall-runoff modelling and hydraulic flow modelling. Representative datasets are necessary for the training phase in which the ANN learns the model's input-output relations. Good and representative training data is not always available. In this publication Genetic Algorithms (GA) are used to optimise training datasets. The approach is tested with an existing hydraulic model in The Netherlands. An initial trainnig dataset is used for training the ANN. After optimisation with a GA of the training dataset the ANN produced more accurate model results.

43

Perafan-Lopez, Juan Carlos, Valeria Lucía Ferrer-Gregory, César Nieto-Londoño e Julián Sierra-Pérez. "Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets". Entropy 24, n. 7 (25 giugno 2022): 875. http://dx.doi.org/10.3390/e24070875.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a widely used algorithm for exploratory clustering applications. Despite the DBSCAN algorithm being considered an unsupervised pattern recognition method, it has two parameters that must be tuned prior to the clustering process in order to reduce uncertainties, the minimum number of points in a clustering segmentation MinPts, and the radii around selected points from a specific dataset Eps. This article presents the performance of a clustering hybrid algorithm for automatically grouping datasets into a two-dimensional space using the well-known algorithm DBSCAN. Here, the function nearest neighbor and a genetic algorithm were used for the automation of parameters MinPts and Eps. Furthermore, the Factor Analysis (FA) method was defined for pre-processing through a dimensionality reduction of high-dimensional datasets with dimensions greater than two. Finally, the performance of the clustering algorithm called FA+GA-DBSCAN was evaluated using artificial datasets. In addition, the precision and Entropy of the clustering hybrid algorithm were measured, which showed there was less probability of error in clustering the most condensed datasets.

44

Adolfo, Cid Mathew Santiago, Hassan Chizari, Thu Yein Win e Salah Al-Majeed. "Sample Reduction for Physiological Data Analysis Using Principal Component Analysis in Artificial Neural Network". Applied Sciences 11, n. 17 (6 settembre 2021): 8240. http://dx.doi.org/10.3390/app11178240.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

With its potential, extensive data analysis is a vital part of biomedical applications and of medical practitioner interpretations, as data analysis ensures the integrity of multidimensional datasets and improves classification accuracy; however, with machine learning, the integrity of the sources is compromised when the acquired data pose a significant threat in diagnosing and analysing such information, such as by including noisy and biased samples in the multidimensional datasets. Removing noisy samples in dirty datasets is integral to and crucial in biomedical applications, such as the classification and prediction problems using artificial neural networks (ANNs) in the body’s physiological signal analysis. In this study, we developed a methodology to identify and remove noisy data from a dataset before addressing the classification problem of an artificial neural network (ANN) by proposing the use of the principal component analysis–sample reduction process (PCA–SRP) to improve its performance as a data-cleaning agent. We first discuss the theoretical background to this data-cleansing methodology in the classification problem of an artificial neural network (ANN). Then, we discuss how the PCA is used in data-cleansing techniques through a sample reduction process (SRP) using various publicly available biomedical datasets with different samples and feature sizes. Lastly, the cleaned datasets were tested through the following: PCA–SRP in ANN accuracy comparison testing, sensitivity vs. specificity testing, receiver operating characteristic (ROC) curve testing, and accuracy vs. additional random sample testing. The results show a significant improvement in the classification of ANNs using the developed methodology and suggested a recommended range of selectivity (Sc) factors for typical cleaning and ANN applications. Our approach successfully cleaned the noisy biomedical multidimensional datasets and yielded up to an 8% increase in accuracy with the aid of the Python language.

45

Guha, Ritam, Manosij Ghosh, Pawan Kumar Singh, Ram Sarkar e Mita Nasipuri. "M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification". Journal of Intelligent Systems 29, n. 1 (14 giugno 2019): 1453–67. http://dx.doi.org/10.1515/jisys-2019-0064.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract The feature selection process is very important in the field of pattern recognition, which selects the informative features so as to reduce the curse of dimensionality, thus improving the overall classification accuracy. In this paper, a new feature selection approach named Memory-Based Histogram-Oriented Multi-objective Genetic Algorithm (M-HMOGA) is introduced to identify the informative feature subset to be used for a pattern classification problem. The proposed M-HMOGA approach is applied to two recently used feature sets, namely Mojette transform and Regional Weighted Run Length features. The experimentations are carried out on Bangla, Devanagari, and Roman numeral datasets, which are the three most popular scripts used in the Indian subcontinent. In-house Bangla and Devanagari script datasets and Competition on Handwritten Digit Recognition (HDRC) 2013 Roman numeral dataset are used for evaluating our model. Moreover, as proof of robustness, we have applied an innovative approach of using different datasets for training and testing. We have used in-house Bangla and Devanagari script datasets for training the model, and the trained model is then tested on Indian Statistical Institute numeral datasets. For Roman numerals, we have used the HDRC 2013 dataset for training and the Modified National Institute of Standards and Technology dataset for testing. Comparison of the results obtained by the proposed model with existing HMOGA and MOGA techniques clearly indicates the superiority of M-HMOGA over both of its ancestors. Moreover, use of K-nearest neighbor as well as multi-layer perceptron as classifiers speaks for the classifier-independent nature of M-HMOGA. The proposed M-HMOGA model uses only about 45–50% of the total feature set in order to achieve around 1% increase when the same datasets are partitioned for training-testing and a 2–3% increase in the classification ability while using only 35–45% features when different datasets are used for training-testing with respect to the situation when all the features are used for classification.

46

Aliyari, Mostafa, e Yonas Zewdu Ayele. "Application of Artificial Neural Networks for Power Load Prediction in Critical Infrastructure: A Comparative Case Study". Applied System Innovation 6, n. 6 (30 novembre 2023): 115. http://dx.doi.org/10.3390/asi6060115.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This article aims to assess the effectiveness of state-of-the-art artificial neural network (ANN) models in time series analysis, specifically focusing on their application in prediction tasks of critical infrastructures (CIs). To accomplish this, shallow models with nearly identical numbers of trainable parameters are constructed and examined. The dataset, which includes 120,884 hourly electricity consumption records, is divided into three subsets (25%, 50%, and the entire dataset) to examine the effect of increasing training data. Additionally, the same models are trained and evaluated for univariable and multivariable data to evaluate the impact of including more features. The case study specifically focuses on predicting electricity consumption using load information from Norway. The results of this study confirm that LSTM models emerge as the best-performed model, surpassing other models as data volume and feature increase. Notably, for training datasets ranging from 2000 to 22,000 instances, GRU exhibits superior accuracy, while in the 22,000 to 42,000 range, LSTM and BiLSTM are the best. When the training dataset is within 42,000 to 360,000, LSTM and ConvLSTM prove to be good choices in terms of accuracy. Convolutional-based models exhibit superior performance in terms of computational efficiency. The convolutional 1D univariable model emerges as a standout choice for scenarios where training time is critical, sacrificing only 0.000105 in accuracy while a threefold improvement in training time is gained. For training datasets lower than 22,000, feature inclusion does not enhance any of the ANN model’s performance. In datasets exceeding 22,000 instances, ANN models display no consistent pattern regarding feature inclusion, though LSTM, Conv1D, Conv2D, ConvLSTM, and FCN tend to benefit. BiLSTM, GRU, and Transformer do not benefit from feature inclusion, regardless of the training dataset size. Moreover, Transformers exhibit inefficiency in time series forecasting due to their permutation-invariant self-attention mechanism, neglecting the crucial role of sequence order, as evidenced by their poor performance across all three datasets in this study. These results provide valuable insights into the capabilities of ANN models and their effective usage in the context of CI prediction tasks.

47

Khan, Somaiya, e Ali Khan. "SkinViT: A transformer based method for Melanoma and Nonmelanoma classification". PLOS ONE 18, n. 12 (27 dicembre 2023): e0295151. http://dx.doi.org/10.1371/journal.pone.0295151.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Over the past few decades, skin cancer has emerged as a major global health concern. The efficacy of skin cancer treatment greatly depends upon early diagnosis and effective treatment. The automated classification of Melanoma and Nonmelanoma is quite challenging task due to presence of high visual similarities across different classes and variabilities within each class. According to the best of our knowledge, this study represents the classification of Melanoma and Nonmelanoma utilising Basal Cell Carcinoma (BCC) and Squamous Cell Carcinoma (SCC) under the Nonmelanoma class for the first time. Therefore, this research focuses on automated detection of different skin cancer types to provide assistance to the dermatologists in timely diagnosis and treatment of Melanoma and Nonmelanoma patients. Recently, artificial intelligence (AI) methods have gained popularity where Convolutional Neural Networks (CNNs) are employed to accurately classify various skin diseases. However, CNN has limitation in its ability to capture global contextual information which may lead to missing important information. In order to address this issue, this research explores the outlook attention mechanism inspired by vision outlooker, which improves important features while suppressing noisy features. The proposed SkinViT architecture integrates an outlooker block, transformer block and MLP head block to efficiently capture both fine level and global features in order to enhance the accuracy of Melanoma and Nonmelanoma classification. The proposed SkinViT method is assessed by different performance metrics such as recall, precision, classification accuracy, and F1 score. We performed extensive experiments on three datasets, Dataset1 which is extracted from ISIC2019, Dataset2 collected from various online dermatological database and Dataset3 combines both datasets. The proposed SkinViT achieved 0.9109 accuracy on Dataset1, 0.8911 accuracy on Dataset3 and 0.8611 accuracy on Dataset2. Moreover, the proposed SkinViT method outperformed other SOTA models and displayed higher accuracy compared to the previous work in the literature. The proposed method demonstrated higher performance efficiency in classification of Melanoma and Nonmelanoma dermoscopic images. This work is expected to inspire further research in implementing a system for detecting skin cancer that can assist dermatologists in timely diagnosing Melanoma and Nonmelanoma patients.

48

Park, Min-Ho, Chang-Min Lee, Antony John Nyongesa, Hee-Joo Jang, Jae-Hyuk Choi, Jae-Jung Hur e Won-Ju Lee. "Prediction of Emission Characteristics of Generator Engine with Selective Catalytic Reduction Using Artificial Intelligence". Journal of Marine Science and Engineering 10, n. 8 (13 agosto 2022): 1118. http://dx.doi.org/10.3390/jmse10081118.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Eco-friendliness is an important global issue, and the maritime field is no exception. Predicting the composition of exhaust gases emitted by ship engines will be of consequence in this respect. Therefore, in this study, exhaust gas data were collected from the generator engine of a real ship along with engine-related data to predict emission characteristics. This is because installing an emission gas analyzer on a ship has substantial economic burden, and, even if it is installed, the accuracy can be increased by a virtual sensor. Furthermore, data were obtained with and without operating the SCR (often mounted on ships to reduce NOx), which is a crucial facility to satisfy environment regulation. In this study, four types of datasets were created by adding cooling and electrical-related variables to the basic engine dataset to check whether it improves model performance or not; each of these datasets consisted of 15 to 26 variables as inputs. CO2 (%), NOx (ppm), and tEx (°C) were predicted from each dataset using an artificial neural network (ANN) model and a support vector machine (SVM) model with optimal hyperparameters selected by trial and error. The results confirmed that the SVM model performed better on smaller datasets, such as the one used in this study compared to the ANN model. Moreover, the dataset type, DaCE, which had both cooling and electrical-related variables added to the basic engine dataset, yielded the best overall prediction performance. When the performance of the SVM model was measured using the test data of a DaCE on both no-SCR mode and SCR mode, the RMSE (R2) of CO2 was between 0.1137% (0.8119) and 0.0912% (0.8975), the RMSE (R2) of NOx was between 17.1088 ppm (0.9643) and 13.6775 ppm (0.9776), and the RMSE (R2) of tEx was between 4.5839 °C (0.8754) and 1.5688 °C (0.9392).

49

Nwokoma, Faith, Justin Foreman e Cajetan M. Akujuobi. "Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis". Machine Learning and Knowledge Extraction 6, n. 2 (3 aprile 2024): 789–99. http://dx.doi.org/10.3390/make6020037.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used unsupervised learning to discover the top-k discriminative features present in the large multivariate IoT dataset used. We used the statistics of principal component analysis to filter the relevant features based on the ranks of the features along the principal directions while also considering the coefficients of the components. The selected number of principal components was used to decide the number of features to be selected in the SVD process. A number of experiments were conducted using different benchmark datasets, and the effectiveness of the proposed method was evaluated based on the reconstruction error. The potency of the results was verified by subjecting the algorithm to a large IoT dataset, and we compared the performance based on accuracy and reconstruction error to the results of the benchmark datasets. The performance evaluation showed consistency with the results obtained with the benchmark datasets, which were of high accuracy and low reconstruction error.

50

Douzas, Georgios, Maria Lechleitner e Fernando Bacao. "Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data". PLOS ONE 17, n. 4 (7 aprile 2022): e0265626. http://dx.doi.org/10.1371/journal.pone.0265626.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.