Relevant bibliographies by topics / PREDICTION DATASET

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'PREDICTION DATASET'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'PREDICTION DATASET.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "PREDICTION DATASET"

Burmakova, Anastasiya, and Diana Kalibatienė. "Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment." Applied Sciences 12, no. 16 (August 18, 2022): 8252. http://dx.doi.org/10.3390/app12168252.

Full text

Abstract:

Applying machine learning (ML) and fuzzy inference systems (FIS) requires large datasets to obtain more accurate predictions. However, in the cases of oil spills on ground environments, only small datasets are available. Therefore, this research aims to assess the suitability of ML techniques and FIS for the prediction of the consequences of oil spills on ground environments using small datasets. Consequently, we present a hybrid approach for assessing the suitability of ML (Linear Regression, Decision Trees, Support Vector Regression, Ensembles, and Gaussian Process Regression) and the adaptive neural fuzzy inference system (ANFIS) for predicting the consequences of oil spills with a small dataset. This paper proposes enlarging the initial small dataset of an oil spill on a ground environment by using the synthetic data generated by applying a mathematical model. ML techniques and ANFIS were tested with the same generated synthetic datasets to assess the proposed approach. The proposed ANFIS-based approach shows significant performance and sufficient efficiency for predicting the consequences of oil spills on ground environments with a smaller dataset than the applied ML techniques. The main finding of this paper indicates that FIS is suitable for prediction with a small dataset and provides sufficiently accurate prediction results.

APA, Harvard, Vancouver, ISO, and other styles

Abdullahi, Dauda Sani, Dr Muhammad Sirajo Aliyu, and Usman Musa Abdullahi. "Comparative analysis of resampling algorithms in the prediction of stroke diseases." UMYU Scientifica 2, no. 1 (March 30, 2023): 88–94. http://dx.doi.org/10.56919/usci.2123.011.

Full text

Abstract:

Stroke disease is a serious cause of death globally. Early predictions of the disease will save a lot of lives but most of the clinical datasets are imbalanced in nature including the stroke dataset, making the predictive algorithms biased towards the majority class. The objective of this research is to compare different data resampling algorithms on the stroke dataset to improve the prediction performances of the machine learning models. This paper considered five (5) resampling algorithms namely; Random over Sampling (ROS), Synthetic Minority oversampling Technique (SMOTE), Adaptive Synthetic (ADASYN), hybrid techniques like SMOTE with Edited Nearest Neighbor (SMOTE-ENN), and SMOTE with Tomek Links (SMOTE-TOMEK) and trained on six (6) machine learning classifiers namely; Logistic Regression (LR), Decision Tree (DT), K-nearest Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF), and XGBoost (XGB). The hybrid technique SMOTE-ENN influences the machine learning classifiers the best followed by the SMOTE technique while the combination of SMOTE and XGB perform better with an accuracy of 97.99% and G-mean score of 0.99, and auc_roc score of 0.99. Resampling algorithms balance the dataset and enhanced the predictive power of machine learning algorithms. Therefore, we recommend resampling stroke dataset in predicting stroke disease than modeling on the imbalanced dataset.

APA, Harvard, Vancouver, ISO, and other styles

Gangil, Tarun, Krishna Sharan, B. Dinesh Rao, Krishnamoorthy Palanisamy, Biswaroop Chakrabarti, and Rajagopal Kadavigere. "Utility of adding Radiomics to clinical features in predicting the outcomes of radiotherapy for head and neck cancer using machine learning." PLOS ONE 17, no. 12 (December 15, 2022): e0277168. http://dx.doi.org/10.1371/journal.pone.0277168.

Full text

Abstract:

Background Radiomics involves the extraction of quantitative information from annotated Computed-Tomography (CT) images, and has been used to predict outcomes in Head and Neck Squamous Cell Carcinoma (HNSCC). Subjecting combined Radiomics and Clinical features to Machine Learning (ML) could offer better predictions of clinical outcomes. This study is a comparative performance analysis of ML models with Clinical, Radiomics, and Clinico-Radiomic datasets for predicting four outcomes of HNSCC treated with Curative Radiation Therapy (RT): Distant Metastases, Locoregional Recurrence, New Primary, and Residual Disease. Methodology The study used retrospective data of 311 HNSCC patients treated with radiotherapy between 2013–2018 at our centre. Binary prediction models were developed for the four outcomes with Clinical-only, Clinico-Radiomic, and Radiomics-only datasets, using three different ML classification algorithms namely, Random Forest (RF), Kernel Support Vector Machine (KSVM), and XGBoost. The best-performing ML algorithms of the three dataset groups was then compared. Results The Clinico-Radiomic dataset using KSVM classifier provided the best prediction. Predicted mean testing accuracy for Distant Metastases, Locoregional Recurrence, New Primary, and Residual Disease was 97%, 72%, 99%, and 96%, respectively. The mean area under the receiver operating curve (AUC) was calculated and displayed for all the models using three dataset groups. Conclusion Clinico-Radiomic dataset improved the predictive ability of ML models over clinical features alone, while models built using Radiomics performed poorly. Radiomics data could therefore effectively supplement clinical data in predicting outcomes.

APA, Harvard, Vancouver, ISO, and other styles

Rau, Cheng-Shyuan, Shao-Chun Wu, Jung-Fang Chuang, Chun-Ying Huang, Hang-Tsung Liu, Peng-Chen Chien, and Ching-Hua Hsieh. "Machine Learning Models of Survival Prediction in Trauma Patients." Journal of Clinical Medicine 8, no. 6 (June 5, 2019): 799. http://dx.doi.org/10.3390/jcm8060799.

Full text

Abstract:

Background: We aimed to build a model using machine learning for the prediction of survival in trauma patients and compared these model predictions to those predicted by the most commonly used algorithm, the Trauma and Injury Severity Score (TRISS). Methods: Enrolled hospitalized trauma patients from 2009 to 2016 were divided into a training dataset (70% of the original data set) for generation of a plausible model under supervised classification, and a test dataset (30% of the original data set) to test the performance of the model. The training and test datasets comprised 13,208 (12,871 survival and 337 mortality) and 5603 (5473 survival and 130 mortality) patients, respectively. With the provision of additional information such as pre-existing comorbidity status or laboratory data, logistic regression (LR), support vector machine (SVM), and neural network (NN) (with the Stuttgart Neural Network Simulator (RSNNS)) were used to build models of survival prediction and compared to the predictive performance of TRISS. Predictive performance was evaluated by accuracy, sensitivity, and specificity, as well as by area under the curve (AUC) measures of receiver operating characteristic curves. Results: In the validation dataset, NN and the TRISS presented the highest score (82.0%) for balanced accuracy, followed by SVM (75.2%) and LR (71.8%) models. In the test dataset, NN had the highest balanced accuracy (75.1%), followed by the TRISS (70.2%), SVM (70.6%), and LR (68.9%) models. All four models (LR, SVM, NN, and TRISS) exhibited a high accuracy of more than 97.5% and a sensitivity of more than 98.6%. However, NN exhibited the highest specificity (51.5%), followed by the TRISS (41.5%), SVM (40.8%), and LR (38.5%) models. Conclusions: These four models (LR, SVM, NN, and TRISS) exhibited a similar high accuracy and sensitivity in predicting the survival of the trauma patients. In the test dataset, the NN model had the highest balanced accuracy and predictive specificity.

APA, Harvard, Vancouver, ISO, and other styles

Sinaga, Benyamin Langgu, Sabrina Ahmad, Zuraida Abal Abas, and Intan Ermahani A. Jalil. "A recommendation system of training data selection method for cross-project defect prediction." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 2 (August 1, 2022): 990. http://dx.doi.org/10.11591/ijeecs.v27.i2.pp990-1006.

Full text

Abstract:

Cross-project <span lang="EN-US">defect prediction (CPDP) has been a popular approach to address the limited historical dataset when building a defect prediction model. Directly applying cross-project datasets to learn the prediction model produces an unsatisfactory predictive model. Therefore, the selection of training data is essential. Many studies have examined the effectiveness of training data selection methods, and the best-performing method varied across datasets. While no method consistently outperformed the others across all datasets, predicting the best method for a specific dataset is essential. This study proposed a recommendation system to select the most suitable training data selection method in the CPDP setting. We evaluated the proposed system using 44 datasets, 13 training data selection methods, and six classification algorithms. The findings concluded that the recommendation system effectively recommends the best method to select training data.</span>

APA, Harvard, Vancouver, ISO, and other styles

Morgan, Maria, Carla Blank, and Raed Seetan. "Plant disease prediction using classification algorithms." IAES International Journal of Artificial Intelligence (IJ-AI) 10, no. 1 (March 1, 2021): 257. http://dx.doi.org/10.11591/ijai.v10.i1.pp257-264.

Full text

Abstract:

<p>This paper investigates the capability of six existing classification algorithms (Artificial Neural Network, Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Decision Tree and Random Forest) in classifying and predicting diseases in soybean and mushroom datasets using datasets with numerical or categorical attributes. While many similar studies have been conducted on datasets of images to predict plant diseases, the main objective of this study is to suggest classification methods that can be used for disease classification and prediction in datasets that contain raw measurements instead of images. A fungus and a plant dataset, which had many differences, were chosen so that the findings in this paper could be applied to future research for disease prediction and classification in a variety of datasets which contain raw measurements. A key difference between the two datasets, other than one being a fungus and one being a plant, is that the mushroom dataset is balanced and only contained two classes while the soybean dataset is imbalanced and contained eighteen classes. All six algorithms performed well on the mushroom dataset, while the Artificial Neural Network and k-Nearest Neighbor algorithms performed best on the soybean dataset. The findings of this paper can be applied to future research on disease classification and prediction in a variety of dataset types such as fungi, plants, humans, and animals.</p>

APA, Harvard, Vancouver, ISO, and other styles

Nunez, John-Jose, Teyden T. Nguyen, Yihan Zhou, Bo Cao, Raymond T. Ng, Jun Chen, Benicio N. Frey, et al. "Replication of machine learning methods to predict treatment outcome with antidepressant medications in patients with major depressive disorder from STAR*D and CAN-BIND-1." PLOS ONE 16, no. 6 (June 28, 2021): e0253023. http://dx.doi.org/10.1371/journal.pone.0253023.

Full text

Abstract:

Objectives Antidepressants are first-line treatments for major depressive disorder (MDD), but 40–60% of patients will not respond, hence, predicting response would be a major clinical advance. Machine learning algorithms hold promise to predict treatment outcomes based on clinical symptoms and episode features. We sought to independently replicate recent machine learning methodology predicting antidepressant outcomes using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset, and then externally validate these methods to train models using data from the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) dataset. Methods We replicated methodology from Nie et al (2018) using common algorithms based on linear regressions and decision trees to predict treatment-resistant depression (TRD, defined as failing to respond to 2 or more antidepressants) in the STAR*D dataset. We then trained and externally validated models using the clinical features found in both datasets to predict response (≥50% reduction on the Quick Inventory for Depressive Symptomatology, Self-Rated [QIDS-SR]) and remission (endpoint QIDS-SR score ≤5) in the CAN-BIND-1 dataset. We evaluated additional models to investigate how different outcomes and features may affect prediction performance. Results Our replicated models predicted TRD in the STAR*D dataset with slightly better balanced accuracy than Nie et al (70%-73% versus 64%-71%, respectively). Prediction performance on our external methodology validation on the CAN-BIND-1 dataset varied depending on outcome; performance was worse for response (best balanced accuracy 65%) compared to remission (77%). Using the smaller set of features found in both datasets generally improved prediction performance when evaluated on the STAR*D dataset. Conclusion We successfully replicated prior work predicting antidepressant treatment outcomes using machine learning methods and clinical data. We found similar prediction performance using these methods on an external database, although prediction of remission was better than prediction of response. Future work is needed to improve prediction performance to be clinically useful.

APA, Harvard, Vancouver, ISO, and other styles

Ahamed, B. Shamreen, Meenakshi S. Arya, and Auxilia Osvin V. Nancy. "Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation." Advances in Human-Computer Interaction 2022 (September 19, 2022): 1–14. http://dx.doi.org/10.1155/2022/9220560.

Full text

Abstract:

The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.

APA, Harvard, Vancouver, ISO, and other styles

Partin, Alexander, Thomas S. Brettin, Yitan Zhu, Jamie Overbeek, Oleksandr Narykov, Priyanka Vasanthakumari, Austin Clyde, et al. "Abstract 5380: Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets." Cancer Research 83, no. 7_Supplement (April 4, 2023): 5380. http://dx.doi.org/10.1158/1538-7445.am2023-5380.

Full text

Abstract:

Abstract Predictive modeling holds great promise for improving personalized cancer treatment and efficiency of drug development. In recent years, deep learning (DL) has been extensively explored for drug response prediction (DRP), outperforming classical machine learning in prediction generalization to new data. Despite the considerable interest in DRP, no agreed-upon methodology for evaluating and comparing the diverse DL models yet exists. Existing papers generally demonstrate the performance of proposed models using cross-validation within a single cell line dataset and compare with baseline models of their choice, substantially limiting the scope and validity of model evaluation and comparison. In this work, we investigate the ability of DRP models for generalizing predictions across datasets of multiple drug screening studies, a more challenging scenario mimicking practical applications of DRP models. Five cell line datasets and six community DRP models with advanced DL architectures have been explored. Public cell line drug screening datasets have been curated and processed for this analysis, including CCLE, CTRP, GDSC1, GDSC2, and GCSI. For each dataset, the same preprocessing pipeline was used to generate cell line gene expressions, drug representations, and drug response values. The six DRP models include advanced architectures and feature engineering methods such as transformer, graph neural network, and image representation of tabular data. Systematic model curation and training have been applied, including consistent training and testing data splits across models and hyperparameter optimization (HPO). To cope with the large-scale model training and HPO, automatic workflows have been implemented and executed on high-performance computing systems. A 5-by-5 matrix of prediction scores, corresponding to the five datasets in both row and column dimensions, has been generated for each model, with off-diagonal values representing the cross-dataset generalization. Despite the advanced DL techniques, all models exhibit substantially inferior performance in cross-dataset analysis as compared with cross-validation within a single dataset. This result demonstrates the challenge of cross-dataset generalization for DRP and motivates the need for rigorous and systematic evaluation of DRP models, which simulates real-world applications. Citation Format: Alexander Partin, Thomas S. Brettin, Yitan Zhu, Jamie Overbeek, Oleksandr Narykov, Priyanka Vasanthakumari, Austin Clyde, Sara E. Jones, Satishkumar Ranganathan Ganakammal, Justin M. Wozniak, Andreas Wilke, Jamaludin Mohd-Yusof, Michael R. Weil, Alexander T. Pearson, Rick L. Stevens. Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5380.

APA, Harvard, Vancouver, ISO, and other styles

Preethi, B. Meena, R. Gowtham, S. Aishvarya, S. Karthick, and D. G. Sabareesh. "Rainfall Prediction using Machine Learning and Deep Learning Algorithms." International Journal of Recent Technology and Engineering (IJRTE) 10, no. 4 (November 30, 2021): 251–54. http://dx.doi.org/10.35940/ijrte.d6611.1110421.

Full text

Abstract:

The project entitled as “Rainfall Prediction using Machine Learning & Deep Learning Algorithms” is a research project which is developed in Python Language and dataset is stored in Microsoft Excel. This prediction uses various machine learning and deep learning algorithms to find which algorithm predicts with most accurately. Rainfall prediction can be achieved by using binary classification under Data Mining. Predicting the rainfall is very important in several aspects of one’s country and can help from preventing serious natural disasters. For this prediction, Artificial Neural Network using Forward and Backward Propagation, Ada Boost, Gradient Boosting and XGBoost algorithms are used in this model for predicting the rainfall. There are totally five modules used in this project. The Data Analysis Module will analyse the datasets and finding the missing values in the dataset. The Data Pre-processing includes Data Cleaning which is the process of filling the missing values in the dataset. The Feature Transformation Module is used to modify the features of the dataset. The Data Mining Module is used to train the dataset to models using any algorithm for learning the pattern. The Model Evaluation Module is used to measure the performance of the model and finalize the overall best accuracy for the prediction. Dataset used in this prediction is for the country Australia. This main aim of the project is to compare the various boosting algorithms with the neural network and find the best algorithm among them. This prediction can be major advantage to the farmers in order to plant the types of crops according to the needy of water. Overall, we analyse the algorithm which is feasible for qualitatively predicting the rainfall.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "PREDICTION DATASET"

Klus, Petr 1985. ""The Clever machine"- a computational tool for dataset exploration and prediction." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/482051.

Full text

Abstract:

The purpose of my doctoral studies was to develop an algorithm for large-scale analysis of protein sets. This thesis outlines the methodology and technical work performed as well as relevant biological cases involved in creation of the core algorithm, the cleverMachine (CM), and its extensions multiCleverMachine (mCM) and cleverGO. The CM and mCM provide characterisation and classification of protein groups based on physico-chemical features, along with protein abundance and Gene Ontology annotation information, to perform an accurate data exploration. My method provides both computational and experimental scientists with a comprehensive, easy to use interface for high-throughput protein sequence screening and classification.
El propósito de mis estudios doctorales era desarrollar un algoritmo para el análisis a gran escala de conjuntos de datos de proteínas. Esta tesis describe la metodología, el trabajo técnico desarrollado y los casos biológicos envueltos en la creación del algoritmo principal –el cleverMachine (CM) y sus extensiones multiCleverMachine (mCM) y cleverGO. El CM y mCM permiten la caracterización y clasificación de grupos de proteínas basados en características físico-químicas, junto con la abundancia de proteínas y la anotación de ontología de genes, para así elaborar una exploración de datos correcta. Mi método está compuesto por científicos tanto computacionales como experimentales con una interfaz amplia, fácil de usar para un monitoreo y clasificación de secuencia de proteínas de alto rendimiento.

APA, Harvard, Vancouver, ISO, and other styles

Clayberg, Lauren (Lauren W. ). "Web element role prediction from visual information using a novel dataset." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/132734.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020
Cataloged from the official PDF of thesis.
Includes bibliographical references (pages 89-90).
Machine learning has enhanced many existing tech industries, including end-to-end test automation for web applications. One of the many goals that mabl and other companies have in this new tech initiative is to automatically gain insight into how web applications work. The task of web element role prediction is vital for the advancement of this newly emerging product category. I applied supervised visual machine learning techniques to the task. In addition, I created a novel dataset and present detailed attribute distribution and bias information. The dataset is used to provide updated baselines for performance using current day web applications, and a novel metric is provided to better quantify the performance of these models. The top performing model achieves an F1-score of 0.45 on ten web element classes. Additional findings include color distributions for different web element roles, and how some color spaces are more intuitive to humans than others.
by Lauren Clayberg.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

Oppon, Ekow CruickShank. "Synergistic use of promoter prediction algorithms: a choice of small training dataset?" Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_8222_1185436339.

Full text

Abstract:

Promoter detection, especially in prokaryotes, has always been an uphill task and may remain so, because of the many varieties of sigma factors employed by various organisms in transcription. The situation is made more complex by the fact, that any seemingly unimportant sequence segment may be turned into a promoter sequence by an activator or repressor (if the actual promoter sequence is made unavailable). Nevertheless, a computational approach to promoter detection has to be performed due to number of reasons. The obvious that comes to mind is the long and tedious process involved in elucidating promoters in the &lsquo
wet&rsquo
laboratories not to mention the financial aspect of such endeavors. Promoter detection/prediction of an organism with few characterized promoters (M.tuberculosis) as envisaged at the beginning of this work was never going to be easy. Even for the few known Mycobacterial promoters, most of the respective sigma factors associated with their transcription were not known. If the information (promoter-sigma) were available, the research would have been focused on categorizing the promoters according to sigma factors and training the methods on the respective categories. That is assuming that, there would be enough training data for the respective categories. Most promoter detection/prediction studies have been carried out on E.coli because of the availability of a number of experimentally characterized promoters (+- 310). Even then, no researcher to date has extended the research to the entire E.coli genome.

APA, Harvard, Vancouver, ISO, and other styles

Vandehei, Bailey R. "Leveraging Defects Life-Cycle for Labeling Defective Classes." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2111.

Full text

Abstract:

Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis activities. This work focuses on automated methods for labeling a class in a version as defective or not. The most used methods for automated class labeling belong to the SZZ family and fail in various circum- stances. Thus, recent studies suggest the use of aect version (AV) as provided by developers and available in the issue tracker such as JIRA. However, in many cir- cumstances, the AV might not be used because it is unavailable or inconsistent. The aim of this study is twofold: 1) to measure the AV availability and consistency in open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method for labeling defective classes which is based on the idea that defects have a stable life-cycle in terms of proportion of versions needed to discover the defect and to x the defect. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, show that the AV cannot be used in the majority (51%) of defects. Therefore, it is important to investigate automated meth- ods for labeling defective classes. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the proposed method for labeling defective classes is, in average among projects and de- fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ with defects life-cycle information is statistically signicant but practically irrelevant ( overall and in average, more accurate via defects' life-cycle than any SZZ method.

APA, Harvard, Vancouver, ISO, and other styles

Sousa, Massáine Bandeira e. "Improving accuracy of genomic prediction in maize single-crosses through different kernels and reducing the marker dataset." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-07032018-163203/.

Full text

Abstract:

In plant breeding, genomic prediction (GP) may be an efficient tool to increase the accuracy of selecting genotypes, mainly, under multi-environments trials. This approach has the advantage to increase genetic gains of complex traits and reduce costs. However, strategies are needed to increase the accuracy and reduce the bias of genomic estimated breeding values. In this context, the objectives were: i) to compare two strategies to obtain markers subsets based on marker effect regarding their impact on the prediction accuracy of genome selection; and, ii) to compare the accuracy of four GP methods including genotype × environment interaction and two kernels (GBLUP and Gaussian). We used a rice diversity panel (RICE) and two maize datasets (HEL and USP). These were evaluated for grain yield and plant height. Overall, the prediction accuracy and relative efficiency of genomic selection were increased using markers subsets, which has the potential for build fixed arrays and reduce costs with genotyping. Furthermore, using Gaussian kernel and the including G×E effect, there is an increase in the accuracy of the genomic prediction models.
No melhoramento de plantas, a predição genômica (PG) é uma eficiente ferramenta para aumentar a eficiência seletiva de genótipos, principalmente, considerando múltiplos ambientes. Esta técnica tem como vantagem incrementar o ganho genético para características complexas e reduzir os custos. Entretanto, ainda são necessárias estratégias que aumentem a acurácia e reduzam o viés dos valores genéticos genotípicos. Nesse contexto, os objetivos foram: i) comparar duas estratégias para obtenção de subconjuntos de marcadores baseado em seus efeitos em relação ao seu impacto na acurácia da seleção genômica; ii) comparar a acurácia seletiva de quatro modelos de PG incluindo o efeito de interação genótipo × ambiente (G×A) e dois kernels (GBLUP e Gaussiano). Para isso, foram usados dados de um painel de diversidade de arroz (RICE) e dois conjuntos de dados de milho (HEL e USP). Estes foram avaliados para produtividade de grãos e altura de plantas. Em geral, houve incremento da acurácia de predição e na eficiência da seleção genômica usando subconjuntos de marcadores. Estes poderiam ser utilizados para construção de arrays e, consequentemente, reduzir os custos com genotipagem. Além disso, utilizando o kernel Gaussiano e incluindo o efeito de interação G×A há aumento na acurácia dos modelos de predição genômica.

APA, Harvard, Vancouver, ISO, and other styles

Johansson, David. "Price Prediction of Vinyl Records Using Machine Learning Algorithms." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96464.

Full text

Abstract:

Machine learning algorithms have been used for price prediction within several application areas. Examples include real estate, the stock market, tourist accommodation, electricity, art, cryptocurrencies, and fine wine. Common approaches in studies are to evaluate the accuracy of predictions and compare different algorithms, such as Linear Regression or Neural Networks. There is a thriving global second-hand market for vinyl records, but the research of price prediction within the area is very limited. The purpose of this project was to expand on existing knowledge within price prediction in general to evaluate some aspects of price prediction of vinyl records. That included investigating the possible level of accuracy and comparing the efficiency of algorithms. A dataset of 37000 samples of vinyl records was created with data from the Discogs website, and multiple machine learning algorithms were utilized in a controlled experiment. Among the conclusions drawn from the results was that the Random Forest algorithm generally generated the strongest results, that results can vary substantially between different artists or genres, and that a large part of the predictions had a good accuracy level, but that a relatively small amount of large errors had a considerable effect on the general results.

APA, Harvard, Vancouver, ISO, and other styles

Baveye, Yoann. "Automatic prediction of emotions induced by movies." Thesis, Ecully, Ecole centrale de Lyon, 2015. http://www.theses.fr/2015ECDL0035/document.

Full text

Abstract:

Jamais les films n’ont été aussi facilement accessibles aux spectateurs qui peuvent profiter de leur potentiel presque sans limite à susciter des émotions. Savoir à l’avance les émotions qu’un film est susceptible d’induire à ses spectateurs pourrait donc aider à améliorer la précision des systèmes de distribution de contenus, d’indexation ou même de synthèse des vidéos. Cependant, le transfert de cette expertise aux ordinateurs est une tâche complexe, en partie due à la nature subjective des émotions. Cette thèse est donc dédiée à la détection automatique des émotions induites par les films, basée sur les propriétés intrinsèques du signal audiovisuel. Pour s’atteler à cette tâche, une base de données de vidéos annotées selon les émotions induites aux spectateurs est nécessaire. Cependant, les bases de données existantes ne sont pas publiques à cause de problèmes de droit d’auteur ou sont de taille restreinte. Pour répondre à ce besoin spécifique, cette thèse présente le développement de la base de données LIRIS-ACCEDE. Cette base a trois avantages principaux: (1) elle utilise des films sous licence Creative Commons et peut donc être partagée sans enfreindre le droit d’auteur, (2) elle est composée de 9800 extraits vidéos de bonne qualité qui proviennent de 160 films et courts métrages, et (3) les 9800 extraits ont été classés selon les axes de “valence” et “arousal” induits grâce un protocole de comparaisons par paires mis en place sur un site de crowdsourcing. L’accord inter-annotateurs élevé reflète la cohérence des annotations malgré la forte différence culturelle parmi les annotateurs. Trois autres expériences sont également présentées dans cette thèse. Premièrement, des scores émotionnels ont été collectés pour un sous-ensemble de vidéos de la base LIRIS-ACCEDE dans le but de faire une validation croisée des classements obtenus via crowdsourcing. Les scores émotionnels ont aussi rendu possible l’apprentissage d’un processus gaussien par régression, modélisant le bruit lié aux annotations, afin de convertir tous les rangs liés aux vidéos de la base LIRIS-ACCEDE en scores émotionnels définis dans l’espace 2D valence-arousal. Deuxièmement, des annotations continues pour 30 films ont été collectées dans le but de créer des modèles algorithmiques temporellement fiables. Enfin, une dernière expérience a été réalisée dans le but de mesurer de façon continue des données physiologiques sur des participants regardant les 30 films utilisés lors de l’expérience précédente. La corrélation entre les annotations physiologiques et les scores continus renforce la validité des résultats de ces expériences. Equipée d’une base de données, cette thèse présente un modèle algorithmique afin d’estimer les émotions induites par les films. Le système utilise à son avantage les récentes avancées dans le domaine de l’apprentissage profond et prend en compte la relation entre des scènes consécutives. Le système est composé de deux réseaux de neurones convolutionnels ajustés. L’un est dédié à la modalité visuelle et utilise en entrée des versions recadrées des principales frames des segments vidéos, alors que l’autre est dédié à la modalité audio grâce à l’utilisation de spectrogrammes audio. Les activations de la dernière couche entièrement connectée de chaque réseau sont concaténées pour nourrir un réseau de neurones récurrent utilisant des neurones spécifiques appelés “Long-Short-Term- Memory” qui permettent l’apprentissage des dépendances temporelles entre des segments vidéo successifs. La performance obtenue par le modèle est comparée à celle d’un modèle basique similaire à l’état de l’art et montre des résultats très prometteurs mais qui reflètent la complexité de telles tâches. En effet, la prédiction automatique des émotions induites par les films est donc toujours une tâche très difficile qui est loin d’être complètement résolue
Never before have movies been as easily accessible to viewers, who can enjoy anywhere the almost unlimited potential of movies for inducing emotions. Thus, knowing in advance the emotions that a movie is likely to elicit to its viewers could help to improve the accuracy of content delivery, video indexing or even summarization. However, transferring this expertise to computers is a complex task due in part to the subjective nature of emotions. The present thesis work is dedicated to the automatic prediction of emotions induced by movies based on the intrinsic properties of the audiovisual signal. To computationally deal with this problem, a video dataset annotated along the emotions induced to viewers is needed. However, existing datasets are not public due to copyright issues or are of a very limited size and content diversity. To answer to this specific need, this thesis addresses the development of the LIRIS-ACCEDE dataset. The advantages of this dataset are threefold: (1) it is based on movies under Creative Commons licenses and thus can be shared without infringing copyright, (2) it is composed of 9,800 good quality video excerpts with a large content diversity extracted from 160 feature films and short films, and (3) the 9,800 excerpts have been ranked through a pair-wise video comparison protocol along the induced valence and arousal axes using crowdsourcing. The high inter-annotator agreement reflects that annotations are fully consistent, despite the large diversity of raters’ cultural backgrounds. Three other experiments are also introduced in this thesis. First, affective ratings were collected for a subset of the LIRIS-ACCEDE dataset in order to cross-validate the crowdsourced annotations. The affective ratings made also possible the learning of Gaussian Processes for Regression, modeling the noisiness from measurements, to map the whole ranked LIRIS-ACCEDE dataset into the 2D valence-arousal affective space. Second, continuous ratings for 30 movies were collected in order develop temporally relevant computational models. Finally, a last experiment was performed in order to collect continuous physiological measurements for the 30 movies used in the second experiment. The correlation between both modalities strengthens the validity of the results of the experiments. Armed with a dataset, this thesis presents a computational model to infer the emotions induced by movies. The framework builds on the recent advances in deep learning and takes into account the relationship between consecutive scenes. It is composed of two fine-tuned Convolutional Neural Networks. One is dedicated to the visual modality and uses as input crops of key frames extracted from video segments, while the second one is dedicated to the audio modality through the use of audio spectrograms. The activations of the last fully connected layer of both networks are conv catenated to feed a Long Short-Term Memory Recurrent Neural Network to learn the dependencies between the consecutive video segments. The performance obtained by the model is compared to the performance of a baseline similar to previous work and shows very promising results but reflects the complexity of such tasks. Indeed, the automatic prediction of emotions induced by movies is still a very challenging task which is far from being solved

APA, Harvard, Vancouver, ISO, and other styles

Lamichhane, Niraj. "Prediction of Travel Time and Development of Flood Inundation Maps for Flood Warning System Including Ice Jam Scenario. A Case Study of the Grand River, Ohio." Youngstown State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463789508.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rai, Manisha. "Topographic Effects in Strong Ground Motion." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/56593.

Full text

Abstract:

Ground motions from earthquakes are known to be affected by earth's surface topography. Topographic effects are a result of several physical phenomena such as the focusing or defocusing of seismic waves reflected from a topographic feature and the interference between direct and diffracted seismic waves. This typically causes an amplification of ground motion on convex features such as hills and ridges and a de-amplification on concave features such as valleys and canyons. Topographic effects are known to be frequency dependent and the spectral accelerations can sometimes reach high values causing significant damages to the structures located on the feature. Topographically correlated damage pattern have been observed in several earthquakes and topographic amplifications have also been observed in several recorded ground motions. This phenomenon has also been extensively studied through numerical analyses. Even though different studies agree on the nature of topographic effects, quantifying these effects have been challenging. The current literature has no consensus on how to predict topographic effects at a site. With population centers growing around regions of high seismicity and prominent topographic relief, such as California, and Japan, the quantitative estimation of the effects have become very important. In this dissertation, we address this shortcoming by developing empirical models that predict topographic effects at a site. These models are developed through an extensive empirical study of recorded ground motions from two large strong-motion datasets namely the California small to medium magnitude earthquake dataset and the global NGA-West2 datasets, and propose topographic modification factors that quantify expected amplification or deamplification at a site. To develop these models, we required a parameterization of topography. We developed two types of topographic parameters at each recording stations. The first type of parameter is developed using the elevation data around the stations, and comprise of parameters such as smoothed slope, smoothed curvature, and relative elevation. The second type of parameter is developed using a series of simplistic 2D numerical analysis. These numerical analyses compute an estimate of expected 2D topographic amplification of a simple wave at a site in several different directions. These 2D amplifications are used to develop a family of parameters at each site. We study the trends in the ground motion model residuals with respect to these topographic parameters to determine if the parameters can capture topographic effects in the recorded data. We use statistical tests to determine if the trends are significant, and perform mixed effects regression on the residuals to develop functional forms that can be used to predict topographic effect at a site. Finally, we compare the two types of parameters, and their topographic predictive power.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

Cooper, Heather. "Comparison of Classification Algorithms and Undersampling Methods on Employee Churn Prediction: A Case Study of a Tech Company." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2260.

Full text

Abstract:

Churn prediction is a common data mining problem that many companies face across industries. More commonly, customer churn has been studied extensively within the telecommunications industry where there is low customer retention due to high market competition. Similar to customer churn, employee churn is very costly to a company and by not deploying proper risk mitigation strategies, profits cannot be maximized, and valuable employees may leave the company. The cost to replace an employee is exponentially higher than finding a replacement, so it is in any company’s best interest to prioritize employee retention. This research combines machine learning techniques with undersampling in hopes of identifying employees at risk of churn so retention strategies can be implemented before it is too late. Four different classification algorithms are tested on a variety of undersampled datasets in order to find the most effective undersampling and classification method for predicting employee churn. Statistical analysis is conducted on the appropriate evaluation metrics to find the most significant methods. The results of this study can be used by the company to target individuals at risk of churn so that risk mitigation strategies can be effective in retaining the valuable employees. Methods and results can be tested and applied across different industries and companies.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "PREDICTION DATASET"

Arseneault, René. Using Linear Modelling and Predictive Analytics Make Future Decisions Based on Large Employee HR Datasets. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Inc., 2023. http://dx.doi.org/10.4135/9781529629491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sentā, Kaiyō Kagaku Gijutsu. Sentanteki yojigen taiki kaiyō rikuiki ketsugō dēta dōka shisutemu no kaihatsu to kōseido kikō hendō yosoku ni hitsuyō na shokichika saikaiseki tōgō dētasetto no kōchiku: Heisei 14-nendo kenkyū seika hōkokusho = Research development of advanced four-dimensional data assimilation system using a climate model toward construction of high-quality reanalysis datasets for climate prediction. [Tokyo]: Monbu Kagakushō̄ Kenkyū Kaihatsukyoku, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kaiyō Kenkyū Kaihatsu Kikō (Japan). Sentanteki yojigen taiki kaiyō rikuiki ketsugō dēta dōka shisutemu no kaihatsu to kōseido kikō hendō yosoku ni hitsuyō na shokichika saikaiseki tōgō dētasetto no kōchiku: Heisei 17-nendo kenkyū seika hōkokusho = Research development of advanced four-dimensional data assimilation system using a climate model toward construction of high-quality reanalysis datasets for climate prediction. [Tokyo]: Monbu Kagakushō̄ Kenkyū Kaihatsukyoku, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kaiyō Kenkyū Kaihatsu Kikō (Japan), Hokkaidō Daigaku, and Japan. Monbu Kagakushō. Kenkyū Kaihatsukyoku., eds. Sentanteki yojigen taiki kaiyō rikuiki ketsugō dēta dōka shisutemu no kaihatsu to kōseido kikō hendō yosoku ni hitsuyō na shokichika saikaiseki tōgō dētasetto no kōchiku: Heisei 18-nendo kenkyū seika hōkokusho = Research development of advanced four-dimensional data assimilation system using a climate model toward construction of high-quality reanalysis datasets for climate prediction. [Tokyo]: Monbu Kagakushō̄ Kenkyū Kaihatsukyoku, 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Delsol, Laurent. Nonparametric Methods for α-Mixing Functional Random Variables. Edited by Frédéric Ferraty and Yves Romain. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780199568444.013.5.

Full text

Abstract:

This article considers how functional kernel methods can be used to study α-mixing datasets. It first provides an overview of how prediction problems involving dependent functional datasets may arise from the study of time series, focusing on the standard discretized model and modelization that takes into account the functional nature of the evolution of the quantity to be studied over time. It then considers strong mixing conditions, with emphasis on the notion of α-mixing coefficients and α-mixing variables introduced by Rosenblatt (1956). It also describes some conditions for a Markov chain to be α-mixing; some useful tools that provide covariance inequalities, exponential inequalities, and Central Limit Theorem (CLT) for α-mixing sequences; the asymptotic properties of functional kernel estimators; the use of kernel smoothing methods with α-mixing datasets; and various functional kernel estimators corresponding to different prediction methods. Finally, the article highlights some interesting prospects for further research.

APA, Harvard, Vancouver, ISO, and other styles

Kumar, Ashish. Learning Predictive Analytics with Python: Gain Practical Insights into Predictive Modelling by Implementing Predictive Analytics Algorithms on Public Datasets with Python. Packt Publishing, Limited, 2016.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Learning Predictive Analytics with Python: Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python. Packt Publishing, 2016.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Peng, Handie. Economic Theories and Empirics on the Sex Market. Edited by Scott Cunningham and Manisha Shah. Oxford University Press, 2016. http://dx.doi.org/10.1093/oxfordhb/9780199915248.013.2.

Full text

Abstract:

This article presents a number of testable predictions from Edlund and Korn’s (2002) theoretical model. In their seminal study, Edlund and Korn propose a model that sees prostitution as an alternative to marriage. According to the them, women can only choose between marriage and prostitution, and “prostitution is low-skill, labor intensive, female, and well paid.” Because prostitution has such an unusual combination of attributes, traditional labor theories might not be able to explain the wage differential of this profession. The Edlund and Korn (EK) model offers “a marriage market explanation to this puzzle.” The critical assumption is that prostitutes need to be compensated for the forgone marriage market opportunities. This chapter tests three unique predictions from the EK model: (1) that there exists a wage differential for the sex worker, (2) that prostitution falls with female wage and male income, and (3) that foreign prostitutes should have a lower wage, ceteris paribus. These predictions are examined using two new datasets of Internet-mediated prostitution. The chapter finds evidence for the first two predictions but not for the third.

APA, Harvard, Vancouver, ISO, and other styles

Johnston, Benjamin, and Ishita Mathur. Applied Supervised Learning with Python: Use Scikit-Learn to Build Predictive Models from Real-world Datasets and Prepare Yourself for the Future of Machine Learning. Packt Publishing, Limited, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Schadt, Eric E. Network Methods for Elucidating the Complexity of Common Human Diseases. Edited by Dennis S. Charney, Eric J. Nestler, Pamela Sklar, and Joseph D. Buxbaum. Oxford University Press, 2017. http://dx.doi.org/10.1093/med/9780190681425.003.0002.

Full text

Abstract:

The life sciences are now a significant contributor to the ever expanding digital universe of data, and stand poised to lead in both the generation of big data and the realization of dramatic benefit from it. We can now score variations in DNA across whole genomes; RNA levels and alternative isoforms, metabolite levels, protein levels, and protein state information across the transcriptome, metabolome and proteome; methylation status across the methylome; and construct extensive protein–protein and protein–DNA interaction maps, all in a comprehensive fashion and at the scale of populations of individuals. This chapter describes a number of analytical approaches aimed at inferring causal relationships among variables in very large-scale datasets by leveraging DNA variation as a systematic perturbation source. The causal inference procedures are also demonstrated to enhance the ability to reconstruct truly predictive, probabilistic causal gene networks that reflect the biological processes underlying complex phenotypes like disease.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "PREDICTION DATASET"

Kumar, Sandeep, and Santosh Singh Rathore. "Software Fault Dataset." In Software Fault Prediction, 31–38. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8715-8_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Spenrath, Yorick, Marwan Hassani, and Boudewijn F. van Dongen. "Online Prediction of Aggregated Retailer Consumer Behaviour." In Lecture Notes in Business Information Processing, 211–23. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98581-3_16.

Full text

Abstract:

AbstractPredicting the behaviour of consumers provides valuable information for retailers, such as the expected spend of a consumer or the total turnover of the retailer. The ability to make predictions on an individual level is useful, as it allows retailers to accurately perform targeted marketing. However, with the expected large number of consumers and their diverse behaviour, making accurate predictions on an individual consumer level is difficult. In this paper we present a framework that focuses on this trade-off in an online setting. By making predictions on a larger number of consumers at a time, we improve the predictive accuracy but at the cost of usefulness, as we can say less about the individual consumers. The framework is developed in an online setting, where we update the prediction model and make new predictions over time. We show the existence of the trade-off in an experimental evaluation on a real-world dataset consisting of 39 weeks of transaction data.

APA, Harvard, Vancouver, ISO, and other styles

Syah, Rahmad, Marischa Elveny, and Mahyuddin K. M. Nasution. "Clustering Large DataSet’ to Prediction Business Metrics." In Software Engineering Perspectives in Intelligent Systems, 1117–27. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63322-6_95.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Aljaaf, Ahmed J., Dhiya Al-Jumeily, Abir J. Hussain, Paul Fergus, Mohammed Al-Jumaily, and Hani Hamdan. "Partially Synthesised Dataset to Improve Prediction Accuracy." In Intelligent Computing Theories and Application, 855–66. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-42291-6_84.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chiou, Andrew, and Xinghuo Yu. "Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset." In Computational Intelligence for Modelling and Prediction, 147–62. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/10966518_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bhattacharya, Hindol, Arnab Bhattacharya, Samiran Chattopadhyay, and Matangini Chattopadhyay. "LDA Topic Modeling Based Dataset Dependency Matrix Prediction." In Communications in Computer and Information Science, 54–69. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-8581-0_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Wei, Xiaofei Xing, Saqib Ali, and Guojun Wang. "Internet Performance Prediction Framework Based on PingER Dataset." In Algorithms and Architectures for Parallel Processing, 118–31. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-05057-3_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lin, Ronghua, Yong Tang, Chengzhe Yuan, Chaobo He, and Weisheng Li. "SCHOLAT Link Prediction: A Link Prediction Dataset Fusing Topology and Attribute Information." In Computer Supported Cooperative Work and Social Computing, 340–51. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-4549-6_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tsatsoulis, P. Daphne, Paige Kordas, Michael Marshall, David Forsyth, and Agata Rozga. "The Static Multimodal Dyadic Behavior Dataset for Engagement Prediction." In Lecture Notes in Computer Science, 386–99. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-49409-8_31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tan, Chuheng, and Ximing Zhong. "A Rapid Wind Velocity Prediction Method in Built Environment Based on CycleGAN Model." In Computational Design and Robotic Fabrication, 253–62. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-8637-6_22.

Full text

Abstract:

AbstractAlthough the wind microclimate and wind environment play important roles in urban prediction, the time-consuming and complicated setup and process of wind simulation are widely regarded as challenges. There are several methods to use deep learning (DL) models for wind speed prediction by labeling pairs of wind simulation dataset samples. However, many wind simulation experiments are needed to obtain paired datasets, which is still time-consuming and cumbersome. Compared with previous studies, we propose a method to train a DL model without labelling paired data, which is based on Cycle Generative Adversarial Network (cycleGAN). To verify our hypothesis, we evaluate the results and process of the pix2pix model (requires paired datasets) and cycleGAN (does not requires paired datasets), and explore the difference of results between these two DL models and professional CFD software. The result shows that cycleGAN can perform as well as pix2pix in accuracy, indicating that some random city plans image samples and random wind simulation samples can train surrogate models as accurate as labelled DL methods. Although the DL method has similar results to the professional CFD method, the details of the wind flow results still need improvement. This study can help designers and policymakers to make informed decisions to choose Dl methods for real-time wind speed prediction for early-stage design exploration.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "PREDICTION DATASET"

Maggio, Simona, Victor Bouvier, and Leo Dreyfus-Schmidt. "Performance Prediction Under Dataset Shift." In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022. http://dx.doi.org/10.1109/icpr56361.2022.9956676.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chen, Yifan, and Fanzeng Xia. "Restaurants’ Rating Prediction Using Yelp Dataset." In 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, 2020. http://dx.doi.org/10.1109/aeeca49918.2020.9213704.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Verma, Chaman, Veronika Stoffova, Zoltan Illes, and Ahmad S. tarawneh. "RESIDENCE STATE AND COUNTRY PREDICTION OF STUDENT TOWARDS ICT FOR THE REAL-TIME." In eLSE 2020. University Publishing House, 2020. http://dx.doi.org/10.12753/2066-026x-20-120.

Full text

Abstract:

An experimental study was conducted to predict the residence state and country of students based on their response provided in the two different ICT survey held during the academic year 2015-2016 and during the academic year 2017-2018. The first dataset was consisted of 560 instances and 59 features and second dataset was comprised of 331 instances and 46 features. We considered the state in the first dataset and country in the second dataset as the response variable and rest of all are assumed as predictors after self-reduction few features. The datasets are trained and tested with the splitting and k-fold Cross Validation (CV) using three popular supervised machine learning classifiers named Artificial Neural Network (ANN), Sequential Minimal Optimization (SMO) and Random Forest (RF) in the Weka 3.8.1 workbench. In the state prediction, the RF classifier outperformed with highest prediction accuracy of 83.39% the ANN and SMO at 6-Fold of the CV method. The maximum accurate prediction count for the Punjab student is found 239 out of 282 and for the Haryana student is found 228 out of 278 with k=6. In the country prediction, the best fitting model will be presented with highest prediction accuracy. The comparison findings are described about state versus country prediction of student with important measures like accuracy, error, F-score, Confusion matrix, True Positive Rate (TPR), False positive rate (FPR) and ROC curves. Further, these state and country predictive models may support the real-time prediction module of student's demography prediction towards the technological awareness.

APA, Harvard, Vancouver, ISO, and other styles

Shaukat, Zain Shaukat, Rashid Naseem, and Muhammad Zubair. "A Dataset for Software Requirements Risk Prediction." In 2018 IEEE International Conference on Computational Science and Engineering (CSE). IEEE, 2018. http://dx.doi.org/10.1109/cse.2018.00022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sherk, Thomas, Minh-Triet Tran, and Tam V. Nguyen. "SharkTank Deal Prediction: Dataset and Computational Model." In 2019 11th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 2019. http://dx.doi.org/10.1109/kse.2019.8919477.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alsaraireh, Jameel, and Mary Agoyi. "New Dataset for Software Defect Prediction Model." In 2022 10th International Conference on Smart Grid (icSmartGrid). IEEE, 2022. http://dx.doi.org/10.1109/icsmartgrid55722.2022.9848620.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Munoz-Gonzalez, Angel, and Ryota Horie. "EEG Signal Power Prediction Using DEAP Dataset." In 2022 7th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS). IEEE, 2022. http://dx.doi.org/10.1109/iciibms55689.2022.9971594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Salman, Nuha Ahmed, and Saad Talib Hasson. "A Prediction Approach for Small Healthcare Dataset." In 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE, 2023. http://dx.doi.org/10.23919/splitech58164.2023.10193552.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sohn, Samuel S., Seonghyeon Moon, Honglu Zhou, Mihee Lee, Sejong Yoon, Vladimir Pavlovic, and Mubbasir Kapadia. "Harnessing Fourier Isovists and Geodesic Interaction for Long-Term Crowd Flow Prediction." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/185.

Full text

Abstract:

With the rise in popularity of short-term Human Trajectory Prediction (HTP), Long-Term Crowd Flow Prediction (LTCFP) has been proposed to forecast crowd movement in large and complex environments. However, the input representations, models, and datasets for LTCFP are currently limited. To this end, we propose Fourier Isovists, a novel input representation based on egocentric visibility, which consistently improves all existing models. We also propose GeoInteractNet (GINet), which couples the layers between a multi-scale attention network (M-SCAN) and a convolutional encoder-decoder network (CED). M-SCAN approximates a super-resolution map of where humans are likely to interact on the way to their goals and produces multi-scale attention maps. The CED then uses these maps in either its encoder's inputs or its decoder's attention gates, which allows GINet to produce super-resolution predictions with substantially higher accuracy than existing models even with Fourier Isovists. In order to evaluate the scalability of models to large and complex environments, which the only existing LTCFP dataset is unsuitable for, a new synthetic crowd dataset with both real and synthetic environments has been generated. In its nascent state, LTCFP has much to gain from our key contributions. The Supplementary Materials, dataset, and code are available at sssohn.github.io/GeoInteractNet.

APA, Harvard, Vancouver, ISO, and other styles

Wu, Shuhui, Yongliang Shen, Zeqi Tan, and Weiming Lu. "Propose-and-Refine: A Two-Stage Set Prediction Network for Nested Named Entity Recognition." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/613.

Full text

Abstract:

Nested named entity recognition (nested NER) is a fundamental task in natural language processing. Various span-based methods have been proposed to detect nested entities with span representations. However, span-based methods do not consider the relationship between a span and other entities or phrases, which is helpful in the NER task. Besides, span-based methods have trouble predicting long entities due to limited span enumeration length. To mitigate these issues, we present the Propose-and-Refine Network (PnRNet), a two-stage set prediction network for nested NER. In the propose stage, we use a span-based predictor to generate some coarse entity predictions as entity proposals. In the refine stage, proposals interact with each other, and richer contextual information is incorporated into the proposal representations. The refined proposal representations are used to re-predict entity boundaries and classes. In this way, errors in coarse proposals can be eliminated, and the boundary prediction is no longer constrained by the span enumeration length limitation. Additionally, we build multi-scale sentence representations, which better model the hierarchical structure of sentences and provide richer contextual information than token-level representations. Experiments show that PnRNet achieves state-of-the-art performance on four nested NER datasets and one flat NER dataset.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "PREDICTION DATASET"

Roberson, Madeleine, Kathleen Inman, Ashley Carey, Isaac Howard, and Jameson Shannon. Probabilistic neural networks that predict compressive strength of high strength concrete in mass placements using thermal history. Engineer Research and Development Center (U.S.), June 2022. http://dx.doi.org/10.21079/11681/44483.

Full text

Abstract:

This study explored the use of artificial neural networks to predict UHPC compressive strengths given thermal history and key mix components. The model developed herein employs Bayesian variational inference using Monte Carlo dropout to convey prediction uncertainty using 735 datapoints on seven UHPC mixtures collected using a variety of techniques. Datapoints contained a measured compressive strength along with three curing inputs (specimen maturity, maximum temperature experienced during curing, time of maximum temperature) and five mixture inputs to distinguish each UHPC mixture (cement type, silicon dioxide content, mix type, water to cementitious material ratio, and admixture dosage rate). Input analysis concluded that predictions were more sensitive to curing inputs than mixture inputs. On average, 8.2% of experimental results in the final model fell outside of the predicted range with 67.9%of these cases conservatively underpredicting. The results support that this model methodology is able to make sufficient probabilistic predictions within the scope of the provided dataset but is not for extrapolating beyond the training data. In addition, the model was vetted using various datasets obtained from literature to assess its versatility. Overall this model is a promising advancement towards predicting mechanical properties of high strength concrete with known uncertainties.

APA, Harvard, Vancouver, ISO, and other styles

Letcher, Theodore, Sandra LeGrand, and Christopher Polashenski. The Blowing Snow Hazard Assessment and Risk Prediction model : a Python based downscaling and risk prediction for snow surface erodibility and probability of blowing snow. Engineer Research and Development Center (U.S.), March 2022. http://dx.doi.org/10.21079/11681/43582.

Full text

Abstract:

Blowing snow is an extreme terrain hazard causing intermittent severe reductions in ground visibility and snow drifting. These hazards pose significant risk to operations in snow-covered regions. While many ingredients-based forecasting methods can be employed to predict where blowing snow is likely to occur, there are currently no physically based tools to predict blowing snow from a weather forecast. However, there are several different process models that simulate the transport of snow over short distances that can be adapted into a terrain forecasting tool. This report documents a downscaling and blowing-snow prediction tool that leverages existing frameworks for snow erodibility, lateral snow transport, and visibility, and applies these frameworks for terrain prediction. This tool is designed to work with standard numerical weather model output and user-specified geographic models to generate spatially variable forecasts of snow erodibility, blowing snow probability, and deterministic blowing-snow visibility near the ground. Critically, this tool aims to account for the history of the snow surface as it relates to erodibility, which further refines the blowing-snow risk output. Qualitative evaluations of this tool suggest that it can provide more precise forecasts of blowing snow. Critically, this tool can aid in mission planning by downscaling high-resolution gridded weather forecast data using even higher resolution terrain dataset, to make physically based predictions of blowing snow.

APA, Harvard, Vancouver, ISO, and other styles

Paparazzoa, Ersilia, Vincenzo Lagani, Silvana Geracitano, Luigi Citrigno, Mirella Aurora Aceto, Antoinio Malvaso, Francesco Bruno, Giuseppe Passarino, and Alberto Montesanto. An ELOVL2 based epigenetic clock for forensic age prediction: a systematic review. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, December 2022. http://dx.doi.org/10.37766/inplasy2022.12.0006.

Full text

Abstract:

Review question / Objective: To develop an easy, robust and improved blood-based age prediction model using ELOVL2 promoter methylation data. Eligibility criteria: All studies with the aim of understanding the relationship between the ELOVL2 methylation levels and age written in English language, carried out in humans and providing a publicly available dataset will be included in the systematic review. Articles that did not include original research (e.g., review, opinion article or conference abstract) and for which methylation analysis will be carried out using a technology different from the pyrosequencing in tissues different form blood will be excluded from further analyses.

APA, Harvard, Vancouver, ISO, and other styles

Zhu, Xian-Kui, Brian Leis, and Tom McGaughy. PR-185-173600-R01 Reference Stress for Metal-loss Assessment of Pipelines. Chantilly, Virginia: Pipeline Research Council International, Inc. (PRCI), August 2018. http://dx.doi.org/10.55274/r0011516.

Full text

Abstract:

This project focused on quantifying the reference stress to be used in predictive models for assessing the effects of metal loss on pipeline integrity. The results of this project will work in concert with the outcomes of project EC-2-7 that examined sources of scatter in metal-loss predictions with respect to the metal-loss defect geometry. The methodology for developing a new reference stress included empirical and finite element analyses along with comparison of full-scale experimental results that indicate the failure behavior of defect-free pipe has dependence on the strain hardening rate, n, of the pipe steel. Since the strain hardening rate is often unreported in qualification test records and mill certification reports, the development of a new reference stress will seek to include the utilization of the ratio of yield-to-tensile strength (Y/T) as a surrogate for n. This approach ideally would be insensitive to pipe grade, and thus, allow broad application of the reference stress without increasing scatter or bias across grade levels. This work also compared the resulting metal-loss criterion with the new reference stress relative to the B31G and Modified B31G models using a dataset of approximately 75 full-scale burst test results for test vessels containing isolated defects. This comparison was performed by C-FER Technologies under sub-contract to EWI and quantified the prediction bias and prediction variability of the new criterion relative to those widely in use.

APA, Harvard, Vancouver, ISO, and other styles

Koduru, Smitha, and Jason Skow. PR-244-153719-R01 Quantification of ILI Sizing Uncertainties and Improving Correction Factors. Chantilly, Virginia: Pipeline Research Council International, Inc. (PRCI), August 2018. http://dx.doi.org/10.55274/r0011518.

Full text

Abstract:

Operators routinely perform verification digs to assess whether an inline inspection (ILI) tool meets the performance specified by the ILI vendors. Characterizing the actual ILI tool performance using available field and ILI data is a difficult problem due to uncertainties associated with measurements and geometric classification of features. The focus of this project is to use existing ILI and excavation data to develop better approaches for assessing ILI tool performance. For corrosion features, operators are primarily interested in quantifying magnetic flux leakage (MFL) ILI tool sizing error and its relationship to burst pressure estimates. In previously completed PRCI research, a limited MFL ILI dataset was used to determine the corrosion feature depth sizing bias and random error using principles published in API 1163 (2013). The research demonstrated the tendency for ILI predictions to be slightly lower than field measurements (i.e., under-call) for the dataset studied, and it provided a framework for characterizing this bias. The goal of this project was to expand on previous work by increasing the number and type of feature morphologies available for analysis, and by estimating the sizing error of ILI measured external corrosion features. New geometric classification criteria, complementing the current criteria suggested by the Pipeline Operator Forum (POF 2009), were also investigated. Lastly, correction factors based on burst pressure prediction accuracy were developed to account for the effect of adopting various feature interaction rules. This report has a related webinar (member login required).

APA, Harvard, Vancouver, ISO, and other styles

Hart, Carl R., D. Keith Wilson, Chris L. Pettit, and Edward T. Nykaza. Machine-Learning of Long-Range Sound Propagation Through Simulated Atmospheric Turbulence. U.S. Army Engineer Research and Development Center, July 2021. http://dx.doi.org/10.21079/11681/41182.

Full text

Abstract:

Conventional numerical methods can capture the inherent variability of long-range outdoor sound propagation. However, computational memory and time requirements are high. In contrast, machine-learning models provide very fast predictions. This comes by learning from experimental observations or surrogate data. Yet, it is unknown what type of surrogate data is most suitable for machine-learning. This study used a Crank-Nicholson parabolic equation (CNPE) for generating the surrogate data. The CNPE input data were sampled by the Latin hypercube technique. Two separate datasets comprised 5000 samples of model input. The ﬁrst dataset consisted of transmission loss (TL) ﬁelds for single realizations of turbulence. The second dataset consisted of average TL ﬁelds for 64 realizations of turbulence. Three machine-learning algorithms were applied to each dataset, namely, ensemble decision trees, neural networks, and cluster-weighted models. Observational data come from a long-range (out to 8 km) sound propagation experiment. In comparison to the experimental observations, regression predictions have 5–7 dB in median absolute error. Surrogate data quality depends on an accurate characterization of refractive and scattering conditions. Predictions obtained through a single realization of turbulence agree better with the experimental observations.

APA, Harvard, Vancouver, ISO, and other styles

Puttanapong, Nattapong, Arturo M. Martinez Jr, Mildred Addawe, Joseph Bulan, Ron Lester Durante, and Marymell Martillan. Predicting Poverty Using Geospatial Data in Thailand. Asian Development Bank, December 2020. http://dx.doi.org/10.22617/wps200434-2.

Full text

Abstract:

This study examines an alternative approach in estimating poverty by investigating whether readily available geospatial data can accurately predict the spatial distribution of poverty in Thailand. It also compares the predictive performance of various econometric and machine learning methods such as generalized least squares, neural network, random forest, and support vector regression. Results suggest that intensity of night lights and other variables that approximate population density are highly associated with the proportion of population living in poverty. The random forest technique yielded the highest level of prediction accuracy among the methods considered, perhaps due to its capability to fit complex association structures even with small and medium-sized datasets.

APA, Harvard, Vancouver, ISO, and other styles

Alviarez, Vanessa, Michele Fioretti, Ken Kikkawa, and Monica Morlacco. Two-Sided Market Power in Firm-to-Firm Trade. Inter-American Development Bank, August 2021. http://dx.doi.org/10.18235/0003493.

Full text

Abstract:

Firms in global value chains (GVCs) are granular and exert bargaining power over the terms of trade. We show that these features are crucial to understanding the well-established variation in prices and pass-through across importers and exporters. We develop a novel theory of prices in GVCs, which tractably nests a wide range of bilateral concentration and bargaining power configurations. We test and evaluate the models predictions using a novel dataset merging transaction-level U.S. import data with balance sheet data for both U.S. importers and foreign exporters. Our pricing framework enhances traditional frameworks in the literature in accurately predicting price changes following a tariff shock. The results shed light on the role of firms in determining the tariff pass-through onto import prices.

APA, Harvard, Vancouver, ISO, and other styles

Idakwo, Gabriel, Sundar Thangapandian, Joseph Luttrell, Zhaoxian Zhou, Chaoyang Zhang, and Ping Gong. Deep learning-based structure-activity relationship modeling for multi-category toxicity classification : a case study of 10K Tox21 chemicals with high-throughput cell-based androgen receptor bioassay data. Engineer Research and Development Center (U.S.), July 2021. http://dx.doi.org/10.21079/11681/41302.

Full text

Abstract:

Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p < 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.

APA, Harvard, Vancouver, ISO, and other styles

Koutsourelakis, P. Unsupervised Group Discovery and LInk Prediction in Relational Datasets: a nonparametric Bayesian approach. Office of Scientific and Technical Information (OSTI), May 2007. http://dx.doi.org/10.2172/908093.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'PREDICTION DATASET'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "PREDICTION DATASET"

Dissertations / Theses on the topic "PREDICTION DATASET"

Books on the topic "PREDICTION DATASET"

Book chapters on the topic "PREDICTION DATASET"

Conference papers on the topic "PREDICTION DATASET"

Reports on the topic "PREDICTION DATASET"