Tesi: "Data / features engineering"

1

Mohammed, Hussein Syed. "Random feature subspace ensemble based approaches for the analysis of data with missing features /". Full text available online, 2006. http://www.lib.rowan.edu/find/theses.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

2

Baik, Edward H. (Edward Hyeen). "Surface-based segmentation of volume data using texture features". Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/43516.

Testo completo

Abstract (sommario):

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (p. 117-123).
by Edward H. Baik.
M.Eng.

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Campbell, Richard John. "Recognition of free-form 3D objects in range data using global and local features /". The Ohio State University, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=osu1486397841221694.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

4

Oldfield, Robin B. "Lithological mapping of Northwest Argentina with remote sensing data using tonal, textural and contextual features". Thesis, Aston University, 1988. http://publications.aston.ac.uk/14287/.

Testo completo

Abstract (sommario):

Tonal, textural and contextual properties are used in manual photointerpretation of remotely sensed data. This study has used these three attributes to produce a lithological map of semi arid northwest Argentina by semi automatic computer classification procedures of remotely sensed data. Three different types of satellite data were investigated, these were LANDSAT MSS, TM and SIR-A imagery. Supervised classification procedures using tonal features only produced poor classification results. LANDSAT MSS produced classification accuracies in the range of 40 to 60%, while accuracies of 50 to 70% were achieved using LANDSAT TM data. The addition of SIR-A data produced increases in the classification accuracy. The increased classification accuracy of TM over the MSS is because of the better discrimination of geological materials afforded by the middle infra red bands of the TM sensor. The maximum likelihood classifier consistently produced classification accuracies 10 to 15% higher than either the minimum distance to means or decision tree classifier, this improved accuracy was obtained at the cost of greatly increased processing time. A new type of classifier the spectral shape classifier, which is computationally as fast as a minimum distance to means classifier is described. However, the results for this classifier were disappointing, being lower in most cases than the minimum distance or decision tree procedures. The classification results using only tonal features were felt to be unacceptably poor, therefore textural attributes were investigated. Texture is an important attribute used by photogeologists to discriminate lithology. In the case of TM data, texture measures were found to increase the classification accuracy by up to 15%. However, in the case of the LANDSAT MSS data the use of texture measures did not provide any significant increase in the accuracy of classification. For TM data, it was found that second order texture, especially the SGLDM based measures, produced highest classification accuracy. Contextual post processing was found to increase classification accuracy and improve the visual appearance of classified output by removing isolated misclassified pixels which tend to clutter classified images. Simple contextual features, such as mode filters were found to out perform more complex features such as gravitational filter or minimal area replacement methods. Generally the larger the size of the filter, the greater the increase in the accuracy. Production rules were used to build a knowledge based system which used tonal and textural features to identify sedimentary lithologies in each of the two test sites. The knowledge based system was able to identify six out of ten lithologies correctly.

Gli stili APA, Harvard, Vancouver, ISO e altri

5

Mora, Omar Ernesto. "Morphology-Based Identification of Surface Features to Support Landslide Hazard Detection Using Airborne LiDAR Data". The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1429861576.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

6

Fridley, Lila (Lila J. ). "Improving online demand forecast using novel features in website data : a case study at Zara". Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117976.

Testo completo

Abstract (sommario):

Thesis: M.B.A., Massachusetts Institute of Technology, Sloan School of Management, in conjunction with the Leaders for Global Operations Program at MIT, 2018.
Thesis: S.M., Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, in conjunction with the Leaders for Global Operations Program at MIT, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 77).
The challenge of improving retail inventory customer service level while reducing costs is common across many retailers. This problem is typically addressed through efficient supply chain operations. This thesis discusses the development of new methodologies to predict e-commerce consumer demand for seasonal, short life-cycle articles. The new methodology incorporates novel data to predict demand of existing products through a bottom-up point forecast at the color and location level. It addresses the widely observed challenge of forecasting censored demand during a stock out. Zara introduces thousands of new items each season across over 2100 stores in 93 markets worldwide [1]. The Zara Distribution team is responsible for allocating inventory to each physical and e-commerce store. In line with Zara's quick to retail strategy, Distribution is flexible and responsive in forecasting store demand, with new styles arriving in stores twice per week [1]. The company is interested in improving the demand forecast by leveraging the novel e-commerce data that has become available since the launch of Zara.com in 2010 [2]. The results of this thesis demonstrate that the addition of new data to a linear regression model reduces prediction error by an average of 16% for e-commerce articles experiencing censored demand during a stock out, in comparison to traditional methods. Expanding the scope to all e-commerce articles, this thesis demonstrates that incorporating easily accessible web data yields an additional 2% error reduction on average for all articles on a color and location basis. Traditional methods to improve demand prediction have not before leveraged the expansive availability of e-commerce data, and this research presents a novel solution to the fashion forecasting challenge. This thesis project may additionally be used as a case-study for companies using subscriptions or an analogous tracking tool, as well as novel data features, in a user-friendly and implementable demand forecast model.
by Lila Fridley.
M.B.A.
S.M.

Gli stili APA, Harvard, Vancouver, ISO e altri

7

Wang, Ziang. "People Matching for Transportation Planning Using Optimized Features and Texel Camera Data for Sequential Estimation". DigitalCommons@USU, 2012. https://digitalcommons.usu.edu/etd/1298.

Testo completo

Abstract (sommario):

This thesis explores pattern recognition in the dynamic setting of public transportation, such as a bus, as people enter and later exit from a doorway. Matching the entrance and exit of each individual provides accurate information about individual riders such as how long a person is on a bus and which stops the person uses. At a higher level, matching exits to entries provides information about the distribution of traffic flow across the whole transportation system. A texel camera is implemented and multiple measures of people are made where the depth and color data are generated. A large number of features are generated and the sequential floating forward selection (SFFS) algorithm is used for selecting the optimized features. Criterion functions using marginal accuracy and maximization of minimum normalized Mahalanobis distance are designed and compared. Because of the particular case of the bus environment, which is a sequential estimation problem, a trellis optimization algorithm is designed based on a sequence of measurements from the texel camera. Since the number of states in the trellis grows exponentially with the number of people currently on the bus, a beam search pruning technique is employed to manage the computational and memory load. Experimental results using real texel camera measurements show good results for 68 people exiting from an initially full bus in a randomized order. In a bus route simulation where a true traffic flow distribution is used to randomly draw entry and exit events for simulated riders, the proposed sequential estimation algorithm produces an estimated traffic flow distribution which provides an excellent match to the true distribution.

Gli stili APA, Harvard, Vancouver, ISO e altri

8

Katzwinkel, Tim, Bhavinbhai Patel, Alexander Schmid, Walter Schmidt, Justus Siebrecht, Manuel Löwer e Jörg Feldhusen. "Kosteneffiziente Technologien zur geometrischen Datenaufnahme im digitalen Reverse Engineering". Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-215118.

Testo completo

Abstract (sommario):

Zusammenfassung Der vorliegende Beitrag schlägt eine Auswahlmethode vor, die geeignete Verfahren zur kosteneffizienten Rekonstruktion geometrischer Daten von Baugruppen und Bauteilen aufzeigt. Dabei werden verschiedene objektbezogene Einflussfaktoren wie beispielsweise die Bauteilkomplexität, vorhandene Standardfeatures (z.B. genormte Gewindebohrungen) oder besondere Oberflächengeometrien berücksichtigt. Darüber hinaus werden verschiedene Techniken anhand der Kriterien zeitlicher Aufwand, technologischer Aufwands und erzielbarer Maßgenauigkeit quantitativ verglichen. Dadurch kann der Anwender einen erforderlichen Kompromiss zwischen kostenmäßigem Aufwand und erzielbarer Maßgenauigkeit abschätzen.

Gli stili APA, Harvard, Vancouver, ISO e altri

9

Fabijan, Aleksander. "Developing the right features : the role and impact of customer and product data in software product development". Licentiate thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-7794.

Testo completo

Abstract (sommario):

Software product development companies are increasingly striving to become data-driven. The access to customer feedback and product data has been, with products increasingly becoming connected to the Internet, demonetized. Systematically collecting the feedback and efficiently using it in product development, however, are challenges that large-scale software development companies face today when being faced by large amounts of available data. In this thesis, we explore the collection, use and impact of customer feedback on software product development. We base our work on a 2-year longitudinal multiple-case study research with case companies in the software-intensive domain, and complement it with a systematic review of the literature. In our work, we identify and confirm that large-software companies today collect vast amounts of feedback data, however, struggle to effectively use it. And due to this situation, there is a risk of prioritizing the development of features that may not deliver value to customers. Our contribution to this problem is threefold. First, we present a comprehensive and systematic review of activities and techniques used to collect customer feedback and product data in software product development. Next, we show that the impact of customer feedback evolves over time, but due to the lack of sharing of the collected data, companies do not fully benefit from this feedback. Finally, we provide an improvement framework for practitioners and researchers to use the collected feedback data in order to differentiate between different feature types and to model feature value during the lifecycle. With our contributions, we aim to bring software companies one step closer to data-driven decision making in software product development.

Gli stili APA, Harvard, Vancouver, ISO e altri

10

Erdogan, Ozgur. "Main Seismological Features Of Recently Compiled Turkish Strong Motion Database". Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/3/12609679/index.pdf.

Testo completo

Abstract (sommario):

In this thesis it is aimed to compile the Turkish strong-motion database for its efficient use in earthquake engineering and strong-motion seismology related studies. Within this context, the Turkish strong-motion database is homogenized in terms of basic earthquake source parameters (e.g. magnitude, style-of-faulting) as well as site classes and different source-to-site distance metrics. As part of this objective, empirical relationships for different magnitude scales are presented for further harmonization of the database. Data processing of the selected raw (unprocessed) strong-motion accelerograms that do not suffer from non-standard problems are realized. A comparative study is also conducted between the peak ground-motion values of Turkish strong-motion database with the estimations computed from different ground-motion prediction models. This way the regional differences of Turkish database are evaluated by making use of global prediction models. It is believed that the main products of this thesis will be of great use for reliable national seismic risk and hazard studies.

Gli stili APA, Harvard, Vancouver, ISO e altri

11

Jin, Chao. "Methodology on Exact Extraction of Time Series Features for Robust Prognostics and Health Monitoring". University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1504795992214385.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

12

Mehta, Alok. "Evolving legacy system's features into fine-grained components using regression test-cases". Link to electronic thesis, 2002. http://www.wpi.edu/Pubs/ETD/Available/etd-1211102-163800.

Testo completo

Abstract (sommario):

Dissertatio (Ph. D.)--Worcester Polytechnic Institute.
Keywords: software maintenance; software evolution; regression test-cases; components; legacy system; incremental software evolution methodology; fine-grained components. Includes bibliographical references (p. 283-294).

Gli stili APA, Harvard, Vancouver, ISO e altri

13

Hounsell, Marcelo da Silva. "Feature-based validation reasoning for intent-driven engineering design". Thesis, Loughborough University, 1998. https://dspace.lboro.ac.uk/2134/33152.

Testo completo

Abstract (sommario):

Feature based modelling represents the future of CAD systems. However, operations such as modelling and editing can corrupt the validity of a feature-based model representation. Feature interactions are a consequence of feature operations and the existence of a number of features in the same model. Feature interaction affects not only the solid representation of the part, but also the functional intentions embedded within features. A technique is thus required to assess the integrity of a feature-based model from various perspectives, including the functional intentional one, and this technique must take into account the problems brought about by feature interactions and operations. The understanding, reasoning and resolution of invalid feature-based models requires an understanding of the feature interaction phenomena, as well as the characterisation of these functional intentions. A system capable of such assessment is called a feature-based representation validation system. This research studies feature interaction phenomena and feature-based designer's intents as a medium to achieve a feature-based representation validation system.

Gli stili APA, Harvard, Vancouver, ISO e altri

14

Lee, Nien-Lung. "Feature Recognition From Scanned Data Points /". The Ohio State University, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487868114111376.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

15

Davis, Jonathan J. "Machine learning and feature engineering for computer network security". Thesis, Queensland University of Technology, 2017. https://eprints.qut.edu.au/106914/1/Jonathan_Davis_Thesis.pdf.

Testo completo

Abstract (sommario):

This thesis studies the application of machine learning to the field of Cyber security. Machine learning algorithms promise to enhance Cyber security by identifying malicious activity based only on provided examples. However, a major difficulty is the unsuitability of raw Cyber security data as input. In an attempt to address this problem, this thesis presents a framework for automatically constructing relevant features suitable for machine learning directly from network traffic. We then test the effectiveness of the framework by applying it to three Cyber security problems: HTTP tunnel detection, DNS tunnel detection, and traffic classification.

Gli stili APA, Harvard, Vancouver, ISO e altri

16

Ramanayaka, Mudiyanselage Asanga. "Data Engineering and Failure Prediction for Hard Drive S.M.A.R.T. Data". Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1594957948648404.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

17

Sarkar, Saurabh. "Feature Selection with Missing Data". University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378194989.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

18

Al-Sit, Waleed. "Automatic feature detection and interpretation in borehole data". Thesis, University of Liverpool, 2015. http://livrepository.liverpool.ac.uk/2014181/.

Testo completo

Abstract (sommario):

Detailed characterisation of the structure of subsurface fractures is greatly facilitated by digital borehole logging instruments, however, the interpretation of which is typically time-consuming and labour-intensive. Despite recent advances towards autonomy and automation, the final interpretation remains heavily dependent on the skill, experience, alertness and consistency of a human operator. Existing computational tools fail to detect layers between rocks that do not exhibit distinct fracture boundaries, and often struggle characterising cross-cutting layers and partial fractures. This research proposes a novel approach to the characterisation of planar rock discontinuities from digital images of borehole logs by using visual texture segmentation and pattern recognition techniques with an iterative adaptation of the Hough transform. This approach has successfully detected non-distinct, partial, distorted and steep fractures and layers in a fully automated fashion and at a relatively low computational cost. Borehole geometry or breakouts (e.g.borehole wall elongation or compression) and imaging tool decentralisation problem affect fracture characterisation and the quality of extracted geological parameters. This research presents a novel approach to the characterisation of distorted fracture in deformed borehole geometry by using least square ellipse fitting and modified Hough transform. This approach approach has successfully detected distorted fractures in deformed borehole geometry using simulated data. To increase the fracture detection accuracy, this research uses multi-sensor data combination by combining extracted edges from different borehole data. This approach has successfully increased true positive detection rate. Performance of the developed algorithms and the results of their application have been promising in terms of speed, accuracy and consistency when compared to manual interpretation by an expert operator. It is highly anticipated that the findings of this research will increase significantly the reliance on automatic interpretation.

Gli stili APA, Harvard, Vancouver, ISO e altri

19

Abdalla, Hassan Shafik. "Development of a design for manufacture concurrent engineering system". Thesis, De Montfort University, 1995. http://hdl.handle.net/2086/4253.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

20

Ni, Weizeng. "Ontology-based Feature Construction on Non-structured Data". University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

21

Sarkar, Biplab. "Modeling and manufacturing of multiple featured objects based on measurement data /". The Ohio State University, 1991. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487757723996478.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

22

Muteba, Ben Ilunga. "Data Science techniques for predicting plant genes involved in secondary metabolites production". University of the Western Cape, 2018. http://hdl.handle.net/11394/7039.

Testo completo

Abstract (sommario):

Masters of Science
Plant genome analysis is currently experiencing a boost due to reduced costs associated with the development of next generation sequencing technologies. Knowledge on genetic background can be applied to guide targeted plant selection and breeding, and to facilitate natural product discovery and biological engineering. In medicinal plants, secondary metabolites are of particular interest because they often represent the main active ingredients associated with health-promoting qualities. Plant polyphenols are a highly diverse family of aromatic secondary metabolites that act as antimicrobial agents, UV protectants, and insect or herbivore repellents. Most of the genome mining tools developed to understand genetic materials have very seldom addressed secondary metabolite genes and biosynthesis pathways. Little significant research has been conducted to study key enzyme factors that can predict a class of secondary metabolite genes from polyketide synthases. The objectives of this study were twofold: Primarily, it aimed to identify the biological properties of secondary metabolite genes and the selection of a specific gene, naringenin-chalcone synthase or chalcone synthase (CHS). The study hypothesized that data science approaches in mining biological data, particularly secondary metabolite genes, would enable the compulsory disclosure of some aspects of secondary metabolite (SM). Secondarily, the aim was to propose a proof of concept for classifying or predicting plant genes involved in polyphenol biosynthesis from data science techniques and convey these techniques in computational analysis through machine learning algorithms and mathematical and statistical approaches. Three specific challenges experienced while analysing secondary metabolite datasets were: 1) class imbalance, which refers to lack of proportionality among protein sequence classes; 2) high dimensionality, which alludes to a phenomenon feature space that arises when analysing bioinformatics datasets; and 3) the difference in protein sequences lengths, which alludes to a phenomenon that protein sequences have different lengths. Considering these inherent issues, developing precise classification models and statistical models proves a challenge. Therefore, the prerequisite for effective SM plant gene mining is dedicated data science techniques that can collect, prepare and analyse SM genes.

Gli stili APA, Harvard, Vancouver, ISO e altri

23

Khazem, Salim. "Apprentissage profond et traitement d'images pour la détection et la prédiction des nœuds au cœur des rondins". Electronic Thesis or Diss., CentraleSupélec, 2024. http://www.theses.fr/2024CSUP0016.

Testo completo

Abstract (sommario):

Dans l'industrie du bois, la qualité des grumes est fortement influencée par leur structure interne, notamment par la répartition des défauts, en particulier les nœuds à l'intérieur des arbres. La détection précise de ces nœuds, qui résultent de la croissance des branches, peut significativement améliorer l'efficacité de l'industrie en réduisant les pertes et en optimisant la qualité des produits dérivés du bois. Traditionnellement, l'identification des nœuds et d'autres caractéristiques internes des grumes, telles que les centres et les contours, nécessite l'utilisation d'équipements spécialisés tels que les scanners CT, souvent combinés à des approches classiques de vision par ordinateur pour obtenir des images détaillées de la structure interne des arbres. La principale difficulté est que ces équipements sont très onéreux et ne sont pas accessibles à toutes les entreprises, ce qui limite leur adoption dans l'industrie. C'est sur cette problématique que se concentre cette thèse et plus particulièrement sur la détection des défauts internes à partir de la surface externe des grumes de bois. Le but initial est de pouvoir automatiser la détection des différentes caractéristiques des grumes. Ensuite, ces caractéristiques seront utilisées pour faire la tâche principale qui consiste à utiliser les variations des contours afin de détecter la distribution des défauts internes. Une des contributions de ce travail est l'automatisation de la détection des caractéristiques sémantiques des arbres en utilisant les images rayon X. Nous établissons que, les méthodes basées sur l'apprentissage profond peuvent avoir une bonne performance de détection et une bonne généralisation à d'autres espèces sans pour autant nécessiter une expertise humaine. Nous introduisons trois pipelines bout-en-bout, pour les différentes caractéristiques à savoir les centres biologiques des arbres, les contours, et les noeuds. La deuxième contribution significative de ce travail, réside dans la mise en place d'un modèle de détection des défauts internes à partir de la surface externe. Le modèle utilise exclusivement les contours fins de la grume pour prédire la présence et la distribution des noeuds à l'intérieur, en exploitant des techniques d'apprentissage profond. Dans un premier temps, le modèle utilisé est de type convolutif récurrent, permettant de capturer efficacement les variations des contours pour inférer les défauts internes. Par la suite, un travail exploratoire a été entrepris, avec une première étape consistant à développer un modèle frugal pour la classification des formes. Cette approche a permis de valider les principes sous-jacents avant d'être étendue à la détection des défauts internes, tout en visant à réduire la complexité du modèle sans compromettre la précision des résultats
In the wood industry, the quality of logs is heavily influenced by their internal structure, particularly the distribution of defects, especially knots within the trees. Accurately detecting these knots, which result from branch growth, can significantly enhance the industry's efficiency by reducing waste and optimizing the quality of wood products. Traditionally, identifying knots and other internal characteristics of logs, such as centers and contours, requires specialized equipment like CT scanners, often combined with conventional computer vision approaches to obtain detailed images of the trees' internal structure. The main challenge is that such equipment is costly and not accessible to all companies, limiting its adoption in the industry. This thesis focuses on addressing this issue, particularly on detecting internal defects based on the external surface of logs. The initial goal is to automate the detection of various log characteristics. These characteristics will then be used to perform the main task, which involves utilizing contour variations to detect the distribution of internal defects. One of the contributions of this work is the automation of detecting the semantic characteristics of trees using X-ray images. We establish that deep learning-based methods can perform well in detection and generalize effectively to other species without requiring human expertise. We introduce three end-to-end pipelines for detecting different characteristics, namely tree biological centers, contours, and knots. The second significant contribution of this work is the development of a model for detecting internal defects based on the external surface. The model exclusively uses the fine contours of the log to predict the presence and distribution of internal knots, leveraging deep learning techniques. Initially, a recurrent convolutional model was employed to efficiently capture contour variations for inferring internal defects. Subsequently, exploratory work was conducted, beginning with the development of a lightweight model for shape classification. This approach helped validate the underlying principles before extending it to the detection of internal defects, aiming to reduce model complexity without compromising result accuracy

Gli stili APA, Harvard, Vancouver, ISO e altri

24

Null, Thomas Calvin. "Use of Self Organized Maps for Feature Extraction of Hyperspectral Data". MSSTATE, 2001. http://sun.library.msstate.edu/ETD-db/theses/available/etd-11082001-145530/.

Testo completo

Abstract (sommario):

In this paper, the problem of analyzing hyperspectral data is presented. The complexity of multi-dimensional data leads to the need for computer assisted data compression and labeling of important features. A brief overview of Self-Organizing Maps and their variants is given and then two possible methods of data analysis are examined. These methods are incorporated into a program derived from som_toolbox2. In this program, ASD data (data collected by an Analytical Spectral Device sensor) is read into a variable, relevant bands for discrimination between classes are extracted, and several different methods of analyzing the results are employed. A GUI was developed for easy implementation of these three stages.

Gli stili APA, Harvard, Vancouver, ISO e altri

25

Yeu, Yeon. "FEATURE EXTRACTION FROM HYPERSPECTRAL IMAGERY FOR OBJECT RECOGNITION". The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1306848130.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

26

Cassabaum, Mary Lou. "Exploiting high dimensional data for signal characterization and classification in feature space". Diss., The University of Arizona, 2004. http://hdl.handle.net/10150/280592.

Testo completo

Abstract (sommario):

The challenge of target classification is addressed in this work with both feature extraction and classifier hyperparameter optimization investigations. Simulated and measured high-range resolution radar data is processed, features are selected, and the resulting features are given to a classifier. For feature extraction, we examine two techniques. The first is a supervised method requiring an "expert" to identify and construct features. The performance of this approach served as motivation for the second technique, an automated wavelet packet basis approach. For this approach, we develop the Kolmogorov-Smirnov best-basis technique that utilizes empirical cumulative distribution functions and results in improved classification performance at low dimensionality. To measure classification efficacy, we use a quadratic Bayesian classifier, which assumes a Gaussian distribution as well as a support vector machine. The support vector machine is a classifier, which has generated excitement and interest in the pattern recognition community due to its generalization, performance, and ability to operate in high dimensional feature spaces. Although support vector machines are generated without the use of user-specified models, required hyperparameters, such as kernel width, are usually user-specified or experimentally derived. We develop techniques to optimize selection of these hyperparameters. These approaches allow us to characterize the problem, ultimately resulting in an automated approach for optimization, semi-alignment .

Gli stili APA, Harvard, Vancouver, ISO e altri

27

Li, Hua. "Feature Selection for High-risk Pattern Discovery in Medical Data". University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1353154433.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

28

Zhang, Yi. "Application of Hyper-geometric Hypothesis-based Quantication and Markov Blanket Feature Selection Methods to Generate Signals for Adverse Drug Reaction Detection". University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1353343669.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

29

Sharma, Jason P. (Jason Poonam) 1979. "Classification performance of support vector machines on genomic data utilizing feature space selection techniques". Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87830.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

30

Wu, You. "Feature Selection on High Dimensional Histogram Data to Improve Vehicle Components´ Life Length Prediction". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-428615.

Testo completo

Abstract (sommario):

Feature selection plays an important role in life length prediction. A well selectedfeature subset can reduce the complexity of predictive models and help understand the mechanism of the ageing process. This thesis intends to investigate the potential of applying feature selection and machine learning on vehicles' operational data to predict the life length of diesel particulate filters. Filter-based feature selection methods with Pearson correlation coefficient, mutual information and analysis of variance are experimented and compared with a wrapper-based method, recursive feature elimination. The selected subsets are evaluated by linear regression, support vector machine, and multilayer perceptron. The results show that filters and wrappers are both able to significantly reduce the input feature sizes while keeping the model performance. In particular, by recursive feature elimination, 5 variables are selected from 130 with classification accuracy over 90%.

Gli stili APA, Harvard, Vancouver, ISO e altri

31

He, Yi. "An Analysis of Airborne Data Collection Methods for Updating Highway Feature Inventory". DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/5016.

Testo completo

Abstract (sommario):

Highway assets, including traffic signs, traffic signals, light poles, and guardrails, are important components of transportation networks. They guide, warn and protect drivers, and regulate traffic. To manage and maintain the regular operation of the highway system, state departments of transportation (DOTs) need reliable and up-to-date information about the location and condition of highway assets. Different methodologies have been employed to collect road inventory data. Currently, ground-based technologies are widely used to help DOTs to continually update their road database, while air-based methods are not commonly used. One possible reason is that the initial investment for air-based methods is relatively high; another is the lack of a systematic and effective approach to extract road features from raw airborne light detection and ranging (LiDAR) data and aerial image data. However, for large-area inventories (e.g., a whole state highway inventory), the total cost of using aerial mapping is actually much lower than other methods considering the time and personnel needed. Moreover, unmanned aerial vehicles (UAVs) are easily accessible and inexpensive, which makes it possible to reduce costs for aerial mapping. The focus of this project is to analyze the capability and strengths of airborne data collection system in highway inventory data collection. In this research, a field experiment was conducted by the Remote Sensing Service Laboratory (RSSL), Utah State University (USU), to collect airborne data. Two kinds of methodologies were proposed for data processing, namely ArcGIS-based algorithm for airborne LiDAR data, and MATLAB-based procedure for aerial photography. The results proved the feasibility and high efficiency of airborne data collection method for updating highway inventory database.

Gli stili APA, Harvard, Vancouver, ISO e altri

32

Allen, Andrew J. "Combining Machine Learning and Empirical Engineering Methods Towards Improving Oil Production Forecasting". DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2223.

Testo completo

Abstract (sommario):

Current methods of production forecasting such as decline curve analysis (DCA) or numerical simulation require years of historical production data, and their accuracy is limited by the choice of model parameters. Unconventional resources have proven challenging to apply traditional methods of production forecasting because they lack long production histories and have extremely variable model parameters. This research proposes a data-driven alternative to reservoir simulation and production forecasting techniques. We create a proxy-well model for predicting cumulative oil production by selecting statistically significant well completion parameters and reservoir information as independent predictor variables in regression-based models. Then, principal component analysis (PCA) is applied to extract key features of a well’s time-rate production profile and is used to estimate cumulative oil production. The efficacy of models is examined on field data of over 400 wells in the Eagle Ford Shale in South Texas, supplied from an industry database. The results of this study can be used to help oil and gas companies determine the estimated ultimate recovery (EUR) of a well and in turn inform financial and operational decisions based on available production and well completion data.

Gli stili APA, Harvard, Vancouver, ISO e altri

33

Tennety, Chandu. "Machining Feature Recognition Using 2D Data of Extruded Operations in Solid Models". Ohio University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1181406949.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

34

Sivakumar, Krish. "CAD feature development and abstraction for process planning". Ohio : Ohio University, 1994. http://www.ohiolink.edu/etd/view.cgi?ohiou1180038784.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

35

Song, Wen. "Planetary navigation activity recognition using wearable accelerometer data". Thesis, Kansas State University, 2013. http://hdl.handle.net/2097/15813.

Testo completo

Abstract (sommario):

Master of Science
Department of Electrical & Computer Engineering
Steve Warren
Activity recognition can be an important part of human health awareness. Many benefits can be generated from the recognition results, including knowledge of activity intensity as it relates to wellness over time. Various activity-recognition techniques have been presented in the literature, though most address simple activity-data collection and off-line analysis. More sophisticated real-time identification is less often addressed. Therefore, it is promising to consider the combination of current off-line, activity-detection methods with wearable, embedded tools in order to create a real-time wireless human activity recognition system with improved accuracy. Different from previous work on activity recognition, the goal of this effort is to focus on specific activities that an astronaut may encounter during a mission. Planetary navigation field test (PNFT) tasks are designed to meet this need. The approach used by the KSU team is to pre-record data on the ground in normal earth gravity and seek signal features that can be used to identify, and even predict, fatigue associated with these activities. The eventual goal is to then assess/predict the condition of an astronaut in a reduced-gravity environment using these predetermined rules. Several classic machine learning algorithms, including the k-Nearest Neighbor, Naïve Bayes, C4.5 Decision Tree, and Support Vector Machine approaches, were applied to these data to identify recognition algorithms suitable for real-time application. Graphical user interfaces (GUIs) were designed for both MATLAB and LabVIEW environments to facilitate recording and data analysis. Training data for the machine learning algorithms were recorded while subjects performed each activity, and then these identification approaches were applied to new data sets with an identification accuracy of around 86%. Early results indicate that a single three-axis accelerometer is sufficient to identify the occurrence of a given PNFT activity. A custom, embedded acceleration monitoring system employing ZigBee transmission is under development for future real-time activity recognition studies. A different GUI has been implemented for this system, which uses an on-line algorithm that will seek to identify activity at a refresh rate of 1 Hz.

Gli stili APA, Harvard, Vancouver, ISO e altri

36

Mortensen, Clifton H. "A Computational Fluid Dynamics Feature Extraction Method Using Subjective Logic". BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2208.

Testo completo

Abstract (sommario):

Computational fluid dynamics simulations are advancing to correctly simulate highly complex fluid flow problems that can require weeks of computation on expensive high performance clusters. These simulations can generate terabytes of data and pose a severe challenge to a researcher analyzing the data. Presented in this document is a general method to extract computational fluid dynamics flow features concurrent with a simulation and as a post-processing step to drastically reduce researcher post-processing time. This general method uses software agents governed by subjective logic to make decisions about extracted features in converging and converged data sets. The software agents are designed to work inside the Concurrent Agent-enabled Feature Extraction concept and operate efficiently on massively parallel high performance computing clusters. Also presented is a specific application of the general feature extraction method to vortex core lines. Each agent's belief tuple is quantified using a pre-defined set of information. The information and functions necessary to set each component in each agent's belief tuple is given along with an explanation of the methods for setting the components. A simulation of a blunt fin is run showing convergence of the horseshoe vortex core to its final spatial location at 60% of the converged solution. Agents correctly select between two vortex core extraction algorithms and correctly identify the expected probabilities of vortex cores as the solution converges. A simulation of a delta wing is run showing coherently extracted primary vortex cores as early as 16% of the converged solution. Agents select primary vortex cores extracted by the Sujudi-Haimes algorithm as the most probable primary cores. These simulations show concurrent feature extraction is possible and that intelligent agents following the general feature extraction method are able to make appropriate decisions about converging and converged features based on pre-defined information.

Gli stili APA, Harvard, Vancouver, ISO e altri

37

Chen, Yan. "Data Quality Assessment Methodology for Improved Prognostics Modeling". University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1330024393.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

38

Yang, Yimin. "Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis". FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2254.

Testo completo

Abstract (sommario):

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.

Gli stili APA, Harvard, Vancouver, ISO e altri

39

Abid, Saad Bin, e Xian Wei. "Development of Software for Feature Model Rendering". Thesis, Jönköping University, JTH, Computer and Electrical Engineering, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-621.

Testo completo

Abstract (sommario):

This Master’s thesis is aimed at improving the management of artifacts in the context of a joint-project between Jönköping University with the SEMCO project and industrial partner, a company involved in developing software for safety components. Both have a slightly distinct interest but this project can serve both parties.

Nowadays feature modelling is efficient way for domain analysis. The purpose of this master thesis is to analysis existing four popular feature diagrams, to find out commonalities between each of them and conclude results to give suggestions of how to use existing notation systems efficiently and according to situations.

The developed software based on knowledge established from research analysis. Two notation systems which are suggested in research part of the thesis report are implemented in the developed software “NotationManager”. The development procedures are also described and developer choices are mentioned along with the comparisons according to the situations

Scope of the research part as well as development is discussed. Future work for developed solution is also suggested.

Gli stili APA, Harvard, Vancouver, ISO e altri

40

Hanley, John P. "A New Evolutionary Algorithm For Mining Noisy, Epistatic, Geospatial Survey Data Associated With Chagas Disease". ScholarWorks @ UVM, 2017. http://scholarworks.uvm.edu/graddis/727.

Testo completo

Abstract (sommario):

The scientific community is just beginning to understand some of the profound affects that feature interactions and heterogeneity have on natural systems. Despite the belief that these nonlinear and heterogeneous interactions exist across numerous real-world systems (e.g., from the development of personalized drug therapies to market predictions of consumer behaviors), the tools for analysis have not kept pace. This research was motivated by the desire to mine data from large socioeconomic surveys aimed at identifying the drivers of household infestation by a Triatomine insect that transmits the life-threatening Chagas disease. To decrease the risk of transmission, our colleagues at the laboratory of applied entomology and parasitology have implemented mitigation strategies (known as Ecohealth interventions); however, limited resources necessitate the search for better risk models. Mining these complex Chagas survey data for potential predictive features is challenging due to imbalanced class outcomes, missing data, heterogeneity, and the non-independence of some features. We develop an evolutionary algorithm (EA) to identify feature interactions in "Big Datasets" with desired categorical outcomes (e.g., disease or infestation). The method is non-parametric and uses the hypergeometric PMF as a fitness function to tackle challenges associated with using p-values in Big Data (e.g., p-values decrease inversely with the size of the dataset). To demonstrate the EA effectiveness, we first test the algorithm on three benchmark datasets. These include two classic Boolean classifier problems: (1) the "majority-on" problem and (2) the multiplexer problem, as well as (3) a simulated single nucleotide polymorphism (SNP) disease dataset. Next, we apply the EA to real-world Chagas Disease survey data and successfully archived numerous high-order feature interactions associated with infestation that would not have been discovered using traditional statistics. These feature interactions are also explored using network analysis. The spatial autocorrelation of the genetic data (SNPs of Triatoma dimidiata) was captured using geostatistics. Specifically, a modified semivariogram analysis was performed to characterize the SNP data and help elucidate the movement of the vector within two villages. For both villages, the SNP information showed strong spatial autocorrelation albeit with different geostatistical characteristics (sills, ranges, and nuggets). These metrics were leveraged to create risk maps that suggest the more forested village had a sylvatic source of infestation, while the other village had a domestic/peridomestic source. This initial exploration into using Big Data to analyze disease risk shows that novel and modified existing statistical tools can improve the assessment of risk on a fine-scale.

Gli stili APA, Harvard, Vancouver, ISO e altri

41

Zhou, Mu. "Knowledge Discovery and Predictive Modeling from Brain Tumor MRIs". Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5809.

Testo completo

Abstract (sommario):

Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care? This dissertation presents a set of computational models in the context of malignant brain tumors— Giloblastoma Multiforme (GBM), which is notoriously aggressive with a poor survival rate. In particular, this dissertation developed quantitative feature extraction approaches for tumor diagnosis from magnetic resonance imaging (MRI), including a multi-scale local computational feature and a novel regional habitat quantification analysis of tumors. In addition, we proposed a histogram-based representation to investigate biological features to characterize ecological dynamics, which is of great clinical interest in evaluating tumor cellular distributions. Furthermore, in regards to clinical systems, generic machine learning techniques are typically incapable of generalizing well to specific diagnostic problems. Therefore, quantitative analysis from a data-driven perspective is becoming critical. In this dissertation, we propose two specific data-driven models to tackle different types of clinical MRI data. First, we inspected cancer systems from a time-domain perspective. We propose a quantitative histogram-based approach that builds a prediction model, measuring the differences from pre- and post-treatment diagnostic MRI data. Second, we investigated the problem of mining knowledge from a skewed distribution—data samples of each survival group are unequally distributed. We proposed an algorithmic framework to effectively predict survival groups by jointly considering imbalanced distributions and classifier design. Our approach achieved an accuracy of 95.24%, suggesting it captures class-specific information in a challenging clinical setting.

Gli stili APA, Harvard, Vancouver, ISO e altri

42

Pookhao, Naruekamol. "Statistical Methods for Functional Metagenomic Analysis Based on Next-Generation Sequencing Data". Diss., The University of Arizona, 2014. http://hdl.handle.net/10150/320986.

Testo completo

Abstract (sommario):

Metagenomics is the study of a collective microbial genetic content recovered directly from natural (e.g., soil, ocean, and freshwater) or host-associated (e.g., human gut, skin, and oral) environmental communities that contain microorganisms, i.e., microbiomes. The rapid technological developments in next generation sequencing (NGS) technologies, enabling to sequence tens or hundreds of millions of short DNA fragments (or reads) in a single run, facilitates the studies of multiple microorganisms lived in environmental communities. Metagenomics, a relatively new but fast growing field, allows us to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Also, it assists us to identify significantly different metabolic potentials in different environments. Particularly, metagenomic analysis on the basis of functional features (e.g., pathways, subsystems, functional roles) enables to contribute the genomic contents of microbes to human health and leads us to understand how the microbes affect human health by analyzing a metagenomic data corresponding to two or multiple populations with different clinical phenotypes (e.g., diseased and healthy, or different treatments). Currently, metagenomic analysis has substantial impact not only on genetic and environmental areas, but also on clinical applications. In our study, we focus on the development of computational and statistical methods for functional metagnomic analysis of sequencing data that is obtained from various environmental microbial samples/communities.

Gli stili APA, Harvard, Vancouver, ISO e altri

43

Dill, Evan T. "Integration of 3D and 2D Imaging Data for Assured Navigation in Unknown Environments". Ohio University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1299616166.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

44

Mizaku, Alda. "Biomolecular feature selection of colorectal cancer microarray data using GA-SVM hybrid and noise perturbation to address overfitting". Diss., Online access via UMI:, 2009.

Cerca il testo completo

Abstract (sommario):

Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Bioengineering, Biomedical Engineering, 2009.
Includes bibliographical references.

Gli stili APA, Harvard, Vancouver, ISO e altri

45

Bard, Ari. "Modeling and Predicting Heat Transfer Coefficients for Flow Boiling in Microchannels". Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1619091352188123.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

46

Regnier, Lise. "Localization, Characterization and Recognition of Singing Voices". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.

Testo completo

Abstract (sommario):

This dissertation is concerned with the problem of describing the singing voice within the audio signal of a song. This work is motivated by the fact that the lead vocal is the element that attracts the attention of most listeners. For this reason it is common for music listeners to organize and browse music collections using information related to the singing voice such as the singer name. Our research concentrates on the three major problems of music information retrieval: the localization of the source to be described (i.e. the recognition of the elements corresponding to the singing voice in the signal of a mixture of instruments), the search of pertinent features to describe the singing voice, and finally the development of pattern recognition methods based on these features to identify the singer. For this purpose we propose a set of novel features computed on the temporal variations of the fundamental frequency of the sung melody. These features, which aim to describe the vibrato and the portamento, are obtained with the aid of a dedicated model. In practice, these features are computed on the time-varying frequency of partials obtained using the sinusoidal model. In the first experiment we show that partials corresponding to the singing voice can be accurately differentiated from the partials produced by other instruments using decisions based on the parameters of the vibrato and the portamento. Once the partials emitted by the singer are identified, the segments of the song containing singing can be directly localized. To improve the recognition of the partials emitted by the singer we propose to group partials that are related harmonically. Partials are clustered according to their degree of similarity. This similarity is computed using a set of CASA cues including their temporal frequency variations (i.e. the vibrato and the portamento). The clusters of harmonically related partials corresponding to the singing voice are identified using the vocal vibrato and the portamento parameters. Groups of vocal partials can then be re-synthesized to isolate the voice. The result of the partial grouping can also be used to transcribe the sung melody. We then propose to go further with these features and study if the vibrato and portamento characteristics can be considered as a part of the singers' signature. Previous works on singer identification describe audio signals using features extracted on the short-term amplitude spectrum. The latter features aim to characterize the timbre of the sound, which, in the case of singing, is related to the vocal tract of the singer. The features we develop in this document capture long-term information related to the intonation of the singer, which is relevant to the style and the technique of the singer. We propose a method to combine these two complementary descriptions of the singing voice to increase the recognition rate of singer identification. In addition we evaluate the robustness of each type of feature against a set of variations. We show the singing voice is a highly variable instrument. To obtain a representative model of a singer's voice it is thus necessary to build models using a large set of examples covering the full tessitura of a singer. In addition, we show that features extracted directly from the partials are more robust to the presence of an instrumental accompaniment than features derived from the amplitude spectrum.

Gli stili APA, Harvard, Vancouver, ISO e altri

47

Henriksson, Erik, e Kristopher Werlinder. "Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.

Testo completo

Abstract (sommario):

The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data.
Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.

Gli stili APA, Harvard, Vancouver, ISO e altri

48

Hu, Renjie. "Random neural networks for dimensionality reduction and regularized supervised learning". Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6960.

Testo completo

Abstract (sommario):

This dissertation explores Random Neural Networks (RNNs) in several aspects and their applications. First, Novel RNNs have been proposed for dimensionality reduction and visualization. Based on Extreme Learning Machines (ELMs) and Self-Organizing Maps (SOMs) a new method is created to identify the important variables and visualize the data. This technique reduces the curse of dimensionality and improves furthermore the interpretability of the visualization and is tested on real nursing survey datasets. ELM-SOM+ is an autoencoder created to preserves the intrinsic quality of SOM and also brings continuity to the projection using two ELMs. This new methodology shows considerable improvement over SOM on real datasets. Second, as a Supervised Learning method, ELMs has been applied to the hierarchical multiscale method to bridge the the molecular dynamics to continua. The method is tested on simulation data and proven to be efficient for passing the information from one scale to another. Lastly, the regularization of ELMs has been studied and a new regularization algorithm for ELMs is created using a modified Lanczos Algorithm. The Lanczos ELM on average divide computational time by 20 and reduce the Normalized MSE by 14% comparing with regular ELMs.

Gli stili APA, Harvard, Vancouver, ISO e altri

49

Ge, Esther. "The query based learning system for lifetime prediction of metallic components". Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/18345/4/Esther_Ting_Ge_Thesis.pdf.

Testo completo

Abstract (sommario):

This research project was a step forward in developing an efficient data mining method for estimating the service life of metallic components in Queensland school buildings. The developed method links together the different data sources of service life information and builds the model for a real situation when the users have information on limited inputs only. A practical lifetime prediction system was developed for the industry partners of this project including Queensland Department of Public Works and Queensland Department of Main Roads. The system provides high accuracy in practice where not all inputs are available for querying to the system.

Gli stili APA, Harvard, Vancouver, ISO e altri

50

Ge, Esther. "The query based learning system for lifetime prediction of metallic components". Queensland University of Technology, 2008. http://eprints.qut.edu.au/18345/.

Testo completo

Abstract (sommario):

This research project was a step forward in developing an efficient data mining method for estimating the service life of metallic components in Queensland school buildings. The developed method links together the different data sources of service life information and builds the model for a real situation when the users have information on limited inputs only. A practical lifetime prediction system was developed for the industry partners of this project including Queensland Department of Public Works and Queensland Department of Main Roads. The system provides high accuracy in practice where not all inputs are available for querying to the system.

Gli stili APA, Harvard, Vancouver, ISO e altri

Tesi sul tema "Data / features engineering"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili