Dissertations / Theses: 'FEATURE SELECTION TECHNIQUE'

1

Tan, Feng. "Improving Feature Selection Techniques for Machine Learning." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/27.

Full text

Abstract:

As a commonly used technique in data preprocessing for machine learning, feature selection identifies important features and removes irrelevant, redundant or noise features to reduce the dimensionality of feature space. It improves efficiency, accuracy and comprehensibility of the models built by learning algorithms. Feature selection techniques have been widely employed in a variety of applications, such as genomic analysis, information retrieval, and text categorization. Researchers have introduced many feature selection algorithms with different selection criteria. However, it has been discovered that no single criterion is best for all applications. We proposed a hybrid feature selection framework called based on genetic algorithms (GAs) that employs a target learning algorithm to evaluate features, a wrapper method. We call it hybrid genetic feature selection (HGFS) framework. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for the target algorithm. The experiments on genomic data demonstrate that ours is a robust and effective approach that can find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm. A common characteristic of text categorization tasks is multi-label classification with a great number of features, which makes wrapper methods time-consuming and impractical. We proposed a simple filter (non-wrapper) approach called Relation Strength and Frequency Variance (RSFV) measure. The basic idea is that informative features are those that are highly correlated with the class and distribute most differently among all classes. The approach is compared with two well-known feature selection methods in the experiments on two standard text corpora. The experiments show that RSFV generate equal or better performance than the others in many cases.

APA, Harvard, Vancouver, ISO, and other styles

2

Loscalzo, Steven. "Group based techniques for stable feature selection." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Vege, Sri Harsha. "Ensemble of Feature Selection Techniques for High Dimensional Data." TopSCHOLAR®, 2012. http://digitalcommons.wku.edu/theses/1164.

Full text

Abstract:

Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima. In this thesis we propose an ensemble approach for feature selection, where multiple feature selection techniques are combined to yield more robust and stable results. Ensemble of multiple feature ranking techniques is performed in two steps. The first step involves creating a set of different feature selectors, each providing its sorted order of features, while the second step aggregates the results of all feature ranking techniques. The ensemble method used in our study is frequency count which is accompanied by mean to resolve any frequency count collision. Experiments conducted in this work are performed on the datasets collected from Kent Ridge bio-medical data repository. Lung Cancer dataset and Lymphoma dataset are selected from the repository to perform experiments. Lung Cancer dataset consists of 57 attributes and 32 instances and Lymphoma dataset consists of 4027 attributes and 96 ix instances. Experiments are performed on the reduced datasets obtained from feature ranking. These datasets are used to build the classification models. Model performance is evaluated in terms of AUC (Area under Receiver Operating Characteristic Curve) performance metric. ANOVA tests are also performed on the AUC performance metric. Experimental results suggest that ensemble of multiple feature selection techniques is more effective than an individual feature selection technique.

APA, Harvard, Vancouver, ISO, and other styles

4

Gustafsson, Robin. "Ordering Classifier Chains using filter model feature selection techniques." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14817.

Full text

Abstract:

Context: Multi-label classification concerns classification with multi-dimensional output. The Classifier Chain breaks the multi-label problem into multiple binary classification problems, chaining the classifiers to exploit dependencies between labels. Consequently, its performance is influenced by the chain's order. Approaches to finding advantageous chain orders have been proposed, though they are typically costly. Objectives: This study explored the use of filter model feature selection techniques to order Classifier Chains. It examined how feature selection techniques can be adapted to evaluate label dependence, how such information can be used to select a chain order and how this affects the classifier's performance and execution time. Methods: An experiment was performed to evaluate the proposed approach. The two proposed algorithms, Forward-Oriented Chain Selection (FOCS) and Backward-Oriented Chain Selection (BOCS), were tested with three different feature evaluators. 10-fold cross-validation was performed on ten benchmark datasets. Performance was measured in accuracy, 0/1 subset accuracy and Hamming loss. Execution time was measured during chain selection, classifier training and testing. Results: Both proposed algorithms led to improved accuracy and 0/1 subset accuracy (Friedman & Hochberg, p < 0.05). FOCS also improved the Hamming loss while BOCS did not. Measured effect sizes ranged from 0.20 to 1.85 percentage points. Execution time was increased by less than 3 % in most cases. Conclusions: The results showed that the proposed approach can improve the Classifier Chain's performance at a low cost. The improvements appear similar to comparable techniques in magnitude but at a lower cost. It shows that feature selection techniques can be applied to chain ordering, demonstrates the viability of the approach and establishes FOCS and BOCS as alternatives worthy of further consideration.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Fu. "Intelligent feature selection for neural regression : techniques and applications." Thesis, University of Warwick, 2012. http://wrap.warwick.ac.uk/49639/.

Full text

Abstract:

Feature Selection (FS) and regression are two important technique categories in Data Mining (DM). In general, DM refers to the analysis of observational datasets to extract useful information and to summarise the data so that it can be more understandable and be used more efficiently in terms of storage and processing. FS is the technique of selecting a subset of features that are relevant to the development of learning models. Regression is the process of modelling and identifying the possible relationships between groups of features (variables). Comparing with the conventional techniques, Intelligent System Techniques (ISTs) are usually favourable due to their flexible capabilities for handling real‐life problems and the tolerance to data imprecision, uncertainty, partial truth, etc. This thesis introduces a novel hybrid intelligent technique, namely Sensitive Genetic Neural Optimisation (SGNO), which is capable of reducing the dimensionality of a dataset by identifying the most important group of features. The capability of SGNO is evaluated with four practical applications in three research areas, including plant science, civil engineering and economics. SGNO is constructed using three key techniques, known as the core modules, including Genetic Algorithm (GA), Neural Network (NN) and Sensitivity Analysis (SA). The GA module controls the progress of the algorithm and employs the NN module as its fitness function. The SA module quantifies the importance of each available variable using the results generated in the GA module. The global sensitivity scores of the variables are used determine the importance of the variables. Variables of higher sensitivity scores are considered to be more important than the variables with lower sensitivity scores. After determining the variables’ importance, the performance of SGNO is evaluated using the NN module that takes various numbers of variables with the highest global sensitivity scores as the inputs. In addition, the symbolic relationship between a group of variables with the highest global sensitivity scores and the model output is discovered using the Multiple‐Branch Encoded Genetic Programming (MBE‐GP). A total of four datasets have been used to evaluate the performance of SGNO. These datasets involve the prediction of short‐term greenhouse tomato yield, prediction of longitudinal dispersion coefficients in natural rivers, prediction of wave overtopping at coastal structures and the modelling of relationship between the growth of industrial inputs and the growth of the gross industrial output. SGNO was applied to all these datasets to explore its effectiveness of reducing the dimensionality of the datasets. The performance of SGNO is benchmarked with four dimensionality reduction techniques, including Backward Feature Selection (BFS), Forward Feature Selection (FFS), Principal Component Analysis (PCA) and Genetic Neural Mathematical Method (GNMM). The applications of SGNO on these datasets showed that SGNO is capable of identifying the most important feature groups of in the datasets effectively and the general performance of SGNO is better than those benchmarking techniques. Furthermore, the symbolic relationships discovered using MBE‐GP can generate performance competitive to the performance of NN models in terms of regression accuracies.

APA, Harvard, Vancouver, ISO, and other styles

6

Muteba, Ben Ilunga. "Data Science techniques for predicting plant genes involved in secondary metabolites production." University of the Western Cape, 2018. http://hdl.handle.net/11394/7039.

Full text

Abstract:

Masters of Science
Plant genome analysis is currently experiencing a boost due to reduced costs associated with the development of next generation sequencing technologies. Knowledge on genetic background can be applied to guide targeted plant selection and breeding, and to facilitate natural product discovery and biological engineering. In medicinal plants, secondary metabolites are of particular interest because they often represent the main active ingredients associated with health-promoting qualities. Plant polyphenols are a highly diverse family of aromatic secondary metabolites that act as antimicrobial agents, UV protectants, and insect or herbivore repellents. Most of the genome mining tools developed to understand genetic materials have very seldom addressed secondary metabolite genes and biosynthesis pathways. Little significant research has been conducted to study key enzyme factors that can predict a class of secondary metabolite genes from polyketide synthases. The objectives of this study were twofold: Primarily, it aimed to identify the biological properties of secondary metabolite genes and the selection of a specific gene, naringenin-chalcone synthase or chalcone synthase (CHS). The study hypothesized that data science approaches in mining biological data, particularly secondary metabolite genes, would enable the compulsory disclosure of some aspects of secondary metabolite (SM). Secondarily, the aim was to propose a proof of concept for classifying or predicting plant genes involved in polyphenol biosynthesis from data science techniques and convey these techniques in computational analysis through machine learning algorithms and mathematical and statistical approaches. Three specific challenges experienced while analysing secondary metabolite datasets were: 1) class imbalance, which refers to lack of proportionality among protein sequence classes; 2) high dimensionality, which alludes to a phenomenon feature space that arises when analysing bioinformatics datasets; and 3) the difference in protein sequences lengths, which alludes to a phenomenon that protein sequences have different lengths. Considering these inherent issues, developing precise classification models and statistical models proves a challenge. Therefore, the prerequisite for effective SM plant gene mining is dedicated data science techniques that can collect, prepare and analyse SM genes.

APA, Harvard, Vancouver, ISO, and other styles

7

Strand, Lars Helge. "Feature selection in Medline using text and data mining techniques." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9249.

Full text

Abstract:

In this thesis we propose a new method for searching for gene products gene products and give annotations associating genes with Gene Ontology codes. Many solutions already exists, using different techniques, however few are capable of addressing the whole GO hierarchy. We propose a method for exploring this hierarchy by dividing it into subtrees, trying to find terms that are characteristics for the subtrees involved. Using a feature selection based on chi-square analysis and naive Bayes classification to find the correct GO nodes.

APA, Harvard, Vancouver, ISO, and other styles

8

Ni, Weizeng. "A Review and Comparative Study on Univariate Feature Selection Techniques." University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1353156184.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Dang, Vinh Q. "Evolutionary approaches for feature selection in biological data." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2014. https://ro.ecu.edu.au/theses/1276.

Full text

Abstract:

Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control.

APA, Harvard, Vancouver, ISO, and other styles

10

Miller, Corey Alexander. "Intelligent Feature Selection Techniques for Pattern Classification of Time-Domain Signals." W&M ScholarWorks, 2013. https://scholarworks.wm.edu/etd/1539623620.

Full text

Abstract:

Time-domain signals form the basis of analysis for a variety of applications, including those involving variable conditions or physical changes that result in degraded signal quality. Typical approaches to signal analysis fail under these conditions, as these types of changes often lie outside the scope of the domain's basic analytic theory and are too complex for modeling. Sophisticated signal processing techniques are required as a result. In this work, we develop a robust signal analysis technique that is suitable for a wide variety of time-domain signal analysis applications. Statistical pattern classification routines are applied to problems of interest involving a physical change in the domain of the problem that translate into changes in the signal characteristics. The basis of this technique involves a signal transformation known as the Dynamic Wavelet Fingerprint, used to generate a feature space in addition to features related to the physical domain of the individual application. Feature selection techniques are explored that incorporate the context of the problem into the feature space reduction in an attempt to identify optimal representations of these data sets.

APA, Harvard, Vancouver, ISO, and other styles

11

floyd, stuart. "Data Mining Techniques for Prognosis in Pancreatic Cancer." Digital WPI, 2007. https://digitalcommons.wpi.edu/etd-theses/671.

Full text

Abstract:

This thesis focuses on the use of data mining techniques to investigate the expected survival time of patients with pancreatic cancer. Clinical patient data have been useful in showing overall population trends in patient treatment and outcomes. Models built on patient level data also have the potential to yield insights into the best course of treatment and the long-term outlook for individual patients. Within the medical community, logistic regression has traditionally been chosen for building predictive models in terms of explanatory variables or features. Our research demonstrates that the use of machine learning algorithms for both feature selection and prediction can significantly increase the accuracy of models of patient survival. We have evaluated the use of Artificial Neural Networks, Bayesian Networks, and Support Vector Machines. We have demonstrated (p<0.05) that data mining techniques are capable of improved prognostic predictions of pancreatic cancer patient survival as compared with logistic regression alone.

APA, Harvard, Vancouver, ISO, and other styles

12

Snorrason, Ögmundur. "Development and evaluation of adaptive feature selection techniques for sequential decision procedures /." The Ohio State University, 1990. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487683401444194.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Mugtussids, Iossif B. "Flight Data Processing Techniques to Identify Unusual Events." Diss., Virginia Tech, 2000. http://hdl.handle.net/10919/28095.

Full text

Abstract:

Modern aircraft are capable of recording hundreds of parameters during flight. This fact not only facilitates the investigation of an accident or a serious incident, but also provides the opportunity to use the recorded data to predict future aircraft behavior. It is believed that, by analyzing the recorded data, one can identify precursors to hazardous behavior and develop procedures to mitigate the problems before they actually occur. Because of the enormous amount of data collected during each flight, it becomes necessary to identify the segments of data that contain useful information. The objective is to distinguish between typical data points, that are present in the majority of flights, and unusual data points that can be only found in a few flights. The distinction between typical and unusual data points is achieved by using classification procedures. In this dissertation, the application of classification procedures to flight data is investigated. It is proposed to use a Bayesian classifier that tries to identify the flight from which a particular data point came. If the flight from which the data point came is identified with a high level of confidence, then the conclusion that the data point is unusual within the investigated flights can be made. The Bayesian classifier uses the overall and conditional probability density functions together with a priori probabilities to make a decision. Estimating probability density functions is a difficult task in multiple dimensions. Because many of the recorded signals (features) are redundant or highly correlated or are very similar in every flight, feature selection techniques are applied to identify those signals that contain the most discriminatory power. In the limited amount of data available to this research, twenty five features were identified as the set exhibiting the best discriminatory power. Additionally, the number of signals is reduced by applying feature generation techniques to similar signals. To make the approach applicable in practice, when many flights are considered, a very efficient and fast sequential data clustering algorithm is proposed. The order in which the samples are presented to the algorithm is fixed according to the probability density function value. Accuracy and reduction level are controlled using two scalar parameters: a distance threshold value and a maximum compactness factor.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

14

Sharma, Jason P. (Jason Poonam) 1979. "Classification performance of support vector machines on genomic data utilizing feature space selection techniques." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87830.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Boilot, Pascal. "Novel intelligent data processing techniques for electronic noses : feature selection and neuro-fuzzy knowledge base." Thesis, University of Warwick, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.399470.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Jarvis, Paul S. "Determining geographical causal relationships through the development of spatial cluster detection and feature selection techniques." Thesis, University of South Wales, 2006. https://pure.southwales.ac.uk/en/studentthesis/determining-geographical-casual-relationships-through-the-development-of-spatial-cluster-detection-and-feature-selection-techniques(7a882804-5565-44d7-8635-e59c66e2e9bc).html.

Full text

Abstract:

Spatial datasets contain information relating to the locations of incidents of a disease or other phenomena. Appropriate analysis of such datasets can reveal information about the distribution of cases of the phenomena. Areas that contain higher than expected incidence of the phenomena, given the background population, are of particular interest. Such clusters of cases may be affected by external factors. By analysing the locations of potential influences, it may be possible to establish whether a cause and effect relationship is present within the dataset. This thesis describes research that has led to the development and application of cluster detection and feature selection techniques in order to determine whether causal relationships are present within generic spatial datasets. The techniques are described and demonstrated, and their effectiveness established by testing them using synthetic datasets. The techniques are then applied to a dataset supplied by the Welsh Leukaemia Registry that details all cases of leukaemia diagnosed in Wales between 1990 and 2000. Cluster detection techniques can be used to provide information about case distribution. A novel technique, CLAP, has been developed that scans the study region and identifies the statistical significance of the levels of incidence in specific areas. Feature selection techniques can be used to identify the extent to which a selection of inputs impact upon a given output. Results from CLAP are combined with details of the locations of potential causal factors, in the form of a numerical dataset that can be analysed using feature selection techniques. Established techniques and a newly developed technique are used for the analysis. Results from such analysis allow conclusions to be drawn as to whether geographical causal relationships are apparent.

APA, Harvard, Vancouver, ISO, and other styles

17

Al-Ani, Ahmed Karim. "An improved pattern classification system using optimal feature selection, classifier combination, and subspace mapping techniques." Thesis, Queensland University of Technology, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

18

Ditzenberger, David A. "Selection and extraction of local geometric features for two dimensional model-based object recognition." Virtual Press, 1992. http://liblink.bsu.edu/uhtbin/catkey/834526.

Full text

Abstract:

A topic of computer vision that has been recently studied by a substantial number of scientists is the recognition of objects in digitized gray scale images. The primary goal of model-based object recognition research is the efficient and precise matching of features extracted from sensory data with the corresponding features in an object model database. A source of difficulty during the feature extraction is the determination and representation of pertinent attributes from the sensory data of the objects in the image. In addition, features which are visible from a single vantage point are not usually adequate for the unique identification of an object and its orientation. This paper will describe a regimen that can be used to address these problems. Image preprocessing such as edge detection, image thinning, thresholding, etc., will first be addressed. This will be followed by an in depth discussion that will center upon the extraction of local geometric feature vectors and the hypothesis-verification model used for two dimensional object recognition.
Department of Computer Science

APA, Harvard, Vancouver, ISO, and other styles

19

Pacheco, Do Espirito Silva Caroline. "Feature extraction and selection for background modeling and foreground detection." Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS005/document.

Full text

Abstract:

Dans ce manuscrit de thèse, nous présentons un descripteur robuste pour la soustraction d’arrière-plan qui est capable de décrire la texture à partir d’une séquence d’images. Ce descripteur est moins sensible aux bruits et produit un histogramme court, tout en préservant la robustesse aux changements d’éclairage. Un autre descripteur pour la reconnaissance dynamique des textures est également proposé. Le descripteur permet d’extraire non seulement des informations de couleur, mais aussi des informations plus détaillées provenant des séquences vidéo. Enfin, nous présentons une approche de sélection de caractéristiques basée sur le principe d'apprentissage par ensemble qui est capable de sélectionner les caractéristiques appropriées pour chaque pixel afin de distinguer les objets de premier plan de l’arrière plan. En outre, notre proposition utilise un mécanisme pour mettre à jour l’importance relative de chaque caractéristique au cours du temps. De plus, une approche heuristique est utilisée pour réduire la complexité de la maintenance du modèle d’arrière-plan et aussi sa robustesse. Par contre, cette méthode nécessite un grand nombre de caractéristiques pour avoir une bonne précision. De plus, chaque classificateur de base apprend un ensemble de caractéristiques au lieu de chaque caractéristique individuellement. Pour compenser ces limitations, nous avons amélioré cette approche en proposant une nouvelle méthodologie pour sélectionner des caractéristiques basées sur le principe du « wagging ». Nous avons également adopté une approche basée sur le concept de « superpixel » au lieu de traiter chaque pixel individuellement. Cela augmente non seulement l’efficacité en termes de temps de calcul et de consommation de mémoire, mais aussi la qualité de la détection des objets mobiles
In this thesis, we present a robust descriptor for background subtraction which is able to describe texture from an image sequence. The descriptor is less sensitive to noisy pixels and produces a short histogram, while preserving robustness to illumination changes. Moreover, a descriptor for dynamic texture recognition is also proposed. This descriptor extracts not only color information, but also a more detailed information from video sequences. Finally, we present an ensemble for feature selection approach that is able to select suitable features for each pixel to distinguish the foreground objects from the background ones. Our proposal uses a mechanism to update the relative importance of each feature over time. For this purpose, a heuristic approach is used to reduce the complexity of the background model maintenance while maintaining the robustness of the background model. However, this method only reaches the highest accuracy when the number of features is huge. In addition, each base classifier learns a feature set instead of individual features. To overcome these limitations, we extended our previous approach by proposing a new methodology for selecting features based on wagging. We also adopted a superpixel-based approach instead of a pixel-level approach. This does not only increases the efficiency in terms of time and memory consumption, but also can improves the segmentation performance of moving objects

APA, Harvard, Vancouver, ISO, and other styles

20

Truong, Hoang Vinh. "Multi color space LBP-based feature selection for texture classification." Thesis, Littoral, 2018. http://www.theses.fr/2018DUNK0468/document.

Full text

Abstract:

L'analyse de texture a été largement étudiée dans la littérature et une grande variété de descripteurs de texture ont été proposés. Parmi ceux-ci, les motifs binaires locaux (LBP) occupent une part importante dans la plupart des applications d'imagerie couleur ou de reconnaissance de formes et sont particulièrement exploités dans les problèmes d'analyse de texture. Généralement, les images couleurs acquises sont représentées dans l'espace colorimétrique RGB. Cependant, il existe de nombreux espaces couleur pour la classification des textures, chacun ayant des propriétés spécifiques qui impactent les performances. Afin d'éviter la difficulté de choisir un espace pertinent, la stratégie multi-espace couleur permet d'utiliser simultanémentles propriétés de plusieurs espaces. Toutefois, cette stratégie conduit à augmenter le nombre d'attributs, notamment lorsqu'ils sont extraits de LBP appliqués aux images couleur. Ce travail de recherche est donc axé sur la réduction de la dimension de l'espace d'attributs générés à partir de motifs binaires locaux par des méthodes de sélection d'attributs. Dans ce cadre, nous considérons l'histogramme des LBP pour la représentation des textures couleur et proposons des approches conjointes de sélection de bins et d'histogrammes multi-espace pour la classification supervisée de textures. Les nombreuses expériences menées sur des bases de référence de texture couleur, démontrent que les approches proposées peuvent améliorer les performances en classification comparées à l'état de l'art
Texture analysis has been extensively studied and a wide variety of description approaches have been proposed. Among them, Local Binary Pattern (LBP) takes an essential part of most of color image analysis and pattern recognition applications. Usually, devices acquire images and code them in the RBG color space. However, there are many color spaces for texture classification, each one having specific properties. In order to avoid the difficulty of choosing a relevant space, the multi color space strategy allows using the properties of several spaces simultaneously. However, this strategy leads to increase the number of features extracted from LBP applied to color images. This work is focused on the dimensionality reduction of LBP-based feature selection methods. In this framework, we consider the LBP histogram and bin selection approaches for supervised texture classification. Extensive experiments are conducted on several benchmark color texture databases. They demonstrate that the proposed approaches can improve the state-of-the-art results

APA, Harvard, Vancouver, ISO, and other styles

21

Jang, Justin. "Subset selection in hierarchical recursive pattern assemblies and relief feature instancing for modeling geometric patterns." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33821.

Full text

Abstract:

This thesis is concerned with modeling geometric patterns. Specifically, a clear and practical definition for regular patterns is proposed. Based on this definition, this thesis proposes the following modeling setting to describe the semantic transfer of a model between various forms of pattern regularity: (1) recognition or identification of patterns in digital models of 3D assemblies and scenes, (2) pattern regularization, (3) pattern modification and editing by varying the repetition parameters, and (4) establishing exceptions (designed irregularities) in regular patterns. In line with this setting, this thesis describes a representation and approach for designing and editing hierarchical assemblies based on grouped, nested, and recursively nested patterns. Based on this representation, this thesis presents the OCTOR approach for specifying, recording, and producing exceptions in regular patterns. To support editing of free-form shape patterns on surfaces, this thesis also presents the imprint-mapping approach which can be used to identify, extract, process, and apply relief features on surfaces. Pattern regularization, modification, and exceptions are addressed for the case of relief features on surfaces.

APA, Harvard, Vancouver, ISO, and other styles

22

Sigweni, Boyce B. "An investigation of feature weighting algorithms and validation techniques using blind analysis for analogy-based estimation." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/12797.

Full text

Abstract:

Context: Software effort estimation is a very important component of the software development life cycle. It underpins activities such as planning, maintenance and bidding. Therefore, it has triggered much research over the past four decades, including many machine learning approaches. One popular approach, that has the benefit of accessible reasoning, is analogy-based estimation. Machine learning including analogy is known to significantly benefit from feature selection/weighting. Unfortunately feature weighting search is an NP hard problem, therefore computationally very demanding, if not intractable. Objective: Therefore, one objective of this research is to develop an effi cient and effective feature weighting algorithm for estimation by analogy. However, a major challenge for the effort estimation research community is that experimental results tend to be contradictory and also lack reliability. This has been paralleled by a recent awareness of how bias can impact research results. This is a contributory reason why software effort estimation is still an open problem. Consequently the second objective is to investigate research methods that might lead to more reliable results and focus on blinding methods to reduce researcher bias. Method: In order to build on the most promising feature weighting algorithms I conduct a systematic literature review. From this I develop a novel and e fficient feature weighting algorithm. This is experimentally evaluated, comparing three feature weighting approaches with a na ive benchmark using 2 industrial data sets. Using these experiments, I explore blind analysis as a technique to reduce bias. Results: The systematic literature review conducted identified 19 relevant primary studies. Results from the meta-analysis of selected studies using a one-sample sign test (p = 0.0003) shows a positive effect - to feature weighting in general compared with ordinary analogy-based estimation (ABE), that is, feature weighting is a worthwhile technique to improve ABE. Nevertheless the results remain imperfect so there is still much scope for improvement. My experience shows that blinding can be a relatively straightforward procedure. I also highlight various statistical analysis decisions which ought not be guided by the hunt for statistical significance and show that results can be inverted merely through a seemingly inconsequential statistical nicety. After analysing results from 483 software projects from two separate industrial data sets, I conclude that the proposed technique improves accuracy over the standard feature subset selection (FSS) and traditional case-based reasoning (CBR) when using pseudo time-series validation. Interestingly, there is no strong evidence for superior performance of the new technique when traditional validation techniques (jackknifing) are used but is more effi cient. Conclusion: There are two main findings: (i) Feature weighting techniques are promising for software effort estimation but they need to be tailored for target case for their potential to be adequately exploited. Despite the research findings showing that assuming weights differ in different parts of the instance space ('local' regions) may improve effort estimation results - majority of studies in software effort estimation (SEE) do not take this into consideration. This represents an improvement on other methods that do not take this into consideration. (ii) Whilst there are minor challenges and some limits to the degree of blinding possible, blind analysis is a very practical and an easy-to-implement method that supports more objective analysis of experimental results. Therefore I argue that blind analysis should be the norm for analysing software engineering experiments.

APA, Harvard, Vancouver, ISO, and other styles

23

Liu, Xiaofeng. "Machinery fault diagnostics based on fuzzy measure and fuzzy integral data fusion techniques." Thesis, Queensland University of Technology, 2007. https://eprints.qut.edu.au/16456/1/Xiaofeng_Liu_Thesis.pdf.

Full text

Abstract:

With growing demands for reliability, availability, safety and cost efficiency in modern machinery, accurate fault diagnosis is becoming of paramount importance so that potential failures can be better managed. Although various methods have been applied to machinery condition monitoring and fault diagnosis, the diagnostic accuracy that can be attained is far from satisfactory. As most machinery faults lead to increases in vibration levels, vibration monitoring has become one of the most basic and widely used methods to detect machinery faults. However, current vibration monitoring methods largely depend on signal processing techniques. This study is based on the recognition that a multi-parameter data fusion approach to diagnostics can produce more accurate results. Fuzzy measures and fuzzy integral data fusion theory can represent the importance of each criterion and express certain interactions among them. This research developed a novel, systematic and effective fuzzy measure and fuzzy integral data fusion approach for machinery fault diagnosis, which comprises feature set selection schema, feature level data fusion schema and decision level data fusion schema for machinery fault diagnosis. Different feature selection and fault diagnostic models were derived from these schemas. Two fuzzy measures and two fuzzy integrals were employed: the 2-additive fuzzy measure, the fuzzy measure, the Choquet fuzzy integral and the Sugeno fuzzy integral respectively. The models were validated using rolling element bearing and electrical motor experiments. Different features extracted from vibration signals were used to validate the rolling element bearing feature set selection and fault diagnostic models, while features obtained from both vibration and current signals were employed to assess electrical motor fault diagnostic models. The results show that the proposed schemas and models perform very well in selecting feature set and can improve accuracy in diagnosing both the rolling element bearing and electrical motor faults.

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Xiaofeng. "Machinery fault diagnostics based on fuzzy measure and fuzzy integral data fusion techniques." Queensland University of Technology, 2007. http://eprints.qut.edu.au/16456/.

Full text

Abstract:

With growing demands for reliability, availability, safety and cost efficiency in modern machinery, accurate fault diagnosis is becoming of paramount importance so that potential failures can be better managed. Although various methods have been applied to machinery condition monitoring and fault diagnosis, the diagnostic accuracy that can be attained is far from satisfactory. As most machinery faults lead to increases in vibration levels, vibration monitoring has become one of the most basic and widely used methods to detect machinery faults. However, current vibration monitoring methods largely depend on signal processing techniques. This study is based on the recognition that a multi-parameter data fusion approach to diagnostics can produce more accurate results. Fuzzy measures and fuzzy integral data fusion theory can represent the importance of each criterion and express certain interactions among them. This research developed a novel, systematic and effective fuzzy measure and fuzzy integral data fusion approach for machinery fault diagnosis, which comprises feature set selection schema, feature level data fusion schema and decision level data fusion schema for machinery fault diagnosis. Different feature selection and fault diagnostic models were derived from these schemas. Two fuzzy measures and two fuzzy integrals were employed: the 2-additive fuzzy measure, the fuzzy measure, the Choquet fuzzy integral and the Sugeno fuzzy integral respectively. The models were validated using rolling element bearing and electrical motor experiments. Different features extracted from vibration signals were used to validate the rolling element bearing feature set selection and fault diagnostic models, while features obtained from both vibration and current signals were employed to assess electrical motor fault diagnostic models. The results show that the proposed schemas and models perform very well in selecting feature set and can improve accuracy in diagnosing both the rolling element bearing and electrical motor faults.

APA, Harvard, Vancouver, ISO, and other styles

25

Nakisa, Bahareh. "Emotion classification using advanced machine learning techniques applied to wearable physiological signals data." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/129875/9/Bahareh%20Nakisa%20Thesis.pdf.

Full text

Abstract:

This research contributed to the development of advanced feature selection model, hyperparameter optimization and temporal multimodal deep learning model to improve the performance of dimensional emotion recognition. This study adopts different approaches based on portable wearable physiological sensors. It identified best models for feature selection and best hyperparameter values for Long Short-Term Memory network and how to fuse multi-modal sensors efficiently for assessing emotion recognition. All methods of this thesis collectively deliver better algorithms and maximize the use of miniaturized sensors to provide an accurate measurement of emotion recognition.

APA, Harvard, Vancouver, ISO, and other styles

26

Arnroth, Lukas, and Dennis Jonni Fiddler. "Supervised Learning Techniques : A comparison of the Random Forest and the Support Vector Machine." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-274768.

Full text

Abstract:

This thesis examines the performance of the support vector machine and the random forest models in the context of binary classification. The two techniques are compared and the outstanding one is used to construct a final parsimonious model. The data set consists of 33 observations and 89 biomarkers as features with no known dependent variable. The dependent variable is generated through k-means clustering, with a predefined final solution of two clusters. The training of the algorithms is performed using five-fold cross-validation repeated twenty times. The outcome of the training process reveals that the best performing versions of the models are a linear support vector machine and a random forest with six randomly selected features at each split. The final results of the comparison on the test set of these optimally tuned algorithms show that the random forest outperforms the linear kernel support vector machine. The former classifies all observations in the test set correctly whilst the latter classifies all but one correctly. Hence, a parsimonious random forest model using the top five features is constructed, which, to conclude, performs equally well on the test set compared to the original random forest model using all features.

APA, Harvard, Vancouver, ISO, and other styles

27

Chida, Anjum A. "Protein Tertiary Model Assessment Using Granular Machine Learning Techniques." Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cs_diss/65.

Full text

Abstract:

The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same.

APA, Harvard, Vancouver, ISO, and other styles

28

Garg, Anushka. "Comparing Machine Learning Algorithms and Feature Selection Techniques to Predict Undesired Behavior in Business Processesand Study of Auto ML Frameworks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285559.

Full text

Abstract:

In recent years, the scope of Machine Learning algorithms and its techniques are taking up a notch in every industry (for example, recommendation systems, user behavior analytics, financial applications and many more). In practice, they play an important role in utilizing the power of the vast data we currently generate on a daily basis in our digital world.In this study, we present a comprehensive comparison of different supervised Machine Learning algorithms and feature selection techniques to build a best predictive model as an output. Thus, this predictive model helps companies predict unwanted behavior in their business processes. In addition, we have researched for the automation of all the steps involved (from understanding data to implementing models) in the complete Machine Learning Pipeline, also known as AutoML, and provide a comprehensive survey of the various frameworks introduced in this domain. These frameworks were introduced to solve the problem of CASH (combined algorithm selection and Hyper- parameter optimization), which is basically automation of various pipelines involved in the process of building a Machine Learning predictive model.
Under de senaste åren har omfattningen av maskininlärnings algoritmer och tekniker tagit ett steg i alla branscher (till exempel rekommendationssystem, beteendeanalyser av användare, finansiella applikationer och många fler). I praktiken spelar de en viktig roll för att utnyttja kraften av den enorma mängd data vi för närvarande genererar dagligen i vår digitala värld.I den här studien presenterar vi en omfattande jämförelse av olika övervakade maskininlärnings algoritmer och funktionsvalstekniker för att bygga en bästa förutsägbar modell som en utgång. Således hjälper denna förutsägbara modell företag att förutsäga oönskat beteende i sina affärsprocesser. Dessutom har vi undersökt automatiseringen av alla inblandade steg (från att förstå data till implementeringsmodeller) i den fullständiga maskininlärning rörledningen, även känd som AutoML, och tillhandahåller en omfattande undersökning av de olika ramarna som introducerats i denna domän. Dessa ramar introducerades för att lösa problemet med CASH (kombinerat algoritmval och optimering av Hyper-parameter), vilket i grunden är automatisering av olika rörledningar som är inblandade i processen att bygga en förutsägbar modell för maskininlärning.

APA, Harvard, Vancouver, ISO, and other styles

29

Ahmed, Omar W. "Enhanced flare prediction by advanced feature extraction from solar images : developing automated imaging and machine learning techniques for processing solar images and extracting features from active regions to enable the efficient prediction of solar flares." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5407.

Full text

Abstract:

Space weather has become an international issue due to the catastrophic impact it can have on modern societies. Solar flares are one of the major solar activities that drive space weather and yet their occurrence is not fully understood. Research is required to yield a better understanding of flare occurrence and enable the development of an accurate flare prediction system, which can warn industries most at risk to take preventative measures to mitigate or avoid the effects of space weather. This thesis introduces novel technologies developed by combining advances in statistical physics, image processing, machine learning, and feature selection algorithms, with advances in solar physics in order to extract valuable knowledge from historical solar data, related to active regions and flares. The aim of this thesis is to achieve the followings: i) The design of a new measurement, inspired by the physical Ising model, to estimate the magnetic complexity in active regions using solar images and an investigation of this measurement in relation to flare occurrence. The proposed name of the measurement is the Ising Magnetic Complexity (IMC). ii) Determination of the flare prediction capability of active region properties generated by the new active region detection system SMART (Solar Monitor Active Region Tracking) to enable the design of a new flare prediction system. iii) Determination of the active region properties that are most related to flare occurrence in order to enhance understanding of the underlying physics behind flare occurrence. The achieved results can be summarised as follows: i) The new active region measurement (IMC) appears to be related to flare occurrence and it has a potential use in predicting flare occurrence and location. ii) Combining machine learning with SMART¿s active region properties has the potential to provide more accurate flare predictions than the current flare prediction systems i.e. ASAP (Automated Solar Activity Prediction). iii) Reduced set of 6 active region properties seems to be the most significant properties related to flare occurrence and they can achieve similar degree of flare prediction accuracy as the full 21 SMART active region properties. The developed technologies and the findings achieved in this thesis will work as a corner stone to enhance the accuracy of flare prediction; develop efficient flare prediction systems; and enhance our understanding of flare occurrence. The algorithms, implementation, results, and future work are explained in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

30

Ahmed, Omar Wahab. "Enhanced flare prediction by advanced feature extraction from solar images : developing automated imaging and machine learning techniques for processing solar images and extracting features from active regions to enable the efficient prediction of solar flares." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5407.

Full text

Abstract:

Space weather has become an international issue due to the catastrophic impact it can have on modern societies. Solar flares are one of the major solar activities that drive space weather and yet their occurrence is not fully understood. Research is required to yield a better understanding of flare occurrence and enable the development of an accurate flare prediction system, which can warn industries most at risk to take preventative measures to mitigate or avoid the effects of space weather. This thesis introduces novel technologies developed by combining advances in statistical physics, image processing, machine learning, and feature selection algorithms, with advances in solar physics in order to extract valuable knowledge from historical solar data, related to active regions and flares. The aim of this thesis is to achieve the followings: i) The design of a new measurement, inspired by the physical Ising model, to estimate the magnetic complexity in active regions using solar images and an investigation of this measurement in relation to flare occurrence. The proposed name of the measurement is the Ising Magnetic Complexity (IMC). ii) Determination of the flare prediction capability of active region properties generated by the new active region detection system SMART (Solar Monitor Active Region Tracking) to enable the design of a new flare prediction system. iii) Determination of the active region properties that are most related to flare occurrence in order to enhance understanding of the underlying physics behind flare occurrence. The achieved results can be summarised as follows: i) The new active region measurement (IMC) appears to be related to flare occurrence and it has a potential use in predicting flare occurrence and location. ii) Combining machine learning with SMART's active region properties has the potential to provide more accurate flare predictions than the current flare prediction systems i.e. ASAP (Automated Solar Activity Prediction). iii) Reduced set of 6 active region properties seems to be the most significant properties related to flare occurrence and they can achieve similar degree of flare prediction accuracy as the full 21 SMART active region properties. The developed technologies and the findings achieved in this thesis will work as a corner stone to enhance the accuracy of flare prediction; develop efficient flare prediction systems; and enhance our understanding of flare occurrence. The algorithms, implementation, results, and future work are explained in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

31

Marin, Rodenas Alfonso. "Comparison of Automatic Classifiers’ Performances using Word-based Feature Extraction Techniques in an E-government setting." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-32363.

Full text

Abstract:

Nowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (“Försäkringskassan” in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.

APA, Harvard, Vancouver, ISO, and other styles

32

Di, Bono Maria Grazia. "Beyond mind reading: advanced machine learning techniques for FMRI data analysis." Doctoral thesis, Università degli studi di Padova, 2009. http://hdl.handle.net/11577/3426149.

Full text

Abstract:

The advent of functional Magnetic Resonance Imaging (fMRI) has significantly improved the knowledge about the neural correlates of perceptual and cognitive processes. The aim of this thesis is to discuss the characteristics of different approaches for fMRI data analysis, from the conventional mass univariate analysis (General Linear Model - GLM), to the multivariate analysis (i.e., data-driven and pattern based methods), and propose a novel, advanced method (Functional ANOVA Models of Gaussian Kernels - FAM-GK) for the analysis of fMRI data acquired in the context of fast event-related experiments. FAM-GK is an embedded method for voxel selection and is able to capture the nonlinear spatio-temporal dynamics of the BOLD signals by performing nonlinear estimation of the experimental conditions. The impact of crucial aspects concerning the use of pattern recognition methods on the fMRI data analysis, such as voxel selection, the choice of classifier and tuning parameters, the cross-validation techniques, are investigated and discussed by analysing the results obtained in four neuroimaging case studies. In a first study, we explore the robustness of nonlinear Support Vector regression (SVR), combined with a filter approach for voxel selection, in the case of an extremely complex regression problem, in which we had to predict the subjective experience of participants immersed in a virtual reality environment. In a second study, we face the problem of voxel selection combined with the choice of the best classifier, and we propose a methodology based on genetic algorithms and nonlinear support vector machine (GA-SVM) efficiently combined in a wrapper approach. In a third study we compare three pattern recognition techniques (i.e., linear SVM, nonlinear SVM, and FAM-GK) for investigating the neural correlates of the representation of numerical and non-numerical ordered sequences (i.e., numbers and letters) in the horizontal segment of the Intraparietal Sulcus (hIPS). The FAM-GK method significantly outperformed the other two classifiers. The results show a partial overlapping of the two representation systems suggesting the existence of neural substrates in hIPS codifying the cardinal and the ordinal dimensions of numbers and letters in a partially independent way. Finally, in the last preliminary study, we tested the same three pattern recognition methods on fMRI data acquired in the context of a fast event-related experiment. The FAM-GK method shows a very high performance, whereas the other classifiers fail to achieve an acceptable classification performance.
L’avvento della tecnica di Risonanza Magnetica funzionale (fMRI) ha notevolmente migliorato le conoscenze sui correlati neurali sottostanti i processi cognitivi. Obiettivo di questa tesi è stato quello di illustrare e discutere criticamente le caratteristiche dei diversi approcci per l’analisi dei dati fMRI, dai metodi convenzionali di analisi univariata (General Linear Model - GLM) ai metodi di analisi multivariata (metodi data-driven e di pattern recognition), proponendo una nuova tecnica avanzata (Functional ANOVA Models of Gaussian Kernels - FAM-GK) per l’analisi di dati fMRI acquisiti con paradigmi sperimentali fast event-related. FAM-GK è un metodo embedded per la selezione dei voxels, che è in grado di catturare le dinamiche non lineari spazio-temporali del segnale BOLD, effettuando stime non lineari delle condizioni sperimentali. L’impatto degli aspetti critici riguardanti l’uso di tecniche di pattern recognition sull’analisi di dati fMRI, tra cui la selezione dei voxels, la scelta del classificatore e dei suoi parametri di apprendimento, le tecniche di cross-validation, sono valutati e discussi analizzando i risultati ottenuti in quattro casi di studio. In un primo studio, abbiamo indagato la robustezza di Support Vector regression (SVR) non lineare, integrato con un approccio di tipo filter per la selezione dei voxels, in un caso di un problema di regressione estremamente complesso, in cui dovevamo predire l’esperienza soggettiva di alcuni partecipanti immersi in un ambiente di realtà virtuale. In un secondo studio, abbiamo affrontato il problema della selezione dei voxels integrato con la scelta del miglior classificatore, proponendo un metodo basato sugli algoritmi genetici e SVM non lineare (GA-SVM) in un approccio di tipo wrapper. In un terzo studio, abbiamo confrontato tre metodi di pattern recognition (SVM lineare, SVM non lineare e FAM-GK) per indagare i correlati neurali della rappresentazione di sequenze ordinate numeriche e non-numeriche (numeri e lettere) a livello del segmento orizzontale del solco intraparitale (hIPS). Le prestazioni di classificazione di FAM-GK sono risultate essere significativamente superiori rispetto a quelle degli alti due classificatori. I risultati hanno mostrato una parziale sovrapposizione dei due sistemi di rappresentazione, suggerendo l’esistenza di substrati neurali nelle regioni hIPS che codificano le dimensioni cardinale e ordinale dei numeri e delle lettere in modo parzialmente indipendente. Infine, nel quarto studio preliminare, abbiamo testato e confrontato gli stessi tre classificatori su dati fMRI acquisiti durante un esperimento fast event-related. FAM-GK ha mostrato delle prestazioni di classificazione piuttosto elevate, mentre le prestazioni degli altri due classificatori sono risultate essere di poco superiori al caso.

APA, Harvard, Vancouver, ISO, and other styles

33

Karlsson, Henrik. "Monitoring Vehicle Suspension Elements Using Machine Learning Techniques." Thesis, KTH, Spårfordon, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-262916.

Full text

Abstract:

Condition monitoring (CM) is widely used in industry, and there is a growing interest in applying CM on rail vehicle systems. Condition based maintenance has the possibility to increase system safety and availability while at the sametime reduce the total maintenance costs.This thesis investigates the feasibility of using condition monitoring of suspension element components, in this case dampers, in rail vehicles. There are different methods utilized to detect degradations, ranging from mathematicalmodelling of the system to pure "knowledge-based" methods, using only large amount of data to detect patterns on a larger scale. In this thesis the latter approach is explored, where acceleration signals are evaluated on severalplaces on the axleboxes, bogieframes and the carbody of a rail vehicle simulation model. These signals are picked close to the dampers that are monitored in this study, and frequency response functions (FRF) are computed between axleboxes and bogieframes as well as between bogieframes and carbody. The idea is that the FRF will change as the condition of the dampers change, and thus act as indicators of faults. The FRF are then fed to different classificationalgorithms, that are trained and tested to distinguish between the different damper faults.This thesis further investigates which classification algorithm shows promising results for the problem, and which algorithm performs best in terms of classification accuracy as well as two other measures. Another aspect explored is thepossibility to apply dimensionality reduction to the extracted indicators (features). This thesis is also looking into how the three performance measures used are affected by typical varying operational conditions for a rail vehicle,such as varying excitation and carbody mass. The Linear Support Vector Machine classifier using the whole feature space, and the Linear Discriminant Analysis classifier combined with Principal Component Analysis dimensionality reduction on the feature space both show promising results for the taskof correctly classifying upcoming damper degradations.
Tillståndsövervakning används brett inom industrin och det finns ett ökat intresse för att applicera tillståndsövervakning inom spårfordons olika system. Tillståndsbaserat underhåll kan potentiellt öka ett systems säkerhet och tillgänglighetsamtidigt som det kan minska de totala underhållskostnaderna.Detta examensarbete undersöker möjligheten att applicera tillståndsövervakning av komponenter i fjädringssystem, i detta fall dämpare, hos spårfordon. Det finns olika metoder för att upptäcka försämringar i komponenternas skick, från matematisk modellering av systemet till mer ”kunskaps-baserade” metodersom endast använder stora mängder data för att upptäcka mönster i en större skala. I detta arbete utforskas den sistnämnda metoden, där accelerationssignaler inhämtas från axelboxar, boggieramar samt vagnskorg från en simuleringsmodellav ett spårfordon. Dessa signaler är extraherade nära de dämpare som övervakas, och används för att beräkna frekvenssvarsfunktioner mellan axelboxar och boggieramar, samt mellan boggieramar och vagnskorg. Tanken är att frekvenssvarsfunktionerna förändras när dämparnas skick förändras ochpå så sätt fungera som indikatorer av dämparnas skick. Frekvenssvarsfunktionerna används sedan för att träna och testa olika klassificeringsalgoritmer för att kunna urskilja olika dämparfel.Detta arbete undersöker vidare vilka klassificeringsalgoritmer som visar lovande resultat för detta problem, och vilka av dessa som presterar bäst med avseende på noggrannheten i prediktionerna, samt två andra mått på algoritmernasprestanda. En annan aspekt som undersöks är möjligheten att applicera dimensionalitetsminskning på de extraherade indikatorerna. Detta arbete undersöker också hur de tre prestandamåtten som används påverkas av typiska förändringar i driftsförhållanden för ett spårfordon såsom varierande exciteringfrån spåret och vagnkorgsmassa. Resultaten visar lovande prestanda för klassificeringsalgoritmen ”Linear Support Vector Machine” som använder hela rymden med felindikatorer, samt algoritmen ”Linear Discriminant Analysis” i kombination med ”Principal Component Analysis” dimensionalitetsreducering.

APA, Harvard, Vancouver, ISO, and other styles

34

Talha, Sid Ahmed Walid. "Apport des techniques d'analyse et de traitement de données pour la reconnaissance des actions en vue d'un suivi du comportement humain." Thesis, Ecole nationale supérieure Mines-Télécom Lille Douai, 2020. http://www.theses.fr/2020MTLD0006.

Full text

Abstract:

Pour lutter contre la perte d’autonomie liée au vieillissement dû à des altérations physiques et/ou psychiques, les nouvelles technologies, oeuvrent à retarder sa survenue, la détecter, l’évaluer et proposer des solutions modernes et innovantes. Dans ce contexte, notre projet de thèse vise à exploiter l’apport des techniques d’analyse et de traitement de données pour le suivi du comportement humain.Cette thèse cible deux parties importantes et complémentaires : la première réalise la synthèse journalière de l’ensemble des actions effectuées par la personne, afin de nous renseigner sur son degré d’autonomie. La deuxième partie propose une solution moderne basée sur l’exécution d’exercices physiques sous forme de mouvements contrôlés reconnus et corrigés.A partir d’une base de données de signaux acquis d’un accéléromètre et d’un gyroscope embarqués dans un smartphone, nous avons développé et mis en place un système intelligent pour la reconnaissance de l’action. Nous nous sommes tout d’abord intéressés à la construction d’un vecteur d’attributs pertinent et optimal suivant le problème de classification rencontré. Notre algorithme de sélection d’attributs est exécuté au niveau de chaque nœud interne de l’approche de classification nous permettant ainsi d’obtenir des performances supérieures aux différents travaux existants dans la littérature. Cette démarche a permis de classer trois catégories d’actions intiment corrélées avec l’autonomie et le bien-être : les actions sédentaires, les actions périodiques ou pseudo-périodiques et les actions non périodiques. Ce système reconnaît aussi six transitions posturales négligées dans la littérature mais importantes pour l’autonomie et le bien-être. Notre approche nous garantit la robustesse à l’emplacement des capteurs et permet de réduire considérablement le temps de calcul nécessaire pour reconnaître l’action.A la base des actions effectuées par la personne durant la journée, un indicateur d’autonomie pourra être établi. Pour maintenir cette autonomie et pallier au risque de la perdre, il est important de s’entretenir physiquement. Dans ce contexte, nous proposons un second système intelligent de reconnaissance de l’action humaine basé sur des données squelette acquis à partir d’une caméra Kinect. Un nouvel algorithme d’extraction d’attributs en temps réel appelé BDV (Body-part Directional Velocity) a été proposé. Dans ce cas le système intelligent de reconnaissance de l’action est basé sur les modèles de Markov cachés (HMMs) avec des distributions de sortie d’état représentées par des modèles de mélange gaussiens (GMMs). Les résultats expérimentaux sur deux jeux de données publiques ont démontré l’efficacité de notre approche et sa supériorité par rapport aux approches de l’état de l’art. L’invariance et la robustesse de l’approche à l’orientation ont été aussi abordées en positionnant ainsi notre technique parmi les meilleures approches testées sur deux jeux de données présentation ce challenge.La reconnaissance anticipée de l’action par notre système a aussi été considérée en montrant que la moitié des actions étaient prévisibles presque au milieu de l’ensemble de la séquence de données squelette, et que certaines classes ont été reconnues avec seulement 4 % de l’ensemble des données de la séquence
To prevent the loss of autonomy linked to aging due to physical and / or psychological alterations, new technologies are working to delay its occurrence, detect it, assess it by offering modern and innovative solutions. In this context, our thesis project aims to exploit the contribution of analysis and data processing techniques for monitoring human behavior.This thesis targets two important and complementary parts: the first carries out the daily action recognition performed by a person, to inform us about his degree of autonomy. The second part offers a modern solution to maintain autonomy, it is based on the execution of physical exercices.From a datasets of signals collected by an accelerometer and a gyroscope embedded in a smartphone, we have developed and implemented an intelligent system for action recognition. We were first interested in the construction of a relevant and optimal feature vector according to the classification problem encountered. Our feature selection algorithm is executed at the level of each internal node of the classification approach, thus allowing us to outperform various state-of-the-art methods. Out approach carries out the classification of three categories of actions highly correlated with autonomy and well-being: sedentary actions, periodic or pseudo-periodic actions, and non-periodic actions. Our system also recognizes six postural transitions important for autonomy and well-being. The proposed approach guarantees robustness in the location of sensors and considerably reduces the computation time necessary to recognize the action.Based on actions carried out by the person during the day, an autonomy indicator can be established. To maintain this autonomy and decrease the risk of losing it, it is important to practice physical exercises. In this context, we propose a second intelligent system to recognize human actions based on skeleton data collected from a Kinect camera. A new algorithm for feature extraction in real-time called BDV (Body-part Directional Velocity) has been proposed. The classification of actions is based on hidden Markov models (HMMs) with state output distributions represented by Gaussian mixing models (GMMs). Experimental results on public datasets have demonstrated the effectiveness of our approach and its superiority over state-of-the-art methods. The invariance and robustness to the orientation of the camera were also addressed, thus positioning our technique among the best approaches on two datasets presenting this challenge. The early recognition of the action by our system was also considered by showing that half of the actions were predictable almost in the middle of the entire sequence of skeleton data and that some classes were recognized with only 4% of the sequence

APA, Harvard, Vancouver, ISO, and other styles

35

Auffarth, Benjamin. "Machine Learning Techniques with Specific Application to the Early Olfactory System." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-90474.

Full text

Abstract:

This thesis deals with machine learning techniques for the extraction of structure and the analysis of the vertebrate olfactory pathway based on related methods. Some of its main contributions are summarized below. We have performed a systematic investigation for classification in biomedical images with the goal of recognizing a material in these images by its texture. This investigation included (i) different measures for evaluating the importance of image descriptors (features), (ii) methods to select a feature set based on these evaluations, and (iii) classification algorithms. Image features were evaluated according to their estimated relevance for the classification task and their redundancy with other features. For this purpose, we proposed a framework for relevance and redundancy measures and, within this framework, we proposed two new measures. These were the value difference metric and the fit criterion. Both measures performed well in comparison with other previously used ones for evaluating features. We also proposed a Hopfield network as a method for feature selection, which in experiments gave one of the best results relative to other previously used approaches. We proposed a genetic algorithm for clustering and tested it on several realworld datasets. This genetic algorithm was novel in several ways, including (i) the use of intra-cluster distance as additional optimization criterion, (ii) an annealing procedure, and (iii) adaptation of mutation rates. As opposed to many conventional clustering algorithms, our optimization framework allowed us to use different cluster validation measures including those which do not rely on cluster centroids. We demonstrated the use of the clustering algorithm experimentally with several cluster validity measures as optimization criteria. We compared the performance of our clustering algorithm to that of the often-used fuzzy c-means algorithm on several standard machine learning datasets from the University of California/Urvine (UCI) and obtained good results. The organization of representations in the brain has been observed at several stages of processing to spatially decompose input from the environment into features that are somehow relevant from a behavioral or perceptual standpoint. For the perception of smells, the analysis of such an organization, however, is not as straightforward because of the missing metric. Some studies report spatial clusters for several combinations of physico-chemical properties in the olfactory bulb at the level of the glomeruli. We performed a systematic study of representations based on a dataset of activity-related images comprising more than 350 odorants and covering the whole spatial array of the first synaptic level in the olfactory system. We found clustered representations for several physico-chemical properties. We compared the relevance of these properties to activations and estimated the size of the coding zones. The results confirmed and extended previous studies on olfactory coding for physico-chemical properties. Particularly of interest was the spatial progression by carbon chain that we found. We discussed our estimates of relevance and coding size in the context of processing strategies. We think that the results obtained in this study could guide the search into olfactory coding primitives and the understanding of the stimulus space. In a second study on representations in the olfactory bulb, we grouped odorants together by perceptual categories, such as floral and fruity. By the application of the same statistical methods as in the previous study, we found clustered zones for these categories. Furthermore, we found that distances between spatial representations were related to perceptual differences in humans as reported in the literature. This was possibly the first time that such an analysis had been done. Apart from pointing towards a spatial decomposition by perceptual dimensions, results indicate that distance relationships between representations could be perceptually meaningful. In a third study, we modeled axon convergence from olfactory receptor neurons to the olfactory bulb. Sensory neurons were stimulated by a set of biologically-relevant odors, which were described by a set of physico-chemical properties that covaried with the neural and glomerular population activity in the olfactory bulb. Convergence was mediated by the covariance between olfactory neurons. In our model, we could replicate the formation of glomeruli and concentration coding as reported in the literature, and further, we found that the spatial relationships between representational zones resulting from our model correlated with reported perceptual differences between odor categories. This shows that natural statistics, including similarity of physico-chemical structure of odorants, can give rise to an ordered arrangement of representations at the olfactory bulb level where the distances between representations are perceptually relevant.

QC 20120224

APA, Harvard, Vancouver, ISO, and other styles

36

Mi, Hongmei. "PDE modeling and feature selection : prediction of tumor evolution and patient outcome in therapeutic follow-up with FDG-PET images." Rouen, 2015. http://www.theses.fr/2015ROUES005.

Full text

Abstract:

La radiothérapie adaptative peut potentiellement améliorer le résultat du traitement du patient à partir d'un plan de traitement ré-optimisé précoce ou au cours du traitement en prenant en compte les spécificités individuelles. Des études prédictives sur le suivi thérapeutique du patient pourraient être d'intérêt sur la façon d’adapter le traitement à chaque patient. Dans cette thèse, nous menons deux études prédictives en utilisant la tomographie par émission de positons (TEP). La première étude a pour but de prédire l'évolution de la tumeur pendant la radiothérapie. Nous proposons un modèle de croissance tumorale spécifique au patient qui est basé sur des équations aux dérivées partielles. Ce modèle est composé de trois termes représentant trois processus biologiques respectivement, où les paramètres du modèle de croissance tumorale sont estimés à partir des images TEP précédentes du patient. La deuxième partie de la thèse porte sur le cas où des images fréquentes de la tumeur est indisponible. Nous effectuons donc une autre étude dont l'objectif est de sélectionner des caractéristiques prédictives, parmi lesquelles des caractéristiques issues des images TEP et d'autres cliniques, pour prédire l’état du patient après le traitement. Notre deuxième contribution est donc une méthode « wrapper » de sélection de caractéristiques qui recherche vers l'avant dans un espace hiérarchique de sous-ensemble de caractéristiques, et évalue les sous-ensembles par leurs performances de prédiction utilisant la machine à vecteurs de support (SVM) comme le classificateur. Pour les deux études prédictives, des résultats obtenus chez des patients atteints de cancer sont encourageants
Adaptive radiotherapy has the potential to improve patient’s outcome from a re-optimized treatment plan early or during the course of treatment by taking individual specificities into account. Predictive studies in patient’s therapeutic follow-up could be of interest in how to adapt treatment to each individual patient. In this thesis, we conduct two predictive studies using patient’s positron emission tomography (PET) imaging. The first study aims to predict tumor evolution during radiotherapy. We propose a patient-specific tumor growth model derived from the advection-reaction equation composed of three terms representing three biological processes respectively, where the tumor growth model parameters are estimated based on patient’s preceding sequential PET images. The second part of the thesis focuses on the case where frequent imaging of the tumor is not available. We therefore conduct another study whose objective is to select predictive factors, among PET-based and clinical characteristics, for patient’s outcome after treatment. Our second contribution is thus a wrapper feature selection method which searches forward in a hierarchical feature subset space, and evaluates feature subsets by their prediction performance using support vector machine (SVM) as the classifier. For the two predictive studies, promising results are obtained on real-world cancer-patient datasets

APA, Harvard, Vancouver, ISO, and other styles

37

Pitt, Ellen Alexandra. "Application of data mining techniques in the prediction of coronary artery disease : use of anaesthesia time-series and patient risk factor data." Thesis, Queensland University of Technology, 2009. https://eprints.qut.edu.au/34427/1/Ellen_Pitt_Thesis.pdf.

Full text

Abstract:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

APA, Harvard, Vancouver, ISO, and other styles

38

Kratsch, Christina [Verfasser], Alice [Akademischer Betreuer] McHardy, Martin [Akademischer Betreuer] Lercher, and Martin [Akademischer Betreuer] Beer. "Computational methods to study phenotype evolution and feature selection techniques for biological data under evolutionary constraints / Christina Kratsch. Gutachter: Martin Lercher ; Martin Beer. Betreuer: Alice McHardy." Düsseldorf : Universitäts- und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf, 2014. http://d-nb.info/1063085128/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Kratsch, Christina Verfasser], Alice [Akademischer Betreuer] [McHardy, Martin [Akademischer Betreuer] Lercher, and Martin [Akademischer Betreuer] Beer. "Computational methods to study phenotype evolution and feature selection techniques for biological data under evolutionary constraints / Christina Kratsch. Gutachter: Martin Lercher ; Martin Beer. Betreuer: Alice McHardy." Düsseldorf : Universitäts- und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf, 2014. http://d-nb.info/1063085128/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Ramanayaka, Mudiyanselage Asanga. "Data Engineering and Failure Prediction for Hard Drive S.M.A.R.T. Data." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1594957948648404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Ramos, Caio César Oba. "Caracterização de perdas comerciais em sistemas de energia através de técnicas inteligentes." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3143/tde-20052015-161147/.

Full text

Abstract:

A detecção de furtos e fraudes nos sistemas de energia provocados por consumidores irregulares é o principal alvo em análises de perdas não-técnicas ou comerciais pelas empresas de energia. Embora a identificação automática de perdas nãotécnicas tenha sido amplamente estudada, a tarefa de selecionar as características mais representativas em um grande conjunto de dados a fim de aumentar a taxa de acerto da identificação, bem como para caracterizar possíveis consumidores irregulares como um problema de otimização, não tem sido muito explorada neste contexto. Neste trabalho, visa-se o desenvolvimento de algoritmos híbridos baseados em técnicas evolutivas a fim de realizar a seleção de características no âmbito da caracterização de perdas não-técnicas, comparando as suas taxas de acerto e verificando as características selecionadas. Vários classificadores são comparados, com destaque para a técnica Floresta de Caminhos Ótimos por sua robustez, sendo ela a técnica escolhida para o cálculo da função objetivo das técnicas evolutivas, analisando o desempenho das mesmas. Os resultados demonstraram que a seleção de características mais representativas podem melhorar a taxa de acerto da classificação de possíveis perdas não-técnicas quando comparada à classificação sem o processo de seleção de características em conjuntos de dados compostos por perfis de consumidores industriais e comerciais. Isto significa que existem características que não são pertinentes e podem diminuir a taxa de acerto durante a classificação dos consumidores. Através da metodologia proposta com o processo de seleção de características, é possível caracterizar e identificar os perfis de consumidores com mais precisão, afim de minimizar os custos com tais perdas, contribuindo para a recuperação de receita das companhias de energia elétrica.
The detection of thefts and frauds in power systems caused by irregular consumers is the most actively pursued analysis in non-technical losses by electric power companies. Although non-technical losses automatic identification has been massively studied, the task of selecting the most representative features in a large dataset, in order to boost the identification accuracy, as well as characterizing possible irregular consumers as a problem of optimization, has not been widely explored in this context. This work aims at developing hybrid algorithms based on evolutionary algorithms in order to perform feature selection in the context of non-technical losses characterization. Although several classifiers have been compared, we have highlighted the Optimum-Path Forest (OPF) technique mainly because of its robustness. Thus, the OPF classifier was chosen to compute the objective function of evolutionary techniques, analyzing their performances. This procedure with feature selection is compared with the procedure without feature selection in datasets composed by industrial and commercial consumers profiles. The results demonstrated that selecting the most representative features can improve the classification accuracy of possible non-technical losses. This means that there are irrelevant features and they can reduce the classification accuracy of consumers. Considering the methodology proposed with feature selection procedure, it is possible to characterize and identify consumer profiles more accurately, in order to minimize costs with such losses, contributing to the recovery of revenue from electric power companies.

APA, Harvard, Vancouver, ISO, and other styles

42

Akkouche, Nourredine. "Optimisation du test de production de circuits analogiques et RF par des techniques de modélisation statistique." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00625469.

Full text

Abstract:

La part dû au test dans le coût de conception et de fabrication des circuits intégrés ne cesse de croître, d'où la nécessité d'optimiser cette étape devenue incontournable. Dans cette thèse, de nouvelles méthodes d'ordonnancement et de réduction du nombre de tests à effectuer sont proposées. La solution est un ordre des tests permettant de détecter au plus tôt les circuits défectueux, qui pourra aussi être utilisé pour éliminer les tests redondants. Ces méthodes de test sont basées sur la modélisation statistique du circuit sous test. Cette modélisation inclus plusieurs modèles paramétriques et non paramétrique permettant de s'adapté à tous les types de circuit. Une fois le modèle validé, les méthodes de test proposées génèrent un grand échantillon contenant des circuits défectueux. Ces derniers permettent une meilleure estimation des métriques de test, en particulier le taux de défauts. Sur la base de cette erreur, un ordonnancement des tests est construit en maximisant la détection des circuits défectueux au plus tôt. Avec peu de tests, la méthode de sélection et d'évaluation est utilisée pour obtenir l'ordre optimal des tests. Toutefois, avec des circuits contenant un grand nombre de tests, des heuristiques comme la méthode de décomposition, les algorithmes génétiques ou les méthodes de la recherche flottante sont utilisées pour approcher la solution optimale.

APA, Harvard, Vancouver, ISO, and other styles

43

Lozano, Vega Gildardo. "Image-based detection and classification of allergenic pollen." Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS031/document.

Full text

Abstract:

Le traitement médical des allergies nécessite la caractérisation des pollens en suspension dans l’air. Toutefois, cette tâche requiert des temps d’analyse très longs lorsqu’elle est réalisée de manière manuelle. Une approche automatique améliorerait ainsi considérablement les applications potentielles du comptage de pollens. Les dernières techniques d’analyse d’images permettent la détection de caractéristiques discriminantes. C’est pourquoi nous proposons dans cette thèse un ensemble de caractéristiques pertinentes issues d’images pour la reconnaissance des principales classes de pollen allergènes. Le cœur de notre étude est l’évaluation de groupes de caractéristiques capables de décrire correctement les pollens en termes de forme, texture, taille et ouverture. Les caractéristiques sont extraites d’images acquises classiquement sous microscope, permettant la reproductibilité de la méthode. Une étape de sélection des caractéristiques est appliquée à chaque groupe pour évaluer sa pertinence.Concernant les apertures présentes sur certains pollens, une méthode adaptative de détection, localisation et comptage pour différentes classes de pollens avec des apparences variées est proposée. La description des apertures se base sur une stratégie de type Sac-de-Mots appliquée à des primitives issues des images. Une carte de confiance est construite à partir de la confiance donnée à la classification des régions de l’image échantillonnée. De cette carte sont extraites des caractéristiques propres aux apertures, permettant leur comptage. La méthode est conçue pour être étendue de façon modulable à de nouveaux types d’apertures en utilisant le même algorithme mais avec un classifieur spécifique.Les groupes de caractéristiques ont été testés individuellement et conjointement sur les classes de pollens les plus répandues en Allemagne. Nous avons montré leur efficacité lors d’une classification de type SVM, notamment en surpassant la variance intra-classe et la similarité inter-classe. Les résultats obtenus en utilisant conjointement tous les groupes de caractéristiques ont abouti à une précision de 98,2 %, comparable à l’état de l’art
The correct classification of airborne pollen is relevant for medical treatment of allergies, and the regular manual process is costly and time consuming. An automatic processing would increase considerably the potential of pollen counting. Modern computer vision techniques enable the detection of discriminant pollen characteristics. In this thesis, a set of relevant image-based features for the recognition of top allergenic pollen taxa is proposed and analyzed. The foundation of our proposal is the evaluation of groups of features that can properly describe pollen in terms of shape, texture, size and apertures. The features are extracted on typical brightfield microscope images that enable the easy reproducibility of the method. A process of feature selection is applied to each group for the determination of relevance.Regarding apertures, a flexible method for detection, localization and counting of apertures of different pollen taxa with varying appearances is proposed. Aperture description is based on primitive images following the Bag-of-Words strategy. A confidence map is built from the classification confidence of sampled regions. From this map, aperture features are extracted, which include the count of apertures. The method is designed to be extended modularly to new aperture types employing the same algorithm to build individual classifiers.The feature groups are tested individually and jointly on of the most allergenic pollen taxa in Germany. They demonstrated to overcome the intra-class variance and inter-class similarity in a SVM classification scheme. The global joint test led to accuracy of 98.2%, comparable to the state-of-the-art procedures

APA, Harvard, Vancouver, ISO, and other styles

44

Chiu, Leung Kin. "Efficient audio signal processing for embedded systems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44775.

Full text

Abstract:

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.

APA, Harvard, Vancouver, ISO, and other styles

45

Apatean, Anca Ioana. "Contributions à la fusion des informations : application à la reconnaissance des obstacles dans les images visible et infrarouge." Phd thesis, INSA de Rouen, 2010. http://tel.archives-ouvertes.fr/tel-00621202.

Full text

Abstract:

Afin de poursuivre et d'améliorer la tâche de détection qui est en cours à l'INSA, nous nous sommes concentrés sur la fusion des informations visibles et infrarouges du point de vue de reconnaissance des obstacles, ainsi distinguer entre les véhicules, les piétons, les cyclistes et les obstacles de fond. Les systèmes bimodaux ont été proposées pour fusionner l'information à différents niveaux: des caractéristiques, des noyaux SVM, ou de scores SVM. Ils ont été pondérés selon l'importance relative des capteurs modalité pour assurer l'adaptation (fixe ou dynamique) du système aux conditions environnementales. Pour évaluer la pertinence des caractéristiques, différentes méthodes de sélection ont été testés par un PPV, qui fut plus tard remplacée par un SVM. Une opération de recherche de modèle, réalisée par 10 fois validation croisée, fournit le noyau optimisé pour SVM. Les résultats ont prouvé que tous les systèmes bimodaux VIS-IR sont meilleurs que leurs correspondants monomodaux.

APA, Harvard, Vancouver, ISO, and other styles

46

Pontabry, Julien. "Construction d'atlas en IRM de diffusion : application à l'étude de la maturation cérébrale." Thesis, Strasbourg, 2013. http://www.theses.fr/2013STRAD039/document.

Full text

Abstract:

L’IRM de diffusion (IRMd) est une modalité d’imagerie médicale in vivo qui suscite un intérêt croissant dans la communauté de neuro-imagerie. L’information sur l’intra-structure des tissus cérébraux est apportée en complément des informations de structure issues de l’IRM structurelle (IRMs). Ces modalités d’imagerie ouvrent ainsi une nouvelle voie pour l’analyse de population et notamment pour l’étude de la maturation cérébrale humaine normale in utero. La modélisation et la caractérisation des changements rapides intervenant au cours de la maturation cérébrale est un défi actuel. Dans ce but, ce mémoire de thèse présente une chaîne de traitement complète de la modélisation spatio-temporelle de la population à l’analyse des changements de forme au cours du temps. Les contributions se répartissent sur trois points. Tout d’abord, l’utilisation de filtre à particules étendus aux modèles d’ordre supérieurs pour la tractographie a permis d’extraire des descripteurs plus pertinents chez le foetus, utilisés ensuite pour estimer les transformations géométriques entre images. Ensuite, l’emploi d’une technique de régression non-paramétrique a permis de modéliser l’évolution temporelle moyenne du cerveau foetal sans imposer d’à priori. Enfin, les changements de forme sont mis en évidence au moyen de méthodes d’extraction et de sélection de caractéristiques
Diffusion weighted MRI (dMRI) is an in vivo imaging modality which raises a great interest in the neuro-imaging community. The intra-structural information of cerebral tissues is provided in addition to the morphological information from structural MRI (sMRI). These imaging modalities bring a new path for population studies, especially for the study in utero of the normal humanbrain maturation. The modeling and the characterization of rapid changes in the brain maturation is an actual challenge. For these purposes, this thesis memoir present a complete processing pipeline from the spatio-temporal modeling of the population to the changes analyze against the time. The contributions are about three points. First, the use of high order diffusion models within a particle filtering framework allows to extract more relevant descriptors of the fetal brain, which are then used for image registration. Then, a non-parametric regression technique was used to model the temporal mean evolution of the fetal brain without enforce a prior knowledge. Finally, the shape changes are highlighted using features extraction and selection methods

APA, Harvard, Vancouver, ISO, and other styles

47

Lian, Chunfeng. "Information fusion and decision-making using belief functions : application to therapeutic monitoring of cancer." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2333/document.

Full text

Abstract:

La radiothérapie est une des méthodes principales utilisée dans le traitement thérapeutique des tumeurs malignes. Pour améliorer son efficacité, deux problèmes essentiels doivent être soigneusement traités : la prédication fiable des résultats thérapeutiques et la segmentation précise des volumes tumoraux. La tomographie d’émission de positrons au traceur Fluoro- 18-déoxy-glucose (FDG-TEP) peut fournir de manière non invasive des informations significatives sur les activités fonctionnelles des cellules tumorales. Les objectifs de cette thèse sont de proposer: 1) des systèmes fiables pour prédire les résultats du traitement contre le cancer en utilisant principalement des caractéristiques extraites des images FDG-TEP; 2) des algorithmes automatiques pour la segmentation de tumeurs de manière précise en TEP et TEP-TDM. La théorie des fonctions de croyance est choisie dans notre étude pour modéliser et raisonner des connaissances incertaines et imprécises pour des images TEP qui sont bruitées et floues. Dans le cadre des fonctions de croyance, nous proposons une méthode de sélection de caractéristiques de manière parcimonieuse et une méthode d’apprentissage de métriques permettant de rendre les classes bien séparées dans l’espace caractéristique afin d’améliorer la précision de classification du classificateur EK-NN. Basées sur ces deux études théoriques, un système robuste de prédiction est proposé, dans lequel le problème d’apprentissage pour des données de petite taille et déséquilibrées est traité de manière efficace. Pour segmenter automatiquement les tumeurs en TEP, une méthode 3-D non supervisée basée sur le regroupement évidentiel (evidential clustering) et l’information spatiale est proposée. Cette méthode de segmentation mono-modalité est ensuite étendue à la co-segmentation dans des images TEP-TDM, en considérant que ces deux modalités distinctes contiennent des informations complémentaires pour améliorer la précision. Toutes les méthodes proposées ont été testées sur des données cliniques, montrant leurs meilleures performances par rapport aux méthodes de l’état de l’art
Radiation therapy is one of the most principal options used in the treatment of malignant tumors. To enhance its effectiveness, two critical issues should be carefully dealt with, i.e., reliably predicting therapy outcomes to adapt undergoing treatment planning for individual patients, and accurately segmenting tumor volumes to maximize radiation delivery in tumor tissues while minimize side effects in adjacent organs at risk. Positron emission tomography with radioactive tracer fluorine-18 fluorodeoxyglucose (FDG-PET) can noninvasively provide significant information of the functional activities of tumor cells. In this thesis, the goal of our study consists of two parts: 1) to propose reliable therapy outcome prediction system using primarily features extracted from FDG-PET images; 2) to propose automatic and accurate algorithms for tumor segmentation in PET and PET-CT images. The theory of belief functions is adopted in our study to model and reason with uncertain and imprecise knowledge quantified from noisy and blurring PET images. In the framework of belief functions, a sparse feature selection method and a low-rank metric learning method are proposed to improve the classification accuracy of the evidential K-nearest neighbor classifier learnt by high-dimensional data that contain unreliable features. Based on the above two theoretical studies, a robust prediction system is then proposed, in which the small-sized and imbalanced nature of clinical data is effectively tackled. To automatically delineate tumors in PET images, an unsupervised 3-D segmentation based on evidential clustering using the theory of belief functions and spatial information is proposed. This mono-modality segmentation method is then extended to co-segment tumor in PET-CT images, considering that these two distinct modalities contain complementary information to further improve the accuracy. All proposed methods have been performed on clinical data, giving better results comparing to the state of the art ones

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Chun-Kang, and 陳俊綱. "A Feature Selection Technique for Semantic Video Indexing System." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/94702733954387515241.

Full text

Abstract:

碩士
國立臺灣大學
資訊工程學研究所
96
For processing the growing and easily accessing videos, users desire an automatic video search system by semantic queries, such as objects, scenes, and events from daily life. To this end TRECVID supplies sufficient video data and a fair evaluation method, annually, to progress video search techniques. Many participants build their classification through fusing results from modeling low level features (LLFs), such as color, edge, and so on. With the development of computer vision, more and more useful LLFs are designed. However, modeling all acquirable LLFs requires tremendous amount of time. Hence, how to use these LLFs efficiently has become an important issue. In this thesis, we propose an evaluation technique for LLFs, then the most appropriate concept-dependent LLF combinations can be chosen to reduce the modeling time while still keep reasonable video search precisions. In our experiments, only modeling 5 chosen LLFs out of total 16 LLFs can reduce 3.51\% modeling time with only 6.78\% performance drop. However, if a half number of LLFs are used, we can even keep 98.88\% precision with 36.07\% time saving.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Chun-Kang. "A Feature Selection Technique for Semantic Video Indexing System." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2307200815300900.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

(9795329), Xiaolong Fan. "A feature selection and classification technique for face recognition." Thesis, 2005. https://figshare.com/articles/thesis/A_feature_selection_and_classification_technique_for_face_recognition/13457450.

Full text

Abstract:

Project examines face recognition research, and presents a novel feature selection and classification technique - Genetic Algorithms (GA) for selection and Artificial Neural Network (ANN) for classification.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'FEATURE SELECTION TECHNIQUE'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles