Tesis sobre el tema "Extraction de motifs fréquents"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Extraction de motifs fréquents".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Julea, Andreea Maria. "Extraction de motifs spatio-temporels dans des séries d'images de télédétection : application à des données optiques et radar". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00652810.
Texto completoPennerath, Frédéric. "Méthodes d'extraction de connaissances à partir de données modélisables par des graphes : Application à des problèmes de synthèse organique". Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00436568.
Texto completoPapon, Pierre-Antoine. "Extraction optimisée de règles d'association positives et négatives intéressantes". Thesis, Clermont-Ferrand 2, 2016. http://www.theses.fr/2016CLF22702/document.
Texto completoThe purpose of data mining is to extract knowledge from large amount of data. The extracted knowledge can take different forms. In this work, we will seek to extract knowledge only in the form of positive association rules and negative association rules. A negative association rule is a rule in which the presence and the absence of a variable can be used. When considering the absence of variables in the study, we will expand the semantics of knowledge and extract undetectable information by the positive association rules mining methods. This will, for example allow doctors to find characteristics that prevent disease instead of searching characteristics that cause a disease. Nevertheless, adding the negation will cause various challenges. Indeed, as the absence of a variable is usually more important than the presence of these same variables, the computational costs will increase exponentially and the risk to extract a prohibitive number of rules, which are mostly redundant and uninteresting, will also increase. In order to address these problems, our proposal, based on the famous Apriori algorithm, does not rely on frequent itemsets as other methods do. We define a new type of itemsets : the reasonably frequent itemsets which will improve the quality of the rules. We also rely on the M G measure to know which forms of rules should be mined but also to remove uninteresting rules. We also use meta-rules to allow us to infer the interest of a negative rule from a positive one. Moreover, our algorithm will extract a new type of negative rules that seems interesting : the rules for which the antecedent and the consequent are conjunctions of negative itemsets. Our study ends with a quantitative and qualitative comparison with other positive and negative association rules mining algorithms on various databases of the literature. Our software ARA (Association Rules Analyzer ) facilitates the qualitative analysis of the algorithms by allowing to compare intuitively the algorithms and to apply in post-process treatments various quality measures. Finally, our proposal improves the extraction in the number and the quality of the extracted rules but also in the rules search path
Fiot, Céline. "Extraction de séquences fréquentes : des données numériques aux valeurs manquantes". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2007. http://tel.archives-ouvertes.fr/tel-00179506.
Texto completoFiot, Céline. "Extraction de séquences fréquentes : des données numériques aux valeurs manquantes". Phd thesis, Montpellier 2, 2007. http://www.theses.fr/2007MON20056.
Texto completoRaïssi, Chedy. "Extraction de Séquences Fréquentes : Des Bases de Données Statiques aux Flots de Données". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2008. http://tel.archives-ouvertes.fr/tel-00351626.
Texto completoRaissi, Chedy. "Extraction de séquences fréquentes : des bases de données statiques aux flots de données". Montpellier 2, 2008. http://www.theses.fr/2008MON20063.
Texto completoFaci, Adam. "Représentation, simulation et exploitation de connaissances dans le formalisme des graphes conceptuels". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS419.
Texto completoThis thesis addresses the field of knowledge representation in conceptual graphs, a structured knowledge representation formalism that provides efficient manipulation tools. There are many challenges in symbolic artificial intelligence concerning the representation of knowledge in general, and in particular the concise representation of a large amount of information. Conceptual graphs, by their ability to visually represent different kinds of knowledge, mainly ontological and factual, and by offering efficient manipulation tools, provide a perfect framework for the answer to these problems. We conduct a comparative study of fuzzy extensions of conceptual graphs, then we propose an algorithm for conceptual graphs simulation as well as an efficient algorithm for extracting frequent patterns that are not redundant with ontological knowledge
Gosselin, Stéphane. "Recherche de motifs fréquents dans une base de cartes combinatoires". Phd thesis, Université Claude Bernard - Lyon I, 2011. http://tel.archives-ouvertes.fr/tel-00838571.
Texto completoSalleb, Ansaf. "Recherche de motifs fréquents pour l'extraction de règles d'association et de caractérisation". Orléans, 2003. http://www.theses.fr/2003ORLE2064.
Texto completoUgarte, Rojas Willy. "Extraction de motifs sous contraintes souples". Caen, 2014. http://www.theses.fr/2014CAEN2040.
Texto completoThe objective of this thesis is to introduce softness in pattern mining process in data mining. Using constraint programming, we were able to make four main contributions: - A general framework for implementing soft threshold constraints in a pattern mining prototype. - The introduction of softness in skypatterns (Pareto-optimal patterns with respect to a set of measures) and the proposal of a generic method for mining (hard) skypatterns as well as soft-skypatterns. - The introduction of the skypattern cube and two methods for its construction: one bottom-up, mainly based on derivation rules; the other uses an approximation of all skypatterns the cube, the method being feasible thanks to the soft-skypatterns. - The introduction of the notion of optimal pattern for modeling many pattern extraction problems: skypatterns, top-k, closed patterns,. . . The declarative and genericity side of our approach opens the way for the definition and discovery of new sets of patterns. These contributions have been experimentally validated on real application domains such as the discovery of toxicophores for the first two contributions and the discovery of mutagenic components for third one
Hamrouni, Tarek. "Fouille de représentations concises des motifs fréquents à travers les espaces de recherche conjonctif et disjonctif". Phd thesis, Université d'Artois, 2009. http://tel.archives-ouvertes.fr/tel-00465733.
Texto completoVoravuthikunchai, Winn. "Représentation des images au moyen de motifs fréquents et émergents pour la classification et la recherche d'images". Caen, 2013. http://www.theses.fr/2013CAEN2084.
Texto completoIn this thesis, our aim is to achieve better results in several tasks in computer vision by focusing on the image representation part. Our idea is to integrate feature dependencies to the original feature representation. Although feature dependencies can give additional useful information to discriminate images, it is a nontrivial task to select a subset of feature combinations from the power set of the features which has an excessively large cardinality. We employ pattern mining techniques to efficiently produce a tractable set of effective combinations. Pattern mining is a process that can analyze large quantities of data and extract interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). The first encountered problem is how to encode image features which are typically real valued as binary transaction items suitable for pattern mining algorithms. We propose some solutions based on local thresholding. The number of extracted patterns is still very high and to use them directly as new features for inferring a supervised classification models leads to overfitting. A solution by aggregating the patterns and have a compact representation which does not overfit to the training data is presented. We have achieved state-of-the-art results on several image classification benchmarks. Along the path of exploration, we realize pattern mining algorithms are suitable especially for large scale tasks as they are very efficient and scale gracefully to the number of images. We have found two suitable applications. The first one is to detect groups of duplicates in very large dataset. In order to run our experiment, we created a database of one million images. The images are randomly downloaded from Google. We have discovered the duplicate groups in less than three minutes. Another application that we found suitable for applying pattern mining techniques is image re-ranking. Our method can improves the original ranking score by a large margin and compare favorably to existing approaches
Plantevit, Marc. "Extraction De Motifs Séquentiels Dans Des Données Multidimensionelles". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2008. http://tel.archives-ouvertes.fr/tel-00319242.
Texto completoMouhoubi, Karima. "Extraction des motifs contraints dans des données bruitées". Paris 13, 2013. http://www.theses.fr/2013PA132060.
Texto completoPlantevit, Marc. "Extraction de motifs séquentiels dans des données multidimensionnelles". Montpellier 2, 2008. http://www.theses.fr/2008MON20066.
Texto completoSequential pattern mining is a key technique of data mining with broad applications (user behavior analysis, bioinformatic, security, music, etc. ). Sequential pattern mining aims at discovering correlations among events through time. There exists many algorithms to discover such patterns. However, these approaches only take one dimension into account (e. G. Product dimension in customer market basket problem analysis) whereas data are multidimensional in nature. In this thesis, we define multidimensional sequential patterns to take the specificity of multidimensional databases (several dimensions, hierarchies, aggregated value). We define algorithms that allow the discovery of such patterns by handling this specificity. Some experiments on both synthetic and real data are reported and show the interest of our proposals. We also focus on the discovery of atypical behavior. We show that there are several interpretations of an atypical behavior (fact or knowledge). According to each interpretation, we propose an approach to discover such behaviors. These approaches are also validated with experiments on real data
Termier, Alexandre. "Extraction d'arbres fréquents dans un corpus hétérogène de données semi-structurées : application à la fouille de document XML". Paris 11, 2004. http://www.theses.fr/2004PA11A002.
Texto completoMarascu, Alice. "Extraction de motifs séquentiels dans les flux de données". Phd thesis, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00445894.
Texto completoPitarch, Yoann. "Résumé de Flots de Données : motifs, Cubes et Hiérarchies". Thesis, Montpellier 2, 2011. http://www.theses.fr/2011MON20051/document.
Texto completoDue to the rapid increase of information and communication technologies, the amount of generated and available data exploded and a new kind of data, the stream data, appeared. One possible and common definition of data stream is an unbounded sequence of very precise data incoming at an high rate. Thus, it is impossible to store such a stream to perform a posteriori analysis. Moreover, more and more data streams concern multidimensional and multilevel data and very few approaches tackle these specificities. Thus, in this work, we proposed some practical and efficient solutions to deal with such particular data in a dynamic context. More specifically, we were interested in adapting OLAP (On Line Analytical Processing ) and hierarchy techniques to build relevant summaries of the data. First, after describing and discussing existent similar approaches, we have proposed two solutions to build more efficiently data cube on stream data. Second, we were interested in combining frequent patterns and the use of hierarchies to build a summary based on the main trends of the stream. Third, even if it exists a lot of types of hierarchies in the literature, none of them integrates the expert knowledge during the generalization phase. However, such an integration could be very relevant to build semantically richer summaries. We tackled this issue and have proposed a new type of hierarchies, namely the contextual hierarchies. We provide with this new type of hierarchies a new conceptual, graphical and logical data warehouse model, namely the contextual data warehouse. Finally, since this work was founded by the ANR through the MIDAS project and thus, we had evaluated our approaches on real datasets provided by the industrial partners of this project (e.g., Orange Labs or EDF R&D)
Masseglia, Florent. "Extraction de connaissances : réunir volumes de données et motifs significatifs". Habilitation à diriger des recherches, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00788309.
Texto completoLi, Haoyuan. "Extraction de séquences inattendues : des motifs séquentiels aux règles d'implication". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00431117.
Texto completoLi, Dong Haoyuan. "Extraction de séquences inattendues : des motifs séquentiels aux règles d’implication". Montpellier 2, 2009. http://www.theses.fr/2009MON20253.
Texto completoThe sequential patterns can be viewed as an extension of the notion of association rules with integrating temporal constraints, which are effective for representing statistical frequency based behaviors between the elements contained in sequence data, that is, the discovered patterns are interesting because they are frequent. However, with considering prior domain knowledge of the data, another reason why the discovered patterns are interesting is because they are unexpected. In this thesis, we investigate the problems in the discovery of unexpected sequences in large databases with respect to prior domain expertise knowledge. We first methodically develop the framework Muse with integrating the approaches to discover the three forms of unexpected sequences. We then extend the framework Muse by adopting fuzzy set theory for describing sequence occurrence. We also propose a generalized framework SoftMuse with respect to the concept hierarchies on the taxonomy of data. We further propose the notions of unexpected sequential patterns and unexpected implication rules, in order to evaluate the discovered unexpected sequences by using a self-validation process. We finally propose the discovery and validation of unexpected sentences in free format text documents. The usefulness and effectiveness of our proposed approaches are shown with the experiments on synthetic data, real Web server access log data, and text document classification
Ståhl, Martin. "Extraction of recurring behavioral motifs from video recordings of natural behavior". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230894.
Texto completoKomplex neural aktivitet utrycks i en mängd olika former, en av dessa är beteende. Det gör att ett naturligt sått att studera neural aktivitet är att analysera beteende. I den här uppsatsen så har beteende blivit studerat genom en dold Markov modell. Data har tagits från filmer av fritt springande möss i en låda. Modellen har framgångsrikt tränats på- och klassificerat mössbeteende. Klassificering med 4 och 6 tillstånd har testats, med 6 tillstånd verkar modellen göra en distinktion mellan två olika stationära tillstånd vilket är biologiskt intressant. Sammanfattningsvis är en gaussisk dold Markov modell ett rimligt sått att klassificera mössbeteende men det löser inga fundamentala problem. Det är också en del datainsamlingstekniker som skapat felaktigheter vilket behöver förbättras.
Kane, Mouhamadou bamba. "Extraction et sélection de motifs émergents minimaux : application à la chémoinformatique". Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC223/document.
Texto completoPattern discovery is an important field of Knowledge Discovery in Databases.This work deals with the extraction of minimal emerging patterns. We propose a new efficientmethod which allows to extract the minimal emerging patterns with or without constraint ofsupport ; unlike existing methods that typically extract the most supported minimal emergentpatterns, at the risk of missing interesting but less supported patterns. Moreover, our methodtakes into account the absence of attribute that brings a new interesting knowledge.Considering the rules associated with emerging patterns highly supported as prototype rules,we have experimentally shown that this set of rules has good confidence on the covered objectsbut unfortunately does not cover a significant part of the objects ; which is a disavadntagefor their use in classification. We propose a prototype-based selection method that improvesthe coverage of the set of the prototype rules without a significative loss on their confidence.We apply our prototype-based selection method to a chemical data relating to the aquaticenvironment : Aquatox. In a classification context, it allows chemists to better explain theclassification of molecules, which, without this method of selection, would be predicted by theuse of a default rule
Oudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement". Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066437/document.
Texto completoThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Oudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement". Electronic Thesis or Diss., Paris 6, 2014. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2014PA066437.pdf.
Texto completoThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Millioz, Fabien. "Deux approches de segmentation temps-fréquence : détection par modèle statistique et extraction de contour par le champ de vecteurs de réallocation". Phd thesis, Grenoble INPG, 2009. http://tel.archives-ouvertes.fr/tel-00421599.
Texto completoLa première méthode s'appuie sur une méthode statistique, modélisant le signal analysé par un signal d'intérêt à segmenter perturbé par un bruit blanc gaussien additif de variance inconnue. Le but est de déterminer le support temps-fréquence, ensemble des points sur lesquels l'énergie du signal à segmenter est répartie. Une détection de type Neyman-Pearson à probabilité de fausse alarme fixée permet de détecter les points temps-fréquence contenant du signal, à niveau de bruit connu. L'algorithme proposé est itératif, estimant le niveau de bruit à partir des points non segmentés, ce niveau de bruit servant à détecter de nouveaux points contenant du signal. Un critère basé sur le kurtosis spectral des points non segmentés permet de définir l'arrêt des itérations.
La seconde méthode est basée sur le principe de la réallocation du spectrogramme, en tant que source d'information sur le spectrogramme. La réallocation déplace l'énergie du spectrogramme sur le centre de gravité local de l'énergie. Aux frontière d'un motif temps-fréquence, l'énergie sera déplacée vers l'intérieur du motif. Ainsi, les vecteur
s de réallocation, décrivant le déplacement de l'énergie du pectrogramme par la réallocation, sont localement parallèles sur la frontière d'un motif. Nous définissons alors un « degré de parallélisme » pour chaque vecteur, égal au nombre de ses vecteurs voisins qui lui sont parallèles. Un algorithme de type suivi de crête, parcourant le plan temps-fréquence en suivant les maximums de ce degré de parallélisme, construit alors un contour entourant le motif temps-fréquence.
Millioz, Fabien. "Deux approches de segmentation temps-fréquence : détection par modèle statistique et extraction de contour par le champ de vecteurs de réallocation". Phd thesis, Grenoble INPG, 2009. http://www.theses.fr/2009INPG0040.
Texto completoTime-frequency representations show the spectral evolution of a signal in time. The goals of this work is to propose two segmentation principles of time-frequency plane, trying to determine the time-frequency areas which present an interest in relation to the analyzed signal. The first method is based on a statistical method, taking as a model of the analyzed signal the sum of a signal of interest to segment and a white Gaussian noise of unknown variance. The aim is to determine the time-frequency support, that is all the points on which the energy of the signal to segment is distributed. A Neyman-Pearson detection with a given probability of false alarm can detect the time-frequency points containing signal for a known noise level. The proposed algorithm is iterative, estimating the noise level from non-segmented points, this noise level permitting to detect new points containing signal. A criterion based on the spectral kurtosis of non-segmented points define the moment to stop the iterations. The application of this method are illustrated on synthetic and real signals, and for different time-frequency representations. The second method is based on the principle of the reassignment of the spectrogram, not as a reassigned time-frequency representation, but only as a source of information on the spectrogram. The reassignment shifts the energy spectrogram to the local center of gravity of the energy. On the boundary of a time-frequency pattern, energy will be moved inside the pattern. Thus, the reassignment vectors describing the displacement of the energy spectrogram by the reassignment are locally parallel on the boundary of a pattern. We then define a "parallelism degree" for each vector, being the number of its neighboring vectors which are parallel to it. A tracking algorithm searching for the maxima of the parallelism degree along the time-frequency plane built finally a closed contour encircling the time-frequency pattern
Au, Émilie. "Intégration de la sémantique dans la représentation de documents par les arbres de dépendances syntaxiques". Mémoire, Université de Sherbrooke, 2011. http://savoirs.usherbrooke.ca/handle/11143/4938.
Texto completoLhote, Loïck. "l'algorithmique: la fouille de données et l'arithmétique". Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00092862.
Texto completoTofan, Radu-Ionel. "Bordures : de la sélection de vues dans un cube de données au calcul parallèle de fréquents maximaux". Thesis, Bordeaux 1, 2010. http://www.theses.fr/2010BOR14073/document.
Texto completoThe materialization of views is an effective technique for optimizing queries. In this thesis, we propose a new vision, we qualify it as "user oriented", of the solutions to the problem of selecting views to materialize in data warehouses : the user fixes the maximum response time. In this vision, we propose algorithms that are competitive with the algorithms "oriented system" type, where resources such as memory, are considered as the major constraint. The "user oriented" approach is studied under a dynamic context. We analyze the stability of this system with respect to the dynamic query workload dynamic as well as data dynamic (insertions and deletions). The key concept of our algorithms for selecting views to materialize is the border. This concept has been widely studied in the data mining community under the maximal frequent itemset extration setting. Many sequential algorithms have been proposed. We propose a new sequential algorithm MineWithRounds, easily parallelizable, which differs from the others in that it guarantees a theoretical speed up in the case of multiprocessors shared memory case
Leleu, Marion. "Extraction de motifs séquentiels sous contraintes dans des données contenant des répétitions consécutives". Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0001/these.pdf.
Texto completoThis PhD Thesis concerns the particular data mining field that is the sequential pattern extractions from event sequence databases (e. G. Customer transaction sequences, web logs, DNA). Among existing algorithms, those based on the use of a representation in memory of the pattern locations (called occurrence lists), present a lost of efficiency when the sequences contain consecutive repetitions. This PhD Thesis proposes some efficient solutions to the sequential pattern extraction in such a context (constraints and repetitions) based on a condensation of informations contained in the occurrence lists, without lost for the extraction process. This new representation leads to new sequential pattern extraction algorithms (GoSpade and GoSpec) particularly well adapted to the presence of consecutive repetitions in the datasets. These algorithms have been proved to be sound and complete and experiments on both real and synthetic datasets enabled to show that the gain in term of memory space and execution time is important and that they increase with the number of consecutive repetitions contained in the datasets. Finally, a financial application has been performed in order to make a condensed representation of market trends by means of frequent sequential patterns
Leleu, Marion Boulicaut Jean-François. "Extraction de motifs séquentiels sous contraintes dans des données contenant des répétitions consécutives". Villeurbanne : Doc'INSA, 2005. http://docinsa.insa-lyon.fr/these/pont.php?id=leleu.
Texto completoShah, Faaiz Hussain. "Gradual Pattern Extraction from Property Graphs". Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS025/document.
Texto completoGraph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through relationships with a set of attributes or properties in the form of (key:value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arise need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches so as not to introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data. Further to this work, we present our approach for mining such graphs in order to extract frequent gradual patterns in the form of ``the more/less $A_1$,..., the more/less $A_n$" where $A_i$ are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of ``The more/less the A_1 is F_1,...,the more/less the A_n is F_n" where A_i are attributes retrieved from the graph nodes or relationships and F_i are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time utilization, memory consumption and the number of patterns being generated
Hébert, Céline. "Extraction et usages de motifs minimaux en fouille de données, contribution au domaine des hypergraphes". Phd thesis, Université de Caen, 2007. http://tel.archives-ouvertes.fr/tel-00253794.
Texto completoTeisseire, Maguelonne. "Autour et alentours des motifs séquentiels". Habilitation à diriger des recherches, Université Montpellier II - Sciences et Techniques du Languedoc, 2007. http://tel.archives-ouvertes.fr/tel-00203628.
Texto completoLorsque les données manipulées deviennent plus complexes, nous montrons que les motifs s'avèrent être également une représentation adaptée. Nous nous attachons à décrire certaines de nos propositions sur deux types de données complexes : (1) pour les documents textuels, nous proposons une approche de classification supervisée SPAC et (2) pour les données multidimensionnelles, nous présentons deux nouvelles techniques permettant de prendre en compte différentes dimensions d'analyse, M2SP, et la hiérarchie disponible sur les dimensions : HYPE.
Al-Najdi, Atheer. "Une approche basée sur les motifs fermés pour résoudre le problème de clustering par consensus". Thesis, Université Côte d'Azur (ComUE), 2016. http://www.theses.fr/2016AZUR4111/document.
Texto completoClustering is the process of partitioning a dataset into groups, so that the instances in the same group are more similar to each other than to instances in any other group. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, a new consensus clustering method, called MultiCons, is proposed. It uses the frequent closed itemset mining technique in order to discover the similarities between the different base clustering solutions. The identified similarities are presented in a form of clustering patterns, that each defines the agreement between a set of base clusters in grouping a set of instances. By dividing these patterns into groups based on the number of base clusters that define the pattern, MultiCons generates a consensussolution from each group, resulting in having multiple consensus candidates. These different solutions are presented in a tree-like structure, called ConsTree, that facilitates understanding the process of building the multiple consensuses, and also the relationships between the data instances and their structuring in the data space. Five consensus functions are proposed in this work in order to build a consensus solution from the clustering patterns. Approach 1 is to just merge any intersecting clustering patterns. Approach 2 can either merge or split intersecting patterns based on a proposed measure, called intersection ratio
Laurent, Anne. "Fouille de données complexes et logique floue : extraction de motifs à partir de bases de données multidimensionnelles". Habilitation à diriger des recherches, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00413140.
Texto completoFabregue, Mickael. "Extraction d'informations synthétiques à partir de données séquentielles : application à l'évaluation de la qualité des rivières". Thesis, Strasbourg, 2014. http://www.theses.fr/2014STRAD016/document.
Texto completoExploring temporal databases with suitable data mining methods have been the subject of several studies. However, it often leads to an excessive volume of extracted information and the analysis is difficult for the user. We addressed this issue and we specically focused on methods that synthesize and filter extracted information. The objective is to provide interpretable results for humans. Thus, we relied on the notion of partially ordered sequence and we proposed (1) an algorithm that extracts the set of closed partially ordered patterns ; (2) a post-processing to filter some interesting patterns for the user and (3) an approach that extracts a partially ordered consensus as an alternative to pattern extraction. The proposed methods were applied for validation on hydrobiological data from the Fresqueau ANR project. In addition, they have been implemented in a visualization tool designed for hydrobiologists for water course quality analysis
Egho, Elias. "Extraction de motifs séquentiels dans des données séquentielles multidimensionnelles et hétérogènes : une application à l'analyse de trajectoires de patients". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0066/document.
Texto completoAll domains of science and technology produce large and heterogeneous data. Although a lot of work was done in this area, mining such data is still a challenge. No previous research work targets the mining of heterogeneous multidimensional sequential data. This thesis proposes a contribution to knowledge discovery in heterogeneous sequential data. We study three different research directions: (i) Extraction of sequential patterns, (ii) Classification and (iii) Clustering of sequential data. Firstly we generalize the notion of a multidimensional sequence by considering complex and heterogeneous sequential structure. We present a new approach called MMISP to extract sequential patterns from heterogeneous sequential data. MMISP generates a large number of sequential patterns as this is usually the case for pattern enumeration algorithms. To overcome this problem, we propose a novel way of considering heterogeneous multidimensional sequences by mapping them into pattern structures. We develop a framework for enumerating only patterns satisfying given constraints. The second research direction is in concern with the classification of heterogeneous multidimensional sequences. We use Formal Concept Analysis (FCA) as a classification method. We show interesting properties of concept lattices and of stability index to classify sequences into a concept lattice and to select some interesting groups of sequences. The third research direction in this thesis is in concern with the clustering of heterogeneous multidimensional sequential data. We focus on the notion of common subsequences to define similarity between a pair of sequences composed of a list of itemsets. We use this similarity measure to build a similarity matrix between sequences and to separate them in different groups. In this work, we present theoretical results and an efficient dynamic programming algorithm to count the number of common subsequences between two sequences without enumerating all subsequences. The system resulting from this research work was applied to analyze and mine patient healthcare trajectories in oncology. Data are taken from a medico-administrative database including all information about the hospitalizations of patients in Lorraine Region (France). The system allows to identify and characterize episodes of care for specific sets of patients. Results were discussed and validated with domain experts
Termier, Alexandre. "Pattern mining rock: more, faster, better". Habilitation à diriger des recherches, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-01006195.
Texto completoLisa, Di Jorio. "Recherche de motifs graduels et application aux données médicales". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2010. http://tel.archives-ouvertes.fr/tel-00577212.
Texto completoSalah, Saber. "Parallel itemset mining in massively distributed environments". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT297/document.
Texto completoLe volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie)
Joliveau, Marc. "Réduction de séries chronologiques de trafic routier urbain issues d'un réseau de capteurs géoréférencés et extraction de motifs spatio-temporels". Châtenay-Malabry, Ecole centrale de Paris, 2008. http://www.theses.fr/2008ECAP1087.
Texto completoGay, Dominique. "Calcul de motifs sous contraintes pour la classification supervisée". Phd thesis, Université de Nouvelle Calédonie, 2009. http://tel.archives-ouvertes.fr/tel-00516706.
Texto completoGay, Dominique. "Calcul de motifs sous contraintes pour la classification supervisée". Phd thesis, Nouvelle Calédonie, 2009. http://portail-documentaire.univ-nc.nc/files/public/bu/theses_unc/TheseDominiqueGay2009.pdf.
Texto completoMaletzke, André Gustavo. "Uma metodologia para extração de conhecimento em séries temporais por meio da identificação de motifs e da extração de características". Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04062009-201445/.
Texto completoData mining has been applied to several areas with the objective of extracting interesting and relevant knowledge from large data bases. In this scenario, machine learning provides some of the main methods employed in data mining. Symbolic learning are among the most used machine learning methods since these methods can provide models that can be interpreted by domain experts. However, traditional machine learning methods, such as decision trees and decision rules, do not take into account the temporal information present into data. This work proposes a methodology to extract knowledge from time series data using feature extraction and motif identification. Features and motifs are used as attributes for knowledge extraction performed by machine learning methods. This methodology was evaluated using some well-known data sets. In addition, we compared the proposed methodology to the approach that feeds machine learning algorithms with raw time series data. Results show that there are statistically significant differences for most of the data sets employed in the study. Finally, it is presented a preliminary study with environmental monitoring data from the Itaipu reservoir, made available by Itaipu Binacional. This study is restricted to the application of motif identification. We have used time series of water temperature collected from several regions of the reservoir. In this study, a pattern in motif distribution was observed for each region of the reservoir, agreeing with some well-known literature results
Mancheron, Alban. "Extraction de Motifs Communs dans un Ensemble de Séquences.Application à l'identification de sites de liaison aux protéines dans les séquences primaires d'ADN". Phd thesis, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00257587.
Texto completoLes difficultés posées par ce problème sont le manque d'informations sur les motifs à extraire, ainsi que le volume important des données à traiter. Deux algorithmes polynomiaux -- l'un déterministe et l'autre probabiliste -- permettant de le traiter ont été conçus. Dans ce contexte, nous avons introduit une nouvelle famille de fonctions de score et étudié leurs propriétés statistiques. Nous avons également caractérisé le langage reconnu par la structure d'index appelée "Oracle", et proposé une amélioration la rendant plus efficace.
Ozturk, Ozgur. "Feature extraction and similarity-based analysis for proteome and genome databases". The Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805.
Texto completoMancheron, Alban. "Extraction de motifs communs dans un ensemble de séquences : application à l'identification de sites de liaison aux protéines dans les séquences primaires d'ADN". Nantes, 2006. http://archive.bu.univ-nantes.fr/pollux/show.action?id=ec42cb78-8fc6-4c4d-a3a3-42735a44dafb.
Texto completoThe extraction of significant biological patterns, and in particular the identification of regulation sites of proteinic synthesis in DNA primary sequences, is one of the major issues today in bioinformatics. Indeed any anomaly in proteinic synthesis regulation has detrimental damages on the well-being of certain organisms. Extracting these sites enables to better understand cellular operation or even to remove or cure pathology. What is promblematic is the lack of information on patterns to be extracted, as well as the large volume of data to mine. In ths dissertation, we introduce two polynomial algorithms – the first one is deterministic and the other one is probabilist – to address the issue of pattern extraction. We introduce a new family of score functions and we study theirs statistical properties. We characterize the language which is recognized by the index structure named “Oracle”, and we modifiy this structure in order to make it more efficient