Дисертації з теми "Exploration des motifs"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Exploration des motifs".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Ndiaye, Marie. "Exploration de grands ensembles de motifs." Thesis, Tours, 2010. http://www.theses.fr/2010TOUR4029/document.
Повний текст джерелаThe abundance of patterns generated by knowledge extraction algorithms is a major problem in data mining. Ta facilitate the exploration of these patterns, two approaches are often used: the first is to summarize the sets of extracted patterns and the second approach relies on the construction of visual representations of the patterns. However, the summaries are not structured and they are proposed without exploration method. Furthermore, visualizations do not provide an overview of the pattern .sets. We define a generic framework that combines the advantages of bath approaches. It allows building summaries of patterns sets at different levels of detail. These summaries provide an overview of the pattern sets and they are structured in the form of cubes on which OLAP navigational operators can be applied in order to explore the pattern sets. Moreover, we propose an algorithm which provides a summary of good quality whose size is below a given threshold. Finally, we instantiate our framework with association rules
Bahauddin, Azizi Bin. "Contemporary Malaysian art : an exploration of the Songket motifs." Thesis, Sheffield Hallam University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287587.
Повний текст джерелаMéger, Nicolas Boulicaut Jean-François Rigotti Christophe. "Recherche automatique des fenêtres temporelles optimales des motifs séquentiels." Villeurbanne : Doc'INSA, 2005. http://docinsa.insa-lyon.fr/these/pont.php?id=meger.
Повний текст джерелаMéger, Nicolas. "Recherche automatique des fenêtres temporelles optimales des motifs séquentiels." Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0095/these.pdf.
Повний текст джерелаThis work addresses the problem of mining patterns under constraints in event sequences. Extracted patterns are episode rules. Our main contribution is an automatic search for optimal time window of each one of the episode rules. We propose to extract only rules having such an optimal time window. These rules are termed FLM-rules. We present an algorithm, WinMiner, that aims to extract FLM-rules, given a minimum support threshold, a minimum confidence threshold and a maximum gap constraint. Proofs of the correctness of this algorithm are supplied. We also propose a dedicated interest measure that aims to select FLM-rules such that their heads and bodies can be considered as dependant. Two applications are described. The first one is about mining medical datasets while the other one deals with seismic datasets
Fauré, Clément. "Découvertes de motifs pertinents par l'implémentation d'un réseau bayésien : application à l'industrie aéronautique." Lyon, INSA, 2007. http://theses.insa-lyon.fr/publication/2007ISAL0077/these.pdf.
Повний текст джерелаThe study of an operational process often runs up against the analysis of heterogeneous and large data. While the environment associated with this process evolves constantly, one inevitably notices the appearance of differences between what was expected and what is really observed. By using the collected data and available expertise, it is then necessary to detect these differences, and thus to update the model being used. Accordingly, we propose a knowledge discovery process that integrates the definition and the exploitation of a bayesian network to facilitate the analysis of a concise set of association rules. The evolution of this model is controlled by the discovery of relevant rules, themselves made more accessible by the exploitation from the properties of this model. Finally, we show a practical application of our proposals to the field of operational interruptions in the aircraft industry
Fauré, Clément Boulicaut Jean-François Mille Alain. "Découvertes de motifs pertinents par l'implémentation d'un réseau bayésien application à l'industrie aéronautique /." Villeurbanne : Doc'INSA, 2008. http://docinsa.insa-lyon.fr/these/pont.php?id=faure.
Повний текст джерелаSalleb, Ansaf. "Recherche de motifs fréquents pour l'extraction de règles d'association et de caractérisation." Orléans, 2003. http://www.theses.fr/2003ORLE2064.
Повний текст джерелаLeleu, Marion. "Extraction de motifs séquentiels sous contraintes dans des données contenant des répétitions consécutives." Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0001/these.pdf.
Повний текст джерелаThis PhD Thesis concerns the particular data mining field that is the sequential pattern extractions from event sequence databases (e. G. Customer transaction sequences, web logs, DNA). Among existing algorithms, those based on the use of a representation in memory of the pattern locations (called occurrence lists), present a lost of efficiency when the sequences contain consecutive repetitions. This PhD Thesis proposes some efficient solutions to the sequential pattern extraction in such a context (constraints and repetitions) based on a condensation of informations contained in the occurrence lists, without lost for the extraction process. This new representation leads to new sequential pattern extraction algorithms (GoSpade and GoSpec) particularly well adapted to the presence of consecutive repetitions in the datasets. These algorithms have been proved to be sound and complete and experiments on both real and synthetic datasets enabled to show that the gain in term of memory space and execution time is important and that they increase with the number of consecutive repetitions contained in the datasets. Finally, a financial application has been performed in order to make a condensed representation of market trends by means of frequent sequential patterns
Leleu, Marion Boulicaut Jean-François. "Extraction de motifs séquentiels sous contraintes dans des données contenant des répétitions consécutives." Villeurbanne : Doc'INSA, 2005. http://docinsa.insa-lyon.fr/these/pont.php?id=leleu.
Повний текст джерелаKhiari, Mehdi. "Découverte de motifs n-aires utilisant la programmation par contraintes." Caen, 2012. http://www.theses.fr/2012CAEN2015.
Повний текст джерелаUntil recently, data mining and Constraint Programming have been developed separately one from the other. This thesis is one of the first to address the relationships between these two areas of computer science, in particular using constraint programming techniques for constraint-based mining. The data mining community has proposed generic approaches to discover local patterns under constraints, and this issue is rather well-mastered. However, these approaches do not take into consideration that the interest of a pattern often depends on the other patterns. Such a pattern is called n-ary pattern or pattern set. Few works on mining n-ary patterns were conducted and the proposed approaches are ad hoc. This thesis proposes an unified framework for modeling and solving n-ary constraints in data mining. First, the n-ary pattern extraction problem is modeled as a Constraint Satisfaction Problem (CSP). Then, a high-level declarative language for mining n-ary patterns is proposed. This language allows to express a wide range of n-ary constraints. Several solving methods are developed and compared. The main advantages of this framework are its declarative and generic sides. To the best of our knowledge, it is the first generic and flexible framework for modeling and mining n-ary patterns
Haas, Ghislaine. "Exploration sémiotique de l'écriture mériméenne." Besançon, 1988. http://www.theses.fr/1988BESA1005.
Повний текст джерелаDaurel, Thomas. "Représentations condensées d'ensembles de règles d'association." Lyon, INSA, 2003. http://www.theses.fr/2003ISAL0059.
Повний текст джерелаRecently, the more and more intense usage of information systems yielded to the growth of the number and the size of the involved databases. The owners felt more and more the potential value of those databases. They started trying to these databases to advantage without being restricted to classical querying processes, but by attempting to extract information enclosing high added value, which could lead to the improvement of the users’ knowledge. This issue led to the creation of a new discipline : frequent pattern extraction. A lot more and more efficient algorithms were developed to address this kind of extraction since 1994. It is now often possible to extract in an exhaustive way in most of the cases certain types of frequent patterns enclosed in a database. The major drawback that met is the following : the discovered patterns are often too numerous. It is therefor difficult to sort them following an interest order in order to derive interesting information. In this context, it appeared that it is particularly interesting to find out more condensed representations of the extracted patterns in order to ensure a better reading of the results. More precisely, we have worked on the patterns called association rules, and we have proposed two global representations of association rules sets. We have designed and implemented tow algorithms for calculating each one of these representations, and we have shown their efficiency and effectiveness in practice. At last, we have conducted tests on real-life datasets
Holat, Pierre. "Fouille de motifs et modélisation statistique pour l'extraction de connaissances textuelles." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD045.
Повний текст джерелаIn natural language processing, two main approaches are used : machine learning and data mining. In this context, cross-referencing data mining methods based on patterns and statistical machine learning methods is apromising but hardly explored avenue. In this thesis, we present three major contributions: the introduction of delta-free patterns, used as statistical model features; the introduction of a semantic similarity constraint for the mining, calculated using a statistical model; and the introduction of sequential labeling rules, created from the patterns and selected by a statistical model
Baker, Nicholas Jackson. "A quantitative exploration of the meso-scale structure of ecological networks." Thesis, University of Canterbury. Biological Sciences, 2015. http://hdl.handle.net/10092/10667.
Повний текст джерелаLi, Dong Haoyuan. "Extraction de séquences inattendues : des motifs séquentiels aux règles d’implication." Montpellier 2, 2009. http://www.theses.fr/2009MON20253.
Повний текст джерелаThe sequential patterns can be viewed as an extension of the notion of association rules with integrating temporal constraints, which are effective for representing statistical frequency based behaviors between the elements contained in sequence data, that is, the discovered patterns are interesting because they are frequent. However, with considering prior domain knowledge of the data, another reason why the discovered patterns are interesting is because they are unexpected. In this thesis, we investigate the problems in the discovery of unexpected sequences in large databases with respect to prior domain expertise knowledge. We first methodically develop the framework Muse with integrating the approaches to discover the three forms of unexpected sequences. We then extend the framework Muse by adopting fuzzy set theory for describing sequence occurrence. We also propose a generalized framework SoftMuse with respect to the concept hierarchies on the taxonomy of data. We further propose the notions of unexpected sequential patterns and unexpected implication rules, in order to evaluate the discovered unexpected sequences by using a self-validation process. We finally propose the discovery and validation of unexpected sentences in free format text documents. The usefulness and effectiveness of our proposed approaches are shown with the experiments on synthetic data, real Web server access log data, and text document classification
Fancett, Anna. "The exploration of familial myths and motifs in selected novels by Jane Austen and Walter Scott." Thesis, University of Aberdeen, 2014. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=225725.
Повний текст джерелаKafaru, Abiodun Babatunde. "An exploration of painting aesthetics, signs, symbols, motifs and patterns of coastal Yoruba land of Nigeria." Thesis, University of Northampton, 2014. http://nectar.northampton.ac.uk/8864/.
Повний текст джерелаVoravuthikunchai, Winn. "Représentation des images au moyen de motifs fréquents et émergents pour la classification et la recherche d'images." Caen, 2013. http://www.theses.fr/2013CAEN2084.
Повний текст джерелаIn this thesis, our aim is to achieve better results in several tasks in computer vision by focusing on the image representation part. Our idea is to integrate feature dependencies to the original feature representation. Although feature dependencies can give additional useful information to discriminate images, it is a nontrivial task to select a subset of feature combinations from the power set of the features which has an excessively large cardinality. We employ pattern mining techniques to efficiently produce a tractable set of effective combinations. Pattern mining is a process that can analyze large quantities of data and extract interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). The first encountered problem is how to encode image features which are typically real valued as binary transaction items suitable for pattern mining algorithms. We propose some solutions based on local thresholding. The number of extracted patterns is still very high and to use them directly as new features for inferring a supervised classification models leads to overfitting. A solution by aggregating the patterns and have a compact representation which does not overfit to the training data is presented. We have achieved state-of-the-art results on several image classification benchmarks. Along the path of exploration, we realize pattern mining algorithms are suitable especially for large scale tasks as they are very efficient and scale gracefully to the number of images. We have found two suitable applications. The first one is to detect groups of duplicates in very large dataset. In order to run our experiment, we created a database of one million images. The images are randomly downloaded from Google. We have discovered the duplicate groups in less than three minutes. Another application that we found suitable for applying pattern mining techniques is image re-ranking. Our method can improves the original ranking score by a large margin and compare favorably to existing approaches
Khanjari, Miyaneh Eynollah. "Un cadre générique pour les modèles globaux fondés sur les motifs locaux." Thesis, Tours, 2009. http://www.theses.fr/2009TOUR4020/document.
Повний текст джерелаThe construction of global models is a significant field of Knowledge Discovery in Databases. In particular, global models based on local patterns such as association rules provide a succinct and understandable description of data. The numerous viewpoints, aims and domain-specific data require a wide range of global models and associated construction methods. This thesis proposes a generic framework for formalizing and manipulating global models based on local patterns. In this framework, a lot of the existing construction methods dedicated to classification, clustering and summarization are easily formulated in a declarative way. We provide a generic algorithm enabling to leave aside technical aspects, for instance the kind of used patterns and associated mining approach. Moreover, we also optimize this algorithm according to the specified parameters. Finally, our framework facilitates the comparison of existing construction methods by highlighting their main features
Pinçonnat, Crystel. "New York dans le roman français : appropriation, exploration et manipulations d'un mythe moderne (1945-1992)." Paris 3, 1995. http://www.theses.fr/1996PA030010.
Повний текст джерелаThis dissertation intends to study new york as a literary myth in the french novel from 1945 to the present. To analyse the evolution of this myth, three notions have been defined : appropriation, integration and hybridization. Being a new and foreign element, this myth had to be assimilated through different strategies into national literature by french novelists. In the appropriation period, new york was represented through two major ways : the transposition of forms inherited from tradition (character of the naive stranger, odyssean journey, antic tragedy and allegorical novel), and the imitation of american paraliterary sub-genres like the detective novel and the science-fiction in which new york appeared like a generic component. Whereas transposition makes new york the modern capital of dispossession, in paraliterature on the contrary, it is the territory of the hero, his winning battlefield. These first strategies gradually exhausted, integration allows a new look at new york, a look from "the inside" : the marginal's one, which transforms urban universe into an insular world. New york becomes a modern form of the anti-ark, a hyperbole for interurban exile. Eventually, hybridization defines the main trend of the french novel with a new york setting : the new york repertory - iconographic and literary stock enriched with the contributions of movies, comics and art - prevails over the urban referent. This repertory offers a rich source for manipulation. The use of new york in fiction definitely breaks with the exotic tradition and engages poetic experiments. The metropolis becomes the mythical chaos of modernity, the fabulous horizon of the french novel, its "image palace"
Salle, Paola. "Les motifs séquentiels pour les données issues des puces ADN." Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20239/document.
Повний текст джерелаThe emergence of biotechnology, such as DNA chips, has acquired huge amounts of data in a cell at a given moment and under certain conditions. They are used in order to understand a disease whose origin is a genomic abnormality disrupting the natural development between growth, division and cell death. Using this biotechnology, the aim is to identify the genes involved in disease studied. But each chip gives information on more than 19,000 genes then it is difficult to use and to analyse the results. Methods of Data mining are used in order to find interesting correlations from large database. Initially proposed to address questions about the behavior of customers of a supermarket, these methods are now used and adapted in various fields of applications ranging marketing to health. In this study, we propose new methods in order to help biologists to deduce new knowledge from data obtained by DNA microarray analysis. Specifically, we propose to identify genes frequently ordered by their expressions and we study the contribution of such information as the new study material for biologists
Albert-Lorincz, Hunor. "Contributions aux techniques de prise de décision et de valorisation financière." Lyon, INSA, 2007. http://theses.insa-lyon.fr/publication/2007ISAL0039/these.pdf.
Повний текст джерелаThis thesis investigates and develops tools for flnancial decision making. Our first contribution is aimed at the extraction of frequents sequential patterns from, for example, discretized flnancial lime series. We introduce well partitioned constraints that allow a hierarchical structuration of the search space for increased efficiency. In particular, we look at the conjunction of a minimal frequency constraint and a regular expression constraint. It becomes possible to build adaptative strategies that find a good balance between the pruning based on the anti-monotonic frequency and the pruning based on the regular expression constraint which is generally neither monotonie nor antimonotonic. Then, we develop two financial applications. At first, we use frequent patterns to characterise market configurations by means of signatures in order to improve some technical indicators functions for automated trading strategies. Then, we look at the pricing of Bermudan options, i. E. , a financial derivative product which allows to terminate an agreement between two parties at a set of pre-defined dates. This requires to compute double conditional expectations at a high computational cos!. Our new method, neighbourhood Monte Carlo can be up to 20 times faster th an the traditional methods
Albert-Lorincz, Hunor Boulicaut Jean-François. "Contributions aux techniques de prise de décision et de valorisation financière." Villeurbanne : Doc'INSA, 2007. http://docinsa.insa-lyon.fr/these/pont.php?id=albert-lorincz.
Повний текст джерелаDi, Jorio Lisa. "Recherche de motifs graduels et application aux données médicales." Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20112.
Повний текст джерелаWith the raise of new biological technologies, as for example DNA chips, and IT technologies (e.g. storage capacities), health care domain has evolved through the last years. Indeed, new high technologies allow for the analysis of thousands of genomic parameters related to various deseases (as cancer, Alzheimer), and how to link them to clinical parameters. In parallel, storage evolutions enable nowadays researchers to gather a huge amount of data generated by biological experiments. This Ph.D thesis is strongly related to medical data mining. We tackle the problem of extracting gradual patterns of the form « the older a patient, the less his memories are accurate ». To handle different types of information, we propose to extract gradualness for an extensive range of patterns: gradual itemsets, gradual multidimensionnal itemsets, gradual sequencial patterns. Every contribution is experimented on a synthetic or real datasets
Khiari, Medhi. "Découverte de motifs n-aires utilisant la programmation par contraintes." Phd thesis, Université de Caen, 2012. http://tel.archives-ouvertes.fr/tel-01023102.
Повний текст джерелаSoulet, Arnaud. "Un cadre générique de découverte de motifs sous contraintes fondées sur des primitives." Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00123185.
Повний текст джерелаl'extraction de connaissances dans les bases de données. Cette thèse
traite de l'extraction de motifs locaux sous contraintes. Nous
apportons un éclairage nouveau avec un cadre combinant des primitives
monotones pour définir des contraintes quelconques. La variété de ces
contraintes exprime avec précision l'archétype des motifs recherchés
par l'utilisateur au sein d'une base de données. Nous proposons alors
deux types d'approche d'extraction automatique et générique malgré les
difficultés algorithmiques inhérentes à cette tâche. Leurs efficacités
reposent principalement sur l'usage de conditions nécessaires pour
approximer les variations de la contrainte. D'une part, des méthodes
de relaxations permettent de ré-utiliser les nombreux algorithmes
usuels du domaines. D'autre part, nous réalisons des méthodes
d'extraction directes dédiées aux motifs ensemblistes pour les données
larges ou corrélées en exploitant des classes d'équivalences. Enfin,
l'utilisation de nos méthodes ont permi la découverte de phénomènes
locaux lors d'applications industrielles et médicales.
Vigneron, Vincent. "Programmation par contraintes et découverte de motifs sur données séquentielles." Thesis, Angers, 2017. http://www.theses.fr/2017ANGE0028/document.
Повний текст джерелаRecent works have shown the relevance of constraint programming to tackle data mining tasks. This thesis follows this approach and addresses motif discovery in sequential data. We focus in particular, in the case of classified sequences, on the search for motifs that best fit each individual class. We propose a language of constraints over matrix domains to model such problems. The language assumes a preprocessing of the data set (e.g., by pre-computing the locations of each character in each sequence) and views a motif as the choice of a sub-matrix (i.e., characters, sequences, and locations). We introduce different matrix constraints (compatibility of locations with the database, class covering, location-based character ordering common to sequences, etc.) and address two NP-complete problems: the search for class-specific totally ordered motifs (e.g., exclusive subsequences) or partially ordered motifs. We provide two CSP models that rely on global constraints to prove exclusivity. We then present a memetic algorithm that uses this CSP model during initialisation and intensification. This hybrid approach proves competitive compared to the pure CSP approach as shown by experiments carried out on protein sequences. Lastly, we investigate data set preprocessing based on patterns rather than characters, in order to reduce the size of the resulting matrix domain. To this end, we present and compare two alternative methods, one based on lattice search, the other on dynamic programming
Joliveau, Marc. "Réduction de séries chronologiques de trafic routier urbain issues d'un réseau de capteurs géoréférencés et extraction de motifs spatio-temporels." Châtenay-Malabry, Ecole centrale de Paris, 2008. http://www.theses.fr/2008ECAP1087.
Повний текст джерелаHébert, Céline. "Extraction et usages de motifs minimaux en fouille de données, contribution au domaine des hypergraphes." Phd thesis, Université de Caen, 2007. http://tel.archives-ouvertes.fr/tel-00253794.
Повний текст джерелаAbboud, Yacine. "Fouille de motifs : entre accessibilité et robustesse." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0176/document.
Повний текст джерелаInformation now occupies a central place in our daily lives, it is both ubiquitous and easy to access. Yet extracting information from data is often an inaccessible process. Indeed, even though data mining methods are now accessible to all, the results of these mining are often complex to obtain and exploit for the user. Pattern mining combined with the use of constraints is a very promising direction of the literature to both improve the efficiency of the mining and make its results more apprehensible to the user. However, the combination of constraints desired by the user is often problematic because it does not always fit with the characteristics of the searched data such as noise. In this thesis, we propose two new constraints and an algorithm to overcome this issue. The robustness constraint allows to mine noisy data while preserving the added value of the contiguity constraint. The extended closedness constraint improves the apprehensibility of the set of extracted patterns while being more noise-resistant than the conventional closedness constraint. The C3Ro algorithm is a generic sequential pattern mining algorithm that integrates many constraints, including the two new constraints that we have introduced, to provide the user the most efficient mining possible while reducing the size of the set of extracted patterns. C3Ro competes with the best pattern mining algorithms in the literature in terms of execution time while consuming significantly less memory. C3Ro has been experienced in extracting competencies from web-based job postings
Makhalova, Tatiana. "Contributions à la fouille d'ensembles de motifs : des données complexes à des ensembles de motifs signifiants et réutilisables." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0124.
Повний текст джерелаIn this thesis, we study different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on some key issues such as (i) formal definition of pattern interestingness, (ii) the minimization of pattern explosion, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets. Moreover, we go beyond the typical perspectives of pattern mining and investigate the intrinsic structure underlying a tabular dataset. The main contributions of this research work are theoretical, conceptual, and practical. Regarding the theoretical novelty, we propose a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows us to estimate both the data and pattern complexity. Furthermore, practically the closure structure may be used to represent the data topology w.r.t. an interestingness measure. Conceptually, the closure structure allows an analyst to understand the intrinsic data configuration before selecting any interestingness measure rather than to understand the data by means of an arbitrarily selected interestingness measure. In this research work, we also discuss the difference between interestingness and quality of pattern sets. We propose to adopt the best practices of supervised learning in pattern mining. Based on that, we developed an algorithm for itemset mining, called KeepItSimple, which relates interestingness and the quality of pattern sets. In practice, KeepItSimple allows us to efficiently mine a set of interesting and good-quality patterns without any pattern explosion. In addition, we propose an algorithm for a greedy enumeration of likely-occurring itemsets that can be used when frequent closed itemset miners return too many itemsets. The last practical contribution consists in developing an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. The experiments show that Mint has very good behavior in practice and usually outperforms its competitors
Vroland, Christophe. "Algorithmique pour la recherche de motifs approchée et application à la recherche de cibles de microARN." Thesis, Lille 1, 2016. http://www.theses.fr/2016LIL10110/document.
Повний текст джерелаApproximate string matching consists in identifying the occurrences of a motif within a text, modulo a given distance. This problem has many applications in bioinformatics for the analysis of biological sequences. For instance, microRNAs are short RNA molecules regulating the expression of genes by specific recognition of their sequence motif on the target gene. Understanding the mode of action of microRNAs requires the ability to identify short motifs, around 21 nucleotides in size, comprising up to 3-4 errors in a text whose size is in the order of 108-109 , representing a genome. In this thesis, I have proposed an efficient algorithm for the approximate search of short motifs. This algorithm is based on a new type of seeds containing errors, the 01*0 seeds, and uses a compressed index structure, the FM-index. I have implemented this algorithm in a freely available software, Bwolo. I demonstrate experimentally the advantage of this approach and compare it to the state of the art of existing tools. I also show how Bwolo can be used and have set up an original study on the distribution of potential miRNA target sites in two plant genomes, Arabidopsis thaliana and Arabidopsis lyrata
Patoyt, Claire. "La poésie d'Emily Dickinson (1830-1886) à la lumière des traductions : une étude des rapports entre énonciation métaphorique et exploration métaphysique." Paris 7, 2013. http://www.theses.fr/2013PA070056.
Повний текст джерелаThis dissertation aims at revealing the intimate links between metaphorical enunciation and metaphysical exploration in the poetry of Emily Dickinson (1830¬1886). Based on close readings and comparative analyses of several poems alongside their French translations, it highlights the inextricable nature of thought and movement. It also examines the essential role such movement plays in the poet's relationship to metaphysical concepts and to the spiritual reaim. What makes the « living metaphor » an enunciative dynamics fit for conveying the conjunctive aspirations or critical impulses of the poet's thought? What is at stake in the metaphysical exploration it carries out? To what extent can it be said that a « metaphysics of movement » shapes and animates Dickinson's poetics? This study shows that Emily Dickinson's metaphorical writing moves from a conception of metaphor inherited from mystical poetry and theology, in which metaphors are seen as bridges between the earthly and the celestial, to an approach in which metaphor serves as a force for re- describing reality and bringing about new referential relationships that give priority to creative imagination and the vitality of poetic language. It also points out the main formai legacy of Dickinson's metaphors. In parallel, it offers an original approach to the poems by engaging in a dynamic dialogue with the translations which here act as a « hermeneutic counterpoint » to the originals, a means of exploring in greater depth the layers of meaning contained in Dickinson's metaphorical utterances. A new analytical light is thus shed on the philosophical depths of the poet's thought
Gosselin, Stéphane. "Recherche de motifs fréquents dans une base de cartes combinatoires." Phd thesis, Université Claude Bernard - Lyon I, 2011. http://tel.archives-ouvertes.fr/tel-00838571.
Повний текст джерелаSalah, Saber. "Parallel itemset mining in massively distributed environments." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT297/document.
Повний текст джерелаLe volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie)
Bu, Daher Julie. "Sequential Pattern Generalization for Mining Multi-source Data." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0204.
Повний текст джерелаHuge amounts of digital data have been created across years due to the increasing digitization in our everyday life. As a consequence, fast data collection and storage tools have been developed and data can be collected in huge volumes for various research and business purposes. The collected data can come from multiple data sources and can be of heterogeneous kinds thus forming heterogeneous multi-source datasets, and they can be analyzed to extract valuable information. Data mining is an important task in discovering interesting information from datasets. Different approaches in this domain have been proposed, among which pattern mining is the most important one. Pattern mining, including sequential pattern mining, discovers statistically relevant patterns (or sequential patterns) among data. The challenges of this domain include discovering important patterns with a limited complexity and by avoiding redundancy among the resulting patterns. Multi-source data could represent descriptive and sequential data, making the mining process complex. There could be problems of data similarity on one source level which leads to a limited number of extracted patterns. The aim of the thesis is to mine multi-source data to obtain valuable information and compensate the loss of patterns due to the problem of similarity with a limited complexity and by avoiding pattern redundancy. Many approaches have been proposed to mine multi-source data. These approaches either integrate multi-source data and perform a single mining process which increases the complexity and generates a redundant set of sequential patterns, or they mine sources separately and integrate the results which could lead to a loss of patterns. We propose G_SPM, a general sequential pattern mining algorithm that takes advantage of multi-source data to mine general patterns which compensates the loss of patterns caused by the problem of data similarity. These rich patterns contain various kinds of information and have higher data coverage than traditional patterns. G_SPM adopts a selective mining strategy of data sources where a main source is first mined, and other sources are mined only when similarity among patterns is detected, which limits the complexity and avoids pattern redundancy. The experimental results confirm that G_SPM succeeds in mining general patterns with a limited complexity. In addition, it outperforms traditional approaches in terms of runtime and pattern redundancy
Simard, Mélissa. "Théâtre, culture et société haïtienne : une exploration interartistique et interculturelle de "La mort de soi dans sa longue robe de Mariée" de Guy Régis Jr." Thesis, Université Laval, 2012. http://www.theses.ulaval.ca/2012/29411/29411.pdf.
Повний текст джерелаKane, Mouhamadou bamba. "Extraction et sélection de motifs émergents minimaux : application à la chémoinformatique." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC223/document.
Повний текст джерелаPattern discovery is an important field of Knowledge Discovery in Databases.This work deals with the extraction of minimal emerging patterns. We propose a new efficientmethod which allows to extract the minimal emerging patterns with or without constraint ofsupport ; unlike existing methods that typically extract the most supported minimal emergentpatterns, at the risk of missing interesting but less supported patterns. Moreover, our methodtakes into account the absence of attribute that brings a new interesting knowledge.Considering the rules associated with emerging patterns highly supported as prototype rules,we have experimentally shown that this set of rules has good confidence on the covered objectsbut unfortunately does not cover a significant part of the objects ; which is a disavadntagefor their use in classification. We propose a prototype-based selection method that improvesthe coverage of the set of the prototype rules without a significative loss on their confidence.We apply our prototype-based selection method to a chemical data relating to the aquaticenvironment : Aquatox. In a classification context, it allows chemists to better explain theclassification of molecules, which, without this method of selection, would be predicted by theuse of a default rule
Boukhetta, Salah Eddine. "Analyse de séquences avec GALACTIC – Approche générique combinant analyse formelle des concepts et fouille de motifs." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS035.
Повний текст джерелаA sequence is a sequence of ordered elements such as travel trajectories or sequences of product purchases in a supermarket. Sequence mining is a domain of data mining that aims an extracting frequent sequential patterns from a set of sequences, where these patterns are most often common subsequences. Support is a monotonic measure that defines the proportion of data sharing a sequential pattern. Several algorithms have been proposed for frequent sequential pattern extraction. With the evolution of computing capabilities, the task of frequent sequential pattern extraction has become faster. The difficulty then lies in the large number of extracted sequential patterns, which makes it difficult to read and therefore to interpret. We speak about "deluge of patterns". Formal Concept Analysis (FCA) is a field of data analysis for identifying relationships in a set of binary data. Pattern structures extend FCA to handle complex data such as sequences. The GALACTIC platform implements the Next Priority Concept algorithm which proposes a pattern extraction approach for heterogeneous and complex data. It allows a generic pattern computation through specific descriptions of objects by monadic predicates. It also proposes to refine a set of objects through specific exploration strategies, which allows to reduce the number of patterns. In this work, we are interested in the analysis of sequential data using GALACTIC. We propose several descriptions and strategies adapted to sequences. We also propose unsupervised quality measures to be able to compare between the obtained patterns. A qualitative and quantitative analysis is conducted on real and synthetic datasets to show the efficiency of our approach
Cadot, Martine. "Extraire et valider les relations complexes en sciences humaines : statistiques, motifs et règles d'association." Phd thesis, Université de Franche-Comté, 2006. http://tel.archives-ouvertes.fr/tel-00594174.
Повний текст джерелаCavadenti, Olivier. "Contribution de la découverte de motifs à l’analyse de collections de traces unitaires." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI084/document.
Повний текст джерелаIn a manufacturing context, a product is moved through different placements or sites before it reaches the final customer. Each of these sites have different functions, e.g. creation, storage, retailing, etc. In this scenario, traceability data describes in a rich way the events a product undergoes in the whole supply chain (from factory to consumer) by recording temporal and spatial information as well as other important elements of description. Thus, traceability is an important mechanism that allows discovering anomalies in a supply chain, like diversion of computer equipment or counterfeits of luxury items. In this thesis, we propose a methodological framework for mining unitary traces using knowledge discovery methods. We show how the process of data mining applied to unitary traces encoded in specific data structures allows extracting interesting patterns that characterize frequent behaviors. We demonstrate that domain knowledge, that is the flow of products provided by experts and compiled in the industry model, is useful and efficient for classifying unitary traces as deviant or not. Moreover, we show how data mining techniques can be used to provide a characterization for abnormal behaviours (When and how did they occur?). We also propose an original method for detecting identity usurpations in the supply chain based on behavioral data, e.g. distributors using fake identities or concealing them. We highlight how the knowledge discovery in databases, applied to unitary traces encoded in specific data structures (with the help of expert knowledge), allows extracting interesting patterns that characterize frequent behaviors. Finally, we detail the achievements made within this thesis with the development of a platform of traces analysis in the form of a prototype
Adda, Mehdi. "Intégration des connaissances ontologiques dans la fouille de motifs séquentiels avec application à la personnalisation web." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2008. http://tel.archives-ouvertes.fr/tel-00842475.
Повний текст джерелаEgho, Elias. "Extraction de motifs séquentiels dans des données séquentielles multidimensionnelles et hétérogènes : une application à l'analyse de trajectoires de patients." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0066/document.
Повний текст джерелаAll domains of science and technology produce large and heterogeneous data. Although a lot of work was done in this area, mining such data is still a challenge. No previous research work targets the mining of heterogeneous multidimensional sequential data. This thesis proposes a contribution to knowledge discovery in heterogeneous sequential data. We study three different research directions: (i) Extraction of sequential patterns, (ii) Classification and (iii) Clustering of sequential data. Firstly we generalize the notion of a multidimensional sequence by considering complex and heterogeneous sequential structure. We present a new approach called MMISP to extract sequential patterns from heterogeneous sequential data. MMISP generates a large number of sequential patterns as this is usually the case for pattern enumeration algorithms. To overcome this problem, we propose a novel way of considering heterogeneous multidimensional sequences by mapping them into pattern structures. We develop a framework for enumerating only patterns satisfying given constraints. The second research direction is in concern with the classification of heterogeneous multidimensional sequences. We use Formal Concept Analysis (FCA) as a classification method. We show interesting properties of concept lattices and of stability index to classify sequences into a concept lattice and to select some interesting groups of sequences. The third research direction in this thesis is in concern with the clustering of heterogeneous multidimensional sequential data. We focus on the notion of common subsequences to define similarity between a pair of sequences composed of a list of itemsets. We use this similarity measure to build a similarity matrix between sequences and to separate them in different groups. In this work, we present theoretical results and an efficient dynamic programming algorithm to count the number of common subsequences between two sequences without enumerating all subsequences. The system resulting from this research work was applied to analyze and mine patient healthcare trajectories in oncology. Data are taken from a medico-administrative database including all information about the hospitalizations of patients in Lorraine Region (France). The system allows to identify and characterize episodes of care for specific sets of patients. Results were discussed and validated with domain experts
Pham, Quang-Khai. "Time Sequence Summarization: Theory and Applications." Phd thesis, Université de Nantes, 2010. http://tel.archives-ouvertes.fr/tel-00538512.
Повний текст джерелаHamrouni, Tarek. "Fouille de représentations concises des motifs fréquents à travers les espaces de recherche conjonctif et disjonctif." Phd thesis, Université d'Artois, 2009. http://tel.archives-ouvertes.fr/tel-00465733.
Повний текст джерелаPennerath, Frédéric. "Méthodes d'extraction de connaissances à partir de données modélisables par des graphes : Application à des problèmes de synthèse organique." Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00436568.
Повний текст джерелаSzathmary, Laszlo. "Méthodes symboliques de fouille de données avec la plate-forme Coron." Phd thesis, Université Henri Poincaré - Nancy I, 2006. http://tel.archives-ouvertes.fr/tel-00336374.
Повний текст джерелаLes contributions principales de cette thèse sont : (1) nous avons développé et adapté des algorithmes pour trouver les règles d'association minimales non-redondantes ; (2) nous avons défini une nouvelle base pour les règles d'associations appelée “règles fermées” ; (3) nous avons étudié un champ de l'ECBD important mais relativement peu étudié, à savoir l'extraction des motifs rares et des règles d'association rares ; (4) nous avons regroupé nos algorithmes et une collection d'autres algorithmes ainsi que d'autres opérations auxiliaires d'ECBD dans une boîte à outils logicielle appelée Coron.
Fabregue, Mickael. "Extraction d'informations synthétiques à partir de données séquentielles : application à l'évaluation de la qualité des rivières." Thesis, Strasbourg, 2014. http://www.theses.fr/2014STRAD016/document.
Повний текст джерелаExploring temporal databases with suitable data mining methods have been the subject of several studies. However, it often leads to an excessive volume of extracted information and the analysis is difficult for the user. We addressed this issue and we specically focused on methods that synthesize and filter extracted information. The objective is to provide interpretable results for humans. Thus, we relied on the notion of partially ordered sequence and we proposed (1) an algorithm that extracts the set of closed partially ordered patterns ; (2) a post-processing to filter some interesting patterns for the user and (3) an approach that extracts a partially ordered consensus as an alternative to pattern extraction. The proposed methods were applied for validation on hydrobiological data from the Fresqueau ANR project. In addition, they have been implemented in a visualization tool designed for hydrobiologists for water course quality analysis
El, Ouassouli Amine. "Discovering complex quantitative dependencies between interval-based state streams." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI061.
Повний текст джерелаThe increasing utilization of sensor devices in addition to human-given data make it possible to capture real world systems complexity through rich temporal descriptions. More precisely, the usage of a multitude of data sources types allows to monitor an environment by describing the evolution of several of its dimensions through data streams. One core characteristic of such configurations is heterogeneity that appears at different levels of the data generation process: data sources, time models and data models. In such context, one challenging task for monitoring systems is to discover non-trivial temporal knowledge that is directly actionable and suitable for human interpretation. In this thesis, we firstly propose to use a Temporal Abstraction (TA) approach to express information given by heterogeneous raw data streams with a unified interval-based representation, called state streams. A state reports on a high level environment configuration that is of interest for an application domain. Such approach solves problems introduced by heterogeneity, provides a high level pattern vocabulary and also permits also to integrate expert(s) knowledge into the discovery process. Second, we introduced the Complex Temporal Dependencies (CTD) that is a quantitative interval-based pattern model. It is defined similarly to a conjunctive normal form and allows to express complex temporal relations between states. Contrary to the majority of existing pattern models, a CTD is evaluated with automatic statistical assessment of streams intersection avoiding the use of any significance user-given parameter. Third, we proposed CTD-Miner a first efficient CTD mining framework. CTD-Miner performs an incremental dependency construction. CTD-Miner benefits from pruning techniques based on a statistical correspondence relationship that aims to accelerate the exploration search space by reducing redundant information and provide a more usable result set. Finally, we proposed the Interval Time Lag Discovery (ITLD) algorithm. ITLD is based on a confidence variation heuristic that permits to reduce the complexity of the pairwise dependency discovery process from quadratic to linear w.r.t a temporal constraint Δ on time lags. Experiments on simulated and real world data showed that ITLD provides efficiently more accurate results in comparison with existing approaches. Hence, ITLD enhances significantly the accuracy, performances and scalability of CTD-Miner. The encouraging results given by CTD-Miner on our real world motion data set suggests that it is possible to integrate insights given by real time video processing approaches in a knowledge discovery process opening interesting perspectives for monitoring smart environments
Shah, Faaiz Hussain. "Gradual Pattern Extraction from Property Graphs." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS025/document.
Повний текст джерелаGraph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through relationships with a set of attributes or properties in the form of (key:value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arise need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches so as not to introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data. Further to this work, we present our approach for mining such graphs in order to extract frequent gradual patterns in the form of ``the more/less $A_1$,..., the more/less $A_n$" where $A_i$ are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of ``The more/less the A_1 is F_1,...,the more/less the A_n is F_n" where A_i are attributes retrieved from the graph nodes or relationships and F_i are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time utilization, memory consumption and the number of patterns being generated