Dissertationen zum Thema „Structuration automatique de données“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Structuration automatique de données" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Bouchekif, Abdesselam. „Structuration automatique de documents audio“. Thesis, Le Mans, 2016. http://www.theses.fr/2016LEMA1038/document.
Der volle Inhalt der QuelleThe topic structuring is an area that has attracted much attention in the Natural Language Processing community. Indeed, topic structuring is considered as the starting point of several applications such as information retrieval, summarization and topic modeling.In this thesis, we proposed a generic topic structuring system i.e. that has the ability to deal with any TV Broadcast News.Our system contains two steps: topic segmentation and title assignment. Topic segmentation consists in splitting the document into thematically homogeneous fragments. The latter are generally identified by anonymous labels and the last step has to assign a title to each segment.Several original contributions are proposed like the use of a joint exploitation of the distribution of speakers and words (speech cohesion) and also the use of diachronic semantic relations. After the topic segmentation step, the generated segments are assigned a title corresponding to an article collected from Google News during the same day. Finally, we proposed the evaluation of two new metrics, the first is dedicated to the topic segmentation and the second to title assignment.The experiments are carried out on three corpora. They consisted of 168 TV Broadcast News from 10 French channels automatically transcribed. Our corpus is characterized by his richness and diversity
Ribert, Arnaud. „Structuration évolutive de données : application à la construction de classifieurs distribués“. Rouen, 1998. http://www.theses.fr/1998ROUES073.
Der volle Inhalt der QuelleKempf, Emmanuelle. „Structuration, standardisation et enrichissement par traitement automatique du langage des données relatives au cancer au sein de l’entrepôt de données de santé de l’Assistance Publique – Hôpitaux de Paris“. Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS694.
Der volle Inhalt der QuelleCancer is a public health issue for which the improvement of care relies, among other levers, on the use of clinical data warehouses (CDWs). Their use involves overcoming obstacles such as the quality, standardization and structuring of the care data stored there. The objective of this thesis was to demonstrate that it is possible to address the challenges of secondary use of data from the Assistance Publique - Hôpitaux de Paris (AP-HP) CDW regarding cancer patients, and for various purposes such as monitoring the safety and quality of care, and performing observational and experimental clinical research. First, the identification of a minimal data set enabled to concentrate the effort of formalizing the items of interest specific to the discipline. From 15 identified items, 4 use cases from distinct medical perspectives were successfully developed: automation of calculations of safety and quality of care required for the international certification of health establishments , clinical epidemiology regarding the impact of public health measures during a pandemic on the delay in cancer diagnosis, decision support regarding the optimization of patient recruitment in clinical trials, development of neural networks regarding prognostication by computer vision. A second condition necessary for the CDW use in oncology is based on the optimal and interoperable formalization between several CDWs of this minimal data set. As part of the French PENELOPE initiative aiming at improving patient recruitment in clinical trials, the thesis assessed the added value of the oncology extension of the OMOP common data model. This version 5.4 of OMOP enabled to double the rate of formalization of prescreening criteria for phase I to IV clinical trials. Only 23% of these criteria could be automatically queried on the AP-HP CDW, and this, modulo a positive predictive value of less than 30%. This work suggested a novel methodology for evaluating the performance of a recruitment support system: based on the usual metrics (sensitivity, specificity, positive predictive value, negative predictive value), but also based on additional indicators characterizing the adequacy of the model chosen with the CDW related (rate of translation and execution of queries). Finally, the work showed how natural language processing related to the CDW data structuring could enrich the minimal data set, based on the baseline tumor dissemination assessment of a cancer diagnosis and on the histoprognostic characteristics of tumors. The comparison of textual extraction performance metrics and the human and technical resources necessary for the development of rules and machine learning systems made it possible to promote, for a certain number of situations, the first approach. The thesis identified that automatic rule-based preannotation before a manual annotation phase for training a machine learning model was an optimizable approach. The rules seemed to be sufficient for textual extraction tasks of a certain typology of entities that are well characterized on a lexical and semantic level. Anticipation and modeling of this typology could be possible upstream of the textual extraction phase, in order to differentiate, depending on each type of entity, to what extent machine learning should replace the rules. The thesis demonstrated that a close attention to a certain number of data science challenges allowed the efficient use of a CDW for various purposes in oncology
Serrano, Laurie. „Vers une capitalisation des connaissances orientée utilisateur : extraction et structuration automatiques de l'information issue de sources ouvertes“. Caen, 2014. http://www.theses.fr/2014CAEN2011.
Der volle Inhalt der QuelleDue to the considerable increase of freely available data (especially on the Web), the discovery of relevant information from textual content is a critical challenge. Open Source Intelligence (OSINT) specialists are particularly concerned by this phenomenon as they try to mine large amounts of heterogeneous information to acquire actionable intelligence. This collection process is still largely done by hand in order to build knowledge sheets summarizing all the knowledge acquired about a specific entity. Given this context, the main goal of this thesis work is to reduce and facilitate the daily work of intelligence analysts. For this sake, our researches revolve around three main axis: knowledge modeling, text mining and knowledge gathering. We explored the literature related to these different domains to develop a global knowledge gathering system. Our first contribution is the building of a domain ontology dedicated to knowledge representation for OSINT purposes and that comprises a specific definition and modeling of the event concept for this domain. Secondly, we have developed and evaluated an event recognition system which is based on two different extraction approaches: the first one is based on hand-crafted rules and the second one on a frequent pattern learning technique. As our third contribution, we proposed a semantic aggregation process as a necessary post-processing step to enhance the quality of the events extracted and to convert extraction results into actionable knowledge. This is achieved by means of multiple similarity measures between events, expressed according a qualitative scale which has been designed following our final users' needs
Hiot, Nicolas. „Construction automatique de bases de données pour le domaine médical : Intégration de texte et maintien de la cohérence“. Electronic Thesis or Diss., Orléans, 2024. http://www.theses.fr/2024ORLE1026.
Der volle Inhalt der QuelleThe automatic construction of databases in the medical field represents a major challenge for guaranteeing efficient information management and facilitating decision-making. This research project focuses on the use of graph databases, an approach that offers dynamic representation and efficient querying of data and its topology. Our project explores the convergence between databases and automatic language processing, with two central objectives. In one hand, our focus is on maintaining consistency within graph databases during updates, particularly with incomplete data and specific business rules. Maintaining consistency during updates ensures a uniform level of data quality for all users and facilitates analysis. In a world of constant change, we give priority to updates, which may involve modifying the instance to accommodate new information. But how can we effectively manage these successive updates within a graph database management system? In a second hand, we focus on the integration of information extracted from text documents, a major source of data in the medical field. In particular, we are looking at clinical cases and pharmacovigilance, a crucial area for identifying the risks and adverse effects associated with the use of drugs. But, how can we detect information in texts? How can this unstructured data be efficiently integrated into a graph database? How can it be structured automatically? And finally, what is a valid structure in this context? We are particularly interested in encouraging reproducible research by adopting a transparent and documented approach to enable independent verification and validation of our results
Nouvel, Damien. „Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale“. Phd thesis, Université François Rabelais - Tours, 2012. http://tel.archives-ouvertes.fr/tel-00788630.
Der volle Inhalt der QuelleSèdes, Florence. „Contribution au developpement des systemes bureautiques integres : gestion de donnees, repertoires, formulaires, documents“. Toulouse 3, 1987. http://www.theses.fr/1987TOU30134.
Der volle Inhalt der QuelleLai, Hien Phuong. „Vers un système interactif de structuration des index pour une recherche par le contenu dans des grandes bases d'images“. Phd thesis, Université de La Rochelle, 2013. http://tel.archives-ouvertes.fr/tel-00934842.
Der volle Inhalt der QuelleGuinaudeau, Camille. „Structuration automatique de flux télévisuels“. Phd thesis, INSA de Rennes, 2011. http://tel.archives-ouvertes.fr/tel-00646522.
Der volle Inhalt der QuellePoli, Jean-Philippe. „Structuration automatique de flux télévisuels“. Phd thesis, Université Paul Cézanne - Aix-Marseille III, 2007. http://tel.archives-ouvertes.fr/tel-00207960.
Der volle Inhalt der QuelleLa stabilité des grilles de programmes nous permet d'en proposer une modélisation statistique basée sur un modèle de Markov contextuel et un arbre de régression. Entraîné sur les grilles de programmes des années précédentes, ce modèle permet de pallier l'imprécision des guides de programmes (EPG, magazines). En rapprochant ces deux sources d'informations, nous sommes en mesure de prédire les séquences d'émissions les plus probables pour un jour de l'année et d'encadrer la durée des émissions.
A partir de ces grilles de programmes prédites et d'un ensemble de règles indiquant les éléments
caractéristiques d'une transition entre deux genres de programmes (images monochromes, silences ou logos), nous sommes en mesure de localiser ces ruptures à l'aide de d´etections effectuées localement dans le flux.
Félicien, Vallet. „Structuration automatique de talk shows télévisés“. Phd thesis, Télécom ParisTech, 2011. http://pastel.archives-ouvertes.fr/pastel-00635495.
Der volle Inhalt der QuelleVallet, Félicien. „Structuration automatique de talk shows télévisés“. Paris, Télécom ParisTech, 2011. http://pastel.archives-ouvertes.fr/pastel-00635495.
Der volle Inhalt der QuelleArchives professionals have high expectations for efficient indexing tools. In particular, the purpose of archiving TV broadcasts has created an expanding need for automatic content structuring methods. In this thesis, is addressed the task of structuring a particular type of TV content that has been scarcely studied in previous works, namely talk show programs. The object of this work is examined in the light of a number of sociological studies, with the aim to identify relevant prior knowledge on the basis of which the structuring approach is motivated. Then, having highlighted that a structuring scheme should be assessed according to specific use cases, a user-based evaluation is undertaken. The latter stresses out the relevance of considering the speakers’ interventions as elementary structural units instead of video shots usually employed in similar studies. Having emphasised the importance of speaker oriented detectors, the second part of this thesis is thus put on speaker diarization methods. We first propose a state of the art of the techniques — particularly unsupervised ones — used in this research domain. Then, results on a first speaker diarization system are presented. Finally, a more original system exploiting efficiently audiovisual information is finally proposed. Its validity is tested on two talk show collections : Le Grand Échiquier and On n’a pas tout dit. The results show that this new system outperforms state of the art methods. Besides, it strengthens the interest of using visual cues — even for tasks that are considered to be exclusively audio such as speaker diarization — and kernel methods in a multimodal context
Caillaut, Gaëtan. „Apprentissage d'espaces prétopologiques pour l'extraction de connaissances structurées“. Electronic Thesis or Diss., Orléans, 2019. http://www.theses.fr/2019ORLE3208.
Der volle Inhalt der QuellePretopology is a mathematical theory whose goal is to relax the set of axioms governing the well known topology theory. Weakening the set of axioms mainly consists in redefining the pseudo-closure operator which is idempotent in topology. The non-idempotence of the pretopological pseudo-closure operator offers an appropriate framework for the modeling of various phenomena, such as iterative processes evolving throughout time. Pretopology is the outcome of the generalisation of several concepts, amongst topology but also graph theory. This thesis is divided in four main parts. The first one is an introduction to the theoretical framework of the pretopology, as well as an overview of several applications in domains where the pretopology theory shines, such as machine learning, image processing or complex systems analysis.The second part will settle the logical modeling of pretopological spaces which allows to define pretopological spaces by a logical and multi-criteria combination. This modeling enables learning algorithms to define pretopological spaces by learning a logical formula. This part will also present an unrestricted pretopological spaces learning algorithm. Unrestricted pretopological spaces can be quite hard to manipulate, especially when the studied population has some structural properties that can be described in a more restricted space. This is why the third part is dedicated to the automatic learning of pretopological spaces of type V. These spaces are defined by a set of prefilters which impose a particular structure. The LPSMI algorithm, which is the main contribution of this work, is presented in this part. This algorithm relies on the multi-instance learning principles to accurately capture the structural properties of pretopological spaces of type V. Finally, the last part consists of multiple applications of the theoretical framework presented in this thesis. Applications to lexical taxonomies extraction, community detection and extraction of temporal relations, as part of a NLP process, will be presented in order to show the usefulness, the relevance and the flexibility of pretopological spaces learning
Zhu, Xuan. „Structuration automatique en locuteurs par approche acoustique“. Phd thesis, Université Paris Sud - Paris XI, 2007. http://tel.archives-ouvertes.fr/tel-00624061.
Der volle Inhalt der QuelleNaturel, Xavier. „Structuration automatique de flux vidéos de télévision“. Phd thesis, Université Rennes 1, 2007. http://tel.archives-ouvertes.fr/tel-00524584.
Der volle Inhalt der QuelleNaturel, Xavier Gros Patrick. „Structuration automatique de flux vidéos de télévision“. [S.l.] : [s.n.], 2007. ftp://ftp.irisa.fr/techreports/theses/2007/naturel.pdf.
Der volle Inhalt der QuellePigeau, Antoine. „Structuration géo-temporelle de données multimédia personnelles“. Phd thesis, Nantes, 2005. http://www.theses.fr/2005NANT2131.
Der volle Inhalt der QuelleUsage of mobile devices raises the need for organizing large personal multimedia collection. The present work focus on personal image collections acquired from mobile phones equipped with a camera. We deal with the structuring of an image collection as a clustering problem. Our solution consists in building two distinct temporal and spatial partitions, based on the temporal and spatial metadata of each image. The main ingredients of our approach are the Gaussian mixture models and the ICL criterion to determine the models complexities. First, we propose an incremental optimization algorithm to build non-hierarchical partitions in an automatic manner. It is then combined with an agglomerative algorithm to provide an incremental hierarchical algorithm. Finally, two techniques are roposed to build hybrid spatio-temporal classifications taking into account the human machine interaction constraints
Nadif, Mohamed. „Classification automatique et données manquantes“. Metz, 1991. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/1991/Nadif.Mohamed.SMZ912.pdf.
Der volle Inhalt der QuelleFalip, Joris. „Structuration de données multidimensionnelles : une approche basée instance pour l'exploration de données médicales“. Thesis, Reims, 2019. http://www.theses.fr/2019REIMS014/document.
Der volle Inhalt der QuelleA posteriori use of medical data accumulated by practitioners represents a major challenge for clinical research as well as for personalized patient follow-up. However, health professionals lack the appropriate tools to easily explore, understand and manipulate their data. To solve this, we propose an algorithm to structure elements by similarity and representativeness. This method allows individuals in a dataset to be grouped around representative and generic members who are able to subsume the elements and summarize the data. This approach processes each dimension individually before aggregating the results and is adapted to high-dimensional data and also offers transparent, interpretable and explainable results. The results we obtain are suitable for exploratory analysis and reasoning by analogy: the structure is similar to the organization of knowledge and decision-making process used by experts. We then propose an anomaly detection algorithm that allows complex and high-dimensional anomalies to be detected by analyzing two-dimensional projections. This approach also provides interpretable results. We evaluate these two algorithms on real and simulated high-dimensional data with up to thousands of dimensions. We analyze the properties of graphs resulting from the structuring of elements. We then describe a medical data pre-processing tool and a web application for physicians. Through this intuitive tool, we propose a visual structure of the elements to ease the exploration. This decision support prototype assists medical diagnosis by allowing the physician to navigate through the data and explore similar patients. It can also be used to test clinical hypotheses on a cohort of patients
Rouvier, Mickael. „Structuration de contenus audio-visuel pour le résumé automatique“. Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00954238.
Der volle Inhalt der QuelleRouvier, Mickaël. „Structuration de contenus audio-visuel pour le résumé automatique“. Thesis, Avignon, 2011. http://www.theses.fr/2011AVIG0192/document.
Der volle Inhalt der QuelleThese last years, with the advent of sites such as Youtube, Dailymotion or Blip TV, the number of videos available on the Internet has increased considerably. The size and their lack of structure of these collections limit access to the contents. Sum- marization is one way to produce snippets that extract the essential content and present it as concisely as possible.In this work, we focus on extraction methods for video summary, based on au- dio analysis. We treat various scientific problems related to this objective : content extraction, document structuring, definition and estimation of objective function and algorithm extraction.On each of these aspects, we make concrete proposals that are evaluated.On content extraction, we present a fast spoken-term detection. The main no- velty of this approach is that it relies on the construction of a detector based on search terms. We show that this strategy of self-organization of the detector im- proves system robustness, which significantly exceeds the classical approach based on automatic speech recogntion.We then present an acoustic filtering method for automatic speech recognition based on Gaussian mixture models and factor analysis as it was used recently in speaker identification. The originality of our contribution is the use of decomposi- tion by factor analysis for estimating supervised filters in the cepstral domain.We then discuss the issues of structuring video collections. We show that the use of different levels of representation and different sources of information in or- der to characterize the editorial style of a video is principaly based on audio analy- sis, whereas most previous works suggested that the bulk of information on gender was contained in the image. Another contribution concerns the type of discourse identification ; we propose low-level models for detecting spontaneous speech that significantly improve the state of the art for this kind of approaches.The third focus of this work concerns the summary itself. As part of video summarization, we first try, to define what a synthetic view is. Is that what cha- racterizes the whole document, or what a user would remember (by example an emotional or funny moment) ? This issue is discussed and we make some concrete proposals for the definition of objective functions corresponding to three different criteria : salience, expressiveness and significance. We then propose an algorithm for finding the sum of the maximum interest that derives from the one introduced in previous works, based on integer linear programming
Gelgon, Marc. „Structuration statistique de données multimédia pour la recherche d'information“. Habilitation à diriger des recherches, Université de Nantes, 2007. http://tel.archives-ouvertes.fr/tel-00450297.
Der volle Inhalt der QuelleAlmeida, Barbosa Plínio. „Caractérisation et génération automatique de la structuration rythmique du français“. Grenoble INPG, 1994. http://www.theses.fr/1994INPG0119.
Der volle Inhalt der QuelleBen, Meftah Salma. „Structuration sématique de documents XML centres-documents“. Thesis, Toulouse 1, 2017. http://www.theses.fr/2017TOU10061/document.
Der volle Inhalt der QuelleLe résumé en anglais n'a pas été communiqué par l'auteur
Benadi, Sofiane Abdelkader. „Structuration des données et des services pour le télé-enseignement“. Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0058/these.pdf.
Der volle Inhalt der QuelleThe evolution of the ICT basically touches the educational field. While fitting in this current, this thesis relates to the design of environments for e-learning. More particularly, our work concerns the implementation of a model for structuring the data and the services whose goal is to used as a basic for the design of hypermedia environments allowing the dynamic generation of pedagogical activities adapted to the profiles and the preferences of the learners. This adaptation is carried out thanks to the use of the various languages revolving around XML technology and thanks to a horizontal system division in three interdependent levels (support, structure and semantics levels). Their respective roles are explained each time by describing the interests of this modelling. Finally, we describe a platform respecting this model which was implemented in order to validate all our proposals
Njike, Fotzo Hermine. „Structuration Automatique de Corpus Textuels par Apprentissage Automatique : Automatically structuring textual corpora with machine learning methods“. Paris 6, 2004. http://www.theses.fr/2004PA066567.
Der volle Inhalt der QuelleScheffer, Nicolas. „Structuration de l'espace acoustique par le modèle générique pour la vérification du locuteur“. Avignon, 2006. http://www.theses.fr/2006AVIG0146.
Der volle Inhalt der QuelleDaniel-Vatonne, Marie-Christine. „Les termes : un modèle de représentation et structuration de données symboliques“. Montpellier 2, 1993. http://www.theses.fr/1993MON20031.
Der volle Inhalt der QuelleDupont, Yoann. „La structuration dans les entités nommées“. Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCA100/document.
Der volle Inhalt der QuelleNamed entity recognition is a crucial discipline of NLP. It is used to extract relations between named entities, which allows the construction of knowledge bases (Surdeanu and Ji, 2014), automatic summary (Nobata et al., 2002) and so on. Our interest in this thesis revolves around structuration phenomena that surround them.We distinguish here two kinds of structural elements in named entities. The first one are recurrent substrings, that we will call the caracteristic affixes of a named entity. The second type of element is tokens with a good discriminative power, which we call trigger tokens of named entities. We will explain here the algorithm we provided to extract such affixes, which we will compare to Morfessor (Creutz and Lagus, 2005b). We will then apply the same algorithm to extract trigger tokens, which we will use for French named entity recognition and postal address extraction.Another form of structuration for named entities is of a syntactic nature. It follows an overlapping or tree structure. We propose a novel kind of linear tagger cascade which have not been used before for structured named entity recognition, generalising other previous methods that are only able to recognise named entities of a fixed depth or being unable to model certain characteristics of the structure. Ours, however, can do both.Throughout this thesis, we compare two machine learning methods, CRFs and neural networks, for which we will compare respective advantages and drawbacks
Mühlhoff, Philippe. „HBDS structuration d'un système de CAO intergraph“. Paris 6, 1990. http://www.theses.fr/1990PA066634.
Der volle Inhalt der QuelleSautot, Lucile. „Conception et implémentation semi-automatique des entrepôts de données : application aux données écologiques“. Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS055/document.
Der volle Inhalt der QuelleThis thesis concerns the semi-automatic design of data warehouses and the associated OLAP cubes analyzing ecological data.The biological sciences, including ecology and agronomy, generate data that require an important collection effort: several years are often required to obtain a complete data set. Moreover, objects and phenomena studied by these sciences are complex and require many parameter recording to be understood. Finally, the collection of complex data over a long time results in an increased risk of inconsistency. Thus, these sciences generate numerous and heterogeneous data, which can be inconsistent. It is interesting to offer to scientists, who work in life sciences, information systems able to store and restore their data, particularly when those data have a significant volume. Among the existing tools, business intelligence tools, including online analytical systems (On-Line Analytical processing: OLAP), particularly caught our attention because it is data analysis process working on large historical collections (i.e. a data warehouse) to provide support to the decision making. The business intelligence offers tools that allow users to explore large volumes of data, in order to discover patterns and knowledge within the data, and possibly confirm their hypotheses.However, OLAP systems are complex information systems whose implementation requires advanced skills in business intelligence. Thus, although they have interesting features to manage and analyze multidimensional data, their complexity makes them difficult to manage by potential users, who would not be computer scientists.In the literature, several studies have examined the automatic multidimensional design, but the examples provided by theses works were traditional data. Moreover, other articles address the multidimensional modeling adapted to complex data (inconsistency, heterogeneous data, spatial objects, texts, images within a warehouse ...) but the proposed methods are rarely automatic. The aim of this thesis is to provide an automatic design method of data warehouse and OLAP cubes. This method must be able to take into account the inherent complexity of biological data. To test the prototypes, that we proposed in this thesis, we have prepared a data set concerning bird abundance along the Loire. This data set is structured as follows: (1) we have the census of 213 bird species (described with a set of qualitative factors, such as diet) in 198 points along the river for 4 census campaigns; (2) each of the 198 points is described by a set of environmental variables from different sources (land surveys, satellite images, GIS). These environmental variables address the most important issue in terms of multidimensional modeling. These data come from different sources, sometimes independent of bird census campaigns, and are inconsistent in time and space. Moreover, these data are heterogeneous: they can be qualitative factors, quantitative varaibles or spatial objects. Finally, these environmental data include a large number of attributes (158 selected variables) (...)
Megdiche, Bousarsar Imen. „Intégration holistique et entreposage automatique des données ouvertes“. Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30214/document.
Der volle Inhalt der QuelleStatistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations
Tlili, Assed. „Structuration des données de la conception d'un bâtiment pour une utilisation informatique“. Phd thesis, Ecole Nationale des Ponts et Chaussées, 1986. http://tel.archives-ouvertes.fr/tel-00529509.
Der volle Inhalt der QuelleAguila, Orieta Del. „Analyse et structuration des données dans les logiciels de CAO en électromagnétisme“. Grenoble INPG, 1988. http://www.theses.fr/1988INPG0077.
Der volle Inhalt der QuelleCho, Choong-Ho. „Structuration des données et caractérisation des ordonnancements admissibles des systèmes de production“. Lyon, INSA, 1989. http://www.theses.fr/1989ISAL0053.
Der volle Inhalt der QuelleThis work deals, on the one band, with the specification and the modelization of data bases for the scheduling problems in a hierarchical architecture of manufacturing systems, on the other hand, with the analytical specification of the set of feasible solutions for the decision support scheduling problems about three different types of workshops: - first, made up several machines (flowshop: sequences of operations are the same for all jobs), considering the important cri teri on as the set up times under set tasks groups) and potential. Constraints, - second, with only one machine, under the given due dates of jobs constraints, finally, organised in a jobshop, under the three previous constraints: set, potential and due dates. One of original researchs concerns the new structure: PQR trees, to characterise the set of feasible sequences of tasks
Méger, Nicolas. „Recherche automatique des fenêtres temporelles optimales des motifs séquentiels“. Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0095/these.pdf.
Der volle Inhalt der QuelleThis work addresses the problem of mining patterns under constraints in event sequences. Extracted patterns are episode rules. Our main contribution is an automatic search for optimal time window of each one of the episode rules. We propose to extract only rules having such an optimal time window. These rules are termed FLM-rules. We present an algorithm, WinMiner, that aims to extract FLM-rules, given a minimum support threshold, a minimum confidence threshold and a maximum gap constraint. Proofs of the correctness of this algorithm are supplied. We also propose a dedicated interest measure that aims to select FLM-rules such that their heads and bodies can be considered as dependant. Two applications are described. The first one is about mining medical datasets while the other one deals with seismic datasets
Aouiche, Kamel. „Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données“. Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
Der volle Inhalt der QuelleWith the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
Guo, Li. „Classifieurs multiples intégarnt la marge d'ensemble. Application aux données de télédétection“. Bordeaux 3, 2011. http://www.theses.fr/2011BOR30022.
Der volle Inhalt der QuelleThis dissertation focuses on exploiting the ensemble margin concept to design better ensemble classifiers. Some training data set issues, such as redundancy, imbalanced classes and noise, are investigated in an ensemble margin framework. An alternative definition of the ensemble margin is at the core of this work. An innovative approach to measure the importance of each instance in the learning process is introduced. We show that there is less redundancy among smaller margin instances than among higher margin ones. In addition, these smaller margin instances carry more significant information than higher margin instances. Therefore, these low margin instances have a major influence in forming an appropriate training set to build up a reliable classifier. Based on these observations, we propose a new boundary bagging method. Another major issue that is investigated in this thesis is the complexity induced by an ensemble approach which usually involves a significant number of base classifiers. A new efficient ensemble pruning method is proposed. It consists in ordering all the base classifiers with respect to an entropy-inspired criterion that also exploits our new version of the margin of ensemble methods. Finally, the proposed ensemble methods are applied to remote sensing data analysis at three learning levels: data level, feature level and classifier level
Girard, Régis. „Classification conceptuelle sur des données arborescentes et imprécises“. La Réunion, 1997. http://elgebar.univ-reunion.fr/login?url=http://thesesenligne.univ.run/97_08_Girard.pdf.
Der volle Inhalt der QuelleFarenc, Christelle. „Ergoval : une méthode de structuration des règles ergonomiques permettant l'évaluation automatique d'interfaces graphiques“. Toulouse 1, 1997. http://www.theses.fr/1997TOU10013.
Der volle Inhalt der QuelleThe thesis introduces a new method for structuring ergonomic rules in order to evaluate graphical user interface. This method performed in collaboration with the SRTP (post office technical research unit) aims to be used by computer experts and to be integrated in an automatic user interface evaluation tool : ERGOVAL. In order to provide information to developers in a way they can handle it to modify the interface, ergonomic rules were reformulated to concern directly graphical objects of the user interface. Knowledge involved in the evaluation was structured in this way : * a representation of the UI in terms of the interaction objects of the norm CUA was built : this is the decomposition of graphical objects * all graphical objects concerned by the same set of ergonomic rules are grouped together into classes of objects : the typology of graphic objects. . The resulting typology consists in several levels of abstraction, the graphical objects being the leaves of this typology. The links of this typology are types of links which have hierarchical properties, i. E. Each type inherits attributes from the parent type and associated rules. A mock-up of the ERGOVAL tool was made to validate knowledge structuration and to define specifications of the final tool. In order to determine the scale application, the automatic and qualitative dimensions were studied especially the automatic retrieval of interface description and the number and level of ergonomic rules integrated in the mock-up. Consequently, the quality of an automatic evaluation and an evaluation of high level ergonomic rules were determined
Bossut, Philippe. „Analyse des données : application à l'analyse automatique d'images multispectrales“. École nationale supérieure des mines de Paris, 1986. http://www.theses.fr/1986ENMP0010.
Der volle Inhalt der QuelleTisserant, Guillaume. „Généralisation de données textuelles adaptée à la classification automatique“. Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS231/document.
Der volle Inhalt der QuelleWe have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data
Jeannin, Akodjénou Marc-Ismaël. „Clustering et volume des données“. Paris 6, 2008. http://www.theses.fr/2009PA066270.
Der volle Inhalt der QuelleRodriguez-Rojas, Oldemar. „Classification et modèles linéaires en analyse des données symboliques“. Paris 9, 2000. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2000PA090064.
Der volle Inhalt der QuelleGomes, da Silva Alzennyr. „Analyse des données évolutives : Application aux données d'usage du Web“. Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Der volle Inhalt der QuelleNowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Thion, Romuald. „Structuration relationnelle des politiques de contrôle d'accès : représentation, raisonnement et vérification logiques“. Lyon, INSA, 2008. http://theses.insa-lyon.fr/publication/2008ISAL0028/these.pdf.
Der volle Inhalt der QuelleAccess control is a mechanism which defmes and controls the privileges of users in a system. Nowadays, it is one of the most common and pervasive mechanisms used for security enforcement in information systems. Access control policies are sets of facts and rules organized by mean of access control models. Sin ce the role-based access control initiative, several access control models have been proposed in the literature. The policies and models have become larger and more complex, and several issues on formalization, verification and administration have appeared. The PhD thesis shows that access control models share common characteristics. Upon analysis and synthesis of these traits, we propose a relational structuration for the design, organization and formalization of privileges. The framework is built upon data dependencies: fragments of first-order logic dedicated to express constraints between relational data. Some results from the data bases community benefit the approach by helping address current issues on expression, verification, and reasoning on access control policies. We focus particularly on the integrity property ofpolicies: guaranteeing that the policies enforce the properties defined in the model. The thesis profits from bridges between data dependencies, conceptual graphs and formal concepts analysis. Thus, we propose a graphical representation of the models and a semi-automated method for eengineering the policies. Finally, we present perspectives for access control models based upon recent applications of data dependencies from the databases community
Delakis, Emmanouil Gros Patrick Gravier Guillaume. „Structuration multimodale des vidéos de tennis en utilisant des modèles segmentaux“. [S.l.] : [s.n.], 2006. ftp://ftp.irisa.fr/techreports/theses/2006/delakis.pdf.
Der volle Inhalt der QuelleGorin, Arseniy. „Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole“. Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0161/document.
Der volle Inhalt der QuelleThis thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)
Gorin, Arseniy. „Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole“. Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0161.
Der volle Inhalt der QuelleThis thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)
Ben-Henia, Iteb. „Degré de figement et double structuration des séquences verbales figées“. Paris 13, 2007. http://www.theses.fr/2007PA131007.
Der volle Inhalt der QuelleLexical frozeness is one of the main obstacles to automatic processing of natural language. The present work intends to be a contribution to improve automatic processing applied to fossilized verbal sequences (SVF) such as casser sa pipe. Notions of degrees of lexical frozeness and double structuration guided our study. In order to determinate the degree of lexical frozeness of SVF, we analysed their internal structure and their external structuration. Consequently, they can be included in predicates classes developped in L. L. I. , based on G. GROSS theory of object classes. After a survey of publications about (verbal) lexical frozeness, we identify and collect criteria to measure SVF degrees of lexical frozeness through the general structure analysis of [V SN SP] sequences. Then, we propose formal tools for automatic recognition of metaphor through analysis of SVF coming from sports. Lastly, we described two syntactico-semantic classes of predicats: and <états humains>