Dissertations / Theses on the topic 'Base de données graphes'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Base de données graphes.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Castelltort, Arnaud. "Historisation de données dans les bases de données NoSQLorientées graphes." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20076.
Full textThis thesis deals with data historization in the context of graphs. Graph data have been dealt with for many years but their exploitation in information systems, especially in NoSQL engines, is recent. The emerging Big Data and 3V contexts (Variety, Volume, Velocity) have revealed the limits of classical relational databases. Historization, on its side, has been considered for a long time as only linked with technical and backups issues, and more recently with decisional reasons (Business Intelligence). However, historization is now taking more and more importance in management applications.In this framework, graph databases that are often used have received little attention regarding historization. Our first contribution consists in studying the impact of historized data in management information systems. This analysis relies on the hypothesis that historization is taking more and more importance. Our second contribution aims at proposing an original model for managing historization in NoSQL graph databases.This proposition consists on the one hand in elaborating a unique and generic system for representing the history and on the other hand in proposing query features.We show that the system can support both simple and complex queries.Our contributions have been implemented and tested over synthetic and real databases
Ingalalli, Vijay. "Querying and Mining Multigraphs." Thesis, Montpellier, 2017. http://www.theses.fr/2017MONTS080/document.
Full textWith the ever-increasing growth of data and information, extracting the right knowledge has become a real challenge.Further, the advanced applications demand the analysis of complex, interrelated data which cannot be adequately described using a propositional representation. The graph representation is of great interest for the knowledge extraction community, since graphs are versatile data structures and are one of the most general forms of data representation. Among several classes of graphs, textit{multigraphs} have been captivating the attention in the recent times, thanks to their inherent property of succinctly representing the entities by allowing the rich and complex relations among them.The focus of this thesis is streamlined into two themes of knowledge extraction; one being textit{knowledge retrieval}, where we focus on the subgraph query matching aspects in multigraphs, and the other being textit{knowledge discovery}, where we focus on the problem of frequent pattern mining in multigraphs.This thesis makes three main contributions in the field of query matching and data mining.The first contribution, which is very generic, addresses querying subgraphs in multigraphs that yields isomorphic matches, and this problem finds potential applications in the domains of remote sensing, social networks, bioinformatics, chemical informatics. The second contribution, which is focussed on knowledge graphs, addresses querying subgraphs in RDF multigraphs that yield homomorphic matches. In both the contributions, we introduce efficient indexing structures that capture the multiedge information. The query matching processes introduced have been carefully optimized, w.r.t. the time performance and the heuristics employed assure robust performance.The third contribution is in the field of data mining, where we propose an efficient frequent pattern mining algorithm for multigraphs. We observe that multigraphs pose challenges while exploring the search space, and hence we introduce novel optimization techniques and heuristic search methods to swiftly traverse the search space.For each proposed approach, we perform extensive experimental analysis by comparing with the existing state-of-the-art approaches in order to validate the performance and correctness of our approaches.In the end, we perform a case study analysis on a remote sensing dataset. Remote sensing dataset is modelled as a multigraph, and the mining and query matching processes are employed to discover some useful knowledge
Tung, Tony. "Indexation 3D de bases de données d'objets par graphes de Reeb améliorés." Paris, ENST, 2005. http://www.theses.fr/2005ENST0013.
Full textThe strong development of numerical technologies has lead to efficient 3D acquisition of real objects and rendering of 3D methods. Nowadays 3D object databases appear in various areas for leisure (games, multimedia) as well as for scientific applications (medical, industrial part catalogues, cultural heritage, etc. ). Large database can be nowadays quickly populated using 3D mesh acquisition and reconstruction tools which have become easy to use, and with new ergonomic 3D design tools which have become very popular. As database size is growing, tools to retrieve information become more and more important. 3D object indexing appears to be a useful and very promising way to manage this new kind of data. As our study took place in the framework of the european project SCULPTEUR IST-2001-35372 which involved museums, we worked with museological 3D model databases. Database indexing consists on defining a method able to perform comparisons between the database elements. Similarity retrieval is one of the main application: using a research “key”, a subset of elements with the most similar keys are extracted from the database. This manuscript presents a 3D shape matching method for 3D mesh models applied to content-based search in database of 3D objects. The approach is based on the multiresolution Reeb graph (MRG) proposed by [Hilaga et al, 01]. MRG provides a rich representation of shapes able in particular to embed the object topology. In our framework, we consider 3D mesh models of various geometrical complexity, of different resolution, and when available with color texture map. The original approach, mainly based on the 3D object topology, is not accurate enough to obtain
Ghazal, Moultazem. "Contribution à la gestion des données géographiques : Modélisation et interrogation par croquis." Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00504944.
Full textDouar, Brahim. "Fouille de sous-graphes fréquents à base d'arc consistance." Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20108/document.
Full textWith the important growth of requirements to analyze large amount of structured data such as chemical compounds, proteins structures, social networks, to cite but a few, graph mining has become an attractive track and a real challenge in the data mining field. Because of the NP-Completeness of subgraph isomorphism test as well as the huge search space, frequent subgraph miners are exponential in runtime and/or memory use. In order to alleviate the complexity issue, existing subgraph miners have explored techniques based on the minimal support threshold, the description language of the examples (only supporting paths, trees, etc.) or hypothesis (search for shared trees or common paths, etc.). In this thesis, we are using a new projection operator, named AC-projection, which exhibits nice complexity properties as opposed to the graph isomorphism operator. This operator comes from the constraints programming field and has the advantage of a polynomial complexity. We propose two frequent subgraph mining algorithms based on the latter operator. The first one, named FGMAC, follows a breadth-first order to find frequent subgraphs and takes advantage of the well-known Apriori levelwise strategy. The second is a pattern-growth approach that follows a depth-first search space exploration strategy and uses powerful pruning techniques in order to considerably reduce this search space. These two approaches extract a set of particular subgraphs named AC-reduced frequent subgraphs. As a first step, we have studied the search space for discovering such frequent subgraphs and proved that this one is smaller than the search space of frequent isomorphic subgraphs. Then, we carried out experiments in order to prove that FGMAC and AC-miner are more efficient than the state-of-the-art algorithms. In the same time, we have studied the relevance of frequent AC-reduced subgraphs, which are much fewer than isomorphic ones, on classification and we conclude that we can achieve an important performance gain without or with non-significant loss of discovered pattern's quality
Ben, Dhia Imen. "Gestion des grandes masses de données dans les graphes réels." Electronic Thesis or Diss., Paris, ENST, 2013. http://www.theses.fr/2013ENST0087.
Full textIn the last few years, we have been witnessing a rapid growth of networks in a wide range of applications such as social networking, bio-informatics, semantic web, road maps, etc. Most of these networks can be naturally modeled as large graphs. Managing, analyzing, and querying such data has become a very important issue, and, has inspired extensive interest within the database community. In this thesis, we address the problem of efficiently answering distance queries in very large graphs. We propose EUQLID, an efficient algorithm to answer distance queries on very large directed graphs. This algorithm exploits some interesting properties that real-world graphs exhibit. It is based on an efficient variant of the seminal 2-hop algorithm. We conducted an extensive set of experiments against state-of-the-art algorithms which show that our approach outperforms existing approaches and that distance queries can be processed within hundreds of milliseconds on very large real-world directed graphs. We also propose an access control model for social networks which can make use of EUQLID to scale on very large graphs. This model allows users to specify fine-grained privacy policies based on their relations with other users in the network. We describe and demonstrate Primates as a prototype which enforces the proposed access control model and allows users to specify their privacy preferences via a graphical user-friendly interface
Buron, Maxime. "Raisonnement efficace sur des grands graphes hétérogènes." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAX061.
Full textThe Semantic Web offers knowledge representations, which allow to integrate heterogeneous data from several sources into a unified knowledge base. In this thesis, we investigate techniques for querying such knowledge bases.The first part is devoted to query answering techniques on a knowledge base, represented by an RDF graph subject to ontological constraints. Implicit information entailed by the reasoning, enabled by the set of RDFS entailment rules, has to be taken into account to correctly answer such queries. First, we present a sound and complete query reformulation algorithm for Basic Graph Pattern queries, which exploits a partition of RDFS entailment rules into assertion and constraint rules. Second, we introduce a novel RDF storage layout, which combines two well-known layouts. For both contributions, our experiments assess our theoretical and algorithmic results.The second part considers the issue of querying heterogeneous data sources integrated into an RDF graph, using BGP queries. Following the Ontology-Based Data Access paradigm, we introduce a framework of data integration under an RDFS ontology, using the Global-Local-As-View mappings, rarely considered in the literature.We present several query answering strategies, which may materialize the integrated RDF graph or leave it virtual, and differ on how and when RDFS reasoning is handled. We implement these strategies in a platform, in order to conduct experiments, which demonstrate the particular interest of one of the strategies based on mapping saturation. Finally, we show that mapping saturation can be extended to reasoning defined by a subset of existential rules
Ben, Dhia Imen. "Gestion des grandes masses de données dans les graphes réels." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0087/document.
Full textIn the last few years, we have been witnessing a rapid growth of networks in a wide range of applications such as social networking, bio-informatics, semantic web, road maps, etc. Most of these networks can be naturally modeled as large graphs. Managing, analyzing, and querying such data has become a very important issue, and, has inspired extensive interest within the database community. In this thesis, we address the problem of efficiently answering distance queries in very large graphs. We propose EUQLID, an efficient algorithm to answer distance queries on very large directed graphs. This algorithm exploits some interesting properties that real-world graphs exhibit. It is based on an efficient variant of the seminal 2-hop algorithm. We conducted an extensive set of experiments against state-of-the-art algorithms which show that our approach outperforms existing approaches and that distance queries can be processed within hundreds of milliseconds on very large real-world directed graphs. We also propose an access control model for social networks which can make use of EUQLID to scale on very large graphs. This model allows users to specify fine-grained privacy policies based on their relations with other users in the network. We describe and demonstrate Primates as a prototype which enforces the proposed access control model and allows users to specify their privacy preferences via a graphical user-friendly interface
Dalleau, Kevin. "Une approche stochastique à base d’arbres aléatoires pour le calcul de dissimilarités : application au clustering pour diverses structures de données." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0181.
Full textThe notion of distance, and more generally of dissimilarity, is an important one in data mining, especially in unsupervised approaches. The algorithms belonging to this class of methods aim at grouping objects in an homogeneous way, and many of them rely on a notion of dissimilarity, in order to quantify the proximity between objects. The choice of algorithms as well as that of dissimilarities is not trivial. Several elements can motivate these choices, such as the type of data – homogeneous data or not –, their representation – feature vectors, graphs –, or some of their characteristics – highly correlated, noisy, etc. –. Although many measures exist, their choice can become complex in some specific settings. This leads to additional complexity in data mining tasks. In this thesis, we present a new approach for computing dissimilarities based on random trees. It is an original approach, which has several advantages such as a great versatility. Indeed, using different dissimilarity calculation modules that we can plug to the method, it becomes possible to apply it in various settings. In particular, we present in this document two modules, enabling the computation of dissimilarities - and, in fine, clustering - on data structured as feature vectors, and on data in the form of graphs. We discuss the very promising results obtained by this approach, as well as the numerous perspectives that it opens, such as the computation of dissimilarity in the framework of attributed graphs, through a unified approach
Fallouh, Fouad. "Données complexes et relation universelle avec inclusions : une aide à la conception et à l'interrogation des bases de données." Lyon 1, 1994. http://www.theses.fr/1994LYO10217.
Full textZneika, Mussab. "Interrogation du web sémantique à l'aide de résumés de graphes de données." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1010.
Full textThe amount of RDF data available increases fast both in size and complexity, making available RDF Knowledge Bases (KBs) with millions or even billions of triples something usual, e.g. more than 1000 datasets are now published as part of the Linked Open Data (LOD) cloud, which contains more than 62 billion RDF triples, forming big and complex RDF data graphs. This explosion of size, complexity and number of available RDF Knowledge Bases (KBs) and the emergence of Linked Datasets made querying, exploring, visualizing, and understanding the data in these KBs difficult both from a human (when trying to visualize) and a machine (when trying to query or compute) perspective. To tackle this problem, we propose a method of summarizing a large RDF KBs based on representing the RDF graph using the (best) top-k approximate RDF graph patterns. The method is named SemSum+ and extracts the meaningful/descriptive information from RDF Knowledge Bases and produces a succinct overview of these RDF KBs. It extracts from the RDF graph, an RDF schema that describes the actual contents of the KB, something that has various advantages even compared to an existing schema, which might be partially used by the data in the KB. While computing the approximate RDF graph patterns, we also add information on the number of instances each of the patterns represents. So, when we query the RDF summary graph, we can easily identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result. The method we propose does not require the presence of the initial schema of the KB and works equally well when there is no schema information at all (something realistic with modern KBs that are constructed either ad-hoc or by merging fragments of other existing KBs). Additionally, the proposed method works equally well with homogeneous (having the same structure) and heterogeneous (having different structure, possibly the result of data described under different schemas/ontologies) RDF graphs.Given that RDF graphs can be large and complex, methods that need to compute the summary by fitting the whole graph in the memory of a (however large) machine will not scale. In order to overcome this problem, we proposed, as part of this thesis, a parallel framework that allows us to have a scalable parallel version of our proposed method. This will allow us to compute the summaries of any RDF graph regardless of size. Actually, we generalized this framework so as to be usable by any approximate pattern mining algorithm that needs parallelization.But working on this problem, introduced us to the issue of measuring the quality of the produced summaries. Given that in the literature exist various algorithms that can be used to summarize RDF graphs, we need to understand which one is better suited for a specific task or a specific RDF KB. In the literature, there is a lack of widely accepted evaluation criteria or an extensive empirical evaluation. This leads to the necessity of a method to compare and evaluate the quality of the produced summaries. So, in this thesis, we provide a comprehensive Quality Framework for RDF Graph Summarization to cover the gap that exists in the literature. This framework allows a better, deeper and more complete understanding of the quality of the different summaries and facilitates their comparison. It is independent of the way RDF summarization algorithms work and makes no assumptions on the type or structure neither of the input nor of the final results. We provide a set of metrics that help us understand not only if this is a valid summary but also how a summary compares to another in terms of the specified quality characteristic(s). The framework has the ability, which was experimentally validated, to capture subtle differences among summaries and produce metrics that depict that and was used to provide an extensive experimental evaluation and comparison of our method
Francis, Nadime. "Vues et requêtes sur les graphes de données : déterminabilité et réécritures." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLN015/document.
Full textGraph databases appear naturally in various scenarios, such as social networks and the semantic Web. In these cases, the information contained in the database lies as much in the data itself as in the topology of the graph, that is, in how the data points are linked together. This leads to considering traditional database theory questions for query languages that return data nodes based on the paths of the graph connecting them. We focus our attention on the view-based query determinacy and rewriting problems. They ask the question whether a view of the database contains enough information to fully answer a query without accessing the database directly. If so, we then want to express the answer to the query directly with regards to the view. This setting occurs in many applications, such as data integration and query optimization. We start by comparing these two tasks to other common task in this setting: computing certain answers, checking consistency of a view instance and updating it. We then build on these results in two specific cases. First, we show that for regular path queries, the existence of a monotone rewriting coincides with the existence of a rewriting expressible in Datalog. Then, we show that for views that only consider the lengths of the path in the graph, we can decide a weaker form of determinacy, called asymptotic determinacy, and produce first-order rewritings for the queries that are asymptotically determined
Lutz, Quentin. "Graph-based contributions to machine-learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT010.
Full textA graph is a mathematical object that makes it possible to represent relationships (called edges) between entities (called nodes). Graphs have long been a focal point in a number of problems ranging from work by Euler to PageRank and shortest-path problems. In more recent times, graphs have been used for machine learning.With the advent of social networks and the world-wide web, more and more datasets can be represented using graphs. Those graphs are ever bigger, sometimes with billions of edges and billions of nodes. Designing efficient algorithms for analyzing those datasets has thus proven necessary. This thesis reviews the state of the art and introduces new algorithms for the clustering and the embedding of the nodes of massive graphs. Furthermore, in order to facilitate the handling of large graphs and to apply the techniques under study, we introduce Scikit-network, a free and open-source Python library which was developed during the thesis. Many tasks, such as the classification or the ranking of the nodes using centrality measures, can be carried out thanks to Scikit-network.We also tackle the problem of labeling data. Supervised machine learning techniques require labeled data to be trained. The quality of this labeled data has a heavy influence on the quality of the predictions of those techniques once trained. However, building this data cannot be achieved through the sole use of machines and requires human intervention. We study the data labeling problem in a graph-based setting, and we aim at describing the solutions that require as little human intervention as possible. We characterize those solutions and illustrate how they can be applied in real use-cases
Ettaleb, Mohamed. "Approche de recommandation à base de fouille de données et de graphes étiquetés multi-couches : contributions à la RI sociale." Electronic Thesis or Diss., Aix-Marseille, 2020. http://www.theses.fr/2020AIXM0588.
Full textIn general, the purpose of a recommendation system is to assist users in selecting relevant elements from a wide range of elements. In the context of the explosion in the number of academic publications available (books, articles, etc.) online, providing a personalized recommendation service is becoming a necessity. In addition, automatic book recommendation based on a query is an emerging theme with many scientific locks. It combines several issues related to information retrieval and data mining for the assessment of the degree of opportunity to recommend a book. This assessment must be made taking into account the query but also the user profile (reading history, interest, notes and comments associated with previous readings) and the entire collection to which the document belongs. Two main avenues have been addressed in this paper to deal with the problem of automatic book recommendation : - Identification of the user’s intentions from a query. - Recommendation of relevant books according to the user’s needs
Cuenca, Pauta Erick. "Visualisation de données dynamiques et complexes : des séries temporelles hiérarchiques aux graphes multicouches." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS054/document.
Full textThe analysis of data that is increasingly complex, large and from different sources (e.g. internet, social medias, etc.) is a dificult task. However, it remains crucial for many fields of application. It implies, in order to extract knowledge, to better understand the nature of the data, its evolution or the many complex relationships it may contain. Information visualization is about visual and interactive representation methods to help a user to extract knowledge. The work presented in this document takes place in this context. At first, we are interested in the visualization of large hierarchical time series. After analyzing the different existing approaches, we present the MultiStream system for visualizing, exploring and comparing the evolution of the series organized into a hierarchical structure. We illustrate its use by two examples: emotions expressed in social media and the evolution of musical genres. In a second time, we tackle the problem of complex data modeled in the form of multilayer graphs (different types of edges can connect the nodes). More specifically, we are interested in the visual querying of large graphs and we present VERTIGo, a system which makes it possible to build queries, to launch them on a specific engine, to visualize/explore the results at different levels of details and to suggest new query extensions. We illustrate its use with a graph of co-authors from different communities
Pech, Palacio Manuel Alfredo. "Spatial data modeling and mining using a graph-based representation." Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0118/these.pdf.
Full textWe propose a unique graph-based model to represent spatial data, non-spatial data and the spatial relations among spatial objects. We will generate datasets composed of graphs with a set of these three elements. We consider that by mining a dataset with these characteristics a graph-based mining tool can search patterns involving all these elements at the same time improving the results of the spatial analysis task. A significant characteristic of spatial data is that the attributes of the neighbors of an object may have an influence on the object itself. So, we propose to include in the model three relationship types (topological, orientation, and distance relations). In the model the spatial data (i. E. Spatial objects), non-spatial data (i. E. Non-spatial attributes), and spatial relations are represented as a collection of one or more directed graphs. A directed graph contains a collection of vertices and edges representing all these elements. Vertices represent either spatial objects, spatial relations between two spatial objects (binary relation), or non-spatial attributes describing the spatial objects. Edges represent a link between two vertices of any type. According to the type of vertices that an edge joins, it can represent either an attribute name or a spatial relation name. The attribute name can refer to a spatial object or a non-spatial entity. We use directed edges to represent directional information of relations among elements (i. E. Object x touches object y) and to describe attributes about objects (i. E. Object x has attribute z). We propose to adopt the Subdue system, a general graph-based data mining system developed at the University of Texas at Arlington, as our mining tool. A special feature named overlap has a primary role in the substructures discovery process and consequently a direct impact over the generated results. However, it is currently implemented in an orthodox way: all or nothing. Therefore, we propose a third approach: limited overlap, which gives the user the capability to set over which vertices the overlap will be allowed. We visualize directly three motivations issues to propose the implementation of the new algorithm: search space reduction, processing time reduction, and specialized overlapping pattern oriented search
Castets, Mathieu. "Pavages réguliers et modélisation des dynamiques spatiales à base de graphes d'interaction : conception, implémentation, application." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS241/document.
Full textThe modelling and simulation of spatial dynamics, particularly for studying landscape changes or environmental issues, raises the question of integrating different forms of spatial representation within the same model. Ocelet is an approach for modelling spatial dynamics based on the original concept of interaction graph. Such a graph holds both the structure of a relation between entities of a model and the semantics describing its evolution. The relationships between spatial entities are here translated into interaction graphs and these graphs are made to evolve during a simulation. The concepts on which Ocelet is based can potentially handle two known forms of spatial representation: shapes with contours (vector format) or regular grid cells (raster). The vector format is already integrated in the first version of Ocelet. The integration of raster and the combination of the two remained to be studied and carried out. The aim of the thesis is to first study the issues related to the integration of continuous fields and their representation by regular tiling, both in the Ocelet language and the concepts on which it is based. The dynamic aspects of this integration had to be taken into account and transitions between different forms of geographic data and interaction graphs had to be studied in the light of the concepts formalized. The concepts were then implemented in the Ocelet modelling platform, with the adaptation of both its compiler and runtime. Finally, these new concepts and tools were tested in three very different cases: two models on Reunion Island, the first simulating runoff in Ravine Saint Gilles watershed in the West Coast of the island, the other simulating the spread of invasive plants in the high plains inside the Reunion National Park. The last case describes the spatialisation of a crop model and is applied here to simulate the cereal crop yields in West Africa, in the context of an early warning system for regional crop monitoring
Hiot, Nicolas. "Construction automatique de bases de données pour le domaine médical : Intégration de texte et maintien de la cohérence." Electronic Thesis or Diss., Orléans, 2024. http://www.theses.fr/2024ORLE1026.
Full textThe automatic construction of databases in the medical field represents a major challenge for guaranteeing efficient information management and facilitating decision-making. This research project focuses on the use of graph databases, an approach that offers dynamic representation and efficient querying of data and its topology. Our project explores the convergence between databases and automatic language processing, with two central objectives. In one hand, our focus is on maintaining consistency within graph databases during updates, particularly with incomplete data and specific business rules. Maintaining consistency during updates ensures a uniform level of data quality for all users and facilitates analysis. In a world of constant change, we give priority to updates, which may involve modifying the instance to accommodate new information. But how can we effectively manage these successive updates within a graph database management system? In a second hand, we focus on the integration of information extracted from text documents, a major source of data in the medical field. In particular, we are looking at clinical cases and pharmacovigilance, a crucial area for identifying the risks and adverse effects associated with the use of drugs. But, how can we detect information in texts? How can this unstructured data be efficiently integrated into a graph database? How can it be structured automatically? And finally, what is a valid structure in this context? We are particularly interested in encouraging reproducible research by adopting a transparent and documented approach to enable independent verification and validation of our results
Acosta, Francisco. "Les arbres balances : spécification, performances et contrôle de concurrence." Montpellier 2, 1991. http://www.theses.fr/1991MON20201.
Full textDavid, Romain. "De la conception d'un système d'observation à large échelle au déploiement et à l'exploitation de son système d'information : application à l'observation des habitats coralligènes et à la colonisation de récifs artificiels (ARMS)." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0215/document.
Full textIn the marine domain, observation protocols developed in many settings produce a large volume of heterogeneous data that are difficult to aggregate and use. This work proposes to develop i) methods, protocols and recommendations to build and / or support the establishment of multi-user monitoring networks, ii) innovative uses of data.Two case studies were chosen: coralligenous habitats at the Mediterranean scale and the colonisation of artificial reefs in different regional seas.Large-scale experimentation is based on the simplest possible measurement methods, described very explicitly in standardised terms, on intercalibrated operators and a method of data processing. A mechanism for coupling data from different origins based on the requalification of heterogeneous descriptive factors and a method for analysis and data mining based on graph theory is also proposed
Kamal-Idrissi, Assia. "Optimisation des réseaux aériens : analyse et sélection de nouveaux marchés." Thesis, Université Côte d'Azur, 2020. https://tel.archives-ouvertes.fr/tel-03177526.
Full textIn the airline industry, problems are various and complicated. Solving these problems aims at reducing costs and maximizing revenues. Revenues can be increased while improving the quality of service. For example, one way is to catch new passengers on existing flight connections or on new markets. The selection of new markets consists in determining network structure to operate, and to estimate passengers flow, their choice of itineraries as well as incomes and costs incurred by these decisions. Our research is about improving market planner engine. Milanamos develops an application for the analysis and simulation of markets intended for air-ports and airlines. It offers its customers a decision-making tool to analyze historical data andto simulate markets in order to find an economic opportunity. This project takes place earlierin the decision process. Thanks to a thorough data analysis, the air transport network could be modelized as a time-independent graph and stored in the Neo4j graph database. We then defined the Flight Radius problem which resolution allows to determine a sub-network centered around a flight for which market shares of the flight are meaningful. Several methods have beenproposed based on queries or on shortest path algorithms combined with acceleration and parallelism techniques. Our algorithms identify some new markets for a flight. Combining graph theory with databases offers new opportunities for analyzing and studying large networks
Delanaux, Rémy. "Intégration de données liées respectueuse de la confidentialité." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1303.
Full textIndividual privacy is a major and largely unexplored concern when publishing new datasets in the context of Linked Open Data (LOD). The LOD cloud forms a network of interconnected and publicly accessible datasets in the form of graph databases modeled using the RDF format and queried using the SPARQL language. This heavily standardized context is nowadays extensively used by academics, public institutions and some private organizations to make their data available. Yet, some industrial and private actors may be discouraged by potential privacy issues. To this end, we introduce and develop a declarative framework for privacy-preserving Linked Data publishing in which privacy and utility constraints are specified as policies, that is sets of SPARQL queries. Our approach is data-independent and only inspects the privacy and utility policies in order to determine the sequence of anonymization operations applicable to any graph instance for satisfying the policies. We prove the soundness of our algorithms and gauge their performance through experimental analysis. Another aspect to take into account is that a new dataset published to the LOD cloud is indeed exposed to privacy breaches due to the possible linkage to objects already existing in the other LOD datasets. In the second part of this thesis, we thus focus on the problem of building safe anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms are robust in the presence of sameAs links that can be explicit or inferred by additional knowledge. To conclude, we evaluate the impact of this safety-preserving solution on given input graphs through experiments. We focus on the performance and the utility loss of this anonymization framework on both real-world and artificial data. We first discuss and select utility measures to compare the original graph to its anonymized counterpart, then define a method to generate new privacy policies from a reference one by inserting incremental modifications. We study the behavior of the framework on four carefully selected RDF graphs. We show that our anonymization technique is effective with reasonable runtime on quite large graphs (several million triples) and is gradual: the more specific the privacy policy is, the lesser its impact is. Finally, using structural graph-based metrics, we show that our algorithms are not very destructive even when privacy policies cover a large part of the graph. By designing a simple and efficient way to ensure privacy and utility in plausible usages of RDF graphs, this new approach suggests many extensions and in the long run more work on privacy-preserving data publishing in the context of Linked Open Data
Cori, Marcel. "Modèles pour la représentation et l'interrogation de données textuelles et de connaissances." Paris 7, 1987. http://www.theses.fr/1987PA077047.
Full textLebrun, Justine. "Appariement inexact de graphes appliqué à la recherche d'image et d'objet 3D." Phd thesis, Université de Cergy Pontoise, 2011. http://tel.archives-ouvertes.fr/tel-00643534.
Full textPennerath, Frédéric. "Méthodes d'extraction de connaissances à partir de données modélisables par des graphes : Application à des problèmes de synthèse organique." Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00436568.
Full textNgo, Duy Hoa. "Amélioration de l'alignement d'ontologies par les techniques d'apprentissage automatique, d'appariement de graphes et de recherche d'information." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2012. http://tel.archives-ouvertes.fr/tel-00767318.
Full textOshurko, Ievgeniia. "Knowledge representation and curation in hierarchies of graphs." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN024.
Full textThe task of automatically extracting insights or building computational models fromknowledge on complex systems greatly relies on the choice of appropriate representation.This work makes an effort towards building a framework suitable for representation offragmented knowledge on complex systems and its semi-automated curation---continuouscollation, integration, annotation and revision.We propose a knowledge representation system based on hierarchies of graphs relatedwith graph homomorphisms. Individual graphs situated in such hierarchies representdistinct fragments of knowledge and the homomorphisms allow relating these fragments.Their graphical structure can be used efficiently to express entities and their relations. Wefocus on the design of mathematical mechanisms, based on algebraic approaches to graphrewriting, for transformation of individual graphs in hierarchies that maintain consistentrelations between them. Such mechanisms provide a transparent audit trail, as well as aninfrastructure for maintaining multiple versions of knowledge.We describe how the developed theory can be used for building schema-aware graphdatabases that provide schema-data co-evolution capabilities. The proposed knowledgerepresentation framework is used to build the KAMI (Knowledge Aggregation and ModelInstantiation) framework for curation of cellular signalling knowledge. The frameworkallows for semi-automated aggregation of individual facts on protein-protein interactionsinto knowledge corpora, reuse of this knowledge for instantiation of signalling models indifferent cellular contexts and generation of executable rule-based models
Abbaci, Katia. "Contribution à l'interrogation flexible et personnalisée d'objets complexes modélisés par des graphes." Thesis, Rennes 1, 2013. http://www.theses.fr/2013REN1S105/document.
Full textSeveral application domains deal with complex objects whose structure and semantics of their components are crucial for their handling. For this, graph structure has been adopted, as a model of representation, in these areas to capture a maximum of information, related to the structure, semantics and behavior of such objects, necessary for effective representation and processing. Thus, when comparing two complex objects, a matching technique is applied between their graph structures. In this thesis, we are interested in approximate matching techniques which constitute suitable tools to automatically find and select the most similar graphs to user graph query. The aim of our work is to develop methods to personalized and flexible querying of repositories of complex objects modeled thanks to graphs and then to return the graphs results that fit best the users ’needs, often expressed partially and in an imprecise way. In a first time, we propose a flexible approach for Web service retrieval that relies both on preference satisfiability and structural similarity between process model graphs. This approach allows (i) to improve the matching process by integrating user preferences and the graph structural aspect, and (ii) to return the most relevant services. A second method for evaluating graph similarity queries is also presented. It retrieves graph similarity skyline of a user query by considering a vector of several graph distance measures instead of a single measure. Thus, graphs which are maximally similar to graph query are returned in an ordered way. Finally, refinement methods have been developed to reduce the size of the skyline when it is of a significant size. They aim to identify and order skyline points that match best the user query
Groz, Benoît. "XML security views : queries, updates and schemas." Thesis, Lille 1, 2012. http://www.theses.fr/2012LIL10143/document.
Full textThe evolution of web technologies and social trends fostered a shift from traditional enterprise databases to web services and online data. While making data more readily available to users, this evolution also raises additional security concerns regarding the privacy of users and more generally the disclosure of sensitive information. The implementation of appropriate access control models is one of the approaches to mitigate the threat. We investigate an access control model based on (non-materialized) XML views, as presented among others by Fan et al. The simplicity of such views, and in particular the absence of arithmetic features and restructuring, facilitates their modelization with tree alignments. Our objective is therefore to investigate how to manipulate efficiently such views, using formal methods, and especially query rewriting and tree automata. Our research follows essentially three directions: we first develop new algorithms to assess the expressivity of views, in terms of determinacy, query rewriting and certain answers. We show that those problems, although undecidable in our most general setting, can be decided under reasonable restrictions. Then we address the problem of handling updates in the security view framework. And last, we investigate the classical issues raised by schemata, focusing on the specific "determinism'' requirements of DTDs and XML Schemata. In particular, we survey some techniques to approximate the set of all possible view documents with a DTD, and we provide new algorithms to check if the content models of a DTD are deterministic
Vigny, Alexandre. "Query enumeration and nowhere dense graphs." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCC211.
Full textThe topic of my thesis lies between complexity, algorithmic and logic. In particular, we are interested in the complexity of evaluating query.More precisely, given G a finite graph. A query q defines a subset of k-tuples of vertices of G that we note q(G). We call k the arity of q and we then try to efficiently perform the following tasks:1) decide whether the set q G) is empty.2) decide whether a given k-tuplet belongs to the set of solutions q(G).3) calculate the number of solutions.4) enumerate the elements of q(G).Regarding the 4th task, an algorithm that will enumerate the solutions can be decomposed into two steps. The first is called preprocessing and is used to prepare the enumeration. Ideally this step only requires a time linear in the size of the graph. The second step is the enumeration properly speaking. The time needed to get a new solution is called the delay. Ideally we want the delay to not depend on the size of the graph but only on the size of the query. We then talk about constant delay enumeration after linear preprocessing.At the beginning of this thesis, a large part of the interrogations about classes of graphs for which a constant delay enumeration is possible seemed to be located around the classes of nowhere dense graphs
Pradel, Camille. "D'un langage de haut niveau à des requêtes graphes permettant d'interroger le web sémantique." Toulouse 3, 2013. http://thesesups.ups-tlse.fr/2237/.
Full textGraph models are suitable candidates for KR on the Web, where everything is a graph, from the graph of machines connected to the Internet, the "Giant Global Graph" as described by Tim Berners-Lee, to RDF graphs and ontologies. In that context, the ontological query answering problem is the following: given a knowledge base composed of a terminological component and an assertional component and a query, does the knowledge base implies the query, i. E. Is there an answer to the query in the knowledge base? Recently, new description logic languages have been proposed where the ontological expressivity is restricted so that query answering becomes tractable. The most prominent members are the DL-Lite and the EL families. In the same way, the OWL-DL language has been restricted and this has led to OWL2, based on the DL-Lite and EL families. We work in the framework of using graph formalisms for knowledge representation (RDF, RDF-S and OWL) and interrogation (SPARQL). Even if interrogation languages based on graphs have long been presented as a natural and intuitive way of expressing information needs, end-users do not think their queries in terms of graphs. They need simple languages that are as close as possible to natural language, or at least mainly limited to keywords. We propose to define a generic way of translating a query expressed in a high-level language into the SPARQL query language, by means of query patterns. The beginning of this work coincides with the current activity of the W3C that launches an initiative to prepare a possible new version of RDF and is in the process of standardizing SPARQL 1. 1 with entailments
Conde, Cespedes Patricia. "Modélisations et extensions du formalisme de l'analyse relationnelle mathématique à la modularisation des grands graphes." Paris 6, 2013. http://www.theses.fr/2013PA066654.
Full textGraphs are the mathematical representation of networks. Since a graph is a special type of binary relation, graph clustering (or modularization), can be mathematically modelled using the Mathematical Relational analysis. This modelling allows to compare numerous graph clustering criteria on the same type of formal representation. We give through a relational coding, the way of comparing different modularization criteria such as: Newman-Girvan, Zahn-Condorcet, Owsinski-Zadrozny, Demaine-Immorlica, Wei-Cheng, Profile Difference et Michalski-Goldberg. We introduce three modularization criteria: the Balanced Modularity, the deviation to Indetermination and the deviation to Uniformity. We identify the properties verified by those criteria and for some of those criteria, specially linear criteria, we characterize the partitions obtained by the optimization of these criteria. The final goal is to facilitate their understanding and their usefulness in some practical contexts, where their purposes become easily interpretable and understandable. Our results are tested by modularizing real networks of different sizes with the generalized Louvain algorithm
Kooli, Nihel. "Rapprochement de données pour la reconnaissance d'entités dans les documents océrisés." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0108/document.
Full textThis thesis focuses on entity recognition in documents recognized by OCR, driven by a database. An entity is a homogeneous group of attributes such as an enterprise in a business form described by the name, the address, the contact numbers, etc. or meta-data of a scientific paper representing the title, the authors and their affiliation, etc. Given a database which describes entities by its records and a document which contains one or more entities from this database, we are looking to identify entities in the document using the database. This work is motivated by an industrial application which aims to automate the image document processing, arriving in a continuous stream. We addressed this problem as a matching issue between the document and the database contents. The difficulties of this task are due to the variability of the entity attributes representation in the database and in the document and to the presence of similar attributes in different entities. Added to this are the record redundancy and typing errors in the database, and the alteration of the structure and the content of the document, caused by OCR. To deal with these problems, we opted for a two-step approach: entity resolution and entity recognition. The first step is to link the records referring to the same entity and to synthesize them in an entity model. For this purpose, we proposed a supervised approach based on a combination of several similarity measures between attributes. These measures tolerate character mistakes and take into account the word permutation. The second step aims to match the entities mentioned in documents with the resulting entity model. We proceeded by two different ways, one uses the content matching and the other integrates the structure matching. For the content matching, we proposed two methods: M-EROCS and ERBL. M-EROCS, an improvement / adaptation of a state of the art method, is to match OCR blocks with the entity model based on a score that tolerates the OCR errors and the attribute variability. ERBL is to label the document with the entity attributes and to group these labels into entities. The structure matching is to exploit the structural relationships between the entity labels to correct the mislabeling. The proposed method, called G-ELSE, is based on local structure graph matching with a structural model which is learned for this purpose. This thesis being carried out in collaboration with the ITESOFT-Yooz society, we have experimented all the proposed steps on two administrative corpuses and a third one extracted from the web
Mougel, Pierre-Nicolas. "Finding homogeneous collections of dense subgraphs using constraint-based data mining approaches." Thesis, Lyon, INSA, 2012. http://www.theses.fr/2012ISAL0073.
Full textThe work presented in this thesis deals with data mining approaches for the analysis of attributed graphs. An attributed graph is a graph where properties, encoded by means of attributes, are associated to each vertex. In such data, our objective is the discovery of subgraphs formed by several dense groups of vertices that are homogeneous with respect to the attributes. More precisely, we define the constraint-based extraction of collections of subgraphs densely connected and such that the vertices share enough attributes. To this aim, we propose two new classes of patterns along with sound and complete algorithms to compute them efficiently using constraint-based approaches. The first family of patterns, named Maximal Homogeneous Clique Set (MHCS), contains patterns satisfying constraints on the number of dense subgraphs, on the size of these subgraphs, and on the number of shared attributes. The second class of patterns, named Collection of Homogeneous k-clique Percolated components (CoHoP), is based on a relaxed notion of density in order to handle missing values. Both approaches are used for the analysis of scientific collaboration networks and protein-protein interaction networks. The extracted patterns exhibit structures useful in a decision support process. Indeed, in a scientific collaboration network, the analysis of such structures might give hints to propose new collaborations between researchers working on the same subjects. In a protein-protein interaction network, the analysis of the extracted patterns can be used to study the relationships between modules of proteins involved in similar biological situations. The analysis of the performances, on real and synthetic data, with respect to different attributed graph characteristics, shows that the proposed approaches scale well for large datasets
Taraviras, Stavros. "Évaluation de la diversité moléculaire des bases de données de molécules à intérêt pharmaceutique, en utilisant la théorie des graphes chimiques." Nice, 2000. http://www.theses.fr/2000NICE5472.
Full textDkhil, Abdellatif. "Identification systématique de structures visuelles de flux physique de production." Strasbourg, 2011. http://www.theses.fr/2011STRA6012.
Full textThis research is motivated by the competitive environment of manufacturing companies. It mainly concerns the design of physical production systems. Specifically, the framework study is performed during the preliminary design phase. This phase is particularly sensitive and plays a major role, where different point of views can be considered to realize the conceptual design. Only one view point concerning the static production flow is considered in this work. To generate a conceptual design depending on this point of view, a usual method of conceptual design elaboration is used. This method is introduced in many literatures. It looks like a string of data processing generated by three main activities. The first activity allows the extraction of data flow from product routing data. During the second activity, properties of analysis are used to analyze the data flow. The single or combined analysis results are called visual structures. The third activity allows the drawings of production flow graph using visual structures. After a literature review, 44 properties analysis are obtained. From these properties of analysis we can deduce 1. 75 1013 possible visual structures and the same number of production flow graphs. Recognizing this, a scientific problem of model reduction based on expert knowledge is defined. Here, the model reduction is a restriction process based on expert rules and validated with industrial data. Through this restriction process, three contributions are proposed. The first concerns the identification of referential properties of analysis which are considered the most useful and relevant in preliminary design phase. The second allows the identification of referential visual structures. The third contribution is a method to automatically identify the particular visual structures. In order to evaluate these contributions, an industrial case study is proposed
Rahman, Md Rashedur. "Knowledge Base Population based on Entity Graph Analysis." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS092/document.
Full textKnowledge Base Population (KBP) is an important and challenging task specially when it has to be done automatically. The objective of KBP task is to make a collection of facts of the world. A Knowledge Base (KB) contains different entities, relationships among them and various properties of the entities. Relation extraction (RE) between a pair of entity mentions from text plays a vital role in KBP task. RE is also a challenging task specially for open domain relations. Generally, relations are extracted based on the lexical and syntactical information at the sentence level. However, global information about known entities has not been explored yet for RE task. We propose to extract a graph of entities from the overall corpus and to compute features on this graph that are able to capture some evidence of holding relationships between a pair of entities. In order to evaluate the relevance of the proposed features, we tested them on a task of relation validation which examines the correctness of relations that are extracted by different RE systems. Experimental results show that the proposed features lead to outperforming the state-of-the-art system
Ayed, Rihab. "Recherche d’information agrégative dans des bases de graphes distribuées." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1305.
Full textIn this research, we are interested in investigating issues related to query evaluation and optimization in the framework of aggregated search. Aggregated search is a new paradigm to access massively distributed information. It aims to produce answers to queries by combining fragments of information from different sources. The queries search for objects (documents) that do not exist as such in the targeted sources, but are built from fragments extracted from the different sources. The sources might not be specified in the query expression, they are dynamically discovered at runtime. In our work, we consider data dependencies to propose a framework for optimizing query evaluation over distributed graph-oriented data sources. For this purpose, we propose an approach for the document indexing/orgranizing process of aggregated search systems. We consider information retrieval systems that are graph oriented (RDF graphs). Using graph relationships, our work is within relational aggregated search where relationships are used to aggregate fragments of information. Our goal is to optimize the access to source of information in a aggregated search system. These sources contain fragments of information that are relevant partially for the query. We aim at minimizing the number of sources to ask, also at maximizing the aggregation operations within a same source. For this, we propose to reorganize the graph database(s) in partitions, dedicated to aggregated queries. We use a semantic or strucutral clustering of RDF predicates. For structural clustering, we propose to use frequent subgraph mining algorithms, we performed for this, a comparative study of their performances. For semantic clustering, we use the descriptive metadata of RDF predicates and apply semantic textual similarity methods to calculate their relatedness. Following the clustering, we define query decomposing rules based on the semantic/structural aspects of RDF predicates
Jakawat, Wararat. "Graphs enriched by Cubes (GreC) : a new approach for OLAP on information networks." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2087/document.
Full textOnline Analytical Processing (OLAP) is one of the most important technologies in data warehouse systems, which enables multidimensional analysis of data. It represents a very powerful and flexible analysis tool to manage within the data deeply by operating computation. OLAP has been the subject of improvements and extensions across the board with every new problem concerning domain and data; for instance, multimedia, spatial data, sequence data and etc. Basically, OLAP was introduced to analyze classical structured data. However, information networks are yet another interesting domain. Extracting knowledge inside large networks is a complex task and too big to be comprehensive. Therefore, OLAP analysis could be a good idea to look at a more compressed view. Many kinds of information networks can help users with various activities according to different domains. In this scenario, we further consider bibliographic networks formed on the bibliographic databases. This data allows analyzing not only the productions but also the collaborations between authors. There are research works and proposals that try to use OLAP technologies for information networks and it is called Graph OLAP. Many Graph OLAP techniques are based on a cube of graphs.In this thesis, we propose a new approach for Graph OLAP that is graphs enriched by cubes (GreC). In a different and complementary way, our proposal consists in enriching graphs with cubes. Indeed, the nodes or/and edges of the considered network are described by a cube. It allows interesting analyzes for the user who can navigate within a graph enriched by cubes according to different granularity levels, with dedicated operators. In addition, there are four main aspects in GreC. First, GreC takes into account the structure of network in order to do topological OLAP operations and not only classical or informational OLAP operations. Second, GreC has a global view of a network considered with multidimensional information. Third, the slowly changing dimension problem is taken into account in order to explore a network. Lastly, GreC allows data analysis for the evolution of a network because our approach allows observing the evolution through the time dimensions in the cubes.To evaluate GreC, we implemented our approach and performed an experimental study on a real bibliographic dataset to show the interest of our proposal. GreC approach includes different algorithms. Therefore, we also validated the relevance and the performances of our algorithms experimentally
Echbarthi, Ghizlane. "Big Graph Processing : Partitioning and Aggregated Querying." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE1225/document.
Full textWith the advent of the "big data", many repercussions have taken place in all fields of information technology, advocating innovative solutions with the best compromise between cost and accuracy. In graph theory, where graphs provide a powerful modeling support for formalizing problems ranging from the simplest to the most complex, the search for NP-complete or NP-difficult problems is rather directed towards approximate solutions, thus Forward approximation algorithms and heuristics while exact solutions become extremely expensive and impossible to use. In this thesis we discuss two main problems: first, the problem of partitioning graphs is approached from a perspective big data, where massive graphs are partitioned in streaming. We study and propose several models of streaming partitioning and we evaluate their performances both theoretically and empirically. In a second step, we are interested in querying distributed / partitioned graphs. In this context, we study the problem of aggregative search in graphs, which aims to answer queries that interrogate several fragments of graphs and which is responsible for reconstructing the final response such that a Matching approached with the initial query
Kooli, Nihel. "Rapprochement de données pour la reconnaissance d'entités dans les documents océrisés." Electronic Thesis or Diss., Université de Lorraine, 2016. http://www.theses.fr/2016LORR0108.
Full textThis thesis focuses on entity recognition in documents recognized by OCR, driven by a database. An entity is a homogeneous group of attributes such as an enterprise in a business form described by the name, the address, the contact numbers, etc. or meta-data of a scientific paper representing the title, the authors and their affiliation, etc. Given a database which describes entities by its records and a document which contains one or more entities from this database, we are looking to identify entities in the document using the database. This work is motivated by an industrial application which aims to automate the image document processing, arriving in a continuous stream. We addressed this problem as a matching issue between the document and the database contents. The difficulties of this task are due to the variability of the entity attributes representation in the database and in the document and to the presence of similar attributes in different entities. Added to this are the record redundancy and typing errors in the database, and the alteration of the structure and the content of the document, caused by OCR. To deal with these problems, we opted for a two-step approach: entity resolution and entity recognition. The first step is to link the records referring to the same entity and to synthesize them in an entity model. For this purpose, we proposed a supervised approach based on a combination of several similarity measures between attributes. These measures tolerate character mistakes and take into account the word permutation. The second step aims to match the entities mentioned in documents with the resulting entity model. We proceeded by two different ways, one uses the content matching and the other integrates the structure matching. For the content matching, we proposed two methods: M-EROCS and ERBL. M-EROCS, an improvement / adaptation of a state of the art method, is to match OCR blocks with the entity model based on a score that tolerates the OCR errors and the attribute variability. ERBL is to label the document with the entity attributes and to group these labels into entities. The structure matching is to exploit the structural relationships between the entity labels to correct the mislabeling. The proposed method, called G-ELSE, is based on local structure graph matching with a structural model which is learned for this purpose. This thesis being carried out in collaboration with the ITESOFT-Yooz society, we have experimented all the proposed steps on two administrative corpuses and a third one extracted from the web
Galicia, Auyón Jorge Armando. "Revisiting Data Partitioning for Scalable RDF Graph Processing Combining Graph Exploration and Fragmentation for RDF Processing Query Optimization for Large Scale Clustered RDF Data RDFPart- Suite: Bridging Physical and Logical RDF Partitioning. Reverse Partitioning for SPARQL Queries: Principles and Performance Analysis. ShouldWe Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System? EXGRAF: Exploration et Fragmentation de Graphes au Service du Traitement Scalable de Requˆetes RDF." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2021. http://www.theses.fr/2021ESMA0001.
Full textThe Resource Description Framework (RDF) and SPARQL are very popular graph-based standards initially designed to represent and query information on the Web. The flexibility offered by RDF motivated its use in other domains and today RDF datasets are great information sources. They gather billions of triples in Knowledge Graphs that must be stored and efficiently exploited. The first generation of RDF systems was built on top of traditional relational databases. Unfortunately, the performance in these systems degrades rapidly as the relational model is not suitable for handling RDF data inherently represented as a graph. Native and distributed RDF systems seek to overcome this limitation. The former mainly use indexing as an optimization strategy to speed up queries. Distributed and parallel RDF systems resorts to data partitioning. The logical representation of the database is crucial to design data partitions in the relational model. The logical layer defining the explicit schema of the database provides a degree of comfort to database designers. It lets them choose manually or automatically (through advisors) the tables and attributes to be partitioned. Besides, it allows the partitioning core concepts to remain constant regardless of the database management system. This design scheme is no longer valid for RDF databases. Essentially, because the RDF model does not explicitly enforce a schema since RDF data is mostly implicitly structured. Thus, the logical layer is inexistent and data partitioning depends strongly on the physical implementations of the triples on disk. This situation contributes to have different partitioning logics depending on the target system, which is quite different from the relational model’s perspective. In this thesis, we promote the novel idea of performing data partitioning at the logical level in RDF databases. Thereby, we first process the RDF data graph to support logical entity-based partitioning. After this preparation, we present a partitioning framework built upon these logical structures. This framework is accompanied by data fragmentation, allocation, and distribution procedures. This framework was incorporated to a centralized (RDF_QDAG) and a distributed (gStoreD) triple store. We conducted several experiments that confirmed the feasibility of integrating our framework to existent systems improving their performances for certain queries. Finally, we design a set of RDF data partitioning management tools including a data definition language (DDL) and an automatic partitioning wizard
Simonne, Lucas. "Mining differential causal rules in knowledge graphs." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG008.
Full textThe mining of association rules within knowledge graphs is an important area of research.Indeed, this type of rule makes it possible to represent knowledge, and their application makes it possible to complete a knowledge graph by adding missing triples or to remove erroneous triples.However, these rules express associations and do not allow the expression of causal relations, whose semantics differ from an association or a correlation.In a system, a causal link between variable A and variable B is a relationship oriented from A to B. It indicates that a change in A causes a change in B, with the other variables in the system maintaining the same values.Several frameworks exist for determining causal relationships, including the potential outcome framework, which involves matching similar instances with different values on a variable named treatment to study the effect of that treatment on another variable named the outcome.In this thesis, we propose several approaches to define rules representing a causal effect of a treatment on an outcome.This effect can be local, i.e., valid for a subset of instances of a knowledge graph defined by a graph pattern, or average, i.e., valid on average for the whole set of graph instances.The discovery of these rules is based on the framework of studying potential outcomes by matching similar instances and comparing their RDF descriptions or their learned vectorial representations through graph embedding models
François, Hélène. "Synthèse de la parole par concaténation d'unités acoustiques : construction et exploitation d'une base de parole continue." Rennes 1, 2002. http://www.theses.fr/2002REN10127.
Full textAlchicha, Élie. "Confidentialité Différentielle et Blowfish appliquées sur des bases de données graphiques, transactionnelles et images." Thesis, Pau, 2021. http://www.theses.fr/2021PAUU3067.
Full textDigital data is playing crucial role in our daily life in communicating, saving information, expressing our thoughts and opinions and capturing our precious moments as digital pictures and videos. Digital data has enormous benefits in all the aspects of modern life but forms also a threat to our privacy. In this thesis, we consider three types of online digital data generated by users of social media and e-commerce customers: graphs, transactional, and images. The graphs are records of the interactions between users that help the companies understand who are the influential users in their surroundings. The photos posted on social networks are an important source of data that need efforts to extract. The transactional datasets represent the operations that occurred on e-commerce services.We rely on a privacy-preserving technique called Differential Privacy (DP) and its generalization Blowfish Privacy (BP) to propose several solutions for the data owners to benefit from their datasets without the risk of privacy breach that could lead to legal issues. These techniques are based on the idea of recovering the existence or non-existence of any element in the dataset (tuple, row, edge, node, image, vector, ...) by adding respectively small noise on the output to provide a good balance between privacy and utility.In the first use case, we focus on the graphs by proposing three different mechanisms to protect the users' personal data before analyzing the datasets. For the first mechanism, we present a scenario to protect the connections between users (the edges in the graph) with a new approach where the users have different privileges: the VIP users need a higher level of privacy than standard users. The scenario for the second mechanism is centered on protecting a group of people (subgraphs) instead of nodes or edges in a more advanced type of graphs called dynamic graphs where the nodes and the edges might change in each time interval. In the third scenario, we keep focusing on dynamic graphs, but this time the adversaries are more aggressive than the past two scenarios as they are planting fake accounts in the dynamic graphs to connect to honest users and try to reveal their representative nodes in the graph. In the second use case, we contribute in the domain of transactional data by presenting an existed mechanism called Safe Grouping. It relies on grouping the tuples in such a way that hides the correlations between them that the adversary could use to breach the privacy of the users. On the other side, these correlations are important for the data owners in analyzing the data to understand who might be interested in similar products, goods or services. For this reason, we propose a new mechanism that exposes these correlations in such datasets, and we prove that the level of privacy is similar to the level provided by Safe Grouping.The third use-case concerns the images posted by users on social networks. We propose a privacy-preserving mechanism that allows the data owners to classify the elements in the photos without revealing sensitive information. We present a scenario of extracting the sentiments on the faces with forbidding the adversaries from recognizing the identity of the persons. For each use-case, we present the results of the experiments that prove that our algorithms can provide a good balance between privacy and utility and that they outperform existing solutions at least in one of these two concepts
Alborzi, Seyed Ziaeddin. "Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction." Electronic Thesis or Diss., Université de Lorraine, 2018. http://www.theses.fr/2018LORR0035.
Full textThis thesis presents: 1) the development of a novel approach to find direct associations between pairs of elements linked indirectly through various common features, 2) the use of this approach to directly associate biological functions to protein domains (ECDomainMiner and GODomainMiner), and to discover domain-domain interactions, and finally 3) the extension of this approach to comprehensively annotate protein structures and sequences. ECDomainMiner and GODomainMiner are two applications to discover new associations between EC Numbers and GO terms to protein domains, respectively. They find a total of 20,728 and 20,318 non-redundant EC-Pfam and GO-Pfam associations, respectively, with F-measures of more than 0.95 with respect to a “Gold Standard” test set extracted from InterPro. Compared to around 1500 manually curated associations in InterPro, ECDomainMiner and GODomainMiner infer a 13-fold increase in the number of available EC-Pfam and GO-Pfam associations. These function-domain associations are then used to annotate thousands of protein structures and millions of protein sequences for which their domain composition is known but that currently lack experimental functional annotations. Using inferred function-domain associations and considering taxonomy information, thousands of annotation rules have automatically been generated. Then, these rules have been utilized to annotate millions of protein sequences in the TrEMBL database
Ounis, Iadh. "Un modèle d'indexation relationnel pour les graphes conceptuels fondé sur une interprétation logique." Phd thesis, Université Joseph Fourier (Grenoble), 1998. http://tel.archives-ouvertes.fr/tel-00004902.
Full textAhmadi, Naser. "A framework for the continuous curation of a knowledge base system." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS320.
Full textEntity-centric knowledge graphs (KGs) are becoming increasingly popular for gathering information about entities. The schemas of KGs are semantically rich, with many different types and predicates to define the entities and their relationships. These KGs contain knowledge that requires understanding of the KG’s structure and patterns to be exploited. Their rich data structure can express entities with semantic types and relationships, oftentimes domain-specific, that must be made explicit and understood to get the most out of the data. Although different applications can benefit from such rich structure, this comes at a price. A significant challenge with KGs is the quality of their data. Without high-quality data, the applications cannot use the KG. However, as a result of the automatic creation and update of KGs, there are a lot of noisy and inconsistent data in them and, because of the large number of triples in a KG, manual validation is impossible. In this thesis, we present different tools that can be utilized in the process of continuous creation and curation of KGs. We first present an approach designed to create a KG in the accounting field by matching entities. We then introduce methods for the continuous curation of KGs. We present an algorithm for conditional rule mining and apply it on large graphs. Next, we describe RuleHub, an extensible corpus of rules for public KGs which provides functionalities for the archival and the retrieval of rules. We also report methods for using logical rules in two different applications: teaching soft rules to pre-trained language models (RuleBert) and explainable fact checking (ExpClaim)
Hoonakker, Frank. "Graphes condensés de réactions, applications à la recherche par similarité, la classification et la modélisation." Université Louis Pasteur (Strasbourg) (1971-2008), 2008. https://publication-theses.unistra.fr/restreint/theses_doctorat/2008/HOONAKKER_Frank_2008.pdf.
Full textThis work is devoted to the developpement of new methods of mining of chemical reactions based on the Condensed Graph of Reaction (CGR) approach. A CGR integrates an information about all reactants and products of a given chemical reaction into one 2D molecular graph. Due to the application of both conventional (simple, double, etc. ) and dynamical (single to double, broken single, etc. ) bond types, a CGR ”condenses” a reaction (involving many molecules) into one pseudo-molecule. This formally allows one to apply to CGRs the chemoinformatics approaches earlier developed for individual compounds. Three possible applications of CGRs were considered: – unsupervised classification of reactions based on clustering algorithms; – reactions similarity search, and – Quantitative Structure Reactivity Relationships (QSRR). Model calculations performed on four databases containing from 1 000 to 200 000 reactions demonstrated high efficiency of the developed approaches and software tools. An system for optimizing reactions condition has been designed, and patented in the USA
Slama, Olfa. "Flexible querying of RDF databases : a contribution based on fuzzy logic." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S089/document.
Full textThis thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited