Tesis sobre el tema "Recherche de similarité"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Recherche de similarité".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Chilowicz, Michel. "Recherche de similarité dans du code source". Phd thesis, Université Paris-Est, 2010. http://tel.archives-ouvertes.fr/tel-00587628.
Texto completoOmhover, Jean-François. "Recherche d'images par similarité de contenus régionaux". Paris 6, 2004. http://www.theses.fr/2004PA066254.
Texto completoMichaud, Dorian. "Indexation bio-inspirée pour la recherche d'images par similarité". Thesis, Poitiers, 2018. http://www.theses.fr/2018POIT2288/document.
Texto completoImage Retrieval is still a very active field of image processing as the number of available image datasets continuously increases.One of the principal objectives of Content-Based Image Retrieval (CBIR) is to return the most similar images to a given query with respect to their visual content.Our work fits in a very specific application context: indexing small expert image datasets, with no prior knowledge on the images. Because of the image complexity, one of our contributions is the choice of effective descriptors from literature placed in direct competition.Two strategies are used to combine features: a psycho-visual one and a statistical one.In this context, we propose an unsupervised and adaptive framework based on the well-known bags of visual words and phrases models that select relevant visual descriptors for each keypoint to construct a more discriminative image representation.Experiments show the interest of using this this type of methodologies during a time when convolutional neural networks are ubiquitous.We also propose a study about semi interactive retrieval to improve the accuracy of CBIR systems by using the knowledge of the expert users
Risser-Maroix, Olivier. "Similarité visuelle et apprentissage de représentations". Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7327.
Texto completoThe objective of this CIFRE thesis is to develop an image search engine, based on computer vision, to assist customs officers. Indeed, we observe, paradoxically, an increase in security threats (terrorism, trafficking, etc.) coupled with a decrease in the number of customs officers. The images of cargoes acquired by X-ray scanners already allow the inspection of a load without requiring the opening and complete search of a controlled load. By automatically proposing similar images, such a search engine would help the customs officer in his decision making when faced with infrequent or suspicious visual signatures of products. Thanks to the development of modern artificial intelligence (AI) techniques, our era is undergoing great changes: AI is transforming all sectors of the economy. Some see this advent of "robotization" as the dehumanization of the workforce, or even its replacement. However, reducing the use of AI to the simple search for productivity gains would be reductive. In reality, AI could allow to increase the work capacity of humans and not to compete with them in order to replace them. It is in this context, the birth of Augmented Intelligence, that this thesis takes place. This manuscript devoted to the question of visual similarity is divided into two parts. Two practical cases where the collaboration between Man and AI is beneficial are proposed. In the first part, the problem of learning representations for the retrieval of similar images is still under investigation. After implementing a first system similar to those proposed by the state of the art, one of the main limitations is pointed out: the semantic bias. Indeed, the main contemporary methods use image datasets coupled with semantic labels only. The literature considers that two images are similar if they share the same label. This vision of the notion of similarity, however fundamental in AI, is reductive. It will therefore be questioned in the light of work in cognitive psychology in order to propose an improvement: the taking into account of visual similarity. This new definition allows a better synergy between the customs officer and the machine. This work is the subject of scientific publications and a patent. In the second part, after having identified the key components allowing to improve the performances of thepreviously proposed system, an approach mixing empirical and theoretical research is proposed. This secondcase, augmented intelligence, is inspired by recent developments in mathematics and physics. First applied tothe understanding of an important hyperparameter (temperature), then to a larger task (classification), theproposed method provides an intuition on the importance and role of factors correlated to the studied variable(e.g. hyperparameter, score, etc.). The processing chain thus set up has demonstrated its efficiency byproviding a highly explainable solution in line with decades of research in machine learning. These findings willallow the improvement of previously developed solutions
Damak, Leïla. "Corps du consommateur et design du produit : recherche d'une similarité ou d'une complémentarité ?" Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090029.
Texto completoThe purpose of this research is to propose and illustrate the self-congruity theory by studying the relationship between body aspects of the consumer and "body" aspects of a product design, where "body" equal the physical shape of any selected consumer product. Several studies had shown that the physical features of any selected product design (or the product form) congruent with the consumer's body characteristics would be influenced by body image and its correlates
Daoudi, Imane. "Recherche par similarité dans les grandes bases de données multimédia : application à la recherche par le contenu dans les bases d'images". Lyon, INSA, 2009. http://theses.insa-lyon.fr/publication/2009ISAL0057/these.pdf.
Texto completo[The emergence of digital multimedia data is increasing. Access, sharing and retrieval of these data have become the real needs. This requires the use of powerful tools and search engine for fast and efficient access to data. The spectacular growth of technologies and numeric requires the use of powerful tools and search engine for fast and efficient access to data. My thesis work is in the field of multimedia data especially images. The main objectives is to develop a fast and efficient indexing and searching method of the k nearest neighbour which is adapted for applications in Content-based image retrieval (CBIR) and for properties of image descriptors (high volume, large dimension, etc. ). The main idea is on one hand, to provide answers to the problems of scalability and the curse of dimensionality and the other to deal with similarity problems that arise in indexing and CBIR. We propose in this thesis two different approaches. The first uses a multidimensional indexing structure based on approximation approach or filtering, which is an improvement in the RA-Blocks method. The proposed method is based on the proposal of an algorithm of subdividing the data space which improves the storage capacity of the index and the CPU times. In a second approach, we propose a multidimensional indexing method suitable for heterogeneous data (colour, texture, shape). The second proposed method combines a non linear dimensionality reduction technique with a multidimensional indexing approach based on approximation. This combination allows one hand to deal with the curse of dimensionality scalability problems and also to exploit the properties of the non-linear space to find suitable similarity measurement for the nature of manipulated data. ]
Zahid, Youssef. "Recherche de similarité d'images à la base du modèle 2D string, application aux radiographies pulmonaires". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0005/MQ44991.pdf.
Texto completoHoonakker, Frank. "Graphes condensés de réactions, applications à la recherche par similarité, la classification et la modélisation". Université Louis Pasteur (Strasbourg) (1971-2008), 2008. https://publication-theses.unistra.fr/restreint/theses_doctorat/2008/HOONAKKER_Frank_2008.pdf.
Texto completoThis work is devoted to the developpement of new methods of mining of chemical reactions based on the Condensed Graph of Reaction (CGR) approach. A CGR integrates an information about all reactants and products of a given chemical reaction into one 2D molecular graph. Due to the application of both conventional (simple, double, etc. ) and dynamical (single to double, broken single, etc. ) bond types, a CGR ”condenses” a reaction (involving many molecules) into one pseudo-molecule. This formally allows one to apply to CGRs the chemoinformatics approaches earlier developed for individual compounds. Three possible applications of CGRs were considered: – unsupervised classification of reactions based on clustering algorithms; – reactions similarity search, and – Quantitative Structure Reactivity Relationships (QSRR). Model calculations performed on four databases containing from 1 000 to 200 000 reactions demonstrated high efficiency of the developed approaches and software tools. An system for optimizing reactions condition has been designed, and patented in the USA
Negrel, Romain. "Représentations optimales pour la recherche dans les bases d'images patrimoniales". Thesis, Cergy-Pontoise, 2014. http://www.theses.fr/2014CERG0703/document.
Texto completoIn the last decades, the development of scanning and storing technologies resulted in the development of many projects of cultural heritage digitization.The massive and continuous flow of numerical data in cultural heritage databases causes many problems for indexing.Indeed, it is no longer possible to perform a manual indexing of all data.To index and ease the access to data, many methods of automatic and semi-automatic indexing have been proposed in the last years.The current available methods for automatic indexing of non-textual documents (images, video, sound, 3D model, ...) are still too complex to implement for large volumes of data.In this thesis, we focus on the automatic indexing of images.To perform automatic or semi-automatic indexing, it is necessary to build an automatic method for evaluating the similarity between two images.Our work is based on image signature methods ; these methods involve summarising the visual content of each image in a signature (single vector), and then using these signatures to compute the similarity between two images.To extract the signatures, we use the following pipeline: First, we extract a large number of local descriptors of the image; Then we summarize all these descriptors in a large signature; Finally, we strongly reduce the dimensionality of the resulting signature.The state of the art signatures based on this pipeline provide very good performance in automatic indexing.However, these methods generally incur high storage and computational costs that make their implementation impossible on large volumes of data.In this thesis, our goal is twofold : First, we wish to improve the image signatures to achieve very good performance in automatic indexing problems ; Second, we want to reduce the cost of the processing chain to enable scalability.We propose to improve an image signature of the state of the art named VLAT (Vectors of Locally Aggregated Tensors).Our improvements increase the discriminative power of the signature.To reduce the size of the signatures, we perform linear projections of the signatures in a lower dimensional space.We propose two methods to compute the projectors while maintaining the performance of the original signatures.Our first approach is to compute the projectors that best approximate the similarities between the original signatures.The second method is based on the retrieval of quasi-copies; We compute the projectors that meet the constraints on the rank of retrieved images with respect to the query image.The most expensive step of the extraction pipeline is the dimentionality reduction step; these costs are due to the large dimentionality of the projectors.To reduce these costs, we propose to use sparse projectors by introducing a sparsity constraint in our methods.Since it is generally complex to solve an optimization problem with a strict sparsity constraint, we propose for each problem a method for approximating sparse projectors.This thesis work is the subject of experiments showing the practical value of the proposed methods in comparison with existing methods
Fotsoh, Tawaofaing Armel. "Recherche d’entités nommées complexes sur le web : propositions pour l’extraction et pour le calcul de similarité". Thesis, Pau, 2018. http://www.theses.fr/2018PAUU3003/document.
Texto completoRecent developments in information technologies have made the web an important data source. However, the web content is very unstructured. Therefore, it is a difficult task to automatically process this web content in order to extract relevant information. This is a reason why research work related to Information Extraction (IE) on the web are growing very quickly. Similarly, another very explored research area is the querying of information extracted on the web to answer an information need. This other research area is known as Information Retrieval (IR). Our research work is at the crossroads of both areas. The main goal of our work is to develop strategies and techniques for crawling the web in order to extract complex Named Entities (NEs) (NEs with several properties that may be text or other NEs). We then propose to index them and to query them in order to answer information needs. This work was carried out within the T2I team of the LIUPPA laboratory, in collaboration with Cogniteev, a company which core business is focused on the analysis of web content. The issues we had to deal with were the extraction of complex NEs on the web and the development of IR services supplied by the extracted data. Our first contribution is related to complex NEs extraction from text content. For this contribution, we take into consideration several problems, in particular the noisy context characterizing some properties (the web page describing an event for example, may contain more than one dates: the event’s date and the date of ticket’s sales opening). For this particular problem, we introduce a block detection module that focuses property's extraction on relevant text blocks. Our experiments show an improvement of system’s performances. We also focused on address extraction where the main issue arises from the fact that there is not a standard way for writing addresses in general and on the web in particular. We therefore propose a pattern-based approach which uses some lexicons for extracting addresses from text, regardless of proprietary resources.Our second contribution deals with similarity computation between complex NEs. In the state of the art, this similarity computation is generally performed in two steps: (i) first, similarities between properties are calculated; (ii) then the obtained similarities are aggregated to compute the overall similarity. Our main proposals focuses on the second step. We propose three techniques for aggregating property’s similarities. The first two are based on the weighted sum of these property’s similarities (simple linear combination and logistic regression). The third technique however, uses decision trees for the aggregation. Finally, we also propose a last approach based on clustering and Salton vector model. This last approach evaluates the similarity at the complex NE level without computing property’s similarities. We also propose a similarity computation function between spatial EN, one represented by a point and the other by a polygon. This completes those of the state of the art
Joly, Alexis. "Recherche par similarité statistique dans une grande base de signatures locales pour l'identification rapide d'extraits vidéo". La Rochelle, 2005. http://www.theses.fr/2005LAROS144.
Texto completoContent-based video indexing deals with techniques used to analyse and to exploit video databases without needs of any additional textual description. The work presented in this report is focused more precisely on content-based video copy detection, which is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. To overcome the difficulties due to the use of very large databases, both in terms of robustness and speed, we propose a complete original and efficient strategy. The first part of this report presents the particular context of copy detection and the signatures used to describe the content of the videos. The originality of our method is that it is based both on local signatures and on a global similarity measure computed after the search in the signatures database. This similarity measure is not only a vote like other classical local approaches but it includes a registration step between candidate objects and objects retrieved by the search. The second part presents the main contribution of the thesis: A new indexing and retrieval technique belonging to the approximate similarity search techniques family. Recent works shows that trading quality for time can be widely profitable to speed-up descriptors similarity search. Whereas all other approximate techniques deal with K Nearest Neighbors search, the principle of our method is to extend the approximate paradigm to range queries. The main originality consists in determining relevant regions of the space according a theoritical model for the distortions undergone by the signatures. The method allows to determine the optimal region of the space with a high controlled probability to contain the good answer. This search paradigm is called statistical query. In practice, to simplify the access to signatures, the relevant regions are determined by using an Hilbert space filling curve and the space partition that induces. The experiments show that the technique is sublinear in database size with an assymptotically linear behavior (but only for huge databases) and that the quality performances are stable. Furthermore, they highlight that statistical queries provide a very high speed-up compared to classical exact range queries. The third part is focused on the global system assessment and the description of three applications. The experiments show that the simple theoretical distortion model is efficient enough to control the effective probability to retrieve a descriptor. They also point out that approximate similarity search is particularly profitable when using local signatures since the lost of some search results does not affect the global robustness of the detection. Furthermore, the detection results are almost invariant to strong database size growing (three orders of magnitude). The proposed approach was integrated in a difered real-time TV monitoring system which is able to control 40 000 hours of videos. The high quantity and variability of the results of this system open new data mining perspectives
Casagrande, Annette. "Proposition d'une mesure de voisinage entre textes : Application à la veille stratégique". Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00773087.
Texto completoLannes, Romain. "Recherche de séquences environnementales inconnues d’intérêt médical/biologique par l’utilisation de grands réseaux de similarité de séquences". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS232.
Texto completoThe objective of this thesis was to identify as yet unknown microorganisms present in various environments and to characterize some of their metabolisms. This unidentified diversity, both taxonomic and functional, is commonly referred to as microbial dark matter. I have used and developed new network methods, including sequence similarity networks, to exploit very large sequence datasets from metagenomic projects. In particular, my work has highlighted the ecological role of ultra-small micro-organisms in some autotrophic metabolic pathways in the oceans. It also shows that CPR and DPANN, recently discovered ultra-small bacteria and archaea, participate in the dynamics of microbial communities through quorum sensing systems similar to those of better characterized organisms. An application of sequence similarity networks to meta-barcoding data also revealed a previously unknown diversity of Holozoans, which could allow us to better understand the transition to multicellularity of Metazoans. Finally, I have developed a method and software for searching for remote homologs of proteins of interest in very large datasets, such as those from metagenomics. This method, now validated, should make it possible to search for sequences belonging to still unknown and very divergent organisms, in the hope of discovering new deep branching phyla, or even new domains of life
Hoàng, Nguyen Vu. "Prise en compte des relations spatiales contextuelles dans la recherche d'images par contenu visuel". Paris 9, 2011. http://basepub.dauphine.fr/xmlui/handle/123456789/8202.
Texto completoThis thesis is focused on the study of methods for image retrieval by visual content in collection of heterogeneous contents. We are interested in the description of spatial relationships between the entities present in the images that can be symbolic objects or visual primitives such as interest points. The first part of this thesis is dedicated to a state of the art on the description of spatial relationship techniques. As a result of this study, we propose the approach Δ-TSR, our first contribution, which allows similarity search based on visual content by using the triangular relationships between entities in images. In our experiments, the entities are local visual features based on salient points represented in a bag of features model. This approach improves not only the quality of the images retrieval but also the execution time in comparison with other approaches in the literature. The second part is dedicated to the study of the image context. The spatial relationships between entities in an image allow creating the global description of the image that we call the image context. Taking into account the contextual spatial relationships in the similarity search of images can allow improving the retrieval quality by limiting false alarms. We defined the context of image as the presence of entity categories and their spatial relationships in the image. We studied the relationships between different entity categories on LabelMe, a state of the art of symbolic images databases of heterogeneous content. This statistical study, our second contribution, allows creating a cartography of their spatial relationships. It can be integrated in a graph-based model of the contextual relationships, our third contribution. This graph describes the general knowledge of every entity categories. Spatial reasoning on this knowledge graph can help improving tasks of image processing such as detection and localization of an entity category by using the presence of another reference. Further, this model can be applied to represent the context of an image. The similarity search based on context can be achieved by comparing the graphs, then, contextual similarity between two images is evaluated by the similarity between their graphs. This work was evaluated on the symbolic image database of LabelMe. The experiments showed its relevance for image retrieval by context
Iltache, Samia. "Modélisation ontologique pour la recherche d'information : évaluation de la similarité sémantique de textes et application à la détection de plagiats". Thesis, Toulouse 2, 2018. http://www.theses.fr/2018TOU20121.
Texto completoThe expansion of the web and the development of different information technologies have contributed to the proliferation of digital documents online. This availability of information has the advantage of making knowledge accessible to all. However, many problems emerged regarding access to relevant information that meets a user's need. The first problem is related to the extraction of the useful available information. A second problem concerns the use of this knowledge which sometimes results in plagiarism.The aim of this thesis is the development of a model that better characterizes documents to facilitate their access and also to detect those with a risk of plagiarism. This model is based on domain ontologies for the classification of documents and for calculating the similarity of documents belonging to the same domain as well. We are particularly interested in scientific papers, specifically their abstracts, short texts that are relatively well structured. The problem is, therefore, to determine how to assess the semantic proximity/similarity of two papers by examining their respective abstracts. Forasmuch as the domain ontology provides a useful way to represent knowledge relative to a given domain, our process is based on two actions:(i) An automatic classification of documents in a domain selected from several candidate domains. This classification determines the meaning of a document from the global context in which its content is used. (ii) A comparison of the texts performed on the basis of the construction of the semantic perimeter of each abstract and on a mutual enrichment performed when comparing the graphs of the abstracts. The semantic comparison of the abstracts is based on a segmentation of their respective content into zones, documentary units, reflecting their logical structure. It is on the comparison of the conceptual graphs of the zones playing the same role that the calculation of the similarity of the abstracts relies
Kouomou-Choupo, Anicet. "Améliorer la recherche par similarité dans une grande base d'images fixes par des techniques de fouilles de données". Phd thesis, Université Rennes 1, 2006. http://tel.archives-ouvertes.fr/tel-00524418.
Texto completoKouomou, Choupo Anicet. "Améliorer la recherche par similarité dans une grande base d'images fixes par des techniques de fouille de données". Rennes 1, 2006. https://tel.archives-ouvertes.fr/tel-00524418.
Texto completoZhou, Zhyiong. "Recherche d'images par le contenu application à la proposition de mots clés". Thesis, Poitiers, 2018. http://www.theses.fr/2018POIT2254.
Texto completoThe search for information in masses of multimedia data and the indexing of these large databases by the content are very current problems. They are part of a type of data management called Digital Asset Management (or DAM) ; The DAM uses image segmentation and data classification techniques.Our main contributions in this thesis can be summarized in three points : - Analysis of the possible uses of different methods of extraction of local characteristics using the VLAD technique.- Proposed a new method for extracting dominant color information in an image.- Comparison of Support Vector Machines (SVM) to different classifiers for the proposed indexing keywords. These contributions have been tested and validated on summary data and on actual data. Our methods were then widely used in the DAM ePhoto system developed by the company EINDEN, which financed the CIFRE thesis in which this work was carried out. The results are encouraging and open new perspectives for research
Abbadeni, Noureddine. "Recherche d'images basée sur le contenu visuel : représentations multiples, similarité et fusion de résultats : cas des images de texture". Thèse, Université de Sherbrooke, 2005. http://savoirs.usherbrooke.ca/handle/11143/5045.
Texto completoZargayouna, Haïfa. "Indexation sémantique de documents XML". Paris 11, 2005. http://www.theses.fr/2005PA112365.
Texto completoXML documents address new challenges and impose new methods for accessing information. They present the advantage of an explicit structure which facilitates their presentation and their exploitation in various contexts. The aim of Semi-structured Information Retrieval (SIR) is to take this structure into account and to integrate it to the representation of the content of semi-structured documents. The Semantic Web (SW) relies on the capacity of XML to define “personalised” tags and standards to describe the signification of the terminology used by means of formal ontologies. The use of ontologies in Information Retrieval has gained interest with the SW. We aim at showing that it is useful to have an intermediate representation of documents as a formal description of the textual content is expensive. In this work we propose new methods that take advantage of the structure and semantics of the documents. The proposed model relies on: 1. A generic model which allows to index documents with heterogeneous structure and provides a matching of these structures. 2. A query language which, unlike the existing query languages, is intuitive and has an XML syntax. The proposed language enables to ask requests on semi-structured documents by keywords and with vague conditions on structure. The semantics are handled in a completely transparent way for the user
Peterlongo, Pierre. "Filtrage de séquences d'ADN pour la recherche de longues répétitions multiples". Phd thesis, Université de Marne la Vallée, 2006. http://tel.archives-ouvertes.fr/tel-00132300.
Texto completode manière exponentielle. D'autre part, la recherche dans le domaine
implique de nouvelles questions dont les formulations in silico
génèrent des problèmes algorithmiquement difficiles à résoudre.
Parmi ces problèmes, certains concernent notamment l'étude de réarrangements génomiques dont les duplications et les éléments transposables. Ils imposent que l'on soit en mesure de détecter précisément et efficacement de longues répétitions approchées et multiples dans les génomes. Par répétition multiple, nous désignons
des répétitions ayant au moins deux copies dans une séquence d'ADN, ou ayant des copies dans au moins deux séquences d'ADN distinctes. De plus, ces répétitions sont approchées dans le sens où des erreurs existent entre les copies d'une même répétition.
La recherche de répétitions approchées multiples peut être résolue par des algorithmes d'alignements multiples locaux mais ceux-ci présentent une complexité exponentielle en la taille de l'entrée, et ne sont donc pas applicables à des données aussi grandes que des génomes. C'est pourquoi, de nouvelles techniques doivent être créées pour répondre à ces nouveaux besoins.
Dans cette thèse, une approche de filtrage des séquences d'ADN est
proposée. Le but d'une telle approche est de supprimer rapidement et
efficacement, parmi des textes représentant des séquences d'ADN, de
larges portions ne pouvant pas faire partie de répétitions. Les données filtrées, limitées en majorité aux portions pertinentes, peuvent alors être fournies en entrée d'un algorithme d'alignement multiple local.
Les filtres proposés appliquent une condition nécessaire aux séquences pour n'en conserver que les portions qui la respectent. Les travaux que nous présentons ont porté sur la création de conditions de filtrage, à la fois efficaces et simples à appliquer d'un point de vue algorithmique. À partir de ces conditions de filtrage, deux filtres, Nimbus et Ed'Nimbus, ont été créés. Ces filtres sont appelés exacts car il ne suppriment jamais de données contenant effectivement des occurrences de répétitions respectant les caractéristiques fixées par un utilisateur. L'efficacité du point de vue de la simplicité d'application et de celui de la précision du filtrage obtenu, conduit à de très bons résultats en pratique. Par exemple, le temps utilisé par des algorithmiques de recherche de répétitions ou d'alignements multiples peut être réduit de plusieurs ordres de grandeur en utilisant les filtres proposés.
Il est important de noter que les travaux présentés dans cette thèse
sont inspirés par une problématique biologique mais ils sont également généraux et peuvent donc être appliqués au filtrage de tout type de textes afin d'y détecter de grandes portions répétées.
Dorval, Thierry. "Approches saillantes et psycho-visuelles pour l'indexation d'images couleurs". Paris 6, 2004. http://www.theses.fr/2004PA066096.
Texto completoLuca, Aurélie de. "Espaces chimiques optimaux pour la recherche par similarité, la classification et la modélisation de réactions chimiques représentées par des graphes condensés de réactions". Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAF027.
Texto completoThis thesis aims to develop an approach based on the Condensed Graph of Reaction (CGR) method able to (i) select an optimal descriptor space the best separating different reaction classes, and (ii) to prepare special descriptors to be used in obtaining predictive structure-reactivity models. This methodology has been applied to similarity search studies in a database containing 8 different reaction classes, and to visualization of its chemical space using Kohonen maps and Generative Topographic Mapping. Another part of the thesis concerns development of predictive models for pKa and for optimal conditions for different types of Michael reaction involving both CGR-based and Electronic Effect Descriptors
Chaouch, Mohamed. "Recherche par le contenu d'objets 3D". Phd thesis, Télécom ParisTech, 2009. http://pastel.archives-ouvertes.fr/pastel-00005168.
Texto completoChaouch, Mohamed. "Recherche par le contenu d'objets 3D". Phd thesis, Paris, ENST, 2009. https://pastel.hal.science/pastel-00005168.
Texto completoThis thesis deals with 3D shape similarity search. We focus on the main steps of the 3D shape matching process: normalization of 3D models, signature extraction from models, and similarity measure. The first part of the thesis concerns the normalization of 3D models, in particular the search for the optimal pose. We propose a new alignment method of 3D models based on the reflective symmetry and the local translational symmetry. We use the properties of the principal component analysis with respect to the planar reflective symmetry in order to select the eventual optimal alignment axes. The second part of the thesis is dedicated to the shape descriptors and the associated similarity measures. Firstly, we propose a new 3D descriptor, called 3D Gaussian descriptor, which is derived from the Gauss transform. Based on a partition of the enclosing 3D model space, this descriptor provides a local characterization of the boundary of the shape. Secondly, we study the multi-views based approaches that characterize the 3D model using their projection images. We introduce an augmented approach, named Enhanced Multi-views Approach, which can be applied in most of the multi-views descriptors. The relevance indices are defined and used in the similarity computation in order to normalize the contributions of the projections in the 3D-shape description. Finally, we propose a robust 3D shape indexing approach, called Depth Line Approach, which is based on the appearance of a set of depth-buffer images. To extract a compact signature, we introduce a sequencing method that transforms the depth lines into sequences. Retrieval is improved by using dynamic programming to compare sequence
Trouvilliez, Benoît. "Similarités de données textuelles pour l'apprentissage de textes courts d'opinions et la recherche de produits". Thesis, Artois, 2013. http://www.theses.fr/2013ARTO0403/document.
Texto completoThis Ph.D. thesis is about the establishment of textual data similarities in the client relation domain. Two subjects are mainly considered : - the automatic analysis of short messages in response of satisfaction surveys ; - the search of products given same criteria expressed in natural language by a human through a conversation with a program. The first subject concerns the statistical informations from the surveys answers. The ideas recognized in the answers are identified, organized according to a taxonomy and quantified. The second subject concerns the transcription of some criteria over products into queries to be interpreted by a database management system. The number of criteria under consideration is wide, from simplest criteria like material or brand, until most complex criteria like color or price. The two subjects meet on the problem of establishing textual data similarities thanks to NLP techniques. The main difficulties come from the fact that the texts to be processed, written in natural language, are short ones and with lots of spell checking errors and negations. Establishment of semantic similarities between words (synonymy, antonymy, ...) and syntactic relations between syntagms (conjunction, opposition, ...) are other issues considered in our work. We also study in this Ph. D. thesis automatic clustering and classification methods in order to analyse answers to satisfaction surveys
Aimé, Xavier. "Gradients de prototypicalité, mesures de similarité et de proximité sémantique : une contribution à l'Ingénierie des Ontologies". Phd thesis, Université de Nantes, 2011. http://tel.archives-ouvertes.fr/tel-00660916.
Texto completoBenmokhtar, Rachid. "Fusion multi-niveaux pour l'indexation et la recherche multimédia par le contenu sémantique". Phd thesis, Télécom ParisTech, 2009. http://pastel.archives-ouvertes.fr/pastel-00005321.
Texto completoRalalason, Bachelin. "Représentation multi-facette des documents pour leur accès sémantique". Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00550650.
Texto completoNgo, Duy Hoa. "Enhancing Ontology Matching by Using Machine Learning, Graph Matching and Information Retrieval Techniques". Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20096/document.
Texto completoIn recent years, ontologies have attracted a lot of attention in the Computer Science community, especially in the Semantic Web field. They serve as explicit conceptual knowledge models and provide the semantic vocabularies that make domain knowledge available for exchange and interpretation among information systems. However, due to the decentralized nature of the semantic web, ontologies are highlyheterogeneous. This heterogeneity mainly causes the problem of variation in meaning or ambiguity in entity interpretation and, consequently, it prevents domain knowledge sharing. Therefore, ontology matching, which discovers correspondences between semantically related entities of ontologies, becomes a crucial task in semantic web applications.Several challenges to the field of ontology matching have been outlined in recent research. Among them, selection of the appropriate similarity measures as well as configuration tuning of their combination are known as fundamental issues that the community should deal with. In addition, verifying the semantic coherent of the discovered alignment is also known as a crucial task. Furthermore, the difficulty of the problem grows with the size of the ontologies. To deal with these challenges, in this thesis, we propose a novel matching approach, which combines different techniques coming from the fields of machine learning, graph matching and information retrieval in order to enhance the ontology matching quality. Indeed, we make use of information retrieval techniques to design new effective similarity measures for comparing labels and context profiles of entities at element level. We also apply a graph matching method named similarity propagation at structure level that effectively discovers mappings by exploring structural information of entities in the input ontologies. In terms of combination similarity measures at element level, we transform the ontology matching task into a classification task in machine learning. Besides, we propose a dynamic weighted sum method to automatically combine the matching results obtained from the element and structure level matchers. In order to remove inconsistent mappings, we design a new fast semantic filtering method. Finally, to deal with large scale ontology matching task, we propose two candidate selection methods to reduce computational space.All these contributions have been implemented in a prototype named YAM++. To evaluate our approach, we adopt various tracks namely Benchmark, Conference, Multifarm, Anatomy, Library and Large BiomedicalOntologies from the OAEI campaign. The experimental results show that the proposed matching methods work effectively. Moreover, in comparison to other participants in OAEI campaigns, YAM++ showed to be highly competitive and gained a high ranking position
Linardi, Michele. "Variable-length similarity search for very large data series : subsequence matching, motif and discord detection". Electronic Thesis or Diss., Sorbonne Paris Cité, 2019. http://www.theses.fr/2019USPCB056.
Texto completoData series (ordered sequences of real valued points, a.k.a. time series) has become one of the most important and popular data-type, which is present in almost all scientific fields. For the last two decades, but more evidently in this last period the interest in this data-type is growing at a fast pace. The reason behind this is mainly due to the recent advances in sensing, networking, data processing and storage technologies, which have significantly assisted the process of generating and collecting large amounts of data series. Data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different data mining problems, such as Clustering, Subsequence Matching, Imputation of Missing Values, Motif Discovery, and Anomaly detection work by means of similarity search. Data series indexes have been proposed for fast similarity search. Nevertheless all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this regard, all solutions for the aforementioned problems require the prior knowledge of the series length, on which similarity search is performed. Consequently, the user must know the length of the expected results, which is often an unrealistic assumption. This aspect is thus of paramount importance. In several cases, the length is a critical parameter that heavily influences the quality of the final outcome. In this thesis, we propose scalable solutions that enable variable-length analysis of very large data series collections. We propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length. Our contribution is two-fold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk based index visits and in-memory sequential scans. Our approach supports non Z-normalized and Z-normalized sequences, and can be used with no changes with both Euclidean Distance and Dynamic Time Warping, for answering both κ-NN and ε-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches. Subsequently, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. The experimental evaluation we conducted over several diverse real datasets show that our approaches are up to orders of magnitude faster than the alternatives. We moreover demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed
Navarro, Emmanuel. "Métrologie des graphes de terrain, application à la construction de ressources lexicales et à la recherche d'information". Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2013. http://tel.archives-ouvertes.fr/tel-01020232.
Texto completoNgo, Duy Hoa. "Amélioration de l'alignement d'ontologies par les techniques d'apprentissage automatique, d'appariement de graphes et de recherche d'information". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2012. http://tel.archives-ouvertes.fr/tel-00767318.
Texto completoKessler, Rémy. "Traitement automatique d'informations appliqué aux ressources humaines". Phd thesis, Université d'Avignon, 2009. http://tel.archives-ouvertes.fr/tel-00453642.
Texto completoPoulard, Fabien B. "Détection de dérivation de texte". Nantes, 2011. http://www.theses.fr/2011NANT2023.
Texto completoThanks to the Internet, the production and publication of content is possible with ease and speed. This possibility raises the issue of controling the origins of this content. This work focuses on detecting derivation links between texts. A derivation link associates a derivative text and the pre-existing texts from which it was written. We focused on the task of identifying derivative texts given a source text for various forms of derivation. Our rst contribution is the denition of a theoretical framework denes the concept of derivation as well as a model framing the dierent forms of derivation. Then, we set up an experimental framework consisting of free software tools, evaluation corpora and evaluation metrics based on IR. The Piithie and Wikinews corpora we have developed are to our knowledge the only ones in French for the evaluation of the detection of derivation links. Finally, we explored dierent methods of detection based on the signature-based approach. In particular, we have introduced the notions of specicity and invariance to guide the choice of descriptors used to modelize the texts in the expectation of their comparison. Our results show that the choice of motivated descriptors, including linguistically motivated ones, can reduce the size of the modelization of texts, and therefore the cost of the method, while oering performances comparable to the much more voluminous state of the art approach
Zaharia, Alexandra. "Identification des motifs de voisinage conservés dans des contextes métaboliques et génomiques". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS275/document.
Texto completoThis thesis fits within the field of systems biology and addresses a problem related to heterogeneous biological networks. It focuses on the relationship between metabolism and genomic context through a graph mining approach.It is well-known that succeeding enzymatic steps involving products of genes in close proximity on the chromosome translate an evolutionary advantage in maintaining this neighborhood relationship at both the metabolic and genomic levels. We therefore choose to focus on the detection of neighboring reactions being catalyzed by products of neighboring genes, where the notion of neighborhood may be modulated by allowing the omission of several reactions and/or genes. More specifically, the sought motifs are trails of reactions (meaning reaction sequences in which reactions may be repeated, but not the links between them). Such neighborhood motifs are referred to as metabolic and genomic patterns.In addition, we are also interested in detecting conserved metabolic and genomic patterns, meaning similar patterns across multiple species. Among the possible variations for a conserved pattern, the presence/absence of reactions and/or genes may be considered, or the different order of reactions and/or genes.A first development proposes algorithms and methods for the identification of conserved metabolic and genomic patterns. These methods are implemented in an open-source pipeline called CoMetGeNe (COnserved METabolic and GEnomic NEighborhoods). By means of this pipeline, we analyze a data set of 50 bacterial species, using data extracted from the KEGG knowledge base.A second development explores the detection of conserved patterns by taking into account the chemical similarity between reactions. This allows for the detection of a class of conserved metabolic modules in which neighboring genes are involved
Lully, Vincent. "Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme". Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUL196.
Texto completoThis thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies
Peng, Botao. "Parrallel data series indexing and similarity search on modern hardware". Electronic Thesis or Diss., Université Paris Cité, 2020. http://www.theses.fr/2020UNIP5193.
Texto completoData series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this Ph.D. work, we present the first data series indexing solutions that are designed to inherently take advantage of modern hardware, in order to accelerate similarity search processing times for both on-disk and in-memory data. In particular, we develop novel algorithms for multi-core, multi-socket, and Single Instruction Multiple Data (SIMD) architectures, as well as algorithms for Graphics Processing Units (GPUs). Our experiments on a variety of synthetic and real data demonstrate that our approaches are up to orders of magnitude faster than the state-of-the-art solutions for both disk-resident and in-memory data. More specifically, our on-disk solution can answer exact similarity search queries on 100GB datasets in ∼ 15 seconds, and our in-memory solution in as low as 36 milliseconds, which enables for the first time real-time, interactive data exploration on very large data series collections
Muhammad, Fuad Muhammad Marwan. "Similarity Search in High-dimensional Spaces with Applications to Time Series Data Mining and Information Retrieval". Phd thesis, Université de Bretagne Sud, 2011. http://tel.archives-ouvertes.fr/tel-00619953.
Texto completoMorère, Olivier André Luc. "Deep learning compact and invariant image representations for instance retrieval". Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066406.
Texto completoImage instance retrieval is the problem of finding an object instance present in a query image from a database of images. Also referred to as particular object retrieval, this problem typically entails determining with high precision whether the retrieved image contains the same object as the query image. Scale, rotation and orientation changes between query and database objects and background clutter pose significant challenges for this problem. State-of-the-art image instance retrieval pipelines consist of two major steps: first, a subset of images similar to the query are retrieved from the database, and second, Geometric Consistency Checks (GCC) are applied to select the relevant images from the subset with high precision. The first step is based on comparison of global image descriptors: high-dimensional vectors with up to tens of thousands of dimensions rep- resenting the image data. The second step is computationally highly complex and can only be applied to hundreds or thousands of images in practical applications. More discriminative global descriptors result in relevant images being more highly ranked, resulting in fewer images that need to be compared pairwise with GCC. As a result, better global descriptors are key to improving retrieval performance and have been the object of much recent interest. Furthermore, fast searches in large databases of millions or even billions of images requires the global descriptors to be compressed into compact representations. This thesis will focus on how to achieve extremely compact global descriptor representations for large-scale image instance retrieval. After introducing background concepts about supervised neural networks, Restricted Boltzmann Machine (RBM) and deep learning in Chapter 2, Chapter 3 will present the design principles and recent work for the Convolutional Neural Networks (CNN), which recently became the method of choice for large-scale image classification tasks. Next, an original multistage approach for the fusion of the output of multiple CNN is proposed. Submitted as part of the ILSVRC 2014 challenge, results show that this approach can significantly improve classification results. The promising perfor- mance of CNN is largely due to their capability to learn appropriate high-level visual representations from the data. Inspired by a stream of recent works showing that the representations learnt on one particular classification task can transfer well to other classification tasks, subsequent chapters will focus on the transferability of representa- tions learnt by CNN to image instance retrieval…
Poulard, Fabien. "Détection de dérivation de texte". Phd thesis, Université de Nantes, 2011. http://tel.archives-ouvertes.fr/tel-00590708.
Texto completoZapletal, Eric. "Un environnement collaboratif sur Internet pour l'aide au consensus en anatomie pathologie : la plateforme IDEM". Paris 6, 2006. http://www.theses.fr/2006PA066590.
Texto completoWang, Peng. "Historical handwriting representation model dedicated to word spotting application". Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4019/document.
Texto completoAs more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification
Philippeau, Jérémy. "Apprentissage de similarités pour l'aide à l'organisation de contenus audiovisuels". Toulouse 3, 2009. http://thesesups.ups-tlse.fr/564/.
Texto completoIn the perspective of new usages in the field of the access to audiovisual archives, we have created a semi-automatic system that helps a user to organize audiovisual contents while performing tasks of classification, characterization, identification and ranking. To do so, we propose to use a new vocabulary, different from the one already available in INA documentary notices, to answer needs which can not be easily defined with words. We have conceived a graphical interface based on graph formalism designed to express an organisational task. The digital similarity is a good tool in respect with the handled elements which are informational objects shown on the computer screen and the automatically extracted audio and video low-level features. We have made the choice to estimate the similarity between those elements with a predictive process through a statistical model. Among the numerous existing models, the statistical prediction based on the univaried regression and on support vectors has been chosen. H)
Morère, Olivier André Luc. "Deep learning compact and invariant image representations for instance retrieval". Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066406.
Texto completoImage instance retrieval is the problem of finding an object instance present in a query image from a database of images. Also referred to as particular object retrieval, this problem typically entails determining with high precision whether the retrieved image contains the same object as the query image. Scale, rotation and orientation changes between query and database objects and background clutter pose significant challenges for this problem. State-of-the-art image instance retrieval pipelines consist of two major steps: first, a subset of images similar to the query are retrieved from the database, and second, Geometric Consistency Checks (GCC) are applied to select the relevant images from the subset with high precision. The first step is based on comparison of global image descriptors: high-dimensional vectors with up to tens of thousands of dimensions rep- resenting the image data. The second step is computationally highly complex and can only be applied to hundreds or thousands of images in practical applications. More discriminative global descriptors result in relevant images being more highly ranked, resulting in fewer images that need to be compared pairwise with GCC. As a result, better global descriptors are key to improving retrieval performance and have been the object of much recent interest. Furthermore, fast searches in large databases of millions or even billions of images requires the global descriptors to be compressed into compact representations. This thesis will focus on how to achieve extremely compact global descriptor representations for large-scale image instance retrieval. After introducing background concepts about supervised neural networks, Restricted Boltzmann Machine (RBM) and deep learning in Chapter 2, Chapter 3 will present the design principles and recent work for the Convolutional Neural Networks (CNN), which recently became the method of choice for large-scale image classification tasks. Next, an original multistage approach for the fusion of the output of multiple CNN is proposed. Submitted as part of the ILSVRC 2014 challenge, results show that this approach can significantly improve classification results. The promising perfor- mance of CNN is largely due to their capability to learn appropriate high-level visual representations from the data. Inspired by a stream of recent works showing that the representations learnt on one particular classification task can transfer well to other classification tasks, subsequent chapters will focus on the transferability of representa- tions learnt by CNN to image instance retrieval…
Kessler, Rémy. "Traitement automatique d’informations appliqué aux ressources humaines". Thesis, Avignon, 2009. http://www.theses.fr/2009AVIG0167/document.
Texto completoSince the 90s, Internet is at the heart of the labor market. First mobilized on specific expertise, its use spreads as increase the number of Internet users in the population. Seeking employment through "electronic employment bursary" has become a banality and e-recruitment something current. This information explosion poses various problems in their treatment with the large amount of information difficult to manage quickly and effectively for companies. We present in this PhD thesis, the work we have developed under the E-Gen project, which aims to create tools to automate the flow of information during a recruitment process.We interested first to the problems posed by the routing of emails. The ability of a companie to manage efficiently and at lower cost this information flows becomes today a major issue for customer satisfaction. We propose the application of learning methods to perform automatic classification of emails to their routing, combining technical and probabilistic vector machines support. After, we present work that was conducted as part of the analysis and integration of a job ads via Internet. We present a solution capable of integrating a job ad from an automatic or assisted in order to broadcast it quickly. Based on a combination of classifiers systems driven by a Markov automate, the system gets very good results. Thereafter, we present several strategies based on vectorial and probabilistic models to solve the problem of profiling candidates according to a specific job offer to assist recruiters. We have evaluated a range of measures of similarity to rank candidatures by using ROC curves. Relevance feedback approach allows to surpass our previous results on this task, difficult, diverse and higly subjective
Al-Natsheh, Hussein. "Text Mining Approaches for Semantic Similarity Exploration and Metadata Enrichment of Scientific Digital Libraries". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2062.
Texto completoFor scientists and researchers, it is very critical to ensure knowledge is accessible for re-use and development. Moreover, the way we store and manage scientific articles and their metadata in digital libraries determines the amount of relevant articles we can discover and access depending on what is actually meant in a search query. Yet, are we able to explore all semantically relevant scientific documents with the existing keyword-based search information retrieval systems? This is the primary question addressed in this thesis. Hence, the main purpose of our work is to broaden or expand the knowledge spectrum of researchers working in an interdisciplinary domain when they use the information retrieval systems of multidisciplinary digital libraries. However, the problem raises when such researchers use community-dependent search keywords while other scientific names given to relevant concepts are being used in a different research community.Towards proposing a solution to this semantic exploration task in multidisciplinary digital libraries, we applied several text mining approaches. First, we studied the semantic representation of words, sentences, paragraphs and documents for better semantic similarity estimation. In addition, we utilized the semantic information of words in lexical databases and knowledge graphs in order to enhance our semantic approach. Furthermore, the thesis presents a couple of use-case implementations of our proposed model
Francois, Nicolas. "Alignement, séquence consensus, recherche de similarités : complexité et approximabilité". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2005. http://tel.archives-ouvertes.fr/tel-00108020.
Texto completoconcernant la comparaison de séquences biologiques. Nous nous pla¸cons successivement du point de vue de
chacune des trois principales théories de la complexité algorithmique : la NP-complétude, l'approximabilité
et la complexité paramétrique.
Dans un premier temps, nous considérons plusieurs formes du problème de l'extraction des motifs communs
à un ensemble de séquences donné. Les motifs communs permettent, en pratique, de classifier les protéines
grâce à leur structure primaire, par exemple en fabriquant des séquences consensus.
En particulier, le problème de la médiane (resp. du centre) pour la distance d'édition consiste à rechercher
une séquence consensus minimisant la somme (resp. le maximum) des distances d'édition la séparant de
chacune des séquences prises en entrée. Nous affinons les résultats connus sur la difficulté de chacun de ces
deux problèmes : nous montrons, par exemple, qu'ils sont tous les deux W[1]-difficiles lorsqu'on les
paramétrise par le nombre des séquences étudiées et ce, même dans le cas d'un alphabet binaire. Nous
considérons également le problème de la plus longue sous-séquence commune. Ce problème a été
exhaustivement étudié dans sa forme usuelle. Or, on trouve dans la nature des séquences d'ADN et d'ARN
circulaires qu'il est utile de comparer. Dans ce mémoire, nous menons à bien la première étude du problème
de la plus longue sous-séquence commune à plusieurs séquences circulaires et/ou non orientées.
Dans un second temps, nous considérons plusieurs problèmes liés à la recherche de similarités approchées
entre séquences biologiques. C'est dans ce domaine que l'application de l'informatique à la biologie
moléculaire a été la plus fructueuse. En pratique les similarités permettent de déterminer les propriétés des
molécules nouvellement séquencées à l'aide de celles des séquences déjà annotées. En effet, une similarité en
séquence entraîne généralement une similarité en structure ou en fonction.
La plupart des nombreux logiciels dédiés à la détection de similarités locales, mettent en oeuvre des filtres
heuristiques : deux portions de séquences ne possédant pas certains motifs spécifiques en commun sont
considérées d'emblée comme dissimilaires. Le choix des motifs conditionne la sensibilité et la sélectivité du
filtre associé. Dans ce mémoire nous considérons un certain type de motifs appelé graine. Il s'agit en fait de
sous-chaînes à trous.
Nous étudions plusieurs problèmes algorithmiques liés à la conception de bonnes graines. En particulier,
nous montrons que le problème suivant est NP-difficile : étant donnés deux entiers naturels k, m et une
graine, décider si le filtre associé est sans perte lorsque l'on restreint la notion de similarité aux paires de
mots de même longueur m, séparés par une distance de Hamming au plus k. Notons que plusieurs
algorithmes exponentiels ont été proposés pour des généralisations de ce problème.
Nicolas, François. "Alignement, séquence, consensus, recherche de similarités : complexité et approximabilité". Montpellier 2, 2005. http://www.theses.fr/2005MON20179.
Texto completoAlbitar, Shereen. "De l'usage de la sémantique dans la classification supervisée de textes : application au domaine médical". Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4343/document.
Texto completoThe main interest of this research is the effect of using semantics in the process of supervised text classification. This effect is evaluated through an experimental study on documents related to the medical domain using the UMLS (Unified Medical Language System) as a semantic resource. This evaluation follows four scenarios involving semantics at different steps of the classification process: the first scenario incorporates the conceptualization step where text is enriched with corresponding concepts from UMLS; both the second and the third scenarios concern enriching vectors that represent text as Bag of Concepts (BOC) with similar concepts; the last scenario considers using semantics during class prediction, where concepts as well as the relations between them are involved in decision making. We test the first scenario using three popular classification techniques: Rocchio, NB and SVM. We choose Rocchio for the other scenarios for its extendibility with semantics. According to experiment, results demonstrated significant improvement in classification performance using conceptualization before indexing. Moderate improvements are reported using conceptualized text representation with semantic enrichment after indexing or with semantic text-to-text semantic similarity measures for prediction