Teses / dissertações sobre o tema "Qualité des données et des informations"
Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos
Veja os 50 melhores trabalhos (teses / dissertações) para estudos sobre o assunto "Qualité des données et des informations".
Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.
Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.
Veja as teses / dissertações das mais diversas áreas científicas e compile uma bibliografia correta.
Le, conte des floris Robin. "Effet des biais cognitifs et de l'environnement sur la qualité des données et des informations". Electronic Thesis or Diss., Université Paris sciences et lettres, 2024. http://www.theses.fr/2024UPSLM004.
Texto completo da fonteFrom the perspective of philosopher Friedrich Nietzsche, there is no reality that exists in itself, no raw fact, no absolute reality: everything that we define as reality is, in fact, only the result of interpretation processes that are unique to us. Mo-reover, the data stored in information systems is often nothing more than the coded representation of statements made by human beings, thereby inherently involving human interpretation and consequently being affected by the same biases and limitations that characterize the human psyche. This thesis introduces a new conceptual framework, the "Data binding and reification" (DBR) model, that describes the process of data interpretation, and then the reification of information, using a new approach that places human-perception mechanisms at the heart of this process. By mobilizing cognitive and beha-vioral sciences, this approach allows us to identify to what extent human intervention and the structure of the environment to which one is subjected condition the emergence of cognitive biases affecting these processes. Experimental results partially validate this model by identifying the characteristics of the environment that affect, in an organizational context, the data-collection process and the quality of the information produced. This work opens up numerous perspectives, such as the development of a choice architecture in the sense of the economist Richard Thaler, which could improve the very process of data collection by modifying the experience of users of the information system
Ravi, Mondi. "Confiance et incertitude dans les environnements distribués : application à la gestion des donnéeset de la qualité des sources de données dans les systèmes M2M (Machine to Machine)". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM090/document.
Texto completo da fonteTrust and uncertainty are two important aspects of many distributed systems. For example, multiple sources of information can be available for the same type of information. This poses the problem to select the best source that can produce the most certain information and to resolve incoherence amongst the available information. Managing trust and uncertainty together forms a complex problem and through this thesis we develop a solution to this. Trust and uncertainty have an intrinsic relationship. Trust is primarily related to sources of information while uncertainty is a characteristic of the information itself. In the absence of trust and uncertainty measures, a system generally suffers from problems like incoherence and uncertainty. To improve on this, we hypothesize that the sources with higher trust levels will produce more certain information than those with lower trust values. We then use the trust measures of the information sources to quantify uncertainty in the information and thereby infer high level conclusions with greater certainty.A general trend in the modern distributed systems is to embed reasoning capabilities in the end devices to make them smart and autonomous. We model these end devices as agents of a Multi Agent System. Major sources of beliefs for such agents are external information sources that can possess varying trust levels. Moreover, the incoming information and beliefs are associated with a degree of uncertainty. Hence, the agents face two-fold problems of managing trust on sources and presence of uncertainty in the information. We illustrate this with three application domains: (i) The intelligent community, (ii) Smart city garbage collection, and (iii) FIWARE : a European project about the Future Internet that motivated the research on this topic. Our solution to the problem involves modelling the devices (or entities) of these domains as intelligent agents that comprise a trust management module, an inference engine and a belief revision system. We show that this set of components can help agents to manage trust on the other sources and quantify uncertainty in the information and then use this to infer more certain high level conclusions. We finally assess our approach using simulated and real data pertaining to the different application domains
Boydens, Isabelle. "Evaluer et améliorer la qualité de l'information: herméneutique des bases de données administratives". Doctoral thesis, Universite Libre de Bruxelles, 1998. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/212039.
Texto completo da fonteMantilla, Morales Gabriela. "Modélisation des transferts de nitrates, confrontation des concepts, des données et des informations : application au bassin de la Charente". Phd thesis, Ecole Nationale des Ponts et Chaussées, 1995. http://pastel.archives-ouvertes.fr/pastel-00569426.
Texto completo da fonteMerino, Laso Pedro. "Détection de dysfonctionements et d'actes malveillants basée sur des modèles de qualité de données multi-capteurs". Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0056/document.
Texto completo da fonteNaval systems represent a strategic infrastructure for international commerce and military activity. Their protection is thus an issue of major importance. Naval systems are increasingly computerized in order to perform an optimal and secure navigation. To attain this objective, on board vessel sensor systems provide navigation information to be monitored and controlled from distant computers. Because of their importance and computerization, naval systems have become a target for hackers. Maritime vessels also work in a harsh and uncertain operational environments that produce failures. Navigation decision-making based on wrongly understood anomalies can be potentially catastrophic.Due to the particular characteristics of naval systems, the existing detection methodologies can't be applied. We propose quality evaluation and analysis as an alternative. The novelty of quality applications on cyber-physical systems shows the need for a general methodology, which is conceived and examined in this dissertation, to evaluate the quality of generated data streams. Identified quality elements allow introducing an original approach to detect malicious acts and failures. It consists of two processing stages: first an evaluation of quality; followed by the determination of agreement limits, compliant with normal states to identify and categorize anomalies. The study cases of 13 scenarios for a simulator training platform of fuel tanks and 11 scenarios for two aerial drones illustrate the interest and relevance of the obtained results
Kerrouche, Abdelali. "Routage des données dans les réseaux centrés sur les contenus". Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1119/document.
Texto completo da fonteThe Information Centric Networking (ICN) represents a new paradigm that is increasingly developed within the Internet world. It brings forward new content-centric based approaches, in order to design a new architecture for the future Internet, whose usage today shifts from a machine oriented communication (hosts) to a large-scale content distribution and retrieval.In this context, several ICN architectures have been proposed by the scientific community, within several international projects: DONA, PURSUIT, SAIL, COMET, CONVERGENCE, Named Data Networking (NDN), etc.Our thesis work has focused on the problems of routing in such networks, through a NDN architecture, which represents one of the most advanced ICN architectures nowadays.In particular, we were interested in designing and implementing routing solutions that integrate quality-of-service metrics (QoS) in the NDN architecture in terms of current Internet usage. This latter is indeed characterized by a heterogeneity of connections and highly dynamic traffic conditions.In this type of architecture, data packets broadcast is organized in two levels: the routing planand the forwarding plane. The latter is responsible for routing packets on all available paths through an identified upstream strategy. The routing plan is meanwhile used only to support the forwarding plane. In fact, our solutions consist of new QoS routing strategies which we describe as adaptive. These strategies can transmit packets over multiple paths while taking into account the QoS parameters related to the state of the network and collected in real time.The first proposed approach is designed on the basis of a on-line Q-learn type inductive learning method, and is used to estimate the information collected on the dynamic state of the network.The second contribution is an adaptive routing strategy designed for NDN architectures which considers the metrics related to QoS. It is based on the similarities between the packet forwarding process in the NDN architecture and the behavior of ants when finding the shortest path between their nest and food sources. The techniques used to design this strategy are based on optimization approaches used "ant colonies" algorithms.Finally, in the last part of the thesis, we generalize the approach described above to extend it to the simultaneous consideration of several QoS parameters. Based on these principles, this approach was later extended to solving problems related to congestion.The results show the effectiveness of the proposed solutions in an NDN architecture and thus allow to consider QoS parameters in packet delivery mechanisms paving the way for various content-oriented applications on this architecture
Glele, Ahanhanzo Yolaine. "Qualité des données dans le système d'information sanitaire de routine et facteurs associés au Bénin: place de l'engagement au travail". Doctoral thesis, Universite Libre de Bruxelles, 2014. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209216.
Texto completo da fonteDans les centres de santé de 1er échelon des départements de l’Atlantique et du Littoral, au sud du Bénin, nous avons développé six études pour atteindre les objectifs de recherche. Les études 1 et 2 basées respectivement sur les méthodologies lot quality assurance sampling et capture recapture sont destinées à mesurer la qualité des données. Les études 3 et 4, transversales, analysent l’engagement au travail des agents de santé responsables du SISR au niveau opérationnel. Les études 5 et 6, respectivement transversale et prospective, identifient les facteurs associés à la qualité des données.
Il ressort de ces analyses que :
•\
Doctorat en Sciences de la santé publique
info:eu-repo/semantics/nonPublished
Ben, Khedher Anis. "Amélioration de la qualité des données produits échangées entre l'ingénierie et la production à travers l'intégration de systèmes d'information dédiés". Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO20012.
Texto completo da fonteThe research work contributes to improve the quality of data exchanged between the production and the engineering units which dedicated to product design and production system design. This improvement is qualified by studying the interactions between the product life cycle management and the production management. These two concepts are supported, wholly or partly by industrial information systems, the study of the interactions then lead to the integration of information systems (PLM, ERP and MES).In a highly competitive environment and globalization, companies are forced to innovate and reduce costs, especially the production costs. Facing with these challenges, the volume and frequency change of production data are increasing due to the steady reduction of the lifetime and the products marketing, the increasing of product customization and the generalization of continuous improvement in production. Consequently, the need to formalize and manage all production data is required. These data should be provided to the production operators and machines.After analysis the data quality for each existing architecture demonstrating the inability to address this problem, an architecture, based on the integration of three information systems involved in the production (PLM, ERP and MES) has been proposed. This architecture leads to two complementary sub-problems. The first one is the development of an architecture based on Web services to improve the accessibility, safety and completeness of data exchanged. The second is the integration architecture of integration based on ontologies to offer the integration mechanisms based on the semantics in order to ensure the correct interpretation of the data exchanged. Therefore, the model of the software tool supports the proposed solution and ensures that integration of data exchanged between engineering and production was carried out
Michel, Pierre. "Sélection d'items en classification non supervisée et questionnaires informatisés adaptatifs : applications à des données de qualité de vie liée à la santé". Thesis, Aix-Marseille, 2016. http://www.theses.fr/2016AIXM4097/document.
Texto completo da fonteAn adaptive test provides a valid measure of quality of life of patients and reduces the number of items to be filled. This approach is dependent on the models used, sometimes based on unverifiable assumptions. We propose an alternative approach based on decision trees. This approach is not based on any assumptions and requires less calculation time for item administration. We present different simulations that demonstrate the relevance of our approach.We present an unsupervised classification method called CUBT. CUBT includes three steps to obtain an optimal partition of a data set. The first step grows a tree by recursively dividing the data set. The second step groups together the pairs of terminal nodes of the tree. The third step aggregates terminal nodes that do not come from the same split. Different simulations are presented to compare CUBT with other approaches. We also define heuristics for the choice of CUBT parameters.CUBT identifies the variables that are active in the construction of the tree. However, although some variables may be irrelevant, they may be competitive for the active variables. It is essential to rank the variables according to an importance score to determine their relevance in a given model. We present a method to measure the importance of variables based on CUBT and competitive binary splis to define a score of variable importance. We analyze the efficiency and stability of this new index, comparing it with other methods
Guemeida, Abdelbasset. "Contributions à une nouvelle approche de Recherche d'Information basée sur la métaphore de l'impédance et illustrée sur le domaine de la santé". Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00581322.
Texto completo da fonteOstermann, Pascal. "Logiques modales et informations incomplètes". Toulouse, ENSAE, 1988. http://www.theses.fr/1988ESAE0013.
Texto completo da fonteDzogang, Fabon. "Représentation et apprentissage à partir de textes pour des informations émotionnelles et pour des informations dynamiques". Paris 6, 2013. http://www.theses.fr/2013PA066253.
Texto completo da fonteAutomatic knowledge extraction from texts consists in mapping lowlevel information, as carried by the words and phrases extracted fromdocuments, to higher level information. The choice of datarepresentation for describing documents is, thus, essential and thedefinition of a learning algorithm is subject to theirspecifics. This thesis addresses these two issues in the context ofemotional information on the one hand and dynamic information on theother. In the first part, we consider the task of emotion extraction forwhich the semantic gap is wider than it is with more traditionalthematic information. Therefore, we propose to study representationsaimed at modeling the many nuances of natural language used fordescribing emotional, hence subjective, information. Furthermore, wepropose to study the integration of semantic knowledge which provides,from a characterization perspective, support for extracting theemotional content of documents and, from a prediction perspective,assistance to the learning algorithm. In the second part, we study information dynamics: any corpus ofdocuments published over the Internet can be associated to sources inperpetual activity which exchange information in a continuousmovement. We explore three main lines of work: automaticallyidentified sources; the communities they form in a dynamic and verysparse description space; and the noteworthy themes they develop. Foreach we propose original extraction methods which we apply to a corpusof real data we have collected from information streams over the Internet
Lamer, Antoine. "Contribution à la prévention des risques liés à l’anesthésie par la valorisation des informations hospitalières au sein d’un entrepôt de données". Thesis, Lille 2, 2015. http://www.theses.fr/2015LIL2S021/document.
Texto completo da fonteIntroduction Hospital Information Systems (HIS) manage and register every day millions of data related to patient care: biological results, vital signs, drugs administrations, care process... These data are stored by operational applications provide remote access and a comprehensive picture of Electronic Health Record. These data may also be used to answer to others purposes as clinical research or public health, particularly when integrated in a data warehouse. Some studies highlighted a statistical link between the compliance of quality indicators related to anesthesia procedure and patient outcome during the hospital stay. In the University Hospital of Lille, the quality indicators, as well as the patient comorbidities during the post-operative period could be assessed with data collected by applications of the HIS. The main objective of the work is to integrate data collected by operational applications in order to realize clinical research studies.Methods First, the data quality of information registered by the operational applications is evaluated with methods … by the literature or developed in this work. Then, data quality problems highlighted by the evaluation are managed during the integration step of the ETL process. New data are computed and aggregated in order to dispose of indicators of quality of care. Finally, two studies bring out the usability of the system.Results Pertinent data from the HIS have been integrated in an anesthesia data warehouse. This system stores data about the hospital stay and interventions (drug administrations, vital signs …) since 2010. Aggregated data have been developed and used in two clinical research studies. The first study highlighted statistical link between the induction and patient outcome. The second study evaluated the compliance of quality indicators of ventilation and the impact on comorbity.Discussion The data warehouse and the cleaning and integration methods developed as part of this work allow performing statistical analysis on more than 200 000 interventions. This system can be implemented with other applications used in the CHRU of Lille but also with Anesthesia Information Management Systems used by other hospitals
Choquet, Rémy. "Partage de données biomédicales : modèles, sémantique et qualité". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2011. http://tel.archives-ouvertes.fr/tel-00824931.
Texto completo da fonteDe, Saint Denis Delphine. "Informations et données personnelles dans le cadre de l'exécution des titres exécutoires". Electronic Thesis or Diss., Toulon, 2020. http://www.theses.fr/2020TOUL0134.
Texto completo da fonteThe effectiveness of enforceable titles requires transparency of personal and heritage information, allowing it to proceed with enforced execution. This information is multiple but subject to the general regulation of personal data protection. Therefore all information about people and their assets is neither obtainable nor usable in any circumstances. The information transparency must consequently be proportionate to the implemented title both in obtaining it and in its subsequent operation. Once acquired the information must be protected from any harm. This protection extends from the moment the information was obtained to the end of its use and to its effective destruction following its legal archiving state.Between transparency and opacity, personal and patrimony information must be easily accessible to the Enforcement Officer while being beyond any reach of any third predation parties. The Judicial Officer must be both the guarantor of the parties’ interests competing and maintaining the information translucently at the service of the enforceable titles effectiveness and therefore of good justice
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données". Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Texto completo da fonteNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Marot, Pierre-Yves. "Les données et informations à caractère personnel : essai sur la notion et ses fonctions". Nantes, 2007. http://www.theses.fr/2007NANT4012.
Texto completo da fonteWhereas the primacy of the person is strongly stated by law, the splitting of the, legal sources devoted to the data and information pertaining to the person (personal data. Nominative information, privacy. . . ) is likely to set the dismantling of the person into as many specific legal statuses as there are data and information. The notion of privacy highly participates to this danger because, if its protection means the protection of an important amount of data and information, their nature doesn't indicate what legal status is to be applied in each case. In this context, it is not surprising to see courts allowing the modification of the civil status (names, surnames, sex. . . ) on the paradoxical rationale of the right of privacy, even if it in large parts depends on state decisions. Facing these conceptual contradictions, we note the emergence of a category of personal data and information which as common criterion holds the identification of the person thus allowed. Starting from this functional category, it becomes possible to explore its practical implication and to give an account of it. As it appears, if the use of personal data and information remains exceptional, it becomes massive as soon as public interest are concerned (e. G. Penal system, public health and public information). It is therefore advised to restore in all, its fullness. The principle of protection for personal data and information by strictly appreciating its exemptions and by relying on the necessary safeguard unavailability provides
Weber-Baghdiguian, Lexane. "Santé, genre et qualité de l'emploi : une analyse sur données microéconomiques". Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLED014/document.
Texto completo da fonteThis thesis studies the influence of work on job and life quality, the latter being considered through the perception that individuals have of their own health. The first chapter focuses on the long-term effects of job losses due to plant closure on job quality. We show that job loss negatively affects wages, perceived job insecurity, the quality of the working environment and job satisfaction, including in the long run. The two last chapters investigate gender differences in self-reported health. The second chapter provides descriptive evidence on the relationships between self-assessed health, gender and mental health problems, i.e. depression and/or affective pains. Finally, in the last chapter, we study the influence of social norms as proxied by the gender structure of the workplace environment, on gender differences in self-reported health. We show that both women and men working in female-dominated environments report more specific health problems than those who work in male-dominated environments. The overall findings of this thesis are twofold. First, losing a job has a negative impact on several dimensions of job quality and satisfaction in the long run. Secondly, mental diseases and social norms at work are important to understand gender-related differences in health perceptions
Puricelli, Alain. "Réingénierie et Contrôle Qualité des Données en vue d'une Migration Technologique". Lyon, INSA, 2000. http://theses.insa-lyon.fr/publication/2000ISAL0092/these.pdf.
Texto completo da fonteThe purpose of this thesis is to develop a methodology of treatment for logical consistency checking in a Geographical Information System (GIS), in order to ensure the migration of the data in the case of a technological change of system and re-structuring. This methodology is then applied to a real GIS installed in the Urban Community of Lyon (the SUR). Logical consistency is one of the quality criteria that are commonly allowed within the community of producers and users of geographical data, as well as precision or exhaustiveness for instance. After a presentation of the elements of quality and metadata in GIS, a state of the art is given concerning various works of standardization within these fields. The different standards under development (those of the CEN, the ISO and the FGDC among others) are analyzed and commented. A methodology of detection and correction of geometrical and topological errors is then detailed, within the framework of existing geographical vector databases. Three types of errors are identified, namely structural, geometrical and semantic errors. For each one of these families of errors, methods of detection based on established theories (integrity constraints, topology and computational geometry) are proposed as well ideas for the correction are detailed. This approach is then implemented within the context of the SUR databases. To complete this application, a specific mechanism was developed to deal also with the errors in tessellations, which were not taken into account by the methodology (which uses binary topological relations). Finally to ensure the consistency of the corrections, a method was set up to spread the corrections in the neighborhood of the objects under corrections. Those objects can be located inside a single layer of data as well as between different layers or different databases of the system
Feno, Daniel Rajaonasy. "Mesures de qualité des règles d'association : normalisation et caractérisation des bases". Phd thesis, Université de la Réunion, 2007. http://tel.archives-ouvertes.fr/tel-00462506.
Texto completo da fonteBazin, Cyril. "Tatouage de données géographiques et généralisation aux données devant préserver des contraintes". Caen, 2010. http://www.theses.fr/2010CAEN2006.
Texto completo da fonteDigital watermaking is a fundamental process for intellectual property protection. It consists in inserting a mark into a digital document by slightly modifications. The presence of this mark allows the owner of a document to prove the priority of his rights. The originality of our work is twofold. In one hand, we use a local approach to ensure a priori that the quality of constrained documents is preserved during the watermark insertion. On the other hand, we propose a generic watermarking scheme. The manuscript is divided in three parts. Firstly, we introduce the basic concepts of digital watermarking for constrainted data and the state of the art of geographical data watermarking. Secondly, we present our watermarking scheme for digital vectorial maps often used in geographic information systems. This scheme preserves some topological and metric qualities of the document. The watermark is robust, it is resilient against geometric transformations and cropping. We give an efficient implementation that is validated by many experiments. Finally, we propose a generalization of the scheme for constrainted data. This generic scheme will facilitate the design of watermarking schemes for new data type. We give a particular example of application of a generic schema for relational databases. In order to prove that it is possible to work directly on the generic scheme, we propose two detection protocols straightly applicable on any implementation of generic scheme
Tabet, Antoine. "Gestion des capteurs et des informations pour un système de détection à multifonction". Perpignan, 2006. http://www.theses.fr/2006PERP0764.
Texto completo da fonteThe quality of the air in the interior mediums became a major concern in France, Europe and in the world. The interior mediums concentrate the majority of the populations and the sources of pollution. These interior mediums are an object of study privileged for the evaluation of pollution, of their influence on health and the solutions suggested for the reduction of the pollutants. In France, at the present time, they do not exist obligations concerning the monitoring of the quality of the air in the interior mediums. Nevertheless, of the lawful texts are under development. Based on an approach multi-source, this thesis presents a methodology supplements for making of a control and measuring apparatus of chemical, physical and microbiological pollution. Also, it presents the study, the validation and the implementation of an original unit for the detection and the control of the pollutants. This unit is based on the automatism, it makes it possible to carry out a complete expertise on the quality of the air while transmitting the data received in real time on Internet. A prototype making it possible to manage data, to order interfaces and to transmit the values received on Internet was developed
Jarwah, Sahar. "Un modèle générique pour la gestion des informations complexes et dynamiques :". Phd thesis, Grenoble 1, 1992. http://tel.archives-ouvertes.fr/tel-00341088.
Texto completo da fonteBen, Saad Myriam. "Qualité des archives web : modélisation et optimisation". Paris 6, 2011. http://www.theses.fr/2011PA066446.
Texto completo da fonteMaddi, Abdelghani. "La quantification de la recherche scientifique et ses enjeux : bases de données, indicateurs et cartographie des données bibliométriques". Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD020/document.
Texto completo da fonteThe issue of productivity and the "quality" of scientific research is one of the central issues of the 21st century in the economic and social world. Scientific research, source of innovation in all fields, is considered the key to economic development and competitiveness. Science must also contribute to the societal challenges defined in the Framework Programmes for Research and Technological Development (H2020) for example, such as health, demography and well-being. In order to rationalize public spending on research and innovation or to guide the investment strategies of funders, several indicators are developed to measure the performance of research entities. Now, no one can escape evaluation, starting with research articles, researchers, institutions and countries (Pansu, 2013, Gingras, 2016). For lack of methodological comprehension, quantitative indicators are sometimes misused by neglecting the aspects related to their method of calculation / normalization, what they represent or the inadequacies of the databases from which they are calculated. This situation may have disastrous scientific and social consequences. Our work plans to examine the tools of evaluative bibliometrics (indicators and databases) in order to measure the issues related to the quantitative evaluation of scientific performances. We show through this research that the quantitative indicators, can never be used alone to measure the quality of the research entities given the disparities of the results according to the analysis perimeters, the ex-ante problems related to the individual characteristics of researchers who directly affect the quantitative indicators, or the shortcomings of the databases from which they are calculated. For a responsible evaluation, it is imperative to accompany the quantitative measures by a qualitative assessment of the peers. In addition, we also examined the effectiveness of quantitative measures for the purpose of understanding the evolution of science and the formation of scientific communities. Our analysis, applied to a corpus of publications dealing the economic crisis, allowed us to show the dominant authors and currents of thought, as well as the temporal evolution of the terms used in this thematic
Legros, Diégo. "Innovation, formation, qualité et performances des entreprises : Une étude économétrique sur données d'entreprises". Paris 2, 2005. http://www.theses.fr/2005PA020106.
Texto completo da fonteCaron, Clément. "Provenance et Qualité dans les Workflows Orientés Données : application à la plateforme WebLab". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066568/document.
Texto completo da fonteThe WebLab platform is an application used to define and execute media-mining workflows. It is an open source platform, developed by the IPCC1 section of Airbus Defence and Space, for the integration of external components. A designer can create complex media-mining workflows using components, whose operation is not always known (black-boxes services). These complex workflows can lead to a problem of data quality, however, and before this work, no tool existed to analyse and improve the quality of WebLab workflows. To deal with black-box services, we choose to tackle this quality problem with a non-intrusive approach: we enhance the definition of the WebLab workflow with provenance and quality propagation rules. Provenance rules generate fine-grained data dependency links between data and services after the execution of a WebLab workflow. Then the quality propagation rules use these links to reason on the influence that the quality of the data used by a component has on the quality of the output data…
Caron, Clément. "Provenance et Qualité dans les Workflows Orientés Données : application à la plateforme WebLab". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066568.
Texto completo da fonteThe WebLab platform is an application used to define and execute media-mining workflows. It is an open source platform, developed by the IPCC1 section of Airbus Defence and Space, for the integration of external components. A designer can create complex media-mining workflows using components, whose operation is not always known (black-boxes services). These complex workflows can lead to a problem of data quality, however, and before this work, no tool existed to analyse and improve the quality of WebLab workflows. To deal with black-box services, we choose to tackle this quality problem with a non-intrusive approach: we enhance the definition of the WebLab workflow with provenance and quality propagation rules. Provenance rules generate fine-grained data dependency links between data and services after the execution of a WebLab workflow. Then the quality propagation rules use these links to reason on the influence that the quality of the data used by a component has on the quality of the output data…
Azé, Jérôme. "Extraction de Connaissances à partir de Données Numériques et Textuelles". Phd thesis, Université Paris Sud - Paris XI, 2003. http://tel.archives-ouvertes.fr/tel-00011196.
Texto completo da fonteL'analyse de telles données est souvent contrainte par la définition d'un support minimal utilisé pour filtrer les connaissances non intéressantes.
Les experts des données ont souvent des difficultés pour déterminer ce support.
Nous avons proposé une méthode permettant de ne pas fixer un support minimal et fondée sur l'utilisation de mesures de qualité.
Nous nous sommes focalisés sur l'extraction de connaissances de la forme "règles d'association".
Ces règles doivent vérifier un ou plusieurs critères de qualité pour être considérées comme intéressantes et proposées à l'expert.
Nous avons proposé deux mesures de qualité combinant différents critères et permettant d'extraire des règles intéressantes.
Nous avons ainsi pu proposer un algorithme permettant d'extraire ces règles sans utiliser la contrainte du support minimal.
Le comportement de notre algorithme a été étudié en présence de données bruitées et nous avons pu mettre en évidence la difficulté d'extraire automatiquement des connaissances fiables à partir de données bruitées.
Une des solutions que nous avons proposée consiste à évaluer la résistance au bruit de chaque règle et d'en informer l'expert lors de l'analyse et de la validation des connaissances obtenues.
Enfin, une étude sur des données réelles a été effectuée dans le cadre d'un processus de fouille de textes.
Les connaissances recherchées dans ces textes sont des règles d'association entre des concepts définis par l'expert et propres au domaine étudié.
Nous avons proposé un outil permettant d'extraire les connaissances et d'assister l'expert lors de la validation de celles-ci.
Les différents résultats obtenus montrent qu'il est possible d'obtenir des connaissances intéressantes à partir de données textuelles en minimisant la sollicitation de l'expert dans la phase d'extraction des règles d'association.
Rezki-Hanchour, Lahouaria. "Contribution à l'amélioration de processus industriels : contrôle, assurance et maitrise de la qualité des produits". Angers, 1995. http://www.theses.fr/1995ANGE0018.
Texto completo da fonteBerti-Équille, Laure. "La qualité des données et leur recommandation : modèle conceptuel, formalisation et application a la veille technologique". Toulon, 1999. http://www.theses.fr/1999TOUL0008.
Texto completo da fonteTechnological Watch activities are focused on information qualification and validation by human expertise. As a matter of facf, none of these systems can provide (nor assist) a critical and qualitative analysis of data they store and manage- Most of information systems store data (1) whose source is usually unique, not known or not identified/authenticated (2) whose quality is unequal and/or ignored. In practice, several data may describe the same entity in the real world with contradictory values and their relative quality may be comparatively evaluated. Many techniques for data cleansing and editing exist for detecting some errors in database but it is determinant to know which data have bad quality and to use the benefit of a qualitative expert judgment on data, which is complementary to quantitative and statistical data analysis. My contribution is to provide a multi-source perspective to data quality, to introduce and to define the concepts of multi-source database (MSDB) and multi-source data quality (MSDQ). My approach was to analyze the wide panorama of research in the literature whose problematic have some analogies with technological watch problematic. The main objective of my work was to design and to provide a storage environment for managing textual information sources, (more or less contradictory) data that are extracted from the textual content and their quality mcta-data. My work was centered on proposing : the methodology to guide step-by-step a project for data quality in a multi-source information context, the conceptual modeling of a multi-source database (MSDB) for managing data sources, multi-source data and their quality meta-data and proposing mechanisms for multi-criteria data recommendation ; the formalization of the QMSD data model (Quality of Multi-Source Data) which describes multi-source data, their quality meta-data and the set of operations for manipulating them ; the development of the sQuaL prototype for implementing and validating my propositions. In the long term, the perspectives are to develop a specific dccisional information system extending classical functionalities for (1) managing multi-source data (2) taking into account their quality meta-data and (3) proposing data-quality-based recommendation as query results. The ambition is to develop the concept of "introspective information system" ; that is to say, an information system thai is active and reactive concerning the quality of its own data
Troya-Galvis, Andrès. "Approche collaborative et qualité des données et des connaissances en analyse multi-paradigme d'images de télédétection". Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD040/document.
Texto completo da fonteAutomatic interpretation of very high spatial resolution remotely sensed images is a complex but necessary task. Object-based image analysis approaches are commonly used to deal with this kind of images. They consist in applying an image segmentation algorithm in order to construct the abjects of interest, and then classifying them using data-mining methods. Most of the existing work in this domain consider the segmentation and the classification independently. However, these two crucial steps are closely related. ln this thesis, we propose two different approaches which are based on data and knowledge quality in order to initialize, guide, and evaluate a segmentation and classification collaborative process. 1. The first approach is based on a mono-class extraction strategy allowing us to focus on the particular properties of a given thematic class in order to accurately label the abjects of this class. 2. The second approach deals with multi-class extraction and offers two strategies to aggregate several mono-class extractors to get a final and completely labelled image
Da, Silva Carvalho Paulo. "Plateforme visuelle pour l'intégration de données faiblement structurées et incertaines". Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4020/document.
Texto completo da fonteWe hear a lot about Big Data, Open Data, Social Data, Scientific Data, etc. The importance currently given to data is, in general, very high. We are living in the era of massive data. The analysis of these data is important if the objective is to successfully extract value from it so that they can be used. The work presented in this thesis project is related with the understanding, assessment, correction/modification, management and finally the integration of the data, in order to allow their respective exploitation and reuse. Our research is exclusively focused on Open Data and, more precisely, Open Data organized in tabular form (CSV - being one of the most widely used formats in the Open Data domain). The first time that the term Open Data appeared was in 1995 when the group GCDIS (Global Change Data and Information System) (from United States) used this expression to encourage entities, having the same interests and concerns, to share their data [Data et System, 1995]. However, the Open Data movement has only recently undergone a sharp increase. It has become a popular phenomenon all over the world. Being the Open Data movement recent, it is a field that is currently growing and its importance is very strong. The encouragement given by governments and public institutions to have their data published openly has an important role at this level
Ben, othmane Zied. "Analyse et visualisation pour l'étude de la qualité des séries temporelles de données imparfaites". Thesis, Reims, 2020. http://www.theses.fr/2020REIMS002.
Texto completo da fonteThis thesis focuses on the quality of the information collected by sensors on the web. These data form time series that are incomplete, imprecise, and are on quantitative scales that are not very comparable. In this context, we are particularly interested in the variability and stability of these time series. We propose two approaches to quantify them. The first is based on a representation using quantiles, the second is a fuzzy approach. Using these indicators, we propose an interactive visualization tool dedicated to the analysis of the quality of the harvest carried out by the sensors. This work is part of a CIFRE collaboration with Kantar
Vaillant, Benoît. "Mesurer la qualité des règles d'association : études formelles et expérimentales". Télécom Bretagne, 2006. http://www.theses.fr/2006TELB0026.
Texto completo da fonteKnowledge discovery in databases aims at extracting information contained in data warehouses. It is a complex process, in which several experts (those acquainted with data, analysts, processing specialists, etc. ) must act together in order to reveal patterns, which will be evaluated according to several criteria: validity, novelty, understandability, exploitability, etc. Depending on the application field, these criteria may be related to differing concepts. In addition, constant improvements made in the methodological and technical aspects of data mining allow one to deal with ever-increasing databases. The number of extracted patterns follows the same increasing trend, without them all being valid, however. It is commonly assumed that the validation of the knowledge mined cannot be performed by a decision maker, usually in charge of this step in the process, without some automated help. In order to carry out this final validation task, a typical approach relies on the use of functions which numerically quantify the pertinence of the patterns. Since such functions, called interestingness measures, imply an order on the patterns, they highlight some specific kind of information. Many measures have been proposed, each of them being related to a particular category of situations. We here address the issue of evaluating the objective interestingness of the particular type of patterns that are association rules, through the use of such measures. Considering that the selection of ``good'' rules implies the use of appropriated measures, we propose a systematic study of the latter, based on formal properties expressed in the most straightforward terms. From this study, we obtain a clustering of many commonly-used measures which we confront with an experimental approach obtained by comparing the rankingsinduced by these measures on classical datasets. Analysing these properties enabled us to highlight some particularities of the measures. We deduce a generalised framework that includes a large majority of them. We also apply two Multicriteria Decision Aiding methods in order to solve the issue of retaining pertinent rules. The first approach takes into account a modelling of the preferences expressed by an expert in the field being mined about the previously defined properties. From this modelling, we establish which measures are the most adapted to the specific context. The second approach addresses the problem of taking into account the potentially differing values that the measures take, and builds an aggregated view of the ordering of the rules by taking into account the differences in evaluations. These methods are applied to practical situations. This work also led us to develop powerful dedicated software, Herbs. We present the processing it allows for rule selection purposes, as well as for the analysis of the behaviour of measures and visualisation aspects. Without any claim to exhaustiveness in our study, the methodology We propose can be extended to new measures or properties, and is applicable to other data mining contexts
El, Ouadghiri Imane. "Analyse du processus de diffusion des informations sur les marchés financiers : anticipation, publication et impact". Thesis, Paris 10, 2015. http://www.theses.fr/2015PA100096.
Texto completo da fonteFinancial markets are subjected daily to the diffusion of economic indicators and their forecasts by public institutions and even private ones. These annoncements can be scheduled or unscheduled. The scheduled announcements are organized according to a specific calendar and known in advance by all operators. These news such as activity indicators, credit, export or sentiments’ surveys, are published monthly or quarterly by specialized agencies to all operators in real time. Our thesis contributes to diferent literatures and aims to thoroughly analyze the three phases of the diffusion process of new information on financial markets : anticipation of the announcement before its publication, interest that arouse its publication and impact of its publication on market dynamics. The aim of the first chapter is to investigate heterogeneity in macroeconomic news forecasts using disaggregate data of monthly expectation surveys conducted by Bloomberg on macroeconomic indicators from January 1999 to February 2013. The second chapter examines the impact of surprises associated with monthly macroeconomic news releases on Treasury-bond returns, by paying particular attention to the moment at which the information is published in the month. In the third chapter we examine the intraday effects of surprises from scheduled and unscheduled announcements on six major exchange rate returns (jumps) using an extension of the standard Tobit model with heteroskedastic and asymmetric errors
Guillet, Fabrice. "Qualité, Fouille et Gestion des Connaissances". Habilitation à diriger des recherches, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00481938.
Texto completo da fonteBen, Hassine Soumaya. "Évaluation et requêtage de données multisources : une approche guidée par la préférence et la qualité des données : application aux campagnes marketing B2B dans les bases de données de prospection". Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO22012/document.
Texto completo da fonteIn Business-to-Business (B-to-B) marketing campaigns, manufacturing “the highest volume of sales at the lowest cost” and achieving the best return on investment (ROI) score is a significant challenge. ROI performance depends on a set of subjective and objective factors such as dialogue strategy, invested budget, marketing technology and organisation, and above all data and, particularly, data quality. However, data issues in marketing databases are overwhelming, leading to insufficient target knowledge that handicaps B-to-B salespersons when interacting with prospects. B-to-B prospection data is indeed mainly structured through a set of independent, heterogeneous, separate and sometimes overlapping files that form a messy multisource prospect selection environment. Data quality thus appears as a crucial issue when dealing with prospection databases. Moreover, beyond data quality, the ROI metric mainly depends on campaigns costs. Given the vagueness of (direct and indirect) cost definition, we limit our focus to price considerations.Price and quality thus define the fundamental constraints data marketers consider when designing a marketing campaign file, as they typically look for the "best-qualified selection at the lowest price". However, this goal is not always reachable and compromises often have to be defined. Compromise must first be modelled and formalized, and then deployed for multisource selection issues. In this thesis, we propose a preference-driven selection approach for multisource environments that aims at: 1) modelling and quantifying decision makers’ preferences, and 2) defining and optimizing a selection routine based on these preferences. Concretely, we first deal with the data marketer’s quality preference modelling by appraising multisource data using robust evaluation criteria (quality dimensions) that are rigorously summarized into a global quality score. Based on this global quality score and data price, we exploit in a second step a preference-based selection algorithm to return "the best qualified records bearing the lowest possible price". An optimisation algorithm, BrokerACO, is finally run to generate the best selection result
Mittal, Nupur. "Data, learning and privacy in recommendation systems". Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S084/document.
Texto completo da fonteRecommendation systems have gained tremendous popularity, both in academia and industry. They have evolved into many different varieties depending mostly on the techniques and ideas used in their implementation. This categorization also marks the boundary of their application domain. Regardless of the types of recommendation systems, they are complex and multi-disciplinary in nature, involving subjects like information retrieval, data cleansing and preprocessing, data mining etc. In our work, we identify three different challenges (among many possible) involved in the process of making recommendations and provide their solutions. We elaborate the challenges involved in obtaining user-demographic data, and processing it, to render it useful for making recommendations. The focus here is to make use of Online Social Networks to access publicly available user data, to help the recommendation systems. Using user-demographic data for the purpose of improving the personalized recommendations, has many other advantages, like dealing with the famous cold-start problem. It is also one of the founding pillars of hybrid recommendation systems. With the help of this work, we underline the importance of user’s publicly available information like tweets, posts, votes etc. to infer more private details about her. As the second challenge, we aim at improving the learning process of recommendation systems. Our goal is to provide a k-nearest neighbor method that deals with very large amount of datasets, surpassing billions of users. We propose a generic, fast and scalable k-NN graph construction algorithm that improves significantly the performance as compared to the state-of-the art approaches. Our idea is based on leveraging the bipartite nature of the underlying dataset, and use a preprocessing phase to reduce the number of similarity computations in later iterations. As a result, we gain a speed-up of 14 compared to other significant approaches from literature. Finally, we also consider the issue of privacy. Instead of directly viewing it under trivial recommendation systems, we analyze it on Online Social Networks. First, we reason how OSNs can be seen as a form of recommendation systems and how information dissemination is similar to broadcasting opinion/reviews in trivial recommendation systems. Following this parallelism, we identify privacy threat in information diffusion in OSNs and provide a privacy preserving algorithm for the same. Our algorithm Riposte quantifies the privacy in terms of differential privacy and with the help of experimental datasets, we demonstrate how Riposte maintains the desirable information diffusion properties of a network
Lévesque, Johann. "Évaluation de la qualité des données géospatiales : approche top-down et gestion de la métaqualité". Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24759/24759.pdf.
Texto completo da fonteUbéda, Thierry. "Contrôle de la qualité spatiale des bases de données géographiques : cohérence topologique et corrections d'erreurs". Lyon, INSA, 1997. http://theses.insa-lyon.fr/publication/1997ISAL0116/these.pdf.
Texto completo da fonteThis work concerns spatial data quality checking in geographical data sets, and especially existing geographical vector databases. Methods developed in this work are not dedicated to a particular data model, but can be adapted to all database fulfilling the two criteria previously given. Concerning the issue of data quality enrichment, this study concerns two complementary levels, namely the conceptual and the semantic level. For each level, processes are developed :- At the conceptual level, geometric properties applicable to geographical data types depending on the dimension of the shape that represents them (0, 1 or 2) are defined. This approach is only based on the objects that compose the database and not on the data model itself. It can then be adapted to every vector geographical data set. - At the semantic level, spatial relation among objects of the database are taken into account by means of topological integrity constraints. They allow to define topological situation that should or should not happen
Heguy, Xabier. "Extensions de BPMN 2.0 et méthode de gestion de la qualité pour l'interopérabilité des données". Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0375/document.
Texto completo da fonteBusiness Process Model and Notation (BPMN) is being becoming the most used standard for business process modelling. One of the important upgrades of BPMN 2.0 with respect to BPMN 1.2 is the fact that Data Objects are now handling semantic elements. Nevertheless, BPMN doesn't enable the representation of performance measurement in the case of interoperability problems in the exchanged data object, which remains a limitation when using BPMN to express interoperability issues in enterprise processes. We propose to extend the Meta-Object Facility meta-model and the XML Schema Definition of BPMN as well as the notation in order to fill this gap. The extension, named performanceMeasurement, is defined using the BPMN Extension Mechanism. This new element will allow to represent performance measurement in the case of interoperability problems as well as interoperability concerns which have been solved. We illustrate the use of this extension with an example from a real industrial case
Mouaddib, Noureddine. "Gestion des informations nuancées : une proposition de modèle et de méthode pour l'identification nuancée d'un phénomène". Nancy 1, 1989. http://www.theses.fr/1989NAN10475.
Texto completo da fonteBarland, Rémi. "Évaluation objective sans référence de la qualité perçue : applications aux images et vidéos compressées". Nantes, 2007. http://www.theses.fr/2007NANT2028.
Texto completo da fonteThe conversion to the all-digital and the development of multimedia communications produce an ever-increasing flow of information. This massive increase in the quantity of data exchanged generates a progressive saturation of the transmission networks. To deal with this situation, the compression standards seek to exploit more and more the spatial and/or temporal correlation to reduce the bit rate. The reduction of the resulting information creates visual artefacts which can deteriorate the visual content of the scene and thus cause troubles for the end-user. In order to propose the best broadcasting service, the assessment of the perceived quality is then necessary. The subjective tests which represent the reference method to quantify the perception of distortions are expensive, difficult to implement and remain inappropriate for an on-line quality assessment. In this thesis, we are interested in the most used compression standards (image or video) and have designed no-reference quality metrics based on the exploitation of the most annoying visual artefacts, such as the blocking, blurring and ringing effects. The proposed approach is modular and adapts to the considered coder and to the required ratio between computational cost and performance. For a low complexity, the metric quantifies the distortions specific to the considered coder, only exploiting the properties of the image signal. To improve the performance, to the detriment of a certain complexity, this one integrates in addition, cognitive models simulating the mechanisms of the visual attention. The saliency maps generated are then used to refine the proposed distortion measures purely based on the image signal
Yildiz, Ustun. "Decentralisation des procédés métiers : qualité de services et confidentialité". Phd thesis, Université Henri Poincaré - Nancy I, 2008. http://tel.archives-ouvertes.fr/tel-00437469.
Texto completo da fonteDevillers, Rodolphe. "Conception d'un système multidimensionnel d'information sur la qualité des données géospatiales". Phd thesis, Université de Marne la Vallée, 2004. http://tel.archives-ouvertes.fr/tel-00008930.
Texto completo da fonteIsambert, Aurélie. "Contrôle de qualité et optimisation de l'acquisition des données en imagerie multimodale pour la radiothérapie externe". Paris 11, 2009. http://www.theses.fr/2009PA11T006.
Texto completo da fonteClaeyman, Marine. "Etude par modélisation et assimilation de données d'un capteur infrarouge géostationnaire pour la qualité de l'air". Toulouse 3, 2010. http://thesesups.ups-tlse.fr/1216/.
Texto completo da fonteThe objective of this thesis is to define a geostationary infrared sensor to observe the atmospheric composition of the lowermost troposphere. We evaluate the potential added value of such an instrument at characterizing the variability of the main pollutants and improving air quality observations and forecasts. We focus on two air quality key pollutants: tropospheric ozone, because of its impact on human health, ecosystems and climate; carbon monoxide (CO), which is a tracer of pollutants emissions. Firstly, an evaluation of a linear scheme for the CO chemistry during one year and a half has been performed in comparison with a detailed chemical scheme (RACMOBUS) and different tropospheric and stratospheric observations (satellite and aircraft data). The advantage of such a scheme is its low computational cost which allows data assimilation of CO during long periods. Assimilation of CO data from the Measurements Of Pollution In The Troposphere (MOPITT) instrument allows us to evaluate the information brought by such infrared observations at the global scale. Secondly, the optimal configuration of a new infrared geostationary sensor has been defined using retrieval studies of atmospheric spectra with the objectives to contribute to the monitoring of ozone and CO for air quality purposes; our constraint also set the ground for a sensor with technically feasible and affordable characteristics. For reference, the information content of this instrument has been compared during summer to the information content from another infrared geostationary instrument similar to MTG-IRS (Meteosat Third Generation - Infrared Sounder), optimized to monitor water vapour and temperature but with monitoring atmospheric composition as Lastly, the potential added value of both instruments for air quality prognoses has been compared using observing system simulation experiments (OSSEs) over two summer months (July - August 2009). The skill of the two instruments to correct different error sources (atmospheric forcing, emission, initial state and the three conditions together) affecting air quality simulations and forecasts, has been characterised. In the end, it is concluded that the instrument configuration proposed is effectively able to bring a constraint on ozone and CO fields in the mid-to-low troposphere
Pellay, François-Xavier. "Méthodes d'estimation statistique de la qualité et méta-analyse de données transcriptomiques pour la recherche biomédicale". Thesis, Lille 1, 2008. http://www.theses.fr/2008LIL10058/document.
Texto completo da fonteTo understand the biological phenomena taking place in a cell under physiological or pathological conditions, it is essential to know the genes that it expresses Measuring genetic expression can be done with DNA chlp technology on which are set out thousands of probes that can measure the relative abundance of the genes expressed in the cell. The microarrays called pangenomic are supposed to cover all existing proteincoding genes, that is to say currently around thirty-thousand for human beings. The measure, analysis and interpretation of such data poses a number of problems and the analytlcal methods used will determine the reliability and accuracy of information obtained with the microarrays technology. The aim of thls thesis is to define methods to control measures, improve the analysis and deepen interpretation of microarrays to optimize their utilization in order to apply these methods in the transcriptome analysis of juvenile myelomocytic leukemia patients, to improve the diagnostic and understand the biological mechanisms behind this rare disease. We thereby developed and validated through several independent studies, a quality control program for microarrays, ace.map QC, a software that improves biological Interpretations of microarrays data based on genes ontologies and a visualization tool for global analysis of signaling pathways. Finally, combining the different approaches described, we have developed a method to obtain reliable biological signatures for diagnostic purposes
Andrieu, Pierre. "Passage à l'échelle, propriétés et qualité des algorithmes de classements consensuels pour les données biologiques massives". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG041.
Texto completo da fonteBiologists and physicians regularly query public biological databases, for example when they are looking for the most associated genes towards a given disease. The chosen keyword are particularly important: synonymous reformulations of the same disease (for example "breast cancer" and "breast carcinoma") may lead to very different rankings of (thousands of) genes. The genes, sorted by relevance, can be tied (equal importance towards the disease). Additionally, some genes returned when using a first synonym may be absent when using another synonym. The rankings are then called "incomplete rankings with ties". The challenge is to combine the information provided by these different rankings of genes. The problem of taking as input a list of rankings and returning as output a so-called consensus ranking, as close as possible to the input rankings, is called the "rank aggregation problem". This problem is known to be NP-hard. Whereas most works focus on complete rankings without ties, we considered incomplete rankings with ties. Our contributions are divided into three parts. First, we have designed a graph-based heuristic able to divide the initial problem into independent sub-problems in the context of incomplete rankings with ties. Second, we have designed an algorithm able to identify common points between all the optimal consensus rankings, allowing to provide information about the robustness of the provided consensus ranking. An experimental study on a huge number of massive biological datasets has highlighted the biological relevance of these approaches. Our last contribution the following one : we have designed a parameterized model able to consider various interpretations of missing data. We also designed several algorithms for this model and did an axiomatic study of this model, based on social choice theory