Tesis sobre el tema "Profilage des données"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 34 mejores tesis para su investigación sobre el tema "Profilage des données".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Chevallier, Marc. "L’Apprentissage artificiel au service du profilage des données". Electronic Thesis or Diss., Paris 13, 2022. http://www.theses.fr/2022PA131060.
Texto completoThe digital transformation that has been rapidly happening within companies over the last few decades has led to a massive production of data. Once the problems related to the storage of those data have been solved, its use within Business Intelligence (BI) or Machine Learning (ML) has become a major objective for companies in order to make their data profitable. But the exploitation of the data is complex because it is not well documented and often contains many errors. It is in this context that the fields of data profiling and data quality (DQ) have become increasingly important. Profiling aims at extracting informative metadata from the data and data quality aims at quantifying the errors in the data.Profiling being a prerequisite to data quality, we have focused our work on this subject through the use of metadata vectors resulting from simple profiling actions. These simple information vectors have allowed us to perform advanced profiling tasks, in particular the prediction of complex semantic types using machine learning. The metadata vectors we used are large and are therefore affected by the curse of dimensionality. This term refers to a set of performance problems that occur in machine learning when the number of dimensions of the problem increases. One method to solve these problems is to use genetic algorithms to select a subset of dimensions with good properties. In this framework we have proposed improvements: on one hand, a non-random initialization of the individuals composing the initial population of the genetic algorithm, on the other hand, a modification to the genetic algorithm with aggressive mutations in order to improve its performance (GAAM)
Ben, Ellefi Mohamed. "La recommandation des jeux de données basée sur le profilage pour le liage des données RDF". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT276/document.
Texto completoWith the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community
Ammous, Karim. "Compression par profilage du code Java compilé pour les systèmes embarqués". Valenciennes, 2007. http://ged.univ-valenciennes.fr/nuxeo/site/esupversions/a56319aa-b36f-46ed-b617-a1464a995056.
Texto completoThe embedded systems are characterized by reduced hardware resources. Although these resources are constantly increasing, they remain insufficient. The memory space is one of the most critical resources. The compression of the code designed for embedded systems constitutes an interesting solution to reduce the memory footprint. Our study focuses on the compression of Java code represented by Java Class format files. Our contribution consists in designing and implementing a profiler based system in order to guide the compression of Java class files. Our profiler enables us to set up, on the basis of elementary compression techniques, an efficient compression strategy which delivers the best rate of compression. This strategy takes into consideration the features of the code given in input and dependencies between compression techniques. Our approach is based on four points: 1 - the study of the input files in order to extract the necessary information for the guidance of the compression process. 2 - the analysis of compression techniques dependencies in terms of effects produced by each technique to the others. To do so, we developed two methods: one numerical, based on the estimation of performance, the other analytical in order to determine whether there are common points between the different compression methods. 3 - the statistic performance assessment which allows to choose a strategy of compression: we have identified the parameters, related to each method, that enable this assessment. 4 - the definition of heuristics in order to identify the most efficient compression path in a research space characterized by an oriented graph
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données". Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Texto completoNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Bakiri, Ali. "Développements informatiques de déréplication et de classification de données spectroscopiques pour le profilage métabolique d’extraits d'algues". Thesis, Reims, 2018. http://www.theses.fr/2018REIMS013.
Texto completoThe emergence of dereplication strategies as a new tool for the rapid identification of the natural products from complex natural extracts has unveiled a great need for cheminformatic tools for the treatment and analysis of the spectral data. The present thesis deals with the development of in silico dereplication methods based on Nuclear Magnetic Resonance (NMR). The first method, DerepCrud, is based on 13C NMR spectroscopy. It identifies the major compounds contained in a crude natural extract without any need for fractionation. The principle of the method is to compare the 13C NMR spectrum of the analyzed mixture to a series of 13C NMR chemical shifts of natural compounds stored in a local database. The second method, BCNet, is designed to exploit the richness of 2D NMR data (HMBC and HSQC) for the dereplication of the natural products. BCNet traces back the network formed by the HMBC correlations of the molecules present in a naturel extract, then isolates the groups of correlations belonging to the individual molecules using a community detection algorithm. The molecules are identified by searching these correlations within a locally constructed database that associates natural product structures and 2D NMR peak positions. Finally, the HSQC correlations of the molecules identified during the previous step are compared to the experimental HSQC correlations of the studied extract in order to increase the quality of identification accuracy
Lagraa, Sofiane. "New MP-SoC profiling tools based on data mining techniques". Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM026/document.
Texto completoMiniaturization of electronic components has led to the introduction of complex electronic systems which are integrated onto a single chip with multiprocessors, so-called Multi-Processor System-on-Chip (MPSoC). The majority of recent embedded systems are based on massively parallel MPSoC architectures, hence the necessity of developing embedded parallel applications. Embedded parallel application design becomes more challenging: It becomes a parallel programming for non-trivial heterogeneous multiprocessors with diverse communication architectures and design constraints such as hardware cost, power, and timeliness. A challenge faced by many developers is the profiling of embedded parallel applications so that they can scale over more and more cores. This is especially critical for embedded systems powered by MPSoC, where ever demanding applications have to run smoothly on numerous cores, each with modest power budget. Moreover, application performance does not necessarily improve as more cores are added. Application performance can be limited due to multiple bottlenecks including contention for shared resources such as caches and memory. It becomes time consuming for a developer to pinpoint in the source code the bottlenecks decreasing the performance. To overcome these issues, in this thesis, we propose a fully three automatic methods which detect the instructions of the code which lead to a lack of performance due to contention and scalability of processors on a chip. The methods are based on data mining techniques exploiting gigabytes of low level execution traces produced by MPSoC platforms. Our profiling approaches allow to quantify and pinpoint, automatically the bottlenecks in source code in order to aid the developers to optimize its embedded parallel application. We performed several experiments on several parallel application benchmarks. Our experiments show the accuracy of the proposed techniques, by quantifying and pinpointing the hotspot in the source code
Brunie, Hugo. "Optimisation des allocations de données pour des applications du Calcul Haute Performance sur une architecture à mémoires hétérogènes". Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0014/document.
Texto completoHigh Performance Computing, which brings together all the players responsible for improving the computing performance of scientific applications on supercomputers, aims to achieve exaflopic performance. This race for performance is today characterized by the manufacture of heterogeneous machines in which each component is specialized. Among these components, system memories specialize too, and the trend is towards an architecture composed of several memories with complementary characteristics. The question arises then of these new machines use whose practical performance depends on the application data placement on the different memories. Compromising code update against performance is challenging. In this thesis, we have developed a data allocation on Heterogeneous Memory Architecture problem formulation. In this formulation, we have shown the benefit of a temporal analysis of the problem, because many studies were based solely on a spatial approach this result highlight their weakness. From this formulation, we developed an offline profiling tool to approximate the coefficients of the objective function in order to solve the allocation problem and optimize the allocation of data on a composite architecture composed of two main memories with complementary characteristics. In order to reduce the amount of code changes needed to execute an application according to our toolbox recommended allocation strategy, we have developed a tool that can automatically redirect data allocations from a minimum source code instrumentation. The performance gains obtained on mini-applications representative of the scientific applications coded by the community make it possible to assert that intelligent data allocation is necessary to fully benefit from heterogeneous memory resources. On some problem sizes, the gain between a naive data placement strategy, and an educated data allocation one, can reach up to ×3.75 speedup
Haine, Christopher. "Kernel optimization by layout restructuring". Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0639/document.
Texto completoCareful data layout design is crucial for achieving high performance, as nowadays processors waste a considerable amount of time being stalled by memory transactions, and in particular spacial and temporal locality have to be optimized. However, data layout transformations is an area left largely unexplored by state-of-the-art compilers, due to the difficulty to evaluate the possible performance gains of transformations. Moreover, optimizing data layout is time-consuming, error-prone, and layout transformations are too numerous tobe experimented by hand in hope to discover a high performance version. We propose to guide application programmers through data layout restructuring with an extensive feedback, firstly by providing a comprehensive multidimensional description of the initial layout, built via analysis of memory traces collected from the application binary textit {in fine} aiming at pinpointing problematic strides at the instruction level, independently of theinput language. We choose to focus on layout transformations,translatable to C-formalism to aid user understanding, that we apply and assesson case study composed of two representative multithreaded real-lifeapplications, a cardiac wave simulation and lattice QCD simulation, with different inputs and parameters. The performance prediction of different transformations matches (within 5%) with hand-optimized layout code
Jouravel, Glorianne. "Stratégies innovantes pour une valorisation d’extraits de plantes en cosmétique : Mise en oeuvre d’un outil de profilage métabolique et recherche de nouvelles activités biologiques". Thesis, Orléans, 2018. http://www.theses.fr/2018ORLE2017.
Texto completoThe cosmetic field valorizes plant extracts by integrating them in care products. These extracts constitute the active ingredients of the cosmetic formulation. Plants are diverse, rich and contain numerous compounds of biological interest. Phytochemistry is a way to describe the metabolic content of plant extracts. But molecular characterization of these complex matrices remains a major challenge nowadays. Indeed,steps of data treatment are time-consuming and laborious. In this way, a tool of metabolic profiling, GAINS, has been developed in order to treat in an automatized way data from analyses performed in liquid chromatography coupled with high-resolution mass spectrometry. It constitutes a real support for phytochemists because automatized data treatment allows gaining time compared to manual treatment. This tool, associated with a large database of natural compounds make possible to assign potential candidates to detected peaks. GAINS appeals a module of in silico fragmentation for holding candidates assignments up.This permits to compare modeled spectrum of fragmentation of candidates with experimental spectrum off ragmentation.The whole set of phytochemical studies realized to identify or isolate compounds goes hand in hand with the study of potential biological effects of extracts to the skin, targeted organ by skin-care products. This allows the discovery of beneficial actions that the extract could have. By knowing the phytochemical content, it is possible to explain and rationalize assays about biological activities. The development of an anti-aging ingredient from purple loosestrife, a plant occurring in the region Centre-Val de Loire, is an example of it
Awwad, Tarek. "Context-aware worker selection for efficient quality control in crowdsourcing". Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI099/document.
Texto completoCrowdsourcing has proved its ability to address large scale data collection tasks at a low cost and in a short time. However, due to the dependence on unknown workers, the quality of the crowdsourcing process is questionable and must be controlled. Indeed, maintaining the efficiency of crowdsourcing requires the time and cost overhead related to this quality control to stay low. Current quality control techniques suffer from high time and budget overheads and from their dependency on prior knowledge about individual workers. In this thesis, we address these limitation by proposing the CAWS (Context-Aware Worker Selection) method which operates in two phases: in an offline phase, the correlations between the worker declarative profiles and the task types are learned. Then, in an online phase, the learned profile models are used to select the most reliable online workers for the incoming tasks depending on their types. Using declarative profiles helps eliminate any probing process, which reduces the time and the budget while maintaining the crowdsourcing quality. In order to evaluate CAWS, we introduce an information-rich dataset called CrowdED (Crowdsourcing Evaluation Dataset). The generation of CrowdED relies on a constrained sampling approach that allows to produce a dataset which respects the requester budget and type constraints. Through its generality and richness, CrowdED helps also in plugging the benchmarking gap present in the crowdsourcing community. Using CrowdED, we evaluate the performance of CAWS in terms of the quality, the time and the budget gain. Results shows that automatic grouping is able to achieve a learning quality similar to job-based grouping, and that CAWS is able to outperform the state-of-the-art profile-based worker selection when it comes to quality, especially when strong budget ant time constraints exist. Finally, we propose CREX (CReate Enrich eXtend) which provides the tools to select and sample input tasks and to automatically generate custom crowdsourcing campaign sites in order to extend and enrich CrowdED
Diallo, Mouhamadou Saliou. "Découverte de règles de préférences contextuelles : application à la construction de profils utilisateurs". Thesis, Tours, 2015. http://www.theses.fr/2015TOUR4052/document.
Texto completoThe use of preferences arouses a growing interest to personalize response to requests and making targeted recommandations. Nevertheless, manual construction of preferences profiles remains complex and time-consuming. In this context, we present in this thesis a new automatic method for preferences elicitation based on data mining techniques. Our proposal is a two phase algorithm : (1) Extracting all contextual preferences rules from a set of user preferences and (2) Building user profile. At the end of the first phase, we notice that there is to much preference rules which satisfy the fixed constraints then in the second phase we eliminate the superfluous preferences rules. In our approach a user profile is constituted by the set of contextual preferences rules resulting of the second phase. A user profile must satisfy conciseness and soundness properties. The soundness property guarantees that the preference rules specifying the profiles are in agreement with a large set of the user preferences, and contradict a small number of them. On the other hand, conciseness implies that profiles are small sets of preference rules. We also proposed four predictions methods which use the extracted profiles. We validated our approach on a set of real-world movie rating datasets built from MovieLens and IMDB. The whole movie rating database consists of 800,156 votes from 6,040 users about 3,881 movies. The results of these experiments demonstrates that the conciseness of user profiles is controlled by the minimal agreement threshold and that even with strong reduction, the soundness of the profile remains at an acceptable level. These experiment also show that predictive qualities of some of our ranking strategies outperform SVMRank in several situations
Caigny, Arno de. "Innovation in customer scoring for the financial services industry". Thesis, Lille, 2019. http://www.theses.fr/2019LIL1A011.
Texto completoThis dissertation improves customer scoring. Customer scoring is important for companies in their decision making processes because it helps to solve key managerial issues such as the decision of which customers to target for a marketing campaign or the assessment of customer that are likely to leave the company. The research in this dissertation makes several contributions in three areas of the customer scoring literature. First, new sources of data are used to score customers. Second, methodology to go from data to decisions is improved. Third, customer life event prediction is proposed as a new application of customer scoring
Chamsi, Abu Quba Rana. "On enhancing recommender systems by utilizing general social networks combined with users goals and contextual awareness". Thesis, Lyon 1, 2015. http://www.theses.fr/2015LYO10061/document.
Texto completoWe are surrounded by decisions to take, what book to read next? What film to watch this night and in the week-end? As the number of items became tremendous the use of recommendation systems became essential in daily life. At the same time social network become indispensable in people’s daily lives; people from different countries and age groups use them on a daily basis. While people are spending time on social networks, they are leaving valuable information about them attracting researchers’ attention. Recommendation is one domain that has been affected by the social networks widespread; the result is the social recommenders’ studies. However, in the literature we’ve found that most of the social recommenders were evaluated over Epinions, flixter and other type of domains based recommender social networks, which are composed of (users, items, ratings and relations). The proposed solutions can’t be extended directly to General Purpose Social Networks (GPSN) like Facebook and Twitter which are open social networks where users can do a variety of useful actions that can be useful for recommendation, but as they can’t rate items, these information are not possible to be used in recommender systems! Moreover, evaluations are based on the known metrics like MAE, and RMSE. This can’t guarantee the satisfaction of users, neither the good quality of recommendation
Ben, Ticha Sonia. "Recommandation personnalisée hybride". Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0168/document.
Texto completoFace to the ongoing rapid expansion of the Internet, user requires help to access to items that may interest her or him. A personalized recommender system filters relevant items from huge catalogue to particular user by observing his or her behavior. The approach based on observing user behavior from his interactions with the website is called usage analysis. Collaborative Filtering and Content-Based filtering are the most widely used techniques in personalized recommender system. Collaborative filtering uses only data from usage analysis to build user profile, while content-based filtering relies in addition on semantic information of items. Hybrid approach is another important technique, which combines collaborative and content-based methods to provide recommendations. The aim of this thesis is to present a new hybridization approach that takes into account the semantic information of items to enhance collaborative recommendations. Several approaches have been proposed for learning a new user profile inferring preferences for semantic information describing items. For each proposed approach, we address the sparsity and the scalability problems. We prove also, empirically, an improvement in recommendations accuracy against collaborative filtering and content-based filtering
Chouiref, Zahira. "Contribution à l'étude de l'optimisation de requêtes de services Web : une approche centrée utilisateur". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2017. http://www.theses.fr/2017ESMA0016.
Texto completoThe internet has completely transformed the way how we communicate (access toinformation). Its evolution was marked by strong growth of published services which has been accompanied by a large explosion in the number of users and a diversity oftheir profiles and their contexts.The work presented in this thesis deal with the adaptive optimization of Web services queries to user needs. This problem is to select a service or a combination of relevant services from a collection of candidates able to perform a required task. These candidate services must meet the requirements requested by the user, the selection makes his/herchoice from non-functional criteria. In our approach, non-functional criteria considered are all associated with preferences of service requester. A significant interest is therefore carried to the user who is at the core of the selection system. This selection is generally considered a complex task because of the diversity of profile and context of the service,which it is performed.Our study focuses mainly on the analysis of different service selection approaches.We especially highlight their contribution to solving the problems inherent in selecting the best services in order to meet the non-functional parameters of the request. Second, our interest has focused on modeling the specification of supply and demand for services, their context and profile as well as the two families preferences : explicit and implicit. Finally, we propose a novel optimization approach that integrates a query reformulation strategy by introducing implicit preferences based on the fuzzy inference process. The idea is to combine the two families of preferences required by the user with consideration of profiles and contexts of services and the user simultaneously. The application of fuzzy set theory in the optimization of preference query of customers by integrating reasoning module on information related to the user leads of great interest in improving the quality of results. We present at the end a set of experiments to demonstrate the validity and relevance of the proposed approach
Servajean, Maximilien. "Recommandation diversifiée et distribuée pour les données scientifiques". Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20216/document.
Texto completoIn many fields, novel technologies employed in information acquisition and measurement (e.g. phenotyping automated greenhouses) are at the basis of a phenomenal creation of data. In particular, we focus on two real use cases: plants observations in botany and phenotyping data in biology. Our contributions can be, however, generalized to Web data. In addition to their huge volume, data are also distributed. Indeed, each user stores their data in many heterogeneous sites (e.g. personal computers, servers, cloud); yet he wants to be able to share them. In both use cases, collaborative solutions, including distributed search and recommendation techniques, could benefit to the user.Thus, the global objective of this work is to define a set of techniques enabling sharing and discovery of data in heterogeneous distributed environment, through the use of search and recommendation approaches.For this purpose, search and recommendation allow users to be presented sets of results, or recommendations, that are both relevant to the queries submitted by the users and with respect to their profiles. Diversification techniques allow users to receive results with better novelty while avoiding redundant and repetitive content. By introducing a distance between each result presented to the user, diversity enables to return a broader set of relevant items.However, few works exploit profile diversity, which takes into account the users that share each item. In this work, we show that in some scenarios, considering profile diversity enables a consequent increase in results quality: surveys show that in more than 75% of the cases, users would prefer profile diversity to content diversity.Additionally, in order to address the problems related to data distribution among heterogeneous sites, two approaches are possible. First, P2P networks aim at establishing links between peers (nodes of the network): creating in this way an overlay network, where peers directly connected to a given peer p are known as his neighbors. This overlay is used to process queries submitted by each peer. However, in state of the art solutions, the redundancy of the peers in the various neighborhoods limits the capacity of the system to retrieve relevant items on the network, given the queries submitted by the users. In this work, we show that introducing diversity in the computation of the neighborhood, by increasing the coverage, enables a huge gain in terms of quality. By taking into account diversity, each peer in a given neighborhood has indeed, a higher probability to return different results given a keywords query compared to the other peers in the neighborhood. Whenever a query is submitted by a peer, our approach can retrieve up to three times more relevant items than state of the art solutions.The second category of approaches is called multi-site. Generally, in state of the art multi-sites solutions, the sites are homogeneous and consist in big data centers. In our context, we propose an approach enabling sharing among heterogeneous sites, such as small research teams servers, personal computers or big sites in the cloud. A prototype regrouping all contributions have been developed, with two versions addressing each of the use cases considered in this thesis
El, Sarraj Lama. "Exploitation d'un entrepôt de données guidée par des ontologies : application au management hospitalier". Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4331.
Texto completoThis research is situated in the domain of Data Warehouses (DW) personalization and concerns DW assistance. Specifically, we are interested in assisting a user during an online analysis processes to use existing operational resources. The application of this research concerns hospital management, for hospitals governance, and is limited to the scope of the Program of Medicalization of Information Systems (PMSI). This research was supported by the Public Hospitals of Marseille (APHM). Our proposal is a semantic approach based on ontologies. The support system implementing this approach, called Ontology-based Personalization System (OPS), is based on a knowledge base operated by a personalization engine. The knowledge base is composed of three ontologies: a domain ontology, an ontology of the DW structure, and an ontology of resources. The personalization engine allows firstly, a personalized search of resources of the DW based on users profile, and secondly for a particular resource, an expansion of the research by recommending new resources based on the context of the resource. To recommend new resources, we have proposed three possible strategies. To validate our proposal, a prototype of the OPS system was developed, a personalization engine has been implemented in Java. This engine exploit an OWL knowledge composed of three interconnected OWL ontologies. We illustrate three experimental scenarios related to PMSI and defined with APHM domain experts
Dhomps, Anne-Lise. "Améliorations des méthodes de combinaison des données Argo et altimétrie pour le suivi des variations à long terme de l'océan". Toulouse 3, 2010. http://thesesups.ups-tlse.fr/1299/.
Texto completoThe objective of this thesis is to analyze the oceanic variability in temperature and salinity over the period 1993-2008. For that purpose, it is necessary to use the maximum of observations available, namely in situ data of temperature and salinity, but also satellite data of sea surface temperature and sea height. To reach our objective, several steps are necessary: create a solid and coherent database, compare datasets to have a better physical understanding of the contents of every type of data then develop methods of combination to assemble the datasets. The cross-comparison of the altimetric and Argo datasets allows at first to verify the quality of the Argo dataset. In 2006, Guinehut and al published a paper on the comparison of SLA (Sea Level Anomaly) and DHA (Dynamic Height Anomaly). Today, the Argo dataset allows improving the comparisons. We explain why and we detail the differences between both studies. We also study the impact of the removal of the seasonal cycle and the influence of the vertical structure of the ocean in the barotrope / barocline distribution of the oceanic circulation. We end on the analysis of the SLA-DHA signal in term of seasonal and inter-annual circulation at 1000 meters deep. Armor3d Field, combination of satellite fields and in situ profiles exist for several years. The recent dataset supplied by Argo profilers allows improving considerably the parameters of the combination, to cover a better geographical zone and to have deeper fields. We show that both types of measures are needed, even to study the large scale variability of the ocean. Finally, we use our Armor3d fields to study the oceanic variability of the last 16 years
Khemiri, Rym. "Vers l'OLAP collaboratif pour la recommandation des analyses en ligne personnalisées". Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO22015/document.
Texto completoThe objective of this thesis is to provide a collaborative approach to the OLAP involving several users, led by an integrated personalization process in decision-making systems in order to help the end user in their analysis process. Whether personalizing the warehouse model, recommending decision queries or recommending navigation paths within the data cubes, the user need an efficient decision-making system that assist him. We were interested in three issues falling within data warehouse and OLAP personalization offering three major contributions. Our contributions are based on a combination of datamining techniques with data warehouses and OLAP technology. Our first contribution is an approach about personalizing dimension hierarchies to obtain new analytical axes semantically richer for the user that can help him to realize new analyzes not provided by the original data warehouse model. Indeed, we relax the constraint of the fixed model of the data warehouse which allows the user to create new relevant analysis axes taking into account both his/her constraints and his/her requirements. Our approach is based on an unsupervised learning method, the constrained k-means. Our goal is then to recommend these new hierarchy levels to other users of the same user community, in the spirit of a collaborative system in which each individual brings his contribution. The second contribution is an interactive approach to help the user to formulate new decision queries to build relevant OLAP cubes based on its past decision queries, allowing it to anticipate its future analysis needs. This approach is based on the extraction of frequent itemsets from a query load associated with one or a set of users belonging to the same actors in a community organization. Our intuition is that the relevance of a decision query is strongly correlated to the usage frequency of the corresponding attributes within a given workload of a user (or group of users). Indeed, our approach of decision queries formulation is a collaborative approach because it allows the user to formulate relevant queries, step by step, from the most commonly used attributes by all actors of the user community. Our third contribution is a navigation paths recommendation approach within OLAP cubes. Users are often left to themselves and are not guided in their navigation process. To overcome this problem, we develop a user-centered approach that suggests the user navigation guidance. Indeed, we guide the user to go to the most interesting facts in OLAP cubes telling him the most relevant navigation paths for him. This approach is based on Markov chains that predict the next analysis query from the only current query. This work is part of a collaborative approach because transition probabilities from one query to another in the cuboids lattice (OLAP cube) is calculated by taking into account all analysis queries of all users belonging to the same community. To validate our proposals, we present a support system user-centered decision which comes in two subsystems: (1) content personalization and (2) recommendation of decision queries and navigation paths. We also conducted experiments that showed the effectiveness of our analysis online user centered approaches using quality measures such as recall and precision
Arnaud, Quentin. "Détection directe de matière noire avec l’expérience EDELWEISS-III : étude des signaux induits par le piégeage de charges, analyse de données et caractérisation de la sensibilité des détecteurs cryogéniques aux WIMPs de basse masse". Thesis, Lyon 1, 2015. http://www.theses.fr/2015LYO10199/document.
Texto completoThe EDELWEISS-III experiment is dedicated to direct dark matter searches aiming at detecting WIMPS. These massive particles should account for more than 80% of the mass of the Universe and be detectable through their elastic scattering on nuclei constituting the absorber of a detector. As the expected WIMP event rate is extremely low (<1/kg/year), a double measurement heat/ionization is performed to discriminate electronic recoils originating from _ and backgrounds and nuclear recoils induced by neutrons and WIMPs. The first part of the thesis work consisted in studying the signals induced by charge carrier trapping. An analytical model of its impact on both ionization and heat signals is presented. The model predictions, through their agreement with both data and a numerical simulation, lead to various applications : improvement of the resolutions, statistical sensitivity to energy deposit depths, characterization of trapping within the crystals. The analysis of the Run308 data is detailed and its results are interpreted in terms of an exclusion limit on the WIMP-nucleon cross section (SI). This study brings to light the presence of a limiting neutron background for high mass WIMP searches (>20GeV). Finally, a study dedicated to the optimization of solid cryogenic detectors to low mass WIMP searches is presented. This study is performed on simulated data using a statistical test based on a profiled likelihood ratio that allows for statistical background subtraction and spectral shape discrimination. This study combined with results from Run308, has lead the EDELWEISS experiment to favor low mass WIMP searches (<20GeV)
Boulil, Kamal. "Une approche automatisée basée sur des contraintes d’intégrité définies en UML et OCL pour la vérification de la cohérence logique dans les systèmes SOLAP : applications dans le domaine agri-environnemental". Thesis, Clermont-Ferrand 2, 2012. http://www.theses.fr/2012CLF22285/document.
Texto completoSpatial Data Warehouse (SDW) and Spatial OLAP (SOLAP) systems are Business Intelligence (BI) allowing for interactive multidimensional analysis of huge volumes of spatial data. In such systems the quality ofanalysis mainly depends on three components : the quality of warehoused data, the quality of data aggregation, and the quality of data exploration. The warehoused data quality depends on elements such accuracy, comleteness and logical consistency. The data aggregation quality is affected by structural problems (e.g., non-strict dimension hierarchies that may cause double-counting of measure values) and semantic problems (e.g., summing temperature values does not make sens in many applications). The data exploration quality is mainly affected by inconsistent user queries (e.g., what are temperature values in USSR in 2010?) leading to possibly meaningless interpretations of query results. This thesis address the problems of logical inconsistency that may affect the data, aggregation and exploration qualities in SOLAP. The logical inconsistency is usually defined as the presence of incoherencies (contradictions) in data ; It is typically controlled by means of Integrity Constraints (IC). In this thesis, we extends the notion of IC (in the SOLAP domain) in order to take into account aggregation and query incoherencies. To overcome the limitations of existing approaches concerning the definition of SOLAP IC, we propose a framework that is based on the standard languages UML and OCL. Our framework permits a plateforme-independent conceptual design and an automatic implementation of SOLAP IC ; It consists of three parts : (1) A SOLAP IC classification, (2) A UML profile implemented in the CASE tool MagicDraw, allowing for a conceptual design of SOLAP models and their IC, (3) An automatic implementation based on the code generators Spatial OCLSQL and UML2MDX, which allows transforming the conceptual specifications into code. Finally, the contributions of this thesis have been experimented and validated in the context of French national projetcts aimming at developping (S)OLAP applications for agriculture and environment
Saaidi, Afaf. "Multi-dimensional probing for RNA secondary structure(s) prediction". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLX067/document.
Texto completoIn structural bioinformatics, predicting the secondary structure(s) of ribonucleic acids (RNAs) represents a major direction of research to understand cellular mechanisms. A classic approach for structure postulates that, at the thermodynamic equilibrium, RNA adopts its various conformations according to a Boltzmann distribution based on its free energy. Modern approaches, therefore, favor the consideration of the dominant conformations. Such approaches are limited in accuracy due to the imprecision of the energy model and the structure topology restrictions.Experimental data can be used to circumvent the shortcomings of predictive computational methods. RNA probing encompasses a wide array of experimental protocols dedicated to revealing partial structural information through exposure to a chemical or enzymatic reagent, whose effect depends on, and thus reveals, features of its adopted structure(s). Accordingly, single-reagent probing data is used to supplement free-energy models within computational methods, leading to significant gains in prediction accuracy. In practice, however, structural biologists integrate probing data produced in various experimental conditions, using different reagents or over a collection of mutated sequences, to model RNA structure(s). This integrative approach remains manual, time-consuming and arguably subjective in its modeling principles. In this Ph.D., we contributed in silico methods for an automated modeling of RNA structure(s) from multiple sources of probing data.We have first established automated pipelines for the acquisition of reactivity profiles from primary data produced through a variety of protocols (SHAPE, DMS using Capillary Electrophoresis, SHAPE-Map/Ion Torrent). We have designed and implemented a new, versatile, method that simultaneously integrates multiple probing profiles. Based on a combination of Boltzmann sampling and structural clustering, it produces alternative stable conformations jointly supported by a set of probing experiments. As it favors recurrent structures, our method allows exploiting the complementarity of several probing assays. The quality of predictions produced using our method compared favorably against state-of-the-art computational predictive methods on single-probing assays.Our method was used to identify models for structured regions in RNA viruses. In collaboration with experimental partners, we suggested a refined structure of the HIV-1 Gag IRES, showing a good compatibility with chemical and enzymatic probing data. The predicted structure allowed us to build hypotheses on binding sites that are functionally relevant to the translation. We also proposed conserved structures in Ebola Untranslated regions, showing a high consistency with both SHAPE probing and evolutionary data. Our modeling allows us to detect conserved and stable stem-loop at the 5’end of each UTR, a typical structure found in viral genomes to protect the RNA from being degraded by nucleases.Our method was extended to the analysis of sequence variants. We analyzed a collection of DMS probed mutants, produced by the Mutate-and-Map protocol, leading to better structural models for the GIR1 lariat-capping ribozyme than from the sole wild-type sequence. To avoid systematic production of point-wise mutants, and exploit the recent SHAPEMap protocol, we designed an experimental protocol based on undirected mutagenesis and sequencing, where several mutated RNAs are produced and simultaneously probed. Produced reads must then be re-assigned to mutants to establish their reactivity profiles used later for structure modeling. The assignment problem was modeled as a likelihood maximization joint inference of mutational profiles and assignments, and solved using an instance of the "Expectation-Maximization" algorithm. Preliminary results on a reduced/simulated sample of reads showed a remarkable decrease of the reads assignment errors compared to a classic algorithm
Rosenmann, Laurence. "Etudes théorique et expérimentale de l'élargissement par collisions des raies de CO2 perturbé par CO2, H2O, N2 et O2 : constitution d'une base de données infrarouge et Raman appliquée aux transferts thermiques et à la combustion". Châtenay-Malabry, Ecole centrale de Paris, 1988. http://www.theses.fr/1988ECAP0071.
Texto completoTifafi, Marwa. "Different soil study tools to better understand the dynamics of carbon in soils at different spatial scales, from a single soil profile to the global scale". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLV021/document.
Texto completoSoils are the major components ofthe terrestrial ecosystems and the largest organiccarbon reservoir on Earth, being very reactive tohuman disturbance and climate change. Despiteits importance within the carbon reservoirs, soilcarbon dynamics is an important source ofuncertainties for future climate predictions. Theaim of the thesis was to explore different aspectsof soil carbon studies (Experimentalmeasurements, modeling, and databaseevaluation) at different spatial scales (from thescale of a profile to the global scale). Wehighlighted that the estimation of the global soilcarbon stocks is still quite uncertain.Consequently, the role of soil carbon in theclimate dynamics becomes one of the majoruncertainties in the Earth system models (ESMs)used to predict future climate change. Thesecond part of thesis deals with the presentationof a new version of the IPSL-Land SurfaceModel called ORCHIDEE-SOM, incorporatingthe 14C dynamics in the soil. Several tests doneassume that model improvements should focusmore on a depth dependent parameterization,mainly for the diffusion, in order to improve therepresentation of the global carbon cycle inLand Surface Models, thus helping to constrainthe predictions of the future soil organic carbonresponse to global warming
Roberts, Mark Alvin. "Full waveform inversion of walk-away VSP data". Paris, Institut de physique du globe, 2007. http://www.theses.fr/2007GLOB0020.
Texto completoDepletion of the earth’s hydrocarbon reserves has led to exploration and production in increasingly complex environments. Imaging beneath allochthonous salt (e. G. Salt domes) remains a challenging task for seismic techniques due to the large velocity contrast of the salt with neighbouring sediments and the very complex structures generated by salt movement. Extensive allochthonous salt sheets cover many potentially productive regions in the deep-water Gulf of Mexico. Drilling through the base of salt is an extremely challenging task due to widely varying pore-pressure found in the sediments beneath. Seismic methods to estimate the seismic velocity can be used in conjunction with empirical formula to predict the pore pressure. However, accurate measurements are often not possible from surface reflection seismic data, so walk-away Vertical Seismic Profile (VSP) data has been used. This involves repeatedly firing a seismic source at various distances from the borehole (usually an airgun array) while recording the velocities measured by geophones in the borehole placed at appropriate depths near the base of the salt. Before this thesis, the data had been processed using the amplitude versus angle information in a simple one-dimension approximation or using travel time information (also using a 1D assumption). In this thesis, I have used 2D full waveform inversion to tackle the problem of velocity estimation. This has the advantage of simultaneously inverting the whole dataset (including transmitted waves, reflected waves, converted waves) and the method includes traveltime and amplitude information. The inversion was performed using local inversion methods due to the size of the inverse problem and the cost of the forward problem. Concerns over large sensitivity variations, that are inherent in the data acquisition, have lead to an examination of the Gauss-Newton method and possible preconditioning matrices for the conjugate gradient method. Due to the poorly constrained nature of the inverse problem, a smoothness constraint has been applied with an innovative preconditioning method. The methodology has been applied to real data and the pore pressure has been predicted using the well established Eaton equation. In addition, the sub-salt structure was recovered, further demonstrating the value of this technique
Durand, Marie. "La découverte et la compréhension des profils d’apprenants : classification semi-supervisée et acquisition d’une langue seconde". Thesis, Paris 8, 2019. http://www.theses.fr/2019PA080029.
Texto completoThis thesis aims to develop an effective methodology for the discovery and description of the learner's profile of an L2 based on acquisition data (perception, understanding and production). We want to detect patterns in the acquisition behaviours of subgroups of learners, taking into account the multidimensional aspect of the L2 learning process. The proposed methodology belongs to the field of artificial intelligence, more specifically to semi supervised clustering techniques.Our algorithm has been applied to the data base of the VILLA project, which includes the performance of learners from 5 different source languages (French, Italian, Dutch, German and English) with Polish as the target language. 156 adult learners were each tested with a variety of tasks in Polish during 14 hours of teaching session, starting from the initial exposure. These tests made it possible to evaluate their performance on the levels of linguistic analysis that are phonology, morphology, morphosyntax and lexicon. The database also includes their sensitivity to input characteristics, such as the frequency and transparency of lexical elements used in linguistic tasks.The similarity measure used in traditional clustering techniques is revisited in this work in order to evaluate the distance between two learners from an acquisitionist point of view. It is based on the identification of the learner's response strategy to a specific language test structure. We show that this measure makes it possible to detect the presence or absence in the learner's responses of a strategy similar to the LC flexional system, and so enables our algorithm to provide a resulting classification consistent with second language acquisition research. As a result, we claim that our algorithm might be relevant in the empirical establishment of learners' profiles and the discovery of new opportunities for reflection or analysis
Keshri, Vivek. "Evolutionary analysis of the β-lactamase families". Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0250.
Texto completoThe β-lactam antibiotics are one of the oldest and widely used antimicrobial drugs. The bacterial enzyme β-lactamase hydrolyzes the β-lactam antibiotic by breaking the core structure “β-lactam ring”. To identify the novel β-lactamases a comprehensive investigation was performed in different biological databases such as Human Microbiome Project, env_nr, and NCBI nr. The analysis revealed that putative ancestral sequences and HMM profile searches played a significant role in the identification of remote homologous and uncovered the existing β-lactamase enzyme in the metagenomic database as dark-matter. The comprehensive phylogenetic analyses of extant and newly identified β-lactamase represent the novel clades in the trees. Further, the β-lactam antibiotic hydrolysis activity of newly identified sequences (from archaea and human) was investigated in laboratory, which shows β-lactamase activity.The second phase of the investigation was undertaken to examine the functional evolution of β-lactamases. First, 1155 β-lactamase protein sequences were retrieved from ARG-ANNOT database and MIC values from the corresponding literature. The results revealed that the functional activity of β-lactamase evolved convergently within the molecular class.The third phase of this thesis presents development of an integrative β-lactamase database. The existing public database of β-lactamase has limited information, therefore, an integrative database was developed
Harrak, Fatima. "Analyse de questions d’apprenants et de profils associés dans des environnements en ligne". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS115.
Texto completoStudents' questions are useful for their learning and for teachers' pedagogical adaptation. However, the volume of questions asked online by students may prevent teachers from dealing with each question (e.g. MOOC or large university cohort). We address this issue mainly in the context of a hybrid training program in which students ask questions online each week, using a flipped classroom approach, to help teachers prepare their on-site Q&A session. Our objective is to support the teacher to determine the types of questions asked by different groups of learners. To conduct this work, we developed a question coding scheme guided by student’s intention and teacher’s pedagogical reaction. Several automatic classification tools have been designed, evaluated and combined to categorize the questions. We have shown how a clustering-based model built on data from previous sessions can be used to predict students' online profiles using exclusively the nature of the questions they ask. These results allowed us to propose three alternative questions’ organizations to teachers (based on questions’ categories and learners’ profiles), opening up perspectives for different pedagogical approaches during Q&A sessions. We have tested and demonstrated the possibility of adapting our coding scheme and associated tools to the very different context of a MOOC, which suggests a form of genericity in our approach
Potie, Gilbert. "Contribution à l'étude géologique de la frontière SE de la plaque caraibe : la serrania del interior oriental sur le transect Cumana-Urica et le bassin de Maturin (Vénézuela) : application de données géophysiques et géologiques à une interpretation structurale". Brest, 1989. http://www.theses.fr/1989BRES2005.
Texto completoGuénec, Nadège. "Méthodologies pour la création de connaissances relatives au marché chinois dans une démarche d'Intelligence Économique : application dans le domaine des biotechnologies agricoles". Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00554743.
Texto completoSomé, Sobom Matthieu. "Estimations non paramétriques par noyaux associés multivariés et applications". Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2030/document.
Texto completoThis work is about nonparametric approach using multivariate mixed associated kernels for densities, probability mass functions and regressions estimation having supports partially or totally discrete and continuous. Some key aspects of kernel estimation using multivariate continuous (classical) and (discrete and continuous) univariate associated kernels are recalled. Problem of supports are also revised as well as a resolution of boundary effects for univariate associated kernels. The multivariate associated kernel is then defined and a construction by multivariate mode-dispersion method is provided. This leads to an illustration on the bivariate beta kernel with Sarmanov's correlation structure in continuous case. Properties of these estimators are studied, such as the bias, variances and mean squared errors. An algorithm for reducing the bias is proposed and illustrated on this bivariate beta kernel. Simulations studies and applications are then performed with bivariate beta kernel. Three types of bandwidth matrices, namely, full, Scott and diagonal are used. Furthermore, appropriated multiple associated kernels are used in a practical discriminant analysis task. These are the binomial, categorical, discrete triangular, gamma and beta. Thereafter, associated kernels with or without correlation structure are used in multiple regression. In addition to the previous univariate associated kernels, bivariate beta kernels with or without correlation structure are taken into account. Simulations studies show the performance of the choice of associated kernels with full or diagonal bandwidth matrices. Then, (discrete and continuous) associated kernels are combined to define mixed univariate associated kernels. Using the tools of unification of discrete and continuous analysis, the properties of the mixed associated kernel estimators are shown. This is followed by an R package, created in univariate case, for densities, probability mass functions and regressions estimations. Several smoothing parameter selections are implemented via an easy-to-use interface. Throughout the paper, bandwidth matrix selections are generally obtained using cross-validation and sometimes Bayesian methods. Finally, some additionnal informations on normalizing constants of associated kernel estimators are presented for densities or probability mass functions
Gratton, Eloïse. "Wireless privacy and personalized location-based services: the challenge of translating the legal framework into business practices". Thèse, 2002. http://hdl.handle.net/1866/2763.
Texto completoThe proliferation of mobile communications is leading to new services based on the ability of service providers to determine, with increasing precision and through the use of location determination technologies, the geographic location of wireless devices and allow their users to receive services based on such location. The development of location-based services introduces new privacy risks for consumers that must be addressed. The portability of wireless devices coupled with their ability to pinpoint the location of wireless users and reveal it to others could produce a system where the everyday activities and movements of these users are tracked and recorded, and where wireless users receive unanticipated messages on their device. For this reason and in order to preserve the privacy of wireless users, a company looking to deploy a technology related to the providing of personalized location-based services ("LBS Provider") will have to analyze the privacy legal framework, coming either from legal sources--that may be in some case vague and not specific to this new context--or from the industry, and translate such framework into business practices. Such analysis may help in establishing what kind of business model and technology should be adopted and developed by LBS Providers in order to ensure the privacy of wireless users while providing this new type of service.
Jetté, Virginie. "Traque-moi si je le veux : à la recherche d'un cadre juridique entourant la publicité comportementale". Thèse, 2017. http://hdl.handle.net/1866/20384.
Texto completoMouine, Mohamed. "Présentation personnalisée des informations environnementales". Thèse, 2014. http://hdl.handle.net/1866/11198.
Texto completoWe present our work in this thesis in the field of information visualization. We dealt with the problem of the generation of weather forecasts reports. Given the huge amount of information produced by Environment Canada and a wide variety of users, it must generate a customized visualization that meets their needs and preferences. We developed MeteoVis, a weather report generator. Given that we have little information on the user profile, we relied on the choices made by similar users to calculate the needs and preferences of a user. We use unsupervised machine learning techniques to group similar users . We compute a degree of similarity of user profiles in the same cluster to determine the needs and preferences. We conducted, with the help of external users experiments for evaluating and comparing our tool with the current site of Environment Canada. The evaluation results show that the visualizations generated by MeteoVis are significantly better than the current bulletins prepared by EC.