Dissertations / Theses on the topic 'Apprentissage avec peu de données'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 24 dissertations / theses for your research on the topic 'Apprentissage avec peu de données.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gautheron, Léo. "Construction de Représentation de Données Adaptées dans le Cadre de Peu d'Exemples Étiquetés." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSES044.
Full textMachine learning consists in the study and design of algorithms that build models able to handle non trivial tasks as well as or better than humans and hopefully at a lesser cost.These models are typically trained from a dataset where each example describes an instance of the same task and is represented by a set of characteristics and an expected outcome or label which we usually want to predict.An element required for the success of any machine learning algorithm is related to the quality of the set of characteristics describing the data, also referred as data representation or features.In supervised learning, the more the features describing the examples are correlated with the label, the more effective the model will be.There exist three main families of features: the ``observable'', the ``handcrafted'' and the ``latent'' features that are usually automatically learned from the training data.The contributions of this thesis fall into the scope of this last category. More precisely, we are interested in the specific setting of learning a discriminative representation when the number of data of interest is limited.A lack of data of interest can be found in different scenarios.First, we tackle the problem of imbalanced learning with a class of interest composed of a few examples by learning a metric that induces a new representation space where the learned models do not favor the majority examples.Second, we propose to handle a scenario with few available examples by learning at the same time a relevant data representation and a model that generalizes well through boosting models using kernels as base learners approximated by random Fourier features.Finally, to address the domain adaptation scenario where the target set contains no label while the source examples are acquired in different conditions, we propose to reduce the discrepancy between the two domains by keeping only the most similar features optimizing the solution of an optimal transport problem between the two domains
Barrère, Killian. "Architectures de Transformer légères pour la reconnaissance de textes manuscrits anciens." Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0017.
Full textTransformer architectures deliver low error rates but are challenging to train due to limited annotated data in handwritten text recognition. We propose lightweight Transformer architectures to adapt to the limited amounts of annotated handwritten text available. We introduce a fast Transformer architecture with an encoder, processing up to 60 pages per second. We also present architectures using a Transformer decoder to incorporate language modeling into character recognition. To effectively train our architectures, we offer algorithms for generating synthetic data adapted to the visual style of modern and historical documents. Finally, we propose strategies for learning with limited data and reducing prediction errors. Our architectures, combined with synthetic data and these strategies, achieve competitive error rates on lines of text from modern documents. For historical documents, they train effectively with minimal annotated data, surpassing state-ofthe- art approaches. Remarkably, just 500 annotated lines are sufficient for character error rates close to 5%
Kasper, Kévin. "Apprentissage d'estimateurs sans modèle avec peu de mesures - Application à la mécanique des fluides." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLN029/document.
Full textThis thesis deals with sparsity promoting techniques in order to produce efficient estimators relying only on a small amount of measurements given by sensors. These sensor locations are crucial to the estimators and have to be chosen meticulously. The proposed methods do not require dynamical models and are instead based on a collection of snapshots of the field of interest. This learning sequence can be acquired through measurements on the real system or through numerical simulation. By relying only on a learning sequence, and not on dynamical models, the proposed methods become general and applicable to a variety of systems.These techniques are illustrated on the 2-D fluid flow around a cylindrical body. The pressure field in the neighbourhood of the cylinder has to be estimated from a limited amount of surface pressure measurements. For a given arrangement of the sensors, efficient estimators suited to these locations are proposed. These estimators fully harness the information given by the limited amount of sensors by manipulating sparse representations and classes. Cases where the measurements are no longer made on the field to be estimated can also be considered. A sensor placement algorithm is proposed in order to improve the performances of the estimators.Multiple extensions are discussed : incorporating past measurements, past control inputs, recovering a field non-linearly related to the measurements, estimating a vectorial field, etc
Dabuleanu, Simona. "Problèmes aux limites pour les équations de Hamilton-Jacobi avec viscosité et données initiales peu régulières." Nancy 1, 2003. http://www.theses.fr/2003NAN10058.
Full textThis thesis deal with the viscous Hamilton-Jacobi equations (VHJ) on bounded domains with smooth boundary. This equation is a nonlinear parabolic problem for which the second term is a power of the gradient of the solution. We study the existence, uniqueness and regularity of weak solutions for (VHJ) equation with Dirichlet or Neumann homogeneous boundary conditions and irregular initial data. The cases of initial data a bounded Radon measure, or a measurable function in the Lebesgue space are investigated. Next, using the Bernstein technique we prove some qualitative properties of these solutions. A particular attention is given to the long time behaviour depending on the sign and the exponent of the nonlinear term
Tremblay, Maxime. "Vision numérique avec peu d'étiquettes : segmentation d'objets et analyse de l'impact de la pluie." Doctoral thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69039.
Full textPhan, Thi Hai Hong. "Reconnaissance d'actions humaines dans des vidéos avec l'apprentissage automatique." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1038.
Full textIn recent years, human action recognition (HAR) has attracted the research attention thanks to its various applications such as intelligent surveillance systems, video indexing, human activities analysis, human-computer interactions and so on. The typical issues that the researchers are envisaging can be listed as the complexity of human motions, the spatial and temporal variations, cluttering, occlusion and change of lighting condition. This thesis focuses on automatic recognizing of the ongoing human actions in a given video. We address this research problem by using both shallow learning and deep learning approaches.First, we began the research work with traditional shallow learning approaches based on hand-scrafted features by introducing a novel feature named Motion of Oriented Magnitudes Patterns (MOMP) descriptor. We then incorporated this discriminative descriptor into simple yet powerful representation techniques such as Bag of Visual Words, Vector of locally aggregated descriptors (VLAD) and Fisher Vector to better represent actions. Also, PCA (Principal Component Analysis) and feature selection (statistical dependency, mutual information) are applied to find out the best subset of features in order to improve the performance and decrease the computational expense. The proposed method obtained the state-of-the-art results on several common benchmarks.Recent deep learning approaches require an intensive computations and large memory usage. They are therefore difficult to be used and deployed on the systems with limited resources. In the second part of this thesis, we present a novel efficient algorithm to compress Convolutional Neural Network models in order to decrease both the computational cost and the run-time memory footprint. We measure the redundancy of parameters based on their relationship using the information theory based criteria, and we then prune the less important ones. The proposed method significantly reduces the model sizes of different networks such as AlexNet, ResNet up to 70% without performance loss on the large-scale image classification task.Traditional approach with the proposed descriptor achieved the great performance for human action recognition but only on small datasets. In order to improve the performance on the large-scale datasets, in the last part of this thesis, we therefore exploit deep learning techniques to classify actions. We introduce the concepts of MOMP Image as an input layer of CNNs as well as incorporate MOMP image into deep neural networks. We then apply our network compression algorithm to accelerate and improve the performance of system. The proposed method reduces the model size, decreases the over-fitting, and thus increases the overall performance of CNN on the large-scale action datasets.Throughout the thesis, we have showed that our algorithms obtain good performance in comparison to the state-of-the-art on challenging action datasets (Weizmann, KTH, UCF Sports, UCF-101 and HMDB51) with low resource required
Raja, Suleiman Raja Fazliza. "Méthodes de detection robustes avec apprentissage de dictionnaires. Applications à des données hyperspectrales." Thesis, Nice, 2014. http://www.theses.fr/2014NICE4121/document.
Full textThis Ph.D dissertation deals with a "one among many" detection problem, where one has to discriminate between pure noise under H0 and one among L known alternatives under H1. This work focuses on the study and implementation of robust reduced dimension detection tests using optimized dictionaries. These detection methods are associated with the Generalized Likelihood Ratio test. The proposed approaches are principally assessed on hyperspectral data. In the first part, several technical topics associated to the framework of this dissertation are presented. The second part highlights the theoretical and algorithmic aspects of the proposed methods. Two issues linked to the large number of alternatives arise in this framework. In this context, we propose dictionary learning techniques based on a robust criterion that seeks to minimize the maximum power loss (type minimax). In the case where the learned dictionary has K = 1 column, we show that the exact solution can be obtained. Then, we propose in the case K > 1 three minimax learning algorithms. Finally, the third part of this manuscript presents several applications. The principal application regards astrophysical hyperspectral data of the Multi Unit Spectroscopic Explorer instrument. Numerical results show that the proposed algorithms are robust and in the case K > 1 they allow to increase the minimax detection performances over the K = 1 case. Other possible applications such as worst-case recognition of faces and handwritten digits are presented
Truong, Nguyen Tuong Vinh. "Apprentissage de fonctions d'ordonnancement avec peu d'exemples étiquetés : une application au routage d'information, au résumé de textes et au filtrage collaboratif." Paris 6, 2009. http://www.theses.fr/2009PA066568.
Full textBelilovsky, Eugene. "Apprentissage de graphes structuré et parcimonieux dans des données de haute dimension avec applications à l’imagerie cérébrale." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC027.
Full textThis dissertation presents novel structured sparse learning methods on graphs that address commonly found problems in the analysis of neuroimaging data as well as other high dimensional data with few samples. The first part of the thesis proposes convex relaxations of discrete and combinatorial penalties involving sparsity and bounded total variation on a graph as well as bounded `2 norm. These are developed with the aim of learning an interpretable predictive linear model and we demonstrate their effectiveness on neuroimaging data as well as a sparse image recovery problem.The subsequent parts of the thesis considers structure discovery of undirected graphical models from few observational data. In particular we focus on invoking sparsity and other structured assumptions in Gaussian Graphical Models (GGMs). To this end we make two contributions. We show an approach to identify differences in Gaussian Graphical Models (GGMs) known to have similar structure. We derive the distribution of parameter differences under a joint penalty when parameters are known to be sparse in the difference. We then show how this approach can be used to obtain confidence intervals on edge differences in GGMs. We then introduce a novel learning based approach to the problem structure discovery of undirected graphical models from observational data. We demonstrate how neural networks can be used to learn effective estimators for this problem. This is empirically shown to be flexible and efficient alternatives to existing techniques
Vo, Xuan Thanh. "Apprentissage avec la parcimonie et sur des données incertaines par la programmation DC et DCA." Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0193/document.
Full textIn this thesis, we focus on developing optimization approaches for solving some classes of optimization problems in sparsity and robust optimization for data uncertainty. Our methods are based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) which are well-known as powerful tools in optimization. This thesis is composed of two parts: the first part concerns with sparsity while the second part deals with uncertainty. In the first part, a unified DC approximation approach to optimization problem involving the zero-norm in objective is thoroughly studied on both theoretical and computational aspects. We consider a common DC approximation of zero-norm that includes all standard sparse inducing penalty functions, and develop general DCA schemes that cover all standard algorithms in the field. Next, the thesis turns to the nonnegative matrix factorization (NMF) problem. We investigate the structure of the considered problem and provide appropriate DCA based algorithms. To enhance the performance of NMF, the sparse NMF formulations are proposed. Continuing this topic, we study the dictionary learning problem where sparse representation plays a crucial role. In the second part, we exploit robust optimization technique to deal with data uncertainty for two important problems in machine learning: feature selection in linear Support Vector Machines and clustering. In this context, individual data point is uncertain but varies in a bounded uncertainty set. Different models (box/spherical/ellipsoidal) related to uncertain data are studied. DCA based algorithms are developed to solve the robust problems
Verny, Louis. "Apprentissage de réseaux causaux avec variables latentes et applications à des contextes génomiques et cliniques." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066545/document.
Full textDuring my PhD, I worked on the development of an information theory based algorithm allowing the reconstruction of a wide variety of graphical model classes from observationnal datas. This method also allows to tackle the effect of latent (unobserved) latent variables ; which is essential given the difficultyto observe a biological/clinical system as a whole. Our method, called Miic (for Multivariate Information-based Inductive Causation), starts from a complete network (all nodes are connected to each other), and iteratively removes non essential edges from it. The second part of my thesis was to analyze and interpret the networks reconstructed from two kinds of biological datasets : Genomic dataset on one hand : Miic was used to learn networks of transcriptomic interactions driving the differentiation of the first hematopoietic cells of the embryo. Clinical datasets on the other hand : Miic was also used on two datasets extracted from two distinct cohort, obtained thanks to two collaborations, with la Pitié-Salpétrière (neurology dataset) and with Institut Curie Hospital (breast cancer dataset). The testing during Miic development, along with the results obtained when we analyzed the different applications presented in this manuscript show Miic’s efficiency at both confirming already known interactions, and getting previously unknown associations
Chen, Dexiong. "Modélisation de données structurées avec des machines profondes à noyaux et des applications en biologie computationnelle." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM070.
Full textDeveloping efficient algorithms to learn appropriate representations of structured data, including sequences or graphs, is a major and central challenge in machine learning. To this end, deep learning has become popular in structured data modeling. Deep neural networks have drawn particular attention in various scientific fields such as computer vision, natural language understanding or biology. For instance, they provide computational tools for biologists to possibly understand and uncover biological properties or relationships among macromolecules within living organisms. However, most of the success of deep learning methods in these fields essentially relies on the guidance of empirical insights as well as huge amounts of annotated data. Exploiting more data-efficient models is necessary as labeled data is often scarce.Another line of research is kernel methods, which provide a systematic and principled approach for learning non-linear models from data of arbitrary structure. In addition to their simplicity, they exhibit a natural way to control regularization and thus to avoid overfitting.However, the data representations provided by traditional kernel methods are only defined by simply designed hand-crafted features, which makes them perform worse than neural networks when enough labeled data are available. More complex kernels inspired by prior knowledge used in neural networks have thus been developed to build richer representations and thus bridge this gap. Yet, they are less scalable. By contrast, neural networks are able to learn a compact representation for a specific learning task, which allows them to retain the expressivity of the representation while scaling to large sample size.Incorporating complementary views of kernel methods and deep neural networks to build new frameworks is therefore useful to benefit from both worlds.In this thesis, we build a general kernel-based framework for modeling structured data by leveraging prior knowledge from classical kernel methods and deep networks. Our framework provides efficient algorithmic tools for learning representations without annotations as well as for learning more compact representations in a task-driven way. Our framework can be used to efficiently model sequences and graphs with simple interpretation of predictions. It also offers new insights about designing more expressive kernels and neural networks for sequences and graphs
Loeffel, Pierre-Xavier. "Algorithmes de machine learning adaptatifs pour flux de données sujets à des changements de concept." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066496/document.
Full textIn this thesis, we investigate the problem of supervised classification on a data stream subject to concept drifts. In order to learn in this environment, we claim that a successful learning algorithm must combine several characteristics. It must be able to learn and adapt continuously, it shouldn’t make any assumption on the nature of the concept or the expected type of drifts and it should be allowed to abstain from prediction when necessary. On-line learning algorithms are the obvious choice to handle data streams. Indeed, their update mechanism allows them to continuously update their learned model by always making use of the latest data. The instance based (IB) structure also has some properties which make it extremely well suited to handle the issue of data streams with drifting concepts. Indeed, IB algorithms make very little assumptions about the nature of the concept they are trying to learn. This grants them a great flexibility which make them likely to be able to learn from a wide range of concepts. Another strength is that storing some of the past observations into memory can bring valuable meta-informations which can be used by an algorithm. Furthermore, the IB structure allows the adaptation process to rely on hard evidences of obsolescence and, by doing so, adaptation to concept changes can happen without the need to explicitly detect the drifts. Finally, in this thesis we stress the importance of allowing the learning algorithm to abstain from prediction in this framework. This is because the drifts can generate a lot of uncertainties and at times, an algorithm might lack the necessary information to accurately predict
Traoré, Abraham. "Contribution à la décomposition de données multimodales avec des applications en apprentisage de dictionnaires et la décomposition de tenseurs de grande taille." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR068/document.
Full textIn this work, we are interested in special mathematical tools called tensors, that are multidimensional arrays defined on tensor product of some vector spaces, each of which has its own coordinate system and the number of spaces involved in this product is generally referred to as order. The interest for these tools stem from some empirical works (for a range of applications encompassing both classification and regression) that prove the superiority of tensor processing with respect to matrix decomposition techniques. In this thesis framework, we focused on specific tensor model named Tucker and established new approaches for miscellaneous tasks such as dictionary learning, online dictionary learning, large-scale processing as well as the decomposition of a tensor evolving with respect to each of its modes. New theoretical results are established and the efficiency of the different algorithms, which are based either on alternate minimization or coordinate gradient descent, is proven via real-world problems
Perrine, Cribier-Delande. "Faciliter la mise en place d'études d'utilisabilité par des outils de stockage des données et d'analyse automatique des traces d'utilisation : un cas d'étude avec une application mobile." Mémoire, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/9538.
Full textCanellas, Camila. "Métamodèle d'analytique des apprentissages avec le numérique." Electronic Thesis or Diss., Sorbonne université, 2021. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2021SORUS538.pdf.
Full textThis work is part of a process of implementing a learning analytics process in a context where documentary production is carried out via an engineering approach driven by models. We are mainly interested in the possibilities that could emerge if the same approach is used in order to achieve such an implementation. Our problematic concerns the identification of these possibilities, in particular by ensuring to allow, via the proposed metamodel, the enrichment of learning indicators with the semantics and the structure of the educational documents consulted by the learners, as well as an a priori definition of relevant indicators. In order to design the metamodel in question, we first carried out an exploratory study with learners, aimed at knowing their needs and the reception of enriched indicators. On the other hand, we carried out a systematic review of the literature of existing interaction indicators in the field of learning analytics in order to gather the potential elements to be abstracted for the construction of the corresponding metamodel. The challenge was to design a metamodel where the elements necessary for the abstraction of this domain are present without being unnecessarily complex, making it possible to model both learning indicators based on a descriptive analysis and those used for a prediction or a diagnosis. We then proceeded to a proof of concept and an evaluation of this metamodel
Suzano, Massa Francisco Vitor. "Mise en relation d'images et de modèles 3D avec des réseaux de neurones convolutifs." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1198/document.
Full textThe recent availability of large catalogs of 3D models enables new possibilities for a 3D reasoning on photographs. This thesis investigates the use of convolutional neural networks (CNNs) for relating 3D objects to 2D images.We first introduce two contributions that are used throughout this thesis: an automatic memory reduction library for deep CNNs, and a study of CNN features for cross-domain matching. In the first one, we develop a library built on top of Torch7 which automatically reduces up to 91% of the memory requirements for deploying a deep CNN. As a second point, we study the effectiveness of various CNN features extracted from a pre-trained network in the case of images from different modalities (real or synthetic images). We show that despite the large cross-domain difference between rendered views and photographs, it is possible to use some of these features for instance retrieval, with possible applications to image-based rendering.There has been a recent use of CNNs for the task of object viewpoint estimation, sometimes with very different design choices. We present these approaches in an unified framework and we analyse the key factors that affect performance. We propose a joint training method that combines both detection and viewpoint estimation, which performs better than considering the viewpoint estimation separately. We also study the impact of the formulation of viewpoint estimation either as a discrete or a continuous task, we quantify the benefits of deeper architectures and we demonstrate that using synthetic data is beneficial. With all these elements combined, we improve over previous state-of-the-art results on the Pascal3D+ dataset by a approximately 5% of mean average viewpoint precision.In the instance retrieval study, the image of the object is given and the goal is to identify among a number of 3D models which object it is. We extend this work to object detection, where instead we are given a 3D model (or a set of 3D models) and we are asked to locate and align the model in the image. We show that simply using CNN features are not enough for this task, and we propose to learn a transformation that brings the features from the real images close to the features from the rendered views. We evaluate our approach both qualitatively and quantitatively on two standard datasets: the IKEAobject dataset, and a subset of the Pascal VOC 2012 dataset of the chair category, and we show state-of-the-art results on both of them
Braud, Chloé. "Identification automatique des relations discursives implicites à partir de corpus annotés et de données brutes." Sorbonne Paris Cité, 2015. https://hal.inria.fr/tel-01256884.
Full textBuilding discourse parsers is currently a major challenge in Natural Language Processing. The identification of the relations (such as Explanation, Contrast. . . ) linking spans of text in the document is the main difficulty. Especially, identifying the so-called implicit relations, that is the relations that lack a discourse connective (such as but, because. . . ), is known as an hard tank sine it requires to take into account varions factors, and because it leads to specific difficulties in a classification system. In this thesis, we use raw data to improve automatic identification of implicit relations. First, we propose to use discourse markers in order to automatically annotate new data. We use domain adaptation methods to deal with the distributional differences between automatically and manually annotated data : we report improvements for systems built on the French corpus ANNODIS and on the English corpus Penn Discourse Treebank. Then, we propose to use word representations built from raw data, which may be automatically annotated with discourse markers, in order to feed a representation of the data based on the words found in the spans of text to be linked. We report improvements on the English corpus Penn Discourse Treebank, and especially we show that this method alleviates the need for rich resources, available but for a few languages
Mutuvi, Stephen. "Epidemic Event Extraction in Multilingual and Low-resource Settings." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS044.
Full textEpidemic event extraction aims to extract incidents of public health importance from text, such as disease outbreaks. While event extraction has been extensively researched for high-resource languages such as English, existing systems for epidemic event extraction are sub-optimal for low-resource, multilingual settings due to training data scarcity. First, we tackle the data scarcity challenge by transforming and annotating an existing document-level multilingual dataset into a token-level annotated dataset suitable for supervised sequence learning. Second, we formulate the event extraction task as a sequence labeling task and utilize the token-level annotated dataset to train supervised machine and deep learning models for epidemic event extraction. The results show that pre-trained language models produced the best overall performance across all the evaluated languages. Third, we propose a domain adaptation technique by including epidemiological entities (disease names and locations) in the vocabulary of pre-trained models. Incorporating the entities positively impacted the tokenization quality, contributing to model performance improvement. Finally, we evaluate self-training and observe that the approach performs marginally better than models trained using supervised learning
Ben, Abdallah Emna. "Étude de la dynamique des réseaux biologiques : apprentissage des modèles, intégration des données temporelles et analyse formelle des propriétés dynamiques." Thesis, Ecole centrale de Nantes, 2017. http://www.theses.fr/2017ECDN0041.
Full textOver the last few decades, the emergence of a wide range of new technologies has produced a massive amount of biological data (genomics, proteomics...). Thus, a very large amount of time series data is now produced every day. The newly produced data can give us new ideas about the behavior of biological systems. This leads to considerable developments in the field of bioinformatics that could benefit from these enormous data. This justifies the motivation to develop efficient methods for learning Biological Regulatory Networks (BRN) modeling a biological system from its time series data. Then, in order to understand the nature of system functions, we study, in this thesis, the dynamics of their BRN models. Indeed, we focus on developing original and scalable logical methods (implemented in Answer Set Programming) to deciphering the emerging complexity of dynamics of biological systems. The main contributions of this thesis are enumerated in the following. (i) Refining the dynamics of the BRN, modeling with the automata Network (AN) formalism, by integrating a temporal parameter (delay) in the local transitions of the automata. We call the extended formalism a Timed Automata Network (T-AN). This integration allows the parametrization of the transitions between each automata local states as well as between the network global states. (ii) Learning BRNs modeling biological systems from their time series data. (iii) Model checking of discrete dynamical properties of BRN (modeling with AN and T-AN) by dynamical formal analysis : attractors identification (minimal trap domains from which the network cannot escape) and reachability verification of an objective from a network global initial state
Trouvilliez, Benoît. "Similarités de données textuelles pour l'apprentissage de textes courts d'opinions et la recherche de produits." Thesis, Artois, 2013. http://www.theses.fr/2013ARTO0403/document.
Full textThis Ph.D. thesis is about the establishment of textual data similarities in the client relation domain. Two subjects are mainly considered : - the automatic analysis of short messages in response of satisfaction surveys ; - the search of products given same criteria expressed in natural language by a human through a conversation with a program. The first subject concerns the statistical informations from the surveys answers. The ideas recognized in the answers are identified, organized according to a taxonomy and quantified. The second subject concerns the transcription of some criteria over products into queries to be interpreted by a database management system. The number of criteria under consideration is wide, from simplest criteria like material or brand, until most complex criteria like color or price. The two subjects meet on the problem of establishing textual data similarities thanks to NLP techniques. The main difficulties come from the fact that the texts to be processed, written in natural language, are short ones and with lots of spell checking errors and negations. Establishment of semantic similarities between words (synonymy, antonymy, ...) and syntactic relations between syntagms (conjunction, opposition, ...) are other issues considered in our work. We also study in this Ph. D. thesis automatic clustering and classification methods in order to analyse answers to satisfaction surveys
Caigny, Arno de. "Innovation in customer scoring for the financial services industry." Thesis, Lille, 2019. http://www.theses.fr/2019LIL1A011.
Full textThis dissertation improves customer scoring. Customer scoring is important for companies in their decision making processes because it helps to solve key managerial issues such as the decision of which customers to target for a marketing campaign or the assessment of customer that are likely to leave the company. The research in this dissertation makes several contributions in three areas of the customer scoring literature. First, new sources of data are used to score customers. Second, methodology to go from data to decisions is improved. Third, customer life event prediction is proposed as a new application of customer scoring
Simeoni, Chiara. "Méthodes numériques pour des équations hyperboliques de type Saint-Venant." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2002. http://tel.archives-ouvertes.fr/tel-00922706.
Full textVicente, Sergio. "Apprentissage statistique avec le processus ponctuel déterminantal." Thesis, 2021. http://hdl.handle.net/1866/25249.
Full textThis thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.