Dissertationen zum Thema „Données de taille variable“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Données de taille variable" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Bijou, Mohammed. „Qualité de l'éducation, taille des classes et mixité sociale : Un réexamen à partir des méthodes à variables instrumentales et semi-paramétriques sur données multiniveaux - Cas du Maroc -“. Electronic Thesis or Diss., Toulon, 2021. http://www.theses.fr/2021TOUL2004.
Der volle Inhalt der QuelleThis thesis objective is to examine the quality of the Moroccan education system exploiting the data of the programs TIMSS and PIRLS 2011.The thesis is structured around three chapters. The first chapter examines the influence of individual student and school characteristics on school performance, as well as the important role of the school environment (effect of size and social composition). In the second chapter, we seek to estimate the optimal class size that ensures widespread success for all students at both levels, namely, the fourth year of primary school and the second year of college. The third chapter proposes to study the relationship between the social and economic composition of the school and academic performance, while demonstrating the role of social mix in student success. In order to study this relationship, we mobilize different econometric approaches, by applying a multilevel model with correction for the problem of endogeneity (chapter 1), a hierarchical semi-parametric model (chapter 2) and a contextual hierarchical semi-parametric model (chapter 3). The results show that academic performance is determined by several factors that are intrinsic to the student and also contextual. Indeed, a smaller class size and a school with a mixed social composition are the two essential elements for a favourable environment and assured learning for all students. According to our results, governments should give priority to reducing class size by limiting it to a maximum of 27 students. In addition, it is necessary to consider making the school map more flexible in order to promote social mixing at school. The results obtained allow a better understanding of the Moroccan school system, in its qualitative aspect and the justification of relevant educational policies to improve the quality of the Moroccan education system
Sanchez, Théophile. „Reconstructing our past ˸ deep learning for population genetics“. Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG032.
Der volle Inhalt der QuelleConstant improvement of DNA sequencing technology that produces large quantities of genetic data should greatly enhance our knowledge of evolution, particularly demographic history. However, the best way to extract information from this large-scale data is still an open problem. Neural networks are a strong candidate to attain this goal, considering their recent success in machine learning. These methods have the advantages of handling high-dimensional data, adapting to most applications and scaling efficiently to available computing resources. However, their performance dependents on their architecture, which should match the data properties to extract the maximum information. In this context, this thesis presents new approaches based on deep learning, as well as the principles for designing architectures adapted to the characteristics of genomic data. The use of convolution layers and attention mechanisms allows the presented networks to be invariant to the sampled haplotypes' permutations and to adapt to data of different dimensions (number of haplotypes and polymorphism sites). Experiments conducted on simulated data demonstrate the efficiency of these approaches by comparing them to more classical network architectures, as well as to state-of-the-art methods. Moreover, coupling neural networks with some methods already proven in population genetics, such as the approximate Bayesian computation, improves the results and combines their advantages. The practicality of neural networks for demographic inference is tested on whole genome sequence data from real populations of Bos taurus and Homo sapiens. Finally, the scenarios obtained are compared with current knowledge of the demographic history of these populations
Caron, Eddy. „Calcul numérique sur données de grande taille“. Amiens, 2000. https://tel.archives-ouvertes.fr/tel-01444591.
Der volle Inhalt der QuelleManouvrier, Maude. „Objets similaires de grande taille dans les bases de données“. Paris 9, 2000. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2000PA090033.
Der volle Inhalt der QuelleGoddard, Jean-Philippe. „Synthèse de molécules de taille variable par polyhomologation de composés borés“. Paris 11, 2002. http://www.theses.fr/2002PA112064.
Der volle Inhalt der QuelleDuring this work, we developed a method of original synthesis allowing to lead mixtures of molecules of variable size with an aim of discovering new chelating molecules of cesium. This method utilizes a reaction of polyhomologation of borated compounds with the nucleophilic ones comprising a grouping leaving in alpha of the negative charge. We tested various families from nucleophilic like anions of sulfones, sulfonium ylides, anions of hydrazones, triméthylsilyldiazométhane and arsonium ylides. The first three families did not allow us to carry out reactions of polyhomologation. The triméthylsilyldiazométhane posséde not either the capacity to carry out reactions successive insertions but this property was exploited to propose a chemical conversion of olefinic hydrocarbon into alkylmethanol corresponding. The arsonium ylides made it possible to carry out reactions of polyhomologation with boronates and boranes. The alkylarsonium ylides were used to form polymers of controlled size having a ramification on each carbon atom of the principal chain. This type of polymer is not accessible by the current methods of polymerization. The allylarsonium ylides have a particular reactivity since the allyl boranes formed during the insertion reactions undergo a sigmatropic [1,3] rearrangement before reacting again with a ylide. It is thus possible to lead with polymers of big size to which the structure is close to that of the natural rubber. By this method it is possible to lead with linear or cyclic polymers. This method is currently under development at the laboratory to form chelating structures of cesium
Uribe, Lobello Ricardo. „Génération de maillages adaptatifs à partir de données volumiques de grande taille“. Thesis, Lyon 2, 2013. http://www.theses.fr/2013LYO22024.
Der volle Inhalt der QuelleIn this document, we have been interested in the surface extraction from the volumetric representation of an object. With this objective in mind, we have studied the spatial subdivision surface extraction algorithms. This approaches divide the volume in order to build a piecewise approximation of the surface. The general idea is to combine local and simple approximations to extract a complete representation of the object's surface.The methods based on the Marching Cubes (MC) algorithm have problems to produce good quality and to handle adaptive surfaces. Even if a lot of improvements to MC have been proposed, these approaches solved one or two problems but they don't offer a complete solution to all the MC drawbacks. Dual methods are more adapted to use adaptive sampling over volumes. These methods generate surfaces that are dual to those generated by the Marching Cubes algorithm or dual grids in order to use MC methods. These solutions build adaptive meshes that represent well the features of the object. In addition, recent improvements guarantee that the produced meshes have good geometrical and topological properties.In this dissertation, we have studied the main topological and geometrical properties of volumetric objects. In a first stage, we have explored the state of the art on spatial subdivision surface extraction methods in order to identify theirs advantages, theirs drawbacks and the implications of theirs application on volumetric objects. We have concluded that a dual approach is the best option to obtain a good compromise between mesh quality and geometrical approximation. In a second stage, we have developed a general pipeline for surface extraction based on a combination of dual methods and connected components extraction to better capture the topology and geometry of the original object. In a third stage, we have presented an out-of-core extension of our surface extraction pipeline in order to extract adaptive meshes from huge volumes. Volumes are divided in smaller sub-volumes that are processed independently to produce surface patches that are later combined in an unique and topologically correct surface. This approach can be implemented in parallel to speed up its performance. Test realized in a vast set of volumes have confirmed our results and the features of our solution
Lê, Thanh Vu. „Visualisation interactive 3D pour un ensemble de données géographiques de très grande taille“. Pau, 2011. http://www.theses.fr/2011PAUU3005.
Der volle Inhalt der QuelleReal-time terrain rendering remains an active area of research for a lot of modern computer based applications such as geographic information systems (GIS), interactive 3D games, flights simulators or virtual reality. The technological breakthroughs in data aquisition, coupled with recent advances in display technology have simultaneously led to substantial increases in resolution of both the Digital Elevation Models (DEM) and the various displays used to present this information. In this phD, we have presented a new out-of-core terrain visualization algorithm that achieves per-pixel accurate shading of large textured elevation maps in real-time : our first contribution is the LOD scheme which is based on a small precomputed quadtree of geometric errors, whose nodes are selected for asynchronous loading and rendering depending on a projection in screenspace of those errors. The terrain data and its color texture are manipulated by the CPU in a unified manner as a collection of raster image patches, whose dimensions depends on their screen-space occupancy ; our second contribution is a novel method to remove artifacts that appear on the border between quadtree blocks, we generate a continuous surface without needing additional mesh ; our latest contribution is an effective method adapted to our data structure for the geomorphing, it can be implemented entirely on the GPU. The presented framework exhibits several interesting features over other existing techniques : there is no mesh manipulation or mesh data structures required ; terrain geometric complexity only depends on projected elevation error views from above result in very coarse meshes), lower geometric complexity degrades terrain silhouettes but not details brought in through normal map shading, real-time rendering with support for progressive data loading ; and geometric information and color textures are similarly and efficiently handled as raster data by the CPU. Due to simplified data structures, the system is compact, CPU and GPU efficient and is simple to implement
Allart, Thibault. „Apprentissage statistique sur données longitudinales de grande taille et applications au design des jeux vidéo“. Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1136/document.
Der volle Inhalt der QuelleThis thesis focuses on longitudinal time to event data possibly large along the following tree axes : number of individuals, observation frequency and number of covariates. We introduce a penalised estimator based on Cox complete likelihood with data driven weights. We introduce proximal optimization algorithms to efficiently fit models coefficients. We have implemented thoses methods in C++ and in the R package coxtv to allow everyone to analyse data sets bigger than RAM; using data streaming and online learning algorithms such that proximal stochastic gradient descent with adaptive learning rates. We illustrate performances on simulations and benchmark with existing models. Finally, we investigate the issue of video game design. We show that using our model on large datasets available in video game industry allows us to bring to light ways of improving the design of studied games. First we have a look at low level covariates, such as equipment choices through time and show that this model allows us to quantify the effect of each game elements, giving to designers ways to improve the game design. Finally, we show that the model can be used to extract more general design recommendations such as dificulty influence on player motivations
Allart, Thibault. „Apprentissage statistique sur données longitudinales de grande taille et applications au design des jeux vidéo“. Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1136.
Der volle Inhalt der QuelleThis thesis focuses on longitudinal time to event data possibly large along the following tree axes : number of individuals, observation frequency and number of covariates. We introduce a penalised estimator based on Cox complete likelihood with data driven weights. We introduce proximal optimization algorithms to efficiently fit models coefficients. We have implemented thoses methods in C++ and in the R package coxtv to allow everyone to analyse data sets bigger than RAM; using data streaming and online learning algorithms such that proximal stochastic gradient descent with adaptive learning rates. We illustrate performances on simulations and benchmark with existing models. Finally, we investigate the issue of video game design. We show that using our model on large datasets available in video game industry allows us to bring to light ways of improving the design of studied games. First we have a look at low level covariates, such as equipment choices through time and show that this model allows us to quantify the effect of each game elements, giving to designers ways to improve the game design. Finally, we show that the model can be used to extract more general design recommendations such as dificulty influence on player motivations
Pham, Thi Thuy Ngoc. „Estimation de mouvement avec bloc de taille variable et application dans un réducteur de bruit“. Mémoire, Université de Sherbrooke, 2005. http://savoirs.usherbrooke.ca/handle/11143/1320.
Der volle Inhalt der QuellePham, Thi Thuy Ngoc. „Estimation de mouvement avec bloc de taille variable et application dans un réducteur de bruit“. [S.l. : s.n.], 2005.
Den vollen Inhalt der Quelle findenShi, Li. „Structures de complexes électrostatiques entre un polyélectrolytes de rigidité variable et des nanoparticules de taille contrôlée“. Paris 7, 2013. http://www.theses.fr/2013PA077079.
Der volle Inhalt der QuelleElectrostatic complexation process involving polyelectrolyte and nanoparticles of opposite charge are receiving an increasing interest in view of their implications in numerous domains. In this thesis, we are particularly interested in the role of ratio LP/R on the formation of complexes. To realize the variation of this parameter, we have chosen five model Systems by the combination of four polyelectrolytes of différent rigidity and three oppositely charged nanoparticles of different sizes, including the positively charged AuNPs synthesized by ourselves. For each System, we have in the first place studied the macroscopic behaviors of complexes formed at different concentration ratio of PEL and NPs, which were recorded in the phase diagrams. Then, the structures of so formed complexes were studied by a combination of cryo-TEM, small-angle neutron, X-ray, and light scattering (size, fractal dimension Df). We have in particular revealed for Lp/R ~1 the formation of well-defined single-strand nanorods and also of randomly branched complexes (Df between 1. 5 and 3) respectively in the two monophasic domains (excess of nanoparticles or of PEL chains). Besides the ratio LP/R, the salt effect was also studied by comparing salt-free System with the one in presence of additional salt, and we proved that the addition of salt can screen the repulsive charges of complexes which results in rapid phase separation and more compact complex structure. Moreover, we have observed unexpectedly the formation of AuNPs nanoparticles- Hyaluronan chains metacrystals
Benali, Khairidine. „Commande d'un système robotisé de type torse humanoïde pour le transport de colis de taille variable“. Thesis, Normandie, 2019. http://www.theses.fr/2019NORMLH22.
Der volle Inhalt der QuelleIn logistics warehouses, automation in the sense of robotization is frequently being employed to cut down production times by efficiently managing the processes of picking heavy loads, place, pack and palletize, while reducing the risks and errors to improve the working conditions of human operators along the way. The flexibility of human is fundamental for order preparation owing to adaptive skills for task variation, but at the same time increasing productivity is complemented with fatigue (musculoskeletal disorders). In this context the research presented in this thesis is a contribution in the robotization of palletization operations requiring exceptional versatility of manipulation and gripping. We have proposed an innovative solution of utilizing a humanoid torso equipped with two manipulator arms with adaptive grippers to grasp and hold the objects of variable size and mass. The main contribution of research is the development of a hybrid Force / Position-Position control law with commutation and estimation of the object surface slip, while taking into account the compliance and correction of the clamping force during handling. The execution of the control involves the collaboration of the two arms for coordinated manipulation and adaptation to the material and the human environment (cobotics)
Padellini, Marc. „Optimisation d'un schéma de codage de la parole à très bas débit, par indexation d'unités de taille variable“. Marne-la-Vallée, 2006. http://www.theses.fr/2006MARN0293.
Der volle Inhalt der QuelleThis thesis aims at studying a speech coding scheme operating at a very low bit rate, around 500 bits/s, relying on speech recognition and speech synthesis techniques. It follows the work carried out by the RNRT project SYMPATEX and Cernocky’s [1] thesis. On one hand, elementary speech units are recognized by the coder, using Hidden Markov Models. On the other hand, a concatenative speech synthesis is used in the decoder. This system takes advantage of a large speech corpus stored in the system, and organized in a synthesis database. The encoder looks up in the corpus the units that best fit the speech to be encoded, then unit indexes and prosodic parameters are transmitted. The decoder retrieves in the database the units to be concatenated. This thesis deals with issues on the overall speech quality of the encoding scheme. A dynamic unit selection is proposed to this purpose. Furthermore, the scheme has been extended to operate under realistic conditions. Noisy environments have been studied, and a noise adaptation module was created. Extension to speaker independent mode is achieved by training the system on a large number of speakers, and using a hierarchic classification of speakers to create a set of synthesis databases which is close to the test speaker. Finally, complexity of the whole scheme is analyzed, and a method to compress the database is proposed
Veganzones, David. „Corporate failure prediction models : contributions from a novel explanatory variable and imbalanced datasets approach“. Thesis, Lille, 2018. http://www.theses.fr/2018LIL1A004.
Der volle Inhalt der QuelleThis dissertation explores novel approaches to develop corporate failure prediction models. This thesis then contains three new areas for intervention. The first is a novel explanatory variable based on earnings management. For this purpose, we use two measures (accruals and real activities) that assess potential earnings manipulation. We evidenced that models which include this novel variable in combination with financial information are more accurate than those relying only on financial data. The second analyzes the capacity of corporate failure models in imbalanced datasets. We put into relation the different degrees of imbalance, the loss on performance and the performance recovery capacity, which have never been studied in corporate failure. The third unifies the previous areas by evaluating the capacity of our proposed earnings management model in imbalanced datasets. Researches covered in this thesis provide unique and relevant contributions to corporate finance literature, especially to corporate failure domain
Chen, Fengwei. „Contributions à l'identification de modèles à temps continu à partir de données échantillonnées à pas variable“. Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0149/document.
Der volle Inhalt der QuelleThe output of a system is always corrupted by additive noise, therefore it is more practical to develop estimation algorithms that are capable of handling noisy data. The effect of white additive noise has been widely studied, while a colored additive noise attracts less attention, especially for a continuous-time (CT) noise. Sampling issues of CT stochastic processes are reviewed in this thesis, several sampling schemes are presented. Estimation of a CT stochastic process is studied. An expectation-maximization-based (EM) method to CT autoregressive/autoregressive moving average model is developed, which gives accurate estimation over a large range of sampling interval. Estimation of CT Box-Jenkins models is also considered in this thesis, in which the noise part is modeled to improve the performance of plant model estimation. The proposed method for CT Box-Jenkins model identification is in a two-step and iterative framework. Two-step means the plant and noise models are estimated in a separate and alternate way, where in estimating each of them, the other is assumed to be fixed. More specifically, the plant is estimated by refined instrumental variable (RIV) method while the noise is estimated by EM algorithm. Iterative means that the proposed method repeats the estimation procedure several times until a optimal estimate is found. Many practical systems have inherent time-delay. The problem of identifying delayed systems are of great importance for analysis, prediction or control design. The presence of a unknown time-delay greatly complicates the parameter estimation problem, essentially because the model are not linear with respect to the time-delay. An approach to continuous-time model identification of time-delay systems, combining a numerical search algorithm for the delay with the RIV method for the dynamic has been developed in this thesis. In the proposed algorithm, the system parameters and time-delay are estimated reciprocally in a bootstrap manner. The time-delay is estimated by an adaptive gradient-based method, whereas the system parameters are estimated by the RIV method. Since numerical method is used in this algorithm, the bootstrap method is likely to converge to local optima, therefore a low-pass filter has been used to enlarge the convergence region for the time-delay. The performance of the proposed algorithms are evaluated by numerical examples
Traoré, Abraham. „Contribution à la décomposition de données multimodales avec des applications en apprentisage de dictionnaires et la décomposition de tenseurs de grande taille“. Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR068/document.
Der volle Inhalt der QuelleIn this work, we are interested in special mathematical tools called tensors, that are multidimensional arrays defined on tensor product of some vector spaces, each of which has its own coordinate system and the number of spaces involved in this product is generally referred to as order. The interest for these tools stem from some empirical works (for a range of applications encompassing both classification and regression) that prove the superiority of tensor processing with respect to matrix decomposition techniques. In this thesis framework, we focused on specific tensor model named Tucker and established new approaches for miscellaneous tasks such as dictionary learning, online dictionary learning, large-scale processing as well as the decomposition of a tensor evolving with respect to each of its modes. New theoretical results are established and the efficiency of the different algorithms, which are based either on alternate minimization or coordinate gradient descent, is proven via real-world problems
Pastorelli, Mario. „Disciplines basées sur la taille pour la planification des jobs dans data-intensif scalable computing systems“. Electronic Thesis or Diss., Paris, ENST, 2014. http://www.theses.fr/2014ENST0048.
Der volle Inhalt der QuelleThe past decade have seen the rise of data-intensive scalable computing (DISC) systems, such as Hadoop, and the consequent demand for scheduling policies to manage their resources, so that they can provide quick response times as well as fairness. Schedulers for DISC systems are usually focused on the fairness, without optimizing the response times. The best practices to overcome this problem include a manual and ad-hoc control of the scheduling policy, which is error-prone and difficult to adapt to changes. In this thesis we focus on size-based scheduling for DISC systems. The main contribution of this work is the Hadoop Fair Sojourn Protocol (HFSP) scheduler, a size-based preemptive scheduler with aging; it provides fairness and achieves reduced response times thanks to its size-based nature. In DISC systems, job sizes are not known a-priori: therefore, HFSP includes a job size estimation module, which computes approximated job sizes and refines these estimations as jobs progress. We show that the impact of estimation errors on the size-based policies is not signifi- cant, under conditions which are verified in a system such as Hadoop. Because of this, and by virtue of being designed around the idea of working with estimated sizes, HFSP is largely tolerant to job size estimation errors. Our experimental results show that, in a real Hadoop deployment and with realistic workloads, HFSP performs better than the built-in scheduling policies, achieving both fairness and small mean response time. Moreover, HFSP maintains its good performance even when the cluster is heavily loaded, by focusing the resources to few selected jobs with the smallest size. HFSP is a preemptive policy: preemption in a DISC system can be implemented with different techniques. Approaches currently available in Hadoop have shortcomings that impact on the system performance. Therefore, we have implemented a new preemption technique, called suspension, that exploits the operating system primitives to implement preemption in a way that guarantees low latency without penalizing low-priority jobs
Pannetier, Benjamin. „Fusion de données pour la surveillance du champ de bataille“. Phd thesis, Université Joseph Fourier (Grenoble), 2006. http://tel.archives-ouvertes.fr/tel-00377247.
Der volle Inhalt der QuelleEtte, Théodore-Emien. „Modèle général de classes et modèle de partitions pour le découpage d'une variable unique : Deuxième partie : application des méthodes de l'analyse des données aux statistiques du marché mondial du cacao et du café“. Paris 6, 1992. http://www.theses.fr/1992PA066657.
Der volle Inhalt der QuelleLange, Benoît. „Visualisation interactive de données hétérogènes pour l'amélioration des dépenses énergétiques du bâtiment“. Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20172/document.
Der volle Inhalt der QuelleEnergy efficiencies are became a major issue. Building from any country have been identified as gap of energy, building are not enough insulated and energy loss by this struc- ture represent a major part of energy expenditure. RIDER has emerged from this viewpoint, RIDER for Research for IT Driven EneRgy efficiency. This project has goal to develop a new kind of IT system to optimize energy consumption of buildings. This system is based on a component paradigm, which is composed by a pivot model, a data warehouse with a data mining approach and a visualization tool. These two last components are developed to improve content of pivot model.In this manuscript, our focus was on the visualization part of the project. This manuscript is composed in two parts: state of the arts and contributions. Basic notions, a visualization chapter and a visual analytics chapter compose the state of the art. In the contribution part, we present data model used in this project, visualization proposed and we conclude with two experimentations on real data
Tandeo, Pierre. „MODÉLISATION SPATIO-TEMPORELLE D'UNE VARIABLE QUANTITATIVE À PARTIR DE DONNÉES MULTI-SOURCES APPLICATION À LA TEMPÉRATURE DE SURFACE DES OCÉANS“. Phd thesis, Agrocampus - Ecole nationale supérieure d'agronomie de rennes, 2010. http://tel.archives-ouvertes.fr/tel-00582679.
Der volle Inhalt der QuelleTandeo, Pierre. „Modélisation spatio-temporelle d’une variable quantitative à partir de données multi-sources : Application à la température de surface des océans“. Rennes, Agrocampus Ouest, 2010. https://tel.archives-ouvertes.fr/tel-00582679.
Der volle Inhalt der QuelleIn this thesis, an important oceanographic variable for the monitoring of the climate is studied: the sea surface temperature. At the global level, this variable is observed along the ocean by several remote sensed sources. In order to treat all this information, statistical methods are used to summarize our variable of interest in global daily map. For that purpose, a state-space linear model with Gaussian error is suggested. We begin to introduce this model on data resulting from having an irregular sampling. Then, we work on the estimation of the parameters. This is based on the combination of the method of moments and the maximum likelihood estimates, with the study of the EM algorithm and the Kalman recursions. Finally, this methodology is applied to estimate the variance of errors and the temporal correlation parameter to the Atlantic ocean. We add the spatial component and propose a separable second order structure, based on the product of a temporal covariance and a spatial anisotropic covariance. According to usual geostatistical methods, the parameters of this covariance are estimated on the Atlantic ocean and form a relevant atlas for the oceanographers. Finally, we show that the contribution of the spatial information increases the predictive behaviour of the model
El, Assaad Hani. „Modélisation et classification dynamique de données temporelles non stationnaires“. Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1162/document.
Der volle Inhalt der QuelleNowadays, diagnosis and monitoring for predictive maintenance of railway components are important key subjects for both operators and manufacturers. They seek to anticipate upcoming maintenance actions, reduce maintenance costs and increase the availability of rail network. In order to maintain the components at a satisfactory level of operation, the implementation of reliable diagnostic strategy is required. In this thesis, we are interested in a main component of railway infrastructure, the railway switch; an important safety device whose failure could heavily impact the availability of the transportation system. The diagnosis of this system is therefore essential and can be done by exploiting sequential measurements acquired successively while the state of the system is evolving over time. These measurements consist of power consumption curves that are acquired during several switch operations. The shape of these curves is indicative of the operating state of the system. The aim is to track the temporal dynamic evolution of railway component state under different operating contexts by analyzing the specific data in order to detect and diagnose problems that may lead to functioning failure. This thesis tackles the problem of temporal data clustering within a broader context of developing innovative tools and decision-aid methods. We propose a new dynamic probabilistic approach within a temporal data clustering framework. This approach is based on both Gaussian mixture models and state-space models. The main challenge facing this work is the estimation of model parameters associated with this approach because of its complex structure. In order to meet this challenge, a variational approach has been developed. The results obtained on both synthetic and real data highlight the advantage of the proposed algorithms compared to other state of the art methods in terms of clustering and estimation accuracy
Pastorelli, Mario. „Disciplines basées sur la taille pour la planification des jobs dans data-intensif scalable computing systems“. Thesis, Paris, ENST, 2014. http://www.theses.fr/2014ENST0048/document.
Der volle Inhalt der QuelleThe past decade have seen the rise of data-intensive scalable computing (DISC) systems, such as Hadoop, and the consequent demand for scheduling policies to manage their resources, so that they can provide quick response times as well as fairness. Schedulers for DISC systems are usually focused on the fairness, without optimizing the response times. The best practices to overcome this problem include a manual and ad-hoc control of the scheduling policy, which is error-prone and difficult to adapt to changes. In this thesis we focus on size-based scheduling for DISC systems. The main contribution of this work is the Hadoop Fair Sojourn Protocol (HFSP) scheduler, a size-based preemptive scheduler with aging; it provides fairness and achieves reduced response times thanks to its size-based nature. In DISC systems, job sizes are not known a-priori: therefore, HFSP includes a job size estimation module, which computes approximated job sizes and refines these estimations as jobs progress. We show that the impact of estimation errors on the size-based policies is not signifi- cant, under conditions which are verified in a system such as Hadoop. Because of this, and by virtue of being designed around the idea of working with estimated sizes, HFSP is largely tolerant to job size estimation errors. Our experimental results show that, in a real Hadoop deployment and with realistic workloads, HFSP performs better than the built-in scheduling policies, achieving both fairness and small mean response time. Moreover, HFSP maintains its good performance even when the cluster is heavily loaded, by focusing the resources to few selected jobs with the smallest size. HFSP is a preemptive policy: preemption in a DISC system can be implemented with different techniques. Approaches currently available in Hadoop have shortcomings that impact on the system performance. Therefore, we have implemented a new preemption technique, called suspension, that exploits the operating system primitives to implement preemption in a way that guarantees low latency without penalizing low-priority jobs
Schramm, Catherine. „Intégration des facteurs prédictifs de l'effet d'un traitement dans la conception et l'analyse des essais cliniques de petite taille : application à la maladie de Huntington“. Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066610/document.
Der volle Inhalt der QuelleHuntington's disease is neurodegenerative, genetic, rare, multifaceted and has a long evolution, inducing heterogeneity of conditions and progression of the disease. Current biotherapy trials are performed on small samples of patients, with a treatment effect measurable in the long-term that is heterogeneous. Identifying markers of the disease progression and of the treatment response may help to better understand and improve results of biotherapy studies in Huntington's disease. We have developed a clustering method for the treatment efficacy in the case of longitudinal data in order to identify treatment responders and nonresponders. Our method combines a linear mixed model with two slopes and a classical clustering algorithm. The mixed model generates random effects associated with treatment response, specific to each patient. The clustering algorithm is used to define subgroups according to the value of the random effects. Our method is robust in case of small samples. Finding subgroups of responders may help to define predictive markers of treatment response which will be used to give the most appropriate treatment for each patient. We discussed integration of (i) the predictive markers in study design of future clinical trials, assessing their impact on the power of the study; and (ii) the prognostic markers of disease progression by studying the COMT polymorphism as a prognostic marker of cognitive decline in Huntington's disease. Finally, we evaluated the learning effect of neuropsychological tasks measuring cognitive abilities, and showed how a double baseline in a clinical trial could take it into account when the primary outcome is the cognitive decline
Linardi, Michele. „Variable-length similarity search for very large data series : subsequence matching, motif and discord detection“. Electronic Thesis or Diss., Sorbonne Paris Cité, 2019. http://www.theses.fr/2019USPCB056.
Der volle Inhalt der QuelleData series (ordered sequences of real valued points, a.k.a. time series) has become one of the most important and popular data-type, which is present in almost all scientific fields. For the last two decades, but more evidently in this last period the interest in this data-type is growing at a fast pace. The reason behind this is mainly due to the recent advances in sensing, networking, data processing and storage technologies, which have significantly assisted the process of generating and collecting large amounts of data series. Data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different data mining problems, such as Clustering, Subsequence Matching, Imputation of Missing Values, Motif Discovery, and Anomaly detection work by means of similarity search. Data series indexes have been proposed for fast similarity search. Nevertheless all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this regard, all solutions for the aforementioned problems require the prior knowledge of the series length, on which similarity search is performed. Consequently, the user must know the length of the expected results, which is often an unrealistic assumption. This aspect is thus of paramount importance. In several cases, the length is a critical parameter that heavily influences the quality of the final outcome. In this thesis, we propose scalable solutions that enable variable-length analysis of very large data series collections. We propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length. Our contribution is two-fold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk based index visits and in-memory sequential scans. Our approach supports non Z-normalized and Z-normalized sequences, and can be used with no changes with both Euclidean Distance and Dynamic Time Warping, for answering both κ-NN and ε-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches. Subsequently, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. The experimental evaluation we conducted over several diverse real datasets show that our approaches are up to orders of magnitude faster than the alternatives. We moreover demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed
Peyhardi, Jean. „Une nouvelle famille de modèles linéaires généralisés (GLMs) pour l'analyse de données catégorielles ; application à la structure et au développement des plantes“. Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00936845.
Der volle Inhalt der QuelleSenga, Kiessé Tristan. „Approche non-paramétrique par noyaux associés discrets des données de dénombrement“. Phd thesis, Université de Pau et des Pays de l'Adour, 2008. http://tel.archives-ouvertes.fr/tel-00372180.
Der volle Inhalt der QuelleBrunet, Anne-Claire. „Développement d'outils statistiques pour l'analyse de données transcriptomiques par les réseaux de co-expression de gènes“. Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30373/document.
Der volle Inhalt der QuelleToday's, new biotechnologies offer the opportunity to collect a large variety and volume of biological data (genomic, proteomic, metagenomic...), thus opening up new avenues for research into biological processes. In this thesis, what we are specifically interested is the transcriptomic data indicative of the activity or expression level of several thousands of genes in a given cell. The aim of this thesis was to propose proper statistical tools to analyse these high dimensional data (n<
Silio, Calzada Ana. „Estimation de la production primaire nouvelle dans les zones d'upwelling à partir de données satellitaires multi-capteurs : application au système du Benguela, et étude de sa variabilité saisonnière et interannuelle“. Paris 6, 2008. http://www.theses.fr/2008PA066367.
Der volle Inhalt der QuelleBreton, Jean. „Modélisation thermique et simulation numérique en régime variable de parois à lame d'air insole et/ou ventilée : intégration dans un code de calcul de charges thermiques de bâtiments“. Lyon, INSA, 1986. http://www.theses.fr/1986ISAL0014.
Der volle Inhalt der QuelleIn our present work we develop detailed numerical software of the thermal behaviour of walls containing a vertical air slab and used as solar or internal gains collectors(Trombe wall, green house effect wall. . . ). The first part is devoted to a bibliographic analysis of convective behaviour of large aspect ratio cavities. In a second part we define the hypothesis and describe the numerical mode is used for each wal1 and their coupling with a detailed software of the thermal behaviour of buildings. The third part presents parametric studies of physical or technological characteristics of the walls. We use here an original criteria for the energetic performance (the Solar Gain Ratio) which enable us to show the main influence of the aeraulic and thermal couplings between the wall and the dwelling cell. Finally we propose simplified mode is which respect this last point and accept a more general description of the physical phenomena inside the walls
Hébert, Benoît-Paul. „Régression avec une variable dépendante ordinale, comparaison de la performance de deux modèles logistiques ordinaux et du modèle linéaire classique à l'aide de données simulées“. Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0016/NQ36277.pdf.
Der volle Inhalt der QuelleBanciu, Andrei. „A stochastic approach for the range evaluation“. Rennes 1, 2012. http://www.theses.fr/2012REN1E002.
Der volle Inhalt der QuellePerthame, Emeline. „Stabilité de la sélection de variables pour la régression et la classification de données corrélées en grande dimension“. Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S122/document.
Der volle Inhalt der QuelleThe analysis of high throughput data has renewed the statistical methodology for feature selection. Such data are both characterized by their high dimension and their heterogeneity, as the true signal and several confusing factors are often observed at the same time. In such a framework, the usual statistical approaches are questioned and can lead to misleading decisions as they are initially designed under independence assumption among variables. The goal of this thesis is to contribute to the improvement of variable selection methods in regression and supervised classification issues, by accounting for the dependence between selection statistics. All the methods proposed in this thesis are based on a factor model of covariates, which assumes that variables are conditionally independent given a vector of latent variables. A part of this thesis focuses on the analysis of event-related potentials data (ERP). ERPs are now widely collected in psychological research to determine the time courses of mental events. In the significant analysis of the relationships between event-related potentials and experimental covariates, the psychological signal is often both rare, since it only occurs on short intervals and weak, regarding the huge between-subject variability of ERP curves. Indeed, this data is characterized by a temporal dependence pattern both strong and complex. Moreover, studying the effect of experimental condition on brain activity for each instant is a multiple testing issue. We propose to decorrelate the test statistics by a joint modeling of the signal and time-dependence among test statistics from a prior knowledge of time points during which the signal is null. Second, an extension of decorrelation methods is proposed in order to handle a variable selection issue in the linear supervised classification models framework. The contribution of factor model assumption in the general framework of Linear Discriminant Analysis is studied. It is shown that the optimal linear classification rule conditionally to these factors is more efficient than the non-conditional rule. Next, an Expectation-Maximization algorithm for the estimation of the model parameters is proposed. This method of data decorrelation is compatible with a prediction purpose. At last, the issues of detection and identification of a signal when features are dependent are addressed more analytically. We focus on the Higher Criticism (HC) procedure, defined under the assumptions of a sparse signal of low amplitude and independence among tests. It is shown in the literature that this method reaches theoretical bounds of detection. Properties of HC under dependence are studied and the bounds of detectability and estimability are extended to arbitrarily complex situations of dependence. Finally, in the context of signal identification, an extension of Higher Criticism Thresholding based on innovations is proposed
Devijver, Emilie. „Modèles de mélange pour la régression en grande dimension, application aux données fonctionnelles“. Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112130/document.
Der volle Inhalt der QuelleFinite mixture regression models are useful for modeling the relationship between a response and predictors, arising from different subpopulations. In this thesis, we focus on high-dimensional predictors and a high-dimensional response. First of all, we provide an ℓ1-oracle inequality satisfied by the Lasso estimator. We focus on this estimator for its ℓ1-regularization properties rather than for the variable selection procedure. We also propose two procedures to deal with this issue. The first procedure leads to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted to the relevant variables selected by an ℓ1-penalized maximum likelihood estimator. The second procedure considers jointly predictor selection and rank reduction for obtaining lower-dimensional approximations of parameters matrices. For each procedure, we get an oracle inequality, which derives the penalty shape of the criterion, depending on the complexity of the random model collection. We extend these procedures to the functional case, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms, apply and evaluate our methods both on simulations and real datasets. In particular, we illustrate the first procedure on an electricity load consumption dataset
Sehi-Bi, Ballo Blizand. „Impact de la mondialisation sur la taille de l’État : analyse théorique et empirique sur un panel de pays à revenu élevé, intermédiaire et faible“. Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCG008.
Der volle Inhalt der QuelleThe impact of globalization on the size of the state: theoretical and empirical analysis on high-income, middle-income and low-income countriesMany years after the General Theory of Keynes (1936), which promotes the role of the public sector in economic stabilization, the debate between the supporters of Keynesianism and the partisans of « laissez-faire » remains controversial, despite the Great Recession of 2008-2009 that could have marked a definitive return, or at least for a long time, to the intervention of the state in the economy. The thesis aims to analyze the effects of globalization on the size of the state through the measurement of the impact of economic growth and economic openness. We also measure the impact of the budget balance on the trade balance. To do this, we use on the one hand, a vector autoregressive model (VAR) in a panel, that we estimate by the GMM method. On the other hand, we also implement methods applicable to dynamic heterogeneous panels (PMG, MG and DFE). Ours results suggests that, the link between economic growth and public spending is a function of the nature of spending and changing inequality (in high-income countries). It also show that in high- and middle-income countries, the relationship between fiscal and current balances depends on changes in output; the current account also influences the budget balance in middle-income countries. Finally, the work reveals that trade openness can lead to some inefficiency of public action through lower tax revenues
Sainct, Benoît. „Contributions statistiques à l'analyse de mégadonnées publiques“. Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30053.
Der volle Inhalt der QuelleThe aim of this thesis is to provide a set of methodological tools to answer two problems: the prediction of the payroll of local authorities, and the analysis of their tax data. For the first, the work revolves around two statistical themes: the selection of time series model, and the analysis of functional data. Because of the complexity of the data and the heavy computation time constraints, a clustering approach has been favored. In particular, we used Functional Principal Component Analysis and a model of Gaussian mixtures to achieve unsupervised classification. These methods have been applied in two prototypes of tools that represent one of the achievements of this thesis. For the second problem, the work was done in three stages: first, innovative methods for classifying an ordinal target variable were compared on public data, notably by exploiting random forests, SVM and gradient boosting. Then, these methods were adapted to outlier detection in a targeted, ordinal, unsupervised and non-parametric context, and their efficiency was mainly compared on synthetic datasets. It is our ordinal random forest by class separation that seems to have the best result. Finally, this method has been applied to real data of tax bases, where the concerns of size and complexity are more important. Aimed at local authorities directorates, this new approach to examining their database is the second outcome of this work
Soret, Perrine. „Régression pénalisée de type Lasso pour l’analyse de données biologiques de grande dimension : application à la charge virale du VIH censurée par une limite de quantification et aux données compositionnelles du microbiote“. Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0254.
Der volle Inhalt der QuelleIn clinical studies and thanks to technological progress, the amount of information collected in the same patient continues to grow leading to situations where the number of explanatory variables is greater than the number of individuals. The Lasso method proved to be appropriate to circumvent over-adjustment problems in high-dimensional settings.This thesis is devoted to the application and development of Lasso-penalized regression for clinical data presenting particular structures.First, in patients with the human immunodeficiency virus, mutations in the virus's genetic structure may be related to the development of drug resistance. The prediction of the viral load from (potentially large) mutations allows guiding treatment choice.Below a threshold, the viral load is undetectable, data are left-censored. We propose two new Lasso approaches based on the Buckley-James algorithm, which imputes censored values by a conditional expectation. By reversing the response, we obtain a right-censored problem, for which non-parametric estimates of the conditional expectation have been proposed in survival analysis. Finally, we propose a parametric estimation based on a Gaussian hypothesis.Secondly, we are interested in the role of the microbiota in the deterioration of respiratory health. The microbiota data are presented as relative abundances (proportion of each species per individual, called compositional data) and they have a phylogenetic structure.We have established a state of the art methods of statistical analysis of microbiota data. Due to the novelty, few recommendations exist on the applicability and effectiveness of the proposed methods. A simulation study allowed us to compare the selection capacity of penalization methods proposed specifically for this type of data.Then we apply this research to the analysis of the association between bacteria / fungi and the decline of pulmonary function in patients with cystic fibrosis from the MucoFong project
Belghoul, Abdeslem. „Optimizing Communication Cost in Distributed Query Processing“. Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC025/document.
Der volle Inhalt der QuelleIn this thesis, we take a complementary look to the problem of optimizing the time for communicating query results in distributed query processing, by investigating the relationship between the communication time and the middleware configuration. Indeed, the middleware determines, among others, how data is divided into batches and messages before being communicated over the network. Concretely, we focus on the research question: given a query Q and a network environment, what is the best middleware configuration that minimizes the time for transferring the query result over the network? To the best of our knowledge, the database research community does not have well-established strategies for middleware tuning. We present first an intensive experimental study that emphasizes the crucial impact of middleware configuration on the time for communicating query results. We focus on two middleware parameters that we empirically identified as having an important influence on the communication time: (i) the fetch size F (i.e., the number of tuples in a batch that is communicated at once to an application consuming the data) and (ii) the message size M (i.e., the size in bytes of the middleware buffer, which corresponds to the amount of data that can be communicated at once from the middleware to the network layer; a batch of F tuples can be communicated via one or several messages of M bytes). Then, we describe a cost model for estimating the communication time, which is based on how data is communicated between computation nodes. Precisely, our cost model is based on two crucial observations: (i) batches and messages are communicated differently over the network: batches are communicated synchronously, whereas messages in a batch are communicated in pipeline (asynchronously), and (ii) due to network latency, it is more expensive to communicate the first message in a batch compared to any other message that is not the first in its batch. We propose an effective strategy for calibrating the network-dependent parameters of the communication time estimation function i.e, the costs of first message and non first message in their batch. Finally, we develop an optimization algorithm to effectively compute the values of the middleware parameters F and M that minimize the communication time. The proposed algorithm allows to quickly find (in small fraction of a second) the values of the middleware parameters F and M that translate a good trade-off between low resource consumption and low communication time. The proposed approach has been evaluated using a dataset issued from application in Astronomy
Smagghue, Gabriel. „Essays on the impact of international trade and labor regulation on firms“. Thesis, Paris, Institut d'études politiques, 2014. http://www.theses.fr/2014IEPP0022/document.
Der volle Inhalt der QuelleRecent literature in international economics and macroeconomics has pointed to the major role played by large firms in shaping aggregate economic outcomes. Large firms influence, inter alia, economic fluctuations, performance on export markets and inequalities between workers and between consumers. It is therefore crucial to understand how large firms emerge and behave. In the present thesis, I look at three independent aspects of this question. First, I study how exporting firms adjust the quality of the products they export in response to an intensification of "low-cost" competition in foreign markets. To this end, I develop a new method to estimate the quality of products at the firm-level and I find evidence that firms upgrade quality in response to "low-cost" competition. Second, I investigate the way exporting firms adjust their sales when a demand shock (e.g. an economic recession, a war) occurs in one of their destinations. In the context of the Champagne wine industry during the 2000-2001 economic recession, I show that firms reallocate their sales toward markets where demand conditions are relatively more favorable. Lastly, I look at the way firms adjust their size and their mix of capital and labor in response to labor regulations which are more binding to large firms. I find that firms shrink and substitute capital for labor to mitigate the labor cost of the regulation. At the aggregate level, preliminary results suggests that workers gain from the regulation while capital owners lose
Wang, Chu. „Deep learning-based prognostics for fuel cells under variable load operating conditions“. Electronic Thesis or Diss., Aix-Marseille, 2022. http://www.theses.fr/2022AIXM0530.
Der volle Inhalt der QuelleProton exchange membrane fuel cell (PEMFC) systems are suitable for various transportation applications thanks to their compact structure, high power density, low start/running temperature, and zero carbon emissions. High cost and lack of durability of PEMFC are still the core factors limiting their large-scale commercialization. In transportation applications, the deterioration of PEMFCs is aggravated by variable load conditions, resulting in a decrease in their Remaining Useful Life (RUL). Prognostics and health management (PHM) is an effective tool to forecast potential system risks, manage system control/maintenance schedules, improve system safety and reliability, extend system life, and reduce operation/maintenance costs. Prognostics is an important foundation and key support for PHM, and its core tasks include health indicator extraction, degradation trend prediction, and RUL estimation. The long-term degradation characteristics of PEMFC are concealed in variable load conditions, which increases the difficulty of health indicator extraction, reduces the accuracy of degradation prediction, and inhibits the reliability of life estimation. In view of this, the thesis work starts from modeling the degradation behavior of PEMFC under variable load conditions and carries out research work on health indicator extraction, short/long-term degradation trend prediction, RUL estimation and reliability evaluation
Kammoun, Radhouane. „Etude de l'évaluation des titres intercotés dans un contexte d'asymétrie d'information : cas des entreprises européennes intercotées au Nasdaq“. Thesis, Aix-Marseille 3, 2011. http://www.theses.fr/2011AIX32029.
Der volle Inhalt der QuelleCross-listing is a good opportunity for European firms to grow and to have access to a liquid market. Widening the investor base, raising funds at a lower cost to implement new projects and reducing geographical barriers and laws, are among the main advantages of cross-listing. Through a theoretical and an empirical study on a sample of firms from European countries and cross-listed on Nasdaq, we study the impact of cross-listing on corporate performance. For investors, the crosslisted securities represents a diversification opportunity and helps mitigate the home bias effect. The crosslisting provides the company greater visibility and allows it to attract new investors. The firm-specific characteristics such as size or industry influence the benefits of cross-listing
Michel, Pierre. „Sélection d'items en classification non supervisée et questionnaires informatisés adaptatifs : applications à des données de qualité de vie liée à la santé“. Thesis, Aix-Marseille, 2016. http://www.theses.fr/2016AIXM4097/document.
Der volle Inhalt der QuelleAn adaptive test provides a valid measure of quality of life of patients and reduces the number of items to be filled. This approach is dependent on the models used, sometimes based on unverifiable assumptions. We propose an alternative approach based on decision trees. This approach is not based on any assumptions and requires less calculation time for item administration. We present different simulations that demonstrate the relevance of our approach.We present an unsupervised classification method called CUBT. CUBT includes three steps to obtain an optimal partition of a data set. The first step grows a tree by recursively dividing the data set. The second step groups together the pairs of terminal nodes of the tree. The third step aggregates terminal nodes that do not come from the same split. Different simulations are presented to compare CUBT with other approaches. We also define heuristics for the choice of CUBT parameters.CUBT identifies the variables that are active in the construction of the tree. However, although some variables may be irrelevant, they may be competitive for the active variables. It is essential to rank the variables according to an importance score to determine their relevance in a given model. We present a method to measure the importance of variables based on CUBT and competitive binary splis to define a score of variable importance. We analyze the efficiency and stability of this new index, comparing it with other methods
Geronimi, Julia. „Contribution à la sélection de variables en présence de données longitudinales : application à des biomarqueurs issus d'imagerie médicale“. Thesis, Paris, CNAM, 2016. http://www.theses.fr/2016CNAM1114/document.
Der volle Inhalt der QuelleClinical studies enable us to measure many longitudinales variables. When our goal is to find a link between a response and some covariates, one can use regularisation methods, such as LASSO which have been extended to Generalized Estimating Equations (GEE). They allow us to select a subgroup of variables of interest taking into account intra-patient correlations. Databases often have unfilled data and measurement problems resulting in inevitable missing data. The objective of this thesis is to integrate missing data for variable selection in the presence of longitudinal data. We use mutiple imputation and introduce a new imputation function for the specific case of variables under detection limit. We provide a new variable selection method for correlated data that integrate missing data : the Multiple Imputation Penalized Generalized Estimating Equations (MI-PGEE). Our operator applies the group-LASSO penalty on the group of estimated regression coefficients of the same variable across multiply-imputed datasets. Our method provides a consistent selection across multiply-imputed datasets, where the optimal shrinkage parameter is chosen by minimizing a BIC-like criteria. We then present an application on knee osteoarthritis aiming to select the subset of biomarkers that best explain the differences in joint space width over time
Sagara, Issaka. „Méthodes d'analyse statistique pour données répétées dans les essais cliniques : intérêts et applications au paludisme“. Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM5081/document.
Der volle Inhalt der QuelleNumerous clinical studies or control interventions were done or are ongoing in Africa for malaria control. For an efficient control of this disease, the strategies should be closer to the reality of the field and the data should be analyzed appropriately. In endemic areas, malaria is a recurrent disease. Repeated malaria episodes are common in African. However, the literature review indicates a limited application of appropriate statistical tools for the analysis of recurrent malaria data. We implemented appropriate statistical methods for the analysis of these data We have also studied the repeated measurements of hemoglobin during malaria treatments follow-up in order to assess the safety of the study drugs by pooling data from 13 clinical trials.For the analysis of the number of malaria episodes, the negative binomial regression has been implemented. To model the recurrence of malaria episodes, four models were used: i) the generalized estimating equations (GEE) using the Poisson distribution; and three models that are an extension of the Cox model: ii) Andersen-Gill counting process (AG-CP), iii) Prentice-Williams-Peterson counting process (PWP-CP); and (iv) the shared gamma frailty model. For the safety analysis, i.e. the assessment of the impact of malaria treatment on hemoglobin levels or the onset of anemia, the generalized linear and latent mixed models (GLLAMM) has been implemented. We have shown how to properly apply the existing statistical tools in the analysis of these data. The prospects of this work remain in the development of guides on good practices on the methodology of the preparation and analysis and storage network for malaria data
Chahdoura, Sami. „Etude sur un cas modèle de questionnaire du double recadrage des notes suivant l'équation personnelle : modèle de codage d'une variable unique : application de l'analyse des correspondances aux comptes du bilan“. Paris 6, 1995. http://www.theses.fr/1995PA066044.
Der volle Inhalt der QuelleGeronimi, Julia. „Contribution à la sélection de variables en présence de données longitudinales : application à des biomarqueurs issus d'imagerie médicale“. Electronic Thesis or Diss., Paris, CNAM, 2016. http://www.theses.fr/2016CNAM1114.
Der volle Inhalt der QuelleClinical studies enable us to measure many longitudinales variables. When our goal is to find a link between a response and some covariates, one can use regularisation methods, such as LASSO which have been extended to Generalized Estimating Equations (GEE). They allow us to select a subgroup of variables of interest taking into account intra-patient correlations. Databases often have unfilled data and measurement problems resulting in inevitable missing data. The objective of this thesis is to integrate missing data for variable selection in the presence of longitudinal data. We use mutiple imputation and introduce a new imputation function for the specific case of variables under detection limit. We provide a new variable selection method for correlated data that integrate missing data : the Multiple Imputation Penalized Generalized Estimating Equations (MI-PGEE). Our operator applies the group-LASSO penalty on the group of estimated regression coefficients of the same variable across multiply-imputed datasets. Our method provides a consistent selection across multiply-imputed datasets, where the optimal shrinkage parameter is chosen by minimizing a BIC-like criteria. We then present an application on knee osteoarthritis aiming to select the subset of biomarkers that best explain the differences in joint space width over time
Kamnang, Wanko Patrick. „Optimisation des requêtes skyline multidimensionnelles“. Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0010/document.
Der volle Inhalt der QuelleAs part of the selection of the best items in a multidimensional database,several kinds of query were defined. The skyline operator has the advantage of not requiring the definition of a scoring function in order to classify tuples. However, the property of monotony that this operator does not satify, (i) makes difficult to optimize its queries in a multidimensional context, (ii) makes hard to estimate the size of query result. This work proposes, first, to address the question of estimating the size of the result of a given skyline query, formulating estimators with good statistical properties (unbiased or convergent). Then, it provides two different approaches to optimize multidimensional skyline queries. The first leans on a well known database concept: functional dependencies. And the second approach looks like a data compression method. Both algorithms are very interesting as confirm the experimental results. Finally, we address the issue of skyline queries in dynamic data by adapting one of our previous solutions in this goal
Spence, Stephen. „Une étude du lien entre la productivité et la bienfaisance des entreprises : une présentation des données provenant d'une expérience sur terrain de l'industrie sylvicole en Colombie-Britannique“. Master's thesis, Université Laval, 2016. http://hdl.handle.net/20.500.11794/26592.
Der volle Inhalt der Quelle