Tesis sobre el tema "Data Subgroup"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 23 mejores tesis para su investigación sobre el tema "Data Subgroup".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Atzmüller, Martin. "Knowledge-intensive subgroup mining : techniques for automatic and interactive discovery /". Berlin : Aka, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=2928288&prov=M&dok_var=1&dok_ext=htm.
Texto completoAtzmüller, Martin. "Knowledge-intensive subgroup mining techniques for automatic and interactive discovery". Berlin Aka, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2928288&prov=M&dok_var=1&dok_ext=htm.
Texto completoBelfodil, Aimene. "An order theoretic point-of-view on subgroup discovery". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI078.
Texto completoAs the title of this dissertation may suggest, the aim of this thesis is to provide an order-theoretic point of view on the task of subgroup discovery. Subgroup discovery is the automatic task of discovering interesting hypotheses in databases. That is, given a database, the hypothesis space the analyst wants to explore and a formal way of how the analyst gauges the quality of the hypotheses (e.g. a quality measure); the automated task of subgroup discovery aims to extract the interesting hypothesis w.r.t. these parameters. In order to elaborate fast and efficient algorithms for subgroup discovery, one should understand the underlying properties of the hypothesis space on the one hand and the properties of its quality measure on the other. In this thesis, we extend the state-of-the-art by: (i) providing a unified view of the hypotheses space behind subgroup discovery using the well-founded mathematical tool of order theory, (ii) proposing the new hypothesis space of conjunction of linear inequalities in numerical databases and the algorithms enumerating its elements and (iii) proposing an anytime algorithm for discriminative subgroup discovery on numerical datasets providing guarantees upon interruption
Mistry, Dipesh. "Recursive partitioning based approaches for low back pain subgroup identification in individual patient data meta-analyses". Thesis, University of Warwick, 2014. http://wrap.warwick.ac.uk/64032/.
Texto completoDoubleday, Kevin. "Generation of Individualized Treatment Decision Tree Algorithm with Application to Randomized Control Trials and Electronic Medical Record Data". Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/613559.
Texto completoMueller, Marianne Larissa [Verfasser], Stefan [Akademischer Betreuer] Kramer y Frank [Akademischer Betreuer] Puppe. "Data Mining Methods for Medical Diagnosis : Test Selection, Subgroup Discovery, and Contrained Clustering / Marianne Larissa Mueller. Gutachter: Stefan Kramer ; Frank Puppe. Betreuer: Stefan Kramer". München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1024964264/34.
Texto completoLi, Rui [Verfasser], Burkhard [Akademischer Betreuer] [Gutachter] Rost y Stefan [Gutachter] Kramer. "Data Mining and Machine Learning Methods for High-dimensional Patient Data in Dementia Research: Voxel Features Mining, Subgroup Discovery and Multi-view Learning / Rui Li ; Gutachter: Burkhard Rost, Stefan Kramer ; Betreuer: Burkhard Rost". München : Universitätsbibliothek der TU München, 2017. http://d-nb.info/1125018224/34.
Texto completoDomingue, Jean-Laurent. "Nurses’ Knowledge, Attitudes and Documentation Practices in a Context of HIV Criminalization: A Secondary Subgroup Analysis of Data from California, Florida, New York, and Texas Nurses". Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/35570.
Texto completoBelfodil, Adnene. "Exceptional model mining for behavioral data analysis". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI086.
Texto completoWith the rapid proliferation of data platforms collecting and curating data related to various domains such as governments data, education data, environment data or product ratings, more and more data are available online. This offers an unparalleled opportunity to study the behavior of individuals and the interactions between them. In the political sphere, being able to query datasets of voting records provides interesting insights for data journalists and political analysts. In particular, such data can be leveraged for the investigation of exceptionally consensual/controversial topics. Consider data describing the voting behavior in the European Parliament (EP). Such a dataset records the votes of each member (MEP) in voting sessions held in the parliament, as well as information on the parliamentarians (e.g., gender, national party, European party alliance) and the sessions (e.g., topic, date). This dataset offers opportunities to study the agreement or disagreement of coherent subgroups, especially to highlight unexpected behavior. It is to be expected that on the majority of voting sessions, MEPs will vote along the lines of their European party alliance. However, when matters are of interest to a specific nation within Europe, alignments may change and agreements can be formed or dissolved. For instance, when a legislative procedure on fishing rights is put before the MEPs, the island nation of the UK can be expected to agree on a specific course of action regardless of their party alliance, fostering an exceptional agreement where strong polarization exists otherwise. In this thesis, we aim to discover such exceptional (dis)agreement patterns not only in voting data but also in more generic data, called behavioral data, which involves individuals performing observable actions on entities. We devise two novel methods which offer complementary angles of exceptional (dis)agreement in behavioral data: within and between groups. These two approaches called Debunk and Deviant, ideally, enables the implementation of a sufficiently comprehensive tool to highlight, summarize and analyze exceptional comportments in behavioral data. We thoroughly investigate the qualitative and quantitative performances of the devised methods. Furthermore, we motivate their usage in the context of computational journalism
Wesley, S. Scott. "Background data subgroups and career outcomes : some developmental influences on person job-matching". Diss., Georgia Institute of Technology, 1989. http://hdl.handle.net/1853/31065.
Texto completoLütz, Elin. "Unsupervised machine learning to detect patient subgroups in electronic health records". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-251669.
Texto completoAnvändandet av digitala journaler för att rapportera patientdata har ökat i takt med digitaliseringen av vården. Dessa data kan innehålla många typer av medicinsk information så som sjukdomssymptom, labbresultat, ICD-10 diagnoskoder och annan patientinformation. EHR data är vanligtvis högdimensionell och innehåller saknade värden, vilket kan leda till beräkningssvårigheter i ett digitalt format. Att upptäcka grupperingar i sådana patientdata kan ge värdefulla insikter inom diagnosprediktion och i utveckling av medicinska beslutsstöd. I detta arbete så undersöker vi en delmängd av digital patientdata som innehåller patientsvar på sjukdomsfrågor. Detta dataset undersöks genom att applicera två populära klustringsalgoritmer: k-means och agglomerativ hierarkisk klustring. Algoritmerna är ställda mot varandra och på olika typer av dataset, primärt rådata och två dataset där saknade värden har ersatts genom imputationstekniker. Det primära utvärderingsmåttet för klustringsalgoritmerna var silhuettvärdet tillsammans med beräknandet av ett euklidiskt distansmått och ett cosinusmått. Resultatet visar att naturliga grupperingar med stor sannolikhet finns att hitta i datasetet. Hierarkisk klustring visade på en högre klusterkvalitet än k-means, och cosinusmåttet var att föredra för detta dataset. Imputation av saknade data ledde till stora förändringar på datastrukturen och således på resultatet av klustringsexperimenten, vilket tyder på att andra och mer avancerade dataspecifika imputationstekniker är att föredra.
Hawken, Steven. "Methodological Approaches to Studying Risk Factors for Adverse Events Following Routine Vaccinations in the General Population and Vulnerable Subgroups of Individuals Using Health Administrative Data". Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31774.
Texto completoHammal, Mohamed Ali. "Contribution à la découverte de sous-groupes corrélés : Application à l’analyse des systèmes territoriaux et des réseaux alimentaires". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI024.
Texto completoBetter feeding cities in quantity and quality, especially large cities, is a major challenge, whose resolution requires a better understanding of the relationships between urban populations and their food. On the scale of urban food systems, we need to understand the availability of food resources crossed with the socio-economic profiles of the territories. But we lack tools and methods to systematically understand the relationships between consumption basins, supply and eating habits. The objective of this thesis is to contribute to the development of new IT tools to process temporal, heterogeneous and multi-sources data in order to identify and characterize behaviors specific to a geographic area. For this, we rely on the joint exploration of gradual patterns, to discover rank correlations, and subgroups in order to find contexts for which the correlations described by the gradual patterns are exceptionally strong compared to the remaining of the data. We propose an enumeration algorithm based on pruning properties with upper bounds, as well as another algorithm which samples the patterns according to the quality measure. These approaches are validated not only on benchmark datasets, but also through an empirical study of the formation of food deserts in the Lyon urban area
Tillberg, Anders. "A multidisciplinary risk assessment of dental restorative materials". Doctoral thesis, Umeå : Univ, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1860.
Texto completoBosc, Guillaume. "Anytime discovery of a diverse set of patterns with Monte Carlo tree search". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI074/document.
Texto completoThe discovery of patterns that strongly distinguish one class label from another is still a challenging data-mining task. Subgroup Discovery (SD) is a formal pattern mining framework that enables the construction of intelligible classifiers, and, most importantly, to elicit interesting hypotheses from the data. However, SD still faces two major issues: (i) how to define appropriate quality measures to characterize the interestingness of a pattern; (ii) how to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is unfeasible. The first issue has been tackled by Exceptional Model Mining (EMM) for discovering patterns that cover tuples that locally induce a model substantially different from the model of the whole dataset. The second issue has been studied in SD and EMM mainly with the use of beam-search strategies and genetic algorithms for discovering a pattern set that is non-redundant, diverse and of high quality. In this thesis, we argue that the greedy nature of most such previous approaches produces pattern sets that lack diversity. Consequently, we formally define pattern mining as a game and solve it with Monte Carlo Tree Search (MCTS), a recent technique mainly used for games and planning problems in artificial intelligence. Contrary to traditional sampling methods, MCTS leads to an any-time pattern mining approach without assumptions on either the quality measure or the data. It converges to an exhaustive search if given enough time and memory. The exploration/exploitation trade-off allows the diversity of the result set to be improved considerably compared to existing heuristics. We show that MCTS quickly finds a diverse pattern set of high quality in our application in neurosciences. We also propose and validate a new quality measure especially tuned for imbalanced multi-label data
Underwood, Marilyn. "The Relationship of 10th-Grade District Progress Monitoring Assessment Scores to Florida Comprehensive Assessment Test Scores in Reading and Mathematics for 2008-2009". Doctoral diss., University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3845.
Texto completoEd.D.
Department of Educational Research, Technology and Leadership
Education
Education EdD
Tseng, Jen Yu y 曾仁佑. "Subgroup Data Analysis Using Survival Tree". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/24649969791940075527.
Texto completo國立清華大學
統計學研究所
104
In this thesis, that we adopt the subgroup analysis to right censored data depends on the method of Su et al. (2008). There are two methods that include Interaction Tree and using the random forest to estimate the importance of each covariate for the subgroup analysis. We try to exploit simulation and real data analysis to observe the performance of them. In real data analysis, we analyze the data of the patients with lung cancer and use their gene expression as the covariate. However, in the large number of covariate, the problem of the calculation speed of Interaction Tree is manifest. In our envision, we decide to sort the covariate in advance and sift the front members having bigger marginal effect to analyze. In the result, the subgroup with heterogeneity of the treatment effect can be defined through this method exactly.
Lemmerich, Florian. "Novel Techniques for Efficient and Effective Subgroup Discovery". Doctoral thesis, 2014. https://nbn-resolving.org/urn:nbn:de:bvb:20-opus-97812.
Texto completoNeue Techniken für effiziente und effektive Subgruppenentdeckung
Atzmüller, Martin. "Knowledge-Intensive Subgroup Mining - Techniques for Automatic and Interactive Discovery". Doctoral thesis, 2006. https://nbn-resolving.org/urn:nbn:de:bvb:20-opus-21004.
Texto completoData Mining wird mit großem Erfolg in vielen Domänen angewandt. Subgruppenentdeckung als wichtiges Teilgebiet des Data Mining kann zum Beispiel gut im Marketing, oder zur Qualitätskontrolle und Analyse in medizinischen Domänen eingesetzt werden. Das allgemeine Ziel besteht darin, potentiell nützliches and letztendlich interessantes Wissen zu entdecken. Jedoch können diese Anforderungen im praktischen Einsatz oft nicht erfüllt werden, etwa falls die eingesetzten Methoden eine schlechte Skalierbarkeit für größere Datensätze aufweisen, falls dem Benutzer zu viele Ergebnisse präsentiert werden, oder falls der Anwender viele der gefundenen Subgruppen-Muster schon kennt. Diese Arbeit stellt eine Kombination von automatischen und interaktiven Techniken vor, um mit den genannten Problemen besser umgehen zu können: Es werden automatische heuristische und vollständige Subgruppenentdeckungs-Verfahren diskutiert, und insbesondere der neuartige SD-Map Algorithmus zur vollständigen Subgruppenentdeckung vorgestellt der sowohl schnell als auch effektiv ist. Bezüglich der interaktiven Techniken werden Methoden zur Subgruppen-Introspektion und Analyse, und fortgeschrittene Visualisierungstechniken vorgestellt, beispielsweise die Zoomtable, die die für die Subgruppenentdeckung wichtigsten Parameter direkt visualisiert und zur Optimierung und Exploration eingesetzt werden kann. Zusätzlich werden verschiedene Visualisierungen zum Vergleich und zur Evaluation von Subgruppen beschrieben um den Benutzer bei diesen essentiellen Schritten zu unterstützen. Weiterhin wird leicht zu formalisierendes Hintergrundwissen vorgestellt, das im Subgruppenentdeckungsprozess in vielfältiger Weise eingesetzt werden kann: Um den Entdeckungsprozess zu fokussieren, den Suchraum einzuschränken, und letztendlich die Effizienz der Entdeckungsmethode zu erhöhen. Insbesondere wird Hintergrundwissen eingeführt, um die Elemente der Anwendungsdomäne zu filtern, um geeignete Abstraktionen zu definieren, Werte zusammenzufassen, und die gefundenen Subgruppenmuster nachzubearbeiten. Schließlich werden diese Techniken in einen wissensintensiven Prozess integriert, der sowohl automatische als auch interaktive Methoden zur Subgruppenentdeckung einschließt. Die praktische Bedeutung des vorgestellten Ansatzes hängt stark von den verfügbaren Werkzeugen ab. Dazu wird das VIKAMINE System als hochintegrierte Umgebung für die wissensintensive aktive Subgruppenentdeckung präsentiert. Die Evaluation des Ansatzes besteht aus zwei Teilen: Hinsichtlich einer Evaluation von Effizienz und Effektivität der Verfahren wird eine experimentelle Evaluation mit synthetischen Daten vorgestellt. Für diesen Zweck wird ein neuartiger in der Arbeit entwickelter Datengenerator angewandt, der eine einfache und intuitive Spezifikation der Datencharakteristiken erlaubt. Für die Evaluation des Ansatzes wurden Daten erzeugt, die ähnliche Charakteristiken aufweisen wie die Daten des angestrebten Einsatzbereichs. Die Ergebnisse der Evaluation zeigen, dass der neuartige SD-Map Algorithmus den anderen in der Arbeit beschriebenen Standard-Algorithmen überlegen ist. Sowohl hinsichtlich der Effizienz, als auch von Precision/Recall bezogen auf die heuristischen Algorithmen bietet SD-Map deutliche Vorteile. Subjektive Evaluationskriterien sind durch die Benutzerakzeptanz, den Nutzen des Ansatzes, und die Interessantheit der Ergebnisse gegeben. Es werden fünf Fallstudien für den Einsatz der vorgestellten Techniken beschrieben: Der Ansatz wurde in medizinischen und technischen Anwendungen mit realen Daten eingesetzt. Dabei wurde er von den Benutzern sehr gut angenommen, und im praktischen Einsatz konnte neuartiges, nützliches, und interessantes Wissen entdeckt werden
Costa, Afonso José Ourives Marques da. "Handling Data Difficulty Factors via a Meta-Learning Approach". Master's thesis, 2020. http://hdl.handle.net/10316/92560.
Texto completoAs aplicações de aprendizagem-máquina são desafiadas pelos fatores de complexidade dos dados. Estes são responsáveis pela degradação da qualidade dos dados, sendo que lidar com estes fatores é uma tarefa importante para evitar a degradação do desempenho de classificadores. Dentro dos fatores de complexidade, o desequilíbrio de classes, que é característico em diversas bases de dados biomédicas, normalmente é abordado com algoritmos de pré-processamento, que são eficazes em melhorar o desempenho de tarefas de classificação.Dado que a seleção do algoritmo mais indicado para lidar com o desequilíbrio de classes muitas vezes é baseada em abordagens de "força-bruta", sistemas de recomendação têm sido desenvolvidos de forma a providenciar a estratégia ótima a utilizar para um dado problema, baseado nas meta-características do conjunto de dados. No entanto, embora diversos sistemas de recomendação tenham sido bem-sucedidos, estes não têm a capacidade de fornecer conhecimento interpretável, uma vez que apenas a entrada (conjunto de dados) e a saída (estratégia recomendada) destes sistemas são conhecidas.De forma a solucionar este problema, o objetivo da presente dissertação é estudar as relações entre meta-características dos dados e algoritmos de pré-processamento no desempenho de classificadores. Para alcançar os objetivos, uma metodologia de meta-aprendizagem foi desenvolvida, baseada em "Exceptional Preferences Mining", que demonstrou ser apropriada para fornecer condições interpretáveis, referentes às relações entre as meta-características dos dados e o ranking de algoritmos de pré-processamento. Em adição, uma nova métrica é proposta com a finalidade de salientar os subgrupos onde grandes variações são observadas, no desempenho de vários algoritmos de pré-processamento.As experiências realizadas incluem 163 bases de dados, pré-processadas com 9 estratégias a nível dos dados, de onde meta-características provenientes de 8 grupos foram extraídas. Os resultados mais relevantes salientam que a utilização de uma estratégia para lidar com o desequilíbrio de classes pode nem sempre ser necessária e que não existe uma relação evidente com a proporção de pontos entre as classes maioritária e minoritária, mas sim com a associação do desequilíbrio de classes com outros fatores de complexidade. Adicionalmente, os domínios de aplicação de estratégias para lidar com distribuições assimétricas de classes são individualmente descritas, para além de outros resultados úteis para o desenvolvimento de novos sistemas de recomendação.
Machine learning applications are challenged by data difficulty factors, which are responsible for the degradation of data quality and dealing with them is a demanding task. Among the difficulty factors, class imbalance, which is noticeable in many biomedical databases, is often tackled with preprocessing algorithms that effectively improve classification performance.Since the selection of an imbalance strategy for a problem often encompasses "brute-force" approaches, recommendation systems have been developed to provide optimal imbalance strategies for the problem at hand, based on the meta-characteristics of the dataset. However, despite the success of such systems, arguably these do not provide any insightful information, since only the inputs (datasets) and outputs (recommended imbalance strategies) of these systems are provided.Addressing this issue, the purpose of this dissertation is to provide a study of the relations between data meta-characteristics and imbalance strategies in the performance of classifiers. To this end, a meta-learning-based framework was developed, based on Exceptional Preferences Mining, which has proven to be suitable to deliver interpretable conditions, concerning the relations between data meta-characteristics and the ranking of preprocessing algorithms. Additionally, a novel metric was proposed, which is suitable to highlight the subgroups where steep performance variations are observable, among the performance of imbalance strategies.The experiments considered 163 datasets, where meta-features from 8 groups were extracted and preprocessed with 9 data-level imbalance strategies. The main findings include that employing an imbalance strategy may not always be required and that there is no evident relation with the imbalance ratio, rather with the association of imbalance with other difficulty factors. Moreover, the domains of application of individual imbalance strategies are described, among other findings suitable for the design of novel recommendation systems.
Wang, Xiaojing. "Bayesian Modeling Using Latent Structures". Diss., 2012. http://hdl.handle.net/10161/5848.
Texto completoThis dissertation is devoted to modeling complex data from the
Bayesian perspective via constructing priors with latent structures.
There are three major contexts in which this is done -- strategies for
the analysis of dynamic longitudinal data, estimating
shape-constrained functions, and identifying subgroups. The
methodology is illustrated in three different
interdisciplinary contexts: (1) adaptive measurement testing in
education; (2) emulation of computer models for vehicle crashworthiness; and (3) subgroup analyses based on biomarkers.
Chapter 1 presents an overview of the utilized latent structured
priors and an overview of the remainder of the thesis. Chapter 2 is
motivated by the problem of analyzing dichotomous longitudinal data
observed at variable and irregular time points for adaptive
measurement testing in education. One of its main contributions lies
in developing a new class of Dynamic Item Response (DIR) models via
specifying a novel dynamic structure on the prior of the latent
trait. The Bayesian inference for DIR models is undertaken, which
permits borrowing strength from different individuals, allows the
retrospective analysis of an individual's changing ability, and
allows for online prediction of one's ability changes. Proof of
posterior propriety is presented, ensuring that the objective
Bayesian analysis is rigorous.
Chapter 3 deals with nonparametric function estimation under
shape constraints, such as monotonicity, convexity or concavity. A
motivating illustration is to generate an emulator to approximate a computer
model for vehicle crashworthiness. Although Gaussian processes are
very flexible and widely used in function estimation, they are not
naturally amenable to incorporation of such constraints. Gaussian
processes with the squared exponential correlation function have the
interesting property that their derivative processes are also
Gaussian processes and are jointly Gaussian processes with the
original Gaussian process. This allows one to impose shape constraints
through the derivative process. Two alternative ways of incorporating derivative
information into Gaussian processes priors are proposed, with one
focusing on scenarios (important in emulation of computer
models) in which the function may have flat regions.
Chapter 4 introduces a Bayesian method to control for multiplicity
in subgroup analyses through tree-based models that limit the
subgroups under consideration to those that are a priori plausible.
Once the prior modeling of the tree is accomplished, each tree will
yield a statistical model; Bayesian model selection analyses then
complete the statistical computation for any quantity of interest,
resulting in multiplicity-controlled inferences. This research is
motivated by a problem of biomarker and subgroup identification to
develop tailored therapeutics. Chapter 5 presents conclusions and
some directions for future research.
Dissertation
Shen, Hua. "Statistical Methods for Life History Analysis Involving Latent Processes". Thesis, 2014. http://hdl.handle.net/10012/8496.
Texto completoLee, Hsi-Yen y 李錫諺. "Iterative clustering of gene expression data in search of subgroups of general population". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/4sadzd.
Texto completo