Dissertations / Theses: 'Inferenza causale'

1

HAMMAD, AHMED TAREK. "Tecniche di valutazione degli effetti dei Programmi e delle Politiche Pubbliche. L' approccio di apprendimento automatico causale." Doctoral thesis, Università Cattolica del Sacro Cuore, 2022. http://hdl.handle.net/10280/110705.

Full text

Abstract:

L'analisi dei meccanismi causali è stata considerata in varie discipline come la sociologia, l’epidemiologia, le scienze politiche, la psicologia e l’economia. Questi approcci permettere di scoprire relazioni e meccanismi causali studiando il ruolo di una variabile di trattamento (come ad esempio una politica pubblica o un programma) su un insieme di variabili risultato di interesse o diverse variabili intermedie sul percorso causale tra il trattamento e le variabili risultato. Questa tesi si concentra innanzitutto sulla revisione e l'esplorazione di strategie alternative per indagare gli effetti causali e gli effetti di mediazione multipli utilizzando algoritmi di apprendimento automatico (Machine Learning) che si sono dimostrati particolarmente adatti per rispondere a domande di ricerca in contesti complessi caratterizzati dalla presenza di relazioni non lineari. In secondo luogo, la tesi fornisce due esempi empirici in cui due algoritmi di Machine Learning, ovvero Generalized Random Foresta e Multiple Additive Regression Trees, vengono utilizzati per tenere conto di importanti variabili di controllo nell'inferenza causale seguendo un approccio “data-driven”.
The analysis of causal mechanisms has been considered in various disciplines such as sociology, epidemiology, political science, psychology and economics. These approaches allow uncovering causal relations and mechanisms by studying the role of a treatment variable (such as a policy or a program) on a set of outcomes of interest or different intermediates variables on the causal path between the treatment and the outcome variables. This thesis first focuses on reviewing and exploring alternative strategies to investigate causal effects and multiple mediation effects using Machine Learning algorithms which have been shown to be particularly suited for assessing research questions in complex settings with non-linear relations. Second, the thesis provides two empirical examples where two Machine Learning algorithms, namely the Generalized Random Forest and Multiple Additive Regression Trees, are used to account for important control variables in causal inference in a data-driven way. By bridging a fundamental gap between causality and advanced data modelling, this work combines state of the art theories and modelling techniques.

APA, Harvard, Vancouver, ISO, and other styles

2

ROMIO, SILVANA ANTONIETTA. "Modelli marginali strutturali per lo studio dell'effetto causale di fattori di rischio in presenza di confondenti tempo dipendenti." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2010. http://hdl.handle.net/10281/8048.

Full text

Abstract:

Uno degli obiettivi piu importanti della ricerca epidemiologica è quello di analizzare la relazione tra uno o più fattori di rischio ed un evento. Tali relazioni sono spesso complicate dalla presenza di confondenti, il cui concetto è estremamente complesso da formalizzare. Dal punto di vista dell'analisi causale, si dice che esiste confondimento quando la misura di associazione non coincide con quella di effetto corrispondente, cioè quando ad esempio il rischio relativo non coincide con il rischio relativo causale. Il problema è quindi quello di individuare i disegni e le ipotesi sulla base delle quali è possibile calcolare l'effetto causale oggetto di studio. Ad esempio, gli studi clinici controllati randomizzati sono nati con lo scopo di minimizzare l'influenza di errori sistematici nella misurazione dell'effetto di un fattore di rischio su di un outcome. Inoltre in questi studi le misure di associazione risultano essere uguali a quelle di effetto (causali). Negli studi osservazionali lo scenario diventa più complesso per la presenza di una o più variabili che possono alterare o 'confondere' la relazione d'interesse poichè lo sperimentatore non può in alcun modo intervenire sulle covariate osservate né sull'outcome. Di particolare interesse risulta quindi l'identificazione di metodi che permettano di risolvere il problema del confondimento. Il problema è particolarmente complesso nello studio dell'effetto causale di un fattore di rischio in presenza di confondenti tempo dipendenti e cioè una variabile che, condizionatamente alla storia di esposizione pregressa è un predittore sia dell'outcome che dell'esposizione successiva. Nel presente lavoro è stato studiato un importante problema di sanità pubblica come quello di esplorare l'esistenza di una relazione causale tra abitudine al fumo e diminuzione dell'indice di massa corporea (body mass index - BMI) considerando come confondente tempo dipendente lo stesso BMI misurato al tempo precedente, utilizzando un modello marginale strutturale per misure ripetute avendo a disposizione i dati relativi ad una coorte di studenti svedesi (coorte BROMS). L'elevata numerosita di tale coorte e l'accuratezza e tipologia dei dati raccolti la rendono particolarente adatta allo studio di fenomeni dinamici comportamentali caratteristici dell'adolescenza. Dallo studio emerge come l'effetto causale cumulato del fumo di sigaretta sulla riduzione del BMI è significativo solo nelle donne, con una stima del parametro relativo all'interazione tra l'esposizione al fumo e genere pari a 0.322 (p-value < 0.001) mentre la stima del parametro relativo al consumo cumulato di sigarette nei maschi è non signicativo e pari a 0.053 (p-value pari a 0.464). I risultati ottenuti sono consistenti con quanto riportato in studi precedenti.

APA, Harvard, Vancouver, ISO, and other styles

3

Nguyên, Tri Long. "Inférence causale, modélisation prédictive et décision médicale." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT028.

Full text

Abstract:

La prise de décision médicale se définit par le choix du traitement de la maladie, dans l’attente d’un résultat probable tentant de maximiser les bénéfices sur la santé du patient. Ce choix de traitement doit donc reposer sur les preuves scientifiques de son efficacité, ce qui renvoie à une problématique d’estimation de l’effet-traitement. Dans une première partie, nous présentons, proposons et discutons des méthodes d’inférence causale, permettant d’estimer cet effet-traitement par des approches expérimentales ou observationnelles. Toutefois, les preuves obtenues par ces méthodes fournissent une information sur l’effet-traitement uniquement à l’échelle de la population globale, et non à l’échelle de l’individu. Connaître le devenir probable du patient est essentiel pour adapter une décision clinique. Nous présentons donc, dans une deuxième partie, l’approche par modélisation prédictive, qui a permis une avancée en médecine personnalisée. Les modèles prédictifs fournissent au clinicien une information pronostique pour son patient, lui permettant ensuite le choix d’adapter le traitement. Cependant, cette approche a ses limites, puisque ce choix de traitement repose encore une fois sur des preuves établies en population globale. Dans une troisième partie, nous proposons donc une méthode originale d’estimation de l’effet-traitement individuel, en combinant inférence causale et modélisation prédictive. Dans le cas où un traitement est envisagé, notre approche permettra au clinicien de connaître et de comparer d’emblée le pronostic de son patient « avant traitement » et son pronostic « après traitement ». Huit articles étayent ces approches
Medical decision-making is defined by the choice of treatment of illness, which attempts to maximize the healthcare benefit, given a probable outcome. The choice of a treatment must be therefore based on a scientific evidence. It refers to a problem of estimating the treatment effect. In a first part, we present, discuss and propose causal inference methods for estimating the treatment effect using experimental or observational designs. However, the evidences provided by these approaches are established at the population level, not at the individual level. Foreknowing the patient’s probability of outcome is essential for adapting a clinical decision. In a second part, we present the approach of predictive modeling, which provided a leap forward in personalized medicine. Predictive models give the patient’s prognosis at baseline and then let the clinician decide on treatment. This approach is therefore limited, as the choice of treatment is still based on evidences stated at the overall population level. In a third part, we propose an original method for estimating the individual treatment effect, by combining causal inference and predictive modeling. Whether a treatment is foreseen, our approach allows the clinician to foreknow and compare both the patient’s prognosis without treatment and the patient’s prognosis with treatment. Within this thesis, we present a series of eight articles

APA, Harvard, Vancouver, ISO, and other styles

4

Sun, Xiaohai. "Causal inference from statistical data /." Berlin : Logos-Verl, 2008. http://d-nb.info/988947331/04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

LIU, DAYANG. "A Review of Causal Inference." Digital WPI, 2009. https://digitalcommons.wpi.edu/etd-theses/44.

Full text

Abstract:

In this report, I first review the evolution of ideas of causation as it relates to causal inference. Then I introduce two currently competing perspectives on this issue: the counterfactual perspective and the noncounterfactual perspective. The ideas of two statisticians, Donald B. Rubin, representing the counterfactual perspective, and A.P.Dawid, representing the noncounterfactual perspective are examined in detail and compared with the evolution of ideas of causality. The main difference between these two perspectives is that the counterfactual perspective is based on counterfactuals which cannot be observed even in principle but the noncounterfactual perspective only relies on observables. I describe the definition of causes and causal inference methods under both perspectives, and I illustrate the application of the two types of methods by specific examples. Finally, I explore various controversies on these two perspectives.

APA, Harvard, Vancouver, ISO, and other styles

6

Sauley, Beau. "Three Essays in Causal Inference." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627659095905957.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Liu, Dayang. "A review of causal inference." Worcester, Mass. : Worcester Polytechnic Institute, 2009. http://www.wpi.edu/Pubs/ETD/Available/etd-010909-121301/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Mahmood, Sharif. "Finding common support and assessing matching methods for causal inference." Diss., Kansas State University, 2017. http://hdl.handle.net/2097/36190.

Full text

Abstract:

Doctor of Philosophy
Department of Statistics
Michael J. Higgins
This dissertation presents an approach to assess and validate causal inference tools to es- timate the causal effect of a treatment. Finding treatment effects in observational studies is complicated by the need to control for confounders. Common approaches for controlling include using prognostically important covariates to form groups of similar units containing both treatment and control units or modeling responses through interpolation. This disser- tation proposes a series of new, computationally efficient methods to improve the analysis of observational studies. Treatment effects are only reliably estimated for a subpopulation under which a common support assumption holds—one in which treatment and control covariate spaces overlap. Given a distance metric measuring dissimilarity between units, a graph theory is used to find common support. An adjacency graph is constructed where edges are drawn between similar treated and control units to determine regions of common support by finding the largest connected components (LCC) of this graph. The results show that LCC improves on existing methods by efficiently constructing regions that preserve clustering in the data while ensuring interpretability of the region through the distance metric. This approach is extended to propose a new matching method called largest caliper matching (LCM). LCM is a version of cardinality matching—a type of matching used to maximize the number of units in an observational study under a covariate balance constraint between treatment groups. While traditional cardinality matching is an NP-hard, LCM can be completed in polynomial time. The performance of LCM with other five popular matching methods are shown through a series of Monte Carlo simulations. The performance of the simulations is measured by the bias, empirical standard deviation and the mean square error of the estimates under different treatment prevalence and different distributions of covariates. The formed matched samples improve estimation of the population treatment effect in a wide range of settings, and suggest cases in which certain matching algorithms perform better than others. Finally, this dissertation presents an application of LCC and matching methods on a study of the effectiveness of right heart catheterization (RHC) and find that clinical outcomes are significantly worse for patients that undergo RHC.

APA, Harvard, Vancouver, ISO, and other styles

9

Guo, H. "Statistical causal inference and propensity analysis." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.599787.

Full text

Abstract:

Statistical causal inference from an observational study often requires adjustment for a possibly multi-dimensional covariate, where there is a need for dimension reduction. Propensity score analysis (Rosenbaum and Rubin 1983) is a popular approach to such reduction. This thesis addresses causal inference within Dawid’s decision-theoretic framework, where studies of “sufficient covariate” and its properties are essential. The role of a propensity variable, obtained from “treatment-sufficient reduction”, is illustrated and examined by a simple normal linear model. As propensity analysis is believed to reduce bias and improve precision, both population-based and sample-based linear regressions have been implemented, with adjustments for the multivariate covariate and for a scalar propensity variable. Theoretical illustrations are then verified by simulation results. In addition, propensity analysis in a non-linear model: logistic regression is also discussed, followed by the investigation of the augmented inverse probability weighted (AIPW) estimator, which is a combination of a response model and a propensity model. It is found that, in the linear regression with homoscedasticity, propensity variable analysis results in exactly the same estimated causal effect as that from multivariate linear regression, for both population and sample. It is claimed that adjusting for an estimated propensity variable yields better precision than the true propensity variable, which is proved to not be universally valid. The AIPW estimator has the property of “Double robustness” and it is possible to improve the precision given that the propensity model is correctly specified.

APA, Harvard, Vancouver, ISO, and other styles

10

Fancsali, Stephen E. "Constructing Variables That Support Causal Inference." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/398.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Morrissey, Edward R. "Bayesian inference of causal gene networks." Thesis, University of Warwick, 2012. http://wrap.warwick.ac.uk/45732/.

Full text

Abstract:

Genes do not act alone, rather they form part of large interacting networks with certain genes regulating the activity of others. The structure of these networks is of great importance as it can produce emergent behaviour, for instance, oscillations in the expression of network genes or robustness to uctuations. While some networks have been studied in detail, most networks underpinning biological processes have not been fully characterised. Elucidating the structure of these networks is of paramount importance to understand these biological processes. With the advent of whole-genome gene expression measurement technology, a number of statistical methods have been put forward to predict the structure of gene networks from the individual gene measurements. This thesis focuses on the development of Bayesian statistical models for the inference of gene regulatory networks using time-series data. Most models used for network inference rely on the assumption that regulation is linear. This assumption is known to be incorrect and when the interactions are highly non-linear can affect the accuracy of the retrieved network. In order to address this problem we developed an inference model that allows for non-linear interactions and benchmarked the model against a linear interaction model. Next we addressed the problem of how to infer a network when replicate measurements are available. To analyse data with replicates we proposed two models that account for measurement error. The models were compared to the standard way of analysing replicate data, that is, calculating the mean/median of the data and treating it as a noise-free time-series. Following the development of the models we implemented GRENITS, an R/Bioconductor package that integrates the models into a single free package. The package is faster than the previous implementations and is also easier to use. Finally GRENITS was used to fit a network to a whole-genome time-series for the bacterium Streptomyces coelicolor. The accuracy of a sub-network of the inferred network was assessed by comparing gene expression dynamics across datasets collected under different experimental conditions.

APA, Harvard, Vancouver, ISO, and other styles

12

Lu, Jiannan. "On Causal Inference for Ordinal Outcomes." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:23845443.

Full text

Abstract:

This dissertation studies the problem of causal inference for ordinal outcomes. Chapter 1 focuses on the sharp null hypothesis of no treatment effect on all experimental units, and develops a systematic procedure for closed-form construction of sequences of alternative hypotheses in increasing orders of their departures from the sharp null hypothesis. The resulted construction procedure helps assessing the powers of randomization tests with ordinal outcomes. Chapter 2 proposes two new causal parameters, i.e., the probabilities that the treatment is beneficial and strictly beneficial for the experimental units, and derives their sharp bounds using only the marginal distributions, without imposing any assumptions on the joint distribution of the potential outcomes. Chapter 3 generalizes the framework in Chapter 2 to address noncompliance.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

13

Murray, Eleanor Jane. "Agent-Based Models for Causal Inference." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:27201721.

Full text

Abstract:

Sound clinical decision making requires evidence-based estimates of the impact of different treatment strategies. In the absence of randomized trials, two potential approaches are agent-based models (ABMs) and the parametric g-formula. Although these methods are mathematically similar, they have generally been considered in isolation. In this dissertation, we bridge the gap between ABMs and the parametric g-formula, in order to improve the use of ABMs for causal inference. In Chapter 1, we describe bias that can occur when ABM inputs or estimates are extrapolated to new populations, and demonstrate the impact of this bias by comparison with the parametric g-formula. We describe the assumptions that are required for extrapolation of an ABM and show that violations of these assumptions produce biased estimates of the risk and causal effect. In Chapter 2, we describe an approach to provide calibration targets for ABMs, and to identify the set of parameters of the ABM that interfere with transportability of the model results to a particular population. We illustrate this approach by comparing the estimates from an existing ABM, the Cost-Effectiveness of Preventing AIDS Complications (CEPAC) model, to estimates from the parametric g-formula applied to a prospective clinical data of HIV-positive individuals under different treatment initiation strategies. In Chapter 3, we focus on the core problem of causal inference from ABMs: how to define and estimate the parameters described in Chapter 2 in light of the bias described in Chapter 1. To illustrate this problem, we consider CEPAC input parameters for opportunistic diseases. We formally define the effect of interest, describe the conditions under which this effect is or is not identifiable, and describe the assumptions required for transportability of this effect. Finally, we show that the estimation of these parameters via a naïve regression analysis approach provides implausible estimates.

APA, Harvard, Vancouver, ISO, and other styles

14

Shpitser, Ilya. "Complete identification methods for causal inference." Diss., Restricted to subscribing institutions, 2008. http://proquest.umi.com/pqdweb?did=1708387761&sid=1&Fmt=2&clientId=1564&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lam, Patrick Kenneth. "Estimating Individual Causal Effects." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11150.

Full text

Abstract:

Most empirical work focuses on the estimation of average treatment effects (ATE). In this dissertation, I argue for a different way of thinking about causal inference by estimating individual causal effects (ICEs). I argue that focusing on estimating ICEs allows for a more precise and clear understanding of causal inference, reconciles the difference between what the researcher is interested in and what the researcher estimates, allows the researcher to explore and discover treatment effect heterogeneity, bridges the quantitative-qualitative divide, and allows for easy estimation of any other causal estimand.
Government

APA, Harvard, Vancouver, ISO, and other styles

16

Amjad, Muhammad Jehangir. "Sequential data inference via matrix estimation : causal inference, cricket and retail." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120190.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 185-193).
This thesis proposes a unified framework to capture the temporal and longitudinal variation across multiple instances of sequential data. Examples of such data include sales of a product over a period of time across several retail locations; trajectories of scores across cricket games; and annual tobacco consumption across the United States over a period of decades. A key component of our work is the latent variable model (LVM) which views the sequential data as a matrix where the rows correspond to multiple sequences while the columns represent the sequential aspect. The goal is to utilize information in the data within the sequence and across different sequences to address two inferential questions: (a) imputation or "filling missing values" and "de-noising" observed values, and (b) forecasting or predicting "future" values, for a given sequence of data. Using this framework, we build upon the recent developments in "matrix estimation" to address the inferential goals in three different applications. First, a robust variant of the popular "synthetic control" method used in observational studies to draw causal statistical inferences. Second, a score trajectory forecasting algorithm for the game of cricket using historical data. This leads to an unbiased target resetting algorithm for shortened cricket games which is an improvement upon the biased incumbent approach (Duckworth-Lewis-Stern). Third, an algorithm which leads to a consistent estimator for the time- and location-varying demand of products using censored observations in the context of retail. As a final contribution, the algorithms presented are implemented and packaged as a scalable open-source library for the imputation and forecasting of sequential data with applications beyond those presented in this work.
by Muhammad Jehangir Amjad.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

17

Lin, Winston. "Essays on Causal Inference in Randomized Experiments." Thesis, University of California, Berkeley, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3593906.

Full text

Abstract:

This dissertation explores methodological topics in the analysis of randomized experiments, with a focus on weakening the assumptions of conventional models.

Chapter 1 gives an overview of the dissertation, emphasizing connections with other areas of statistics (such as survey sampling) and other fields (such as econometrics and psychometrics).

Chapter 2 reexamines Freedman's critique of ordinary least squares regression adjustment in randomized experiments. Using Neyman's model for randomization inference, Freedman argued that adjustment can lead to worsened asymptotic precision, invalid measures of precision, and small-sample bias. This chapter shows that in sufficiently large samples, those problems are minor or easily fixed. OLS adjustment cannot hurt asymptotic precision when a full set of treatment-covariate interactions is included. Asymptotically valid confidence intervals can be constructed with the Huber-White sandwich standard error estimator. Checks on the asymptotic approximations are illustrated with data from a randomized evaluation of strategies to improve college students' achievement. The strongest reasons to support Freedman's preference for unadjusted estimates are transparency and the dangers of specification search.

Chapter 3 extends the discussion and analysis of the small-sample bias of OLS adjustment. The leading term in the bias of adjustment for multiple covariates is derived and can be estimated empirically, as was done in Chapter 2 for the single-covariate case. Possible implications for choosing a regression specification are discussed.

Chapter 4 explores and modifies an approach suggested by Rosenbaum for analysis of treatment effects when the outcome is censored by death. The chapter is motivated by a randomized trial that studied the effects of an intensive care unit staffing intervention on length of stay in the ICU. The proposed approach estimates effects on the distribution of a composite outcome measure based on ICU mortality and survivors' length of stay, addressing concerns about selection bias by comparing the entire treatment group with the entire control group. Strengths and weaknesses of possible primary significance tests (including the Wilcoxon-Mann-Whitney rank sum test and a heteroskedasticity-robust variant due to Brunner and Munzel) are discussed and illustrated.

APA, Harvard, Vancouver, ISO, and other styles

18

Zajonc, Tristan. "Essays on Causal Inference for Public Policy." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10163.

Full text

Abstract:

Effective policymaking requires understanding the causal effects of competing proposals. Relevant causal quantities include proposals' expected effect on different groups of recipients, the impact of policies over time, the potential trade-offs between competing objectives, and, ultimately, the optimal policy. This dissertation studies causal inference for public policy, with an emphasis on applications in economic development and education. The ﬁrst chapter introduces Bayesian methods for time-varying treatments that commonly arise in economics, health, and education. I present methods that account for dynamic selection on intermediate outcomes and can estimate the causal eﬀect of arbitrary dynamic treatment regimes, recover the optimal regime, and characterize the set of feasible outcomes under diﬀerent regimes. I demonstrate these methods through an application to optimal student tracking in ninth and tenth grade mathematics. The proposed estimands characterize outcomes, mobility, equity, and eﬃciency under diﬀerent tracking regimes. The second chapter studies regression discontinuity designs with multiple forcing variables. Leading examples include education policies where treatment depends on multiple test scores and spatial treatment discontinuities arising from geographic borders. I give local linear estimators for both the conditional eﬀect along the boundary and the average eﬀect over the boundary. For two-dimensional RD designs, I derive an optimal, data-dependent, bandwidth selection rule for the conditional eﬀect. I demonstrate these methods using a summer school and grade retention example. The third chapters illustrate the central role of persistence in estimating and interpreting value-added models of learning. Using data from Pakistani public and private schools, I apply dynamic panel methods that address three key empirical challenges: imperfect persistence, unobserved student heterogeneity, and measurement error. After correcting for these diﬃculties, the estimates suggest that only a ﬁfth to a half of learning persists between grades and that private schools increase average achievement by 0.25 standard deviations each year. In contrast, value-added models that assume perfect persistence yield severely downwardly biased and occasionally wrong-signed estimates of the private school eﬀect.

APA, Harvard, Vancouver, ISO, and other styles

19

Brendel, Markus. "Essays on causal inference in corporate finance." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-180823.

Full text

Abstract:

This dissertation work provides a kaleidoscope of alternative empirical estimation techniques while illuminating and challenging conventional approaches and established findings in the Corporate Finance literature. In particular, the observed „conglomerate discount“ and the effect of diversication and concentrated ownership on firm value are revisited in the course of my cumulated doctoral thesis. In doing so, the main emphasis lies on the inference of causation in the presence of endogeneity concerns, namely by considering potential distortions caused by unobserved heterogeneity, reverse causality or non-random self-selection.

APA, Harvard, Vancouver, ISO, and other styles

20

Budhathoki, Kailash [Verfasser]. "Causal inference on discrete data / Kailash Budhathoki." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2020. http://d-nb.info/1226153801/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Feller, Avi Isaac. "Essays in Causal Inference and Public Policy." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467344.

Full text

Abstract:

This dissertation addresses statistical methods for understanding treatment effect variation in randomized experiments, both in terms of variation across pre-treatment covariates and variation across post-randomization intermediate outcomes. These methods are then applied to data from the National Head Start Impact Study (HSIS), a large-scale randomized evaluation of the Federally funded preschool program, which has become an important part of the policy debate in early childhood education. Chapter 2 proposes a randomization-based approach for testing for the presence of treatment effect variation not explained by observed covariates. The key challenge in using this approach is the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this setting. We explore potential solutions and advocate for a method that guarantees valid tests in finite samples despite this nuisance. We also show how this method readily extends to testing for heterogeneity beyond a given model, which can be useful for assessing the sufficiency of a given scientific theory. We finally apply this method to the HSIS and find that there is indeed significant unexplained treatment effect variation. Chapter 3 leverages model-based principal stratification to assess treatment effect variation across an intermediate outcome in the HSIS. In particular, we estimate differential impacts of Head Start by alternative care setting, the care that children would receive in the absence of the offer to enroll in Head Start. We find strong, positive short-term effects of Head Start on receptive vocabulary for those Compliers who would otherwise be in home-based care. By contrast, we find no meaningful impact of Head Start on vocabulary for those Compliers who would otherwise be in other center-based care. Our findings suggest that alternative care type is a potentially important source of variation in Head Start. Chapter 4 reviews the literature on the use of principal score methods, which rely on predictive covariates rather than outcomes for estimating principal causal effects. We clarify the role of the Principal Ignorability assumption in this approach and show that there are in fact two versions: Strong and Weak Principal Ignorability. We then explore several proposed in the literature and assess their finite sample properties via simulation. Finally, we propose some extensions to the case of two-sided noncompliance and apply these ideas to the HSIS, finding mixed results.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

22

Garcia, Horton Viviana. "Topics in Bayesian Inference for Causal Effects." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:23845483.

Full text

Abstract:

This manuscript addresses two topics in Bayesian inference for causal effects. 1) Treatment noncompliance is frequent in clinical trials, and because the treatment actually received may be different from that assigned, comparisons between groups as randomized will no longer assess the effect of the treatment received. To address this complication, we create latent subgroups based on the potential outcomes of treatment received and focus on the subgroup of compliers, where under certain assumptions the estimands of causal effects of assignment can be interpreted as causal effects of receipt of treatment. We propose estimands of causal effects for right-censored time-to event endpoints, and discuss a framework to estimate those causal effects that relies on modeling survival times as parametric functions of pre-treatment variables. We demonstrate a Bayesian estimation strategy that multiply imputes the missing data using posterior predictive distributions using a randomized clinical trial involving breast cancer patients. Finally, we establish a connection with the commonly used parametric proportional hazards and accelerated failure time models, and briefly discuss the consequences of relaxing the assumption of independent censoring. 2) Bayesian inference for causal effects based on data obtained from ignorable assignment mechanisms can be sensitive to the model specified for the data. Ignorability is defined with respect to specific models for an assignment mechanism and data, which we call the ``true'' generating data models, generally unknown to the statistician; these, in turn, determine a true posterior distribution for a causal estimand of interest. On the other hand, the statistician poses a set of models to conduct the analysis, which we call the ``statistician's'' models; a posterior distribution for the causal estimand can be obtained assuming these models. Let $\Delta_M$ denote the difference between the true models and the statistician's models, and let $\Delta_D$ denote the difference between the true posterior distribution and the statistician's posterior distribution (for a specific estimand). For fixed $\Delta_M$ and fixed sample size, $\Delta_D$ varies more with data-dependent assignment mechanisms than with data-free assignment mechanisms. We illustrate this through a sequence of examples of $\Delta_M$, and under various ignorable assignment mechanisms, namely, complete randomization design, rerandomization design, and the finite selection model design. In each case, we create the 95\% posterior interval for an estimand under a statistician's model, and then compute its coverage probability for the correct posterior distribution; this Bayesian coverage probability is our choice of measure $\Delta_D$. The objective of these examples is to provide insights into the ranges of data models for which Bayesian inference for causal effects from datasets obtained through ignorable assignment mechanisms is approximately valid from the Bayesian perspective, and how these validities are influenced by data-dependent assignment mechanisms.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

23

Bailey, Delia Ruth Grigg Katz Jonathan N. "Essays on causal inference and political representation /." Diss., Pasadena, Calif. : California Institute of Technology, 2007. http://resolver.caltech.edu/CaltechETD:etd-05242007-154102.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

García, Núñez Luis. "Econometría de evaluación de impacto." Economía, 2012. http://repositorio.pucp.edu.pe/index/handle/123456789/117180.

Full text

Abstract:

In recent years the program evaluation methods have become very popular in applied microeconomics. However, the variety of these methods responds to specific problems, which are normally determined by the data available and the impact the researcher tries to measure. This paper summarizes the main methods in the current literature, emphasizing the assumptions under which the average treatment effect and the average treatment effect on the treated are identified. Additionally, after each section I briefly present some applications of these methods. This document is a didactic presentation for advanced students in economics and applied researchers who wish to learn the basics of these techniques
En años recientes los métodos de evaluación de impacto se han difundido ampliamente en la investigaciónmicroeconómica aplicada. Sin embargo, la variedad de métodos responde a problemas particulares y específicos los cuales están determinados normalmente por los datos disponibles y el impacto que se busca medir. El presente documento resume las principales corrientes disponibles en la literatura actual, poniendo énfasis en los supuestos bajo los cuales el efecto tratamiento promedio y el efecto tratamiento promedio sobre los tratados se encuentran identificados. Adicionalmente se presentan algunos ejemplos de aplicaciones prácticas de estos métodos. Se busca hacer una presentación didáctica que pueda ser útil a estudiantes avanzados y a investigadores aplicados que busquen conocer los principios básicos de estas técnicas.

APA, Harvard, Vancouver, ISO, and other styles

25

Andric, Nikola. "Exploring Objective Causal Inference in Case-Noncase Studies under the Rubin Causal Model." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467481.

Full text

Abstract:

Case-noncase studies, also known as case-control studies, are ubiquitous in epidemiology, where a common goal is to estimate the effect of an exposure on an outcome of interest. In many areas of application, such as policy-informing drug utilization research, this effect is inherently causal. Although logistic regression, the predominant method for analysis of case-noncase data, and other traditional methodologies, may provide associative insights, they are generally inappropriate for causal conclusions. As such, they fail to address the very essence of many epidemiological investigations that employ them. In addition, these methodologies do not allow for outcome-free design (Rubin, 2007) of case-noncase data, which compromises the objectivity of resulting inferences. This thesis is directed at exploring what can be done to preserve objectivity in the causal analysis of case-noncase study data. It is structured as follows. In Chapter 1 we introduce a formal framework for studying causal effects from case-noncase data, which builds upon the well-established Rubin Causal Model for prospective studies. In Chapter 2 we propose a two-party, three-step methodology — PrepDA — for objective causal inference with case-noncase data. We illustrate the application of our methodology in a simple non-trivial setting. Its operating characteristics are investigated via simulation, and compared to those of logistic and probit regression. Chapter 3 focuses on the re-analysis of a subset of data from a published article, Karkouti et al. (2006). We investigate whether PrepDA and logistic regression, when applied to case-noncase data, can generate estimates that are concordant with those from the causal analysis of prospectively collected data. We introduce tools for covariate balance assessment across multiple imputed datasets. We explore the potential for analyst bias with logistic regression, when said method is used to analyze case-noncase data. In Chapter 4 we discuss our technology’s advantages over, and drawbacks as compared to, traditional approaches.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

26

Echtermeyer, Christoph. "Causal pattern inference from neural spike train data." Thesis, St Andrews, 2009. http://hdl.handle.net/10023/843.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lundin, Mathias. "Sensitivity Analysis of Untestable Assumptions in Causal Inference." Doctoral thesis, Umeå universitet, Statistiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-43239.

Full text

Abstract:

This thesis contributes to the research field of causal inference, where the effect of a treatment on an outcome is of interest is concerned. Many such effects cannot be estimated through randomised experiments. For example, the effect of higher education on future income needs to be estimated using observational data. In the estimation, assumptions are made to make individuals that get higher education comparable with those not getting higher education, to make the effect estimable. Another assumption often made in causal inference (both in randomised an nonrandomised studies) is that the treatment received by one individual has no effect on the outcome of others. If this assumption is not met, the meaning of the causal effect of the treatment may be unclear. In the first paper the effect of college choice on income is investigated using Swedish register data, by comparing graduates from old and new Swedish universities. A semiparametric method of estimation is used, thereby relaxing functional assumptions for the data. One assumption often made in causal inference in observational studies is that individuals in different treatment groups are comparable, given that a set of pretreatment variables have been adjusted for in the analysis. This so called unconfoundedness assumption is in principle not possible to test and, therefore, in the second paper we propose a Bayesian sensitivity analysis of the unconfoundedness assumption. This analysis is then performed on the results from the first paper. In the third paper of the thesis, we study profile likelihood as a tool for semiparametric estimation of a causal effect of a treatment. A semiparametric version of the Bayesian sensitivity analysis of the unconfoundedness assumption proposed in Paper II is also performed using profile likelihood. The last paper of the thesis is concerned with the estimation of direct and indirect causal effects of a treatment where interference between units is present, i.e., where the treatment of one individual affects the outcome of other individuals. We give unbiased estimators of these direct and indirect effects for situations where treatment probabilities vary between individuals. We also illustrate in a simulation study how direct and indirect causal effects can be estimated when treatment probabilities need to be estimated using background information on individuals.

APA, Harvard, Vancouver, ISO, and other styles

28

Ramsahai, Roland Ryan. "Causal inference with instruments and other supplementary variables." Thesis, University of Oxford, 2008. http://ora.ox.ac.uk/objects/uuid:df2961da-0843-421f-8be4-66a92e6b0d13.

Full text

Abstract:

Instrumental variables have been used for a long time in the econometrics literature for the identification of the causal effect of one random variable, B, on another, C, in the presence of unobserved confounders. In the classical continuous linear model, the causal effect can be point identified by studying the regression of C on A and B on A, where A is the instrument. An instrument is an instance of a supplementary variable which is not of interest in itself but aids identification of causal effects. The method of instrumental variables is extended here to generalised linear models, for which only bounds on the causal effect can be computed. For the discrete instrumental variable model, bounds have been derived in the literature for the causal effect of B on C in terms of the joint distribution of (A,B,C). Using an approach based on convex polytopes, bounds are computed here in terms of the pairwise (A,B) and (A,C) distributions, in direct analogy to the classic use but without the linearity assumption. The bounding technique is also adapted to instrumental models with stronger and weaker assumptions. The computation produces constraints which can be used to invalidate the model. In the literature, constraints of this type are usually tested by checking whether the relative frequencies satisfy them. This is unsatisfactory from a statistical point of view as it ignores the sampling uncertainty of the data. Given the constraints for a model, a proper likelihood analysis is conducted to develop a significance test for the validity of the instrumental model and a bootstrap algorithm for computing confidence intervals for the causal effect. Applications are presented to illustrate the methods and the advantage of a rigorous statistical approach. The use of covariates and intermediate variables for improving the efficiency of causal estimators is also discussed.

APA, Harvard, Vancouver, ISO, and other styles

29

Gong, Zhaojing. "Parametric Potential-Outcome Survival Models for Causal Inference." Thesis, University of Canterbury. Mathematics and Statistics, 2008. http://hdl.handle.net/10092/1803.

Full text

Abstract:

Estimating causal effects in clinical trials is often complicated by treatment noncompliance and missing outcomes. In time-to-event studies, estimation is further complicated by censoring. Censoring is a type of missing outcome, the mechanism of which may be non-ignorable. While new estimates have recently been proposed to account for noncompliance and missing outcomes, few studies have specifically considered time-to-event outcomes, where even the intention-to-treat (ITT) estimator is potentially biased for estimating causal effects of assigned treatment. In this thesis, we develop a series of parametric potential-outcome (PPO) survival models, for the analysis of randomised controlled trials (RCT) with time-to-event outcomes and noncompliance. Both ignorable and non-ignorable censoring mechanisms are considered. We approach model-fitting from a likelihood-based perspective, using the EM algorithm to locate maximum likelihood estimators. We are not aware of any previous work that addresses these complications jointly. In addition, we give new formulations for the average causal effect (ACE) and the complier average causal effect (CACE) to suit survival analysis. To illustrate the likelihood-based method proposed in this thesis, the HIP breast cancer trial data \citep{Baker98, Shapiro88} were re-analysed using specific PPO-survival models, the Weibull and log-normal based PPO-survival models, which assume that the failure time and censored time distributions both follow Weibull or log-normal distributions. Furthermore, an extended PPO-survival model is also derived in this thesis, which permits investigation into the impact of causal effect after accommodating certain pre-treatment covariates. This is an important contribution to the potential outcomes, survival and RCT literature. For comparison, the Frangakis-Rubin (F-R) model \citep{Frangakis99} is also applied to the HIP breast cancer trial data. To date, the F-R model has not yet been applied to any time-to-event data in the literature.

APA, Harvard, Vancouver, ISO, and other styles

30

Arcangeloni, Luca. "Causal Inference for Jamming Detection in Adverse Scenarios." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

The goal of this thesis is the development of an anti-jamming defense mechanism based on causal inference. The current state-of-the-art methods to compute causality, i.e., Granger Causality (GC), Transfer Entropy (TE) and Convergent Cross Mapping (CCM) are presented and they are used to detect the smart jammer into an appropriate simulation environment. The performances of the causality tools are evaluated, pointing out how the TE obtains the best results while the GC fails the detection of the intruder. The innovative CCM algorithm, instead, requires to function a deterministic structure of the communications. In the first part of the work, before that simulation environment is impemented, the three methods are compared to underline their theoretical advantages and disadvantages.

APA, Harvard, Vancouver, ISO, and other styles

31

Lu, Danni. "Representation Learning Based Causal Inference in Observational Studies." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/102426.

Full text

Abstract:

This dissertation investigates novel statistical approaches for causal effect estimation in observational settings, where controlled experimentation is infeasible and confounding is the main hurdle in estimating causal effect. As such, deconfounding constructs the main subject of this dissertation, that is (i) to restore the covariate balance between treatment groups and (ii) to attenuate spurious correlations in training data to derive valid causal conclusions that generalize. By incorporating ideas from representation learning, adversarial matching, generative causal estimation, and invariant risk modeling, this dissertation establishes a causal framework that balances the covariate distribution in latent representation space to yield individualized estimations, and further contributes novel perspectives on causal effect estimation based on invariance principles. The dissertation begins with a systematic review and examination of classical propensity score based balancing schemes for population-level causal effect estimation, presented in Chapter 2. Three causal estimands that target different foci in the population are considered: average treatment effect on the whole population (ATE), average treatment effect on the treated population (ATT), and average treatment effect on the overlap population (ATO). The procedure is demonstrated in a naturalistic driving study (NDS) to evaluate the causal effect of cellphone distraction on crash risk. While highlighting the importance of adopting causal perspectives in analyzing risk factors, discussions on the limitations in balance efficiency, robustness against high-dimensional data and complex interactions, and the need for individualization are provided to motivate subsequent developments. Chapter 3 presents a novel generative Bayesian causal estimation framework named Balancing Variational Neural Inference of Causal Effects (BV-NICE). Via appealing to the Robinson factorization and a latent Bayesian model, a novel variational bound on likelihood is derived, explicitly characterized by the causal effect and propensity score. Notably, by treating observed variables as noisy proxies of unmeasurable latent confounders, the variational posterior approximation is re-purposed as a stochastic feature encoder that fully acknowledges representation uncertainties. To resolve the imbalance in representations, BV-NICE enforces KL-regularization on the respective representation marginals using Fenchel mini-max learning, justified by a new generalization bound on the counterfactual prediction accuracy. The robustness and effectiveness of this framework are demonstrated through an extensive set of tests against competing solutions on semi-synthetic and real-world datasets. In recognition of the reliability issue when extending causal conclusions beyond training distributions, Chapter 4 argues ascertaining causal stability is the key and introduces a novel procedure called Risk Invariant Causal Estimation (RICE). By carefully re-examining the relationship between statistical invariance and causality, RICE cleverly leverages the observed data disparities to enable the identification of stable causal effects. Concretely, the causal inference objective is reformulated under the framework of invariant risk modeling (IRM), where a population-optimality penalty is enforced to filter out un-generalizable effects across heterogeneous populations. Importantly, RICE allows settings where counterfactual reasoning with unobserved confounding or biased sampling designs become feasible. The effectiveness of this new proposal is verified with respect to a variety of study designs on real and synthetic data. In summary, this dissertation presents a flexible causal inference framework that acknowledges the representation uncertainties and data heterogeneities. It enjoys three merits: improved balance to complex covariate interactions, enhanced robustness to unobservable latent confounders, and better generalizability to novel populations.
Doctor of Philosophy
Reasoning cause and effect is the innate ability of a human. While the drive to understand cause and effect is instinct, the rigorous reasoning process is usually trained through the observation of countless trials and failures. In this dissertation, we embark on a journey to explore various principles and novel statistical approaches for causal inference in observational studies. Throughout the dissertation, we focus on the causal effect estimation which answers questions like ``what if" and ``what could have happened". The causal effect of a treatment is measured by comparing the outcomes corresponding to different treatment levels of the same unit, e.g. ``what if the unit is treated instead of not treated?". The challenge lies in the fact that i) a unit only receives one treatment at a time and therefore it is impossible to directly compare outcomes of different treatment levels; ii) comparing the outcomes across different units may involve bias due to confounding as the treatment assignment potentially follows a systematic mechanism. Therefore, deconfounding constructs the main hurdle in estimating causal effects. This dissertation presents two parallel principles of deconfounding: i) balancing, i.e., comparing difference under similar conditions; ii) contrasting, i.e., extracting invariance under heterogeneous conditions. Chapter 2 and Chapter 3 explore causal effect through balancing, with the former systematically reviews a classical propensity score weighting approach in a conventional data setting and the latter presents a novel generative Bayesian framework named Balancing Variational Neural Inference of Causal Effects(BV-NICE) for high-dimensional, complex, and noisy observational data. It incorporates the advance deep learning techniques of representation learning, adversarial learning, and variational inference. The robustness and effectiveness of the proposed framework are demonstrated through an extensive set of experiments. Chapter 4 extracts causal effect through contrasting, emphasizing that ascertaining stability is the key of causality. A novel causal effect estimating procedure called Risk Invariant Causal Estimation(RICE) is proposed that leverages the observed data disparities to enable the identification of stable causal effects. The improved generalizability of RICE is demonstrated through synthetic data with different structures, compared with state-of-art models. In summary, this dissertation presents a flexible causal inference framework that acknowledges the data uncertainties and heterogeneities. By promoting two different aspects of causal principles and integrating advance deep learning techniques, the proposed framework shows improved balance for complex covariate interactions, enhanced robustness for unobservable latent confounders, and better generalizability for novel populations.

APA, Harvard, Vancouver, ISO, and other styles

32

Kovach, Matthew. "Causal Inference of Human Resources Key Performance Indicators." Bowling Green State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1542361652897175.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Lee, Joseph Jiazong. "Extensions of Randomization-Based Methods for Causal Inference." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17463974.

Full text

Abstract:

In randomized experiments, the random assignment of units to treatment groups justifies many of the traditional analysis methods for evaluating causal effects. Specifying subgroups of units for further examination after observing outcomes, however, may partially nullify any advantages of randomized assignment when data are analyzed naively. Some previous statistical literature has treated all post-hoc analyses homogeneously as entirely invalid and thus uninterpretable. Alternative analysis methods and the extent of the validity of such analyses remain largely unstudied. Here Chapter 1 proposes a novel, randomization-based method that generates valid post-hoc subgroup p-values, provided we know exactly how the subgroups were constructed. If we do not know the exact subgrouping procedure, our method may still place helpful bounds on the significance level of estimated effects. Chapter 2 extends the proposed methodology to generate valid posterior predictive p-values for partially post-hoc subgroup analyses, i.e., analyses that compare existing experimental data --- from which a subgroup specification is derived --- to new, subgroup-only data. Both chapters are motivated by pharmaceutical examples in which subgroup analyses played pivotal and controversial roles. Chapter 3 extends our randomization-based methodology to more general randomized experiments with multiple testing and nuisance unknowns. The results are valid familywise tests that are doubly advantageous, in terms of statistical power, over traditional methods. We apply our methods to data from the United States Job Training Partnership Act (JTPA) Study, where our analyses lead to different conclusions regarding the significance of estimated JTPA effects. In all chapters, we investigate the operating characteristics and demonstrate the advantages of our methods through series of simulations.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

34

Ding, Peng. "Exploring the Role of Randomization in Causal Inference." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467349.

Full text

Abstract:

This manuscript includes three topics in causal inference, all of which are under the randomization inference framework (Neyman, 1923; Fisher, 1935a; Rubin, 1978). This manuscript contains three self-contained chapters. Chapter 1. Under the potential outcomes framework, causal effects are defined as comparisons between potential outcomes under treatment and control. To infer causal effects from randomized experiments, Neyman proposed to test the null hypothesis of zero average causal effect (Neyman’s null), and Fisher proposed to test the null hypothesis of zero individual causal effect (Fisher’s null). Although the subtle difference between Neyman’s null and Fisher’s null has caused lots of controversies and confusions for both theoretical and practical statisticians, a careful comparison between the two approaches has been lacking in the literature for more than eighty years. I fill in this historical gap by making a theoretical comparison between them and highlighting an intriguing paradox that has not been recognized by previous re- searchers. Logically, Fisher’s null implies Neyman’s null. It is therefore surprising that, in actual completely randomized experiments, rejection of Neyman’s null does not imply rejection of Fisher’s null for many realistic situations, including the case with constant causal effect. Furthermore, I show that this paradox also exists in other commonly-used experiments, such as stratified experiments, matched-pair experiments, and factorial experiments. Asymptotic analyses, numerical examples, and real data examples all support this surprising phenomenon. Besides its historical and theoretical importance, this paradox also leads to useful practical implications for modern researchers. Chapter 2. Causal inference in completely randomized treatment-control studies with binary outcomes is discussed from Fisherian, Neymanian and Bayesian perspectives, using the potential outcomes framework. A randomization-based justification of Fisher’s exact test is provided. Arguing that the crucial assumption of constant causal effect is often unrealistic, and holds only for extreme cases, some new asymptotic and Bayesian inferential procedures are proposed. The proposed procedures exploit the intrinsic non-additivity of unit-level causal effects, can be applied to linear and non- linear estimands, and dominate the existing methods, as verified theoretically and also through simulation studies. Chapter 3. Recent literature has underscored the critical role of treatment effect variation in estimating and understanding causal effects. This approach, however, is in contrast to much of the foundational research on causal inference; Neyman, for example, avoided such variation through his focus on the average treatment effect and his definition of the confidence interval. In this chapter, I extend the Ney- manian framework to explicitly allow both for treatment effect variation explained by covariates, known as the systematic component, and for unexplained treatment effect variation, known as the idiosyncratic component. This perspective enables es- timation and testing of impact variation without imposing a model on the marginal distributions of potential outcomes, with the workhorse approach of regression with interaction terms being a special case. My approach leads to two practical results. First, I combine estimates of systematic impact variation with sharp bounds on over- all treatment variation to obtain bounds on the proportion of total impact variation explained by a given model—this is essentially an R2 for treatment effect variation. Second, by using covariates to partially account for the correlation of potential out- comes problem, I exploit this perspective to sharpen the bounds on the variance of the average treatment effect estimate itself. As long as the treatment effect varies across observed covariates, the resulting bounds are sharper than the current sharp bounds in the literature. I apply these ideas to a large randomized evaluation in educational research, showing that these results are meaningful in practice.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

35

Burauel, Patrick [Verfasser]. "Essays on Methods for Causal Inference / Patrick Burauel." Berlin : Freie Universität Berlin, 2020. http://d-nb.info/1218077816/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

MOSCELLI, GIUSEPPE. "Essays on causal inference and applied health economics." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2013. http://hdl.handle.net/2108/207907.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Hajage, David. "Utilisation du score de propension et du score pronostique en pharmacoépidémiologie." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC175/document.

Full text

Abstract:

Les études observationnelles en pharmacoépidémiologie sont souvent mises en place pour évaluer un médicament mis sur le marché récemment ou concurrencé par de nombreuses alternatives thérapeutiques. Cette situation conduit à devoir évaluer l'effet d'un médicament dans une cohorte comprenant peu de sujets traités, c'est à dire une population où l'exposition d'intérêt est rare. Afin de prendre en compte les facteurs de confusion dans cette situation, certains auteurs déconseillent l'utilisation du score de propension au profit du score pronostique, mais cette recommandation ne s'appuie sur aucune étude évaluant spécifiquement les faibles prévalences de l'exposition, et ignore le type d'estimation, conditionnelle ou marginale, fournie par chaque méthode d'utilisation du score pronostique.La première partie de ce travail évalue les méthodes basées sur le score de propension pour l'estimation d'un effet marginal en situation d'exposition rare. La deuxième partie évalue les performances des méthodes basées sur le score pronostique rapportées dans la littérature, introduit de nouvelles méthodes basées sur le score pronostique adaptées à l'estimation d'effets conditionnels ou marginaux, et les compare aux performances des méthodes basées sur le score de propension. La dernière partie traite des estimateurs de la variance des effets du traitement. Nous présentons les conséquences liées à la non prise en compte de l'étape d'estimation du score de propension et du score pronostique dans le calcul de la variance. Nous proposons et évaluons de nouveaux estimateurs tenant compte de cette étape
Pharmacoepidemiologic observational studies are often conducted to evaluate newly marketed drugs or drugs in competition with many alternatives. In such cohort studies, the exposure of interest is rare. To take into account confounding factors in such settings, some authors advise against the use of the propensity score in favor of the prognostic score, but this recommendation is not supported by any study especially focused on infrequent exposures and ignores the type of estimation provided by each prognostic score-based method.The first part of this work evaluates the use of propensity score-based methods to estimate the marginal effect of a rare exposure. The second part evaluates the performance of the prognostic score based methods already reported in the literature, compares them with the propensity score based methods, and introduces some new prognostic score-based methods intended to estimate conditional or marginal effects. The last part deals with variance estimators of the treatment effect. We present the opposite consequences of ignoring the estimation step of the propensity score and the prognostic score. We show some new variance estimators accounting for this step

APA, Harvard, Vancouver, ISO, and other styles

38

Hamada, Sophie Rym. "Analyse de la prise en charge des patients traumatisés sévères dans le contexte français : processus de triage et processus de soin." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS572.

Full text

Abstract:

La traumatologie est un problème de santé publique au troisième rang des années de vie perdues ajustées sur l’incapacité en France. L’investissement sanitaire et le volume de recherche qu’elle génère sont en deçà de ce que représente son impact sociétal. L’objet de ce travail de recherche était de plonger au cœur du parcours du patient traumatisé sévère pour en cibler trois problématiques clefs et tenter de répondre aux interrogations qu’elles génèrent.Les données utilisées provenaient essentiellement d’un observatoire de traumatologie lourde hospitalier (Traumabase®), régional et national, qui collige un ensemble de variables épidémiologiques, cliniques, paracliniques, et thérapeutiques des patients traumatisés sévères admis en centre de traumatologie.Le premier projet a ciblé l’orientation initiale (triage) des patients traumatisés sévères suite à un accident de la circulation au sein de la région Île de France et son effet sur la mortalité. Les patients initialement mal triés, transférés secondairement dans les centres de traumatologie régionaux, ne présentaient pas un pronostic plus sombre que les patients qui étaient transportés directement. Le système de soin dans son ensemble permettait de leur assurer un devenir équivalent. Une analyse en population réalisée par un chainage probabiliste des données avec les fiches d’accident de l’observatoire national de la sécurité routière a permis d’approcher le taux de sous triage conduisant au décès dans la région (0,15%) et d’objectiver que 60% des décès survenaient avant toute admission hospitalière.Le second projet visait l’optimisation de la jonction entre l’équipe médicalisée préhospitalière et l’équipe intrahospitalière. Il s’est attelé à développer un outil de prédiction de la sévérité des patients hémorragiques pour permettre l’anticipation de l’admission des patients les plus graves. Cet outil, le Red Flag, avait pour cahier des charges d’être simple et pragmatique, et de ne pas nécessiter de dispositif externe pour l’utiliser. Il a identifié cinq caractéristiques (shock index>1, pression artérielle moyenne <70mmHg, hémoglobine capillaire < 13g/dL, bassin instable et intubation), dont la présence de deux ou plus d’entre-elles permettait d’activer l’alerte pour l’hôpital receveur. Cet outil devra être évalué en prospectif pour confirmer ses performances et évaluer son impact sur l’organisation et le devenir des patients.Le troisième projet de recherche ciblait plus spécifiquement une des thérapeutiques de la coagulopathie aigue du traumatisé sévère en choc hémorragique. Il a tenté de quantifier l’impact de l’administration de concentré de fibrinogène à la phase précoce du choc hémorragique traumatique (6 premières heures) sur la mortalité toutes causes confondues des 24 premières heures par une approche d’inférence causale (score de propension et méthode d’estimation double robuste). Il n’a pas été retrouvé d’effet significatif sur la mortalité, un manque de puissance pouvant être responsable de ce résultat (différence de risque observée : -0,031, Intervalle de confiance 95% [-0,084 ; 0,021]).Ainsi l’ensemble de ces 3 projets de recherche ont permis de répondre à des problématiques ciblées du parcours du patient traumatisé sévère, générant par la même de nouvelles perspectives d’analyse pour mieux circonscrire les réponses de terrain
In France, the third most frequent cause of disability adjusted life years lost is trauma, an observation that makes trauma a public health challenge. However, investment in trauma care and specific research fails to meet this challenge and to acknowledge the associated societal and economic impact.The purpose of this research was to explore the core of the pathway of a major trauma patient and bring to light key issues and question and to find answers. The data used in this research were mainly extracted from a regional and national trauma registry, the Traumabase®. The registry collects epidemiological, clinical, paraclinical and therapeutic variables for patients with severe trauma admitted to participating trauma centres. The first project focused on the effects of triage on patients with severe trauma following a road traffic accident in the Ile de France region. Patients who were initially under triaged and then transferred to regional trauma centres did not have a worse prognosis than patients who were transported directly. The emergency medical system as a whole ensured that they would have an equivalent outcome. A population analysis carried out by a probabilistic data chainage using the accident records of the National Road Safety Observatory made it possible to approach the undertriage rate leading to death in the region (0.15%) and to reveal that 60% of deaths occurred before any hospital admission. The second project developed a pragmatic pre-alert tool based on simple, clinical prehospital criteria to predict acute hemorrhage in trauma patients. This tool is meant to increase the performance of the receiving hospital trauma team of these critically sick patients and activate a specific hemorrhage pathway. The study identified five variables (shock index>1, mean blood pressure <70mmHg, capillary hemoglobin <13g/dL, unstable pelvis and intubation). If two or more variables were present, the tool identified patient with acute hemorrhage and the corresponding pathway should be activated. This tool requires prospective validation and assessment of its impact on care provision and patient outcome.The third research project focused on a therapeutic component of trauma induced coagulopathy. The study attempted to quantify the effect of fibrinogen concentrate administration at the early phase of traumatic hemorrhagic shock (first 6 hours) on 24 hours all-cause mortality using a causal inference approach (propensity score and double robust estimator). The research did not demonstrate any impact on mortality (observed risk difference: -0.031, 95% confidence interval [-0.084; 0.021]); a lack of power might be responsible for this result

APA, Harvard, Vancouver, ISO, and other styles

39

Häggström, Jenny. "Selection of smoothing parameters with application in causal inference." Doctoral thesis, Umeå universitet, Statistiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-39614.

Full text

Abstract:

This thesis is a contribution to the research area concerned with selection of smoothing parameters in the framework of nonparametric and semiparametric regression. Selection of smoothing parameters is one of the most important issues in this framework and the choice can heavily influence subsequent results. A nonparametric or semiparametric approach is often desirable when large datasets are available since this allow us to make fewer and weaker assumptions as opposed to what is needed in a parametric approach. In the first paper we consider smoothing parameter selection in nonparametric regression when the purpose is to accurately predict future or unobserved data. We study the use of accumulated prediction errors and make comparisons to leave-one-out cross-validation which is widely used by practitioners. In the second paper a general semiparametric additive model is considered and the focus is on selection of smoothing parameters when optimal estimation of some specific parameter is of interest. We introduce a double smoothing estimator of a mean squared error and propose to select smoothing parameters by minimizing this estimator. Our approach is compared with existing methods.The third paper is concerned with the selection of smoothing parameters optimal for estimating average treatment effects defined within the potential outcome framework. For this estimation problem we propose double smoothing methods similar to the method proposed in the second paper. Theoretical properties of the proposed methods are derived and comparisons with existing methods are made by simulations.In the last paper we apply our results from the third paper by using a double smoothing method for selecting smoothing parameters when estimating average treatment effects on the treated. We estimate the effect on BMI of divorcing in middle age. Rich data on socioeconomic conditions, health and lifestyle from Swedish longitudinal registers is used.

APA, Harvard, Vancouver, ISO, and other styles

40

Waernbaum, Ingeborg. "Covariate selection and propensity score specification in causal inference." Doctoral thesis, Umeå : Umeå universitet, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1688.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Oelrich, Oscar. "Causal Inference Using Propensity Score Matching in Clustered Data." Thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-225990.

Full text

Abstract:

Propensity score matching is commonly used to estimate causal effects of treatments. However, when using data with a hierarchical structure, we need to take the multilevel nature of the data into account. In this thesis the estimation of propensity scores with multilevel models is presented to extend propensity score matching for use with multilevel data. A Monte Carlo simulation study is performed to evaluate several different estimators. It is shown that propensity score estimators ignoring the multilevel structure of the data are biased, while fixed effects models produce unbiased results. An empirical study of the causal effect of truancy on mathematical ability for Swedish 9th graders is also performed, where it is shown that truancy has a negative effect on mathematical ability.

APA, Harvard, Vancouver, ISO, and other styles

42

Geneletti, Sara Gisella. "Aspects of causal inference in a non-counterfactual framework." Thesis, University College London (University of London), 2005. http://discovery.ucl.ac.uk/1445505/.

Full text

Abstract:

Since the mid 1970s and increasingly over the last decade, causal inference has generated interest and controversy in statistics. Mathematical frame works have been developed to make causal inference in fields ranging from epidemiology to social science. However, most frameworks rely on the existence of counterfactuals, and the assumptions that underpin them are not always made explicit. This thesis analyses such assumptions and proposes an alternative model. This is then used to tackle problems that have been formulated in counterfactual terms. The proposed framework is based on decision theory. Causes are seen in terms of interventions which in turn are seen as decisions. Decisions are thus explicitly included as intervention variables, in both algebraic expressions for causal effects and the in DAGs which represent the probabilistic structure between the variables. The non-counterfactual framework introduces a novel way of determining whether causal quantities are identifiable. Two such quantities are considered and conditions for their identification are presented. These are the direct effect of treatment on response in the presence of a mediating variable, and the effect of treatment on the treated. To determine whether these are identifiable, intervention nodes are introduced on the variables that are thought to be causal in the problem. By manipulating the conditional independences between the observed variables and the intervention nodes it is possible to determine whether the quantities of interest can be expressed in terms of the a) specific settings and/or b) the idle setting of the intervention nodes, corresponding to experimental regimes and the observational regimes of the causal variables. This method can be easily tailored to any specific context, as it relies only on the understanding of conditional independences.

APA, Harvard, Vancouver, ISO, and other styles

43

Ding, Jiacheng. "Causal Inference based Fault Localization for Python Numerical Programs." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1530904294580033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Sun, BaoLuo. "Semi-Parametric Methods for Missing Data and Causal Inference." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493594.

Full text

Abstract:

In this dissertation, we propose methodology to account for missing data as well as a strategy to account for outcome heterogeneity. Missing data occurs frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an exogeneous instrumental variable (IV) is observed for all subjects such that it satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In chapter 1, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs. The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone MAR settings. In chapter 2, we propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which is easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a constrained Bayesian approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of standard IPW estimation is improved by incorporating information from incomplete cases through an augmented estimating equation which is optimal within a large class of estimating equations. We investigate the finite-sample properties of the proposed estimators in extensive simulations and illustrate the new methodology in an application evaluating key correlates of preterm delivery for infants born to HIV infected mothers in Botswana, Africa. When a risk factor affects certain categories of a multinomial outcome but not others, outcome heterogeneity is said to be present. A standard epidemiologic approach for modeling risk factors of a categorical outcome typically entails fitting a polytomous logistic regression via maximum likelihood estimation. In chapter 3, we show that standard polytomous regression is ill-equipped to detect outcome heterogeneity, and will generally understate the degree to which such heterogeneity may be present. Specifically, nonsaturated polytomous regression will often a priori rule out the possibility of outcome heterogeneity from its parameter space. As a remedy, we propose to model each category of the outcome as a separate binary regression. For full efficiency, we propose to estimate the collection of regression parameters jointly by a constrained Bayesian approach which ensures that one remains within the multinomial model. The approach is straightforward to implement in standard software for Bayesian estimation.
Biostatistics

APA, Harvard, Vancouver, ISO, and other styles

45

Liu, Jinzhong. "Bayesian Inference for Treatment Effect." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1504803668961964.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Majid, Asifa. "Language and causal understanding : there's something about Mary." Thesis, University of Glasgow, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366213.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Béal, Jonas. "De la modélisation mécanistique des voies de signalisation dans le cancer à l’interprétation des modèles et de leurs apports : applications cliniques et évaluation statistique." Electronic Thesis or Diss., Université Paris sciences et lettres, 2020. https://theses.hal.science/tel-03188676.

Full text

Abstract:

Au delà de ses mécanismes génétiques, le cancer peut être compris comme une maladie de réseaux qui résulte souvent de l’interaction entre différentes perturbations dans un réseau de régulation cellulaire. La dynamique de ces réseaux et des voies de signalisation associées est complexe et requiert des approches intégrées. Une d’entre elles est la conception de modèles dits mécanistiques qui traduisent mathématiquement la connaissance biologique des réseaux afin de pouvoir simuler le fonctionnement moléculaire des cancers informatiquement. Ces modèles ne traduisent cependant que les mécanismes généraux à l’oeuvre dans certains cancers en particulier.Cette thèse propose en premier lieu de définir des modèles mécanistiques personnalisés de cancer. Un modèle générique est d’abord défini dans un formalisme logique (ou Booléen), avant d’utiliser les données omiques (mutations, ARN, protéines) de patients ou de lignées cellulaires afin de rendre le modèle spécifique à chacun. Ces modèles personnalisés peuvent ensuite être confrontés aux données cliniques de patients pour vérifier leur validité. Le cas de la réponse clinique aux traitements est exploré en particulier dans ce travail. La représentation explicite des mécanismes moléculaires par ces modèles permet en effet de simuler l’effet de différents traitements suivant leur mode d’action et de vérifier si la sensibilité d’un patient à un traitement est bien prédite par le modèle personnalisé correspondant. Un exemple concernant la réponse aux inhibiteurs de BRAF dans les mélanomes et cancers colorectaux est ainsi proposé.La confrontation des modèles mécanistiques de cancer, ceux présentés dans cette thèse et d’autres, aux données cliniques incite par ailleurs à évaluer rigoureusement leurs éventuels bénéfices dans la cadre d’une utilisation médicale. La quantification et l’interprétation de la valeur pronostique des biomarqueurs issus de certains modèles méchanistiques est brièvement présentée avant de se focaliser sur le cas particulier des modèles capables de sélectionner le meilleur traitement pour chaque patient en fonction des ses caractéristiques moléculaires. Un cadre théorique est proposé pour étendre les méthodes d’inférence causale à l’évaluation de tels algorithmes de médecine de précision. Une illustration est fournie à l’aide de données simulées et de xénogreffes dérivées de patients.L’ensemble des méthodes et applications décrites tracent donc un chemin, de la conception de modèles mécanistiques de cancer à leur évaluation grâce à des modèles statistiques émulant des essais cliniques, proposant ainsi un cadre pour la mise en oeuvre de la médecine de précision en oncologie
Beyond its genetic mechanisms, cancer can be understood as a network disease that often results from the interactions between different perturbations in a cellular regulatory network. The dynamics of these networks and associated signaling pathways are complex and require integrated approaches. One approach is to design mechanistic models that translate the biological knowledge of networks in mathematical terms to simulate computationally the molecular features of cancers. However, these models only reflect the general mechanisms at work in cancers.This thesis proposes to define personalized mechanistic models of cancer. A generic model is first defined in a logical (or Boolean) formalism, before using omics data (mutations, RNA, proteins) from patients or cell lines in order to make the model specific to each one profile. These personalized models can then be compared with the clinical data of patients in order to validate them. The response to treatment is investigated in particular in this thesis. The explicit representation of the molecular mechanisms by these models allows to simulate the effect of different treatments according to their targets and to verify if the sensitivity of a patient to a drug is well predicted by the corresponding personalized model. An example concerning the response to BRAF inhibitors in melanomas and colorectal cancers is thus presented.The comparison of mechanistic models of cancer, those presented in this thesis and others, with clinical data also encourages a rigorous evaluation of their possible benefits in the context of medical use. The quantification and interpretation of the prognostic value of outputs of some mechanistic models is briefly presented before focusing on the particular case of models able to recommend the best treatment for each patient according to his molecular profile. A theoretical framework is defined to extend causal inference methods to the evaluation of such precision medicine algorithms. An illustration is provided using simulated data and patient derived xenografts.All the methods and applications put forward a possible path from the design of mechanistic models of cancer to their evaluation using statistical models emulating clinical trials. As such, this thesis provides one framework for the implementation of precision medicine in oncology

APA, Harvard, Vancouver, ISO, and other styles

48

Elling, Eva. "Effects of MIFID II on Stock Trade Volumes of Nasdaq Stockholm." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-257510.

Full text

Abstract:

Introducing new financial legislation to financial markets require caution to achieve the intended outcome. This thesis aims to investigate whether or not the newly installed revised Markets in Financial Instruments Directive- the MIFID II regulation - temporally influenced the trading stock volume levels of Nasdaq Stockholm during its introduction to the Swedish stock market. A first approach of a generalized Negative Binomial model is carried out on aggregated data, followed by an individual Fixed Effects model in an attempt to eliminate omitted variable bias caused by missing unobserved variables for the individual stocks. The aggregated data is attained by taking the equally weighted average of the trading volume and adjusting for seasonality through Seasonal and Trend decomposition using Loess in combination with a regression model with ARIMA errors to mitigate calendar effects. Due to robustness of the aggregated data, the Negative Binomial model manage to capture significant effects of the regulation on the Small Cap. segment, even though clusters of the data show signs of divergent reactions to MIFID II. Since the Fixed Effects model operate on non-aggregated TSCS data and because of the varying effects on each stock the Fixed Effect model fails in its attempt to do the same.
Implementation av nya finansiella regelverk på finansmarknaden kräver aktsamhet för att uppnå de tilltänka målen. Det här arbetet undersöker huruvida MIFID II regleringen orsakade en temporär medelvärdesskiftning av de handlade aktievolymerna på Nasdaq Stockholm under regelverkets introduktion på den svenska marknaden. Först testas en generaliserad Negative Binomial regression applicerat på aggregerad data, därefter en individuell Fixed Effects modell för att försöka eliminera fel på grund av saknade, okända variabler. Det aggrigerade datasettet erhålls genom att ta genomsnittet av handelsvolymerna och justera dessa för sässongsmässiga mönster med metoden STL i kombination med regression med ARIMA residualer för att även ta hänsyn till kalender relaterade effekter. Eftersom den aggrigerade datan är robust lyckas the Negative Binomial regressionen fånga signifikanta effekter av regleringen för Small Cap. segmentet trots att datat uppvisar tecken på att subgrupper inom segmentet reagerat väldigt olika på den nya regleringen. Eftersom Fixed Effects modellen är applicerad på icke-aggrigerad TSCS data och pågrund av den varierande effekten på de individuella aktierna lyckas inte denna modell med detta.

APA, Harvard, Vancouver, ISO, and other styles

49

Olsen, Catharina. "Causal inference and prior integration in bioinformatics using information theory." Doctoral thesis, Universite Libre de Bruxelles, 2013. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401.

Full text

Abstract:

An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.

Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.

The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.

Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score
Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

50

Pingel, Ronnie. "Some Aspects of Propensity Score-based Estimators for Causal Inference." Doctoral thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-229341.

Full text

Abstract:

This thesis consists of four papers that are related to commonly used propensity score-based estimators for average causal effects. The first paper starts with the observation that researchers often have access to data containing lots of covariates that are correlated. We therefore study the effect of correlation on the asymptotic variance of an inverse probability weighting and a matching estimator. Under the assumptions of normally distributed covariates, constant causal effect, and potential outcomes and a logit that are linear in the parameters we show that the correlation influences the asymptotic efficiency of the estimators differently, both with regard to direction and magnitude. Further, the strength of the confounding towards the outcome and the treatment plays an important role. The second paper extends the first paper in that the estimators are studied under the more realistic setting of using the estimated propensity score. We also relax several assumptions made in the first paper, and include the doubly robust estimator. Again, the results show that the correlation may increase or decrease the variances of the estimators, but we also observe that several aspects influence how correlation affects the variance of the estimators, such as the choice of estimator, the strength of the confounding towards the outcome and the treatment, and whether constant or non-constant causal effect is present. The third paper concerns estimation of the asymptotic variance of a propensity score matching estimator. Simulations show that large gains can be made for the mean squared error by properly selecting smoothing parameters of the variance estimator and that a residual-based local linear estimator may be a more efficient estimator for the asymptotic variance. The specification of the variance estimator is shown to be crucial when evaluating the effect of right heart catheterisation, i.e. we show either a negative effect on survival or no significant effect depending on the choice of smoothing parameters. In the fourth paper, we provide an analytic expression for the covariance matrix of logistic regression with normally distributed regressors. This paper is related to the other papers in that logistic regression is commonly used to estimate the propensity score.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Inferenza causale'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles