Einloggen

Thematische Bibliographien / Data missingness / Zeitschriftenartikel

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Data missingness.

Zeitschriftenartikel zum Thema „Data missingness“

Autor: Grafiati

Veröffentlicht am 24. Juni 2021

Zuletzt aktualisiert am 15. Februar 2022

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Data missingness" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Ghazali, Shamihah Muhammad, Norshahida Shaadan und Zainura Idrus. „Missing data exploration in air quality data set using R-package data visualisation tools“. Bulletin of Electrical Engineering and Informatics 9, Nr. 2 (01.04.2020): 755–63. http://dx.doi.org/10.11591/eei.v9i2.2088.

Der volle Inhalt der Quelle

Annotation:

Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set need to be treated or replaced using imputation method. Thus, exploring missing values pattern must be conducted beforehand to determine a suitable method. This paper discusses on the application of data visualisation as a smart technique for missing data exploration aiming to increase understanding on missing data behaviour which include missing data mechanism (MCAR, MAR and MNAR), distribution pattern of missingness in terms of percentage as well as the gap size. This paper presents the application of several data visualisation tools from five R-packges such as visdat, VIM, ggplot2, Amelia and UpSetR for data missingness exploration. For an illustration, based on an air quality data set in Malaysia, several graphics were produced and discussed to illustrate the contribution of the visualisation tools in providing input and the insight on the pattern of data missingness. Based on the results, it is shown that missing values in air quality data set of the chosen sites in Malaysia behave as missing at random (MAR) with small percentage of missingness and do contain long gap size of missingness.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

ZHANG, WEN, YE YANG und QING WANG. „A COMPARATIVE STUDY OF ABSENT FEATURES AND UNOBSERVED VALUES IN SOFTWARE EFFORT DATA“. International Journal of Software Engineering and Knowledge Engineering 22, Nr. 02 (März 2012): 185–202. http://dx.doi.org/10.1142/s0218194012400025.

Der volle Inhalt der Quelle

Annotation:

Software effort data contains a large amount of missing values of project attributes. The problem of absent features, which occurred recently in machine learning, is often neglected by researchers of software engineering when handling the missingness in software effort data. In essence, absent features (structural missingness) and unobserved values (unstructured missingness) are different cases of missingness although their appearance in the data set are the same. This paper attempts to clarify the root cause of missingness of software effort data. When regarding missingness as absent features, we develop Max-margin regression to predict real effort of software projects. When regarding missingness as unobserved values, we use existing imputation techniques to impute missing values. Then, ε – SVR is used to predict real effort of software projects with the input data sets. Experiments on ISBSG (International Software Benchmarking Standard Group) and CSBSG (Chinese Software Benchmarking Standard Group) data sets demonstrate that, with the tasks of effort prediction, the treatment regarding missingness in software effort data set as unobserved values can produce more desirable performance than that of regarding missingness as absent features. This paper is the first to introduce the concept of absent features to deal with missingness of software effort data.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

De Raadt, Alexandra, Matthijs J. Warrens, Roel J. Bosker und Henk A. L. Kiers. „Kappa Coefficients for Missing Data“. Educational and Psychological Measurement 79, Nr. 3 (16.01.2019): 558–76. http://dx.doi.org/10.1177/0013164418823249.

Der volle Inhalt der Quelle

Annotation:

Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms—namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet ( Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Arioli, Angelica, Arianna Dagliati, Bethany Geary, Niels Peek, Philip A. Kalra, Anthony D. Whetton und Nophar Geifman. „OptiMissP: A dashboard to assess missingness in proteomic data-independent acquisition mass spectrometry“. PLOS ONE 16, Nr. 4 (15.04.2021): e0249771. http://dx.doi.org/10.1371/journal.pone.0249771.

Der volle Inhalt der Quelle

Annotation:

Background Missing values are a key issue in the statistical analysis of proteomic data. Defining the strategy to address missing values is a complex task in each study, potentially affecting the quality of statistical analyses. Results We have developed OptiMissP, a dashboard to visually and qualitatively evaluate missingness and guide decision making in the handling of missing values in proteomics studies that use data-independent acquisition mass spectrometry. It provides a set of visual tools to retrieve information about missingness through protein densities and topology-based approaches, and facilitates exploration of different imputation methods and missingness thresholds. Conclusions OptiMissP provides support for researchers’ and clinicians’ qualitative assessment of missingness in proteomic datasets in order to define study-specific strategies for the handling of missing values. OptiMissP considers biases in protein distributions related to the choice of imputation method and helps analysts to balance the information loss caused by low missingness thresholds and the noise introduced by selecting high missingness thresholds. This is complemented by topological data analysis which provides additional insight to the structure of the data and their missingness. We use an example in Chronic Kidney Disease to illustrate the main functionalities of OptiMissP.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Xie, Hui. „Analyzing longitudinal clinical trial data with nonignorable missingness and unknown missingness reasons“. Computational Statistics & Data Analysis 56, Nr. 5 (Mai 2012): 1287–300. http://dx.doi.org/10.1016/j.csda.2010.11.021.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Babcock, Ben, Peter E. L. Marks, Yvonne H. M. van den Berg und Antonius H. N. Cillessen. „Implications of systematic nominator missingness for peer nomination data“. International Journal of Behavioral Development 42, Nr. 1 (19.08.2016): 148–54. http://dx.doi.org/10.1177/0165025416664431.

Der volle Inhalt der Quelle

Annotation:

Missing data are a persistent problem in psychological research. Peer nomination data present a unique missing data problem, because a nominator’s nonparticipation results in missing data for other individuals in the study. This study examined the range of effects of systematic nonparticipation on the correlations between peer nomination data when nominators with various levels of popularity and social preference are missing. Results showed that, compared to completely random nominator missingness, systematic missingness of raters based on popularity had a significant impact on the correlations between various peer nomination variables. Systematic missingness based on social preference had a smaller impact. These results demonstrate varying (and potentially large) effects of systematically missing nominators on studies using nomination data. It is important that researchers using peer nomination data explore whether nominators are missing in any sort of systematic way and include these results as part of each study. Future research into the nature of systematic nominator missingness could make it possible to use advanced methodologies, such as multiple imputation, in an attempt to minimize the issues associated with systematic missingness.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Spineli, Loukia M., Chrysostomos Kalyvas und Katerina Papadimitropoulou. „Continuous(ly) missing outcome data in network meta-analysis: A one-stage pattern-mixture model approach“. Statistical Methods in Medical Research 30, Nr. 4 (06.01.2021): 958–75. http://dx.doi.org/10.1177/0962280220983544.

Der volle Inhalt der Quelle

Annotation:

Appropriate handling of aggregate missing outcome data is necessary to minimise bias in the conclusions of systematic reviews. The two-stage pattern-mixture model has been already proposed to address aggregate missing continuous outcome data. While this approach is more proper compared with the exclusion of missing continuous outcome data and simple imputation methods, it does not offer flexible modelling of missing continuous outcome data to investigate their implications on the conclusions thoroughly. Therefore, we propose a one-stage pattern-mixture model approach under the Bayesian framework to address missing continuous outcome data in a network of interventions and gain knowledge about the missingness process in different trials and interventions. We extend the hierarchical network meta-analysis model for one aggregate continuous outcome to incorporate a missingness parameter that measures the departure from the missing at random assumption. We consider various effect size estimates for continuous data, and two informative missingness parameters, the informative missingness difference of means and the informative missingness ratio of means. We incorporate our prior belief about the missingness parameters while allowing for several possibilities of prior structures to account for the fact that the missingness process may differ in the network. The method is exemplified in two networks from published reviews comprising a different amount of missing continuous outcome data.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

McGurk, Kathryn A., Arianna Dagliati, Davide Chiasserini, Dave Lee, Darren Plant, Ivona Baricevic-Jones, Janet Kelsall et al. „The use of missing values in proteomic data-independent acquisition mass spectrometry to enable disease activity discrimination“. Bioinformatics 36, Nr. 7 (02.12.2019): 2217–23. http://dx.doi.org/10.1093/bioinformatics/btz898.

Der volle Inhalt der Quelle

Annotation:

Abstract Motivation Data-independent acquisition mass spectrometry allows for comprehensive peptide detection and relative quantification than standard data-dependent approaches. While less prone to missing values, these still exist. Current approaches for handling the so-called missingness have challenges. We hypothesized that non-random missingness is a useful biological measure and demonstrate the importance of analysing missingness for proteomic discovery within a longitudinal study of disease activity. Results The magnitude of missingness did not correlate with mean peptide concentration. The magnitude of missingness for each protein strongly correlated between collection time points (baseline, 3 months, 6 months; R = 0.95–0.97, confidence interval = 0.94–0.97) indicating little time-dependent effect. This allowed for the identification of proteins with outlier levels of missingness that differentiate between the patient groups characterized by different patterns of disease activity. The association of these proteins with disease activity was confirmed by machine learning techniques. Our novel approach complements analyses on complete observations and other missing value strategies in biomarker prediction of disease activity. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Elleman, Lorien G., Sarah K. McDougald, David M. Condon und William Revelle. „That Takes the BISCUIT“. European Journal of Psychological Assessment 36, Nr. 6 (November 2020): 948–58. http://dx.doi.org/10.1027/1015-5759/a000590.

Der volle Inhalt der Quelle

Annotation:

Abstract. The predictive accuracy of personality-criterion regression models may be improved with statistical learning (SL) techniques. This study introduced a novel SL technique, BISCUIT (Best Items Scale that is Cross-validated, Unit-weighted, Informative, and Transparent). The predictive accuracy and parsimony of BISCUIT were compared with three established SL techniques (the lasso, elastic net, and random forest) and regression using two sets of scales, for five criteria, across five levels of data missingness. BISCUIT’s predictive accuracy was competitive with other SL techniques at higher levels of data missingness. BISCUIT most frequently produced the most parsimonious SL model. In terms of predictive accuracy, the elastic net and lasso dominated other techniques in the complete data condition and in conditions with up to 50% data missingness. Regression using 27 narrow traits was an intermediate choice for predictive accuracy. For most criteria and levels of data missingness, regression using the Big Five had the worst predictive accuracy. Overall, loss in predictive accuracy due to data missingness was modest, even at 90% data missingness. Findings suggest that personality researchers should consider incorporating planned data missingness and SL techniques into their designs and analyses.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Rhemtulla, Mijke, Fan Jia, Wei Wu und Todd D. Little. „Planned missing designs to optimize the efficiency of latent growth parameter estimates“. International Journal of Behavioral Development 38, Nr. 5 (23.01.2014): 423–34. http://dx.doi.org/10.1177/0165025413514324.

Der volle Inhalt der Quelle

Annotation:

We examine the performance of planned missing (PM) designs for correlated latent growth curve models. Using simulated data from a model where latent growth curves are fitted to two constructs over five time points, we apply three kinds of planned missingness. The first is item-level planned missingness using a three-form design at each wave such that 25% of data are missing. The second is wave-level planned missingness such that each participant is missing up to two waves of data. The third combines both forms of missingness. We find that three-form missingness results in high convergence rates, little parameter estimate or standard error bias, and high efficiency relative to the complete data design for almost all parameter types. In contrast, wave missingness and the combined design result in dramatically lowered efficiency for parameters measuring individual variability in rates of change (e.g., latent slope variances and covariances), and bias in both estimates and standard errors for these same parameters. We conclude that wave missingness should not be used except with large effect sizes and very large samples.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Fernstad, Sara Johansson. „To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization“. Information Visualization 18, Nr. 2 (25.07.2018): 230–50. http://dx.doi.org/10.1177/1473871618785387.

Der volle Inhalt der Quelle

Annotation:

While missing data is a commonly occurring issue in many domains, it is a topic that has been greatly overlooked by visualization scientists. Missing data values reduce the reliability of analysis results. A range of methods exist to replace the missing values with estimated values, but their appropriateness often depend on the patterns of missingness. Increased understanding of the missingness patterns and the distribution of missing values in data may greatly improve reliability, as well as provide valuable insight into potential problems in data gathering and analyses processes, and better understanding of the data as a whole. Visualization methods have a unique possibility to support investigation and understanding of missingness patterns by making the missing values and their relationship to recorded values visible. This article provides an overview of visualization of missing data values and defines a set of three missingness patterns of relevance for understanding missingness in data. It also contributes a usability evaluation which compares visualization methods representing missing values and how well they help users identify missingness patterns. The results indicate differences in performance depending on the visualization method as well as missingness pattern. Recommendations for future design of missing data visualization are provided based on the outcome of the study.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Forna, Alpha, Ilaria Dorigatti, Pierre Nouvellet und Christl A. Donnelly. „Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study“. PLOS ONE 16, Nr. 9 (15.09.2021): e0257005. http://dx.doi.org/10.1371/journal.pone.0257005.

Der volle Inhalt der Quelle

Annotation:

Background Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. Methods Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random—MCAR, missing at random—MAR, or missing not at random—MNAR). Results Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%–16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%–11%). Conclusion ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings—patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

St-Louis, Etienne, Daniel Roizblatt, Dan L. Deckelbaum, Robert Baird, César V. Millán und Alicia Ebensperger. „Identifying Pediatric Trauma Data Gaps at a Large Urban Trauma Referral Center in Santiago, Chile“. Panamerican Journal of Trauma, Critical Care & Emergency Surgery 6, Nr. 3 (2017): 169–76. http://dx.doi.org/10.5005/jp-journals-10030-1188.

Der volle Inhalt der Quelle

Annotation:

ABSTRACT Background Trauma registries contribute to improving trauma care, but their impact is highly dependent on the quality of the data. A simplified point of care pediatric trauma registry (PTR) was developed at the Centre for Global Surgery from the McGill University Health Centre (MUHC) for implementation in Low-middle income countries (LMICs). Pilot deployment was launched at a large urban trauma center in May 2016 in Santiago, Chile. Prior to deployment, we sought to identify missing data in existing trauma records in order to optimize PTR practicality and user benefit. Materials and methods The project was approved by the local Institutional Review Board. Retrospective chart review was conducted on trauma patients below the age of 15 who were evaluated at the emergency room (ER) of Hospital Dr. Sotero del Rio (HSR) between January 1st and June 30th 2015. Data missingness was evaluated for each component of the PTR (demographics, mechanism, injury and outcomes). Potential independent predictors of data missingness were evaluated using multiple linear regression. Results A total of 351 patients were included. Demographic data missingness ranged from 0% (age) to 95% (mode of arrival). Mechanism data missingness ranged from 6% (cause of injury) to 42% (site of injury). Injury physiology data missingness ranged from 37% (oxygen saturation) to 99% (respiratory rate). Interestingly, mean injury anatomy data missingness was significantly inferior to physiology data (0.6% vs. 78.6%, p < 0.05). Outcome data missingness reached 54% at 2 weeks. Conclusion In resource-limited settings, high quality data is essential to guide responsible resource allocation. We believe implementation of a simplified trauma registry has the potential to reduce data gaps for pediatric trauma patients by streamlining trauma data collection at point of care. This should include streamlined data collection with a short per-patient completion time, and should forego attempts to collect data at 2 weeks, which has proven unsuccessful. How to cite this article St-Louis E, Roizblatt D, Deckelbaum DL, Baird R, Millán CV, Ebensperger A, Razek T. Identifying Pediatric Trauma Data Gaps at a Large Urban Trauma Referral Center in Santiago, Chile. Panam J Trauma Crit Care Emerg Surg 2017;6(3):169-176.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Sadinle, Mauricio, und Jerome P. Reiter. „Sequentially additive nonignorable missing data modelling using auxiliary marginal information“. Biometrika 106, Nr. 4 (26.10.2019): 889–911. http://dx.doi.org/10.1093/biomet/asz054.

Der volle Inhalt der Quelle

Annotation:

Summary We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value of that variable, thereby representing nonignorable missingness mechanisms. These missing data models are identified by making use of auxiliary information on marginal distributions, such as marginal probabilities for multivariate categorical variables or moments for numeric variables. We prove identification results and illustrate the use of these mechanisms in an application.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Imai, Takumi. „Methodology of Semiparametric Estimation for Data with Missingness“. Japanese Journal of Applied Statistics 46, Nr. 2 (2017): 87–106. http://dx.doi.org/10.5023/jappstat.46.87.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Molenberghs, Geert, Els J. T. Goetghebeur, Stuart R. Lipsitz und Michael G. Kenward. „Nonrandom Missingness in Categorical Data: Strengths and Limitations“. American Statistician 53, Nr. 2 (Mai 1999): 110. http://dx.doi.org/10.2307/2685728.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Cho Paik, Myunghee. „Nonignorable Missingness in Matched Case-Control Data Analyses“. Biometrics 60, Nr. 2 (Juni 2004): 306–14. http://dx.doi.org/10.1111/j.0006-341x.2004.00174.x.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Molenberghs, Geert, Els J. T. Goetghebeur, Stuart R. Lipsitz und Michael G. Kenward. „Nonrandom Missingness in Categorical Data: Strengths and Limitations“. American Statistician 53, Nr. 2 (Mai 1999): 110–18. http://dx.doi.org/10.1080/00031305.1999.10474442.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Lu, Zhenqiu, und Zhiyong Zhang. „Bayesian Approach to Non-ignorable Missingness in Latent Growth Models“. Journal of Behavioral Data Science 1, Nr. 2 (Mai 2021): 1–30. http://dx.doi.org/10.35566/jbds/v1n2/p1.

Der volle Inhalt der Quelle

Annotation:

Latent growth curve models (LGCMs) are becoming increasingly important among growth models because they can effectively capture individuals' latent growth trajectories and also explain the factors that influence such growth by analyzing the repeatedly measured manifest variables. However, with the increase in complexity of LGCMs, there is an increase in issues on model estimation. This research proposes a Bayesian approach to LGCMs to address the perennial problem of almost all longitudinal research, namely, missing data. First, different missingness models are formulated. We focus on non-ignorable missingness in this article. Specifically, these models include the latent intercept dependent missingness, the latent slope dependent missingness, and the potential outcome dependent missingness. To implement the model estimation, this study proposes a full Bayesian approach through data augmentation algorithm and Gibbs sampling procedure. Simulation studies are conducted and results show that the proposed method accurately recover model parameters and the mis-specified missingness may result in severely misleading conclusions. Finally, the implications of the approach and future research directions are discussed.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Derks, Eske M., Conor V. Dolan und Dorret I. Boomsma. „Statistical Power to Detect Genetic and Environmental Influences in the Presence of Data Missing at Random“. Twin Research and Human Genetics 10, Nr. 1 (01.02.2007): 159–67. http://dx.doi.org/10.1375/twin.10.1.159.

Der volle Inhalt der Quelle

Annotation:

AbstractWe study the situation in which a cheap measure (X) is observed in a large, representative twin sample, and a more expensive measure (Y) is observed in a selected subsample. The aim of this study is to investigate the optimal selection design in terms of the statistical power to detect genetic and environmental influences on the variance of Y and on the covariance of X and Y. Data were simulated for 4000 dizygotic and 2000 monozygotic twins. Missingness (87% vs. 97%) was then introduced in accordance with 7 selection designs: (i) concordant low + individual high design; (ii) extreme concordant design; (iii) extreme concordant and discordant design (EDAC); (iv) extreme discordant design; (v) individual score selection design; (vi) selection of an optimal number of MZ and DZ twins; and (vii) missing completely at random. The statistical power to detect the influence of additive and dominant genetic and shared environmental effects on the variance of Y and on the covariance between X and Y was investigated. The best selection design is the individual score selection design. The power to detect additive genetic effects is high irrespective of the percentage of missingness or selection design. The power to detect shared environmental effects is acceptable when the percentage of missingness is 87%, but is low when the percentage of missingness is 97%, except for the individual score selection design, in which the power remains acceptable. The power to detect D is low, irrespective of selection design or percentage of missingness. The individual score selection design is therefore the best design for detecting genetic and environmental influences on the variance of Y and on the covariance of X and Y. However, the EDAC design may be preferred when an additional purpose of a study is to detect quantitative trait loci effects.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Yu, Yue, Emily J. Smith und Carter T. Butts. „Retrospective Network Imputation from Life History Data: The Impact of Designs“. Sociological Methodology 50, Nr. 1 (26.02.2020): 131–67. http://dx.doi.org/10.1177/0081175020905624.

Der volle Inhalt der Quelle

Annotation:

Retrospective life history designs are among the few practical approaches for collecting longitudinal network information from large populations, particularly in the context of relationships like sexual partnerships that cannot be measured via digital traces or documentary evidence. While all such designs afford the ability to “peer into the past” vis-à-vis the point of data collection, little is known about the impact of the specific design parameters on the time horizon over which such information is useful. In this article, we investigate the effect of two different survey designs on retrospective network imputation: (1) intervalN, where subjects are asked to provide information on all partners within the past [Formula: see text] time units; and (2) lastK, where subjects are asked to provide information about their [Formula: see text] most recent partners. We simulate a “ground truth” sexual partnership network using a published model of Krivitsky (2012), and we then sample this data using the two retrospective designs under various choices of [Formula: see text] and [Formula: see text]. We examine the accumulation of missingness as a function of time prior to interview, and we investigate the impact of this missingness on model-based imputation of the state of the network at prior time points via conditional ERGM prediction. We quantitatively show that—even setting aside problems of alter identification and informant accuracy—choice of survey design and parameters used can drastically change the amount of missingness in the dataset. These differences in missingness have a large impact on the quality of retrospective parameter estimation and network imputation, including important effects on properties related to disease transmission.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Chaimani, Anna, Dimitris Mavridis, Georgia Salanti, Julian P. T. Higgins und Ian R. White. „Allowing for Informative Missingness in Aggregate Data Meta-Analysis with Continuous or Binary Outcomes: Extensions to Metamiss“. Stata Journal: Promoting communications on statistics and Stata 18, Nr. 3 (September 2018): 716–40. http://dx.doi.org/10.1177/1536867x1801800310.

Der volle Inhalt der Quelle

Annotation:

Missing outcome data can invalidate the results of randomized trials and their meta-analysis. However, addressing missing data is often a challenging issue because it requires untestable assumptions. The impact of missing outcome data on the meta-analysis summary effect can be explored by assuming a relationship between the outcome in the observed and the missing participants via an informative missingness parameter. The informative missingness parameters cannot be estimated from the observed data, but they can be specified, with associated uncertainty, using evidence external to the meta-analysis, such as expert opinion. The use of informative missingness parameters in pairwise meta-analysis of aggregate data with binary outcomes has been previously implemented in Stata by the metamiss command. In this article, we present the new command metamiss2, which is an extension of metamiss for binary or continuous data in pairwise or network meta-analysis. The command can be used to explore the robustness of results to different assumptions about the missing data via sensitivity analysis.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Fang, Zhou, Tianzhou Ma, Gong Tang, Li Zhu, Qi Yan, Ting Wang, Juan C. Celedón, Wei Chen und George C. Tseng. „Bayesian integrative model for multi-omics data with missingness“. Bioinformatics 34, Nr. 22 (01.09.2018): 3801–8. http://dx.doi.org/10.1093/bioinformatics/bty775.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

McNeish, Daniel. „Missing data methods for arbitrary missingness with small samples“. Journal of Applied Statistics 44, Nr. 1 (22.03.2016): 24–39. http://dx.doi.org/10.1080/02664763.2016.1158246.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Khoshgoftaar, Taghi M., und Jason Van Hulse. „Imputation techniques for multivariate missingness in software measurement data“. Software Quality Journal 16, Nr. 4 (11.06.2008): 563–600. http://dx.doi.org/10.1007/s11219-008-9054-7.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Park, Soomin, Mari Palta, Jun Shao und Lei Shen. „Bias adjustment in analysing longitudinal data with informative missingness“. Statistics in Medicine 21, Nr. 2 (2001): 277–91. http://dx.doi.org/10.1002/sim.992.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Zhou, Sherry, und Anne Corinne Huggins-Manley. „The Performance of the Semigeneralized Partial Credit Model for Handling Item-Level Missingness“. Educational and Psychological Measurement 80, Nr. 6 (15.05.2020): 1196–215. http://dx.doi.org/10.1177/0013164420918392.

Der volle Inhalt der Quelle

Annotation:

The semi-generalized partial credit model (Semi-GPCM) has been proposed as a unidimensional modeling method for handling not applicable scale responses and neutral scale responses, and it has been suggested that the model may be of use in handling missing data in scale items. The purpose of this study is to evaluate the ability of the unidimensional Semi-GPCM to aid in the recovery of person parameters from item response data in the presence of item-level missingness, and to compare the performance of the model with two other proposed methods for handling such missingness: a multidimensional modeling approach for missingness and full information maximum likelihood estimation. The results indicate that the Semi-GPCM performs acceptably in an absolute sense when less than 30% of the item data is missing but does not outperform the other two methods under any particular conditions. We conclude with a discussion about when practitioners may or may not want to use the Semi-GPCM to recover person parameters from item response data with missingness.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Alade, Oyekale Abel, Ali Selamat und Roselina Sallehuddin. „The Effects of Missing Data Characteristics on the Choice of Imputation Techniques“. Vietnam Journal of Computer Science 07, Nr. 02 (20.03.2020): 161–77. http://dx.doi.org/10.1142/s2196888820500098.

Der volle Inhalt der Quelle

Annotation:

One major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In this paper, we propose the need to examine the causes of missing data in a medical dataset to ensure that the right imputation method is used in solving the problem. The mechanism of missingness in datasets was studied to know the missing pattern of datasets and determine a suitable imputation technique to generate complete datasets. The pattern shows that the missingness of the dataset used in this study is not a monotone missing pattern. Also, single imputation techniques underestimate variance and ignore relationships among the variables; therefore, we used multiple imputations technique that runs in five iterations for the imputation of each missing value. The whole missing values in the dataset were 100% regenerated. The imputed datasets were validated using an extreme learning machine (ELM) classifier. The results show improvement in the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with the original dataset with different classifiers like support vector machine (SVM), radial basis function (RBF), and ELMs.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

Franks, Alexander M., Edoardo M. Airoldi und Donald B. Rubin. „Nonstandard conditionally specified models for nonignorable missing data“. Proceedings of the National Academy of Sciences 117, Nr. 32 (28.07.2020): 19045–53. http://dx.doi.org/10.1073/pnas.1815563117.

Der volle Inhalt der Quelle

Annotation:

Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey’s representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Plichta, Jennifer Kay, Christel N. Rushing, Holly C. Lewis, Dan G. Blazer, Terry Hyslop und Rachel Adams Greenup. „Missing data in breast cancer: Relationship with survival in national databases.“ Journal of Clinical Oncology 38, Nr. 15_suppl (20.05.2020): e19114-e19114. http://dx.doi.org/10.1200/jco.2020.38.15_suppl.e19114.

Der volle Inhalt der Quelle

Annotation:

e19114 Background: National cancer registries are valuable tools used to analyze patterns of care and clinical oncology outcomes; yet, patients with missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS). Methods: Using the NCDB and SEER, we compared data missingness among patients diagnosed with invasive breast cancer from 2010-2014. Key variables included: demographic variables (age, race, ethnicity, insurance, education, income), tumor variables (grade, ER, PR, HER2, TNM stage), and treatment variables (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data via Cox proportional hazards models. Results: Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missingness of at least 1 key variable was 29% and 13%, respectively. Of those, the majority were missing a tumor variable (NCDB 80%; SEER 88%), while demographic and treatment variables were missing less often. When compared to patients with complete data, missingness was associated with a greater risk of death; NCDB 17% vs. 14% (HR 1.23, 99% CI 1.21-1.25) and SEER 27% vs 14% (HR 2.11, 99% CI 2.05-2.18). Rate of death was similar whether the patient was missing 1 or ≥2 variables. When stratified by the type of missing variable, differences in OS between those with and without missing data in the NCDB were small. In SEER, reductions in OS were largest for those missing tumor variables (HR 2.26, 99% CI 2.19-2.33) or surgery data (HR 3.84, 99% CI 3.32-4.45). Among the tumor variables specifically, few clinically meaningful differences in OS were noted in the NCDB, while the most significant differences in SEER were noted in T and N stage (table). Conclusions: Missingness of select variables is associated with a worse OS and is not uncommon within large national cancer registries. Therefore, researchers must use caution when choosing inclusion/exclusion criteria for outcomes studies. Future research is needed to elucidate which patients are most often missing data and why OS differences are observed. [Table: see text]

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Reich, Brian J., und Dipankar Bandyopadhyay. „A latent factor model for spatial data with informative missingness“. Annals of Applied Statistics 4, Nr. 1 (März 2010): 439–59. http://dx.doi.org/10.1214/09-aoas278.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Qu, A. „Testing ignorable missingness in estimating equation approaches for longitudinal data“. Biometrika 89, Nr. 4 (01.12.2002): 841–50. http://dx.doi.org/10.1093/biomet/89.4.841.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Reich, Brian J., Dipankar Bandyopadhyay und Howard D. Bondell. „A Nonparametric Spatial Model for Periodontal Data With Nonrandom Missingness“. Journal of the American Statistical Association 108, Nr. 503 (September 2013): 820–31. http://dx.doi.org/10.1080/01621459.2013.795487.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Wright, Joseph, und Erica Frantz. „How oil income and missing hydrocarbon rents data influence autocratic survival: A response to Lucas and Richter (2016)“. Research & Politics 4, Nr. 3 (Juli 2017): 205316801771979. http://dx.doi.org/10.1177/2053168017719794.

Der volle Inhalt der Quelle

Annotation:

This paper re-examines the findings from a recently published study on hydrocarbon rents and autocratic survival by Lucas and Richter (LR hereafter). LR introduce a new data set on hydrocarbon rents and use it to examine the link between oil income and autocratic survival. Employing a placebo test, we show that the authors’ strategy for dealing with missingness in the new hydrocarbon rents data set – filling in missing data with zeros – creates bias in the reported estimates of interest. Addressing missingness with multiple imputation shows that the LR findings linking oil rents to democratization do not hold. Instead, we find that hydrocarbon rents reduce the chances of transition to a new dictatorship, consistent with the conclusions of Wright et al.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Bartlett, Jonathan W., James R. Carpenter, Kate Tilling und Stijn Vansteelandt. „Improving upon the efficiency of complete case analysis when covariates are MNAR“. Biostatistics 15, Nr. 4 (06.06.2014): 719–30. http://dx.doi.org/10.1093/biostatistics/kxu023.

Der volle Inhalt der Quelle

Annotation:

Abstract Missing values in covariates of regression models are a pervasive problem in empirical research. Popular approaches for analyzing partially observed datasets include complete case analysis (CCA), multiple imputation (MI), and inverse probability weighting (IPW). In the case of missing covariate values, these methods (as typically implemented) are valid under different missingness assumptions. In particular, CCA is valid under missing not at random (MNAR) mechanisms in which missingness in a covariate depends on the value of that covariate, but is conditionally independent of outcome. In this paper, we argue that in some settings such an assumption is more plausible than the missing at random assumption underpinning most implementations of MI and IPW. When the former assumption holds, although CCA gives consistent estimates, it does not make use of all observed information. We therefore propose an augmented CCA approach which makes the same conditional independence assumption for missingness as CCA, but which improves efficiency through specification of an additional model for the probability of missingness, given the fully observed variables. The new method is evaluated using simulations and illustrated through application to data on reported alcohol consumption and blood pressure from the US National Health and Nutrition Examination Survey, in which data are likely MNAR independent of outcome.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Lawson, Andrew, Anna Schritz, Luis Villarroel und Gloria A. Aguayo. „Multi-Scale Multivariate Models for Small Area Health Survey Data: A Chilean Example“. International Journal of Environmental Research and Public Health 17, Nr. 5 (05.03.2020): 1682. http://dx.doi.org/10.3390/ijerph17051682.

Der volle Inhalt der Quelle

Annotation:

Background: We propose a general approach to the analysis of multivariate health outcome data where geo-coding at different spatial scales is available. We propose multiscale joint models which address the links between individual outcomes and also allow for correlation between areas. The models are highly novel in that they exploit survey data to provide multiscale estimates of the prevalences in small areas for a range of disease outcomes. Results The models incorporate both disease specific, and common disease spatially structured components. The multiple scales envisaged is where individual survey data is used to model regional prevalences or risks at an aggregate scale. This approach involves the use of survey weights as predictors within our Bayesian multivariate models. Missingness has to be addressed within these models and we use predictive inference which exploits the correlation between diseases to provide estimates of missing prevalances. The Case study we examine is from the National Health Survey of Chile where geocoding to Province level is available. In that survey, diabetes, Hypertension, obesity and elevated low-density cholesterol (LDL) are available but differential missingness requires that aggregation of estimates and also the assumption of smoothed sampling weights at the aggregate level. Conclusions: The methodology proposed is highly novel and flexibly handles multiple disease outcomes at individual and aggregated levels (i.e., multiscale joint models). The missingness mechanism adopted provides realistic estimates for inclusion in the aggregate model at Provincia level. The spatial structure of four diseases within Provincias has marked spatial differentiation, with diabetes and hypertension strongly clustered in central Provincias and obesity and LDL more clustered in the southern areas.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Wysham, Nicholas G., Steven P. Wolf, Gregory Samsa, Amy P. Abernethy und Thomas W. LeBlanc. „Integration of Electronic Patient-Reported Outcomes Into Routine Cancer Care: An Analysis of Factors Affecting Data Completeness“. JCO Clinical Cancer Informatics, Nr. 1 (November 2017): 1–10. http://dx.doi.org/10.1200/cci.16.00043.

Der volle Inhalt der Quelle

Annotation:

Purpose Routinely collected patient-reported outcomes (PROs) could provide invaluable data to a patient-centered learning health system but are often highly missing in clinical trials. We analyzed our experience with PROs to understand patterns of missing data using electronic collection as part of routine clinical care. Methods This is an analysis of a prospectively collected observational database of electronic PROs captured as part of routine clinical care in four different outpatient oncology clinics at an academic referral center. Results More than 24,000 clinical encounters from 7,655 unique patients are included. Data were collected via an electronic tablet–based survey instrument (Patient Care Monitor, version 2.0), at the time of clinical care, as part of routine care processes. Missing instruments (ie, no items completed) were submitted for 6.8% of clinical encounters, and 15.8% of encounters had missing items. Nearly 90% of all encounters involved < 10% missing items. In multivariable analyses, younger age, private health insurance, being seen in the breast oncology clinic, less time spent on the instrument, and longitudinal care were significantly associated with less missingness. Conclusion Embedding collection of electronic PRO data into routine clinical care yielded low rates of missing data in this real-world, prospectively collected database. In contrast to clinical trial experience, missingness improve with longitudinal care. This approach may be a solution to minimizing missingness of PROs in research or clinical care settings in support of learning health care systems.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Aguilera, Héctor, Carolina Guardiola-Albert und Carmen Serrano-Hidalgo. „Estimating extremely large amounts of missing precipitation data“. Journal of Hydroinformatics 22, Nr. 3 (28.02.2020): 578–92. http://dx.doi.org/10.2166/hydro.2020.127.

Der volle Inhalt der Quelle

Annotation:

Abstract Accurate estimation of missing daily precipitation data remains a difficult task. A wide variety of methods exists for infilling missing values, but the percentage of gaps is one of the main factors limiting their applicability. The present study compares three techniques for filling in large amounts of missing daily precipitation data: spatio-temporal kriging (STK), multiple imputation by chained equations through predictive mean matching (PMM), and the random forest (RF) machine learning algorithm. To our knowledge, this is the first time that extreme missingness (>90%) has been considered. Different percentages of missing data and missing patterns are tested in a large dataset drawn from 112 rain gauges in the period 1975–2017. The results show that both STK and RF can handle extreme missingness, while PMM requires larger observed sample sizes. STK is the most robust method, suitable for chronological missing patterns. RF is efficient under random missing patterns. Model evaluation is usually based on performance and error measures. However, this study outlines the risk of just relying on these measures without checking for consistency. The RF algorithm overestimated daily precipitation outside the validation period in some cases due to the overdetection of rainy days under time-dependent missing patterns.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

A. Mahmood, Wisam, Mohammed S. Rashid, Teaba Wala Aldeen und Teaba Wala Aldeen. „CHOOSING APPROPRIATE IMPUTATION METHODS FOR MISSING DATA: A DECISION ALGORITHM ON METHODS FOR MISSING DATA“. Journal of Al-Qadisiyah for computer science and mathematics 11, Nr. 2 (05.09.2019): 65–73. http://dx.doi.org/10.29304/jqcm.2019.11.2.588.

Der volle Inhalt der Quelle

Annotation:

Missing values commonly happen in the realm of medical research, which is regarded creating a lot of bias in case it is neglected with poor handling. However, while dealing with such challenges, some standard statistical methods have been already developed and available, yet no credible method is available so far to infer credible estimates. The existing data size gets lowered, apart from a decrease in efficiency happens when missing values is found in a dataset. A number of imputation methods have addressed such challenges in early scholarly works for handling missing values. Some of the regular methods include complete case method, mean imputation method, Last Observation Carried Forward (LOCF) method, Expectation-Maximization (EM) algorithm, and Markov Chain Monte Carlo (MCMC), Mean Imputation (Mean), Hot Deck (HOT), Regression Imputation (Regress), K-nearest neighbor (KNN),K-Mean Clustering, Fuzzy K-Mean Clustering, Support Vector Machine, and Multiple Imputation (MI) method. In the present paper, a simulation study is attempted for carrying out an investigative exploration into the efficacy of the above mentioned archetypal imputation methods along with longitudinal data setting under missing completely at random (MCAR). We took out missingness from three cases in a block having low missingness of 5% as well as higher levels at 30% and 50%. With this simulation study, we concluded LOCF method having more bias than the other methods in most of the situations after carrying out a comparison through simulation study.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Ejima, Keisuke, Roger Zoh, Carmen Tekwe, David Allison und Andrew Brown. „What Proportion of Planned Missing Data Is Allowed for Unbiased Estimates of the Association Between Energy Intake and Body Weight Using Multiple Imputation?“ Current Developments in Nutrition 4, Supplement_2 (29.05.2020): 1167. http://dx.doi.org/10.1093/cdn/nzaa056_014.

Der volle Inhalt der Quelle

Annotation:

Abstract Objectives A gold standard method to measure energy intake (EI) is doubly labeled water (DLW), but it is expensive and not feasible for large studies. EI from self-report (EISR) is prone to bias, but is still widely used due to convenience; however, estimated associations between EISR and outcomes are biased in many cases. Double sampling with multiple imputation (MI) involves obtaining gold standard (e.g., EIDLW) measurements on a random subsample, and proxy data (e.g., EISR) on the whole sample, and recovering missing gold standard information using MI. However, it is not known what proportion of missingness in EIDLW is acceptable to obtain unbiased estimates of associations between EI and outcomes. Methods We used body weight as an example outcome from the CALERIE Study (N = 218). We performed two regressions on the complete dataset: EIDLW as a predictor and body weight (kg) as an outcome to estimate the ‘true’ coefficient (denoted βDLW), or using EISR as the predictor (βSR). Random subsets of EIDLW were deleted (10% to 90% of full data in 10% increments) to simulate obtaining EIDLW data on only a subset of participants. Regressions were performed using the subset EIDLW data using two different approaches: complete case analysis of only the subset (βDLWsub) and MI informed by EISR on the full data set (βMI). Bias was estimated as the difference between βDLW and βSR, between βDLW and βDLWsub for each EIDLW subset, and between βDLW and βMI for each subset. Resampling was repeated 100 times to assess the uncertainty of the bias. Results Bias of EISR was substantial (∼50%). Bias of βDLWsub was not significantly different from zero for all proportions of missing EIDLW; 95% CIs increased as proportion of missingness increased (as expected). Bias for βMI was not significantly different from zero for missingness of EIDLW up to 80%. βMI was significantly negatively biased toward βSR when the proportion of missingness was 90%. 95%CIs of βMI estimates were narrower than those of βDLWsub for all amounts of missingness. Conclusions Unbiased, more precise estimates of the association between EI and body weight using MI were obtained with missing EIDLW as high as 80%. Obtaining gold standard data collection on subsets may allow for unbiased estimates using self-report data feasible in larger samples. Funding Sources NIH R25HL124208. JSPS KAKENHI 18K18146. Meiji Yasuda Foundation of Health and Welfare 2019.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

Mebane, Walter R., und Paul Poast. „Causal Inference without Ignorability: Identification with Nonrandom Assignment and Missing Treatment Data“. Political Analysis 21, Nr. 2 (2013): 233–51. http://dx.doi.org/10.1093/pan/mps043.

Der volle Inhalt der Quelle

Annotation:

How a treatment causes a particular outcome is a focus of inquiry in political science. When treatment data are either nonrandomly assigned or missing, the analyst will often invoke ignorability assumptions: that is, both the treatment and missingness are assumed to be as if randomly assigned, perhaps conditional on a set of observed covariates. But what if these assumptions are wrong? What if the analyst does not know why—or even if—a particular subject received a treatment? Building on Manski, Molinari offers an approach for calculating nonparametric identification bounds for the average treatment effect of a binary treatment under general missingness or nonrandom assignment. To make these bounds substantively more informative, Molinari's technique permits adding monotonicity assumptions (e.g., assuming that treatment effects are weakly positive). Given the potential importance of these assumptions, we develop a new Bayesian method for performing sensitivity analysis regarding them. This sensitivity analysis allows analysts to interpret the assumptions' consequences quantitatively and visually. We apply this method to two problems in political science, highlighting the method's utility for applied research.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Meisner, Jonas, Siyang Liu, Mingxi Huang und Anders Albrechtsen. „Large-scale inference of population structure in presence of missingness using PCA“. Bioinformatics 37, Nr. 13 (18.01.2021): 1868–75. http://dx.doi.org/10.1093/bioinformatics/btab027.

Der volle Inhalt der Quelle

Annotation:

Abstract Motivation Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information. Results We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods cannot handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08×. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU’s capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets. Availability and implementation EMU is written in Python and is freely available at https://github.com/rosemeis/emu. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Lee, Daniel Y., Jeffrey R. Harring und Laura M. Stapleton. „Comparing Methods for Addressing Missingness in Longitudinal Modeling of Panel Data“. Journal of Experimental Education 87, Nr. 4 (21.06.2019): 596–615. http://dx.doi.org/10.1080/00220973.2018.1520683.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Kyoung, Yujung, und Keunbaik Lee. „Bayesian Pattern Mixture Model for Longitudinal Binary Data with Nonignorable Missingness“. Communications for Statistical Applications and Methods 22, Nr. 6 (30.11.2015): 589–98. http://dx.doi.org/10.5351/csam.2015.22.6.589.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Allen, Andrew S., Julianne S. Collins, Paul J. Rathouz, Craig L. Selander und Glen A. Satten. „Bootstrap calibration of TRANSMIT for informative missingness of parental genotype data“. BMC Genetics 4, Suppl 1 (2003): S39. http://dx.doi.org/10.1186/1471-2156-4-s1-s39.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Hu, Zonghui, Dean A. Follmann und Jing Qin. „Semiparametric Double Balancing Score Estimation for Incomplete Data With Ignorable Missingness“. Journal of the American Statistical Association 107, Nr. 497 (März 2012): 247–57. http://dx.doi.org/10.1080/01621459.2012.656009.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Lu, Tao. „Jointly modeling skew longitudinal survival data with missingness and mismeasured covariates“. Journal of Applied Statistics 44, Nr. 13 (10.11.2016): 2354–67. http://dx.doi.org/10.1080/02664763.2016.1254728.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Sotto, Cristina, Caroline Beunckens, Geert Molenberghs, Ivy Jansen und Geert Verbeke. „Marginalizing pattern-mixture models for categorical data subject to monotone missingness“. Metrika 69, Nr. 2-3 (05.12.2008): 305–36. http://dx.doi.org/10.1007/s00184-008-0219-y.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Neuilly, Melanie-Angela, Ming-Li Hsieh, Alex Kigerl und Zachary K. Hamilton. „Data Missingness Patterns in Homicide Datasets: An Applied Test on a Primary Data Set“. Violence and Victims 35, Nr. 4 (01.08.2020): 589–614. http://dx.doi.org/10.1891/vv-d-17-00189.

Der volle Inhalt der Quelle

Annotation:

Research on homicide missing data conventionally posits a Missing At Random pattern despite the relationship between missing data and clearance. The latter, however, cannot be satisfactorily modeled using variables traditionally available in homicide datasets. For this reason, it has been argued that missingness in homicide data follows a Nonignorable pattern instead. Hence, the use of multiple imputation strategies as recommended in the field for ignorable patterns would thus pose a threat to the validity of results obtained in such a way. This study examines missing data mechanisms by using a set of primary data collected in New Jersey. After comparing Listwise Deletion, Multiple Imputation, Propensity Score Matching, and Log-Multiplicative Association Models, our findings underscore that data in homicide datasets are indeed Missing Not At Random.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Turner, Elizabeth L., Lanqiu Yao, Fan Li und Melanie Prague. „Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness“. Statistical Methods in Medical Research 29, Nr. 5 (11.07.2019): 1338–53. http://dx.doi.org/10.1177/0962280219859915.

Der volle Inhalt der Quelle

Annotation:

The generalized estimating equation (GEE) approach can be used to analyze cluster randomized trial data to obtain population-averaged intervention effects. However, most cluster randomized trials have some missing outcome data and a GEE analysis of available data may be biased when outcome data are not missing completely at random. Although multilevel multiple imputation for GEE (MMI-GEE) has been widely used, alternative approaches such as weighted GEE are less common in practice. Using both simulations and a real data example, we evaluate the performance of inverse probability weighted GEE vs. MMI-GEE for binary outcomes. Simulated data are generated assuming a covariate-dependent missing data pattern across a range of missingness clustering (from none to high), where all covariates are measured at baseline and are fully observed (i.e. a type of missing-at-random mechanism). Two types of weights are estimated and used in the weighted GEE: (1) assuming no clustering of missingness (W-GEE) and (2) accounting for such clustering (CW-GEE). Results show that, even in settings with high missingness clustering, CW-GEE can lead to more bias and lower coverage than W-GEE, whereas W-GEE and MMI-GEE provide comparable results. W-GEE should be considered a viable strategy to account for missing outcomes in cluster randomized trials.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!