Siga este enlace para ver otros tipos de publicaciones sobre el tema: Uncertain imputation.

Artículos de revistas sobre el tema "Uncertain imputation"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Uncertain imputation".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

G.V., Suresh y Srinivasa Reddy E.V. "Uncertain Data Analysis with Regularized XGBoost". Webology 19, n.º 1 (20 de enero de 2022): 3722–40. http://dx.doi.org/10.14704/web/v19i1/web19245.

Texto completo
Resumen
Uncertainty is a ubiquitous element in available knowledge about the real world. Data sampling error, obsolete sources, network latency, and transmission error are all factors that contribute to the uncertainty. These kinds of uncertainty have to be handled cautiously, or else the classification results could be unreliable or even erroneous. There are numerous methodologies developed to comprehend and control uncertainty in data. There are many faces for uncertainty i.e., inconsistency, imprecision, ambiguity, incompleteness, vagueness, unpredictability, noise, and unreliability. Missing information is inevitable in real-world data sets. While some conventional multiple imputation approaches are well studied and have shown empirical validity, they entail limitations in processing large datasets with complex data structures. In addition, these standard approaches tend to be computationally inefficient for medium and large datasets. In this paper, we propose a scalable multiple imputation frameworks based on XGBoost, bootstrapping and regularized method. XGBoost, one of the fastest implementations of gradient boosted trees, is able to automatically retain interactions and non-linear relations in a dataset while achieving high computational efficiency with the aid of bootstrapping and regularized methods. In the context of high-dimensional data, this methodology provides fewer biased estimates and reflects acceptable imputation variability than previous regression approaches. We validate our adaptive imputation approaches with standard methods on numerical and real data sets and shown promising results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Wang, Jianwei, Ying Zhang, Kai Wang, Xuemin Lin y Wenjie Zhang. "Missing Data Imputation with Uncertainty-Driven Network". Proceedings of the ACM on Management of Data 2, n.º 3 (29 de mayo de 2024): 1–25. http://dx.doi.org/10.1145/3654920.

Texto completo
Resumen
We study the problem of missing data imputation, which is a fundamental task in the area of data quality that aims to impute the missing data to achieve the completeness of datasets. Though the recent distribution-modeling-based techniques (e.g., distribution generation and distribution matching) can achieve state-of-the-art performance in terms of imputation accuracy, we notice that (1) they deploy a sophisticated deep learning model that tends to be overfitting for missing data imputation; (2) they directly rely on a global data distribution while overlooking the local information. Driven by the inherent variability in both missing data and missing mechanisms, in this paper, we explore the uncertain nature of this task and aim to address the limitations of existing works by proposing an u<u>N</u>certainty-driven netw<u>O</u>rk for <u>M</u>issing data <u>I</u>mputation, termed NOMI. NOMI has three key components, i.e., the retrieval module, the neural network gaussian process imputator (NNGPI) and the uncertainty-based calibration module. NOMI~ runs these components sequentially and in an iterative manner to achieve a better imputation performance. Specifically, in the retrieval module, NOMI~ retrieves local neighbors of the incomplete data samples based on the pre-defined similarity metric. Subsequently, we design NNGPI~ that merges the advantages of both the Gaussian Process and the universal approximation capacity of neural networks. NNGPI~ models the uncertainty by learning the posterior distribution over the data to impute missing values while alleviating the overfitting issue. Moreover, we further propose an uncertainty-based calibration module that utilizes the uncertainty of the imputator on its prediction to help the retrieval module obtain more reliable local information, thereby further enhancing the imputation performance. We also demonstrate that our NOMI~ can be reformulated as an instance of the well-known Expectation Maximization (EM) algorithm, highlighting the strong theoretical foundation of our proposed methods. Extensive experiments are conducted over 12 real-world datasets. The results demonstrate the excellent performance of NOMI in terms of both accuracy and efficiency.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Elimam, Rayane, Nicolas Sutton-Charani, Stéphane Perrey y Jacky Montmain. "Uncertain imputation for time-series forecasting: Application to COVID-19 daily mortality prediction". PLOS Digital Health 1, n.º 10 (25 de octubre de 2022): e0000115. http://dx.doi.org/10.1371/journal.pdig.0000115.

Texto completo
Resumen
The object of this study is to put forward uncertainty modeling associated with missing time series data imputation in a predictive context. We propose three imputation methods associated with uncertainty modeling. These methods are evaluated on a COVID-19 dataset out of which some values have been randomly removed. The dataset contains the numbers of daily COVID-19 confirmed diagnoses (“new cases”) and daily deaths (“new deaths”) recorded since the start of the pandemic up to July 2021. The considered task is to predict the number of new deaths 7 days in advance. The more values are missing, the higher the imputation impact is on the predictive performances. The Evidential K-Nearest Neighbors (EKNN) algorithm is used for its ability to take into account labels uncertainty. Experiments are provided to measure the benefits of the label uncertainty models. Results show the positive impact of uncertainty models on imputation performances, especially in a noisy context where the number of missing values is high.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Liang, Pei, Junhua Hu, Yongmei Liu y Xiaohong Chen. "Public resources allocation using an uncertain cooperative game among vulnerable groups". Kybernetes 48, n.º 8 (2 de septiembre de 2019): 1606–25. http://dx.doi.org/10.1108/k-03-2018-0146.

Texto completo
Resumen
Purpose This paper aims to solve the problem of public resource allocation among vulnerable groups by proposing a new method called uncertain α-coordination value based on uncertain cooperative game. Design/methodology/approach First, explicit forms of uncertain Shapley value with Chouqet integral form and uncertain centre-of-gravity of imputation-set (CIS) value are defined separately on the basis of uncertainty theory and cooperative game. Then, a convex combination of the two values above called the uncertain α-coordination value is used as the best solution. This study proves that the proposed methods meet the basic properties of cooperative game. Findings The uncertain α-coordination value is used to solve a public medical resource allocation problem in fuzzy coalitions and uncertain payoffs. Compared with other methods, the α-coordination value can solve such problem effectively because it balances the worries of vulnerable group’s further development and group fairness. Originality/value In this paper, an extension of classical cooperative game called uncertain cooperative game is proposed, in which players choose any level of participation in a game and relate uncertainty with the value of the game. A new function called uncertain α-Coordination value is proposed to allocate public resources amongst vulnerable groups in an uncertain environment, a topic that has not been explored yet. The definitions of uncertain Shapley value with Choquet integral form and uncertain CIS value are proposed separately to establish uncertain α-Coordination value.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Bleidorn, Michel Trarbach, Wanderson de Paula Pinto, Isamara Maria Schmidt, Antonio Sergio Ferreira Mendonça y José Antonio Tosta dos Reis. "Methodological approaches for imputing missing data into monthly flows series". Ambiente e Agua - An Interdisciplinary Journal of Applied Science 17, n.º 2 (5 de abril de 2022): 1–27. http://dx.doi.org/10.4136/ambi-agua.2795.

Texto completo
Resumen
Missing data is one of the main difficulties in working with fluviometric records. Database gaps may result from fluviometric stations components problems, monitoring interruptions and lack of observers. Incomplete series analysis generates uncertain results, negatively impacting water resources management. Thus, proper missing data consideration is very important to ensure better information quality. This work aims to analyze, comparatively, missing data imputation methodologies in monthly river-flow time series, considering, as a case study, the Doce River, located in Southeast Brazil. Missing data were simulated in 5%, 10%, 15%, 25% and 40% proportions following a random distribution pattern, ignoring the missing data generation mechanisms. Ten missing data imputation methodologies were used: arithmetic mean, median, simple and multiple linear regression, regional weighting, spline and Stineman interpolation, Kalman smoothing, multiple imputation and maximum likelihood. Their performances were compared through bias, root mean square error, absolute mean percentage error, determination coefficient and concordance index. Results indicate that for 5% missing data, any methodology for imputing can be considered, recommending caution for arithmetic mean method application. However, as the missing data proportion increases, it is recommended to use multiple imputation and maximum likelihood methodologies when there are support stations for imputation, and the Stineman interpolation and Kalman Smoothing methods when only the studied series is available. Keywords: Doce river, imputation, missing data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Gromova, Ekaterina, Anastasiya Malakhova y Arsen Palestini. "Payoff Distribution in a Multi-Company Extraction Game with Uncertain Duration". Mathematics 6, n.º 9 (11 de septiembre de 2018): 165. http://dx.doi.org/10.3390/math6090165.

Texto completo
Resumen
A nonrenewable resource extraction game model is analyzed in a differential game theory framework with random duration. If the cumulative distribution function (c.d.f.) of the final time is discontinuous, the related subgames are differentiated based on the position of the initial instant with respect to the jump. We investigate properties of optimal trajectories and of imputation distribution procedures if the game is played cooperatively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Lee, Jung Yeon, Myeong-Kyu Kim y Wonkuk Kim. "Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates". Mathematics 8, n.º 2 (8 de febrero de 2020): 217. http://dx.doi.org/10.3390/math8020217.

Texto completo
Resumen
Low-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly uncertain; therefore, the uncertain genotypes are usually removed or imputed before performing a statistical analysis. It may result in the inflated type I error rate and in a loss of statistical power. In this paper, we propose a mixture-based penalized score association test adjusting for non-genetic covariates. The proposed score test statistic is based on a sandwich variance estimator so that it is robust under the model misspecification between the covariates and the latent genotypes. The proposed method takes advantage of not requiring either external imputation or elimination of uncertain genotypes. The results of our simulation study show that the type I error rates are well controlled and the proposed association test have reasonable statistical power. As an illustration, we apply our statistic to pharmacogenomics data for drug responsiveness among 400 epilepsy patients.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Griffin, James M., Jino Mathew, Antal Gasparics, Gábor Vértesy, Inge Uytdenhouwen, Rachid Chaouadi y Michael E. Fitzpatrick. "Machine-Learning Approach to Determine Surface Quality on a Reactor Pressure Vessel (RPV) Steel". Applied Sciences 12, n.º 8 (7 de abril de 2022): 3721. http://dx.doi.org/10.3390/app12083721.

Texto completo
Resumen
Surface quality measures such as roughness, and especially its uncertain character, affect most magnetic non-destructive testing methods and limits their performance in terms of an achievable signal-to-noise ratio and reliability. This paper is primarily focused on an experimental study targeting nuclear reactor materials manufactured from the milling process with various machining parameters to produce varying surface quality conditions to mimic the varying material surface qualities of in-field conditions. From energising a local area electromagnetically, a receiver coil is used to obtain the emitted Barkhausen noise, from which the condition of the material surface can be inspected. Investigations were carried out with the support of machine-learning algorithms, such as Neural Networks (NN) and Classification and Regression Trees (CART), to identify the differences in surface quality. Another challenge often faced is undertaking an analysis with limited experimental data. Other non-destructive methods such as Magnetic Adaptive Testing (MAT) were used to provide data imputation for missing data using other intelligent algorithms. For data reinforcement, data augmentation was used. With more data the problem of ‘the curse of data dimensionality’ is addressed. It demonstrated how both data imputation and augmentation can improve measurement datasets.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

FLÅM, S. D. y Y. M. ERMOLIEV. "Investment, uncertainty, and production games". Environment and Development Economics 14, n.º 1 (febrero de 2009): 51–66. http://dx.doi.org/10.1017/s1355770x08004579.

Texto completo
Resumen
ABSTRACTThis paper explores a few cooperative aspects of investments in uncertain, real options. By hypothesis some production commitments, factors, or quotas are transferable. Cases in point include energy supply, emission of pollutants, and harvest of renewable resources. Of particular interest are technologies or projects that provide anti-correlated returns. Any such project stabilizes the aggregate proceeds. Therefore, given widespread risk aversion, a project of this sort merits a bonus. The setting is formalized as a two-stage, stochastic, production game. Absent economies of scale, such games are quite tractable in analysis, computation, and realization. A core imputation comes in terms of shadow prices that equilibrate competitive, endogenous markets. Such prices emerge as optimal dual solutions to coordinated production programs, featuring pooled commitments, or resources. Alternatively, the prices could result from repeated exchange.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Le, H., S. Batterman, K. Dombrowski, R. Wahl, J. Wirth, E. Wasilevich y M. Depa. "A Comparison of Multiple Imputation and Optimal Estimation for Missing and Uncertain Urban Air Toxics Data". Epidemiology 17, Suppl (noviembre de 2006): S242. http://dx.doi.org/10.1097/00001648-200611001-00624.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Jin, Weijian y Yajing Zhang. "Identifying Critical Success Factors of an Emergency Information Response System Based on the Similar-DEMATEL Method". Sustainability 15, n.º 20 (12 de octubre de 2023): 14823. http://dx.doi.org/10.3390/su152014823.

Texto completo
Resumen
An emergency information response system (EIRS) is a system that utilizes various intelligence technologies to effectively handle various emergencies and provide decision support for decision-makers. As critical success factors (CSFs) in EIRS play a vital role in emergency management, it is necessary to study the CSFs of EIRS. Most previous studies applied the Decision Experiment and Decision-Making Trial and Evaluation Laboratory (DEMATEL) method with complete evaluation information to identify CSFs. Due to the complexity of the decision-making environment when identifying CSFs of EIRS, decision-makers sometimes cannot provide complete evaluation information during the decision-making process. To fill this gap, this paper provided a Similar-DEMATEL method to impute the missing values and identify CSFs of EIRS, which may avoid the dilemma of decision distortion and make decision-making results more accurate. It is found that the factors of Information mining capability, Equipment support capability, Monitoring and early warning capability, and Organization participation capability are the CSFs in EIRS. Our proposed method differs from previous research, such as the mean imputation method, to impute the missing values. We compared the differences between the proposed method and the mean imputation method and gave the advantages of the proposed method. Our method focuses more on uncertain decision-making environments, which is conducive to improving the efficiency of EIRS in emergency management, and therefore it is more widely adopted.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Shen, Xiaotao y Zheng-Jiang Zhu. "MetFlow: an interactive and integrated workflow for metabolomics data cleaning and differential metabolite discovery". Bioinformatics 35, n.º 16 (2 de enero de 2019): 2870–72. http://dx.doi.org/10.1093/bioinformatics/bty1066.

Texto completo
Resumen
Abstract Summary Mass spectrometry-based metabolomics aims to profile the metabolic changes in biological systems and identify differential metabolites related to physiological phenotypes and aberrant activities. However, many confounding factors during data acquisition complicate metabolomics data, which is characterized by high dimensionality, uncertain degrees of missing and zero values, nonlinearity, unwanted variations and non-normality. Therefore, prior to differential metabolite discovery analysis, various types of data cleaning such as batch alignment, missing value imputation, data normalization and scaling are essentially required for data post-processing. Here, we developed an interactive web server, namely, MetFlow, to provide an integrated and comprehensive workflow for metabolomics data cleaning and differential metabolite discovery. Availability and implementation The MetFlow is freely available on http://metflow.zhulab.cn/. Supplementary information Supplementary data are available at Bioinformatics online.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Poyatos, Rafael, Oliver Sus, Llorenç Badiella, Maurizio Mencuccini y Jordi Martínez-Vilalta. "Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information". Biogeosciences 15, n.º 9 (4 de mayo de 2018): 2601–17. http://dx.doi.org/10.5194/bg-15-2601-2018.

Texto completo
Resumen
Abstract. The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10–80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in the IEFC incomplete dataset (5495 plots) and quantify imputation uncertainty. Resulting spatial patterns of the studied traits in Catalan forests were broadly similar when using species means, regression kriging or the best-performing MICE application, but some important discrepancies were observed at the local level. Our results highlight the need to assess imputation quality beyond just imputation accuracy and show that including environmental information in statistical imputation approaches yields more plausible imputations in spatially explicit plant trait datasets.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

THANGARAJ, V., A. SUGUMARAN y AMIT K. BISWAS. "ON BARGAINING BASED POINT SOLUTION TO COOPERATIVE TU GAMES". International Game Theory Review 09, n.º 02 (junio de 2007): 361–74. http://dx.doi.org/10.1142/s0219198907001461.

Texto completo
Resumen
Consider the cooperative coalition games with side payments. Bargaining sets are calculated for all possible coalition structures to obtain a collection of imputations rather than single imputation. Our aim is to obtain a single payoff vector, which is acceptable by all players of the game under grand coalition. Though Shapely value is a single imputation, it is based on fair divisions rather than bargaining considerations. So, we present a method to obtain a single imputation based on bargaining considerations.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Vidotto, Davide, Jeroen K. Vermunt y Katrijn Van Deun. "Bayesian Latent Class Models for the Multiple Imputation of Categorical Data". Methodology 14, n.º 2 (1 de abril de 2018): 56–68. http://dx.doi.org/10.1027/1614-2241/a000146.

Texto completo
Resumen
Abstract. Latent class analysis has been recently proposed for the multiple imputation (MI) of missing categorical data, using either a standard frequentist approach or a nonparametric Bayesian model called Dirichlet process mixture of multinomial distributions (DPMM). The main advantage of using a latent class model for multiple imputation is that it is very flexible in the sense that it can capture complex relationships in the data given that the number of latent classes is large enough. However, the two existing approaches also have certain disadvantages. The frequentist approach is computationally demanding because it requires estimating many LC models: first models with different number of classes should be estimated to determine the required number of classes and subsequently the selected model is reestimated for multiple bootstrap samples to take into account parameter uncertainty during the imputation stage. Whereas the Bayesian Dirichlet process models perform the model selection and the handling of the parameter uncertainty automatically, the disadvantage of this method is that it tends to use a too small number of clusters during the Gibbs sampling, leading to an underfitting model yielding invalid imputations. In this paper, we propose an alternative approach which combined the strengths of the two existing approaches; that is, we use the Bayesian standard latent class model as an imputation model. We show how model selection can be performed prior to the imputation step using a single run of the Gibbs sampler and, moreover, show how underfitting is prevented by using large values for the hyperparameters of the mixture weights. The results of two simulation studies and one real-data study indicate that with a proper setting of the prior distributions, the Bayesian latent class model yields valid imputations and outperforms competing methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Corder, Nathan y Shu Yang. "Estimating Average Treatment Effects Utilizing Fractional Imputation when Confounders are Subject to Missingness". Journal of Causal Inference 8, n.º 1 (1 de enero de 2020): 249–71. http://dx.doi.org/10.1515/jci-2019-0024.

Texto completo
Resumen
Abstract The problem of missingness in observational data is ubiquitous. When the confounders are missing at random, multiple imputation is commonly used; however, the method requires congeniality conditions for valid inferences, which may not be satisfied when estimating average causal treatment effects. Alternatively, fractional imputation, proposed by Kim 2011, has been implemented to handling missing values in regression context. In this article, we develop fractional imputation methods for estimating the average treatment effects with confounders missing at random. We show that the fractional imputation estimator of the average treatment effect is asymptotically normal, which permits a consistent variance estimate. Via simulation study, we compare fractional imputation’s accuracy and precision with that of multiple imputation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Pettersson, Nicklas. "Bias reduction of finite population imputation by kernel methods". Statistics in Transition new series 14, n.º 1 (4 de marzo de 2013): 139–60. http://dx.doi.org/10.59170/stattrans-2013-009.

Texto completo
Resumen
Missing data is a nuisance in statistics. Real donor imputation can be used with item nonresponse. A pool of donor units with similar values on auxiliary variables is matched to each unit with missing values. The missing value is then replaced by a copy of the corresponding observed value from a randomly drawn donor. Such methods can to some extent protect against nonresponse bias. But bias also depends on the estimator and the nature of the data. We adopt techniques from kernel estimation to combat this bias. Motivated by Pólya urn sampling, we sequentially update the set of potential donors with units already imputed, and use multiple imputations via Bayesian bootstrap to account for imputation uncertainty. Simulations with a single auxiliary variable show that our imputation method performs almost as well as competing methods with linear data, but better when data is nonlinear, especially with large samples.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Seu, Kimseth, Mi-Sun Kang y HwaMin Lee. "An Intelligent Missing Data Imputation Techniques: A Review". JOIV : International Journal on Informatics Visualization 6, n.º 1-2 (31 de mayo de 2022): 278. http://dx.doi.org/10.30630/joiv.6.1-2.935.

Texto completo
Resumen
The incomplete dataset is an unescapable problem in data preprocessing that primarily machine learning algorithms could not employ to train the model. Various data imputation approaches were proposed and challenged each other to resolve this problem. These imputations were established to predict the most appropriate value using different machine learning algorithms with various concepts. Furthermore, accurate estimation of the imputation method is exceptionally critical for some datasets to complete the missing value, especially imputing datasets in medical data. The purpose of this paper is to express the power of the distinguished state-of-the-art benchmarks, which have included the K-nearest Neighbors Imputation (KNNImputer) method, Bayesian Principal Component Analysis (BPCA) Imputation method, Multiple Imputation by Center Equation (MICE) Imputation method, Multiple Imputation with denoising autoencoder neural network (MIDAS) method. These methods have contributed to the achievable resolution to optimize and evaluate the appropriate data points for imputing the missing value. We demonstrate the experiment with all these imputation techniques based on the same four datasets which are collected from the hospital. Both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are utilized to measure the outcome of implementation and compare with each other to prove an extremely robust and appropriate method that overcomes missing data problems. As a result of the experiment, the KNNImputer and MICE have performed better than BPCA and MIDAS imputation, and BPCA has performed better than the MIDAS algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Templ, Matthias. "Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values". Mathematics 11, n.º 12 (16 de junio de 2023): 2729. http://dx.doi.org/10.3390/math11122729.

Texto completo
Resumen
Navigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers—those true yet unusual values of your population—and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

van Amsterdam, Wouter A. C., Netanja I. Harlianto, Joost J. C. Verhoeff, Pim Moeskops, Pim A. de Jong y Tim Leiner. "The Association between Muscle Quantity and Overall Survival Depends on Muscle Radiodensity: A Cohort Study in Non-Small-Cell Lung Cancer Patients". Journal of Personalized Medicine 12, n.º 7 (21 de julio de 2022): 1191. http://dx.doi.org/10.3390/jpm12071191.

Texto completo
Resumen
The prognostic value of CT-derived muscle quantity for overall survival (OS) in patients with non-small-cell lung cancer (NSCLC) is uncertain due to conflicting evidence. We hypothesize that increased muscle quantity is associated with better OS in patients with normal muscle radiodensity but not in patients with fatty degeneration of muscle tissue and low muscle radiodensity. We performed an observational cohort study in NSCLC patients treated with radiotherapy. A deep learning algorithm was used to measure muscle quantity as psoas muscle index (PMI) and psoas muscle radiodensity (PMD) on computed tomography. The potential interaction between PMI and PMD for OS was investigated using Cox proportional-hazards regression. Baseline adjustment variables were age, sex, histology, performance score and body mass index. We investigated non-linear effects of continuous variables and imputed missing values using multiple imputation. We included 2840 patients and observed 1975 deaths in 5903 patient years. The average age was 68.9 years (standard deviation 10.4, range 32 to 96) and 1692 patients (59.6%) were male. PMI was more positively associated with OS for higher values of PMD (hazard ratio for interaction 0.915; 95% confidence interval 0.861–0.972; p-value 0.004). We found evidence that high muscle quantity is associated with better OS when muscle radiodensity is higher, in a large cohort of NSCLC patients treated with radiotherapy. Future studies on the association between muscle status and OS should accommodate this interaction in their analysis for more accurate and more generalizable results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Osintsev, D. V. "Administrative Liability for Discrediting the Activities of Public Authorities". Siberian Law Review 20, n.º 4 (6 de septiembre de 2023): 355–66. http://dx.doi.org/10.19073/2658-7602-2023-20-4-355-366.

Texto completo
Resumen
The subject of the analysis is the relations and activities of law enforcement agencies related to the qualification of administrative offenses aimed at discrediting the Armed Forces of the Russian Federation and state authorities in the context of unfriendly relations and armed confrontation between individual countries. Forces of the Russian Federation and public authorities in the conditions of unfriendly attitudes and armed confrontation of certain countries. The purpose of the study is to clarify the phenomenon of verbal and intensional offences of psycho-emotional type, the peculiarities of construction, design and deconstruction of the main elements of the composition of this type of administrative offences. Clarification of the criteria for the separation of conditionally lawful and categorically unlawful act expressed as a negative statement or aggressive unstable attitude of the psychological state of the person who committed an administrative offence also is the purpose of this paper. The research methods are based on the application of the legal text and correlation of the context of circumstances from the positions of the main provisions of propositional logic (derivation of categorical judgements about the meaning of certain terms with uncertain semantics (discrediting, publicity, targeting, etc.)). The doctrine of multiple predicates (which allowed building of matrix structures of admissibility of imputation of an unlawful act to a subject with a certain legal status only, as well as situational negation of admissibility of its imputation, if a full map of the subject's legal status is not available) is also used in the present research. Besides, design of modal constructions (rules of qualification of administrative offences of psycho-emotional type), as well as the logic of evaluations (in terms of weak and strong statements about prohibited or permissible statements about the activity of representatives of the authorities in certain conditions or in certain ways) is used as effective research method . The main conclusions of the study are expressed in the assertion of the priority establishment of amenability and punishability for offences related to the formation of destructive anti-cultural stereotypes, false ideological attitudes, belittling of the traditional way of life and denial of vital foundations of Russian society, the need to eliminate legal constructions with vague and unclear content. Increased attention should be paid to the semantic analysis and targeting of the offender's phrases precisely to replace the upheld public position with egoistic aspirations and planting the priority of personal misconceptions. It is necessary to introduce a separate category of rules of information and semantic analysis when imputing this or that administrative offence.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Ma, Zong-fang, Hui-xuan Zhao, Lei-hua Li y Lin Song. "A Belief Two-Level Weighted Clustering Method for Incomplete Pattern Based on Multiview Fusion". Computational Intelligence and Neuroscience 2022 (30 de noviembre de 2022): 1–11. http://dx.doi.org/10.1155/2022/2895338.

Texto completo
Resumen
Incomplete pattern clustering is a challenging task because the unknown attributes of the missing data introduce uncertain information that affects the accuracy of the results. In addition, the clustering method based on the single view ignores the complementary information from multiple views. Therefore, a new belief two-level weighted clustering method based on multiview fusion (BTC-MV) is proposed to deal with incomplete patterns. Initially, the BTC-MV method estimates the missing data by an attribute-level weighted imputation method with k-nearest neighbor (KNN) strategy based on multiple views. The unknown attributes are replaced by the average of the KNN. Then, the clustering method based on multiple views is proposed for a complete data set with estimations; the view weights represent the reliability of the evidence from different source spaces. The membership values from multiple views, which indicate the probability of the pattern belonging to different categories, reduce the risk of misclustering. Finally, a view-level weighted fusion strategy based on the belief function theory is proposed to integrate the membership values from different source spaces, which improves the accuracy of the clustering task. To validate the performance of the BTC-MV method, extensive experiments are conducted to compare with classical methods, such as MI-KM, MI-KMVC, KNNI-FCM, and KNNI-MFCM. Results on six UCI data sets show that the error rate of the BTC-MV method is lower than that of the other methods. Therefore, it can be concluded that the BTC-MV method has superior performance in dealing with incomplete patterns.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Early, Kirstin, Jennifer Mankoff y Stephen E. Fienberg. "Dynamic Question Ordering in Online Surveys". Journal of Official Statistics 33, n.º 3 (1 de septiembre de 2017): 625–57. http://dx.doi.org/10.1515/jos-2017-0030.

Texto completo
Resumen
Abstract Online surveys have the potential to support adaptive questions, where later questions depend on earlier responses. Past work has taken a rule-based approach, uniformly across all respondents. We envision a richer interpretation of adaptive questions, which we call Dynamic Question Ordering (DQO), where question order is personalized. Such an approach could increase engagement, and therefore response rate, as well as imputation quality. We present a DQO framework to improve survey completion and imputation. In the general survey-taking setting, we want to maximize survey completion, and so we focus on ordering questions to engage the respondent and collect hopefully all information, or at least the information that most characterizes the respondent, for accurate imputations. In another scenario, our goal is to provide a personalized prediction. Since it is possible to give reasonable predictions with only a subset of questions, we are not concerned with motivating users to answer all questions. Instead, we want to order questions to get information that reduces prediction uncertainty, while not being too burdensome. We illustrate this framework with two case studies, for the prediction and survey-taking settings. We also discuss DQO for national surveys and consider connections between our statistics-based question-ordering approach and cognitive survey methodology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Alahamade, Wedad, Iain Lake, Claire E. Reeves y Beatriz De La Iglesia. "Evaluation of multivariate time series clustering for imputation of air pollution data". Geoscientific Instrumentation, Methods and Data Systems 10, n.º 2 (3 de noviembre de 2021): 265–85. http://dx.doi.org/10.5194/gi-10-265-2021.

Texto completo
Resumen
Abstract. Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air-pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However, problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a multivariate time series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutant concentrations produces very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours and of the characteristics of different stations according to their environmental type.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Kim, Mimi, Joan T. Merrill, Cuiling Wang, Shankar Viswanathan, Ken Kalunian, Leslie Hanrahan y Peter Izmirly. "SLE clinical trials: impact of missing data on estimating treatment effects". Lupus Science & Medicine 6, n.º 1 (octubre de 2019): e000348. http://dx.doi.org/10.1136/lupus-2019-000348.

Texto completo
Resumen
ObjectiveA common problem in clinical trials is missing data due to participant dropout and loss to follow-up, an issue which continues to receive considerable attention in the clinical research community. Our objective was to examine and compare current and alternative methods for handling missing data in SLE trials with a particular focus on multiple imputation, a flexible technique that has been applied in different disease settings but not to address missing data in the primary outcome of an SLE trial.MethodsData on 279 patients with SLE randomised to standard of care (SoC) and also receiving mycophenolate mofetil (MMF), azathioprine or methotrexate were obtained from the Lupus Foundation of America-Collective Data Analysis Initiative Database. Complete case analysis (CC), last observation carried forward (LOCF), non-responder imputation (NRI) and multiple imputation (MI) were applied to handle missing data in an analysis to assess differences in SLE Responder Index-5 (SRI-5) response rates at 52 weeks between patients on SoC treated with MMF versus other immunosuppressants (non-MMF).ResultsThe rates of missing data were 32% in the MMF and 23% in the non-MMF groups. As expected, the NRI missing data approach yielded the lowest estimated response rates. The smallest and least significant estimates of differences between groups were observed with LOCF, and precision was lowest with the CC method. Estimated between-group differences were magnified with the MI approach, and imputing SRI-5 directly versus deriving SRI-5 after separately imputing its individual components yielded similar results.ConclusionThe potential advantages of applying MI to address missing data in an SLE trial include reduced bias when estimating treatment effects, and measures of precision that properly reflect uncertainty in the imputations. However, results can vary depending on the imputation model used, and the underlying assumptions should be plausible. Sensitivity analysis should be conducted to demonstrate robustness of results, especially when missing data proportions are high.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

He, Deniu. "Active learning for ordinal classification on incomplete data". Intelligent Data Analysis 27, n.º 3 (18 de mayo de 2023): 613–34. http://dx.doi.org/10.3233/ida-226664.

Texto completo
Resumen
Existing active learning algorithms typically assume that the data provided are complete. Nonetheless, data with missing values are common in real-world applications, and active learning on incomplete data is less studied. This paper studies the problem of active learning for ordinal classification on incomplete data. Although cutting-edge imputation methods can be used to impute the missing values before commencing active learning, inaccurately imputed instances are unavoidable and may degrade the ordinal classifier’s performance once labeled. Therefore, the crucial question in this work is how to reduce the negative impact of imprecisely filled instances on active learning. First, to avoid selecting filled instances with high imputation imprecision, we propose penalizing the query selection with a novel imputation uncertainty measure that combines a feature-level imputation uncertainty and a knowledge-level imputation uncertainty. Second, to mitigate the adverse influence of potentially labeled imprecisely imputed instances, we suggest using a diversity-based uncertainty sampling strategy to select query instances in specified candidate instance regions. Extensive experiments on nine public ordinal classification datasets with varying value missing rates show that the proposed approach outperforms several baseline methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Zhang, Li-Chun. "Likelihood Imputation". Scandinavian Journal of Statistics 25, n.º 2 (junio de 1998): 401–14. http://dx.doi.org/10.1111/1467-9469.00112.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Maldonado-Cruz, Eduardo, John T. Foster y Michael J. Pyrcz. "Sonic Well-Log Imputation Through Machine-Learning-Based Uncertainty Models". Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description 64, n.º 2 (1 de abril de 2023): 253–70. http://dx.doi.org/10.30632/pjv64n2-2023a7.

Texto completo
Resumen
Sonic well logs provide critical information to calibrate seismic data and support geomechanical characterization. Advanced subsurface data analytics and machine learning enable new methods and workflows for property estimation, regression, and classification for geoscience and subsurface engineering applications. However, current applications for imputation of well-logging values rely only on model accuracy and low error predictions. T raditional model validation techniques are not enough to validate models and account for the substantial uncertainty in the subsurface. Well-logging imputation estimates and their associated uncertainty models are essential to the field development planning and decision-making workflows, such as reservoir modeling, volumetric resource assessment, predrill prediction with uncertainty, remaining resource mapping, and production allocation. When performing subsurface feature imputation with machine learning, we must expand our machine-learning model training and complexity tuning workflows to check the entire uncertainty model to ensure uncertainty distributions are precise and accurate. We propose a workflow that integrates the goodness metric to calculate accurate and precise uncertainty models of sonic well-log predictions based on ensembles of the machine-learning estimates. Our workflow combines model evaluation and visualization of the estimates and the uncertainty model with respect to measured depth. Our proposed method provides intuitive diagnostics and metrics to evaluate estimation accuracy and uncertainty model goodness.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Scheuren, Fritz. "Multiple Imputation". American Statistician 59, n.º 4 (noviembre de 2005): 315–19. http://dx.doi.org/10.1198/000313005x74016.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Han, Jongmin y Seokho Kang. "Active learning with missing values considering imputation uncertainty". Knowledge-Based Systems 224 (julio de 2021): 107079. http://dx.doi.org/10.1016/j.knosys.2021.107079.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Mathiowetz, Nancy A. "Respondent Expressions of Uncertainty: Data Source for Imputation". Public Opinion Quarterly 62, n.º 1 (1998): 47. http://dx.doi.org/10.1086/297830.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Heydarbeygie, Akbar y Nima Ahmadi. "Nonparametric methods for the estimation of imputation uncertainty". Journal of Applied Statistics 40, n.º 3 (marzo de 2013): 693–98. http://dx.doi.org/10.1080/02664763.2012.750649.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Chan, Kelvin K. W., Feng Xie, Andrew R. Willan y Eleanor M. Pullenayegum. "Underestimation of Variance of Predicted Health Utilities Derived from Multiattribute Utility Instruments". Medical Decision Making 37, n.º 3 (10 de julio de 2016): 262–72. http://dx.doi.org/10.1177/0272989x16650181.

Texto completo
Resumen
Background. Parameter uncertainty in value sets of multiattribute utility-based instruments (MAUIs) has received little attention previously. This false precision leads to underestimation of the uncertainty of the results of cost-effectiveness analyses. The aim of this study is to examine the use of multiple imputation as a method to account for this uncertainty of MAUI scoring algorithms. Method. We fitted a Bayesian model with random effects for respondents and health states to the data from the original US EQ-5D-3L valuation study, thereby estimating the uncertainty in the EQ-5D-3L scoring algorithm. We applied these results to EQ-5D-3L data from the Commonwealth Fund (CWF) Survey for Sick Adults ( n = 3958), comparing the standard error of the estimated mean utility in the CWF population using the predictive distribution from the Bayesian mixed-effect model (i.e., incorporating parameter uncertainty in the value set) with the standard error of the estimated mean utilities based on multiple imputation and the standard error using the conventional approach of using MAUI (i.e., ignoring uncertainty in the value set). Result. The mean utility in the CWF population based on the predictive distribution of the Bayesian model was 0.827 with a standard error (SE) of 0.011. When utilities were derived using the conventional approach, the estimated mean utility was 0.827 with an SE of 0.003, which is only 25% of the SE based on the full predictive distribution of the mixed-effect model. Using multiple imputation with 20 imputed sets, the mean utility was 0.828 with an SE of 0.011, which is similar to the SE based on the full predictive distribution. Conclusion. Ignoring uncertainty of the predicted health utilities derived from MAUIs could lead to substantial underestimation of the variance of mean utilities. Multiple imputation corrects for this underestimation so that the results of cost-effectiveness analyses using MAUIs can report the correct degree of uncertainty.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Norris, David C. y Andrew Wilson. "Early-childhood housing mobility and subsequent PTSD in adolescence: a Moving to Opportunity reanalysis". F1000Research 5 (27 de mayo de 2016): 1014. http://dx.doi.org/10.12688/f1000research.8753.1.

Texto completo
Resumen
In a 2014 report on adolescent mental health outcomes in the Moving to Opportunity for Fair Housing Demonstration (MTO), Kessler et al. reported that, at 10- to 15-year follow-up, boys from households randomized to an experimental housing voucher intervention experienced 12-month prevalence of post-traumatic stress disorder (PTSD) at several times the rate of boys from control households. We reanalyze this finding here, bringing to light a PTSD outcome imputation procedure used in the original analysis, but not described in the study report. By bootstrapping with repeated draws from the frequentist sampling distribution of the imputation model used by Kessler et al., and by varying two pseudorandom number generator seeds that fed their analysis, we account for several purely statistical components of the uncertainty inherent in their imputation procedure. We also discuss other sources of uncertainty in this procedure that were not accessible to a formal reanalysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Lewis, Taylor, Elizabeth Goldberg, Nathaniel Schenker, Vladislav Beresovsky, Susan Schappert, Sandra Decker, Nancy Sonnenfeld y Iris Shimizu. "The Relative Impacts of Design Effects and Multiple Imputation on Variance Estimates: A Case Study with the 2008 National Ambulatory Medical Care Survey". Journal of Official Statistics 30, n.º 1 (1 de marzo de 2014): 147–61. http://dx.doi.org/10.2478/jos-2014-0008.

Texto completo
Resumen
Abstract The National Ambulatory Medical Care Survey collects data on office-based physician care from a nationally representative, multistage sampling scheme where the ultimate unit of analysis is a patient-doctor encounter. Patient race, a commonly analyzed demographic, has been subject to a steadily increasing item nonresponse rate. In 1999, race was missing for 17 percent of cases; by 2008, that figure had risen to 33 percent. Over this entire period, single imputation has been the compensation method employed. Recent research at the National Center for Health Statistics evaluated multiply imputing race to better represent the missing-data uncertainty. Given item nonresponse rates of 30 percent or greater, we were surprised to find many estimates’ ratios of multiple-imputation to single-imputation estimated standard errors close to 1. A likely explanation is that the design effects attributable to the complex sample design largely outweigh any increase in variance attributable to missing-data uncertainty.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Yang, Yingjie, Sifeng Liu y Naiming Xie. "Uncertainty and grey data analytics". Marine Economics and Management 2, n.º 2 (1 de julio de 2019): 73–86. http://dx.doi.org/10.1108/maem-08-2019-0006.

Texto completo
Resumen
Purpose The purpose of this paper is to propose a framework for data analytics where everything is grey in nature and the associated uncertainty is considered as an essential part in data collection, profiling, imputation, analysis and decision making. Design/methodology/approach A comparative study is conducted between the available uncertainty models and the feasibility of grey systems is highlighted. Furthermore, a general framework for the integration of grey systems and grey sets into data analytics is proposed. Findings Grey systems and grey sets are useful not only for small data, but also big data as well. It is complementary to other models and can play a significant role in data analytics. Research limitations/implications The proposed framework brings a radical change in data analytics. It may bring a fundamental change in our way to deal with uncertainties. Practical implications The proposed model has the potential to avoid the mistake from a misleading data imputation. Social implications The proposed model takes the philosophy of grey systems in recognising the limitation of our knowledge which has significant implications in our way to deal with our social life and relations. Originality/value This is the first time that the whole data analytics is considered from the point of view of grey systems.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Chion, Marie, Christine Carapito y Frédéric Bertrand. "Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics". PLOS Computational Biology 18, n.º 8 (29 de agosto de 2022): e1010420. http://dx.doi.org/10.1371/journal.pcbi.1010420.

Texto completo
Resumen
Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters’ variability thanks to Rubin’s rules. The imputation-based peptide’s intensities’ variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Rubin, Donald B. "Discussion on Multiple Imputation". International Statistical Review 71, n.º 3 (15 de enero de 2007): 619–25. http://dx.doi.org/10.1111/j.1751-5823.2003.tb00216.x.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Kim, Jae Kwang y Hyeonah Park. "Imputation using response probability". Canadian Journal of Statistics 34, n.º 1 (marzo de 2006): 171–82. http://dx.doi.org/10.1002/cjs.5550340112.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Schafer, Joseph L. "Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ". Statistica Neerlandica 57, n.º 1 (febrero de 2003): 19–35. http://dx.doi.org/10.1111/1467-9574.00218.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Carruthers, Thomas, Laurence Kell y Carlos Palma. "Accounting for uncertainty due to data processing in virtual population analysis using Bayesian multiple imputation". Canadian Journal of Fisheries and Aquatic Sciences 75, n.º 6 (junio de 2018): 883–96. http://dx.doi.org/10.1139/cjfas-2017-0165.

Texto completo
Resumen
Virtual population analysis (VPA) is used in many stock assessment settings and requires a total catch-at-age data set where an age is assigned to each fish that has been caught. These data sets are typically constructed using ad hoc methods that rely on numerous assumptions. Although approaches are available to account for observation error in these data, no statistically rigorous methods have been developed to account for uncertainty from data processing. To address this, we investigated a Bayesian multiple imputation approach to filling missing size data. Using Atlantic yellowfin tuna (Thunnus albacares) and bigeye tuna (Thunnus obesus) as case studies, we evaluated the hypothesis that data processing is as important in determining management reference points in stock assessments as conventional sources of uncertainty. Size imputation models accounting for location, season, and year provided good predictive capacity. Uncertainty from data processing could be large; however, the circumstances for this were unpredictable and varied depending on the stock. These results indicate that VPA assessments should attempt to account for uncertainty in data processing to avoid potentially large compression of uncertainty in assessment results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Horton, Nicholas J. y Stuart R. Lipsitz. "Multiple Imputation in Practice". American Statistician 55, n.º 3 (agosto de 2001): 244–54. http://dx.doi.org/10.1198/000313001317098266.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Weedop, K. B., A. Ø. Mooers, C. M. Tucker y W. D. Pearse. "The effect of phylogenetic uncertainty and imputation on EDGE Scores". Animal Conservation 22, n.º 6 (25 de marzo de 2019): 527–36. http://dx.doi.org/10.1111/acv.12495.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Schenker, Nathaniel y A. H. Welsh. "Asymptotic Results for Multiple Imputation". Annals of Statistics 16, n.º 4 (diciembre de 1988): 1550–66. http://dx.doi.org/10.1214/aos/1176351053.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Nielsen, Søren Feodor. "Proper and Improper Multiple Imputation". International Statistical Review 71, n.º 3 (15 de enero de 2007): 593–607. http://dx.doi.org/10.1111/j.1751-5823.2003.tb00214.x.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Rubin, Donald B. "Multiple Imputation after 18+ Years". Journal of the American Statistical Association 91, n.º 434 (junio de 1996): 473–89. http://dx.doi.org/10.1080/01621459.1996.10476908.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Mozharovskyi, Pavlo, Julie Josse y François Husson. "Nonparametric Imputation by Data Depth". Journal of the American Statistical Association 115, n.º 529 (11 de abril de 2019): 241–53. http://dx.doi.org/10.1080/01621459.2018.1543123.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Endres, Eva, Paul Fink y Thomas Augustin. "Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data". Journal of Official Statistics 35, n.º 3 (1 de septiembre de 2019): 599–624. http://dx.doi.org/10.2478/jos-2019-0025.

Texto completo
Resumen
Abstract Statistical matching is the term for the integration of two or more data files that share a partially overlapping set of variables. Its aim is to obtain joint information on variables collected in different surveys based on different observation units. This naturally leads to an identification problem, since there is no observation that contains information on all variables of interest. We develop the first statistical matching micro approach reflecting the natural uncertainty of statistical matching arising from the identification problem in the context of categorical data. A complete synthetic file is obtained by imprecise imputation, replacing missing entries by sets of suitable values. Altogether, we discuss three imprecise imputation strategies and propose ideas for potential refinements. Additionally, we show how the results of imprecise imputation can be embedded into the theory of finite random sets, providing tight lower and upper bounds for probability statements. The results based on a newly developed simulation design–which is customised to the specific requirements for assessing the quality of a statistical matching procedure for categorical data–corroborate that the narrowness of these bounds is practically relevant and that these bounds almost always cover the true parameters.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Elasra, Amira. "Multiple Imputation of Missing Data in Educational Production Functions". Computation 10, n.º 4 (24 de marzo de 2022): 49. http://dx.doi.org/10.3390/computation10040049.

Texto completo
Resumen
Educational production functions rely mostly on longitudinal data that almost always exhibit missing data. This paper contributes to a number of avenues in the literature on the economics of education and applied statistics by reviewing the theoretical foundation of missing data analysis with a special focus on the application of multiple imputation to educational longitudinal studies. Multiple imputation is one of the most prominent methods to surmount this problem. Not only does it account for all available information in the predictors, but it also takes into account the uncertainty generated by the missing data themselves. This paper applies a multiple imputation technique using a fully conditional specification method based on an iterative Markov chain Monte Carlo (MCMC) simulation using a Gibbs sampler algorithm. Previous attempts to use MCMC simulation were applied on relatively small datasets with small numbers of variables. Therefore, another contribution of this paper is its application and comparison of the imputation technique on a large longitudinal English educational study for three iteration specifications. The results of the simulation proved the convergence of the algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Zhang, Zhihui, Xiangjun Xiao, Wen Zhou, Dakai Zhu y Christopher I. Amos. "False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy". Human Molecular Genetics, 9 de agosto de 2021. http://dx.doi.org/10.1093/hmg/ddab203.

Texto completo
Resumen
Abstract Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case–control studies are unbalanced.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía