Journal articles on the topic 'Mixed categorical variables'

To see the other types of publications on this topic, follow the link: Mixed categorical variables.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Mixed categorical variables.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

McCane, Brendan, and Michael Albert. "Distance functions for categorical and mixed variables." Pattern Recognition Letters 29, no. 7 (May 2008): 986–93. http://dx.doi.org/10.1016/j.patrec.2008.01.021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Horníček, Jaroslav, and Hana Řezanková. "Missing Data Imputation for Categorical Variables." Statistika: Statistics and Economy Journal 102, no. 3 (September 2022): 249–60. http://dx.doi.org/10.54694/stat.2022.3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dealing with missing data is a crucial part of everyday data analysis. The IMIC algorithm is a missing data imputation method that can handle mixed numerical and categorical datasets. However, the categorical data are crucial for this work. This paper proposes the new improvement of the IMIC algorithm. The two proposed modifications consider the number of categories in each categorical variable. Based on this information, the factor, which modifies the original measure, is computed. The factor equation is inspired by the Eskin similarity measure that is known in the hierarchical clustering of categorical data. The results show that as the missing value ratio in the dataset grows, better results are achieved using the second modification. The paper also shortly analyzes the advantages and disadvantages of using the IMIC algorithm.
3

Zuo, Yan, Vu Nguyen, Amir Dezfouli, David Alexander, Benjamin Ward Muir, and Iadine Chades. "Mixed-Variable Black-Box Optimisation Using Value Proposal Trees." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 11506–14. http://dx.doi.org/10.1609/aaai.v37i9.26360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Many real-world optimisation problems are defined over both categorical and continuous variables, yet efficient optimisation methods such as Bayesian Optimisation (BO) are ill-equipped to handle such mixed-variable search spaces. The optimisation breadth introduced by categorical variables in the mixed-input setting has seen recent approaches operating on local trust regions, but these methods can be greedy in suboptimal regions of the search space. In this paper, we adopt a holistic view and aim to consolidate optimisation of the categorical and continuous sub-spaces under a single acquisition metric. We develop a tree-based method which retains a global view of the optimisation spaces by identifying regions in the search space with high potential candidates which we call value proposals. Our method uses these proposals to make selections on both the categorical and continuous components of the input. We show that this approach significantly outperforms existing mixed-variable optimisation approaches across several mixed-variable black-box optimisation tasks.
4

Lee, Sik-Yum, Xin-Yuan Song, and Bin Lu. "Discriminant Analysis Using Mixed Continuous, Dichotomous, and Ordered Categorical Variables." Multivariate Behavioral Research 42, no. 4 (December 28, 2007): 631–45. http://dx.doi.org/10.1080/00273170701710114.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Di Nuzzo, Cinzia. "Advancing Spectral Clustering for Categorical and Mixed-Type Data: Insights and Applications." Mathematics 12, no. 4 (February 6, 2024): 508. http://dx.doi.org/10.3390/math12040508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study focuses on adapting spectral clustering, a numeric data-clustering technique, for categorical and mixed-type data. The method enhances spectral clustering for categorical and mixed-type data with novel kernel functions, showing improved accuracy in real-world applications. Despite achieving better clustering for datasets with mixed variables, challenges remain in identifying suitable kernel functions for categorical relationships.
6

Morales, D., L. Pardo, and K. Zografos. "Informational distances and related statistics in mixed continuous and categorical variables." Journal of Statistical Planning and Inference 75, no. 1 (November 1998): 47–63. http://dx.doi.org/10.1016/s0378-3758(98)00120-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ng, Michael K., Elaine Y. Chan, Meko M. C. So, and Wai-Ki Ching. "A semi-supervised regression model for mixed numerical and categorical variables." Pattern Recognition 40, no. 6 (June 2007): 1745–52. http://dx.doi.org/10.1016/j.patcog.2006.06.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Leung, Chi-Ying. "Regularized classification for mixed continuous and categorical variables under across-location heteroscedasticity." Journal of Multivariate Analysis 93, no. 2 (April 2005): 358–74. http://dx.doi.org/10.1016/j.jmva.2004.03.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Munoz Zuniga, Miguel, and Delphine Sinoquet. "Global optimization for mixed categorical-continuous variables based on Gaussian process models with a randomized categorical space exploration step." INFOR: Information Systems and Operational Research 58, no. 2 (March 19, 2020): 310–41. http://dx.doi.org/10.1080/03155986.2020.1730677.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Han, Jisoo, and HyungJun Cho. "A Study on Cluster Analysis of Mixed Data with Continuous and Categorical Variables." Korean Data Analysis Society 20, no. 4 (August 31, 2018): 1769–80. http://dx.doi.org/10.37727/jkdas.2018.20.4.1769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Hamid, Hashibah, Nor Idayu Mahat, and Safwati Ibrahim. "ADAPTIVE VARIABLE EXTRACTIONS WITH LDA FOR CLASSIFICATION OF MIXED VARIABLES, AND APPLICATIONS TO MEDICAL DATA." Journal of Information and Communication Technology 20, Number 3 (June 11, 2021): 305–27. http://dx.doi.org/10.32890/jict2021.20.3.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The strategy surrounding the extraction of a number of mixed variables is examined in this paper in building a model for Linear Discriminant Analysis (LDA). Two methods for extracting crucial variables from a dataset with categorical and continuous variables were employed, namely, multiple correspondence analysis (MCA) and principal component analysis (PCA). However, in this case, direct use of either MCA or PCA on mixed variables is impossible due to restrictions on the structure of data that each method could handle. Therefore, this paper executes some adjustments including a strategy for managing mixed variables so that those mixed variables are equivalent in values. With this, both MCA and PCA can be performed on mixed variables simultaneously. The variables following this strategy of extraction were then utilised in the construction of the LDA model before applying them to classify objects going forward. The suggested models, using three real sets of medical data were then tested, where the results indicated that using a combination of the two methods of MCA and PCA for extraction and LDA could reduce the model’s size, having a positive effect on classifying and better performance of the model since it leads towards minimising the leave-one-out error rate. Accordingly, the models proposed in this paper, including the strategy that was adapted was successful in presenting good results over the full LDA model. Regarding the indicators that were used to extract and to retain the variables in the model, cumulative variance explained (CVE), eigenvalue, and a non-significant shift in the CVE (constant change), could be considered a useful reference or guideline for practitioners experiencing similar issues in future.
12

Gómez-Guerrero, Santiago, Inocencio Ortiz, Gustavo Sosa-Cabrera, Miguel García-Torres, and Christian E. Schaerer. "Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty." Entropy 24, no. 1 (December 30, 2021): 64. http://dx.doi.org/10.3390/e24010064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models.
13

Song, Xin-Yuan, Ye-Mao Xia, and Sik-Yum Lee. "Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables." Statistics in Medicine 28, no. 17 (May 26, 2009): 2253–76. http://dx.doi.org/10.1002/sim.3612.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Mami, Ahmed M., and Ayman Ali Elberjo. "ON USING NONPARAMETRIC REGRESSION METHODS TO ESTIMATE CATEGORICAL OUTCOMES MODELS WITH MIXED DATA TYPES." EPH - International Journal of Applied Science 1, no. 3 (September 27, 2015): 15–22. http://dx.doi.org/10.53555/eijas.v1i3.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Many data analysis methods are sensitive to the type of data under study. When we begin any statistical data analysis, it is very important to recognize the different types of data. Data can take a variety of values or belong to various categories, whichever numerical or nominal. However, there are two types of data, quantitative and qualitative (Categorical) data. The general and powerful methodological approaches for the analysis of quantitative data have been widely taught for several decades. While the analysis for qualitative data analysis have blossomed only in the past 25 years. The need for analysis of categorical data techniques has increased steadily in recent years, in economic, health, social science. However, analysis of categorical data models when the dependent variable binary or multinomial outcomes with mixed explanatory variables are complex. The main goal of this paper is to estimate a nonparametric regression model of the binary and multinomial outcomes models with mixed explanatory variables, it is based on nonparametric conditional CDF method and (PDF) method of bandwidth selection, presented by Li and Racine (2008). Then we have compared it with one of the most common method of parametric regression (the logistic regression model). The comparisons will be based on two criteria depends on their classification ability through Correct Classification Ratio CCR as well as their log likelihood value LLK. We conducted several simulation studies using generated random data (categorical discrete and continues) in order to investigate the performance of both the parametric model and the nonparametric model for binary and multinomial outcomes. Interesting results have been achieved in this work. Application on real-data have also been applied when there exist mixed variables. We make use of dataset of the Household Expenditure Survey (HES).
15

Botana, Iñigo López-Riobóo, Carlos Eiras-Franco, and Amparo Alonso-Betanzos. "Regression Tree Based Explanation for Anomaly Detection Algorithm." Proceedings 54, no. 1 (August 18, 2020): 7. http://dx.doi.org/10.3390/proceedings2020054007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This work presents EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), a novel approach to address explanation using an anomaly detection algorithm, ADMNC, which provides accurate detections on mixed numerical and categorical input spaces. Our improved algorithm leverages the formulation of the ADMNC model to offer pre-hoc explainability based on CART (Classification and Regression Trees). The explanation is presented as a segmentation of the input data into homogeneous groups that can be described with a few variables, offering supervisors novel information for justifications. To prove scalability and interpretability, we list experimental results on real-world large datasets focusing on network intrusion detection domain.
16

Mbuga, Felix, and Cristina Tortora. "Spectral Clustering of Mixed-Type Data." Stats 5, no. 1 (December 23, 2021): 1–11. http://dx.doi.org/10.3390/stats5010001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k-prototypes and KAMILA, using several simulated and real data sets.
17

Bishnoi, Sudha, Nadhir Al-Ansari, Mujahid Khan, Salim Heddam, and Anurag Malik. "Classification of Cotton Genotypes with Mixed Continuous and Categorical Variables: Application of Machine Learning Models." Sustainability 14, no. 20 (October 21, 2022): 13685. http://dx.doi.org/10.3390/su142013685.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Mixed data is a combination of continuous and categorical variables and occurs frequently in fields such as agriculture, remote sensing, biology, medical science, marketing, etc., but only limited work has been done with this type of data. In this study, data on continuous and categorical characters of 452 genotypes of cotton (Gossypium hirsutum) were obtained from an experiment conducted by the Central Institute of Cotton Research (CICR), Sirsa, Haryana (India) during the Kharif season of the year 2018–2019. The machine learning (ML) classifiers/models, namely k-nearest neighbor (KNN), Classification and Regression Tree (CART), C4.5, Naïve Bayes, random forest (RF), bagging, and boosting were considered for cotton genotypes classification. The performance of these ML classifiers was compared to each other along with the linear discriminant analysis (LDA) and logistic regression. The holdout method was used for cross-validation with an 80:20 ratio of training and testing data. The results of the appraisal based on hold-out cross-validation showed that the RF and AdaBoost performed very well, having only two misclassifications with the same accuracy of 97.26% and the error rate of 2.74%. The LDA classifier performed the worst in terms of accuracy, with nine misclassifications. The other performance measures, namely sensitivity, specificity, precision, F1 score, and G-mean, were all together used to find out the best ML classifier among all those considered. Moreover, the RF and AdaBoost algorithms had the highest value of all the performance measures, with 96.97% sensitivity and 97.50% specificity. Thus, these models were found to be the best in classifying the low- and high-yielding cotton genotypes.
18

Brevault, Loïc, and Mathieu Balesdent. "Bayesian Quality-Diversity approaches for constrained optimization problems with mixed continuous, discrete and categorical variables." Engineering Applications of Artificial Intelligence 133 (July 2024): 108118. http://dx.doi.org/10.1016/j.engappai.2024.108118.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Abhishek, Kumar, Sven Leyffer, and Jeffrey T. Linderoth. "Modeling without categorical variables: a mixed-integer nonlinear program for the optimization of thermal insulation systems." Optimization and Engineering 11, no. 2 (March 23, 2010): 185–212. http://dx.doi.org/10.1007/s11081-010-9109-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Irsyifa Mayzela Afnan, Siti Hasanah, Anwar Fitrianto, Erfiani, and Alfa Nugraha. "Klasterisasi Desa di Provinsi Jawa Barat Berdasarkan Indeks Pembangunan Desa (IPD) Tahun 2021 Menggunakan Algoritma K-Prototypes." Jurnal Statistika dan Aplikasinya 7, no. 2 (January 8, 2024): 174–83. http://dx.doi.org/10.21009/jsa.07206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cluster analysis is a method used to group data with similar characteristics. There are various clustering methods adapted to different types of data. K-Prototypes is a clustering method that can be applied to mixed numerical and categorical data. The data used in this study are mixed numerical and categorical data derived from the Village Potential data in 2021. The aim of this research is to group villages in West Java based on variables from the Indeks Pembangunan Desa (IPD). Clustering using three clusters adapted to village status according to IPD resulted in 931 villages in cluster-1, 1880 villages in cluster-2, and 2104 villages in cluster-3. The characteristics of cluster-1 villages are villages that have adequate health and education facilities and good infrastructure conditions. Cluster-2 has an average numeric variable lower than cluster-1 but higher than cluster-3.
21

Şahin, Melek Gülşah, and Yıldız Yıldırım. "The general attitudes towards artificial intelligence (GAAIS): A meta-analytic reliability generalization study." International Journal of Assessment Tools in Education 11, no. 2 (March 22, 2024): 303–19. http://dx.doi.org/10.21449/ijate.1369023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study aims to generalize the reliability of the GAAIS, which is known to perform valid and reliable measurements, is frequently used in the literature, aims to measure one of today's popular topics, and is one of the first examples developed in the field. Within the meta-analytic reliability generalization study, moderator analyses were also conducted on some categorical and continuous variables. Cronbach's α values for the overall scale and the positive and negative subscales, and McDonald's ω coefficients for positive and negative subscales were generalized. Google Scholar, WOS, Taylor & Francis, Science Direct, and EBSCO databases were searched to obtain primary studies. As a result of the screening, 132 studies were found, and these studies were reviewed according to the inclusion criteria. Reliability coefficients obtained from 19 studies that met the criteria were included in the meta-analysis. While meta-analytic reliability generalization was performed according to the random effects model, moderator analyses were performed according to the mixed effect model based on both categorical variables and continuous variables. As a result of the research pooled, Cronbach's α was 0.881, 0.828, and 0.863 for total, the negative, and positive subscales respectively. Also, McDonald's ω was 0.873 and 0.923 for negative and positive subscales respectively. It was found that there were no significant differences between the reliability coefficients for all categorical variables. On the other hand, all continuous moderator variables (mean age, standard deviation age, and rate of female) had a significant effect.
22

Tarsitano, Agostino, and Marianna Falcone. "Missing-Values Adjustment for Mixed-Type Data." Journal of Probability and Statistics 2011 (2011): 1–20. http://dx.doi.org/10.1155/2011/290380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record.
23

Izzati, Hanifa, Indahwati Indahwati, and Anik Djuraidah. "BCBimax Biclustering Algorithm with Mixed-Type Data." JUITA : Jurnal Informatika 12, no. 1 (May 20, 2024): 131. http://dx.doi.org/10.30595/juita.v12i1.21519.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The application of biclustering analysis to mixed data is still relatively new. Initially, biclustering analysis was primarily used on gene expression data that has an interval scale. In this research, we will transform ordinal categorical variables into interval scales using the Method of Successive Interval (MSI). The BCBimax algorithm will be applied in this study with several binarization experiments that produce the smallest Mean Square Residual (MSR) at the predetermined column and row thresholds. Next, a row and column threshold test will be carried out to find the optimal bicluster threshold. The existence of different interests in the variables for international market potential and the number of Indonesian export destination countries is the reason for the need for identification regarding the mapping of destination countries based on international trade potential. The study's results with the median threshold of all data found that the optimal MSR is at the threshold of row 7 and column 2. The number of biclusters formed is 9 which covers 74.7% of countries. Most countries in the bicluster come from the European Continent and a few countries from the African Continent are included in the bicluster.
24

Chrisinta, Debora, I. Made Sumertajaya, and Indahwati Indahwati. "EVALUASI KINERJA METODE CLUSTER ENSEMBLE DAN LATENT CLASS CLUSTERING PADA PEUBAH CAMPURAN." Indonesian Journal of Statistics and Its Applications 4, no. 3 (November 30, 2020): 448–61. http://dx.doi.org/10.29244/ijsa.v4i3.630.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in the real-world often contain both numeric and categorical attributes. It is difficult for applying traditional clustering algorithms directly to these kinds of data. So, the paper aims to show the best method based on the cluster ensemble and latent class clustering approach for mixed data. Cluster ensemble is a method to combine different clustering results from two sub-datasets: the categorical and numerical variables. Then, clustering algorithms are designed for numerical and categorical datasets that are employed to produce corresponding clusters. On the other side, latent class clustering is a model-based clustering used for any type of data. The numbers of clusters base on the estimation of the probability model used. The best clustering method recommends LCC, which provides higher accuracy and the smallest standard deviation ratio. However, both LCC and cluster ensemble methods produce evaluation values that are not much different as the application method used potential village data in Bengkulu Province for clustering.
25

Gukasyan, M., J. Moses, and K. Greenman. "A-44 Categorical Errors on the Benton Visual Retention Test are Systemically Related to Specific Factorial Components of Intelligence." Archives of Clinical Neuropsychology 34, no. 6 (July 25, 2019): 903. http://dx.doi.org/10.1093/arclin/acz034.44.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Objective We investigated the factorial relationship of the six categories of memory errors of the BVRT to the four factorial variables of the WAIS to determine the relationship between cognitive and nonverbal memory variables. Methods A sample of 134 diagnostically mixed ambulatory American Veteran patients with a wide variety of mixed neuropsychiatric diagnoses and with or without general medical problems who had completed the WAIS-3, and the BVRT were examined. There were no demographic or diagnostic exclusion criteria. Results The 6 types of BVRT memory errors (omissions, distortions, perseverations, rotations, misplacements, and size errors) were factored using principal component analysis. The four WAIS 3 and six BVRT components were jointly factored to examine for systematic relationships between memory and cognitive domains. The analysis identified specific factorial relationships of BVRT error type to each of the four factorial components of the WAIS. POI was related to rotation errors, VCI was related to size errors, PSI specifically related to omissions and WMI to distortions. Misplacement and perseveration errors were related to each other but not to factorial constructs of the WAIS. Conclusions There are specific and robust relationships among BVRT errors and dimensional cognitive variables.
26

Wang, Li Min, Xiong Fei Li, and Xue Cheng Wang. "Towards Efficient Dimensionality Reduction for Evolving Bayesian Network Classifier." Advanced Materials Research 108-111 (May 2010): 240–43. http://dx.doi.org/10.4028/www.scientific.net/amr.108-111.240.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dimensionality reduction is useful for improving the performance of Bayesian networks. In this paper we suggest an effective method of modeling categorical and numerical variables of the mixed data with different Bayesian classifiers. Such an approach reduces output sensitivity to input changes by applying feature extraction and selection, and empirical studies on UCI benchmarking data show that our approach has clear advantages with respect to the classification accuracy.
27

Isebor, Obiajulu J., David Echeverría Ciaurri, and Louis J. Durlofsky. "Generalized Field-Development Optimization With Derivative-Free Procedures." SPE Journal 19, no. 05 (March 10, 2014): 891–908. http://dx.doi.org/10.2118/163631-pa.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Summary The optimization of general oilfield development problems is considered. Techniques are presented to simultaneously determine the optimal number and type of new wells, the sequence in which they should be drilled, and their corresponding locations and (time-varying) controls. The optimization is posed as a mixed-integer nonlinear programming (MINLP) problem and involves categorical, integer-valued, and real-valued variables. The formulation handles bound, linear, and nonlinear constraints, with the latter treated with filter-based techniques. Noninvasive derivative-free approaches are applied for the optimizations. Methods considered include branch and bound (B&B), a rigorous global-search procedure that requires the relaxation of the categorical variables; mesh adaptive direct search (MADS), a local pattern-search method; particle swarm optimization (PSO), a heuristic global-search method; and a PSO-MADS hybrid. Four example cases involving channelized-reservoir models are presented. The recently developed PSO-MADS hybrid is shown to consistently outperform the standalone MADS and PSO procedures. In the two cases in which B&B is applied, the heuristic PSO-MADS approach is shown to give comparable solutions but at a much lower computational cost. This is significant because B&B provides a systematic search in the categorical variables. We conclude that, although it is demanding in terms of computation, the methodology presented here, with PSO-MADS as the core optimization method, appears to be applicable for realistic reservoir development and management.
28

Lopez-Arevalo, Ivan, Edwin Aldana-Bobadilla, Alejandro Molina-Villegas, Hiram Galeana-Zapién, Victor Muñiz-Sanchez, and Saul Gausin-Valle. "A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning." Entropy 22, no. 12 (December 9, 2020): 1391. http://dx.doi.org/10.3390/e22121391.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon’s Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.
29

Haikal, Husnul Aris, Aji Hamim Wigena, Kusman Sadik, and Efriwati Efriwati. "Comparison of Discriminant Analysis and Support Vector Machine on Mixed Categorical and Continuous Independent Variables for COVID-19 Patients Data." Scientific Journal of Informatics 11, no. 1 (February 29, 2024): 165–76. http://dx.doi.org/10.15294/sji.v11i1.48565.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Purpose: Numerous factors can affect the duration of COVID-19 recovery. One method involves utilizing natural herbal medication. This study seeks to determine the variables influencing the duration of COVID-19 recovery and to compare discriminant analysis and support vector machine models using COVID-19 patient data from West Sumatra.Methods: Two data mining methods, Discriminant Analysis and Support Vector Machine with different types of kernels (linear, polynomial, and radial basis function), were employed to categorize the time of COVID-19 recovery in this work. The study utilized 428 data points, with 75% allocated for training data and 25% for testing data. The independent factors were evaluated by determining the selection variables' information value (IV) to gauge their influence on the dependent variable. Data resampling techniques were employed to tackle the problem of data imbalance. This study employs data resampling techniques, including undersampling, oversampling, and SMOTE. The balancing accuracy of Discriminant Analysis and Support Vector Machine was examined.Result: The Discriminant Analysis with SMOTE achieved a balanced accuracy of 66.50%, outperforming the linear kernel Support Vector Machine with SMOTE, which had a balanced accuracy of 63.20% in this dataset.Novelty: This study assessed the novelty, originality, and value by comparing Discriminant Analysis and SVM algorithms with categorical and continuous independent variables. This research explores techniques for managing imbalanced data using undersampling, oversampling, and SMOTE, with variable selection based on information value assessment.
30

Sambasivan, Rajiv, and Sourish Das. "Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data." Calcutta Statistical Association Bulletin 70, no. 2 (November 2018): 155–80. http://dx.doi.org/10.1177/0008068318814630.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Datasets with a mixture of numerical and categorical attributes are routinely encountered in many application domains. Such datasets do not have a direct representation in Euclidean space. As a consequence, dissimilarity measures such as the Gower distance are used when partitioning clustering approaches are used with such datasets. Homogeneity analysis (HA) can be used to determine a Euclidean representation of mixed datasets. Such a representation can be analysed by leveraging the large body of tools and techniques for data with a Euclidean representation. The utility of the representation obtained from HA is not limited to clustering. This representation can be used to visualize mixed datasets and generate succinct numerical summaries. Such summaries can yield clues about associations between variables which may be difficult to discover otherwise. AMS Classification Code: 62-07
31

Moch. Abdillah Nafis and Dedy Dwi Prastyo. "Analysis of Freight Mode Characteristics on The Northern Coastline Route using Mixed Data based Cluster Analysis." Proceedings of International Conference on Economics Business and Government Challenges 1, no. 1 (September 14, 2022): 141–51. http://dx.doi.org/10.33005/ic-ebgc.v1i1.39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the increase in the number of vehicles, the maintenance cost on north coast roads (Pantura route) increases because many of its parts are damaged, potholed, and other problems. Mode switching is expected to produce more efficiently, and grouping a mode's characteristics are proven to classify which products need to undergo a modal shift. In this study, a grouping of goods truck modes will be carried out based on the characteristics of the trip, the traveler, and the transportation system on the Northern coastline route to then provide policy recommendations for switching to another mode on the goods truck mode at the Northern coastline route with mostly possible characteristics. The method used for the grouping in this research is clustering analysis, particularly the k-prototype clustering method, and partitions around medoids because the data contains mixed variables, i.e., both categorical and numerical variables. Keywords: Cluster Analysis; Freight Transport; Mixed Data; Transportation.
32

Nielsen, Kuniko. "Categories in speech perception and production: How closely are they related?" Journal of the Acoustical Society of America 151, no. 4 (April 2022): A265. http://dx.doi.org/10.1121/10.0011286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A large body of research shows that fine phonetic details are not only used by listeners in processing speech (e.g., McMurray et al., 2009), but these details also affect listeners’ subsequent speech productions (e.g., Goldinger, 1998), suggesting a robust link between speech perception and production. However, evidence for a direct perception-production link is mixed, and the relationship between categories in perception and production is largely unknown. Newman (2003) found correlation between a perceptual prototype and mean VOT production for /pa/, while there were no production-perception correlations for other stop consonants. Nielsen (2021) found that perceptual boundaries in VOT vary widely across speakers, but that there is no apparent link between categorical boundary in perception of voicing contrast ([p]-[b]) and production variables of isolated speech (e.g., mean VOT for /p/ or /b/, the center of gap between two categories, the distance between the two category means), confirming Bailey and Haggard (1973). The current study further explores the relationship between perception and production categories by examining categorical boundaries of English stops using minimal pairs (e.g., pear-bear), their perceptual prototypes through a goodness rating task, and production variables in connected and isolated speech.
33

Grané, Aurea, and Rosario Romera. "On Visualizing Mixed-Type Data." Sociological Methods & Research 47, no. 2 (January 5, 2016): 207–39. http://dx.doi.org/10.1177/0049124115621334.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area. Nevertheless, sensitivity and robustness of MDS configurations have been topics scarcely addressed in the specialized literature. In this work, we are interested in the construction of robust profiles for mixed-type data using a proper MDS configuration. To this end, we propose to compare different MDS configurations (coming from different metrics) through a combination of sensitivity and robust analysis. In particular, as an alternative to classical Gower’s metric, we propose a robust joint metric combining different distance matrices, avoiding redundant information, via related metric scaling. The search for robustness and identification of outliers is done through a distance-based procedure related to geometric variability notions. In this sense, we propose a statistic for detecting multivariate outliers in the context of mixed-type data and evaluate its performance through a simulation study. Finally, we apply these techniques to a real data set provided by the largest humanitarian organization involved in social programs in Spain, where we are able to find in a robust way the most relevant factors defining the profiles of people that were under risk of being socially excluded in the beginning of the 2008 economic crisis.
34

Costa, Emanuel Arnoni, André Felipe Hess, César Augusto Guimarães Finger, Cristine Tagliapietra Schons, Danieli Regina Klein, Lorena Oliveira Barbosa, Geedre Adriano Borsoi, Veraldo Liesenberg, and Polyanna da Conceição Bispo. "Enhancing Height Predictions of Brazilian Pine for Mixed, Uneven-Aged Forests Using Artificial Neural Networks." Forests 13, no. 8 (August 13, 2022): 1284. http://dx.doi.org/10.3390/f13081284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Artificial intelligence (AI) seeks to simulate the human ability to reason, make decisions, and solve problems. Several AI methodologies have been introduced in forestry to reduce costs and increase accuracy in estimates. We evaluate the performance of Artificial Neural Networks (ANN) in estimating the heights of Araucaria angustifolia (Bertol.) Kuntze (Brazilian pine) trees. The trees are growing in Uneven-aged Mixed Forests (UMF) in southern Brazil and are under different levels of competition. The dataset was divided into training and validation sets. Multi-layer Perceptron (MLP) networks were trained under different Data Normalization (DN) procedures, Neurons in the Hidden Layer (NHL), and Activation Functions (AF). The continuous input variables were diameter at breast height (DBH) and height at the base of the crown (HCB). As a categorical input variable, we consider the sociological position of the trees (dominant–SP1 = 1; codominant–SP2 = 2; and dominated–SP3 = 3), and the continuous output variable was the height (h). In the hidden layer, the number of neurons varied from 3 to 9. Results show that there is no influence of DN in the ANN accuracy. However, the increase in NHL above a certain level caused the model’s over-fitting. In this regard, around 6 neurons stood out, combined with logistic sigmoid AF in the intermediate layer and identity AF in the output layer. Considering the best selected network, the following values of statistical criteria were obtained for the training dataset (R2 = 0.84; RMSE = 1.36 m, and MAPE = 6.29) and for the validation dataset (R2 = 0.80; RMSE = 1.49 m, and MAPE = 6.53). The possibility of using categorical and numerical variables in the same modeling has been motivating the use of AI techniques in different forestry applications. The ANN presented generalization and consistency regarding biological realism. Therefore, we recommend caution when determining DN, amount of NHL, and using AF during modeling. We argue that such techniques show great potential for forest management procedures and are suggested in other similar environments.
35

Zhang, Zhenyu, Akihiko Nishimura, Nídia S. Trovão, Joshua L. Cherry, Andrew J. Holbrook, Xiang Ji, Philippe Lemey, and Marc A. Suchard. "Accelerating Bayesian inference of dependency between mixed-type biological traits." PLOS Computational Biology 19, no. 8 (August 28, 2023): e1011419. http://dx.doi.org/10.1371/journal.pcbi.1011419.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck—integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.
36

Gironell, Alexandre, Berta Pascual-Sedano, Ignacio Aracil, Juan Marín-Lahoz, Javier Pagonabarraga, and Jaime Kulisevsky. "Tremor Types in Parkinson Disease: A Descriptive Study Using a New Classification." Parkinson's Disease 2018 (September 30, 2018): 1–5. http://dx.doi.org/10.1155/2018/4327597.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background. The current classification of tremor types in Parkinson disease (PD) is potentially confusing, particularly for mixed tremor, and there is no label for pure resting tremor. With a view to better defining the clinical phenomenological classification of these tremors, our group relabeled the different types as follows: pure resting tremor (type I); mixed resting and action tremor with similar frequencies (type II) divided, according to action tremor presentation, into II-R when there is a time lag and II-C otherwise; pure action tremor (type III); and mixed resting and action tremor with differing frequencies (type IV). We performed a descriptive study to determine prevalence and clinical correlates for this new tremor classification. Patient/Methods. A total of 315 consecutively recruited patients with PD and tremor were clinically evaluated. X2 tests were used to assess tremor type associations with categorical variables, namely, sex, family history of PD, motor fluctuations, and anticholinergic and beta-blocker use. With tremor type as the independent variable, ANOVA was performed to study the relationship between dependent quantitative variables, namely, age, age at PD diagnosis, disease duration, and UPDRS scores for rigidity. Results. The studied patients had tremor types as follows: type I, 30%; type II, 50% (II-R, 25% and II-C, 25%); type III, 19%; and type IV, 1%. No significant association was found between the studied clinical variables and tremor types. Conclusions. Mixed tremor was the most common tremor type in our series of patients with PD according to our proposed classification, which we hope will enhance understanding of the broad clinical phenomenology of PD.
37

Duarte, Belmiro P. M. "Exact Optimal Designs of Experiments for Factorial Models via Mixed-Integer Semidefinite Programming." Mathematics 11, no. 4 (February 7, 2023): 854. http://dx.doi.org/10.3390/math11040854.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The systematic design of exact optimal designs of experiments is typically challenging, as it results in nonconvex optimization problems. The literature on the computation of model-based exact optimal designs of experiments via mathematical programming, when the covariates are categorical variables, is still scarce. We propose mixed-integer semidefinite programming formulations, to find exact D-, A- and I-optimal designs for linear models, and locally optimal designs for nonlinear models when the design domain is a finite set of points. The strategy requires: (i) the generation of a set of candidate treatments; (ii) the formulation of the optimal design problem as a mixed-integer semidefinite program; and (iii) its solution, employing appropriate solvers. For comparison, we use semidefinite programming-based formulations to find equivalent approximate optimal designs. We demonstrate the application of the algorithm with various models, considering both unconstrained and constrained setups. Equivalent approximate optimal designs are used for comparison.
38

Rachwał, Alicja, Emilia Popławska, Izolda Gorgol, Tomasz Cieplak, Damian Pliszczuk, Łukasz Skowron, and Tomasz Rymarczyk. "Determining the Quality of a Dataset in Clustering Terms." Applied Sciences 13, no. 5 (February 24, 2023): 2942. http://dx.doi.org/10.3390/app13052942.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The purpose of the theoretical considerations and research conducted was to indicate the instruments with which the quality of a dataset can be verified for the segmentation of observations occurring in the dataset. The paper proposes a novel way to deal with mixed datasets containing categorical and continuous attributes in a customer segmentation task. The categorical variables were embedded using an innovative unsupervised model based on an autoencoder. The customers were then divided into groups using different clustering algorithms, based on similarity matrices. In addition to the classic k-means method and the more modern DBSCAN, three graph algorithms were used: the Louvain algorithm, the greedy algorithm and the label propagation algorithm. The research was conducted on two datasets: one containing on retail customers and the other containing wholesale customers. The Calinski–Harabasz index, Davies–Bouldins index, NMI index, Fowlkes–Mallows index and silhouette score were used to assess the quality of the clustering. It was noted that the modularity parameter for graph methods was a good indicator of whether a given set could be meaningfully divided into groups.
39

Apostolou, Konstantinos, Alexandra Staikou, Smaragda Sotiraki, and Marianthi Hatziioannou. "An Assessment of Snail-Farm Systems Based on Land Use and Farm Components." Animals 11, no. 2 (January 21, 2021): 272. http://dx.doi.org/10.3390/ani11020272.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this study, the structural and management characteristics of snail farms in Greece were analyzed to maximize sustainable food production. Objectives, such as the classification of farming systems and assessing the effects of various annual production parameters, were investigated. Data were collected (2017) via a questionnaire, and sampling was conducted in 29 snail farms dispersed in six different regions (Thrace, Central Macedonia, West Macedonia, Thessaly, Western Greece, and the Attica Islands). Descriptive statistics for continuous variables and frequencies for categorical variables were calculated. The similarity between farms was analyzed using nonmetric multidimensional scaling (nMDS). The average farm operation duration exceeded eight months and the mean annual production was 1597 kg of fresh, live snails. Results recorded five farming systems: elevated sections (7%), net-covered greenhouse (38%), a mixed system with a net-covered greenhouse (10%), open field (38%), and mixed system with an open field (7%). Snail farms differ in the type of substrate, available facilities, and equipment (60% similarity between most of the open field farms). The geographical location of a farms’ settlement affects productivity but also influences the duration of operation, especially in open field farms, due to their operation under a wide assortment of climatic types.
40

Osterrieder, Anne, Giulia Cuman, Wirichada Pan-Ngum, Phaik Kin Cheah, Phee-Kheng Cheah, Pimnara Peerawaranun, Margherita Silan, et al. "Economic and social impacts of COVID-19 and public health measures: results from an anonymous online survey in Thailand, Malaysia, the UK, Italy and Slovenia." BMJ Open 11, no. 7 (July 2021): e046863. http://dx.doi.org/10.1136/bmjopen-2020-046863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
ObjectivesTo understand the impact of COVID-19 and public health measures on different social groups, we conducted a mixed-methods study in five countries (‘SEBCOV—social, ethical and behavioural aspects of COVID-19’). Here, we report the results of the online survey.Study design and statistical analysisOverall, 5058 respondents from Thailand, Malaysia, the UK, Italy and Slovenia completed the self-administered survey between May and June 2020. Poststratification weighting was applied, and associations between categorical variables assessed. Frequency counts and percentages were used to summarise categorical data. Associations between categorical variables were assessed using Pearson’s χ2 test. Data were analysed in Stata 15.0ResultsAmong the five countries, Thai respondents reported having been most, and Slovenian respondents least, affected economically. The following factors were associated with greater negative economic impacts: being 18–24 years or 65 years or older; lower education levels; larger households; having children under 18 in the household and and having flexible/no income. Regarding social impact, respondents expressed most concern about their social life, physical health, mental health and well-being.There were large differences between countries in terms of voluntary behavioural change, and in compliance and agreement with COVID-19 restrictions. Overall, self-reported compliance was higher among respondents who self-reported a high understanding of COVID-19. UK respondents felt able to cope the longest and Thai respondents the shortest with only going out for essential needs or work. Many respondents reported seeing news perceived to be fake, the proportion varying between countries, with education level and self-reported levels of understanding of COVID-19.ConclusionsOur data showed that COVID-19 and public health measures have uneven economic and social impacts on people from different countries and social groups. Understanding the factors associated with these impacts can help to inform future public health interventions and mitigate their negative consequences.Trial registration numberTCTR20200401002.
41

Nix, John-Michael L. "Cluster and Time-Series Analyses of Computer-Assisted Pronunciation Training Users." International Journal of Computer-Assisted Language Learning and Teaching 4, no. 1 (January 2014): 1–20. http://dx.doi.org/10.4018/ijcallt.2014010101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The present study utilized hierarchical agglomerative cluster (HAC) analysis to categorize users of a popular, web-based computer-assisted pronunciation training (CAPT) program into user types using activity log data. Results indicate an optimal grouping of four types: Reluctant, Point-focused, Optimal, and Engaged. Clustering was determined by aggregate data on seven indicator variables of mixed types (e.g., ratio, continuous, and categorical). It was found that measurements of effort: lines recorded and episodic effort served best to distinguish the user types. Subsequent time-series analysis of cluster members showed that groupings exhibited distinct trends in learning behavior which explain performance outcomes. Four waves of data were collected during one semester of EFL instruction wherein CAPT usage partially fulfilled course requirements. This study follows an exploratory, data-driven approach. In addition to the findings above, suggestions for future research into interactions between individual differences variables and CALL platforms are made.
42

Maneiro, Rubén, José Luís Losada, Mariona Portell, and Antonio Ardá. "Observational Analysis of Corner Kicks in High-Level Football: A Mixed Methods Study." Sustainability 13, no. 14 (July 6, 2021): 7562. http://dx.doi.org/10.3390/su13147562.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Corner kicks are one of the most important set pieces in high-level football. The present study aimed to analyze the evolution of the tactical approach to corner kicks in high-performance football. For this, a total of 1704 corner kicks executed in the 192 matches corresponding to the 2010, 2014 and 2018 FIFA World Cups were analyzed. To achieve the proposed objectives, the observational methodology was used. The results show an evolution in the mode of execution of these actions, but instead the success rate remains low. The log-linear test allowed to find significant relationships between some of the most important categorical variables in these actions: match status, number of intervening attackers and time. The decision tree models show that the number of players involved in these actions is the criterion that presents the greatest information gain. These results corroborate previous multivariate studies, although more research is still needed. Finally, the results of the present study can be used by coaches to create different training situations where success in this type of action can be enhanced.
43

Clark, Lisa B., Eduardo González, Annie L. Henry, and Anna A. Sher. "A Solution to Treat Mixed-Type Human Datasets from Socio-Ecological Systems." Journal of Environmental Geography 13, no. 3-4 (November 1, 2020): 51–60. http://dx.doi.org/10.2478/jengeo-2020-0012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Coupled human and natural systems (CHANS) are frequently represented by large datasets with varied data including continuous, ordinal, and categorical variables. Conventional multivariate analyses cannot handle these mixed data types. In this paper, our goal was to show how a clustering method that has not before been applied to understanding the human dimension of CHANS: a Gower dissimilarity matrix with partitioning around medoids (PAM) can be used to treat mixed-type human datasets. A case study of land managers responsible for invasive plant control projects across rivers of the southwestern U.S. was used to characterize managers’ backgrounds and decisions, and project properties through clustering. Results showed that managers could be classified as “federal multitaskers” or as “educated specialists”. Decisions were characterized by being either “quick and active” or “thorough and careful”. Project goals were either comprehensive with ecological goals or more limited in scope. This study shows that clustering with Gower and PAM can simplify the complex human dimension of this system, demonstrating the utility of this approach for systems frequently composed of mixed-type data such as CHANS. This clustering approach can be used to direct scientific recommendations towards homogeneous groups of managers and project types.
44

Lateef, Manal A., and M. I. Lone. "Histomorphology of germ cell tumors at various anatomic sites: a 5 years study at a tertiary care centre." International Journal of Research in Medical Sciences 9, no. 10 (September 28, 2021): 3079. http://dx.doi.org/10.18203/2320-6012.ijrms20213936.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background: Germ cell tumors (GCTs) are a heterogeneous group of neoplasms, which occur in the gonads, and at extra gonadal sites of the body. The aim of the study was to observe the different histopathological patterns of various GCTs in the body at all possible sites and to know their IHC staining patterns.Methods: The study was conducted for a period of 5 years from 2015 to 2019 and was an observational study. The recorded data was compiled and entered in a spreadsheet and then exported to data editor of SPSS Version 20.0. Continuous variables were expressed as mean SD and categorical variables were summarized as frequencies and percentages. Graphically the data was presented by bar and pie diagrams Chi-square test or Fisher’s exact test was applied for comparing categorical values. P<0.05 was considered statistically significant. All p values were 2 tailed.Results: A total of 93 cases were analyzed and the mean age of the patients was 27.8 years. Mature cystic teratoma was the most common histopathological variant and was mostly seen in the ovaries. There was a difference in age predilection of benign and malignant tumors. Most of the malignant GCTs were gonadal while EGCTs were likely to be benign. MGCTs (mixed GCTs) were mostly testicular in origin with only one MGCT being extragonadal.Conclusions: Mature cystic teratomas were the most frequent GCTs with frequent site being in ovaries. Out 0f 18 EGCTs only 2 were malignant, rest all were mature cystic teratomas.
45

Nguyen, Tho, Ladda Thiamwong, Qian Lou, and Rui Xie. "Unveiling Fall Triggers in Older Adults: A Machine Learning Graphical Model Analysis." Mathematics 12, no. 9 (April 23, 2024): 1271. http://dx.doi.org/10.3390/math12091271.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
While existing research has identified diverse fall risk factors in adults aged 60 and older across various areas, comprehensively examining the interrelationships between all factors can enhance our knowledge of complex mechanisms and ultimately prevent falls. This study employs a novel approach—a mixed undirected graphical model (MUGM)—to unravel the interplay between sociodemographics, mental well-being, body composition, self-assessed and performance-based fall risk assessments, and physical activity patterns. Using a parameterized joint probability density, MUGMs specify the higher-order dependence structure and reveals the underlying graphical structure of heterogeneous variables. The MUGM consisting of mixed types of variables (continuous and categorical) has versatile applications that provide innovative and practical insights, as it is equipped to transcend the limitations of traditional correlation analysis and uncover sophisticated interactions within a high-dimensional data set. Our study included 120 elders from central Florida whose 37 fall risk factors were analyzed using an MUGM. Among the identified features, 34 exhibited pairwise relationships, while COVID-19-related factors and housing composition remained conditionally independent from all others. The results from our study serve as a foundational exploration, and future research investigating the longitudinal aspects of these features plays a pivotal role in enhancing our knowledge of the dynamics contributing to fall prevention in this population.
46

Luo, Han. "Foreign Language Anxiety: Past and Future." Chinese Journal of Applied Linguistics 36, no. 4 (October 22, 2013): 442–64. http://dx.doi.org/10.1515/cjal-2013-0030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract This paper gives a comprehensive review of studies on foreign language anxiety. Foreign language anxiety has been recognized in the past few decades as a situation-specific emotional reaction that potentially impedes foreign language learning. Research has shown that foreign language anxiety is not only prevalent among foreign language learners, but also has various negative effects on foreign language learning. In order to help learners cope with this problem, researchers have identified a large number of sources of foreign language anxiety, which generally fall into four major categories, namely, the classroom environment, learner characteristics, the target language, and the foreign language learning process itself. Researchers have also investigated quite a number of factors associated with foreign language anxiety (including categorical background variables and quantitative learner variables) and have produced mixed results. Based on a thorough review of foreign language anxiety, the paper concludes with recommendations for future studies on foreign language anxiety.
47

Zheng, Xiaohui, and Sophia Rabe-Hesketh. "Estimating Parameters of Dichotomous and Ordinal Item Response Models with Gllamm." Stata Journal: Promoting communications on statistics and Stata 7, no. 3 (September 2007): 313–33. http://dx.doi.org/10.1177/1536867x0700700302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Item response theory models are measurement models for categorical responses. Traditionally, the models are used in educational testing, where responses to test items can be viewed as indirect measures of latent ability. The test items are scored either dichotomously (correct–incorrect) or by using an ordinal scale (a grade from poor to excellent). Item response models also apply equally for measurement of other latent traits. Here we describe the one- and two-parameter logit models for dichotomous items, the partial-credit and rating scale models for ordinal items, and an extension of these models where the latent variable is regressed on explanatory variables. We show how these models can be expressed as generalized linear latent and mixed models and fitted by using the user-written command gllamm.
48

Finch, W. Holmes. "Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis." Educational and Psychological Measurement 80, no. 2 (July 31, 2019): 217–41. http://dx.doi.org/10.1177/0013164419865769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Exploratory factor analysis (EFA) is widely used by researchers in the social sciences to characterize the latent structure underlying a set of observed indicator variables. One of the primary issues that must be resolved when conducting an EFA is determination of the number of factors to retain. There exist a large number of statistical tools designed to address this question, with none being universally optimal across applications. Recently, researchers have investigated the use of model fit indices that are commonly used in the conduct of confirmatory factor analysis to determine the number of factors to retain in EFA. These results have yielded mixed results, appearing to be effective when used in conjunction with normally distributed indicators, but not being as effective for categorical indicators. The purpose of this simulation study was to compare the performance of difference values for several fit indices as a method for identifying the optimal number of factors to retain in an EFA, with parallel analysis, which is one of the most reliable such extant methods. Results of the simulation demonstrated that the use of fit index difference values outperformed parallel analysis for categorical indicators, and for normally distributed indicators when factor loadings were small. Implications of these findings are discussed.
49

Crayen, Claudia, Michael Eid, Tanja Lischetzke, and Jeroen K. Vermunt. "A Continuous-Time Mixture Latent-State-Trait Markov Model for Experience Sampling Data." European Journal of Psychological Assessment 33, no. 4 (July 2017): 296–311. http://dx.doi.org/10.1027/1015-5759/a000418.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract. In psychological research, statistical models of latent state-trait (LST) theory are popular for the analysis of longitudinal data. We identify several limitations of available models when applied to intensive longitudinal data with categorical observed and latent variables and inter- and intraindividually varying time intervals. As an extension of available LST models for categorical data, we describe a general mixed continuous-time LST model that is suitable for intensive longitudinal data with unobserved heterogeneity and individually varying time intervals. This model is illustrated by an application to momentary mood data that were collected in an experience sampling study (N = 164). In addition, the results of a simulation study are reported that was conducted to find out (a) the minimal data requirements with respect to sample size and number of occasions, and (b) how strong the bias is if the continuous-time structure is ignored. The empirical application revealed two classes for which the transition pattern and effects of time-varying covariates differ. In the simulation study, only small differences between the continuous-time model and its discrete-time counterpart emerged. Sample sizes N = 100 and larger in combination with six or more occasions of measurement tended to produce reliable estimation results. Implications of the models for future research are discussed.
50

Albuquerque, Pedro, Gisela Demo, Solange Alfinito, and Kesia Rozzett. "Bayesian factor analysis for mixed data on management studies." RAUSP Management Journal 54, no. 4 (October 14, 2019): 430–45. http://dx.doi.org/10.1108/rausp-05-2019-0108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Purpose Factor analysis is the most used tool in organizational research and its widespread use in scale validations contribute to decision-making in management. However, standard factor analysis is not always applied correctly mainly due to the misuse of ordinal data as interval data and the inadequacy of the former for classical factor analysis. The purpose of this paper is to present and apply the Bayesian factor analysis for mixed data (BFAMD) in the context of empirical using the Bayesian paradigm for the construction of scales. Design/methodology/approach Ignoring the categorical nature of some variables often used in management studies, as the popular Likert scale, may result in a model with false accuracy and possibly biased estimates. To address this issue, Quinn (2004) proposed a Bayesian factor analysis model for mixed data, which is capable of modeling ordinal (qualitative measure) and continuous data (quantitative measure) jointly and allows the inclusion of qualitative information through prior distributions for the parameters’ model. This model, adopted here, presents considering advantages and allows the estimation of the posterior distribution for the latent variables estimated, making the process of inference easier. Findings The results show that BFAMD is an effective approach for scale validation in management studies making both exploratory and confirmatory analyses possible for the estimated factors and also allowing the analysts to insert a priori information regardless of the sample size, either by using the credible intervals for Factor Loadings or by conducting specific hypotheses tests. The flexibility of the Bayesian approach presented is counterbalanced by the fact that the main estimates used in factor analysis as uniqueness and communalities commonly lose their usual interpretation due to the choice of using prior distributions. Originality/value Considering that the development of scales through factor analysis aims to contribute to appropriate decision-making in management and the increasing misuse of ordinal scales as interval in organizational studies, this proposal seems to be effective for mixed data analyses. The findings found here are not intended to be conclusive or limiting but offer a useful starting point from which further theoretical and empirical research of Bayesian factor analysis can be built.

To the bibliography