To see the other types of publications on this topic, follow the link: Categorical method.

Journal articles on the topic 'Categorical method'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Categorical method.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Oh, Seung-Joon, and Jae-Yearn Kim. "A Scalable Clustering Method for Categorical Sequences." Journal of Korean Institute of Intelligent Systems 14, no. 2 (April 1, 2004): 136–41. http://dx.doi.org/10.5391/jkiis.2004.14.2.136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Giordan, Marco, and Giancarlo Diana. "A Clustering Method for Categorical Ordinal Data." Communications in Statistics - Theory and Methods 40, no. 7 (March 8, 2011): 1315–34. http://dx.doi.org/10.1080/03610920903581010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Baba, Yasumasa. "Graphical prediction method based on categorical data." Computational Statistics & Data Analysis 5, no. 2 (May 1987): 85–101. http://dx.doi.org/10.1016/0167-9473(87)90034-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Seman, Ali, Zainab Abu Bakar, Azizian Mohd. Sapa, and Ida Rosmini Othman. "A Medoid-based Method for Clustering Categorical Data." Journal of Artificial Intelligence 6, no. 4 (September 15, 2013): 257–65. http://dx.doi.org/10.3923/jai.2013.257.265.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

OH, SEUNG-JOON, and JAE-YEARN KIM. "A SCALABLE CLUSTERING METHOD FOR CATEGORICAL SEQUENCE DATA." International Journal of Computational Methods 02, no. 02 (June 2005): 167–80. http://dx.doi.org/10.1142/s0219876205000417.

Full text
Abstract:
Clustering of sequences is relatively less explored but it is becoming increasingly important in data mining applications such as web usage mining and bioinformatics. The web user segmentation problem uses web access log files to partition a set of users into clusters such that users within one cluster are more similar to one another than to the users in other clusters. Similarly, grouping protein sequences that share a similar structure can help to identify sequences with similar functions. However, few clustering algorithms consider sequentiality. In this paper, we study how to cluster sequence datasets. Due to the high computational complexity of hierarchical clustering algorithms for clustering large datasets, a new clustering method is required. Therefore, we propose a new scalable clustering method using sampling and a k-nearest-neighbor method. Using a splice dataset and a synthetic dataset, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional algorithms.
APA, Harvard, Vancouver, ISO, and other styles
6

He, Zengyou, Xiaofei Xu, and Shengchun Deng. "A cluster ensemble method for clustering categorical data." Information Fusion 6, no. 2 (June 2005): 143–51. http://dx.doi.org/10.1016/j.inffus.2004.03.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cao, Fuyuan, Jiye Liang, and Liang Bai. "A new initialization method for categorical data clustering." Expert Systems with Applications 36, no. 7 (September 2009): 10223–28. http://dx.doi.org/10.1016/j.eswa.2009.01.060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cao, Fuyuan, and Jiye Liang. "A data labeling method for clustering categorical data." Expert Systems with Applications 38, no. 3 (March 2011): 2381–85. http://dx.doi.org/10.1016/j.eswa.2010.08.026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hargrove, William W., Forrest M. Hoffman, and Paul F. Hessburg. "Mapcurves: a quantitative method for comparing categorical maps." Journal of Geographical Systems 8, no. 2 (May 12, 2006): 187–208. http://dx.doi.org/10.1007/s10109-006-0025-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Moskaliuk, S. S. "Method of categorical extension of Cayley-Klein groups." Czechoslovak Journal of Physics 55, no. 11 (November 2005): 1495–501. http://dx.doi.org/10.1007/s10582-006-0031-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Naouali, Sami, Semeh Ben Salem, and Zied Chtourou. "Clustering Categorical Data: A Survey." International Journal of Information Technology & Decision Making 19, no. 01 (January 2020): 49–96. http://dx.doi.org/10.1142/s0219622019300064.

Full text
Abstract:
Clustering is a complex unsupervised method used to group most similar observations of a given dataset within the same cluster. To guarantee high efficiency, the clustering process should ensure high accuracy and low complexity. Many clustering methods were developed in various fields depending on the type of application and the data type considered. Categorical clustering considers segmenting a dataset in which the data are categorical and were widely used in many real-world applications. Thus several methods were developed including hard, fuzzy and rough set-based methods. In this survey, more than 30 categorical clustering algorithms were investigated. These methods were classified into hierarchical and partitional clustering methods and classified in terms of their accuracy, precision and recall to identify the most prominent ones. Experimental results show that rough set-based clustering methods provided better efficiency than hard and fuzzy methods. Besides, methods based on the initialization of the centroids also provided good results.
APA, Harvard, Vancouver, ISO, and other styles
12

Dirisinapu, Lakshmi Sreenivasareddy, Krishna Murthy Mudumbi, and Govardhan Aliseri. "Outlier Analysis of Categorical Data Using Infrequency." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 8, no. 3 (June 30, 2013): 868–73. http://dx.doi.org/10.24297/ijct.v8i3.3397.

Full text
Abstract:
Anomalies are those objects, which will act with different behavior and do not follow with the remaining records in the databases. Detecting anomalies is an important issue in many fields. Though many methods are available to detect anomalies in numerical datasets, only a few methods are available for categorical datasets. In this work, a new method has been proposed. This algorithm finds anomalies based on infrequent itemsets in each record. These outliers are generated by Apriori property on each record values in datasets. Previous methods may not distinguish different records with the same frequency. These give same score for each record. For each record a score is generated based on infrequent itemsets which is called MAD score in this paper. This algorithm utilizes the frequency of each value in the dataset. FPOF method is used the concept of frequent itemset and otey method used infrequent itemset. But these cannot distinguish records perfectly. The proposed algorithm has been applied on Nursery dataset and Bank dataset taken from “UCI Machine Learning Repository”. Numerical attributes are excluded from Datasets for this analysis. The experimental results show that it is efficient for outlier detection in categorical dataset.
APA, Harvard, Vancouver, ISO, and other styles
13

Onoghojobi, B. "Hill Climbing Method Using Claus Model for Categorical Data." Journal of Mathematics and Statistics 5, no. 4 (April 1, 2009): 375–78. http://dx.doi.org/10.3844/jmssp.2009.375.378.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Gubich, Olga V., and Konstantin V. Zashcholkin. "MODIFICATION OF CATEGORICAL-ZONE METHOD FOR DIGITAL WATERMARKS EMBEDDING." ELECTRICAL AND COMPUTER SYSTEMS 19, no. 95 (July 2, 2015): 262–65. http://dx.doi.org/10.15276/eltecs.19.95.2015.58.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Yoon, Yong-Hwa, and Bo-Seung Choi. "Model selection method for categorical data with non-response." Journal of the Korean Data and Information Science Society 23, no. 4 (July 31, 2012): 627–41. http://dx.doi.org/10.7465/jkdi.2012.23.4.627.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Timofeyeva, Galina Adolfovna, and Dmitry Vladimirovich Bondarchuk. "Mathematical Foundations of categorical vector method in data mining." Herald of the Ural State University of Railway Transport, no. 4 (2015): 4–8. http://dx.doi.org/10.20291/2079-0392-2015-4-4-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Попей-олл, С., and S. Popey-oll. "The Theory of Self-Identification and Categorical Research Method." Scientific Research and Development. Socio-Humanitarian Research and Technology 8, no. 1 (March 27, 2019): 17–25. http://dx.doi.org/10.12737/article_5c8f49fbadbbc6.99681771.

Full text
Abstract:
This article presents a categorical method for analyzing the complex processes of personal identity. Human experiences are a result of conscious generalizations that dominate culture and are fixed in semantic categories. The rapid transformation of society fragments a life into many identifying parameters. Therefore, «a self-concept» and a semantic category of being may not be consistent with each other. The harmonious level of self-organization is manifested in the sensory coherence of people: an intention and an expectation. And fragmentation is a chaos of self-awareness and loss of an emotional stability. In a complicating society, the identity of a person becomes multiple and ambiguous. These studies will determine not only the social level of human self-organization, but begin the search for a method to maintain them. The article attempts to consider a categorical method for analyzing the self-identification properties of a people.
APA, Harvard, Vancouver, ISO, and other styles
18

Cheng, Li, Yijie Wang, and Xingkong Ma. "A Neural Probabilistic outlier detection method for categorical data." Neurocomputing 365 (November 2019): 325–35. http://dx.doi.org/10.1016/j.neucom.2019.07.069.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Rezaee, Hassan, and Denis Marcotte. "Calibration of categorical simulations by evolutionary gradual deformation method." Computational Geosciences 22, no. 2 (January 16, 2018): 587–605. http://dx.doi.org/10.1007/s10596-017-9711-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

te Grotenhuis, Manfred, Ben Pelzer, Rob Eisinga, Rense Nieuwenhuis, Alexander Schmidt-Catran, and Ruben Konig. "A novel method for modelling interaction between categorical variables." International Journal of Public Health 62, no. 3 (October 28, 2016): 427–31. http://dx.doi.org/10.1007/s00038-016-0902-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Bai, Liang, Jiye Liang, Chuangyin Dang, and Fuyuan Cao. "A cluster centers initialization method for clustering categorical data." Expert Systems with Applications 39, no. 9 (July 2012): 8022–29. http://dx.doi.org/10.1016/j.eswa.2012.01.131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Johnson, J. Michael, and Keith C. Clarke. "An area preserving method for improved categorical raster resampling." Cartography and Geographic Information Science 48, no. 4 (April 15, 2021): 292–304. http://dx.doi.org/10.1080/15230406.2021.1892531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Kostek, Bożena, Piotr Odya, and Piotr Suchomski. "Loudness Scaling Test Based on Categorical Perception." Archives of Acoustics 41, no. 4 (December 1, 2016): 637–48. http://dx.doi.org/10.1515/aoa-2016-0061.

Full text
Abstract:
Abstract The main goal of this research study is focused on creating a method for loudness scaling based on categorical perception. Its main features, such as: way of testing, calibration procedure for securing reliable results, employing natural test stimuli, etc., are described in the paper and assessed against a procedure that uses 1/2-octave bands of noise (LGOB) for the loudness growth estimation. The Mann-Whitney U-test is employed to check whether the proposed method is statistically equivalent to LGOB. It is shown that loudness functions obtained in both methods are similar in the statistical context. Moreover, the band-filtered musical instrument signals are experienced as more pleasant than the narrow-band noise stimuli and the proposed test is performed in a shorter time. The method proposed may be incorporated into fitting hearing strategies or used for checking individual loudness growth functions and adapting them to the comfort level settings while listening to music.
APA, Harvard, Vancouver, ISO, and other styles
24

von Eye, Alexander, Christof Schuster, and William M. Rogers. "Modelling Synergy using Manifest Categorical Variables." International Journal of Behavioral Development 22, no. 3 (September 1998): 537–57. http://dx.doi.org/10.1080/016502598384261.

Full text
Abstract:
This paper discusses methods to model the concept of synergy at the level of manifest categorical variables. First, a classification of concepts of synergy is presented. A dditive and nonadditive concepts of synergy are distinguished. Most prominent among the nonadditive concepts is superadditive synergy. Examples are given from the natural sciences and the social sciences. M delling focuses on the relationship between the agents involved in a synergetic process. These relationships are expressed in form of contrasts, expressed in effect coding vectors in design matrices for nonstandard log-linear models. A method by Schuster is used to transform design matrices such that parameters reflect the proposed relationships. A n example reanalyses data presented by Bishop, Fienberg, and Holland (1975) that describe the development of thromboembolisms in women who differ in their patterns of contraceptive use and smoking. Alternative methods of analysis are com pared. Implications for developmental research are discussed.
APA, Harvard, Vancouver, ISO, and other styles
25

Yang, Yanyun, and Yan Xia. "Categorical Omega With Small Sample Sizes via Bayesian Estimation: An Alternative to Frequentist Estimators." Educational and Psychological Measurement 79, no. 1 (January 18, 2018): 19–39. http://dx.doi.org/10.1177/0013164417752008.

Full text
Abstract:
When item scores are ordered categorical, categorical omega can be computed based on the parameter estimates from a factor analysis model using frequentist estimators such as diagonally weighted least squares. When the sample size is relatively small and thresholds are different across items, using diagonally weighted least squares can yield a substantially biased estimate of categorical omega. In this study, we applied Bayesian estimation methods for computing categorical omega. The simulation study investigated the performance of categorical omega under a variety of conditions through manipulating the scale length, number of response categories, distributions of the categorical variable, heterogeneities of thresholds across items, and prior distributions for model parameters. The Bayes estimator appears to be a promising method for estimating categorical omega. M plus and SAS codes for computing categorical omega were provided.
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Chengyuan, and Carol Anne Hargreaves. "Topological Machine Learning for Mixed Numeric and Categorical Data." International Journal on Artificial Intelligence Tools 30, no. 05 (August 2021): 2150025. http://dx.doi.org/10.1142/s0218213021500251.

Full text
Abstract:
Topological data analysis is a relatively new branch of machine learning that excels in studying high-dimensional data, and is theoretically known to be robust against noise. Meanwhile, data objects with mixed numeric and categorical attributes are ubiquitous in real-world applications. However, topological methods are usually applied to point cloud data, and to the best of our knowledge there is no available framework for the classification of mixed data using topological methods. In this paper, we propose a novel topological machine learning method for mixed data classification. In the proposed method, we use theory from topological data analysis such as persistent homology, persistence diagrams and Wasserstein distance to study mixed data. The performance of the proposed method is demonstrated by experiments on a real-world heart disease dataset. Experimental results show that our topological method outperforms several state-of-the-art algorithms in the prediction of heart disease.
APA, Harvard, Vancouver, ISO, and other styles
27

Nguyen, Dang, Sunil Gupta, Santu Rana, Alistair Shilton, and Svetha Venkatesh. "Bayesian Optimization for Categorical and Category-Specific Continuous Inputs." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 5256–63. http://dx.doi.org/10.1609/aaai.v34i04.5971.

Full text
Abstract:
Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a.k.a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method.
APA, Harvard, Vancouver, ISO, and other styles
28

Blackman, Sherry. "Comparisons among Methods of Scoring Androgyny Continuously Using Computer-Simulated Data." Psychological Reports 57, no. 1 (August 1985): 151–54. http://dx.doi.org/10.2466/pr0.1985.57.1.151.

Full text
Abstract:
To study the relationship between the Bem-Spence categorical method of scoring androgyny and a number of continuous methods, Bem's Masculinity and Femininity scores were simulated for 50 studies of 200 subjects each. The continuous methods relate closely to one another and leave between 64% and 71% of the variance in the categorical method unaccounted for. There is little evidence for choosing one method over another. Research is needed to compare the predictions of the median-split method with those of the continuous methods.
APA, Harvard, Vancouver, ISO, and other styles
29

Sanchez, Jeniffer Duarte, Leandro C. Rêgo, and Raydonal Ospina. "Prediction by Empirical Similarity via Categorical Regressors." Machine Learning and Knowledge Extraction 1, no. 2 (May 15, 2019): 641–52. http://dx.doi.org/10.3390/make1020038.

Full text
Abstract:
A quantifier of similarity is generally a type of score that assigns a numerical value to a pair of sequences based on their proximity. Similarity measures play an important role in prediction problems with many applications, such as statistical learning, data mining, biostatistics, finance and others. Based on observed data, where a response variable of interest is assumed to be associated with some regressors, it is possible to make response predictions using a weighted average of observed response variables, where the weights depend on the similarity of the regressors. In this work, we propose a parametric regression model for continuous response based on empirical similarities for the case where the regressors are represented by categories. We apply the proposed method to predict tooth length growth in guinea pigs based on Vitamin C supplements considering three different dosage levels and two delivery methods. The inferential procedure is performed through maximum likelihood and least squares estimation under two types of similarity functions and two distance metrics. The empirical results show that the method yields accurate models with low dimension facilitating the parameters’ interpretation.
APA, Harvard, Vancouver, ISO, and other styles
30

Kondo, Tadafumi, and Yuchi Kanzawa. "Fuzzy Clustering Methods for Categorical Multivariate Data Based on q-Divergence." Journal of Advanced Computational Intelligence and Intelligent Informatics 22, no. 4 (July 20, 2018): 524–36. http://dx.doi.org/10.20965/jaciii.2018.p0524.

Full text
Abstract:
This paper presents two fuzzy clustering algorithms for categorical multivariate data based on q-divergence. First, this study shows that a conventional method for vectorial data can be explained as regularizing another conventional method using q-divergence. Second, based on the known results that Kullback-Leibler (KL)-divergence is generalized into the q-divergence, and two conventional fuzzy clustering methods for categorical multivariate data adopt KL-divergence, two fuzzy clustering algorithms for categorical multivariate data that are based on q-divergence are derived from two optimization problems built by extending the KL-divergence in these conventional methods to the q-divergence. Through numerical experiments using real datasets, the proposed methods outperform the conventional methods in term of clustering accuracy.
APA, Harvard, Vancouver, ISO, and other styles
31

ADACHI, Kohei, and Hiroshi TANAKA. "A Method for Scaling Categorical Attributes with Inter-Objec dissimilarity." Kodo Keiryogaku (The Japanese Journal of Behaviormetrics) 22, no. 2 (1995): 110–25. http://dx.doi.org/10.2333/jbhmk.22.110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Thomas, Roy. "A Novel Ensemble Method for Detecting Outliers in Categorical Data." International Journal of Advanced Trends in Computer Science and Engineering 9, no. 4 (August 25, 2020): 4947–53. http://dx.doi.org/10.30534/ijatcse/2020/108942020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Geng, Zhi, Yang-Bo He, Xue-Li Wang, and Qiang Zhao. "Bayesian method for learning graphical models with incompletely categorical data." Computational Statistics & Data Analysis 44, no. 1-2 (October 2003): 175–92. http://dx.doi.org/10.1016/s0167-9473(03)00066-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Choi yun-hi. "Writing Method and Categorical Variation for Hangul Titles with GeanMoonLok." Korean Classical Woman Literature Studies ll, no. 17 (December 2008): 413–38. http://dx.doi.org/10.17090/kcwls.2008..17.413.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

De Angelis, Luca, and José G. Dias. "Mining categorical sequences from data using a hybrid clustering method." European Journal of Operational Research 234, no. 3 (May 2014): 720–30. http://dx.doi.org/10.1016/j.ejor.2013.11.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Dovbysh, A. S., V. V. Moskalenko, and A. S. Rizhova. "Information-Extreme Method for Classification of Observations with Categorical Attributes." Cybernetics and Systems Analysis 52, no. 2 (March 2016): 224–31. http://dx.doi.org/10.1007/s10559-016-9818-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Yuan, Liang, Wenjian Wang, and Lifei Chen. "Two-stage pruning method for gram-based categorical sequence clustering." International Journal of Machine Learning and Cybernetics 10, no. 4 (November 15, 2017): 631–40. http://dx.doi.org/10.1007/s13042-017-0744-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Bondar, Yulia. "CATEGORICAL DISTINCTION OF THE CONCEPTS “INTERACTIVE TECHNOLOGY” AND “INTERACTIVE METHOD”." Knowledge, Education, Law, Management 1, no. 4 (2020): 8–13. http://dx.doi.org/10.51647/kelm.2020.4.1.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Ji, Jinchao, Wei Pang, Yanlin Zheng, Zhe Wang, and Zhiqiang Ma. "An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance." International Journal of Pattern Recognition and Artificial Intelligence 29, no. 07 (September 28, 2015): 1550024. http://dx.doi.org/10.1142/s021800141550024x.

Full text
Abstract:
Most of the initialization approaches are dedicated to the partitional clustering algorithms which process categorical or numerical data only. However, in real-world applications, data objects with both numeric and categorical features are ubiquitous. The coexistence of both categorical and numerical attributes make the initialization methods designed for single-type data inapplicable to mixed-type data. Furthermore, to the best of our knowledge, in the existing partitional clustering algorithms designed for mixed-type data, the initial cluster centers are determined randomly. In this paper, we propose a novel initialization method for mixed data clustering. In the proposed method, both the distance and density are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments on three real-world datasets in comparison with that of traditional initialization methods.
APA, Harvard, Vancouver, ISO, and other styles
40

Kumar, Ajay, and Shishir Kumar. "A Support Based Initialization Algorithm for Categorical Data Clustering." Journal of Information Technology Research 11, no. 2 (April 2018): 53–67. http://dx.doi.org/10.4018/jitr.2018040104.

Full text
Abstract:
Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
41

Alkharusi, Hussain. "Categorical Variables in Regression Analysis: A Comparison of Dummy and Effect Coding." International Journal of Education 4, no. 2 (June 17, 2012): 202. http://dx.doi.org/10.5296/ije.v4i2.1962.

Full text
Abstract:
The use of categorical variables in regression involves the application of coding methods. The purpose of this paper is to describe how categorical independent variables can be incorporated into regression by virtue of two coding methods: dummy and effect coding. The paper discusses the uses, interpretations, and underlying assumptions of each method. In general, overall results of the regression are unaffected by the methods used for coding the categorical independent variables. In any of the methods, the analysis tests whether group membership is related to the dependent variables. Both methods yield identical R2 and F. However, the interpretations of the intercept and regression coefficients depend on what coding method has been applied and whether the groups have equal sample sizes.
APA, Harvard, Vancouver, ISO, and other styles
42

Hao, Zengchao, Fanghua Hao, Youlong Xia, Vijay P. Singh, Yang Hong, Xinyi Shen, and Wei Ouyang. "A Statistical Method for Categorical Drought Prediction Based on NLDAS-2." Journal of Applied Meteorology and Climatology 55, no. 4 (April 2016): 1049–61. http://dx.doi.org/10.1175/jamc-d-15-0200.1.

Full text
Abstract:
AbstractDrought is a slowly varying natural phenomenon and may have wide impacts on a range of sectors. Tremendous efforts have therefore been devoted to drought monitoring and prediction to reduce potential impacts of drought. Reliable drought prediction is critically important to provide information ahead of time for early warning to facilitate drought-preparedness plans. The U.S. Drought Monitor (USDM) is a composite drought product that depicts drought conditions in categorical forms, and it has been widely used to track drought and its impacts for operational and research purposes. The USDM is an assessment of drought condition but does not provide drought prediction information. Given the wide application of USDM, drought prediction in a categorical form similar to that of USDM would be of considerable importance, but it has not been explored thus far. This study proposes a statistical method for categorical drought prediction by integrating the USDM drought category as an initial condition with drought information from other sources such as drought indices from land surface simulation or statistical prediction. Incorporating USDM drought categories and drought indices from phase 2 of the North American Land Data Assimilation System (NLDAS-2), the proposed method is tested in Texas for 2001–14. Results show satisfactory performance of the proposed method for categorical drought prediction, which provides useful information to aid early warning for drought-preparedness plans.
APA, Harvard, Vancouver, ISO, and other styles
43

Nguyen, Huu Hiep. "Clustering Categorical Data Using Community Detection Techniques." Computational Intelligence and Neuroscience 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/8986360.

Full text
Abstract:
With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases.
APA, Harvard, Vancouver, ISO, and other styles
44

Ahmad, Amir, and Lipika Dey. "A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set." Pattern Recognition Letters 28, no. 1 (January 2007): 110–18. http://dx.doi.org/10.1016/j.patrec.2006.06.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Kim, Jihyeok, Reinald Kim Amplayo, Kyungjae Lee, Sua Sung, Minji Seo, and Seung-won Hwang. "Categorical Metadata Representation for Customized Text Classification." Transactions of the Association for Computational Linguistics 7 (November 2019): 201–15. http://dx.doi.org/10.1162/tacl_a_00263.

Full text
Abstract:
The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. This information has been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose using basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various data sets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly.
APA, Harvard, Vancouver, ISO, and other styles
46

Chrisinta, Debora, I. Made Sumertajaya, and Indahwati Indahwati. "EVALUASI KINERJA METODE CLUSTER ENSEMBLE DAN LATENT CLASS CLUSTERING PADA PEUBAH CAMPURAN." Indonesian Journal of Statistics and Its Applications 4, no. 3 (November 30, 2020): 448–61. http://dx.doi.org/10.29244/ijsa.v4i3.630.

Full text
Abstract:
Most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in the real-world often contain both numeric and categorical attributes. It is difficult for applying traditional clustering algorithms directly to these kinds of data. So, the paper aims to show the best method based on the cluster ensemble and latent class clustering approach for mixed data. Cluster ensemble is a method to combine different clustering results from two sub-datasets: the categorical and numerical variables. Then, clustering algorithms are designed for numerical and categorical datasets that are employed to produce corresponding clusters. On the other side, latent class clustering is a model-based clustering used for any type of data. The numbers of clusters base on the estimation of the probability model used. The best clustering method recommends LCC, which provides higher accuracy and the smallest standard deviation ratio. However, both LCC and cluster ensemble methods produce evaluation values that are not much different as the application method used potential village data in Bengkulu Province for clustering.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Han-Ching, and Nae-Sheng Wang. "The Assignment of Scores Procedure for Ordinal Categorical Data." Scientific World Journal 2014 (2014): 1–7. http://dx.doi.org/10.1155/2014/304213.

Full text
Abstract:
Ordinal data are the most frequently encountered type of data in the social sciences. Many statistical methods can be used to process such data. One common method is to assign scores to the data, convert them into interval data, and further perform statistical analysis. There are several authors who have recently developed assigning score methods to assign scores to ordered categorical data. This paper proposes an approach that defines an assigning score system for an ordinal categorical variable based on underlying continuous latent distribution with interpretation by using three case study examples. The results show that the proposed score system is well for skewed ordinal categorical data.
APA, Harvard, Vancouver, ISO, and other styles
48

Lee, Changki, and Uk Jung. "Context-Based Geodesic Dissimilarity Measure for Clustering Categorical Data." Applied Sciences 11, no. 18 (September 10, 2021): 8416. http://dx.doi.org/10.3390/app11188416.

Full text
Abstract:
Measuring the dissimilarity between two observations is the basis of many data mining and machine learning algorithms, and its effectiveness has a significant impact on learning outcomes. The dissimilarity or distance computation has been a manageable problem for continuous data because many numerical operations can be successfully applied. However, unlike continuous data, defining a dissimilarity between pairs of observations with categorical variables is not straightforward. This study proposes a new method to measure the dissimilarity between two categorical observations, called a context-based geodesic dissimilarity measure, for the categorical data clustering problem. The proposed method considers the relationships between categorical variables and discovers the implicit topological structures in categorical data. In other words, it can effectively reflect the nonlinear patterns of arbitrarily shaped categorical data clusters. Our experimental results confirm that the proposed measure that considers both nonlinear data patterns and relationships among the categorical variables yields better clustering performance than other distance measures.
APA, Harvard, Vancouver, ISO, and other styles
49

Dong, Bin, Songlei Jian, and Ke Zuo. "CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships." Entropy 22, no. 4 (March 29, 2020): 391. http://dx.doi.org/10.3390/e22040391.

Full text
Abstract:
Categorical data are ubiquitous in machine learning tasks, and the representation of categorical data plays an important role in the learning performance. The heterogeneous coupling relationships between features and feature values reflect the characteristics of the real-world categorical data which need to be captured in the representations. The paper proposes an enhanced categorical data embedding method, i.e., CDE++, which captures the heterogeneous feature value coupling relationships into the representations. Based on information theory and the hierarchical couplings defined in our previous work CDE (Categorical Data Embedding by learning hierarchical value coupling), CDE++ adopts mutual information and margin entropy to capture feature couplings and designs a hybrid clustering strategy to capture multiple types of feature value clusters. Moreover, Autoencoder is used to learn non-linear couplings between features and value clusters. The categorical data embeddings generated by CDE++ are low-dimensional numerical vectors which are directly applied to clustering and classification and achieve the best performance comparing with other categorical representation learning methods. Parameter sensitivity and scalability tests are also conducted to demonstrate the superiority of CDE++.
APA, Harvard, Vancouver, ISO, and other styles
50

Arengas, Gustavo. "Categorical definitions and properties via generators." Revista Colombiana de Matemáticas 53, no. 2 (July 1, 2019): 165–84. http://dx.doi.org/10.15446/recolma.v53n2.85525.

Full text
Abstract:
In the present work, we show how the study of categorical constructions does not have to be done with all the objects of the category, but we can restrict ourselves to work with families of generators. Thus, universal properties can be characterized through iterated families of generators, which leads us in particular to an alternative version of the adjoint functor theorem. Similarly, the properties of relations or subobjects algebra can be investigated by this method. We end with a result that relates various forms of compactness through representable functors of generators.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography