Zeitschriftenartikel: „Variables clustering“

1

Perricone, Chiara. „Clustering macroeconomic variables“. Structural Change and Economic Dynamics 44 (März 2018): 23–33. http://dx.doi.org/10.1016/j.strueco.2018.02.001.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Hathaway, Richard J. „Clustering Random Variables“. IETE Journal of Research 44, Nr. 4-5 (Juli 1998): 199–205. http://dx.doi.org/10.1080/03772063.1998.11416046.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Chen, Mingkun, und Evelyne Vigneau. „Supervised clustering of variables“. Advances in Data Analysis and Classification 10, Nr. 1 (15.11.2014): 85–101. http://dx.doi.org/10.1007/s11634-014-0191-5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Zhang, Hongmei, Yubo Zou, Will Terry, Wilfried Karmaus und Hasan Arshad. „Joint Clustering With Correlated Variables“. American Statistician 73, Nr. 3 (09.07.2018): 296–306. http://dx.doi.org/10.1080/00031305.2018.1424033.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Rubiano Moreno, Jesica, Carlos Alonso Malaver, Samuel Nucamendi Guillén und Carlos López Hernández. „A clustering algorithm for ipsative variables“. DYNA 86, Nr. 211 (01.10.2019): 94–101. http://dx.doi.org/10.15446/dyna.v86n211.77835.

Der volle Inhalt der Quelle

Annotation:

The aim of this study is to introduce a new clustering method for ipsatives variables. This method can be used for nominals or ordinals variables for which responses must be mutually exclusive, and it is independent of data distribution. The proposed method is applied to outline motivational profiles for individuals based on a declared preferences set. A case study is used to analyze the performance of the proposed algorithm by comparing proposed method results versus the PAM method. Results show that proposed method generate a better segmentation and differentiated groups. An extensive study was conducted to validate the performance clustering method against a set of random groups by clustering measures.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Forina, M., C. Armanino und V. Raggio. „Clustering with dendrograms on interpretation variables“. Analytica Chimica Acta 454, Nr. 1 (März 2002): 13–19. http://dx.doi.org/10.1016/s0003-2670(01)01517-3.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Saracco, J., und M. Chavent. „Clustering of Variables for Mixed Data“. EAS Publications Series 77 (2016): 121–69. http://dx.doi.org/10.1051/eas/1677007.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Huh, Myung-Hoe, und Yong B. Lim. „Weighting variables in K-means clustering“. Journal of Applied Statistics 36, Nr. 1 (31.10.2008): 67–78. http://dx.doi.org/10.1080/02664760802382533.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Vigneau, E., und E. M. Qannari. „Clustering of Variables Around Latent Components“. Communications in Statistics - Simulation and Computation 32, Nr. 4 (11.01.2003): 1131–50. http://dx.doi.org/10.1081/sac-120023882.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Ghizlane, Ez-Zarrad, Sabbar Wafae und Bekkhoucha Abdelkrim. „Features Clustering Around Latent Variables for High Dimensional Data“. E3S Web of Conferences 297 (2021): 01070. http://dx.doi.org/10.1051/e3sconf/202129701070.

Der volle Inhalt der Quelle

Annotation:

Clustering of variables is the task of grouping similar variables into different groups. It may be useful in several situations such as dimensionality reduction, feature selection, and detect redundancies. In the present study, we combine two methods of features clustering the clustering of variables around latent variables (CLV) algorithm and the k-means based co-clustering algorithm (kCC). Indeed, classical CLV cannot be applied to high dimensional data because this approach becomes tedious when the number of features increases.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

KELLER, ANNETTE, und FRANK KLAWONN. „FUZZY CLUSTERING WITH WEIGHTING OF DATA VARIABLES“. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 08, Nr. 06 (Dezember 2000): 735–46. http://dx.doi.org/10.1142/s0218488500000538.

Der volle Inhalt der Quelle

Annotation:

We introduce an objective function-based fuzzy clustering technique that assigns one influence parameter to each single data variable for each cluster. Our method is not only suited to detect structures or groups of data that are not uniformly distributed over the structure's single domains, but gives also information about the influence of individual variables on the detected groups. In addition, our approach can be seen as a generalization of the well-known fuzzy c-means clustering algorithm.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Yang, Miin-Shen, Pei-Yuan Hwang und De-Hua Chen. „Fuzzy clustering algorithms for mixed feature variables“. Fuzzy Sets and Systems 141, Nr. 2 (Januar 2004): 301–17. http://dx.doi.org/10.1016/s0165-0114(03)00072-1.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Vichi, Maurizio, Donatella Vicari und Henk A. L. Kiers. „Clustering and dimension reduction for mixed variables“. Behaviormetrika 46, Nr. 2 (11.03.2019): 243–69. http://dx.doi.org/10.1007/s41237-018-0068-6.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Vigneau, E., K. Sahmer, E. M. Qannari und D. Bertrand. „Clustering of variables to analyze spectral data“. Journal of Chemometrics 19, Nr. 3 (2005): 122–28. http://dx.doi.org/10.1002/cem.909.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Raghuveer, Boddupally. „Clustering Methods of Acculturation“. Journal of Clinical and Medical Case Reports and Reviews 2, Nr. 4 (24.10.2022): 1–2. http://dx.doi.org/10.59468/2837-469x/022.

Der volle Inhalt der Quelle

Annotation:

The purpose of our study was to determine if acculturation variables from different acculturation domains form empirically extracted acculturation clusters The findings of the present study lend additional support to the use of clustering methods as a way of including multiple domains of acculturation, thereby gaining a more comprehensive understanding of acculturation and its connection with psychosocial adjustment. The results also reinforce prior research findings that integration, or biculturalism, is an adaptive acculturation strategy.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Vigneau, Evelyne, Mingkun Chen und El,Mostafa Qannari. „ClustVarLV: An R Package for the Clustering of Variables Around Latent Variables“. R Journal 7, Nr. 2 (2015): 134. http://dx.doi.org/10.32614/rj-2015-026.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Karimi, Sadegh, und Bahram Hemmateenejad. „Identification of discriminatory variables in proteomics data analysis by clustering of variables“. Analytica Chimica Acta 767 (März 2013): 35–43. http://dx.doi.org/10.1016/j.aca.2012.12.050.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Singh, Nripendra. „Clustering Consumers on the Basis of Attitudinal Variables“. Asia Pacific Business Review 5, Nr. 4 (Oktober 2009): 146–55. http://dx.doi.org/10.1177/097324700900500413.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Bühlmann, Peter, Philipp Rütimann, Sara van de Geer und Cun-Hui Zhang. „Correlated variables in regression: Clustering and sparse estimation“. Journal of Statistical Planning and Inference 143, Nr. 11 (November 2013): 1835–58. http://dx.doi.org/10.1016/j.jspi.2013.05.019.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Pulido-Valdeolivas, I., D. Gómez-Andrés, J. A. Martin, J. López, E. Gómez-Barrena und E. Rausell. „P6.14 Hierarchical clustering of Gillette Gait Index variables“. Clinical Neurophysiology 122 (Juni 2011): S87. http://dx.doi.org/10.1016/s1388-2457(11)60303-9.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Csörgő, Sándor, und Wei Biao Wu. „On the clustering of independent uniform random variables“. Random Structures & Algorithms 25, Nr. 4 (28.06.2004): 396–420. http://dx.doi.org/10.1002/rsa.20030.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Šulc, Zdeněk, und Hana Řezanková. „Evaluation of selected approaches to clustering categorical variables“. Statistics in Transition new series 15, Nr. 4 (01.12.2014): 591–610. http://dx.doi.org/10.59170/stattrans-2014-039.

Der volle Inhalt der Quelle

Annotation:

This paper focuses on recently proposed similarity measures and their performance in categorical variable clustering. It compares clustering results using three recently developed similarity measures (IOF, OF and Lin measures) with results obtained using two association measures for nominal variables (Cramér’s V and the uncertainty coefficient) and with the simple matching coefficient (the overlap measure). To eliminate the influence of a particular linkage method on the structure of final clusters, three linkage methods are examined (complete, single, average). The created groups (clusters) of variables can be considered as the basis for dimensionality reduction, e.g. by choosing one of the variables from a given group as a representative for the whole group. The quality of resulting clusters is evaluated by the within-cluster variability, expressed by the WCM coefficient, and by dendrogram analysis. The examined similarity measures are compared and evaluated using two real data sets from a social survey.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Yan, Jingdong, und Wuwei Liu. „An Ensemble Clustering Approach (Consensus Clustering) for High-Dimensional Data“. Security and Communication Networks 2022 (16.05.2022): 1–9. http://dx.doi.org/10.1155/2022/5629710.

Der volle Inhalt der Quelle

Annotation:

Due to the plurality of irrelevant attributes, sparse distribution, and complicated calculations in high-dimensional data, traditional clustering algorithms, such as K-means, do not perform well on high-dimensional data. To address the clustering problem of high-dimensional data, this paper studies an integrated clustering method for high-dimensional data. A method of subspace division based on minimum redundancy is proposed to solve the problem of subspace division of high-dimensional data; subspace division is improved by using the K-means algorithm. Additionally, this method uses mutual information between the characteristic variables of the data to replace the calculation in the K-means algorithm. The distance between the characteristic variables of the data is used to divide the data into subspaces according to the mutual information values between the characteristic variables of the data. To achieve high clustering accuracy and diversity based on clustering requirements, this paper uses a genetic algorithm as the consistency integration function. The fitness function is designed according to the clustering fusion target, and the selection operator is designed according to the maximum number of overlapping elements in the base clustering. The experimental results show that the clustering algorithm proposed in this paper outperforms other methods on most datasets and is an effective clustering integration algorithm. The proposed clustering algorithm is compared with other commonly used clustering fusion algorithms on datasets to prove the advantages of the proposed algorithm.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Vandewalle, Vincent. „Multi-Partitions Subspace Clustering“. Mathematics 8, Nr. 4 (15.04.2020): 597. http://dx.doi.org/10.3390/math8040597.

Der volle Inhalt der Quelle

Annotation:

In model based clustering, it is often supposed that only one clustering latent variable explains the heterogeneity of the whole dataset. However, in many cases several latent variables could explain the heterogeneity of the data at hand. Finding such class variables could result in a richer interpretation of the data. In the continuous data setting, a multi-partition model based clustering is proposed. It assumes the existence of several latent clustering variables, each one explaining the heterogeneity of the data with respect to some clustering subspace. It allows to simultaneously find the multi-partitions and the related subspaces. Parameters of the model are estimated through an EM algorithm relying on a probabilistic reinterpretation of the factorial discriminant analysis. A model choice strategy relying on the BIC criterion is proposed to select to number of subspaces and the number of clusters by subspace. The obtained results are thus several projections of the data, each one conveying its own clustering of the data. Model’s behavior is illustrated on simulated and real data.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Legland, D., und J. Beaugrand. „Automated clustering of lignocellulosic fibres based on morphometric features and using clustering of variables“. Industrial Crops and Products 45 (Februar 2013): 253–61. http://dx.doi.org/10.1016/j.indcrop.2012.12.021.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Chang, Xiaopeng, Minghua Zhang, Xiang Zhang und Sheng Zhang. „Two-Step Clustering for Mineral Prospectivity Mapping: A Case Study from the Northeastern Edge of the Jiaolai Basin, China“. Minerals 14, Nr. 11 (28.10.2024): 1089. http://dx.doi.org/10.3390/min14111089.

Der volle Inhalt der Quelle

Annotation:

The advancement of geological big data has rendered data-driven methodologies increasingly vital in Mineral Prospectivity Mapping. The effective integration of quantitative and qualitative data, including experiential and knowledge-based insights, is crucial in geological data fusion. Specifically, the conversion of raw data into samples and the selection of predictive methods are two core issues that constitute the focus of this study. Traditional clustering methods require the user to specify the number of clusters in advance. The two-step clustering can automatically determine the clustering result ‘k’ while analyzing both continuous and categorical variables, by building a Cluster Feature (CF) and using information criteria to merge nodes. In this study, we conducted an analysis utilizing stream sediment element data, residual gravity anomalies, and fault distribution through the two-step clustering method. Factor analysis (FA) was employed to reduce 16 elemental variables from stream sediments into five uncorrelated continuous variables; additionally, residual gravity anomalies were transformed from continuous to categorical variables via an interval-based method before being combined with fault distribution, resulting in seven variables for clustering. The research findings indicate that categorical variables significantly influence clustering results; concurrently, as the importance of continuous variables within the cluster increases, so does k. When only one categorical variable is present, residual gravity anomalies show significantly better clustering than fault distribution; however, when two categorical variables are involved, it is essential to consider the quantity of categories: more categories lead to poorer quality. The results from the Jiaolai Basin’s northeastern margin indicate a significant correlation with known gold deposits; two-step clustering is a promising and effective method for improving mineral prospecting efforts.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Raymaekers, Jakob, und Ruben H. Zamar. „Pooled variable scaling for cluster analysis“. Bioinformatics 36, Nr. 12 (13.04.2020): 3849–55. http://dx.doi.org/10.1093/bioinformatics/btaa243.

Der volle Inhalt der Quelle

Annotation:

Abstract Motivation Many popular clustering methods are not scale-invariant because they are based on Euclidean distances. Even methods using scale-invariant distances, such as the Mahalanobis distance, lose their scale invariance when combined with regularization and/or variable selection. Therefore, the results from these methods are very sensitive to the measurement units of the clustering variables. A simple way to achieve scale invariance is to scale the variables before clustering. However, scaling variables is a very delicate issue in cluster analysis: A bad choice of scaling can adversely affect the clustering results. On the other hand, reporting clustering results that depend on measurement units is not satisfactory. Hence, a safe and efficient scaling procedure is needed for applications in bioinformatics and medical sciences research. Results We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures, such as the SD and the range, our proposed scale avoids dampening the beneficial effect of informative clustering variables. We confirm through an extensive simulation study and applications to well-known real-data examples that the proposed scaling method is safe and generally useful. Finally, we use our approach to cluster a high-dimensional genomic dataset consisting of gene expression data for several specimens of breast cancer cells tissue obtained from human patients. Availability and implementation An R-implementation of the algorithms presented is available at https://wis.kuleuven.be/statdatascience/robust/software. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

BAŞARAN, Bülent. „Examining Preservice Teachers’ TPACK-21 Efficacies with Clustering Analysis in Terms of Certain Variables“. Malaysian Online Journal of Educational Technology 8, Nr. 3 (01.07.2020): 84–99. http://dx.doi.org/10.17220/mojet.2020.03.005.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

Hummel, Manuela, Dominic Edelmann und Annette Kopp-Schneider. „Clustering of samples and variables with mixed-type data“. PLOS ONE 12, Nr. 11 (28.11.2017): e0188274. http://dx.doi.org/10.1371/journal.pone.0188274.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Hajnal, Istvan, und Geert Loosveldt. „The Sensitivity of Hierarchical Clustering Solutions to Irrelevant Variables“. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 50, Nr. 1 (März 1996): 56–70. http://dx.doi.org/10.1177/075910639605000105.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Brusco, Michael J. „Clustering Binary Data in the Presence of Masking Variables.“ Psychological Methods 9, Nr. 4 (2004): 510–23. http://dx.doi.org/10.1037/1082-989x.9.4.510.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Lim, Yaeji, Hee‐Seok Oh und Ying Kuen Cheung. „Functional clustering of accelerometer data via transformed input variables“. Journal of the Royal Statistical Society: Series C (Applied Statistics) 68, Nr. 3 (16.09.2018): 495–520. http://dx.doi.org/10.1111/rssc.12310.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Qannari, E. M., E. Vigneau, P. Luscan, A. C. Lefebvre und F. Vey. „Clustering of variables, application in consumer and sensory studies“. Food Quality and Preference 8, Nr. 5-6 (September 1997): 423–28. http://dx.doi.org/10.1016/s0950-3293(97)00008-6.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Yu, Chang, und Daniel Zelterman. „Sums of dependent Bernoulli random variables and disease clustering“. Statistics & Probability Letters 57, Nr. 4 (Mai 2002): 363–73. http://dx.doi.org/10.1016/s0167-7152(02)00091-3.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Li, Ye, Yiyan Chen und Qun Li. „A Clustering Algorithm for Triangular Fuzzy Normal Random Variables“. International Journal of Fuzzy Systems 22, Nr. 7 (15.09.2020): 2083–100. http://dx.doi.org/10.1007/s40815-020-00933-7.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

González-Rodríguez, Gil, Ana Colubi, Pierpaolo D’Urso und Manuel Montenegro. „Multi-sample test-based clustering for fuzzy random variables“. International Journal of Approximate Reasoning 50, Nr. 5 (Mai 2009): 721–31. http://dx.doi.org/10.1016/j.ijar.2009.01.003.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Fernández, Antonio, José A. Gámez, Rafael Rumí und Antonio Salmerón. „Data clustering using hidden variables in hybrid Bayesian networks“. Progress in Artificial Intelligence 2, Nr. 2-3 (09.04.2014): 141–52. http://dx.doi.org/10.1007/s13748-014-0048-3.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Lee, Yunjung, und Seyoung Park. „Spectral clustering of weighted variables on multi-omics data“. Korean Journal of Applied Statistics 36, Nr. 3 (30.06.2023): 175–96. http://dx.doi.org/10.5351/kjas.2023.36.3.175.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Hagoel, Lea, Liora Ore, Efrat Neter, Zmira Silman und Gad Rennert. „Clustering Women’s Health Behaviors“. Health Education & Behavior 29, Nr. 2 (April 2002): 170–82. http://dx.doi.org/10.1177/109019810202900203.

Der volle Inhalt der Quelle

Annotation:

This study attempts to characterize health lifestyles by subgrouping women with similar behavior patterns. Data on background, health behaviors, and perceptions were collected via phone interview from 1,075 Israeli women aged 50 to 74. From a cluster analysis conducted on health behaviors, three clusters emerged: a “health promoting” cluster (44.1%), women adhering to recommended behaviors; an “inactive” cluster (40.3%), women engaging in neither health-promoting nor compromising behaviors; and an “ambivalent” cluster (15.4%), women engaging somewhat in both health-promoting and compromising behaviors. Clustering was cross-tabulated by demographic and perceptual variables, further validating the subgrouping. The cluster solution was also validated by predicting another health behavior (mammography screening) for which there was an external validating source. Findings are discussed in comparison to published cluster solutions, culminating in suggestions for intervention alternatives. The concept of lifestyle was deemed appropriate to summarize the clustering of these behavioral, perceptual, and structural variables.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Rosing, K. E., und C. S. ReVelle. „Optimal Clustering“. Environment and Planning A: Economy and Space 18, Nr. 11 (November 1986): 1463–76. http://dx.doi.org/10.1068/a181463.

Der volle Inhalt der Quelle

Annotation:

Cluster analysis can be performed with several models. One method is to seek those clusters for which the total flow between all within-cluster members is a maximum. This model has, until now, been viewed as mathematically difficult because of the presence of products of integer variables in the objective function. In another optimization model of cluster analysis, the p-median, a central member is found for each cluster, so that relationships of cluster members with the various central members are maximized (or minimized). This problem, although mathematically tractable, is a less realistic formulation of the general clustering problem. The formulation of the maximum interflow problem is here transformed in stages into a linear analogue which is economically solvable. Computation experience with the several transformed stages is reported and a practical example of the analysis demonstrated.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

Ma, Xinwei, Ruiming Cao und Yuchuan Jin. „Spatiotemporal Clustering Analysis of Bicycle Sharing System with Data Mining Approach“. Information 10, Nr. 5 (02.05.2019): 163. http://dx.doi.org/10.3390/info10050163.

Der volle Inhalt der Quelle

Annotation:

The main objective of this study is to explore the spatiotemporal activities pattern of bicycle sharing system by combining together temporal and spatial attributes variables through clustering analysis method. Specifically, three clustering algorithms, i.e., hierarchical clustering, K-means clustering, expectation maximization clustering, are chosen to group the bicycle sharing stations. The temporal attributes variables are obtained through the statistical analysis of bicycle sharing smart card data, and the spatial attributes variables are quantified by point of interest (POI) data around bicycle sharing docking stations, which reflects the influence of land use on bicycle sharing system. According to the performance of the three clustering algorithms and six cluster validation measures, K-means clustering has been proven as the better clustering algorithm for the case of Ningbo, China. Then, the 477 bicycle sharing docking stations were clustered into seven clusters. The results show that the stations of each cluster have their own unique spatiotemporal activities pattern influenced by people’s travel habits and land use characteristics around the stations. This analysis will help bicycle sharing operators better understand the system usage and learn how to improve the service quality of the existing system.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Jamotton, Charlotte, Donatien Hainaut und Thomas Hames. „Insurance Analytics with Clustering Techniques“. Risks 12, Nr. 9 (05.09.2024): 141. http://dx.doi.org/10.3390/risks12090141.

Der volle Inhalt der Quelle

Annotation:

The K-means algorithm and its variants are well-known clustering techniques. In actuarial applications, these partitioning methods can identify clusters of policies with similar attributes. The resulting partitions provide an actuarial framework for creating maps of dominant risks and unsupervised pricing grids. This research article aims to adapt well-established clustering methods to complex insurance datasets containing both categorical and numerical variables. To achieve this, we propose a novel approach based on Burt distance. We begin by reviewing the K-means algorithm to establish the foundation for our Burt distance-based framework. Next, we extend the scope of application of the mini-batch and fuzzy K-means variants to heterogeneous insurance data. Additionally, we adapt spectral clustering, a technique based on graph theory that accommodates non-convex cluster shapes. To mitigate the computational complexity associated with spectral clustering’s O(n3) runtime, we introduce a data reduction method for large-scale datasets using our Burt distance-based approach.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Bello, Thaísa B., Anderson G. Costa, Thainara R. da Silva, Juliana L. Paes und Marcus V. M. de Oliveira. „Tomato quality based on colorimetric characteristics of digital images“. Revista Brasileira de Engenharia Agrícola e Ambiental 24, Nr. 8 (August 2020): 567–72. http://dx.doi.org/10.1590/1807-1929/agriambi.v24n8p567-572.

Der volle Inhalt der Quelle

Annotation:

ABSTRACT Results of evaluations using optical evaluation methods may be correlated with tomato quality and maturation. In this context, the objective of this study was to evaluated the correlation between tomato colorimetric and physico-chemical variables, clustering them as a function of maturation stages, using multivariate analysis. The experiment was conducted using 150 fruits and three maturation stages (immature, light red and mature). The physico-chemical variables were evaluated through traditional methods. The colorimetric variables were assessed on images in RGB color model taken with a digital camera. The correlation between colorimetric and physico-chemical variables was analyzed using the Pearson’s coefficient. Principal components analysis and k-means clustering method was applied to three data set: RGB isolated variables; colorimetric variables calculated by relation between the RGB bands (colorimetric indexes); and physico-chemical variables. The colorimetric variables present higher explanatory capacity of the maturation variation than physico-chemical variables. The colorimetric indexes presented higher performance in clustering (accuracy of 0.98) tomatoes as a function of maturation.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Rosyada, Istina Alya, und Dina Tri Utari. „Penerapan Principal Component Analysis untuk Reduksi Variabel pada Algoritma K-Means Clustering“. Jambura Journal of Probability and Statistics 5, Nr. 1 (04.06.2024): 6–13. http://dx.doi.org/10.37905/jjps.v5i1.18733.

Der volle Inhalt der Quelle

Annotation:

K-Means clustering is a widely used clustering algorithm. However, it has the disadvantage that the performance of clustering data decreases if the variables of the processed data are immense. The complex variables problem in K-Means can be overcome by combining the Principal Component Analysis (PCA) variable reduction method. This study uses seven indicator variables for the welfare of the people of West Java Province in 2021 to measure the welfare level of districts/cities. The results of the analysis obtained two principal components based on eigenvalues. Clustering from cluster analysis with the K-Means with variable reduction using PCA formed the three best clusters where the number of members of each cluster consisted of 12, 8, and 7 districts/cities.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Zhu, Chuanze, Xu Zhong, Zhenjie Lin, Liming Wang, Wenzhong Li und Sanglu Lu. „Multivariate Time Series Clustering based on Graph Convolutional Network“. Journal of Physics: Conference Series 2522, Nr. 1 (01.06.2023): 012021. http://dx.doi.org/10.1088/1742-6596/2522/1/012021.

Der volle Inhalt der Quelle

Annotation:

Abstract Multivariable time series (MTS) clustering is an important topic in time series data mining. The major challenge of MTS clustering is to capture the temporal correlations and the dependencies between multiple variables. In this paper, we propose a novel MTS clustering approach based on graph convolutional network (GCN), which is a powerful feature extractor for graph structure data. We regard each variable in MTS as a node in the graph and construct edges through the correlation between variables. Furthermore, GCN and deep learning back-ropagation technology are used to continuously learn the relationship between multiple variables. Combining the learned variables with the characteristics of the time dimensions, the comprehensive features can be fused to form effective representation for MTS clustering task. We carry out extensive experimental analysis on four open time series data sets and six benchmark algorithms, which shows the superiority of the proposed method.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Hendricks, Renee, und Mohammad Khasawneh. „Cluster Analysis of Categorical Variables of Parkinson’s Disease Patients“. Brain Sciences 11, Nr. 10 (29.09.2021): 1290. http://dx.doi.org/10.3390/brainsci11101290.

Der volle Inhalt der Quelle

Annotation:

Parkinson’s disease (PD) is a chronic disease. No treatment stops its progression, and it presents symptoms in multiple areas. One way to understand the PD population is to investigate the clustering of patients by demographic and clinical similarities. Previous PD cluster studies included scores from clinical surveys, which provide a numerical but ordinal, non-linear value. In addition, these studies did not include categorical variables, as the clustering method utilized was not applicable to categorical variables. It was discovered that the numerical values of patient age and disease duration were similar among past cluster results, pointing to the need to exclude these values. This paper proposes a novel and automatic discovery method to cluster PD patients by incorporating categorical variables. No estimate of the number of clusters is required as input, whereas the previous cluster methods require a guess from the end user in order for the method to be initiated. Using a patient dataset from the Parkinson’s Progression Markers Initiative (PPMI) website to demonstrate the new clustering technique, our results showed that this method provided an accurate separation of the patients. In addition, this method provides an explainable process and an easy way to interpret clusters and describe patient subtypes.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Normoyle, Aline, und Shane Jensen. „Bayesian Clustering of Player Styles for Multiplayer Games“. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 11, Nr. 1 (24.06.2021): 163–69. http://dx.doi.org/10.1609/aiide.v11i1.12805.

Der volle Inhalt der Quelle

Annotation:

With game play data, empirical approaches to clustering are typically based solely on game outcomes, e.g. kills, deaths, and score for each player. In this paper, we investigate a method for clustering players based on how a player’s choices relate to outcomes, or equivalently the latent player styles exhibited by players. Our approach is based on a Bayesian semi-parametric clustering method which has several advantages: the number of clusters do not need to be specified a priori; the technique can work with a very compact representation of each match (e.g. consisting primarily of indicator variables for player choices); a player can belong to multiple clusters and hence can have a hybrid style; and the resulting clusterings often have a straight-forward interpretation. To demonstrate the approach, we apply our method to multiplayer match logs from Battlefield 3 consisting of over 1200 players and 500,000 matches.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Mogaraju, Jagadish Kumar. „Agglomerative and Divisive hierarchical cluster analysis of groundwater quality variables using opensource tools over YSR district, AP, India“. Journal of Scientific Research 66, Nr. 04 (2022): 15–20. http://dx.doi.org/10.37398/jsr.2022.660403.

Der volle Inhalt der Quelle

Annotation:

Groundwater quality variables like F, Total Hardness (TH), Total Alkalinity (TA), Total Dissolved Solids (TDS), SO4, SAR, NA, EC, Cl, Ca, Mg, and pH were tested with Hierarchical clustering analysis (HCA) to identify the groupings or clusters that exist in the dataset. The dataset is subjected to Agglomerative and divisive hierarchical clustering. The observations were scaled to compare variables systematically. The clustering structure was determined using an agglomerative coefficient. Agglomerative approaches like complete, average, single, and ward are tested using agglomerative coefficients. The ward approach best suits the dataset to investigate a strong clustering structure. The agglomerative coefficient obtained is 0.8666752, and the divisive coefficient is 0.8371531. The entanglement score attained was 0.26, demonstrating a good alignment with nominal entanglement. The principal component analysis resulted in two main components contributing 54.8% and 18.2% explainable variance. The variables that are prominent in each PC are investigated and reported. The gap statistic and average silhouette method are used to know the optimal number of clusters. Open-source software like R/ R studio is used for this analysis. This work concludes that clustering analysis is essential to understand the groundwater quality variables better.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Li, Wei Ping, De Qing Quan und Jun Cai. „Application of Data Mining in Sports in the Consumer Market Segmentation“. Applied Mechanics and Materials 631-632 (September 2014): 280–83. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.280.

Der volle Inhalt der Quelle

Annotation:

This paper combines the data mining technology and the rich sports consumption data resources of city household survey. By using the K-Means fast cluster method, sports consumer market models were constructed based on the different variables. Research shows, choosing sports consumption content as variables to establish clustering model is better than choosing the demographic , sports consumption content ,consumer psychology and way of life as variables to establish clustering model. According to the results of clustering, the city residents are divided into four kinds of consumer groups in accordance with the different features of sports consumption.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Mukherjee, Sudipto, Himanshu Asnani, Eugene Lin und Sreeram Kannan. „ClusterGAN: Latent Space Clustering in Generative Adversarial Networks“. Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 4610–17. http://dx.doi.org/10.1609/aaai.v33i01.33014610.

Der volle Inhalt der Quelle

Annotation:

Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zeitschriftenartikel zum Thema „Variables clustering“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an