Log in

Relevant bibliographies by topics / Cluster analysis – Data processing / Journal articles

To see the other types of publications on this topic, follow the link: Cluster analysis – Data processing.

Journal articles on the topic 'Cluster analysis – Data processing'

Author: Grafiati

Published: 4 June 2021

Last updated: 30 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cluster analysis – Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zanev, Vladimir, Stanislav Topalov, and Veselin Christov. "Analysis and Data Mining of Lead-Zinc Ore Data." Serdica Journal of Computing 7, no. 3 (2014): 271–80. http://dx.doi.org/10.55630/sjc.2013.7.271-280.

Full text

Abstract:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process.ACM Computing Classification System (1998): H.2.8, H.3.3.

APA, Harvard, Vancouver, ISO, and other styles

2

Karlashevych, Ivan, and Volodymyr Pravda. "Use of Cluster Analysis Method to Increase the Efficiency and Accuracy of Radar Data Processing." Computational Problems of Electrical Engineering 7, no. 1 (2017): 33–36. http://dx.doi.org/10.23939/jcpee2017.01.033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ocampo, Daniel Morin, and Luiz Caldeira Brant de Tolentino-Neto. "Cluster Analysis for Data Processing in Educational Research." Acta Scientiae 21, no. 4 (2019): 34–48. http://dx.doi.org/10.17648/acta.scientiae.v21iss4id5119.

Full text

Abstract:

Quantitative approaches to educational research have been undervalued and consequently less widely used. In this sense, this paper aims to present and analyze the techniques of Cluster Analysis as a possibility for research in sciences area. Therefore, the main hierarchical and non-hierarchical techniques of Cluster Analysis are presented, as well as some of their applications in educational research found in the literature. Cluster Analysis is adequate to simplify or elaborate hypotheses on massive data, such as large-scale educational research. The studies in the area of education that used Cluster Analysis methods proved to be fruitful to elicit results that collaborate with the area.

APA, Harvard, Vancouver, ISO, and other styles

4

Tkachev, Ivan, Roman Vasilyev, and Elena Belousova. "Cluster analysis of lightning discharges: based on Vereya-MR network data." Solar-Terrestrial Physics 7, no. 4 (2021): 85–92. http://dx.doi.org/10.12737/stp-74202109.

Full text

Abstract:

Monitoring thunderstorm activity can help you solve many problems such as infrastructure facility protection, warning of hazardous phenomena associated with intense precipitation, study of conditions for the occurrence of thunderstorms and the degree of their influence on human activity, as well as the influence of thunderstorm activity on the formation of near-Earth space. We investigate the characteristics of thunderstorm cells by the method of cluster analysis. We take the Vereya-MR network data accumulated over a period from 2012 to 2018 as a basis. The Vereya-MR network considered in this paper is included in networks operating in the VLF-LF range (long and super-long radio waves). Reception points equipped with recording equipment, primary information processing systems, communication systems, precision time and positioning devices based on global satellite navigation systems are located throughout Russia. In the longitudinal-latitudinal thunderstorm distributions of interest, the dependence on the location of recording devices might be manifested. We compare the behavior of thunderstorms on the entire territory of the Russian Federation with those in the Baikal natural territory. We have established the power of thunderstorms over the Baikal region is lower. The daily variation in thunderstorm cells we obtained is consistent with the data from other works. There are no differences in other thunderstorm characteristics between the regions under study. This might be due to peculiarities of the analysis method. On the basis of the work performed, we propose sites for new points of our own lightning location network, as well as additional methods of cluster analysis.

APA, Harvard, Vancouver, ISO, and other styles

5

Melnikov, B. F., P. I. Averin, and E. A. Melnikova. "Intelligent processing of acoustic emission data based on cluster analysis." Journal of Physics: Conference Series 1236 (June 2019): 012044. http://dx.doi.org/10.1088/1742-6596/1236/1/012044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Rose, Rodrigo L., Tejas G. Puranik, and Dimitri N. Mavris. "Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives." Aerospace 7, no. 10 (2020): 143. http://dx.doi.org/10.3390/aerospace7100143.

Full text

Abstract:

The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for the analysis of aviation safety narratives based on text-based accounts of in-flight events and categorical metadata parameters which accompany them. An extensive pre-processing routine is presented, including a comparison between numeric models of textual representation for the purposes of document classification. A framework for categorizing and visualizing narratives is presented through a combination of k-means clustering and 2-D mapping with t-Distributed Stochastic Neighbor Embedding (t-SNE). A cluster post-processing routine is developed for identifying driving factors in each cluster and building a hierarchical structure of cluster and sub-cluster labels. The Aviation Safety Reporting System (ASRS), which includes over a million de-identified voluntarily submitted reports describing aviation safety incidents for commercial flights, is analyzed as a case study for the methodology. The method results in the identification of 10 major clusters and a total of 31 sub-clusters. The identified groupings are post-processed through metadata-based statistical analysis of the learned clusters. The developed method shows promise in uncovering trends from clusters that are not evident in existing anomaly labels in the data and offers a new tool for obtaining insights from text-based safety data that complement existing approaches.

APA, Harvard, Vancouver, ISO, and other styles

7

Jung, Se-Hoon, Jong-Chan Kim, and Chun-Bo Sim. "Prediction Data Processing Scheme using an Artificial Neural Network and Data Clustering for Big Data." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (2016): 330. http://dx.doi.org/10.11591/ijece.v6i1.9334.

Full text

Abstract:

Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster choice through principal component analysis, and expanding the scope of data clustering. The technique also applies an artificial neural network learning model for user recommendation and prediction from the clustered data. The proposed processing model for predicted data generated results that improved the existing artificial neural network–based data clustering and learning model by approximately 9.25%.

APA, Harvard, Vancouver, ISO, and other styles

8

Jung, Se-Hoon, Jong-Chan Kim, and Chun-Bo Sim. "Prediction Data Processing Scheme using an Artificial Neural Network and Data Clustering for Big Data." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (2016): 330. http://dx.doi.org/10.11591/ijece.v6i1.pp330-336.

Full text

Abstract:

Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster choice through principal component analysis, and expanding the scope of data clustering. The technique also applies an artificial neural network learning model for user recommendation and prediction from the clustered data. The proposed processing model for predicted data generated results that improved the existing artificial neural network–based data clustering and learning model by approximately 9.25%.

APA, Harvard, Vancouver, ISO, and other styles

9

Susanty, Aries, Bambang Purwanggono, Nia Budi Puspitasari, and Chellsy Allison. "Conjoint Analysis for Evaluation of Customer’s Preference of Analgesic Generic Medicines under Non-proprietary Names." E3S Web of Conferences 202 (2020): 12022. http://dx.doi.org/10.1051/e3sconf/202020212022.

Full text

Abstract:

The main objective of this research is to get greater insight into the customer preferences in purchasing analgesic generic medicines under the non-proprietary name and to identify clusters with different preference structures. This research uses conjoint analysis (CA) and cluster analysis as data processing. This research collects the data through questionnaire from 200 respondents and uses the convenience sampling method to choose 200 respondents from sixteen districts in Semarang. The result of data processing with conjoint analysis indicated that customer prefers the analgesic generic medicine under the non-proprietary name with the following condition: the price of 20% of analgesic generic-branded minutes, has 15 minutes onset time of effect, can be purchased at minimarket, in the form of syrup, and the source of information is family and friend. Moreover, the result of data processing also indicated that the importance of attribute is the place of purchase, followed by price, onset time of drugs, the form of drugs, and, the source of information. Then, the result of data processing with clustering analysis indicated that the respondent can be grouped into four clusters. The attribute that has the highest importance level in cluster 1 until cluster 4 is ‘form of drugs’, ‘the place of purchase’, ‘source of information’, and ‘price’, respectively.

APA, Harvard, Vancouver, ISO, and other styles

10

Haryono Setiadi, Safira Nuri Safitri, and Esti Suryani. "Educational Data Mining Menggunakan Metode Analysis Cluster dan Decision Tree berdasarkan Log Mining." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 3 (2022): 448–56. http://dx.doi.org/10.29207/resti.v6i3.3935.

Full text

Abstract:

Educational Data Mining (EDM) often appears to be applied in big data processing in the education sector. One of the educational data that can be further processed with EDM is activity log data from an e-learning system used in teaching and learning activities. The log activity can be further processed more specifically by using log mining. The purpose of this study was to process log data from the Sebelas Maret University Online Learning System (SPADA UNS) to determine student learning behavior patterns and their relationship to the final results obtained. The data mining method applied in this research is cluster analysis with the K-means Clustering and Decision Tree algorithms. The clustering process is used to find groups of students who have similar learning patterns. While the decision tree is used to model the results of the clustering in order to enable the analysis and decision-making processes. Processing of 11,139 SPADA UNS log data resulted in 3 clusters with a Davies Bouldin Index (DBI) value of 0.229. The results of these three clusters are modeled by using a Decision Tree. The decision tree model in cluster 0 represents a group of students who have a low tendency of learning behavior patterns with the highest frequency of access to course viewing activities obtained accuracy of 74.42% . In cluster 1, which contains groups of students with high learning behavior patterns, have a high frequency of access to viewing discussion activities obtained accuracy of 76.47%. While cluster 2 is a group of students who have a pattern of learning behavior that is having a high frequency of access to the activity of sending assignments obtained accuracy of 90.00%.

APA, Harvard, Vancouver, ISO, and other styles

11

Caspart, René, Max Fischer, Manuel Giffels, et al. "Setup and commissioning of a high-throughput analysis cluster." EPJ Web of Conferences 245 (2020): 07007. http://dx.doi.org/10.1051/epjconf/202024507007.

Full text

Abstract:

Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster setup at Karlsruhe Institute of Technology. This cluster was designed for use cases demanding high data-throughput. Based on hyper converged servers this cluster offers 500 job slots and 1 PB of local storage. Combined with the 100 Gb network connection between the servers and a 200 Gb uplink to the Tier-1 storage, the cluster can sustain a data-throughput of 1 PB per day. In addition, the local storage provided by the hyper converged worker nodes can be used as cache space. This allows employing of caching approaches on the cluster, thereby enabling a more efficient usage of the disk space. In previous contributions this concept has been shown to lead to an expected speedup of 2 to 4 compared to conventional setups.

APA, Harvard, Vancouver, ISO, and other styles

12

Getmanets, O., A. Nekos, and M. Pelikhatyi. "CLUSTER ANALYSIS AND RADIATION MONITORING OF ENVIRONMENT." Visnyk of Taras Shevchenko National University of Kyiv. Geology, no. 3 (86) (2019): 75–79. http://dx.doi.org/10.17721/1728-2713.86.11.

Full text

Abstract:

Building a background radiation field on the ground on the basis of measurement data taken at a finite number of points is one of the most important tasks of radiation monitoring. The aim of the work: to study the possibility of applying cluster analysis for the tasks of radiation monitoring of the environment. Cluster analysis is a multidimensional statistical analysis. Its main purpose is to split the set of objects under study (observation points) into homogeneous groups or clusters, that is, the task of classifying data and identifying the corresponding structure in them is solved. Methods of research: the measurements of the power of the ambient dose of continuous X-ray and gamma radiation on the terrain by using the MKS-05 dosimeter "TERRA-0"; processing of the obtained data by cluster analysis methods using the computer program "Statistics-10", wherein each cluster point is characterized by three coordinates: two coordinates on the ground and the power of the ambient dose of radiation at a given point; Euclidean distance was chosen as the distance between two points. Results: after processing data using various clustering methods: the method of Complete Linkage, the method of Weighted pair-group average and the Ward's method, it was found that the results of the analysis practically coincide with each other, that proves the reliability of the application of cluster analysis for the tasks of radiation monitoring of the environment and mapping of radiation pollution. Conclusions: the concept of a "radiation cluster" was first formulated in this work, combining coordinates on a plane with an ambient dose rate;the possibility of using cluster analysis to construct a map of radiation pollution of the environment has been proved by sequential projectionfrom more connected to less connected radiation clusters onto the plane of the controlled zone. In this sense, cluster analysis is similar to the operator approach to the construction of the radiation field. For further research, it is of some interest to study the issues of integration of cluster analysis with geographic information systems.

APA, Harvard, Vancouver, ISO, and other styles

13

Arifiyanti, Amalia Anjani, Farhan Setiyo Darusman, and Brahmantio Widyo Trenggono. "Population Density Cluster Analysis in DKI Jakarta Province Using K-Means Algorithm." Journal of Information Systems and Informatics 4, no. 3 (2022): 772–83. http://dx.doi.org/10.51519/journalisi.v4i3.315.

Full text

Abstract:

This study aims to analyze clusters based on the area and population density of the area and population density of the area in DKI Jakarta Province in 2015 using the data mining method by clustering as the first step in planning for population equality. The subject of analysis in this study is a village located in the province of DKI Jakarta which is recorded based on the area and population density in each sub-district until 2015 with several stages, namely data understanding, data processing or cleansing, cluster tendency assessment, clustering, cluster review. From this study, the results were obtained that the data tended to be clustered because the statistical value of Hopkins was close to the value of 0 and in VAT there was a vague picture of clusters that might be formed. Based on this, cluster creation is carried out using the K-Means Algorithm. Based on the results, there are 3 clusters formed, namely cluster 0 (not densely populated), cluster 1 (medium population density), and cluster 2 (densely populated). These results can be used as a basis for policy making in population management.

APA, Harvard, Vancouver, ISO, and other styles

14

Nikitina, M. A., I. M. Chernukha, Ya M. Uzakov, and D. E. Nurmukhanbetova. "CLUSTER ANALYSIS FOR DATABASES TYPOLOGIZATION CHARACTERISTICS." Series of Geology and Technical Sciences 2, no. 446 (2021): 114–21. http://dx.doi.org/10.32014/2021.2518-170x.42.

Full text

Abstract:

The article deals with basic concepts of cluster analysis and data clustering. The authors give brief information on the history of cluster analysis and its first applications. The article gives the classification of methods by the way of data processing and analysis in cluster analysis. The detailed description of the popular, non- hierarchical K-means algorithm is given. When developing databases, their structure should provide for the division of products into clusters based on various characteristics. It is necessary to consider the division into clusters based on other characteristics, such as allergenicity (whether the product contains an allergic component or not) or carbohydrate content (important for diabetics). The content of protein, potassium and phosphates should be taken into account when developing diets for those suffering from kidney diseases. The presence of specific amino acids - for metabolic diseases, etc. In this way, food composition data and product clustering across different categories allow nutritionists to create interchangeable lists of meals with portion sizes, or lists of permitted and prohibited food products in terms of various diseases. The authors give the clustering of the database fragment of chemical composition of food products on the example of cottage cheese products and confectionary by one of the signs – the content of carbohydrates – in the R software environment by k-means. Food clusters based on carbohydrate content are very important in shaping the diet for diabetics. A visual gradation of products into clusters is demonstrated in the form of a dendrogram showing the degree of proximity of individual clusters. The resulting dendrogram contains 5 clusters. Cluster 4 includes the largest number of products (170 items) with an average carbohydrate content of 1.8 g with a variation range from 0 to 7.1 g. Food products and dishes that fall into this cluster are the least dangerous for people with diabetes. Cluster 5 includes only 8 products with a distribution of carbohydrates within the cluster from 62.60 to 80.40 g. This category of food should be excluded when preparing a diet for people with diabetes.

APA, Harvard, Vancouver, ISO, and other styles

15

Chen, Dan Dan, and Zhi Gang Yao. "Analysis on Ship Equipment Consumption Data Based on Data Mining." Advanced Materials Research 846-847 (November 2013): 1141–44. http://dx.doi.org/10.4028/www.scientific.net/amr.846-847.1141.

Full text

Abstract:

A comprehensive analysis on a large amount of ship equipment consumption data accumulated over the years is achieved through the establishment of data warehouse, online analytical processing, regression analysis, cluster analysis, etc. by means of data mining. The analysis results present important references for equipment guarantee department in terms of equipment preparation and carrying, etc. and provide the comprehensive analysis and utilization on massive ship maintenance support data with technical means.

APA, Harvard, Vancouver, ISO, and other styles

16

Zakharov, V. I., and P. A. Budnikov. "The application of cluster analysis to the processing of GPS-interferometry data." Moscow University Physics Bulletin 67, no. 1 (2012): 25–32. http://dx.doi.org/10.3103/s0027134912010262.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Wismüller, Axel, Oliver Lange, Johannes Behrends, et al. "Visualization of supervised functional MRI data processing methods by unsupervised cluster analysis." NeuroImage 13, no. 6 (2001): 285. http://dx.doi.org/10.1016/s1053-8119(01)91628-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Meng, Hai-Dong, Yu-Chen Song, Fei-Yan Song, and Hai-Tao Shen. "Research and application of cluster and association analysis in geochemical data processing." Computational Geosciences 15, no. 1 (2010): 87–98. http://dx.doi.org/10.1007/s10596-010-9199-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Bondarev, A. E. "VISUAL ANALYSIS AND PROCESSING OF CLUSTERS STRUCTURES IN MULTIDIMENSIONAL DATASETS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W4 (May 10, 2017): 151–54. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w4-151-2017.

Full text

Abstract:

The article is devoted to problems of visual analysis of clusters structures for a multidimensional datasets. For visual analyzing an approach of elastic maps design [1,2] is applied. This approach is quite suitable for processing and visualizing of multidimensional datasets. To analyze clusters in original data volume the elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of dataset in question are projected to the map. The extension of designed map to a flat plane allows one to get an insight about the cluster structure of multidimensional dataset. The approach of elastic maps does not require any a priori information about data in question and does not depend on data nature, data origin, etc. Elastic maps are usually combined with PCA approach. Being presented in the space based on three first principal components the elastic maps provide quite good results. The article describes the results of elastic maps approach application to visual analysis of clusters for different multidimensional datasets including medical data.

APA, Harvard, Vancouver, ISO, and other styles

20

Vijay Bhaskhar Reddy PP COMP.SCI.0560, Y., Dr L.S.S Reddy, and Dr S.S.N. Reddy. "An Efficient Density Based Clustering approach for High Dimensional Data." International Journal of Engineering & Technology 7, no. 2.32 (2018): 111. http://dx.doi.org/10.14419/ijet.v7i2.32.15381.

Full text

Abstract:

Data extraction, data processing, pattern mining and clustering are the important features in data mining. The extraction of data and formation of interesting patterns from huge datasets can be used in prediction and decision making for further analysis. This improves, the need for efficient and effective analysis methods to make use of this data. Clustering is one important technique in data mining. In clustering a set of items are divided into several clusters where inter-cluster similarity is minimized and intra-cluster similarity is maximized. Clustering techniques are easy to identify of class in large databases. However, the application to large databases rises the following requirements for clustering techniques: minimal requirements of domain knowledge to determine the input specifications, invention of clusters with absolute shape & certainty of large databases.. The existing clustering techniques offer no solution to the combination of requirements. The proposed clustering technique DBSCAN using KNN relying on a density-based notion of clusters which is accomplished to discover clusters of arbitrary shape.

APA, Harvard, Vancouver, ISO, and other styles

21

Huang, Weihua. "Research on the Revolution of Multidimensional Learning Space in the Big Data Environment." Complexity 2021 (May 18, 2021): 1–12. http://dx.doi.org/10.1155/2021/6583491.

Full text

Abstract:

Multiuser fair sharing of clusters is a classic problem in cluster construction. However, the cluster computing system for hybrid big data applications has the characteristics of heterogeneous requirements, which makes more and more cluster resource managers support fine-grained multidimensional learning resource management. In this context, it is oriented to multiusers of multidimensional learning resources. Shared clusters have become a new topic. A single consideration of a fair-shared cluster will result in a huge waste of resources in the context of discrete and dynamic resource allocation. Fairness and efficiency of cluster resource sharing for multidimensional learning resources are equally important. This paper studies big data processing technology and representative systems and analyzes multidimensional analysis and performance optimization technology. This article discusses the importance of discrete multidimensional learning resource allocation optimization in dynamic scenarios. At the same time, in view of the fact that most of the resources of the big data application cluster system are supplied to large jobs that account for a small proportion of job submissions, while the small jobs that account for a large proportion only use the characteristics of a small part of the system’s resources, the expected residual multidimensionality of large-scale work is proposed. The server with the least learning resources is allocated first, and only fair strategies are considered for small assignments. The topic index is distributed and stored on the system to realize the parallel processing of search to improve the efficiency of search processing. The effectiveness of RDIBT is verified through experimental simulation. The results show that RDIBT has higher performance than LSII index technology in index creation speed and search response speed. In addition, RDIBT can also ensure the scalability of the index system.

APA, Harvard, Vancouver, ISO, and other styles

22

Ma, Xiaoya, Zhaoqian Gong, Feng Zhang, Shun Wang, Xiaojun Liu, and Guangyou Fang. "An Automatic Drift-Measurement-Data-Processing Method with Digital Ionosondes." Remote Sensing 14, no. 19 (2022): 4710. http://dx.doi.org/10.3390/rs14194710.

Full text

Abstract:

Drift detection is one of the important detection modes in a digital ionosonde system. In this paper, a new data processing method is presented for boosting the automatic and high-quality drift measurement, which is helpful for long-term ionospheric observation, and has been successfully applied to the Chinese Academy of Sciences, Digital Ionosonde (CAS-DIS). Based on Doppler interferometry principle, this method can be successively divided into four constraint steps: extracting the stable echo data; restricting the ionospheric detection region; extracting the reliable reflection cluster, including Doppler filtering and coarse clustering analysis; and calculating the drift velocity. Ordinary wave (O-wave) data extraction, complementary code pulse compression and other data preprocessing techniques are used to improve the signal-to-noise ratio (SNR) of echo data. For the purpose of eliminating multiple echoes, the ionospheric region is determined by combining the optimal height range and detection frequencies obtained from the ionogram. Successively, Doppler filtering and coarse clustering analysis extract reliable reflection clusters. Finally, the weighting factor is brought in, and then weighted least-squares (WLS) is used to fit the drift velocity. The entire data processing process can be implemented automatically without constantly changing parameter settings due to changes in external conditions. This is the first time coarse clustering analysis has been used to extract the paracentral reflection cluster to eliminate scattered reflection points and outer reflection clusters, which further reduces the impacts of external conditions on parameter settings and improves the ability of automatic drift measurement. Compared with the previous method possessed by Digisonde Protable Sounder 4D (DPS4D), the new method can achieve comparable drift detection precision and results even with fewer reflection points. In 2021–2022, several experiments on F region drift detection were carried out in Hainan, China. Results indicate that drift velocities fitted by the new method have diurnal variation and change more gently; the trends of drift velocities fitted by the new method and the previous method are semblable; and this new method can be widely applied to digital ionosondes.

APA, Harvard, Vancouver, ISO, and other styles

23

Allam, Tahani M. "Estimate the Performance of Cloudera Decision Support Queries." International Journal of Online and Biomedical Engineering (iJOE) 18, no. 01 (2022): 127–38. http://dx.doi.org/10.3991/ijoe.v18i01.27877.

Full text

Abstract:

Hive and Impala queries are used to process a big amount of data. The overwriting amount of information requires an efficient data processing system. When we deal with a long-term batch query and analysis Hive will be more suitable for this query. Impala is the most powerful system suitable for real-time interactive Structured Query Language (SQL) query which are added a massive parallel processing to Hadoop distributed cluster. The data growth makes a problem with SQL Cluster because the execution processing time is increased. In this paper, a comparison is demonstrated between the performance time of Hive, Impala and SQL on two different data models with different queries chosen to test the performance. The results demonstrate that Impala outperforms Hive and SQL cluster when it comes to analyze data and processing tasks. Using two benchmark datasets, TPC-H and statistical computing, we compare the performance of Hive, Impala, and SQL clusters 2009 Statistical Graphics Data Expo.

APA, Harvard, Vancouver, ISO, and other styles

24

Turgaeva, A. A. "Cluster analysis in the control over the activity of insurance companies." Economic Analysis: Theory and Practice 19, no. 3 (2020): 541–63. http://dx.doi.org/10.24891/ea.19.3.541.

Full text

Abstract:

Subject. The article considers clustering of insurance companies as a type of informatization of economy for practical application by the internal control system. Objectives. The purpose is to present clusters and give their interpretation for insurance companies in relation to internal control; identify the possibility of clustering, using the Deductor Studio platform developed by Base Group for internal control systems. Methods. The study employs techniques of statistical research and data processing, mathematical methods, methods of grouping, and cluster analysis. Results. Clusters are presented by several indicators of insurance companies. The study reveals heterogeneity in the results of distribution according to the rating of companies in terms of various indicators. It confirms the need to use the cluster analysis in the internal control system. Conclusions. Cluster analysis enables the internal control system to take into account all data regardless of their amount, and avoid data sampling. It reduces the level of errors in the results of analysis and control.

APA, Harvard, Vancouver, ISO, and other styles

25

Nayak, Janmenjoy, Bighnaraj Naik, Pandit Byomakesha Dash, and Danilo Pelusi. "Optimal Fuzzy Cluster Partitioning by Crow Search Meta-Heuristic for Biomedical Data Analysis." International Journal of Applied Metaheuristic Computing 12, no. 2 (2021): 49–66. http://dx.doi.org/10.4018/ijamc.2021040104.

Full text

Abstract:

Biomedical data is often more unstructured in nature, and biomedical data processing task is becoming more complex day by day. Thus, biomedical informatics requires competent data analysis and data mining techniques for designing decision support system's framework to solve clinical and heathcare-related issues. Due to increasingly large and complex data sets and demand of biomedical informatics research, researchers are attracted towards automated machine learning models. This paper is proposed to design an efficient machine learning model based on fuzzy c-means with meta-heuristic optimizations for biomedical data analysis and clustering. The main contributions of this paper are 1) projecting an efficient machine learning model based on fuzzy c-means and meta-heuristic optimization for biomedical data classification, 2) employing benchmark validation techniques and critical hypothesises testing, and 3) providing a background for biomedical data processing with a view of data processing and mining.

APA, Harvard, Vancouver, ISO, and other styles

26

Maeda, Takahiro, and Hiroyuki Fujiwara. "Seismic Hazard Visualization from Big Simulation Data: Cluster Analysis of Long-Period Ground-Motion Simulation Data." Journal of Disaster Research 12, no. 2 (2017): 233–40. http://dx.doi.org/10.20965/jdr.2017.p0233.

Full text

Abstract:

This paper describes a method of extracting the relation between the ground-motion characteristics of each area and a seismic source model, based on ground-motion simulation data output in planar form for many earthquake scenarios, and the construction of a parallel distributed processing system where this method is implemented. The extraction is realized using two-stage clustering. In the first stage, the ground-motion indices and scenario parameters are used as input data to cluster the earthquake scenarios within each evaluation mesh. In the second stage, the meshes are clustered based on the similarity of earthquake-scenario clustering. Because the mesh clusters can be correlated to the geographical space, it is possible to extract the relation between the ground-motion characteristics of each area and the scenario parameters by examining the relation between the mesh clusters and scenario clusters obtained by the two-stage clustering. The results are displayed visually; they are saved as GeoTIFF image files. The system was applied to the long-period ground-motion simulation data for hypothetical megathrust earthquakes in the Nankai Trough. This confirmed that the relation between the extracted ground-motion characteristics of each area and scenario parameters is in agreement with the results of ground-motion simulations.

APA, Harvard, Vancouver, ISO, and other styles

27

Lin, Qiang, and Xilin Zhang. "Key Technologies of Media Big Data in-Depth Analysis System Based on 5G Platform." Journal of Physics: Conference Series 2294, no. 1 (2022): 012007. http://dx.doi.org/10.1088/1742-6596/2294/1/012007.

Full text

Abstract:

Abstract To meet the needs of large-scale users for personalized streaming media services with high speed, low delay, and high quality in a 5G mobile network environment, this paper studies the resource allocation mechanism of streaming media based on a 5G network from the perspective of user demand prediction, which can alleviate the pressure of mobile network, improve the utilization rate of streaming media resources and the quality of user service experience. The augmented reality visualization of large-scale social media data must rely on the computing power of distributed clusters. This paper constructs a distributed parallel processing framework in a high-performance cluster environment, which adopts a loosely coupled organizational structure. Each module can be combined, called, and expanded arbitrarily under the condition of following a unified interface. In this paper, the algebraic method of parallel computing algorithm is innovatively proposed to describe parallel processing tasks and organize and call large-scale data-parallel processing operators, which effectively supports the business requirements of large-scale parallel processing of large-scale spatial social media data and solves the bottleneck of large-scale spatial social media data-parallel processing.

APA, Harvard, Vancouver, ISO, and other styles

28

Mutasher, Watheq Ghanim, and Abbas Fadhil Aljuboori. "Real Time Big Data Sentiment Analysis and Classification of Facebook." Webology 19, no. 1 (2022): 1112–27. http://dx.doi.org/10.14704/web/v19i1/web19076.

Full text

Abstract:

Many peoples use Facebook to connect and share their views on various issues, with the majority of user-generated content consisting of textual information. Since there is so much actual data from people who are posting messages on their situation in real time thoughts on a range of subjects in everyday life, the collection and analysis of these data, which may well be helpful for political decision or public opinion monitoring, is a worthwhile research project. Therefore, in this paper doing to analyze for public text post on Facebook stream in real time through environment Hadoop ecosystem by using apache spark with NLTK python. The post or feeds are gathered form the Facebook API in real time the data stored database used Apache spark to quick query processing the text partitions in each data nodes (machine). Also used Amazon cloud based Hadoop cluster ecosystem into processing of huge data and eliminate on-site hardware, IT support, and other operational difficulties and installation configuration Hadoop such as Hadoop distribution file system and Apache spark. By using the principle of decision dictionary, emotion analysis is used as positive, negative, or neutral and execution two algorithms in machine learning (naive bias & support vector machine) to build model predict the outcome demonstrates a high level of precision in sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

29

Lim, Jong Beom, Jong-Suk Ahn, and Kang-Woo Lee. "Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing." Advanced Science Letters 22, no. 9 (2016): 2314–19. http://dx.doi.org/10.1166/asl.2016.7813.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Botvin, M., and A. Gertsiy. "COMPARISON OF CLUSTER ANALYSIS ALGORITHMS IN OBJECT RECOGNITION." Collection of scientific works of the State University of Infrastructure and Technologies series "Transport Systems and Technologies", no. 36 (December 30, 2020): 112–20. http://dx.doi.org/10.32703/2617-9040-2020-36-12.

Full text

Abstract:

The article is an overview of the direction of graphic image processing based on clustering algorithms. The analysis of prospects of application of algorithms of cluster analysis in digital image processing, in particular, at segmentation and compression of graphic images, and also at recognition of images in transport sphere of activity is carried out. Comparative modeling of such algorithms of cluster analysis as K-means, Mean-Shift (clustering of average shift) and DBSCAN (based on density of spatial clustering for applications with noise) on various types of data is carried out. The simulation was performed on synthetic datasets in a Jupyter Notebook environment using the Scikit-learn library. In particular, four data sets were generated in this environment, to which these clustering algorithms were applied. The simulation results showed that the K-means algorithm can effectively describe relatively simple shapes. In contrast, the mean shift does not require assumptions about the number of clusters and the shape of the distribution, but its performance depends on the choice of scale parameters. The DBSCAN algorithm can successfully detect more complex shapes, which emphasizes one of the strengths of this algorithm - the clustering of arbitrary data. The disadvantages of the selected algorithms are also given and it is indicated on which types of images they effectively work with the estimation of computational speed.

APA, Harvard, Vancouver, ISO, and other styles

31

Schreck, Tobias, Jürgen Bernard, Tatiana von Landesberger, and Jörn Kohlhammer. "Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps." Information Visualization 8, no. 1 (2009): 14–29. http://dx.doi.org/10.1057/ivs.2008.29.

Full text

Abstract:

Visual-interactive cluster analysis provides valuable tools for effectively analyzing large and complex data sets. Owing to desirable properties and an inherent predisposition for visualization, the Kohonen Feature Map (or Self-Organizing Map or SOM) algorithm is among the most popular and widely used visual clustering techniques. However, the unsupervised nature of the algorithm may be disadvantageous in certain applications. Depending on initialization and data characteristics, cluster maps (cluster layouts) may emerge that do not comply with user preferences, expectations or the application context. Considering SOM-based analysis of trajectory data, we propose a comprehensive visual-interactive monitoring and control framework extending the basic SOM algorithm. The framework implements the general Visual Analytics idea to effectively combine automatic data analysis with human expert supervision. It provides simple, yet effective facilities for visually monitoring and interactively controlling the trajectory clustering process at arbitrary levels of detail. The approach allows the user to leverage existing domain knowledge and user preferences, arriving at improved cluster maps. We apply the framework on several trajectory clustering problems, demonstrating its potential in combining both unsupervised (machine) and supervised (human expert) processing, in producing appropriate cluster results.

APA, Harvard, Vancouver, ISO, and other styles

32

Yu, Zhanqiu. "Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means." International Journal of Distributed Systems and Technologies 10, no. 1 (2019): 1–12. http://dx.doi.org/10.4018/ijdst.2019010101.

Full text

Abstract:

To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Hadoop cloud cluster platform, a K-means cluster analysis was achieved. In addition, based on the traditional clustering algorithm, a center point selection technology suitable for RFID IOT data clustering was selected. The results showed that the clustering efficiency was improved to some extent. As a result, an RFID Internet of things clustering analysis prototype system is designed and realized, which further tests the feasibility.

APA, Harvard, Vancouver, ISO, and other styles

33

Beidler, Peter, Mark Nguyen, and John Kang. "Extracting knowledge of NCI research directions from funding data using language processing." Journal of Clinical Oncology 39, no. 15_suppl (2021): e13547-e13547. http://dx.doi.org/10.1200/jco.2021.39.15_suppl.e13547.

Full text

Abstract:

e13547 Background: In fiscal year (FY) 2019, 42% of the $6 billion NCI budget went towards nearly 5,000 research project grants, of which about 60% are R01 type. Given the enormity of allocated resources, there is a need for the scientific community to have a more rigorous understanding of the cancer research landscape. While the NCI Budget Fact Book publishes statistics based on pre-designated codings, it is unclear if this method yields the best representation of fields within oncology. Open questions include: how many distinguishable areas of cancer research are being funded? Are there differences in growth rate, publication rate and geographic distribution among those areas? Addressing these questions in a systematic manner is a well-suited problem for unsupervised machine learning. Methods: We analyzed 55,362 R-type grants from FY 2010-2021 (up to 1/31/21) from NIH ExPORTER. Preprocessing was done on ‘Project Terms’ to weight their importance using TF*IDF vectorization and principal component analysis. We used minibatch K-means clustering repeated 100 times, with the best iteration chosen by Calinski-Harabasz clustering quality score. Over 100 repetitions, the Adjusted Rand Index was 0.9907±0.0037 (mean ±standard deviation), indicating robustness to initial conditions of K-means. For publication rate analysis, FY 2020 and 2021 grants were excluded, and 2021 grants were excluded for trajectory analysis. Optimal cluster number was determined based on a combination of inertia, Calinski-Halabasz, and silhouette scores. Results: We found the optimal number of 24 clusters to best represent separation of the R-type grant research directions. These 24 clusters clearly represent topics such as immunotherapy, cohort risk-factor studies, and imaging. Notable trends include increased funding of immunotherapy and targeted inhibitor clusters averaging +$9.9M and +$9.2M growth per year respectively over 10 years, and decreased funding of pathway regulation and genetics clusters averaging -$7.8M and -$5.7M per year. These examples suggest a broader trend that funding is shifting from basic to translational science. Further analysis shows that the targeted inhibitor cluster is most geographically skewed, with 30% of grants going to institutions in just three cities. The number of journal articles (per grant) also shows a bias, with development/training and genetics clusters having publication rates of 32.6 and 20.8 and randomized control trials having a publication rate of 6.3. The average publication rate of NCI was 14.8. Conclusions: Using a novel framework for unsupervised clustering of NCI grant key-phrases, we can organize research projects more holistically than keyword searching, and more efficiently than manual categorization. Our model identifies growing and shrinking areas of research and points out biases in location and publication rate across these areas.

APA, Harvard, Vancouver, ISO, and other styles

34

BELOKI, ZUHAITZ, XABIER ARTOLA, and AITOR SOROA. "A scalable architecture for data-intensive natural language processing." Natural Language Engineering 23, no. 5 (2017): 709–31. http://dx.doi.org/10.1017/s1351324917000092.

Full text

Abstract:

AbstractComputational power needs have greatly increased during the last years, and this is also the case in the Natural Language Processing (NLP) area, where thousands of documents must be processed, i.e., linguistically analyzed, in a reasonable time frame. These computing needs have implied a radical change in the computing architectures and big-scale text processing techniques used in NLP. In this paper, we present a scalable architecture for distributed language processing. The architecture uses Storm to combine diverse NLP modules into a processing chain, which carries out the linguistic analysis of documents. Scalability requires designing solutions that are able to run distributed programs in parallel and across large machine clusters. Using the architecture presented here, it is possible to integrate a set of third-party NLP modules into a unique processing chain which can be deployed onto a distributed environment, i.e., a cluster of machines, so allowing the language-processing modules run in parallel. No restrictions are placed a priori on the NLP modules apart of being able to consume and produce linguistic annotations following a given format. We show the feasibility of our approach by integrating two linguistic processing chains for English and Spanish. Moreover, we provide several scripts that allow building from scratch a whole distributed architecture that can be then easily installed and deployed onto a cluster of machines. The scripts and the NLP modules used in the paper are publicly available and distributed under free licenses. In the paper, we also describe a series of experiments carried out in the context of the NewsReader project with the goal of testing how the system behaves in different scenarios.

APA, Harvard, Vancouver, ISO, and other styles

35

Fakherldin, Mohammed, Ibrahim Aaker Targio Hashem, Abdullah Alzuabi, and Faiz Alotaibi. "Performance Evaluation of Hadoop in Cloud for Big Data." International Journal of Engineering & Technology 7, no. 4.15 (2018): 16. http://dx.doi.org/10.14419/ijet.v7i4.15.21363.

Full text

Abstract:

Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas in big data. Hadoop is one of the most popular platforms for big data, thus, Hadoop MapReduce is used to store data in Hadoop distributed file systems. While, cloud computing is considered an excellent candidate for storing and processing the big data. However, processing big data across multiple nodes is a challenging task. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. This paper provides a review and analysis of the impact of using physical versus cloud cluster in the processing a large amount of data. This analysis has an impact on the processing in terms of execution time and cost of using each one of them. The result indicates that the use of cloud virtual machines helped better utilize the resources of the host computer.

APA, Harvard, Vancouver, ISO, and other styles

36

Mendizabal-Ruiz, Gerardo, Israel Román-Godínez, Sulema Torres-Ramos, Ricardo A. Salido-Ruiz, Hugo Vélez-Pérez, and J. Alejandro Morales. "Genomic signal processing for DNA sequence clustering." PeerJ 6 (January 24, 2018): e4264. http://dx.doi.org/10.7717/peerj.4264.

Full text

Abstract:

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

APA, Harvard, Vancouver, ISO, and other styles

37

Hamad, Sumaya, Khattab Alheeti, Yossra Ali, and Shaimaa Shaker. "Clustering and Analysis of Dynamic Ad Hoc Network Nodes Movement Based on FCM Algorithm." International Journal of Online and Biomedical Engineering (iJOE) 16, no. 12 (2020): 47. http://dx.doi.org/10.3991/ijoe.v16i12.16067.

Full text

Abstract:

<p><strong>Abstract—</strong> Clustering is a major exploratory data mining activity, and a popular statistical data analysis technique used in many fields. Cluster analysis generally speaking isn't just an automated function, but rather reiterated information exploration procedure or multipurpose dynamic optimisation Comprising trial and error. Parameters for pre-processing and modeling data frequently need to be modified until the output hits the desired properties. -Data points in fuzzy clustering may probably belong to several clusters. Each Data Point is assigned membership grades. Such grades of membership reflect the degree to which data points belong to each cluster. The Fuzzy C-means clustering (FCM) algorithm is among the most widely used fuzzy clustering algorithms. In this paper We use this method to find typological analysis for dynamic Ad Hoc network nodes movement and demonstrate that we can achieve good performance of fuzziness on a simulated data set of dynamic ad hoc network nodes (DANET) and How to use this principle to formulate node clustering as a partitioning problem. Cluster analysis aims at grouping a collection of nodes into clusters in such a way that nodes seeing a high degree of correlation within the same cluster, whereas nodes members of various clusters are extremely dissimilar in nature. The FCM algorithm is used for implementation and evaluation the simulated data set using NS2 simulator with optimized AODV protocol. The results from the algorithm 's application show the technique achieved the maximum values of stability for both cluster centers and nodes (98.41 %, 99.99 %) respectively.<strong></strong></p>

APA, Harvard, Vancouver, ISO, and other styles

38

Prasad, Jagdish, and Rahul Rajawat. "A Note on Comparison between Statistical Cluster and Neural Network Cluster." Recent Patents on Engineering 13, no. 2 (2019): 166–73. http://dx.doi.org/10.2174/1872212112666180216161153.

Full text

Abstract:

Background: Cluster analysis is a data reduction technique in rows of the data matrix. This technique is widely used in engineering, biology, society, pattern recognition, and image processing. Objective: In this paper, self organized map (SOM) using the artificial neural network and different statistical techniques of cluster analysis are used on Population data of 33 districts of Rajasthan with 9 variables for comparison purpose. Methods: The goal of this work is to identify the most suitable technique for clustering the data by using the artificial neural network and different statistical clustering techniques. We received all patents regarding artificial neural network and k-means cluster method. Conclusion: The k-means cluster analysis is found as good as Neural Network cluster analysis, whereas Hierarchical cluster analysis and two steps cluster analysis provide some variation from the neural network cluster analysis.

APA, Harvard, Vancouver, ISO, and other styles

39

Eberle, Detlef G., and Hendrik Paasche. "Integrated data analysis for mineral exploration: A case study of clustering satellite imagery, airborne gamma-ray, and regional geochemical data suites." GEOPHYSICS 77, no. 4 (2012): B167—B176. http://dx.doi.org/10.1190/geo2011-0063.1.

Full text

Abstract:

Partitioning cluster algorithms have proven to be powerful tools for data-driven integration of large geoscientific databases. We used fuzzy Gustafson-Kessel cluster analysis to integrate Landsat imagery, airborne radiometric, and regional geochemical data to aid in the interpretation of a multimethod database. The survey area extends over [Formula: see text] and is located in the Northern Cape Province, South Africa. We carefully selected five variables for cluster analysis to avoid the clustering results being dominated by spatially high-correlated data sets that were present in our database. Unlike other, more popular cluster algorithms, such as k-means or fuzzy c-means, the Gustafson-Kessel algorithm requires no preclustering data processing, such as scaling or adjustment of histographic data distributions. The outcome of cluster analysis was a classified map that delineates prominent near-to-surface structures. To add value to the classified map, we compared the detected structures to mapped geology and additional geophysical ground-truthing data. We were able to associate the structures detected by cluster analysis to geophysical and geological information thus obtaining a pseudolithology map. The latter outlined an area with increased mineral potential where manganese mineralization, i.e., psilomelane, had been located.

APA, Harvard, Vancouver, ISO, and other styles

40

Qiao, Mu, and Zixuan Cheng. "A Novel Long- and Short-Term Memory Network with Time Series Data Analysis Capabilities." Mathematical Problems in Engineering 2020 (October 13, 2020): 1–9. http://dx.doi.org/10.1155/2020/8885625.

Full text

Abstract:

Time series data are an extremely important type of data in the real world. Time series data gradually accumulate over time. Due to the dynamic growth in time series data, they tend to have higher dimensions and large data scales. When performing cluster analysis on this type of data, there are shortcomings in using traditional feature extraction methods for processing. To improve the clustering performance on time series data, this study uses a recurrent neural network (RNN) to train the input data. First, an RNN called the long short-term memory (LSTM) network is used to extract the features of time series data. Second, pooling technology is used to reduce the dimensionality of the output features in the last layer of the LSTM network. Due to the long time series, the hidden layer in the LSTM network cannot remember the information at all times. As a result, it is difficult to obtain a compressed representation of the global information in the last layer. Therefore, it is necessary to combine the information from the previous hidden unit to supplement all of the data. By stacking all the hidden unit information and performing a pooling operation, a dimensionality reduction effect of the hidden unit information is achieved. In this way, the memory loss caused by an excessively long sequence is compensated. Finally, considering that many time series data are unbalanced data, the unbalanced K-means (UK-means) algorithm is used to cluster the features after dimensionality reduction. The experiments were conducted on multiple publicly available time series datasets. The experimental results show that LSTM-based feature extraction combined with the dimensionality reduction processing of the pooling technology and cluster processing for imbalanced data used in this study has a good effect on the processing of time series data.

APA, Harvard, Vancouver, ISO, and other styles

41

Yang, Yoonhee, Dongsun Yim, Wonjeong Park, Soo Jung Baek, and Min Ji Kang. "Exploring Real-Time Word Learning Skills and Its Related Factors in Preschool Children: An Eye-Tracking Study." Communication Sciences & Disorders 27, no. 3 (2022): 468–82. http://dx.doi.org/10.12963/csd.22894.

Full text

Abstract:

Objectives: This study aimed to identify the real-time word learning processing aspects of children in each group by classifying groups according to their actual vocabulary acquisition performance (offline processing data) in QUIL (Quick incidental learning). We compared whether there was a significant difference between the QUIL offline and online processing data of the two groups, and finally we attempted to explore whether QUIL offline and online processing data had a significant correlation with children’s working memory. Methods: Thirty-three children [21 with TD (Typically developing children); and 12 with SLI (Children with specific language impairment)] aged 3- to 6-year-old participated in this study. K-mean cluster analysis was conducted to create new groups based on QUIL offline scores, and to examine children’s online word learning processing. To analyze the children’s word learning process with an eye-tracker, the animations recorded with narration were shown to the children through a computer with an eye-tracker attached. Results: There was a significant difference between the two clusters at the third exposure condition in online data (fixation duration). In addition, there was a significant correlation between QUIL online processing and linguistic WM (Working memory) in cluster 1, and between QUIL offline scores and nonlinguistic WM in cluster 2. Conclusion: When new vocabulary is exposed for cluster 2, it can be inferred that the efficiency of vocabulary acquisition will be improved if visual information is intensively combined with language information.

APA, Harvard, Vancouver, ISO, and other styles

42

Azizah, Anestasya Nur, Tatik Widiharih, and Arief Rachman Hakim. "Kernel K-Means Clustering untuk Pengelompokan Sungai di Kota Semarang Berdasarkan Faktor Pencemaran Air." Jurnal Gaussian 11, no. 2 (2022): 228–36. http://dx.doi.org/10.14710/j.gauss.v11i2.35470.

Full text

Abstract:

K-Means Clustering is one of the types of non-hierarchical cluster analysis which is frequently used, but has a weakness in processing data with non-linearly separable (do not have clear boundaries) characteristic and overlapping cluster, that is when visually the results of a cluster are between other clusters. The Gaussian Kernel Function in Kernel K-Means Clustering can be used to solve data with non-linearly separable characteristic and overlapping cluster. The difference between Kernel K-Means Clustering and K-Means lies on the input data that have to be plotted in a new dimension using kernel function. The real data used are the data of 47 rivers and 18 indicators of river water pollution from Dinas Lingkungan Hidup (DLH) of Semarang City in the first semester of 2019. The cluster results evaluation is used the Calinski-Harabasz, Silhouette, and Xie-Beni indexes. The goals of this study are to know the step concepts and analysis results of Kernel K-Means Clustering for the grouping of rivers in Semarang City based on water pollution factors. Based on the results of the study, the cluster results evaluation show that the best number of clusters K=4

APA, Harvard, Vancouver, ISO, and other styles

43

Ghoneimy, Samy, and Samir Abou El-Seoud. "A MapReduce Framework for DNA Sequencing Data Processing." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 4, no. 4 (2016): 11. http://dx.doi.org/10.3991/ijes.v4i4.6537.

Full text

Abstract:

<p class="Els-1storder-head">Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC) and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format) file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA), Sequence Alignment/Map (SAM) ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie) that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp) algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable accuracy to alternative ‎methods for sequencing data processing.‎‎</p><p class="Abstract"><em></em><em><br /></em></p>

APA, Harvard, Vancouver, ISO, and other styles

44

Xiang, Hong, Anrong Wang, Guoqun Fu, Xue Luo, and Xudong Pan. "Fuzzy Cluster Analysis and Prediction of Psychiatric Health Data Based on BPNN." International Journal of Circuits, Systems and Signal Processing 16 (January 13, 2022): 497–503. http://dx.doi.org/10.46300/9106.2022.16.61.

Full text

Abstract:

PMH (psychiatry/mental health) is affected by many factors, among which there are numerous connections, so the prediction of PMH is a nonlinear problem. In this paper, BPNN (Back Propagation Neural Network) is applied to fuzzy clustering analysis and prediction of PMH data, and the rules and characteristics of PMH and behavioral characteristics of people with mental disorders are analyzed, and various internal relations among psychological test data are mined, thus providing scientific basis for establishing and perfecting early prevention and intervention of mental disorders in colleges and universities. Artificial neural network is a mathematical model of information processing, which is composed of synapses similar to the structure of brain neurons. The fuzzy clustering analysis and data prediction ability of optimized PMH data are obviously improved. Applying BPNN to the fuzzy clustering analysis and prediction of PMH data, analyzing the rules and characteristics of PMH and the behavioral characteristics of patients with mental disorders, can explore various internal relations among psychological test data, and provide scientific basis for establishing early prevention and intervention of mental disorders.

APA, Harvard, Vancouver, ISO, and other styles

45

Soucek, J., T. Dudok de Wit, M. Dunlop, and P. Décréau. "Local wavelet correlation: applicationto timing analysis of multi-satellite CLUSTER data." Annales Geophysicae 22, no. 12 (2004): 4185–96. http://dx.doi.org/10.5194/angeo-22-4185-2004.

Full text

Abstract:

Abstract. Multi-spacecraft space observations, such as those of CLUSTER, can be used to infer information about local plasma structures by exploiting the timing differences between subsequent encounters of these structures by individual satellites. We introduce a novel wavelet-based technique, the Local Wavelet Correlation (LWC), which allows one to match the corresponding signatures of large-scale structures in the data from multiple spacecraft and determine the relative time shifts between the crossings. The LWC is especially suitable for analysis of strongly non-stationary time series, where it enables one to estimate the time lags in a more robust and systematic way than ordinary cross-correlation techniques. The technique, together with its properties and some examples of its application to timing analysis of bow shock and magnetopause crossing observed by CLUSTER, are presented. We also compare the performance and reliability of the technique with classical discontinuity analysis methods. Key words. Radio science (signal processing) – Space plasma physics (discontinuities; instruments and techniques)

APA, Harvard, Vancouver, ISO, and other styles

46

Ahamad, M. K., and A. K. Bharti. "ANALYSIS THE CLUSTER PERFORMANCE OF REAL DATASET USING SPSS TOOL WITH K-MEANS APPROACH VIA PCA." Advances in Mathematics: Scientific Journal 10, no. 1 (2021): 535–42. http://dx.doi.org/10.37418/amsj.10.1.53.

Full text

Abstract:

Partitioning problems are handled by the idea of cluster and this technique which plays the essential work in mining of data from the given dataset. The K-Means cluster is well accepted theory to apply on huge datasets, but has some drawbacks. The factual dataset is taken from the repository of data used for clustering. Furthermore, as getting the outcome of this procedure is essential to resolve the limitations and quality enhanced of cluster by apply the Principal Component Analysis (PCA) on the dataset. In paper we have demonstrate the results by experimental for factual datasets with dissimilarities. We have worked to validate the experimental significant for the clusters metric and component size minimized for different dataset during the processing on SPSS tool on the basis of eigenvalues. In this research paper we also discussed the comparative analysis of distance between initial centroid of wine and disease of heart dataset at the level of cluster k=2 and k=3.

APA, Harvard, Vancouver, ISO, and other styles

47

Wu, Zhong, and Chuan Zhou. "Construction of an Intelligent Processing Platform for Equestrian Event Information Based on Data Fusion and Data Mining." Journal of Sensors 2021 (July 23, 2021): 1–9. http://dx.doi.org/10.1155/2021/1869281.

Full text

Abstract:

In the past two years, equestrian sports have become more and more popular with the public. Due to the comprehensive development of equestrian preparations for the 2020 Olympic Games in China, the equestrian sports industry presents an unprecedented favorable development environment in China. This article is aimed at studying the construction of an equestrian event information intelligent processing platform based on data fusion and data mining. This article introduces the relevant theoretical knowledge of data mining and data fusion, including the description of the concept of data mining, the common analysis methods and algorithms of data mining, the basic concepts of data fusion, and the functional structure of data fusion. It discusses various algorithms in cluster analysis and focuses on the analysis of distance measurement and similarity coefficient in cluster analysis. In the experimental part, in order to intelligently process and acquire information, an information intelligent processing platform is constructed based on data fusion and data mining technology. The experimental results of this paper show that the precision rate, recall rate, and F -score of the platform under closed test are much higher than those under open test, and the precision rate is increased by about 7.26%.

APA, Harvard, Vancouver, ISO, and other styles

48

Ma, Youwen, and Yi Wan. "Data Analysis Method of Intelligent Analysis Platform for Big Data of Film and Television." Complexity 2021 (April 16, 2021): 1–10. http://dx.doi.org/10.1155/2021/9947832.

Full text

Abstract:

Based on cloud computing and statistics theory, this paper proposes a reasonable analysis method for big data of film and television. The method selects Hadoop open source cloud platform as the basis, combines the MapReduce distributed programming model and HDFS distributed file storage system and other key cloud computing technologies. In order to cope with different data processing needs of film and television industry, association analysis, cluster analysis, factor analysis, and K-mean + association analysis algorithm training model were applied to model, process, and analyze the full data of film and TV series. According to the film type, producer, production region, investment, box office, audience rating, network score, audience group, and other factors, the film and television data in recent years are analyzed and studied. Based on the study of the impact of each attribute of film and television drama on film box office and TV audience rating, it is committed to the prediction of film and television industry and constantly verifies and improves the algorithm model.

APA, Harvard, Vancouver, ISO, and other styles

49

Zhou, Xuejun. "The Construction of Economic Data Processing System Based on the Net Cluster Technology." Journal of Computational and Theoretical Nanoscience 14, no. 1 (2017): 263–68. http://dx.doi.org/10.1166/jctn.2017.6159.

Full text

Abstract:

In order to improve the efficiency in economic data processing, the construction of the data processing system based on the net cluster technology is proposed in this paper. As an important part of Statistic Data Warehouse, the Macro-economy Emulation System, which is based on statistic data warehouse, is now being used on Government Decision Support System. Currently, the system is processing the emulation over the abundant national macro-economy data. It presents the building of Visual Macroeconomic Emulating System on Statistic Data Warehouse (SDWES), including data extraction and check-up, data normalization, and a mechanism to support market analysis and forecast. The experiment shows this paper has a reference value for the application of net cluster technology in the construction process of economic data processing system which can also promote the overall performance substantially.

APA, Harvard, Vancouver, ISO, and other styles

50

Jalilova, Samira. "Designing an effective calculation of a cluster analysis task." Scientific Bulletin 2 (2019): 7–12. http://dx.doi.org/10.54414/rdlb1970.

Full text

Abstract:

This paper investigates the problems of the algorithmic complexity of a cluster analysis task. The issues like further development of effective calculating methods and their inclusion in the system of data processing and realization of the proposed algorithm have been outlined, and the ways of further development of the algorithms have been reflected.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!