To see the other types of publications on this topic, follow the link: Cluster analysis – Data processing.

Journal articles on the topic 'Cluster analysis – Data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cluster analysis – Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zanev, Vladimir, Stanislav Topalov, and Veselin Christov. "Analysis and Data Mining of Lead-Zinc Ore Data." Serdica Journal of Computing 7, no. 3 (2014): 271–80. http://dx.doi.org/10.55630/sjc.2013.7.271-280.

Full text
Abstract:
This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A tes
APA, Harvard, Vancouver, ISO, and other styles
2

Karlashevych, Ivan, and Volodymyr Pravda. "Use of Cluster Analysis Method to Increase the Efficiency and Accuracy of Radar Data Processing." Computational Problems of Electrical Engineering 7, no. 1 (2017): 33–36. http://dx.doi.org/10.23939/jcpee2017.01.033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ocampo, Daniel Morin, and Luiz Caldeira Brant de Tolentino-Neto. "Cluster Analysis for Data Processing in Educational Research." Acta Scientiae 21, no. 4 (2019): 34–48. http://dx.doi.org/10.17648/acta.scientiae.v21iss4id5119.

Full text
Abstract:
Quantitative approaches to educational research have been undervalued and consequently less widely used. In this sense, this paper aims to present and analyze the techniques of Cluster Analysis as a possibility for research in sciences area. Therefore, the main hierarchical and non-hierarchical techniques of Cluster Analysis are presented, as well as some of their applications in educational research found in the literature. Cluster Analysis is adequate to simplify or elaborate hypotheses on massive data, such as large-scale educational research. The studies in the area of education that used
APA, Harvard, Vancouver, ISO, and other styles
4

Tkachev, Ivan, Roman Vasilyev, and Elena Belousova. "Cluster analysis of lightning discharges: based on Vereya-MR network data." Solar-Terrestrial Physics 7, no. 4 (2021): 85–92. http://dx.doi.org/10.12737/stp-74202109.

Full text
Abstract:
Monitoring thunderstorm activity can help you solve many problems such as infrastructure facility protection, warning of hazardous phenomena associated with intense precipitation, study of conditions for the occurrence of thunderstorms and the degree of their influence on human activity, as well as the influence of thunderstorm activity on the formation of near-Earth space. We investigate the characteristics of thunderstorm cells by the method of cluster analysis. We take the Vereya-MR network data accumulated over a period from 2012 to 2018 as a basis. The Vereya-MR network considered in this
APA, Harvard, Vancouver, ISO, and other styles
5

Melnikov, B. F., P. I. Averin, and E. A. Melnikova. "Intelligent processing of acoustic emission data based on cluster analysis." Journal of Physics: Conference Series 1236 (June 2019): 012044. http://dx.doi.org/10.1088/1742-6596/1236/1/012044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rose, Rodrigo L., Tejas G. Puranik, and Dimitri N. Mavris. "Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives." Aerospace 7, no. 10 (2020): 143. http://dx.doi.org/10.3390/aerospace7100143.

Full text
Abstract:
The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for th
APA, Harvard, Vancouver, ISO, and other styles
7

Jung, Se-Hoon, Jong-Chan Kim, and Chun-Bo Sim. "Prediction Data Processing Scheme using an Artificial Neural Network and Data Clustering for Big Data." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (2016): 330. http://dx.doi.org/10.11591/ijece.v6i1.9334.

Full text
Abstract:
Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster cho
APA, Harvard, Vancouver, ISO, and other styles
8

Jung, Se-Hoon, Jong-Chan Kim, and Chun-Bo Sim. "Prediction Data Processing Scheme using an Artificial Neural Network and Data Clustering for Big Data." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (2016): 330. http://dx.doi.org/10.11591/ijece.v6i1.pp330-336.

Full text
Abstract:
Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster cho
APA, Harvard, Vancouver, ISO, and other styles
9

Susanty, Aries, Bambang Purwanggono, Nia Budi Puspitasari, and Chellsy Allison. "Conjoint Analysis for Evaluation of Customer’s Preference of Analgesic Generic Medicines under Non-proprietary Names." E3S Web of Conferences 202 (2020): 12022. http://dx.doi.org/10.1051/e3sconf/202020212022.

Full text
Abstract:
The main objective of this research is to get greater insight into the customer preferences in purchasing analgesic generic medicines under the non-proprietary name and to identify clusters with different preference structures. This research uses conjoint analysis (CA) and cluster analysis as data processing. This research collects the data through questionnaire from 200 respondents and uses the convenience sampling method to choose 200 respondents from sixteen districts in Semarang. The result of data processing with conjoint analysis indicated that customer prefers the analgesic generic medi
APA, Harvard, Vancouver, ISO, and other styles
10

Haryono Setiadi, Safira Nuri Safitri, and Esti Suryani. "Educational Data Mining Menggunakan Metode Analysis Cluster dan Decision Tree berdasarkan Log Mining." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 3 (2022): 448–56. http://dx.doi.org/10.29207/resti.v6i3.3935.

Full text
Abstract:
Educational Data Mining (EDM) often appears to be applied in big data processing in the education sector. One of the educational data that can be further processed with EDM is activity log data from an e-learning system used in teaching and learning activities. The log activity can be further processed more specifically by using log mining. The purpose of this study was to process log data from the Sebelas Maret University Online Learning System (SPADA UNS) to determine student learning behavior patterns and their relationship to the final results obtained. The data mining method applied in th
APA, Harvard, Vancouver, ISO, and other styles
11

Caspart, René, Max Fischer, Manuel Giffels, et al. "Setup and commissioning of a high-throughput analysis cluster." EPJ Web of Conferences 245 (2020): 07007. http://dx.doi.org/10.1051/epjconf/202024507007.

Full text
Abstract:
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster
APA, Harvard, Vancouver, ISO, and other styles
12

Getmanets, O., A. Nekos, and M. Pelikhatyi. "CLUSTER ANALYSIS AND RADIATION MONITORING OF ENVIRONMENT." Visnyk of Taras Shevchenko National University of Kyiv. Geology, no. 3 (86) (2019): 75–79. http://dx.doi.org/10.17721/1728-2713.86.11.

Full text
Abstract:
Building a background radiation field on the ground on the basis of measurement data taken at a finite number of points is one of the most important tasks of radiation monitoring. The aim of the work: to study the possibility of applying cluster analysis for the tasks of radiation monitoring of the environment. Cluster analysis is a multidimensional statistical analysis. Its main purpose is to split the set of objects under study (observation points) into homogeneous groups or clusters, that is, the task of classifying data and identifying the corresponding structure in them is solved. Methods
APA, Harvard, Vancouver, ISO, and other styles
13

Arifiyanti, Amalia Anjani, Farhan Setiyo Darusman, and Brahmantio Widyo Trenggono. "Population Density Cluster Analysis in DKI Jakarta Province Using K-Means Algorithm." Journal of Information Systems and Informatics 4, no. 3 (2022): 772–83. http://dx.doi.org/10.51519/journalisi.v4i3.315.

Full text
Abstract:
This study aims to analyze clusters based on the area and population density of the area and population density of the area in DKI Jakarta Province in 2015 using the data mining method by clustering as the first step in planning for population equality. The subject of analysis in this study is a village located in the province of DKI Jakarta which is recorded based on the area and population density in each sub-district until 2015 with several stages, namely data understanding, data processing or cleansing, cluster tendency assessment, clustering, cluster review. From this study, the results w
APA, Harvard, Vancouver, ISO, and other styles
14

Nikitina, M. A., I. M. Chernukha, Ya M. Uzakov, and D. E. Nurmukhanbetova. "CLUSTER ANALYSIS FOR DATABASES TYPOLOGIZATION CHARACTERISTICS." Series of Geology and Technical Sciences 2, no. 446 (2021): 114–21. http://dx.doi.org/10.32014/2021.2518-170x.42.

Full text
Abstract:
The article deals with basic concepts of cluster analysis and data clustering. The authors give brief information on the history of cluster analysis and its first applications. The article gives the classification of methods by the way of data processing and analysis in cluster analysis. The detailed description of the popular, non- hierarchical K-means algorithm is given. When developing databases, their structure should provide for the division of products into clusters based on various characteristics. It is necessary to consider the division into clusters based on other characteristics, su
APA, Harvard, Vancouver, ISO, and other styles
15

Chen, Dan Dan, and Zhi Gang Yao. "Analysis on Ship Equipment Consumption Data Based on Data Mining." Advanced Materials Research 846-847 (November 2013): 1141–44. http://dx.doi.org/10.4028/www.scientific.net/amr.846-847.1141.

Full text
Abstract:
A comprehensive analysis on a large amount of ship equipment consumption data accumulated over the years is achieved through the establishment of data warehouse, online analytical processing, regression analysis, cluster analysis, etc. by means of data mining. The analysis results present important references for equipment guarantee department in terms of equipment preparation and carrying, etc. and provide the comprehensive analysis and utilization on massive ship maintenance support data with technical means.
APA, Harvard, Vancouver, ISO, and other styles
16

Zakharov, V. I., and P. A. Budnikov. "The application of cluster analysis to the processing of GPS-interferometry data." Moscow University Physics Bulletin 67, no. 1 (2012): 25–32. http://dx.doi.org/10.3103/s0027134912010262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Wismüller, Axel, Oliver Lange, Johannes Behrends, et al. "Visualization of supervised functional MRI data processing methods by unsupervised cluster analysis." NeuroImage 13, no. 6 (2001): 285. http://dx.doi.org/10.1016/s1053-8119(01)91628-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Meng, Hai-Dong, Yu-Chen Song, Fei-Yan Song, and Hai-Tao Shen. "Research and application of cluster and association analysis in geochemical data processing." Computational Geosciences 15, no. 1 (2010): 87–98. http://dx.doi.org/10.1007/s10596-010-9199-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bondarev, A. E. "VISUAL ANALYSIS AND PROCESSING OF CLUSTERS STRUCTURES IN MULTIDIMENSIONAL DATASETS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W4 (May 10, 2017): 151–54. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w4-151-2017.

Full text
Abstract:
The article is devoted to problems of visual analysis of clusters structures for a multidimensional datasets. For visual analyzing an approach of elastic maps design [1,2] is applied. This approach is quite suitable for processing and visualizing of multidimensional datasets. To analyze clusters in original data volume the elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of
APA, Harvard, Vancouver, ISO, and other styles
20

Vijay Bhaskhar Reddy PP COMP.SCI.0560, Y., Dr L.S.S Reddy, and Dr S.S.N. Reddy. "An Efficient Density Based Clustering approach for High Dimensional Data." International Journal of Engineering & Technology 7, no. 2.32 (2018): 111. http://dx.doi.org/10.14419/ijet.v7i2.32.15381.

Full text
Abstract:
Data extraction, data processing, pattern mining and clustering are the important features in data mining. The extraction of data and formation of interesting patterns from huge datasets can be used in prediction and decision making for further analysis. This improves, the need for efficient and effective analysis methods to make use of this data. Clustering is one important technique in data mining. In clustering a set of items are divided into several clusters where inter-cluster similarity is minimized and intra-cluster similarity is maximized. Clustering techniques are easy to identify of
APA, Harvard, Vancouver, ISO, and other styles
21

Huang, Weihua. "Research on the Revolution of Multidimensional Learning Space in the Big Data Environment." Complexity 2021 (May 18, 2021): 1–12. http://dx.doi.org/10.1155/2021/6583491.

Full text
Abstract:
Multiuser fair sharing of clusters is a classic problem in cluster construction. However, the cluster computing system for hybrid big data applications has the characteristics of heterogeneous requirements, which makes more and more cluster resource managers support fine-grained multidimensional learning resource management. In this context, it is oriented to multiusers of multidimensional learning resources. Shared clusters have become a new topic. A single consideration of a fair-shared cluster will result in a huge waste of resources in the context of discrete and dynamic resource allocatio
APA, Harvard, Vancouver, ISO, and other styles
22

Ma, Xiaoya, Zhaoqian Gong, Feng Zhang, Shun Wang, Xiaojun Liu, and Guangyou Fang. "An Automatic Drift-Measurement-Data-Processing Method with Digital Ionosondes." Remote Sensing 14, no. 19 (2022): 4710. http://dx.doi.org/10.3390/rs14194710.

Full text
Abstract:
Drift detection is one of the important detection modes in a digital ionosonde system. In this paper, a new data processing method is presented for boosting the automatic and high-quality drift measurement, which is helpful for long-term ionospheric observation, and has been successfully applied to the Chinese Academy of Sciences, Digital Ionosonde (CAS-DIS). Based on Doppler interferometry principle, this method can be successively divided into four constraint steps: extracting the stable echo data; restricting the ionospheric detection region; extracting the reliable reflection cluster, incl
APA, Harvard, Vancouver, ISO, and other styles
23

Allam, Tahani M. "Estimate the Performance of Cloudera Decision Support Queries." International Journal of Online and Biomedical Engineering (iJOE) 18, no. 01 (2022): 127–38. http://dx.doi.org/10.3991/ijoe.v18i01.27877.

Full text
Abstract:
Hive and Impala queries are used to process a big amount of data. The overwriting amount of information requires an efficient data processing system. When we deal with a long-term batch query and analysis Hive will be more suitable for this query. Impala is the most powerful system suitable for real-time interactive Structured Query Language (SQL) query which are added a massive parallel processing to Hadoop distributed cluster. The data growth makes a problem with SQL Cluster because the execution processing time is increased. In this paper, a comparison is demonstrated between the performanc
APA, Harvard, Vancouver, ISO, and other styles
24

Turgaeva, A. A. "Cluster analysis in the control over the activity of insurance companies." Economic Analysis: Theory and Practice 19, no. 3 (2020): 541–63. http://dx.doi.org/10.24891/ea.19.3.541.

Full text
Abstract:
Subject. The article considers clustering of insurance companies as a type of informatization of economy for practical application by the internal control system. Objectives. The purpose is to present clusters and give their interpretation for insurance companies in relation to internal control; identify the possibility of clustering, using the Deductor Studio platform developed by Base Group for internal control systems. Methods. The study employs techniques of statistical research and data processing, mathematical methods, methods of grouping, and cluster analysis. Results. Clusters are pres
APA, Harvard, Vancouver, ISO, and other styles
25

Nayak, Janmenjoy, Bighnaraj Naik, Pandit Byomakesha Dash, and Danilo Pelusi. "Optimal Fuzzy Cluster Partitioning by Crow Search Meta-Heuristic for Biomedical Data Analysis." International Journal of Applied Metaheuristic Computing 12, no. 2 (2021): 49–66. http://dx.doi.org/10.4018/ijamc.2021040104.

Full text
Abstract:
Biomedical data is often more unstructured in nature, and biomedical data processing task is becoming more complex day by day. Thus, biomedical informatics requires competent data analysis and data mining techniques for designing decision support system's framework to solve clinical and heathcare-related issues. Due to increasingly large and complex data sets and demand of biomedical informatics research, researchers are attracted towards automated machine learning models. This paper is proposed to design an efficient machine learning model based on fuzzy c-means with meta-heuristic optimizati
APA, Harvard, Vancouver, ISO, and other styles
26

Maeda, Takahiro, and Hiroyuki Fujiwara. "Seismic Hazard Visualization from Big Simulation Data: Cluster Analysis of Long-Period Ground-Motion Simulation Data." Journal of Disaster Research 12, no. 2 (2017): 233–40. http://dx.doi.org/10.20965/jdr.2017.p0233.

Full text
Abstract:
This paper describes a method of extracting the relation between the ground-motion characteristics of each area and a seismic source model, based on ground-motion simulation data output in planar form for many earthquake scenarios, and the construction of a parallel distributed processing system where this method is implemented. The extraction is realized using two-stage clustering. In the first stage, the ground-motion indices and scenario parameters are used as input data to cluster the earthquake scenarios within each evaluation mesh. In the second stage, the meshes are clustered based on t
APA, Harvard, Vancouver, ISO, and other styles
27

Lin, Qiang, and Xilin Zhang. "Key Technologies of Media Big Data in-Depth Analysis System Based on 5G Platform." Journal of Physics: Conference Series 2294, no. 1 (2022): 012007. http://dx.doi.org/10.1088/1742-6596/2294/1/012007.

Full text
Abstract:
Abstract To meet the needs of large-scale users for personalized streaming media services with high speed, low delay, and high quality in a 5G mobile network environment, this paper studies the resource allocation mechanism of streaming media based on a 5G network from the perspective of user demand prediction, which can alleviate the pressure of mobile network, improve the utilization rate of streaming media resources and the quality of user service experience. The augmented reality visualization of large-scale social media data must rely on the computing power of distributed clusters. This p
APA, Harvard, Vancouver, ISO, and other styles
28

Mutasher, Watheq Ghanim, and Abbas Fadhil Aljuboori. "Real Time Big Data Sentiment Analysis and Classification of Facebook." Webology 19, no. 1 (2022): 1112–27. http://dx.doi.org/10.14704/web/v19i1/web19076.

Full text
Abstract:
Many peoples use Facebook to connect and share their views on various issues, with the majority of user-generated content consisting of textual information. Since there is so much actual data from people who are posting messages on their situation in real time thoughts on a range of subjects in everyday life, the collection and analysis of these data, which may well be helpful for political decision or public opinion monitoring, is a worthwhile research project. Therefore, in this paper doing to analyze for public text post on Facebook stream in real time through environment Hadoop ecosystem b
APA, Harvard, Vancouver, ISO, and other styles
29

Lim, Jong Beom, Jong-Suk Ahn, and Kang-Woo Lee. "Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing." Advanced Science Letters 22, no. 9 (2016): 2314–19. http://dx.doi.org/10.1166/asl.2016.7813.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Botvin, M., and A. Gertsiy. "COMPARISON OF CLUSTER ANALYSIS ALGORITHMS IN OBJECT RECOGNITION." Collection of scientific works of the State University of Infrastructure and Technologies series "Transport Systems and Technologies", no. 36 (December 30, 2020): 112–20. http://dx.doi.org/10.32703/2617-9040-2020-36-12.

Full text
Abstract:
The article is an overview of the direction of graphic image processing based on clustering algorithms. The analysis of prospects of application of algorithms of cluster analysis in digital image processing, in particular, at segmentation and compression of graphic images, and also at recognition of images in transport sphere of activity is carried out. Comparative modeling of such algorithms of cluster analysis as K-means, Mean-Shift (clustering of average shift) and DBSCAN (based on density of spatial clustering for applications with noise) on various types of data is carried out. The simula
APA, Harvard, Vancouver, ISO, and other styles
31

Schreck, Tobias, Jürgen Bernard, Tatiana von Landesberger, and Jörn Kohlhammer. "Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps." Information Visualization 8, no. 1 (2009): 14–29. http://dx.doi.org/10.1057/ivs.2008.29.

Full text
Abstract:
Visual-interactive cluster analysis provides valuable tools for effectively analyzing large and complex data sets. Owing to desirable properties and an inherent predisposition for visualization, the Kohonen Feature Map (or Self-Organizing Map or SOM) algorithm is among the most popular and widely used visual clustering techniques. However, the unsupervised nature of the algorithm may be disadvantageous in certain applications. Depending on initialization and data characteristics, cluster maps (cluster layouts) may emerge that do not comply with user preferences, expectations or the application
APA, Harvard, Vancouver, ISO, and other styles
32

Yu, Zhanqiu. "Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means." International Journal of Distributed Systems and Technologies 10, no. 1 (2019): 1–12. http://dx.doi.org/10.4018/ijdst.2019010101.

Full text
Abstract:
To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Had
APA, Harvard, Vancouver, ISO, and other styles
33

Beidler, Peter, Mark Nguyen, and John Kang. "Extracting knowledge of NCI research directions from funding data using language processing." Journal of Clinical Oncology 39, no. 15_suppl (2021): e13547-e13547. http://dx.doi.org/10.1200/jco.2021.39.15_suppl.e13547.

Full text
Abstract:
e13547 Background: In fiscal year (FY) 2019, 42% of the $6 billion NCI budget went towards nearly 5,000 research project grants, of which about 60% are R01 type. Given the enormity of allocated resources, there is a need for the scientific community to have a more rigorous understanding of the cancer research landscape. While the NCI Budget Fact Book publishes statistics based on pre-designated codings, it is unclear if this method yields the best representation of fields within oncology. Open questions include: how many distinguishable areas of cancer research are being funded? Are there diff
APA, Harvard, Vancouver, ISO, and other styles
34

BELOKI, ZUHAITZ, XABIER ARTOLA, and AITOR SOROA. "A scalable architecture for data-intensive natural language processing." Natural Language Engineering 23, no. 5 (2017): 709–31. http://dx.doi.org/10.1017/s1351324917000092.

Full text
Abstract:
AbstractComputational power needs have greatly increased during the last years, and this is also the case in the Natural Language Processing (NLP) area, where thousands of documents must be processed, i.e., linguistically analyzed, in a reasonable time frame. These computing needs have implied a radical change in the computing architectures and big-scale text processing techniques used in NLP. In this paper, we present a scalable architecture for distributed language processing. The architecture uses Storm to combine diverse NLP modules into a processing chain, which carries out the linguistic
APA, Harvard, Vancouver, ISO, and other styles
35

Fakherldin, Mohammed, Ibrahim Aaker Targio Hashem, Abdullah Alzuabi, and Faiz Alotaibi. "Performance Evaluation of Hadoop in Cloud for Big Data." International Journal of Engineering & Technology 7, no. 4.15 (2018): 16. http://dx.doi.org/10.14419/ijet.v7i4.15.21363.

Full text
Abstract:
Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas in big data. Hadoop is one of the most popular platforms for big data, thus, Hadoop MapReduce is used to store data in Hadoop distributed file systems. While, cloud computing is considered an excellent candidate for storing and processing the big data. However, processing big data across multiple nodes is a challenging task. The problem is even more compl
APA, Harvard, Vancouver, ISO, and other styles
36

Mendizabal-Ruiz, Gerardo, Israel Román-Godínez, Sulema Torres-Ramos, Ricardo A. Salido-Ruiz, Hugo Vélez-Pérez, and J. Alejandro Morales. "Genomic signal processing for DNA sequence clustering." PeerJ 6 (January 24, 2018): e4264. http://dx.doi.org/10.7717/peerj.4264.

Full text
Abstract:
Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and anal
APA, Harvard, Vancouver, ISO, and other styles
37

Hamad, Sumaya, Khattab Alheeti, Yossra Ali, and Shaimaa Shaker. "Clustering and Analysis of Dynamic Ad Hoc Network Nodes Movement Based on FCM Algorithm." International Journal of Online and Biomedical Engineering (iJOE) 16, no. 12 (2020): 47. http://dx.doi.org/10.3991/ijoe.v16i12.16067.

Full text
Abstract:
<p><strong>Abstract—</strong> Clustering is a major exploratory data mining activity, and a popular statistical data analysis technique used in many fields. Cluster analysis generally speaking isn't just an automated function, but rather reiterated information exploration procedure or multipurpose dynamic optimisation Comprising trial and error. Parameters for pre-processing and modeling data frequently need to be modified until the output hits the desired properties. -Data points in fuzzy clustering may probably belong to several clusters. Each Data Point is assigned members
APA, Harvard, Vancouver, ISO, and other styles
38

Prasad, Jagdish, and Rahul Rajawat. "A Note on Comparison between Statistical Cluster and Neural Network Cluster." Recent Patents on Engineering 13, no. 2 (2019): 166–73. http://dx.doi.org/10.2174/1872212112666180216161153.

Full text
Abstract:
Background: Cluster analysis is a data reduction technique in rows of the data matrix. This technique is widely used in engineering, biology, society, pattern recognition, and image processing. Objective: In this paper, self organized map (SOM) using the artificial neural network and different statistical techniques of cluster analysis are used on Population data of 33 districts of Rajasthan with 9 variables for comparison purpose. Methods: The goal of this work is to identify the most suitable technique for clustering the data by using the artificial neural network and different statistical c
APA, Harvard, Vancouver, ISO, and other styles
39

Eberle, Detlef G., and Hendrik Paasche. "Integrated data analysis for mineral exploration: A case study of clustering satellite imagery, airborne gamma-ray, and regional geochemical data suites." GEOPHYSICS 77, no. 4 (2012): B167—B176. http://dx.doi.org/10.1190/geo2011-0063.1.

Full text
Abstract:
Partitioning cluster algorithms have proven to be powerful tools for data-driven integration of large geoscientific databases. We used fuzzy Gustafson-Kessel cluster analysis to integrate Landsat imagery, airborne radiometric, and regional geochemical data to aid in the interpretation of a multimethod database. The survey area extends over [Formula: see text] and is located in the Northern Cape Province, South Africa. We carefully selected five variables for cluster analysis to avoid the clustering results being dominated by spatially high-correlated data sets that were present in our database
APA, Harvard, Vancouver, ISO, and other styles
40

Qiao, Mu, and Zixuan Cheng. "A Novel Long- and Short-Term Memory Network with Time Series Data Analysis Capabilities." Mathematical Problems in Engineering 2020 (October 13, 2020): 1–9. http://dx.doi.org/10.1155/2020/8885625.

Full text
Abstract:
Time series data are an extremely important type of data in the real world. Time series data gradually accumulate over time. Due to the dynamic growth in time series data, they tend to have higher dimensions and large data scales. When performing cluster analysis on this type of data, there are shortcomings in using traditional feature extraction methods for processing. To improve the clustering performance on time series data, this study uses a recurrent neural network (RNN) to train the input data. First, an RNN called the long short-term memory (LSTM) network is used to extract the features
APA, Harvard, Vancouver, ISO, and other styles
41

Yang, Yoonhee, Dongsun Yim, Wonjeong Park, Soo Jung Baek, and Min Ji Kang. "Exploring Real-Time Word Learning Skills and Its Related Factors in Preschool Children: An Eye-Tracking Study." Communication Sciences & Disorders 27, no. 3 (2022): 468–82. http://dx.doi.org/10.12963/csd.22894.

Full text
Abstract:
Objectives: This study aimed to identify the real-time word learning processing aspects of children in each group by classifying groups according to their actual vocabulary acquisition performance (offline processing data) in QUIL (Quick incidental learning). We compared whether there was a significant difference between the QUIL offline and online processing data of the two groups, and finally we attempted to explore whether QUIL offline and online processing data had a significant correlation with children’s working memory. Methods: Thirty-three children [21 with TD (Typically developing chi
APA, Harvard, Vancouver, ISO, and other styles
42

Azizah, Anestasya Nur, Tatik Widiharih, and Arief Rachman Hakim. "Kernel K-Means Clustering untuk Pengelompokan Sungai di Kota Semarang Berdasarkan Faktor Pencemaran Air." Jurnal Gaussian 11, no. 2 (2022): 228–36. http://dx.doi.org/10.14710/j.gauss.v11i2.35470.

Full text
Abstract:
K-Means Clustering is one of the types of non-hierarchical cluster analysis which is frequently used, but has a weakness in processing data with non-linearly separable (do not have clear boundaries) characteristic and overlapping cluster, that is when visually the results of a cluster are between other clusters. The Gaussian Kernel Function in Kernel K-Means Clustering can be used to solve data with non-linearly separable characteristic and overlapping cluster. The difference between Kernel K-Means Clustering and K-Means lies on the input data that have to be plotted in a new dimension using k
APA, Harvard, Vancouver, ISO, and other styles
43

Ghoneimy, Samy, and Samir Abou El-Seoud. "A MapReduce Framework for DNA Sequencing Data Processing." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 4, no. 4 (2016): 11. http://dx.doi.org/10.3991/ijes.v4i4.6537.

Full text
Abstract:
<p class="Els-1storder-head">Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC) and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format)
APA, Harvard, Vancouver, ISO, and other styles
44

Xiang, Hong, Anrong Wang, Guoqun Fu, Xue Luo, and Xudong Pan. "Fuzzy Cluster Analysis and Prediction of Psychiatric Health Data Based on BPNN." International Journal of Circuits, Systems and Signal Processing 16 (January 13, 2022): 497–503. http://dx.doi.org/10.46300/9106.2022.16.61.

Full text
Abstract:
PMH (psychiatry/mental health) is affected by many factors, among which there are numerous connections, so the prediction of PMH is a nonlinear problem. In this paper, BPNN (Back Propagation Neural Network) is applied to fuzzy clustering analysis and prediction of PMH data, and the rules and characteristics of PMH and behavioral characteristics of people with mental disorders are analyzed, and various internal relations among psychological test data are mined, thus providing scientific basis for establishing and perfecting early prevention and intervention of mental disorders in colleges and u
APA, Harvard, Vancouver, ISO, and other styles
45

Soucek, J., T. Dudok de Wit, M. Dunlop, and P. Décréau. "Local wavelet correlation: applicationto timing analysis of multi-satellite CLUSTER data." Annales Geophysicae 22, no. 12 (2004): 4185–96. http://dx.doi.org/10.5194/angeo-22-4185-2004.

Full text
Abstract:
Abstract. Multi-spacecraft space observations, such as those of CLUSTER, can be used to infer information about local plasma structures by exploiting the timing differences between subsequent encounters of these structures by individual satellites. We introduce a novel wavelet-based technique, the Local Wavelet Correlation (LWC), which allows one to match the corresponding signatures of large-scale structures in the data from multiple spacecraft and determine the relative time shifts between the crossings. The LWC is especially suitable for analysis of strongly non-stationary time series, wher
APA, Harvard, Vancouver, ISO, and other styles
46

Ahamad, M. K., and A. K. Bharti. "ANALYSIS THE CLUSTER PERFORMANCE OF REAL DATASET USING SPSS TOOL WITH K-MEANS APPROACH VIA PCA." Advances in Mathematics: Scientific Journal 10, no. 1 (2021): 535–42. http://dx.doi.org/10.37418/amsj.10.1.53.

Full text
Abstract:
Partitioning problems are handled by the idea of cluster and this technique which plays the essential work in mining of data from the given dataset. The K-Means cluster is well accepted theory to apply on huge datasets, but has some drawbacks. The factual dataset is taken from the repository of data used for clustering. Furthermore, as getting the outcome of this procedure is essential to resolve the limitations and quality enhanced of cluster by apply the Principal Component Analysis (PCA) on the dataset. In paper we have demonstrate the results by experimental for factual datasets with dissi
APA, Harvard, Vancouver, ISO, and other styles
47

Wu, Zhong, and Chuan Zhou. "Construction of an Intelligent Processing Platform for Equestrian Event Information Based on Data Fusion and Data Mining." Journal of Sensors 2021 (July 23, 2021): 1–9. http://dx.doi.org/10.1155/2021/1869281.

Full text
Abstract:
In the past two years, equestrian sports have become more and more popular with the public. Due to the comprehensive development of equestrian preparations for the 2020 Olympic Games in China, the equestrian sports industry presents an unprecedented favorable development environment in China. This article is aimed at studying the construction of an equestrian event information intelligent processing platform based on data fusion and data mining. This article introduces the relevant theoretical knowledge of data mining and data fusion, including the description of the concept of data mining, th
APA, Harvard, Vancouver, ISO, and other styles
48

Ma, Youwen, and Yi Wan. "Data Analysis Method of Intelligent Analysis Platform for Big Data of Film and Television." Complexity 2021 (April 16, 2021): 1–10. http://dx.doi.org/10.1155/2021/9947832.

Full text
Abstract:
Based on cloud computing and statistics theory, this paper proposes a reasonable analysis method for big data of film and television. The method selects Hadoop open source cloud platform as the basis, combines the MapReduce distributed programming model and HDFS distributed file storage system and other key cloud computing technologies. In order to cope with different data processing needs of film and television industry, association analysis, cluster analysis, factor analysis, and K-mean + association analysis algorithm training model were applied to model, process, and analyze the full data
APA, Harvard, Vancouver, ISO, and other styles
49

Zhou, Xuejun. "The Construction of Economic Data Processing System Based on the Net Cluster Technology." Journal of Computational and Theoretical Nanoscience 14, no. 1 (2017): 263–68. http://dx.doi.org/10.1166/jctn.2017.6159.

Full text
Abstract:
In order to improve the efficiency in economic data processing, the construction of the data processing system based on the net cluster technology is proposed in this paper. As an important part of Statistic Data Warehouse, the Macro-economy Emulation System, which is based on statistic data warehouse, is now being used on Government Decision Support System. Currently, the system is processing the emulation over the abundant national macro-economy data. It presents the building of Visual Macroeconomic Emulating System on Statistic Data Warehouse (SDWES), including data extraction and check-up,
APA, Harvard, Vancouver, ISO, and other styles
50

Jalilova, Samira. "Designing an effective calculation of a cluster analysis task." Scientific Bulletin 2 (2019): 7–12. http://dx.doi.org/10.54414/rdlb1970.

Full text
Abstract:
This paper investigates the problems of the algorithmic complexity of a cluster analysis task. The issues like further development of effective calculating methods and their inclusion in the system of data processing and realization of the proposed algorithm have been outlined, and the ways of further development of the algorithms have been reflected.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!