To see the other types of publications on this topic, follow the link: Text Stream Clustering.

Journal articles on the topic 'Text Stream Clustering'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Text Stream Clustering.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Vo, Tham, and Phuc Do. "GOW-Stream: A novel approach of graph-of-words based mixture model for semantic-enhanced text stream clustering." Intelligent Data Analysis 25, no. 5 (September 15, 2021): 1211–31. http://dx.doi.org/10.3233/ida-205443.

Full text
Abstract:
Recently, rapid growth of social networks and online news resources from Internet have made text stream clustering become an insufficient application in multiple domains (e.g.: text retrieval diversification, social event detection, text summarization, etc.) Different from traditional static text clustering approach, text stream clustering task has specific key challenges related to the rapid change of topics/clusters and high-velocity of coming streaming document batches. Recent well-known model-based text stream clustering models, such as: DTM, DCT, MStream, etc. are considered as word-independent evaluation approach which means largely ignoring the relations between words while sampling clusters/topics. It definitely leads to the decrease of overall model accuracy performance, especially for short-length text documents such as comments, microblogs, etc. in social networks. To tackle these existing problems, in this paper we propose a novel approach of graph-of-words (GOWs) based text stream clustering, called GOW-Stream. The application of common GOWs which are generated from each document batch while sampling clusters/topics can support to overcome the word-independent evaluation challenge. Our proposed GOW-Stream is promising to significantly achieve better text stream clustering performance than recent state-of-the-art baselines. Extensive experiments on multiple benchmark real-world datasets demonstrate the effectiveness of our proposed model in both accuracy and time-consuming performances.
APA, Harvard, Vancouver, ISO, and other styles
2

Qiang, Jipeng, Wanyin Xu, Yun Li, Yunhao Yuan, and Yi Zhu. "Lifelong Learning Augmented Short Text Stream Clustering Method." IEEE Access 9 (2021): 70493–501. http://dx.doi.org/10.1109/access.2021.3078096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gong, Linghui, Jianping Zeng, and Shiyong Zhang. "Text stream clustering algorithm based on adaptive feature selection." Expert Systems with Applications 38, no. 3 (March 2011): 1393–99. http://dx.doi.org/10.1016/j.eswa.2010.07.041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ma, Hui Fang, and Hui Li Ma. "Combining Burst Detection for Hot Topic Extraction." Advanced Materials Research 268-270 (July 2011): 1283–88. http://dx.doi.org/10.4028/www.scientific.net/amr.268-270.1283.

Full text
Abstract:
As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.
APA, Harvard, Vancouver, ISO, and other styles
5

Taninpong, Phimphaka, and Sudsanguan Ngamsuriyaroj. "Tree-based text stream clustering with application to spam mail classification." International Journal of Data Mining, Modelling and Management 10, no. 4 (2018): 353. http://dx.doi.org/10.1504/ijdmmm.2018.095354.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ngamsuriyaroj, Sudsanguan, and Phimphaka Taninpong. "Tree-based text stream clustering with application to spam mail classification." International Journal of Data Mining, Modelling and Management 10, no. 4 (2018): 353. http://dx.doi.org/10.1504/ijdmmm.2018.10015879.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Pei, and Ze Deng. "Use of Distributed Semi-Supervised Clustering for Text Classification." Journal of Circuits, Systems and Computers 28, no. 08 (July 2019): 1950127. http://dx.doi.org/10.1142/s0218126619501275.

Full text
Abstract:
Text classification is an important way to handle and organize textual data. Among existing methods of text classification, semi-supervised clustering is a main-stream technique. In the era of ‘Big data’, the current semi-supervised clustering approaches for text classification generally do not apply for excessive costs in scalability and computing performance for massive text data. Aiming at this problem, this study proposes a scalable text classification algorithm for large-scale text collections, namely D-TESC by modifying a state-of-the-art semi-supervised clustering approach for text classification in a centralized fashion (TESC). D-TESC can process the textual data in a distributed manner to meet a great scalability. The experimental results indicate that (1) the D-TESC algorithm has a comparable classification quality with TESC, and (2) outperforms TESC by average 7.2 times by using eight CPU threads in terms of scalability.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Junyang, Zhiguo Gong, and Weiwen Liu. "A Dirichlet process biterm-based mixture model for short text stream clustering." Applied Intelligence 50, no. 5 (February 1, 2020): 1609–19. http://dx.doi.org/10.1007/s10489-019-01606-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kumar, Sushil, and Komal Kumar Bhatia. "Clustering Based Approach for Novelty Detection in Text Documents." Asian Journal of Computer Science and Technology 8, no. 2 (May 5, 2019): 116–21. http://dx.doi.org/10.51983/ajcst-2019.8.2.2130.

Full text
Abstract:
As the information is overloaded over the internet accessing of information from the internet according to a given query provides redundant and irrelevant information. It is necessary to retrieve relevant and novel information from a given query by the user. With the result of this the user will require minimum effort to access the information need. In this work we proposed a clustering based approach for novelty detection which will provide the relevant and novel documents for the information need. Based on the user query the incoming stream of documents will be clustered using k-means algorithm. Then the cluster heads are selected from the various clusters with the minimum distance. These cluster heads are the novel documents from a collection of documents from different clusters having the large distance. The proposed technique can be further used in the field of information retrieval.
APA, Harvard, Vancouver, ISO, and other styles
10

Hamou, Reda Mohamed, Abdelmalek Amine, and Ahmed Chaouki Lokbani. "The Social Spiders in the Clustering of Texts." International Journal of Artificial Life Research 3, no. 3 (July 2012): 1–14. http://dx.doi.org/10.4018/jalr.2012070101.

Full text
Abstract:
In this paper the authors experiment and test a new biomimetic approach based on social spiders to solve a combinatorial problem ie the automatic classification of texts because a very large data stream flows and particularly on the web. Representation of textual data was performed by a method independent of the language ie n-gram characters and words because there is currently no method of learning that can directly represent unstructured data (text). To validate the classification, the authors used a measure of evaluation based on recall and precision (F-measure). During the experiment, the authors found a powerful visualization tool in social spiders that they exploit to make visual classification.
APA, Harvard, Vancouver, ISO, and other styles
11

Park, Jun Pyo, Chang-Sup Park, and Yon Dohn Chung. "Energy and Latency Efficient Access of Wireless XML Stream." Journal of Database Management 21, no. 1 (January 2010): 58–79. http://dx.doi.org/10.4018/jdm.2010112303.

Full text
Abstract:
In this article, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data that enables energy and latencyefficient broadcasting of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains the tag name, attributes, and text content of an element, as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the stream with shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing in the mobile clients. Through extensive experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.
APA, Harvard, Vancouver, ISO, and other styles
12

Vo, Tham. "GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clustering." Neural Computing and Applications 34, no. 6 (October 28, 2021): 4321–41. http://dx.doi.org/10.1007/s00521-021-06563-w.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Krstajić, Miloš, Mohammad Najm-Araghi, Florian Mansmann, and Daniel A. Keim. "Story Tracker: Incremental visual text analytics of news story development." Information Visualization 12, no. 3-4 (July 2013): 308–23. http://dx.doi.org/10.1177/1473871613493996.

Full text
Abstract:
Online news sources produce thousands of news articles every day, reporting on local and global real-world events. New information quickly replaces the old, making it difficult for readers to put current events in the context of the past. The stories about these events have complex relationships and characteristics that are difficult to model: they can be weakly or strongly related or they can merge or split over time. In this article, we present a visual analytics system for temporal analysis of news stories in dynamic information streams, which combines interactive visualization and text mining techniques to facilitate the analysis of similar topics that split and merge over time. Text clustering algorithms extract stories from online news streams in consecutive time windows and identify similar stories from the past. The stories are displayed in a visualization, which (1) sorts the stories by minimizing clutter and overlap from edge crossings, (2) shows their temporal characteristics in different time frames with different levels of detail, and (3) allows incremental updates of the display without recalculating the past data. Stories can be interactively filtered by their duration and connectivity in order to be explored in full detail. To demonstrate the system’s capabilities for detailed dynamic text stream exploration, we present a use case with real news data about the Arabic Uprising in 2011.
APA, Harvard, Vancouver, ISO, and other styles
14

Busch, Lukas, Ruben van Heusden, and Maarten Marx. "Using Deep-Learned Vector Representations for Page Stream Segmentation by Agglomerative Clustering." Algorithms 16, no. 5 (May 18, 2023): 259. http://dx.doi.org/10.3390/a16050259.

Full text
Abstract:
Page stream segmentation (PSS) is the task of retrieving the boundaries that separate source documents given a consecutive stream of documents (for example, sequentially scanned PDF files). The task has recently gained more interest as a result of the digitization efforts of various companies and organizations, as they move towards having all their documents available online for improved searchability and accessibility for users. The current state-of-the-art approach is neural start of document page classification on representations of the text and/or images of pages using models such as Visual Geometry Group-16 (VGG-16) and BERT to classify individual pages. We view the task of PSS as a clustering task instead, hypothesizing that pages from one document are similar to each other and different to pages in other documents, something that is difficult to incorporate in the current approaches. We compare the segmentation performance of an agglomerative clustering method with a binary classification model based on images on a new publicly available dataset and experiment with using either pretrained or finetuned image vectors as inputs to the model. To adapt the clustering method to PSS, we propose the switch method to alleviate the effects of pages of the same class having a high similarity, and report an improvement in the scores using this method. Unfortunately, neither clustering with pretrained embeddings nor clustering with finetuned embeddings outperformed start of document page classification for PSS. However, clustering with either pretrained or finetuned representations is substantially more effective than the baseline, with finetuned embeddings outperforming pretrained embeddings. Finally, having the number of documents K as part of the input, in our use case a realistic assumption, has a surprisingly significant positive effect. In contrast to earlier papers, we evaluate PSS with the overlap weighted partial match F1 score, developed as a Panoptic Quality in the computer vision domain, a metric that is particularly well-suited to PSS as it can be used to measure document segmentation.
APA, Harvard, Vancouver, ISO, and other styles
15

Fazelabdolabadi, Babak, and Mohammad Hossein Golestan. "Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks." HighTech and Innovation Journal 1, no. 4 (December 1, 2020): 148–60. http://dx.doi.org/10.28991/hij-2020-01-04-02.

Full text
Abstract:
This article develops a Bayesian framework to quantify the absolute permeability of water in a porous structure from the geometry and clustering parameters of its underlying pore-throat network. These parameters include the network`s diameter, transivity, degree, centrality, assortativity, edge density, K-core decomposition, Kleinberg’s hub centrality scores, Kleinberg's authority centrality scores, length, and porosity. In addition, the incorporated clustering aspects of the networks have been determined with respect to several clustering criteria – edge betweenness, greedy optimization of modularity, multi-level optimization of modularity, and short random walks. As such, the article takes the first footsteps of creating a Database of Micro Networks for micro-scale porous structures, to be used as main input stream for the proposed Bayesian scheme. Doi: 10.28991/HIJ-2020-01-04-02 Full Text: PDF
APA, Harvard, Vancouver, ISO, and other styles
16

Weng, Jianshu, and Bu-Sung Lee. "Event Detection in Twitter." Proceedings of the International AAAI Conference on Web and Social Media 5, no. 1 (August 3, 2021): 401–8. http://dx.doi.org/10.1609/icwsm.v5i1.14102.

Full text
Abstract:
Twitter, as a form of social media, is fast emerging in recent years. Users are using Twitter to report real-life events. This paper focuses on detecting those events by analyzing the text stream in Twitter. Although event detection has long been a research topic, the characteristics of Twitter make it a non-trivial task. Tweets reporting such events are usually overwhelmed by high flood of meaningless “babbles”. Moreover, event detection algorithm needs to be scalable given the sheer amount of tweets. This paper attempts to tackle these challenges with EDCoW (Event Detection with Clustering of Wavelet-based Signals). EDCoW builds signals for individual words by applying wavelet analysis on the frequencybased raw signals of the words. It then filters away the trivial words by looking at their corresponding signal autocorrelations. The remaining words are then clustered to form events with a modularity-based graph partitioning technique. Experimental results show promising result of EDCoW.
APA, Harvard, Vancouver, ISO, and other styles
17

Kotseva, Bonka, Irene Vianini, Nikolaos Nikolaidis, Nicolò Faggiani, Kristina Potapova, Caroline Gasparro, Yaniv Steiner, et al. "Trend analysis of COVID-19 mis/disinformation narratives–A 3-year study." PLOS ONE 18, no. 11 (November 17, 2023): e0291423. http://dx.doi.org/10.1371/journal.pone.0291423.

Full text
Abstract:
To tackle the COVID-19 infodemic, we analysed 58,625 articles from 460 unverified sources, that is, sources that were indicated by fact checkers and other mis/disinformation experts as frequently spreading mis/disinformation, covering the period from 1 January 2020 to 31 December 2022. Our aim was to identify the main narratives of COVID-19 mis/disinformation, develop a codebook, automate the process of narrative classification by training an automatic classifier, and analyse the spread of narratives over time and across countries. Articles were retrieved with a customised version of the Europe Media Monitor (EMM) processing chain providing a stream of text items. Machine translation was employed to automatically translate non-English text to English and clustering was carried out to group similar articles. A multi-level codebook of COVID-19 mis/disinformation narratives was developed following an inductive approach; a transformer-based model was developed to classify all text items according to the codebook. Using the transformer-based model, we identified 12 supernarratives that evolved over the three years studied. The analysis shows that there are often real events behind mis/disinformation trends, which unverified sources misrepresent or take out of context. We established a process that allows for near real-time monitoring of COVID-19 mis/disinformation. This experience will be useful to analyse mis/disinformation about other topics, such as climate change, migration, and geopolitical developments.
APA, Harvard, Vancouver, ISO, and other styles
18

Kauffmann, Peral, Gil, Ferrández, Sellers, and Mora. "Managing Marketing Decision-Making with Sentiment Analysis: An Evaluation of the Main Product Features Using Text Data Mining." Sustainability 11, no. 15 (August 5, 2019): 4235. http://dx.doi.org/10.3390/su11154235.

Full text
Abstract:
Companies have realized the importance of “big data” in creating a sustainable competitive advantage, and user-generated content (UGC) represents one of big data’s most important sources. From blogs to social media and online reviews, consumers generate a huge amount of brand-related information that has a decisive potential business value for marketing purposes. Particularly, we focus on online reviews that could have an influence on brand image and positioning. Within this context, and using the usual quantitative star score ratings, a recent stream of research has employed sentiment analysis (SA) tools to examine the textual content of reviews and categorize buyer opinions. Although many SA tools split comments into negative or positive, a review can contain phrases with different polarities because the user can have different sentiments about each feature of the product. Finding the polarity of each feature can be interesting for product managers and brand management. In this paper, we present a general framework that uses natural language processing (NLP) techniques, including sentiment analysis, text data mining, and clustering techniques, to obtain new scores based on consumer sentiments for different product features. The main contribution of our proposal is the combination of price and the aforementioned scores to define a new global score for the product, which allows us to obtain a ranking according to product features. Furthermore, the products can be classified according to their positive, neutral, or negative features (visualized on dashboards), helping consumers with their sustainable purchasing behavior. We proved the validity of our approach in a case study using big data extracted from Amazon online reviews (specifically cell phones), obtaining satisfactory and promising results. After the experimentation, we could conclude that our work is able to improve recommender systems by using positive, neutral, and negative customer opinions and by classifying customers based on their comments.
APA, Harvard, Vancouver, ISO, and other styles
19

Liu, Yu-Bao, Jia-Rong Cai, Jian Yin, and Ada Wai-Chee Fu. "Clustering Text Data Streams." Journal of Computer Science and Technology 23, no. 1 (January 2008): 112–28. http://dx.doi.org/10.1007/s11390-008-9115-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Altun, Oğuz, and Orhan Nooruldeen. "SKETRACK: Stroke-Based Recognition of Online Hand-Drawn Sketches of Arrow-Connected Diagrams and Digital Logic Circuit Diagrams." Scientific Programming 2019 (November 26, 2019): 1–17. http://dx.doi.org/10.1155/2019/6501264.

Full text
Abstract:
Digitalization of handwritten documents has created a greater need for accurate online recognition of hand-drawn sketches. However, the online recognition of hand-drawn diagrams is an enduring challenge in human-computer interaction due to the complexity in extracting and recognizing the visual objects reliably from a continuous stroke stream. This paper focuses on the design and development of a new, efficient stroke-based online hand-drawn sketch recognition scheme named SKETRACK for hand-drawn arrow diagrams and digital logic circuit diagrams. The fundamental parts of this model are text separation, symbol segmentation, feature extraction, classification, and structural analysis. The proposed scheme utilizes the concepts of normalization and segmentation to isolate the text from the sketches. Then, the features are extracted to model different structural variations of the strokes that are categorized into the arrows/lines and the symbols for effective processing. The strokes are clustered using the spectral clustering algorithm based on p-distance and Euclidean distance to compute the similarity between the features and minimize the feature dimensionality by grouping similar features. Then, the symbol recognition is performed using modified support vector machine (MSVM) classifier in which a hybrid kernel function with a lion optimized tuning parameter of SVM is utilized. Structural analysis is performed with lion-based task optimization for recognizing the symbol candidates to form the final diagram representations. This proposed recognition model is suitable for simpler structures such as flowcharts, finite automata, and the logic circuit diagrams. Through the experiments, the performance of the proposed SKETRACK scheme is evaluated on three domains of databases and the results are compared with the state-of-the-art methods to validate its superior efficiency.
APA, Harvard, Vancouver, ISO, and other styles
21

Khin Sandar Kyaw, Praman Tepsongkroh, Chanwut Thongkamkaew, and Farida Sasha. "Business Intelligent Framework Using Sentiment Analysis for Smart Digital Marketing in the E-Commerce Era." Asia Social Issues 16, no. 3 (January 10, 2023): e252965. http://dx.doi.org/10.48048/asi.2023.252965.

Full text
Abstract:
Since trading has been transformed into online platforms, marketing strategies have adapted to digital systems in order to enhance the Customer Relationship Management (CRM) in the E-commerce era. E-commerce systems are the most widely used digital platforms where customer information including personal, and behavioral information, flows as a big data stream. Conducting business intelligent observation on digital big data assists to improve digital marketing policy through the customer intention prediction, decision-making to advertise based on the target group clustering, and customer assist recommendation. To discover the business intelligent, sentiment analysis technology can assist as a solution to understand the customer behavior through the opinion mining where the natural language processing, text analysis, computational linguistics, and biometrics are conducted to analysis the customer information and feedbacks, for smart digital marketing applications. This research observes the applications of sentiment analysis in E-commerce systems as a comprehensive study, and the critical role of discovering business intelligent for smart digital marketing in E-commerce platforms is pointed out according to the technical perspective. Furthermore, the concept of a business intelligent framework integrated with the modelling of decision-making, prediction, and recommendation systems using the contribution of hybrid feature selection which is based on rule-based and machine learning-based sentiment analysis, is proposed for the future innovative smart digital marketing trend.
APA, Harvard, Vancouver, ISO, and other styles
22

Reino, Stella, Elena M. Rossi, Robyn E. Sanderson, Elena Sellentin, Amina Helmi, Helmer H. Koppelman, and Sanjib Sharma. "Galactic potential constraints from clustering in action space of combined stellar stream data." Monthly Notices of the Royal Astronomical Society 502, no. 3 (February 4, 2021): 4170–93. http://dx.doi.org/10.1093/mnras/stab304.

Full text
Abstract:
ABSTRACT Stream stars removed by tides from their progenitor satellite galaxy or globular cluster act as a group of test particles on neighbouring orbits, probing the gravitational field of the Milky Way. While constraints from individual streams have been shown to be susceptible to biases, combining several streams from orbits with various distances reduces these biases. We fit a common gravitational potential to multiple stellar streams simultaneously by maximizing the clustering of the stream stars in action space. We apply this technique to members of the GD-1, Palomar 5 (Pal 5), Orphan, and Helmi streams, exploiting both the individual and combined data sets. We describe the Galactic potential with a Stäckel model, and vary up to five parameters simultaneously. We find that we can only constrain the enclosed mass, and that the strongest constraints come from the GD-1, Pal 5, and Orphan streams whose combined data set yields $M(\lt 20\, \mathrm{kpc}) = 2.96^{+0.25}_{-0.26} \times 10^{11} \, \mathrm{ M}_{\odot}$. When including the Helmi stream in the data set, the mass uncertainty increases to $M(\lt 20\, \mathrm{kpc}) = 3.12^{+3.21}_{-0.46} \times 10^{11} \, \mathrm{M}_{\odot}$.
APA, Harvard, Vancouver, ISO, and other styles
23

Nagasuresh, Mr M., and Ms R. Roopa. "Big Data Stream Mining Using Integrated Framework with Classification and Clustering Methods." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (April 30, 2023): 2503–9. http://dx.doi.org/10.22214/ijraset.2023.50695.

Full text
Abstract:
Abstract: The causes of numerous sorts of big data and data stream problems include the quick development of industry firms, the vast amount of data generated by these innovations, and the exponential growth of industrial company websites. There are numerous stream data mining algorithms for classification and grouping, each with its own unique set of attributes and important features. Ensemble classifiers aid in enhancing the greatest prediction performance results from these cutting-edge techniques. Ensemble approaches teach multiple types of classifiers and clusters rather than a single classifier. Their machine learning prediction findings are merged to form a voting schedule. This research offered a framework for stream data mining based on miss categorization stream data, utilizing the advantages of assembly technology. Real-world data streams are used in experiments. The experimental results are compared to modern popular ensemble techniques such as Boosting and Bagging. The test results show an increase in accuracy rate and decrease in classification time.
APA, Harvard, Vancouver, ISO, and other styles
24

Jiang, Xiaobo, Yunchuan Jiang, Leping Liu, Meng Xia, and Yunlu Jiang. "Time dimension feature extraction and classification of high-dimensional large data streams based on unsupervised learning." Journal of Computational Methods in Sciences and Engineering 24, no. 2 (May 10, 2024): 835–48. http://dx.doi.org/10.3233/jcm-237085.

Full text
Abstract:
In order to solve the problem of low accuracy of time dimension feature extraction and classification of high-dimensional large data streams, this paper proposes a time dimension feature extraction and classification algorithm of high-dimensional large data streams based on unsupervised learning. Analyze the trend of high-dimensional data flow changes under machine learning, and achieve dimensionality reduction of high-dimensional large traffic time dimensional data through local save projection. Analyze the spatial relationship between feature attributes and feature space, segment and fit high-dimensional big data streams and time dimensional feature data streams, further segment time dimensional sequences using sliding windows, and complete feature extraction through discrete dyadic wavelet transform. According to the clustering algorithm, cluster the time dimension feature data stream, calculate the cosine similarity of the feature data, model the time dimension feature stream of training samples, use the feature classification function to minimize the classification loss, and use unsupervised learning to achieve the final classification task. The test results show that this method can improve the temporal feature extraction and classification accuracy streams.
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Chung-Hong, Hsin-Chang Yang, Yenming J. Chen, and Yung-Lin Chuang. "Event Monitoring and Intelligence Gathering Using Twitter Based Real-Time Event Summarization and Pre-Trained Model Techniques." Applied Sciences 11, no. 22 (November 11, 2021): 10596. http://dx.doi.org/10.3390/app112210596.

Full text
Abstract:
Recently, an emerging application field through Twitter messages and algorithmic computation to detect real-time world events has become a new paradigm in the field of data science applications. During a high-impact event, people may want to know the latest information about the development of the event because they want to better understand the situation and possible trends of the event for making decisions. However, often in emergencies, the government or enterprises are usually unable to notify people in time for early warning and avoiding risks. A sensible solution is to integrate real-time event monitoring and intelligence gathering functions into their decision support system. Such a system can provide real-time event summaries, which are updated whenever important new events are detected. Therefore, in this work, we combine a developed Twitter-based real-time event detection algorithm with pre-trained language models for summarizing emergent events. We used an online text-stream clustering algorithm and self-adaptive method developed to gather the Twitter data for detection of emerging events. Subsequently we used the Xsum data set with a pre-trained language model, namely T5 model, to train the summarization model. The Rouge metrics were used to compare the summary performance of various models. Subsequently, we started to use the trained model to summarize the incoming Twitter data set for experimentation. In particular, in this work, we provide a real-world case study, namely the COVID-19 pandemic event, to verify the applicability of the proposed method. Finally, we conducted a survey on the example resulting summaries with human judges for quality assessment of generated summaries. From the case study and experimental results, we have demonstrated that our summarization method provides users with a feasible method to quickly understand the updates in the specific event intelligence based on the real-time summary of the event story.
APA, Harvard, Vancouver, ISO, and other styles
26

Aggarwal, Charu C., and Philip S. Yu. "On clustering massive text and categorical data streams." Knowledge and Information Systems 24, no. 2 (August 6, 2009): 171–96. http://dx.doi.org/10.1007/s10115-009-0241-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Rekik, Amal, and Salma Jamoussi. "Incremental autoencoders for text streams clustering in social networks." JUCS - Journal of Universal Computer Science 27, no. 11 (November 28, 2021): 1203–21. http://dx.doi.org/10.3897/jucs.76770.

Full text
Abstract:
Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.
APA, Harvard, Vancouver, ISO, and other styles
28

Bryant, Avory C., and Krzysztof J. Cios. "SOTXTSTREAM: Density-based self-organizing clustering of text streams." PLOS ONE 12, no. 7 (July 7, 2017): e0180543. http://dx.doi.org/10.1371/journal.pone.0180543.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

V Prasad, Gollanapalli, Kapil Sharma, Rama Krishna B, S. Krishna Mohan Rao, and Venkatadri M. "Labelled Classifier with Weighted Drift Trigger Model using Machine Learning for Streaming Data Analysis." International journal of electrical and computer engineering systems 13, no. 5 (July 15, 2022): 349–56. http://dx.doi.org/10.32985/ijeces.13.5.3.

Full text
Abstract:
The term “data-drift” refers to a difference between the data used to test and validate a model and the data used to deploy it in production. It is possible for data to drift for a variety of reasons. The track of time is an important consideration. Data mining procedures such as classification, clustering, and data stream mining are critical to information extraction and knowledge discovery because of the possibility for significant data type and dimensionality changes over time. The amount of research on mining and analyzing real-time streaming data has risen dramatically in the recent decade. As the name suggests, it’s a stream of data that originates from a number of sources. Analyzing information assets has taken on increased significance in the quest for real-time analytics fulfilment. Traditional mining methods are no longer effective since data is acting in a different way. Aside from storage and temporal constraints, data streams provide additional challenges because just a single pass of the data is required. The dynamic nature of data streams makes it difficult to run any mining method, such as classification, clustering, or indexing, in a single iteration of data. This research identifies concept drift in streaming data classification. For data classification techniques, a Labelled Classifier with Weighted Drift Trigger Model (LCWDTM) is proposed that provides categorization and the capacity to tackle concept drift difficulties. The proposed classifier efficiency is contrasted with the existing classifiers and the results represent that the proposed model in data drift detection is accurate and efficient.
APA, Harvard, Vancouver, ISO, and other styles
30

Costa, António C., J. A. Tenreiro Machado, and Maria Dulce Quelhas. "Multidimensional Scaling Applied to Histogram-Based DNA Analysis." Comparative and Functional Genomics 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/289694.

Full text
Abstract:
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendallτrank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.
APA, Harvard, Vancouver, ISO, and other styles
31

Qiu, Zhangcheng, and Hong Shen. "User clustering in a dynamic social network topic model for short text streams." Information Sciences 414 (November 2017): 102–16. http://dx.doi.org/10.1016/j.ins.2017.05.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

PhridviRaj, Chintakindi Srinivas, and C. V. GuruRao. "Clustering Text Data Streams – A Tree based Approach with Ternary Function and Ternary Feature Vector." Procedia Computer Science 31 (2014): 976–84. http://dx.doi.org/10.1016/j.procs.2014.05.350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Hart, Neil C. G., Suzanne L. Gray, and Peter A. Clark. "Detection of Coherent Airstreams Using Cluster Analysis: Application to an Extratropical Cyclone." Monthly Weather Review 143, no. 9 (August 31, 2015): 3518–31. http://dx.doi.org/10.1175/mwr-d-14-00382.1.

Full text
Abstract:
Abstract Flow in geophysical fluids is commonly summarized by coherent streams (e.g., conveyor belt flows in extratropical cyclones or jet streaks in the upper troposphere). Typically, parcel trajectories are calculated from the flow field and subjective thresholds are used to distinguish coherent streams of interest. This methodology contribution develops a more objective approach to distinguish coherent airstreams within extratropical cyclones. Agglomerative clustering is applied to trajectories along with a method to identify the optimal number of cluster classes. The methodology is applied to trajectories associated with the low-level jets of a well-studied extratropical cyclone. For computational efficiency, a constraint that trajectories must pass through these jet regions is applied prior to clustering; the partitioning into different airstreams is then performed by the agglomerative clustering. It is demonstrated that the methodology can identify the salient flow structures of cyclones: the warm and cold conveyor belts. A test focusing on the airstreams terminating at the tip of the bent-back front further demonstrates the success of the method in that it can distinguish finescale flow structure such as descending sting-jet airstreams.
APA, Harvard, Vancouver, ISO, and other styles
34

Guo, Haoxu, Xiaodong Qiu, and Lixiang Chen. "Orbital-angular-momentum-based optical clustering via nonlinear optics." Applied Physics Letters 122, no. 6 (February 6, 2023): 061103. http://dx.doi.org/10.1063/5.0135728.

Full text
Abstract:
Machine learning offers a convenient and intelligent tool for a variety of applications in the fields ranging from fundamental research to financial analysis. With the explosive growth of data streams, i.e., “big data,” optical machine learning with the inherent capacity for massive parallel processing is gradually attracting attention. Despite significant experimental and theoretical progress in this area, limited by the coherent manipulation of multibeams, high dimensional optical vector or matrix operation is still challenging. Here, by using the second harmonic generation of high dimensional orbital angular momentum superposition states, we present a compact and robust optical clustering machine, which is the crucial component in machine learning. In experiment, we conduct supervised clustering for classification of three- and eight-dimensional vectors and unsupervised clustering for text mining of 14-dimensional texts both with high accuracies. The presented optical clustering scheme could offer a pathway for constructing high speed and low energy consumption machine learning architectures.
APA, Harvard, Vancouver, ISO, and other styles
35

Jensen, Jaclyn, Guillaume Thomas, Alan W. McConnachie, Else Starkenburg, Khyati Malhan, Julio Navarro, Nicolas Martin, et al. "Uncovering fossils of the distant Milky Way with UNIONS: NGC 5466 and its stellar stream." Monthly Notices of the Royal Astronomical Society 507, no. 2 (August 12, 2021): 1923–36. http://dx.doi.org/10.1093/mnras/stab2325.

Full text
Abstract:
ABSTRACT We examine the spatial clustering of blue horizontal branch (BHB) stars from the u-band of the Canada–France Imaging Survey (CFIS, a component of the Ultraviolet Near-Infrared Optical Northern Survey, or UNIONS). All major groupings of stars are associated with previously known satellites, and among these is NGC 5466, a distant (16 kpc) globular cluster. NGC 5466 reportedly possesses a long stellar stream, although no individual members of the stream have previously been identified. Using both BHBs and more numerous red giant branch stars cross-matched to Gaia Data Release 2, we identify extended tidal tails from NGC 5466 that are both spatially and kinematically coherent. Interestingly, we find that this stream does not follow the same path as the previous detection at large distances from the cluster. We trace the stream across 31° of sky and show that it exhibits a very strong distance gradient in the range 10 < Rhelio < 30 kpc. We compare our observations to simple dynamical models of the stream and find that they are able to broadly reproduce the overall path and kinematics. The fact that NGC 5466 is so distant, traces a wide range of Galactic distances, has an identified progenitor, and appears to have recently had an interaction with the Galaxy’s disc makes it a unique test-case for dynamical modelling of the Milky Way.
APA, Harvard, Vancouver, ISO, and other styles
36

Sapozhnikov, Sergey, and Dana Kovaleva. "Clustering stellar pairs to detect extended stellar structures." Proceedings of the International Astronomical Union 16, S362 (June 2020): 150–51. http://dx.doi.org/10.1017/s1743921322001697.

Full text
Abstract:
AbstractGaia data allows for search for extended stellar structures in phase (coordinates plus velocities) space. We describe a method of using DBSCAN clustering algorithm, which is used to group closely-packed-together data points, to a list of preliminary selected pairs of stars, with parameters expected to be found within stellar streams and comoving groups: loose structures in which stars are not gravitationally bound, but do share motion and evolutionary properties. To test our approach, we construct a model population of background stars, and use pair-constructing and clustering algorithms on it. Results show that transitioning to a list of pairs sharply reveals structures not presented in background model, which then become more apparent targets in coordinate-velocity phase space for DBSCAN algorithm thanks to now increased relative density of the extended stellar structure.
APA, Harvard, Vancouver, ISO, and other styles
37

Ismalina, Poppy. "What Factors Constitute Structures of Clustering Creative Industries? Incorporating New Institutional Economics and New Economic Sociology into A Conceptual Framework." Gadjah Mada International Journal of Business 14, no. 3 (November 27, 2012): 213. http://dx.doi.org/10.22146/gamaijb.5454.

Full text
Abstract:
Creative industries tend to cluster in specific places and the reasons for this phenomenon can be a multiplicity of elements linked mainly to culture, creativity, innovation and local development. In the international literature, it is pretty well recognized that creativity is frequently characterized by the agglomeration of firms so that creative industries are not homogeneously distributed across the territory but they are concentrated in the space. Three theories are becoming the dominant theoretical perspectives in agglomeration economies theory and they are increasingly being applied in industrial clusters analysis to study the effect of clustering industries. The theories are Marshall’s theoretical principles of localization economies, Schmitz’s collective efficiency and Porter’s five-diamond approach. However, those have adequately theorized neither the institutionalization process through which change takes place nor the socio-economic context of the institutional formations of clustering creative industries. This text begins by reviewing three main theories to more fully articulate institutionalization processes of an economic institution. Specifically, this paper incorporates new institutional economics (NIE) and new economic sociology (NES) to explain the processes associated with creating institutional practices within clustering creative industries. Both streams of institutional theory constitute that economic organizations are socially constructed. Next, this text proposes the framework that depicts the socio-economic context better and more directly addresses the dynamics of enacting, embedding and changing organizational features and processes within clustering creative industries. Some pertinent definitions are offered to be used in a conceptual framework of research about how economic institutions like clustering creative industries constitute their structures.
APA, Harvard, Vancouver, ISO, and other styles
38

Goodsitt, Jan V., James L. Morgan, and Patricia K. Kuhl. "Perceptual strategies in prelingual speech segmentation." Journal of Child Language 20, no. 2 (June 1993): 229–52. http://dx.doi.org/10.1017/s0305000900008266.

Full text
Abstract:
ABSTRACTPrevious work has suggested that infants may segment continuous speech by a BRACKETING STRATEGY that segregates portions of the speech stream based on prosodic cues to their endpoints. The two present studies were designed to assess whether infants also can deploy a CLUSTERING STRATEGY that exploits asymmetries in transitional probabilities between successive elements, aggregating elements with high transitional probabilities and identifying points of low transitional probabilities as boundaries between units. These studies examined effects of the structure and redundancy of speech context on infants' discrimination of two target syllables using an operant head-turning procedure. After discrimination training on the target syllables in isolation, discrimination maintenance was tested when the target syllables were embedded in one of three contexts. Invariant Order contexts were structured to promote clustering, whereas the Redundant and Variable Order contexts were not. Thirty-six seven-month-olds were tested in Experiment I, in which stimuli were produced with varying intonation contours; 36 eight-month-olds were tested in Experiment 2, in which stimuli were produced with comparable flat pitch contours. In both experiments, performance of the three groups was equivalent in an initial 20-trial test. However, in a second 20-trial test, significant improvements in performance were shown by infants in the Invariant Order condition. No such gains were shown by infants in the other two conditions. These studies suggest that clustering may complement bracketing in infants' discovery of units of language.
APA, Harvard, Vancouver, ISO, and other styles
39

Tidke, Bharat, Rupa Mehta, Dipti Rana, and Hullash Jangir. "Topic Sensitive User Clustering Using Sentiment Score and Similarity Measures." International Journal of Web-Based Learning and Teaching Technologies 15, no. 2 (April 2020): 34–45. http://dx.doi.org/10.4018/ijwltt.2020040103.

Full text
Abstract:
Social media data (SMD) is driven by statistical and analytical technologies to obtain information for various decisions. SMD is vast and evolutionary in nature which makes traditional data warehouses ill suited. The research aims to propose and implement novel framework that analyze tweets data from online social networking site (OSN; i.e., Twitter). The authors fetch streaming tweets from Twitter API using Apache Flume to detect clusters of users having similar sentiment. Proposed approach utilizes scalable and fault tolerant system (i.e., Hadoop) that typically harness HDFS for data storage and map-reduce paradigm for data processing. Apache Hive is used to work on top of Hadoop for querying data. The experiments are performed to test the scalability of proposed framework by examining various sizes of data. The authors' goal is to handle big social data effectively using cost-effective tools for fetching as well as querying unstructured data and algorithms for analysing scalable, uninterrupted data streams with finite memory and resources.
APA, Harvard, Vancouver, ISO, and other styles
40

Yang, Tianyi, Supranta S. Boruah, and Niayesh Afshordi. "Gravitational potential from small-scale clustering in action space: application to Gaia Data Release 2." Monthly Notices of the Royal Astronomical Society 493, no. 3 (March 7, 2020): 3061–80. http://dx.doi.org/10.1093/mnras/staa441.

Full text
Abstract:
ABSTRACT Most measurements of mass in astronomy that use kinematics of stars or gas rely on assumptions of equilibrium that are often hard to verify. Instead, we develop a novel idea that uses the clustering in action space, as a probe of the underlying gravitational potential: the correct potential should maximize small-scale clustering in the action space. We provide a first-principle derivation of likelihood using the two-point correlation function in action space, and we test it against simulations of stellar streams. We then apply this method to the second data release of Gaia, and we use it to measure the radial force fraction fh and logarithmic slope α of the dark matter halo profile. We investigate stars within 9–11 kpc and 11.5–15 kpc from the Galactic Centre, and we find (fh, α) = (0.391 ± 0.009, 1.835 ± 0.092) and (0.351 ± 0.012, 1.687 ± 0.079), respectively. We also confirm that the set of parameters that maximize the likelihood function does correspond to the most clustering in the action space. The best-fitting circular velocity curve for the Milky Way potential is consistent with past measurements (although it is ∼5–10 per cent lower than previous methods that use masers or globular clusters). Our work provides a clear demonstration of the full statistical power that lies in the full phase space information, relieving the need for ad hoc assumptions such as virial equilibrium, circular motion or steam-finding algorithms.
APA, Harvard, Vancouver, ISO, and other styles
41

Genc, Onur, Ozgur Kisi, and Mehmet Ardiclioglu. "Modeling velocity distributions in small streams using different neuro-fuzzy and neural computing techniques." Journal of Water and Climate Change 11, no. 2 (January 29, 2019): 390–401. http://dx.doi.org/10.2166/wcc.2019.103.

Full text
Abstract:
Abstract Accurate estimation of velocity distribution in open channels or streams (especially in turbulent flow conditions) is very important and its measurement is very difficult because of spatio-temporal variation in velocity vectors. In the present study, velocity distribution in streams was estimated by two different artificial neural networks (ANN), ANN with conjugate gradient (ANN-CG) and ANN with Levenberg–Marquardt (ANN-LM), and two different adaptive neuro-fuzzy inference systems (ANFIS), ANFIS with grid partition (ANFIS-GP) and ANFIS with subtractive clustering (ANFIS-SC). The performance of the proposed models was compared with the multiple-linear regression (MLR) model. The comparison results revealed that the ANN-CG, ANN-LM, ANFIS-GP, and ANFIS-SC models performed better than the MLR model in estimating velocity distribution. Among the soft computing methods, the ANFIS-GP was observed to be better than the ANN-CG, ANN-LM, and ANFIS-SC models. The root mean square errors (RMSE) and mean absolute errors (MAE) of the MLR model were reduced by 69% and 72%, respectively, using the ANFIS-GP model to estimate velocity distribution in the test period.
APA, Harvard, Vancouver, ISO, and other styles
42

Ammirato, Salvatore, Alberto Michele Felicetti, Cinzia Raso, Bruno Antonio Pansera, and Antonio Violi. "Agritourism and Sustainability: What We Can Learn from a Systematic Literature Review." Sustainability 12, no. 22 (November 17, 2020): 9575. http://dx.doi.org/10.3390/su12229575.

Full text
Abstract:
Scholars from different perspectives agree that agritourism can be the right tool to balance the needs of tourists with those of rural communities, offering real opportunities for economic and social development, while mitigating undesirable impacts on the environment. This paper aims to provide a holistic outlook of the different perspectives under which scientific literature deals with the topic of agritourism as a means to support the sustainable development of rural areas. To reach this aim, we performed a systematic review of the scientific literature in order to point out the linkages between agritourism and sustainability. We analyzed papers through a text mining solution based on the Latent Dirichlet Allocation (LDA) technique to point out the main topics around which the scientific literature on agritourism and sustainability has grown. Topics are further categorized in themes by means of an agglomerative hierarchical clustering procedure. Results are further analyzed to highlight the strengths and weaknesses of the current streams of the literature.
APA, Harvard, Vancouver, ISO, and other styles
43

Wang, Guan-Yu, Hai-Feng Wang, Yang-Ping Luo, Yuan-Sen Ting, Thor Tepper-García, Joss Bland-Hawthorn, and Jeffrey Carlin. "Galactic-Seismology Substructures and Streams Hunter with LAMOST and Gaia. I. Methodology and Local Halo Results." Astrophysical Journal 974, no. 2 (October 1, 2024): 219. http://dx.doi.org/10.3847/1538-4357/ad6d59.

Full text
Abstract:
Abstract We present a novel, deep-learning-based method—dubbed Galactic-Seismology Substructures and Streams Hunter, or GS3 Hunter for short—to search for substructures and streams in stellar kinematics data. GS3 Hunter relies on a combined application of Siamese neural networks to transform the phase space information and the K-means algorithm for the clustering. As a validation test, we apply GS3 Hunter to a subset of the Feedback in Realistic Environments (FIRE) cosmological simulations. The stellar streams and substructures thus identified are in good agreement with corresponding results reported earlier by the FIRE team. In the same vein, we apply our method to a subset of local halo stars from the Gaia Early Data Release 3 and GALAH DR3 data sets and recover several previously known dynamical groups, such as Thamnos 1+2, the hot thick disk, ED-1, L-RL3, Helmi 1+2, Gaia-Sausage-Enceladus, Sequoia, Virgo Radial Merger, Cronus, and Nereus. Finally, we apply our method without fine-tuning to a subset of K giant stars located in the inner halo region, obtained from the LAMOST Data Release 5 data set. We recover three previously known structures (Sagittarius, Hercules-Aquila Cloud, and the Virgo Overdensity), but we also discover a number of new substructures. We anticipate that GS3 Hunter will become a useful tool for the community dedicated to the search for stellar streams and structures in the Milky Way (MW) and the Local Group, thus helping advance our understanding of the stellar inner and outer halos and the assembly and tidal stripping history in and around the MW.
APA, Harvard, Vancouver, ISO, and other styles
44

Baloian, Nelson, and José Pino. "Editorial introduction to J.UCS special issue Challenges for Smart Environments – Human-Centered Computing, Data Science, and Ambient Intelligence I." JUCS - Journal of Universal Computer Science 27, no. 11 (November 28, 2021): 1149–51. http://dx.doi.org/10.3897/jucs.76554.

Full text
Abstract:
Modern technologies and various domains of human activities increasingly rely on data science to develop smarter and autonomous systems. This trend has already changed the whole landscape of the global economy becoming more AI-driven. Massive production of data by humans and machines, its availability for feasible processing with advent of deep learning infrastructures, combined with advancements in reliable information transfer capacities, open unbounded horizons for societal progress in close future. Quite naturally, this brings also new challenges for science and industry. In that context, Internet of things (IoT) is an enormously huge factory of monitoring and data generation. It enables countless devices to act as sensors which record and manipulate data, while requiring efficient algorithms to derive actionable knowledge. Billions of end-users equipped with smart mobile phones are also producing immensely large volumes of data, being it about user interaction or indirect telemetry such as location coordinates. Social networks represent another kind of data-intensive sources, with both structured and unstructured components, containing valuable information about world’s connectivity, dynamism, and more. Last but not least, to help businesses run smoothly, today’s cloud computing infrastructures and applications are also serviced and managed through measuring huge amounts of data to leverage in various predictive and automation tasks for healthy performance and permanent availability. Therefore, all these technology areas, experts and practitioners, are facing innovation challenges on building novel methodologies, accurate models, and systems for respective data-driven solutions which are effective and efficient. In view of the complexity of contemporary neural network architectures and models with millions of parameters they derive, one of such challenges is related to the concept of explainability of the machine learning models. It refers to the ability of the model to give information which can be interpreted by humans about the reasons for the decision made or recommendation released. These challenges can only be met with a mix of basic research, process modeling and simulation under uncertainty using qualitative and quantitative methods from the involved sciences, and taking into account international standards and adequate evaluation methods. Based on a successful funded collaboration between the American University of Armenia, the University of Duisburg-Essen and the University of Chile, in previous years a network was built, and in September 2020 a group of researchers gathered (although virtually) for the 2nd CODASSCA workshop on “Collaborative Technologies and Data Science in Smart City Applications”. This event has attracted 25 paper submissions which deal with the problems and challenges mentioned above. The studies are in specialized areas and disclose novel solutions and approaches based on existing theories suitably applied. The authors of the best papers published in the conference proceedings on Collaborative Technologies and Data Science in Artificial Intelligence Applications by Logos edition Berlin were invited to submit significantly extended and improved versions of their contributions to be considered for a journal special issue of J.UCS. There was also a J.UCS open call so that any author could submit papers on the highlighted subject. For this volume, we selected those dealing with more theoretical issues which were rigorously reviewed in three rounds and 6 papers nominated to be published. The editors would like to express their gratitude to J.UCS foundation for accepting the special issues in their journal, to the German Research Foundation (DFG), the German Academic Exchange Service (DAAD) and the universities and sponsors involved for funding the common activities and thank the editors of the CODASSCA2020 proceedings for their ongoing encouragement and support, the authors for their contributions, and the anonymous reviewers for their invaluable support. The paper “Incident Management for Explainable and Automated Root Cause Analysis in Cloud Data Centers” by Arnak Poghosyan, Ashot Harutyunyan, Naira Grigoryan, and Nicholas Kushmerick addresses an increasingly important problem towards autonomous or self-X systems, intelligent management of modern cloud environments with an emphasis on explainable AI. It demonstrates techniques and methods that greatly help in automated discovery of explicit conditions leading to data center incidents. The paper “Temporal Accelerators: Unleashing the Potential of Embedded FPGAs” by Christopher Cichiwskyj and Gregor Schiele presents an approach for executing computational tasks that can be split into sequential sub-tasks. It divides accelerators into multiple, smaller parts and uses the reconfiguration capabilities of the FPGA to execute the parts according to a task graph. That improves the energy consumption and the cost of using FPGAs in IoT devices. The paper “On Recurrent Neural Network based Theorem Prover for First Order Minimal Logic” by Ashot Baghdasaryan and Hovhannes Bolibekyan investigates using recurrent neural networks to determine the order of proof search in a sequent calculus for first-order minimal logic with a history mechanism. It demonstrates reduced durations in automated theorem proving systems.  The paper “Incremental Autoencoders for Text Streams Clustering in Social Networks” by Amal Rekik and Salma Jamoussi proposes a deep learning method to identify trending topics in a social network. It is built on detecting changes in streams of tweets. The method is experimentally validated to outperform relevant data stream algorithms in identifying “hot” topics. The paper “E-Capacity–Equivocation Region of Wiretap Channel” by Mariam Haroutunian studies a secure communication problem over the wiretap channel, where information transfer from the source to a legitimate receiver needs to be realized maximally secretly for an eavesdropper. This is an information-theoretic research which generalizes the capacity-equivocation region and secrecy-capacity function of the wiretap channel subject to error exponent criterion, thus deriving new and extended fundamental limits in reliable and secure communication in presence of a wiretapper. The paper “Leveraging Multifaceted Proximity Measures among Developers in Predicting Future Collaborations to Improve the Social Capital of Software Projects” by Amit Kumar and Sonali Agarwal targets improving the social capital of individual software developers and projects using machine learning. Authors’ approach applies network proximity and developer activity features to build a classifier for predicting the future collaborations among developers and generating relevant recommendations.
APA, Harvard, Vancouver, ISO, and other styles
45

M, Sriram, Susmithaa Raam A, Vignesh B, and Dr Balasubramanian V. "End-to-End Machine Learning Pipeline for Real-Time Network Traffic Classification and Monitoring in Android Automotive." International Journal of Innovative Technology and Exploring Engineering 11, no. 7 (June 30, 2022): 32–38. http://dx.doi.org/10.35940/ijitee.g9982.0611722.

Full text
Abstract:
The aim of this work is to build a network traffic monitoring application that is capable of categorizing network data traffic based on their application usage into 7 types: Browsing, Chat, Email, File Transfer, Streaming, VoIP and P2P. Flow-wise data is analyzed after the traffic stream is fed into the CICFlowmeter. Live traffic flow is fed to various ML models and algorithms such as K-Means Clustering algorithm, Agglomerative Clustering, Mean-shift algorithm, Random Forest Classifier, Adaptive Boosting algorithm, Gradient Boosting algorithm, Linear Discriminant analysis, Naive Bayes classifier, Classification and regression trees and the Support Vector Machine model. K-fold cross validation test is conducted, which derived results depicting the best of the models to be the Random Forest Classifier. We used 23 features for model training based on their importances. Model evaluation is done using the confusion matrix. Class imbalances are handled effectively with a comparative study of both under-sampling and oversampling of the dataset. Oversampling using SMOTE produces better results. The important timebased features in classification is recorded for further studies. The model used was fast enough to classify the flows in real time and display the analytics in the dashboard. The Flask framework is used to build a live dashboard to display the network traffic classified along with the several important features. We were able to prove that network traffic classification cam be done using time-based features which does not violate data protection laws. Network traffic classification using Random forest algorithm on oversampled dataset gave an overall accuracy of 0.92 was achieved.
APA, Harvard, Vancouver, ISO, and other styles
46

Li, Jin, and Ruibo Zhao. "Integrated Classification Algorithm for Unbalanced Data Streams Based on Joint Nonnegative Matrix Factorization." Wireless Communications and Mobile Computing 2022 (June 14, 2022): 1–12. http://dx.doi.org/10.1155/2022/5659979.

Full text
Abstract:
The purpose of this paper is to study the unbalanced data flow integration classification algorithm based on joint nonnegative matrix factorization, in order to solve the problem that the basic clustering results obtained from the original data set have some information loss, thereby reducing the effective information in the integration stage. In this paper, the accuracy of the unbalanced data and the detection time consumption are selected as the research object. Six data sets with imbalanced proportions of minority and majority samples are selected for experiments. Mathematical statistical analysis is first used to observe text classification, disease diagnosis, and network intrusion detection and the classification accuracy of majority class and minority class; the commonly used algorithm for unbalanced data is statistical analysis method. Comparing the univariate method for comprehensive classification of unbalanced data flow based on nonnegative matrix factorization with the unbalanced data algorithm, the observation has accurate rate and detects time-consuming changes. Among them, the comprehensive classification algorithm of unbalanced data flow is based on the classification of data, classifying the data, judging whether two data points belong to the same category, and determining their degree of balance. The research data shows that the unbalanced data flow integrated classification algorithm based on joint nonnegative matrix decomposition can reasonably evaluate the classification performance of the classifier for a few classes, and the detection speed is faster and saves more time. The experimental research shows that the algorithm combines the relationship matrix and information matrix from the original data set into a consensus function, uses NMF technology to obtain the membership matrix, effectively uses potential information, improves the accuracy rate of 69.73%, and shortens 71.65% of the time consumed.
APA, Harvard, Vancouver, ISO, and other styles
47

Indumathy, D., K. Ramesh, G. Senthilkumar, S. Sudha, and T. A. Mohanaprakash. "Investigations on coronary artery plaque detection and subclassification using machine learning classifier." Journal of X-Ray Science and Technology 30, no. 3 (April 15, 2022): 513–29. http://dx.doi.org/10.3233/xst-211077.

Full text
Abstract:
Coronary artery diseases are one of the high-risk diseases, which occur due to the insufficient blood supply to the heart. The different types of plaques formed inside the artery leads to the blockage of the blood stream. Understanding the type of plaques along with the detection and classification of plaques supports in reducing the mortality of patients. The objective of this study is to present a novel clustering method of plaque segmentation followed by wavelet transform based feature extraction. The extracted features of all different kinds of calcified and sub calcified plaques are applied to first train and test three machine learning classifiers including support vector machine, random forest and decision tree classifiers. The bootstrap ensemble classifier then decides the best classification result through a voting method of three classifiers. A training dataset including 64 normal CTA images and 73 abnormal CTA images is used, while a testing dataset consists of 111 normal CTA images and 103 abnormal CTA images. The evaluation metrics shows better classification rate and accuracy of 97.7%. The Sensitivity and Specificity rates are 97.8% and 97.5%, respectively. As a result, our study results demonstrate the feasibility and advantages of developing and applying this new image processing and machine learning scheme to assist coronary artery plaque detection and classification.
APA, Harvard, Vancouver, ISO, and other styles
48

Cavallaro, Matteo, David Flacher, and Massimo Angelo Zanetti. "Radical right parties and European economic integration: Evidence from the seventh European Parliament." European Union Politics 19, no. 2 (March 11, 2018): 321–43. http://dx.doi.org/10.1177/1465116518760241.

Full text
Abstract:
This article explores the differences in radical right parties' voting behaviour on economic matters at the European Parliament. As the literature highlights the heterogeneity of these parties in relation to their economic programmes, we test whether divergences survive the elections and translate into dissimilar voting patterns. Using voting records from the seventh term of the European Parliament, we show that radical right parties do not act as a consolidated party family. We then analyse the differences between radical right parties by the means of different statistical methods (NOMINATE, Ward's clustering criterion, and additive trees) and find that these are described along two dimensions: the degree of opposition to the European Union and the classical left–right economic cleavage. We provide a classification of these parties compromising four groups: pro-welfare conditional, pro-market conditional, and rejecting. Our results indicate that radical right parties do not act as a party family at the European Parliament. This remains true regardless of the salience of the policy issues in their agendas. The article also derives streams for future research on the heterogeneity of radical right parties.
APA, Harvard, Vancouver, ISO, and other styles
49

Gagne, David John, Amy McGovern, and Jerry Brotzge. "Classification of Convective Areas Using Decision Trees." Journal of Atmospheric and Oceanic Technology 26, no. 7 (July 1, 2009): 1341–53. http://dx.doi.org/10.1175/2008jtecha1205.1.

Full text
Abstract:
Abstract This paper presents an automated approach for classifying storm type from weather radar reflectivity using decision trees. Recent research indicates a strong relationship between storm type (morphology) and severe weather, and such information can aid in the warning process. Furthermore, new adaptive sensing tools, such as the Center for Collaborative Adaptive Sensing of the Atmosphere’s (CASA’s) weather radar, can make use of storm-type information in real time. Given the volume of weather radar data from those tools, manual classification of storms is not possible when dealing with real-time data streams. An automated system can more quickly and efficiently sort through real-time data streams and return value-added output in a form that can be more easily manipulated and understood. The method of storm classification in this paper combines two machine learning techniques: K-means clustering and decision trees. K-means segments the reflectivity data into clusters, and decision trees classify each cluster. The K means was used to separate isolated cells from linear systems. Each cell received labels such as “isolated pulse,” “isolated strong,” or “multicellular.” Linear systems were labeled as “trailing stratiform,” “leading stratiform,” and “parallel stratiform.” The classification scheme was tested using both simulated and observed storms. The simulated training and test datasets came from the Advanced Regional Prediction System (ARPS) simulated reflectivity data, and observed data were collected from composite reflectivity mosaics from the CASA Integrative Project One (IP1) network. The observations from the CASA network showed that the classification scheme is now ready for operational use.
APA, Harvard, Vancouver, ISO, and other styles
50

Corr, Diarmuid, Amber Leeson, Malcolm McMillan, Ce Zhang, and Thomas Barnes. "An inventory of supraglacial lakes and channels across the West Antarctic Ice Sheet." Earth System Science Data 14, no. 1 (January 24, 2022): 209–28. http://dx.doi.org/10.5194/essd-14-209-2022.

Full text
Abstract:
Abstract. Quantifying the extent and distribution of supraglacial hydrology, i.e. lakes and streams, is important for understanding the mass balance of the Antarctic ice sheet and its consequent contribution to global sea-level rise. The existence of meltwater on the ice surface has the potential to affect ice shelf stability and grounded ice flow through hydrofracturing and the associated delivery of meltwater to the bed. In this study, we systematically map all observable supraglacial lakes and streams in West Antarctica by applying a semi-automated Dual-NDWI (normalised difference water index) approach to >2000 images acquired by the Sentinel-2 and Landsat-8 satellites during January 2017. We use a K-means clustering method to partition water into lakes and streams, which is important for understanding the dynamics and inter-connectivity of the hydrological system. When compared to a manually delineated reference dataset on three Antarctic test sites, our approach achieves average values for sensitivity (85.3 % and 77.6 %), specificity (99.1 % and 99.7 %) and accuracy (98.7 % and 98.3 %) for Sentinel-2 and Landsat-8 acquisitions, respectively. We identified 10 478 supraglacial features (10 223 lakes and 255 channels) on the West Antarctic Ice Sheet (WAIS) and Antarctic Peninsula (AP), with a combined area of 119.4 km2 (114.7 km2 lakes, 4.7 km2 channels). We found 27.3 % of feature area on grounded ice and 54.9 % on floating ice shelves. In total, 17.8 % of feature area crossed the grounding line. A recent expansion in satellite data provision made new continental-scale inventories such as these, the first produced for WAIS and AP, possible. The inventories provide a baseline for future studies and a benchmark to monitor the development of Antarctica's surface hydrology in a warming world and thus enhance our capability to predict the collapse of ice shelves in the future. The dataset is available at https://doi.org/10.5281/zenodo.5642755 (Corr et al., 2021).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography