To see the other types of publications on this topic, follow the link: Text Data Streams.

Journal articles on the topic 'Text Data Streams'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Text Data Streams.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Liu, Yu-Bao, Jia-Rong Cai, Jian Yin, and Ada Wai-Chee Fu. "Clustering Text Data Streams." Journal of Computer Science and Technology 23, no. 1 (January 2008): 112–28. http://dx.doi.org/10.1007/s11390-008-9115-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Aggarwal, Charu C., and Philip S. Yu. "On clustering massive text and categorical data streams." Knowledge and Information Systems 24, no. 2 (August 6, 2009): 171–96. http://dx.doi.org/10.1007/s10115-009-0241-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

FRAHLING, GEREON, PIOTR INDYK, and CHRISTIAN SOHLER. "SAMPLING IN DYNAMIC DATA STREAMS AND APPLICATIONS." International Journal of Computational Geometry & Applications 18, no. 01n02 (April 2008): 3–28. http://dx.doi.org/10.1142/s0218195908002520.

Full text
Abstract:
A dynamic geometric data stream is a sequence of m ADD/REMOVE operations of points from a discrete geometric space {1,…, Δ} d ?. ADD (p) inserts a point p from {1,…, Δ} d into the current point set P , REMOVE(p) deletes p from P . We develop low-storage data structures to (i) maintain ε-nets and ε-approximations of range spaces of P with small VC-dimension and (ii) maintain a (1 + ε)-approximation of the weight of the Euclidean minimum spanning tree of P . Our data structure for ε-nets uses [Formula: see text] bits of memory and returns with probability 1 – δ a set of [Formula: see text] points that is an e-net for an arbitrary fixed finite range space with VC-dimension [Formula: see text]. Our data structure for ε-approximations uses [Formula: see text] bits of memory and returns with probability 1 – δ a set of [Formula: see text] points that is an ε-approximation for an arbitrary fixed finite range space with VC-dimension [Formula: see text]. The data structure for the approximation of the weight of a Euclidean minimum spanning tree uses O ( log (1/δ)( log Δ/ε) O ( d )) space and is correct with probability at least 1 – δ. Our results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P .
APA, Harvard, Vancouver, ISO, and other styles
4

Zhang, Yuhong, Guang Chu, Peipei Li, Xuegang Hu, and Xindong Wu. "Three-layer concept drifting detection in text data streams." Neurocomputing 260 (October 2017): 393–403. http://dx.doi.org/10.1016/j.neucom.2017.04.047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Russo, Matthew, Tatsunori Hashimoto, Daniel Kang, Yi Sun, and Matei Zaharia. "Accelerating Aggregation Queries on Unstructured Streams of Data." Proceedings of the VLDB Endowment 16, no. 11 (July 2023): 2897–910. http://dx.doi.org/10.14778/3611479.3611496.

Full text
Abstract:
Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep neural networks (DNNs) to answer such queries in the batch setting. However, much of this work is not suited for the streaming setting because it requires access to the entire dataset before a query can be submitted or is specific to video. Thus, to the best of our knowledge, no prior work addresses the problem of efficiently answering queries over multiple modalities of streams. In this work we propose InQuest, a system for accelerating aggregation queries on unstructured streams of data with statistical guarantees on query accuracy. InQuest leverages inexpensive approximation models ("proxies") and sampling techniques to limit the execution of an expensive high-precision model (an "oracle") to a subset of the stream. It then uses the oracle predictions to compute an approximate query answer in real-time. We theoretically analyzed InQuest and show that the expected error of its query estimates converges on stationary streams at a rate inversely proportional to the oracle budget. We evaluated our algorithm on six real-world video and text datasets and show that InQuest achieves the same root mean squared error (RMSE) as two streaming baselines with up to 5.0x fewer oracle invocations. We further show that InQuest can achieve up to 1.9x lower RMSE at a fixed number of oracle invocations than a state-of-the-art batch setting algorithm.
APA, Harvard, Vancouver, ISO, and other styles
6

Petrasova, Svitlana, Nina Khairova, and Anastasiia Kolesnyk. "TECHNOLOGY FOR IDENTIFICATION OF INFORMATION AGENDA IN NEWS DATA STREAMS." Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies, no. 1 (5) (July 12, 2021): 86–90. http://dx.doi.org/10.20998/2079-0023.2021.01.14.

Full text
Abstract:
Currently, the volume of news data streams is growing that contributes to increasing interest in systems that allow automating the big data streams processing. Based on intelligent data processing tools, the semantic similarity identification of text information will make it possible to select common information spaces of news. The article analyzes up-to-date statistical metrics for identifying coherent fragments, in particular, from news texts displaying the agenda, identifies the main advantages and disadvantages as well. The information technology is proposed for identifying the common information space of relevant news in the data stream for a certain period of time. The technology includes the logical-linguistic and distributive-statistical models for identifying collocations. The MI distributional semantic model is applied at the stage of potential collocation extraction. At the same time, regular expressions developed in accordance with the grammar of the English language make it possible to identify grammatically correct constructions. The advantage of the developed logical-linguistic model formalizing the semantic-grammatical characteristics of collocations, based on the use of algebraicpredicate operations and a semantic equivalence predicate, is that both the grammatical structure of the language and the meaning of words (collocates) are analyzed. The WordNet thesaurus is used to determine the synonymy relationship between the main and dependent collocation components. Based on the investigated corpus of news texts from the CNN and BBC services, the effectiveness of the developed technology is assessed. The analysis shows that the precision coefficient is 0.96. The use of the proposed technology could improve the quality of news streams processing. The solution to the problem of automatic identification of semantic similarity can be used to identify texts of the same domain, relevant information, extract facts and eliminate semantic ambiguity, etc. Keywords: data stream, agenda, logical-linguistic model, distribution-statistical model, collocation, semantic similarity, WordNet, news text corpus, precision.
APA, Harvard, Vancouver, ISO, and other styles
7

AL-Dyani, Wafa Zubair, Farzana Kabir Ahmad, and Siti Sakira Kamaruddin. "A Survey on Event Detection Models for Text Data Streams." Journal of Computer Science 16, no. 7 (July 1, 2020): 916–35. http://dx.doi.org/10.3844/jcssp.2020.916.935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hasan, Maryam, Elke Rundensteiner, and Emmanuel Agu. "Automatic emotion detection in text streams by analyzing Twitter data." International Journal of Data Science and Analytics 7, no. 1 (February 9, 2018): 35–51. http://dx.doi.org/10.1007/s41060-018-0096-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhao, Xuezhuan, Ziheng Zhou, Lingling Li, Lishen Pei, and Zhaoyi Ye. "Scene Text Detection Based On Fusion Network." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 10 (May 29, 2021): 2153005. http://dx.doi.org/10.1142/s0218001421530050.

Full text
Abstract:
Due to the robustness resulted from scale transformation and unbalanced distribution of training samples in scene text detection task, a new fusion framework TSFnet is proposed in this paper. This framework is composed of Detection Stream, Judge Stream and Fusion Stream. In the Detection Stream, loss balance factor (LBF) is raised to improve the region proposal network (RPN). To predict the global text segmentation map, the algorithm combines regression strategy and case segmentation method. In the Judge Stream, a classification of the samples is proposed based on the Judge Map and the corresponding tags to calculate the overlap rate. As a support of Detection Stream, feature pyramid network is utilized in the algorithm to extract Judge Map and calculate LBF. In the Fusion Stream, a new fusion algorithm is raised. By fusing the output of the two streams, we can position the text area in the natural scene accurately. Finally, the algorithm is experimented on the standard data sets ICDAR 2015 and ICDAR2017-MLT. The test results show that the [Formula: see text] values are 87.8% and 67.57%, respectively, superior to the state-of-the art models. This proves that the algorithm can solve the robustness issues under the unbalance between scale transformation and training data.
APA, Harvard, Vancouver, ISO, and other styles
10

Azkan, Can, Markus Spiekermann, and Henry Goecke. "Uncovering Research Streams in the Data Economy Using Text Mining Algorithms." Technology Innovation Management Review 9, no. 11 (January 1, 2019): 62–74. http://dx.doi.org/10.22215/timreview/1284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Liu, Rui Fang, Bao Jin Yu, Jiang Xue, and Li Xin Xu. "The Design of kNN Text Categorization on Storm Cluster." Applied Mechanics and Materials 427-429 (September 2013): 2701–6. http://dx.doi.org/10.4028/www.scientific.net/amm.427-429.2701.

Full text
Abstract:
Try to handle a big data stream for information retrieval is just a beginning from TREC KBA2012. Storm is a free and open source distributed real-time computation system, which makes it easy to reliably process unbounded streams of data. For the issue of KBA2012, the combination of k-nearest neighbor (kNN) algorithm and Storm cluster will be an effective solution. kNN classification technique stands out for its simplicity and capability of being implemented on distributed platform. In addition, the entities (categories) in the issue of KBA2012 are settled down, which corresponds to the situation in which kNN is used. The paper discussed the implementation of kNN on Storm cluster and the results of experiments with KBA2012 data set.
APA, Harvard, Vancouver, ISO, and other styles
12

Rekik, Amal, and Salma Jamoussi. "Incremental autoencoders for text streams clustering in social networks." JUCS - Journal of Universal Computer Science 27, no. 11 (November 28, 2021): 1203–21. http://dx.doi.org/10.3897/jucs.76770.

Full text
Abstract:
Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.
APA, Harvard, Vancouver, ISO, and other styles
13

Ranganathan, Gunasundari, Clara Barathi Priyadharshini Ganesan, and Balakumar Chellamuthu. "Emotional Tendency Analysis of Twitter Data Streams." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 11s (October 7, 2023): 116–26. http://dx.doi.org/10.17762/ijritcc.v11i11s.8077.

Full text
Abstract:
The web now seems to be an alive and dynamic arena in which billions of people across the globe connect, share, publish, and engage in a broad range of everyday activities. Using social media, individuals may connect and communicate with each other at any time and from any location. More than 500 million individuals across the globe post their thoughts and opinions on the internet every day. There is a huge amount of information created from a variety of social media platforms in a variety of formats and languages throughout the globe. Individuals define emotions as powerful feelings directed toward something or someone as a result of internal or external events that have a personal meaning. Emotional recognition in text has several applications in human-computer interface and natural language processing (NLP). Emotion classification has previously been studied using bag-of words classifiers or deep learning methods on static Twitter data. For real-time textual emotion identification, the proposed model combines a mix of keyword-based and learning-based models, as well as a real-time Emotional Tendency Analysis
APA, Harvard, Vancouver, ISO, and other styles
14

Krstajić, Miloš, Mohammad Najm-Araghi, Florian Mansmann, and Daniel A. Keim. "Story Tracker: Incremental visual text analytics of news story development." Information Visualization 12, no. 3-4 (July 2013): 308–23. http://dx.doi.org/10.1177/1473871613493996.

Full text
Abstract:
Online news sources produce thousands of news articles every day, reporting on local and global real-world events. New information quickly replaces the old, making it difficult for readers to put current events in the context of the past. The stories about these events have complex relationships and characteristics that are difficult to model: they can be weakly or strongly related or they can merge or split over time. In this article, we present a visual analytics system for temporal analysis of news stories in dynamic information streams, which combines interactive visualization and text mining techniques to facilitate the analysis of similar topics that split and merge over time. Text clustering algorithms extract stories from online news streams in consecutive time windows and identify similar stories from the past. The stories are displayed in a visualization, which (1) sorts the stories by minimizing clutter and overlap from edge crossings, (2) shows their temporal characteristics in different time frames with different levels of detail, and (3) allows incremental updates of the display without recalculating the past data. Stories can be interactively filtered by their duration and connectivity in order to be explored in full detail. To demonstrate the system’s capabilities for detailed dynamic text stream exploration, we present a use case with real news data about the Arabic Uprising in 2011.
APA, Harvard, Vancouver, ISO, and other styles
15

Krzywicki, Alfred, Wayne Wobcke, Michael Bain, John Calvo Martinez, and Paul Compton. "Data mining for building knowledge bases: techniques, architectures and applications." Knowledge Engineering Review 31, no. 2 (March 2016): 97–123. http://dx.doi.org/10.1017/s0269888916000047.

Full text
Abstract:
AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.
APA, Harvard, Vancouver, ISO, and other styles
16

Netolický, Pavel, Jonáš Petrovský, and František Dařena. "Text‑Mining in Streams of Textual Data Using Time Series Applied to Stock Market." Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis 66, no. 6 (2018): 1573–80. http://dx.doi.org/10.11118/actaun201866061573.

Full text
Abstract:
Each day, a lot of text data is generated. This data comes from various sources and may contain valuable information. In this article, we use text mining methods to discover if there is a connection between news articles and changes of the S&P 500 stock index. The index values and documents were divided into time windows according to the direction of the index value changes. We achieved a classification accuracy of 65–74 %.
APA, Harvard, Vancouver, ISO, and other styles
17

Feng, Long, Haojie Ren, and Changliang Zou. "A setwise EWMA scheme for monitoring high-dimensional datastreams." Random Matrices: Theory and Applications 09, no. 02 (May 9, 2019): 2050004. http://dx.doi.org/10.1142/s2010326320500045.

Full text
Abstract:
The monitoring of high-dimensional data streams has become increasingly important for real-time detection of abnormal activities in many statistical process control (SPC) applications. Although the multivariate SPC has been extensively studied in the literature, the challenges associated with designing a practical monitoring scheme for high-dimensional processes when between-streams correlation exists are yet to be addressed well. Classical [Formula: see text]-test-based schemes do not work well because the contamination bias in estimating the covariance matrix grows rapidly with the increase of dimension. We propose a test statistic which is based on the “divide-and-conquer” strategy, and integrate this statistic into the multivariate exponentially weighted moving average charting scheme for Phase II process monitoring. The key idea is to calculate the [Formula: see text] statistics on low-dimensional sub-vectors and to combine them together. The proposed procedure is essentially distribution-free and computation efficient. The control limit is obtained through the asymptotic distribution of the test statistic under some mild conditions on the dependence structure of stream observations. Our asymptotic results also shed light on quantifying the size of a reference sample required. Both theoretical analysis and numerical results show that the proposed method is able to control the false alarm rate and deliver robust change detection.
APA, Harvard, Vancouver, ISO, and other styles
18

Chen, Lisi, and Shuo Shang. "Region-Based Message Exploration over Spatio-Temporal Data Streams." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 873–80. http://dx.doi.org/10.1609/aaai.v33i01.3301873.

Full text
Abstract:
Massive amount of spatio-temporal data that contain location and text content are being generated by location-based social media. These spatio-temporal messages cover a wide range of topics. It is of great significance to discover local trending topics based on users’ location-based and topicbased requirements. We develop a region-based message exploration mechanism that retrieve spatio-temporal message clusters from a stream of spatio-temporal messages based on users’ preferences on message topic and message spatial distribution. Additionally, we propose a region summarization algorithm that finds a subset of representative messages in a cluster to summarize the topics and the spatial attributes of messages in the cluster. We evaluate the efficacy and efficiency of our proposal on two real-world datasets and the results demonstrate that our solution is capable of high efficiency and effectiveness compared with baselines.
APA, Harvard, Vancouver, ISO, and other styles
19

Padovani, José. "Pandemics, Delays, and Pure Data: on ‘afterlives’ (2020), for Flute and Live Electronics and Visuals." Revista Vórtex 9, no. 2 (December 10, 2021): 1–14. http://dx.doi.org/10.33871/23179937.2021.9.2.17.

Full text
Abstract:
The essay addresses creative and technical aspects of the piece ‘afterlives’ (2020), for flute and live electronics and visuals. Composed and premiered in the context of the COVID-19 pandemic, the composition employs audiovisual processes based on different audiovisual techniques: phase-vocoders, buffer-based granulations, Ambisonics spatialization, and variable delay of video streams. The resulting sounds and images allude to typical situations of social interaction via video conferencing applications. ‘Afterlives’ relies on an interplay between current, almost-current, and past moments of the audiovisual streams, which dephase the performer’s images and sounds. I have avoided, in text, delving deeper into the Pure Data abstractions and or into the musical analysis of my composition. The main purpose of the text is rather to present compositional/technical elements of ‘afterlives’ and discuss how they enable new experiences of time.
APA, Harvard, Vancouver, ISO, and other styles
20

Mishra, Devendra Kumar. "CHALLENGES IN TEXT MINING FOR BUSINESS INTELLIGENCE." International Journal of Engineering Technologies and Management Research 5, no. 2 (May 4, 2020): 301–4. http://dx.doi.org/10.29121/ijetmr.v5.i2.2018.660.

Full text
Abstract:
Today is the era of internet; the internet represents a big space where large amounts of data are added every day. This huge amount of digital data and interconnection exploding data. Big Data mining have the capability to retrieving useful information in large datasets or streams of data. Analysis can also be done in a distributed environment. The framework needed for analysis to this large amount of data must support statistical analysis and data mining. The framework should be design in such a way so that big data and traditional data can be combined, so results that come analyzing new data with the old data. Traditional tools are not sufficient to extract information those are unseen.
APA, Harvard, Vancouver, ISO, and other styles
21

Perkins, Mark. "Aspects of Discourse Stream Analysis." Global Language Review IV, no. II (December 30, 2019): 1–6. http://dx.doi.org/10.31703/glr.2019(iv-ii).01.

Full text
Abstract:
The huge proliferation of textual (and other data) in digital and organisational sources has led to new techniques of text analysis. The potential thereby unleashed may be underpinned by further theoretical developments to the theory of Discourse Stream Analysis (DSA) as presented here. These include the notion of change in the discourse stream in terms of discourse stream fronts, linguistic elements evolving in real time, and notions of time itself in terms of relative speed, subject orientation and perception. Big data has also given rise to fake news, the manipulation of messages on a large scale. Fake news is conveyed in fake discourse streams and has led to a new field of description and analysis.
APA, Harvard, Vancouver, ISO, and other styles
22

Kapenieks, Jānis. "A WEB-BASED FAST AND RELIABLE TEXT CLASSIFICATION TOOL." SOCIETY. TECHNOLOGY. SOLUTIONS. Proceedings of the International Scientific Conference 1 (April 17, 2019): 24. http://dx.doi.org/10.35363/via.sts.2019.21.

Full text
Abstract:
INTRODUCTION Opinion analysis in the big data analysis context has been a hot topic in science and the business world recently. Social media has become a key data source for opinions generating a large amount of data every day providing content for further analysis. In the Big data age, unstructured data classification is one of the key tools for fast and reliable content analysis. I expect significant growth in the demand for content classification services in the nearest future. There are many online text classification tools available providing limited functionality -such as automated text classification in predefined categories and sentiment analysis based on a pre-trained machine learning algorithm. The limited functionality does not provide tools such as data mining support and/or a machine learning algorithm training interface. There are a limited number of tools available providing the whole sets of tools required for text classification, i.e. this includes all the steps starting from data mining till building a machine learning algorithm and applying it to a data stream from a social network source. My goal is to create a tool able to generate a classified text stream directly from social media with a user friendly set-up interface. METHODS AND MATERIALS The text classification tool will have a core based modular structure (each module providing certain functionality) so the system can be scaled in terms of technology and functionality. The tool will be built on open source libraries and programming languages running on a Linux OS based server. The tool will be based on three key components: frontend, backend and data storage as described below: backend: Python and Nodejs programming language with machine learning and text filtering libraries: TensorFlow, and Keras, for data storage Mysql 5.7/8 will be used, frontend will be based on web technologies built using PHP and Javascript. EXPECTED RESULTS The expected result of my work is a web-based text classification tool for opinion analysis using data streams from social media. The tool will provide a user friendly interface for data collection, algorithm selection, machine learning algorithm setup and training. Multiple text classification algorithms will be available as listed below: Linear SVM Random Forest Multinomial Naive Bayes Bernoulli Naive Bayes Ridge Regressio Perceptron Passive Aggressive Classifier Deep machine learning algorithm. System users will be able to identify the most effective algorithm for their text classification task and compare them based on their accuracy. The architecture of the text classification tool will be based on a frontend interface and backend services. The frontend interface will provide all the tools the system user will be interacting with the system. This includes setting up data collection streams from multiple social networks and allocating them to pre-specified channels based on keywords. Data from each channel can be classified and assigned to a pre-defined cluster. The tool will provide a training interface for machine learning algorithms. This text classification tool is currently in active development for a client with planned testing and implementation in April 2019.
APA, Harvard, Vancouver, ISO, and other styles
23

Karim, Farah, Ioanna Lytra, Christian Mader, Sören Auer, and Maria-Esther Vidal. "DESERT: A Continuous SPARQL Query Engine for On-Demand Query Answering." International Journal of Semantic Computing 12, no. 03 (September 2018): 373–97. http://dx.doi.org/10.1142/s1793351x18400172.

Full text
Abstract:
The Internet of Things (IoT) has been rapidly adopted in many domains ranging from household appliances e.g. ventilation, lighting, and heating, to industrial manufacturing and transport networks. Despite the, enormous benefits of optimization, monitoring, and maintenance rendered by IoT devices, an ample amount of data is generated continuously. Semantically describing IoT generated data using ontologies enables a precise interpretation of this data. However, ontology-based descriptions tremendously increase the size of IoT data and in presence of repeated sensor measurements, a large amount of the data are duplicates that do not contribute to new insights during query processing or IoT data analytics. In order to ensure that only required ontology-based descriptions are generated, we devise a knowledge-driven approach named DESERT that is able to on-[Formula: see text]emand factoriz[Formula: see text] and [Formula: see text]emantically [Formula: see text]nrich st[Formula: see text]eam da[Formula: see text]a. DESERT resorts to a knowledge graph to describe IoT stream data; it utilizes only the data that is required to answer an input continuous SPARQL query and applies a novel method of data factorization to reduce duplicated measurements in the knowledge graph. The performance of DESERT is empirically studied on a collection of continuous SPARQL queries from SRBench, a benchmark of IoT stream data and continuous SPARQL queries. Furthermore, data streams with various combinations of uniform and varying data stream speeds and streaming window size dimensions are considered in the study. Experimental results suggest that DESERT is capable of speeding up continuous query processing while creates knowledge graphs that include no replications.
APA, Harvard, Vancouver, ISO, and other styles
24

Kim, Hajin, Myeong-Seon Gil, Yang-Sae Moon, and Mi-Jung Choi. "Variable size sampling to support high uniformity confidence in sensor data streams." International Journal of Distributed Sensor Networks 14, no. 4 (April 2018): 155014771877399. http://dx.doi.org/10.1177/1550147718773999.

Full text
Abstract:
In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance, and the continuous degradation by the sampling range increase. For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample, which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.
APA, Harvard, Vancouver, ISO, and other styles
25

PhridviRaj, Chintakindi Srinivas, and C. V. GuruRao. "Clustering Text Data Streams – A Tree based Approach with Ternary Function and Ternary Feature Vector." Procedia Computer Science 31 (2014): 976–84. http://dx.doi.org/10.1016/j.procs.2014.05.350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Butakova, M. A., and G. S. Miziukov. "Measure and conditions for determining the information proximity of text information streams." Informatization and communication, no. 2 (April 30, 2020): 114–18. http://dx.doi.org/10.34219/2078-8320-2020-11-2-114-118.

Full text
Abstract:
The aim of the article is to present a new approach for determining the informational proximity between textual information flows. To achieve this goal, the article considers the existing approach for determining the similarity between poorly structured data. The basic conditions for the results are described, the stages of validation and the coefficients for determining information proximity are highlighted. Examples of calculations on test data are given. In conclusion, the results of the verification of the approach are described, a brief description of the results is given.
APA, Harvard, Vancouver, ISO, and other styles
27

Krukovets, Dmytro. "Data Science Opportunities at Central Banks: Overview." Visnyk of the National Bank of Ukraine, no. 249 (June 30, 2020): 13–24. http://dx.doi.org/10.26531/vnbu2020.249.02.

Full text
Abstract:
This paper reviews the main streams of Data Science algorithm usage at central banks and shows their rising popularity over time. It contains an overview of use cases for macroeconomic and financial forecasting, text analysis (newspapers, social networks, and various types of reports), and other techniques based on or connected to large amounts of data. The author also pays attention to the recent achievements of the National Bank of Ukraine in this area. This study contributes to the building of the vector for research the role of Data Science for central banking.
APA, Harvard, Vancouver, ISO, and other styles
28

Masegosa, Andrés R., Darío Ramos-López, Antonio Salmerón, Helge Langseth, and Thomas D. Nielsen. "Variational Inference over Nonstationary Data Streams for Exponential Family Models." Mathematics 8, no. 11 (November 3, 2020): 1942. http://dx.doi.org/10.3390/math8111942.

Full text
Abstract:
In many modern data analysis problems, the available data is not static but, instead, comes in a streaming fashion. Performing Bayesian inference on a data stream is challenging for several reasons. First, it requires continuous model updating and the ability to handle a posterior distribution conditioned on an unbounded data set. Secondly, the underlying data distribution may drift from one time step to another, and the classic i.i.d. (independent and identically distributed), or data exchangeability assumption does not hold anymore. In this paper, we present an approximate Bayesian inference approach using variational methods that addresses these issues for conjugate exponential family models with latent variables. Our proposal makes use of a novel scheme based on hierarchical priors to explicitly model temporal changes of the model parameters. We show how this approach induces an exponential forgetting mechanism with adaptive forgetting rates. The method is able to capture the smoothness of the concept drift, ranging from no drift to abrupt drift. The proposed variational inference scheme maintains the computational efficiency of variational methods over conjugate models, which is critical in streaming settings. The approach is validated on four different domains (energy, finance, geolocation, and text) using four real-world data sets.
APA, Harvard, Vancouver, ISO, and other styles
29

VALENCIA, MARIA, CODRINA LAUTH, and ERNESTINA MENASALVAS. "EMERGING USER INTENTIONS: MATCHING USER QUERIES WITH TOPIC EVOLUTION IN NEWS TEXT STREAMS." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 17, supp01 (August 2009): 59–80. http://dx.doi.org/10.1142/s0218488509006030.

Full text
Abstract:
Trend detection analysis from unstructured data poses a huge challenge to current advanced, web-enabled knowledge-based systems (KBS). Consolidated studies in topic and trend detection from text streams have concentrated so far mainly on identifying and visualizing dynamically evolving text patterns. From the knowledge modeling perspective identifying and defining new, relevant features that are able to synchronize the emergent user intentions to the dynamicity of the system's structure is a need. Additionally the advanced KBS have to remain highly sensitive to the content change, marked by evolution of trends in topics extracted from text streams. In this paper, we are describing a three-layered approach called the "user-system-content method" that is helping us to identify the most relevant knowledge mapping features derived from the USER, SYSTEM and CONTENT perspectives into an overall "context model", that will enable the advanced KBS to automatically streamline the query enrichment process in a much more user-centered, dynamical and flexible way. After a general introduction to our three-layered approach, we will describe into detail the necessary process steps for the implementation of our method and will present a case study for its integration on a real multimedia web-content portal using news streams as major source of unstructured information.
APA, Harvard, Vancouver, ISO, and other styles
30

Prof. Dr. Husam Abulrazzak and Fatma Hassan Al-Rubbiay. "Analyzing Time series by using Data mining." Journal of Administration and Economics 48, no. 139 (June 9, 2024): 279–81. http://dx.doi.org/10.31272/jae.i139.1099.

Full text
Abstract:
Algorithms and complex data analysis techniques are used in multiple fields that are expanding daily, and with it the challenges in facing multiple and more complex data types, and the directions of exploration research vary according to the diversity of these fields, and their use is increasing in the modern era in the field of artificial intelligence, which aims to facilitate human life in various fields. Mining of complex data types includes mining of time series, symbolic chains, and biological chains, in addition to mining of graphs, computer networks, mobile data, text mining, and data streams.
APA, Harvard, Vancouver, ISO, and other styles
31

Yadav, Piyush, Dhaval Salwala, Dibya Prakash Das, and Edward Curry. "Knowledge Graph Driven Approach to Represent Video Streams for Spatiotemporal Event Pattern Matching in Complex Event Processing." International Journal of Semantic Computing 14, no. 03 (September 2020): 423–55. http://dx.doi.org/10.1142/s1793351x20500051.

Full text
Abstract:
Complex Event Processing (CEP) is an event processing paradigm to perform real-time analytics over streaming data and match high-level event patterns. Presently, CEP is limited to process structured data stream. Video streams are complicated due to their unstructured data model and limit CEP systems to perform matching over them. This work introduces a graph-based structure for continuous evolving video streams, which enables the CEP system to query complex video event patterns. We propose the Video Event Knowledge Graph (VEKG), a graph-driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. A CEP-based state optimization — VEKG-Time Aggregated Graph (VEKG-TAG) — is proposed over VEKG representation for faster event detection. VEKG-TAG is a spatiotemporal graph aggregation method that provides a summarized view of the VEKG graph over a given time length. We defined a set of nine event pattern rules for two domains (Activity Recognition and Traffic Management), which act as a query and applied over VEKG graphs to discover complex event patterns. To show the efficacy of our approach, we performed extensive experiments over 801 video clips across 10 datasets. The proposed VEKG approach was compared with other state-of-the-art methods and was able to detect complex event patterns over videos with [Formula: see text]-Score ranging from 0.44 to 0.90. In the given experiments, the optimized VEKG-TAG was able to reduce 99% and 93% of VEKG nodes and edges, respectively, with 5.19[Formula: see text] faster search time, achieving sub-second median latency of 4–20[Formula: see text]ms.
APA, Harvard, Vancouver, ISO, and other styles
32

Elkamchouchi, Hassan, Rosemarie Anton, and Yasmine Abouelseoud. "Multimedia Data Secure Transmission: A Review." International Journal of Scientific Research and Management 10, no. 12 (December 2, 2022): 949–71. http://dx.doi.org/10.18535/ijsrm/v10i12.ec01.

Full text
Abstract:
Encryption is a technique of encoding data so that they can only be recognized by authorized receivers. More interactive media information is communicated in the medical, business, and military fields because of the rapid advances in various multimedia transmission and networking technologies, which may contain sensitive information that must be kept hidden from public users. Advanced encryption standards (AES) and data encryption standards (DES) are widely used encryption algorithms for text data. However, they are not appropriate for video data. To ensure that this information cannot be accessed by attackers, the demand for efficient video-protection techniques has been raised. This article provides multimedia design requirements to maintain a secure multimedia system occupied with a threat model for detecting and ranking the potential risks facing a multimedia system. the risks exposed to multimedia security and their impacts on users are typically described according to the textual description and also an overview of the current state-of-the-art video-encryption schemes are presented and their performance parameters have been examined. The relationship between encryption algorithms and compression techniques is also discussed and various multimedia applications have been presented in this paper; Additionally, as the synchronisation of real-time continuous streams is necessary for the interchange of these streams in multimedia conferencing services, multiple synchronisation strategies have been given in this study along with video synchronisation challenges.
APA, Harvard, Vancouver, ISO, and other styles
33

Naldi, Giovanni, Andrea Mattana, Sandro Pastore, Monica Alderighi, Kristian Zarb Adami, Francesco Schillirò, Amin Aminaei, et al. "The Digital Signal Processing Platform for the Low Frequency Aperture Array: Preliminary Results on the Data Acquisition Unit." Journal of Astronomical Instrumentation 06, no. 01 (March 2017): 1641014. http://dx.doi.org/10.1142/s2251171716410142.

Full text
Abstract:
A signal processing hardware platform has been developed for the Low Frequency Aperture Array component of the Square Kilometre Array (SKA). The processing board, called an Analog Digital Unit (ADU), is able to acquire and digitize broadband (up to 500[Formula: see text]MHz bandwidth) radio-frequency streams from 16 dual polarized antennas, channel the data streams and then combine them flexibly as part of a larger beamforming system. It is envisaged that there will be more than 8000 of these signal processing platforms in the first phase of the SKA, so particular attention has been devoted to ensure the design is low-cost and low-power. This paper describes the main features of the data acquisition unit of such a platform and presents preliminary results characterizing its performance.
APA, Harvard, Vancouver, ISO, and other styles
34

Asha Priyadarshini.M. "Integrating Diverse Data Streams for Enhanced Emotional Intelligence in Mental Health Care." Communications on Applied Nonlinear Analysis 32, no. 1s (October 30, 2024): 182–95. http://dx.doi.org/10.52783/cana.v32.2147.

Full text
Abstract:
This research presents a comprehensive system for real-time emotion recognition and analysis using multimodal data, including image, video, audio, and text. The system employs deep learning models to extract features and classify emotions from each modality. By integrating these predictions, we aim to provide a holistic understanding of a user's emotional state and assess potential risks. The system further analyzes the collected emotion data to identify trends, patterns, and indicators of emotional well-being and suicide risk. Visualizations such as time-series plots, distribution charts, and stacked area charts are employed to represent the emotional dynamics over time.Based on the analysis; the system generates personalized recommendations and alerts for users exhibiting signs of distress or emotional instability, including potential suicide risk factors. The goal is to promote proactive well-being management and early intervention. It is crucial to emphasize that this system is a tool for early detection and should be used in conjunction with professional mental health care.
APA, Harvard, Vancouver, ISO, and other styles
35

Zubair Al-Dyani, Wafa, Adnan Hussein Yahya, and Farzana Kabir Ahmad. "Challenges of event detection from social media streams." International Journal of Engineering & Technology 7, no. 2.15 (April 6, 2018): 72. http://dx.doi.org/10.14419/ijet.v7i2.15.11217.

Full text
Abstract:
The area of Event Detection (ED) has attracted researchers' attention over the last few years because of the wide use of social media. Many studies have examined the problem of ED in various social media platforms, like Twitter, Facebook, YouTube, etc. The ED task for social networks involves many issues, including the processing of huge volumes of data with a high level of noise, data collection and privacy issues, etc. Hence, this article discusses and presents the wide range of challenges encountered in the ED process from unstructured text data for the most popular Social Networks (SNs), such as Facebook and Twitter. The main goal is to aid the researchers to understand the main challenges and to discuss the future directions in the ED area.
APA, Harvard, Vancouver, ISO, and other styles
36

Chudaev, D. A. "Materials to diatom flora of Moscow Region: naviculoid diatoms of Meleevsky Stream (Zvenigorod Biological Station)." Novosti sistematiki nizshikh rastenii 50 (2016): 142–59. http://dx.doi.org/10.31111/nsnr/2016.50.142.

Full text
Abstract:
The article contains a brief review of investigation of diatom flora of S. N. Skadovsky Zvenigorod Biological Station (Moscow Region). Until present the shallow streams of Moscow Region and European Russia did not attract a proper attention of the diatomologists that makes this study important. To date studies of algae of Meleevsky Stream have not been conducted. The stream flows a in spruce forest, its valley is swampy and water is characterized by circumneutral pH values and medium electrolyte content. The list of 98 species, varietes and morphotypes of naviculoid diatoms belonging to 18 genera found in epipelon of the watercourse is provided. For all taxa brief descriptions and light or electron micrographs are given. The highest taxonomic richness was observed in the genera Pinnularia (23 taxa), Sellaphora (17) and Stauroneis (13). 3 genera and 47 species and varietes are new for the flora of Zvenigorod Biological Station (marked with * in the text), 21 species are new records for Moscow Region (marked with **). On the one hand the work broadens our knowledge of diatom flora of Moscow Region, on the other hand it contrubutes to the data on diatom biodiversity in small forest streams of European part of Russia neglected by researchers.
APA, Harvard, Vancouver, ISO, and other styles
37

Li, Shuqi, Weiheng Liao, Yuhan Chen, and Rui Yan. "PEN: Prediction-Explanation Network to Forecast Stock Price Movement with Better Explainability." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (June 26, 2023): 5187–94. http://dx.doi.org/10.1609/aaai.v37i4.25648.

Full text
Abstract:
Nowadays explainability in stock price movement prediction is attracting increasing attention in banks, hedge funds and asset managers, primarily due to audit or regulatory reasons. Text data such as financial news and social media posts can be part of the reasons for stock price movement. To this end, we propose a novel framework of Prediction-Explanation Network (PEN) jointly modeling text streams and price streams with alignment. The key component of the PEN model is an shared representation learning module that learns which texts are possibly associated with the stock price movement by modeling the interaction between the text data and stock price data with a salient vector characterizing their correlation. In this way, the PEN model is able to predict the stock price movement by identifying and utilizing abundant messages while on the other hand, the selected text messages also explain the stock price movement. Experiments on real-world datasets demonstrate that we are able to kill two birds with one stone: in terms of accuracy, the proposed PEN model outperforms the state-of-art baseline; on explainability, the PEN model are demonstrated to be far superior to attention mechanism, capable of picking out the crucial texts with a very high confidence.
APA, Harvard, Vancouver, ISO, and other styles
38

Wang, Y. D., T. Wang, X. Y. Ye, J. Q. Zhu, and J. Lee. "Using social media for disaster emergency management." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (June 8, 2016): 579–81. http://dx.doi.org/10.5194/isprs-archives-xli-b2-579-2016.

Full text
Abstract:
Social media have become a universal phenomenon in our society (Wang et al., 2012). As a new data source, social media have been widely used in knowledge discovery in fields related to health (Jackson et al., 2014), human behaviour (Lee, 2014), social influence (Hong, 2013), and market analysis (Hanna et al., 2011). <br><br> In this paper, we report a case study of the 2012 Beijing Rainstorm to investigate how emergency information was timely distributed using social media during emergency events. We present a classification and location model for social media text streams during emergency events. This model classifies social media text streams based on their topical contents. Integrated with a trend analysis, we show how Sina-Weibo fluctuated during emergency events. Using a spatial statistical analysis method, we found that the distribution patterns of Sina-Weibo were related to the emergency events but varied among different topics. This study helps us to better understand emergency events so that decision-makers can act on emergencies in a timely manner. In addition, this paper presents the tools, methods, and models developed in this study that can be used to work with text streams from social media in the context of disaster management.
APA, Harvard, Vancouver, ISO, and other styles
39

Wang, Y. D., T. Wang, X. Y. Ye, J. Q. Zhu, and J. Lee. "Using social media for disaster emergency management." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (June 8, 2016): 579–81. http://dx.doi.org/10.5194/isprsarchives-xli-b2-579-2016.

Full text
Abstract:
Social media have become a universal phenomenon in our society (Wang et al., 2012). As a new data source, social media have been widely used in knowledge discovery in fields related to health (Jackson et al., 2014), human behaviour (Lee, 2014), social influence (Hong, 2013), and market analysis (Hanna et al., 2011). &lt;br&gt;&lt;br&gt; In this paper, we report a case study of the 2012 Beijing Rainstorm to investigate how emergency information was timely distributed using social media during emergency events. We present a classification and location model for social media text streams during emergency events. This model classifies social media text streams based on their topical contents. Integrated with a trend analysis, we show how Sina-Weibo fluctuated during emergency events. Using a spatial statistical analysis method, we found that the distribution patterns of Sina-Weibo were related to the emergency events but varied among different topics. This study helps us to better understand emergency events so that decision-makers can act on emergencies in a timely manner. In addition, this paper presents the tools, methods, and models developed in this study that can be used to work with text streams from social media in the context of disaster management.
APA, Harvard, Vancouver, ISO, and other styles
40

Гибадуллин, Р. Ф., Д. А. Гашигуллин, and И. С. Вершинин. "Development of StegoStream decorator for associative protection of byte stream." МОДЕЛИРОВАНИЕ, ОПТИМИЗАЦИЯ И ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ 11, no. 2(41) (April 12, 2023): 23–24. http://dx.doi.org/10.26102/2310-6018/2023.41.2.023.

Full text
Abstract:
Потоковая архитектура .NET основана на трех концепциях: опорные хранилища, декораторы и адаптеры. Опорное хранилище представляет собой конечную точку, такую как файл на накопителе, массив в оперативной памяти или сетевое подключение. Опорное хранилище не может использоваться, если программисту не открыт к нему доступ. Стандартным классом .NET, который предназначен для такой цели, является Stream (поток); он предоставляет стандартный набор методов, позволяющих выполнять побайтовое чтение, запись и позиционирование. Потоки делятся на две категории: потоки с опорными хранилищами и потоки с декораторами. Потоки с опорными хранилищами и потоки с декораторами имеют дело исключительно с байтами. Это гибко и эффективно, однако приложения часто работают на более высоких уровнях, таких как текст или XML. Адаптеры преодолевают такой разрыв, помещая поток в оболочку класса со специализированными методами, которые типизированы для конкретного формата. В статье представлен разработанный авторами декоратор StegoStream, основанный на ассоциативном механизме защиты данных. Данный декоратор обладает следующими преимуществами: он обеспечивает взаимодействие с адаптером; освобождает потоки с опорными хранилищами от необходимости самостоятельной реализации таких возможностей, как сокрытие и раскрытие; потоки, декорированные StegoStream, не страдают от изменения интерфейса; StegoStream можно использовать при соединении в цепочки с другими декораторами (например, декоратор сжатия можно соединить с декоратором сокрытия). Практическое использование декоратора StegoStream представлено на примере разработанного мультиклиентного защищенного чата с централизованным сервером. The .NET streaming architecture is based on three concepts: reference repositories, decorators, and adapters. A reference repository represents an endpoint, such as a file on a storage device, an array in RAM, or a network connection. It cannot be used unless the programmer has access to it. The standard .NET class designed for this purpose is Stream; it provides a standard set of methods that allow byte-by-byte reading, writing, and positioning. Streams fall into two categories: streams with reference repository and streams with decorators. Streams with reference repositories and streams with decorators deal exclusively with bytes. While flexible and efficient, applications often operate at higher levels, such as text or XML. Adapters bridge this gap by putting a stream into a class shell with specialized methods that are typified for a specific format. The paper presents the StegoStream decorator developed by the authors, which is based on associative data protection mechanism. This decorator has the following advantages: it provides interaction with the adapter; it releases streams with reference repositories from the necessity of independent implementation of such features as hiding and unhiding; streams decorated with StegoStream do not suffer from interface changes; StegoStream may be used when chaining with other decorators (for example, the compression decorator may be combined with the hiding decorator). Practical use of StegoStream decorator is presented drawing on the example of the developed multi-client secure chat with a centralized server. Keywords:associative steganography, cryptography, streaming architecture, decorator, information security.
APA, Harvard, Vancouver, ISO, and other styles
41

Tonkin, Emma, Alison Burrows, Przemysław Woznowski, Pawel Laskowski, Kristina Yordanova, Niall Twomey, and Ian Craddock. "Talk, Text, Tag? Understanding Self-Annotation of Smart Home Data from a User’s Perspective." Sensors 18, no. 7 (July 20, 2018): 2365. http://dx.doi.org/10.3390/s18072365.

Full text
Abstract:
Delivering effortless interactions and appropriate interventions through pervasive systems requires making sense of multiple streams of sensor data. This is particularly challenging when these concern people’s natural behaviours in the real world. This paper takes a multidisciplinary perspective of annotation and draws on an exploratory study of 12 people, who were encouraged to use a multi-modal annotation app while living in a prototype smart home. Analysis of the app usage data and of semi-structured interviews with the participants revealed strengths and limitations regarding self-annotation in a naturalistic context. Handing control of the annotation process to research participants enabled them to reason about their own data, while generating accounts that were appropriate and acceptable to them. Self-annotation provided participants an opportunity to reflect on themselves and their routines, but it was also a means to express themselves freely and sometimes even a backchannel to communicate playfully with the researchers. However, self-annotation may not be an effective way to capture accurate start and finish times for activities, or location associated with activity information. This paper offers new insights and recommendations for the design of self-annotation tools for deployment in the real world.
APA, Harvard, Vancouver, ISO, and other styles
42

Zhang, Congle, Stephen Soderland, and Daniel S. Weld. "Exploiting Parallel News Streams for Unsupervised Event Extraction." Transactions of the Association for Computational Linguistics 3 (December 2015): 117–29. http://dx.doi.org/10.1162/tacl_a_00127.

Full text
Abstract:
Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations ( e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge. This paper introduces NewsSpike-RE, a novel, unsupervised algorithm that discovers event relations and then learns to extract them. NewsSpike-RE uses a novel probabilistic graphical model to cluster sentences describing similar events from parallel news streams. These clusters then comprise training data for the extractor. Our evaluation shows that NewsSpike-RE generates high quality training sentences and learns extractors that perform much better than rival approaches, more than doubling the area under a precision-recall curve compared to Universal Schemas.
APA, Harvard, Vancouver, ISO, and other styles
43

MOLINA, MARTIN, and AMANDA STENT. "A KNOWLEDGE-BASED METHOD FOR GENERATING SUMMARIES OF SPATIAL MOVEMENT IN GEOGRAPHIC AREAS." International Journal on Artificial Intelligence Tools 19, no. 04 (August 2010): 393–415. http://dx.doi.org/10.1142/s021821301000025x.

Full text
Abstract:
In this article we describe a method for automatically generating text summaries of data corresponding to traces of spatial movement in geographical areas. The method can help humans to understand large data streams, such as the amounts of GPS data recorded by a variety of sensors in mobile phones, cars, etc. We describe the knowledge representations we designed for our method and the main components of our method for generating the summaries: a discourse planner, an abstraction module and a text generator. We also present evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.
APA, Harvard, Vancouver, ISO, and other styles
44

Preotiuc-Pietro, Daniel, Sina Samangooei, Trevor Cohn, Nicholas Gibbins, and Mahesan Niranjan. "Trendminer: An Architecture for Real Time Analysis of Social Media Text." Proceedings of the International AAAI Conference on Web and Social Media 6, no. 3 (August 3, 2021): 38–42. http://dx.doi.org/10.1609/icwsm.v6i3.14348.

Full text
Abstract:
The emergence of online social networks (OSNs) and the accompanying availability of large amounts of data, pose a number of new natural language processing (NLP) and computational challenges. Data from OSNs is different to data from traditional sources (e.g. newswire). The texts are short, noisy and conversational. Another important issue is that data occurs in a real-time streams, needing immediate analysis that is grounded in time and context. In this paper we describe a new open-source framework for efficient text processing of streaming OSN data (available at www.trendminer-project.eu). Whilst researchers have made progress in adapting or creating text analysis tools for OSN data, a system to unify these tasks has yet to be built. Our system is focused on a real world scenario where fast processing and accuracy is paramount. We use the MapReduce framework for distributed computing and present running times for our system in order to show that scaling to online scenarios is feasible.We describe the components of the system and evaluate their accuracy. Our system supports easy integration of future modules in order to extend its functionality.
APA, Harvard, Vancouver, ISO, and other styles
45

Bou, Savong, Toshiyuki Amagasa, and Hiroyuki Kitagawa. "Path-based keyword search over XML streams." International Journal of Web Information Systems 11, no. 3 (August 17, 2015): 347–69. http://dx.doi.org/10.1108/ijwis-04-2015-0013.

Full text
Abstract:
Purpose – In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and XPath-based filtering conditions at the same time. Experimental results prove that our proposed scheme can efficiently and practically process XPath-based keyword search over XML streams. Design/methodology/approach – To allow XPath-based keyword search over XML streams, it was attempted to integrate YFilter (Diao et al., 2003) with CKStream (Hummel et al., 2011). More precisely, the nondeterministic finite automation (NFA) of YFilter is extended so that keyword matching at text nodes is supported. Next, the stack data structure is modified by integrating set of NFA states in YFilter with bitmaps generated from set of keyword queries in CKStream. Findings – Extensive experiments were conducted using both synthetic and real data set to show the effectiveness of the proposed method. The experimental results showed that the accuracy of the proposed method was better than the baseline method (CKStream), while it consumed less memory. Moreover, the proposed scheme showed good scalability with respect to the number of queries. Originality/value – Due to the rapid diffusion of XML streams, the demand for querying such information is also growing. In such a situation, the ability to query by combining XPath and keyword search is important, because it is easy to use, but powerful means to query XML streams. However, none of existing works has addressed this issue. This work is to cope with this problem by combining an existing XPath-based YFilter and a keyword-search-based CKStream for XML streams to enable XPath-based keyword search.
APA, Harvard, Vancouver, ISO, and other styles
46

Li, Jin, and Ruibo Zhao. "Integrated Classification Algorithm for Unbalanced Data Streams Based on Joint Nonnegative Matrix Factorization." Wireless Communications and Mobile Computing 2022 (June 14, 2022): 1–12. http://dx.doi.org/10.1155/2022/5659979.

Full text
Abstract:
The purpose of this paper is to study the unbalanced data flow integration classification algorithm based on joint nonnegative matrix factorization, in order to solve the problem that the basic clustering results obtained from the original data set have some information loss, thereby reducing the effective information in the integration stage. In this paper, the accuracy of the unbalanced data and the detection time consumption are selected as the research object. Six data sets with imbalanced proportions of minority and majority samples are selected for experiments. Mathematical statistical analysis is first used to observe text classification, disease diagnosis, and network intrusion detection and the classification accuracy of majority class and minority class; the commonly used algorithm for unbalanced data is statistical analysis method. Comparing the univariate method for comprehensive classification of unbalanced data flow based on nonnegative matrix factorization with the unbalanced data algorithm, the observation has accurate rate and detects time-consuming changes. Among them, the comprehensive classification algorithm of unbalanced data flow is based on the classification of data, classifying the data, judging whether two data points belong to the same category, and determining their degree of balance. The research data shows that the unbalanced data flow integrated classification algorithm based on joint nonnegative matrix decomposition can reasonably evaluate the classification performance of the classifier for a few classes, and the detection speed is faster and saves more time. The experimental research shows that the algorithm combines the relationship matrix and information matrix from the original data set into a consensus function, uses NMF technology to obtain the membership matrix, effectively uses potential information, improves the accuracy rate of 69.73%, and shortens 71.65% of the time consumed.
APA, Harvard, Vancouver, ISO, and other styles
47

Hernandez-Boussard, Tina. "Abstract IA14: Linking heterogeneous data to enable knowledge discovery in health care." Cancer Epidemiology, Biomarkers & Prevention 29, no. 9_Supplement (September 1, 2020): IA14. http://dx.doi.org/10.1158/1538-7755.modpop19-ia14.

Full text
Abstract:
Abstract The vision of precision medicine relies on the linkage of large-scale clinical, molecular, environmental, and patient-generated datasets. Traditionally, diverse data streams have been analyzed independently, including the wealth of information captured in electronic health records (EHRs). However, to successfully leverage the volumes of data that can be used in health care, cross-modality integration is necessary. We have developed a clinical data warehouse for prostate cancer that integrates multiple data streams, from structured EHR data to imaging, state registries to patient-generated data, as well as the rich granular information contained in unstructured clinical narrative text. This rich, longitudinal dataset facilitates secondary data use and enhances observational research in oncology. We have developed advanced machine learning approaches to analyze these data. Our methods can accurately classify patients into clinical and pathologic stage groups and prognostic risk groups. These classifications can be used at point of care to guide optimal treatment pathways based on evidence-based guidelines (e.g., identify high-risk patients for whom a radionuclide bone scan is recommended). Furthermore, linking routinely collected patient surveys to EHRs reveals important differences in global physical and mental health between demographic and clinical subgroups. Giving clinicians visibility into these patient-reported outcomes can help personalize treatment pathways and may inform population health initiatives to support vulnerable groups. The granular health data we collect and link also provide population-based views into changes in treatment patterns and effects from policy changes, e.g., changes to PSA screening guidelines. The integration of diverse data streams presents unique technical, semantic, and ethical challenges; however, our work suggests that multimodal clinical data can significantly improve the performance of prediction algorithms and guide treatment decisions, powering knowledge discovery at the patient and population level. Citation Format: Tina Hernandez-Boussard. Linking heterogeneous data to enable knowledge discovery in health care [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr IA14.
APA, Harvard, Vancouver, ISO, and other styles
48

Kim, Min-Seon, Bo-Young Lim, Kisung Lee, and Hyuk-Yoon Kwon. "Effective Model Update for Adaptive Classification of Text Streams in a Distributed Learning Environment." Sensors 22, no. 23 (November 29, 2022): 9298. http://dx.doi.org/10.3390/s22239298.

Full text
Abstract:
In this study, we propose dynamic model update methods for the adaptive classification model of text streams in a distributed learning environment. In particular, we present two model update strategies: (1) the entire model update and (2) the partial model update. The former aims to maximize the model accuracy by periodically rebuilding the model based on the accumulated datasets including recent datasets. Its learning time incrementally increases as the datasets increase, but we alleviate the learning overhead by the distributed learning of the model. The latter fine-tunes the model only with a limited number of recent datasets, noting that the data streams are dependent on a recent event. Therefore, it accelerates the learning speed while maintaining a certain level of accuracy. To verify the proposed update strategies, we extensively apply them to not only fully trainable language models based on CNN, RNN, and Bi-LSTM, but also a pre-trained embedding model based on BERT. Through extensive experiments using two real tweet streaming datasets, we show that the entire model update improves the classification accuracy of the pre-trained offline model; the partial model update also improves it, which shows comparable accuracy with the entire model update, while significantly increasing the learning speed. We also validate the scalability of the proposed distributed learning architecture by showing that the model learning and inference time decrease as the number of worker nodes increases.
APA, Harvard, Vancouver, ISO, and other styles
49

Ahmad, Ali M., Ghazali Sulong, Amjad Rehman, Mohammed Hazim Alkawaz, and Tanzila Saba. "Data Hiding Based on Improved Exploiting Modification Direction Method and Huffman Coding." Journal of Intelligent Systems 23, no. 4 (December 1, 2014): 451–59. http://dx.doi.org/10.1515/jisys-2014-0007.

Full text
Abstract:
AbstractThe rapid growth of covert activities via communications network brought about an increasing need to provide an efficient method for data hiding to protect secret information from malicious attacks. One of the options is to combine two approaches, namely steganography and compression. However, its performance heavily relies on three major factors, payload, imperceptibility, and robustness, which are always in trade-offs. Thus, this study aims to hide a large amount of secret message inside a grayscale host image without sacrificing its quality and robustness. To realize the goal, a new two-tier data hiding technique is proposed that integrates an improved exploiting modification direction (EMD) method and Huffman coding. First, a secret message of an arbitrary plain text of characters is compressed and transformed into streams of bits; each character is compressed into a maximum of 5 bits per stream. The stream is then divided into two parts of different sizes of 3 and 2 bits. Subsequently, each part is transformed into its decimal value, which serves as a secret code. Second, a cover image is partitioned into groups of 5 pixels based on the original EMD method. Then, an enhancement is introduced by dividing the group into two parts, namely k1 and k2, which consist of 3 and 2 pixels, respectively. Furthermore, several groups are randomly selected for embedding purposes to increase the security. Then, for each selected group, each part is embedded with its corresponding secret code by modifying one grayscale value at most to hide the code in a (2ki + 1)-ary notational system. The process is repeated until a stego-image is eventually produced. Finally, the χ2 test, which is considered one of the most severe attacks, is applied against the stego-image to evaluate the performance of the proposed method in terms of its robustness. The test revealed that the proposed method is more robust than both least significant bit embedding and the original EMD. Additionally, in terms of imperceptibility and capacity, the experimental results have also shown that the proposed method outperformed both the well-known methods, namely original EMD and optimized EMD, with a peak signal-to-noise ratio of 55.92 dB and payload of 52,428 bytes.
APA, Harvard, Vancouver, ISO, and other styles
50

Mejova, Yelena, and Padmini Srinivasan. "Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter." Proceedings of the International AAAI Conference on Web and Social Media 6, no. 1 (August 3, 2021): 234–41. http://dx.doi.org/10.1609/icwsm.v6i1.14242.

Full text
Abstract:
Most sentiment analysis studies address classification of a single source of data such as reviews or blog posts. However, the multitude of social media sources available for text analysis lends itself naturally to domain adaptation. In this study, we create a dataset spanning three social media sources -- blogs, reviews, and Twitter -- and a set of 37 common topics. We first examine sentiments expressed in these three sources while controlling for the change in topic. Then using this multi-dimensional data we show that when classifying documents in one source (a target source), models trained on other sources of data can be as good as or even better than those trained on the target data. That is, we show that models trained on some social media sources are generalizable to others. All source adaptation models we implement show reviews and Twitter to be the best sources of training data. It is especially useful to know that models trained on Twitter data are generalizable, since, unlike reviews, Twitter is more topically diverse.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography