To see the other types of publications on this topic, follow the link: Distributed Stream Processing Systems.

Journal articles on the topic 'Distributed Stream Processing Systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Distributed Stream Processing Systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

K, Sornalakshmi. "Dynamic Operator Scaling for Distributed Stream Processing Systems for Fluctuating Streams." Journal of Advanced Research in Dynamical and Control Systems 12, SP7 (July 25, 2020): 2815–21. http://dx.doi.org/10.5373/jardcs/v12sp7/20202422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wei, Xiaohui, Yuan Zhuang, Hongliang Li, and Zhiliang Liu. "Reliable stream data processing for elastic distributed stream processing systems." Cluster Computing 23, no. 2 (May 21, 2019): 555–74. http://dx.doi.org/10.1007/s10586-019-02939-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Shuiying Yu, Shuiying Yu, Yinting Zheng Shuiying Yu, Fan Zhang Yinting Zheng, Hanhua Chen Fan Zhang, and Hai Jin Hanhua Chen. "TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System." 網際網路技術學刊 24, no. 2 (March 2023): 475–85. http://dx.doi.org/10.53106/160792642023032402024.

Full text
Abstract:
<p>Stream join is one of the most fundamental operations in data stream processing applications. Existing distributed stream join systems can support efficient two-way join, which is a join operation between two streams. Based the two-way join, implementing a three-way join require to be split into double two-way joins, where the second two-way join needs to wait for the join result transmitted from the first two-way join. We show through experiments that such a design raises prohibitively high processing latency. To solve this problem, we propose TriJoin, a time-efficient three-way distributed stream join system. We design a symmetric wait-free structure by symmetrically partitioning tuples and reused join. TriJoin utilizes reused join to join each new tuple with the intermediate result of the other two streams and stored tuples locally. For a new tuple, TriJoin only joins it with the intermediate result to generate the final result without waiting, greatly reducing the processing latency. In TriJoin, we design two partitioning and storage schemes according to two different forms of three-way stream join. We implement TriJoin and conduct comprehensive experiments to evaluate the performance using real-world traces. Results show that TriJoin significantly reduces the processing latency by up to 68%, compared to existing designs.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
4

Shukla, Anshu, and Yogesh Simmhan. "Model-driven scheduling for distributed stream processing systems." Journal of Parallel and Distributed Computing 117 (July 2018): 98–114. http://dx.doi.org/10.1016/j.jpdc.2018.02.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bernardelli de Moraes, Matheus, and André Leon Sampaio Gradvohl. "Evaluating the impact of a coordinated checkpointing in distributed data streams processing systems using discrete event simulation." Revista Brasileira de Computação Aplicada 12, no. 2 (May 19, 2020): 16–27. http://dx.doi.org/10.5335/rbca.v12i2.10295.

Full text
Abstract:
Data Streams Processing systems process continuous flows of data under Quality of Service requirements. Data streams often contain critical information which requires real-time processing. To guarantee systems' dependability and avoid information loss, one must use a fault-tolerance strategy. However, there are several strategies available, and the proper evaluation of which mechanism is better for each system architecture is challenging, especially in large-scale distributed systems. In this paper, we propose a discrete simulation model for investigating the impacts of the Coordinated Checkpoint fault tolerance strategy imposes on Data Stream Processing Systems. Results show that this strategy critically affects stream processing in failure-prone situations due to an increase in latency up to 120% and information loss, reaching 95% of the processing window in the worst case.
APA, Harvard, Vancouver, ISO, and other styles
6

Tran, Tri Minh, and Byung Suk Lee. "Distributed stream join query processing with semijoins." Distributed and Parallel Databases 27, no. 3 (March 6, 2010): 211–54. http://dx.doi.org/10.1007/s10619-010-7062-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hildrum, Kirsten, Fred Douglis, Joel L. Wolf, Philip S. Yu, Lisa Fleischer, and Akshay Katta. "Storage optimization for large-scale distributed stream-processing systems." ACM Transactions on Storage 3, no. 4 (February 2008): 1–28. http://dx.doi.org/10.1145/1326542.1326547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Eskandari, Leila, Jason Mair, Zhiyi Huang, and David Eyers. "I-Scheduler: Iterative scheduling for distributed stream processing systems." Future Generation Computer Systems 117 (April 2021): 219–33. http://dx.doi.org/10.1016/j.future.2020.11.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Xunyun, and Rajkumar Buyya. "Resource Management and Scheduling in Distributed Stream Processing Systems." ACM Computing Surveys 53, no. 3 (July 5, 2020): 1–41. http://dx.doi.org/10.1145/3355399.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Shukla, Anshu, Shilpa Chaturvedi, and Yogesh Simmhan. "RIoTBench: An IoT benchmark for distributed stream processing systems." Concurrency and Computation: Practice and Experience 29, no. 21 (October 4, 2017): e4257. http://dx.doi.org/10.1002/cpe.4257.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Valeev, S. S., N. V. Kondratyeva, A. S. Kovtunenko, M. A. Timirov, and R. R. Karimov. "Distributed stream data processing system in multi-agent safety system of infrastructure objects." Information Technology and Nanotechnology, no. 2416 (2019): 324–31. http://dx.doi.org/10.18287/1613-0073-2019-2416-324-331.

Full text
Abstract:
The solution of the problem of resource management in distributed computing systems of processing stream data in safety systems of distributed objects is considered. The tasks of streaming data processing in a multi-level multi-agent evacuation system in an infrastructure object are considered. The features of the mathematical model of a distributed stream data processing system are discussed.
APA, Harvard, Vancouver, ISO, and other styles
12

EITER, THOMAS, PAUL OGRIS, and KONSTANTIN SCHEKOTIHIN. "A Distributed Approach to LARS Stream Reasoning (System paper)." Theory and Practice of Logic Programming 19, no. 5-6 (September 2019): 974–89. http://dx.doi.org/10.1017/s1471068419000309.

Full text
Abstract:
AbstractStream reasoning systems are designed for complex decision-making from possibly infinite, dynamic streams of data. Modern approaches to stream reasoning are usually performing their computations using stand-alone solvers, which incrementally update their internal state and return results as the new portions of data streams are pushed. However, the performance of such approaches degrades quickly as the rates of the input data and the complexity of decision problems are growing. This problem was already recognized in the area of stream processing, where systems became distributed in order to allocate vast computing resources provided by clouds. In this paper we propose a distributed approach to stream reasoning that can efficiently split computations among different solvers communicating their results over data streams. Moreover, in order to increase the throughput of the distributed system, we suggest an interval-based semantics for the LARS language, which enables significant reductions of network traffic. Performed evaluations indicate that the distributed stream reasoning significantly outperforms existing stand-alone LARS solvers when the complexity of decision problems and the rate of incoming data are increasing.
APA, Harvard, Vancouver, ISO, and other styles
13

Xiao, Fuyuan, Cheng Zhan, Hong Lai, Li Tao, and Zhiguo Qu. "New parallel processing strategies in complex event processing systems with data streams." International Journal of Distributed Sensor Networks 13, no. 8 (August 2017): 155014771772862. http://dx.doi.org/10.1177/1550147717728626.

Full text
Abstract:
Sensor network–based application has gained increasing attention where data streams gathered from distributed sensors need to be processed and analyzed with timely responses. Distributed complex event processing is an effective technology to handle these data streams by matching of incoming events to persistent pattern queries. Therefore, a well-managed parallel processing scheme is required to improve both system performance and the quality-of-service guarantees of the system. However, the specific properties of pattern operators increase the difficulties of implementing parallel processing. To address this issue, a new parallelization model and three parallel processing strategies are proposed for distributed complex event processing systems. The effects of temporal constraints, for example, sliding windows, are included in the new parallelization model to enable the processing load for the overlap between windows of a batch induced by each input event to be shared by the downstream machines to avoid events that may result in wrong decisions. The proposed parallel strategies can keep the complex event processing system working stably and continuously during the elapsed time. Finally, the application of our work is demonstrated using experiments on the StreamBase system regardless of the increased input rate of the stream or the increased time window size of the operator.
APA, Harvard, Vancouver, ISO, and other styles
14

Balazinska, Magdalena, Hari Balakrishnan, Samuel R. Madden, and Michael Stonebraker. "Fault-tolerance in the borealis distributed stream processing system." ACM Transactions on Database Systems 33, no. 1 (March 2008): 1–44. http://dx.doi.org/10.1145/1331904.1331907.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Cardellini, Valeria, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. "Optimal Operator Replication and Placement for Distributed Stream Processing Systems." ACM SIGMETRICS Performance Evaluation Review 44, no. 4 (May 10, 2017): 11–22. http://dx.doi.org/10.1145/3092819.3092823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Repantis, T., Xiaohui Gu, and V. Kalogeraki. "QoS-Aware Shared Component Composition for Distributed Stream Processing Systems." IEEE Transactions on Parallel and Distributed Systems 20, no. 7 (July 2009): 968–82. http://dx.doi.org/10.1109/tpds.2008.165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Rank, Johannes, Jonas Herget, Andreas Hein, and Helmut Krcmar. "Evaluating Task-Level CPU Efficiency for Distributed Stream Processing Systems." Big Data and Cognitive Computing 7, no. 1 (March 10, 2023): 49. http://dx.doi.org/10.3390/bdcc7010049.

Full text
Abstract:
Big Data and primarily distributed stream processing systems (DSPSs) are growing in complexity and scale. As a result, effective performance management to ensure that these systems meet the required service level objectives (SLOs) is becoming increasingly difficult. A key factor to consider when evaluating the performance of a DSPS is CPU efficiency, which is the ratio of the workload processed by the system to the CPU resources invested. In this paper, we argue that developing new performance tools for creating DSPSs that can fulfill SLOs while using minimal resources is crucial. This is especially significant in edge computing situations where resources are limited and in large cloud deployments where conserving power and reducing computing expenses are essential. To address this challenge, we present a novel task-level approach for measuring CPU efficiency in DSPSs. Our approach supports various streaming frameworks, is adaptable, and comes with minimal overheads. This enables developers to understand the efficiency of different DSPSs at a granular level and provides insights that were not previously possible.
APA, Harvard, Vancouver, ISO, and other styles
18

Akanbi, Adeyinka, and Muthoni Masinde. "A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring." Sensors 20, no. 11 (June 3, 2020): 3166. http://dx.doi.org/10.3390/s20113166.

Full text
Abstract:
In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.
APA, Harvard, Vancouver, ISO, and other styles
19

XIAO, Fuyuan, Teruaki KITASUKA, and Masayoshi ARITSUGI. "Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems." IEICE Transactions on Information and Systems E95-D, no. 4 (2012): 1062–73. http://dx.doi.org/10.1587/transinf.e95.d.1062.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Yue-jie. "Data Stream of Wireless Sensor Networks Based on Deep Learning." International Journal of Online Engineering (iJOE) 12, no. 11 (November 24, 2016): 22. http://dx.doi.org/10.3991/ijoe.v12i11.6232.

Full text
Abstract:
The sensor data in wireless sensor networks are continuously arriving in multiple, rapid, time varying, possibly unpredictable, unbounded streams, and no record of historical information is kept. These limitations make conventional Database Management Systems and their evolution unsuitable for streams. Thereby there is a need to build a complete Data Streaming Management System (DSMS), which could process streams and perform dynamic continuous query processing. In this paper, a framework for Adaptive Distributed Data Streaming Management System (ADDSMS) is presented, which operates as streams control interface between arrays of distributed data stream sources and end-user clients who access and analyze these streams. Simulation results show that the proposed method can thus improve overall system performance substantially.
APA, Harvard, Vancouver, ISO, and other styles
21

Henning, Sören, and Wilhelm Hasselbring. "Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures." Big Data Research 25 (July 2021): 100209. http://dx.doi.org/10.1016/j.bdr.2021.100209.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Ivanov, Yurii, Borys Sharov, Nazar Zalevskyi, and Ostap Kernytskyi. "Software System for End-Products Accounting in Bakery Production Lines Based on Distributed Video Streams Analysis." Advances in Cyber-Physical Systems 7, no. 2 (December 16, 2022): 101–7. http://dx.doi.org/10.23939/acps2022.02.101.

Full text
Abstract:
Among the main requirements of modern surveillance systems are stability in the face of negative influences and intellectualization. The purpose of intellectualization is that the surveillance system should perform not only the main functions such as monitoring and stream recording but also have to provide effective stream processing. The requirement for this processing is that the system operation has to be automated, and the operator's influence should be minimal. Modern intelligent surveillance systems require the development of grouping methods. The context of the grouping method here is associated with a decomposition of the target problem. Depending on the purpose of the system, the target problem can represent several subproblems, each of which usually accomplishes by artificial intelligence or data mining methods.
APA, Harvard, Vancouver, ISO, and other styles
23

Bordin, Maycon Viana, Dalvan Griebler, Gabriele Mencagli, Claudio F. R. Geyer, and Luiz Gustavo L. Fernandes. "DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems." IEEE Access 8 (2020): 222900–222917. http://dx.doi.org/10.1109/access.2020.3043948.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Llaves, Alejandro, Oscar Corcho, Peter Taylor, and Kerry Taylor. "Enabling RDF Stream Processing for Sensor Data Management in the Environmental Domain." International Journal on Semantic Web and Information Systems 12, no. 4 (October 2016): 1–21. http://dx.doi.org/10.4018/ijswis.2016100101.

Full text
Abstract:
This paper presents a generic approach to integrate environmental sensor data efficiently, allowing the detection of relevant situations and events in near real-time through continuous querying. Data variety is addressed with the use of the Semantic Sensor Network ontology for observation data modelling, and semantic annotations for environmental phenomena. Data velocity is handled by distributing sensor data messaging and serving observations as RDF graphs on query demand. The stream processing engine presented in the paper, morph-streams++, provides adapters for different data formats and distributed processing of streams in a cluster. An evaluation of different configurations for parallelization and semantic annotation parameters proves that the described approach reduces the average latency of message processing in some cases.
APA, Harvard, Vancouver, ISO, and other styles
25

Yang, Dingyu, Jianmei Guo, Zhi-Jie Wang, Yuan Wang, Jingsong Zhang, Liang Hu, Jian Yin, and Jian Cao. "FastPM: An approach to pattern matching via distributed stream processing." Information Sciences 453 (July 2018): 263–80. http://dx.doi.org/10.1016/j.ins.2018.04.031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Pishgoo, Boshra, Ahmad Akbari Azirani, and Bijan Raahemi. "A hybrid distributed batch-stream processing approach for anomaly detection." Information Sciences 543 (January 2021): 309–27. http://dx.doi.org/10.1016/j.ins.2020.07.026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Kim, Yoon-Ki, and Yongsung Kim. "DiPLIP: Distributed Parallel Processing Platform for Stream Image Processing Based on Deep Learning Model Inference." Electronics 9, no. 10 (October 13, 2020): 1664. http://dx.doi.org/10.3390/electronics9101664.

Full text
Abstract:
Recently, as the amount of real-time video streaming data has increased, distributed parallel processing systems have rapidly evolved to process large-scale data. In addition, with an increase in the scale of computing resources constituting the distributed parallel processing system, the orchestration of technology has become crucial for proper management of computing resources, in terms of allocating computing resources, setting up a programming environment, and deploying user applications. In this paper, we present a new distributed parallel processing platform for real-time large-scale image processing based on deep learning model inference, called DiPLIP. It provides a scheme for large-scale real-time image inference using buffer layer and a scalable parallel processing environment according to the size of the stream image. It allows users to easily process trained deep learning models for processing real-time images in a distributed parallel processing environment at high speeds, through the distribution of the virtual machine container.
APA, Harvard, Vancouver, ISO, and other styles
28

Xiao, Fuyuan, and Masayoshi Aritsugi. "An Adaptive Parallel Processing Strategy for Complex Event Processing Systems over Data Streams in Wireless Sensor Networks." Sensors 18, no. 11 (November 2, 2018): 3732. http://dx.doi.org/10.3390/s18113732.

Full text
Abstract:
Efficient matching of incoming events of data streams to persistent queries is fundamental to event stream processing systems in wireless sensor networks. These applications require dealing with high volume and continuous data streams with fast processing time on distributed complex event processing (CEP) systems. Therefore, a well-managed parallel processing technique is needed for improving the performance of the system. However, the specific properties of pattern operators in the CEP systems increase the difficulties of the parallel processing problem. To address these issues, a parallelization model and an adaptive parallel processing strategy are proposed for the complex event processing by introducing a histogram and utilizing the probability and queue theory. The proposed strategy can estimate the optimal event splitting policy, which can suit the most recent workload conditions such that the selected policy has the least expected waiting time for further processing of the arriving events. The proposed strategy can keep the CEP system running fast under the variation of the time window sizes of operators and the input rates of streams. Finally, the utility of our work is demonstrated through the experiments on the StreamBase system.
APA, Harvard, Vancouver, ISO, and other styles
29

Alshamrani, Sultan, Hesham Alhumyani, Quadri Waseem, and Isbudeen Noor Mohamed. "High availability of data using Automatic Selection Algorithm (ASA) in distributed stream processing systems." Bulletin of Electrical Engineering and Informatics 8, no. 2 (June 1, 2019): 690–98. http://dx.doi.org/10.11591/eei.v8i2.1414.

Full text
Abstract:
High Availability of data is one of the most critical requirements of a distributed stream processing systems (DSPS). We can achieve high availability using available recovering techniques, which include (active backup, passive backup and upstream backup). Each recovery technique has its own advantages and disadvantages. They are used for different type of failures based on the type and the nature of the failures. This paper presents an Automatic Selection Algorithm (ASA) which will help in selecting the best recovery techniques based on the type of failures. We intend to use together all different recovery approaches available (i.e., active standby, passive standby, and upstream standby) at nodes in a distributed stream-processing system (DSPS) based upon the system requirements and a failure type). By doing this, we will achieve all benefits of fastest recovery, precise recovery and a lower runtime overhead in a single solution. We evaluate our automatic selection algorithm (ASA) approach as an algorithm selector during the runtime of stream processing. Moreover, we also evaluated its efficiency in comparison with the time factor. The experimental results show that our approach is 95% efficient and fast than other conventional manual failure recovery approaches and is hence totally automatic in nature.
APA, Harvard, Vancouver, ISO, and other styles
30

Jayasekara, Sachini, Aaron Harwood, and Shanika Karunasekera. "A utilization model for optimization of checkpoint intervals in distributed stream processing systems." Future Generation Computer Systems 110 (September 2020): 68–79. http://dx.doi.org/10.1016/j.future.2020.04.019.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Ye, Qian, and Minyan Lu. "s2p: Provenance Research for Stream Processing System." Applied Sciences 11, no. 12 (June 15, 2021): 5523. http://dx.doi.org/10.3390/app11125523.

Full text
Abstract:
The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in modern DSP systems. Challenges include but are not limited to an optimization solution for avoiding excessive runtime overhead, reducing provenance-related data storage, and providing it in an easy-to-use fashion. Without any prior knowledge about which kinds of data may finally lead to the abnormal, we have to track all transformations in detail, which potentially causes hard system burden. This paper proposes s2p (Stream Process Provenance), which mainly consists of online provenance and offline provenance, to provide fine- and coarse-grained provenance in different precision. We base our design of s2p on the fact that, for a mature online DSP system, the abnormal results are rare, and the results that require a detailed analysis are even rarer. We also consider state transition in our provenance explanation. We implement s2p on Apache Flink named as s2p-flink and conduct three experiments to evaluate its scalability, efficiency, and overhead from end-to-end cost, throughput, and space overhead. Our evaluation shows that s2p-flink incurs a 13% to 32% cost overhead, 11% to 24% decline in throughput, and few additional space costs in the online provenance phase. Experiments also demonstrates the s2p-flink can scale well. A case study is presented to demonstrate the feasibility of the whole s2p solution.
APA, Harvard, Vancouver, ISO, and other styles
32

Ni, Xiang, Jing Li, Mo Yu, Wang Zhou, and Kun-Lung Wu. "Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 857–64. http://dx.doi.org/10.1609/aaai.v34i01.5431.

Full text
Abstract:
This paper considers the problem of resource allocation in stream processing, where continuous data flows must be processed in real time in a large distributed system. To maximize system throughput, the resource allocation strategy that partitions the computation tasks of a stream processing graph onto computing devices must simultaneously balance workload distribution and minimize communication. Since this problem of graph partitioning is known to be NP-complete yet crucial to practical streaming systems, many heuristic-based algorithms have been developed to find reasonably good solutions. In this paper, we present a graph-aware encoder-decoder framework to learn a generalizable resource allocation strategy that can properly distribute computation tasks of stream processing graphs unobserved from training data. We, for the first time, propose to leverage graph embedding to learn the structural information of the stream processing graphs. Jointly trained with the graph-aware decoder using deep reinforcement learning, our approach can effectively find optimized solutions for unseen graphs. Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.
APA, Harvard, Vancouver, ISO, and other styles
33

Poźniak, Krzysztof. "Modeling of Synchronous Data Streams Processing in the RPC Muon Trigger System of the CMS Experiment." International Journal of Electronics and Telecommunications 56, no. 4 (November 1, 2010): 489–502. http://dx.doi.org/10.2478/v10177-010-0067-3.

Full text
Abstract:
Modeling of Synchronous Data Streams Processing in the RPC Muon Trigger System of the CMS ExperimentThis paper presents signal synchronization aspects in a large, distributed, multichannel RPC Muon Trigger system in the CMS experiment. The paper is an introduction to normalized structure analysis methods of such systems. The method introduces a general model of the system, presented in a form of a network of distributed, synchronous, pipeline processes. The model is based on a definition of a synchronous data stream and its formal, fundamental properties. Theoretical considerations are supported by a practical application of synchronous streams and processes management. The following processes were modeled and implemented in hardware: window synchronization, derandomization, data concentration and generation of test pulses. There are presented chosen results of the model application in the CMS experiment.
APA, Harvard, Vancouver, ISO, and other styles
34

Imran, Muhammad, Gábor E. Gévay, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. "Fast datalog evaluation for batch and stream graph processing." World Wide Web 25, no. 2 (January 20, 2022): 971–1003. http://dx.doi.org/10.1007/s11280-021-00960-w.

Full text
Abstract:
AbstractImplementing complex algorithms for big data, artificial intelligence, and graph processing requires enormous effort. Succinct, declarative programs to solve complex problems that can be efficiently executed for batching and streaming data are in demand. This paper presents Nexus, a distributed Datalog evaluation system. It evaluates Datalog programs using the semi-naive algorithm for batch and streaming data using incremental and asynchronous iteration. Furthermore, we evaluate Datalog programs with aggregates to determine the advantages of implementing the semi-naive algorithm using incremental iteration on its performance. Our experimental results show that Nexus significantly outperforms acyclic dataflow-based systems.
APA, Harvard, Vancouver, ISO, and other styles
35

Ye, Qian, and Minyan Lu. "SPOT: Testing Stream Processing Programs with Symbolic Execution and Stream Synthesizing." Applied Sciences 11, no. 17 (August 30, 2021): 8057. http://dx.doi.org/10.3390/app11178057.

Full text
Abstract:
Adoption of distributed stream processing (DSP) systems such as Apache Flink in real-time big data processing is increasing. However, DSP programs are prone to be buggy, especially when one programmer neglects some DSP features (e.g., source data reordering), which motivates development of approaches for testing and verification. In this paper, we focus on the test data generation problem for DSP programs. Currently, there is a lack of an approach that generates test data for DSP programs with both high path coverage and covering different stream reordering situations. We present a novel solution, SPOT (i.e., Stream Processing Program Test), to achieve these two goals simultaneously. At first, SPOT generates a set of individual test data representing each path of one DSP program through symbolic execution. Then, SPOT composes these independent data into various time series data (a.k.a, stream) in diverse reordering. Finally, we can perform a test by feeding the DSP program with these streams continuously. To automatically support symbolic analysis, we also developed JPF-Flink, a JPF (i.e., Java Pathfinder) extension to coordinate the execution of Flink programs. We present four case studies to illustrate that: (1) SPOT can support symbolic analysis for the commonly used DSP operators; (2) test data generated by SPOT can more efficiently achieve high JDU (i.e., Joint Dataflow and UDF) path coverage than two recent DSP testing approaches; (3) test data generated by SPOT can more easily trigger software failure when comparing with those two DSP testing approaches; and (4) the data randomly generated by those two test techniques are highly skewed in terms of stream reordering, which is measured by the entropy metric. In comparison, it is even for test data from SPOT.
APA, Harvard, Vancouver, ISO, and other styles
36

Russo Russo, Gabriele, Matteo Nardelli, Valeria Cardellini, and Francesco Lo Presti. "Multi-Level Elasticity for Wide-Area Data Streaming Systems: A Reinforcement Learning Approach." Algorithms 11, no. 9 (September 7, 2018): 134. http://dx.doi.org/10.3390/a11090134.

Full text
Abstract:
The capability of efficiently processing the data streams emitted by nowadays ubiquitous sensing devices enables the development of new intelligent services. Data Stream Processing (DSP) applications allow for processing huge volumes of data in near real-time. To keep up with the high volume and velocity of data, these applications can elastically scale their execution on multiple computing resources to process the incoming data flow in parallel. Being that data sources and consumers are usually located at the network edges, nowadays the presence of geo-distributed computing resources represents an attractive environment for DSP. However, controlling the applications and the processing infrastructure in such wide-area environments represents a significant challenge. In this paper, we present a hierarchical solution for the autonomous control of elastic DSP applications and infrastructures. It consists of a two-layered hierarchical solution, where centralized components coordinate subordinated distributed managers, which, in turn, locally control the elastic adaptation of the application components and deployment regions. Exploiting this framework, we design several self-adaptation policies, including reinforcement learning based solutions. We show the benefits of the presented self-adaptation policies with respect to static provisioning solutions, and discuss the strengths of reinforcement learning based approaches, which learn from experience how to optimize the application performance and resource allocation.
APA, Harvard, Vancouver, ISO, and other styles
37

Park, Jun Pyo, Chang-Sup Park, and Yon Dohn Chung. "Energy and Latency Efficient Access of Wireless XML Stream." Journal of Database Management 21, no. 1 (January 2010): 58–79. http://dx.doi.org/10.4018/jdm.2010112303.

Full text
Abstract:
In this article, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data that enables energy and latencyefficient broadcasting of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains the tag name, attributes, and text content of an element, as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the stream with shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing in the mobile clients. Through extensive experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Rajaguru D., Puviyarasi T., and Vengattaraman T. "Malicious Data Stream Identification to Improve the Resource Elasticity of Handheld Edge Computing System." International Journal of Handheld Computing Research 8, no. 4 (October 2017): 30–39. http://dx.doi.org/10.4018/ijhcr.2017100103.

Full text
Abstract:
This article lights the need for the identification of resource elasticity in handheld edge computing systems and its related issues. Under a few developing application situations, for example, in urban areas, operational checking of huge foundations, wearable help, and the Internet of Things, nonstop information streams must be prepared under short postponements. A few arrangements, including various programming motors, have been created for handling unbounded information streams in an adaptive and productive way. As of late, designs have been proposed to utilize edge processing for information stream handling. This article reviews the cutting-edge stream preparing motors and systems for misusing asset versatility which highlights distributed computing in stream preparation. Asset flexibility takes into consideration an application or administration to scale out/in as per fluctuating requests. Flexibility turns out to be much more difficult in conveyed conditions involving edge and distributed computing assets. Device security is one of the real difficulties for fruitful execution of the Internet of Things and fog figuring conditions in the current IT space. Specialists and information technology (IT) associations have investigated numerous answers for shield frameworks from unauthenticated device assaults. Fog registering utilizes organize devices (e.g. switch, switch and center) for dormancy mindful handling of gathered information utilizing IoT. This article concludes with the various process for improvising the resource elasticity of handheld devices for leading the communication to the next stage of computing.
APA, Harvard, Vancouver, ISO, and other styles
39

Cheng, Zhinan, Qun Huang, and Patrick P. C. Lee. "On the performance and convergence of distributed stream processing via approximate fault tolerance." VLDB Journal 28, no. 5 (September 3, 2019): 821–46. http://dx.doi.org/10.1007/s00778-019-00565-w.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Ageev, Aleksey Vladimirovich, Andrey Alexandrovich Boguslavskiy, and Sergey Mikhailovich Sokolov. "Task scheduling in the onboard computer system." Keldysh Institute Preprints, no. 43 (2023): 1–27. http://dx.doi.org/10.20948/prepr-2023-43.

Full text
Abstract:
The problem of rational resource allocation in the on-board computing system of a robotic complex is considered. As a first step, the possibility of using online scheduling algorithms without preemptive for distributed systems, the Round Robin cyclic algorithm, is analyzed. To demonstrate the basic capabilities of the developed scheduler, a video stream segmentation task is used. The peculiarities of task processing for real-time vision systems are demonstrated. The problem of inter-node synchronization of sensor data is solved. A feature of on-board robotics resources, such as the need for a linking software in the form of Robot Operation System, is taken into account. To develop the task scheduler, the C++ programming language and the ROS2 framework, which provides asynchronous networking, are used. A scheduling model and software implementing this model are being built to perform tasks in a distributed environment in order to control the processing of video streams in a vision system.
APA, Harvard, Vancouver, ISO, and other styles
41

Qu, Zhijian, Hanxin Liu, Hanlin Wang, Xinqiang Chen, Rui Chi, and Zixiao Wang. "Cluster equilibrium scheduling method based on backpressure flow control in railway power supply systems." PLOS ONE 15, no. 12 (December 9, 2020): e0243543. http://dx.doi.org/10.1371/journal.pone.0243543.

Full text
Abstract:
The purpose of the study is to solve problems, i.e., increasingly significant processing delay of massive monitoring data and imbalanced tasks in the scheduling and monitoring center for a railway network. To tackle these problems, a method by using a smooth weighted round-robin scheduling based on backpressure flow control (BF-SWRR) is proposed. The method is developed based on a model for message queues and real-time streaming computing. By using telemetry data flow as input data sources, the fields of data sources are segmented into different sets by using a distributed model of stream computing parallel processing. Moreover, the round-robin (RR) scheduling method for the distributed server is improved. The parallelism, memory occupancy, and system delay are tested by taking a high-speed train section of a certain line as an example. The result showed that the BF-SWRR method for clusters can control the delay to within 1 s. When the parallelism of distributed clusters is set to 8, occupancy rates of the CPU and memory can be decreased by about 15%. In this way, the overall load of the cluster during stream computing is more balanced.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Yongheng, Xiaozan Zhang, and Zengwang Wang. "A Proactive Decision Support System for Online Event Streams." International Journal of Information Technology & Decision Making 17, no. 06 (November 2018): 1891–913. http://dx.doi.org/10.1142/s0219622018500463.

Full text
Abstract:
In-stream big data processing is an important part of big data processing. Proactive decision support systems can predict future system states and execute some actions to avoid unwanted states. In this paper, we propose a proactive decision support system for online event streams. Based on Complex Event Processing (CEP) technology, this method uses structure varying dynamic Bayesian network to predict future events and system states. Different Bayesian network structures are learned and used according to different event context. A networked distributed Markov decision processes model with predicting states is proposed as sequential decision making model. A Q-learning method is investigated for this model to find optimal joint policy. The experimental evaluations show that this method works well for congestion control in transportation system.
APA, Harvard, Vancouver, ISO, and other styles
43

Афанасьев, В. В., О. А. Бебенина, and И. И. Ветров. "APPROACHES TO METHOD OF BINARY CLASSIFICATION SELECTING FOR TEXT MESSAGES IN STREAMING DISTRIBUTED SYSTEMS OF INFORMATION PROCESSING." СИСТЕМЫ УПРАВЛЕНИЯ И ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ, no. 4(82) (December 1, 2020): 43–46. http://dx.doi.org/10.36622/vstu.2020.48.26.010.

Full text
Abstract:
В статье представлен вариант решения задачи выбора метода двоичной (двухклассовой) классификации потока текстовых сообщений (на примере новостных источников - сервиса RSS), поступающих на вход распределенной системы обработки информации. Рассматриваются вопросы, как выбора рационального, с точки зрения потоковой обработки, метода двоичной классификации для подсистемы обработки текстовых сообщений на основе технологии машинного обучения, так и формирования обучающей и тестовой выборок массива текстовых сообщений, необходимых на этапе обучения двоичного классификатора. Представлены результаты экспериментальной проверки полученных значений указанных выборок применительно к рассмотренным методам двоичной классификации. Приводится подход к процессу обучения классификатора для подсистемы обработки текстовых сообщений в распределенной системе потоковой обработки информации. The article presents a variant of solving problem of a method for binary (two-class) classification choosing of a text messages stream (using the example of news sources - the RSS service) entering on input of a distributed information processing system. The issues of both the choice of a rational binary classification method for a text message, from the point of view of stream processing for subsystem based on machine learning technology, and the formation of training and test samples of an array of text messages required at the stage of training a binary classifier are considered. The results of experimental verification of the obtained values ??of these samples are presented in relation to the considered methods of binary classification. An approach to the process of classifier training for text message processing subsystem in a distributed system of streaming information processing is presented.
APA, Harvard, Vancouver, ISO, and other styles
44

Boyarshin, Igor, Anna Doroshenko, and Pavlo Rehida. "REQUEST BALANCING METHOD FOR INCREASING THEIR PROCESSING EFFICIENCY WITH INFORMATION REPLICATION IN A DISTRIBUTED DATA STORAGE SYSTEM." TECHNICAL SCIENCES AND TECHNOLOGIES, no. 2(24) (2021): 75–82. http://dx.doi.org/10.25140/2411-5363-2021-2(24)-75-82.

Full text
Abstract:
The article describes a new method of improving efficiency of the systems that deal with storage and providing access of shared data of many users by utilizing replication. Existing methods of load balancing in data storage systems are described, namely RR and WRR. A new method of request balancing among multiple data storage nodes is proposed, that is able to adjust to input request stream intensity in real time and utilize disk space efficiently while doing so.
APA, Harvard, Vancouver, ISO, and other styles
45

Chavarria, J. Andres, Todd Bown, Paul Clarkson, Simon Watson, and Chris Minto. "Digitalization of asset surveillance through distributed fiber-optic sensing: Geophysics and engineering diagnostics and streaming." Leading Edge 41, no. 9 (September 2022): 636–40. http://dx.doi.org/10.1190/tle41090636.1.

Full text
Abstract:
Fiber-optic distributed acoustic sensing (DAS) can listen to a wide range of signals. This listening takes place at high sampling rates with fine spatial resolution, resulting in large data volumes. Data streaming solutions are available but result in large transmission and storage costs. In this paper, we describe strategies to convert large data streams from DAS interrogator units to diagnostics or processed products. Optimizing DAS systems results in higher signal-to-noise ratio for signals while extracting diagnostic features out of the noise that could be related to production or well engineering. DAS has sensitivity to diverse signals, and the first goal of edge processing is to separate them for consumption by various disciplines. Focusing the processing on specific aspects in the DAS recordings provides data products that are streamed in efficient ways. We show how DAS processing can deploy fast algorithms so that data diagnostics are sent to remote locations. This enables real-time-diagnostics and event-detection tools. By providing the bulk of computing in the field, data upload to remote servers is efficient and targeted. We show how this managed data stream enables digitalization of engineering and geoscience assets.
APA, Harvard, Vancouver, ISO, and other styles
46

Theodorakis, Georgios, Fotios Kounelis, Peter Pietzuch, and Holger Pirk. "Scabbard." Proceedings of the VLDB Endowment 15, no. 2 (October 2021): 361–74. http://dx.doi.org/10.14778/3489496.3489515.

Full text
Abstract:
Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex cluster-based deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs. We describe Scabbard, the first single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specific compression: Scabbard monitors stream statistics and dynamically generates computationally efficient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies.
APA, Harvard, Vancouver, ISO, and other styles
47

De Pauw, Wim, and Henrique Andrade. "Visualizing Large-Scale Streaming Applications." Information Visualization 8, no. 2 (January 22, 2009): 87–106. http://dx.doi.org/10.1057/ivs.2009.5.

Full text
Abstract:
Stream processing is a new and important computing paradigm. Innovative streaming applications are being developed in areas ranging from scientific applications (for example, environment monitoring), to business intelligence (for example, fraud detection and trend analysis), to financial markets (for example, algorithmic trading systems). In this paper we describe Streamsight, a new visualization tool built to examine, monitor and help understand the dynamic behavior of streaming applications. Streamsight can handle the complex, distributed and large-scale nature of stream processing applications by using hierarchical graphs, multi-perspective visualizations, and de-cluttering strategies. To address the dynamic and adaptive nature of these applications, Streamsight also provides real-time visualization as well as the capability to record and replay. All these features are used for debugging, for performance optimization, and for management of resources, including capacity planning. More than 100 developers, both inside and outside IBM, have been using Streamsight to help design and implement large-scale stream processing applications.
APA, Harvard, Vancouver, ISO, and other styles
48

Kalyaev, A. I. "APPLICATION OF DISTRIBUTED COMPUTING SYSTEMS FOR IMAGE PROCESSING IN ORDER TO SEARCH FOR UNMANNED AERIAL VEHICLES." Vestnik komp'iuternykh i informatsionnykh tekhnologii, no. 208 (October 2021): 46–53. http://dx.doi.org/10.14489/vkit.2021.10.pp.046-053.

Full text
Abstract:
This article describes an approach to solving the problem of searching, identifying and tracking UAVs (Unmanned Aerial Vehicles) using a distributed computing system for processing images from multiple surveillance cameras. Today, the problem of finding UAVs is becoming especially relevant due to their widespread distribution and low cost, which gives a wide scope for illegal use: the implementation of terrorist attacks in crowded places and critical infrastructure, as well as unauthorized tracking of specially protected areas. At the same time, modern radars have low efficiency for searching for UAVs, so today visual detection tools are used, for which effective work requires complex calculations. In this article, it is proposed to use distributed computing systems to solve these complex problems of processing a video stream for the purpose of searching, identifying and tracking objects (UAVs). For this, the author of the article, proceeding from the potential areas of application of such systems, decided to apply a multiagent approach, which makes it possible to create fault-tolerant and scalable systems. In the course of work on the article, softwarefor a distributed computing system for image processing in order to search for unmanned aerial vehicleswas created, a hardware stand was assembled to test it. While performing tests, it was concluded that the proposed method can be applied to implement high-resolution video processing and frame rate in a distributed computing system.
APA, Harvard, Vancouver, ISO, and other styles
49

Alaasam, Ameer Basim Abdulameer, Gleb Igorevich Radchenko, Andrei Nikolaevitch Tchernykh, and José Luis González-Compeán González-Compeán. "Stateful Stream Processing Containerized as Microservice to Support Digital Twins in Fog Computing." Proceedings of the Institute for System Programming of the RAS 33, no. 1 (2021): 65–80. http://dx.doi.org/10.15514/ispras-2021-33(1)-5.

Full text
Abstract:
Digital twins of processes and devices use information from sensors to synchronize their state with the entities of the physical world. The concept of stream computing enables effective processing of events generated by such sensors. However, the need to track the state of an instance of the object leads to the impossibility of organizing instances of digital twins as stateless services. Another feature of digital twins is that several tasks implemented on their basis require the ability to respond to incoming events at near-real-time speed. In this case, the use of cloud computing becomes unacceptable due to high latency. Fog computing manages this problem by moving some computational tasks closer to the data sources. One of the recent solutions providing the development of loosely coupled distributed systems is a Microservice approach, which implies the organization of the distributed system as a set of coherent and independent services interacting with each other using messages. The microservice is most often isolated by utilizing containers to overcome the high overheads of using virtual machines. The main problem is that microservices and containers together are stateless by nature. The container technology still does not fully support live container migration between physical hosts without data loss. It causes challenges in ensuring the uninterrupted operation of services in fog computing environments. Thus, an essential challenge is to create a containerized stateful stream processing based microservice to support digital twins in the fog computing environment. Within the scope of this article, we study live stateful stream processing migration and how to redistribute computational activity across cloud and fog nodes using Kafka middleware and its Stream DSL API.
APA, Harvard, Vancouver, ISO, and other styles
50

RIESCO, A., and J. RODRÍGUEZ-HORTALÁ. "Property-Based Testing for Spark Streaming." Theory and Practice of Logic Programming 19, no. 04 (February 19, 2019): 574–602. http://dx.doi.org/10.1017/s1471068419000012.

Full text
Abstract:
AbstractStream processing has reached the mainstream in the last years, as a new generation of open-source distributed stream processing systems, designed for scaling horizontally on commodity hardware, has brought the capability for processing high-volume and high-velocity data streams to companies of all sizes. In this work, we propose a combination of temporal logic and property-based testing (PBT) for dealing with the challenges of testing programs that employ this programming model. We formalize our approach in a discrete time temporal logic for finite words, with some additions to improve the expressiveness of properties, which includes timeouts for temporal operators and a binding operator for letters. In particular, we focus on testing Spark Streaming programs written with the Spark API for the functional language Scala, using the PBT library ScalaCheck. For that we add temporal logic operators to a set of new ScalaCheck generators and properties, as part of our testing library sscheck.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography