To see the other types of publications on this topic, follow the link: Stream Processing System.

Dissertations / Theses on the topic 'Stream Processing System'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Stream Processing System.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wladdimiro, Cottet Daniel. "Dynamic adaptation in Stream Processing Systems." Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS028.

Full text
Abstract:
Le nombre de données produites par les systèmes ou applications Web actuels augmente rapidement en raison des nombreuses interactions avec les utilisateurs (dans le cadre par exemple, transactions boursières en temps réel, des jeux multijoueurs, des données en continu produits par Twitter, etc.). Ainsi, il existe une demande croissante, notamment dans les domaines du commerce, de la sécurité et de la recherche, pour des systèmes capables de traiter ces données en temps réel et de fournir des informations utiles dans un court laps de temps. Les systèmes de traitement des flux (SPS) répondent à ces besoins et ont été largement utilisés à cette fin. L’objectif des SPS est de traiter de grands volumes de données en temps réel en endentent un ensemble d’opérateurs dans des applications structurée en DAG. Le plupart des SPS existants, tels que Flink ou Storm, sont configurés avant leur déploiement, définissant généralement à l’avance le DAG et le nombre de répliques opérateurs. Une surestimation du nombre de répliques entraîne alors un gaspillage des ressources allouées. D’autre part, en fonction de l’interaction avec l’environnement, le taux de données en entrée peut fluctuer de manière dynamique et, par conséquent, les opérateurs peuvent être surchargés, ce qui entraîne une dégradation des performances du système. Ces SPS ne sont pas capables de s’adapter dynamiquement à la charge de travail de l’opérateur et aux variations du taux d’entrée. Pour résoudre ce problème, une solution consiste à augmenter dynamiquement le nombre de ressources, physiques ou logiques, allouées au SPS lorsque la demande de traitement d’un ou plusieurs opérateurs augmente. Nous présentons dans cette thèse deux approches, RA-SPS et PA-SPS, pour modifier dynamiquement le nombre de répliques d’un opérateur. L’approche réactive repose sur l’état courant des opérateurs calculé sur de multiples métriques. Tandis que le modèle prédictif se base sur la variation du taux d’entrée, le temps d’exécution des opérateurs et les événements en file d’attente. Nous avons également étendu Storm pour reconfigurer dynamiquement le nombre de copies sans avoir à geler l’application. Notre SPS met aussi en œuvre un équilibreur de charge qui distribue les événements entrants de manière équitable entre les répliques d’un opérateur. Des expériences sur la Google Cloud Platform (GCP) ont été menées avec des applications qui traitent le flux Twitter, le trafic DNS ou les traces de flux du journal système. Nous avons évalué différentes configurations et les avons comparées avec l’implémentation originale de Storm ainsi qu’avec des travaux de pointe tels que SPS DABS-Storm qui adapte également le nombre de répliques. Les résultat montrent que notre approche permet d’améliorer de manière conséquente le nombre d’événement traité tout en réduisant les coûts
The amount of data produced by today’s web-based systems and applications increases rapidly, due to the many interactions with users (e.g. real-time stock market transactions, multiplayer games, streaming data produced by Twitter, etc.). As a result, there is a growing demand, particularly in the fields of commerce, security and research, for systems capable of processing this data in real time and providing useful information in a short space of time. Stream processing systems (SPS) meet these needs and have been widely used for this purpose. The aim of SPSs is to process large volumes of data in real time by housing a set of operators in applications based on Directed acyclic graphs (DAG). Most existing SPSs, such as Flink or Storm, are configured prior to deployment, usually defining the DAG and the number of operator replicas in advance. Overestimating the number of replicas can lead to a waste of allocated resources. On the other hand, depending on interaction with the environment, the rate of input data can fluctuate dynamically and, as a result, operators can become overloaded, leading to a degradation in system performance. These SPSs are not capable of dynamically adapting to operator workload and input rate variations. One solution to this problem is to dynamically increase the number of resources, physical or logical, allocated to the SPS when the processing demand of one or more operators increases. This thesis presents two SPSs, RA-SPS and PA-SPS, reactive and predictive approach respectively, for dynamically modifying the number of operator replicas. The reactive approach relies on the current state of operators computed on multiple metrics, while the predictive model is based on input rate variation, operator execution time, and queued events. The two SPSs extend Storm SPS to dynamically reconfigure the number of copies without having to downtime the application. They also implement a load balancer that distributes incoming events fairly among operator replicas. Experiments on the Google Cloud Platform (GCP) were carried out with applications that process Twitter data, DNS traffic, or logs traces. Performance was evaluated with different configurations and the results were compared with those of running the same applications on the original Storm as well as with state-of-the-art work such as SPS DABS-Storm, which also adapt the number of replicas. The comparison shows that both RA-SPS and PA-SPS can significantly improve the number of events processed, while reducing costs
APA, Harvard, Vancouver, ISO, and other styles
2

Hongslo, Anders. "Stream Processing in the Robot Operating System framework." Thesis, Linköpings universitet, Artificiell intelligens och integrerad datorsystem, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-79846.

Full text
Abstract:
Streams of information rather than static databases are becoming increasingly important with the rapid changes involved in a number of fields such as finance, social media and robotics. DyKnow is a stream-based knowledge processing middleware which has been used in autonomous Unmanned Aerial Vehicle (UAV) research. ROS (Robot Operating System) is an open-source robotics framework providing hardware abstraction, device drivers, communication infrastructure, tools, libraries as well as other functionalities. This thesis describes a design and a realization of stream processing in ROS based on the stream-based knowledge processing middleware DyKnow. It describes how relevant information in ROS can be selected, labeled, merged and synchronized to provide streams of states. There are a lot of applications for such stream processing such as execution monitoring or evaluating metric temporal logic formulas through progression over state sequences containing the features of the formulas. Overviews are given of DyKnow and ROS before comparing the two and describing the design. The stream processing capabilities implemented in ROS are demonstrated through performance evaluations which show that such stream processing is fast and efficient. The resulting realization in ROS is also readily extensible to provide further stream processing functionality.
APA, Harvard, Vancouver, ISO, and other styles
3

Kakkad, Vasvi. "Curracurrong: a stream processing system for distributed environments." Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/12861.

Full text
Abstract:
Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay.
APA, Harvard, Vancouver, ISO, and other styles
4

Tokmouline, Timur. "A signal oriented stream processing system for pipeline monitoring." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/37106.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
Includes bibliographical references (p. 115-117).
In this thesis, we develop SignalDB, a framework for composing signal processing applications from primitive stream and signal processing operators. SignalDB allows the user to focus on the signal processing task and avoid needlessly spending time on learning a particular application programming interface (API). We use SignalDB to express acoustic and pressure transient methods for water pipeline monitoring as query plans consisting of signal processing operators.
by Timur Tokmouline.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
5

Robakowski, Mikolaj. "Comparison of State Backends for Modern Stream Processing System." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290597.

Full text
Abstract:
Distributed Stream Processing is a very popular computing paradigm used invarious modern computer systems. An important aspect of distributed streamprocessing systems is how they deal with computation state bigger than thesystem memory. This is often solved by the usage of a state backend – adatabase, usually an embedded one, that manages the state on the persistentstorage. However, this makes the performance of the whole system dependanton the performance of the database under the given workload. Log-structuredmerge-tree-based solutions are commonly used in stream processing systemsas a one-size-fits-all state backends. We postulate that using different statebackends for different workloads yields much better performance. In this workwe implement several state backends for Arcon, a modern stream processingruntime written in Rust and developed at KTH. The thesis goes over the designchoices and implementation process of a state backend interface alongwith several concrete implementations. We experimentally evaluate the implementationsagainst each other and show that under certain workloads someperform better than other. In particular we show that under read-heavy workloadssled, an embedded Bw-tree-based database written in Rust, outperformsthe commonly used, LSM-based RocksDB.
Distribuerad strömbehandling är ett mycket populärt dataparadigm som användsi olika moderna datorsystem. En viktig aspekt av distribuerad strömbearbetningssystem är hur de hanterar data som är större än system minne.Detta löses ofta genom användning av en backend – en databas, vanligtvis eninbäddad, som hanterar lagringen. Detta gör dock att hela systemets prestandablir beroende av databasens prestanda för den angivna arbetsbelastningen.Loggstrukturerad merge-tree-baserade lösningar används ofta i strömbehandlingssystemsom en backend för alla typer av belastningar. Vi postulerar attanvända olika backends för olika arbetsbelastningar ger mycket bättre prestanda.I det här arbetet implementerar vi flera backends för Arcon, en modernströmbehandlings runtime skriven i Rust och utvecklad vid KTH. Avhandlingengår över implementeringsprocessen och gränssnittet för backends med flerakonkreta implementationer. Vi utvärderar experimentellt implementationernamot varandra och visar att vissa presterar bättre än andra beroende på arbetsbelastningen.I synnerhet visar vi att under läs-tungt arbete, så ser vi att sled,en inbäddad Bw-Tree databas skriven i Rust presterar bättre än den vanligaLSM-baserade RocksDB.
APA, Harvard, Vancouver, ISO, and other styles
6

Mousavi, Bamdad. "Scalable Stream Processing and Management for Time Series Data." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42295.

Full text
Abstract:
There has been an enormous growth in the generation of time series data in the past decade. This trend is caused by widespread adoption of IoT technologies, the data generated by monitoring of cloud computing resources, and cyber physical systems. Although time series data have been a topic of discussion in the domain of data management for several decades, this recent growth has brought the topic to the forefront. Many of the time series management systems available today lack the necessary features to successfully manage and process the sheer amount of time series being generated today. In this today we stive to examine the field and study the prior work in time series management. We then propose a large system capable of handling time series management end to end, from generation to consumption by the end user. Our system is composed of open-source data processing frameworks. Our system has the capability to collect time series data, perform stream processing over it, store it for immediate and future processing and create necessary visualizations. We present the implementation of the system and perform experimentations to show its scalability to handle growing pipelines of incoming data from various sources.
APA, Harvard, Vancouver, ISO, and other styles
7

Balazinska, Magdalena. "Fault-tolerance and load management in a distributed stream processing system." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/35287.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 187-199).
Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, low-latency processing of large-volume data streams. These "stream processing applications" arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applications. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. Furthermore, because stream processing applications are inherently distributed, and because distribution can improve performance and scalability, researchers have also proposed and developed distributed SPEs. In this dissertation, we address two challenges faced by a distributed SPE: (1) faulttolerant operation in the face of node failures, network failures, and network partitions, and (2) federated load management. For fault-tolerance, we present a replication-based scheme, called Delay, Process, and Correct (DPC), that masks most node and network failures.
(cont.) When network partitions occur, DPC addresses the traditional availability-consistency trade-off by maintaining, when possible, a desired availability specified by the application or user, but eventually also delivering the correct results. While maintaining the desired availability bounds, DPC also strives to minimize the number of inaccurate results that must later be corrected. In contrast to previous proposals for fault tolerance in SPEs, DPC simultaneously supports a variety of applications that differ in their preferred trade-off between availability and consistency. For load management, we present a Bounded-Price Mechanism (BPM) that enables autonomous participants to collaboratively handle their load without individually owning the resources necessary for peak operation. BPM is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price. We show that BPM provides incentives that foster participation and leads to good system-wide load distribution. In contrast to earlier proposals based on computational economies, BPM is lightweight, enables participants to develop and exploit preferential relationships, and provides stability and predictability.
(cont.) Although motivated by stream processing, BPM is general and can be applied to any federated system. We have implemented both schemes in the Borealis distributed stream processing engine. They will be available with the next release of the system.
by Magdalena Balazinska.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
8

Ahmed, Abdulbasit. "Online network intrusion detection system using temporal logic and stream data processing." Thesis, University of Liverpool, 2013. http://livrepository.liverpool.ac.uk/12153/.

Full text
Abstract:
These days, the world is becoming more interconnected, and the Internet has dominated the ways to communicate or to do business. Network security measures must be taken to protect the organization environment. Among these security measures are the intrusion detection systems. These systems aim to detect the actions that attempt to compromise the confidentiality, availability, and integrity of a resource by monitoring the events occurring in computer systems and/or networks. The increasing amounts of data that are transmitted at higher and higher speed networks created a challenging problem for the current intrusion detection systems. Once the traffic exceeds the operational boundaries of these systems, packets are dropped. This means that some attacks will not be detected. In this thesis, we propose developing an online network based intrusion detection system by the combined use of temporal logic and stream data processing. Temporal Logic formalisms allow us to represent attack patterns or normal behaviour. Stream data processing is a recent database technology applied to flows of data. It is designed with high performance features for data intensive applications processing. In this work we develop a system where temporal logic specifications are automatically translated into stream queries that run on the stream database server and are continuously evaluated against the traffic to detect intrusions. The experimental results show that this combination was efficient in using the resources of the running machines and was able to detect all the attacks in the test data. Additionally, the proposed solution provides a concise and unambiguous way to formally represent attack signatures and it is extensible allowing attacks to be added. Also, it is scalable as the system can benefit from using more CPUs and additional memory on the same machine, or using distributed servers.
APA, Harvard, Vancouver, ISO, and other styles
9

Al-Sinayyid, Ali. "JOB SCHEDULING FOR STREAMING APPLICATIONS IN HETEROGENEOUS DISTRIBUTED PROCESSING SYSTEMS." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/dissertations/1868.

Full text
Abstract:
The colossal amounts of data generated daily are increasing exponentially at a never-before-seen pace. A variety of applications—including stock trading, banking systems, health-care, Internet of Things (IoT), and social media networks, among others—have created an unprecedented volume of real-time stream data estimated to reach billions of terabytes in the near future. As a result, we are currently living in the so-called Big Data era and witnessing a transition to the so-called IoT era. Enterprises and organizations are tackling the challenge of interpreting the enormous amount of raw data streams to achieve an improved understanding of data, and thus make efficient and well-informed decisions (i.e., data-driven decisions). Researchers have designed distributed data stream processing systems that can directly process data in near real-time. To extract valuable information from raw data streams, analysts need to create and implement data stream processing applications structured as a directed acyclic graphs (DAG). The infrastructure of distributed data stream processing systems, as well as the various requirements of stream applications, impose new challenges. Cluster heterogeneity in a distributed environment results in different cluster resources for task execution and data transmission, which make the optimal scheduling algorithms an NP-complete problem. Scheduling streaming applications plays a key role in optimizing system performance, particularly in maximizing the frame-rate, or how many instances of data sets can be processed per unit of time. The scheduling algorithm must consider data locality, resource heterogeneity, and communicational and computational latencies. The latencies associated with the bottleneck from computation or transmission need to be minimized when mapped to the heterogeneous and distributed cluster resources. Recent work on task scheduling for distributed data stream processing systems has a number of limitations. Most of the current schedulers are not designed to manage heterogeneous clusters. They also lack the ability to consider both task and machine characteristics in scheduling decisions. Furthermore, current default schedulers do not allow the user to control data locality aspects in application deployment.In this thesis, we investigate the problem of scheduling streaming applications on a heterogeneous cluster environment and develop the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto a heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications that are structured as a DAG, such as Amazon Timestream, Google Millwheel, and Twitter Heron. We conducted experiments using three Storm microbenchmark topologies in both simulated and real Apache Storm environments. To evaluate performance, we compared the proposed MT-Scheduler with the simulated round-robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round-robin approach in terms of both average system latency and throughput.
APA, Harvard, Vancouver, ISO, and other styles
10

Addimando, Alessio. "Progettazione di un intrusion detection system su piattaforma big data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16755/.

Full text
Abstract:
Negli ultimi anni, nel panorama digitale, è stato rilevato un ingente aumento del numero di dispositivi e utenti con accesso ad Internet. Proporzionalmente a questi fattori ogni giorno vengono generati continuamente, e in qualsiasi contesto, grandi quantità di dati difficili da gestire. Questo ha fatto emergere la necessità di riorganizzare gli asset aziendali per far fronte ad un calibro di informazione maggiore e per far in modo che la gestione stessa ne estragga valore concreto per la realtà decisionale. L'insieme di queste motivazioni da vita al fenomeno dei Big Data. Affiancato a questo panorama, inoltre, la grande quantità macchine e utenti in rete ha esponenzialmente aumentato anche il numero di attacchi informatici, che puntano nella stragrande dei casi all'appropriazione non autorizzata di dati sensibili e/o a provocare disservizi nelle reti private. Un esempio è il campus universitario di Forlì-Cesena che stima costantemente attive circa 3000 macchine interconnesse tra di loro e con la rete esterna. La grande quantità di risorse connesse in rete assume una certa importanza visti i dati sensibili che gestiscono e immagazzinano e nonostante l'archittettura di monitoraggio venga continuamente aggiornata, quest'ultima presenta colli di bottiglia evidenti e limitazioni nell'elaborazione dell'intero traffico di rete. Per far fronte a questa problematica lo scopo della tesi è stato quello di far convergere questi due ambiti informatici integrando al processo di sicurezza della rete un sistema di analisi e monitoraggio per il rilevamento di intrusioni (intrusion detection system), su piattaforma Big Data. Il prototipo realizzato (denominato Styx), sfrutta tecniche di data stream processing (elaborazione di dati real-time) e di machine learning (tecniche di apprendimento per estrazione di modelli predittivi) per potenziare l'attuale sistema di monitoraggio della rete universitaria.
APA, Harvard, Vancouver, ISO, and other styles
11

Slater, Alicia Adell. "Recovery of community structure and leaf processing in a headwater stream following use of a wetland passive treatment system to abate copper pollution." Thesis, This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-08222008-063653/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Wigent, Mark A., Andrea M. Mazzario, and Scott M. Matsumura. "Use of Multi-Threading, Modern Programming Language, and Lossless Compression in a Dynamic Commutation/Decommutation System." International Foundation for Telemetering, 2011. http://hdl.handle.net/10150/595662.

Full text
Abstract:
ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada
The Spectrum Efficient Technology Science and Technology (SET S&T) Program is sponsoring the development of the Dynamic Commutation and Decommutation System (DCDS), which optimizes telemetry data transmission in real time. The goal of DCDS is to improve spectrum efficiency - not through improving RF techniques but rather through changing and optimizing contents of the telemetry stream during system test. By allowing the addition of new parameters to the telemetered stream at any point during system test, DCDS removes the need to transmit measured data unless it is actually needed on the ground. When compared to serial streaming telemetry, real time re-formatting of the telemetry stream does require additional processing onboard the test article. DCDS leverages advances in microprocessor technology to perform this processing while meeting size, weight, and power constraints of the test environment. Performance gains of the system have been achieved by significant multi-threading of the application, allowing it to run on modern multi-core processors. Two other enhancing technologies incorporated into DCDS are the Java programming language and lossless compression.
APA, Harvard, Vancouver, ISO, and other styles
13

Carter, Bruce, and Troy Scoughton. "Low-cost, short-term development or high-data-rate, multi-stream, mulit-data type telemetry acquisition/processing system using an off-the-shelf integrated Telemetry Front End." International Foundation for Telemetering, 1989. http://hdl.handle.net/10150/614541.

Full text
Abstract:
International Telemetering Conference Proceedings / October 30-November 02, 1989 / Town & Country Hotel & Convention Center, San Diego, California
This paper explores the effects the new breed of off-theshelf integrated telemetry front end (TFE) packages have on the cost and schedule of the development cycle associated with real-time telemetry acquisition/processing systems. A case study of an actual project involving replacement of the Holloman AFB sled track telemetry processing system (TPS) with a system capable of simultaneously supporting up to twenty (20) asynchronous data streams is profiled. Notable among the capabilities of the system are; support for PCM, PAM, FM, IRIG and Local time streams; incoming data rates up to 10 Megabits/sec/stream; data logging rates over 16 MegaBytes/sec and the use of local area networks for distribution of data to real-time displays. To achieve these requirements within a manageable cost/schedule framework, the system was designed around an integrated TFE sub-system. Comparisons are drawn between several aspects of this projects development and that of an earlier developmental system which was completed by PSL within the last 16 months.
APA, Harvard, Vancouver, ISO, and other styles
14

Aved, Alexander. "Scene Understanding for Real Time Processing of Queries over Big Data Streaming Video." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5597.

Full text
Abstract:
With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation. These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to "reuse" advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language. With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; success in addressing the “trust” issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications.
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
15

Braik, William. "Détection d'évènements complexes dans les flux d'évènements massifs." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0596/document.

Full text
Abstract:
La détection d’évènements complexes dans les flux d’évènements est un domaine qui a récemment fait surface dans le ecommerce. Notre partenaire industriel Cdiscount, parmi les sites ecommerce les plus importants en France, vise à identifier en temps réel des scénarios de navigation afin d’analyser le comportement des clients. Les objectifs principaux sont la performance et la mise à l’échelle : les scénarios de navigation doivent être détectés en moins de quelques secondes, alorsque des millions de clients visitent le site chaque jour, générant ainsi un flux d’évènements massif.Dans cette thèse, nous présentons Auros, un système permettant l’identification efficace et à grande échelle de scénarios de navigation conçu pour le eCommerce. Ce système s’appuie sur un langage dédié pour l’expression des scénarios à identifier. Les règles de détection définies sont ensuite compilées en automates déterministes, qui sont exécutés au sein d’une plateforme Big Data adaptée au traitement de flux. Notre évaluation montre qu’Auros répond aux exigences formulées par Cdiscount, en étant capable de traiter plus de 10,000 évènements par seconde, avec une latence de détection inférieure à une seconde
Pattern detection over streams of events is gaining more and more attention, especially in the field of eCommerce. Our industrial partner Cdiscount, which is one of the largest eCommerce companies in France, aims to use pattern detection for real-time customer behavior analysis. The main challenges to consider are efficiency and scalability, as the detection of customer behaviors must be achieved within a few seconds, while millions of unique customers visit the website every day,thus producing a large event stream. In this thesis, we present Auros, a system for large-scale an defficient pattern detection for eCommerce. It relies on a domain-specific language to define behavior patterns. Patterns are then compiled into deterministic finite automata, which are run on a BigData streaming platform. Our evaluation shows that our approach is efficient and scalable, and fits the requirements of Cdiscount
APA, Harvard, Vancouver, ISO, and other styles
16

Kammoun, Abderrahmen. "Enhancing Stream Processing and Complex Event Processing Systems." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES012.

Full text
Abstract:
Alors que de plus en plus d'objets et d'appareils sensoriels connectés font partie de notre vie quotidienne, la masse d'informations circulant à grande vitesse ne cesse d'augmenter. Cette énorme quantité de données produites à des débits élevés exige une compréhension rapide pour être utile dans divers domaines d'activité telles que l'internet des objets, la santé, la gestion de l'énergie, etc. Les techniques traditionnelles de stockage et de traitement de données se sont révélées inefficaces ou inadaptables pour gérer ce flux de données. Cette thèse a pour objectif de proposer des solutions optimales à deux problèmes de recherche sur la gestion de flux de données. La première concerne l’optimisation de la résolution de requêtes continues complexes par les systèmes de détection d'événements complexes (CEP). La seconde aux problèmes liées à la prédiction des événement complexes fondée sur l’apprentissage de l’historique du système. Premièrement, nous avons proposé un modèle de recalcul pour le traitement de requêtes complexes, basé sur une indexation multidimensionnelle et des algorithmes de jointures optimisés. Deuxièmement, nous avons conçu un CEP prédictif qui utilise des informations historiques pour prédire des événements complexes futurs. Pour utiliser efficacement l'information historique, nous utilisons un espace de séquences historiques à N dimensions. Par conséquent, la prédiction peut être effectuée en répondant aux requêtes d’intervalles sur cet espace de séquences historiques. La pertinence des résultats obtenus, notamment par l'application de nos algorithmes et approches lors de challenges internationaux démontre la viabilité des méthodes que nous proposons
As more and more connected objects and sensory devices are becoming part of our daily lives, the sea of high-velocity information flow is growing. This massive amount of data produced at high rates requires rapid insight to be useful in various applications such as the Internet of Things, health care, energy management, etc. Traditional data storage and processing techniques are proven inefficient. This gives rise to Data Stream Management and Complex Event Processing (CEP) systems.This thesis aims to provide optimal solutions for complex and proactive queries. Our proposed techniques, in addition to CPU and memory efficiency, enhance the capabilities of existing CEP systems by adding predictive feature through real-time learning. The main contributions of this thesis are as follows:We proposed various techniques to reduce the CPU and memory requirements of expensive queries. These operators result in exponential complexity both in terms of CPU and memory. Our proposed recomputation and heuristic-based algorithm reduce the costs of these operators. These optimizations are based on enabling efficient multidimensional indexing using space-filling curves and by clustering events into batches to reduce the cost of pair-wise joins.We designed a novel predictive CEP system that employs historical information to predict future complex events. We proposed a compressed index structure, range query processing techniques and an approximate summarizing technique over the historical space.The applicability of our techniques over the real-world problems presented has produced further customize-able solutions that demonstrate the viability of our proposed methods
APA, Harvard, Vancouver, ISO, and other styles
17

Maurer, Simon. "Analysis and coordination of mixed-criticality cyber-physical systems." Thesis, University of Hertfordshire, 2018. http://hdl.handle.net/2299/21094.

Full text
Abstract:
A Cyber-physical System (CPS) can be described as a network of interlinked, concurrent computational components that interact with the physical world. Such a system is usually of reactive nature and must satisfy strict timing requirements to guarantee a correct behaviour. The components can be of mixed-criticality which implies different progress models and communication models, depending whether the focus of a component lies on predictability or resource efficiency. In this dissertation I present a novel approach that bridges the gap between stream processing models and Labelled Transition Systems (LTSs). The former offer powerful tools to describe concurrent systems of, usually simple, components while the latter allow to describe complex, reactive, components and their mutual interaction. In order to achieve the bridge between the two domains I introduce the novel LTS Synchronous Interface Automaton (SIA) that allows to model the interaction protocol of a process via its interface and to incrementally compose simple processes into more complex ones while preserving the system properties. Exploiting these properties I introduce an analysis to identify permanent blocking situations in a network of composed processes. SIAs are wrapped by the novel component-based coordination model Process Network with Synchronous Communication (PNSC) that allows to describe a network of concurrent processes where multiple communication models and the co-existence and interaction of heterogeneous processes is supported due to well defined interfaces. The work presented in this dissertation follows a holistic approach which spans from the theory of the underlying model to an instantiation of the model as a novel coordination language, called Streamix. The language uses network operators to compose networks of concurrent processes in a structured and hierarchical way. The work is validated by a prototype implementation of a compiler and a Run-time System (RTS) that allows to compile a Streamix program and execute it on a platform with support for ISO C, POSIX threads, and a Linux operating system.
APA, Harvard, Vancouver, ISO, and other styles
18

Alves, Francisco Marco Morais. "Framework for location based system sustained by mobile phone users." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/23817.

Full text
Abstract:
mestrado em Engenharia de Computadores e Telemática
Vivemos na era da informação e da Internet das coisas e por isso nunca antes a informação teve tanto valor, ao mesmo tempo nunca existiu tão elevada troca de informação. Com toda esta quantidade de dados e com o aumento substancial do poder computacional, tem-se assistido a uma explosão de ferramentas para o processamento destes dados em tempo real. Um novo paradigma também emergiu, pelo facto de que muita dessa informação tem meta informação da qual é possível extrair conhecimento adicional quando enriquecida. No caso dos operadores de telecomunicações existem vários fluxos de informação trocados entre dispositivos dos clientes, utilizadores de redes móveis e as antenas. Como exemplos são os casos dos pacotes Radius, Call Detail Records CDR’s e os Event Detail Records EDR’s que servem para o controlo de tráfego e para outros tipos de controlo e configurações. Em muitos destes pacotes vem incluída informação geográfica e temporal. Depressa se torna claro que a partir desta informação geográfica é possível extrair conhecimento e por isso valor adicional para os detentores da informação. Esta dissertação recorre a fluxos devidamente anonimizados que possuem informação de antenas (id e por isso posição e distância ao dispositivo). Neste trabalho é apresentada uma solução escalável e fiável que num ambiente de streaming determina a posição dos utilizadores de redes móveis, através de triangulação. A solução também determina métricas relativas a áreas geográficas. Devido a dificuldades externas, estes fluxos (dados) tiveram de ser simulados. As áreas são definidas e introduzidas por utilizadores da aplicação de forma a saberem as entradas e saídas, bem como o tempo de permanência em uma determinada área. Sendo o processamento realizado em ambiente de streaming, a solução desenvolvida tem de ser capaz de recuperar de falhas quando elas existirem de uma forma coerente e consistente.
The time we live in is the time of information and the time of the Internet of Things. So, never before information had so much value. On the other hand, the volume of information exchange grows exponentially day by day. With all this amount of data as well with the computational power available nowadays, real time data processing tools emerge every day. A new paradigm emerges because there is a lot of meta information in this data exchange. With the enrichment of this meta information, it is possible to extract additional knowledge. From a telecommunication company point of view, there is a lot of exchanged data flows between clients’ devices and the Base Transceiver Station (BTS) such as, Radius packets, Call Detail Records (CDR) and Event Detail Records (EDR). Frequently, these flows are for control and configurations purposes. But in many cases, it also contains geographical and time information. Soon was clear that it is possible to perform data enrichment on this geographical information, in order to extract additional knowledge. In other words, additional value for the telecommunication company. This dissertation through data flows previously anonymized, that contain BTS’s information (e.g. position and distance from the client mobile), grants one scalable and reliable solution on a streaming environment that determines multiple metrics related to geographical areas. Due to external difficulties, it was necessary to simulate all the data flows. These areas are inputted by application user clients in order to know the number of people that get in or out of these areas as well the time spent inside. Since the work is done on streaming environment, the solution presented is able to recover from failures and fault tolerant in a consistent and coherent manner.
APA, Harvard, Vancouver, ISO, and other styles
19

Vijayakumar, Nithya Nirmal. "Data management in distributed stream processing systems." [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278228.

Full text
Abstract:
Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2007.
Source: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6093. Adviser: Beth Plale. Title from dissertation home page (viewed May 9, 2008).
APA, Harvard, Vancouver, ISO, and other styles
20

Mei, Haitao. "Real-time stream processing in embedded systems." Thesis, University of York, 2017. http://etheses.whiterose.ac.uk/19750/.

Full text
Abstract:
Modern real-time embedded systems often involve computational-intensive data processing algorithms to meet their application requirements. As a result, there has been an increase in the use of multiprocessor platforms. The stream processing programming model aims to facilitate the construction of concurrent data processing programs to exploit the parallelism available on these architectures. However, most current stream processing frameworks or languages are not designed for use in real-time systems, let alone systems that might also have hard real-time control algorithms. This thesis contends that a generic architecture of a real-time stream processing infrastructure can be created to support predictable processing of both batched and live streaming data sources, and integrated with hard real-time control algorithms. The thesis first reviews relevant stream processing techniques, and identifies the open issues. Then a real-time stream processing task model, and an architecture for supporting that model is proposed. An approach to the integration of stream processing tasks into a real-time environment that also has hard real-time components is presented. Data is processed in parallel using execution-time servers allocated to each core. An algorithm is presented for selecting the parameters of the servers that maximises their capacities (within an overall deadline) and ensures that hard real-time components remain schedulable. Response-time analysis is derived to guarantee that the real-time requirements (deadlines for batched data processing, and latency for each data item for live data) for the stream processing activity are met. A framework, called SPRY, is implemented to support the proposed real-time stream processing architecture. The framework supports fully-partitioned applications that are scheduled using fixed priority-based scheduling techniques. A case study based on a modified Generic Avionics Platform is given to demonstrate the overall approach. Finally, the evaluation shows that the presented approach provides a better schedulability than alternative approaches.
APA, Harvard, Vancouver, ISO, and other styles
21

Drougas, Ioannis. "Rate allocation in distributed stream processing systems." Diss., [Riverside, Calif.] : University of California, Riverside, 2008. http://proquest.umi.com/pqdweb?index=0&did=1663077971&SrchMode=2&sid=1&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1268240766&clientId=48051.

Full text
Abstract:
Thesis (Ph. D.)--University of California, Riverside, 2008.
Includes abstract. Title from first page of PDF file (viewed March 10, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 93-98). Also issued in print.
APA, Harvard, Vancouver, ISO, and other styles
22

Idris, Muhammad. "Real-time Business Intelligence through Compact and Efficient Query Processing Under Updates." Doctoral thesis, Universite Libre de Bruxelles, 2019. https://dipot.ulb.ac.be/dspace/bitstream/2013/284705/5/contratMI.pdf.

Full text
Abstract:
Responsive analytics are rapidly taking over the traditional data analytics dominated by the post-fact approaches in traditional data warehousing. Recent advancements in analytics demand placing analytical engines at the forefront of the system to react to updates occurring at high speed and detect patterns, trends, and anomalies. These kinds of solutions find applications in Financial Systems, Industrial Control Systems, Business Intelligence and on-line Machine Learning among others. These applications are usually associated with Big Data and require the ability to react to constantly changing data in order to obtain timely insights and take proactive measures. Generally, these systems specify the analytical results or their basic elements in a query language, where the main task then is to maintain query results under frequent updates efficiently. The task of reacting to updates and analyzing changing data has been addressed in two ways in the literature: traditional business intelligence (BI) solutions focus on historical data analysis where the data is refreshed periodically and in batches, and stream processing solutions process streams of data from transient sources as flows of data items. Both kinds of systems share the niche of reacting to updates (known as dynamic evaluation), however, they differ in architecture, query languages, and processing mechanisms. In this thesis, we investigate the possibility of a reactive and unified framework to model queries that appear in both kinds of systems.In traditional BI solutions, evaluating queries under updates has been studied under the umbrella of incremental evaluation of queries that are based on the relational incremental view maintenance model and mostly focus on queries that feature equi-joins. Streaming systems, in contrast, generally follow automaton based models to evaluate queries under updates, and they generally process queries that mostly feature comparisons of temporal attributes (e.g. timestamp attributes) along with comparisons of non-temporal attributes over streams of bounded sizes. Temporal comparisons constitute inequality constraints while non-temporal comparisons can either be equality or inequality constraints. Hence these systems mostly process inequality joins. As a starting point for our research, we postulate the thesis that queries in streaming systems can also be evaluated efficiently based on the paradigm of incremental evaluation just like in BI systems in a main-memory model. The efficiency of such a model is measured in terms of runtime memory footprint and the update processing cost. To this end, the existing approaches of dynamic evaluation in both kinds of systems present a trade-off between memory footprint and the update processing cost. More specifically, systems that avoid materialization of query (sub)results incur high update latency and systems that materialize (sub)results incur high memory footprint. We are interested in investigating the possibility to build a model that can address this trade-off. In particular, we overcome this trade-off by investigating the possibility of practical dynamic evaluation algorithm for queries that appear in both kinds of systems and present a main-memory data representation that allows to enumerate query (sub)results without materialization and can be maintained efficiently under updates. We call this representation the Dynamic Constant Delay Linear Representation (DCLRs).We devise DCLRs with the following properties: 1) they allow, without materialization, enumeration of query results with bounded-delay (and with constant delay for a sub-class of queries), 2) they allow tuple lookup in query results with logarithmic delay (and with constant delay for conjunctive queries with equi-joins only), 3) they take space linear in the size of the database, 4) they can be maintained efficiently under updates. We first study the DCLRs with the above-described properties for the class of acyclic conjunctive queries featuring equi-joins with projections and present the dynamic evaluation algorithm called the Dynamic Yannakakis (DYN) algorithm. Then, we present the generalization of the DYN algorithm to the class of acyclic queries featuring multi-way Theta-joins with projections and call it Generalized DYN (GDYN). We devise DCLRs with the above properties for acyclic conjunctive queries, and the working of DYN and GDYN over DCLRs are based on a particular variant of join trees, called the Generalized Join Trees (GJTs) that guarantee the above-described properties of DCLRs. We define GJTs and present algorithms to test a conjunctive query featuring Theta-joins for acyclicity and to generate GJTs for such queries. We extend the classical GYO algorithm from testing a conjunctive query with equalities for acyclicity to testing a conjunctive query featuring multi-way Theta-joins with projections for acyclicity. We further extend the GYO algorithm to generate GJTs for queries that are acyclic.GDYN is hence a unified framework based on DCLRs that enables processing of queries that appear in streaming systems as well as in BI systems in a unified main-memory model and addresses the space-time trade-off. We instantiate GDYN to the particular case where all Theta-joins involve only equalities and inequalities and call this instantiation IEDYN. We implement DYN and IEDYN as query compilers that generate executable programs in the Scala programming language and provide all the necessary data structures and their maintenance and enumeration methods in a continuous stream processing model. We evaluate DYN and IEDYN against state-of-the-art BI and streaming systems on both industrial and synthetically generated benchmarks. We show that DYN and IEDYN outperform the existing systems by over an order of magnitude efficiency in both memory footprint and update processing time.
Doctorat en Sciences de l'ingénieur et technologie
info:eu-repo/semantics/nonPublished
APA, Harvard, Vancouver, ISO, and other styles
23

Ren, Xiangnan. "Traitement et raisonnement distribués des flux RDF." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1139/document.

Full text
Abstract:
Le traitement en temps réel des flux de données émanant des capteurs est devenu une tâche courante dans de nombreux scénarios industriels. Dans le contexte de l'Internet des objets (IoT), les données sont émises par des sources de flux hétérogènes, c'est-à-dire provenant de domaines et de modèles de données différents. Cela impose aux applications de l'IoT de gérer efficacement l'intégration de données à partir de ressources diverses. Le traitement des flux RDF est dès lors devenu un domaine de recherche important. Cette démarche basée sur des technologies du Web Sémantique supporte actuellement de nombreuses applications innovantes où les notions de temps réel et de raisonnement sont prépondérantes. La recherche présentée dans ce manuscrit s'attaque à ce type d'application. En particulier, elle a pour objectif de gérer efficacement les flux de données massifs entrants et à avoir des services avancés d’analyse de données, e.g., la détection d’anomalie. Cependant, un moteur de RDF Stream Processing (RSP) moderne doit prendre en compte les caractéristiques de volume et de vitesse rencontrées à l'ère du Big Data. Dans un projet industriel d'envergure, nous avons découvert qu'un moteur de traitement de flux disponible 24/7 est généralement confronté à un volume de données massives, avec des changements dynamiques de la structure des données et les caractéristiques de la charge du système. Pour résoudre ces problèmes, nous proposons Strider, un moteur de traitement de flux RDF distribué, hybride et adaptatif qui optimise le plan de requête logique selon l’état des flux de données. Strider a été conçu pour garantir d'importantes propriétés industrielles telles que l'évolutivité, la haute disponibilité, la tolérance aux pannes, le haut débit et une latence acceptable. Ces garanties sont obtenues en concevant l'architecture du moteur avec des composants actuellement incontournables du Big Data: Apache Spark et Apache Kafka. De plus, un nombre croissant de traitements exécutés sur des moteurs RSP nécessitent des mécanismes de raisonnement. Ils se traduisent généralement par un compromis entre le débit de données, la latence et le coût computationnel des inférences. Par conséquent, nous avons étendu Strider pour prendre en charge la capacité de raisonnement en temps réel avec un support d'expressivité d'ontologies en RDFS + (i.e., RDFS + owl:sameAs). Nous combinons Strider avec une approche de réécriture de requêtes pour SPARQL qui bénéficie d'un encodage intelligent pour les bases de connaissances. Le système est évalué selon différentes dimensions et sur plusieurs jeux de données, pour mettre en évidence ses performances. Enfin, nous avons exploré le raisonnement du flux RDF dans un contexte d'ontologies exprimés avec un fragment d'ASP (Answer Set Programming). La considération de cette problématique de recherche est principalement motivée par le fait que de plus en plus d'applications de streaming nécessitent des tâches de raisonnement plus expressives et complexes. Le défi principal consiste à gérer les dimensions de débit et de latence avec des méthologies efficaces. Les efforts récents dans ce domaine ne considèrent pas l'aspect de passage à l'échelle du système pour le raisonnement des flux. Ainsi, nous visons à explorer la capacité des systèmes distribuées modernes à traiter des requêtes d'inférence hautement expressive sur des flux de données volumineux. Nous considérons les requêtes exprimées dans un fragment positif de LARS (un cadre logique temporel basé sur Answer Set Programming) et proposons des solutions pour traiter ces requêtes, basées sur les deux principaux modèles d’exécution adoptés par les principaux systèmes distribuées: Bulk Synchronous Parallel (BSP) et Record-at-A-Time (RAT). Nous mettons en œuvre notre solution nommée BigSR et effectuons une série d’évaluations. Nos expériences montrent que BigSR atteint un débit élevé au-delà du million de triplets par seconde en utilisant un petit groupe de machines
Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines
APA, Harvard, Vancouver, ISO, and other styles
24

Works, Karen E. "Targeted Prioritized Processing in Overloaded Data Stream Systems." Digital WPI, 2013. https://digitalcommons.wpi.edu/etd-dissertations/414.

Full text
Abstract:
"We are in an era of big data, sensors, and monitoring technology. One consequence of this technology is the continuous generation of massive volumes of streaming data. To support this, stream processing systems have emerged. These systems must produce results while meeting near-real time response obligations. However, computation intensive processing on high velocity streams is challenging. Stream arrival rates are often unpredictable and can fluctuate. This can cause systems to not always be able to process all incoming data within their required response time.Yet inherently some results may be much more significant than others. The delay or complete neglect of producing certain highly significant results could result in catastrophic consequences. Unfortunately, this critical problem of targeted prioritized processing in overloaded environments remains largely unaddressed to date. In this talk, I will describe four key challenges that my dissertation successfully tackled. First, I address the problem of optimally processing the most significant tuples identified by the user at compile-time before less critical ones. Second, I propose a new aggregate operator that increases the accuracy of aggregate results produced for TP systems. Third, I address the problem of identifying and pulling forward significant tuples at run-time via dynamic determinants. Fourth, I design multi-input operators, such as the join operator, which produce multi-stream results in significance order. My experimental studies explore a rich diversity of workloads, queries, and data sets, including real data streams. The results substantiate that my approaches are a significant improvement over the state-of-the-art approaches."
APA, Harvard, Vancouver, ISO, and other styles
25

Bordin, Maycon Viana. "A benchmark suite for distributed stream processing systems." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/163441.

Full text
Abstract:
Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos.
Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems.
APA, Harvard, Vancouver, ISO, and other styles
26

Martin, André. "Minimizing Overhead for Fault Tolerance in Event Stream Processing Systems." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-210251.

Full text
Abstract:
Event Stream Processing (ESP) is a well-established approach for low-latency data processing enabling users to quickly react to relevant situations in soft real-time. In order to cope with the sheer amount of data being generated each day and to cope with fluctuating workloads originating from data sources such as Twitter and Facebook, such systems must be highly scalable and elastic. Hence, ESP systems are typically long running applications deployed on several hundreds of nodes in either dedicated data-centers or cloud environments such as Amazon EC2. In such environments, nodes are likely to fail due to software aging, process or hardware errors whereas the unbounded stream of data asks for continuous processing. In order to cope with node failures, several fault tolerance approaches have been proposed in literature. Active replication and rollback recovery-based on checkpointing and in-memory logging (upstream backup) are two commonly used approaches in order to cope with such failures in the context of ESP systems. However, these approaches suffer either from a high resource footprint, low throughput or unresponsiveness due to long recovery times. Moreover, in order to recover applications in a precise manner using exactly once semantics, the use of deterministic execution is required which adds another layer of complexity and overhead. The goal of this thesis is to lower the overhead for fault tolerance in ESP systems. We first present StreamMine3G, our ESP system we built entirely from scratch in order to study and evaluate novel approaches for fault tolerance and elasticity. We then present an approach to reduce the overhead of deterministic execution by using a weak, epoch-based rather than strict ordering scheme for commutative and tumbling windowed operators that allows applications to recover precisely using active or passive replication. Since most applications are running in cloud environments nowadays, we furthermore propose an approach to increase the system availability by efficiently utilizing spare but paid resources for fault tolerance. Finally, in order to free users from the burden of choosing the correct fault tolerance scheme for their applications that guarantees the desired recovery time while still saving resources, we present a controller-based approach that adapts fault tolerance at runtime. We furthermore showcase the applicability of our StreamMine3G approach using real world applications and examples.
APA, Harvard, Vancouver, ISO, and other styles
27

Kautharapu, K. B. "Aqueous two phase systems for down stream processing of proteins." Thesis(Ph.D.), CSIR-National Chemical Laboratory, Pune, 2009. http://dspace.ncl.res.in:8080/xmlui/handle/20.500.12252/2750.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Chong, Fong Ho. "Frequency-stream-tying hidden Markov model /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20CHONG.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 119-123). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
29

Nehme, Rimma V. "Continuous query processing on spatio-temporal data streams." Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-082305-154035/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

VASCONCELOS, RAFAEL OLIVEIRA. "A DYNAMIC LOAD BALANCING MECHANISM FOR DATA STREAM PROCESSING ON DDS SYSTEMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23629@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Esta dissertação apresenta a solução de balanceamento de carga baseada em fatias de processamento de dados (Data Processing Slice Load Balancing solution) para permitir o balanceamento de carga dinâmico do processamento de fluxos de dados em sistemas baseados em DDS (Data Distribution Service). Um grande número de aplicações requer o processamento contínuo de alto volume de dados oriundos de várias fontes distribuídas., tais como monitoramento de rede, sistemas de engenharia de tráfego, roteamento inteligente de carros em áreas metropolitanas, redes de sensores, sistemas de telecomunicações, aplicações financeiras e meteorologia. Conceito chave da solução proposta é o Data Processing Slice, o qual é a unidade básica da carga de processamento dos dados dos nós servidores em um domínio DDS. A solução consiste de um nó balanceador, o qual é responsável por monitorar a carga atual de um conjunto de nós processadores homogêneos e quando um desbalanceamento de carga é detectado, coordenar ações para redistribuir entre os nós processadores algumas fatias de carga de trabalho de forma segura. Experimentos feitos com grandes fluxos de dados que demonstram a baixa sobrecarga, o bom desempenho e a confiabilidade da solução apresentada.
This thesis presents the Data Processing Slice Load Balancing solution to enable dynamic load balancing of Data Stream Processing on DDS-based systems (Data Distribution Service). A large number of applications require continuous and timely processing of high-volume of data originated from many distributed sources, such as network monitoring, traffic engineering systems, intelligent routing of cars in metropolitan areas, sensor networks, telecommunication systems, financial applications and meteorology. The key concept of the proposed solution is the Data Processing Slice (DPS), which is the basic unit of data processing load of server nodes in a DDS Domain. The Data Processing Slice Load Balancing solution consists of a load balancer, which is responsible for monitoring the current load of a set of homogenous data processing nodes and when a load unbalance is detected, it coordinates the actions to redistribute some data processing slices among the processing nodes in a secure way. Experiments with large data stream have demonstrated the low overhead, good performance and the reliability of the proposed solution.
APA, Harvard, Vancouver, ISO, and other styles
31

Bustamante, Fabián Ernesto. "The active streams approach to adaptive distributed applications and services." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15481.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Sansrimahachai, Watsawee. "Tracing fine-grained provenance in stream processing systems using a reverse mapping method." Thesis, University of Southampton, 2012. https://eprints.soton.ac.uk/337675/.

Full text
Abstract:
Applications that require continuous processing of high-volume data streams have grown in prevalence and importance. These kinds of system often process streaming data in real-time or near real-time and provide instantaneous responses in order to support a precise and on time decision. In such systems it is difficult to know exactly how a particular result is generated. However, such information is extremely important for the validation and veri�cation of stream processing results. Therefore, it is crucial that stream processing systems have a mechanism for tracking provenance - the information pertaining to the process that produced result data - at the level of individual stream elements which we refer to as fine-grained provenance tracking for streams. The traceability of stream processing systems allows for users to validate individual stream elements, to verify the computation that took place and to understand the chain of reasoning that was used in the production of a stream processing result. Several recent solutions to provenance tracking in stream processing systems mainly focus on coarse-grained stream provenance in which the level of granularity for capturing provenance information is not detailed enough to address our problem. This thesis proposes a novel fine-grained provenance solution for streams that exploits a reverse mapping method to precisely capture dependency relationships for every individual stream element. It is also designed to support a stream-specific provenance query mechanism, which performs provenance queries dynamically over streams of provenance assertions without requiring the assertions to be stored persistently. The dissertation makes four major contributions to the state of the art. First is a provenance model for streams that allows for the provenance of individual stream elements to be obtained. Second is a provenance query method which utilizes a reverse mapping method - stream ancestor functions - in order to obtain the provenance of a particular stream processing result. The third contribution is a stream-specific provenance query mechanism that enables provenance queries to be computed on-the-fly without requiring provenance assertions to be stored persistently. The fourth contribution is the performance characteristics of our stream provenance solution. It is shown that the storage overhead for provenance collection can be reduced significantly by using our storage reduction technique and the marginal cost of storage consumption is constant based on the number of input stream events. A 4% overhead for the persistent provenance approach and a 7% overhead for the stream-specific query approach are observed as the impact of provenance recording on system performance. In addition, our stream-specific query approach offers low latency processing (0.3 ms per additional component) with reasonable memory consumption.
APA, Harvard, Vancouver, ISO, and other styles
33

Weisenseel, Chuck, and David Lane. "SIMULTANEOUS DATA PROCESSING OF MULTIPLE PCM STREAMS ON A PC BASED SYSTEM." International Foundation for Telemetering, 1999. http://hdl.handle.net/10150/608317.

Full text
Abstract:
International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada
The trend of current data acquisition and recording systems is to capture multiple streams of Pulse Code Modulation (PCM) data on a single media. The MARS II data recording system manufactured by Datatape, the Asynchronous Realtime Multiplexer and Output Reconstructor (ARMOR) systems manufactured by Calculex, Inc., and other systems on the market today are examples of this technology. The quantity of data recorded by these systems can be impressive, and can cause difficulties in post-test data processing in terms of data storage and turn around time to the analyst. This paper describes the system currently in use at the Strategic Systems Combined Test Force B-1B division to simultaneously post-flight process up to twelve independent PCM streams at twice real-time speeds. This system is entirely personal computer (PC) based running the Window NT 4.0 operating system with an internal ISA bus PCM decommutation card. Each PC is capable of receiving and processing one stream at a time. Therefore, the core of the system is twelve PCs each with decommutation capability. All PCs are connected via a fast ethernet network hub. The data processed by this system is IRIG 106 Chapter 8 converted MIL-STD-1553B bus data and Chapter 4 Class I and II PCM data. All system operator inputs are via Distributed Component Object Modeling (DCOM) provided by Microsoft Developers Studio, Versions 5.0 and 6.0, which allows control and status of multiple data processing PCs from one workstation. All data processing software is written in-house using Visual C++ and Visual Basic.
APA, Harvard, Vancouver, ISO, and other styles
34

Fernández, Moctezuma Rafael J. "A Data-Descriptive Feedback Framework for Data Stream Management Systems." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/116.

Full text
Abstract:
Data Stream Management Systems (DSMSs) provide support for continuous query evaluation over data streams. Data streams provide processing challenges due to their unbounded nature and varying characteristics, such as rate and density fluctuations. DSMSs need to adapt stream processing to these changes within certain constraints, such as available computational resources and minimum latency requirements in producing results. The proposed research develops an inter-operator feedback framework, where opportunities for run-time adaptation of stream processing are expressed in terms of descriptions of substreams and actions applicable to the substreams, called feedback punctuations. Both the discovery of adaptation opportunities and the exploitation of these opportunities are performed in the query operators. DSMSs are also concerned with state management, in particular, state derived from tuple processing. The proposed research also introduces the Contracts Framework, which provides execution guarantees about state purging in continuous query evaluation for systems with and without inter-operator feedback. This research provides both theoretical and design contributions. The research also includes an implementation and evaluation of the feedback techniques in the NiagaraST DSMS, and a reference implementation of the Contracts Framework.
APA, Harvard, Vancouver, ISO, and other styles
35

Chen, Liang. "A grid-based middleware for processing distributed data streams." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1157990530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Penczek, Frank. "Static guarantees for coordinated components : a statically typed composition model for stream-processing networks." Thesis, University of Hertfordshire, 2012. http://hdl.handle.net/2299/9046.

Full text
Abstract:
Does your program do what it is supposed to be doing? Without running the program providing an answer to this question is much harder if the language does not support static type checking. Of course, even if compile-time checks are in place only certain errors will be detected: compilers can only second-guess the programmer’s intention. But, type based techniques go a long way in assisting programmers to detect errors in their computations earlier on. The question if a program behaves correctly is even harder to answer if the program consists of several parts that execute concurrently and need to communicate with each other. Compilers of standard programming languages are typically unable to infer information about how the parts of a concurrent program interact with each other, especially where explicit threading or message passing techniques are used. Hence, correctness guarantees are often conspicuously absent. Concurrency management in an application is a complex problem. However, it is largely orthogonal to the actual computational functionality that a program realises. Because of this orthogonality, the problem can be considered in isolation. The largest possible separation between concurrency and functionality is achieved if a dedicated language is used for concurrency management, i.e. an additional program manages the concurrent execution and interaction of the computational tasks of the original program. Such an approach does not only help programmers to focus on the core functionality and on the exploitation of concurrency independently, it also allows for a specialised analysis mechanism geared towards concurrency-related properties. This dissertation shows how an approach that completely decouples coordination from computation is a very supportive substrate for inferring static guarantees of the correctness of concurrent programs. Programs are described as streaming networks connecting independent components that implement the computations of the program, where the network describes the dependencies and interactions between components. A coordination program only requires an abstract notion of computation inside the components and may therefore be used as a generic and reusable design pattern for coordination. A type-based inference and checking mechanism analyses such streaming networks and provides comprehensive guarantees of the consistency and behaviour of coordination programs. Concrete implementations of components are deliberately left out of the scope of coordination programs: Components may be implemented in an external language, for example C, to provide the desired computational functionality. Based on this separation, a concise semantic framework allows for step-wise interpretation of coordination programs without requiring concrete implementations of their components. The framework also provides clear guidance for the implementation of the language. One such implementation is presented and hands-on examples demonstrate how the language is used in practice.
APA, Harvard, Vancouver, ISO, and other styles
37

Martin, André [Verfasser], Christof [Akademischer Betreuer] Fetzer, and Peter [Gutachter] Pietzuch. "Minimizing Overhead for Fault Tolerance in Event Stream Processing Systems / André Martin ; Gutachter: Peter Pietzuch ; Betreuer: Christof Fetzer." Dresden : Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://d-nb.info/1119362482/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Büchner, Steffen [Verfasser], Jörg [Gutachter] Nolte, Rolf [Gutachter] Kraemer, and Wolfgang [Gutachter] Schröder-Preikschat. "Applying the stream-processing paradigm to ultra high-speed communication systems / Steffen Büchner ; Gutachter: Jörg Nolte, Rolf Kraemer, Wolfgang Schröder-Preikschat." Cottbus : BTU Cottbus - Senftenberg, 2020. http://d-nb.info/1218080191/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Le, Quoc Do. "Approximate Data Analytics Systems." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-234219.

Full text
Abstract:
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.
APA, Harvard, Vancouver, ISO, and other styles
40

Sree, Kumar Sruthi. "External Streaming State Abstractions and Benchmarking." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291338.

Full text
Abstract:
Distributed data stream processing is a popular research area and is one of the promising paradigms for faster and efficient data management. Application state is a first-class citizen in nearly every stream processing system. Nowadays, stream processing is, by definition, stateful. For a stream processing application, the state is backing operations such as aggregations, joins, and windows. Apache Flink is one of the most accepted and widely used stream processing systems in the industry. One of the main reasons engineers choose Apache Flink to write and deploy continuous applications is its unique combination of flexibility and scalability for stateful programmability, and the firm guarantee that the system ensures. Apache Flink’s guarantees always make its states correct and consistent even when nodes fail or when the number of tasks changes. Flink state can scale up to its compute node’s hard disk boundaries using embedded databases to store and retrieve data. Nevertheless, in all existing state backends officially supported by Flink, the state is always available locally to compute tasks. Even though this makes deployment more convenient, it creates other challenges such as non-trivial state reconfiguration and failure recovery. At the same time, compute, and state are bound to be tightly coupled. This strategy also leads to over-provisioning and is counterintuitive on state intensive only workloads or compute-intensive only workloads. This thesis investigates an alternative state backend architecture, FlinkNDB, which can tackle these challenges. FlinkNDB decouples state and computes by using a distributed database to store the state. The thesis covers the challenges of existing state backends and design choices and the new state backend implementation. We have evaluated the implementation of FlinkNDB against existing state backends offered by Apache Flink.
Distribuerad dataströmsbehandling är ett populärt forskningsområde och är ett av de lovande paradigmen för snabbare och effektivare datahantering. Applicationstate är en förstklassig medborgare i nästan alla strömbehandlingssystem. Numera är strömbearbetning per definition statlig. För en strömbehandlingsapplikation backar staten operationer som aggregeringar, sammanfogningar och windows. Apache Flink är ett av de mest accepterade och mest använda strömbehandlingssystemen i branschen. En av de främsta anledningarna till att ingenjörer väljer ApacheFlink för att skriva och distribuera kontinuerliga applikationer är dess unika kombination av flexibilitet och skalbarhet för statlig programmerbarhet, och företaget garanterar att systemet säkerställer. Apache Flinks garantier gör alltid dess tillstånd korrekt och konsekvent även när noder misslyckas eller när antalet uppgifter ändras. Flink-tillstånd kan skala upp till dess beräkningsnods hårddiskgränser genom att använda inbäddade databaser för att lagra och hämta data. I allmänna tillståndsstöd som officiellt stöds av Flink är staten dock alltid tillgänglig lokalt för att beräkna uppgifter. Även om detta gör installationen bekvämare, skapar det andra utmaningar som icke-trivial tillståndskonfiguration och felåterställning. Samtidigt måste beräkning och tillstånd vara tätt kopplade. Den här strategin leder också till överanvändning och är kontraintuitiv för statligt intensiva endast arbetsbelastningar eller beräkningsintensiva endast arbetsbelastningar. Denna avhandling undersöker en alternativ statsbackendarkitektur, FlinkNDB, som kan hantera dessa utmaningar. FlinkNDB frikopplar tillstånd och beräknar med hjälp av en distribuerad databas för att lagra tillståndet. Avhandlingen täcker utmaningarna med befintliga statliga backends och designval och den nya implementeringen av statebackend. Vi har utvärderat genomförandet av FlinkNDBagainst befintliga statliga backends som erbjuds av Apache Flink.
APA, Harvard, Vancouver, ISO, and other styles
41

Henriksson, Jonas. "Implementation of a real-time Fast Fourier Transform on a Graphics Processing Unit with data streamed from a high-performance digitizer." Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-113389.

Full text
Abstract:
In this thesis we evaluate the prospects of performing real-time digital signal processing on a graphics processing unit (GPU) when linked together with a high-performance digitizer. A graphics card is acquired and an implementation developed that address issues such as transportation of data and capability of coping with the throughput of the data stream. Furthermore, it consists of an algorithm for executing consecutive fast Fourier transforms on the digitized signal together with averaging and visualization of the output spectrum. An empirical approach has been used when researching different available options for streaming data. For better performance, an analysis of the introduced noise of using single-precision over double-precision has been performed to decide on the required precision in the context of this thesis. The choice of graphics card is based on an empirical investigation coupled with a measurement-based approach. An implementation in single-precision with streaming from the digitizer, by means of double buffering in CPU RAM, capable of speeds up to 3.0 GB/s is presented. Measurements indicate that even higher bandwidths are possible without overflowing the GPU. Tests show that the implementation is capable of computing the spectrum for transform sizes of , however measurements indicate that higher and lower transform sizes are possible. The results of the computations are visualized in real-time.
APA, Harvard, Vancouver, ISO, and other styles
42

Klein, Anja. "Datenqualität in Sensordatenströmen." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-27581.

Full text
Abstract:
Die stetige Entwicklung intelligenter Sensorsysteme erlaubt die Automatisierung und Verbesserung komplexer Prozess- und Geschäftsentscheidungen in vielfältigen Anwendungsszenarien. Sensoren können zum Beispiel zur Bestimmung optimaler Wartungstermine oder zur Steuerung von Produktionslinien genutzt werden. Ein grundlegendes Problem bereitet dabei die Sensordatenqualität, die durch Umwelteinflüsse und Sensorausfälle beschränkt wird. Ziel der vorliegenden Arbeit ist die Entwicklung eines Datenqualitätsmodells, das Anwendungen und Datenkonsumenten Qualitätsinformationen für eine umfassende Bewertung unsicherer Sensordaten zur Verfügung stellt. Neben Datenstrukturen zur effizienten Datenqualitätsverwaltung in Datenströmen und Datenbanken wird eine umfassende Datenqualitätsalgebra zur Berechnung der Qualität von Datenverarbeitungsergebnissen vorgestellt. Darüber hinaus werden Methoden zur Datenqualitätsverbesserung entwickelt, die speziell auf die Anforderungen der Sensordatenverarbeitung angepasst sind. Die Arbeit wird durch Ansätze zur nutzerfreundlichen Datenqualitätsanfrage und -visualisierung vervollständigt.
APA, Harvard, Vancouver, ISO, and other styles
43

Johnson, Robert A. "A Comparison Between Two-Dimensional and Three-DimensionalAnalysis, A Review of Horizontal Wood Diaphragms and a Case Study of the Structure Located at 89 Shrewsbury Street, Worcester, MA." Digital WPI, 2008. https://digitalcommons.wpi.edu/etd-theses/524.

Full text
Abstract:
A two-dimensional structural analysis design approach has been the universally accepted method for a small structural engineering design firm. The tools to perform the analysis have been paper and pencil, calculators and more recently personal computers with two-dimensional software. With the introduction of three-dimensional software, a major shift is occurring on how small structural engineering firms approach analysis and design. This thesis research reviews the analysis of an existing building utilizing the standard two-dimensional approach, including horizontal diaphragm-action within wood floors. This study also reviews the research performed on horizontal diaphragms and investigates the use of three-dimensional, finite element modeling (RISA-3D) for the analysis of horizontal diaphragms. It is shown that the three-dimensional model can provide results similar to the two-dimensional hand calculations. However, the thickness of the diaphragm elements has to be significantly modified for flexible diaphragm action. The experience described herein is useful for structural engineer interfacing within three-dimensional CAD systems. The thesis concludes with a discussion on the challenges facing small structural engineering firms, including computer based technologies, engineering expertise to develop contract documents and review shop drawings, and outsourcing of design services.
APA, Harvard, Vancouver, ISO, and other styles
44

SOUSA, Rodrigo Duarte. "Escalonamento adaptativo para sistemas de processamento contínuo de eventos." Universidade Federal de Campina Grande, 2014. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/381.

Full text
Abstract:
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-13T17:23:58Z No. of bitstreams: 1 RODRIGO DUARTE SOUSA - DISSERTAÇÃO - PPGCC 2014..pdf: 3708263 bytes, checksum: d9e59ec276a62382b6317ec8ce6bf880 (MD5)
Made available in DSpace on 2018-04-13T17:23:58Z (GMT). No. of bitstreams: 1 RODRIGO DUARTE SOUSA - DISSERTAÇÃO - PPGCC 2014..pdf: 3708263 bytes, checksum: d9e59ec276a62382b6317ec8ce6bf880 (MD5) Previous issue date: 2014-08-04
Sistemasde processamento contínuo de eventos vêm sendo utilizados em aplicações que necessitam de um processamento quase em tempo real. Essa necessidade, junto da quantidade elevada de dados processados nessas aplicações, provocam que tais sistemas possuam fortes requisitos de desempenho e tolerância a falhas. Sendo assim, escalonadores geralmente fazem uso de informações de utilização dos recursos das máquinas do sistema (como utilização de CPU, memória RAM, rede e disco) natentativadereagirapossíveissobrecargasque possam aumentar a utilização dos recursos, provocando uma piora no desempenho da aplicação. Entretanto, devido aos diferentes perfis de aplicações e componentes, a complexidade de se decidir, de forma flexível e genérica, o que deve ser monitorado e a diferença entre o que torna um recurso mais importante que outro em um dado momento, podem provocar escolhas não adequadas por parte do escalonador. O trabalho apresentado nesta dissertação propõe um algoritmo de escalonamento que, através de uma abordagem reativa, se adapta a diferentes perfis de aplicações e de carga, tomando decisões baseadas no monitoramento da variação do desempenho de seus operadores. Periodicamente,o escalonador realiza uma avaliação de quais operadores apresentaram uma piora em seu desempenho e, posteriormente, tenta migrar tais operadores para nós menos sobrecarregados. Foram executados experimentos onde um protótipo do algoritmo foi avaliado e os resultados demonstraram uma melhora no desempenho do sistema, apartirdadiminuiçãodalatênciadeprocessamentoedamanutenção da quantidade de eventos processados. Em execuções com variações bruscas da carga de trabalho, a latência média de processamento dos operadores foi reduzida em mais de 84%, enquanto queaquantidadedeeventos processados diminuiuapenas 1,18%.
The usage of event stream processing systems is growing lately, mainly at applications that have a near real-time processing as a requirement. That need, combined with the high amount of data processed by these applications, increases the dependency on performance and fault tolerance of such systems. Therefore, to handle these requirements, schedulers usually make use of the resources utilization (like CPU, RAM, disk and network bandwidth) in an attempt to react to potential over loads that may further increase their utilization, causing the application’s performance to deteriorate. However, due to different application profiles and components, the complexity of deciding, in a flexible and generic way, what resources should be monitored and the difference between what makes a resource utilization more important than another in a given time, can provoke the scheduler to perform wrong actions. In this work, we propose a scheduling algorithm that, via a reactive approach, adapts to different applications profiles and load, taking decisions based at the latency variation from its operators. Periodically, the system scheduler performs an evaluation of which operators are giving evidence of beingin an over loaded state, then, the scheduler tries to migrate those operators to a machine with less utilization. The experiments showed an improvement in the system performance, in scenarios with a bursty workload, the operators’ average processing latency was reduced by more than 84%, while the number of processed events decreased by only1.18%.
APA, Harvard, Vancouver, ISO, and other styles
45

Seshadri, Sangeetha. "Enhancing availability in large scale." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29715.

Full text
Abstract:
Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Ling Liu; Committee Member: Brian Cooper; Committee Member: Calton Pu; Committee Member: Douglas Blough; Committee Member: Karsten Schwan. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
46

Kamaleswaran, Rishikesan. "CBPsp: complex business processes for stream processing." Thesis, 2011. http://hdl.handle.net/10155/151.

Full text
Abstract:
This thesis presents the framework of a complex business process driven event stream processing system to produce meaningful output with direct implications to the business objectives of an organization. This framework is demonstrated using a case study instantiating the management of a newborn infant with hypoglycaemia. Business processes defined within guidelines, are defined at build-time while critical knowledge found in the definition of business processes are used to support their enactment for stream analysis. Four major research contributions are delivered. The first contribution enables the definition and enactment of complex business processes in real-time. The second contribution supports the extraction of business process using knowledge found within the initial expression of the business process. The third contribution allows for the explicit use of temporal abstraction and stream analysis knowledge to support enactment in real-time. Finally, the last contribution is the real-time integration of heterogeneous streams based on Service-Oriented Architecture principles.
UOIT
APA, Harvard, Vancouver, ISO, and other styles
47

Liu, Rong-Tai, and 劉榮太. "Stream Processing Engine in the Network Intrusion Detection System." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/84357251864564162409.

Full text
Abstract:
博士
國立清華大學
資訊工程學系
93
With growing Internet connectivity comes evolving opportunities for attackers to unlawfully access computers over the network. The Network Intrusion Detection Systems (NIDSes) are designed to identify attacks against networks or a host that are invisible to firewalls, thus providing an additional layer of security. The NIDS aims to detect a wide range of security violations ranging from attempted break-ins by outsiders to system penetrations and abuses by insiders. Generally two main methods are used for intrusion detection, namely Pattern Matching and Statistical Analysis. The former method applies a static set of patterns and alerts to traffic sequences with known signatures. Meanwhile, the latter method detects anomalous events statistically by gathering protocol header information and comparing this traffic to known attacks, as well as by sensing anomalies. Pattern matching tools are excellent at detecting known attacks, but perform poorly when facing a fresh assault or a modification of an old assault. NIDSes that use statistical analysis perform worse at sensing known problems, but much better at reporting unknown assaults. Improved implementation of an NIDS should combine these two methods to improve network protection. Either way, NIDSes rely on exact string matching from network packet payloads against thousands of intrusion signatures. This dissertation first discusses an efficient and practical mechanism named FSS (First-Seen SYN) filter which can mitigate and block SYN Flood attacks. Then it presents a TCP processing engine which tracks the behaviors of each TCP connection including the state transition, sequence and acknowledgement number, and integrity checking. The most important of all, it eliminates the ambiguities when the attackers use ambiguities in network protocol specifications to deceive network security systems. Then we introduce several fast pattern-matching algorithms since it’s the most computation -intensive task in an NIDS and dominates the performance of an NIDS. Two software-based algorithms and one hardware-based architecture are proposed and proven to be more efficient and high-performance compared to other existing methodologies.
APA, Harvard, Vancouver, ISO, and other styles
48

Balazinska, Magdalena, Hari Balakrishnan, Samuel Madden, and Mike Stonebraker. "Availability-Consistency Trade-Offs in a Fault-Tolerant Stream Processing System." 2004. http://hdl.handle.net/1721.1/30506.

Full text
Abstract:
processing. In contrast to previous techniques that handlenode failures, our approach also tolerates network failuresand network partitions. The approach is based on a principledtrade-off between consistency and availability in theface of failure, that (1) ensures that all data on an inputstream is processed within a specified time threshold, but(2) reduces the impact of failures by limiting if possible thenumber of results produced based on partially available inputdata, and (3) corrects these results when failures heal.Our approach is well-suited for applications such as environmentmonitoring, where high availability and “real-time”response is preferable to perfect answers.Our approach uses replication and guarantees that all processingreplicas achieve state consistency, both in the absenceof failures and after a failure heals. We achieve consistencyin the former case by defining a data-serializing operatorthat ensures that the order of tuples to a downstreamoperator is the same at all the replicas. To achieve consistencyafter a failure heals, we develop approaches based oncheckpoint/redo and undo/redo techniques.We have implemented these schemes in a prototype distributedstream processing system, and present experimentalresults that show that the system meets the desiredavailability-consistency trade-offs.
APA, Harvard, Vancouver, ISO, and other styles
49

Chen, Mei-Hsuan, and 陳美璇. "Data Flow Graph Partitioning for Stream Processing in Multi-FPGA Reconfigurable System." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/05549707611165024588.

Full text
Abstract:
碩士
國立交通大學
資訊工程系
91
The reconfigurable computing offers computation ability in hardware to increase performance, but also keeps the flexibility in software solution. The multi-FGPA reconfigurable system provides means for dealing with the applications that are too large to fit within a single FPGA, but may be partitioned over multiple FPGA available. The systems have a limited number of I/O pins that connect the FPGAs together, and therefore I/O pins must be used carefully. The object of this thesis is to exploit potential throughput of stream processing in multi-FPGA reconfigurable system. We proposed two approaches that schedule data flow graph onto the multi-FPGA system. The first method makes use of data flow graph to find the ideal size and connectivity of FPGA for multi-FPGA reconfigurable system. And the second approach increases the throughput by decreasing the communication overhead in current multi-FPGA reconfigurable system. In our simulation, we use kernel algorithms of DSP as benchmark. The results are promising.
APA, Harvard, Vancouver, ISO, and other styles
50

Carvalho, José Miguel Saramago de. "PhisioStream : a physiology monitoring system using off-the-shelf stream processing frameworks." Master's thesis, 2018. http://hdl.handle.net/10773/25888.

Full text
Abstract:
The on-going VR2Market research project emerged from a consortium composed of several partners from technology to psychology, including Carnegie Mellow University, United States under the CMU-Portugal program funded by FCT. The VR2Market main objective is to provide a team-wide monitoring solution over context, environmental aspects, and physiology of operating in hazardous professions, First Responders. However, the current solution is not cloud-enabled and relies on custommade components within a centralized design which hinders future evolutions towards more distributed situations. The objective of this work consists in refactoring VR2Market in order to provide cloud support with a more extensible architecture while allowing flexible data handling and visualization without compromising the existing functionalities. The key architectural option relies on the adoption of a streaming processing approach, applying off-the-shelf log monitoring and management solutions. Apache Kafka was used to handle and process data flows, both for integrating legacy data sources and to deploy simple trigger alarms. The later can be easily extended to more complex analytics, namely by using Apache Spark or Storm, without any refactoring of the data flow pipeline. The proposed solution handles simultaneously the processing of data and flexible visualization over both historical and live data. Services are modeled under a container-oriented approach, using Docker, to fully harness cloud-enabled deployments. Using the VR2Market context as the starting point, we managed to define and implement a new architecture that leverages on off-the-shelf tools to address the system needs. However, due to their general-purpose nature, they can easily be adapted to other scenarios. In addition, the system should support the integration of new types of sensors which can now be made with low effort.
O projeto VR2Market surgiu a partir de um consórcio composto por vários parceiros desde a área da tecnologia à psicologia, incluindo a Universidade de Carnegie Mellow, Estados Unidos, sob o programa CMU-Portugal financiado pelo FCT. O principal objetivo deste projeto é fornecer uma solução de monitorização de equipas de operacionais em profissões de risco, First Responders, em relação a aspetos tanto ambientais como fisiológicos. Contudo, a presente solução não oferece suporte à cloud e é composta maioritariamente por componentes ad hoc, o que dificulta o processo de evolução para soluções mais distribuídas. O objetivo do presente trabalho consiste no refactoring do VR2Market no sentido de oferecer suporte à cloud, a partir de uma arquitetura mais expansível e que possibilite o processamento e visualização de dados sem comprometer as funcionalidades existentes no momento. As opções tomadas recaem sobre o uso de processamento de streams e soluções off-the-shelf, tipicamente mais usadas para tarefas de gestão e monitorização de logs. O processamento de streams assente sobre Apache Kafka revelou ser uma boa abordagem para garantir o tratamento e processamento de dados pré-existentes bem como para criar alarmes simples sobre alguns parâmetros. Esta capacidade de processamento poderá ser elevada a níveis mais complexos de analytics, nomeadamente através de ferramentas como o Apache Spark ou Storm, sem comprometer o funcionamento da restante arquitetura. O tratamento dos dados como uma stream possibilitou ainda a integração de ferramentas off-the-shelf que possibilitaram a visualização dos dados de forma contínua ao longo do tempo. Ao combinar estas duas abordagens, foi possível garantir a visualização e processamento de dados de uma forma dinâmica e flexível – tanto sobre dados pré-existentes como os que chegam ao sistema. Foi adotada uma abordagem baseada em Docker containers que possibilitou não só uma forma mais simples de instalar o sistema como também chegar a uma solução totalmente cloud-enabled. Apesar de estar diretamente relacionado com o contexto do VR2Market, pela sua natureza, a nossa arquitetura pode ser facilmente adaptada a outro tipo de cenários. Além disso, a integração de novos tipos de sensores pode ser agora feita de forma mais fácil.
Mestrado em Engenharia de Computadores e Telemática
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography