Rozprawy doktorskie: „Distributed Stream Processing Systems”

1

Vijayakumar, Nithya Nirmal. "Data management in distributed stream processing systems". [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278228.

Pełny tekst źródła

Streszczenie:

Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2007.
Source: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6093. Adviser: Beth Plale. Title from dissertation home page (viewed May 9, 2008).

Style APA, Harvard, Vancouver, ISO itp.

2

Drougas, Ioannis. "Rate allocation in distributed stream processing systems". Diss., [Riverside, Calif.] : University of California, Riverside, 2008. http://proquest.umi.com/pqdweb?index=0&did=1663077971&SrchMode=2&sid=1&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1268240766&clientId=48051.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.)--University of California, Riverside, 2008.
Includes abstract. Title from first page of PDF file (viewed March 10, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 93-98). Also issued in print.

Style APA, Harvard, Vancouver, ISO itp.

3

Bordin, Maycon Viana. "A benchmark suite for distributed stream processing systems". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/163441.

Pełny tekst źródła

Streszczenie:

Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos.
Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems.

Style APA, Harvard, Vancouver, ISO itp.

4

Kakkad, Vasvi. "Curracurrong: a stream processing system for distributed environments". Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/12861.

Pełny tekst źródła

Streszczenie:

Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay.

Style APA, Harvard, Vancouver, ISO itp.

5

Al-Sinayyid, Ali. "JOB SCHEDULING FOR STREAMING APPLICATIONS IN HETEROGENEOUS DISTRIBUTED PROCESSING SYSTEMS". OpenSIUC, 2020. https://opensiuc.lib.siu.edu/dissertations/1868.

Pełny tekst źródła

Streszczenie:

The colossal amounts of data generated daily are increasing exponentially at a never-before-seen pace. A variety of applications—including stock trading, banking systems, health-care, Internet of Things (IoT), and social media networks, among others—have created an unprecedented volume of real-time stream data estimated to reach billions of terabytes in the near future. As a result, we are currently living in the so-called Big Data era and witnessing a transition to the so-called IoT era. Enterprises and organizations are tackling the challenge of interpreting the enormous amount of raw data streams to achieve an improved understanding of data, and thus make efficient and well-informed decisions (i.e., data-driven decisions). Researchers have designed distributed data stream processing systems that can directly process data in near real-time. To extract valuable information from raw data streams, analysts need to create and implement data stream processing applications structured as a directed acyclic graphs (DAG). The infrastructure of distributed data stream processing systems, as well as the various requirements of stream applications, impose new challenges. Cluster heterogeneity in a distributed environment results in different cluster resources for task execution and data transmission, which make the optimal scheduling algorithms an NP-complete problem. Scheduling streaming applications plays a key role in optimizing system performance, particularly in maximizing the frame-rate, or how many instances of data sets can be processed per unit of time. The scheduling algorithm must consider data locality, resource heterogeneity, and communicational and computational latencies. The latencies associated with the bottleneck from computation or transmission need to be minimized when mapped to the heterogeneous and distributed cluster resources. Recent work on task scheduling for distributed data stream processing systems has a number of limitations. Most of the current schedulers are not designed to manage heterogeneous clusters. They also lack the ability to consider both task and machine characteristics in scheduling decisions. Furthermore, current default schedulers do not allow the user to control data locality aspects in application deployment.In this thesis, we investigate the problem of scheduling streaming applications on a heterogeneous cluster environment and develop the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto a heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications that are structured as a DAG, such as Amazon Timestream, Google Millwheel, and Twitter Heron. We conducted experiments using three Storm microbenchmark topologies in both simulated and real Apache Storm environments. To evaluate performance, we compared the proposed MT-Scheduler with the simulated round-robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round-robin approach in terms of both average system latency and throughput.

Style APA, Harvard, Vancouver, ISO itp.

6

Balazinska, Magdalena. "Fault-tolerance and load management in a distributed stream processing system". Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/35287.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 187-199).
Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, low-latency processing of large-volume data streams. These "stream processing applications" arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applications. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. Furthermore, because stream processing applications are inherently distributed, and because distribution can improve performance and scalability, researchers have also proposed and developed distributed SPEs. In this dissertation, we address two challenges faced by a distributed SPE: (1) faulttolerant operation in the face of node failures, network failures, and network partitions, and (2) federated load management. For fault-tolerance, we present a replication-based scheme, called Delay, Process, and Correct (DPC), that masks most node and network failures.
(cont.) When network partitions occur, DPC addresses the traditional availability-consistency trade-off by maintaining, when possible, a desired availability specified by the application or user, but eventually also delivering the correct results. While maintaining the desired availability bounds, DPC also strives to minimize the number of inaccurate results that must later be corrected. In contrast to previous proposals for fault tolerance in SPEs, DPC simultaneously supports a variety of applications that differ in their preferred trade-off between availability and consistency. For load management, we present a Bounded-Price Mechanism (BPM) that enables autonomous participants to collaboratively handle their load without individually owning the resources necessary for peak operation. BPM is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price. We show that BPM provides incentives that foster participation and leads to good system-wide load distribution. In contrast to earlier proposals based on computational economies, BPM is lightweight, enables participants to develop and exploit preferential relationships, and provides stability and predictability.
(cont.) Although motivated by stream processing, BPM is general and can be applied to any federated system. We have implemented both schemes in the Borealis distributed stream processing engine. They will be available with the next release of the system.
by Magdalena Balazinska.
Ph.D.

Style APA, Harvard, Vancouver, ISO itp.

7

Bustamante, Fabián Ernesto. "The active streams approach to adaptive distributed applications and services". Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15481.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

8

Penczek, Frank. "Static guarantees for coordinated components : a statically typed composition model for stream-processing networks". Thesis, University of Hertfordshire, 2012. http://hdl.handle.net/2299/9046.

Pełny tekst źródła

Streszczenie:

Does your program do what it is supposed to be doing? Without running the program providing an answer to this question is much harder if the language does not support static type checking. Of course, even if compile-time checks are in place only certain errors will be detected: compilers can only second-guess the programmer’s intention. But, type based techniques go a long way in assisting programmers to detect errors in their computations earlier on. The question if a program behaves correctly is even harder to answer if the program consists of several parts that execute concurrently and need to communicate with each other. Compilers of standard programming languages are typically unable to infer information about how the parts of a concurrent program interact with each other, especially where explicit threading or message passing techniques are used. Hence, correctness guarantees are often conspicuously absent. Concurrency management in an application is a complex problem. However, it is largely orthogonal to the actual computational functionality that a program realises. Because of this orthogonality, the problem can be considered in isolation. The largest possible separation between concurrency and functionality is achieved if a dedicated language is used for concurrency management, i.e. an additional program manages the concurrent execution and interaction of the computational tasks of the original program. Such an approach does not only help programmers to focus on the core functionality and on the exploitation of concurrency independently, it also allows for a specialised analysis mechanism geared towards concurrency-related properties. This dissertation shows how an approach that completely decouples coordination from computation is a very supportive substrate for inferring static guarantees of the correctness of concurrent programs. Programs are described as streaming networks connecting independent components that implement the computations of the program, where the network describes the dependencies and interactions between components. A coordination program only requires an abstract notion of computation inside the components and may therefore be used as a generic and reusable design pattern for coordination. A type-based inference and checking mechanism analyses such streaming networks and provides comprehensive guarantees of the consistency and behaviour of coordination programs. Concrete implementations of components are deliberately left out of the scope of coordination programs: Components may be implemented in an external language, for example C, to provide the desired computational functionality. Based on this separation, a concise semantic framework allows for step-wise interpretation of coordination programs without requiring concrete implementations of their components. The framework also provides clear guidance for the implementation of the language. One such implementation is presented and hands-on examples demonstrate how the language is used in practice.

Style APA, Harvard, Vancouver, ISO itp.

9

Chen, Liang. "A grid-based middleware for processing distributed data streams". Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1157990530.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

10

Sree, Kumar Sruthi. "External Streaming State Abstractions and Benchmarking". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291338.

Pełny tekst źródła

Streszczenie:

Distributed data stream processing is a popular research area and is one of the promising paradigms for faster and efficient data management. Application state is a first-class citizen in nearly every stream processing system. Nowadays, stream processing is, by definition, stateful. For a stream processing application, the state is backing operations such as aggregations, joins, and windows. Apache Flink is one of the most accepted and widely used stream processing systems in the industry. One of the main reasons engineers choose Apache Flink to write and deploy continuous applications is its unique combination of flexibility and scalability for stateful programmability, and the firm guarantee that the system ensures. Apache Flink’s guarantees always make its states correct and consistent even when nodes fail or when the number of tasks changes. Flink state can scale up to its compute node’s hard disk boundaries using embedded databases to store and retrieve data. Nevertheless, in all existing state backends officially supported by Flink, the state is always available locally to compute tasks. Even though this makes deployment more convenient, it creates other challenges such as non-trivial state reconfiguration and failure recovery. At the same time, compute, and state are bound to be tightly coupled. This strategy also leads to over-provisioning and is counterintuitive on state intensive only workloads or compute-intensive only workloads. This thesis investigates an alternative state backend architecture, FlinkNDB, which can tackle these challenges. FlinkNDB decouples state and computes by using a distributed database to store the state. The thesis covers the challenges of existing state backends and design choices and the new state backend implementation. We have evaluated the implementation of FlinkNDB against existing state backends offered by Apache Flink.
Distribuerad dataströmsbehandling är ett populärt forskningsområde och är ett av de lovande paradigmen för snabbare och effektivare datahantering. Applicationstate är en förstklassig medborgare i nästan alla strömbehandlingssystem. Numera är strömbearbetning per definition statlig. För en strömbehandlingsapplikation backar staten operationer som aggregeringar, sammanfogningar och windows. Apache Flink är ett av de mest accepterade och mest använda strömbehandlingssystemen i branschen. En av de främsta anledningarna till att ingenjörer väljer ApacheFlink för att skriva och distribuera kontinuerliga applikationer är dess unika kombination av flexibilitet och skalbarhet för statlig programmerbarhet, och företaget garanterar att systemet säkerställer. Apache Flinks garantier gör alltid dess tillstånd korrekt och konsekvent även när noder misslyckas eller när antalet uppgifter ändras. Flink-tillstånd kan skala upp till dess beräkningsnods hårddiskgränser genom att använda inbäddade databaser för att lagra och hämta data. I allmänna tillståndsstöd som officiellt stöds av Flink är staten dock alltid tillgänglig lokalt för att beräkna uppgifter. Även om detta gör installationen bekvämare, skapar det andra utmaningar som icke-trivial tillståndskonfiguration och felåterställning. Samtidigt måste beräkning och tillstånd vara tätt kopplade. Den här strategin leder också till överanvändning och är kontraintuitiv för statligt intensiva endast arbetsbelastningar eller beräkningsintensiva endast arbetsbelastningar. Denna avhandling undersöker en alternativ statsbackendarkitektur, FlinkNDB, som kan hantera dessa utmaningar. FlinkNDB frikopplar tillstånd och beräknar med hjälp av en distribuerad databas för att lagra tillståndet. Avhandlingen täcker utmaningarna med befintliga statliga backends och designval och den nya implementeringen av statebackend. Vi har utvärderat genomförandet av FlinkNDBagainst befintliga statliga backends som erbjuds av Apache Flink.

Style APA, Harvard, Vancouver, ISO itp.

11

Braik, William. "Détection d'évènements complexes dans les flux d'évènements massifs". Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0596/document.

Pełny tekst źródła

Streszczenie:

La détection d’évènements complexes dans les flux d’évènements est un domaine qui a récemment fait surface dans le ecommerce. Notre partenaire industriel Cdiscount, parmi les sites ecommerce les plus importants en France, vise à identifier en temps réel des scénarios de navigation afin d’analyser le comportement des clients. Les objectifs principaux sont la performance et la mise à l’échelle : les scénarios de navigation doivent être détectés en moins de quelques secondes, alorsque des millions de clients visitent le site chaque jour, générant ainsi un flux d’évènements massif.Dans cette thèse, nous présentons Auros, un système permettant l’identification efficace et à grande échelle de scénarios de navigation conçu pour le eCommerce. Ce système s’appuie sur un langage dédié pour l’expression des scénarios à identifier. Les règles de détection définies sont ensuite compilées en automates déterministes, qui sont exécutés au sein d’une plateforme Big Data adaptée au traitement de flux. Notre évaluation montre qu’Auros répond aux exigences formulées par Cdiscount, en étant capable de traiter plus de 10,000 évènements par seconde, avec une latence de détection inférieure à une seconde
Pattern detection over streams of events is gaining more and more attention, especially in the field of eCommerce. Our industrial partner Cdiscount, which is one of the largest eCommerce companies in France, aims to use pattern detection for real-time customer behavior analysis. The main challenges to consider are efficiency and scalability, as the detection of customer behaviors must be achieved within a few seconds, while millions of unique customers visit the website every day,thus producing a large event stream. In this thesis, we present Auros, a system for large-scale an defficient pattern detection for eCommerce. It relies on a domain-specific language to define behavior patterns. Patterns are then compiled into deterministic finite automata, which are run on a BigData streaming platform. Our evaluation shows that our approach is efficient and scalable, and fits the requirements of Cdiscount

Style APA, Harvard, Vancouver, ISO itp.

12

Liu, Ying. "Query optimization for distributed stream processing". [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3274258.

Pełny tekst źródła

Streszczenie:

Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2007.
Source: Dissertation Abstracts International, Volume: 68-07, Section: B, page: 4597. Adviser: Beth Plale. Title from dissertation home page (viewed Apr. 21, 2008).

Style APA, Harvard, Vancouver, ISO itp.

13

Newton, Ryan Rhodes 1980. "Language design for distributed stream processing". Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/46795.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 149-152).
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collected programming to these domains. This is made possible by three primary implementation techniques, each of which leverages characteristics of the streaming domain. First, WaveScript employs an evaluation strategy that uses a combination of interpretation and reification to partially evaluate programs into stream dataflow graphs. Second, we use profile-driven compilation to enable many optimizations that are normally only available in the synchronous (rather than asynchronous) dataflow domain. Finally, an empirical, profile-driven approach also allows us to compute practical partitions of dataflow graphs, spreading them across embedded nodes and more powerful servers. We have used our language to build and deploy applications, including a sensor-network for the acoustic localization of wild animals such as the Yellow-Bellied marmot. We evaluate WaveScript's performance on this application, showing that it yields good performance on both embedded and desktop-class machines. Our language allowed us to implement the application rapidly, while outperforming a previous C implementation by over 35%, using fewer than half the lines of code. We evaluate the contribution of our optimizations to this success. We also evaluate WaveScript's ability to extract parallelism from this and other applications.
by Ryan Rhodes Newton.
Ph.D.

Style APA, Harvard, Vancouver, ISO itp.

14

Ren, Xiangnan. "Traitement et raisonnement distribués des flux RDF". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1139/document.

Pełny tekst źródła

Streszczenie:

Le traitement en temps réel des flux de données émanant des capteurs est devenu une tâche courante dans de nombreux scénarios industriels. Dans le contexte de l'Internet des objets (IoT), les données sont émises par des sources de flux hétérogènes, c'est-à-dire provenant de domaines et de modèles de données différents. Cela impose aux applications de l'IoT de gérer efficacement l'intégration de données à partir de ressources diverses. Le traitement des flux RDF est dès lors devenu un domaine de recherche important. Cette démarche basée sur des technologies du Web Sémantique supporte actuellement de nombreuses applications innovantes où les notions de temps réel et de raisonnement sont prépondérantes. La recherche présentée dans ce manuscrit s'attaque à ce type d'application. En particulier, elle a pour objectif de gérer efficacement les flux de données massifs entrants et à avoir des services avancés d’analyse de données, e.g., la détection d’anomalie. Cependant, un moteur de RDF Stream Processing (RSP) moderne doit prendre en compte les caractéristiques de volume et de vitesse rencontrées à l'ère du Big Data. Dans un projet industriel d'envergure, nous avons découvert qu'un moteur de traitement de flux disponible 24/7 est généralement confronté à un volume de données massives, avec des changements dynamiques de la structure des données et les caractéristiques de la charge du système. Pour résoudre ces problèmes, nous proposons Strider, un moteur de traitement de flux RDF distribué, hybride et adaptatif qui optimise le plan de requête logique selon l’état des flux de données. Strider a été conçu pour garantir d'importantes propriétés industrielles telles que l'évolutivité, la haute disponibilité, la tolérance aux pannes, le haut débit et une latence acceptable. Ces garanties sont obtenues en concevant l'architecture du moteur avec des composants actuellement incontournables du Big Data: Apache Spark et Apache Kafka. De plus, un nombre croissant de traitements exécutés sur des moteurs RSP nécessitent des mécanismes de raisonnement. Ils se traduisent généralement par un compromis entre le débit de données, la latence et le coût computationnel des inférences. Par conséquent, nous avons étendu Strider pour prendre en charge la capacité de raisonnement en temps réel avec un support d'expressivité d'ontologies en RDFS + (i.e., RDFS + owl:sameAs). Nous combinons Strider avec une approche de réécriture de requêtes pour SPARQL qui bénéficie d'un encodage intelligent pour les bases de connaissances. Le système est évalué selon différentes dimensions et sur plusieurs jeux de données, pour mettre en évidence ses performances. Enfin, nous avons exploré le raisonnement du flux RDF dans un contexte d'ontologies exprimés avec un fragment d'ASP (Answer Set Programming). La considération de cette problématique de recherche est principalement motivée par le fait que de plus en plus d'applications de streaming nécessitent des tâches de raisonnement plus expressives et complexes. Le défi principal consiste à gérer les dimensions de débit et de latence avec des méthologies efficaces. Les efforts récents dans ce domaine ne considèrent pas l'aspect de passage à l'échelle du système pour le raisonnement des flux. Ainsi, nous visons à explorer la capacité des systèmes distribuées modernes à traiter des requêtes d'inférence hautement expressive sur des flux de données volumineux. Nous considérons les requêtes exprimées dans un fragment positif de LARS (un cadre logique temporel basé sur Answer Set Programming) et proposons des solutions pour traiter ces requêtes, basées sur les deux principaux modèles d’exécution adoptés par les principaux systèmes distribuées: Bulk Synchronous Parallel (BSP) et Record-at-A-Time (RAT). Nous mettons en œuvre notre solution nommée BigSR et effectuons une série d’évaluations. Nos expériences montrent que BigSR atteint un débit élevé au-delà du million de triplets par seconde en utilisant un petit groupe de machines
Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines

Style APA, Harvard, Vancouver, ISO itp.

15

Kammoun, Abderrahmen. "Enhancing Stream Processing and Complex Event Processing Systems". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES012.

Pełny tekst źródła

Streszczenie:

Alors que de plus en plus d'objets et d'appareils sensoriels connectés font partie de notre vie quotidienne, la masse d'informations circulant à grande vitesse ne cesse d'augmenter. Cette énorme quantité de données produites à des débits élevés exige une compréhension rapide pour être utile dans divers domaines d'activité telles que l'internet des objets, la santé, la gestion de l'énergie, etc. Les techniques traditionnelles de stockage et de traitement de données se sont révélées inefficaces ou inadaptables pour gérer ce flux de données. Cette thèse a pour objectif de proposer des solutions optimales à deux problèmes de recherche sur la gestion de flux de données. La première concerne l’optimisation de la résolution de requêtes continues complexes par les systèmes de détection d'événements complexes (CEP). La seconde aux problèmes liées à la prédiction des événement complexes fondée sur l’apprentissage de l’historique du système. Premièrement, nous avons proposé un modèle de recalcul pour le traitement de requêtes complexes, basé sur une indexation multidimensionnelle et des algorithmes de jointures optimisés. Deuxièmement, nous avons conçu un CEP prédictif qui utilise des informations historiques pour prédire des événements complexes futurs. Pour utiliser efficacement l'information historique, nous utilisons un espace de séquences historiques à N dimensions. Par conséquent, la prédiction peut être effectuée en répondant aux requêtes d’intervalles sur cet espace de séquences historiques. La pertinence des résultats obtenus, notamment par l'application de nos algorithmes et approches lors de challenges internationaux démontre la viabilité des méthodes que nous proposons
As more and more connected objects and sensory devices are becoming part of our daily lives, the sea of high-velocity information flow is growing. This massive amount of data produced at high rates requires rapid insight to be useful in various applications such as the Internet of Things, health care, energy management, etc. Traditional data storage and processing techniques are proven inefficient. This gives rise to Data Stream Management and Complex Event Processing (CEP) systems.This thesis aims to provide optimal solutions for complex and proactive queries. Our proposed techniques, in addition to CPU and memory efficiency, enhance the capabilities of existing CEP systems by adding predictive feature through real-time learning. The main contributions of this thesis are as follows:We proposed various techniques to reduce the CPU and memory requirements of expensive queries. These operators result in exponential complexity both in terms of CPU and memory. Our proposed recomputation and heuristic-based algorithm reduce the costs of these operators. These optimizations are based on enabling efficient multidimensional indexing using space-filling curves and by clustering events into batches to reduce the cost of pair-wise joins.We designed a novel predictive CEP system that employs historical information to predict future complex events. We proposed a compressed index structure, range query processing techniques and an approximate summarizing technique over the historical space.The applicability of our techniques over the real-world problems presented has produced further customize-able solutions that demonstrate the viability of our proposed methods

Style APA, Harvard, Vancouver, ISO itp.

16

Mei, Haitao. "Real-time stream processing in embedded systems". Thesis, University of York, 2017. http://etheses.whiterose.ac.uk/19750/.

Pełny tekst źródła

Streszczenie:

Modern real-time embedded systems often involve computational-intensive data processing algorithms to meet their application requirements. As a result, there has been an increase in the use of multiprocessor platforms. The stream processing programming model aims to facilitate the construction of concurrent data processing programs to exploit the parallelism available on these architectures. However, most current stream processing frameworks or languages are not designed for use in real-time systems, let alone systems that might also have hard real-time control algorithms. This thesis contends that a generic architecture of a real-time stream processing infrastructure can be created to support predictable processing of both batched and live streaming data sources, and integrated with hard real-time control algorithms. The thesis first reviews relevant stream processing techniques, and identifies the open issues. Then a real-time stream processing task model, and an architecture for supporting that model is proposed. An approach to the integration of stream processing tasks into a real-time environment that also has hard real-time components is presented. Data is processed in parallel using execution-time servers allocated to each core. An algorithm is presented for selecting the parameters of the servers that maximises their capacities (within an overall deadline) and ensures that hard real-time components remain schedulable. Response-time analysis is derived to guarantee that the real-time requirements (deadlines for batched data processing, and latency for each data item for live data) for the stream processing activity are met. A framework, called SPRY, is implemented to support the proposed real-time stream processing architecture. The framework supports fully-partitioned applications that are scheduled using fixed priority-based scheduling techniques. A case study based on a modified Generic Avionics Platform is given to demonstrate the overall approach. Finally, the evaluation shows that the presented approach provides a better schedulability than alternative approaches.

Style APA, Harvard, Vancouver, ISO itp.

17

Argile, Andrew Duncan Stuart. "Distributed processing in decision support systems". Thesis, Nottingham Trent University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.259647.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

18

Unnava, Vasundhara. "Query processing in distributed database systems". Connect to resource, 1992. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1261314105.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

19

Harel, Nissim. "Memory Optimizations for Distributed Stream-based Applications". Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/13988.

Pełny tekst źródła

Streszczenie:

Distributed stream-based applications manage large quantities of data and exhibit unique production and consumption patterns that set them apart from general-purpose applications. This dissertation examines possible ways of creating more efficient memory management schemes. Specifically, it looks at the memory reclamation problem. It takes advantage of special traits of streaming applications to extend the definition of the garbage collection problem for those applications and include not only data items that are not reachable but also items that have no effect on the final outcome of the application. Streaming applications typically fully process only a portion of the data, and resources directed towards the remaining data items (i.e., those that dont affect the final outcome) can be viewed as wasted resources that should be minimized. Two complementary approaches are suggested: 1. Garbage Identification 2. Adaptive Resource Utilization Garbage Identification is concerned with an analysis of dynamic data dependencies to infer those items that the application is no longer going to access. Several garbage identification algorithms are examined. Each one of the algorithms uses a set of application properties (possibly distinct from one another) to reduce the memory consumption of the application. The performance of these garbage identification algorithms is compared to the performance of an ideal garbage collector, using a novel logging/post-mortem analyzer. The results indicate that the algorithms that achieve a low memory footprint (close to that of an ideal garbage collector) perform their garbage identification decisions locally; however, they base these decisions on best-effort global information obtained from other components of the distributed application. The Adaptive Resource Utilization (ARU) algorithm analyzes the dynamic relationships between the production and consumption of data items. It uses this information to infer the capacity of the system to process data items and adjusts data generation accordingly. The ARU algorithm makes local capacity decisions based on best-effort global information. This algorithm is found to be as effective as the most successful garbage identification algorithm in reducing the memory footprint of stream-based applications, thus confirming the observation that using best-effort global information to perform local decisions is fundamental in reducing memory consumption for stream-based applications.

Style APA, Harvard, Vancouver, ISO itp.

20

Zhou, Wanlei, i mikewood@deakin edu au. "Building reliable distributed systems". Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20051017.160921.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

21

Gunaseelan, L. "Debugging of Distributed object systems". Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/9219.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

22

Works, Karen E. "Targeted Prioritized Processing in Overloaded Data Stream Systems". Digital WPI, 2013. https://digitalcommons.wpi.edu/etd-dissertations/414.

Pełny tekst źródła

Streszczenie:

"We are in an era of big data, sensors, and monitoring technology. One consequence of this technology is the continuous generation of massive volumes of streaming data. To support this, stream processing systems have emerged. These systems must produce results while meeting near-real time response obligations. However, computation intensive processing on high velocity streams is challenging. Stream arrival rates are often unpredictable and can fluctuate. This can cause systems to not always be able to process all incoming data within their required response time.Yet inherently some results may be much more significant than others. The delay or complete neglect of producing certain highly significant results could result in catastrophic consequences. Unfortunately, this critical problem of targeted prioritized processing in overloaded environments remains largely unaddressed to date. In this talk, I will describe four key challenges that my dissertation successfully tackled. First, I address the problem of optimally processing the most significant tuples identified by the user at compile-time before less critical ones. Second, I propose a new aggregate operator that increases the accuracy of aggregate results produced for TP systems. Third, I address the problem of identifying and pulling forward significant tuples at run-time via dynamic determinants. Fourth, I design multi-input operators, such as the join operator, which produce multi-stream results in significance order. My experimental studies explore a rich diversity of workloads, queries, and data sets, including real data streams. The results substantiate that my approaches are a significant improvement over the state-of-the-art approaches."

Style APA, Harvard, Vancouver, ISO itp.

23

Navaratnam, Srivallipuranandan. "Reliable group communication in distributed systems". Thesis, University of British Columbia, 1987. http://hdl.handle.net/2429/26505.

Pełny tekst źródła

Streszczenie:

This work describes the design and implementation details of a reliable group communication mechanism. The mechanism guarantees that messages will be received by all the operational members of the group or by none of them (atomicity). In addition, the sequence of messages will be the same at each of the recipients (order). The message ordering property can be used to simplify distributed database systems and distributed processing algorithms. The proposed mechanism continues to operate despite process, host and communication link failures (survivability). Survivability is essential in fault-tolerant applications.
Science, Faculty of
Computer Science, Department of
Graduate

Style APA, Harvard, Vancouver, ISO itp.

24

孫昱東 i Yudong Sun. "A distributed object model for solving irregularly structured problemson distributed systems". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31243630.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

25

Fukuzono, Hayato. "Spatial Signal Processing on Distributed MIMO Systems". 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/217206.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Al-Bassiouni, Abdel-Aziz Mahmoud. "Optimum signal processing in distributed sensor systems". Thesis, Monterey, California: U.S. Naval Postgraduate School, 1987. http://hdl.handle.net/10945/22401.

Pełny tekst źródła

Streszczenie:

Approved for public release; distribution is unlimited
We consider the problem of detection of known signals in noise using quantized, discrete sensor observations. Optimal design of the quantizers at the sensor sites as well as the global fusion of the quantized observations is presented. Also the equivalence between a team of two sensors and their fusion centre and another team of a primary decision maker and a second opinion is shown. Since the fusion of information is a main pillar of the thesis, an early chapter is devoted to the optimum fusion policy. Extension of the results to the case of vector sensor observations is also considered

Style APA, Harvard, Vancouver, ISO itp.

27

Wong, Kar Leong. "A message controller for distributed processing systems". Thesis, Nottingham Trent University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.312309.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

28

Millar, Dean Lee. "Parallel distributed processing in rock engineering systems". Thesis, Imperial College London, 2008. http://hdl.handle.net/10044/1/37116.

Pełny tekst źródła

Streszczenie:

Rock Engineering Systems are a collection of ideas, mathematical tools and computer technology all of which are designed to solve problems in rock engineering with interacting components. The interactions between components can be complex and the rock engineering problems themselves contain a high degree of uncertainty. The research described in this thesis investigates the incorporation of computational techniques known as parallel distributed processing methods into the disciplines of rock mechanics and rock engineering. Two main applications of parallel distributed processing methods in rock engineering are investigated in this thesis. 1) Multilayered perceptron artificial neural networks are used successfully to encapsulate the laboratory behaviour of rocks under triaxial compression. Trained artificial neural networks are then used to replace conventional constitutive models within finite difference geomechanical numerical modelling codes. 2) Two multilayered perceptron artificial neural networks are developed to assist in the task of discrimination of rock fracture presence within digital imagery of rock exposures. The first is trained using samples of the image that contain fracture image content and samples that do not, and provides a probability-like measure of fracture presence. It was sufficiently successful to permit estimation of fracture intensity parameter , . The second was developed specifically to identify fracture termination condition by matching samples to a set of fracture termination condition templates. Seven original contributions to the rock mechanics and rock engineering disciplines have resulted across the three application areas. These contributions are itemised, with details, at the beginning of the final Chapter of the thesis.

Style APA, Harvard, Vancouver, ISO itp.

29

Andersson, Sara. "Data Processing and Collection in Distributed Systems". Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85313.

Pełny tekst źródła

Streszczenie:

Distributed systems can be seen in a variety of applications that is in use today. Tritech provides several systems that to some extent consist of distributed systems of nodes. These nodes collect data and the data have to be processed. A problem that often appears when designing these systems, is deciding where the data should be processed, i.e., which architecture is the most suitable one for the system. Decide the architecture for these systems are not simple, especially since it changes rather quickly due to the development in these areas. The thesis aims to perform a study regarding which factors affect the choice of architecture in a distributed system and how these factors relate to each other. To be able to analyze which factors do affect the choice of architecture and to what extent, a simulator was implemented. The simulator received information about the factors as input, and return one or several architecture configurations as output. By performing qualitative interviews, the input factors to the simulator were chosen. The factors that were analyzed in the thesis was: security, storage, working memory, size of data, number of nodes, data processing per data set, robust communication, battery consumption, and cost. From the qualitative interviews as well as from the prestudy five architecture configuration was chosen. The chosen architectures were: thin-client server, thick-client server, three-tier client-server, peer-to-peer, and cloud computing. The simulator was validated regarding the three given use cases: agriculture, the train industry, and industrial Internet of Things. The validation consisted of five existing projects from Tritech. From the results of the validation, the simulator produced correct results for three of the five projects. By using the simulator results, it could be seen which factors affect the choice of architecture more than others and are hard to provide in the same architecture since they are conflicting factors. The conflicting factors were security together with working memory and robust communication. The factor working memory together with battery consumption also showed to be conflicting factors and is hard to provide within the same architecture. Therefore, according to the simulator, it can be seen that the factors that affect the choice of architecture were working memory, battery consumption, security, and robust communication. By using the results of the simulator, a decision matrix was designed whose purpose was to facilitate the choice of architecture. The evaluation of the decision matrix consisted of four projects from Tritech including the three given use cases: agriculture, the train industry, and industrial Internet of Things. The evaluation of the decision matrix showed that the two architectures that received the most points, one of the architectures were used in the validated project.
Distribuerade system kan ses i en mängd olika applikationer som används idag. Tritech jobbar med flera produkter som till viss del består av distribuerade system av noder. Det dessa system har gemensamt är att noderna samlar in data och denna data kommer på ett eller ett annat sätt behöva bearbetas. En fråga som ofta behövs besvaras vid uppsättning av arkitekturen för sådana projekt är huruvida datan ska bearbetas, d.v.s. vilken arkitektkonfiguration som är mest lämplig för systemet. Att ta dessa beslut har visat sig inte alltid vara helt simpelt, och det ändrar sig relativt snabbt med den utvecklingen som sker på dessa områden. Denna uppsats syftar till att utföra en studie om vilka faktorer som påverkar valet av arkitektur för ett distribuerat system samt hur dessa faktorer förhåller sig mot varandra. För att kunna analysera vilka faktorer som påverkar valet av arkitektur och i vilken utsträckning, implementerades en simulator. Simulatorn tog faktorerna som input och returnerade en eller flera arkitekturkonfigurationer som output. Genom att utföra kvalitativa intervjuer valdes faktorerna till simulatorn. Faktorerna som analyserades i denna uppsats var: säkerhet, lagring, arbetsminne, storlek på data, antal noder, databearbetning per datamängd, robust kommunikation, batteriförbrukning och kostnad. Från de kvalitativa intervjuerna och från förstudien valdes även fem stycken arkitekturkonfigurationer. De valda arkitekturerna var: thin-client server, thick-client server, three-tier client-server, peer-to-peer, och cloud computing. Simulatorn validerades inom de tre givna användarfallen: lantbruk, tågindustri och industriell IoT. Valideringen bestod av fem befintliga projekt från Tritech. Från resultatet av valideringen producerade simulatorn korrekta resultat för tre av de fem projekten. Utifrån simulatorns resultat, kunde det ses vilka faktorer som påverkade mer vid valet av arkitektur och är svåra att kombinera i en och samma arkitekturkonfiguration. Dessa faktorer var säkerhet tillsammans med arbetsminne och robust kommunikation. Samt arbetsminne tillsammans med batteriförbrukning visade sig också vara faktorer som var svåra att kombinera i samma arkitektkonfiguration. Därför, enligt simulatorn, kan det ses att de faktorer som påverkar valet av arkitektur var arbetsminne, batteriförbrukning, säkerhet och robust kommunikation. Genom att använda simulatorns resultat utformades en beslutsmatris vars syfte var att underlätta valet av arkitektur. Utvärderingen av beslutsmatrisen bestod av fyra projekt från Tritech som inkluderade de tre givna användarfallen: lantbruk, tågindustrin och industriell IoT. Resultatet från utvärderingen av beslutsmatrisen visade att de två arkitekturerna som fick flest poäng, var en av arkitekturerna den som användes i det validerade projektet

Style APA, Harvard, Vancouver, ISO itp.

30

Xia, Yu S. M. Massachusetts Institute of Technology. "Logical timestamps in distributed transaction processing systems". Thesis, Massachusetts Institute of Technology, 2018. https://hdl.handle.net/1721.1/122877.

Pełny tekst źródła

Streszczenie:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 73-79).
Distributed transactions are such transactions with remote data access. They usually suffer from high network latency (compared to the internal overhead) during data operations on remote data servers, and therefore lengthen the entire transaction executiont time. This increases the probability of conflicting with other transactions, causing high abort rates. This, in turn, causes poor performance. In this work, we constructed Sundial, a distributed concurrency control algorithm that applies logical timestamps seaminglessly with a cache protocol, and works in a hybrid fashion where an optimistic approach is combined with lock-based schemes. Sundial tackles the inefficiency problem in two ways. Firstly, Sundial decides the order of transactions on the fly. Transactions get their commit timestamp according to their data access traces. Each data item in the database has logical leases maintained by the system. A lease corresponds to a version of the item. At any logical time point, only a single transaction holds the 'lease' for any particular data item. Therefore, lease holders do not have to worry about someone else writing to the item because in the logical timeline, the data writer needs to acquire a new lease which is disjoint from the holder's. This lease information is used to calculate the logical commit time for transactions. Secondly, Sundial has a novel caching scheme that works together with logical leases. The scheme allows the local data server to automatically cache data from the remote server while preserving data coherence. We benchmarked Sundial along with state-of-the-art distributed transactional concurrency control protocols. On YCSB, Sundial outperforms the second best protocol by 57% under high data access contention. On TPC-C, Sundial has a 34% improvement over the state-of-the-art candidate. Our caching scheme has performance gain comparable with hand-optimized data replication. With high access skew, it speeds the workload by up to 4.6 x.
"This work was supported (in part) by the U.S. National Science Foundation (CCF-1438955)"
by Yu Xia.
S.M.
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

Style APA, Harvard, Vancouver, ISO itp.

31

Bernabéu-Aubán, José Manuel. "Location finding algorithms for distributed systems". Diss., Georgia Institute of Technology, 1988. http://hdl.handle.net/1853/32951.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Bennett, John K. "Distributed Smalltalk : inheritance and reactiveness in distributed systems /". Thesis, Connect to this title online; UW restricted, 1988. http://hdl.handle.net/1773/6923.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

33

Cannalire, Pietro. "Geo-distributed multi-layer stream aggregation". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230217.

Pełny tekst źródła

Streszczenie:

The standard processing architectures are enough to satisfy a lot of applications by employing already existing stream processing frameworks which are able to manage distributed data processing. In some specific cases, having geographically distributed data sources requires to distribute even more the processing over a large area by employing a geographically distributed architecture.‌ The issue addressed in this work is the reduction of data movement across the network which is continuously flowing in a geo-distributed architecture from streaming sources to the processing location and among processing entities within the same distributed cluster. Reduction of data movement can be critical for decreasing bandwidth costs since accessing links placed in the middle of the network can be costly and can increase as the amount of data exchanges increase. In this work we want to create a different concept to deploy geographically distributed architectures by relying on Apache Spark Structured Streaming and Apache Kafka. The features needed for an algorithm to run on a geo-distributed architecture are provided. The algorithms to be executed on this architecture apply the windowing and the data synopses techniques to produce a summaries of the input data and to address issues of the geographically distributed architecture. The computation of the average and the Misra-Gries algorithm are then implemented to test the designed architecture. This thesis work contributes in providing a new model of building geographically distributed architecture. The experimental results show that, for the algorithms running on top of the geo distributed architecture, the computation time is reduced on average by 70% compared to the distributed setup. Similarly, and the amount of data exchanged across the network is reduced on average by 99%, compared to the distributed setup.
Standardbehandlingsarkitekturer är tillräckligt för uppfylla behoven av många tillämpningar genom användning av befintliga ramverk för flödesbehandling med stöd för distribuerad databehandling. I specifika fall kan geografiskt fördelade datakällor kräva att databehandlingen fördelas över ett stort område med hjälp av en geografiskt distribuerad arkitektur. Problemet som behandlas i detta arbete är minskningen av kontinuerlig dataöverföring i ett nätverk med geo-distribuerad arkitektur. Minskad dataöverföring kan vara avgörande för minskade bandbreddskonstnader då åtkomst av länkar placerade i mitten av ett nätverk kan vara dyrt och öka ytterligare med tilltagande dataöverföring. I det här arbetet vill vi skapa ett nytt koncept för att upprätta geografiskt distribuerade arkitekturer med hjälp av Apache Spark Structured Streaming och Apache Kafka. Funktioner och förutsättningar som behövs för att en algoritm ska kunna köras på en geografisk distribuerad arkitektur tillhandahålls. Algoritmerna som ska köras på denna arkitektur tillämpar “windowing synopsing” och “data synopses”-tekniker för att framställa en sammanfattning av ingående data samt behandla problem beträffande den geografiskt fördelade arkitekturen. Beräkning av medelvärdet och Misra-Gries-algoritmen implementeras för att testa den konstruerade arkitekturen. Denna avhandling bidrar till att förse ny modell för att bygga geografiskt distribuerad arkitektur. Experimentella resultat visar att beräkningstiden reduceras i genomsnitt 70% för de algoritmer som körs ovanför den geo-distribuerade arkitekturen jämfört med den distribuerade konfigurationen. På liknande sätt reduceras mängden data som utväxlas över nätverket med 99% i snitt jämfört med den distribuerade inställningen.

Style APA, Harvard, Vancouver, ISO itp.

34

Gater, Christian. "Fault-tolerant distributed measurement systems". Thesis, University of Edinburgh, 1987. http://hdl.handle.net/1842/16990.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Laitala, J. (Joni). "Metadata management in distributed file systems". Bachelor's thesis, University of Oulu, 2017. http://urn.fi/URN:NBN:fi:oulu-201709092881.

Pełny tekst źródła

Streszczenie:

The purpose of this research has been to study the architectures of popular distributed file systems used in cloud computing, with a focus on their metadata management, in order to identify differences between and issues within varying designs from the metadata perspective. File system and metadata concepts are briefly introduced before the comparisons are made.

Style APA, Harvard, Vancouver, ISO itp.

36

Khalidi, M. Yousef Amin. "Hardware support for distributed object-based systems". Diss., Georgia Institute of Technology, 1989. http://hdl.handle.net/1853/8192.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

張立新 i Lap-sun Cheung. "Load balancing in distributed object computing systems". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31224179.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

38

VASCONCELOS, RAFAEL OLIVEIRA. "AN EFFICIENT APPROACH TO COORDINATED RECONFIGURATION IN DISTRIBUTED DATA STREAM SYSTEMS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2017. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30660@1.

Pełny tekst źródła

Streszczenie:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Ao mesmo tempo em que sistemas de processamento de fluxo de dados devem prover serviços de análise e manipulação de dados ininterruptamente (disponibilidade 24x7), eles comumente também precisam lidar com mudanças em seus ambientes de execução (e.g., alterar a topologia da rede) e nos requisitos que eles devem cumprir (e.g., adição de novas funções de processamento dos fluxos de dados). Por um lado, reconfiguração dinâmica de software (i.e., a capacidade de substituir parte do software em tempo de execução) é uma característica desejável. Por outro lado, sistemas de fluxo de dados podem sofrer com a interrupção e sobrecarga causada pela reconfiguração. Por conta da necessidade de reconfigurar (i.e., evoluir) o sistema ao mesmo tempo em que o sistema não pode ser interrompido (i.e., bloqueado), reconfiguração consistente e não bloqueante é ainda considerada um problema em aberto na literatura. Esta tese apresenta e valida uma abordagem não quiescente para reconfiguração dinâmica de software que preserva a consistência de sistemas de fluxo de dados distribuídos. A abordagem proposta permite que o sistema seja reconfigurado gradual e suavemente, sem precisar interromper o processamento do fluxo de dados ou atingir a quiescência. A avaliação indica que a abordagem proposta realiza reconfiguração distribuída consistentemente e tem um impacto desprezível sobre a diminuição na disponibilidade e no desempenho do sistema. Além disto, a implementação da abordagem proposta teve um desempenho melhor em todos os testes comparativos.
While many data stream systems have to provide continuous (24x7) services with no acceptable downtime, they also have to cope with changes in their execution environments and in the requirements that they must comply (e.g., moving from on-premises architecture to a cloud system, changing the network technology, adding new functionality or modifying existing parts). On one hand, dynamic software reconfiguration (i.e., the capability of evolving on the fly) is a desirable feature. On the other hand, stream systems may suffer from the disruption and overhead caused by the reconfiguration. Due to the necessity of reconfiguring (i.e., evolving) the system whilst the system must not be disrupted (i.e., blocked), consistent and non-disruptive reconfiguration is still considered an open problem. This thesis presents and validates a non-quiescent approach for dynamic software reconfiguration that preserves the consistency of distributed data stream processing systems. Unlike many works that require the system to reach a safe state (e.g., quiescence) before performing a reconfiguration, the proposed approach enables the system to smoothly evolve (i.e., be reconfigured) in a non-disruptive way without reaching quiescence. The evaluation indicates that the proposed approach supports consistent distributed reconfiguration and has negligible impact on availability and performance. Furthermore, the implementation of the proposed approach showed better performance results in all experiments than the quiescent approach and Upstart.

Style APA, Harvard, Vancouver, ISO itp.

39

Reale, Andrea <1986&gt. "Quality of Service in Distributed Stream Processing for large scale Smart Pervasive Environments". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6390/1/main.pdf.

Pełny tekst źródła

Streszczenie:

The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.

Style APA, Harvard, Vancouver, ISO itp.

40

Reale, Andrea <1986&gt. "Quality of Service in Distributed Stream Processing for large scale Smart Pervasive Environments". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6390/.

Pełny tekst źródła

Streszczenie:

The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.

Style APA, Harvard, Vancouver, ISO itp.

41

Kotto, Kombi Roland. "Distributed query processing over fluctuating streams". Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI050/document.

Pełny tekst źródła

Streszczenie:

Le traitement de flux de données est au cœur des problématiques actuelles liées au Big Data. Face à de grandes quantités de données (Volume) accessibles de manière éphémère (Vélocité), des solutions spécifiques tels que les systèmes de gestion de flux de données (SGFD) ont été développés. Ces SGFD reçoivent des flux et des requêtes continues pour générer de nouveaux résultats aussi longtemps que des données arrivent en entrée. Dans le contexte de cette thèse, qui s’est réalisée dans le cadre du projet ANR Socioplug (ANR-13-INFR-0003), nous considérons une plateforme collaborative de traitement de flux de données à débit variant en termes de volume et de distribution des valeurs. Chaque utilisateur peut soumettre des requêtes continues et contribue aux ressources de traitement de la plateforme. Cependant, chaque unité de traitement traitant les requêtes dispose de ressources limitées ce qui peut engendrer la congestion du système en fonction des variations des flux en entrée. Le problème est alors de savoir comment adapter dynamiquement les ressources utilisées par chaque requête continue par rapport aux besoins de traitement. Cela soulève plusieurs défis : i) comment détecter un besoin de reconfiguration ? ii) quand reconfigurer le système pour éviter sa congestion ? Durant ces travaux de thèse, nous nous sommes intéressés à la gestion automatique de la parallélisation des opérateurs composant une requête continue. Nous proposons une approche originale basée sur une estimation des besoins de traitement dans un futur proche. Ainsi, nous pouvons adapter le niveau de parallélisme des opérateurs de manière proactive afin d’ajuster les ressources utilisées aux besoins des traitements. Nous montrons qu’il est possible d’éviter la congestion du système mais également de réduire significativement la consommation de ressources à performance équivalente. Ces différents travaux ont été implémentés et validés dans un SGFD largement utilisé avec différents jeux de tests reproductibles
In a Big Data context, stream processing has become a very active research domain. In order to manage ephemeral data (Velocity) arriving at important rates (Volume), some specific solutions, denoted data stream management systems (DSMSs),have been developed. DSMSs take as inputs some queries, called continuous queries,defined on a set of data streams. Acontinuous query generates new results as long as new data arrive in input. In many application domains, data streams haveinput rates and distribution of values which change over time. These variations may impact significantly processingrequirements for each continuous query.This thesis takes place in the ANR project Socioplug (ANR-13-INFR-0003). In this context, we consider a collaborative platformfor stream processing. Each user can submit multiple continuous queries and contributes to the execution support of theplatform. However, as each processing unit supporting treatments has limited resources in terms of CPU and memory, asignificant increase in input rate may cause the congestion of the system. The problem is then how to adjust dynamicallyresource usage to processing requirements for each continuous query ? It raises several challenges : i) how to detect a need ofreconfiguration ? ii) when reconfiguring the system to avoid its congestion at runtime ?In this work, we are interested by the different processing steps involved in the treatment of a continuous query over adistributed infrastructure. From this global analysis, we extract mechanisms enabling dynamic adaptation of resource usage foreach continuous query. We focus on automatic parallelization, or auto-parallelization, of operators composing the executionplan of a continuous query. We suggest an original approach based on the monitoring of operators and an estimation ofprocessing requirements in near future. Thus, we can increase (scale-out), or decrease (scale-in) the parallelism degree ofoperators in a proactive many such as resource usage fits to processing requirements dynamically. Compared to a staticconfiguration defined by an expert, we show that it is possible to avoid the congestion of the system in many cases or to delay itin most critical cases. Moreover, we show that resource usage can be reduced significantly while delivering equivalentthroughput and result quality. We suggest also to combine this approach with complementary mechanisms for dynamic adaptation of continuous queries at runtime. These differents approaches have been implemented within a widely used DSMS and have been tested over multiple and reproductible micro-benchmarks

Style APA, Harvard, Vancouver, ISO itp.

42

Kohli, Prince. "User-level state sharing in distributed systems". Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/9170.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

43

Brito, Andrey. "Speculation in Parallel and Distributed Event Processing Systems". Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-38911.

Pełny tekst źródła

Streszczenie:

Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.

Style APA, Harvard, Vancouver, ISO itp.

44

Bhasker, Bharat. "Query processing in heterogeneous distributed database management systems". Diss., Virginia Tech, 1992. http://hdl.handle.net/10919/39437.

Pełny tekst źródła

Streszczenie:

The goal of this work is to present an advanced query processing algorithm formulated and developed in support of heterogeneous distributed database management systems. Heterogeneous distributed database management systems view the integrated data through an uniform global schema. The query processing algorithm described here produces an inexpensive strategy for a query expressed over the global schema. The research addresses the following aspects of query processing: (1) Formulation of a low level query language to express the fundamental heterogeneous database operations; (2) Translation of the query expressed over the global schema to an equivalent query expressed over a conceptual schema; (3) An estimation methodology to derive the intermediate result sizes of the database operations; (4) A query decomposition algorithm to generate an efficient sequence of the basic database operations to answer the query. This research addressed the first issue by developing an algebraic query language called cluster algebra. The cluster algebra consists of the following operations: (a) Selection, union, intersection and difference, which are extensions of their relational algebraic counterparts to heterogeneous databases; (b) Normal-join and normal-projection which replace their counterparts, join and projection, in the relational algebra; (c) Two new operators embed and unembed to restructure the database schema. The second issue of the query translation was addressed by development of an algorithm that translates a cluster algebra query expressed over the virtual views to an equivalent cluster algebra query expressed over the conceptual databases. A non-parametric estimation methodology to estimate the result size of a cluster algebra operation was developed to address the third issue described above. Finally, this research developed a query decomposition algorithm, applicable to the relational and non-relational databases, that decomposes a query by computing all profitable semi-join operations, followed by the determination of the best sequence of join operations per processing site. The join optimization is performed by formulating a zero-one integer linear program that uses the non-parametric estimation technique to compute the sizes of intermediate results. The query processing algorithm was implemented in the context of DAVID, a heterogeneous distributed database management system.
Ph. D.

Style APA, Harvard, Vancouver, ISO itp.

45

Elmagarmid, Ahmed Khalifa. "Deadlock detection and resolution in distributed processing systems /". The Ohio State University, 1985. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487261919110166.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

46

DI, SALVO ANDREA. "CMOS distributed signal processing systems for radiation sensors". Doctoral thesis, Politecnico di Torino, 2022. http://hdl.handle.net/11583/2957742.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

47

Ebrahimian, Mohammad Reza. "Power system operations : state estimation distributed processing /". Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

48

Peiro, Sajjad Hooman. "Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers". Licentiate thesis, KTH, Programvaruteknik och Datorsystem, SCS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582.

Pełny tekst źródła

Streszczenie:

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures. First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.

QC 20161005

Style APA, Harvard, Vancouver, ISO itp.

49

Sun, Yudong. "A distributed object model for solving irregularly structured problems on distributed systems /". Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23501662.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

50

Martin, André. "Minimizing Overhead for Fault Tolerance in Event Stream Processing Systems". Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-210251.

Pełny tekst źródła

Streszczenie:

Event Stream Processing (ESP) is a well-established approach for low-latency data processing enabling users to quickly react to relevant situations in soft real-time. In order to cope with the sheer amount of data being generated each day and to cope with fluctuating workloads originating from data sources such as Twitter and Facebook, such systems must be highly scalable and elastic. Hence, ESP systems are typically long running applications deployed on several hundreds of nodes in either dedicated data-centers or cloud environments such as Amazon EC2. In such environments, nodes are likely to fail due to software aging, process or hardware errors whereas the unbounded stream of data asks for continuous processing. In order to cope with node failures, several fault tolerance approaches have been proposed in literature. Active replication and rollback recovery-based on checkpointing and in-memory logging (upstream backup) are two commonly used approaches in order to cope with such failures in the context of ESP systems. However, these approaches suffer either from a high resource footprint, low throughput or unresponsiveness due to long recovery times. Moreover, in order to recover applications in a precise manner using exactly once semantics, the use of deterministic execution is required which adds another layer of complexity and overhead. The goal of this thesis is to lower the overhead for fault tolerance in ESP systems. We first present StreamMine3G, our ESP system we built entirely from scratch in order to study and evaluate novel approaches for fault tolerance and elasticity. We then present an approach to reduce the overhead of deterministic execution by using a weak, epoch-based rather than strict ordering scheme for commutative and tumbling windowed operators that allows applications to recover precisely using active or passive replication. Since most applications are running in cloud environments nowadays, we furthermore propose an approach to increase the system availability by efficiently utilizing spare but paid resources for fault tolerance. Finally, in order to free users from the burden of choosing the correct fault tolerance scheme for their applications that guarantees the desired recovery time while still saving resources, we present a controller-based approach that adapts fault tolerance at runtime. We furthermore showcase the applicability of our StreamMine3G approach using real world applications and examples.

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Distributed Stream Processing Systems”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych