Dissertations / Theses: 'Large Scale Processing'

1

Kutlu, Mucahid. "Parallel Processing of Large Scale Genomic Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436355132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Caneill, Matthieu. "Contributions to large-scale data processing systems." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM006/document.

Full text

Abstract:

Cette thèse couvre le sujet des systèmes de traitement de données àgrande échelle, et plus précisément trois approches complémentaires :la conception d'un système pour prédir des défaillances de serveursgrâce à l'analyse de leurs données de supervision; l'acheminement dedonnées dans un système à temps réel en étudiant les corrélationsentre les champs des messages pour favoriser la localité; etfinalement un environnement de développement innovateur pour concevoirdes transformations de donées en utilisant des graphes orientés deblocs.À travers le projet Smart Support Center, nous concevons unearchitecture qui passe à l'échelle, afin de stocker des sériestemporelles rapportées par des moteurs de supervision, qui vérifienten permanence la santé des systèmes informatiques. Nous utilisons cesdonnées pour effectuer des prédictions, et détecter de potentielsproblèmes avant qu'ils ne ne produisent.Nous nous plongeons ensuite dans les algorithmes d'acheminement pourles sytèmes de traitement de données en temps réel, et développons unecouche pour acheminer les messages plus efficacement, en évitant lesrebonds entre machines. Dans ce but, nous identifions en temps réelles corrélations qui apparaissent entre les champs de ces messages,tels les mots-clics et leur localisation géographique, par exempledans le cas de micromessages. Nous utilisons ces corrélations pourcréer des tables d'acheminement qui favorisent la colocation desacteurs traitant ces messages.Pour finir, nous présentons λ-blocks, un environnement dedéveloppement pour effectuer des tâches de transformations de donnéessans écrire de code source, mais en créant des graphes de blocs decode. L'environnement est rapide, et est distribué avec des pilesincluses: libraries de blocs, modules d'extension, et interfaces deprogrammation pour l'étendre. Il est également capable de manipulerdes graphes d'exécution, pour optimisation, analyse, vérification, outout autre but
This thesis covers the topic of large-scale data processing systems,and more precisely three complementary approaches: the design of asystem to perform prediction about computer failures through theanalysis of monitoring data; the routing of data in a real-time systemlooking at correlations between message fields to favor locality; andfinally a novel framework to design data transformations usingdirected graphs of blocks.Through the lenses of the Smart Support Center project, we design ascalable architecture, to store time series reported by monitoringengines, which constantly check the health of computer systems. We usethis data to perform predictions, and detect potential problems beforethey arise.We then dive in routing algorithms for stream processing systems, anddevelop a layer to route messages more efficiently, by avoiding hopsbetween machines. For that purpose, we identify in real-time thecorrelations which appear in the fields of these messages, such ashashtags and their geolocation, for example in the case of tweets. Weuse these correlations to create routing tables which favor theco-location of actors handling these messages.Finally, we present λ-blocks, a novel programming framework to computedata processing jobs without writing code, but rather by creatinggraphs of blocks of code. The framework is fast, and comes withbatteries included: block libraries, plugins, and APIs to extendit. It is also able to manipulate computation graphs, foroptimization, analyzis, verification, or any other purposes

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Liqiang. "An Efficient Platform for Large-Scale MapReduce Processing." ScholarWorks@UNO, 2009. http://scholarworks.uno.edu/td/963.

Full text

Abstract:

In this thesis we proposed and implemented the MMR, a new and open-source MapRe- duce model with MPI for parallel and distributed programing. MMR combines Pthreads, MPI and the Google's MapReduce processing model to support multi-threaded as well as dis- tributed parallelism. Experiments show that our model signi cantly outperforms the leading open-source solution, Hadoop. It demonstrates linear scaling for CPU-intensive processing and even super-linear scaling for indexing-related workloads. In addition, we designed a MMR live DVD which facilitates the automatic installation and con guration of a Linux cluster with integrated MMR library which enables the development and execution of MMR applications.

APA, Harvard, Vancouver, ISO, and other styles

4

Larsson, Carl-Johan. "Movie Recommendation System Using Large Scale Graph-Processing." Thesis, KTH, Skolan för elektro- och systemteknik (EES), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200601.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Gardner, Tara Conti. "Delipidation Treatments for Large-Scale Protein Purification Processing." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/36512.

Full text

Abstract:

Triglycerides are the majority lipid component of most biochemical mixtures and are virtually water insoluble. Lipid removal is desired prior to protein purification processing to decrease nonspecific fouling of downstream chromatographic matrices. Transgenic pig milk was used as a model system to study delipidation from therapeutic protein sources. The majority of triglycerides was extracted from stable lipid micelles and removed with a method that can be incorporated in downstream protein purification processing without denaturing the target protein. An efficient delipidation treatment used TNBP, a non-polar solvent, to extract lipid micelles and then phase transfer milk lipids into a TNBP-swelled dextran particulate. A batch incubation of a whey/TNBP mixture with pre-swollen Sephadex LH-20 or hydroxyalkoxypropyl dextran (HAPD) beads at 4 C for 24 hours removed 67 + 2 % (0.645 mg triglycerides/ml Sephadex LH-20) and 71 o + 1 % (0.628 mg triglycerides/ml HAPD) of the triglycerides present in the skimmed transgenic whey, respectively. Fully swollen beads removed 20% more triglycerides than beads which were wetted but not swollen in TNBP, indicating that a larger phase volume and internal adsorption of the lipids onto the Sephadex matrix dominates over surface adsorption. Polyclonal ELISAs indicated that 89 + 6% of the recombinant human Protein C was still present in the transgenic whey after this delipidation treatment, indicating this treatment did not denature or harm the target protein.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Jiayin. "Building Efficient Large-Scale Big Data Processing Platforms." Thesis, University of Massachusetts Boston, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10262281.

Full text

Abstract:

In the era of big data, many cluster platforms and resource management schemes are created to satisfy the increasing demands on processing a large volume of data. A general setting of big data processing jobs consists of multiple stages, and each stage represents generally defined data operation such as ltering and sorting. To parallelize the job execution in a cluster, each stage includes a number of identical tasks that can be concurrently launched at multiple servers. Practical clusters often involve hundreds or thousands of servers processing a large batch of jobs. Resource management, that manages cluster resource allocation and job execution, is extremely critical for the system performance.

Generally speaking, there are three main challenges in resource management of the new big data processing systems. First, while there are various pending tasks from dierent jobs and stages, it is difficult to determine which ones deserve the priority to obtain the resources for execution, considering the tasks' different characteristics such as resource demand and execution time. Second, there exists dependency among the tasks that can be concurrently running. For any two consecutive stages of a job, the output data of the former stage is the input data of the later one. The resource management has to comply with such dependency. The third challenge is the inconsistent performance of the cluster nodes. In practice, run-time performance of every server is varying. The resource management needs to dynamically adjust the resource allocation according to the performance change of each server.

The resource management in the existing platforms and prior work often rely on fixed user-specic congurations, and assumes consistent performance in each node. The performance, however, is not satisfactory under various workloads. This dissertation aims to explore new approaches to improving the eciency of large-scale big data processing platforms. In particular, the run-time dynamic factors are carefully considered when the system allocates the resources. New algorithms are developed to collect run-time data and predict the characteristics of jobs and the cluster. We further develop resource management schemes that dynamically tune the resource allocation for each stage of every running job in the cluster. New findings and techniques in this dissertation will certainly provide valuable and inspiring insights to other similar problems in the research community.

APA, Harvard, Vancouver, ISO, and other styles

7

Clifford, Raphael. "Indexed strings for large scale genomic analysis." Thesis, Imperial College London, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.268368.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Schaeppi, Reto. "Large scale processing of microarray data a diploma thesis /." Zurich : Information and Communication Systems Research Group, Institute of Information Systems, Swiss Federal Institute of Technology, 2002. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=48.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mesmoudi, Amin. "Declarative parallel query processing on large scale astronomical databases." Thesis, Lyon 1, 2015. http://www.theses.fr/2015LYO10326.

Full text

Abstract:

Les travaux de cette thèse s'inscrivent dans le cadre du projet Petasky. Notre objectif est de proposer des outils permettant de gérer des dizaines de Peta-octets de données issues d'observations astronomiques. Nos travaux se focalisent essentiellement sur la conception des nouveaux systèmes permettant de garantir le passage à l'échelle. Dans cette thèse, nos contributions concernent trois aspects : Benchmarking des systèmes existants, conception d'un nouveau système et optimisation du système. Nous avons commencé par analyser la capacité des systèmes fondés sur le modèle MapReduce et supportant SQL à gérer les données LSST et leurs capacités d'optimisation de certains types de requêtes. Nous avons pu constater qu'il n'y a pas de technique « magique » pour partitionner, stocker et indexer les données mais l'efficacité des techniques dédiées dépend essentiellement du type de requête et de la typologie des données considérées. Suite à notre travail de Benchmarking, nous avons retenu quelques techniques qui doivent être intégrées dans un système de gestion de données à large échelle. Nous avons conçu un nouveau système de façon à garantir la capacité dudit système à supporter plusieurs mécanismes de partitionnement et plusieurs opérateurs d'évaluation. Nous avons utilisé BSP (Bulk Synchronous Parallel) comme modèle de calcul. Les données sont représentées logiquement par des graphes. L'évaluation des requêtes est donc faite en explorant le graphe de données en utilisant les arcs entrants et les arcs sortants. Les premières expérimentations ont montré que notre approche permet une amélioration significative des performances par rapport aux systèmes Map/Reduce
This work is carried out in framework of the PetaSky project. The objective of this project is to provide a set of tools allowing to manage Peta-bytes of data from astronomical observations. Our work is concerned with the design of a scalable approach. We first started by analyzing the ability of MapReduce based systems and supporting SQL to manage the LSST data and ensure optimization capabilities for certain types of queries. We analyzed the impact of data partitioning, indexing and compression on query performance. From our experiments, it follows that there is no “magic” technique to partition, store and index data but the efficiency of dedicated techniques depends mainly on the type of queries and the typology of data that are considered. Based on our work on benchmarking, we identified some techniques to be integrated to large-scale data management systems. We designed a new system allowing to support multiple partitioning mechanisms and several evaluation operators. We used the BSP (Bulk Synchronous Parallel) model as a parallel computation paradigm. Unlike MapeReduce model, we send intermediate results to workers that can continue their processing. Data is logically represented as a graph. The evaluation of queries is performed by exploring the data graph using forward and backward edges. We also offer a semi-automatic partitioning approach, i.e., we provide the system administrator with a set of tools allowing her/him to choose the manner of partitioning data using the schema of the database and domain knowledge. The first experiments show that our approach provides a significant performance improvement with respect to Map/Reduce systems

APA, Harvard, Vancouver, ISO, and other styles

10

Dreibelbis, Harold N., Dennis Kelsch, and Larry James. "REAL-TIME TELEMETRY DATA PROCESSING and LARGE SCALE PROCESSORS." International Foundation for Telemetering, 1991. http://hdl.handle.net/10150/612912.

Full text

Abstract:

International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Real-time data processing of telemetry data has evolved from a highly centralized single large scale computer system to multiple mini-computers or super mini-computers tied together in a loosely coupled distributed network. Each mini-computer or super mini-computer essentially performing a single function in the real-time processing sequence of events. The reasons in the past for this evolution are many and varied. This paper will review some of the more significant factors in that evolution and will present some alternatives to a fully distributed mini-computer network that appear to offer significant real-time data processing advantages.

APA, Harvard, Vancouver, ISO, and other styles

11

Garcia, Bernal Daniel. "Decentralizing Large-Scale Natural Language Processing with Federated Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278822.

Full text

Abstract:

Natural Language Processing (NLP) is one of the most popular and visible forms of Artificial Intelligence in recent years. This is partly because it has to do with a common characteristic of human beings: language. NLP applications allow to create new services in the industrial sector in order to offer new solutions and provide significant productivity gains. All of this has happened thanks to the rapid progression of Deep Learning models. Large scale contextual representation models, such asWord2Vec, ELMo and BERT, have significantly advanced NLP in recently years. With these latest NLP models, it is possible to understand the semantics of text to a degree never seen before. However, they require large amounts of text data to process to achieve high-quality results. This data can be gathered from different sources, but one of the main collection points are devices such as smartphones, smart appliances and smart sensors. Lamentably, joining and accessing all this data from multiple sources is extremely challenging due to privacy and regulatory reasons. New protocols and techniques have been developed to solve this limitation by training models in a massively distributed manner taking advantage of the powerful characteristic of the devices that generates the data. Particularly, this research aims to test the viability of training NLP models, in specific Word2Vec, with a massively distributed protocol like Federated Learning. The results show that FederatedWord2Vecworks as good as Word2Vec is most of the scenarios, even surpassing it in some semantics benchmark tasks. It is a novel area of research, where few studies have been conducted, with a large knowledge gap to fill in future researches.
Naturlig språkbehandling är en av de mest populära och synliga formerna av artificiell intelligens under de senaste åren. Det beror delvis på att det har att göra med en gemensam egenskap hos människor: språk. Naturlig språkbehandling applikationer gör det möjligt att skapa nya tjänster inom industrisektorn för att erbjuda nya lösningar och ge betydande produktivitetsvinster. Allt detta har hänt tack vare den snabba utvecklingen av modeller för djup inlärning. Modeller i storskaligt sammanhang, som Word2Vec, ELMo och BERT har väsentligt avancerat naturligt språkbehandling på senare tid år. Med dessa senaste naturliga språkbearbetningsmo modeller är det möjligt att förstå textens semantik i en grad som aldrig sett förut. De kräver dock stora mängder textdata för att bearbeta för att uppnå högkvalitativa resultat. Denna information kan samlas in från olika källor, men ett av de viktigaste insamlingsställena är enheter som smartphones, smarta apparater och smarta sensorer. Beklagligtvis är det extremt utmanande att gå med och komma åt alla dessa uppgifter från flera källor på grund av integritetsskäl och regleringsskäl. Nya protokoll och tekniker har utvecklats för att lösa denna begränsning genom att träna modeller på ett massivt distribuerat sätt med fördel av de kraftfulla egenskaperna hos enheterna som genererar data. Särskilt syftar denna forskning till att testa livskraften för att utbilda naturligt språkbehandling modeller, i specifika Word2Vec, med ett massivt distribuerat protokoll som Förenat Lärande. Resultaten visar att det Förenade Word2Vec fungerar lika bra som Word2Vec är de flesta av scenarierna, till och med överträffar det i vissa semantiska riktmärken. Det är ett nytt forskningsområde, där få studier har genomförts, med ett stort kunskapsgap för att fylla i framtida forskningar.

APA, Harvard, Vancouver, ISO, and other styles

12

Alkathiri, Abdul Aziz. "Decentralized Large-Scale Natural Language Processing Using Gossip Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281277.

Full text

Abstract:

The field of Natural Language Processing in machine learning has seen rising popularity and use in recent years. The nature of Natural Language Processing, which deals with natural human language and computers, has led to the research and development of many algorithms that produce word embeddings. One of the most widely-used of these algorithms is Word2Vec. With the abundance of data generated by users and organizations and the complexity of machine learning and deep learning models, performing training using a single machine becomes unfeasible. The advancement in distributed machine learning offers a solution to this problem. Unfortunately, due to reasons concerning data privacy and regulations, in some real-life scenarios, the data must not leave its local machine. This limitation has lead to the development of techniques and protocols that are massively-parallel and data-private. The most popular of these protocols is federated learning. However, due to its centralized nature, it still poses some security and robustness risks. Consequently, this led to the development of massively-parallel, data private, decentralized approaches, such as gossip learning. In the gossip learning protocol, every once in a while each node in the network randomly chooses a peer for information exchange, which eliminates the need for a central node. This research intends to test the viability of gossip learning for large- scale, real-world applications. In particular, it focuses on implementation and evaluation for a Natural Language Processing application using gossip learning. The results show that application of Word2Vec in a gossip learning framework is viable and yields comparable results to its non-distributed, centralized counterpart for various scenarios, with an average loss on quality of 6.904%.
Fältet Naturlig Språkbehandling (Natural Language Processing eller NLP) i maskininlärning har sett en ökande popularitet och användning under de senaste åren. Naturen av Naturlig Språkbehandling, som bearbetar naturliga mänskliga språk och datorer, har lett till forskningen och utvecklingen av många algoritmer som producerar inbäddningar av ord. En av de mest använda av dessa algoritmer är Word2Vec. Med överflödet av data som genereras av användare och organisationer, komplexiteten av maskininlärning och djupa inlärningsmodeller, blir det omöjligt att utföra utbildning med hjälp av en enda maskin. Avancemangen inom distribuerad maskininlärning erbjuder en lösning på detta problem, men tyvärr får data av sekretesskäl och datareglering i vissa verkliga scenarier inte lämna sin lokala maskin. Denna begränsning har lett till utvecklingen av tekniker och protokoll som är massivt parallella och dataprivata. Det mest populära av dessa protokoll är federerad inlärning (federated learning), men på grund av sin centraliserade natur utgör det ändock vissa säkerhets- och robusthetsrisker. Följaktligen ledde detta till utvecklingen av massivt parallella, dataprivata och decentraliserade tillvägagångssätt, såsom skvallerinlärning (gossip learning). I skvallerinlärningsprotokollet väljer varje nod i nätverket slumpmässigt en like för informationsutbyte, vilket eliminerarbehovet av en central nod. Syftet med denna forskning är att testa livskraftighetenav skvallerinlärning i större omfattningens verkliga applikationer. I synnerhet fokuserar forskningen på implementering och utvärdering av en NLP-applikation genom användning av skvallerinlärning. Resultaten visar att tillämpningen av Word2Vec i en skvallerinlärnings ramverk är livskraftig och ger jämförbara resultat med dess icke-distribuerade, centraliserade motsvarighet för olika scenarier, med en genomsnittlig kvalitetsförlust av 6,904%.

APA, Harvard, Vancouver, ISO, and other styles

13

Andersson-Sunna, Josefin. "Large Scale Privacy-Centric Data Collection, Processing, and Presentation." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-84930.

Full text

Abstract:

It has become an important part of business development to collect statistical data from online sources. Information about users and how they interact with an online source can help improving the user experience and increasing sales of products. Collecting data about users has many benefits for the business owner, but it also raises privacy issues since more and more information about users are spread over the internet. Tools that collect statistical data from online sources exists, but using such tools gives away the control over the data collected. If a business implements its own analytics system, it is easier to make it more privacy centric and the control over the data collected is kept. This thesis examines what techniques that are most suitable for a system whose purpose is to collect, store, process, and present large-scale privacy centric data. Research about what technique to use for collecting data and how to keep track of unique users in a privacy centric way has been made as well as research about what database to use that can handle many write requests and store large scale data. A prototype was implemented based on the research, where JavaScript tagging is used to collect data from several online sources and cookies is used to keep track of unique users. Cassandra was chosen as database for the prototype because of its high scalability and speed at write requests. Two versions of the processing of raw data into statistical reports was implemented to be able to evaluate if the data should be preprocessed or if the reports could be created when the user asks for it. To evaluate the techniques used in the prototype, load tests of the prototype was made where the results showed that a bottleneck was reached after 45 seconds on a workload of 600 write requests per second. The tests also showed that the prototype managed to keep its performance at a workload of 500 write requests per second for one hour, where it completed 1 799 953 requests. Latency tests when processing raw data into statistical reports was also made to evaluate if the data should be preprocessed or processed when the user asks for the report. The result showed that it took around 30 seconds to process 1 200 000 rows of data from the database which is too long for a user to wait for the report. When investigating what part of the processing that increased the latency the most it showed that it was the retrieval of data from the database that increased the latency. It took around 25 seconds to retrieve the data and only around 5 seconds to process it into statistical reports. The tests showed that Cassandra is slow when retrieving many rows of data, but fast when writing data which is more important in this prototype.
Det har blivit en viktig del av affärsutvecklingen hos företag att samla in statistiska data från deras online-källor. Information om användare och hur de interagerar med en online-källa kan hjälpa till att förbättra användarupplevelsen och öka försäljningen av produkter. Att samla in data om användare har många fördelar för företagsägaren, men det väcker också integritetsfrågor eftersom mer och mer information om användare sprids över internet. Det finns redan verktyg som kan samla in statistiska data från online-källor, men när sådana verktyg används förloras kontrollen över den insamlade informationen. Om ett företag implementerar sitt eget analyssystem är det lättare att göra det mer integritetscentrerat och kontrollen över den insamlade informationen behålls. Detta arbete undersöker vilka tekniker som är mest lämpliga för ett system vars syfte är att samla in, lagra, bearbeta och presentera storskalig integritetscentrerad information. Teorier har undersökts om vilken teknik som ska användas för att samla in data och hur man kan hålla koll på unika användare på ett integritetscentrerat sätt, samt om vilken databas som ska användas som kan hantera många skrivförfrågningar och lagra storskaligdata. En prototyp implementerades baserat på teorierna, där JavaScript-taggning används som metod för att samla in data från flera online källor och cookies används för att hålla reda på unika användare. Cassandra valdes som databas för prototypen på grund av dess höga skalbarhet och snabbhet vid skrivförfrågningar. Två versioner av bearbetning av rådata till statistiska rapporter implementerades för att kunna utvärdera om data skulle bearbetas i förhand eller om rapporterna kunde skapas när användaren ber om den. För att utvärdera teknikerna som användes i prototypen gjordes belastningstester av prototypen där resultaten visade att en flaskhals nåddes efter 45 sekunder på en arbetsbelastning på 600 skrivförfrågningar per sekund. Testerna visade också att prototypen lyckades hålla prestandan med en arbetsbelastning på 500 skrivförfrågningar per sekund i en timme, där den slutförde 1 799 953 förfrågningar. Latenstest vid bearbetning av rådata till statistiska rapporter gjordes också för att utvärdera om data ska förbehandlas eller bearbetas när användaren ber om rapporten. Resultatet visade att det tog cirka 30 sekunder att bearbeta 1 200 000 rader med data från databasen vilket är för lång tid för en användare att vänta på rapporten. Vid undersökningar om vilken del av bearbetningen som ökade latensen mest visade det att det var hämtningen av data från databasen som ökade latensen. Det tog cirka 25 sekunder att hämta data och endast cirka 5 sekunder att bearbeta dem till statistiska rapporter. Testerna visade att Cassandra är långsam när man hämtar ut många rader med data, men är snabb på att skriva data vilket är viktigare i denna prototyp.

APA, Harvard, Vancouver, ISO, and other styles

14

Ali, Nasar A. "Thermomechanical processing of 34CrNiMo6 steel for large scale forging." Thesis, University of Sheffield, 2014. http://etheses.whiterose.ac.uk/7102/.

Full text

Abstract:

This work simulated the thermo-mechanical processing of large-scale forging product made of 34CrNiMo6 steel to evaluate the effect of different processing condition parameters and cooling rates on the variation of microstructure and the final mechanical properties. Through this investigation we tried to achieve the required mechanical properties for deep sea applications, which were a minimum Charpy impact value of 38J at temperature of -20 °C according to ABS specifications and a minimum surface hardness of 302 HB according to First Subsea specification design. Initially, a series of single and multi-hit plane strain compression tests were performed to evaluate the hot-deformed microstructure in thermo-mechanical processing, with particular attention paid to the effect of austenitising temperature and deformation conditions of temperature, strain and strain rate. The exponential law, power law and hyperbolic sine law types of Zener–Hollomon equations were utilised to calculate the hot activation energy of deformation (Qdef). In addition the constitutive equations were used for modelling and generalising the DRV and DRX flow curves of 34CrNiMo6 steel, using the method proposed by Avrami. Secondly, a heat treatment process using different austenitising temperatures and different cooling rates was also investigated to achieve the required aims, in which many tests were performed through controlling the temperatures, soaking times, and cooling rates to study the effect of the heat treatment parameters on the grain size and transformation behaviour of austenite. Additionally, to attempt to refine the austenite grain size and to increase the austenite phase percentage within the microstructure, multiple heat treatment paths were also used. A double normalizing, double quenching, and single tempered process were used in all possible combinations to investigate their influence on the final microstructure in an attempt to identify the most effective heat treatment cycle with an effective sequence for the heat treatment operations.

APA, Harvard, Vancouver, ISO, and other styles

15

Stein, Oliver. "Intelligent Resource Management for Large-scale Data Stream Processing." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-391927.

Full text

Abstract:

With the increasing trend of using cloud computing resources, the efficient utilization of these resources becomes more and more important. Working with data stream processing is a paradigm gaining in popularity, with tools such as Apache Spark Streaming or Kafka widely available, and companies are shifting towards real-time monitoring of data such as sensor networks, financial data or anomaly detection. However, it is difficult for users to efficiently make use of cloud computing resources and studies show that a lot of energy and compute hardware is wasted. We propose an approach to optimizing resource usage in cloud computing environments designed for data stream processing frameworks, based on bin packing algorithms. Test results show that the resource usage is substantially improved as a result, with future improvements suggested to further increase this. The solution was implemented as an extension of the HarmonicIO data stream processing framework and evaluated through simulated workloads.

APA, Harvard, Vancouver, ISO, and other styles

16

Panneer, Selvan Vaina Malar. "Energy efficiency maximisation in large scale MIMO systems." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/16052.

Full text

Abstract:

The power usage of the communication technology industry and the consistent energy-related pollution are becoming major societal and economic concerns. These concern stimulated academia and industry to an intense activity in the new research area of green cellular networks. Bandwidth Efficiency (BE) is one of the most important metrics to select candidate technologies for next-generation wireless communications systems. Nevertheless, the important goal is to design new innovative network architecture and technologies needed to encounter the explosive development in cellular data demand without increasing the power consumption. As a result, Energy Efficiently (EE) has become another significant metric for evaluating the performance of wireless communications systems. MIMO technology has drawn lots of attention in wireless communication, as it gives substantial increases in link range and throughput without an additional increase in bandwidth or transmits power. Multi-user MIMO (MU-MIMO) regarded when evolved Base Station equipped with multiple antennas communicates with several User Terminal (UEs) at the same time. MU-MIMO is capable of improving either the reliability or the BE by improving either the multiplexing gains or diversity gains. A proposed new idea in MU-MIMO refers to the system that uses hundreds of antennas to serve dozens of UEs simultaneously. This so-called, Large Scale-MIMO (LS MIMO) regarded as a candidate technique for future wireless communication systems. An analysis is conducted to investigate the performance of the proposed uplink and downlink of LS MIMO systems with different linear processing techniques at the base station. The most common precoding and receive combining are considered: minimum mean squared error (MMSE), maximum ratio transmission/combining (MRT/MRC), and zero-forcing (ZF)processing. The fundamental problems answered on how to select the number of (BS) antennas M, number of active (UEs) K, and the transmit power to cover a given area with maximal EE. The EE is defined as the number of bits transferred per Joule of energy. A new power consumption model is proposed to emphasise that the real power scales faster with M and K than scaling linearly. The new power consumption model is utilised for deriving closed-form EE maximising values of the number of BS antennas, number of active UEs and transmit power under the assumption that ZF processing is deployed in the uplink and downlink transmissions for analytic convenience. This analysis is then extended to the imperfect CSI case and to symmetric multi-cell scenarios. These expressions provide valuable design understandings on the interaction between systems parameters, propagation environment, and different components of the power consumption model. Analytical results are assumed only for ZF with perfect channel state information (CSI) to compute closed-form expression for the optimal number of UEs, number of BS antennas, and transmit power. Numerical results are provided (a) for all the investigated schemes with perfect CSI and in a single-cell scenario; (b) for ZF with imperfect CSI, and in a multi-cell scenario. The simulation results show that (a) an LS MIMO with 100 - 200 BS antennas are the correct number of antennas for energy efficiency maximisation; (b) these number of BS antennas should serve number of active UEs of the same size; (c) since the circuit power increases the transmit power should increase with number of BS antennas; (d) the radiated power antenna is in the range of 10-100 mW and decreases with number of BS antennas; (e) ZF processing provides the highest EE in all the scenarios due to active interference-suppression at affordable complexity. Therefore, these are highly relevant results that prove LS MIMO is the technique to achieve high EE in future cellular networks.

APA, Harvard, Vancouver, ISO, and other styles

17

Riley, George F. "Techniques for large scale distributed simulations of computer networks." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/10010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Le, Riguer E. M. J. "Generic VLSI architectures : chip designs for image processing applications." Thesis, Queen's University Belfast, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Singh, Babita 1986. "Large-scale study of RNA processing alterations in multiple cancers." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/572859.

Full text

Abstract:

RNA processing and their alterations are determinant to understand normal and disease cell phenotypes. In particular, specific alterations in the RNA processing of genes has been linked to widely accepted cancer hallmarks. With the availability of large-scale genomic and transcriptomic data for multiple cancer types, it is now possible to address ambitious questions such as obtaining a global view of alterations in RNA processing specific to each cancer type as well as in common across all types. The first objective of this thesis is to obtain a global view of RNA processing alterations across different tumor types along with alterations with respect to RNA binding proteins (trans-component), their tumor-type specificity, differential expression, mutations, copy number variation and whether these alterations result in differential splicing. Using data for more than 4000 patients from 11 tumor types, we provide the link between alterations of RNA binding proteins and splicing changes across multiple tumor types. Second objective moves one step further and explores in detail the RNA-processing alterations with respect to mutations on RNA regulatory sequences (cis-components). Using whole genome sequencing data for more than 1000 cancer patients, we thoroughly study the sequence of entire genes and report significantly mutated short regions in coding and non-coding parts of genes that are moreover enriched in RNA putative RNA regulatory sites, including regions deep into the introns. The recurrence of some of the mutations in non-coding regions is comparable to some of already known driver genes in coding regions. We further analyze the impact of these mutations at the RNA level by using RNA sequencing from the same samples. This work proposes a novel and powerful strategy to study mutations in cancer to identify novel oncogenic mechanisms. In addition, we share the immense amount of data generated in these analyses so that other researchers can study them in detail and validate them experimentally.
El procesamiento del ARN y sus alteraciones son determinantes para entender el fenotipo de las células en condiciones normales y de enfermedad. En particular, alteraciones en el procesamiento de ARN de determinados genes se han vinculado a características distintivas del cáncer ampliamente aceptadas. Con la disponibilidad de datos genómicos y transcriptómicos a gran escala paramúltiples tipos de cáncer, es posible abordar cuestiones ambiciosas como la obtención de una visión global de las alteraciones en el procesamiento de ARN que son específicas para cada tipo de cáncer, así como de aquellas las comunes a varios tipos. El primer objetivo de esta tesis es obtener una visión global de las alteraciones del procesamiento de ARN en diferentes tipos de tumores, así como de las alteraciones en las proteínas de unión a ARN (componente trans), y si dichas alteraciones resultan en un procesamiento diferencial del RNA. Utilizando datos de más de 4000 pacientes para 11 tipos de tumores, establecemos la relación entre las alteraciones de las proteínas de unión a ARN y cambios de splicing en múltiples tipos de tumores. El segundo objetivo va un paso más allá y explora en detalle las alteraciones del procesamiento de ARN con respecto a mutaciones en las secuencias reguladoras del ARN (componente cis). Utilizando datos de genomas completos para más de 1000 pacientes, estudiamos a fondo la secuencia de genes para identificar regiones cortas significativamente mutadas en partes codificantes y no codificantes por proteína, y que además están enriquecidas en posibles sitios reguladores del ARN, incluyendo regiones intrónicas profundas. La recurrencia de las mutaciones en algunas regiones no codificantes es comparable a la de algunos genes drivers de cáncer conocidos. Además, analizamos el impacto de estas mutaciones a nivel del ARN mediante el uso de datos de secuenciación de ARN de las mismas muestras. Este trabajo propone una estrategia novedosa y potente para estudiar las mutaciones en cáncer con el fin de identificar nuevos mecanismos oncogénicos. Además, compartimos la inmensa cantidad de datos generados en estos análisis para que otros investigadores los puedan estudiar en detalle y validarlos experimentalmente.

APA, Harvard, Vancouver, ISO, and other styles

20

March, Laurent Moulinier. "Acousto-optic processing for training of large scale neural networks." Thesis, University of Kent, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.411557.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

AZAMBUJA, MARCELLO DE LIMA. "A CLOUD COMPUTING ARCHITECTURE FOR LARGE SCALE VIDEO DATA PROCESSING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2011. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28923@1.

Full text

Abstract:

O advento da Internet coloca grandes desafios para o projeto de sistemas de submissão abertos, uma vez que elimina barreiras físicas e geográficas. A redução de custos, associada a passagem de mídias analógicas para digitais, dificultou a projeção de capacidades e recursos necessários para a construção de tais sistemas. Neste trabalho propomos uma arquitetura de software, baseada em computação na nuvem, que provê a escalabilidade necessária para a construção de sistemas de submissão abertos. Estes sistemas são caracterizados pela geração de grandes volumes de dados. Um caso real de uso é analisado utilizando o processamento de vídeos.
The advent of the Internet poses great challenges to the design of public submission systems as it eliminates traditional barriers, such as geographical location and cost. With open global access, it is very hard to estimate storage space and processing power required by this class of applications. In this thesis we explore cloud computing technology as an alternative solution. The main contribution of this work is a general architecture in which to built open access, data intensive, public submission systems. A real world scenario is analyzed using this architecture for video processing.

APA, Harvard, Vancouver, ISO, and other styles

22

McCusker, Sean. "A digital image processing approach to large-scale turbulence studies." Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/843989/.

Full text

Abstract:

An image processing approach to turbulence studies has been developed. The approach employs a structure tracking technique to quantify the movement of coherent, large-scale turbulent structures. The 'structure tracking' technique has been applied to the shear layer of a low speed jet issuing into a low speed crossflow. A study of the characteristics of the turbulent flow within this region involved comparative measurements with hot-wire anemometry measurements within the same flow regime and fractal analysis of the flow visualisation images used by the tracking routine. Fractal analysis was applied to flow visualisation images to educe a range of length scales made apparent by the flow visualisation equipment The results obtained with the structure tracking technique included the instantaneous velocity of the structures and a measure of their length scales. The instantaneous velocity measurements were used to calculate a turbulence characteristic associated with the structures. Further analysis revealed subsets of this turbulence characteristic involving the variation in average velocity of individual structures as well as variations in the instantaneous velocity of individual structures. Where possible, the results of the structure tracking technique were compared to those achieved by hot wire anemometry and good correspondence was found between the mean flow characteristics measured by both techniques. The results of the two techniques began to diverge in the regions of the flow where conventional hot-wire anemometry was unable to discriminate between the flow associated with the jet and that associated with the crossflow. In such regions, time-averaged hot-wire anemometry produced results which combined the measurements in both flow regimes and therefore attenuated any characteristics of the jet which were significantly different from those of the crossflow. In the same flow regions the structure tracking technique was able to measure those characteristics specifically associated with the jet, producing results, which reflected the behaviour of the jet more accurately.

APA, Harvard, Vancouver, ISO, and other styles

23

Shi, Rong Shi. "Efficient data and metadata processing in large-scale distributed systems." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1534414418404428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Judge, Paul Q. "Security and protection architectures for large-scale content distribution." Diss., Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/9217.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Pham, David, and David Pham. "Processing High Purity Zirconium Diboride Ultra-High Temperature Ceramics: Small-to-Large Scale Processing." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/621315.

Full text

Abstract:

Next generation aerospace vehicles require thermal protection system (TPS) materials that are capable of withstanding the extreme aerothermal environment during hypersonic flight (>Mach 5 [>1700 m/s]). Ultra-high temperature ceramics (UHTC) such as zirconium diboride (ZrB₂) are candidate TPS materials due to their high-temperature thermal and mechanical properties and are often the basis for advanced composites for enhanced oxidation resistance. However, ZrB₂ matrix impurities in the form of boron trioxide (B₂O₃) and zirconium dioxide (ZrO₂) limit the high-temperature capabilities. Electric based sintering techniques, such as spark plasma sintering (SPS), that use joule heating have become the preferred densification method to process advanced ceramics due to its ability to produce high density parts with reduced densification times and limit grain growth. This study focuses on a combined experimental and thermodynamic assisted processing approach to enhance powder purity through a carbo- and borocarbo-thermal reduction of oxides using carbon (C) and boron carbide (B₄C). The amount of oxides on the powder surface are measured, the amount of additive required to remove oxides is calculated, and processing conditions (temperature, pressure, environment) are controlled to promote favorable thermodynamic reactions both during thermal processing in a tube furnace and SPS. Untreated ZrB₂ contains 0.18 wt%O after SPS. Additions of 0.75 wt%C is found to reduce powder surface oxides to 0.12 wt%O. A preliminary Zr-C-O computational thermodynamic model shows limited efficiency of carbon additions to completely remove oxygen due to the solubility of oxygen in zirconium carbide (ZrC) forming a zirconium oxycarbide (ZrCₓOᵧ). Scanning electron microscopy (SEM) and scanning transmission electron microscopy (STEM) with atomic scale elemental spectroscopy shows reduced oxygen content with amorphous Zr-B oxides and discreet ZrO₂ particle impurities in the microstructure. Processing ZrB₂ with minimal additions of B₄C (0.25 wt%) produces high purity parts after SPS with only 0.06 wt%O. STEM identifies unique “trash collector” oxides composed of manufacturer powder impurities of calcium, silver, and yttrium. A preliminary Zr-B-C-O thermodynamic model is used to show the potential reaction paths using B₄C that promotes oxide removal to produce high-purity ZrB₂ with fine grains (3.3 𝜇m) and superior mechanical properties (flexural strength of 660MPa) than the current state-of-the-art ZrB₂ ceramics. Due to the desirable properties produced using SPS, there is growing interest to advance processing techniques from lab-scale (20 mm discs) to large-scale (>100 mm). The advancement of SPS technologies has been stunted due to the limited power and load delivery of lab-scale furnaces. We use a large scale direct current sintering furnace (DCS) to address the challenges of producing industrially relevant sized parts. However, current-assisted sintering techniques, like SPS and DCS, are highly dependent on tooling resistances and the electrical conductivity of the sample, which influences the part uniformity through localized heating spots that are strongly dependent on the current flow path. We develop a coupled thermal-electrical finite element analysis model to investigate the development and effects of tooling and current density manipulation on an electrical conductor (ZrB₂) and an electrical insulator, silicon nitride (Si₃N₄), at the steady-state where material properties, temperature gradients and current/voltage input are constant. The model is built based on experimentally measured temperature gradients in the tooling for 20 mm discs and validated by producing 30 mm discs with similar temperature gradients and grain size uniformity across the part. The model aids in developing tooling to manipulate localize current density in specific regions to produce uniform 100 mm discs of ZrB₂ and Si₃N₄.

APA, Harvard, Vancouver, ISO, and other styles

26

Kulkarni, Gaurav Ramesh. "Scalable parallel algorithms and software for large scale proteomics." Pullman, Wash. : Washington State University, 2010. http://www.dissertations.wsu.edu/Thesis/Fall2009/g_kulkarni_121609.pdf.

Full text

Abstract:

Thesis (M.S. in computer science)--Washington State University, May 2010.
Title from PDF title page (viewed on June 2, 2010). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 64-66).

APA, Harvard, Vancouver, ISO, and other styles

27

Sathisan, Shashi Kumar. "Encapsulation of large scale policy assisting computer models." Thesis, Virginia Polytechnic Institute and State University, 1985. http://hdl.handle.net/10919/101261.

Full text

Abstract:

In the past two decades policy assisting computer models have made a tremendous impact in the analysis of national security issues and the analysis of problems in various government affairs. SURMAN (Survivability Management) is a policy assisting model that has been developed for use in national security planning. It is a large scale model formulated using the system dynamics approach of treating a problem in its entirety rather than in parts. In this thesis, an encapsulation of SURMAN is attempted so as to sharpen and focus its ability to perform policy/design evaluation. It is also aimed to make SURMAN more accessible to potential users and to provide a simple tool to the decision makers without having to resort to the mainframe computers. To achieve these objectives a personal/microcomputer version of SURMAN (PC SURMAN) and a series of curves relating inputs to outputs are developed. PC SURMAN reduces the complexity of SURMAN by dealing with generic aircraft. It details the essential survivability management parameters and their causal relationships through the life-cycle of aircraft systems. The model strives to link the decision parameters (inputs) to the measures of effectiveness (outputs). The principal decision variables identified are survivability, availability, and inventory of the aircraft system. The measures of effectiveness identified are the Increase Payload Delivered to Target Per Loss (ITDPL), Cost Elasticity of Targets Destroyed Per Loss (CETDPL), Combat Value Ratio (COMVR), Kill to Loss Ratio (KLR), and Decreased Program Life-Cycle Cost (DPLCC). The model provides an opportunity for trading off decision parameters. The trading off of survivability enhancement techniques and the defense budget allocation parameters for selecting those techniques/parameters with higher benefits and lower penalties are discussed. The information relating inputs to outputs for the tradeoff analysis is presented graphically using curves derived from experimentally designed computer runs.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

28

Gu, Weiming. "On-line monitoring and interactive steering of large-scale parallel and distributed applications." Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/9220.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Devaney, Mark David. "Plan recognition in a large-scale multi-agent tactical domain." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/9195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Trezzo, Christopher J. "Continuous MapReduce an architecture for large-scale in-situ data processing /." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477939.

Full text

Abstract:

Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed July 16, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (leaves 48-51).

APA, Harvard, Vancouver, ISO, and other styles

31

Lu, Jiamin [Verfasser]. "Parallel SECONDO : processing moving objects data at large scale / Jiamin Lu." Hagen : Fernuniversität Hagen, 2014. http://d-nb.info/1057895512/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Hirai, Tsuguhito. "Performance Modeling of Large-Scale Parallel-Distributed Processing for Cloud Environment." Kyoto University, 2018. http://hdl.handle.net/2433/232493.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Nairouz, Bassem R. "Conceptual design methodology of distributed intelligence large scale systems." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49077.

Full text

Abstract:

Distributed intelligence systems are starting to gain dominance in the field of large-scale complex systems. These systems are characterized by nonlinear behavior patterns that are only predicted through simulation-based engineering. In addition, the autonomy, intelligence, and reconfiguration capabilities required by certain systems introduce obstacles adding another layer of complexity. However, there exists no standard process for the design of such systems. This research presents a design methodology focusing on distributed control architectures while concurrently considering the systems design process. The methodology has two major components. First, it introduces a hybrid design process, based on the infusion of the control architecture and conceptual system design processes. The second component is the development of control architectures metamodel, placing a distinction between control configuration and control methods. This enables a standard representation of a wide spectrum of control architectures frameworks.

APA, Harvard, Vancouver, ISO, and other styles

34

Roy, Amber Joyce. "Dynamic Grid-Based Data Distribution Management in Large Scale Distributed Simulations." Thesis, University of North Texas, 2000. https://digital.library.unt.edu/ark:/67531/metadc2699/.

Full text

Abstract:

Distributed simulation is an enabling concept to support the networked interaction of models and real world elements that are geographically distributed. This technology has brought a new set of challenging problems to solve, such as Data Distribution Management (DDM). The aim of DDM is to limit and control the volume of the data exchanged during a distributed simulation, and reduce the processing requirements of the simulation hosts by relaying events and state information only to those applications that require them. In this thesis, we propose a new DDM scheme, which we refer to as dynamic grid-based DDM. A lightweight UNT-RTI has been developed and implemented to investigate the performance of our DDM scheme. Our results clearly indicate that our scheme is scalable and it significantly reduces both the number of multicast groups used, and the message overhead, when compared to previous grid-based allocation schemes using large-scale and real-world scenarios.

APA, Harvard, Vancouver, ISO, and other styles

35

Seshadri, Sangeetha. "Enhancing availability in large scale." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29715.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Ling Liu; Committee Member: Brian Cooper; Committee Member: Calton Pu; Committee Member: Douglas Blough; Committee Member: Karsten Schwan. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

36

Kilinc-Karzan, Fatma. "Tractable relaxations and efficient algorithmic techniques for large-scale optimization." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41141.

Full text

Abstract:

In this thesis, we develop tractable relaxations and efficient algorithms for large-scale optimization. Our developments are motivated by a recent paradigm, Compressed Sensing (CS), which consists of acquiring directly low-dimensional linear projections of signals, possibly corrupted with noise, and then using sophisticated recovery procedures for signal reconstruction. We start by analyzing how to utilize a priori information given in the form of sign restrictions on part of the entries. We propose necessary and sufficient on the sensing matrix for exact recovery of sparse signals, utilize them in deriving error bounds under imperfect conditions, suggest verifiable sufficient conditions and establish their limits of performance. In the second part of this thesis, we study the CS synthesis problem -selecting the minimum number of rows from a given matrix, so that the resulting submatrix possesses certifiably good recovery properties. We express the synthesis problem as the problem of approximating a given matrix by a matrix of specified low rank in the uniform norm and develop a randomized algorithm for this problem. The third part is dedicated to efficient First-Order Methods (FOMs) for large-scale, well-structured convex optimization problems. We propose FOMs with stochastic oracles that come with exact guarantees on solution quality, achieve sublinear time behavior, and through extensive simulations, show considerable improvement over the state-of-the-art deterministic FOMs. In the last part, we examine a general sparse estimation problem -estimating a block sparse linear transform of a signal from the undersampled observations of the signal corrupted with nuisance and stochastic noise. We show that an extension of the earlier results to this more general framework is possible. In particular, we suggest estimators that have efficiently verifiable guaranties of performance and provide connections to well-known results in CS theory.

APA, Harvard, Vancouver, ISO, and other styles

37

Yang, Su. "PC-grade parallel processing and hardware acceleration for large-scale data analysis." Thesis, University of Huddersfield, 2009. http://eprints.hud.ac.uk/id/eprint/8754/.

Full text

Abstract:

Arguably, modern graphics processing units (GPU) are the first commodity, and desktop parallel processor. Although GPU programming was originated from the interactive rendering in graphical applications such as computer games, researchers in the field of general purpose computation on GPU (GPGPU) are showing that the power, ubiquity and low cost of GPUs makes them an ideal alternative platform for high-performance computing. This has resulted in the extensive exploration in using the GPU to accelerate general-purpose computations in many engineering and mathematical domains outside of graphics. However, limited to the development complexity caused by the graphics-oriented concepts and development tools for GPU-programming, GPGPU has mainly been discussed in the academic domain so far and has not yet fully fulfilled its promises in the real world. This thesis aims at exploiting GPGPU in the practical engineering domain and presented a novel contribution to GPGPU-driven linear time invariant (LTI) systems that are employed by the signal processing techniques in stylus-based or optical-based surface metrology and data processing. The core contributions that have been achieved in this project can be summarized as follow. Firstly, a thorough survey of the state-of-the-art of GPGPU applications and their development approaches has been carried out in this thesis. In addition, the category of parallel architecture pattern that the GPGPU belongs to has been specified, which formed the foundation of the GPGPU programming framework design in the thesis. Following this specification, a GPGPU programming framework is deduced as a general guideline to the various GPGPU programming models that are applied to a large diversity of algorithms in scientific computing and engineering applications. Considering the evolution of GPU’s hardware architecture, the proposed frameworks cover through the transition of graphics-originated concepts for GPGPU programming based on legacy GPUs and the abstraction of stream processing pattern represented by the compute unified device architecture (CUDA) in which GPU is considered as not only a graphics device but a streaming coprocessor of CPU. Secondly, the proposed GPGPU programming framework are applied to the practical engineering applications, namely, the surface metrological data processing and image processing, to generate the programming models that aim to carry out parallel computing for the corresponding algorithms. The acceleration performance of these models are evaluated in terms of the speed-up factor and the data accuracy, which enabled the generation of quantifiable benchmarks for evaluating consumer-grade parallel processors. It shows that the GPGPU applications outperform the CPU solutions by up to 20 times without significant loss of data accuracy and any noticeable increase in source code complexity, which further validates the effectiveness of the proposed GPGPU general programming framework. Thirdly, this thesis devised methods for carrying out result visualization directly on GPU by storing processed data in local GPU memory through making use of GPU’s rendering device features to achieve realtime interactions. The algorithms employed in this thesis included various filtering techniques, discrete wavelet transform, and the fast Fourier Transform which cover the common operations implemented in most LTI systems in spatial and frequency domains. Considering the employed GPUs’ hardware designs, especially the structure of the rendering pipelines, and the characteristics of the algorithms, the series of proposed GPGPU programming models have proven its feasibility, practicality, and robustness in real engineering applications. The developed GPGPU programming framework as well as the programming models are anticipated to be adaptable for future consumer-level computing devices and other computational demanding applications. In addition, it is envisaged that the devised principles and methods in the framework design are likely to have significant benefits outside the sphere of surface metrology.

APA, Harvard, Vancouver, ISO, and other styles

38

Lampi, J. (Jaakko). "Large-scale distributed data management and processing using R, Hadoop and MapReduce." Master's thesis, University of Oulu, 2014. http://urn.fi/URN:NBN:fi:oulu-201406191771.

Full text

Abstract:

The exponential growth of raw, i.e. unstructured, data collected by various methods has forced companies to change their business strategies and operational approaches. The revenue strategies of a growing number of companies are solely based on the information gained from data and the utilization of it. Managing and processing large-scale data sets, also know as Big Data, requires new methods and techniques, but storing and transporting the ever-growing amount of data also creates new technological challenges. Wireless sensor networks monitor their clients and track their behavior. A client on a wireless sensor network can be anything from a random object to a living being. The Internet of Things binds these clients together, forming a single, massive network. Data is progressively produced and collected by, for example, research projects, commercial products, and governments for different means. This thesis comprises theory for managing large-scale data sets, introduces existing techniques and technologies, and analyzes the situation vis-a-vis the growing amount of data. As an implementation, a Hadoop cluster running R and Matlab is built and sample data sets collected from different sources are stored and analyzed by using the cluster. Datasets include the cellular band of the long-term spectral occupancy findings from the observatory of IIT (Illinois Institute of Technology) and open weather data from weatherunderground.com. An R software environment running on the master node is used as the main tool for calculations and controlling the data flow between different software. These include Hadoop’s HDFS and MapReduce for storage and analysis, as well as a Matlab server for processing sample data and pipelining it to R. The hypothesis that the cold weather front and snowing in the Chicago (IL, US) area should be shown on the cellular band occupancy is set. As a result of the implementation, thorough, step-by-step guides for setting up and managing a Hadoop cluster and using it via an R environment are produced, along with examples and calculations being done. Analysis of datasets and a comparison of performance between R and MapReduce is produced and speculated upon. Results of the analysis correlate somewhat with the weather, but the dataset used for performance comparison should clearly have been larger in order to produce viable results through distributed computing
Raakadatan eli eri menetelmillä kerätyn strukturoimattoman datan määrän huikea kasvu viime vuosina on ajanut yrityksiä muuttamaan strategioitaan ja toimintamallejaan. Monien uusien yritysten tuottostrategiat pohjautuvat puhtaasti datasta saatavaan informaation ja sen hyväksikäyttöön. Suuret datamäärat ja niin kutsuttu Big Data vaativat uusia menetelmiä ja sovelluksia niin datan prosessoinin kuin analysoinninkin suhteen, mutta myös suurien datamäärien fyysinen tallettaminen ja datan siirtäminen tietokannoista käyttäjille ovat luoneet uusia teknologisia haasteita. Langattomat sensoriverkot seuraavat käyttäjiään, joita voivat periaatteessa olla kaikki fyysiset objektit ja elävät olennot, ja valvovat ja tallentavat niiden käyttäytymistä. Niin kutsuttu Internet of Things yhdistää nämä objektit, tai asiat, yhteen massiiviseen verkostoon. Dataa ja informaatiota kerätään yhä kasvavalla vauhdilla esimerkiksi tutkimusprojekteissa, kaupalliseen tarkoitukseen ja valtioiden turvallisuuden takaamiseen. Diplomityössä käsitellään teoriaa suurten datamäärien hallinnasta, esitellään uusien ja olemassa olevien tekniikoiden ja teknologioiden käyttöä sekä analysoidaan tilannetta datan ja tiedon kannalta. Työosuudessa käydään vaiheittain läpi Hadoop-klusterin rakentaminen ja yleisimpien analysointityökalujen käyttö. Käytettävänä oleva testidata analysoidaan rakennettua klusteria hyväksi käyttäen, analysointitulokset ja klusterin laskentatehokkuus kirjataan ylös ja saatuja tuloksia analysoidaan olemassa olevien ratkaisujen ja tarpeiden näkökulmasta. Työssä käytetyt tietoaineistot ovat IIT (Illinois Institute of Technology) havaintoasemalla kerätty mobiilikaistan käyttöaste sekä avoin säädata weatherunderground.com:ista. Analysointituloksena mobiilikaistan käyttöasteen oletetaan korreloivan kylmään ja lumiseen aikaväliin Chigagon alueella Amerikassa. Työn tuloksena ovat tarkat asennus- ja käyttöohjeet Hadoop-klusterille ja käytetyille ohjelmistoille, aineistojen analysointitulokset sekä analysoinnin suorituskykyvertailu käyttäen R-ohjelmistoympäristöä ja MapReducea. Lopputuloksena voidaan esittää, että mobiilikaistan käyttöasteen voidaan jossain määrin todeta korreloivan sääolosuhteiden kanssa. Suorituskykymittauksessa käytetty tietoaineisto oli selvästi liian pieni, että hajautetusta laskennasta voitaisiin hyötyä

APA, Harvard, Vancouver, ISO, and other styles

39

Colmenares, Hugo Armando Gualdron. "Block-based and structure-based techniques for large-scale graph processing and visualization." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-23032016-145752/.

Full text

Abstract:

Data analysis techniques can be useful in decision-making processes, when patterns of interest can indicate trends in specific domains. Such trends might support evaluation, definition of alternatives, or prediction of events. Currently, datasets have increased in size and complexity, posing challenges to modern hardware resources. In the case of large datasets that can be represented as graphs, issues of visualization and scalable processing are of current concern. Distributed frameworks are commonly used to deal with this data, but the deployment and the management of computational clusters can be complex, demanding technical and financial resources that can be prohibitive in several scenarios. Therefore, it is desirable to design efficient techniques for processing and visualization of large scale graphs that optimize hardware resources in a single computational node. In this course of action, we developed a visualization technique named StructMatrix to find interesting insights on real-life graphs. In addition, we proposed a graph processing framework M-Flash that used a novel, bimodal block processing strategy (BBP) to boost computation speed by minimizing I/O cost. Our results show that our visualization technique allows an efficient and interactive exploration of big graphs and our framework MFlash significantly outperformed all state-of-the-art approaches based on secondary memory. Our contributions have been validated in peer-review events demonstrating the potential of our finding in fostering the analytical possibilities related to large-graph data domains.
Técnicas de análise de dados podem ser úteis em processos de tomada de decisão, quando padrões de interesse indicam tendências em domínios específicos. Tais tendências podem auxiliar a avaliação, a definição de alternativas ou a predição de eventos. Atualmente, os conjuntos de dados têm aumentado em tamanho e complexidade, impondo desafios para recursos modernos de hardware. No caso de grandes conjuntos de dados que podem ser representados como grafos, aspectos de visualização e processamento escalável têm despertado interesse. Arcabouços distribuídos são comumente usados para lidar com esses dados, mas a implantação e o gerenciamento de clusters computacionais podem ser complexos, exigindo recursos técnicos e financeiros que podem ser proibitivos em vários cenários. Portanto é desejável conceber técnicas eficazes para o processamento e visualização de grafos em larga escala que otimizam recursos de hardware em um único nó computacional. Desse modo, este trabalho apresenta uma técnica de visualização chamada StructMatrix para identificar relacionamentos estruturais em grafos reais. Adicionalmente, foi proposta uma estratégia de processamento bimodal em blocos, denominada Bimodal Block Processing (BBP), que minimiza o custo de I/O para melhorar o desempenho do processamento. Essa estratégia foi incorporada a um arcabouço de processamento de grafos denominado M-Flash e desenvolvido durante a realização deste trabalho.Foram conduzidos experimentos a fim de avaliar as técnicas propostas. Os resultados mostraram que a técnica de visualização StructMatrix permitiu uma exploração eficiente e interativa de grandes grafos. Além disso, a avaliação do arcabouço M-Flash apresentou ganhos significativos sobre todas as abordagens baseadas em memória secundária do estado da arte. Ambas as contribuições foram validadas em eventos de revisão por pares, demonstrando o potencial analítico deste trabalho em domínios associados a grafos em larga escala.

APA, Harvard, Vancouver, ISO, and other styles

40

Dzermajko, Caron. "Performance comparison of data distribution management strategies in large-scale distributed simulation." Thesis, University of North Texas, 2004. https://digital.library.unt.edu/ark:/67531/metadc4524/.

Full text

Abstract:

Data distribution management (DDM) is a High Level Architecture/Run-time Infrastructure (HLA/RTI) service that manages the distribution of state updates and interaction information in large-scale distributed simulations. The key to efficient DDM is to limit and control the volume of data exchanged during the simulation, to relay data to only those hosts requiring the data. This thesis focuses upon different DDM implementations and strategies. This thesis includes analysis of three DDM methods including the fixed grid-based, dynamic grid-based, and region-based methods. Also included is the use of multi-resolution modeling with various DDM strategies and analysis of the performance effects of aggregation/disaggregation with these strategies. Running numerous federation executions, I simulate four different scenarios on a cluster of workstations with a mini-RTI Kit framework and propose a set of benchmarks for a comparison of the DDM schemes. The goals of this work are to determine the most efficient model for applying each DDM scheme, discover the limitations of the scalability of the various DDM methods, evaluate the effects of aggregation/disaggregation on performance and resource usage, and present accepted benchmarks for use in future research.

APA, Harvard, Vancouver, ISO, and other styles

41

Deri, Joya A. "Graph Signal Processing: Structure and Scalability to Massive Data Sets." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/725.

Full text

Abstract:

Large-scale networks are becoming more prevalent, with applications in healthcare systems, financial networks, social networks, and traffic systems. The detection of normal and abnormal behaviors (signals) in these systems presents a challenging problem. State-of-the-art approaches such as principal component analysis and graph signal processing address this problem using signal projections onto a space determined by an eigendecomposition or singular value decomposition. When a graph is directed, however, applying methods based on the graph Laplacian or singular value decomposition causes information from unidirectional edges to be lost. Here we present a novel formulation and graph signal processing framework that addresses this issue and that is well suited for application to extremely large, directed, sparse networks. In this thesis, we develop and demonstrate a graph Fourier transform for which the spectral components are the Jordan subspaces of the adjacency matrix. In addition to admitting a generalized Parseval’s identity, this transform yields graph equivalence classes that can simplify the computation of the graph Fourier transform over certain networks. Exploration of these equivalence classes provides the intuition for an inexact graph Fourier transform method that dramatically reduces computation time over real-world networks with nontrivial Jordan subspaces. We apply our inexact method to four years of New York City taxi trajectories (61 GB after preprocessing) over the NYC road network (6,400 nodes, 14,000 directed edges). We discuss optimization strategies that reduce the computation time of taxi trajectories from raw data by orders of magnitude: from 3,000 days to less than one day. Our method yields a fine-grained analysis that pinpoints the same locations as the original method while reducing computation time and decreasing energy dispersal among spectral components. This capability to rapidly reduce raw traffic data to meaningful features has important ramifications for city planning and emergency vehicle routing.

APA, Harvard, Vancouver, ISO, and other styles

42

Giese, Holger, and Stephan Hildebrandt. "Efficient model synchronization of large-scale models." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2009/2928/.

Full text

Abstract:

Model-driven software development requires techniques to consistently propagate modiﬁcations between different related models to realize its full potential. For large-scale models, efﬁciency is essential in this respect. In this paper, we present an improved model synchronization algorithm based on triple graph grammars that is highly efﬁcient and, therefore, can also synchronize large-scale models sufﬁciently fast. We can show, that the overall algorithm has optimal complexity if it is dominating the rule matching and further present extensive measurements that show the efﬁciency of the presented model transformation and synchronization technique.
Die Model-getriebene Softwareentwicklung benötigt Techniken zur Übertragung von Änderungen zwischen verschiedenen zusammenhängenden Modellen, um vollständig nutzbar zu sein. Bei großen Modellen spielt hier die Effizienz eine entscheidende Rolle. In diesem Bericht stellen wir einen verbesserten Modellsynchronisationsalgorithmus vor, der auf Tripel-Graph-Grammatiken basiert. Dieser arbeitet sehr effizient und kann auch sehr große Modelle schnell synchronisieren. Wir können zeigen, dass der Gesamtalgortihmus eine optimale Komplexität aufweist, sofern er die Ausführung dominiert. Die Effizient des Algorithmus' wird durch einige Benchmarkergebnisse belegt.

APA, Harvard, Vancouver, ISO, and other styles

43

Larrabee, Alan Roger. "Adaptation of a large-scale computational chemistry program to the iPSC concurrent computer." Full text open access at:, 1986. http://content.ohsu.edu/u?/etd,116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Twigg, Christopher M. "Floating Gate Based Large-Scale Field-Programmable Analog Arrays for Analog Signal Processing." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11601.

Full text

Abstract:

Large-scale reconfigurable and programmable analog devices provide a new option for prototyping and synthesizing analog circuits for analog signal processing and beyond. Field-programmable analog arrays (FPAAs) built upon floating gate transistor technologies provide the analog reconfigurability and programmability density required for large-scale devices on a single integrated circuit (IC). A wide variety of synthesized circuits, such as OTA followers, band-pass filters, and capacitively coupled summation/difference circuits, were measured to demonstrate the flexibility of FPAAs. Three generations of devices were designed and tested to verify the viability of such floating gate based large-scale FPAAs. Various architectures and circuit topologies were also designed and tested to explore the trade-offs present in reconfigurable analog systems. In addition, large-scale FPAAs have been incorporated into class laboratory exercises, which provide students with a much broader range of circuit and IC design experiences than have been previously possible. By combining reconfigurable analog technologies with an equivalent large-scale digital device, such as a field-programmable gate array (FPGA), an extremely powerful and flexible mixed signal development system can be produced that will enable all of the benefits possible through cooperative analog/digital signal processing (CADSP).

APA, Harvard, Vancouver, ISO, and other styles

45

Kumar, Rajiv. "Enabling large-scale seismic data acquisition, processing and waveform-inversion via rank-minimization." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/62932.

Full text

Abstract:

In this thesis, I adapt ideas from the field of compressed sensing to mitigate the computational and memory bottleneck of seismic processing workflows such as missing-trace interpolation, source separation and wave-equation based inversion for large-scale 3- and 5-D seismic data. For interpolation and source separation using rank-minimization, I propose three main ingredients, namely a rank-revealing transform domain, a subsampling scheme that increases the rank in the transform domain, and a practical large-scale data-consistent rank-minimization framework, which avoids the need for expensive computation of singular value decompositions. We also devise a wave-equation based factorization approach that removes computational bottlenecks and provides access to the kinematics and amplitudes of full-subsurface offset extended images via actions of full extended image volumes on probing vectors, which I use to perform the amplitude-versus- angle analyses and automatic wave-equation migration velocity analyses on complex geological environments. After a brief overview of matrix completion techniques in Chapter 1, we propose a singular value decomposition (SVD)-free factorization based rank-minimization approach for large-scale matrix completion problems. Then, I extend this framework to deal with large-scale seismic data interpolation problems, where I show that the standard approach of partitioning the seismic data into windows is not required, which use the fact that events tend to become linear in these windows, while exploiting the low-rank structure of seismic data. Carefully selected synthetic and realistic seismic data examples validate the efficacy of the interpolation framework. Next, I extend the SVD-free rank-minimization approach to remove the seismic cross-talk in simultaneous source acquisition. Experimental results verify that source separation using the SVD-free rank-minimization approaches are comparable to the sparsity-promotion based techniques; however, separation via rank-minimization is significantly faster and memory efficient. We further introduce a matrix-vector formulation to form full-subsurface extended image volumes, which removes the storage and computational bottleneck found in the convention methods. I demonstrate that the proposed matrix-vector formulation is used to form different image gathers with which amplitude-versus-angle and wave-equation migration velocity analyses is performed, without requiring prior information on the geologic dips. Finally, I conclude the thesis by outlining potential future research directions and extensions of the thesis work.
Science, Faculty of
Earth, Ocean and Atmospheric Sciences, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

46

Xiao, Shucai. "Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/51845.

Full text

Abstract:

Today, heterogeneous computing systems are widely used to meet the increasing demand for high-performance computing. These systems commonly use powerful and energy-efficient accelerators to augment general-purpose processors (i.e., CPUs). The graphic processing unit (GPU) is one such accelerator. Originally designed solely for graphics processing, GPUs have evolved into programmable processors that can deliver massive parallel processing power for general-purpose applications. Using SIMD (Single Instruction Multiple Data) based components as building units; the current GPU architecture is well suited for data-parallel applications where the execution of each task is independent. With the delivery of programming models such as Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), programming GPUs has become much easier than before. However, developing and optimizing an application on a GPU is still a challenging task, even for well-trained computing experts. Such programming tasks will be even more challenging in large-scale heterogeneous systems, particularly in the context of utility computing, where GPU resources are used as a service. These challenges are largely due to the limitations in the current programming models: (1) there are no intra-and inter-GPU cooperative mechanisms that are natively supported; (2) current programming models only support the utilization of GPUs installed locally; and (3) to use GPUs on another node, application programs need to explicitly call application programming interface (API) functions for data communication. To reduce the mapping efforts and to better utilize the GPU resources, we investigate generalizing the utility of GPUs in large-scale heterogeneous systems with GPUs as accelerators. We generalize the utility of GPUs through the transparent virtualization of GPUs, which can enable applications to view all GPUs in the system as if they were installed locally. As a result, all GPUs in the system can be used as local GPUs. Moreover, GPU virtualization is a key capability to support the notion of "GPU as a service." Specifically, we propose the virtual OpenCL (or VOCL) framework for the transparent virtualization of GPUs. To achieve good performance, we optimize and extend the framework in three aspects: (1) optimize VOCL by reducing the data transfer overhead between the local node and remote node; (2) propose GPU synchronization to reduce the overhead of switching back and forth if multiple kernel launches are needed for data communication across different compute units on a GPU; and (3) extend VOCL to support live virtual GPU migration for quick system maintenance and load rebalancing across GPUs. With the above optimizations and extensions, we thoroughly evaluate VOCL along three dimensions: (1) show the performance improvement for each of our optimization strategies; (2) evaluate the overhead of using remote GPUs via several microbenchmark suites as well as a few real-world applications; and (3) demonstrate the overhead as well as the benefit of live virtual GPU migration. Our experimental results indicate that VOCL can generalize the utility of GPUs in large-scale systems at a reasonable virtualization and migration cost.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

47

Wu, Yuanyuan. "HADOOP-EDF: LARGE-SCALE DISTRIBUTED PROCESSING OF ELECTROPHYSIOLOGICAL SIGNAL DATA IN HADOOP MAPREDUCE." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/88.

Full text

Abstract:

The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

APA, Harvard, Vancouver, ISO, and other styles

48

Reale, Andrea <1986&gt. "Quality of Service in Distributed Stream Processing for large scale Smart Pervasive Environments." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6390/1/main.pdf.

Full text

Abstract:

The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.

APA, Harvard, Vancouver, ISO, and other styles

49

Reale, Andrea <1986&gt. "Quality of Service in Distributed Stream Processing for large scale Smart Pervasive Environments." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6390/.

Full text

Abstract:

The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.

APA, Harvard, Vancouver, ISO, and other styles

50

Kruzick, Stephen M. "Optimal Graph Filter Design for Large-Scale Random Networks." Research Showcase @ CMU, 2018. http://repository.cmu.edu/dissertations/1165.

Full text

Abstract:

Graph signal processing analyzes signals supported on the nodes of a network with respect to a shift operator matrix that conforms to the graph structure. For shift-invariant graph filters, which are polynomial functions of the shift matrix, the filter response is defined by the value of the filter polynomial at the shift matrix eigenvalues. Thus, information regarding the spectral decomposition of the shift matrix plays an important role in filter design. However, under stochastic conditions leading to uncertain network structure, the eigenvalues of the shift matrix become random, complicating the filter design task. In such case, empirical distribution functions built from the random matrix eigenvalues may exhibit deterministic limiting behavior that can be exploited for problems on large-scale random networks. Acceleration filters for distributed average consensus dynamics on random networks provide the application covered in this thesis work. The thesis discusses methods from random matrix theory appropriate for analyzing adjacency matrix spectral asymptotics for both directed and undirected random networks, introducing relevant theorems. Network distribution properties that allow computational simplification of these methods are developed, and the methods are applied to important classes of random network distributions. Subsequently, the thesis presents the main contributions, which consist of optimization problems for consensus acceleration filters based on the obtained asymptotic spectral density information. The presented methods cover several cases for the random network distribution, including both undirected and directed networks as well as both constant and switching random networks. These methods also cover two related optimization objectives, asymptotic convergence rate and graph total variation.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Large Scale Processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles