Dissertations / Theses on the topic 'MAPREDUCE ARCHITECTURE'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'MAPREDUCE ARCHITECTURE.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Trezzo, Christopher J. "Continuous MapReduce an architecture for large-scale in-situ data processing /." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477939.
Full textTitle from first page of PDF file (viewed July 16, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (leaves 48-51).
Venumuddala, Ramu Reddy. "Distributed Frameworks Towards Building an Open Data Architecture." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc801911/.
Full textKang, Seunghwa. "On the design of architecture-aware algorithms for emerging applications." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39503.
Full textFerreira, Leite Alessandro. "A user-centered and autonomic multi-cloud architecture for high performance computing applications." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112355/document.
Full textCloud computing has been seen as an option to execute high performance computing (HPC) applications. While traditional HPC platforms such as grid and supercomputers offer a stable environment in terms of failures, performance, and number of resources, cloud computing offers on-Demand resources generally with unpredictable performance at low financial cost. Furthermore, in cloud environment, failures are part of its normal operation. To overcome the limits of a single cloud, clouds can be combined, forming a cloud federation often with minimal additional costs for the users. A cloud federation can help both cloud providers and cloud users to achieve their goals such as to reduce the execution time, to achieve minimum cost, to increase availability, to reduce power consumption, among others. Hence, cloud federation can be an elegant solution to avoid over provisioning, thus reducing the operational costs in an average load situation, and removing resources that would otherwise remain idle and wasting power consumption, for instance. However, cloud federation increases the range of resources available for the users. As a result, cloud or system administration skills may be demanded from the users, as well as a considerable time to learn about the available options. In this context, some questions arise such as: (a) which cloud resource is appropriate for a given application? (b) how can the users execute their HPC applications with acceptable performance and financial costs, without needing to re-Engineer the applications to fit clouds' constraints? (c) how can non-Cloud specialists maximize the features of the clouds, without being tied to a cloud provider? and (d) how can the cloud providers use the federation to reduce power consumption of the clouds, while still being able to give service-Level agreement (SLA) guarantees to the users? Motivated by these questions, this thesis presents a SLA-Aware application consolidation solution for cloud federation. Using a multi-Agent system (MAS) to negotiate virtual machine (VM) migrations between the clouds, simulation results show that our approach could reduce up to 46% of the power consumption, while trying to meet performance requirements. Using the federation, we developed and evaluated an approach to execute a huge bioinformatics application at zero-Cost. Moreover, we could decrease the execution time in 22.55% over the best single cloud execution. In addition, this thesis presents a cloud architecture called Excalibur to auto-Scale cloud-Unaware application. Executing a genomics workflow, Excalibur could seamlessly scale the applications up to 11 virtual machines, reducing the execution time by 63% and the cost by 84% when compared to a user's configuration. Finally, this thesis presents a product line engineering (PLE) process to handle the variabilities of infrastructure-As-A-Service (IaaS) clouds, and an autonomic multi-Cloud architecture that uses this process to configure and to deal with failures autonomously. The PLE process uses extended feature model (EFM) with attributes to describe the resources and to select them based on users' objectives. Experiments realized with two different cloud providers show that using the proposed model, the users could execute their application in a cloud federation environment, without needing to know the variabilities and constraints of the clouds
Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.
Full textPh. D.
Yang, Zhao. "Spatial Data Mining Analytical Environment for Large Scale Geospatial Data." ScholarWorks@UNO, 2016. http://scholarworks.uno.edu/td/2284.
Full textde, Souza Ferreira Tharso. "Improving Memory Hierarchy Performance on MapReduce Frameworks for Multi-Core Architectures." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129468.
Full textThe need of analyzing large data sets from many different application fields has fostered the use of simplified programming models like MapReduce. Its current popularity is justified by being a useful abstraction to express data parallel processing and also by effectively hiding synchronization, fault tolerance and load balancing management details from the application developer. MapReduce frameworks have also been ported to multi-core and shared memory computer systems. These frameworks propose to dedicate a different computing CPU core for each map or reduce task to execute them concurrently. Also, Map and Reduce phases share a common data structure where main computations are applied. In this work we describe some limitations of current multi-core MapReduce frameworks. First, we describe the relevance of the data structure used to keep all input and intermediate data in memory. Current multi-core MapReduce frameworks are designed to keep all intermediate data in memory. When executing applications with large data input, the available memory becomes too small to store all framework intermediate data and there is a severe performance loss. We propose a memory management subsystem to allow intermediate data structures the processing of an unlimited amount of data by the use of a disk spilling mechanism. Also, we have implemented a way to manage concurrent access to disk of all threads participating in the computation. Finally, we have studied the effective use of the memory hierarchy by the data structures of the MapReduce frameworks and proposed a new implementation of partial MapReduce tasks to the input data set. The objective is to make a better use of the cache and to eliminate references to data blocks that are no longer in use. Our proposal was able to significantly reduce the main memory usage and improves the overall performance with the increasing of cache memory usage.
Adornes, Daniel Couto. "A unified mapreduce programming interface for multi-core and distributed architectures." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2015. http://tede2.pucrs.br/tede2/handle/tede/6782.
Full textMade available in DSpace on 2016-06-22T19:44:58Z (GMT). No. of bitstreams: 1 DIS_DANIEL_COUTO_ADORNES_COMPLETO.pdf: 1894086 bytes, checksum: f87c59fa92f43ed62efaafd9c724ed8d (MD5) Previous issue date: 2015-03-31
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES
In order to improve performance, simplicity and scalability of large datasets processing, Google proposed the MapReduce parallel pattern. This pattern has been implemented in several ways for different architectural levels, achieving significant results for high performance computing. However, developing optimized code with those solutions requires specialized knowledge in each framework?s interface and programming language. Recently, the DSL-POPP was proposed as a framework with a high-level language for patternsoriented parallel programming, aimed at abstracting complexities of parallel and distributed code. Inspired on DSL-POPP, this work proposes the implementation of a unified MapReduce programming interface with rules for code transformation to optimized solutions for shared-memory multi-core and distributed architectures. The evaluation demonstrates that the proposed interface is able to avoid performance losses, while also achieving a code and a development cost reduction from 41.84% to 96.48%. Moreover, the construction of the code generator, the compatibility with other MapReduce solutions and the extension of DSL-POPP with the MapReduce pattern are proposed as future work.
Visando melhoria de performance, simplicidade e escalabilidade no processamento de dados amplos, o Google prop?s o padr?o paralelo MapReduce. Este padr?o tem sido implementado de variadas formas para diferentes n?veis de arquitetura, alcan?ando resultados significativos com respeito a computa??o de alto desempenho. No entanto, desenvolver c?digo otimizado com tais solu??es requer conhecimento especializado na interface e na linguagem de programa??o de cada solu??o. Recentemente, a DSL-POPP foi proposta como uma solu??o de linguagem de programa??o de alto n?vel para programa??o paralela orientada a padr?es, visando abstrair as complexidades envolvidas em programa??o paralela e distribu?da. Inspirado na DSL-POPP, este trabalho prop?e a implementa??o de uma interface unificada de programa??o MapReduce com regras para transforma??o de c?digo para solu??es otimizadas para arquiteturas multi-core de mem?ria compartilhada e distribu?da. A avalia??o demonstra que a interface proposta ? capaz de evitar perdas de performance, enquanto alcan?a uma redu??o de c?digo e esfor?o de programa??o de 41,84% a 96,48%. Ademais, a constru??o do gerador de c?digo, a compatibilidade com outras solu??es MapReduce e a extens?o da DSL-POPP com o padr?o MapReduce s?o propostas para trabalhos futuros.
Pan, Jie. "Modélisation et exécution des applications d'analyse de données multi-dimentionnelles sur architectures distribuées." Phd thesis, Ecole Centrale Paris, 2010. http://tel.archives-ouvertes.fr/tel-00579125.
Full textPalanisamy, Balaji. "Cost-effective and privacy-conscious cloud service provisioning: architectures and algorithms." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/52157.
Full textLiu, Yu-Yang, and 劉育瑒. "Parallel Genetic-Fuzzy Mining with MapReduce Architecture." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/eq783m.
Full text國立中山大學
資訊工程學系研究所
103
Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results. In the past, some researches were proposed to train membership functions by genetic algorithms and could indeed improve the quality of found rules. Those kinds of methods were, however, suffered from the long execution time in the training phase. Besides, after appropriate fuzzy membership functions are found, mining out the frequent itemsets from them is also a very time-consuming process as traditional data mining. In this thesis, we thus propose a series of approaches based on the MapReduce architecture to speed up the GA-fuzzy mining process. The contributions can be divided into three parts, including data preprocessing, membership-function training by GA, and fuzzy association-rule derivation. All are performed by MapReduce. For data preprocessing, the proposed approach can not only transform the original data into key-value format to fit the requirement of MapReduce, but also efficiently reduce the redundant database scan by joining the quantities into lists. For membership-function training by GA, the fitness evaluation, which is the most time-costly process, is distributed to shorten the execution time. At last, a distributed fuzzy rule mining approach based on FP-growth is designed to improve the time efficiency of finding fuzzy association rules. The performance between using a single processor and using MapReduce will be compared and discussed from experiments and the results show that our approaches can efficiently reduce the execution time of the whole process.
VARSHNEY, PRATEEK KUMAR VARSHNEY. "IMPLEMENTING PARALLEL PSO ALGORITHM USING MAPREDUCE ARCHITECTURE." Thesis, 2016. http://dspace.dtu.ac.in:8080/jspui/handle/repository/14678.
Full textLO, HSIANG-FU, and 羅祥福. "Study of Performance Optimization Scheme for Hadoop MapReduce Architecture." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/55720702234492522318.
Full text國防大學理工學院
國防科學研究所
104
As the use of cloud computing increases rapidly, Big Data also continue to grow quickly. The performance of data processing for big data has become an important research issue. This thesis discusses performance measurement methods together with performance tuning scheme in Hadoop MapReduce and then correspondingly proposes the performance improvement methods. To design a performance measurement scheme for Hadoop information hiding applications, a Performance AnalysiS Scheme for MapReduce Information Hiding (PASS-MIH) model is proposed to analyze and measure the performance impact factors of Hadoop information hiding applications. Experimental results show that PASS-MIH model can estimate four levels of performance impact factors for MR-based LSB test case and gain 53.8% performance improvement rate while integrating an existing Hadoop parameter tuning method. In addition, a Comprehensive Performance Rating (CPR) model was used to identify nine principal components from workload history and Hadoop configuration that strongly impacted the Hadoop performance. Experimental results indicate that tuning principal components of Hadoop configurations can produce non-linear performance results. In addition, an ACO-based Hadoop Configuration Optimization (ACO-HCO) scheme is proposed to optimize the performance of Hadoop by automatically tuning its configuration parameter settings. ACO-HCO first employed gene expression programming technique to build an object function based on historical job running records, which represents a correlation among the Hadoop configuration parameters. It then employs ant colony optimization technique, which makes use of the objective function to search for optimal or near optimal parameter settings. Experimental results verify that ACO-HCO scheme enhances the performance of Hadoop significantly compared with the default settings. Moreover, it outperforms both rule-of-thumb settings and the Starfish model in Hadoop performance optimization.
Jun-YiHsu and 徐君毅. "A Job Scheduling of Fair Resource Allocation with Energy-Saving for MapReduce Architecture in Cloud Computing." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/44086073397709531365.
Full text國立成功大學
資訊工程學系碩博士班
100
Cloud computing is one of the most important technological and has a broad range of application. Hadoop is most commonly used in the cloud computing platform. The method of the task scheduler is fairly scheduling at hadoop. Fairly scheduling is easy to implement and distribute tasks fairly in the work environment, but has a multiplicative gap from optimal assignment. On the other hand, the method does not consider about the concept of energy saving. If adding the concept of energy saving in cloud computing, we will be able to gain some benefits. We map the scheduling model of cloud computing into a mathematical model in this thesis, and then find an assignment based on linear programming. However, such an assignment process causes a long, exponential complexity of time, we thus propose an algorithm which is polynomial time for obtaining the assignment. On the other hand, if the resources are distinct in the environment, the execution time for each resource is different but the completion time is same because of the same task. The situation results that the better resource idles. We are able to reduce the clock rate for extending execution time and do not affect the overall time, but it is able to save energy. Since we do not know when the tasks come for on-line scheduling, so we are only able to reduce the clock rate for energy-saving. If we are able to control the state for I/O device, the more energy consumption is reducing, so we present an I/O device scheduling algorithm for reducing more energy consumption when the arrival time of tasks is known. The experiment result shows the function of our proposed strategy is better than that of Hadoop on the completion time and energy consumption.
Kendall, Wesley James. "A Scalable Architecture for Simplifying Full-Range Scientific Data Analysis." 2011. http://trace.tennessee.edu/utk_graddiss/1198.
Full text(9529172), Ejebagom J. Ojogbo. "ZipThru: A software architecture that exploits Zipfian skew in datasets for accelerating Big Data analysis." Thesis, 2020.
Find full text