Academic literature on the topic 'MAPREDUCE ARCHITECTURE'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'MAPREDUCE ARCHITECTURE.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "MAPREDUCE ARCHITECTURE"

1

Jiang, Tao, Huaxi Gu, Kun Wang, Xiaoshan Yu, and Yunfeng Lu. "BHyberCube: A MapReduce aware heterogeneous architecture for data center." Computer Science and Information Systems 14, no. 3 (2017): 611–27. http://dx.doi.org/10.2298/csis170202019t.

Full text
Abstract:
Some applications, like MapReduce, ask for heterogeneous network in data center network. However, the traditional network topologies, like fat tree and BCube, are homogeneous. MapReduce is a distributed data processing application. In this paper, we propose a BHyberCube network (BHC), which is a new heterogeneous network for MapReduce. Heterogeneous nodes and scalability issues are addressed considering the implementation of MapReduce in the existing topologies. Mathematical model is established to demonstrate the procedure of building a BHC. Comparisons of BHC and other topologies show the good properties BHC possesses for MapReduce. We also do simulations of BHC in multi-job injection and different probability of worker servers? communications scenarios respectively. The result and analysis show that the BHC could be a viable interconnection topology in today?s data center for MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
2

Park, Jong-Hyuk, Hwa-Young Jeong, Young-Sik Jeong, and Min Choi. "REST-MapReduce: An Integrated Interface but Differentiated Service." Journal of Applied Mathematics 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/170723.

Full text
Abstract:
With the fast deployment of cloud computing, MapReduce architectures are becoming the major technologies for mobile cloud computing. The concept of MapReduce was first introduced as a novel programming model and implementation for a large set of computing devices. In this research, we propose a novel concept of REST-MapReduce, enabling users to use only the REST interface without using the MapReduce architecture. This approach provides a higher level of abstraction by integration of the two types of access interface, REST API and MapReduce. The motivation of this research stems from the slower response time for accessing simple RDBMS on Hadoop than direct access to RDMBS. This is because there is overhead to job scheduling, initiating, starting, tracking, and management during MapReduce-based parallel execution. Therefore, we provide a good performance for REST Open API service and for MapReduce, respectively. This is very useful for constructing REST Open API services on Hadoop hosting services, for example, Amazon AWS (Macdonald, 2005) or IBM Smart Cloud. For evaluating performance of our REST-MapReduce framework, we conducted experiments with Jersey REST web server and Hadoop. Experimental result shows that our approach outperforms conventional approaches.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Bin, Jia Jin Le, and Mei Wang. "Effective ACPS-Based Rescheduling of Parallel Batch Processing Machines with MapReduce." Applied Mechanics and Materials 575 (June 2014): 820–24. http://dx.doi.org/10.4028/www.scientific.net/amm.575.820.

Full text
Abstract:
MapReduce is a highly efficient distributed and parallel computing framework, allowing users to readily manage large clusters in parallel computing. For Big data search problem in the distributed computing environment based on MapReduce architecture, in this paper we propose an Ant colony parallel search algorithm (ACPSMR) for Big data. It take advantage of the group intelligence of ant colony algorithm for global parallel search heuristic scheduling capabilities to solve problem of multi-task parallel batch scheduling with low efficiency in the MapReduce. And we extended HDFS design in MapReduce architecture, which make it to achieve effective integration with MapReduce. Then the algorithm can make the best of the scalability, high parallelism of MapReduce. The simulation experiment result shows that, the new algorithm can take advantages of cloud computing to get good efficiency when mining Big data.
APA, Harvard, Vancouver, ISO, and other styles
4

Mitra, Arnab, Anirban Kundu, Matangini Chattopadhyay, and Samiran Chattopadhyay. "On the Exploration of Equal Length Cellular Automata Rules Targeting a MapReduce Design in Cloud." International Journal of Cloud Applications and Computing 8, no. 2 (April 2018): 1–26. http://dx.doi.org/10.4018/ijcac.2018040101.

Full text
Abstract:
A MapReduce design with Cellular Automata (CA) is presented in this research article to facilitate load-reduced independent data processing and cost-efficient physical implementation in heterogeneous Cloud architecture. Equal Length Cellular Automata (ELCA) are considered for the design. This article explores ELCA rules and presents an ELCA based MapReduce design in cloud. New algorithms are presented for i) synthesis, ii) classification of ELCA rules, and iii) ELCA based MapReduce design in Cloud. Shuffling and efficient reduction of data volume are ensured in proposed MapReduce design.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Rong, and Haibo Chen. "Tiled-MapReduce." ACM Transactions on Architecture and Code Optimization 10, no. 1 (April 2013): 1–30. http://dx.doi.org/10.1145/2445572.2445575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Loughran, S., Jose M. Alcaraz Calero, A. Farrell, J. Kirschnick, and J. Guijarro. "Dynamic Cloud Deployment of a MapReduce Architecture." IEEE Internet Computing 16, no. 6 (November 2012): 40–50. http://dx.doi.org/10.1109/mic.2011.163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

de Kruijf, M., and K. Sankaralingam. "MapReduce for the Cell Broadband Engine Architecture." IBM Journal of Research and Development 53, no. 5 (September 2009): 10:1–10:12. http://dx.doi.org/10.1147/jrd.2009.5429076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Hanpeng, Wuqi Gao, and Junmin Luo. "Research on Intelligentization of Cloud Computing Programs Based on Self-awareness." International Journal of Advanced Network, Monitoring and Controls 8, no. 2 (June 1, 2023): 89–98. http://dx.doi.org/10.2478/ijanmc-2023-0060.

Full text
Abstract:
Abstract Through the research of MapReduce programming framework of cloud computing, the current MapReduce program only solves specific problems, and there is no design experience or design feature summary of MapReduce program, let alone formal description and experience inheritance and application of knowledge base. In order to solve the problem of intelligent cloud computing program, a general MapReduce program generation method is designed. This paper proposes the architecture of intelligent cloud computing by studying AORBCO model and combining cloud computing technology. According to the behavior control mechanism in AORBCO model, a program generation method of MapReduce in intelligent cloud computing is proposed. This method will extract entity information in input data set and entity information in knowledge base in intelligent cloud computing for similarity calculation, and extract the entity in the top order as key key-value pair information in intelligent cloud computing judgment data set. The data processing types are divided, and then aligned with each specific MapReduce capability, and the MapReduce program generation experiment is verified in the AORBCO model development platform. The experiment shows that the complexity of big data MapReduce program code is simplified, and the generated code execution efficiency is good.
APA, Harvard, Vancouver, ISO, and other styles
9

Khudhair, Muslim Mohsin, Adil AL-Rammahi, and Furkan Rabee. "An innovativefractal architecture model for implementing MapReduce in an open multiprocessing parallel environment." Indonesian Journal of Electrical Engineering and Computer Science 30, no. 2 (May 1, 2023): 1059. http://dx.doi.org/10.11591/ijeecs.v30.i2.pp1059-1067.

Full text
Abstract:
One of the infrastructure applications that cloud computing offers as a service is parallel data processing. MapReduce is a type of parallel processing used more and more by data-intensive applications in cloud computing environments. MapReduce is based on a strategy called "divide and conquer," which uses regular computers, also called "nodes," to do processing in parallel. This paper looks at how open multiprocessing (OpenMP), the best shared-memory parallel programming model for high-performance computing, can be used with the proposed fractal network model in the MapReduce application. A well-known model, the cube, is used to compare the fractal network model and its work. Where experiments demonstrated that the fractal model is preferable to the cube model. The fractal model achieved an average speedup of 2.7 and an efficiency rate of 67.7%. In contrast, the cube model could only reach an average speedup of 2.5 and an efficiency rate of 60.4%.
APA, Harvard, Vancouver, ISO, and other styles
10

Marzuni, Saeed Mirpour, Abdorreza Savadi, Adel N. Toosi, and Mahmoud Naghibzadeh. "Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce." Future Generation Computer Systems 115 (February 2021): 188–200. http://dx.doi.org/10.1016/j.future.2020.09.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "MAPREDUCE ARCHITECTURE"

1

Trezzo, Christopher J. "Continuous MapReduce an architecture for large-scale in-situ data processing /." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477939.

Full text
Abstract:
Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed July 16, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (leaves 48-51).
APA, Harvard, Vancouver, ISO, and other styles
2

Venumuddala, Ramu Reddy. "Distributed Frameworks Towards Building an Open Data Architecture." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc801911/.

Full text
Abstract:
Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and Spark.
APA, Harvard, Vancouver, ISO, and other styles
3

Kang, Seunghwa. "On the design of architecture-aware algorithms for emerging applications." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39503.

Full text
Abstract:
This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.
APA, Harvard, Vancouver, ISO, and other styles
4

Ferreira, Leite Alessandro. "A user-centered and autonomic multi-cloud architecture for high performance computing applications." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112355/document.

Full text
Abstract:
Le cloud computing a été considéré comme une option pour exécuter des applications de calcul haute performance. Bien que les plateformes traditionnelles de calcul haute performance telles que les grilles et les supercalculateurs offrent un environnement stable du point de vue des défaillances, des performances, et de la taille des ressources, le cloud computing offre des ressources à la demande, généralement avec des performances imprévisibles mais à des coûts financiers abordables. Pour surmonter les limites d’un cloud individuel, plusieurs clouds peuvent être combinés pour former une fédération de clouds, souvent avec des coûts supplémentaires légers pour les utilisateurs. Une fédération de clouds peut aider autant les fournisseurs que les utilisateurs à atteindre leurs objectifs tels la réduction du temps d’exécution, la minimisation des coûts, l’augmentation de la disponibilité, la réduction de la consommation d’énergie, pour ne citer que ceux-Là. Ainsi, la fédération de clouds peut être une solution élégante pour éviter le sur-Approvisionnement, réduisant ainsi les coûts d’exploitation en situation de charge moyenne, et en supprimant des ressources qui, autrement, resteraient inutilisées et gaspilleraient ainsi de énergie. Cependant, la fédération de clouds élargit la gamme des ressources disponibles. En conséquence, pour les utilisateurs, des compétences en cloud computing ou en administration système sont nécessaires, ainsi qu’un temps d’apprentissage considérable pour maîtrises les options disponibles. Dans ce contexte, certaines questions se posent: (a) Quelle ressource du cloud est appropriée pour une application donnée? (b) Comment les utilisateurs peuvent-Ils exécuter leurs applications HPC avec un rendement acceptable et des coûts financiers abordables, sans avoir à reconfigurer les applications pour répondre aux normes et contraintes du cloud ? (c) Comment les non-Spécialistes du cloud peuvent-Ils maximiser l’usage des caractéristiques du cloud, sans être liés au fournisseur du cloud ? et (d) Comment les fournisseurs de cloud peuvent-Ils exploiter la fédération pour réduire la consommation électrique, tout en étant en mesure de fournir un service garantissant les normes de qualité préétablies ? À partir de ces questions, la présente thèse propose une solution de consolidation d’applications pour la fédération de clouds qui garantit le respect des normes de qualité de service. On utilise un système multi-Agents pour négocier la migration des machines virtuelles entre les clouds. En nous basant sur la fédération de clouds, nous avons développé et évalué une approche pour exécuter une énorme application de bioinformatique à coût zéro. En outre, nous avons pu réduire le temps d’exécution de 22,55% par rapport à la meilleure exécution dans un cloud individuel. Cette thèse présente aussi une architecture de cloud baptisée « Excalibur » qui permet l’adaptation automatique des applications standards pour le cloud. Dans l’exécution d’une chaîne de traitements de la génomique, Excalibur a pu parfaitement mettre à l’échelle les applications sur jusqu’à 11 machines virtuelles, ce qui a réduit le temps d’exécution de 63% et le coût de 84% par rapport à la configuration de l’utilisateur. Enfin, cette thèse présente un processus d’ingénierie des lignes de produits (PLE) pour gérer la variabilité de l’infrastructure à la demande du cloud, et une architecture multi-Cloud autonome qui utilise ce processus pour configurer et faire face aux défaillances de manière indépendante. Le processus PLE utilise le modèle étendu de fonction avec des attributs pour décrire les ressources et les sélectionner en fonction des objectifs de l’utilisateur. Les expériences réalisées avec deux fournisseurs de cloud différents montrent qu’en utilisant le modèle proposé, les utilisateurs peuvent exécuter leurs applications dans un environnement de clouds fédérés, sans avoir besoin de connaître les variabilités et contraintes du cloud
Cloud computing has been seen as an option to execute high performance computing (HPC) applications. While traditional HPC platforms such as grid and supercomputers offer a stable environment in terms of failures, performance, and number of resources, cloud computing offers on-Demand resources generally with unpredictable performance at low financial cost. Furthermore, in cloud environment, failures are part of its normal operation. To overcome the limits of a single cloud, clouds can be combined, forming a cloud federation often with minimal additional costs for the users. A cloud federation can help both cloud providers and cloud users to achieve their goals such as to reduce the execution time, to achieve minimum cost, to increase availability, to reduce power consumption, among others. Hence, cloud federation can be an elegant solution to avoid over provisioning, thus reducing the operational costs in an average load situation, and removing resources that would otherwise remain idle and wasting power consumption, for instance. However, cloud federation increases the range of resources available for the users. As a result, cloud or system administration skills may be demanded from the users, as well as a considerable time to learn about the available options. In this context, some questions arise such as: (a) which cloud resource is appropriate for a given application? (b) how can the users execute their HPC applications with acceptable performance and financial costs, without needing to re-Engineer the applications to fit clouds' constraints? (c) how can non-Cloud specialists maximize the features of the clouds, without being tied to a cloud provider? and (d) how can the cloud providers use the federation to reduce power consumption of the clouds, while still being able to give service-Level agreement (SLA) guarantees to the users? Motivated by these questions, this thesis presents a SLA-Aware application consolidation solution for cloud federation. Using a multi-Agent system (MAS) to negotiate virtual machine (VM) migrations between the clouds, simulation results show that our approach could reduce up to 46% of the power consumption, while trying to meet performance requirements. Using the federation, we developed and evaluated an approach to execute a huge bioinformatics application at zero-Cost. Moreover, we could decrease the execution time in 22.55% over the best single cloud execution. In addition, this thesis presents a cloud architecture called Excalibur to auto-Scale cloud-Unaware application. Executing a genomics workflow, Excalibur could seamlessly scale the applications up to 11 virtual machines, reducing the execution time by 63% and the cost by 84% when compared to a user's configuration. Finally, this thesis presents a product line engineering (PLE) process to handle the variabilities of infrastructure-As-A-Service (IaaS) clouds, and an autonomic multi-Cloud architecture that uses this process to configure and to deal with failures autonomously. The PLE process uses extended feature model (EFM) with attributes to describe the resources and to select them based on users' objectives. Experiments realized with two different cloud providers show that using the proposed model, the users could execute their application in a cloud federation environment, without needing to know the variabilities and constraints of the clouds
APA, Harvard, Vancouver, ISO, and other styles
5

Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.

Full text
Abstract:
Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional space of optimization methodologies, which is challenging even for the well-trained expert. The complexity of this multi-dimensional space arises not only from the traditionally well known but arduous task of architecture-aware GPU optimization at design and compile time, but it also arises in the partitioning and scheduling of the computation across these heterogeneous resources. Even with programming models like the Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), the developer still needs to manage the data transfer be- tween host and device and vice versa, orchestrate the execution of several kernels, and more arduously, optimize the kernel code. In this dissertation, we aim to deliver a transparent parallel programming environment for heterogeneous resources by leveraging the power of the MapReduce programming model and OpenCL programming language. We propose a portable architecture-aware framework that efficiently runs an application across heterogeneous resources, specifically AMD GPUs and NVIDIA GPUs, while hiding complex architectural details from the developer. To further enhance performance portability, we explore approaches for asynchronously and efficiently distributing the computations across heterogeneous resources. When applied to benchmarks and representative applications, our proposed framework significantly enhances performance, including up to 58% improvement over traditional approaches to task assignment and up to a 45-fold improvement over state-of-the-art MapReduce implementations.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
6

Yang, Zhao. "Spatial Data Mining Analytical Environment for Large Scale Geospatial Data." ScholarWorks@UNO, 2016. http://scholarworks.uno.edu/td/2284.

Full text
Abstract:
Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users.
APA, Harvard, Vancouver, ISO, and other styles
7

de, Souza Ferreira Tharso. "Improving Memory Hierarchy Performance on MapReduce Frameworks for Multi-Core Architectures." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129468.

Full text
Abstract:
La necesidad de analizar grandes conjuntos de datos de diferentes tipos de aplicaciones ha popularizado el uso de modelos de programación simplicados como MapReduce. La popularidad actual se justifica por ser una abstracción útil para expresar procesamiento paralelo de datos y también ocultar eficazmente la sincronización de datos, tolerancia a fallos y la gestión de balanceo de carga para el desarrollador de la aplicación. Frameworks MapReduce también han sido adaptados a los sistema multi-core y de memoria compartida. Estos frameworks proponen que cada core de una CPU ejecute una tarea Map o Reduce de manera concurrente. Las fases Map y Reduce también comparten una estructura de datos común donde se aplica el procesamiento principal. En este trabajo se describen algunas limitaciones de los actuales frameworks para arquitecturas multi-core. En primer lugar, se describe la estructura de datos que se utiliza para mantener todo el archivo de entrada y datos intermedios en la memoria. Los frameworks actuales para arquitecturas multi-core han estado diseñado para mantener todos los datos intermedios en la memoria. Cuando se ejecutan aplicaciones con un gran conjunto de datos de entrada, la memoria disponible se convierte en demasiada pequeña para almacenar todos los datos intermedios del framework, presentando así una grave pérdida de rendimiento. Proponemos un subsistema de gestión de memoria que permite a las estructuras de datos procesar un número ilimitado de datos a través del uso de un mecanismo de spilling en el disco. También implementamos una forma de gestionar el acceso simultáneo al disco por todos los threads que realizan el procesamiento. Por último, se estudia la utilización eficaz de la jerarquía de memoria de los frameworks MapReduce y se propone una nueva implementación de una tarea MapReduce parcial para conjuntos de datos de entrada. El objetivo es hacer un buen uso de la caché, eliminando las referencias a los bloques de datos que ya no están en uso. Nuestra propuesta fue capaz de reducir significativamente el uso de la memoria principal y mejorar el rendimiento global con el aumento del uso de la memoria caché.
The need of analyzing large data sets from many different application fields has fostered the use of simplified programming models like MapReduce. Its current popularity is justified by being a useful abstraction to express data parallel processing and also by effectively hiding synchronization, fault tolerance and load balancing management details from the application developer. MapReduce frameworks have also been ported to multi-core and shared memory computer systems. These frameworks propose to dedicate a different computing CPU core for each map or reduce task to execute them concurrently. Also, Map and Reduce phases share a common data structure where main computations are applied. In this work we describe some limitations of current multi-core MapReduce frameworks. First, we describe the relevance of the data structure used to keep all input and intermediate data in memory. Current multi-core MapReduce frameworks are designed to keep all intermediate data in memory. When executing applications with large data input, the available memory becomes too small to store all framework intermediate data and there is a severe performance loss. We propose a memory management subsystem to allow intermediate data structures the processing of an unlimited amount of data by the use of a disk spilling mechanism. Also, we have implemented a way to manage concurrent access to disk of all threads participating in the computation. Finally, we have studied the effective use of the memory hierarchy by the data structures of the MapReduce frameworks and proposed a new implementation of partial MapReduce tasks to the input data set. The objective is to make a better use of the cache and to eliminate references to data blocks that are no longer in use. Our proposal was able to significantly reduce the main memory usage and improves the overall performance with the increasing of cache memory usage.
APA, Harvard, Vancouver, ISO, and other styles
8

Adornes, Daniel Couto. "A unified mapreduce programming interface for multi-core and distributed architectures." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2015. http://tede2.pucrs.br/tede2/handle/tede/6782.

Full text
Abstract:
Submitted by Setor de Tratamento da Informa??o - BC/PUCRS (tede2@pucrs.br) on 2016-06-22T19:44:58Z No. of bitstreams: 1 DIS_DANIEL_COUTO_ADORNES_COMPLETO.pdf: 1894086 bytes, checksum: f87c59fa92f43ed62efaafd9c724ed8d (MD5)
Made available in DSpace on 2016-06-22T19:44:58Z (GMT). No. of bitstreams: 1 DIS_DANIEL_COUTO_ADORNES_COMPLETO.pdf: 1894086 bytes, checksum: f87c59fa92f43ed62efaafd9c724ed8d (MD5) Previous issue date: 2015-03-31
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES
In order to improve performance, simplicity and scalability of large datasets processing, Google proposed the MapReduce parallel pattern. This pattern has been implemented in several ways for different architectural levels, achieving significant results for high performance computing. However, developing optimized code with those solutions requires specialized knowledge in each framework?s interface and programming language. Recently, the DSL-POPP was proposed as a framework with a high-level language for patternsoriented parallel programming, aimed at abstracting complexities of parallel and distributed code. Inspired on DSL-POPP, this work proposes the implementation of a unified MapReduce programming interface with rules for code transformation to optimized solutions for shared-memory multi-core and distributed architectures. The evaluation demonstrates that the proposed interface is able to avoid performance losses, while also achieving a code and a development cost reduction from 41.84% to 96.48%. Moreover, the construction of the code generator, the compatibility with other MapReduce solutions and the extension of DSL-POPP with the MapReduce pattern are proposed as future work.
Visando melhoria de performance, simplicidade e escalabilidade no processamento de dados amplos, o Google prop?s o padr?o paralelo MapReduce. Este padr?o tem sido implementado de variadas formas para diferentes n?veis de arquitetura, alcan?ando resultados significativos com respeito a computa??o de alto desempenho. No entanto, desenvolver c?digo otimizado com tais solu??es requer conhecimento especializado na interface e na linguagem de programa??o de cada solu??o. Recentemente, a DSL-POPP foi proposta como uma solu??o de linguagem de programa??o de alto n?vel para programa??o paralela orientada a padr?es, visando abstrair as complexidades envolvidas em programa??o paralela e distribu?da. Inspirado na DSL-POPP, este trabalho prop?e a implementa??o de uma interface unificada de programa??o MapReduce com regras para transforma??o de c?digo para solu??es otimizadas para arquiteturas multi-core de mem?ria compartilhada e distribu?da. A avalia??o demonstra que a interface proposta ? capaz de evitar perdas de performance, enquanto alcan?a uma redu??o de c?digo e esfor?o de programa??o de 41,84% a 96,48%. Ademais, a constru??o do gerador de c?digo, a compatibilidade com outras solu??es MapReduce e a extens?o da DSL-POPP com o padr?o MapReduce s?o propostas para trabalhos futuros.
APA, Harvard, Vancouver, ISO, and other styles
9

Pan, Jie. "Modélisation et exécution des applications d'analyse de données multi-dimentionnelles sur architectures distribuées." Phd thesis, Ecole Centrale Paris, 2010. http://tel.archives-ouvertes.fr/tel-00579125.

Full text
Abstract:
Des quantités de données colossalles sont générées quotidiennement. Traiter de grands volumes de données devient alors un véritable challenge pour les logiciels d'analyse des données multidimensionnelles. De plus, le temps de réponse exigé par les utilisateurs de ces logiciels devient de plus en plus court, voire intéractif. Pour répondre à cette demande, une approche basée sur le calcul parallèle est une solution. Les approches traditionnelles reposent sur des architectures performantes, mais coûteuses, comme les super-calculateurs. D'autres architectures à faible coût sont également disponibles, mais les méthodes développées sur ces architectures sont souvent bien moins efficaces. Dans cette thèse, nous utilisons un modèle de programmation parallèle issu du Cloud Computing, dénommé MapReduce, pour paralléliser le traitement des requêtes d'analyse de données multidimensionnelles afin de bénéficier de mécanismes de bonne scalabilité et de tolérance aux pannes. Dans ce travail, nous repensons les techniques existantes pour optimiser le traitement de requête d'analyse de données multidimensionnelles, y compris les étapes de pré-calcul, d'indexation, et de partitionnement de données. Nous avons aussi résumé le parallélisme de traitement de requêtes. Ensuite, nous avons étudié le modèle MapReduce en détail. Nous commençons par présenter le principe de MapReduce et celles du modèle étendu, MapCombineReduce. En particulier, nous analysons le coût de communication pour la procédure de MapReduce. Après avoir présenté le stockage de données qui fonctionne avec MapReduce, nous présentons les caractéristiques des applications de gestion de données appropriées pour le Cloud Computing et l'utilisation de MapReduce pour les applications d'analyse de données dans les travaux existants. Ensuite, nous nous concentrons sur la parallélisation des Multiple Group-by query, une requête typique utilisée dans l'exploration de données multidimensionnelles. Nous présentons la mise en oeuvre de l'implémentation initiale basée sur MapReduce et une optimisation basée sur MapCombineReduce. Selon les résultats expérimentaux, notre version optimisée montre un meilleur speed-up et une meilleure scalabilité que la version initiale. Nous donnons également une estimation formelle du temps d'exécution pour les deux implémentations. Afin d'optimiser davantage le traitement du Multiple Group-by query, une phase de restructuration de données est proposée pour optimiser les jobs individuels. Nous re-definissons l'organisation du stockage des données, et nous appliquons les techniques suivantes, le partitionnement des données, l'indexation inversée et la compression des données, au cours de la phase de restructuration des données. Nous redéfinissons les calculs effectués dans MapReduce et dans l'ordonnancement des tâches en utilisant cette nouvelle structure de données. En nous basant sur la mesure du temps d'exécution, nous pouvons donner une estimation formelle et ainsi déterminer les facteurs qui impactent les performances, telles que la sélectivité de requête, le nombre de mappers lancés sur un noeud, la distribution des données " hitting ", la taille des résultats intermédiaires, les algorithmes de sérialisation adoptée, l'état du réseau, le fait d'utiliser ou non le combiner, ainsi que les méthodes adoptées pour le partitionnement de données. Nous donnons un modèle d'estimation des temps d'exécution et en particulier l'estimation des valeurs des paramètres différents pour les exécutions utilisant le partitionnement horizontal. Afin de soutenir la valeur-unique-wise-ordonnancement, qui est plus flexible, nous concevons une nouvelle structure de données compressées, qui fonctionne avec un partitionnement vertical. Cette approche permet l'agrégation sur une certaine valeur dans un processus continu.
APA, Harvard, Vancouver, ISO, and other styles
10

Palanisamy, Balaji. "Cost-effective and privacy-conscious cloud service provisioning: architectures and algorithms." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/52157.

Full text
Abstract:
Cloud Computing represents a recent paradigm shift that enables users to share and remotely access high-powered computing resources (both infrastructure and software/services) contained in off-site data centers thereby allowing a more efficient use of hardware and software infrastructures. This growing trend in cloud computing, combined with the demands for Big Data and Big Data analytics, is driving the rapid evolution of datacenter technologies towards more cost-effective, consumer-driven, more privacy conscious and technology agnostic solutions. This dissertation is dedicated to taking a systematic approach to develop system-level techniques and algorithms to tackle the challenges of large-scale data processing in the Cloud and scaling and delivering privacy-aware services with anytime-anywhere availability. We analyze the key challenges in effective provisioning of Cloud services in the context of MapReduce-based parallel data processing considering the concerns of cost-effectiveness, performance guarantees and user-privacy and we develop a suite of solution techniques, architectures and models to support cost-optimized and privacy-preserving service provisioning in the Cloud. At the cloud resource provisioning tier, we develop a utility-driven MapReduce Cloud resource planning and management system called Cura for cost-optimally allocating resources to jobs. While existing services require users to select a number of complex cluster and job parameters and use those potentially sub-optimal per-job configurations, the Cura resource management achieves global resource optimization in the cloud by minimizing cost and maximizing resource utilization. We also address the challenges of resource management and job scheduling for large-scale parallel data processing in the Cloud in the presence of networking and storage bottlenecks commonly experienced in Cloud data centers. We develop Purlieus, a self-configurable locality-based data and virtual machine management framework that enables MapReduce jobs to access their data either locally or from close-by nodes including all input, output and intermediate data achieving significant improvements in job response time. We then extend our cloud resource management framework to support privacy-preserving data access and efficient privacy-conscious query processing. Concretely, we propose and implement VNCache: an efficient solution for MapReduce analysis of cloud-archived log data for privacy-conscious enterprises. Through a seamless data streaming and prefetching model in VNCache, Hadoop jobs begin execution as soon as they are launched without requiring any apriori downloading. At the cloud consumer tier, we develop mix-zone based techniques for delivering anonymous cloud services to mobile users on the move through Mobimix, a novel road-network mix-zone based framework that enables real time, location based service delivery without disclosing content or location privacy of the consumers.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "MAPREDUCE ARCHITECTURE"

1

Herodotou, Herodotos, and Shivnath Babu. Massively Parallel Databases and MapReduce Systems. Now Publishers, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "MAPREDUCE ARCHITECTURE"

1

Zhou, Lijun, and Zhiyi Yu. "Acceleration of MapReduce Framework on a Multicore Processor." In Emerging Technology and Architecture for Big-data Analytics, 175–90. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-54840-1_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Xu, Hongsheng, Ganglong Fan, and Ke Li. "Analysis and Application of Mapreduce Architecture and Working Principle." In Advances in Intelligent Systems and Computing, 955–61. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15235-2_127.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Laclavík, Michal, Martin Šeleng, and Ladislav Hluchý. "Towards Large Scale Semantic Annotation Built on MapReduce Architecture." In Computational Science – ICCS 2008, 331–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-69389-5_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Eken, Süleyman, Umut Kizgindere, and Ahmet Sayar. "MapReduce Based Scalable Range Query Architecture for Big Spatial Data." In Lecture Notes in Geoinformation and Cartography, 263–72. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-45123-7_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Talan, Pooja P., Kartik U. Sharma, Pratiksha P. Nawade, and Karishma P. Talan. "An Overview of Hadoop MapReduce, Spark, and Scalable Graph Processing Architecture." In Advances in Intelligent Systems and Computing, 35–42. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1280-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Quan, and Minyi Guo. "MapReduce for Cloud Computing." In Task Scheduling for Multi-core and Parallel Architectures, 173–98. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6238-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Yujie, Wenyu Qu, Zhiyang Li, Changqing Ji, Yuanyuan Li, and Yinan Wu. "Fast Scalable k-means++ Algorithm with MapReduce." In Algorithms and Architectures for Parallel Processing, 15–28. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11194-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hu, Minghao, Changjian Wang, Pengfei You, Zhen Huang, and Yuxing Peng. "Deadline-Oriented Task Scheduling for MapReduce Environments." In Algorithms and Architectures for Parallel Processing, 359–72. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-27122-4_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chen, Yi, Zhaobin Liu, Tingting Wang, and Lu Wang. "Load Balancing in MapReduce Based on Data Locality." In Algorithms and Architectures for Parallel Processing, 229–41. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11197-1_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yu, Xiao, Jin Liu, Xiao Liu, Chuanxiang Ma, and Bin Li. "A MapReduce Reinforced Distributed Sequential Pattern Mining Algorithm." In Algorithms and Architectures for Parallel Processing, 183–97. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-27122-4_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "MAPREDUCE ARCHITECTURE"

1

Ammal, R. Ananthalakshmi, and Kumar K. B. Aneesh. "MapReduce framework based distributed NMS architecture." In 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM 2011). IEEE, 2011. http://dx.doi.org/10.1109/inm.2011.5990662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Changjian, Yuxing Peng, Junyi Liu, Mingxing Tang, Guangming Liu, Jinghua Feng, and Pengfei You. "Optimal Task Scheduling in MapReduce." In 2014 9th IEEE International Conference on Networking, Architecture, and Storage (NAS). IEEE, 2014. http://dx.doi.org/10.1109/nas.2014.26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhou, Xiaobo, and Bin Zhang. "Research of MapReduce architecture on busbar protection." In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 2018. http://dx.doi.org/10.1109/ccis.2018.8691190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Caruana, Godwin, Maozhen Li, and Hao Qi. "SpamCloud: A MapReduce based anti-spam architecture." In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). IEEE, 2010. http://dx.doi.org/10.1109/fskd.2010.5569282.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Lifeng, Yue Zhang, Meilin Liu, Chongjun Wang, and Jun Wang. "A-MapCG: An Adaptive MapReduce Framework for GPUs." In 2017 International Conference on Networking, Architecture, and Storage (NAS). IEEE, 2017. http://dx.doi.org/10.1109/nas.2017.8026842.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Woldemariam, Yonas, Stefan Pletschacher, Christian Clausner, and Julian Bass. "A Cloud-Hosted MapReduce Architecture for Syntactic Parsing." In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 2019. http://dx.doi.org/10.1109/seaa.2019.00024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Linchuan, Xin Huo, and Gagan Agrawal. "Accelerating MapReduce on a coupled CPU-GPU architecture." In 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2012. http://dx.doi.org/10.1109/sc.2012.16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ranger, Colby, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. "Evaluating MapReduce for Multi-core and Multiprocessor Systems." In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, 2007. http://dx.doi.org/10.1109/hpca.2007.346181.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fang Zhou, Hai Pham, Jianhui Yue, Hao Zou, and Weikuan Yu. "SFMapReduce: An optimized MapReduce framework for Small Files." In 2015 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, 2015. http://dx.doi.org/10.1109/nas.2015.7255218.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ma, Nam, Yinglong Xia, and Viktor K. Prasanna. "Parallel Exact Inference on Multicore Using MapReduce." In 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2012. http://dx.doi.org/10.1109/sbac-pad.2012.43.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography