Academic literature on the topic 'MAPREDUCE FRAMEWORKS'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'MAPREDUCE FRAMEWORKS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "MAPREDUCE FRAMEWORKS"

1

Ajibade Lukuman Saheed, Abu Bakar Kamalrulnizam, Ahmed Aliyu, and Tasneem Darwish. "Latency-aware Straggler Mitigation Strategy in Hadoop MapReduce Framework: A Review." Systematic Literature Review and Meta-Analysis Journal 2, no. 2 (October 19, 2021): 53–60. http://dx.doi.org/10.54480/slrm.v2i2.19.

Full text
Abstract:
Processing huge and complex data to obtain useful information is challenging, even though several big data processing frameworks have been proposed and further enhanced. One of the prominent big data processing frameworks is MapReduce. The main concept of MapReduce framework relies on distributed and parallel processing. However, MapReduce framework is facing serious performance degradations due to the slow execution of certain tasks type called stragglers. Failing to handle stragglers causes delay and affects the overall job execution time. Meanwhile, several straggler reduction techniques have been proposed to improve the MapReduce performance. This study provides a comprehensive and qualitative review of the different existing straggler mitigation solutions. In addition, a taxonomy of the available straggler mitigation solutions is presented. Critical research issues and future research directions are identified and discussed to guide researchers and scholars
APA, Harvard, Vancouver, ISO, and other styles
2

Darapaneni, Chandra Sekhar, Bobba Basaveswara Rao, Boggavarapu Bhanu Venkata Satya Vara Prasad, and Suneetha Bulla. "An Analytical Performance Evaluation of MapReduce Model Using Transient Queuing Model." Advances in Modelling and Analysis B 64, no. 1-4 (December 31, 2021): 46–53. http://dx.doi.org/10.18280/ama_b.641-407.

Full text
Abstract:
Today the MapReduce frameworks become the standard distributed computing mechanisms to store, process, analyze, query and transform the Bigdata. While processing the Bigdata, evaluating the performance of the MapReduce framework is essential, to understand the process dependencies and to tune the hyper-parameters. Unfortunately, the scope of the MapReduce framework in-built functions is limited to evaluate the performance till some extent. A reliable analytical performance model is required in this area to evaluate the performance of the MapReduce frameworks. The main objective of this paper is to investigate the performance effect of the MapReduce computing models under various configurations. To accomplish this job, we proposed an analytical transient queuing model, which evaluates the MapReduce model performance for different job arrival rates at mappers and various job completion times of mappers as well as the reducers too. In our transient queuing model, we appointed an efficient multi-server queuing model M/M/C for optimal waiting queue management. To conduct the experiments on proposed analytics model, we selected the Bigdata applications with three mappers and two reducers, under various configurations. As part of the experiments, the transient differential equations, average queue lengths, mappers blocking probability, shuffle waiting probabilities and transient states are evaluated. MATLAB based numerical simulations presented the analytical results for various combinations of the input parameters like λ, µ1 and µ2 and their effect on queue length.
APA, Harvard, Vancouver, ISO, and other styles
3

Kang, Sol Ji, Sang Yeon Lee, and Keon Myung Lee. "Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems." Advances in Multimedia 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/575687.

Full text
Abstract:
With problem size and complexity increasing, several parallel and distributed programming models and frameworks have been developed to efficiently handle such problems. This paper briefly reviews the parallel computing models and describes three widely recognized parallel programming frameworks: OpenMP, MPI, and MapReduce. OpenMP is the de facto standard for parallel programming on shared memory systems. MPI is the de facto industry standard for distributed memory systems. MapReduce framework has become the de facto standard for large scale data-intensive applications. Qualitative pros and cons of each framework are known, but quantitative performance indexes help get a good picture of which framework to use for the applications. As benchmark problems to compare those frameworks, two problems are chosen: all-pairs-shortest-path problem and data join problem. This paper presents the parallel programs for the problems implemented on the three frameworks, respectively. It shows the experiment results on a cluster of computers. It also discusses which is the right tool for the jobs by analyzing the characteristics and performance of the paradigms.
APA, Harvard, Vancouver, ISO, and other styles
4

Srirama, Satish Narayana, Oleg Batrashev, Pelle Jakovits, and Eero Vainikko. "Scalability of Parallel Scientific Applications on the Cloud." Scientific Programming 19, no. 2-3 (2011): 91–105. http://dx.doi.org/10.1155/2011/361854.

Full text
Abstract:
Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids) on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
5

Senthilkumar, M., and P. Ilango. "A Survey on Job Scheduling in Big Data." Cybernetics and Information Technologies 16, no. 3 (September 1, 2016): 35–51. http://dx.doi.org/10.1515/cait-2016-0033.

Full text
Abstract:
Abstract Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing. Hadoop is also open source software that allows the user to effectively utilize the hardware. Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time. This paper gives the outline of job scheduling, classification of the scheduler, and comparison of different existing algorithms with advantages, drawbacks, limitations. In this paper, we discussed various tools and frameworks used for monitoring and the ways to improve the performance in MapReduce. This paper helps the beginners and researchers in understanding the scheduling mechanisms used in Big Data.
APA, Harvard, Vancouver, ISO, and other styles
6

Adornes, Daniel, Dalvan Griebler, Cleverson Ledur, and Luiz Gustavo Fernandes. "Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures." International Journal of Software Engineering and Knowledge Engineering 25, no. 09n10 (November 2015): 1739–41. http://dx.doi.org/10.1142/s0218194015710096.

Full text
Abstract:
MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.
APA, Harvard, Vancouver, ISO, and other styles
7

Song, Minjae, Hyunsuk Oh, Seungmin Seo, and Kyong-Ho Lee. "Map-Side Join Processing of SPARQL Queries Based on Abstract RDF Data Filtering." Journal of Database Management 30, no. 1 (January 2019): 22–40. http://dx.doi.org/10.4018/jdm.2019010102.

Full text
Abstract:
The amount of RDF data being published on the Web is increasing at a massive rate. MapReduce-based distributed frameworks have become the general trend in processing SPARQL queries against RDF data. Currently, query processing systems that use MapReduce have not been able to keep up with the increase of semantic annotated data, resulting in non-interactive SPARQL query processing. The principal reason is that intermediate query results from join operations in a MapReduce framework are so massive that they consume all available network bandwidth. In this article, the authors present an efficient SPARQL processing system that uses MapReduce and HBase. The system runs a job optimized query plan using their proposed abstract RDF data to decrease the number of jobs and also decrease the amount of input data. The authors also present an efficient algorithm of using Map-side joins while also using the abstract RDF data to filter out unneeded RDF data. Experimental results show that the proposed approach demonstrates better performance when processing queries with a large amount of input data than those found in previous works.
APA, Harvard, Vancouver, ISO, and other styles
8

Thabtah, Fadi, Suhel Hammoud, and Hussein Abdel-Jaber. "Parallel Associative Classification Data Mining Frameworks Based MapReduce." Parallel Processing Letters 25, no. 02 (June 2015): 1550002. http://dx.doi.org/10.1142/s0129626415500024.

Full text
Abstract:
Associative classification (AC) is a research topic that integrates association rules with classification in data mining to build classifiers. After dissemination of the Classification-based Association Rule algorithm (CBA), the majority of its successors have been developed to improve either CBA's prediction accuracy or the search for frequent ruleitems in the rule discovery step. Both of these steps require high demands in processing time and memory especially in cases of large training data sets or a low minimum support threshold value. In this paper, we overcome the problem of mining large training data sets by proposing a new learning method that repeatedly transforms data between line and item spaces to quickly discover frequent ruleitems, generate rules, subsequently rank and prune rules. This new learning method has been implemented in a parallel Map-Reduce (MR) algorithm called MRMCAR which can be considered the first parallel AC algorithm in the literature. The new learning method can be utilised in the different steps within any AC or association rule mining algorithms which scales well if contrasted with current horizontal or vertical methods. Two versions of the learning method (Weka, Hadoop) have been implemented and a number of experiments against different data sets have been conducted. The ground bases of the comparisons are classification accuracy and time required by the algorithm for data initialization, frequent ruleitems discovery, rule generation and rule pruning. The results reveal that MRMCAR is superior to both current AC mining algorithms and rule based classification algorithms in improving the classification performance with respect to accuracy.
APA, Harvard, Vancouver, ISO, and other styles
9

Goncalves, Carlos, Luis Assuncao, and Jose C. Cunha. "Flexible MapReduce Workflows for Cloud Data Analytics." International Journal of Grid and High Performance Computing 5, no. 4 (October 2013): 48–64. http://dx.doi.org/10.4018/ijghpc.2013100104.

Full text
Abstract:
Data analytics applications handle large data sets subject to multiple processing phases, some of which can execute in parallel on clusters, grids or clouds. Such applications can benefit from using MapReduce model, only requiring the end-user to define the application algorithms for input data processing and the map and reduce functions, but this poses a need to install/configure specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. In order to provide more flexibility in defining and adjusting the application configurations, as well as in the specification of the composition of the application phases and their orchestration, the authors describe an approach for supporting MapReduce stages as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). The authors discuss how a text mining application is represented as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. Access to intermediate data produced during the MapReduce computations is supported by a data sharing abstraction. The authors describe two implementations of this abstraction, one based on a shared tuple space and another based on an in-memory distributed key/value store. The authors describe the implementation of the framework, a set of developed tools, and our experimentation with the execution of the text mining algorithm over multiple Amazon EC2 (Elastic Compute Cloud) instances, and report on the speed-up and size-up results obtained up to 20 EC2 instances and for different corpus sizes, up to 97 million words.
APA, Harvard, Vancouver, ISO, and other styles
10

Esposito, Christian, and Massimo Ficco. "Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce." International Journal of Data Warehousing and Mining 12, no. 1 (January 2016): 49–68. http://dx.doi.org/10.4018/ijdwm.2016010104.

Full text
Abstract:
The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "MAPREDUCE FRAMEWORKS"

1

de, Souza Ferreira Tharso. "Improving Memory Hierarchy Performance on MapReduce Frameworks for Multi-Core Architectures." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129468.

Full text
Abstract:
La necesidad de analizar grandes conjuntos de datos de diferentes tipos de aplicaciones ha popularizado el uso de modelos de programación simplicados como MapReduce. La popularidad actual se justifica por ser una abstracción útil para expresar procesamiento paralelo de datos y también ocultar eficazmente la sincronización de datos, tolerancia a fallos y la gestión de balanceo de carga para el desarrollador de la aplicación. Frameworks MapReduce también han sido adaptados a los sistema multi-core y de memoria compartida. Estos frameworks proponen que cada core de una CPU ejecute una tarea Map o Reduce de manera concurrente. Las fases Map y Reduce también comparten una estructura de datos común donde se aplica el procesamiento principal. En este trabajo se describen algunas limitaciones de los actuales frameworks para arquitecturas multi-core. En primer lugar, se describe la estructura de datos que se utiliza para mantener todo el archivo de entrada y datos intermedios en la memoria. Los frameworks actuales para arquitecturas multi-core han estado diseñado para mantener todos los datos intermedios en la memoria. Cuando se ejecutan aplicaciones con un gran conjunto de datos de entrada, la memoria disponible se convierte en demasiada pequeña para almacenar todos los datos intermedios del framework, presentando así una grave pérdida de rendimiento. Proponemos un subsistema de gestión de memoria que permite a las estructuras de datos procesar un número ilimitado de datos a través del uso de un mecanismo de spilling en el disco. También implementamos una forma de gestionar el acceso simultáneo al disco por todos los threads que realizan el procesamiento. Por último, se estudia la utilización eficaz de la jerarquía de memoria de los frameworks MapReduce y se propone una nueva implementación de una tarea MapReduce parcial para conjuntos de datos de entrada. El objetivo es hacer un buen uso de la caché, eliminando las referencias a los bloques de datos que ya no están en uso. Nuestra propuesta fue capaz de reducir significativamente el uso de la memoria principal y mejorar el rendimiento global con el aumento del uso de la memoria caché.
The need of analyzing large data sets from many different application fields has fostered the use of simplified programming models like MapReduce. Its current popularity is justified by being a useful abstraction to express data parallel processing and also by effectively hiding synchronization, fault tolerance and load balancing management details from the application developer. MapReduce frameworks have also been ported to multi-core and shared memory computer systems. These frameworks propose to dedicate a different computing CPU core for each map or reduce task to execute them concurrently. Also, Map and Reduce phases share a common data structure where main computations are applied. In this work we describe some limitations of current multi-core MapReduce frameworks. First, we describe the relevance of the data structure used to keep all input and intermediate data in memory. Current multi-core MapReduce frameworks are designed to keep all intermediate data in memory. When executing applications with large data input, the available memory becomes too small to store all framework intermediate data and there is a severe performance loss. We propose a memory management subsystem to allow intermediate data structures the processing of an unlimited amount of data by the use of a disk spilling mechanism. Also, we have implemented a way to manage concurrent access to disk of all threads participating in the computation. Finally, we have studied the effective use of the memory hierarchy by the data structures of the MapReduce frameworks and proposed a new implementation of partial MapReduce tasks to the input data set. The objective is to make a better use of the cache and to eliminate references to data blocks that are no longer in use. Our proposal was able to significantly reduce the main memory usage and improves the overall performance with the increasing of cache memory usage.
APA, Harvard, Vancouver, ISO, and other styles
2

Kumaraswamy, Ravindranathan Krishnaraj. "Exploiting Heterogeneity in Distributed Software Frameworks." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/64423.

Full text
Abstract:
The objective of this thesis is to address the challenges faced in sustaining efficient, high-performance and scalable Distributed Software Frameworks (DSFs), such as MapReduce, Hadoop, Dryad, and Pregel, for supporting data-intensive scientific and enterprise applications on emerging heterogeneous compute, storage and network infrastructure. Large DSF deployments in the cloud continue to grow both in size and number, given DSFs are cost-effective and easy to deploy. DSFs are becoming heterogeneous with the use of advanced hardware technologies and due to regular upgrades to the system. For instance, low-cost, power-efficient clusters that employ traditional servers along with specialized resources such as FPGAs, GPUs, powerPC, MIPS and ARM based embedded devices, and high-end server-on-chip solutions will drive future DSFs infrastructure. Similarly, high-throughput DSF storage is trending towards hybrid and tiered approaches that use large in-memory buffers, SSDs, etc., in addition to disks. However, the schedulers and resource managers of these DSFs assume the underlying hardware to be similar or homogeneous. Another problem faced in evolving applications is that they are typically complex workflows comprising of different kernels. The kernels can be diverse, e.g., compute-intensive processing followed by data-intensive visualization and each kernel will have a different affinity towards different hardware. Because of the inability of the DSFs to understand heterogeneity of the underlying hardware architecture and applications, existing resource managers cannot ensure appropriate resource-application match for better performance and resource usage. In this dissertation, we design, implement, and evaluate DerbyhatS, an application-characteristics-aware resource manager for DSFs, which predicts the performance of the application under different hardware configurations and dynamically manage compute and storage resources as per the application needs. We adopt a quantitative approach where we first study the detailed behavior of various Hadoop applications running on different hardware configurations and propose application-attuned dynamic system management in order to improve the resource-application match. We re-design the Hadoop Distributed File System (HDFS) into a multi-tiered storage system that seamlessly integrates heterogeneous storage technologies into the HDFS. We also propose data placement and retrieval policies to improve the utilization of the storage devices based on their characteristics such as I/O throughput and capacity. DerbyhatS workflow scheduler is an application-attuned workflow scheduler and is constituted by two components. phi-Sched coupled with epsilon-Sched manages the compute heterogeneity and DUX coupled with AptStore manages the storage substrate to exploit heterogeneity. DerbyhatS will help realize the full potential of the emerging infrastructure for DSFs, e.g., cloud data centers, by offering many advantages over the state of the art by ensuring application-attuned, dynamic heterogeneous resource management.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
3

Venumuddala, Ramu Reddy. "Distributed Frameworks Towards Building an Open Data Architecture." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc801911/.

Full text
Abstract:
Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and Spark.
APA, Harvard, Vancouver, ISO, and other styles
4

Peddi, Sri Vijay Bharat. "Cloud Computing Frameworks for Food Recognition from Images." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32450.

Full text
Abstract:
Distributed cloud computing, when integrated with smartphone capabilities, contribute to building an efficient multimedia e-health application for mobile devices. Unfortunately, mobile devices alone do not possess the ability to run complex machine learning algorithms, which require large amounts of graphic processing and computational power. Therefore, offloading the computationally intensive part to the cloud, reduces the overhead on the mobile device. In this thesis, we introduce two such distributed cloud computing models, which implement machine learning algorithms in the cloud in parallel, thereby achieving higher accuracy. The first model is based on MapReduce SVM, wherein, through the use of Hadoop, the system distributes the data and processes it across resizable Amazon EC2 instances. Hadoop uses a distributed processing architecture called MapReduce, in which a task is mapped to a set of servers for processing and the results are then reduced back to a single set. In the second method, we implement cloud virtualization, wherein we are able to run our mobile application in the cloud using an Android x86 image. We describe a cloud-based virtualization mechanism for multimedia-assisted mobile food recognition, which allow users to control their virtual smartphone operations through a dedicated client application installed on their smartphone. The application continues to be processed on the virtual mobile image even if the user is disconnected for some reason. Using these two distributed cloud computing models, we were able to achieve higher accuracy and reduced timings for the overall execution of machine learning algorithms and calorie measurement methodologies, when implemented on the mobile device.
APA, Harvard, Vancouver, ISO, and other styles
5

Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.

Full text
Abstract:
Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional space of optimization methodologies, which is challenging even for the well-trained expert. The complexity of this multi-dimensional space arises not only from the traditionally well known but arduous task of architecture-aware GPU optimization at design and compile time, but it also arises in the partitioning and scheduling of the computation across these heterogeneous resources. Even with programming models like the Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), the developer still needs to manage the data transfer be- tween host and device and vice versa, orchestrate the execution of several kernels, and more arduously, optimize the kernel code. In this dissertation, we aim to deliver a transparent parallel programming environment for heterogeneous resources by leveraging the power of the MapReduce programming model and OpenCL programming language. We propose a portable architecture-aware framework that efficiently runs an application across heterogeneous resources, specifically AMD GPUs and NVIDIA GPUs, while hiding complex architectural details from the developer. To further enhance performance portability, we explore approaches for asynchronously and efficiently distributing the computations across heterogeneous resources. When applied to benchmarks and representative applications, our proposed framework significantly enhances performance, including up to 58% improvement over traditional approaches to task assignment and up to a 45-fold improvement over state-of-the-art MapReduce implementations.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
6

Alkan, Sertan. "A Distributed Graph Mining Framework Based On Mapreduce." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12611588/index.pdf.

Full text
Abstract:
The frequent patterns hidden in a graph can reveal crucial information about the network the graph represents. Existing techniques to mine the frequent subgraphs in a graph database generally rely on the premise that the data can be fit into main memory of the device that the computation takes place. Even though there are some algorithms that are designed using highly optimized methods to some extent, many lack the solution to the problem of scalability. In this thesis work, our aim is to find and enumerate the subgraphs that are at least as frequent as the designated threshold in a given graph. Here, we propose a new distributed algorithm for frequent subgraph mining problem that can scale horizontally as the computing cluster size increases. The method described here, uses a partitioning method and Map/Reduce programming model to distribute the computation of frequent subgraphs. In the core of this algorithm, we make use of an existing graph partitioning method to split the given data in the distributed file system and to merge and join the computed subgraphs without losing information. The frequent subgraph computation in each split is done using another known method which can enumerate the frequent patterns. Although current algorithms can efficiently find frequent patterns, they are not parallel or distributed algorithms in that even when they partition the data, they are designed to work on a single machine. Furthermore, these algorithms are computationally expensive but not fault tolerant and are not designed to work on a distributed file system. Using the Map/Reduce paradigm, we distribute the computation of frequent patterns to every machine in a cluster. Our algorithm, first bi-partitions the data via successive Map/Reduce jobs, then invokes another Map/Reduce job to compute the subgraphs in partitions using CloseGraph, recovers the whole set by invoking a series of Map/Reduce jobs to merge-join the previously found patterns. The implementation uses an open source Map/Reduce environment, Hadoop. In our experiments, our method can scale up to large graphs, as the graph data size gets bigger, this method performs better than the existing algorithms.
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Yongzhi. "Constructing Secure MapReduce Framework in Cloud-based Environment." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2238.

Full text
Abstract:
MapReduce, a parallel computing paradigm, has been gaining popularity in recent years as cloud vendors offer MapReduce computation services on their public clouds. However, companies are still reluctant to move their computations to the public cloud due to the following reason: In the current business model, the entire MapReduce cluster is deployed on the public cloud. If the public cloud is not properly protected, the integrity and the confidentiality of MapReduce applications can be compromised by attacks inside or outside of the public cloud. From the result integrity’s perspective, if any computation nodes on the public cloud are compromised,thosenodes can return incorrect task results and therefore render the final job result inaccurate. From the algorithmic confidentiality’s perspective, when more and more companies devise innovative algorithms and deploy them to the public cloud, malicious attackers can reverse engineer those programs to detect the algorithmic details and, therefore, compromise the intellectual property of those companies. In this dissertation, we propose to use the hybrid cloud architecture to defeat the above two threats. Based on the hybrid cloud architecture, we propose separate solutions to address the result integrity and the algorithmic confidentiality problems. To address the result integrity problem, we propose the Integrity Assurance MapReduce (IAMR) framework. IAMR performs the result checking technique to guarantee high result accuracy of MapReduce jobs, even if the computation is executed on an untrusted public cloud. We implemented a prototype system for a real hybrid cloud environment and performed a series of experiments. Our theoretical simulations and experimental results show that IAMR can guarantee a very low job error rate, while maintaining a moderate performance overhead. To address the algorithmic confidentiality problem, we focus on the program control flow and propose the Confidentiality Assurance MapReduce (CAMR) framework. CAMR performs the Runtime Control Flow Obfuscation (RCFO) technique to protect the predicates of MapReduce jobs. We implemented a prototype system for a real hybrid cloud environment. The security analysis and experimental results show that CAMR defeats static analysis-based reverse engineering attacks, raises the bar for the dynamic analysis-based reverse engineering attacks, and incurs a modest performance overhead.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Yue. "A Workload Balanced MapReduce Framework on GPU Platforms." Wright State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=wright1450180042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Raja, Anitha. "A Coordination Framework for Deploying Hadoop MapReduce Jobs on Hadoop Cluster." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196951.

Full text
Abstract:
Apache Hadoop is an open source framework that delivers reliable, scalable, and distributed computing. Hadoop services are provided for distributed data storage, data processing, data access, and security. MapReduce is the heart of the Hadoop framework and was designed to process vast amounts of data distributed over a large number of nodes. MapReduce has been used extensively to process structured and unstructured data in diverse fields such as e-commerce, web search, social networks, and scientific computation. Understanding the characteristics of Hadoop MapReduce workloads is the key to achieving improved configurations and refining system throughput. Thus far, MapReduce workload characterization in a large-scale production environment has not been well studied. In this thesis project, the focus is mainly on composing a Hadoop cluster (as an execution environment for data processing) to analyze two types of Hadoop MapReduce (MR) jobs via a proposed coordination framework. This coordination framework is referred to as a workload translator. The outcome of this work includes: (1) a parametric workload model for the target MR jobs, (2) a cluster specification to develop an improved cluster deployment strategy using the model and coordination framework, and (3) better scheduling and hence better performance of jobs (i.e. shorter job completion time). We implemented a prototype of our solution using Apache Tomcat on (OpenStack) Ubuntu Trusty Tahr, which uses RESTful APIs to (1) create a Hadoop cluster version 2.7.2 and (2) to scale up and scale down the number of workers in the cluster. The experimental results showed that with well tuned parameters, MR jobs can achieve a reduction in the job completion time and improved utilization of the hardware resources. The target audience for this thesis are developers. As future work, we suggest adding additional parameters to develop a more refined workload model for MR and similar jobs.
Apache Hadoop är ett öppen källkods system som levererar pålitlig, skalbar och distribuerad användning. Hadoop tjänster hjälper med distribuerad data förvaring, bearbetning, åtkomst och trygghet. MapReduce är en viktig del av Hadoop system och är designad att bearbeta stora data mängder och även distribuerad i flera leder. MapReduce är använt extensivt inom bearbetning av strukturerad och ostrukturerad data i olika branscher bl. a e-handel, webbsökning, sociala medier och även vetenskapliga beräkningar. Förståelse av MapReduces arbetsbelastningar är viktiga att få förbättrad konfigurationer och resultat. Men, arbetsbelastningar av MapReduce inom massproduktions miljö var inte djup-forskat hittills. I detta examensarbete, är en hel del fokus satt på ”Hadoop cluster” (som en utförande miljö i data bearbetning) att analysera två typer av Hadoop MapReduce (MR) arbeten genom ett tilltänkt system. Detta system är refererad som arbetsbelastnings översättare. Resultaten från denna arbete innehåller: (1) en parametrisk arbetsbelastningsmodell till inriktad MR arbeten, (2) en specifikation att utveckla förbättrad kluster strategier med båda modellen och koordinations system, och (3) förbättrad planering och arbetsprestationer, d.v.s kortare tid att utföra arbetet. Vi har realiserat en prototyp med Apache Tomcat på (OpenStack) Ubuntu Trusty Tahr som använder RESTful API (1) att skapa ”Hadoop cluster” version 2.7.2 och (2) att båda skala upp och ner antal medarbetare i kluster. Forskningens resultat har visat att med vältrimmad parametrar, kan MR arbete nå förbättringar dvs. sparad tid vid slutfört arbete och förbättrad användning av hårdvara resurser. Målgruppen för denna avhandling är utvecklare. I framtiden, föreslår vi tilläggning av olika parametrar att utveckla en allmän modell för MR och liknande arbeten.
APA, Harvard, Vancouver, ISO, and other styles
10

Lakkimsetti, Praveen Kumar. "A framework for automatic optimization of MapReduce programs based on job parameter configurations." Kansas State University, 2011. http://hdl.handle.net/2097/12011.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
Mitchell L. Neilsen
Recently, cost-effective and timely processing of large datasets has been playing an important role in the success of many enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: first, the emergence of cloud computing, which provides transparent access to a large number of processing, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. MapReduce has been widely used for large-scale data analysis in the Cloud [5]. The system is well recognized for its elastic scalability and fine-grained fault tolerance. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators to increase the efficiency of the program. Users often run into performance problems because they are unaware of how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators [4]. The major objective of this project is to provide a framework that optimizes MapReduce programs that run on large datasets. This is done by executing the MapReduce program on a part of the dataset using stored parameter combinations and setting the program with the most efficient combination and this modified program can be executed over the different datasets. We know that many MapReduce programs are used over and over again in applications like daily weather analysis, log analysis, daily report generation etc. So, once the parameter combination is set, it can be used on a number of data sets efficiently. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "MAPREDUCE FRAMEWORKS"

1

Singh, Jaspreet, S. N. Panda, and Rajesh Kaushal. "Performance Evaluation of Big Data Frameworks: MapReduce and Spark." In Advances in Intelligent Systems and Computing, 1611–19. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-5903-2_167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Dhanani, Jenish, Rupa Mehta, Dipti Rana, and Bharat Tidke. "Back-Propagated Neural Network on MapReduce Frameworks: A Survey." In Smart Innovations in Communication and Computational Sciences, 381–91. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-2414-7_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Noh, Hyunho, and Jun-Ki Min. "An Efficient Data Access Method Exploiting Quadtrees on MapReduce Frameworks." In Database Systems for Advanced Applications, 86–100. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40270-8_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Salto, Carolina, Gabriela Minetti, Enrique Alba, and Gabriel Luque. "Developing Genetic Algorithms Using Different MapReduce Frameworks: MPI vs. Hadoop." In Advances in Artificial Intelligence, 262–72. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00374-6_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Huanle, Ronghai Yang, Zhibo Yang, and Wing Cheong Lau. "Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration." In Algorithms and Architectures for Parallel Processing, 525–39. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-27122-4_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Reinders, James, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, and Xinmin Tian. "Common Parallel Patterns." In Data Parallel C++, 323–52. Berkeley, CA: Apress, 2020. http://dx.doi.org/10.1007/978-1-4842-5574-2_14.

Full text
Abstract:
Abstract When we are at our best as programmers, we recognize patterns in our work and apply techniques that are time proven to be the best solution. Parallel programming is no different, and it would be a serious mistake not to study the patterns that have proven to be useful in this space. Consider the MapReduce frameworks adopted for Big Data applications; their success stems largely from being based on two simple yet effective parallel patterns—map and reduce.
APA, Harvard, Vancouver, ISO, and other styles
7

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop Framework." In Big Data with Hadoop MapReduce, 47–111. Includes bibliographical references and index.: Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Suryawanshi, Sahiba, and Praveen Kaushik. "Efficient MapReduce Framework Using Summation." In Data, Engineering and Applications, 3–11. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6351-1_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Cho, Kyung Soo, Ji Yeon Lim, Jae Yeol Yoon, Young Hee Kim, Seung Kwan Kim, and Ung Mo Kim. "Opinion Mining in MapReduce Framework." In Communications in Computer and Information Science, 50–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-22365-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, Xiufeng, Christian Thomsen, and Torben Bach Pedersen. "The ETLMR MapReduce-Based ETL Framework." In Lecture Notes in Computer Science, 586–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-22351-8_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "MAPREDUCE FRAMEWORKS"

1

Lee, Haejoon, Minseo Kang, Sun-Bum Youn, Jae-Gil Lee, and YongChul Kwon. "An Experimental Comparison of Iterative MapReduce Frameworks." In CIKM'16: ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2983323.2983647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Haoyu, Haiying Shen, Charles Reiss, Arnim Jain, and Yunqiao Zhang. "Improved Intermediate Data Management for MapReduce Frameworks." In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020. http://dx.doi.org/10.1109/ipdps47924.2020.00062.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hsaini, Sara, Salma Azzouzi, and My El Hassan Charaf. "A Secure Testing Based Approach for Mapreduce Frameworks." In 2018 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). IEEE, 2018. http://dx.doi.org/10.1109/icecocs.2018.8610596.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jakovits, Pelle, and Satish Narayana Srirama. "Evaluating MapReduce frameworks for iterative Scientific Computing applications." In 2014 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2014. http://dx.doi.org/10.1109/hpcsim.2014.6903690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Haque, Ahsanul, and Latifur Khan. "MapReduce Based Frameworks for Classifying Evolving Data Stream." In 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW). IEEE, 2013. http://dx.doi.org/10.1109/icdmw.2013.145.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ahmad, Maaz Bin Safeer, and Alvin Cheung. "Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications." In SIGMOD/PODS '18: International Conference on Management of Data. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3183713.3196891.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Trong-Tuan Vu and Fabrice Huet. "A Lightweight Continuous Jobs Mechanism for MapReduce Frameworks." In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 2013. http://dx.doi.org/10.1109/ccgrid.2013.36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ghit, Bogdan, and Dick Epema. "Tyrex: Size-Based Resource Allocation in MapReduce Frameworks." In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 2016. http://dx.doi.org/10.1109/ccgrid.2016.82.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rivas-Gomez, Sergio, Stefano Markidis, Erwin Laure, Keeran Brabazon, Oliver Perks, and Sai Narasimhamurthy. "Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks." In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 2018. http://dx.doi.org/10.1109/hpcc/smartcity/dss.2018.00153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Guo, Jia, and Gagan Agrawal. "Achieving Performance and Programmability for MapReduce(-Like) Frameworks." In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 2018. http://dx.doi.org/10.1109/hipc.2018.00043.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography