To see the other types of publications on this topic, follow the link: MAPREDUCE ARCHITECTURE.

Journal articles on the topic 'MAPREDUCE ARCHITECTURE'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'MAPREDUCE ARCHITECTURE.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jiang, Tao, Huaxi Gu, Kun Wang, Xiaoshan Yu, and Yunfeng Lu. "BHyberCube: A MapReduce aware heterogeneous architecture for data center." Computer Science and Information Systems 14, no. 3 (2017): 611–27. http://dx.doi.org/10.2298/csis170202019t.

Full text
Abstract:
Some applications, like MapReduce, ask for heterogeneous network in data center network. However, the traditional network topologies, like fat tree and BCube, are homogeneous. MapReduce is a distributed data processing application. In this paper, we propose a BHyberCube network (BHC), which is a new heterogeneous network for MapReduce. Heterogeneous nodes and scalability issues are addressed considering the implementation of MapReduce in the existing topologies. Mathematical model is established to demonstrate the procedure of building a BHC. Comparisons of BHC and other topologies show the good properties BHC possesses for MapReduce. We also do simulations of BHC in multi-job injection and different probability of worker servers? communications scenarios respectively. The result and analysis show that the BHC could be a viable interconnection topology in today?s data center for MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
2

Park, Jong-Hyuk, Hwa-Young Jeong, Young-Sik Jeong, and Min Choi. "REST-MapReduce: An Integrated Interface but Differentiated Service." Journal of Applied Mathematics 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/170723.

Full text
Abstract:
With the fast deployment of cloud computing, MapReduce architectures are becoming the major technologies for mobile cloud computing. The concept of MapReduce was first introduced as a novel programming model and implementation for a large set of computing devices. In this research, we propose a novel concept of REST-MapReduce, enabling users to use only the REST interface without using the MapReduce architecture. This approach provides a higher level of abstraction by integration of the two types of access interface, REST API and MapReduce. The motivation of this research stems from the slower response time for accessing simple RDBMS on Hadoop than direct access to RDMBS. This is because there is overhead to job scheduling, initiating, starting, tracking, and management during MapReduce-based parallel execution. Therefore, we provide a good performance for REST Open API service and for MapReduce, respectively. This is very useful for constructing REST Open API services on Hadoop hosting services, for example, Amazon AWS (Macdonald, 2005) or IBM Smart Cloud. For evaluating performance of our REST-MapReduce framework, we conducted experiments with Jersey REST web server and Hadoop. Experimental result shows that our approach outperforms conventional approaches.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Bin, Jia Jin Le, and Mei Wang. "Effective ACPS-Based Rescheduling of Parallel Batch Processing Machines with MapReduce." Applied Mechanics and Materials 575 (June 2014): 820–24. http://dx.doi.org/10.4028/www.scientific.net/amm.575.820.

Full text
Abstract:
MapReduce is a highly efficient distributed and parallel computing framework, allowing users to readily manage large clusters in parallel computing. For Big data search problem in the distributed computing environment based on MapReduce architecture, in this paper we propose an Ant colony parallel search algorithm (ACPSMR) for Big data. It take advantage of the group intelligence of ant colony algorithm for global parallel search heuristic scheduling capabilities to solve problem of multi-task parallel batch scheduling with low efficiency in the MapReduce. And we extended HDFS design in MapReduce architecture, which make it to achieve effective integration with MapReduce. Then the algorithm can make the best of the scalability, high parallelism of MapReduce. The simulation experiment result shows that, the new algorithm can take advantages of cloud computing to get good efficiency when mining Big data.
APA, Harvard, Vancouver, ISO, and other styles
4

Mitra, Arnab, Anirban Kundu, Matangini Chattopadhyay, and Samiran Chattopadhyay. "On the Exploration of Equal Length Cellular Automata Rules Targeting a MapReduce Design in Cloud." International Journal of Cloud Applications and Computing 8, no. 2 (April 2018): 1–26. http://dx.doi.org/10.4018/ijcac.2018040101.

Full text
Abstract:
A MapReduce design with Cellular Automata (CA) is presented in this research article to facilitate load-reduced independent data processing and cost-efficient physical implementation in heterogeneous Cloud architecture. Equal Length Cellular Automata (ELCA) are considered for the design. This article explores ELCA rules and presents an ELCA based MapReduce design in cloud. New algorithms are presented for i) synthesis, ii) classification of ELCA rules, and iii) ELCA based MapReduce design in Cloud. Shuffling and efficient reduction of data volume are ensured in proposed MapReduce design.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Rong, and Haibo Chen. "Tiled-MapReduce." ACM Transactions on Architecture and Code Optimization 10, no. 1 (April 2013): 1–30. http://dx.doi.org/10.1145/2445572.2445575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Loughran, S., Jose M. Alcaraz Calero, A. Farrell, J. Kirschnick, and J. Guijarro. "Dynamic Cloud Deployment of a MapReduce Architecture." IEEE Internet Computing 16, no. 6 (November 2012): 40–50. http://dx.doi.org/10.1109/mic.2011.163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

de Kruijf, M., and K. Sankaralingam. "MapReduce for the Cell Broadband Engine Architecture." IBM Journal of Research and Development 53, no. 5 (September 2009): 10:1–10:12. http://dx.doi.org/10.1147/jrd.2009.5429076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Hanpeng, Wuqi Gao, and Junmin Luo. "Research on Intelligentization of Cloud Computing Programs Based on Self-awareness." International Journal of Advanced Network, Monitoring and Controls 8, no. 2 (June 1, 2023): 89–98. http://dx.doi.org/10.2478/ijanmc-2023-0060.

Full text
Abstract:
Abstract Through the research of MapReduce programming framework of cloud computing, the current MapReduce program only solves specific problems, and there is no design experience or design feature summary of MapReduce program, let alone formal description and experience inheritance and application of knowledge base. In order to solve the problem of intelligent cloud computing program, a general MapReduce program generation method is designed. This paper proposes the architecture of intelligent cloud computing by studying AORBCO model and combining cloud computing technology. According to the behavior control mechanism in AORBCO model, a program generation method of MapReduce in intelligent cloud computing is proposed. This method will extract entity information in input data set and entity information in knowledge base in intelligent cloud computing for similarity calculation, and extract the entity in the top order as key key-value pair information in intelligent cloud computing judgment data set. The data processing types are divided, and then aligned with each specific MapReduce capability, and the MapReduce program generation experiment is verified in the AORBCO model development platform. The experiment shows that the complexity of big data MapReduce program code is simplified, and the generated code execution efficiency is good.
APA, Harvard, Vancouver, ISO, and other styles
9

Khudhair, Muslim Mohsin, Adil AL-Rammahi, and Furkan Rabee. "An innovativefractal architecture model for implementing MapReduce in an open multiprocessing parallel environment." Indonesian Journal of Electrical Engineering and Computer Science 30, no. 2 (May 1, 2023): 1059. http://dx.doi.org/10.11591/ijeecs.v30.i2.pp1059-1067.

Full text
Abstract:
One of the infrastructure applications that cloud computing offers as a service is parallel data processing. MapReduce is a type of parallel processing used more and more by data-intensive applications in cloud computing environments. MapReduce is based on a strategy called "divide and conquer," which uses regular computers, also called "nodes," to do processing in parallel. This paper looks at how open multiprocessing (OpenMP), the best shared-memory parallel programming model for high-performance computing, can be used with the proposed fractal network model in the MapReduce application. A well-known model, the cube, is used to compare the fractal network model and its work. Where experiments demonstrated that the fractal model is preferable to the cube model. The fractal model achieved an average speedup of 2.7 and an efficiency rate of 67.7%. In contrast, the cube model could only reach an average speedup of 2.5 and an efficiency rate of 60.4%.
APA, Harvard, Vancouver, ISO, and other styles
10

Marzuni, Saeed Mirpour, Abdorreza Savadi, Adel N. Toosi, and Mahmoud Naghibzadeh. "Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce." Future Generation Computer Systems 115 (February 2021): 188–200. http://dx.doi.org/10.1016/j.future.2020.09.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Sharma, Yashvardhan, Saurabh Verma, Sumit Kumar, and Shivam U. "A Context-Based Performance Enhancement Algorithm for Columnar Storage in MapReduce with Hive." International Journal of Cloud Applications and Computing 3, no. 4 (October 2013): 38–50. http://dx.doi.org/10.4018/ijcac.2013100104.

Full text
Abstract:
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this context, MapReduce has emerged as a promising architecture for large scale data warehousing and data analytics on commodity clusters. The MapReduce framework offers several lucrative features such as high fault-tolerance, scalability and use of a variety of hardware from low to high range. But these benefits have resulted in substantial performance compromise. In this paper, we propose the design of a novel cluster-based data warehouse system, Daenyrys for data processing on Hadoop – an open source implementation of the MapReduce framework under the umbrella of Apache. Daenyrys is a data management system which has the capability to take decision about the optimum partitioning scheme for the Hadoop's distributed file system (DFS). The optimum partitioning scheme improves the performance of the complete framework. The choice of the optimum partitioning is query-context dependent. In Daenyrys, the columns are formed into optimized groups to provide the basis for the partitioning of tables vertically. Daenyrys has an algorithm that monitors the context of current queries and based on the observations, it re-partitions the DFS for better performance and resource utilization. In the proposed system, Hive, a MapReduce-based SQL-like query engine is supported above the DFS.
APA, Harvard, Vancouver, ISO, and other styles
12

Esposito, Christian, and Massimo Ficco. "Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce." International Journal of Data Warehousing and Mining 12, no. 1 (January 2016): 49–68. http://dx.doi.org/10.4018/ijdwm.2016010104.

Full text
Abstract:
The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.
APA, Harvard, Vancouver, ISO, and other styles
13

Xiao, Hao, Huajuan Zhang, Fen Ge, and Ning Wu. "A MapReduce architecture for embedded multiprocessor system-on-chips." IEICE Electronics Express 13, no. 2 (2016): 20151025. http://dx.doi.org/10.1587/elex.13.20151025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Yin, ShouYi, ShengJia Shao, LeiBo Liu, and ShaoJun Wei. "MapReduce inspired loop mapping for coarse-grained reconfigurable architecture." Science China Information Sciences 57, no. 12 (December 2014): 1–14. http://dx.doi.org/10.1007/s11432-014-5198-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Veiga, Jorge, Roberto R. Expósito, Guillermo L. Taboada, and Juan Touriño. "Flame-MR: An event-driven architecture for MapReduce applications." Future Generation Computer Systems 65 (December 2016): 46–56. http://dx.doi.org/10.1016/j.future.2016.06.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Song, Minjae, Hyunsuk Oh, Seungmin Seo, and Kyong-Ho Lee. "Map-Side Join Processing of SPARQL Queries Based on Abstract RDF Data Filtering." Journal of Database Management 30, no. 1 (January 2019): 22–40. http://dx.doi.org/10.4018/jdm.2019010102.

Full text
Abstract:
The amount of RDF data being published on the Web is increasing at a massive rate. MapReduce-based distributed frameworks have become the general trend in processing SPARQL queries against RDF data. Currently, query processing systems that use MapReduce have not been able to keep up with the increase of semantic annotated data, resulting in non-interactive SPARQL query processing. The principal reason is that intermediate query results from join operations in a MapReduce framework are so massive that they consume all available network bandwidth. In this article, the authors present an efficient SPARQL processing system that uses MapReduce and HBase. The system runs a job optimized query plan using their proposed abstract RDF data to decrease the number of jobs and also decrease the amount of input data. The authors also present an efficient algorithm of using Map-side joins while also using the abstract RDF data to filter out unneeded RDF data. Experimental results show that the proposed approach demonstrates better performance when processing queries with a large amount of input data than those found in previous works.
APA, Harvard, Vancouver, ISO, and other styles
17

Zhou, Jingren, Nicolas Bruno, Ming-Chuan Wu, Per-Ake Larson, Ronnie Chaiken, and Darren Shakib. "SCOPE: parallel databases meet MapReduce." VLDB Journal 21, no. 5 (June 28, 2012): 611–36. http://dx.doi.org/10.1007/s00778-012-0280-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Li, Yuan, Ahmed Eldawy, Jie Xue, Nadezda Knorozova, Mohamed F. Mokbel, and Ravi Janardan. "Scalable computational geometry in MapReduce." VLDB Journal 28, no. 4 (January 16, 2019): 523–48. http://dx.doi.org/10.1007/s00778-018-0534-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Weipeng, Jing, Tian Dongxue, Chen Guangsheng, and Li Yiyuan. "Research on Improved Method of Storage and Query of Large-Scale Remote Sensing Images." Journal of Database Management 29, no. 3 (July 2018): 1–16. http://dx.doi.org/10.4018/jdm.2018070101.

Full text
Abstract:
The traditional method is used to deal with massive remote sensing data stored in low efficiency and poor scalability. This article presents a parallel processing method based on MapReduce and HBase. The filling of remote sensing images by the Hilbert curve makes the MapReduce method construct pyramids in parallel to reduce network communication between nodes. Then, the authors design a massive remote sensing data storage model composed of metadata storage model, index structure and filter column family. Finally, this article uses MapReduce frameworks to realize pyramid construction, storage and query of remote sensing data. The experimental results show that this method can effectively improve the speed of data writing and querying, and has good scalability.
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Hao, Lei Xue, Yan Zhu, Chun Li Yang, Li Li Chi, Zheng Wang, and Zhao Lu Zhang. "Medical Information Sharing Architecture Based on Cloud Computing." Applied Mechanics and Materials 130-134 (October 2011): 3095–101. http://dx.doi.org/10.4028/www.scientific.net/amm.130-134.3095.

Full text
Abstract:
The existing independent hospital information system hinders the cooperation and integration of healthcare service processes. In order to solve this problem, hospital information architecture based on cloud computing was put forward through analyzing the characteristics of medical information. The platform, system architecture and key technologies of the hospital information architecture were discussed. Software prototype was developed by using Hadoop, Tashi and MapReduce and implementation of this model proved to be feasible and effective.
APA, Harvard, Vancouver, ISO, and other styles
21

Liang, Shuang, Shouyi Yin, Leibo Liu, Yike Guo, and Shaojun Wei. "A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration." IEEE Computer Architecture Letters 15, no. 2 (July 1, 2016): 69–72. http://dx.doi.org/10.1109/lca.2015.2458318.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Lu, Weiming, Yaoguang Wang, Jingyuan Jiang, Jian Liu, Yapeng Shen, and Baogang Wei. "Hybrid storage architecture and efficient MapReduce processing for unstructured data." Parallel Computing 69 (November 2017): 63–77. http://dx.doi.org/10.1016/j.parco.2017.08.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Wenzhu, Yusong Tan, Qingbo Wu, and Yaoxue Zhang. "micMR: An efficient MapReduce framework for CPU–MIC heterogeneous architecture." Journal of Parallel and Distributed Computing 93-94 (July 2016): 120–31. http://dx.doi.org/10.1016/j.jpdc.2016.04.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Jeong, Won Seob, and Won Woo Ro. "A Design of SIMT-based MapReduce Accelerator Architecture for Solid-state Drives." Journal of the Institute of Electronics and Information Engineers 56, no. 10 (October 31, 2019): 25–31. http://dx.doi.org/10.5573/ieie.2019.56.10.25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Tan, Jian, Xiaoqiao Meng, and Li Zhang. "Delay tails in MapReduce scheduling." ACM SIGMETRICS Performance Evaluation Review 40, no. 1 (June 7, 2012): 5–16. http://dx.doi.org/10.1145/2318857.2254761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Benelallam, Amine, Abel Gómez, Massimo Tisi, and Jordi Cabot. "Distributing relational model transformation on MapReduce." Journal of Systems and Software 142 (August 2018): 1–20. http://dx.doi.org/10.1016/j.jss.2018.04.014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Fehér, Péter, Márk Asztalos, Tamás Vajk, Tamás Mészáros, and László Lengyel. "Detecting subgraph isomorphism with MapReduce." Journal of Supercomputing 73, no. 5 (October 6, 2016): 1810–51. http://dx.doi.org/10.1007/s11227-016-1885-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Hashem, Ibrahim Abaker Targio, Nor Badrul Anuar, Mohsen Marjani, Ejaz Ahmed, Haruna Chiroma, Ahmad Firdaus, Muhamad Taufik Abdullah, et al. "MapReduce scheduling algorithms: a review." Journal of Supercomputing 76, no. 7 (December 10, 2018): 4915–45. http://dx.doi.org/10.1007/s11227-018-2719-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Krishna.R, Ragav, and Sushma R. "Dedicated Client Architecture in MapReduce and its Implications on Performance Considerations." International Journal of Computer Applications 104, no. 8 (October 18, 2014): 1–3. http://dx.doi.org/10.5120/18219-9275.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Xu, Jian, and Bin Ma. "Study of Network Public Opinion Classification Method Based on Naive Bayesian Algorithm in Hadoop Environment." Applied Mechanics and Materials 519-520 (February 2014): 58–61. http://dx.doi.org/10.4028/www.scientific.net/amm.519-520.58.

Full text
Abstract:
In the light of the excellent distributed storage and parallel processing feature of hadoop cluster, a new kind of network public opinion classification method based on Naive Bayes algorithm in hadoop environment is studied. The collected public opinion documents are stored locally according to the HDFS architecture, and whose character words are extracted paralleled in Mapreduce process. Thus the naive Bayesian classification algorithm is parallel encapsulated on cloud computing platform. The MapReduce packaged Naive Bayesian classification algorithm performance is verified and the results show that the algorithm execution speed are significantly improved compared to a single server. Its public opinion classification accuracy rate is more than 85%, which can effectively improve the classification performance of network public opinion and classification efficiency.
APA, Harvard, Vancouver, ISO, and other styles
31

Peng, Zhihao, Poria Pirozmand, Masoumeh Motevalli, and Ali Esmaeili. "Genetic Algorithm-Based Task Scheduling in Cloud Computing Using MapReduce Framework." Mathematical Problems in Engineering 2022 (September 30, 2022): 1–11. http://dx.doi.org/10.1155/2022/4290382.

Full text
Abstract:
Task scheduling is an essential component of any distributed system because it routes tasks to appropriate resources for execution, such as grids, clouds, and peer-to-peer networks. Common scheduling algorithms include downsides, such as high temporal complexity, non-simultaneous processing of input tasks, and longer program execution times. Exploration-based scheduling algorithms prioritize tasks using a variety of methods, resulting in long execution times on heterogeneous distributed computing systems. As a result, task prioritization becomes a bottleneck in such systems. It is appropriate to prioritize tasks with the shortest execution time using faster algorithms. The genetic algorithm (GA) is one of the evolutionary approaches used to solve complex problems quickly. This paper proposes a parallel GA with a MapReduce architecture for scheduling jobs on cloud computing with various priority queues. The fundamental aim of this study is to employ a MapReduce architecture to minimize the total execution time of the task scheduling process in the cloud computing environment. The proposed method accomplishes task scheduling in two stages: first, the GA was used in conjunction with heuristic techniques to assign tasks to processors, and then the GA was used in conjunction with the MapReduce framework to assign jobs to processors. In our experiments, we consider heterogeneous resources that differ in their ability to execute various tasks, as well as running a job on different resources with varying execution durations. The results show that the proposed method outperforms other algorithms such as particle swarm optimization, whale optimization algorithm, moth-flame optimization, and intelligent water drops.
APA, Harvard, Vancouver, ISO, and other styles
32

Diarra, Mamadou, and Telesphore B. Tiendrebeogo. "Performance Evaluation of Big Data Processing of Cloak-Reduce." International Journal of Distributed and Parallel systems 13, no. 1 (January 31, 2022): 13–22. http://dx.doi.org/10.5121/ijdps.2022.13102.

Full text
Abstract:
Big Data has introduced the challenge of storing and processing large volumes of data (text, images, and videos). The success of centralised exploitation of massive data on a node is outdated, leading to the emergence of distributed storage, parallel processing and hybrid distributed storage and parallel processing frameworks. The main objective of this paper is to evaluate the load balancing and task allocation strategy of our hybrid distributed storage and parallel processing framework CLOAK-Reduce. To achieve this goal, we first performed a theoretical approach of the architecture and operation of some DHT-MapReduce. Then, we compared the data collected from their load balancing and task allocation strategy by simulation. Finally, the simulation results show that CLOAK-Reduce C5R5 replication provides better load balancing efficiency, MapReduce job submission with 10% churn or no churn.
APA, Harvard, Vancouver, ISO, and other styles
33

Rafique, M. Mustafa, Benjamin Rose, Ali R. Butt, and Dimitrios S. Nikolopoulos. "Supporting MapReduce on large-scale asymmetric multi-core clusters." ACM SIGOPS Operating Systems Review 43, no. 2 (April 21, 2009): 25–34. http://dx.doi.org/10.1145/1531793.1531800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Tan, Jian, Yandong Wang, Weikuan Yu, and Li Zhang. "Non-work-conserving effects in MapReduce." ACM SIGMETRICS Performance Evaluation Review 42, no. 1 (June 20, 2014): 181–92. http://dx.doi.org/10.1145/2637364.2592007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sandholm, Thomas, and Kevin Lai. "MapReduce optimization using regulated dynamic prioritization." ACM SIGMETRICS Performance Evaluation Review 37, no. 1 (June 15, 2009): 299–310. http://dx.doi.org/10.1145/2492101.1555384.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Lee, Daewoo, Jin-Soo Kim, and Seungryoul Maeng. "Large-scale incremental processing with MapReduce." Future Generation Computer Systems 36 (July 2014): 66–79. http://dx.doi.org/10.1016/j.future.2013.09.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Kavitha, C., S. R. Srividhya, Wen-Cheng Lai, and Vinodhini Mani. "IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop." Electronics 11, no. 10 (May 17, 2022): 1599. http://dx.doi.org/10.3390/electronics11101599.

Full text
Abstract:
Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Xi Zi, and Ya Bin Xu. "Design of Peer-to-Peer Traffic Classification System Model Based on Cloud Computing." Applied Mechanics and Materials 182-183 (June 2012): 1347–51. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.1347.

Full text
Abstract:
The advantages and disadvantages of mainstream peer-to-peer (P2P) traffic classification technology in the current application are analyzed. As existing traffic classification tools fail to meet the super flow, as well as continuous increasing of network bandwidth, a cloud-based P2P traffic classification system model is proposed, which use the distributed parallel computing architecture named MapReduce based on hadoop.
APA, Harvard, Vancouver, ISO, and other styles
39

Özgüven, Yavuz, Utku Gönener, and Süleyman Eken. "A Dockerized big data architecture for sports analytics." Computer Science and Information Systems, no. 00 (2022): 10. http://dx.doi.org/10.2298/csis220118010o.

Full text
Abstract:
The big data revolution has had an impact on sports analytics as well. Many large corporations have begun to see the financial benefits of integrating sports analytics with big data. When we rely on central processing systems to aggregate and analyze large amounts of sport data from many sources, we compromise the accuracy and timeliness of the data. As a response to these issues, distributed systems come to the rescue, and the MapReduce paradigm holds promise for large scale data analytics. We describe a big data architecture based on Docker containers with Apache Spark in this paper. We evaluate the architecture on four data-intensive case studies in sport analytics including structured analysis, streaming, machine learning approaches, and graph-based analysis.
APA, Harvard, Vancouver, ISO, and other styles
40

Urazmatov, T. Q., and X. Sh Kuzibayev. "MapReduce and Apache spark: technology analysis, advantages and disadvantages." Journal of Physics: Conference Series 2373, no. 5 (December 1, 2022): 052008. http://dx.doi.org/10.1088/1742-6596/2373/5/052008.

Full text
Abstract:
Abstract Nowadays, it is absolutely illogical and impossible to process big data using traditional software methods and hardware. because too much data available does not allow this. However, there are some effective ways to perform such operations. This article discusses the main problems and solutions for processing big data. Today, there are a number of technologies and algorithms that process and analyze big data. This article mainly discusses, analyzes, and summarizes the advantages and disadvantages of the MapReduce architecture and Apache spark technology, and the results are presented in tabular form.
APA, Harvard, Vancouver, ISO, and other styles
41

Feng, Yijia, and Lei Wang. "Distributed ItemCF Recommendation Algorithm Based on the Combination of MapReduce and Hive." Electronics 12, no. 16 (August 10, 2023): 3398. http://dx.doi.org/10.3390/electronics12163398.

Full text
Abstract:
The ItemCF algorithm is currently the most widely used recommendation algorithm in commercial applications. In the early days of recommender systems, most recommendation algorithms were run on a single machine rather than in parallel. This approach, coupled with the rapid growth of massive user behavior data in the current big data era, has led to a bottleneck in improving the execution efficiency of recommender systems. With the vigorous development of distributed technology, distributed ItemCF algorithms have become a research hotspot. Hadoop is a very popular distributed system infrastructure. MapReduce, which provides massive data computing, and Hive, a data warehousing tool, are the two core components of Hadoop, each with its own advantages and applicable scenarios. Scholars have already utilized MapReduce and Hive for the parallelization of the ItemCF algorithm. However, these pieces of literature make use of either MapReduce or Hive alone without fully leveraging the strengths of both. As a result, it has been difficult for parallel ItemCF recommendation algorithms to feature both simple and efficient implementation and high running efficiency. To address this issue, we proposed a distributed ItemCF recommendation algorithm based on the combination of MapReduce and Hive and named it HiMRItemCF. This algorithm divided ItemCF into six steps: deduplication, obtaining the preference matrixes of all users, obtaining the co-occurrence matrixes of all items, multiplying the two matrices to generate a three-dimensional matrix, aggregating the data of the three-dimensional matrix to obtain the recommendation scores of all users for all items, and sorting the scores in descending order, with Hive being used to carry out steps 1 and 6, and MapReduce for the other four steps involving more complex calculations and operations. The Hive jobs and MapReduce jobs are linked through Hive’s external tables. After implementing the proposed algorithm using Java and running the program on three publicly available user shopping behavior datasets, we found that compared to algorithms that only use MapReduce jobs, the program implementing the proposed algorithm has fewer lines of source code, lower cyclomatic complexity and Halstead complexity, and can achieve a higher speedup ratio and parallel computing efficiency when processing all datasets. These experimental results indicate that the parallel and distributed ItemCF algorithm proposed in this paper, which combines MapReduce and Hive, has both the advantages of concise and easy-to-understand code as well as high time efficiency.
APA, Harvard, Vancouver, ISO, and other styles
42

Xu, Yujie, Wenyu Qu, Zhiyang Li, Geyong Min, Keqiu Li, and Zhaobin Liu. "Efficient $k$ -Means++ Approximation with MapReduce." IEEE Transactions on Parallel and Distributed Systems 25, no. 12 (December 1, 2014): 3135–44. http://dx.doi.org/10.1109/tpds.2014.2306193.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Fang, Wenbin, Bingsheng He, Qiong Luo, and Naga K. Govindaraju. "Mars: Accelerating MapReduce with Graphics Processors." IEEE Transactions on Parallel and Distributed Systems 22, no. 4 (April 2011): 608–20. http://dx.doi.org/10.1109/tpds.2010.158.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Lu, Yue, Yuguan Li, and Mohamed Y. Eltabakh. "Decorating the cloud: enabling annotation management in MapReduce." VLDB Journal 25, no. 3 (January 30, 2016): 399–424. http://dx.doi.org/10.1007/s00778-016-0422-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Tapiador, D., W. O’Mullane, A. G. A. Brown, X. Luri, E. Huedo, and P. Osuna. "A framework for building hypercubes using MapReduce." Computer Physics Communications 185, no. 5 (May 2014): 1429–38. http://dx.doi.org/10.1016/j.cpc.2014.02.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

P, Naresh, Rajyalakshmi P, Krishna Vempati, and Saidulu D. "IMPROVING THE DATA TRANSMISSION SPEED IN CLOUD MIGRATION BY USING MAPREDUCE FOR BIGDATA." International Journal of Engineering Technology and Management Sciences 4, no. 5 (September 28, 2020): 73–75. http://dx.doi.org/10.46647/ijetms.2020.v04i05.013.

Full text
Abstract:
Cloud acts as a data storage and also used for data transfer from one cloud to other. Here data exchange takes place among cloud centers of organizations. At each cloud center huge amount of data was stored, which interns hard to store and retrieve information from it. While migrating the data there are some issues like low data transfer rate, end to end latency issues and data storage issues will occur. As data was distributed among so many cloud centers from single source, will reduces the speed of migration. In distributed cloud computing it is very difficult to transfer the data fast and securely. This paper explores MapReduce within the distributed cloud architecture where MapReduce assists at each cloud. It strengthens the data migration process with the help of HDFS. Compared to existing cloud migration approach the proposed approach gives accurate results interns of speed, time and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
47

Dicky, Timothy, Alva Erwin, and Heru Purnomo Ipung. "Developing a Scalable and Accurate Job Recommendation System with Distributed Cluster System using Machine Learning Algorithm." Journal of Applied Information, Communication and Technology 7, no. 2 (March 17, 2021): 71–78. http://dx.doi.org/10.33555/jaict.v7i2.108.

Full text
Abstract:
The purpose of this research is to develop a job recommender system based on the Hadoop MapReduce framework to achieve scalability of the system when it processes big data. Also, a machine learning algorithm is implemented inside the job recommender to produce an accurate job recommendation. The project begins by collecting sample data to build an accurate job recommender system with a centralized program architecture. Then a job recommender with a distributed system program architecture is implemented using Hadoop MapReduce which then deployed to a Hadoop cluster. After the implementation, both systems are tested using a large number of applicants and job data, with the time required for the program to compute the data is recorded to be analyzed. Based on the experiments, we conclude that the recommender produces the most accurate result when the cosine similarity measure is used inside the algorithm. Also, the centralized job recommender system is able to process the data faster compared to the distributed cluster job recommender system. But as the size of the data grows, the centralized system eventually will lack the capacity to process the data, while the distributed cluster job recommender is able to scale according to the size of the data.
APA, Harvard, Vancouver, ISO, and other styles
48

Lin, Minghong, Li Zhang, Adam Wierman, and Jian Tan. "Joint optimization of overlapping phases in MapReduce." ACM SIGMETRICS Performance Evaluation Review 41, no. 3 (January 10, 2014): 16–18. http://dx.doi.org/10.1145/2567529.2567534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Xin, Junchang, Zhiqiong Wang, Chen Chen, Linlin Ding, Guoren Wang, and Yuhai Zhao. "ELM ∗ : distributed extreme learning machine with MapReduce." World Wide Web 17, no. 5 (June 29, 2013): 1189–204. http://dx.doi.org/10.1007/s11280-013-0236-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Wang, Xiao Feng. "The Application of Hadoop in the Campus Cloud Computing System." Applied Mechanics and Materials 543-547 (March 2014): 3092–95. http://dx.doi.org/10.4028/www.scientific.net/amm.543-547.3092.

Full text
Abstract:
Based on the theory of cloud computing, this paper uses Hadoop distributed computing framework and the MapReduce programming model, designs and implements a campus cloud computing system for processing huge amounts of data. The system uses a three-layer architecture, has the flexibility to expand the scale, low development cost and ease of operation, reduces the difficulty of parallel programming and has the ability to efficiently handle massive data analysis and processing.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography