Увійти

Готові списки джерел за темами / Apache Hive / Статті в журналах

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Apache Hive.

Статті в журналах з теми "Apache Hive"

Автор: Grafiati

Опубліковано: 30 травня 2022

Оновлено: 31 травня 2022

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-26 статей у журналах для дослідження на тему "Apache Hive".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Martinez-Mosquera, Diana, Rosa Navarrete, and Sergio Luján-Mora. "Efficient processing of complex XSD using Hive and Spark." PeerJ Computer Science 7 (August 17, 2021): e652. http://dx.doi.org/10.7717/peerj-cs.652.

Повний текст джерела

Анотація:

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Jankatti, Santosh, Raghavendra B. K., Raghavendra S., and Meenakshi Meenakshi. "Performance evaluation of Map-reduce jar pig hive and spark with machine learning using big data." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 4 (August 1, 2020): 3811. http://dx.doi.org/10.11591/ijece.v10i4.pp3811-3818.

Повний текст джерела

Анотація:

Big data is the biggest challenges as we need huge processing power system and good algorithms to make an decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Wankhede, Manish, Vijay Trivedi, and Vineet Richhariya. "Location based Analysis of Twitter Data using Apache Hive." International Journal of Computer Applications 153, no. 10 (November 17, 2016): 21–26. http://dx.doi.org/10.5120/ijca2016912170.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Ravindra, Padmashree, and Kemafor Anyanwu. "Nesting Strategies for Enabling Nimble MapReduce Dataflows for Large RDF Data." International Journal on Semantic Web and Information Systems 10, no. 1 (January 2014): 1–26. http://dx.doi.org/10.4018/ijswis.2014010101.

Повний текст джерела

Анотація:

Graph and semi-structured data are usually modeled in relational processing frameworks as “thin” relations (node, edge, node) and processing such data involves a lot of join operations. Intermediate results of joins with multi-valued attributes or relationships, contain redundant subtuples due to repetition of single-valued attributes. The amount of redundant content is high for real-world multi-valued relationships in social network (millions of Twitter followers of popular celebrities) or biological (multiple references to related proteins) datasets. In MapReduce-based platforms such as Apache Hive and Pig, redundancy in intermediate results contributes avoidable costs to the overall I/O, sorting, and network transfer overhead of join-intensive workloads due to longer workflows. Consequently, providing techniques for dealing with such redundancy will enable more nimble execution of such workflows. This paper argues for the use of a nested data model for representing intermediate data concisely using nesting-aware dataflow operators that allow for lazy and partial unnesting strategies. This approach reduces the overall I/O and network footprint of a workflow by concisely representing intermediate results during most of a workflow's execution, until complete unnesting is absolutely necessary. The proposed strategies are integrated into Apache Pig and experimental evaluation over real-world and synthetic benchmark datasets confirms their superiority over relational-style MapReduce systems such as Apache Pig and Hive.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Hacimahmud, Abdullayev Vugar, Ragimova Nazila Ali, and Khalilov Matlab Etibar. "The research of social processes at the university using big data." MATEC Web of Conferences 348 (2021): 01003. http://dx.doi.org/10.1051/matecconf/202134801003.

Повний текст джерела

Анотація:

The volume of information in the 21st century is growing at a rapid pace. Big data technologies are used to process modern information. This article discusses the use of big data technologies to implement monitoring of social processes. Big data has its characteristics and principles, which reflect here. In addition, we also discussed big data applications in some areas. Particular attention in this article pays to the interactions of big data and sociology. For this, there consider digital sociology and computational social sciences. One of the main objects of study in sociology is social processes. The article shows the types of social processes and their monitoring. As an example, there is implemented monitoring of social processes at the university. There are used following technologies for the realization of social processes monitoring: products 1010data (1010edge, 1010connect, 1010reveal, 1010equities), products of Apache Software Foundation (Apache Hive, Apache Chukwa, Apache Hadoop, Apache Pig), MapReduce framework, language R, library Pandas, NoSQL, etc. Despite this, this article examines the use of the MapReduce model for social processes monitoring at the university.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Ritika Siril Paul, Yazala, and Dilipkumar A. Borikar. "An Approach To Twitter Sentiment Analysis Over Hadoop." International Journal of Engineering & Technology 7, no. 4.5 (September 22, 2018): 374. http://dx.doi.org/10.14419/ijet.v7i4.5.20110.

Повний текст джерела

Анотація:

Sentiment analysis is the process of identifying people’s attitude and emotional state from the language they use via any social websites or other sources. The main aim is to identify a set of potential features in the review and extract the opinion expressions of those features by making full use of their associations. The Twitter has now become a routine for the people around the world to post thousands of reactions and opinions on every topic, every second of every single day. It’s like one big psychological database that’s constantly being updated and which can be used to analyze the sentiments of the people. Hadoop is one of the best options available for twitter data sentiment analysis and which also works for the distributed big data, streaming data, text data etc. This paper provides an efficient mechanism to perform sentiment analysis/ opinion mining on Twitter data over Hortonworks Data platform, which provides Hadoop on Windows, with the assistance of Apache Flume, Apache HDFS and Apache Hive.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Chang, Bao-Rong, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo, and Chi-Chung Chen. "Integration and optimization of multiple big data processing platforms." Engineering Computations 33, no. 6 (August 1, 2016): 1680–704. http://dx.doi.org/10.1108/ec-08-2015-0247.

Повний текст джерела

Анотація:

Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Yu, Dongjin, Wensheng Dou, Zhixiang Zhu, and Jiaojiao Wang. "Materialized View Selection Based on Adaptive Genetic Algorithm and Its Implementation with Apache Hive." International Journal of Computational Intelligence Systems 8, no. 6 (2015): 1091. http://dx.doi.org/10.1080/18756891.2015.1113744.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Tidke, Bharat, Rupa Mehta, Dipti Rana, and Hullash Jangir. "Topic Sensitive User Clustering Using Sentiment Score and Similarity Measures." International Journal of Web-Based Learning and Teaching Technologies 15, no. 2 (April 2020): 34–45. http://dx.doi.org/10.4018/ijwltt.2020040103.

Повний текст джерела

Анотація:

Social media data (SMD) is driven by statistical and analytical technologies to obtain information for various decisions. SMD is vast and evolutionary in nature which makes traditional data warehouses ill suited. The research aims to propose and implement novel framework that analyze tweets data from online social networking site (OSN; i.e., Twitter). The authors fetch streaming tweets from Twitter API using Apache Flume to detect clusters of users having similar sentiment. Proposed approach utilizes scalable and fault tolerant system (i.e., Hadoop) that typically harness HDFS for data storage and map-reduce paradigm for data processing. Apache Hive is used to work on top of Hadoop for querying data. The experiments are performed to test the scalability of proposed framework by examining various sizes of data. The authors' goal is to handle big social data effectively using cost-effective tools for fetching as well as querying unstructured data and algorithms for analysing scalable, uninterrupted data streams with finite memory and resources.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Gomathy, Dr C. K. "Efficient Transfer of data from RDBMS to HDFS and conversion to JSON format." International Journal for Research in Applied Science and Engineering Technology 9, no. 10 (October 31, 2021): 1869–71. http://dx.doi.org/10.22214/ijraset.2021.38710.

Повний текст джерела

Анотація:

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Saha, Sujan, and Sukumar Mandal. "Application of tools to support Linked Open Data." Library Hi Tech News 38, no. 6 (October 18, 2021): 21–24. http://dx.doi.org/10.1108/lhtn-09-2021-0060.

Повний текст джерела

Анотація:

Purpose These projects aim to improve library services for users in the future by combining Link Open Data (LOD) technology with data visualization. It displays and analyses search results in an intuitive manner. These services are enhanced by integrating various LOD technologies into the authority control system. Design/methodology/approach The technology known as LOD is used to access, recycle, share, exchange and disseminate information, among other things. The applicability of Linked Data technologies for the development of library information services is evaluated in this study. Findings Apache Hadoop is used for rapidly storing and processing massive Linked Data data sets. Apache Spark is a free and open-source data processing tool. Hive is a SQL-based data warehouse that enables data scientists to write, read and manage petabytes of data. Originality/value The distributed large data storage system Apache HBase does not use SQL. This study’s goal is to search the geographic, authority and bibliographic databases for relevant links found on various websites. When data items are linked together, all of the data bits are linked together as well. The study observed and evaluated the tools and processes and recorded each data item’s URL. As a result, data can be combined across silos, enhanced by third-party data sources and contextualized.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Singh, Harbhajan, and Vijay Dhir. "Comparative Study of Sentiment Analysis of Type 1 and Type 2 Diabetic Patients Using Apache Flume and Hive." Webology 19, no. 1 (January 20, 2022): 4035–54. http://dx.doi.org/10.14704/web/v19i1/web19266.

Повний текст джерела

Анотація:

The Hadoop ecosystem platform has been used to perform sentiment analysis of the opinion regarding type 1 and type 2 diabetes. This research work concentrates on exploring the public opinion about assumptions, thinking and behavior towards the condition of type 1 and type 2 diabetes. Tweets from twitter have been taken as dataset for sentiment analysis for the research. After conducting a set of repetitive experiments, it has been concluded that people feel more negative about type 2 diabetes. They are more pessimistic about the diet and treatment of type 2 diabetes. Present paper is divided mainly into four sections. The first section, gives an overview of the sentiment analysis and diabetes. The proposed work and methodology have been included in the second section. The third section contains the whole experiment work to support the results. In the fourth section, results regarding opinions about type 1 and type 2 diabetes have been illustrated using bar charts obtained from the experiments followed by concluding remarks, future scope and significance of the research.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Agarwal, Dr Ujjwal. "MAPREDUCE: INSIGHT ANALYSIS OF BIG DATA VIA PARALLEL DATA PROCESSING USING JAVA PROGRAMMING, HIVE AND APACHE PIG." International Journal of Advanced Research in Computer Science 9, no. 1 (February 20, 2018): 536–40. http://dx.doi.org/10.26483/ijarcs.v9i1.5414.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Yue, Hang. "Unstructured Healthcare Data Archiving and Retrieval Using Hadoop and Drill." International Journal of Big Data and Analytics in Healthcare 3, no. 2 (July 2018): 28–44. http://dx.doi.org/10.4018/ijbdah.2018070103.

Повний текст джерела

Анотація:

A healthcare hybrid Hadoop ecosystem is analyzed for unstructured healthcare data archives. This healthcare hybrid Hadoop ecosystem is composed of some components such as Pig, Hive, Sqoop and Zoopkeeper, Hadoop Distributed File System (HDFS), MapReduce and HBase. Also, Apache Drill is applied for unstructured healthcare data retrieval. This article will discuss the combination of Hadoop and Drill for data analysis applications. Based on the analysis of Hadoop components, (including HBase design) and the case studies of Drill query design regarding different unstructured healthcare data, the Hadoop ecosystem and Drill are valid tools to integrate and access voluminous complex healthcare data. They can improve the healthcare systems, achieve savings on patient care costs, optimize the healthcare supply chain and infer useful knowledge from noisy and heterogeneous healthcare data sources.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Sharma, Yashvardhan, Saurabh Verma, Sumit Kumar, and Shivam U. "A Context-Based Performance Enhancement Algorithm for Columnar Storage in MapReduce with Hive." International Journal of Cloud Applications and Computing 3, no. 4 (October 2013): 38–50. http://dx.doi.org/10.4018/ijcac.2013100104.

Повний текст джерела

Анотація:

To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this context, MapReduce has emerged as a promising architecture for large scale data warehousing and data analytics on commodity clusters. The MapReduce framework offers several lucrative features such as high fault-tolerance, scalability and use of a variety of hardware from low to high range. But these benefits have resulted in substantial performance compromise. In this paper, we propose the design of a novel cluster-based data warehouse system, Daenyrys for data processing on Hadoop – an open source implementation of the MapReduce framework under the umbrella of Apache. Daenyrys is a data management system which has the capability to take decision about the optimum partitioning scheme for the Hadoop's distributed file system (DFS). The optimum partitioning scheme improves the performance of the complete framework. The choice of the optimum partitioning is query-context dependent. In Daenyrys, the columns are formed into optimized groups to provide the basis for the partitioning of tables vertically. Daenyrys has an algorithm that monitors the context of current queries and based on the observations, it re-partitions the DFS for better performance and resource utilization. In the proposed system, Hive, a MapReduce-based SQL-like query engine is supported above the DFS.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Deepika, Deepika, Ompal Singh, Adarsh Anand, and Jagvinder Singh. "SDE based Unified Scheme for Developing Entropy Prediction Models for OSS." International Journal of Mathematical, Engineering and Management Sciences 6, no. 1 (October 29, 2020): 207–22. http://dx.doi.org/10.33889/ijmems.2021.6.1.013.

Повний текст джерела

Анотація:

Today, so as to meet the user's requirement, modification of software is necessarily required. But at the same time, to incorporate these modifications and requirements there are enormous changes which are made to the coding of the software and over a period of time these changes make the software complex. Largely there are three types of code changes occur in the source code namely, bug repair, feature enhancement & addition of new features, but these changes bring the uncertainty in the bug removal rate. In this paper, these uncertainties have been explicitly modeled and using three-dimensional wiener processes that define the three types of fluctuation; we have come up with an entropy prediction modeling framework with a unified approach. The analytical solution of the equation is interpreted using Itô’s process. The models are fitted on three real life projects namely Avro, Hive and Pig of Apache open source software (OSS) The experimental findings show that present models exhibit accurate estimation results and have strong prediction skills.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Mazzilli Ciraulo, Barbara, Anne-Laure Melchior, Daniel Maschmann, Ivan Yu Katkov, Anaëlle Halle, Françoise Combes, Joseph D. Gelfand, and Aisha Al Yazeedi. "Two interacting galaxies hiding as one, revealed by MaNGA." Astronomy & Astrophysics 653 (September 2021): A47. http://dx.doi.org/10.1051/0004-6361/202141319.

Повний текст джерела

Анотація:

Given their prominent role in galaxy evolution, it is of paramount importance to unveil galaxy interactions and merger events and to investigate the underlying mechanisms. The use of high-resolution data makes it easier to identify merging systems, but it can still be challenging when the morphology does not show any clear galaxy pair or gas bridge. Characterising the origin of puzzling kinematic features can help reveal complicated systems. Here, we present a merging galaxy, MaNGA 1-114955, in which we highlighted the superimposition of two distinct rotating discs along the line of sight. These counter-rotating objects both lie on the star-forming main sequence but display perturbed stellar velocity dispersions. The main galaxy presents off-centred star formation as well as off-centred high-metallicity regions, supporting the scenario of recent starbursts, while the secondary galaxy hosts a central starburst that coincides with an extended radio emission, in excess with respect to star formation expectations. Stellar mass as well as dynamical mass estimates agree towards a mass ratio within the visible radius of 9:1 for these interacting galaxies. We suggest that we are observing a pre-coalescence stage of a merger. The primary galaxy accreted gas through a past first pericentre passage about 1 Gyr ago and more recently from the secondary gas-rich galaxy, which exhibits an underlying active galactic nucleus. Our results demonstrate how a galaxy can hide another one and the relevance of a multi-component approach for studying ambiguous systems. We anticipate that our method will be efficient at unveiling the mechanisms taking place in a sub-sample of galaxies observed by the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, all of which exhibit kinematic features of a puzzling origin in their gas emission lines.

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Gutsche, Oliver, and Igor Mandrichenko. "Striped Data Analysis Framework." EPJ Web of Conferences 245 (2020): 06042. http://dx.doi.org/10.1051/epjconf/202024506042.

Повний текст джерела

Анотація:

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Arora, Saiyam, Abinesh Verma, Richa Vasuja, and Richa Vasuja. "An Overview of Apache Pig and Apache Hive." International Journal of Scientific Research in Computer Science, Engineering and Information Technology, March 5, 2019, 432–36. http://dx.doi.org/10.32628/cseit195250.

Повний текст джерела

Анотація:

Ever since the enhancement of technology has taken place, the data is growing at an alarming rate. The most prominent factor of data growth is the “Social Media”, leads to the origination of a tremendous amount of data called Big Data. Big Data is a term used for data sets that are extremely large in size as well as complicated to store and process using traditional database processing applications. A saviour to deal with Big Data is “Hadoop” and two major components of Hadoop which are HDFS (Distributed Storage) and Map Reduce(Parallel Processing). Apache Pig and Hive is an essential part of the Hadoop Ecosystem. This paper covers an overview of both Apache Pig and Hive with their architecture. As Hadoop, no doubt is doing tremendously great work by storing and processing the huge volume of data but there are more frameworks now a days to increase the efficiency of Hadoop framework which are basically seen as the layers of Hadoop or a part of Apache Hadoop project. And that is why this paper includes the two most important layers namely Apache Pig and Apache Hive.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Abu-Alsaad, Hiba A. "Retailing Analysis Using Hadoop and Apache Hive." International journal of simulation: systems, science & technology, March 29, 2020. http://dx.doi.org/10.5013/ijssst.a.20.01.08.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Ojerinde, Afolabi, and Philip Adewole. "MANAGING HUMAN INDUCED CRISIS WITH BIG DATA INFRASTRUCTURE." JOURNAL OF RESEARCH AND REVIEW IN SCIENCE 5, no. 1 (December 1, 2018). http://dx.doi.org/10.36108/jrrslasu/8102/50(0131).

Повний текст джерела

Анотація:

This paper presents human induced crisis management system using big data infrastructure. This approach was motivated by the already established fact that human induced crisis are characterized by velocity, variety and volume. This paper therefore employed Hadoop big data stack, web technology to design and implement a crisis management model. The resulting system comprises analytical engine, custom website and a desktop application called “Channel”. The Hadoop distributed file system was used for data storage in the analytical engine, crisis data were collected via Twitter API and web service generated by the project website using Apache Flume and Channel respectively. Apache Hive was used to analyse the collected data and the analysed result were posted back to custom website using Channel. The system was evaluated using Mean Opinion Score (MOS) to test for its applicability, usability and reliability. The perceived applicability rating of 74%, usability rating of 73% and reliability rating of 57% were obtained. The resulting system provides insight into crisis situation; promote rapid situational awareness, aid policy formulation and monitoring.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Tariq, Muhammad Usman, Muhammad Babar, Marc Poulin, Akmal Saeed Khattak, Mohammad Dahman Alshehri, and Sarah Kaleem. "Human Behavior Analysis Using Intelligent Big Data Analytics." Frontiers in Psychology 12 (July 6, 2021). http://dx.doi.org/10.3389/fpsyg.2021.686610.

Повний текст джерела

Анотація:

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Ramos, M. P., P. M. Tasinaffo, A. M. Cunha, D. A. Silva, G. S. Gonçalves, and L. A. V. Dias. "A canonical model for seasonal climate prediction using Big Data." Journal of Big Data 9, no. 1 (March 3, 2022). http://dx.doi.org/10.1186/s40537-022-00580-9.

Повний текст джерела

Анотація:

AbstractThis article addresses the elaboration of a canonical model, involving methods, techniques, metrics, tools, and Big Data, applied to the knowledge of seasonal climate prediction, aiming at greater dynamics, speed, conciseness, and scalability. The proposed model was hosted in an environment capable of integrating different types of meteorological data and centralizing data stores. The seasonal climate prediction method called M-PRECLIS was designed and developed for practical application. The usability and efficiency of the proposed model was tested through a case study that made use of operational data generated by an atmospheric numerical model of the climate area found in the supercomputing environment of the Center for Weather Forecasting and Climate Studies linked to the Brazilian Institute for Space Research. The seasonal climate prediction uses ensemble members method to work and the main Big Data technologies used for data processing were: Python language, Apache Hadoop, Apache Hive, and the Optimized Row Columnar (ORC) file format. The main contributions of this research are the canonical model, its modules and internal components, the proposed method M-PRECLIS, and its use in a case study. After applying the model to a practical and real experiment, it was possible to analyze the results obtained and verify: the consistency of the model by the output images, the code complexity, the performance, and also to perform the comparison with related works. Thus, it was found that the proposed canonical model, based on the best practices of Big Data, is a viable alternative that can guide new paths to be followed.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

"Challenges and Research Disputes and Tools in Big Data Analytics." International Journal of Engineering and Advanced Technology 8, no. 6S3 (November 22, 2019): 1949–52. http://dx.doi.org/10.35940/ijeat.f1376.0986s319.

Повний текст джерела

Анотація:

Big Data is the era of data processing. Big Data is the Collate’s observer data sets that are complicated that traditional data-processing abilities.There are the various challenges include data analysis, capture the data, curation, search, sharing, stowage, transmission, visualization, and privacy violations. A large collections of petabytes of data is engendered day by day from the up-to-date information systems and digital era such as Internet of Things and cloud computing. Big data environs is used to attain, organize and analyse the numerous types of data. A large scale distributed file system which should be a fault tolerant, flexible and scalable. The term big data comes with the new challenges to input, process and output the data The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Pig, Apache Hive, No SQL and Spark. Initially, we extant the definition of big data and discuss big data challenges. Succeeding, The Propionate Paramour of Big Data Systems Models in the Into Prolonging Seam, Namely data Generation, data Assange, data Storage, and data Analytics. These four modules form a big data value chain. In accumulation, we present the prevalent Hadoop framework for addressing big data.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Hussein, Abou_el_ela Abdou. "Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)." Journal of Scientific Research and Reports, November 28, 2020, 58–84. http://dx.doi.org/10.9734/jsrr/2020/v26i930310.

Повний текст джерела

Анотація:

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Velázquez Herrera, Jacinto. "Desconcentración administrativa y de rentas." Iuris Dictio 1, no. 2 (July 1, 2000). http://dx.doi.org/10.18272/iu.v1i2.525.

Повний текст джерела

Анотація:

El tema es apasionante. Aparte del carácter científico, que posee en alto grado, es indudable su trascendencia política por supuesto entendiendo el término en su sentido lato. Quiero ser muy claro en orden a mi aserto de que no debe opinarse y, obvio, jamás lo hice ni lo haré, por impulso de conveniencias para promover ocasionales popularidades ni menos, muchísimo menos, con el propósito de alcanzar liderazgo a partir de la defensa de posiciones que parezcan simpáticas o no para las diferentes Provincias del país, a las que no es dable alentarles esperanzas dirigidas a que supongan que la postura tal o cual va a solucionarle sus problemas. No soslayemos que todo ecuatoriano está obligado a condenar del modo más enérgico e irreversible, al que promueva sentimientos de orden regionalista negativo. No es malo defender lo suyo; lo detestable es hacerlo a costa de sacrificar a los demás.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!