Log in

Relevant bibliographies by topics / Big Data analytics applications / Dissertations / Theses

Dissertations / Theses on the topic 'Big Data analytics applications'

To see the other types of publications on this topic, follow the link: Big Data analytics applications.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Big Data analytics applications.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Al-Shiakhli, Sarah. "Big Data Analytics: A Literature Review Perspective." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-74173.

Full text

Abstract:

Big data is currently a buzzword in both academia and industry, with the term being used todescribe a broad domain of concepts, ranging from extracting data from outside sources, storingand managing it, to processing such data with analytical techniques and tools.This thesis work thus aims to provide a review of current big data analytics concepts in an attemptto highlight big data analytics’ importance to decision making.Due to the rapid increase in interest in big data and its importance to academia, industry, andsociety, solutions to handling data and extracting knowledge from datasets need to be developedand provided with some urgency to allow decision makers to gain valuable insights from the variedand rapidly changing data they now have access to. Many companies are using big data analyticsto analyse the massive quantities of data they have, with the results influencing their decisionmaking. Many studies have shown the benefits of using big data in various sectors, and in thisthesis work, various big data analytical techniques and tools are discussed to allow analysis of theapplication of big data analytics in several different domains.

APA, Harvard, Vancouver, ISO, and other styles

2

Talevi, Iacopo. "Big Data Analytics and Application Deployment on Cloud Infrastructure." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14408/.

Full text

Abstract:

This dissertation describes a project began in October 2016. It was born from the collaboration between Mr.Alessandro Bandini and me, and has been developed under the supervision of professor Gianluigi Zavattaro. The main objective was to study, and in particular to experiment with, the cloud computing in general and its potentiality in the data elaboration field. Cloud computing is a utility-oriented and Internet-centric way of delivering IT services on demand. The first chapter is a theoretical introduction on cloud computing, analyzing the main aspects, the keywords, and the technologies behind clouds, as well as the reasons for the success of this technology and its problems. After the introduction section, I will briefly describe the three main cloud platforms in the market. During this project we developed a simple Social Network. Consequently in the third chapter I will analyze the social network development, with the initial solution realized through Amazon Web Services and the steps we took to obtain the final version using Google Cloud Platform with its charateristics. To conclude, the last section is specific for the data elaboration and contains a initial theoretical part that describes MapReduce and Hadoop followed by a description of our analysis. We used Google App Engine to execute these elaborations on a large dataset. I will explain the basic idea, the code and the problems encountered.

APA, Harvard, Vancouver, ISO, and other styles

3

Abounia, Omran Behzad. "Application of Data Mining and Big Data Analytics in the Construction Industry." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu148069742849934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Zhang, Liangwei. "Big Data Analytics for Fault Detection and its Application in Maintenance." Doctoral thesis, Luleå tekniska universitet, Drift, underhåll och akustik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-60423.

Full text

Abstract:

Big Data analytics has attracted intense interest recently for its attempt to extract information, knowledge and wisdom from Big Data. In industry, with the development of sensor technology and Information & Communication Technologies (ICT), reams of high-dimensional, streaming, and nonlinear data are being collected and curated to support decision-making. The detection of faults in these data is an important application in eMaintenance solutions, as it can facilitate maintenance decision-making. Early discovery of system faults may ensure the reliability and safety of industrial systems and reduce the risk of unplanned breakdowns. Complexities in the data, including high dimensionality, fast-flowing data streams, and high nonlinearity, impose stringent challenges on fault detection applications. From the data modelling perspective, high dimensionality may cause the notorious “curse of dimensionality” and lead to deterioration in the accuracy of fault detection algorithms. Fast-flowing data streams require algorithms to give real-time or near real-time responses upon the arrival of new samples. High nonlinearity requires fault detection approaches to have sufficiently expressive power and to avoid overfitting or underfitting problems. Most existing fault detection approaches work in relatively low-dimensional spaces. Theoretical studies on high-dimensional fault detection mainly focus on detecting anomalies on subspace projections. However, these models are either arbitrary in selecting subspaces or computationally intensive. To meet the requirements of fast-flowing data streams, several strategies have been proposed to adapt existing models to an online mode to make them applicable in stream data mining. But few studies have simultaneously tackled the challenges associated with high dimensionality and data streams. Existing nonlinear fault detection approaches cannot provide satisfactory performance in terms of smoothness, effectiveness, robustness and interpretability. New approaches are needed to address this issue. This research develops an Angle-based Subspace Anomaly Detection (ABSAD) approach to fault detection in high-dimensional data. The efficacy of the approach is demonstrated in analytical studies and numerical illustrations. Based on the sliding window strategy, the approach is extended to an online mode to detect faults in high-dimensional data streams. Experiments on synthetic datasets show the online extension can adapt to the time-varying behaviour of the monitored system and, hence, is applicable to dynamic fault detection. To deal with highly nonlinear data, the research proposes an Adaptive Kernel Density-based (Adaptive-KD) anomaly detection approach. Numerical illustrations show the approach’s superiority in terms of smoothness, effectiveness and robustness.

APA, Harvard, Vancouver, ISO, and other styles

5

Green, Oded. "High performance computing for irregular algorithms and applications with an emphasis on big data analytics." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51860.

Full text

Abstract:

Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.

APA, Harvard, Vancouver, ISO, and other styles

6

Svenningsson, Philip, and Maximilian Drubba. "How to capture that business value everyone talks about? : An exploratory case study on business value in agile big data analytics organizations." Thesis, Internationella Handelshögskolan, Jönköping University, IHH, Företagsekonomi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48882.

Full text

Abstract:

Background: Big data analytics has been referred to as a hype the past decade, making manyorganizations adopt data-driven processes to stay competitive in their industries. Many of theorganizations adopting big data analytics use agile methodologies where the most importantoutcome is to maximize business value. Multiple scholars argue that big data analytics lead toincreased business value, however, there is a theoretical gap within the literature about how agileorganizations can capture this business value in a practically relevant way. Purpose: Building on a combined definition that capturing business value means being able todefine-, communicate- and measure it, the purpose of this thesis is to explore how agileorganizations capture business value from big data analytics, as well as find out what aspects ofvalue are relevant when defining it. Method: This study follows an abductive research approach by having a foundation in theorythrough the use of a qualitative research design. A single case study of Nike Inc. was conducted togenerate the primary data for this thesis where nine participants from different domains within theorganization were interviewed and the results were analysed with a thematic content analysis. Findings: The findings indicate that, in order for agile organizations to capture business valuegenerated from big data analytics, they need to (1) define the value through a synthezised valuemap, (2) establish a common language with the help of a business translator and agile methods,and (3), measure the business value before-, during- and after the development by usingindividually idenified KPIs derived from the business value definition.

APA, Harvard, Vancouver, ISO, and other styles

7

Zubar, Tymofiy, Тимофій Андрійович Зубар, Olena Volovyk, and Олена Іванівна Воловик. "Big data in logistics: last mile application." Thesis, National Aviation University, 2021. https://er.nau.edu.ua/handle/NAU/50494.

Full text

Abstract:

1. 3PL Survey - https://www.3plstudy.com/ 2. 5 Examples of How Big Data in Logistics Can Transform The Supply Chain - https://www.datapine.com/blog/how-big-data-logistics-transform-supply-chain
Big data is revolutionizing many business areas, including logistics and business processes in it. The complexity and dynamics of logistics, coupled with the reliance on many movable parts, can cause bottlenecks at any point in the supply chain, making big data application a vital element of effectiveness in logistical processes design and management. For example, big data logistics can be used to optimize routing, simplify factory functions and give transparency to the entire supply chain, from which both logistics companies and shipping companies may benefit. The third-party logistical company and a transportation company may agree on this issue. Though big data require a large number of high-quality information sources to work effectively.
Великі дані зробили революцію в багатьох сферах бізнесу, включаючи логістику та бізнес-процеси в ній. Складність та динаміка логістики в поєднанні з опорою на багато рухомих частин можуть спричинити вузькі місця у будь-якій точці ланцюга поставок, роблячи застосування великих даних важливим елементом ефективності у проектуванні та управлінні логістичними процесами. Наприклад, логістика великих даних може бути використана для оптимізації маршрутизації, спрощення заводських функцій та надання прозорості усьому ланцюжку поставок, від чого можуть виграти як логістичні компанії, так і судноплавні компанії. Стороння логістична компанія та транспортна компанія можуть домовитись щодо цього питання. Хоча великі дані вимагають великої кількості високоякісних джерел інформації для ефективної роботи.

APA, Harvard, Vancouver, ISO, and other styles

8

Cui, Henggang. "Exploiting Application Characteristics for Efficient System Support of Data-Parallel Machine Learning." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/908.

Full text

Abstract:

Large scale machine learning has many characteristics that can be exploited in the system designs to improve its efficiency. This dissertation demonstrates that the characteristics of the ML computations can be exploited in the design and implementation of parameter server systems, to greatly improve the efficiency by an order of magnitude or more. We support this thesis statement with three case study systems, IterStore, GeePS, and MLtuner. IterStore is an optimized parameter server system design that exploits the repeated data access pattern characteristic of ML computations. The designed optimizations allow IterStore to reduce the total run time of our ML benchmarks by up to 50×. GeePS is a parameter server that is specialized for deep learning on distributed GPUs. By exploiting the layer-by-layer data access and computation pattern of deep learning, GeePS provides almost linear scalability from single-machine baselines (13× more training throughput with 16 machines), and also supports neural networks that do not fit in GPU memory. MLtuner is a system for automatically tuning the training tunables of ML tasks. It exploits the characteristic that the best tunable settings can often be decided quickly with just a short trial time. By making use of optimization-guided online trial-and-error, MLtuner can robustly find and re-tune tunable settings for a variety of machine learning applications, including image classification, video classification, and matrix factorization, and is over an order of magnitude faster than traditional hyperparameter tuning approaches.

APA, Harvard, Vancouver, ISO, and other styles

9

Sharma, Rahil. "Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2277.

Full text

Abstract:

Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds.

APA, Harvard, Vancouver, ISO, and other styles

10

Matteuzzi, Tommaso. "Network diffusion methods for omics big bio data analytics and interpretation with application to cancer datasets." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13660/.

Full text

Abstract:

Nella attuale ricerca biomedica un passo fondamentale verso una comprensione dei meccanismi alla radice di una malattia è costituito dalla identificazione dei disease modules, cioè quei sottonetwork dell'interattoma, il network delle interazioni tra proteine, con un alto numero di alterazioni geniche. Tuttavia, l'incompletezza del network e l'elevata variabilità dei geni alterati rendono la soluzione di questo problema non banale. I metodi fisici che sfruttano le proprietà dei processi diffusivi su network, dei quali mi sono occupato in questo lavoro di tesi, sono quelli che consentono di ottenere le migliori prestazioni. Nella prima parte del mio lavoro, ho indagato la teoria relativa alla diffusione ed ai random walk su network, trovando interessanti relazioni con le tecniche di clustering e con altri modelli fisici la cui dinamica è descritta dalla matrice laplaciana. Ho poi implementato un tecnica basata sulla diffusione su rete applicandola a dati di espressione genica e mutazioni somatiche di tre diverse tipologie di cancro. Il metodo è organizzato in due parti. Dopo aver selezionato un sottoinsieme dei nodi dell'interattoma, associamo ad ognuno di essi un'informazione iniziale che riflette il "grado" di alterazione del gene. L'algoritmo di diffusione propaga l'informazione iniziale nel network raggiungendo, dopo un transiente, lo stato stazionario. A questo punto, la quantità di fluido in ciascun nodo è utilizzata per costruire un ranking dei geni. Nella seconda parte, i disease modules sono identificati mediante una procedura di network resampling. L'analisi condotta ci ha permesso di identificare un numero consistente di geni già noti nella letteratura relativa ai tipi di cancro studiati, nonché un insieme di altri geni correlati a questi che potrebbero essere interessanti candidati per ulteriori approfondimenti.Attraverso una procedura di Gene Set Enrichment abbiamo infine testato la correlazione dei moduli identificati con pathway biologici noti.

APA, Harvard, Vancouver, ISO, and other styles

11

Bohle, Alexander, and Liam Johnson. "Supply Chain Analytics implications for designing Supply Chain Networks : Linking Descriptive Analytics to operational Supply Chain Analytics applications to derive strategic Supply Chain Network Decisions." Thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Centre of Logistics and Supply Chain Management (CeLS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-44120.

Full text

Abstract:

Today’s dynamic and increasingly competitive market had expanded complexities for global businesses pressuring companies to start leveraging on Big Data solutions in order to sustain the global competitions by becoming more data-driven in managing their supply chains.The main purpose of this study is twofold, 1) to explore the implications of applying analytics designing supply chain networks, 2) to investigate the link between operational and strategic management levels when making strategic decisions using Analytics.Qualitative methods have been applied for this study to gain a greater understanding of the Supply Chain Analytics phenomenon. An inductive approach in form of interviews, was performed in order to gain new empirical data. Fifteen semi-structured interviews were conducted with professional individuals who hold managerial roles such as project managers, consultants, and end-users within the fields of Supply Chain Management and Big Data Analytics. The received empirical information was later analyzed using the thematic analysis method.The main findings in this thesis relatively contradicts with previous studies and existing literature in terms of connotations, definitions and applications of the three main types of Analytics. Furthermore, the findings present new approaches and perspectives that advanced analytics apply on both strategic and operational management levels that are shaping supply chain network designs.

APA, Harvard, Vancouver, ISO, and other styles

12

Tosson, Amir [Verfasser], and Ullrich [Gutachter] Pietsch. "The way to a smarter community: exploring and exploiting data modeling, big data analytics, high-performance computing and artificial intelligence techniques for applications of 2D energy-dispersive detectors in the crystallography community / Amir Tosson ; Gutachter: Ullrich Pietsch." Siegen : Universitätsbibliothek der Universität Siegen, 2020. http://d-nb.info/1216332282/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Oskar, Marko. "Application of innovative methods of machine learning in Biosystems." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=108729&source=NDLTD&language=en.

Full text

Abstract:

The topic of the research in this dissertation is the application of machinelearning in solving problems characteristic to biosystems, with specialemphasis on agriculture. Firstly, an innovative regression algorithm based onbig data was presented, that was used for yield prediction. The predictionswere then used as an input for the improved portfolio optimisation algorithm,so that appropriate soybean varieties could be selected for fields withdistinctive parameters. Lastly, a multi-objective optimisation problem was setup and solved using a novel method for categorical evolutionary algorithmbased on NSGA-III.
Предмет истраживања докторске дисертације је примена машинског учења у решавању проблема карактеристичних за биосистемe са нагласком на пољопривреду. Најпре је представљен иновативни алгоритам за регресију који је примењен на великој количини података како би се са предиковали приноси. На основу предикција одабране су одговарајуће сорте соје за њиве са одређеним карактеристикама унапређеним алгоритмом оптимизације портфолија. Напослетку је постављен оптимизациони проблем одређивања сетвене структуре са вишеструким функцијама циља који је решен иновативном методом, категоричким еволутивним алгоритмом заснованом на NSGA-III алгоритму.
Predmet istraživanja doktorske disertacije je primena mašinskog učenja u rešavanju problema karakterističnih za biosisteme sa naglaskom na poljoprivredu. Najpre je predstavljen inovativni algoritam za regresiju koji je primenjen na velikoj količini podataka kako bi se sa predikovali prinosi. Na osnovu predikcija odabrane su odgovarajuće sorte soje za njive sa određenim karakteristikama unapređenim algoritmom optimizacije portfolija. Naposletku je postavljen optimizacioni problem određivanja setvene strukture sa višestrukim funkcijama cilja koji je rešen inovativnom metodom, kategoričkim evolutivnim algoritmom zasnovanom na NSGA-III algoritmu.

APA, Harvard, Vancouver, ISO, and other styles

14

Erlandsson, Niklas. "Game Analytics och Big Data." Thesis, Mittuniversitetet, Avdelningen för arkiv- och datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29185.

Full text

Abstract:

Game Analytics är ett område som vuxit fram under senare år. Spelutvecklare har möjligheten att analysera hur deras kunder använder deras produkter ned till minsta knapptryckning. Detta kan resultera i stora mängder data och utmaning ligger i att lyckas göra något vettigt av sitt data. Utmaningarna med speldata beskrivs ofta med liknande egenskaper som används för att beskriva Big Data: volume, velocity och variability. Detta borde betyda att det finns potential för ett givande samarbete. Studiens syfte är att analysera och utvärdera vilka möjligheter Big Data ger att utveckla området Game Analytics. För att uppfylla syftet genomförs en litteraturstudie och semi-strukturerade intervjuer med individer aktiva inom spelbranschen. Resultatet visar att källorna är överens om att det finns värdefull information bland det data som kan lagras, framförallt i de monetära, generella och centrala (core) till spelet värdena. Med mer avancerad analys kan flera andra intressanta mönster grävas fram men ändå är det övervägande att hålla sig till de enklare variablerna och inte bry sig om att gräva djupare. Det är inte för att datahanteringen skulle bli för omständlig och svår utan för att analysen är en osäker investering. Även om någon tar sig an alla utmaningar speldata ställer fram finns det en osäkerhet på informationens tillit och användbarheten hos svaren. Framtidsvisionerna inom Game Analytics är blygsamma och inom den närmsta framtiden är det nästan bara effektiviseringar och en utbredning som förutspås vilket inte direkt ställer några nya krav på datahanteringen.
Game Analytics is a research field that appeared recently. Game developers have the ability to analyze how customers use their products down to every button pressed. This can result in large amounts of data and the challenge is to make sense of it all. The challenges with game data is often described with the same characteristics used to define Big Data: volume, velocity and variability. This should mean that there is potential for a fruitful collaboration. The purpose of this study is to analyze and evaluate what possibilities Big Data has to develop the Game Analytics field. To fulfill this purpose a literature review and semi-structured interviews with people active in the gaming industry were conducted. The results show that the sources agree that valuable information can be found within the data you can store, especially in the monetary, general and core values to the specific game. With more advanced analysis you may find other interesting patterns as well but nonetheless the predominant way seems to be sticking to the simple variables and staying away from digging deeper. It is not because data handling or storing would be tedious or too difficult but simply because the analysis would be too risky of an investment. Even if you have someone ready to take on all the challenges game data sets up, there is not enough trust in the answers or how useful they might be. Visions of the future within the field are very modest and the nearest future seems to hold mostly efficiency improvements and a widening of the field, making it reach more people. This does not really post any new demands or requirements on the data handling.

APA, Harvard, Vancouver, ISO, and other styles

15

Doucet, Rachel A., Deyan M. Dontchev, Javon S. Burden, and Thomas L. Skoff. "Big data analytics test bed." Thesis, Monterey, California: Naval Postgraduate School, 2013. http://hdl.handle.net/10945/37615.

Full text

Abstract:

Approved for public release; distribution is unlimited
The proliferation of big data has significantly expanded the quantity and breadth of information throughout the DoD. The task of processing and analyzing this data has become difficult, if not infeasible, using traditional relational databases. The Navy has a growing priority for information processing, exploitation, and dissemination, which makes use of the vast network of sensors that produce a large amount of big data. This capstone report explores the feasibility of a scalable Tactical Cloud architecture that will harness and utilize the underlying open-source tools for big data analytics. A virtualized cloud environment was built and analyzed at the Naval Postgraduate School, which offers a test bed, suitable for studying novel variations of these architectures. Further, the technologies directly used to implement the test bed seek to demonstrate a sustainable methodology for rapidly configuring and deploying virtualized machines and provides an environment for performance benchmark and testing. The capstone findings indicate the strategies and best practices to automate the deployment, provisioning and management of big data clusters. The functionality we seek to support is a far more general goal: finding open-source tools that help to deploy and configure large clusters for on-demand big data analytics.

APA, Harvard, Vancouver, ISO, and other styles

16

Miloš, Marek. "Nástroje pro Big Data Analytics." Master's thesis, Vysoká škola ekonomická v Praze, 2013. http://www.nusl.cz/ntk/nusl-199274.

Full text

Abstract:

The thesis covers the term for specific data analysis called Big Data. The thesis firstly defines the term Big Data and the need for its creation because of the rising need for deeper data processing and analysis tools and methods. The thesis also covers some of the technical aspects of Big Data tools, focusing on Apache Hadoop in detail. The later chapters contain Big Data market analysis and describe the biggest Big Data competitors and tools. The practical part of the thesis presents a way of using Apache Hadoop to perform data analysis with data from Twitter and the results are then visualized in Tableau.

APA, Harvard, Vancouver, ISO, and other styles

17

Katzenbach, Alfred, and Holger Frielingsdorf. "Big Data Analytics für die Produktentwicklung." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-214517.

Full text

Abstract:

Aus der Einleitung: "Auf der Hannovermesse 2011 wurde zum ersten Mal der Begriff "Industrie 4.0" der Öffentlichkeit bekannt gemacht. Die Akademie der Technikwissenschaften hat in einer Arbeitsgruppe diese Grundidee der vierten Revolution der Industrieproduktion weiterbearbeitet und 2013 in einem Abschlussbericht mit dem Titel „Umsetzungsempfehlungen für das Zukunftsprojekt Industrie 4.0“ veröffentlicht (BmBF, 2013). Die Grundidee besteht darin, wandlungsfähige und effiziente Fabriken unter Nutzung moderner Informationstechnologie zu entwickeln. Basistechnologien für die Umsetzung der intelligenten Fabriken sind: — Cyber-Physical Systems (CPS) — Internet of Things (IoT) und Internet of Services (IoS) — Big Data Analytics and Prediction — Social Media — Mobile Computing Der Abschlussbericht fokussiert den Wertschöpfungsschritt der Produktion, während die Fragen der Produktentwicklung weitgehend unberücksichtigt geblieben sind. Die intelligente Fabrik zur Herstellung intelligenter Produkte setzt aber auch die Weiterentwicklung der Produktentwicklungsmethoden voraus. Auch hier gibt es einen großen Handlungsbedarf, der sehr stark mit den Methoden des „Modellbasierten Systems-Engineering“ einhergeht. ..."

APA, Harvard, Vancouver, ISO, and other styles

18

Sun, Mingyang. "Big data analytics in power systems." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/45061.

Full text

Abstract:

With the increasing penetration of advanced sensor systems in power systems, an influx of extremely large datasets presents a valuable opportunity to gain insight for improving system operation and planning in the context of the large-scale integration of intermittent energy sources. To this end, it becomes imperative to implement big data methodologies to handle such complex datasets with the challenges of volume, velocity, variety, and veracity (4Vs) for further dealing with the problems in power systems. The large-scale integration of intermittent energy sources, the introduction of shiftable load elements and the growing interconnection that characterizes electricity systems worldwide have led to a significant increase of operational uncertainty. The construction of suitable statistical models is a fundamental step towards building Monte Carlo analysis frameworks to be used for exploring the uncertainty state-space and supporting real-time decision-making. This thesis firstly proposes the novel composite modelling approaches that employ dimensionality reduction, clustering and parametric modelling techniques with a particular focus on the use of pair copula construction schemes. Large power system datasets are modelled using different combinations of the aforementioned techniques, and detailed comparisons are drawn on the basis of Kolmogorov-Smirnov tests, multivariate two-sample energy tests and visual data comparisons. The proposed methods are shown to be superior to alternative high-dimensional modelling approaches. In addition, the benefits of the proposed model are demonstrated through the applications on the calculation of a system’s PNS profile and the security assessment based on decision trees. Furthermore, this thesis presents a novel finite mixture modelling framework based on C-vine copulas (CVMM) for carrying out consumer categorization. The superiority of the proposed framework lies in the great flexibility of pair copulas towards identifying multi-dimensional dependency structures present in load profiling data. CVMM is compared to other classical methods by using real demand measurements recorded across 2,613 households in a London smart-metering trial. The superior performance of the proposed approach is demonstrated by analyzing four validity indicators. In addition, a decision tree classification module for partitioning new consumers is developed and the improved predictive performance of CVMM compared to existing methods is highlighted. Further case studies are carried out based on different loading conditions and different sets of large numbers of households to demonstrate the advantages and to test the scalability of the proposed method. In addition, in the interest of economic efficiency, design of distribution networks should be tailored to the demonstrated needs of its consumers. However, in the absence of detailed knowledge related to the characteristics of electricity consumption, planning has traditionally been carried out on the basis of empirical metrics; conservative estimates of individual peak consumption levels and of demand diversification across multiple consumers. Although such practices have served the industry well, the advent of smart metering opens up the possibility for gaining valuable insights on demand patterns, resulting in enhanced planning capabilities. This thesis is motivated by the collection of demand measurements across 2,613 households in London, as part of Low Carbon London project’s smart-metering trial. Demand diversity and other metrics of interest are quantified for the entire dataset as well as across different customer classes, investigating the degree to which occupancy level and wealth can be used to infer peak demand behavior. This thesis also presents a novel TNEP framework that accommodates multiple sources of operational stochasticity. Inter-spatial dependencies between loads in various locations and intermittent generation units’ output are captured by using a multivariate Gaussian copula. This statistical model forms the basis of a Monte Carlo analysis framework for exploring the uncertainty state-space. Benders decomposition is applied to efficiently split the investment and operation problems. The advantages of the proposed model are demonstrated through a case study on the IEEE 118-bus system. By evaluating the confidence interval of the optimality gap, the advantages of the proposed approach over conventional techniques are clearly demonstrated. Finally, this thesis proposes a novel scenarios selection framework for the transmission expansion problem to obtain an accurate solution in terms of operating costs and investment decisions with a significantly reduced number of operating states. Different classification variables and clustering techniques are considered and compared to determine the most appropriate combination for this specific problem. Benders decomposition is applied to solve the TNEP problem by splitting the investment and operation problems. The superior performance of the proposed scenarios selection framework is demonstrated through a numerical case study on the modified IEEE 118-bus system.

APA, Harvard, Vancouver, ISO, and other styles

19

Bitto, Nicholas. "Adding big data analytics to GCSS-MC." Thesis, Monterey, California: Naval Postgraduate School, 2014. http://hdl.handle.net/10945/43879.

Full text

Abstract:

Approved for public release; distribution is unlimited
Global Combat Support System - Marine Corp is a large logistics system designed to replace numerous legacy systems used by theMarine Corps. While it has been in existence for a while, its intended potential has not been fully realized. Therefore, various teams are working hard to develop the analytics that will benefit the community. With the growth of data, the only way these analytics (in Structured Query Language [SQL]) will run efficiently will be on proprietary hardware from Oracle. This research looks at running the same analytics on commodity hardware using Hadoop Distributed File System and Java Map Reduce. The results show that while it takes longer to program in Java (over SQL), the analytics are just as, or even more powerful ,as SQL, and the potential to save on hardware cost is significant.

APA, Harvard, Vancouver, ISO, and other styles

20

TANNEEDI, NAREN NAGA PAVAN PRITHVI. "Customer Churn Prediction Using Big Data Analytics." Thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13518.

Full text

Abstract:

Customer churn is always a grievous issue for the Telecom industry as customers do not hesitate to leave if they don’t find what they are looking for. They certainly want competitive pricing, value for money and above all, high quality service. Customer churning is directly related to customer satisfaction. It’s a known fact that the cost of customer acquisition is far greater than cost of customer retention, that makes retention a crucial business prototype. There is no standard model which addresses the churning issues of global telecom service providers accurately. BigData analytics with Machine Learning were found to be an efficient way for identifying churn. This thesis aims to predict customer churn using Big Data analytics, namely a J48 decision tree on a Java based benchmark tool, WEKA. Three different datasets from various sources were considered; first includes Telecom operator’s six month aggregate active and churned users’ data usage volumes, second includes globally surveyed data and third dataset comprises of individual weekly data usage analysis of 22 android customers along with their average quality, annoyance and churn scores by accompanying theses. Statistical analyses and J48 Decision trees were drawn for three different datasets. From the statistics of normalized volumes, autocorrelations were small owing to reliable confidence intervals, but confidence intervals were overlapping and close by, therefore no much significance could be noticed, henceforth no strong trends could be observed. From decision tree analytics, decision trees with 52%, 70% and 95% accuracies were achieved for three different data sources respectively. Data preprocessing, data normalization and feature selection have shown to be prominently influential. Monthly data volumes have not shown much decision power. Average Quality, Churn Risk and to some extent, Annoyance scores may point out a probable churner. Weekly data volumes with customer’s recent history and necessary attributes like age, gender, tenure, bill, contract, data plan, etc., are pivotal for churn prediction.

APA, Harvard, Vancouver, ISO, and other styles

21

Le, Quoc Do. "Approximate Data Analytics Systems." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-234219.

Full text

Abstract:

Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.

APA, Harvard, Vancouver, ISO, and other styles

22

Leis, Machín Angela 1974. "Studying depression through big data analytics on Twitter." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671365.

Full text

Abstract:

Mental disorders have become a major concern in public health, since they are one of the main causes of the overall disease burden worldwide. Depressive disorders are the most common mental illnesses, and they constitute the leading cause of disability worldwide. Language is one of the main tools on which mental health professionals base their understanding of human beings and their feelings, as it provides essential information for diagnosing and monitoring patients suffering from mental disorders. In parallel, social media platforms such as Twitter, allow us to observe the activity, thoughts and feelings of people’s daily lives, including those of patients suffering from mental disorders such as depression. Based on the characteristics and linguistic features of the tweets, it is possible to identify signs of depression among Twitter users. Moreover, the effect of antidepressant treatments can be linked to changes in the features of the tweets posted by depressive users. The analysis of this huge volume and diversity of data, the so-called “Big Data”, can provide relevant information about the course of mental disorders and the treatments these patients are receiving, which allows us to detect, monitor and predict depressive disorders. This thesis presents different studies carried out on Twitter data in the Spanish language, with the aim of detecting behavioral and linguistic patterns associated to depression, which can constitute the basis of new and complementary tools for the diagnose and follow-up of patients suffering from this disease

APA, Harvard, Vancouver, ISO, and other styles

23

Brydon, Humphrey Charles. "Missing imputation methods explored in big data analytics." University of the Western Cape, 2018. http://hdl.handle.net/11394/6605.

Full text

Abstract:

Philosophiae Doctor - PhD (Statistics and Population Studies)
The aim of this study is to look at the methods and processes involved in imputing missing data and more specifically, complete missing blocks of data. A further aim of this study is to look at the effect that the imputed data has on the accuracy of various predictive models constructed on the imputed data and hence determine if the imputation method involved is suitable. The identification of the missingness mechanism present in the data should be the first process to follow in order to identify a possible imputation method. The identification of a suitable imputation method is easier if the mechanism can be identified as one of the following; missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR). Predictive models constructed on the complete imputed data sets are shown to be less accurate for those models constructed on data sets which employed a hot-deck imputation method. The data sets which employed either a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods are shown to result in predictive models that are more accurate. The addition of an iterative bagging technique in the modelling procedure is shown to produce highly accurate prediction estimates. The bagging technique is applied to variants of the neural network, a decision tree and a multiple linear regression (MLR) modelling procedure. A stochastic gradient boosted decision tree (SGBT) is also constructed as a comparison to the bagged decision tree. Final models are constructed from 200 iterations of the various modelling procedures using a 60% sampling ratio in the bagging procedure. It is further shown that the addition of the bagging technique in the MLR modelling procedure can produce a MLR model that is more accurate than that of the other more advanced modelling procedures under certain conditions. The evaluation of the predictive models constructed on imputed data is shown to vary based on the type of fit statistic used. It is shown that the average squared error reports little difference in the accuracy levels when compared to the results of the Mean Absolute Prediction Error (MAPE). The MAPE fit statistic is able to magnify the difference in the prediction errors reported. The Normalized Mean Bias Error (NMBE) results show that all predictive models constructed produced estimates that were an over-prediction, although these did vary depending on the data set and modelling procedure used. The Nash Sutcliffe efficiency (NSE) was used as a comparison statistic to compare the accuracy of the predictive models in the context of imputed data. The NSE statistic showed that the estimates of the models constructed on the imputed data sets employing a multiple imputation method were highly accurate. The NSE statistic results reported that the estimates from the predictive models constructed on the hot-deck imputed data were inaccurate and that a mean substitution of the fully observed data would have been a better method of imputation. The conclusion reached in this study shows that the choice of imputation method as well as that of the predictive model is dependent on the data used. Four unique combinations of imputation methods and modelling procedures were concluded for the data considered in this study.

APA, Harvard, Vancouver, ISO, and other styles

24

Oikonomidi, Sofia. "Impact of Big Data Analytics in Industry 4.0." Thesis, Linnéuniversitetet, Institutionen för informatik (IK), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-99443.

Full text

Abstract:

Big data in industry 4.0 is a major subject for the currently developed research but also for the organizations that are motivated to invest in these kinds of projects. The big data are known as the large quantity of data collected from various resources that potentially could be analyzed and provide valuable insights and patterns. In industry 4.0 the production of data is massive, and thus, provides the basis for analysis and important information extraction. This study aims to provide the impact of big data analytics in industry 4.0 environments by the utilization of the SWOT dimensions framework with the intention to provide both a positive and a negative perspective of the subject. Considering that these implementations are an innovative trend and limited awareness exists for the subject, it is valuable to summarize and explore the identified findings from the published literature that will be reviewed based on interviews with data scientists. The intention is to increase the knowledge of the subject and inform the organizations about their potential expectations and challenges. The effects are represented in the SWOT analysis based on findings collected from 22 selected articles which were afterwards discussed with professionals. The systematic literature review started with the creation of a plan and specifically defined steps approach based on previously existing scientific papers. The relevant literature was decided upon specified inclusion and exclusion criteria and their relevance to the research questions. Following this, the interview questionnaire was build based on the findings in order to gather empirical data on the subject. The results revealed that the insights developed through big data support the management towards effective decision-making since it reduces the ambiguity of the actions. Meanwhile, the optimization of production, expenditure decrement, and customer satisfaction are the following as top categories mentioned in the selected articles for the strength dimension. In the opportunities, the interoperability of the equipment, the real-time information acquirement and exchange, and self-awareness of the systems are reflected in the majority of the papers. On the contrary, the threats and weaknesses are referred to fewer studies. The infrastructure limitations, security, and privacy issues are demonstrated substantially. The organizational changes and human resources matters are also expressed but infrequently. The data scientists agreed with the findings and mentioned that decision-making, process effectiveness and customer relationships are their major expectations and objectives while the experience and knowledge limitations of the personnel is their main concern. In general, the gaps in the existing literature could be identified in the challenges that occur for the big data projects in industry 4.0. Consequently, further research is recommended in the field in order to raise the awareness in the interested parties and ensure the project’s success.

APA, Harvard, Vancouver, ISO, and other styles

25

Rystadius, Gustaf, David Monell, and Linus Mautner. "The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation." Thesis, Jönköping University, Internationella Handelshögskolan, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959.

Full text

Abstract:

Background: The implementation of Big Data Analytics (BDA) has drastically increased within several sectors such as retailing. Due to its rapidly altering environment, companies have to adapt and modify their business strategies and models accordingly. The concepts of ambidexterity and agility are said to act as mediators to these changes in relation to a company’s capabilities within BDA. Problem: Research within the respective fields of dynamic mediators and BDAC have been conducted, but the investigation of specific traits of these mediators, their interconnection and its impact on BDAC is scant. This actuality is seen as a surprise from scholars, calling for further empirical investigation. Purpose: This paper sought to empirically investigate what specific traits of ambidexterity and agility that emerged within the case company of Åhlen’s BDA-operation, and how these traits are interconnected. It further studied how these traits and their interplay impacts the firm's talent and managerial BDAC. Method: A qualitative case study on the retail firm Åhlens was conducted with three participants central to the firm's BDA-operation. Semi-structured interviews were conducted with questions derived from the conceptual framework based upon reviewed literature and pilot interviews. The data was then analyzed and matched to literature using a thematic analysis approach. Results: Five ambidextrous traits and three agile traits were found within Åhlen’s BDA-operation. Analysis of these traits showcased a clear positive impact on Åhlen’s BDAC, when properly interconnected. Further, it was found that in absence of such interplay, the dynamic mediators did not have as positive impact and occasionally even disruptive effects on the firm’s BDAC. Hence it was concluded that proper connection between the mediators had to be present in order to successfully impact and enhance the capabilities.

APA, Harvard, Vancouver, ISO, and other styles

26

Hellström, Elin, and My Hemlin. "Det binära guldet : en uppsats om big data och analytics." Thesis, Uppsala universitet, Informationssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-205900.

Full text

Abstract:

Syftet med denna studie är att utreda begreppen big data och analytics. Utifrån vetenskapliga teorier om begreppen undersöks hur konsultföretag uppfattar och använder sig av big data och analytics. För att skapa en nyanserad bild har även en organisation inom vården undersökts för att få kunskap om hur de kan dra nytta av big data och analytics. Ett antal viktiga svårigheter och framgångsfaktorer kopplade till båda begreppen presenteras. De svårigheterna kopplas sedan ihop med en framgångsfaktor som anses kunna bidra till att lösa det problemet. De mest relevanta framgångsfaktorer som identifierats är att högkvalitativ data finns tillgänglig men även kunskap och kompetens kring hur man hanterar data. Slutligen tydliggörs begreppens innebörd där man kan se att big data oftast beskrivs ur dimensionerna volym, variation och hastighet och att analytics i de flesta fall syftar till att deskriptiv och preventiv analys genomförs.
The purpose of this study is to investigate the concepts of big data and analytics. The concepts are explored based on scientific theories and interviews with consulting firms. A healthcare organization has also been interviewed to get a richer understanding of how big data and analytics can be used to gain insights and how an organisation can benefit from them. A number of important difficulties and sucess facors connected to the concepts are presented. These difficulties are then linked to a sucess factor that is considered to solve the problem. The most relevant success factors identified are the avaliability of high quality data and knowledge and expertise on how to handle the data. Finally the concepts are clarified and one can see that big data is usually described from the dimensions volume, variety and velocity and analytics is usually described as descriptive and preventive analysis.

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Liangwei. "Big Data Analytics for eMaintenance : Modeling of high-dimensional data streams." Licentiate thesis, Luleå tekniska universitet, Drift, underhåll och akustik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-17012.

Full text

Abstract:

Big Data analytics has attracted intense interest from both academia and industry recently for its attempt to extract information, knowledge and wisdom from Big Data. In industry, with the development of sensor technology and Information & Communication Technologies (ICT), reams of high-dimensional data streams are being collected and curated by enterprises to support their decision-making. Fault detection from these data is one of the important applications in eMaintenance solutions with the aim of supporting maintenance decision-making. Early discovery of system faults may ensure the reliability and safety of industrial systems and reduce the risk of unplanned breakdowns. Both high dimensionality and the properties of data streams impose stringent challenges on fault detection applications. From the data modeling point of view, high dimensionality may cause the notorious “curse of dimensionality” and lead to the accuracy deterioration of fault detection algorithms. On the other hand, fast-flowing data streams require fault detection algorithms to have low computing complexity and give real-time or near real-time responses upon the arrival of new samples. Most existing fault detection models work on relatively low-dimensional spaces. Theoretical studies on high-dimensional fault detection mainly focus on detecting anomalies on subspace projections of the original space. However, these models are either arbitrary in selecting subspaces or computationally intensive. In considering the requirements of fast-flowing data streams, several strategies have been proposed to adapt existing fault detection models to online mode for them to be applicable in stream data mining. Nevertheless, few studies have simultaneously tackled the challenges associated with high dimensionality and data streams. In this research, an Angle-based Subspace Anomaly Detection (ABSAD) approach to fault detection from high-dimensional data is developed. Both analytical study and numerical illustration demonstrated the efficacy of the proposed ABSAD approach. Based on the sliding window strategy, the approach is further extended to an online mode with the aim of detecting faults from high-dimensional data streams. Experiments on synthetic datasets proved that the online ABSAD algorithm can be adaptive to the time-varying behavior of the monitored system, and hence applicable to dynamic fault detection.
Godkänd; 2015; 20150512 (liazha); Nedanstående person kommer att hålla licentiatseminarium för avläggande av teknologie licentiatexamen. Namn: Liangwei Zhang Ämne: Drift och underhållsteknik/Operation and Maintenance Engineering Uppsats: Big Data Analytics for eMaintenance Examinator: Professor Uday Kumar Institutionen för samhällsbyggnad och naturresurser Avdelning Drift, underhåll och akustik Luleå tekniska universitet Diskutant: Professor Wolfgang Birk Institutionen för system- och rymdteknik Avdelning Signaler och system Luleå tekniska universitet Tid: Onsdag 10 juni 2015 kl 10.00 Plats: E243, Luleå tekniska universitet

APA, Harvard, Vancouver, ISO, and other styles

28

Palummo, Alexandra Lina. "Supporto SQL al sistema Hadoop per big data analytics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text

Abstract:

Negli ultimi anni, si sta parlando sempre più spesso di Big Data, riferendosi non solo a grandi moli di dati generati da diversi fenomeni come l’esplosione delle reti sociali e l’accelerazione senza precedenti dello sviluppo tecnologico, ma l'espressione riguarda alcune nuove necessità e le conseguenti sfide, dette delle Tre V: Volume, Velocità e Varietà. Per poter analizzare ed estrarre informazioni da questi grandi volumi di dati, sono state sviluppate risorse e tecnologie differenti dai sistemi convenzionali di immagazzinamento e gestione dei dati. Una delle tecnologie che ha avuto maggior successo è rappresentata da Apache Hadoop, un framework Open Source di Apache. In questo elaborato viene illustrata una panoramica di Hadoop, concepito per offrire supporto ad applicazioni distribuite e semplificare le operazioni di storage e gestione di dataset di grandi dimensioni, fornendo una alternativa ai DBMS relazionali poco adatti alle trasformazioni dei Big Data. Hadoop fornisce inoltre strumenti in grado di analizzare e processare una grande quantità di informazioni, tra i quali Hive, Impala e BigSQL 3.0, descritti nella seconda parte dell’elaborato. Confrontando le prestazioni di questi tre sistemi mediante un esperimento, condotto sul benchmark TPC-DS su piattaforma Hadoop, è stato evidenziato come BigSQL 3.0 riesce ad ottenere le prestazioni migliori.

APA, Harvard, Vancouver, ISO, and other styles

29

MA, YIXIAO. "Big Data Analytics of City Wide Building Energy Declarations." Thesis, KTH, Industriell ekologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-165080.

Full text

Abstract:

This thesis explores the building energy performance of the domestic sector in the city of Stockholm based on the building energy declaration database. The aims of this master thesis are to analyze the big data sets of around 20,000 buildings in Stockholm region, explore the correlation between building energy performance and different internal and external affecting factors on building energy consumption, such as building energy systems, building vintages and etc. By using clustering method, buildings with different energy consumptions can be easily identified. Thereafter, energy saving potential is estimated by setting step-by-step target, while feasible energy saving solutions can also be proposed in order to drive building energy performance at city level. A brief introduction of several key concepts, energy consumption in buildings, building energy declaration and big data, serves as the background information, which helps to clarify the necessity of conducting this master thesis. The methods used in this thesis include data processing, descriptive analysis, regression analysis, clustering analysis and energy saving potential analysis. The provided building energy declaration data is firstly processed in MS Excel then reorganized in MS Access. As for the data analysis process, IBM SPSS is further introduced for the descriptive analysis and graphical representation. By defining different energy performance indicators, the descriptive analysis presents the energy consumption and composition for different building classifications. The results also give the application details of different ventilation systems in different building types. Thereafter, the correlation between building energy performance and five different independent variables is analyzed by using a linear regression model. Clustering analysis is further performed on studied buildings for the purpose of targeting low energy efficiency groups, and the buildings with various energy consumptions are well identified and grouped based on their energy performance. It proves that clustering method is quite useful in the big data analysis, however some parameters in the process of clustering needs to be further adjusted in order to achieve more satisfied results. Energy saving potential for the studied buildings is calculated as well. The conclusion shows that the maximal potential for energy savings in the studied buildings is estimated at 43% (2.35 TWh) for residential buildings and 54% (1.68 TWh) for non-residential premises, and the saving potential is calculated for different building categories and different clusters as well.

APA, Harvard, Vancouver, ISO, and other styles

30

Moran, Andrew M. Eng Massachusetts Institute of Technology. "Improving big data visual analytics with interactive virtual reality." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105972.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 80-84).
For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors impede the practices of not only processing data, but also analyzing and displaying it in an efficient manner to the user. Many efforts have been completed in the data mining and visual analytics community to create effective ways to further improve analysis and achieve the knowledge desired for better understanding. Our approach for improved big data visual analytics is two-fold, focusing on both visualization and interaction. Given geo-tagged information, we are exploring the benefits of visualizing datasets in the original geospatial domain by utilizing a virtual reality platform. After running proven analytics on the data, we intend to represent the information in a more realistic 3D setting, where analysts can achieve an enhanced situational awareness and rely on familiar perceptions to draw in-depth conclusions on the dataset. In addition, developing a human-computer interface that responds to natural user actions and inputs creates a more intuitive environment. Tasks can be performed to manipulate the dataset and allow users to dive deeper upon request, adhering to desired demands and intentions. Due to the volume and popularity of social media, we developed a 3D tool visualizing Twitter on MIT's campus for analysis. Utilizing emerging technologies of today to create a fully immersive tool that promotes visualization and interaction can help ease the process of understanding and representing big data.
by Andrew Moran.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

31

Jun, Sang-Woo. "Scalable multi-access flash store for Big Data analytics." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/87947.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 47-49).
For many "Big Data" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this work we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70[mu]s, including 3.5[mu]s of network overhead.
by Sang-Woo Jun.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

32

Alaka, H. A. "'Big data analytics' for construction firms insolvency prediction models." Thesis, University of the West of England, Bristol, 2017. http://eprints.uwe.ac.uk/30714/.

Full text

Abstract:

In a pioneering effort, this study is the first to develop a construction firms insolvency prediction model (CF-IPM) with Big Data Analytics (BDA); combine qualitative and quantitative variables; advanced artificial intelligence tools such as Random Forest and Bart Machine; and data of all sizes of construction firms (CF), ensuring wide applicability The pragmatism paradigm was employed to allow the use of mixed methods. This was necessary to allow the views of the top management team (TMT) of failed and existing construction firms to be captured using a qualitative approach. TMT members of 13 existing and 14 failed CFs were interviewed. Interview result was used to create a questionnaire with over hundred qualitative variables. A total of 272 and 259 (531) usable questionnaires were returned for existing and failed CFs respectively. The data of the 531 questionnaires were oversample to get a total questionnaire sample of 1052 CFs. The original and matched sample financial data of the firms were downloaded. Using Cronbach’s alpha and factor analysis, qualitative variables were reduced to 13 (Q1 to Q13) while11 financial ratios (i.e. quantitative variables) (R1 and R11) reported by large and MSM CFs were identified for the sample CFs. The BDA system was set up with the Amazon Web Services Elastic Compute Cloud using five ‘Instances’ as Hadoop DataNodes and one as NameNode. The NameNode was configured as Spark Master. Eleven variable selection methods and three voting systems were used to select the final seven qualitative and seven quantitative variables, which were used to develop 13 BDA-CF-IPMs. The Decision Tree BDA-CF-IPM was the model of choice in this study because it had high accuracy, low Type I error and transparency. The most important variables (factors) affecting insolvency of construction firms according to the best model are returned on total assets; liquidity; solvency ratio; top management characteristics; strategic issues and external relations; finance and conflict related issues; industry contract/project knowledge.

APA, Harvard, Vancouver, ISO, and other styles

33

Olsén, Cleas, and Gustav Lindskog. "Big Data Analytics : A potential way to Competitive Performance." Thesis, Linnéuniversitetet, Institutionen för informatik (IK), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104372.

Full text

Abstract:

Big data analytics (BDA) has become an increasingly popular topic over the years amongst academics and practitioners alike. Big data, which is an important part of BDA, was originally defined with three Vs, being volume, velocity and variety. In later years more Vs have surfaced to better accommodate the current need. The analytics of BDA consists of different methods of analysing gathered data. Analysing data can provide insights to organisations which in turn can give organisations competitive advantage and enhance their businesses. Looking into the necessary resources needed to build big data analytic capabilities (BDAC), this thesis sought out to find how Swedish organisations enable and use BDA in their businesses. This thesis also investigated whether BDA could lead to performance enhancement and competitive advantage to organisations. A theoretical framework based on previous studies was adapted and used in order to help answer the thesis purpose. A qualitative study was deemed the most suitable for this study using semi-structured interviews. Previous studies in this field pointed to the fact that organisations may not be aware of how or why to use or enable BDA. According to current literature, different resources are needed to work in conjunction with each other in order to create BDAC and enable BDA to be utilized. Several different studies discuss challenges such as the culture of the organisation, human skills, and the need for top management to support BDA initiatives to succeed. The findings from the interviews in this study indicated that in a Swedish context the different resources, such as data, technical skills, and data driven culture, amongst others, are being used to enable BDA. Furthermore, the result showed that business process improvements are a first staple in organisations use of benefiting from BDA. This is because of the ease and security in calculating profits and effect from such an investment. Depending on how far an organisation have come in their transformation process they may also innovate and/or create products or services from insights made possible from BDA.
Big data analytics (BDA) har blivit ett populärt ämne under de senaste åren hos akademiker och utövare. Big data, som är en viktig del av BDA, var först definierad med tre Vs, volym, hastighet och varietet. På senare år har flertalet V framkommit för att bättre uttrycka det nuvarande behovet. Analysdelen i BDA består av olika metoder av analysering av data. Dataanalysering som görs kan ge insikter till organisationer, som i sin tur kan ge organisationer konkurrensfördelar och förbättra deras företag. Genom att definiera de resurser som krävs för att bygga big data analytic capabilities (BDAC), så försökte denna avhandling att visa hur svenska organisationer möjliggör och använder BDA i sina företag. Avhandlingen försökte också härleda om BDA kan leda till prestandaförbättringar och konkurrensfördelar för organisationer. Ett teoretiskt ramverk, baserat på tidigare studier, anpassades och användes för att hjälpa till att svara på avhandlingens syfte. En kvalitativ studie utsågs vara den mest passande ansatsen, tillsammans med semi-strukturerade intervjuer. Tidigare studier inom området visade på att organisationer kanske inte helt är medvetna om hur eller varför BDA möjliggörs eller kan användas. Enligt den nuvarande litteraturen så behöver olika resurser arbeta tillsammans med varandra för att skapa BDAC och möjliggöra att BDA kan utnyttjas till fullo. Flera olika studier diskuterade utmaningar såsom kulturen inom organisationen, kompetens hos anställda och att ledningen behöver stödja BDA initiativ för att lyckas. Fynden från studiens intervjuer indikerade, i ett svenskt sammanhang, att olika resurser såsom data, tekniska färdigheter och datadriven kultur bland annat, används för att möjliggöra BDA. Fortsättningsvis påvisade resultatet att affärsprocessförbättring är en första stapel i användandet av fördelarna från BDA. Anledningen till det är för att det är lättare och säkrare med beräkning av förtjänst och effekt från en sådan investering. Beroende på hur långt en organisation har kommit i deras transformationsprocess kan de också innovera och/eller skapa produkter eller tjänster som möjliggjorts av insikter från BDA.

APA, Harvard, Vancouver, ISO, and other styles

34

Mathias, Henry. "Analyzing Small Businesses' Adoption of Big Data Security Analytics." ScholarWorks, 2019. https://scholarworks.waldenu.edu/dissertations/6614.

Full text

Abstract:

Despite the increased cost of data breaches due to advanced, persistent threats from malicious sources, the adoption of big data security analytics among U.S. small businesses has been slow. Anchored in a diffusion of innovation theory, the purpose of this correlational study was to examine ways to increase the adoption of big data security analytics among small businesses in the United States by examining the relationship between small business leaders' perceptions of big data security analytics and their adoption. The research questions were developed to determine how to increase the adoption of big data security analytics, which can be measured as a function of the user's perceived attributes of innovation represented by the independent variables: relative advantage, compatibility, complexity, observability, and trialability. The study included a cross-sectional survey distributed online to a convenience sample of 165 small businesses. Pearson correlations and multiple linear regression were used to statistically understand relationships between variables. There were no significant positive correlations between relative advantage, compatibility, and the dependent variable adoption; however, there were significant negative correlations between complexity, trialability, and the adoption. There was also a significant positive correlation between observability and the adoption. The implications for positive social change include an increase in knowledge, skill sets, and jobs for employees and increased confidentiality, integrity, and availability of systems and data for small businesses. Social benefits include improved decision making for small businesses and increased secure transactions between systems by detecting and eliminating advanced, persistent threats.

APA, Harvard, Vancouver, ISO, and other styles

35

Vahedian, Khezerlou Amin. "Mining big mobility data for large urban event analytics." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/7039.

Full text

Abstract:

This thesis seeks to formulate concepts and develop methods that facilitate the mining of urban big mobility data. Specifically, the aim of the formulations and developed methods is to identify and predict certain events that occur as a result of urban mobility. This thesis, studies unexpected gathering and dispersal events. A Gathering event is the process of an unusually large number of moving objects (e.g. taxi) arriving at the same area within a short period of time. It is important for city management to identify emerging gathering events which might cause public safety or sustainability concerns. Similarly, a dispersal event is the process of an unusually large number of moving objects leaving the same area within a short period of time. Early prediction of dispersal events is important in mitigating congestion and safety risks and making better dispatching decisions for taxi and ride-sharing fleets. This thesis solves the problems of early detection and forecasting of gathering and predicting dispersal events. Prior work to detect gathering events uses undirected patterns which lack the ability to specify the dynamic flow of the traffic and the destination of the gathering. Forecasting gathering events is a predictive approach as apposed to descriptive approaches of detection. This thesis is the first to use destination prediction to forecast gathering events. Moreover, the presented destination prediction technique relaxes independence assumptions of related work and addresses the resulting challenges to achieve superior performance. Literature of dispersal event prediction solves this problem as a taxi demand prediction problem. Those methods aim at predicting the regular pattern and are unable to predict rare events. This thesis presents the SmartEdge Algorithm for early detection of gathering events. SmartEdge outputs a gathering footprint that specifies gathering paths and gathering destination. To forecast gathering events, this thesis presents DH-VIGO, which uses a dynamic hybrid model to forecast rare gathering events ahead of the time. Comprehensive evaluations using real-world datasets demonstrate meaningful results and superior performance compared to baseline methods. To predict dispersal events, this thesis uses a two-stage framework based on survival analysis called DILSA+, to predict the start time of the event and an event demand predictor to predict the volume of the demand in case of a dispersal event. Extensive evaluations on real-world data demonstrate that DILSA+ out-performs baselines and can effectively predict dispersal events.

APA, Harvard, Vancouver, ISO, and other styles

36

Hahmann, Martin, Claudio Hartmann, Lars Kegel, Dirk Habich, and Wolfgang Lehner. "Big by blocks: Modular Analytics." De Gruyter, 2016. https://tud.qucosa.de/id/qucosa%3A72848.

Full text

Abstract:

Big Data and Big Data analytics have attracted major interest in research and industry and continue to do so. The high demand for capable and scalable analytics in combination with the ever increasing number and volume of application scenarios and data has lead to a large and intransparent landscape full of versions, variants and individual algorithms. As this zoo of methods lacks a systematic way of description, understanding is almost impossible which severely hinders effective application and efficient development of analytic algorithms. To solve this issue we propose our concept of modular analytics that abstracts the essentials of an analytic domain and turns them into a set of universal building blocks. As arbitrary algorithms can be created from the same set of blocks, understanding is eased and development benefits from reusability.

APA, Harvard, Vancouver, ISO, and other styles

37

Cao, Lei. "Outlier Detection In Big Data." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/82.

Full text

Abstract:

The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches.

APA, Harvard, Vancouver, ISO, and other styles

38

Singh, Shailendra. "Smart Meters Big Data : Behavioral Analytics via Incremental Data Mining and Visualization." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/35244.

Full text

Abstract:

The big data framework applied to smart meters offers an exception platform for data-driven forecasting and decision making to achieve sustainable energy efficiency. Buying-in consumer confidence through respecting occupants' energy consumption behavior and preferences towards improved participation in various energy programs is imperative but difficult to obtain. The key elements for understanding and predicting household energy consumption are activities occupants perform, appliances and the times that appliances are used, and inter-appliance dependencies. This information can be extracted from the context rich big data from smart meters, although this is challenging because: (1) it is not trivial to mine complex interdependencies between appliances from multiple concurrent data streams; (2) it is difficult to derive accurate relationships between interval based events, where multiple appliance usage persist; (3) continuous generation of the energy consumption data can trigger changes in appliance associations with time and appliances. To overcome these challenges, we propose an unsupervised progressive incremental data mining technique using frequent pattern mining (appliance-appliance associations) and cluster analysis (appliance-time associations) coupled with a Bayesian network based prediction model. The proposed technique addresses the need to analyze temporal energy consumption patterns at the appliance level, which directly reflect consumers' behaviors and provide a basis for generalizing household energy models. Extensive experiments were performed on the model with real-world datasets and strong associations were discovered. The accuracy of the proposed model for predicting multiple appliances usage outperformed support vector machine during every stage while attaining accuracy of 81.65\%, 85.90\%, 89.58\% for 25\%, 50\% and 75\% of the training dataset size respectively. Moreover, accuracy results of 81.89\%, 75.88\%, 79.23\%, 74.74\%, and 72.81\% were obtained for short-term (hours), and long-term (day, week, month, and season) energy consumption forecasts, respectively.

APA, Harvard, Vancouver, ISO, and other styles

39

Plevoets, Christina, and Rodrigo Fernandes. "Exploring the role of Big Data and Analytics : Creating Data-Driven Innovation." Thesis, Blekinge Tekniska Högskola, Institutionen för industriell ekonomi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13463.

Full text

Abstract:

Purpose: The purpose of this thesis is to understand the role of big data in the innovation process and specify the conditions or scenarios where big data and innovation can join forces. Research design, approach and method: The theoretical framework used is based on the innovation framework with big data from Kusiak (2015) and it is validated by three propositions: 1) Big data analytics (BDA) gives the possibility to discover new insights, such as increase profitability, expansion of customer base and market growth, hence it is a contributor to innovation. 2) Firms are investing or plan to invest massively in big data analytics, in terms of money and time, in order to have a competitive advantage. 3) Big data analytics is a general purpose technology (GPT), thus can bring value across different industries. The study is explorative using a qualitative approach primarily validated by interviews and supported by one exploratory survey. A total of four individuals were interrogated through semi-structured and two through unstructured interviews. Regarding the cross-sectional on-line survey, the response rate was 29% where 15 questionnaires were filled out. Findings: The outcome of the study is that:  BDA gives the possibility to discover new insights hence BDA is a contributor to innovation if an evolving process is in place and that humans are interacting with the results.  Firms are investing or plan to invest massively in BDA in order to have a competitive advantage is partially supported as there is a lack of financial figures and order of magnitude.  BDA is a general purpose technology thus can bring value across different industries (e.g. Insurance and metallurgy) is supported by the empirical findings. In conclusion, big data analytics can play a role in the innovation process in three different phases: data storage, data analysis and innovation knowledge. It is reflected as: BDA can trigger innovation, BDA can be innovation and different sources of data can contribute to insights in a BDA system

APA, Harvard, Vancouver, ISO, and other styles

40

Stouten, Floris. "Big data analytics attack detection for Critical Information Infrastructure Protection." Thesis, Luleå tekniska universitet, Datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-59562.

Full text

Abstract:

Attacks on critical information infrastructure are increasing in volume and sophistication with destructive consequences according to the 2015 Cyber Supply Chain Security Revisited report from ESG recently (ESG, 2015). In a world of connectivity and data dependency, cyber-crime is on the rise causing many disruptions in our way of living. Our society relies on these critical information infrastructures for our social and economic well-being, and become more complex due to many integrated systems. Over the past years, various research contributions have been made to provide intrusion detection solutions to address these complex attack problems. Even though various research attempts have been made, shortcomings still exists in these solutions to provide attack detection. False positives and false negatives outcomes for attack detection are still known shortcomings that must be addressed. This study contributes research, by finding a solution for the found shortcomings by designing an IT artifact framework based on the Design Science Research Methodology (DSRM). The framework consist of big data analytics technology that provides attack detection. Research outcomes for this study shows a possible solution to the shortcomings by the designed IT artifact framework with use of big data analytics technology. The framework built on open source technology can provide attack detection, and possibly provide a solution to improve the false positives and false negatives for attack detection outcomes. Three main modules have been designed and demonstrated, whereby a hybrid approach for detection is used to address the shortcomings. Therefore, this research can benefit Critical Information Infrastructure Protection (CIIP) in Sweden to detect attacks and can possibly be utilized in various network infrastructures.

APA, Harvard, Vancouver, ISO, and other styles

41

Jun, Sang-Woo. "Big data analytics made affordable using hardware-accelerated flash storage." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118088.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 175-192).
Vast amount of data is continuously being collected from sources including social networks, web pages, and sensor networks, and their economic value is dependent on our ability to analyze them in a timely and affordable manner. High performance analytics have traditionally required a machine or a cluster of machines with enough DRAM to accommodate the entire working set, due to their need for random accesses. However, datasets of interest are now regularly exceeding terabytes in size, and the cost of purchasing and operating a cluster with hundreds of machines is becoming a significant overhead. Furthermore, the performance of many random-access-intensive applications plummets even when a fraction of data does not fit in memory. On the other hand, such datasets could be stored easily in the flash-based secondary storage of a rack-scale cluster, or even a single machine for a fraction of capital and operating costs. While flash storage has much better performance compared to hard disks, there are many hurdles to overcome in order to reach the performance of DRAM-based clusters. This thesis presents a new system architecture as well as operational methods that enable flash-based systems to achieve performance comparable to much costlier DRAM-based clusters for many important applications. We describe a highly customizable architecture called BlueDBM, which includes flash storage devices augmented with in-storage hardware accelerators, networked using a separate storage-area network. Using a prototype BlueDBM cluster with custom-designed accelerated storage devices, as well as novel accelerator designs and storage management algorithms, we have demonstrated high performance at low cost for applications including graph analytics, sorting, and database operations. We believe this approach to handling Big Data analytics is an attractive solution to the cost-performance issue of Big Data analytics.
by Sang-Woo Jun.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

42

Bin, Saip Mohamed A. "Big Social Data Analytics: A Model for the Public Sector." Thesis, University of Bradford, 2019. http://hdl.handle.net/10454/18352.

Full text

Abstract:

The influence of Information and Communication Technologies (ICTs) particularly internet technology has had a fundamental impact on the way government is administered, provides services and interacts with citizens. Currently, the use of social media is no longer limited to informal environments but is an increasingly important medium of communication between citizens and governments. The extensive and increasing use of social media will continue to generate huge amounts of user-generated content known as Big Social Data (BSD). The growing body of BSD presents innumerable opportunities as well as challenges for local government planning, management and delivery of public services to citizens. However, the governments have not yet utilised the potential of BSD for better understanding the public and gaining new insights from this new way of interactions. Some of the reasons are lacking in the mechanism and guidance to analyse this new format of data. Thus, the aim of this study is to evaluate how the body of BSD can be mined, analysed and applied in the context of local government in the UK. The objective is to develop a Big Social Data Analytics (BSDA) model that can be applied in the case of local government. Data generated from social media over a year were collected, collated and analysed using a range of social media analytics and network analysis tools and techniques. The final BSDA model was applied to a local council case to evaluate its impact in real practice. This study allows to better understand the methods of analysing the BSD in the public sector and extend the literature related to e-government, social media, and social network theory
Universiti Utara Malaysia

APA, Harvard, Vancouver, ISO, and other styles

43

Stevens, Melissa Anine. "Creating value from big data and analytics : a leader's perspective." Diss., University of Pretoria, 2017. http://hdl.handle.net/2263/64819.

Full text

Abstract:

The world is increasingly using technologies that generate and consume unimaginably large quantities of data, called big data. The power of big data does not lie only in its quantum, but in what organisations do with it. Within a business context, big data and analytic methodologies offer the potential to generate unique insights. However, the reality is that many organisations have not yet mastered the art of using big data and analytics to create value. The objective of this research was to assist organisations that are on the journey to becoming data-led, by exploring a leadersÕ perspective on the required building blocks of the process through which big data and analytics create value. This topic was explored through nine qualitative, semi-structured interviews with leaders of financial service organisations operating in South Africa. The study found that organisations that created value from big data and analytics needed leadership support to be able to successfully create a data-led decision making environment. Furthermore, organisations needed diverse skills embodied in staff that were willing to learn continuously, had strong quantitative abilities and business acumen. Different physical infrastructure is also needed, and this created a need for financing. Importantly, organisations also needed to have an understanding of what value they were pursing through a big data initiative.
Mini Dissertation (MBA)--University of Pretoria, 2017.
lt2018
Gordon Institute of Business Science (GIBS)
MBA
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

44

Niland, Michael John. "Toward the influence of the organisation on big data analytics." Diss., University of Pretoria, 2017. http://hdl.handle.net/2263/64902.

Full text

Abstract:

Big Data Analytics (BDA) capabilities are of significant interest to organisations as they are reportedly able to enhance Firm Performance (FPer). This is or particular interest given the ever increasing demands on business to remain competitive within a dynamic market. Though BDA has been shown to be effective across diverse industries and applications, it has also been shown to be ineffective in many occasions. This study draws on theories concerning organisational capabilities, including dynamic capability theory, decision making theory and organisational culture theory to assess the influence of the organisation on the effectiveness of a BDA capability. In so doing, this study extends the above research streams by assessing the direct and moderated influences of Firm Dynamic Capability (FDC), Organisational Culture to BDA (OCDA) and Data Driven Decision Making (DDDM) on the BDA capability, and the resulting impact on FPer. The hierarchical research model was assessed with 80 online survey responses from respondents primarily based in South Africa who were directly associated with a BDA capability in their organisation. The results illustrate the significant direct and moderating impact of on BDA and FPer. The implications of these findings are discussed relative to both theoretical and practical applications in business settings, followed by considerations for further research required in the field.
Mini Dissertation (MBA)--University of Pretoria, 2017.
km2018
Gordon Institute of Business Science (GIBS)
MBA
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

45

Khan, Mukhtaj. "Hadoop performance modeling and job optimization for big data analytics." Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/11078.

Full text

Abstract:

Big data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.

APA, Harvard, Vancouver, ISO, and other styles

46

Rashid, A. N. M. Bazlur. "Cooperative co-evolution-based feature selection for big data analytics." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2428.

Full text

Abstract:

The rapid progress of modern technologies generates a massive amount of highthroughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (attributes). However, irrelevant features may degrade the classification performance of ML algorithms. Feature selection (FS) is a combinatorial optimisation technique used to select a subset of relevant features that represent the dataset. For example, FS is an effective preprocessing step of anomaly detection techniques in Big Cybersecurity Datasets. Evolutionary algorithms (EAs) are widely used search strategies for feature selection. A variant of EAs, called a cooperative co-evolutionary algorithm (CCEA) or simply cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for large-scale optimisation problems. The goal of this thesis is to investigate and develop three key research issues related to feature selection in Big Data and anomaly detection using feature selection in Big Cybersecurity Data. The first research problem of this thesis is to investigate and develop a feature selection framework using CCEA. The objective of feature selection is twofold: selecting a suitable subset of features or in other words, reducing the number of features to decrease computations and improving classification accuracy, which are contradictory, but can be achieved using a single objective function. Using only classification accuracy as the objective function for FS, EAs, such as CCEA, achieves higher accuracy, even with a higher number of features. Hence, this thesis proposes a penalty-based wrapper single objective function. This function has been used to evaluate the FS process using CCEA, henceforth called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). Experimental analysis was performed using six widely used classifiers on six different datasets, with and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Furthermore, the performance results have been compared with four other state-of-the-art techniques. CC decomposes a large and complex problem into several subproblems, optimises each subproblem independently, and collaborates different subproblems only to build a complete solution of the problem. The existing decomposition solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. However, for real-world problems without any prior information about how the features in a dataset interact, it is difficult to find a suitable problem decomposition technique for feature selection. Hence, the second research problem of this thesis is to investigate and develop a decomposition method that can decompose Big Datasets dynamically, and can ensure the probability of grouping interacting features into the same subcomponent. Accordingly, this thesis proposes a random feature grouping (RFG) with three variants. RFG has been used in the CC-based FS process, hence called Cooperative Co-Evolution-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis performed using six widely used ML classifiers on seven different datasets, with and without FS, indicates that, in most cases, the proposed CCFSRFG-1 outperforms CCEAFS and CCFSRFG-2, and also does so when using all features. Furthermore, the performance results have been compared with five other state-of-theart techniques. Anomaly detection from Big Cybersecurity Datasets is very important; however, this is a very challenging and computationally expensive task. Feature selection in cybersecurity datasets may improve and quantify the accuracy and scalability of both supervised and unsupervised anomaly detection techniques. The third research problem of this thesis is to investigate and develop an anomaly detection approach using feature selection that can improve the anomaly detection performance, and also reduce the execution time. Accordingly, this thesis proposes an Anomaly Detection Using Feature Selection (ADUFS) to deal with this research problem. Experiments were performed on five different benchmark cybersecurity datasets, with and without feature selection, and the performance of both supervised and unsupervised anomaly detection techniques were investigated by ADUFS. The experimental results indicate that, instead of using the original dataset, a dataset with a reduced number of features yields better performance in terms of true positive rate (TPR) and false positive rate (FPR) than the existing techniques for anomaly detection. In addition, all anomaly detection techniques require less computational time when using datasets with a suitable subset of features rather than entire datasets. Furthermore, the performance results have been compared with six other state-of-the-art techniques.

APA, Harvard, Vancouver, ISO, and other styles

47

Kämpe, Gabriella. "How Big Data Affects UserExperienceReducing cognitive load in big data applications." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163995.

Full text

Abstract:

We have entered the age of big data. Massive data sets are common in enterprises, government, and academia. Interpreting such scales of data is still hard for the human mind. This thesis investigates how proper design can decrease the cognitive load in data-heavy applications. It focuses on numeric data describing economic growth in retail organizations. It aims to answer the questions: What is important to keep in mind when designing an interface that holds large amounts of data? and How to decrease the cognitive load in complex user interfaces without reducing functionality?. It aims to answer these questions by comparing two user interfaces in terms of efficiency, structure, ease of use and navigation. Each interface holds the same functionality and amount of data, but one is designed to increase user experience by reducing cognitive load. The design choices in the second application are based on the theory found in the literature study in the thesis.

APA, Harvard, Vancouver, ISO, and other styles

48

Boquet, Pujadas Guillem. "Contributions to Intelligent Transportation Systems. Big data analytics for reliable and valuable data." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/673761.

Full text

Abstract:

La indústria del transport ha entrat en l’era del big data. Part de les dades difoses pels vehicles i la infraestructura connectats està sent explotada per Intelligent Transport Systems (ITS), aplicacions avançades en què les tecnologies de la informació i la comunicació s’apliquen en el camp de la gestió del tràfic i transport per carretera. En un futur pròxim, és probable que tots els vehicles es comuniquin entre si i amb la infraestructura circumdant, per exemple, per advertir a altres sobre incidents de trànsit o les condicions de la carretera. No obstant això, els requisits de connectivitat i anàlisi de dades per als casos d’ús previstos estan lluny d’estar coberts. Dedicated Short Range Communication (DSRC) és un estàndard basat en l’evolució de IEEE 802.11p Wi-Fi, una de les principals tecnologies que permeten el concepte de vehicle connectat. La primera part d’aquesta tesi aborda la millora de la comunicació directa entre vehicle i infraestructura mitjançant IEEE 802.11p a la capa d’adquisició de dades ITS. L’anàlisi elaborada conclou que la informació que rep la infraestructura difosa a través dels protocols estandarditzats no és prou fiable per permetre aplicacions de seguretat en els encreuaments de carreteres. Per solucionar això, es proposen nous criteris orientats a la infraestructura per adequar els paràmetres de comunicació. A més, es dissenya un nou protocol per a interseccions alineat amb els estàndards que augmenta la fiabilitat de la capa d’adquisició de dades fins al punt de permetre la implementació d’aplicacions de seguretat. A causa que la capa d’adquisició de dades produeix grans quantitats de dades, es requereix l’agregació i el processament d’aquestes a les capes superiors d’aplicació i anàlisi de dades per permetre casos d’ús més avançats. Per exemple, aplicacions crítiques que tenen el potencial impacte de reduir problemes com la seguretat viària, la contaminació, la congestió de trànsit i els costos de transport. La segona part de la tesi proposa un model generatiu basat en deep learning que es pot utilitzar de manera no supervisada per resoldre múltiples problemes dels ITS. Les dades recopilades pels ITS s’exploten i es transformen en un actiu valuós per a les aplicacions de seguretat i la presa de decisions, sense la necessitat de coneixements addicionals ni de dades etiquetades. El model permet comprimir de manera eficient les dades i pronosticar el trànsit, imputar valors que falten, seleccionar les millors dades i models per a un problema específic i detectar dades de trànsit anòmales al mateix temps. L’última part de la tesi està motivada per la creixent preocupació que genera l’eficiència de les solucions ITS i la gran quantitat de dades que s’espera processar. L’algoritme presentat permet derivar de manera automàtica i eficient la mínima arquitectura del model que proporciona la màxima compressió de la informació i manté la màxima informació útil sobre les dades de trànsit originals. D’aquesta manera, el rendiment del sistema de pronòstic de trànsit ITS posterior no es veu afectat negativament, sinó que es beneficia del fet que les dades es representen amb menys dimensions, la qual cosa és de vital importància en l’era del big data. Les bases de l’algoritme es prenen de conceptes teòrics de la Teoria de la Informació aplicats a les xarxes neuronals, anant un pas més enllà dels mètodes actualment disponibles que es basen en prova i error.
La industria del transporte ha entrado en la era del big data. Parte de los datos difundidos por los vehículos y la infraestructura conectados está siendo explotada por Intelligent Transport Systems (ITS), aplicaciones avanzadas en las que las tecnologías de la información y la comunicación se aplican en el campo de la gestión del tráfico del transporte por carretera. En un futuro próximo, es probable que todos los vehículos se comuniquen entre sí y con la infraestructura circundante, por ejemplo, para advertir a otros sobre incidentes de tráfico o las condiciones de la carretera. Sin embargo, los requisitos de conectividad y análisis de datos para los casos de uso previstos están lejos de estar cubiertos. Dedicated Short Range Communication (DSRC) es un estándar basado en la evolución de IEEE 802.11p Wi-Fi, una de las principales tecnologías que permiten el concepto de vehículo conectado. La primera parte de esta tesis aborda la mejora de la comunicación directa entre vehículo e infraestructura mediante IEEE 802.11p en la capa de adquisición de datos ITS. El análisis realizado concluye que la información que recibe la infraestructura difundida a través de los protocolos estandarizados no es lo suficientemente fiable como para permitir aplicaciones de seguridad en los cruces de carreteras. Para solucionar esto, se proponen nuevos criterios orientados a la infraestructura para adecuar los parámetros de comunicación. Además, se diseña un nuevo protocolo alineado con los estándares para intersecciones que aumenta la confiabilidad de la capa de adquisición de datos hasta el punto de permitir la implementación de aplicaciones de seguridad. Debido a que la capa de adquisición de datos produce grandes cantidades de datos, se requiere la agregación y el procesamiento de estos en las capas superiores de aplicación y análisis de datos para permitir casos de uso más avanzados. Por ejemplo, aplicaciones críticas que tienen el potencial impacto de reducir problemas como la seguridad vial, la contaminación, la congestión del tráfico y los costes de transporte. La segunda parte de la tesis propone un modelo generativo basado en deep learning que se puede utilizar de manera no supervisada para resolver múltiples problemas de los ITS. Los datos recopilados por los ITS se explotan y se transforman en un activo valioso para las aplicaciones de seguridad y la toma de decisiones, sin la necesidad de conocimientos adicionales ni de datos etiquetados. El modelo permite comprimir de manera eficiente los datos y pronosticar el tráfico, imputar valores faltantes, seleccionar los mejores datos y modelos para un problema específico y detectar datos de tráfico anómalos al mismo tiempo. La última parte de la tesis está motivada por la creciente preocupación que genera la eficiencia de las soluciones ITS y la gran cantidad de datos que se espera procesar. El algoritmo presentado permite derivar de manera automática y eficiente la mínima arquitectura del modelo que proporciona la máxima compresión de la información y mantiene la máxima información útil sobre los datos de tráfico originales. De esta manera, el rendimiento del sistema de pronóstico de tráfico ITS posterior no se ve afectado negativamente, sino que se beneficia del hecho de que los datos se representan con menos dimensiones, lo cual es de vital importancia en la era del big data. Las bases del algoritmo se toman de conceptos teóricos de la Teoría de la Información aplicados a las redes neuronales, yendo un paso más allá de los métodos actualmente disponibles que se basan en prueba y error.
Transportation industry has entered the era of big data. Part of the data disseminated by connected vehicles and infrastructure is being exploited by Intelligent Transport Systems (ITS), advanced applications in which information and communication technologies are applied in the field of road transport traffic management. In the upcoming future, all road vehicles are likely to communicate with one another and the surrounding infrastructure, for example, to warn others about traffic incidents or poor road conditions. But, the connectivity and data analytics requirements for the envisaged use cases are far from covered. Dedicated Short Range Communication (DSRC) is a higher layer standard based on the evolution of IEEE 802.11p Wi-Fi, one of the main technologies that support the first generation of vehicle-to-everything (V2X) communication. The first part of this dissertation addresses the improvement of IEEE 802.11p direct vehicular-to-infrastructure communication in the ITS data acquisition layer, which suffers from a well-known scalability problem. The analysis carried out concludes that the data dissemination of standardized protocols is not reliable enough to support safety applications that depend on ITS roadside units located in intersection areas. To solve this, novel infrastructure-oriented criteria is proposed to adapt the communication parameters and an intersection assistance protocol is designed in compliance with the standards to increase the reliability of the data acquisition layer up to the point where safety applications can be implemented. As ITS data acquisition layer produces massive amounts of data, it requires data aggregation and processing in the data analytics and application layer to enable more advanced use cases, mission-critical applications that have the potential impact to reduce problems such as road safety, pollution, traffic congestion and transportation costs. The second part of the dissertation proposes a generative deep learning model that can be used in an unsupervised manner to solve multiple ITS challenges. Big data collected by ITS is exploited and transformed to an asset for safety applications and decision-making, without the need for additional knowledge nor labeled data. The model allows to efficiently compress traffic data and forecast, impute missing values, select the best data and models for a specific problem and detect anomalous traffic data at the same time. The last part of the dissertation is motivated by the growing concern generated by the efficiency of ITS solutions and the large amount of data expected to be processed. The presented algorithm allows to automatically and efficiently derive the minimum expression architecture of the model that provides maximal compressed representations that inform about the original traffic data. In this way, the performance of the subsequent ITS traffic forecasting system is not adversely affected, but benefits from data being represented with fewer dimensions, which is vitally important in the age of big data. The basis of the algorithm is taken from theoretical concepts of Information Theory applied to neural networks, going a step beyond the current available methods that are based on trial and error.
Universitat Autònoma de Barcelona. Programa de Doctorat en Enginyeria Electrònica i de Telecomunicació

APA, Harvard, Vancouver, ISO, and other styles

49

Gardoni, Pietro. "Big Data Analytics: il valore delle informazioni nella strategia e nell'organizzazione aziendale." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Questo elaborato vuole illustrare gli impatti strategici e organizzativi della Big Data Analytics per le organizzazioni moderne nell'attuale contesto competitivo del business. L'obbiettivo vuole essere quello di fornire delle linee guida generali adatte a supportare l'azienda nell'ottica del business data-driven. A tal fine, il testo è stato strutturato partendo con l'illustrazione del legame esistente tra l'informazione e la tecnologia, esaminando inoltre gli effetti nell'ottenimento del vantaggio competitivo. Si è poi voluta presentare una panoramica riguardante la governance dei dati nelle organizzazioni e le implicazioni dei big data all'interno dei modelli di business. Inoltre, sono stati illustrati alcuni aspetti organizzativi chiave da prendere in considerazione in un business data-driven. In aggiunta, sono stati osservati due casi innovativi reali nel campo della Big Data Analytics. L'ultimo capitolo fornisce un riepilogo degli argomenti trattati nel testo insieme ad ulteriori pensieri conclusivi.

APA, Harvard, Vancouver, ISO, and other styles

50

Hauptli, Erich Jurg. "ProGENitor : an application to guide your career." Thesis, 2014. http://hdl.handle.net/2152/28120.

Full text

Abstract:

This report introduces ProGENitor; a system to empower individuals with career advice based on vast amounts of data. Specifically, it develops a machine learning algorithm that shows users how to efficiently reached specific career goals based upon the histories of other users. A reference implementation of this algorithm is presented, along with experimental results that show that it provides quality actionable intelligence to users.
text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!