Dissertations / Theses: 'Streaming Data Analysis'

1

Anagnostopoulos, Christoforos. "A Statistical Framework for Streaming Data Analysis." Thesis, Imperial College London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.520838.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Patni, Harshal Kamlesh. "Real Time Semantic Analysis of Streaming Sensor Data." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1324181415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fairbanks, James Paul. "Graph analysis combining numerical, statistical, and streaming techniques." Diss., Georgia Institute of Technology, 2016. http://hdl.handle.net/1853/54972.

Full text

Abstract:

Graph analysis uses graph data collected on a physical, biological, or social phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each contribute a fragment to understanding the graph; however, their combination allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis.

APA, Harvard, Vancouver, ISO, and other styles

4

Menglei, Min. "Anomaly detection based on multiple streaming sensor data." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-36275.

Full text

Abstract:

Today, the Internet of Things is widely used in various fields, such as factories, public facilities, and even homes. The use of the Internet of Things involves a large number of sensor devices that collect various types of data in real time, such as machine voltage, current, and temperature. These devices will generate a large amount of streaming sensor data. These data can be used to make the data analysis, which can discover hidden relation such as monitoring operating status of a machine, detecting anomalies and alerting the company in time to avoid significant losses. Therefore, the application of anomaly detection in the field of data mining is very extensive. This paper proposes an anomaly detection method based on multiple streaming sensor data and performs anomaly detection on three data sets which are from the real company. First, this project proposes the state transition detection algorithm, state classification algorithm, and the correlation analysis method based on frequency. Then two algorithms were implemented in Python, and then make the correlation analysis using the results from the system to find some possible meaningful relations which can be used in the anomaly detection. Finally, calculate the accuracy and time complexity of the system, and then evaluated its feasibility and scalability. From the evaluation result, it is concluded that the method

APA, Harvard, Vancouver, ISO, and other styles

5

Giannini, Andrea. "Social Network Analysis: Architettura Streaming Big Data di Raccolta e Analisi Dati da Twitter." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25378/.

Full text

Abstract:

Negli ultimi anni i social media, come ad esempio Facebook, Twitter, WhatsApp, YouTube, si sono diffusi a macchia d'olio. Ormai quasi tutti accedono giornalmente su almeno uno di questi per informarsi, esprimere opinioni e interagire con altri utenti. Per questa ragione sono diventati fondamentali per i reparti marketing delle aziende essendo non solo un ottimo canale di comunicazione, ma anche una fonte di informazioni sui clienti e potenziali tali. La tesi si focalizza proprio su quest'ultimo aspetto. Il progetto Social Network Analysis (SNA) vuole essere infatti uno strumento attraverso il quale è possibile visionare e analizzare per intero le reti di interazione tra utenti. Ci si è posti l'obiettivo di realizzare SNA in modo che raccogliesse e si aggiornasse in tempo reale, così da essere sempre al passo con le ultime novità, data la dinamicità delle informazioni all'interno dei social media. Un progetto come SNA comporta dover affrontare diversi ostacoli. Oltre a quello di riuscire a realizzare un'architettura che accolga un flusso continuo di informazioni, uno degli ostacoli più importanti è quello di gestire la grande mole di dati. Per farlo ci si è affidati ad un'architettura distribuita e facilmente scalabile che comprende l'uso di elaborazioni in cluster, di funzioni serverless e di database NoSQL approvvigionati attraverso il servizio cloud di Microsoft, Azure. In questa tesi SNA è stato progettato e implementato basandosi su Twitter, ma è possibile sfruttare la stessa idea su tanti altri social media.

APA, Harvard, Vancouver, ISO, and other styles

6

Kühn, Eileen [Verfasser], and A. [Akademischer Betreuer] Streit. "Online Analysis of Dynamic Streaming Data / Eileen Kühn ; Betreuer: A. Streit." Karlsruhe : KIT-Bibliothek, 2018. http://d-nb.info/1161008721/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Moitra, Anindya. "Computation and Application of Persistent Homology on Streaming Data." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613686214764863.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zubeir, Abdulghani Ismail. "OAP: An efficient online principal component analysis algorithm for streaming EEG data." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-392403.

Full text

Abstract:

Data processing on streaming data poses computational as well as statistical challenges. Streaming data requires that data processing algorithms are able to process a new data point within micro-seconds. This is especially challenging on dimension reduction, where traditional methods as Principal Component Analysis (PCA) require eigenvectors decomposition of a matrix based on the complete dataset. So a proper online version of PCA should avoid this computational involved step in favor for a more efficient update rule. This is implemented by an algorithm named Online Angle Preservation (OAP), which is able to handle large dimensions in the required time limitations. This project presents an application of OAP in the case of Electroencephalography (EEG). For this, an interface was coded from an openBCI EEG device, through a Java API to a streaming environment called Stream Analyzer (sa.engine). The performance of this solution was compared to a standard Windowised PCA solution, indicating its competitive performance. This report details this setup and details the results.

APA, Harvard, Vancouver, ISO, and other styles

9

Vigraham, Sushrutha. "Design and Analysis of a Real-time Data Monitoring Prototype for the LWA Radio Telescope." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/31306.

Full text

Abstract:

Increasing computing power has been helping researchers understand many complex scientific problems. Scientific computing helps to model and visualize complex processes such as molecular modelling, medical imaging, astrophysics and space exploration by processing large set of data streams collected through sensors or cameras. This produces a massive amount of data which consume a large amount of processing and storage resources. Monitoring the data streams and filtering unwanted information will enable efficient use of the available resources. This thesis proposes a data-centric system that can monitor high-speed data streams in real-time. The proposed system provides a flexible environment where users can plug-in application-specific data monitoring algorithms. The Long Wavelength Array telescope (LWA) is an astronomical apparatus that works with high speed data streams, and the proposed data-centric platform is developed to evaluate FPGAs to implement data monitoring algorithms in LWA. The throughput of the data-centric system has been modeled and it is observed that the developed data-centric system can deliver a maximum throughput of 164 MB/s. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

10

Landford, Jordan. "Event Detection Using Correlation within Arrays of Streaming PMU Data." PDXScholar, 2016. http://pdxscholar.library.pdx.edu/open_access_etds/3031.

Full text

Abstract:

This thesis provides a synchrophasor data analysis methodology that leverages both statistical correlation techniques and a statistical distribution in order to identify data inconsistencies, as well as power system contingencies. This research utilizes archived Phasor Measurement Unit (PMU) data obtained from the Bonneville Power Administration in order to show that this methodology is not only feasible, but extremely useful for power systems monitoring, decision support, and planning purposes. By analyzing positive sequence voltage angles between a pair of PMUs at two different substation locations, an historic record of correlation is established. From this record, a Rayleigh distribution of correlation coefficients is calculated. The statistical parameters of this Rayleigh distribution are used to infer occurrences of power system and data events. To monitor an entire system, a simple solution would be observing each of these parameters for every PMU combination. One issue with this approach is that correlation of some PMU pairs may be redundant or yield little value to monitoring capabilities. Additionally, this approach quickly encounters scalability issues as each additional PMU adds considerably to computation - for example, if the system contains n PMUs the amount of computations will be n(n-1)/2. System-wide monitoring of these parameters in this fashion is cumbersome and inefficient. To address these issues, an alternative scheme is proposed which involves monitoring only a subset of PMUs characterized by electrically coupled zones, or clusters, of PMUs. These clusters include both electrically-distant and electrically-near PMU sites. When monitored over an event, these yield statistical parameters sufficient for detecting event occurrences. This clustering scheme can be utilized to significantly decrease computation time and allocation of resources while maintaining optimal system observability. Results from the statistical methods are presented for a select few case studies for both data and power system event detection. In addition, determination of cluster size and content is discussed in detail. Lastly, the viability of monitoring pertinent statistical parameters over various clustering schemes is demonstrated.

APA, Harvard, Vancouver, ISO, and other styles

11

Akhmedov, Iliiazbek. "Parallelization of Push-based System for Molecular Simulation Data Analysis with GPU." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6448.

Full text

Abstract:

Modern simulation systems generate big amount of data, which consequently has to be analyzed in a timely fashion. Traditional database management systems follow principle of pulling the needed data, processing it, and then returning the results. This approach is then optimized by means of caching, storing in different structures, or doing some sacrifices on precision of the results to make it faster. When it comes to the point of doing various queries that require analysis of the whole data, this design has the following disadvantages: considerable overhead on traditional disk random I/O framework while reading from the simulation output files and low throughput of the data that consequently results in long latency, and, if there was any indexing to optimize selections, overhead of storing those becomes too big, too. Beside it, indexing will also cause delay during write operations and since most of the queries work with the entire data sets, indexing loses its point. There is a new approach to this problem – Push-based System for Molecular Simulation Data Analysis for processing network of queries proposed in the previous paper and its primary steps are: i) it uses traditional scan-based I/O framework to load the data from files to the main memory and then ii) the data is pushed through a network of queries which consequently filters the data and collect all the needed information which increases efficiency and data throughput. It has a considerable advantage in analysis of molecular simulation data, because it normally involves all the data sets to be processed by the queries. In this paper, we propose improved version of Push-based System for Molecular Simulation Data Analysis. Its major difference with the previous design is usage of GPU for the actual processing part of the data flow. Using the same scan-based I/O framework the data is pushed through the network of queries which are processed by GPU, and due to the nature of science simulation data, this gives a big advantage for processing it faster and easier (it will be explained more in later sections). In the old approach there were some custom data structures such as quad-tree for calculation of histograms to make the processing faster and those involved loss of data and some expectations from the data nature, too. In the new approach due to high performance of GPU processing and its nature, custom data structures were not even needed much, though it didn’t bear any loss in precision and performance.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhou, Pu. "A dynamic approximate representation scheme for streaming time series." Connect to thesis, 2009. http://repository.unimelb.edu.au/10187/6766.

Full text

Abstract:

The huge volume of time series data generated in many applications poses new challenges in the techniques of data storage, transmission, and computation. Further more, when the time series are in the form of streaming data, new problems emerge and new techniques are required because of the streaming characteristics, e.g. high volume, high speed and continuous flowing. Approximate representation is one of the most efficient and effective solutions to address the large-volume-high-speed problem. In this thesis, we propose a dynamic representation scheme for streaming time series. Existing methods use a unitary function form for the entire approximation task. In contrast, our method adopts a set of function candidates such as linear function, polynomial function(degree ≥ 2), and exponential function. We provide a novel segmenting strategy to generate subsequences and dynamically choose candidate functions to approximate the subsequences. Since we are dealing with streaming time series, the segmenting points and the corresponding approximate functions are incrementally produced. For a certain function form, we use a buffer window to find the local farthest possible segmenting point under a user specified error tolerance threshold. To achieve this goal, we define a feasible space for the coefficients of the function and show that we can indirectly find the local best segmenting point by the calculation in the coefficient space. Given the error tolerance threshold, the candidate function representing more information by unit parameter is chosen as the approximate function. Therefore, our representation scheme is more flexible and compact. We provide two dynamic algorithms, PLQS and PLQES, which involve two and three candidate functions, respectively. We also present the general strategy of function selection when more candidate functions are considered. In the experimental test, we examine the effectiveness of our algorithms with synthetic and real time series data sets. We compare our method with the piecewise linear approximation method and the experimental results demonstrate the evident superiority of our dynamic approach under the same error tolerance threshold.

APA, Harvard, Vancouver, ISO, and other styles

13

Rivetti, di Val Cervo Nicolo. "Efficient Stream Analysis and its Application to Big Data Processing." Thesis, Nantes, 2016. http://www.theses.fr/2016NANT4046/document.

Full text

Abstract:

L’analyse de flux de données est utilisée dans beaucoup de contexte où la masse des données et/ou le débit auquel elles sont générées, excluent d’autres approches (par exemple le traitement par lots). Le modèle flux fourni des solutions aléatoires et/ou fondées sur des approximations pour calculer des fonctions d’intérêt sur des flux (repartis) de n-uplets, en considérant le pire cas, et en essayant de minimiser l’utilisation des ressources. En particulier, nous nous intéressons à deux problèmes classiques : l’estimation de fréquence et les poids lourds. Un champ d’application moins courant est le traitement de flux qui est d’une certaine façon un champ complémentaire aux modèle flux. Celui-ci fournis des systèmes pour effectuer des calculs génériques sur les flux en temps réel souple, qui passent à l’échèle. Cette dualité nous permet d’appliquer des solutions du modèle flux pour optimiser des systèmes de traitement de flux. Dans cette thèse, nous proposons un nouvel algorithme pour la détection d’éléments surabondants dans des flux repartis, ainsi que deux extensions d’un algorithme classique pour l’estimation des fréquences des items. Nous nous intéressons également à deux problèmes : construire un partitionnement équitable de l’univers des n-uplets par rapport à leurs poids et l’estimation des valeurs de ces n-uplets. Nous utilisons ces algorithmes pour équilibrer et/ou délester la charge dans les systèmes de traitement de flux Nowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). The data streaming model provides randomized and/or approximated solutions to compute specific functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. In particular, we look into two classical and related data streaming problems: frequency estimation and (distributed) heavy hitters. A less common field of application is stream processing which is somehow complementary and more practical, providing efficient and highly scalable frameworks to perform soft real-time generic computation on streams, relying on cloud computing. This duality allows us to apply data streaming solutions to optimize stream processing systems. In this thesis, we provide a novel algorithm to track heavy hitters in distributed streams and two extensions of a well-known algorithm to estimate the frequencies of data items. We also tackle two related problems and their solution: provide even partitioning of the item universe based on their weights and provide an estimation of the values carried by the items of the stream. We then apply these results to both network monitoring and stream processing. In particular, we leverage these solutions to perform load shedding as well as to load balance parallelized operators in stream processing systems

APA, Harvard, Vancouver, ISO, and other styles

14

Grupchev, Vladimir. "Improvements on Scientific System Analysis." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5851.

Full text

Abstract:

Thanks to the advancement of the modern computer simulation systems, many scientific applications generate, and require manipulation of large volumes of data. Scientific exploration substantially relies on effective and accurate data analysis. The shear size of the generated data, however, imposes big challenges in the process of analyzing the system. In this dissertation we propose novel techniques as well as using some known designs in a novel way in order to improve scientific data analysis. We develop an efficient method to compute an analytical query called spatial distance histogram (SDH). Special heuristics are exploited to process SDH efficiently and accurately. We further develop a mathematical model to analyze the mechanism leading to errors. This gives rise to a new approximate algorithm with improved time/accuracy tradeoff. Known MS analysis systems follow a pull-based design, where the executed queries mandate the data needed on their part. Such a design introduces redundant and high I/O traffic as well as cpu/data latency. To remedy such issues, we design and implement a push-based system, which uses a sequential scan-based I/O framework that pushes the loaded data to a number of pre-programmed queries. The efficiency of the proposed system as well as the approximate SDH algorithms is backed by the results of extensive experiments on MS generated data.

APA, Harvard, Vancouver, ISO, and other styles

15

Ahmed, Kachkach. "Analyzing user behavior and sentiment in music streaming services." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186527.

Full text

Abstract:

These last years, streaming services (for music, podcasts, TV shows and movies) have been under the spotlight by disrupting traditional media consumption platforms. If the technical implications of streaming huge amounts of data are well researched, much remains to be done to analyze the wealth of data collected by these services and exploit it to its full potential in order to improve them. Using raw data about users’ interactions with the music streaming service Spotify, this thesis focuses on three main concepts: streaming context, user attention and the sequential analysis of user actions. We discuss the importance of each of these aspects and propose different statistical and machine learning techniques to model them. We show how these models can be used to improve streaming services by inferring user sentiment and improving recommender systems, characterizing user sessions, extracting behavioral patterns and providing useful business metrics. De senaste åren har strömningtjänster (för musik, podcasts, TV-serier och filmer) varit i strålkastarljuset genom att förändra synen på hur vi konsumerar media. Om det tekniska impikationerna av att strömma stora mängder data är väl utforskat finns det mycket kvar i att analysera de stora datamängderna som samlas in för att förstå och förbättra tjänsterna. Genom att använda rådata om hur användarna interagerar med musiktjänsten Spotify, fokuserar den här uppsatsen på tre huvudkoncept: strömmandets kontext, användares uppmäksamhet samt sekvensiell analys av användares handlingar. Vi diskuterar betydelsen av varje koncept och föreslår en olika statistiska och maskininlärningstekniker för att modellera dem. Vi visar hur dessa modeller kan användas för att förbättra strömmningstjänster genom att antyda användares sentiment, förbättra rekommendationer, karaktärisera användarsessioner, extrahera betendemönster och ta fram användbar affärsdata.

APA, Harvard, Vancouver, ISO, and other styles

16

Hilley, David B. "Temporal streams programming abstractions for distributed live stream analysis applications /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31695.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2010. Committee Chair: Ramachandran, Umakishore; Committee Member: Clark, Nathan; Committee Member: Haskin, Roger; Committee Member: Pu, Calton; Committee Member: Rehg, James. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

17

Gratorp, Christina. "Bitrate smooting: a study on traffic shaping and -analysis in data networks." Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10136.

Full text

Abstract:

Examensarbetet bakom denna rapport utgör en undersökande studie om hur transmission av mediadata i nätverk kan göras effektivare. Det kan åstadkommas genom att viss tilläggsinformation avsedd för att jämna ut datatakten adderas i det realtidsprotokoll, Real Time Protocol, som används för strömmande media. Genom att försöka skicka lika mycket data under alla konsekutiva tidsintervall i sessionen kommer datatakten vid en godtycklig tidpunkt med större sannolikhet att vara densamma som vid tidigare klockslag. En streamingserver kan tolka, hantera och skicka data vidare enligt instruktionerna i protokollets sidhuvud. Datatakten jämnas ut genom att i förtid, under tidsintervall som innehåller mindre data, skicka även senare data i strömmen. Resultatet av detta är en utjämnad datataktskurva som i sin tur leder till en jämnare användning av nätverkskapaciteten.Arbetet inkluderar en översiktlig analys av beteendet hos strömmande media, bakgrundsteori om filkonstruktion och nätverksteknologier samt ett förslag på hur mediafiler kan modifieras för att uppfylla syftet med examensarbetet. Resultat och diskussion kan förhoppningsvis användas som underlag för en framtida implementation av en applikation ämnad att förbättra trafikflöden över nätverk.

APA, Harvard, Vancouver, ISO, and other styles

18

Aussel, Nicolas. "Real-time anomaly detection with in-flight data : streaming anomaly detection with heterogeneous communicating agents." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLL007/document.

Full text

Abstract:

Avec l'augmentation du nombre de capteurs et d'actuateurs dans les avions et le développement de liaisons de données fiables entre les avions et le sol, il est devenu possible d'améliorer la sécurité et la fiabilité des systèmes à bord en appliquant des techniques d'analyse en temps réel. Cependant, étant donné la disponibilité limité des ressources de calcul embarquées et le coût élevé des liaisons de données, les solutions architecturelles actuelles ne peuvent pas exploiter pleinement toutes les ressources disponibles, limitant leur précision.Notre but est de proposer un algorithme distribué de prédiction de panne qui pourrait être exécuté à la fois à bord de l'avion et dans une station au sol tout en respectant un budget de communication. Dans cette approche, la station au sol disposerait de ressources de calcul rapides et de données historiques et l'avion disposerait de ressources de calcul limitées et des données de vol actuelles.Dans cette thèse, nous étudierons les spécificités des données aéronautiques et les méthodes déjà existantes pour produire des prédictions de pannes à partir de ces dernières et nous proposerons une solution au problème posé. Notre contribution sera détaillé en trois parties.Premièrement, nous étudierons le problème de prédiction d'événements rares créé par la haute fiabilité des systèmes aéronautiques. Beaucoup de méthodes d'apprentissage en classification reposent sur des jeux de données équilibrés. Plusieurs approches existent pour corriger le déséquilibre d'un jeu de donnée et nous étudierons leur efficacité sur des jeux de données extrêmement déséquilibrés.Deuxièmement, nous étudierons le problème d'analyse textuelle de journaux car de nombreux systèmes aéronautiques ne produisent pas d'étiquettes ou de valeurs numériques faciles à interpréter mais des messages de journaux textuels. Nous étudierons les méthodes existantes basées sur une approche statistique et sur l'apprentissage profond pour convertir des messages de journaux textuels en une forme utilisable en entrée d'algorithmes d'apprentissage pour classification. Nous proposerons notre propre méthode basée sur le traitement du langage naturel et montrerons comment ses performances dépassent celles des autres méthodes sur un jeu de donnée public standard.Enfin, nous offrirons une solution au problème posé en proposant un nouvel algorithme d'apprentissage distribué s'appuyant sur deux paradigmes d'apprentissage existant, l'apprentissage actif et l'apprentissage fédéré. Nous détaillerons notre algorithme, son implémentation et fournirons une comparaison de ses performances avec les méthodes existantes With the rise of the number of sensors and actuators in an aircraft and the development of reliable data links from the aircraft to the ground, it becomes possible to improve aircraft security and maintainability by applying real-time analysis techniques. However, given the limited availability of on-board computing and the high cost of the data links, current architectural solutions cannot fully leverage all the available resources limiting their accuracy.Our goal is to provide a distributed algorithm for failure prediction that could be executed both on-board of the aircraft and on a ground station and that would produce on-board failure predictions in near real-time under a communication budget. In this approach, the ground station would hold fast computation resources and historical data and the aircraft would hold limited computational resources and current flight's data.In this thesis, we will study the specificities of aeronautical data and what methods already exist to produce failure prediction from them and propose a solution to the problem stated. Our contribution will be detailed in three main parts.First, we will study the problem of rare event prediction created by the high reliability of aeronautical systems. Many learning methods for classifiers rely on balanced datasets. Several approaches exist to correct a dataset imbalance and we will study their efficiency on extremely imbalanced datasets.Second, we study the problem of log parsing as many aeronautical systems do not produce easy to classify labels or numerical values but log messages in full text. We will study existing methods based on a statistical approach and on Deep Learning to convert full text log messages into a form usable as an input by learning algorithms for classifiers. We will then propose our own method based on Natural Language Processing and show how it outperforms the other approaches on a public benchmark.Last, we offer a solution to the stated problem by proposing a new distributed learning algorithm that relies on two existing learning paradigms Active Learning and Federated Learning. We detail our algorithm, its implementation and provide a comparison of its performance with existing methods

APA, Harvard, Vancouver, ISO, and other styles

19

Agarwal, Virat. "Algorithm design on multicore processors for massive-data analysis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34839.

Full text

Abstract:

Analyzing massive-data sets and streams is computationally very challenging. Data sets in systems biology, network analysis and security use network abstraction to construct large-scale graphs. Graph algorithms such as traversal and search are memory-intensive and typically require very little computation, with access patterns that are irregular and fine-grained. The increasing streaming data rates in various domains such as security, mining, and finance leaves algorithm designers with only a handful of clock cycles (with current general purpose computing technology) to process every incoming byte of data in-core at real-time. This along with increasing complexity of mining patterns and other analytics puts further pressure on already high computational requirement. Processing streaming data in finance comes with an additional constraint to process at low latency, that restricts the algorithm to use common techniques such as batching to obtain high throughput. The primary contributions of this dissertation are the design of novel parallel data analysis algorithms for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive streaming data, financial market data feed processing and analytics, and data transformation, that capture the machine-independent aspects, to guarantee portability with performance to future processors, with high performance implementations on multicore processors that embed processorspecific optimizations. Our breadth first search graph traversal algorithm demonstrates a capability to process massive graphs with billions of vertices and edges on commodity multicore processors at rates that are competitive with supercomputing results in the recent literature. We also present high performance scalable keyword scanning on streaming data using novel automata compression algorithm, a model of computation based on small software content addressable memories (CAMs) and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level algorithmic approach to process financial feeds we present a solution that decodes and normalizes option market data at rates an order of magnitude more than the current needs of the market, yet portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm design challenges to process massive-data and present solutions and techniques that we believe can be used and extended to solve future research problems in this domain.

APA, Harvard, Vancouver, ISO, and other styles

20

Markou, Ioannis. "Analysing User Viewing Behaviour in Video Streaming Services." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292095.

Full text

Abstract:

The user experience offered by a video streaming service plays a fundamental role in customer satisfaction. This experience can be degraded by poor playback quality and buffering issues. These problems can be caused by a user demand that is higher than the video streaming service capacity. Resource scaling methods can increase the available resources to cover the need. However, most resource scaling systems are reactive and scale up in an automated fashion when a certain demand threshold is exceeded. During popular live streaming content, the demand can be so high that even by scaling up at the last minute, the system might still be momentarily under-provisioned, resulting in a bad user experience. The solution to this problem is proactive scaling which is event-based, using content-related information to scale up or down, according to knowledge from past events. As a result, proactive resource scaling is a key factor in providing reliable video streaming services. Users viewing habits heavily affect demand. To provide an accurate model for proactive resource scaling tools, these habits need to be modelled. This thesis provides such a forecasting model for user views that can be used by a proactive resource scaling mechanism. This model is created by applying machine learning algorithms to data from both live TV and over-the-top streaming services. To produce a model with satisfactory accuracy, numerous data attributes were considered relating to users, content and content providers. The findings of this thesis show that user viewing demand can be modelled with high accuracy, without heavily relying on user-related attributes but instead by analysing past event logs and with knowledge of the schedule of the content provider, whether it is live tv or a video streaming service. Användarupplevelsen som erbjuds av en videostreamingtjänst spelar en grundläggande roll för kundnöjdheten. Denna upplevelse kan försämras av dålig uppspelningskvalitet och buffertproblem. Dessa problem kan orsakas av en efterfrågan från användare som är högre än videostreamingtjänstens kapacitet. Resursskalningsmetoder kan öka tillgängliga resurser för att täcka behovet. De flesta resursskalningssystem är dock reaktiva och uppskalas automatiskt när en viss behovströskel överskrids. Under populärt livestreaminginnehåll kan efterfrågan vara så hög att även genom att skala upp i sista minuten kan systemet fortfarande vara underutnyttjat tillfälligt, vilket resulterar i en dålig användarupplevelse. Lösningen på detta problem är proaktiv skalning som är händelsebaserad och använder innehållsrelaterad information för att skala upp eller ner, enligt kunskap från tidigare händelser. Som ett resultat är proaktiv resursskalning en nyckelfaktor för att tillhandahålla tillförlitliga videostreamingtjänster. Användares visningsvanor påverkar efterfrågan kraftigt. För att ge en exakt modell för proaktiva resursskalningsverktyg måste dessa vanor modelleras. Denna avhandling ger en sådan prognosmodell för användarvyer som kan användas av en proaktiv resursskalningsmekanism. Denna modell är skapad genom att använda maskininlärningsalgoritmer på data från både live-TV och streamingtjänster. För att producera en modell med tillfredsställande noggrannhet ansågs ett flertal dataattribut relaterade till användare, innehåll och innehållsleverantörer. Resultaten av den här avhandlingen visar att efterfrågan på användare kan modelleras med hög noggrannhet utan att starkt förlita sig på användarrelaterade attribut utan istället genom att analysera tidigare händelseloggar och med kunskap om innehållsleverantörens schema, vare sig det är live-tv eller tjänster för videostreaming.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhu, Jun. "Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip." Licentiate thesis, Stockholm : Skolan för informations- och kommunikationsteknik, Kungliga Tekniska högskolan, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-10591.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Iegorov, Oleg. "Une approche de fouille de données pour le débogage temporel des applications embarquées de streaming." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM032/document.

Full text

Abstract:

Le déboggage des applications de streaming qui s'exécutent sur les systèmes embarqués multimédia est l'un des domaines les plus exigeants dans le développement de logiciel embarqué. Les nouvelles générations de materiel embarqué introduisent de nouvelles systèmes sur une puce, qui fait que les développeurs du logiciel doivent adapter leurs logiciels aux nouvelles platformes. Le logiciel embarqué doit non seulement fournir des résultats corrects mais aussi le faire en temps réel afin de respecter les propriétés de qualité de service (Quality-of-Service, QoS) du système. Lorsque les propriétés QoS ne sont pas respectées, des bugs temporels font leur apparition. Ces bugs se manifestent comme, par exemple, des glitches dans le flux vidéo ou des craquements dans le flux audio. Le déboggage temporel est en général difficile à effectuer car les bugs temporels n'ont pas souvent de rapport avec l'exactitude fonctionnelle du code des applications, ce qui rend les outils de débogage traditionels, comme GDB, peu utiles. Le non-respect des propriétés QoS peut provenir des interactions entre les applications, ou entre les applications et les processus systèmes. Par conséquent, le contexte d'exécution entier doit être pris en compte pour le déboggage temporel. Les avancements récents en collecte des traces d'exécution permettent aux développeurs de recueillir des traces et de les analyser après la fin d'exécution pour comprendre quelle activité système est responsable des bugs temporels. Cependant, les traces d'exécution ont une taille conséquente, ce qui demande aux devéloppeurs des connaissainces en analyse de données qu'ils n’ont souvent pas.Dans cette thèse, nous proposons SATM - une approche novatrice pour le déboggage temporel des applications de streaming. SATM repose sur la prémisse que les applications sont conçues avec le modèle dataflow, i.e. peuvent être représentées comme un graphe orienté où les données sont transmises entre des unités de calcul (fontions, modules, etc.) appelées "acteurs". Les acteurs doivent être exécutés de manière périodique afin de respecter les propriétés QoS représentées par les contraintes de temps-réél. Nous montrons qu'un acteur qui ne respecte pas de façon répétée sa période pendant l'exécution de l'application cause la violation des contraintes temps-reel de l'application. En pratique, SATM est un workflow d'analyse de données venant des traces d'exécution qui combine des mesures statistiques avec des algorithmes de fouille de données. SATM fournit une méthode automatique du débogage temporel des applications de streaming. Notre approche prend en entrée une trace d'exécution d'une application ayant une QoS basse ainsi qu'une liste de ses acteurs, et tout d'abord détecte des invocations des acteurs dans la trace. SATM découvre ensuite les périodes des acteurs ainsi que les séctions de la trace où la période n'a pas été respectée. Enfin, ces séctions sont analysées afin d'extraire des motifs de l'activité système qui différencient ces sections des autres séctions de la trace. De tels motifs peuvent donner des indices sur l'origine du problème temporel dans le systeme et sont rendus au devéloppeur. Plus précisément, nous représentons ces motifs comme des séquences contrastes minimales et nous étudions des différentes solutions pour fouiller ce type de motifs à partir des traces d'exécution.Enfin, nous montrons la capacité de SATM de détecter une perturbation temporelle injectée artificiellement dans un framework multimedia GStreamer, ainsi que des bugs temporels dans deux cas d'utilisation des applications de streaming industrielles provenant de la société STMicroelectronics. Nous fournissons également une analyse détaillée des algorithmes de fouille de motifs séquentiels appliqués sur les données venant des traces d'exécution, et nous expliquons pour quelle est la raison les algorithmes de pointe n'arrivent pas à fouiller les motifs séquentiels à partir des traces d'exécution de façon efficace Debugging streaming applications run on multimedia embedded systems found in modern consumer electronics (e.g. in set-top boxes, smartphones, etc) is one of the most challenging areas of embedded software development. With each generation of hardware, more powerful and complex Systems-on-Chip (SoC) are released, and developers constantly strive to adapt their applications to these new platforms. Embedded software must not only return correct results but also deliver these results on time in order to respect the Quality-of-Service (QoS) properties of the entire system. The non-respect of QoS properties lead to the appearance of temporal bugs which manifest themselves in multimedia embedded systems as, for example, glitches in the video or cracks in the sound. Temporal debugging proves to be tricky as temporal bugs are not related to the functional correctness of the code, thus making traditional GDB-like debuggers essentially useless. Violations of QoS properties can stem from complex interactions between a particular application and the system or other applications; the complete execution context must be, therefore, taken into account in order to perform temporal debugging. Recent advances in tracing technology allow software developers to capture a trace of the system's execution and to analyze it afterwards to understand which particular system activity is responsible for the violations of QoS properties. However, such traces have a large volume, and understanding them requires data analysis skills that are currently out of the scope of the developers' education.In this thesis, we propose SATM (Streaming Application Trace Miner) - a novel temporal debugging approach for embedded streaming applications. SATM is based on the premise that such applications are designed under the dataflow model of computation, i.e. as a directed graph where data flows between computational units called actors. In such setting, actors must be scheduled in a periodic way in order to meet QoS properties expressed as real-time constraints, e.g. displaying 30 video frames per second. We show that an actor which does not eventually respect its period at runtime causes the violation of the application’s real-time constraints. In practice, SATM is a data analysis workflow combining statistical measures and data mining algorithms. It provides an automatic solution to the problem of temporal debugging of streaming applications. Given an execution trace of a streaming application exhibiting low QoS as well as a list of its actors, SATM firstly determines exact actors’ invocations found in the trace. It then discovers the actors’ periods, as well as parts of the trace in which the periods are not respected. Those parts are further analyzed to extract patterns of system activity that differentiate them from other parts of the trace. Such patterns can give strong hints on the origin of the problem and are returned to the developer. More specifically, we represent those patterns as minimal contrast sequences and investigate various solutions to mine such sequences from execution trace data.Finally, we demonstrate SATM’s ability to detect both an artificial perturbation injected in an open source multimedia framework, as well as temporal bugs from two industrial use cases coming from STMicroelectronics. We also provide an extensive analysis of sequential pattern mining algorithms applied on execution trace data and explain why state-of-the-art algorithms fail to efficiently mine sequential patterns from real-world traces

APA, Harvard, Vancouver, ISO, and other styles

23

Ediger, David. "Analyzing hybrid architectures for massively parallel graph analysis." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/47659.

Full text

Abstract:

The quantity of rich, semi-structured data generated by sensor networks, scientific simulation, business activity, and the Internet grows daily. The objective of this research is to investigate architectural requirements for emerging applications in massive graph analysis. Using emerging hybrid systems, we will map applications to architectures and close the loop between software and hardware design in this application space. Parallel algorithms and specialized machine architectures are necessary to handle the immense size and rate of change of today's graph data. To highlight the impact of this work, we describe a number of relevant application areas ranging from biology to business and cybersecurity. With several proposed architectures for massively parallel graph analysis, we investigate the interplay of hardware, algorithm, data, and programming model through real-world experiments and simulations. We demonstrate techniques for obtaining parallel scaling on multithreaded systems using graph algorithms that are orders of magnitude faster and larger than the state of the art. The outcome of this work is a proposed hybrid architecture for massive-scale analytics that leverages key aspects of data-parallel and highly multithreaded systems. In simulations, the hybrid systems incorporating a mix of multithreaded, shared memory systems and solid state disks performed up to twice as fast as either homogeneous system alone on graphs with as many as 18 trillion edges.

APA, Harvard, Vancouver, ISO, and other styles

24

Zhao, Qi. "Towards Ideal Network Traffic Measurement: A Statistical Algorithmic Approach." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19821.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2008. Committee Chair: Xu, Jun; Committee Member: Ammar, Mostafa; Committee Member: Feamster, Nick; Committee Member: Ma, Xiaoli; Committee Member: Zegura, Ellen.

APA, Harvard, Vancouver, ISO, and other styles

25

Awodokun, Olugbenga. "Classification of Patterns in Streaming Data Using Clustering Signatures." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1504880155623189.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Zamam, Mohamad. "A unified framework for real-time streaming and processing of IoT data." Thesis, Linnéuniversitetet, Institutionen för medieteknik (ME), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-66057.

Full text

Abstract:

The emergence of the Internet of Things (IoT) is introducing a new era to the realm of computing and technology. The proliferation of sensors and actuators that are embedded in things enables these devices to understand the environments and respond accordingly more than ever before. Additionally, it opens the space to unlimited possibilities for building applications that turn this sensation into big benefits, and within various domains. From smart cities to smart transportation and smart environment and the list is quite long. However, this revolutionary spread of IoT devices and technologies rises big challenges. One major challenge is the diversity in IoT vendors that results in data heterogeneity. This research tackles this problem by developing a data management tool that normalizes IoT data. Another important challenge is the lack of practical IoT technology with low cost and low maintenance. That has often limited large-scale deployments and mainstream adoption. This work utilizes open-source data analytics in one unified IoT framework in order to address this challenge. What is more, billions of connected things are generating unprecedented amounts of data from which intelligence must be derived in real-time. This unified framework processes real-time streams of data from IoT. A questionnaire that involved participants with background knowledge in IoT was conducted in order to collect feedback about the proposed framework. The aspects of the framework were presented to the participants in a form of demonstration video describing the work that has been done. Finally, using the participants’ feedback, the contribution of the developed framework to the IoT was discussed and presented.

APA, Harvard, Vancouver, ISO, and other styles

27

Mogis, Jay D. "Transparency, technology and trust: Music metrics and cultural distortion." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/199497/1/Jay_Mogis_Thesis.pdf.

Full text

Abstract:

Transparency issues influence many aspects of modern society and are critical in balancing the dichotomy between personalisation and privacy. Transparency behaviours have previously been refined into the disclosure, clarity and accuracy (DCA) model (Schnackenberg & Tomlinson, 2014). This thesis has two projects. Project 1 makes a significant contribution to the research literature on transparency and music industry practices by testing the DCA model. Project 2 presents an investigation into music licensing, missing revenue and technology transfer in Australia. The findings suggest an accreditation system for copyright throughput accuracy could function as a potential roadmap for transparency innovation more broadly.

APA, Harvard, Vancouver, ISO, and other styles

28

Biswas, Ayan. "Uncertainty and Error Analysis in the Visualization of Multidimensional and Ensemble Data Sets." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1480605991395144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Barros, Victor Perazzolo. "Big data analytics em cloud gaming: um estudo sobre o reconhecimento de padrões de jogadores." Universidade Presbiteriana Mackenzie, 2017. http://tede.mackenzie.br/jspui/handle/tede/3405.

Full text

Abstract:

Submitted by Rosa Assis (rosa_assis@yahoo.com.br) on 2017-11-14T18:05:03Z No. of bitstreams: 2 VICTOR PERAZZOLO BARROS.pdf: 24134660 bytes, checksum: 8761000aa9ba093f81a14b3c2368f2b7 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2017-11-27T12:14:38Z (GMT) No. of bitstreams: 2 VICTOR PERAZZOLO BARROS.pdf: 24134660 bytes, checksum: 8761000aa9ba093f81a14b3c2368f2b7 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Made available in DSpace on 2017-11-27T12:14:38Z (GMT). No. of bitstreams: 2 VICTOR PERAZZOLO BARROS.pdf: 24134660 bytes, checksum: 8761000aa9ba093f81a14b3c2368f2b7 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-02-06 The advances in Cloud Computing and communication technologies enabled the concept of Cloud Gaming to become a reality. Through PCs, consoles, smartphones, tablets, smart TVs and other devices, people can access and use games via data streaming, regardless the computing power of these devices. The Internet is the fundamental way of communication between the device and the game, which is hosted and processed on an environment known as Cloud. In the Cloud Gaming model, the games are available on demand and offered in large scale to the users. The players' actions and commands are sent to servers that process the information and send the result (reaction) back to the players. The volume of data processed and stored in these Cloud environments exceeds the limits of analysis and manipulation of conventional tools, but these data contains information about the players' profile, its singularities, actions, behavior and patterns that can be valuable when analyzed. For a proper comprehension and understanding of this raw data and to make it interpretable, it is necessary to use appropriate techniques and platforms to manipulate this amount of data. These platforms belong to an ecosystem that involves the concepts of Big Data. The model known as Big Data Analytics is an effective and capable way to, not only work with these data, but understand its meaning, providing inputs for assertive analysis and predictive actions. This study searches to understand how these technologies works and propose a method capable to analyze and identify patterns in players' behavior and characteristics on a virtual environment. By knowing the patterns of different players, it is possible to group and compare information, in order to optimize the user experience, revenue for developers and raise the level of control over the environment in a way that players' actions can be predicted. The results presented are based on different analysis modeling using the Hadoop technology combined with data visualization tools and information from open data sources in a dataset of the World of Warcraft game. Fraud detection, users' game patterns, churn prevention inputs and relations with game attractiveness elements are examples of modeling used. In this research, it was possible to map and identify the players' behavior patterns and create a prediction of its frequency and tendency to evade or stay in the game. Os avanços das tecnologias de Computacão em Nuvem (Cloud Computing) e comunicações possibilitaram o conceito de Jogos em Nuvem (Cloud Gaming) se tornar uma realidade. Por meio de computadores, consoles, smartphones, tablets, smart TVs e outros equipamentos é possível acessar via streaming e utilizar jogos independentemente da capacidade computacional destes dispositivos. Os jogos são hospedados e executados em um ambiente computacional conhecido como Nuvem, a Internet é o meio de comunicação entre estes dispositivos e o jogo. No modelo conhecido como Cloud Gaming, compreendesse que os jogos são disponibilizados sob demanda para os usuários e podem ser oferecidos em larga escala. Os comandos e ações dos jogadores são enviados para servidores que processam a informação e enviam o resultado (reação) para o jogador. A quantidade de dados que são processados e armazenados nestes ambientes em Nuvem superam os limites de análise e manipulação de plataformas convencionais, porém tais dados contém informacões sobre o perfil dos jogadores, suas particularidades, ações, comportamentos e padrões que podem ser importantes quando analisados. Para uma devida compreensão e lapidação destes dados brutos, a fim de torná-los interpretáveis, se faz necessário o uso de técnicas e plataformas apropriadas para manipulação desta quantidade de dados. Estas plataformas fazem parte de um ecossistema que envolvem os conceitos de Big Data. Arquiteturas e ferramentas de Big Data, mais especificamente, o modelo denominado Big Data Analytics, são instrumentos eficazes e capazes de não somente trabalhar com estes dados, mas entender seu significado, fornecendo insumos para análise assertiva e predição de acões. O presente estudo busca compreender o funcionamento destas tecnologias e fornecer um método capaz de identificar padrões nos comportamentos e características dos jogadores em ambiente virtual. Conhecendo os padrões de diferentes usuários é possível agrupar e comparar as informações, a fim de otimizar a experiência destes usuários no jogo, aumentar a receita para os desenvolvedores e elevar o nível de controle sobre o ambiente ao ponto que seja possível de prever ações futuras dos jogadores. Os resultados obtidos são derivados de diferentes modelagens de análise utilizando a tecnologia Hadoop combinada com ferramentas de visualização de dados e informações de fontes de dados abertas, em um dataset do jogo World of Warcraft. Detecção de fraude, padrões de jogo dos usuários, insumos para prevencão de churn e relações com elementos de atratividade no jogo, são exemplos de modelagens abordadas. Nesta pesquisa foi possível mapear e identificar os padrões de comportamento dos jogadores e criar uma previsão e tendência de assiduidade sobre evasão ou permanencia de usuários no jogo.

APA, Harvard, Vancouver, ISO, and other styles

30

OUYANG, SHAO-YU, and 歐陽少佑. "Research on Live Streaming Data Analysis of University Basketball Association." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/jm4sd2.

Full text

Abstract:

碩士 輔仁大學 體育學系碩士班 107 The research was focused on the current live streaming situation of the college basketball league in Taiwan ,and to explore the relevant factors from the data. This research was based on a total of 64 games from the rematch to the final on the 106th academic year and the 107th academic year of the University Basketball Association men’s group first level. We used secondary data analysis method. The relevant data was collected from YouTube channel managing system. The results were proceeded by descriptive ststistics and related analysis. The conclusions are as follow: 1.The channel-related data is growing positively, watching ethnic men> women, and the majority of student groups. The Chien Hsin University of Science and Technology is the best among all teams.2.The score gap is not significantly related to live streaming data and the content of the game affects the rating performance of the audience.

APA, Harvard, Vancouver, ISO, and other styles

31

Hsing, Hong-Yu, and 邢弘宇. "A Performance Analysis and Estimation of the Data Stream of Spark Streaming." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/7uem9b.

Full text

Abstract:

碩士 國立臺中科技大學 資訊工程系碩士班 104 Since Spark Streaming handles data in a coarse-grained model (processing a micro-batch of data at a time), delays are inevitable. In the framework of Spark Streaming, data is processed after a certain amount has been collected, which aggravates the problem of delays in data processing. Such delays stem from the design of the framework. In view of that, it is worth contemplating how to calibrate the operational parameters of Spark Streaming. To put it simply, we must try to perform optimization on the processing time and the use of memory. However, it would be very time-consuming to run the program each time optimization is required. Consequently, the study focuses on the analysis and estimation of the DstreamGraph within the framework of Spark Streaming. With a view to increasing the level of parallelism, decreasing the workload of serialization and deserialization, and securing reasonable batch-processing time, the appropriate parameter configuration for an operation is figured out, so that repetitive calibrations are not necessary. The study presents a formula estimation model for transformation parameters, which is effective in analyzing and estimating the duration of a batch-processing cycle. With the model, developers can accurately and swiftly figure out the most appropriate batch-processing time, preventing the redundant restarting and testing of the program for subsequent calibration. It also serves as guidance for setting the batch interval in Spark Streaming to limit the delay within the one-second range.

APA, Harvard, Vancouver, ISO, and other styles

32

Кузьо, Олег Олегович, та Oleg Kuzo. "Розробка інформаційної системи для моніторингу музичних вподобань користувачів стрімінгових сервісів". Bachelor's thesis, 2021. http://elartu.tntu.edu.ua/handle/lib/35714.

Full text

Abstract:

Кваліфікаційна робота присвячена вирішенню проблеми користувача стрімінгового сервісу по аналізу його музичних вподобань. Мета роботи: створити інформаційну систему для аналізу музичних вподобань користувачів стрімінгових сервісів. В першому розділі кваліфікаційної роботи описано предмет дослідження, визначені основні складові для оцінки музичного смаку користувачів, проведено визначення необхідного функціоналу кінцевого продукту на основі існуючих аналогів, та обрано оптимальні рішення для розробки кінцевого продукту. В другому розділі кваліфікаційної роботи детально описані етапи проектування кінцевого продукту. Розглянуто принцип роботи бази даних системи та побудови веб-інтерфейсу для користувача. Також було створено інструкцію користувачу та виведено мінімальні системні вимоги для якісної роботи кінцевого продукту та проведено функціональне та структурне тестування. У третьому розділі висвітлено значення адаптації в трудовому процесі та описано загальні вимоги безпеки з охорони праці для користувачів ПК. Qualification work is devoted to solving the problem of the user of the streaming service for the analysis of his musical preferences. Purpose: to create an information system for analyzing the musical preferences of users of streaming services. The first section of the qualification work describes the subject of research, identifies the main components for assessing the musical taste of users, determines the required functionality of the final product based on existing analogues, and selects the best solutions for the final product. The second section of the qualification work describes in detail the stages of designing the final product. The principle of operation of the system database and construction of the web interface for the user is considered. The user manual was also created and the minimum system requirements for the quality of the final product were derived, and functional and structural testing was performed. The third section highlights the importance of adaptation in the labor process and describes the general safety requirements for occupational safety for PC users. ВСТУП 7 1 ПРЕДМЕТНЕ ДОСЛІЖЕННЯ ТА ВИДІЛЕННЯ ОСНОВНИХ АСПЕКТІВ РОБОТИ 8 1.1 ОПИС ПРЕДМЕТУ ДОСЛІДЖЕННЯ 8 1.2 ПОРІВНЯЛЬНИЙ АНАЛІЗ ІСНУЮЧИХ ІНСТРУМЕНТІВ ДЛЯ МОНІТОРИНГУ СТРІМІНГОВИХ СЕРВІСІВ 10 1.3 ВИЗНАЧЕННЯ ОПТИМАЛЬНИХ РІШЕНЬ ДЛЯ РОЗРОБКИ КІНЦЕВОГО ПРОДУКТУ 17 1.4 КОНЦЕПТУАЛЬНА МОДЕЛЬ 24 1.5 ВИСНОВКИ ДО РОЗДІЛУ 25 2 ПРОЄКТУВАННЯ ТА СТВОРЕННЯ КІНЦЕВОГО ПРОДУКТУ 28 2.1 ПРОЄКТУВАННЯ ТА ВІЗУАЛІЗАЦІЯ АЛГОРИТМІВ РОБОТИ КОМПОНЕНТІВ В СИСТЕМІ 28 2.2 СТВОРЕННЯ ПРОГРАМНО-АПАРАТНОГО СЕРЕДОВИЩА 32 2.3 ЕКСПЕРИМЕНТАЛЬНА ЧАСТИНА 41 2.4 ФУНКЦІОНАЛЬНЕ ТА СТРУКТУРНЕ ТЕСТУВАННЯ 46 2.5 ВИСНОВКИ ДО РОЗДІЛУ 50 3 БЕЗПЕКА ЖИТТЄДІЯЛЬНОСТІ, ОСНОВИ ХОРОНИ ПРАЦІ 53 3.1 ЗНАЧЕННЯ АДАПТАЦІЇ В ТРУДОВОМУ ПРОЦЕСІ. 53 3.2 ЗАГАЛЬНІ ВИМОГИ БЕЗПЕКИ З ОХОРОНИ ПРАЦІ ДЛЯ КОРИСТУВАЧІВ ПК 56 ВИСНОВКИ 61 СПИСОК ВИКОРИСТАНИХ ДЖЕРЕЛ 63

APA, Harvard, Vancouver, ISO, and other styles

33

Ferreira, Ernesto Carlos Casanova. "Big Data Streaming Analytics." Master's thesis, 2019. http://hdl.handle.net/11110/1927.

Full text

Abstract:

Big Data é o tema da ciência e tecnologia dos dias de hoje. Representa o acumular de dados em grande quantidade, heterogéneos, e a grande velocidade, verificados ao longo dos anos. Atualmente um dos desafios reside no desenvolvimento de sistemas e plataformas com capacidade para aproveitar estes dados nas diversas áreas, tais como no meio ambiente (redução da poluição, energia de renovação inteligente, redução do consumo de energia, etc.), na saúde (melhoria do bem-estar humano, qualidade de vida, etc.), na economia (previsão de negócios), na indústria (indústria 4.0, indústria inteligente, etc.), entre muitos outros. Estes objetivos são possíveis de atingir através de: i) recolha; ii) tratamento; iii) análise com consequente ação. O desenvolvimento de uma plataforma que permita analisar dados em streaming provenientes de dispositivos, rede de sensores com fio sem fio, web, feeds de redes sociais, aplicações e muito mais, representa um grande desafio e uma ótima oportunidade, adicionando a capacidade de extrair conhecimento de fluxos de dados em tempo real, identificando padrões e relações, apoiar decisões e prever comportamentos, são objetivos considerados. A gestão de energia representa o cenário onde essas realidades coexistem, ou seja, há muitos dados já armazenados e muitos mais gerados continuamente, onde o estado atual e a previsão (monitorização e controlo) representam ganhos e bases para melhores decisões, ao nível do consumo, da rentabilidade, da eficiência energética, da segurança, etc. Estes dados, pelo seu aumento substancial todos os anos, leva a desenvolver uma tecnologia que permita acompanhar este crescimento. Neste âmbito, com este trabalho pretendeu-se desenvolver uma arquitetura de Streaming que permita a recolha e análise em Big Data para otimização de processos de negócio com um caso de estudo baseado em gestão de energia que permita a análise de dados de sensores em tempo real ou próximo de tempo real. No desenvolvimento desta plataforma, foi objetivo combinar várias tecnologias, que permita entre outros: i) analisar dados de utilidade elevada em tempo real; ii) otimizar custos com os recursos de hardware em instalação On Premises; iii) flexibilidade, permitindo ser aplicável a vários casos de uso; iiii) contribuir para o desenvolvimento de um produto.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Streaming Data Analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles