Dissertations / Theses on the topic 'Noisy Time Series Clustering'

To see the other types of publications on this topic, follow the link: Noisy Time Series Clustering.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Noisy Time Series Clustering.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kim, Doo Young. "Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6277.

Full text
Abstract:
The current study consists of three major parts. Statistical modeling, the connection between statistical modeling and cluster analysis, and proposing new methods to cluster time dependent information. First, we perform a statistical modeling of the Carbon Dioxide (CO2) emission in South Korea in order to identify the attributable variables including interaction effects. One of the hot issues in the earth in 21st century is Global warming which is caused by the marriage between atmospheric temperature and CO2 in the atmosphere. When we confront this global problem, we first need to verify what causes the problem then we can find out how to solve the problem. Thereby, we find and rank the attributable variables and their interactions based on their semipartial correlation and compare our findings with the results from the United States and European Union. This comparison shows that the number one contributing variable in South Korea and the United States is Liquid Fuels while it is the number 8 ranked in EU. This comparison provides the evidence to support regional policies and not global, to control CO2 in an optimal level in our atmosphere. Second, we study regional behavior of the atmospheric CO2 in the United States. Utilizing the longitudinal transitional modeling scheme, we calculate transitional probabilities based on effects from five end-use sectors that produce most of the CO2 in our atmosphere, that is, the commercial sector, electric power sector, industrial sector, residential sector, and the transportation sector. Then, using those transitional probabilities we perform a hierarchical clustering procedure to classify the regions with similar characteristics based on nine US climate regions. This study suggests that our elected officials can proceed to legislate regional policies by end-use sectors in order to maintain the optimal level of the atmospheric CO2 which is required by global consensus. Third, we propose new methods to cluster time dependent information. It is almost impossible to find data that are not time dependent among floods of information that we have nowadays, and it needs not to emphasize the importance of data mining of the time dependent information. The first method we propose is called “Lag Target Time Series Clustering (LTTC)” which identifies actual level of time dependencies among clustering objects. The second method we propose is the “Multi-Factor Time Series Clustering (MFTC)” which allows us to consider the distance in multi-dimensional space by including multiple information at a time. The last method we propose is the “Multi-Level Time Series Clustering (MLTC)” which is especially important when you have short term varying time series responses to cluster. That is, we extract only pure lag effect from LTTC. The new methods that we propose give excellent results when applied to time dependent clustering. Finally, we develop appropriate algorithm driven by the analytical structure of the proposed methods to cluster financial information of the ten business sectors of the N.Y. Stock Exchange. We used in our clustering scheme 497 stocks that constitute the S&P 500 stocks. We illustrated the usefulness of the subject study by structuring diversified financial portfolio.
APA, Harvard, Vancouver, ISO, and other styles
2

Xiong, Yimin. "Time series clustering using ARMA models /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20XIONG.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 49-55). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
3

Jarjour, Riad. "Clustering financial time series for volatility modeling." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6439.

Full text
Abstract:
The dynamic conditional correlation (DCC) model and its variants have been widely used in modeling the volatility of multivariate time series, with applications in portfolio construction and risk management. While popular for its simplicity, the DCC uses only two parameters to model the correlation dynamics, regardless of the number of assets. The flexible dynamic conditional correlation (FDCC) model attempts to remedy this by grouping the stocks into various clusters, each with its own set of parameters. However, it assumes the grouping is known apriori. In this thesis we develop a systematic method to determine the number of groups to use as well as how to allocate the assets to groups. We show through simulation that the method does well in identifying the groups, and apply the method to real data, showing its performance. We also develop and apply a Bayesian approach to this same problem. Furthermore, we propose an instantaneous measure of correlation that can be used in many volatility models, and in fact show that it outperforms the popular sample Pearson's correlation coefficient for small sample sizes, thus opening the door to applications in fields other than finance.
APA, Harvard, Vancouver, ISO, and other styles
4

Torku, Thomas K. "Takens Theorem with Singular Spectrum Analysis Applied to Noisy Time Series." Digital Commons @ East Tennessee State University, 2016. https://dc.etsu.edu/etd/3013.

Full text
Abstract:
The evolution of big data has led to financial time series becoming increasingly complex, noisy, non-stationary and nonlinear. Takens theorem can be used to analyze and forecast nonlinear time series, but even small amounts of noise can hopelessly corrupt a Takens approach. In contrast, Singular Spectrum Analysis is an excellent tool for both forecasting and noise reduction. Fortunately, it is possible to combine the Takens approach with Singular Spectrum analysis (SSA), and in fact, estimation of key parameters in Takens theorem is performed with Singular Spectrum Analysis. In this thesis, we combine the denoising abilities of SSA with the Takens theorem approach to make the manifold reconstruction outcomes of Takens theorem less sensitive to noise. In particular, in the course of performing the SSA on a noisy time series, we branch of into a Takens theorem approach. We apply this approach to a variety of noisy time series.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Jing. "Clustering and forecasting for rain attenuation time series data." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219615.

Full text
Abstract:
Clustering is one of unsupervised learning algorithm to group similar objects into the same cluster and the objects in the same cluster are more similar to each other than those in the other clusters. Forecasting is making prediction based on the past data and efficient artificial intelligence models to predict data developing tendency, which can help to make appropriate decisions ahead. The datasets used in this thesis are the signal attenuation time series data from the microwave networks. Microwave networks are communication systems to transmit information between two fixed locations on the earth. They can support increasing capacity demands of mobile networks and play an important role in next generation wireless communication technology. But inherent vulnerability to random fluctuation such as rainfall will cause significant network performance degradation. In this thesis, K-means, Fuzzy c-means and 2-state Hidden Markov Model are used to develop one step and two step rain attenuation data clustering models. The forecasting models are designed based on k-nearest neighbor method and implemented with linear regression to predict the real-time rain attenuation in order to help microwave transport networks mitigate rain impact, make proper decisions ahead of time and improve the general performance.
Clustering is een van de unsupervised learning algorithmen om groep soortgelijke objecten in dezelfde cluster en de objecten in dezelfde cluster zijn meer vergelijkbaar met elkaar dan die in de andere clusters. Prognoser är att göra förutspårningar baserade på övergående data och effektiva artificiella intelligensmodeller för att förutspå datautveckling, som kan hjälpa till att fatta lämpliga beslut. Dataseten som används i denna avhandling är signaldämpningstidsseriedata från mikrovågsnätverket. Mikrovågsnät är kommunikationssystem för att överföra information mellan två fasta platser på jorden. De kan stödja ökade kapacitetsbehov i mobilnät och spela en viktig roll i nästa generationens trådlösa kommunikationsteknik. Men inneboende sårbarhet för slumpmässig fluktuering som nedbörd kommer att orsaka betydande nätverksförstöring. I den här avhandlingen används K-medel, Fuzzy c-medel och 2-state Hidden Markov Model för att utveckla ett steg och tvåstegs regen dämpning dataklyvningsmodeller. Prognosmodellerna är utformade utifrån k-närmaste granne-metoden och implementeras med linjär regression för att förutsäga realtidsdämpning för att hjälpa mikrovågstransportnät att mildra regnpåverkan, göra rätt beslut före tid och förbättra den allmänna prestandan.
APA, Harvard, Vancouver, ISO, and other styles
6

Nunes, Neuza Filipa Martins. "Algorithms for time series clustering applied to biomedical signals." Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/5666.

Full text
Abstract:
Thesis submitted in the fulfillment of the requirements for the Degree of Master in Biomedical Engineering
The increasing number of biomedical systems and applications for human body understanding creates a need for information extraction tools to use in biosignals. It’s important to comprehend the changes in the biosignal’s morphology over time, as they often contain critical information on the condition of the subject or the status of the experiment. The creation of tools that automatically analyze and extract relevant attributes from biosignals, providing important information to the user, has a significant value in the biosignal’s processing field. The present dissertation introduces new algorithms for time series clustering, where we are able to separate and organize unlabeled data into different groups whose signals are similar to each other. Signal processing algorithms were developed for the detection of a meanwave, which represents the signal’s morphology and behavior. The algorithm designed computes the meanwave by separating and averaging all cycles of a cyclic continuous signal. To increase the quality of information given by the meanwave, a set of wave-alignment techniques was also developed and its relevance was evaluated in a real database. To evaluate our algorithm’s applicability in time series clustering, a distance metric created with the information of the automatic meanwave was designed and its measurements were given as input to a K-Means clustering algorithm. With that purpose, we collected a series of data with two different modes in it. The produced algorithm successfully separates two modes in the collected data with 99.3% of efficiency. The results of this clustering procedure were compared to a mechanism widely used in this area, which models the data and uses the distance between its cepstral coefficients to measure the similarity between the time series.The algorithms were also validated in different study projects. These projects show the variety of contexts in which our algorithms have high applicability and are suitable answers to overcome the problems of exhaustive signal analysis and expert intervention. The algorithms produced are signal-independent, and therefore can be applied to any type of signal providing it is a cyclic signal. The fact that this approach doesn’t require any prior information and the preliminary good performance make these algorithms powerful tools for biosignals analysis and classification.
APA, Harvard, Vancouver, ISO, and other styles
7

Correia, Maria Inês Costa. "Cluster analysis of financial time series." Master's thesis, Instituto Superior de Economia e Gestão, 2020. http://hdl.handle.net/10400.5/21016.

Full text
Abstract:
Mestrado em Mathematical Finance
Esta dissertação aplica o método da Signature como medida de similaridade entre dois objetos de séries temporais usando as propriedades de ordem 2 da Signature e aplicando-as a um método de Clustering Asimétrico. O método é comparado com uma abordagem de Clustering mais tradicional, onde a similaridade é medida usando Dynamic Time Warping, desenvolvido para trabalhar com séries temporais. O intuito é considerar a abordagem tradicional como benchmark e compará-la ao método da Signature através do tempo de computação, desempenho e algumas aplicações. Estes métodos são aplicados num conjunto de dados de séries temporais financeiras de Fundos Mútuos do Luxemburgo. Após a revisão da literatura, apresentamos o método Dynamic Time Warping e o método da Signature. Prossegue-se com a explicação das abordagens de Clustering Tradicional, nomeadamente k-Means, e Clustering Espectral Assimétrico, nomeadamente k-Axes, desenvolvido por Atev (2011). O último capítulo é dedicado à Investigação Prática onde os métodos anteriores são aplicados ao conjunto de dados. Os resultados confirmam que o método da Signature têm efectivamente potencial para Machine Learning e previsão, como sugerido por Levin, Lyons and Ni (2013).
This thesis applies the Signature method as a measurement of similarities between two time-series objects, using the Signature properties of order 2, and its application to Asymmetric Spectral Clustering. The method is compared with a more Traditional Clustering approach where similarities are measured using Dynamic Time Warping, developed to work with time-series data. The intention for this is to consider the traditional approach as a benchmark and compare it to the Signature method through computation times, performance, and applications. These methods are applied to a financial time series data set of Mutual Exchange Funds from Luxembourg. After the literature review, we introduce the Dynamic Time Warping method and the Signature method. We continue with the explanation of Traditional Clustering approaches, namely k-Means, and Asymmetric Clustering techniques, namely the k-Axes algorithm, developed by Atev (2011). The last chapter is dedicated to Practical Research where the previous methods are applied to the data set. Results confirm that the Signature method has indeed potential for machine learning and prediction, as suggested by Levin, Lyons, and Ni (2013).
info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO, and other styles
8

Nelson, Alex Tremain. "Nonlinear estimation and modeling of noisy time-series by dual Kalman filtering methods." Oregon Health & Science University, 2000. http://content.ohsu.edu/u?/etd,211.

Full text
Abstract:
Ph.D.
Electrical and Computer Engineering
Numerous applications require either the estimation or prediction of a noisy time-series. Examples include speech enhancement, economic forecasting, and geophysical modeling. A noisy time-series can be described in terms of a probabilistic model, which accounts for both the deterministic and stochastic components of the dynamics. Such a model can be used with a Kalman filter (or extended Kalman filter) to estimate and predict the time-series from noisy measurements. When the model is unknown, it must be estimated as well; dual estimation refers to the problem of estimating both the time-series, and its underlying probabilistic model, from noisy data. The majority of dual estimation techniques in the literature are for signals described by linear models, and many are restricted to off-line application domains. Using a probabilistic approach to dual estimation, this work unifies many of the approaches in the literature within a common theoretical and algorithmic framework, and extends their capabilities to include sequential dual estimation of both linear and nonlinear signals. The dual Kalman filtering method is developed as a method for minimizing a variety of dual estimation cost functions, and is shown to be an effective general method for estimating the signal, model parameters, and noise variances in both on-line and off-line environments.
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Chiying. "Contributions to Collective Dynamical Clustering-Modeling of Discrete Time Series." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/198.

Full text
Abstract:
The analysis of sequential data is important in business, science, and engineering, for tasks such as signal processing, user behavior mining, and commercial transactions analysis. In this dissertation, we build upon the Collective Dynamical Modeling and Clustering (CDMC) framework for discrete time series modeling, by making contributions to clustering initialization, dynamical modeling, and scaling. We first propose a modified Dynamic Time Warping (DTW) approach for clustering initialization within CDMC. The proposed approach provides DTW metrics that penalize deviations of the warping path from the path of constant slope. This reduces over-warping, while retaining the efficiency advantages of global constraint approaches, and without relying on domain dependent constraints. Second, we investigate the use of semi-Markov chains as dynamical models of temporal sequences in which state changes occur infrequently. Semi-Markov chains allow explicitly specifying the distribution of state visit durations. This makes them superior to traditional Markov chains, which implicitly assume an exponential state duration distribution. Third, we consider convergence properties of the CDMC framework. We establish convergence by viewing CDMC from an Expectation Maximization (EM) perspective. We investigate the effect on the time to convergence of our efficient DTW-based initialization technique and selected dynamical models. We also explore the convergence implications of various stopping criteria. Fourth, we consider scaling up CDMC to process big data, using Storm, an open source distributed real-time computation system that supports batch and distributed data processing. We performed experimental evaluation on human sleep data and on user web navigation data. Our results demonstrate the superiority of the strategies introduced in this dissertation over state-of-the-art techniques in terms of modeling quality and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
10

Nordlinder, Magnus. "Clustering of Financial Account Time Series Using Self Organizing Maps." Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291612.

Full text
Abstract:
This thesis aims to cluster financial account time series by extracting global features from the time series and by using two different dimensionality reduction methods, Kohonen Self Organizing Maps and principal component analysis, to cluster the set of the time series by using K-means. The results are then used to further cluster a set of financial services provided by a financial institution, to determine if it is possible to find a set of services which coincide with the time series clusters. The results find several sets of services that are prevalent in the different time series clusters. The resulting method can be used to understand the dynamics between deposits variability and the customers usage of different services and to analyse whether a service is more used in different clusters.
Målet med denna uppsats är att klustra tidsserier över finansiella konton genom att extrahera tidsseriernas karakteristik. För detta används två metoder för att reducera tidsseriernas dimensionalitet, Kohonen Self Organizing Maps och principal komponent analys. Resultatet används sedan för att klustra finansiella tjänster som en kund använder, med syfte att analysera om det existerar ett urval av tjänster som är mer eller mindre förekommande bland olika tidsseriekluster. Resultatet kan användas för att analysera dynamiken mellan kontobehållning och kundens finansiella tjänster, samt om en tjänst är mer förekommande i ett tidsseriekluster.
APA, Harvard, Vancouver, ISO, and other styles
11

Zhang, Guilin. "Clustering Algorithms for Time Series Gene Expression in Microarray Data." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc177269/.

Full text
Abstract:
Clustering techniques are important for gene expression data analysis. However, efficient computational algorithms for clustering time-series data are still lacking. This work documents two improvements on an existing profile-based greedy algorithm for short time-series data; the first one is implementation of a scaling method on the pre-processing of the raw data to handle some extreme cases; the second improvement is modifying the strategy to generate better clusters. Simulation data and real microarray data were used to evaluate these improvements; this approach could efficiently generate more accurate clusters. A new feature-based algorithm was also developed in which steady state value; overshoot, rise time, settling time and peak time are generated by the 2nd order control system for the clustering purpose. This feature-based approach is much faster and more accurate than the existing profile-based algorithm for long time-series data.
APA, Harvard, Vancouver, ISO, and other styles
12

Caiado, Aníbal Jorge da Costa Cristóvão. "Distance-based methods for classification and clustering of time series." Doctoral thesis, Instituto Superior de Economia e Gestão, 2006. http://hdl.handle.net/10400.5/3531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Arora, Rahul. "Operational Modal Parameter Estimation from Short Time-Data Series." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1397467142.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lei, Jiahuan. "An extended BIRCH-based clustering algorithm for large time-series datasets." Thesis, Mittuniversitetet, Avdelningen för informations- och kommunikationssystem, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29858.

Full text
Abstract:
Temporal data analysis and mining has attracted substantial interest due to theproliferation and ubiquity of time series in many fields. Time series clustering isone of the most popular mining methods, and many time series clustering algorithmsprimarily focus on detecting the clusters in a batch fashion that will use alot of memory space and thus limit the scalability and capability for large timeseries.The BIRCH algorithm has been proven to scale well to large datasets,which is characterized by an incrementally clustering data objects using a singlescan. However the Euclidean distance metric employed in BIRCH has beenproven to not be accurate for time series and will degrade the accuracy performance.To overcome this drawback, this work proposes an extended BIRCH algorithmfor large time series. The BIRCH clustering algorithm is extended bychanging the cluster feature vector to the proposed modified cluster feature, replacingthe original Euclidean distance measure with dynamic time warping andemploying DTW barycenter averaging method as the centroid computation approach,which is more suitable for time-series clustering than any other averagingmethods. To demonstrate the effectiveness of the proposed algorithm, weconducted an extensive evaluation of our algorithm against BIRCH, k-meansand their variants with combinations of competitive distance measures. Experimentalresults show that the extended BIRCH algorithm improves the accuracysignificantly compared to the BIRCH algorithm and its variants, and achievescompetitive and similar accuracy as k-means and its variant, k-DBA. However,unlike k-means and k-DBA, the extended BIRCH algorithm maintains the abilityof incrementally handling continuous incoming data objects, which is thekey to cluster large time-series datasets. Finally the extended BIRCH-based algorithmis applied to solve a subsequence time-series clustering task of a simulationmulti-variate time-series dataset with the help of a sliding window.
APA, Harvard, Vancouver, ISO, and other styles
15

Leverger, Colin. "Investigation of a framework for seasonal time series forecasting." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S033.

Full text
Abstract:
Pour déployer des applications web, l'utilisation de serveurs informatique est primordiale. S'ils sont peu nombreux, les performances des applications peuvent se détériorer. En revanche, s'ils sont trop nombreux, les ressources sont gaspillées et les coûts argumentés. Dans ce contexte, les ingénieurs utilisent des outils de planning capacitaire qui leur permettent de suivre les performances des serveurs, de collecter les données temporelles générées par les infrastructures et d’anticiper les futurs besoins. La nécessité de créer des prévisions fiables apparaît évidente. Les données des infrastructures présentent souvent une saisonnalité évidente. Le cycle d’activité suivi par l’infrastructure est déterminé par certains cycles saisonniers (par exemple, le rythme quotidien de l’activité des utilisateurs). Cette thèse présente un framework pour la prévision de séries temporelles saisonnières. Ce framework est composé de deux modèles d’apprentissage automatique (e.g. clustering et classification) et vise à fournir des prévisions fiables à moyen terme avec un nombre limité de paramètres. Trois implémentations du framework sont présentées : une baseline, une déterministe et une probabiliste. La baseline est constituée d'un algorithme de clustering K-means et de modèles de Markov. La version déterministe est constituée de plusieurs algorithmes de clustering (K-means, K-shape, GAK et MODL) et de plusieurs classifieurs (classifieurs bayésiens, arbres de décisions, forêt aléatoire et régression logistique). La version probabiliste repose sur du coclustering pour créer des grilles probabilistes de séries temporelles, afin de décrire les données de manière non supervisée. Les performances des différentes implémentations du framework sont comparées avec différents modèles de l’état de l’art, incluant les modèles autorégressifs, les modèles ARIMA et SARIMA, les modèles Holts Winters, ou encore Prophet pour la partie probabiliste. Les résultats de la baseline sont encourageants, et confirment l'intérêt pour le framework proposé. De bons résultats sont constatés pour la version déterministe du framework, et des résultats corrects pour la version probabiliste. Un cas d’utilisation d’Orange est étudié, et l’intérêt et les limites de la méthodologie sont montrés
To deploy web applications, using web servers is paramount. If there is too few of them, applications performances can quickly deteriorate. However, if they are too numerous, the resources are wasted and the cost increased. In this context, engineers use capacity planning tools to follow the performances of the servers, to collect time series data and to anticipate future needs. The necessity to create reliable forecasts seems clear. Data generated by the infrastructure often exhibit seasonality. The activity cycle followed by the infrastructure is determined by some seasonal cycles (for example, the user’s daily rhythms). This thesis introduces a framework for seasonal time series forecasting. This framework is composed of two machine learning models (e.g. clustering and classification) and aims at producing reliable midterm forecasts with a limited number of parameters. Three instantiations of the framework are presented: one baseline, one deterministic and one probabilistic. The baseline is composed of K-means clustering algorithms and Markov Models. The deterministic version is composed of several clustering algorithms (K-means, K-shape, GAK and MODL) and of several classifiers (naive-bayes, decision trees, random forests and logistic regression). The probabilistic version relies on coclustering to create time series probabilistic grids, that are used to describe the data in an unsupervised way. The performances of the various implementations are compared with several state-of-the-art models, including the autoregressive models, ARIMA and SARIMA, Holt Winters, or even Prophet for the probabilistic paradigm. The results of the baseline are encouraging and confirm the interest for the framework proposed. Good results are observed for the deterministic implementation, and correct results for the probabilistic version. One Orange use case is studied, and the interest and limits of the methodology are discussed
APA, Harvard, Vancouver, ISO, and other styles
16

Zeng, Zhanggui. "Financial Time Series Analysis using Pattern Recognition Methods." University of Sydney, 2008. http://hdl.handle.net/2123/3558.

Full text
Abstract:
Doctor of Philosophy
This thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database.
APA, Harvard, Vancouver, ISO, and other styles
17

Tang, Fan. "Structural time series clustering, modeling, and forecasting in the state-space framework." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/6002.

Full text
Abstract:
This manuscript consists of two papers that formulate novel methodologies pertaining to time series analysis in the state-space framework. In Chapter 1, we introduce an innovative time series forecasting procedure that relies on model-based clustering and model averaging. The clustering algorithm employs a state-space model comprised of three latent structures: a long-term trend component; a seasonal component, to capture recurring global patterns; and an anomaly component, to reflect local perturbations. A two-step clustering algorithm is applied to identify series that are both globally and locally correlated, based on the corresponding smoothed latent structures. For each series in a particular cluster, a set of forecasting models is fit, using covariate series from the same cluster. To fully utilize the cluster information and to improve forecasting for a series of interest, multi-model averaging is employed. We illustrate the proposed technique in an application that involves a collection of monthly disease incidence series. In Chapter 2, to effectively characterize a count time series that arises from a zero-inflated binomial (ZIB) distribution, we propose two classes of statistical models: a class of observation-driven ZIB (ODZIB) models, and a class of parameter-driven ZIB (PDZIB) models. The ODZIB model is formulated in the partial likelihood framework. Common iterative algorithms (Newton-Raphson, Fisher Scoring, and Expectation Maximization) can be used to obtain the maximum partial likelihood estimators (MPLEs). The PDZIB model is formulated in the state-space framework. For parameter estimation, we devise a Monte Carlo Expectation Maximization (MCEM) algorithm, using particle methods to approximate the intractable conditional expectations in the E-step of the algorithm. We investigate the efficacy of the proposed methodology in a simulation study, and illustrate its utility in a practical application pertaining to disease coding.
APA, Harvard, Vancouver, ISO, and other styles
18

Blakely, Logan. "Spectral Clustering for Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series." Thesis, Portland State University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10980011.

Full text
Abstract:

The increasing demand for and prevalence of distributed energy resources (DER) such as solar power, electric vehicles, and energy storage, present a unique set of challenges for integration into a legacy power grid, and accurate models of the low-voltage distribution systems are critical for accurate simulations of DER. Accurate labeling of the phase connections for each customer in a utility model is one area of grid topology that is known to have errors and has implications for the safety, efficiency, and hosting capacity of a distribution system. This research presents a methodology for the phase identification of customers solely using the advanced metering infrastructure (AMI) voltage timeseries. This thesis proposes to use Spectral Clustering, combined with a sliding window ensemble method for utilizing a long-term, time-series dataset that includes missing data, to group customers within a lateral by phase. These clustering phase predictions validate over 90% of the existing phase labels in the model and identify customers where the current phase labels are incorrect in this model. Within this dataset, this methodology produces consistent, high-quality results, verified by validating the clustering phase predictions with the underlying topology of the system, as well as selected examples verified using satellite and street view images publicly available in Google Earth. Further analysis of the results of the Spectral Clustering predictions are also shown to not only validate and improve the phase labels in the utility model, but also show potential in the detection of other types of errors in the topology of the model such as errors in the labeling of connections between customers and transformers, unlabeled residential solar power, unlabeled transformers, and locating customers with incomplete information in the model. These results indicate excellent potential for further development of this methodology as a tool for validating and improving existing utility models of the low-voltage side of the distribution system.

APA, Harvard, Vancouver, ISO, and other styles
19

Díaz, González Fernando. "Federated Learning for Time Series Forecasting Using LSTM Networks: Exploiting Similarities Through Clustering." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254665.

Full text
Abstract:
Federated learning poses a statistical challenge when training on highly heterogeneous sequence data. For example, time-series telecom data collected over long intervals regularly shows mixed fluctuations and patterns. These distinct distributions are an inconvenience when a node not only plans to contribute to the creation of the global model but also plans to apply it on its local dataset. In this scenario, adopting a one-fits-all approach might be inadequate, even when using state-of-the-art machine learning techniques for time series forecasting, such as Long Short-Term Memory (LSTM) networks, which have proven to be able to capture many idiosyncrasies and generalise to new patterns. In this work, we show that by clustering the clients using these patterns and selectively aggregating their updates in different global models can improve local performance with minimal overhead, as we demonstrate through experiments using realworld time series datasets and a basic LSTM model.
Federated Learning utgör en statistisk utmaning vid träning med starkt heterogen sekvensdata. Till exempel så uppvisar tidsseriedata inom telekomdomänen blandade variationer och mönster över längre tidsintervall. Dessa distinkta fördelningar utgör en utmaning när en nod inte bara ska bidra till skapandet av en global modell utan även ämnar applicera denna modell på sin lokala datamängd. Att i detta scenario införa en global modell som ska passa alla kan visa sig vara otillräckligt, även om vi använder oss av de mest framgångsrika modellerna inom maskininlärning för tidsserieprognoser, Long Short-Term Memory (LSTM) nätverk, vilka visat sig kunna fånga komplexa mönster och generalisera väl till nya mönster. I detta arbete visar vi att genom att klustra klienterna med hjälp av dessa mönster och selektivt aggregera deras uppdateringar i olika globala modeller kan vi uppnå förbättringar av den lokal prestandan med minimala kostnader, vilket vi demonstrerar genom experiment med riktigt tidsseriedata och en grundläggande LSTM-modell.
APA, Harvard, Vancouver, ISO, and other styles
20

Damle, Chaitanya. "Flood forecasting using time series data mining." [Tampa, Fla.] : University of South Florida, 2005. http://purl.fcla.edu/fcla/etd/SFE0001038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Wang, Xing. "Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6425.

Full text
Abstract:
The Time Dependent Kernel Density Estimation (TDKDE) developed by Harvey & Oryshchenko (2012) is a kernel density estimation adjusted by the Exponentially Weighted Moving Average (EWMA) weighting scheme. The Maximum Likelihood Estimation (MLE) procedure for estimating the parameters proposed by Harvey & Oryshchenko (2012) is easy to apply but has two inherent problems. In this study, we evaluate the performances of the probability density estimation in terms of the uniformity of Probability Integral Transforms (PITs) on various kernel functions combined with different preset numbers. Furthermore, we develop a new estimation algorithm which can be conducted using Artificial Neural Networks to eliminate the inherent problems with the MLE method and to improve the estimation performance as well. Based on the new estimation algorithm, we develop the TDKDE-based Random Forests time series classification algorithm which is significantly superior to the commonly used statistical feature-based Random Forests method as well as the Ker- nel Density Estimation (KDE)-based Random Forests approach. Furthermore, the proposed TDKDE-based Self-organizing Map (SOM) clustering algorithm is demonstrated to be superior to the widely used Discrete-Wavelet- Transform (DWT)-based SOM method in terms of the Adjusted Rand Index (ARI).
APA, Harvard, Vancouver, ISO, and other styles
22

Xu, Tianbing. "Nonparametric evolutionary clustering." Diss., Online access via UMI:, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
23

Zlicar, Blaz. "Algorithms for noisy and nonstationary data : advances in financial time series forecasting and pattern detection with machine learning." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10043123/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Abualhamayl, Abdullah Jameel Mr. "APPLY DATA CLUSTERING TO GENE EXPRESSION DATA." CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/259.

Full text
Abstract:
Data clustering plays an important role in effective analysis of gene expression. Although DNA microarray technology facilitates expression monitoring, several challenges arise when dealing with gene expression datasets. Some of these challenges are the enormous number of genes, the dimensionality of the data, and the change of data over time. The genetic groups which are biologically interlinked can be identified through clustering. This project aims to clarify the steps to apply clustering analysis of genes involved in a published dataset. The methodology for this project includes the selection of the dataset representation, the selection of gene datasets, Similarity Matrix Selection, the selection of clustering algorithm, and analysis tool. R language with the focus of Kmeans, fpc, hclust, and heatmap3 packages in R is used in this project as an analysis tool. Different clustering algorithms are used on Spellman dataset to illustrate how genes are grouped together in clusters which help to understand our genetic behaviors.
APA, Harvard, Vancouver, ISO, and other styles
25

Li, Lei. "Fast Algorithms for Mining Co-evolving Time Series." Research Showcase @ CMU, 2011. http://repository.cmu.edu/dissertations/112.

Full text
Abstract:
Time series data arise in many applications, from motion capture, environmental monitoring, temperatures in data centers, to physiological signals in health care. In the thesis, I will focus on the theme of learning and mining large collections of co-evolving sequences, with the goal of developing fast algorithms for finding patterns, summarization, and anomalies. In particular, this thesis will answer the following recurring challenges for time series: 1. Forecasting and imputation: How to do forecasting and to recover missing values in time series data? 2. Pattern discovery and summarization: How to identify the patterns in the time sequences that would facilitate further mining tasks such as compression, segmentation and anomaly detection? 3. Similarity and feature extraction: How to extract compact and meaningful features from multiple co-evolving sequences that will enable better clustering and similarity queries of time series? 4. Scale up: How to handle large data sets on modern computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models, in particular, including Linear Dynamical Systems (LDS) and Hidden Markov Models (HMM). We also develop a distributed algorithm for finding patterns in large web-click streams. Our thesis will present special models and algorithms that incorporate domain knowledge. For motion capture, we will describe the natural motion stitching and occlusion filling for human motion. In particular, we provide a metric for evaluating the naturalness of motion stitching, based which we choose the best stitching. Thanks to domain knowledge (body structure and bone lengths), our algorithm is capable of recovering occlusions in mocap sequences, better in accuracy and longer in missing period. We also develop an algorithm for forecasting thermal conditions in a warehouse-sized data center. The forecast will help us control and manage the data center in a energy-efficient way, which can save a significant percentage of electric power consumption in data centers.
APA, Harvard, Vancouver, ISO, and other styles
26

YILDIRIM, NURSEDA. "TIME SERIES MODELLING FOR WIND POWER PREDICTION AND CONTROL : CLUSTERING AND ASSOCIATION RULES OF DATA MINING FOR CFD AND TIME SERIES DATA OF POWER RAMPS." Thesis, Uppsala universitet, Institutionen för geovetenskaper, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-245304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Lu, Zhengdong. "Constrained clustering and cognitive decline detection /." Full text open access at:, 2008. http://content.ohsu.edu/u?/etd,650.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Khy, Sophoin, Yoshiharu Ishikawa, and Hiroyuki Kitagawa. "A Query Language and Its Processing for Time-Series Document Clusters." Springer-Verlag, 2008. http://hdl.handle.net/2237/10689.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

CASSIANO, KEILA MARA. "TIME SERIES ANALYSIS USING SINGULAR SPECTRUM ANALYSIS (SSA) AND BASED DENSITY CLUSTERING OF THE COMPONENTS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24787@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Esta tese propõe a utilização do DBSCAN (Density Based Spatial Clustering of Applications with Noise) para separar os componentes de ruído na fase de agrupamento das autotriplas da Análise Singular Espectral (SSA) de Séries Temporais. O DBSCAN é um método moderno de clusterização (revisto em 2013) e especialista em identificar ruído através de regiões de menor densidade. O método de agrupamento hierárquico até então é a última inovação na separação de ruído na abordagem SSA, implementado no pacote R- SSA. No entanto, o método de agrupamento hierárquico é muito sensível a ruído, não é capaz de separá-lo corretamente, não deve ser usado em conjuntos com diferentes densidades e não funciona bem no agrupamento de séries temporais de diferentes tendências, ao contrário dos métodos de aglomeração à base de densidade que são eficazes para separar o ruído a partir dos dados e dedicados para trabalhar bem em dados a partir de diferentes densidades. Este trabalho mostra uma melhor eficiência de DBSCAN sobre os outros métodos já utilizados nesta etapa do SSA, garantindo considerável redução de ruídos e proporcionando melhores previsões. O resultado é apoiado por avaliações experimentais realizadas para séries simuladas de modelos estacionários e não estacionários. A combinação de metodologias proposta também foi aplicada com sucesso na previsão de uma série real de velocidade do vento.
This thesis proposes using DBSCAN (Density Based Spatial Clustering of Applications with Noise) to separate the noise components of eigentriples in the grouping stage of the Singular Spectrum Analysis (SSA) of Time Series. The DBSCAN is a modern (revised in 2013) and expert method at identify noise through regions of lower density. The hierarchical clustering method was the last innovation in noise separation in SSA approach, implemented on package R-SSA. However, is repeated in the literature that the hierarquical clustering method is very sensitive to noise, is unable to separate it correctly, and should not be used in clusters with varying densities and neither works well in clustering time series of different trends. Unlike, the methods of density based clustering are effective in separating the noise from the data and dedicated to work well on data from different densities This work shows better efficiency of DBSCAN over the others methods already used in this stage of SSA, because it allows considerable reduction of noise and provides better forecasting. The result is supported by experimental evaluations realized for simulated stationary and non-stationary series. The proposed combination of methodologies also was applied successfully to forecasting real series of wind s speed.
APA, Harvard, Vancouver, ISO, and other styles
30

Makhlouk, Oumaïma. "Time series data analytics : clustering-based anomaly detection techniques for quality control in semiconductor manufacturing." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120248.

Full text
Abstract:
Thesis: M. Eng. in Advanced Manufacturing and Design, Massachusetts Institute of Technology, Department of Mechanical Engineering, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-110).
Optimizing their manufacturing systems and processes whilst ensuring a low production cost is important for Analog Devices, Inc. (ADI). Therefore, detecting anomalies on production lines and alerting on out-of-control processes is crucial. Although Statistical Process Control (SPC) methods have been implemented in the past and have proven to be efficient, the company seeks improvements using machine learning. The Machine Health Project is one of the data analytics-based projects under way at ADI to implement such improvements. Anomaly detection techniques can be effective in improving the quality control on semiconductor production lines. Sets of data collected from semiconductor manufacturing machines, such as a plasma etcher, can be analyzed to control the fabrication process and test the efficiency of machine learning algorithms. This thesis focuses on cluster analysis for outlier detection, and provides a univariate strategy to find potential anomalous behaviors in the data when a given parameter is known as relevant. If a more thorough analysis of the data is needed, a multivariate clustering analysis can also be computed. In addition, decomposition-based algorithms are presented. These rely on techniques such as the STL and SAX representations of time series, and provide a visual computation of time series discords. In this thesis, these methods are implemented, and their results are compared. Recommendations are provided as to how to best utilize the outputs of these outlier detection algorithms.
by Oumaïma Makhlouk.
M. Eng. in Advanced Manufacturing and Design
APA, Harvard, Vancouver, ISO, and other styles
31

Marti, Gautier. "Some contributions to the clustering of financial time series and applications to credit default swaps." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX097/document.

Full text
Abstract:
Nous commençons cette thèse par passer en revue l'ensemble épars de la littérature sur les méthodes de partitionnement automatique des séries temporelles financières. Ensuite, tout en introduisant les jeux de données qui ont aussi bien servi lors des études empiriques que motivé les choix de modélisation, nous essayons de donner des informations intéressantes sur l'état du marché des couvertures de défaillance peu connu du grand public sinon pour son rôle lors de la crise financière mondiale de 2007-2008. Contrairement à la majorité de la littérature sur les méthodes de partitionnement automatique des séries temporelles financières, notre but n'est pas de décrire et expliquer les résultats par des explications économiques, mais de pouvoir bâtir des modèles et autres larges systèmes d'information sur ces groupes homogènes. Pour ce faire, les fondations doivent être stables. C'est pourquoi l'essentiel des travaux entrepris et décrits dans cette thèse visent à affermir le bien-fondé de l'utilisation de ces regroupements automatiques en discutant de leur consistance et stabilité aux perturbations. De nouvelles distances entre séries temporelles financières prenant mieux en compte leur nature stochastique et pouvant être mis à profit dans les méthodes de partitionnement automatique existantes sont proposées. Nous étudions empiriquement leur impact sur les résultats. Les résultats de ces études peuvent être consultés sur www.datagrapple.com
In this thesis we first review the scattered literature about clustering financial time series. We then try to give as much colors as possible on the credit default swap market, a relatively unknown market from the general public but for its role in the contagion of bank failures during the global financial crisis of 2007-2008, while introducing the datasets that have been used in the empirical studies. Unlike the existing body of literature which mostly offers descriptive studies, we aim at building models and large information systems based on clusters which are seen as basic building blocks: These foundations must be stable. That is why the work undertaken and described in the following intends to ground further the clustering methodologies. For that purpose, we discuss their consistency and propose alternative measures of similarity that can be plugged in the clustering methodologies. We study empirically their impact on the clusters. Results of the empirical studies can be explored at www.datagrapple.com
APA, Harvard, Vancouver, ISO, and other styles
32

Hanna, Peter, and Erik Swartling. "Anomaly Detection in Time Series Data using Unsupervised Machine Learning Methods: A Clustering-Based Approach." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273630.

Full text
Abstract:
For many companies in the manufacturing industry, attempts to find damages in their products is a vital process, especially during the production phase. Since applying different machine learning techniques can further aid the process of damage identification, it becomes a popular choice among companies to make use of these methods to enhance the production process even further. For some industries, damage identification can be heavily linked with anomaly detection of different measurements. In this thesis, the aim is to construct unsupervised machine learning models to identify anomalies on unlabeled measurements of pumps using high frequency sampled current and voltage time series data. The measurement can be split up into five different phases, namely the startup phase, three duty point phases and lastly the shutdown phase. The approach is based on clustering methods, where the main algorithms of use are the density-based algorithms DBSCAN and LOF. Dimensionality reduction techniques, such as feature extraction and feature selection, are applied to the data and after constructing the five models of each phase, it can be seen that the models identifies anomalies in the data set given.​
För flera företag i tillverkningsindustrin är felsökningar av produkter en fundamental uppgift i produktionsprocessen. Då användningen av olika maskininlärningsmetoder visar sig innehålla användbara tekniker för att hitta fel i produkter är dessa metoder ett populärt val bland företag som ytterligare vill förbättra produktionprocessen. För vissa industrier är feldetektering starkt kopplat till anomalidetektering av olika mätningar. I detta examensarbete är syftet att konstruera oövervakad maskininlärningsmodeller för att identifiera anomalier i tidsseriedata. Mer specifikt består datan av högfrekvent mätdata av pumpar via ström och spänningsmätningar. Mätningarna består av fem olika faser, nämligen uppstartsfasen, tre last-faser och fasen för avstängning. Maskinilärningsmetoderna är baserade på olika klustertekniker, och de metoderna som användes är DBSCAN och LOF algoritmerna. Dessutom tillämpades olika dimensionsreduktionstekniker och efter att ha konstruerat 5 olika modeller, alltså en för varje fas, kan det konstateras att modellerna lyckats identifiera anomalier i det givna datasetet.
APA, Harvard, Vancouver, ISO, and other styles
33

Ferreira, Leonardo Nascimento. "Time series data mining using complex networks." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-01022018-144118/.

Full text
Abstract:
A time series is a time-ordered dataset. Due to its ubiquity, time series analysis is interesting for many scientific fields. Time series data mining is a research area that is intended to extract information from these time-related data. To achieve it, different models are used to describe series and search for patterns. One approach for modeling temporal data is by using complex networks. In this case, temporal data are mapped to a topological space that allows data exploration using network techniques. In this thesis, we present solutions for time series data mining tasks using complex networks. The primary goal was to evaluate the benefits of using network theory to extract information from temporal data. We focused on three mining tasks. (1) In the clustering task, we represented every time series by a vertex and we connected vertices that represent similar time series. We used community detection algorithms to cluster similar series. Results show that this approach presents better results than traditional clustering results. (2) In the classification task, we mapped every labeled time series in a database to a visibility graph. We performed classification by transforming an unlabeled time series to a visibility graph and comparing it to the labeled graphs using a distance function. The new label is the most frequent label in the k-nearest graphs. (3) In the periodicity detection task, we first transform a time series into a visibility graph. Local maxima in a time series are usually mapped to highly connected vertices that link two communities. We used the community structure to propose a periodicity detection algorithm in time series. This method is robust to noisy data and does not require parameters. With the methods and results presented in this thesis, we conclude that network science is beneficial to time series data mining. Moreover, this approach can provide better results than traditional methods. It is a new form of extracting information from time series and can be easily extended to other tasks.
Séries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Chuhe. "A sliding window BIRCH algorithm with performance evaluations." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-32397.

Full text
Abstract:
An increasing number of applications covered various fields generate transactional data or other time-stamped data which all belongs to time series data. Time series data mining is a popular topic in the data mining field, it introduces some challenges to improve accuracy and efficiency of algorithms for time series data. Time series data are dynamical, large-scale and high complexity, which makes it difficult to discover patterns among time series data with common methods suitable for static data. One of hierarchical-based clustering methods called BIRCH was proposed and employed for addressing the problems of large datasets. It minimizes the costs of I/O and time. A CF tree is generated during its working process and clusters are generated after four phases of the whole BIRCH procedure. A drawback of BIRCH is that it is not very scalable. This thesis is devoted to improve accuracy and efficiency of BIRCH algorithm. A sliding window BIRCH algorithm is implemented on the basis of BIRCH algorithm. At the end of thesis, the accuracy and efficiency of sliding window BIRCH are evaluated. A performance comparison among SW BIRCH, BIRCH and K-means are also presented with Silhouette Coefficient index and Calinski-Harabaz Index. The preliminary results indicate that the SW BIRCH may achieve a better performance than BIRCH in some cases.
APA, Harvard, Vancouver, ISO, and other styles
35

Thielo, Marcelo Resende. "Análise e classificação de séries temporais não estacionárias utilizando métodos não-lineares." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2000. http://hdl.handle.net/10183/12661.

Full text
Abstract:
Neste trabalho fazemos revisão de alguns dos principais métodos para análise não-linear de séries temporais originadas a partir de sistemas de baixa dimensionalidade com dinâmica predominantemente determinística, dando ênfase ao problema de classificação/clusterização nãosupervisionada destas mesmas séries. Várias medidas de dissimilaridade são utilizadas em conjunto com métodos heurísticos baseados em algoritmos estocásticos, para a organização de segmentos de séries temporais não estacionárias em grupos com características em comum, na tentativa de associar a estes alguma característica clínica previamente conhecida. O método é implementado com diferentes medidas de dissimilaridade e um experimento feito com séries temporais sintéticas (obtidas a partir de simulação numérica) com fins de validação e posteriormente aplicado a um problema real, o problema de segmentação de estágios de sono. Os resultados indicam certa promissoriedade do método para aplicação na classificação estágios de sono em eletroencefalogramas.
In this work we make a review of some of the main methods available for nonlinear time series analysis for low-dimensional deterministic systems, giving emphasis to the problem of unsupervised classification/clustering of this kind of data. Various dissimilarity measures are used together with heuristic search methods based on stochastic algorithms to organize segments of one (big) nonstationary time series in groups with common characteristics, trying to relate these groups to some known clinical property. The method is implemented with different dissimilarity measures and one experiment made with synthetic (generated by numerical simulations) time series for validation and lately applied to a real problem, the problem of sleep stages segmentation. The results look promising with respect to the applicability of the method to classify sleep stages in electroencephalographic recordings.
APA, Harvard, Vancouver, ISO, and other styles
36

Jaunzems, Davis. "Time-series long-term forcasting for A/B tests." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205344.

Full text
Abstract:
Den tekniska utvecklingen av datorenheter och kommunikationsverktyg har skapat möjligheter att lagra och bearbeta större mängder information än någonsin tidigare. För forskare är det ett sätt att göra mer exakta vetenskapliga upptäckter, för företag är det ett verktyg för att bättre förstå sina kunder, sina produkter och att skapa fördelar gentemot sina konkurrenter. Inom industrin har A/B-testning blivit ett viktigt och vedertaget sätt att skaffa kunskaper som bidrar till att kunna fatta datadrivna beslut. A/B-test är en jämförelse av två eller flera versioner för att avgöra vilken som fungerar bäst enligt förutbestämda mätningar. I kombination med informationsutvinning och statistisk analys gör dessa tester det möjligt att besvara ett antal viktiga frågor och bidra till övergången från att "vi tror" till att "vi vet". Samtidigt kan dåliga testfall ha negativ inverkan på företags affärer och kan också leda till att användare upplever testerna negativt. Det är skälet till varför det är viktigt att kunna förutsäga A/B-testets långsiktiga effekter, utvunna ur kortsiktiga data. I denna rapport är A/B-tester och de prognoser de skapar undersökta genom att använda univariat tidsserieanalys. Men på grund av den korta tidsperioden och det stora urvalet, är det en stor utmaning att ge korrekta långtidsprognoser. Det är en kvantitativ och empirisk studie som använder verkliga data som tagits från ett socialt spelutvecklingsbolag, King Digital Entertainment PLC (King.com). Först analyseras och förbereds data genom en serie olika steg. Tidsserieprognoser har funnits i generationer. Därför görs en analys och noggrannhetsjämförelse av befintliga prognosmodeller, så som medelvärdesprognos, ARIMA och Artificial Neural Networks. Resultaten av analysen på verkliga data visar liknande resultat som andra forskare har funnit för långsiktiga prognoser med kortsiktiga data. För att förbättra exaktheten i prognosen föreslås en metod med tidsseriekluster. Metoden utnyttjar likheten mellan tidsserier genom Dynamic Time Warping och skapar separata kluster av prognosmodeller. Klustren väljs med hög noggrannhet med hjälp av Random Forest klassificering och de långa tidsserieintervallen säkras genom att använda historiska tester och en Markov Chain. Den föreslagna metoden visar överlägsna resultat i jämförelse med befintliga modeller och kan användas för att erhålla långsiktiga prognoser för A/B-tester.
The technological development of computing devices and communication tools has allowed to store and process more information than ever before. For researchers it is a means of making more accurate scientific discoveries, for companies it is a way of better understanding their clients, products and gain an edge over the competitors. In the industry A/B testing is becoming an important and a common way of obtaining insights that help to make data-driven decisions. A/B test is a comparison of two or more versions to determine which is performing better according to predetermined measurements. In combination of data mining and statistical analysis, these tests allow to answer important questions and help to transition from the state of “we think” to “we know”. Nevertheless, running bad test cases can have negative impact on businesses and can result in bad user experience. That is why it is important to be able to forecast A/B test long-term effects from short-term data. In this report A/B tests and their forecasting is looked at using the univariate time-series analysis. However, because of the short duration and high diversity, it poses a great challenge in providing accurate long-term forecasts. This is a quantitative and empirical study that uses real-world data set from a social game development company King Digital Entertainment PLC(King.com). First through series of steps the data are analysed and pre-processed. Time-series forecasting has been around for generations. That is why an analysis and accuracy comparison of existing forecasting models, like, mean forecast, ARIMA and Artificial Neural Networks, is carried out. The results on real data set show similar results that other researchers have found for long-term forecasts with short-term data. To improve the forecasting accuracy a time-series clustering method is proposed. The method utilizes similarity between time-series through Dynamic Time Warping, and trains separate cluster forecasting models. The clusters are chosen with high accuracy using Random Forest classifier, and certainty about time-series long-term range is obtained by using historical tests and a Markov Chain. The proposed method shows superior results against existing models, and can be used to obtain long-term forecasts for A/B tests.
APA, Harvard, Vancouver, ISO, and other styles
37

Soheily-Khah, Saeid. "Generalized k-means-based clustering for temporal data under time warp." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM064/document.

Full text
Abstract:
L’alignement de multiples séries temporelles est un problème important non résolu dans de nombreuses disciplines scientifiques. Les principaux défis pour l’alignement temporel de multiples séries comprennent la détermination et la modélisation des caractéristiques communes et différentielles de classes de séries. Cette thèse est motivée par des travaux récents portant sur l'extension de la DTW pour l’alignement de séries multiples issues d’applications diverses incluant la reconnaissance vocale, l'analyse de données micro-array, la segmentation ou l’analyse de mouvements humain. Ces travaux fondés sur l’extension de la DTW souffrent cependant de plusieurs limites : 1) Ils se limitent au problème de l'alignement par pair de séries 2) Ils impliquent uniformément les descripteurs des séries 3) Les alignements opérés sont globaux. L'objectif de cette thèse est d'explorer de nouvelles approches d’alignement temporel pour la classification non supervisée de séries. Ce travail comprend d'abord le problème de l'extraction de prototypes, puis de l'alignement de séries multiples multidimensionnelles
Temporal alignment of multiple time series is an important unresolved problem in many scientific disciplines. Major challenges for an accurate temporal alignment include determining and modeling the common and differential characteristics of classes of time series. This thesis is motivated by recent works in extending Dynamic time warping for aligning multiple time series from several applications including speech recognition, curve matching, micro-array data analysis, temporal segmentation or human motion. However these DTW-based works suffer of several limitations: 1) They address the problem of aligning two time series regardless of the remaining time series, 2) They involve uniformly the features of the multiple time series, 3) The time series are aligned globally by including the whole observations. The aim of this thesis is to explore a generalized dynamic time warping for time series clustering. This work includes first the problem of prototype extraction, then the alignment of multiple and multidimensional time series
APA, Harvard, Vancouver, ISO, and other styles
38

Costa, Fausto Guzzo da. "Employing nonlinear time series analysis tools with stable clustering algorithms for detecting concept drift on data streams." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-13112017-105506/.

Full text
Abstract:
Several industrial, scientific and commercial processes produce open-ended sequences of observations which are referred to as data streams. We can understand the phenomena responsible for such streams by analyzing data in terms of their inherent recurrences and behavior changes. Recurrences support the inference of more stable models, which are deprecated by behavior changes though. External influences are regarded as the main agent actuacting on the underlying phenomena to produce such modifications along time, such as new investments and market polices impacting on stocks, the human intervention on climate, etc. In the context of Machine Learning, there is a vast research branch interested in investigating the detection of such behavior changes which are also referred to as concept drifts. By detecting drifts, one can indicate the best moments to update modeling, therefore improving prediction results, the understanding and eventually the controlling of other influences governing the data stream. There are two main concept drift detection paradigms: the first based on supervised, and the second on unsupervised learning algorithms. The former faces great issues due to the labeling infeasibility when streams are produced at high frequencies and large volumes. The latter lacks in terms of theoretical foundations to provide detection guarantees. In addition, both paradigms do not adequately represent temporal dependencies among data observations. In this context, we introduce a novel approach to detect concept drifts by tackling two deficiencies of both paradigms: i) the instability involved in data modeling, and ii) the lack of time dependency representation. Our unsupervised approach is motivated by Carlsson and Memolis theoretical framework which ensures a stability property for hierarchical clustering algorithms regarding to data permutation. To take full advantage of such framework, we employed Takens embedding theorem to make data statistically independent after being mapped to phase spaces. Independent data were then grouped using the Permutation-Invariant Single-Linkage Clustering Algorithm (PISL), an adapted version of the agglomerative algorithm Single-Linkage, respecting the stability property proposed by Carlsson and Memoli. Our algorithm outputs dendrograms (seen as data models), which are proven to be equivalent to ultrametric spaces, therefore the detection of concept drifts is possible by comparing consecutive ultrametric spaces using the Gromov-Hausdorff (GH) distance. As result, model divergences are indeed associated to data changes. We performed two main experiments to compare our approach to others from the literature, one considering abrupt and another with gradual changes. Results confirm our approach is capable of detecting concept drifts, both abrupt and gradual ones, however it is more adequate to operate on complicated scenarios. The main contributions of this thesis are: i) the usage of Takens embedding theorem as tool to provide statistical independence to data streams; ii) the implementation of PISL in conjunction with GH (called PISLGH); iii) a comparison of detection algorithms in different scenarios; and, finally, iv) an R package (called streamChaos) that provides tools for processing nonlinear data streams as well as other algorithms to detect concept drifts.
Diversos processos industriais, científicos e comerciais produzem sequências de observações continuamente, teoricamente infinitas, denominadas fluxos de dados. Pela análise das recorrências e das mudanças de comportamento desses fluxos, é possível obter informações sobre o fenômeno que os produziu. A inferência de modelos estáveis para tais fluxos é suportada pelo estudo das recorrências dos dados, enquanto é prejudicada pelas mudanças de comportamento. Essas mudanças são produzidas principalmente por influências externas ainda desconhecidas pelos modelos vigentes, tal como ocorre quando novas estratégias de investimento surgem na bolsa de valores, ou quando há intervenções humanas no clima, etc. No contexto de Aprendizado de Máquina (AM), várias pesquisas têm sido realizadas para investigar essas variações nos fluxos de dados, referidas como mudanças de conceito. Sua detecção permite que os modelos possam ser atualizados a fim de apurar a predição, a compreensão e, eventualmente, controlar as influências que governam o fluxo de dados em estudo. Nesse cenário, algoritmos supervisionados sofrem com a limitação para rotular os dados quando esses são gerados em alta frequência e grandes volumes, e algoritmos não supervisionados carecem de fundamentação teórica para prover garantias na detecção de mudanças. Além disso, algoritmos de ambos paradigmas não representam adequadamente as dependências temporais entre observações dos fluxos. Nesse contexto, esta tese de doutorado introduz uma nova metodologia para detectar mudanças de conceito, na qual duas deficiências de ambos paradigmas de AM são confrontados: i) a instabilidade envolvida na modelagem dos dados, e ii) a representação das dependências temporais. Essa metodologia é motivada pelo arcabouço teórico de Carlsson e Memoli, que provê uma propriedade de estabilidade para algoritmos de agrupamento hierárquico com relação à permutação dos dados. Para usufruir desse arcabouço, as observações são embutidas pelo teorema de imersão de Takens, transformando-as em independentes. Esses dados são então agrupados pelo algoritmo Single-Linkage Invariante à Permutação (PISL), o qual respeita a propriedade de estabilidade de Carlsson e Memoli. A partir dos dados de entrada, esse algoritmo gera dendrogramas (ou modelos), que são equivalentes a espaços ultramétricos. Modelos sucessivos são comparados pela distância de Gromov-Hausdorff a fim de detectar mudanças de conceito no fluxo. Como resultado, as divergências dos modelos são de fato associadas a mudanças nos dados. Experimentos foram realizados, um considerando mudanças abruptas e o outro mudanças graduais. Os resultados confirmam que a metodologia proposta é capaz de detectar mudanças de conceito, tanto abruptas quanto graduais, no entanto ela é mais adequada para cenários mais complicados. As contribuições principais desta tese são: i) o uso do teorema de imersão de Takens para transformar os dados de entrada em independentes; ii) a implementação do algoritmo PISL em combinação com a distância de Gromov-Hausdorff (chamado PISLGH); iii) a comparação da metodologia proposta com outras da literatura em diferentes cenários; e, finalmente, iv) a disponibilização de um pacote em R (chamado streamChaos) que provê tanto ferramentas para processar fluxos de dados não lineares quanto diversos algoritmos para detectar mudanças de conceito.
APA, Harvard, Vancouver, ISO, and other styles
39

Foster, Eric D. "State space time series clustering using discrepancies based on the Kullback-Leibler information and the Mahalanobis distance." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/3455.

Full text
Abstract:
In this thesis, we consider the clustering of time series data; specifically, time series that can be modeled in the state space framework. Of primary focus is the pairwise discrepancy between two state space time series. The state space model can be formulated in terms of two equations: the state equation, based on a latent process, and the observation equation. Because the unobserved state process is often of interest, we develop discrepancy measures based on the estimated version of the state process. We compare these measures to discrepancies based on the observed data. In all, seven novel discrepancies are formulated. First, discrepancies derived from Kullback-Leibler (KL) information and Mahalanobis distance (MD) measures are proposed based on the observed data. Next, KL information and MD discrepancies are formulated based on the composite marginal contributions of the smoothed estimates of the unobserved state process. Furthermore, an MD is created based on the joint contributions of the collection of smoothed estimates of the unobserved state process. The cross trajectory distance, a discrepancy heavily influenced by both observed and smoothed data, is proposed as well as a Euclidean distance based on the smoothed state estimates. The performance of these seven novel discrepancies is compared to the often used Euclidean distance based on the observed data, as well as a KL information discrepancy based on the joint contributions of the collection of smoothed state estimates (Bengtsson and Cavanaugh, 2008). We find that those discrepancy measures based on the smoothed estimates of the unobserved state process outperform those discrepancy measures based on the observed data. The best performance was achieved by the discrepancies founded upon the joint contributions of the collection of unobserved states, followed by the discrepancies derived from the marginal contributions. We observed a non-trivial degradation in clustering performance when estimating the parameters of the state space model. To improve estimation, we propose an iterative estimation and clustering routine based on the notion of finding a series' most similar counterparts, pooling them, and estimating a new set of parameters. Under ideal circumstances, we show that the iterative estimation and clustering algorithm can potentially achieve results that approach those obtained in settings where parameters are known. In practice, the algorithm often improves the performance of the model-based clustering measures. We apply our methods to two examples. The first application pertains to the clustering of time course genetic data. We use data from Cho et al. (1998) where a time course experiment of yeast gene expression was performed in order to study the yeast mitotic cell cycle. We attempt to discover the phase to which 219 genes belong. The second application seeks to answer whether or not influenza and pneumonia mortality can be explained geographically. Data from a collection of cities across the U.S. are acquired from the Morbidity and Mortality Weekly Report (MMWR). We cluster the MMWR data without geographic constraints, and compare the results to clusters defined by MMWR geographic regions. We find that influenza and pneumonia mortality cannot be explained by geography.
APA, Harvard, Vancouver, ISO, and other styles
40

Tino, Peter, Christian Schittenkopf, and Georg Dorffner. "Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams. Lessons learned from financial volatility trading." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 2000. http://epub.wu.ac.at/1680/1/document.pdf.

Full text
Abstract:
In this paper we investigate the potential of the analysis of noisy non-stationary time series by quantizing it into streams of discrete symbols and applying finite-memory symbolic predictors. The main argument is that careful quantization can reduce the noise in the time series to make model estimation more amenable given limited numbers of samples that can be drawn due to the non-stationarity in the time series. As a main application area we study the use of such an analysis in a realistic setting involving financial forecasting and trading. In particular, using historical data, we simulate the trading of straddles on the financial indexes DAX and FTSE 100 on a daily basis, based on predictions of the daily volatility differences in the underlying indexes. We propose a parametric, data-driven quantization scheme which transforms temporal patterns in the series of daily volatility changes into grammatical and statistical patterns in the corresponding symbolic streams. As symbolic predictors operating on the quantized streams we use the classical fixed-order Markov models, variable memory length Markov models and a novel variation of fractal-based predictors introduced in its original form in (Tino, 2000b). The fractal-based predictors are designed to efficiently use deep memory. We compare the symbolic models with continuous techniques such as time-delay neural networks with continuous and categorical outputs, and GARCH models. Our experiments strongly suggest that the robust information reduction achieved by quantizing the real-valued time series is highly beneficial. To deal with non-stationarity in financial daily time series, we propose two techniques that combine ``sophisticated" models fitted on the training data with a fixed set of simple-minded symbolic predictors not using older (and potentially misleading) data in the training set. Experimental results show that by quantizing the volatility differences and then using symbolic predictive models, market makers can generate a statistically significant excess profit. However, with respect to our prediction and trading techniques, the option market on the DAX does seem to be efficient for traders and non-members of the stock exchange. There is a potential for traders to make an excess profit on the FTSE 100. We also mention some interesting observations regarding the memory structure in the studied series of daily volatility differences. (author's abstract)
Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
APA, Harvard, Vancouver, ISO, and other styles
41

Darwish, Amena. "Optimized material flow using unsupervised time series clustering : An experimental study on the just in time supermarket for Volvo powertrain production Skövde." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17530.

Full text
Abstract:
Machine learning has achieved remarkable performance in many domains, now it promising to solve manufacturing problems — a new ongoing trend of using machine learning in industrial applications. Dealing with the material order demand in manufacturing as time-series sequences, making unsupervised time-series clustering possible to apply. This study aims to evaluate different time-series clustering approaches, algorithms, and distance measures in material flow data. Three different approaches are evaluated; statistical clustering approaches; raw based and shape-based approaches and at last feature-based approach. The objectives are to categorize the materials in the supermarket (intermediate storage area to store materials before assembling the products) into three different flows according to their time-series properties. The experimental shows that feature-based approach is performed best for the data. A features filter is applied to keep the relevant features, that catch the unique characteristics from the data the predicted output. As a conclusion data type, structure, the goal of the clustering task and the application domains are reasons that have to consider when choosing the suitable clustering approach.
APA, Harvard, Vancouver, ISO, and other styles
42

Salmon, Brian Paxton. "Improved hyper-temporal feature extraction methods for land cover change detection in satellite time series." Thesis, University of Pretoria, 2012. http://hdl.handle.net/2263/28199.

Full text
Abstract:
The growth in global population inevitably increases the consumption of natural resources. The need to provide basic services to these growing communities leads to an increase in anthropogenic changes to the natural environment. The resulting transformation of vegetation cover (e.g. deforestation, agricultural expansion, urbanisation) has significant impacts on hydrology, biodiversity, ecosystems and climate. Human settlement expansion is the most common driver of land cover change in South Africa, and is currently mapped on an irregular, ad hoc basis using visual interpretation of aerial photographs or satellite images. This thesis proposes several methods of detecting newly formed human settlements using hyper-temporal, multi-spectral, medium spatial resolution MODIS land surface reflectance satellite imagery. The hyper-temporal images are used to extract time series, which are analysed in an automated fashion using machine learning methods. A post-classification change detection framework was developed to analyse the time series using several feature extraction methods and classifiers. Two novel hyper-temporal feature extraction methods are proposed to characterise the seasonal pattern in the time series. The first feature extraction method extracts Seasonal Fourier features that exploits the difference in temporal spectra inherent to land cover classes. The second feature extraction method extracts state-space vectors derived using an extended Kalman filter. The extended Kalman filter is optimised using a novel criterion which exploits the information inherent in the spatio-temporal domain. The post-classification change detection framework was evaluated on different classifiers; both supervised and unsupervised methods were explored. A change detection accuracy of above 85% with false alarm rate below 10% was attained. The best performing methods were then applied at a provincial scale in the Gauteng and Limpopo provinces to produce regional change maps, indicating settlement expansion.
Thesis (PhD(Eng))--University of Pretoria, 2012.
Electrical, Electronic and Computer Engineering
unrestricted
APA, Harvard, Vancouver, ISO, and other styles
43

Wolstenholme, Robert. "Clustering time series data by analysing graphical models of connectivity and the application to diagnosis of brain disorders." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/55873.

Full text
Abstract:
In this thesis we investigate clustering and classification techniques applied to time series data from multivariate stochastic processes. In particular we focus on extracting features in the form of graphical models of conditional dependence between the process components. The motivation is to use the techniques on brain EEG data measured from multiple patients and investigate whether it can be used in areas such as medical diagnosis. We look at both the case where the graphical model is estimated based on time series recorded on the scalp and also where the graphical model is estimated based on source signals within the brain. In the first case we use a multiple hypothesis testing approach to build the graphical models and a learning algorithm based on random forests to find patterns within multiple graphical models. In the second case we use independent component analysis (ICA) to extract the source time series and estimate the conditional dependence graphs using partial mutual information. It is of particular note that in this case due to the indeterminacy issues associated with ICA we only know the conditional dependence graphs up to some unknown permutation of the nodes. To solve this issue we use novel methods based on an extension of graph matching to multiple inputs in order to develop a new clustering algorithm. Finally, we show how this algorithm can be combined with further information obtained during the ICA phase contained in columns of the unmixing matrix, to create a more powerful method.
APA, Harvard, Vancouver, ISO, and other styles
44

Groth, Gerson Eduardo. "Attribute field K-means : clustering trajectories with attribute by fitting multiple fields." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/150038.

Full text
Abstract:
A enorme quantidade de trajetórias contendo múltiplas dimensões, e o aumento de complexidade que esses dados ocasionam, impõem desafios ao visualizar e analisar estas informações. Visualização de Trajetórias deve lidar com alterações tanto na dimensão de espaço quanto na dimensão de tempo. Porém, os atributos de cada trajetória podem ocasionar uma melhor compreensão sobre seus comportamentos e características. Dessa forma, eles não deveriam ser neglicenciados. Neste trabalho, nós abordamos este problema interpretando séries temporais multivariadas com foco nos atributos das trajetórias, em um espaço de configuração que codifica um explícito relacionamento entre as variáveis das séries temporais. Nós propomos uma técnica original de clusterização de trajetórias, chamada Attribute Field k-means (AFKM). Ela usa um espaço de configuração dinâmica para gerar clusters baseados nos atributos e parâmetros definidos pelo usuário. Além disso, incorporando uma interface de sketching, nosso método é capaz de encontrar clusters que aproximam os exemplos de trajetórias desenhados pelo usuário. Nós também desenvolvemos um protótipo para explorar as trajetórias e clusters gerados pelo AFKM, de um modo interativo. Nossos resultados, em sintéticos e reais conjuntos de dados de séries temporais, provam a eficiência e o poder de visualização do nosso método.
The amount of high-dimensional trajectory data and its increasing complexity imposes a challenge when visualizing and analysing this information. Trajectory Visualization must deal with changes both in space and time dimensions, but the attributes of each trajectory may provide insights about its behavior and important aspects. Thus, they should not be neglected. In this work, we tackle this problem by interpreting multivariate time series as attribute-rich trajectories in a configuration space that encodes an explicit relationship among the time series variables. We propose a novel trajectory-clustering technique called Attribute Field k-means (AFKM). It uses a dynamic configuration space to generate clusters based on attributes and parameters set by the user. Furthermore, by incorporating a sketching-based interface, our approach is capable of finding clusters that approximates the input sketches. In addiction, we developed a prototype to explore the trajectories and clusters generated by AFKM in an interactive manner. Our results on synthetic and real time series datasets prove the efficiency and visualization power of our approach.
APA, Harvard, Vancouver, ISO, and other styles
45

Huo, Shiyin. "Detecting Self-Correlation of Nonlinear, Lognormal, Time-Series Data via DBSCAN Clustering Method, Using Stock Price Data as Example." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1321989426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Moser, Uwe Dominik [Verfasser], and Dieter [Akademischer Betreuer] Schramm. "Multivariate Time Series Clustering and Classification for Objective Assessment of Automated Driving Functions / Uwe Dominik Moser ; Betreuer: Dieter Schramm." Duisburg, 2020. http://d-nb.info/1216038880/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Zhakiya, Elezhan. "Unsupervised machine learning and k-Means clustering as a way of discovering anomalous events In continuous seismic time series." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117323.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Earth, Atmospheric, and Planetary Sciences, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-52).
Unsupervised k-Means clustering was implemented as a method for identifying anomalies in seismic time series. Sliding window approach was used for generating specific subsequences from the overall waveform. Dynamic Time Warping (DTW) was used as the method for comparing seismic subsequences. DTW barycenter averaging (DBA) was used as the method for averaging multiple subsequences within a group of similiar shapes. Clustering is able to discover anomalously shaped parts of a seismic time series in a completely unsupervised fashion, without requiring anyone to input actual times of the events, any predetermiend examples of events, or any other parameters about the signal.
by Elezhan Zhakiya.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
48

Sävhammar, Simon. "Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19194.

Full text
Abstract:
The uniform interval normalization technique is proposed as an approach to handle sparse data and to handle noise in the data. The technique is evaluated transforming and normalizing the MoodMapper and Safebase data sets, the predictive capabilities are compared by forecasting the data set with aLSTM model. The results are compared to both the commonly used MinMax normalization technique and MinMax normalization with a time2vec layer. It was found the uniform interval normalization performed better on the sparse MoodMapper data set, and the denser Safebase data set. Future works consist of studying the performance of uniform interval normalization on other data sets and with other machine learning models.
APA, Harvard, Vancouver, ISO, and other styles
49

Eriksson, Therése, and Abdelnaeim Mohamed Mahmoud. "Waveform clustering - Grouping similar power system events." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44147.

Full text
Abstract:
Over the last decade, data has become a highly valuable resource. Electrical power grids deal with large quantities of data, and continuously collect this for analytical purposes. Anomalies that occur within this data is important to identify since they could cause nonoptimal performance within the substations, or in worse cases damage to the substations themselves. However, large datasets in the order of millions are hard or even impossible to gain a reasonable overview of the data manually. When collecting data from electrical power grids, predefined triggering criteria are often used to indicate that an event has occurred within the specific system. This makes it difficult to search for events that are unknown to the operator of the deployed acquisition system. Clustering, an unsupervised machine learning method, can be utilised for fault prediction within systems generating large amounts of multivariate time-series data without labels and can group data more efficiently and without the bias of a human operator. A large number of clustering techniques exist, as well as methods for extracting information from the data itself, and identification of these was of utmost importance. This thesis work presents a study of the methods involved in the creation of such a clustering system which is suitable for the specific type of data. The objective of the study was to identify methods that enables finding the underlying structures of the data and cluster the data based on these. The signals were split into multiple frequency sub-bands and from these features could be extracted and evaluated. Using suitable combinations of features the data was clustered with two different clustering algorithms, CLARA and CLARANS, and evaluated with established quality analysis methods. The results indicate that CLARA performed overall best on all the tested feature sets. The formed clusters hold valuable information such as indications of unknown events within the system, and if similar events are clustered together this can assist a human operator further to investigate the importance of the clusters themselves. A further conclusion from the results is that research into the use of more optimised clustering algorithms is necessary so that expansion into larger datasets can be considered.
APA, Harvard, Vancouver, ISO, and other styles
50

Arzoky, Mahir. "Munch : an efficient modularisation strategy on sequential source code check-ins." Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/13808.

Full text
Abstract:
As developers are increasingly creating more sophisticated applications, software systems are growing in both their complexity and size. When source code is easy to understand, the system can be more maintainable, which leads to reduced costs. Better structured code can also lead to new requirements being introduced more efficiently with fewer issues. However, the maintenance and evolution of systems can be frustrating; it is difficult for developers to keep a fixed understanding of the system’s structure as the structure can change during maintenance. Software module clustering is the process of automatically partitioning the structure of the system using low-level dependencies in the source code, to improve the system’s structure. There have been a large number of studies using the Search Based Software Engineering approach to solve the software module clustering problem. A software clustering tool, Munch, was developed and employed in this study to modularise a unique dataset of sequential source code software versions. The tool is based on Search Based Software Engineering techniques. The tool constitutes of a number of components that includes the clustering algorithm, and a number of different fitness functions and metrics that are used for measuring and assessing the quality of the clustering decompositions. The tool will provide a framework for evaluating a number of clustering techniques and strategies. The dataset used in this study is provided by Quantel Limited, it is from processed source code of a product line architecture library that has delivered numerous products. The dataset analysed is the persistence engine used by all products, comprising of over 0.5 million lines of C++. It consists of 503 software versions. This study looks to investigate whether search-based software clustering approaches can help stakeholders to understand how inter-class dependencies of the software system change over time. It performs efficient modularisation on a time-series of source code relationships, taking advantage of the fact that the nearer the source code in time the more similar the modularisation is expected to be. This study introduces a seeding concept and highlights how it can be used to significantly reduce the runtime of the modularisation. The dataset is not treated as separate modularisation problems, but instead the result of the previous modularisation of the graph is used to give the next graph a head start. Code structure and sequence is used to obtain more effective modularisation and reduce the runtime of the process. To evaluate the efficiency of the modularisation numerous experiments were conducted on the dataset. The results of the experiments present strong evidence to support the seeding strategy. To reduce the runtime further, statistical techniques for controlling the number of iterations of the modularisation, based on the similarities between time adjacent graphs, is introduced. The convergence of the heuristic search technique is examined and a number of stopping criterions are estimated and evaluated. Extensive experiments were conducted on the time-series dataset and evidence are presented to support the proposed techniques. In addition, this thesis investigated and evaluated the starting clustering arrangement of Munch’s clustering algorithm, and introduced and experimented with a number of starting clustering arrangements that includes a uniformly random clustering arrangement strategy. Moreover, this study investigates whether the dataset used for the modularisation resembles a random graph by computing the probabilities of observing certain connectivity. This thesis demonstrates how modularisation is not possible with data that resembles random graphs, and demonstrates that the dataset being used does not resemble a random graph except for small sections where there were large maintenance activities. Furthermore, it explores and shows how the random graph metric can be used as a tool to indicate areas of interest in the dataset, without the need to run the modularisation. Last but not least, there is a huge amount of software code that has and will be developed, however very little has been learnt from how the code evolves over time. The intention of this study is also to help developers and stakeholders to model the internal software and to aid in modelling development trends and biases, and to try and predict the occurrence of large changes and potential refactorings. Thus, industrial feedback of the research was obtained. This thesis presents work on the detection of refactoring activities, and discusses the possible applications of the findings of this research in industrial settings.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography