Dissertations / Theses on the topic 'Noisy Time Series Clustering'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Noisy Time Series Clustering.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kim, Doo Young. "Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6277.
Full textXiong, Yimin. "Time series clustering using ARMA models /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20XIONG.
Full textIncludes bibliographical references (leaves 49-55). Also available in electronic version. Access restricted to campus users.
Jarjour, Riad. "Clustering financial time series for volatility modeling." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6439.
Full textTorku, Thomas K. "Takens Theorem with Singular Spectrum Analysis Applied to Noisy Time Series." Digital Commons @ East Tennessee State University, 2016. https://dc.etsu.edu/etd/3013.
Full textLi, Jing. "Clustering and forecasting for rain attenuation time series data." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219615.
Full textClustering is een van de unsupervised learning algorithmen om groep soortgelijke objecten in dezelfde cluster en de objecten in dezelfde cluster zijn meer vergelijkbaar met elkaar dan die in de andere clusters. Prognoser är att göra förutspårningar baserade på övergående data och effektiva artificiella intelligensmodeller för att förutspå datautveckling, som kan hjälpa till att fatta lämpliga beslut. Dataseten som används i denna avhandling är signaldämpningstidsseriedata från mikrovågsnätverket. Mikrovågsnät är kommunikationssystem för att överföra information mellan två fasta platser på jorden. De kan stödja ökade kapacitetsbehov i mobilnät och spela en viktig roll i nästa generationens trådlösa kommunikationsteknik. Men inneboende sårbarhet för slumpmässig fluktuering som nedbörd kommer att orsaka betydande nätverksförstöring. I den här avhandlingen används K-medel, Fuzzy c-medel och 2-state Hidden Markov Model för att utveckla ett steg och tvåstegs regen dämpning dataklyvningsmodeller. Prognosmodellerna är utformade utifrån k-närmaste granne-metoden och implementeras med linjär regression för att förutsäga realtidsdämpning för att hjälpa mikrovågstransportnät att mildra regnpåverkan, göra rätt beslut före tid och förbättra den allmänna prestandan.
Nunes, Neuza Filipa Martins. "Algorithms for time series clustering applied to biomedical signals." Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/5666.
Full textThe increasing number of biomedical systems and applications for human body understanding creates a need for information extraction tools to use in biosignals. It’s important to comprehend the changes in the biosignal’s morphology over time, as they often contain critical information on the condition of the subject or the status of the experiment. The creation of tools that automatically analyze and extract relevant attributes from biosignals, providing important information to the user, has a significant value in the biosignal’s processing field. The present dissertation introduces new algorithms for time series clustering, where we are able to separate and organize unlabeled data into different groups whose signals are similar to each other. Signal processing algorithms were developed for the detection of a meanwave, which represents the signal’s morphology and behavior. The algorithm designed computes the meanwave by separating and averaging all cycles of a cyclic continuous signal. To increase the quality of information given by the meanwave, a set of wave-alignment techniques was also developed and its relevance was evaluated in a real database. To evaluate our algorithm’s applicability in time series clustering, a distance metric created with the information of the automatic meanwave was designed and its measurements were given as input to a K-Means clustering algorithm. With that purpose, we collected a series of data with two different modes in it. The produced algorithm successfully separates two modes in the collected data with 99.3% of efficiency. The results of this clustering procedure were compared to a mechanism widely used in this area, which models the data and uses the distance between its cepstral coefficients to measure the similarity between the time series.The algorithms were also validated in different study projects. These projects show the variety of contexts in which our algorithms have high applicability and are suitable answers to overcome the problems of exhaustive signal analysis and expert intervention. The algorithms produced are signal-independent, and therefore can be applied to any type of signal providing it is a cyclic signal. The fact that this approach doesn’t require any prior information and the preliminary good performance make these algorithms powerful tools for biosignals analysis and classification.
Correia, Maria Inês Costa. "Cluster analysis of financial time series." Master's thesis, Instituto Superior de Economia e Gestão, 2020. http://hdl.handle.net/10400.5/21016.
Full textEsta dissertação aplica o método da Signature como medida de similaridade entre dois objetos de séries temporais usando as propriedades de ordem 2 da Signature e aplicando-as a um método de Clustering Asimétrico. O método é comparado com uma abordagem de Clustering mais tradicional, onde a similaridade é medida usando Dynamic Time Warping, desenvolvido para trabalhar com séries temporais. O intuito é considerar a abordagem tradicional como benchmark e compará-la ao método da Signature através do tempo de computação, desempenho e algumas aplicações. Estes métodos são aplicados num conjunto de dados de séries temporais financeiras de Fundos Mútuos do Luxemburgo. Após a revisão da literatura, apresentamos o método Dynamic Time Warping e o método da Signature. Prossegue-se com a explicação das abordagens de Clustering Tradicional, nomeadamente k-Means, e Clustering Espectral Assimétrico, nomeadamente k-Axes, desenvolvido por Atev (2011). O último capítulo é dedicado à Investigação Prática onde os métodos anteriores são aplicados ao conjunto de dados. Os resultados confirmam que o método da Signature têm efectivamente potencial para Machine Learning e previsão, como sugerido por Levin, Lyons and Ni (2013).
This thesis applies the Signature method as a measurement of similarities between two time-series objects, using the Signature properties of order 2, and its application to Asymmetric Spectral Clustering. The method is compared with a more Traditional Clustering approach where similarities are measured using Dynamic Time Warping, developed to work with time-series data. The intention for this is to consider the traditional approach as a benchmark and compare it to the Signature method through computation times, performance, and applications. These methods are applied to a financial time series data set of Mutual Exchange Funds from Luxembourg. After the literature review, we introduce the Dynamic Time Warping method and the Signature method. We continue with the explanation of Traditional Clustering approaches, namely k-Means, and Asymmetric Clustering techniques, namely the k-Axes algorithm, developed by Atev (2011). The last chapter is dedicated to Practical Research where the previous methods are applied to the data set. Results confirm that the Signature method has indeed potential for machine learning and prediction, as suggested by Levin, Lyons, and Ni (2013).
info:eu-repo/semantics/publishedVersion
Nelson, Alex Tremain. "Nonlinear estimation and modeling of noisy time-series by dual Kalman filtering methods." Oregon Health & Science University, 2000. http://content.ohsu.edu/u?/etd,211.
Full textElectrical and Computer Engineering
Numerous applications require either the estimation or prediction of a noisy time-series. Examples include speech enhancement, economic forecasting, and geophysical modeling. A noisy time-series can be described in terms of a probabilistic model, which accounts for both the deterministic and stochastic components of the dynamics. Such a model can be used with a Kalman filter (or extended Kalman filter) to estimate and predict the time-series from noisy measurements. When the model is unknown, it must be estimated as well; dual estimation refers to the problem of estimating both the time-series, and its underlying probabilistic model, from noisy data. The majority of dual estimation techniques in the literature are for signals described by linear models, and many are restricted to off-line application domains. Using a probabilistic approach to dual estimation, this work unifies many of the approaches in the literature within a common theoretical and algorithmic framework, and extends their capabilities to include sequential dual estimation of both linear and nonlinear signals. The dual Kalman filtering method is developed as a method for minimizing a variety of dual estimation cost functions, and is shown to be an effective general method for estimating the signal, model parameters, and noise variances in both on-line and off-line environments.
Wang, Chiying. "Contributions to Collective Dynamical Clustering-Modeling of Discrete Time Series." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/198.
Full textNordlinder, Magnus. "Clustering of Financial Account Time Series Using Self Organizing Maps." Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291612.
Full textMålet med denna uppsats är att klustra tidsserier över finansiella konton genom att extrahera tidsseriernas karakteristik. För detta används två metoder för att reducera tidsseriernas dimensionalitet, Kohonen Self Organizing Maps och principal komponent analys. Resultatet används sedan för att klustra finansiella tjänster som en kund använder, med syfte att analysera om det existerar ett urval av tjänster som är mer eller mindre förekommande bland olika tidsseriekluster. Resultatet kan användas för att analysera dynamiken mellan kontobehållning och kundens finansiella tjänster, samt om en tjänst är mer förekommande i ett tidsseriekluster.
Zhang, Guilin. "Clustering Algorithms for Time Series Gene Expression in Microarray Data." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc177269/.
Full textCaiado, Aníbal Jorge da Costa Cristóvão. "Distance-based methods for classification and clustering of time series." Doctoral thesis, Instituto Superior de Economia e Gestão, 2006. http://hdl.handle.net/10400.5/3531.
Full textArora, Rahul. "Operational Modal Parameter Estimation from Short Time-Data Series." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1397467142.
Full textLei, Jiahuan. "An extended BIRCH-based clustering algorithm for large time-series datasets." Thesis, Mittuniversitetet, Avdelningen för informations- och kommunikationssystem, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29858.
Full textLeverger, Colin. "Investigation of a framework for seasonal time series forecasting." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S033.
Full textTo deploy web applications, using web servers is paramount. If there is too few of them, applications performances can quickly deteriorate. However, if they are too numerous, the resources are wasted and the cost increased. In this context, engineers use capacity planning tools to follow the performances of the servers, to collect time series data and to anticipate future needs. The necessity to create reliable forecasts seems clear. Data generated by the infrastructure often exhibit seasonality. The activity cycle followed by the infrastructure is determined by some seasonal cycles (for example, the user’s daily rhythms). This thesis introduces a framework for seasonal time series forecasting. This framework is composed of two machine learning models (e.g. clustering and classification) and aims at producing reliable midterm forecasts with a limited number of parameters. Three instantiations of the framework are presented: one baseline, one deterministic and one probabilistic. The baseline is composed of K-means clustering algorithms and Markov Models. The deterministic version is composed of several clustering algorithms (K-means, K-shape, GAK and MODL) and of several classifiers (naive-bayes, decision trees, random forests and logistic regression). The probabilistic version relies on coclustering to create time series probabilistic grids, that are used to describe the data in an unsupervised way. The performances of the various implementations are compared with several state-of-the-art models, including the autoregressive models, ARIMA and SARIMA, Holt Winters, or even Prophet for the probabilistic paradigm. The results of the baseline are encouraging and confirm the interest for the framework proposed. Good results are observed for the deterministic implementation, and correct results for the probabilistic version. One Orange use case is studied, and the interest and limits of the methodology are discussed
Zeng, Zhanggui. "Financial Time Series Analysis using Pattern Recognition Methods." University of Sydney, 2008. http://hdl.handle.net/2123/3558.
Full textThis thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database.
Tang, Fan. "Structural time series clustering, modeling, and forecasting in the state-space framework." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/6002.
Full textBlakely, Logan. "Spectral Clustering for Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series." Thesis, Portland State University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10980011.
Full textThe increasing demand for and prevalence of distributed energy resources (DER) such as solar power, electric vehicles, and energy storage, present a unique set of challenges for integration into a legacy power grid, and accurate models of the low-voltage distribution systems are critical for accurate simulations of DER. Accurate labeling of the phase connections for each customer in a utility model is one area of grid topology that is known to have errors and has implications for the safety, efficiency, and hosting capacity of a distribution system. This research presents a methodology for the phase identification of customers solely using the advanced metering infrastructure (AMI) voltage timeseries. This thesis proposes to use Spectral Clustering, combined with a sliding window ensemble method for utilizing a long-term, time-series dataset that includes missing data, to group customers within a lateral by phase. These clustering phase predictions validate over 90% of the existing phase labels in the model and identify customers where the current phase labels are incorrect in this model. Within this dataset, this methodology produces consistent, high-quality results, verified by validating the clustering phase predictions with the underlying topology of the system, as well as selected examples verified using satellite and street view images publicly available in Google Earth. Further analysis of the results of the Spectral Clustering predictions are also shown to not only validate and improve the phase labels in the utility model, but also show potential in the detection of other types of errors in the topology of the model such as errors in the labeling of connections between customers and transformers, unlabeled residential solar power, unlabeled transformers, and locating customers with incomplete information in the model. These results indicate excellent potential for further development of this methodology as a tool for validating and improving existing utility models of the low-voltage side of the distribution system.
Díaz, González Fernando. "Federated Learning for Time Series Forecasting Using LSTM Networks: Exploiting Similarities Through Clustering." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254665.
Full textFederated Learning utgör en statistisk utmaning vid träning med starkt heterogen sekvensdata. Till exempel så uppvisar tidsseriedata inom telekomdomänen blandade variationer och mönster över längre tidsintervall. Dessa distinkta fördelningar utgör en utmaning när en nod inte bara ska bidra till skapandet av en global modell utan även ämnar applicera denna modell på sin lokala datamängd. Att i detta scenario införa en global modell som ska passa alla kan visa sig vara otillräckligt, även om vi använder oss av de mest framgångsrika modellerna inom maskininlärning för tidsserieprognoser, Long Short-Term Memory (LSTM) nätverk, vilka visat sig kunna fånga komplexa mönster och generalisera väl till nya mönster. I detta arbete visar vi att genom att klustra klienterna med hjälp av dessa mönster och selektivt aggregera deras uppdateringar i olika globala modeller kan vi uppnå förbättringar av den lokal prestandan med minimala kostnader, vilket vi demonstrerar genom experiment med riktigt tidsseriedata och en grundläggande LSTM-modell.
Damle, Chaitanya. "Flood forecasting using time series data mining." [Tampa, Fla.] : University of South Florida, 2005. http://purl.fcla.edu/fcla/etd/SFE0001038.
Full textWang, Xing. "Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6425.
Full textXu, Tianbing. "Nonparametric evolutionary clustering." Diss., Online access via UMI:, 2009.
Find full textZlicar, Blaz. "Algorithms for noisy and nonstationary data : advances in financial time series forecasting and pattern detection with machine learning." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10043123/.
Full textAbualhamayl, Abdullah Jameel Mr. "APPLY DATA CLUSTERING TO GENE EXPRESSION DATA." CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/259.
Full textLi, Lei. "Fast Algorithms for Mining Co-evolving Time Series." Research Showcase @ CMU, 2011. http://repository.cmu.edu/dissertations/112.
Full textYILDIRIM, NURSEDA. "TIME SERIES MODELLING FOR WIND POWER PREDICTION AND CONTROL : CLUSTERING AND ASSOCIATION RULES OF DATA MINING FOR CFD AND TIME SERIES DATA OF POWER RAMPS." Thesis, Uppsala universitet, Institutionen för geovetenskaper, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-245304.
Full textLu, Zhengdong. "Constrained clustering and cognitive decline detection /." Full text open access at:, 2008. http://content.ohsu.edu/u?/etd,650.
Full textKhy, Sophoin, Yoshiharu Ishikawa, and Hiroyuki Kitagawa. "A Query Language and Its Processing for Time-Series Document Clusters." Springer-Verlag, 2008. http://hdl.handle.net/2237/10689.
Full textCASSIANO, KEILA MARA. "TIME SERIES ANALYSIS USING SINGULAR SPECTRUM ANALYSIS (SSA) AND BASED DENSITY CLUSTERING OF THE COMPONENTS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24787@1.
Full textEsta tese propõe a utilização do DBSCAN (Density Based Spatial Clustering of Applications with Noise) para separar os componentes de ruído na fase de agrupamento das autotriplas da Análise Singular Espectral (SSA) de Séries Temporais. O DBSCAN é um método moderno de clusterização (revisto em 2013) e especialista em identificar ruído através de regiões de menor densidade. O método de agrupamento hierárquico até então é a última inovação na separação de ruído na abordagem SSA, implementado no pacote R- SSA. No entanto, o método de agrupamento hierárquico é muito sensível a ruído, não é capaz de separá-lo corretamente, não deve ser usado em conjuntos com diferentes densidades e não funciona bem no agrupamento de séries temporais de diferentes tendências, ao contrário dos métodos de aglomeração à base de densidade que são eficazes para separar o ruído a partir dos dados e dedicados para trabalhar bem em dados a partir de diferentes densidades. Este trabalho mostra uma melhor eficiência de DBSCAN sobre os outros métodos já utilizados nesta etapa do SSA, garantindo considerável redução de ruídos e proporcionando melhores previsões. O resultado é apoiado por avaliações experimentais realizadas para séries simuladas de modelos estacionários e não estacionários. A combinação de metodologias proposta também foi aplicada com sucesso na previsão de uma série real de velocidade do vento.
This thesis proposes using DBSCAN (Density Based Spatial Clustering of Applications with Noise) to separate the noise components of eigentriples in the grouping stage of the Singular Spectrum Analysis (SSA) of Time Series. The DBSCAN is a modern (revised in 2013) and expert method at identify noise through regions of lower density. The hierarchical clustering method was the last innovation in noise separation in SSA approach, implemented on package R-SSA. However, is repeated in the literature that the hierarquical clustering method is very sensitive to noise, is unable to separate it correctly, and should not be used in clusters with varying densities and neither works well in clustering time series of different trends. Unlike, the methods of density based clustering are effective in separating the noise from the data and dedicated to work well on data from different densities This work shows better efficiency of DBSCAN over the others methods already used in this stage of SSA, because it allows considerable reduction of noise and provides better forecasting. The result is supported by experimental evaluations realized for simulated stationary and non-stationary series. The proposed combination of methodologies also was applied successfully to forecasting real series of wind s speed.
Makhlouk, Oumaïma. "Time series data analytics : clustering-based anomaly detection techniques for quality control in semiconductor manufacturing." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120248.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 109-110).
Optimizing their manufacturing systems and processes whilst ensuring a low production cost is important for Analog Devices, Inc. (ADI). Therefore, detecting anomalies on production lines and alerting on out-of-control processes is crucial. Although Statistical Process Control (SPC) methods have been implemented in the past and have proven to be efficient, the company seeks improvements using machine learning. The Machine Health Project is one of the data analytics-based projects under way at ADI to implement such improvements. Anomaly detection techniques can be effective in improving the quality control on semiconductor production lines. Sets of data collected from semiconductor manufacturing machines, such as a plasma etcher, can be analyzed to control the fabrication process and test the efficiency of machine learning algorithms. This thesis focuses on cluster analysis for outlier detection, and provides a univariate strategy to find potential anomalous behaviors in the data when a given parameter is known as relevant. If a more thorough analysis of the data is needed, a multivariate clustering analysis can also be computed. In addition, decomposition-based algorithms are presented. These rely on techniques such as the STL and SAX representations of time series, and provide a visual computation of time series discords. In this thesis, these methods are implemented, and their results are compared. Recommendations are provided as to how to best utilize the outputs of these outlier detection algorithms.
by Oumaïma Makhlouk.
M. Eng. in Advanced Manufacturing and Design
Marti, Gautier. "Some contributions to the clustering of financial time series and applications to credit default swaps." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX097/document.
Full textIn this thesis we first review the scattered literature about clustering financial time series. We then try to give as much colors as possible on the credit default swap market, a relatively unknown market from the general public but for its role in the contagion of bank failures during the global financial crisis of 2007-2008, while introducing the datasets that have been used in the empirical studies. Unlike the existing body of literature which mostly offers descriptive studies, we aim at building models and large information systems based on clusters which are seen as basic building blocks: These foundations must be stable. That is why the work undertaken and described in the following intends to ground further the clustering methodologies. For that purpose, we discuss their consistency and propose alternative measures of similarity that can be plugged in the clustering methodologies. We study empirically their impact on the clusters. Results of the empirical studies can be explored at www.datagrapple.com
Hanna, Peter, and Erik Swartling. "Anomaly Detection in Time Series Data using Unsupervised Machine Learning Methods: A Clustering-Based Approach." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273630.
Full textFör flera företag i tillverkningsindustrin är felsökningar av produkter en fundamental uppgift i produktionsprocessen. Då användningen av olika maskininlärningsmetoder visar sig innehålla användbara tekniker för att hitta fel i produkter är dessa metoder ett populärt val bland företag som ytterligare vill förbättra produktionprocessen. För vissa industrier är feldetektering starkt kopplat till anomalidetektering av olika mätningar. I detta examensarbete är syftet att konstruera oövervakad maskininlärningsmodeller för att identifiera anomalier i tidsseriedata. Mer specifikt består datan av högfrekvent mätdata av pumpar via ström och spänningsmätningar. Mätningarna består av fem olika faser, nämligen uppstartsfasen, tre last-faser och fasen för avstängning. Maskinilärningsmetoderna är baserade på olika klustertekniker, och de metoderna som användes är DBSCAN och LOF algoritmerna. Dessutom tillämpades olika dimensionsreduktionstekniker och efter att ha konstruerat 5 olika modeller, alltså en för varje fas, kan det konstateras att modellerna lyckats identifiera anomalier i det givna datasetet.
Ferreira, Leonardo Nascimento. "Time series data mining using complex networks." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-01022018-144118/.
Full textSéries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
Li, Chuhe. "A sliding window BIRCH algorithm with performance evaluations." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-32397.
Full textThielo, Marcelo Resende. "Análise e classificação de séries temporais não estacionárias utilizando métodos não-lineares." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2000. http://hdl.handle.net/10183/12661.
Full textIn this work we make a review of some of the main methods available for nonlinear time series analysis for low-dimensional deterministic systems, giving emphasis to the problem of unsupervised classification/clustering of this kind of data. Various dissimilarity measures are used together with heuristic search methods based on stochastic algorithms to organize segments of one (big) nonstationary time series in groups with common characteristics, trying to relate these groups to some known clinical property. The method is implemented with different dissimilarity measures and one experiment made with synthetic (generated by numerical simulations) time series for validation and lately applied to a real problem, the problem of sleep stages segmentation. The results look promising with respect to the applicability of the method to classify sleep stages in electroencephalographic recordings.
Jaunzems, Davis. "Time-series long-term forcasting for A/B tests." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205344.
Full textThe technological development of computing devices and communication tools has allowed to store and process more information than ever before. For researchers it is a means of making more accurate scientific discoveries, for companies it is a way of better understanding their clients, products and gain an edge over the competitors. In the industry A/B testing is becoming an important and a common way of obtaining insights that help to make data-driven decisions. A/B test is a comparison of two or more versions to determine which is performing better according to predetermined measurements. In combination of data mining and statistical analysis, these tests allow to answer important questions and help to transition from the state of “we think” to “we know”. Nevertheless, running bad test cases can have negative impact on businesses and can result in bad user experience. That is why it is important to be able to forecast A/B test long-term effects from short-term data. In this report A/B tests and their forecasting is looked at using the univariate time-series analysis. However, because of the short duration and high diversity, it poses a great challenge in providing accurate long-term forecasts. This is a quantitative and empirical study that uses real-world data set from a social game development company King Digital Entertainment PLC(King.com). First through series of steps the data are analysed and pre-processed. Time-series forecasting has been around for generations. That is why an analysis and accuracy comparison of existing forecasting models, like, mean forecast, ARIMA and Artificial Neural Networks, is carried out. The results on real data set show similar results that other researchers have found for long-term forecasts with short-term data. To improve the forecasting accuracy a time-series clustering method is proposed. The method utilizes similarity between time-series through Dynamic Time Warping, and trains separate cluster forecasting models. The clusters are chosen with high accuracy using Random Forest classifier, and certainty about time-series long-term range is obtained by using historical tests and a Markov Chain. The proposed method shows superior results against existing models, and can be used to obtain long-term forecasts for A/B tests.
Soheily-Khah, Saeid. "Generalized k-means-based clustering for temporal data under time warp." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM064/document.
Full textTemporal alignment of multiple time series is an important unresolved problem in many scientific disciplines. Major challenges for an accurate temporal alignment include determining and modeling the common and differential characteristics of classes of time series. This thesis is motivated by recent works in extending Dynamic time warping for aligning multiple time series from several applications including speech recognition, curve matching, micro-array data analysis, temporal segmentation or human motion. However these DTW-based works suffer of several limitations: 1) They address the problem of aligning two time series regardless of the remaining time series, 2) They involve uniformly the features of the multiple time series, 3) The time series are aligned globally by including the whole observations. The aim of this thesis is to explore a generalized dynamic time warping for time series clustering. This work includes first the problem of prototype extraction, then the alignment of multiple and multidimensional time series
Costa, Fausto Guzzo da. "Employing nonlinear time series analysis tools with stable clustering algorithms for detecting concept drift on data streams." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-13112017-105506/.
Full textDiversos processos industriais, científicos e comerciais produzem sequências de observações continuamente, teoricamente infinitas, denominadas fluxos de dados. Pela análise das recorrências e das mudanças de comportamento desses fluxos, é possível obter informações sobre o fenômeno que os produziu. A inferência de modelos estáveis para tais fluxos é suportada pelo estudo das recorrências dos dados, enquanto é prejudicada pelas mudanças de comportamento. Essas mudanças são produzidas principalmente por influências externas ainda desconhecidas pelos modelos vigentes, tal como ocorre quando novas estratégias de investimento surgem na bolsa de valores, ou quando há intervenções humanas no clima, etc. No contexto de Aprendizado de Máquina (AM), várias pesquisas têm sido realizadas para investigar essas variações nos fluxos de dados, referidas como mudanças de conceito. Sua detecção permite que os modelos possam ser atualizados a fim de apurar a predição, a compreensão e, eventualmente, controlar as influências que governam o fluxo de dados em estudo. Nesse cenário, algoritmos supervisionados sofrem com a limitação para rotular os dados quando esses são gerados em alta frequência e grandes volumes, e algoritmos não supervisionados carecem de fundamentação teórica para prover garantias na detecção de mudanças. Além disso, algoritmos de ambos paradigmas não representam adequadamente as dependências temporais entre observações dos fluxos. Nesse contexto, esta tese de doutorado introduz uma nova metodologia para detectar mudanças de conceito, na qual duas deficiências de ambos paradigmas de AM são confrontados: i) a instabilidade envolvida na modelagem dos dados, e ii) a representação das dependências temporais. Essa metodologia é motivada pelo arcabouço teórico de Carlsson e Memoli, que provê uma propriedade de estabilidade para algoritmos de agrupamento hierárquico com relação à permutação dos dados. Para usufruir desse arcabouço, as observações são embutidas pelo teorema de imersão de Takens, transformando-as em independentes. Esses dados são então agrupados pelo algoritmo Single-Linkage Invariante à Permutação (PISL), o qual respeita a propriedade de estabilidade de Carlsson e Memoli. A partir dos dados de entrada, esse algoritmo gera dendrogramas (ou modelos), que são equivalentes a espaços ultramétricos. Modelos sucessivos são comparados pela distância de Gromov-Hausdorff a fim de detectar mudanças de conceito no fluxo. Como resultado, as divergências dos modelos são de fato associadas a mudanças nos dados. Experimentos foram realizados, um considerando mudanças abruptas e o outro mudanças graduais. Os resultados confirmam que a metodologia proposta é capaz de detectar mudanças de conceito, tanto abruptas quanto graduais, no entanto ela é mais adequada para cenários mais complicados. As contribuições principais desta tese são: i) o uso do teorema de imersão de Takens para transformar os dados de entrada em independentes; ii) a implementação do algoritmo PISL em combinação com a distância de Gromov-Hausdorff (chamado PISLGH); iii) a comparação da metodologia proposta com outras da literatura em diferentes cenários; e, finalmente, iv) a disponibilização de um pacote em R (chamado streamChaos) que provê tanto ferramentas para processar fluxos de dados não lineares quanto diversos algoritmos para detectar mudanças de conceito.
Foster, Eric D. "State space time series clustering using discrepancies based on the Kullback-Leibler information and the Mahalanobis distance." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/3455.
Full textTino, Peter, Christian Schittenkopf, and Georg Dorffner. "Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams. Lessons learned from financial volatility trading." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 2000. http://epub.wu.ac.at/1680/1/document.pdf.
Full textSeries: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
Darwish, Amena. "Optimized material flow using unsupervised time series clustering : An experimental study on the just in time supermarket for Volvo powertrain production Skövde." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17530.
Full textSalmon, Brian Paxton. "Improved hyper-temporal feature extraction methods for land cover change detection in satellite time series." Thesis, University of Pretoria, 2012. http://hdl.handle.net/2263/28199.
Full textThesis (PhD(Eng))--University of Pretoria, 2012.
Electrical, Electronic and Computer Engineering
unrestricted
Wolstenholme, Robert. "Clustering time series data by analysing graphical models of connectivity and the application to diagnosis of brain disorders." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/55873.
Full textGroth, Gerson Eduardo. "Attribute field K-means : clustering trajectories with attribute by fitting multiple fields." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/150038.
Full textThe amount of high-dimensional trajectory data and its increasing complexity imposes a challenge when visualizing and analysing this information. Trajectory Visualization must deal with changes both in space and time dimensions, but the attributes of each trajectory may provide insights about its behavior and important aspects. Thus, they should not be neglected. In this work, we tackle this problem by interpreting multivariate time series as attribute-rich trajectories in a configuration space that encodes an explicit relationship among the time series variables. We propose a novel trajectory-clustering technique called Attribute Field k-means (AFKM). It uses a dynamic configuration space to generate clusters based on attributes and parameters set by the user. Furthermore, by incorporating a sketching-based interface, our approach is capable of finding clusters that approximates the input sketches. In addiction, we developed a prototype to explore the trajectories and clusters generated by AFKM in an interactive manner. Our results on synthetic and real time series datasets prove the efficiency and visualization power of our approach.
Huo, Shiyin. "Detecting Self-Correlation of Nonlinear, Lognormal, Time-Series Data via DBSCAN Clustering Method, Using Stock Price Data as Example." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1321989426.
Full textMoser, Uwe Dominik [Verfasser], and Dieter [Akademischer Betreuer] Schramm. "Multivariate Time Series Clustering and Classification for Objective Assessment of Automated Driving Functions / Uwe Dominik Moser ; Betreuer: Dieter Schramm." Duisburg, 2020. http://d-nb.info/1216038880/34.
Full textZhakiya, Elezhan. "Unsupervised machine learning and k-Means clustering as a way of discovering anomalous events In continuous seismic time series." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117323.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 51-52).
Unsupervised k-Means clustering was implemented as a method for identifying anomalies in seismic time series. Sliding window approach was used for generating specific subsequences from the overall waveform. Dynamic Time Warping (DTW) was used as the method for comparing seismic subsequences. DTW barycenter averaging (DBA) was used as the method for averaging multiple subsequences within a group of similiar shapes. Clustering is able to discover anomalously shaped parts of a seismic time series in a completely unsupervised fashion, without requiring anyone to input actual times of the events, any predetermiend examples of events, or any other parameters about the signal.
by Elezhan Zhakiya.
S.M.
Sävhammar, Simon. "Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19194.
Full textEriksson, Therése, and Abdelnaeim Mohamed Mahmoud. "Waveform clustering - Grouping similar power system events." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44147.
Full textArzoky, Mahir. "Munch : an efficient modularisation strategy on sequential source code check-ins." Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/13808.
Full text