Tesi: "Spatial data mining"

1

Zhang, Xin Iris, e 張欣. "Fast mining of spatial co-location patterns". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30462708.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

2

Yang, Zhao. "Spatial Data Mining Analytical Environment for Large Scale Geospatial Data". ScholarWorks@UNO, 2016. http://scholarworks.uno.edu/td/2284.

Testo completo

Abstract (sommario):

Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users.

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Al-Naymat, Ghazi. "NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA". Thesis, The University of Sydney, 2009. http://hdl.handle.net/2123/5295.

Testo completo

Abstract (sommario):

Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002.

Gli stili APA, Harvard, Vancouver, ISO e altri

4

Al-Naymat, Ghazi. "NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA". University of Sydney, 2009. http://hdl.handle.net/2123/5295.

Testo completo

Abstract (sommario):

Doctor of Philosophy (PhD)
Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002.

Gli stili APA, Harvard, Vancouver, ISO e altri

5

Koperski, Krzysztof. "A progressive refinement approach to spatial data mining". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0024/NQ51882.pdf.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

6

Yang, Hui. "A general framework for mining spatial and spatio-temporal object association patterns in scientific data". Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1155319799.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

7

Yu, Ping. "FP-tree Based Spatial Co-location Pattern Mining". Thesis, University of North Texas, 2005. https://digital.library.unt.edu/ark:/67531/metadc4724/.

Testo completo

Abstract (sommario):

A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.

Gli stili APA, Harvard, Vancouver, ISO e altri

8

SHENCOTTAH, K. N. KALYANKUMAR. "FINDING CLUSTERS IN SPATIAL DATA". University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179521337.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

9

Lin, Zhungshan. "Optimal Candidate Generation in Spatial Co-Location Mining". DigitalCommons@USU, 2009. https://digitalcommons.usu.edu/etd/377.

Testo completo

Abstract (sommario):

Existing spatial co-location algorithms based on levels suffer from generating extra, nonclique candidate instances. Thus, they require cliqueness checking at every level. In this thesis, a novel, spatial co-location mining algorithm that automatically generates co-located spatial features without generating any nonclique candidates at any level is proposed. Subsequently, this algorithm generates fewer candidates than other existing level-wise, co-location algorithms without losing any pertinent information. The benefits of this algorithm have been clearly observed at early stages in the mining process.

Gli stili APA, Harvard, Vancouver, ISO e altri

10

Pech, Palacio Manuel Alfredo. "Spatial data modeling and mining using a graph-based representation". Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0118/these.pdf.

Testo completo

Abstract (sommario):

Est proposé un unique modèle basé sur des graphes pour représenter des données spatiales, les données non-spatiales et les relations entre les objets spatiaux. Ainsi un graphe est généré à partir de ces trois éléments. On considère que l'outil de fouille de données basé sur les graphes peut découvrir des patterns incluant ces trois éléments, selon trois types de relation spatiale (topologique, cardinale et de distance). Dans notre modèle, les données spatiales, non-spatiales (attributs non-spatiaux), et les relations spatiales représentent une collections d'un ou plusieurs graphes orientés. Les sommets représentent soit les objets spatiaux, soit les relations spatiales entre deux objets spatiaux, ou les attributs non-spatiaux. De plus, un sommet peut représenter soit un attribut, soit le nom d'une relation spatiale. Les noms des attributs peuvent référencer des objets spatiaux ou non-spatiaux. Les arcs orientés sont utilisés pour représenter des informations directionnelles sur les relations entre les éléments, et pour décrire les attributs des objets. On a adopté SUBDUE comme un outil de fouille de graphes. Une caractéristique particulière dite de recouvrement joue un rôle important dans la découverte de patterns. Cependant, elle peut-être implémentée pour recouvrir la totalité du graphe, ou bien ne considérer aucun sommet. En conséquence, nous proposons une troisième piste nommée recouvrement limité, laquelle donne à l'utilisateur la capacité de choisir le recouvrement. On analyse directement trois caractéristiques de l'algorithme proposé, la réduction de l'espace de recherche, la réduction du temps de calcul, et la découverte de patterns grâce à ce type de recouvrement
We propose a unique graph-based model to represent spatial data, non-spatial data and the spatial relations among spatial objects. We will generate datasets composed of graphs with a set of these three elements. We consider that by mining a dataset with these characteristics a graph-based mining tool can search patterns involving all these elements at the same time improving the results of the spatial analysis task. A significant characteristic of spatial data is that the attributes of the neighbors of an object may have an influence on the object itself. So, we propose to include in the model three relationship types (topological, orientation, and distance relations). In the model the spatial data (i. E. Spatial objects), non-spatial data (i. E. Non-spatial attributes), and spatial relations are represented as a collection of one or more directed graphs. A directed graph contains a collection of vertices and edges representing all these elements. Vertices represent either spatial objects, spatial relations between two spatial objects (binary relation), or non-spatial attributes describing the spatial objects. Edges represent a link between two vertices of any type. According to the type of vertices that an edge joins, it can represent either an attribute name or a spatial relation name. The attribute name can refer to a spatial object or a non-spatial entity. We use directed edges to represent directional information of relations among elements (i. E. Object x touches object y) and to describe attributes about objects (i. E. Object x has attribute z). We propose to adopt the Subdue system, a general graph-based data mining system developed at the University of Texas at Arlington, as our mining tool. A special feature named overlap has a primary role in the substructures discovery process and consequently a direct impact over the generated results. However, it is currently implemented in an orthodox way: all or nothing. Therefore, we propose a third approach: limited overlap, which gives the user the capability to set over which vertices the overlap will be allowed. We visualize directly three motivations issues to propose the implementation of the new algorithm: search space reduction, processing time reduction, and specialized overlapping pattern oriented search

Gli stili APA, Harvard, Vancouver, ISO e altri

11

Pech, Palacio Manuel Alfredo Laurini Robert Tchounikine Anne Sol Martínez David. "Spatial data modeling and mining using a graph-based representation". Villeurbanne : Doc'INSA, 2006. http://docinsa.insa-lyon.fr/these/pont.php?id=pech_palacio.

Testo completo

Abstract (sommario):

Thèse doctorat : Informatique : Villeurbanne, INSA : 2005. Thèse doctorat : Informatique : Universidad de las Américas - Puebla : 2005.
Thèse soutenue en co-tutelle. Thèse rédigée en français, en anglais et en espagnol. Titre provenant de l'écran-titre. Bibliogr. p. 174-182.

Gli stili APA, Harvard, Vancouver, ISO e altri

12

Kou, Yufeng. "Abnormal Pattern Recognition in Spatial Data". Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/30145.

Testo completo

Abstract (sommario):

In the recent years, abnormal spatial pattern recognition has received a great deal of attention from both industry and academia, and has become an important branch of data mining. Abnormal spatial patterns, or spatial outliers, are those observations whose characteristics are markedly different from their spatial neighbors. The identification of spatial outliers can be used to reveal hidden but valuable knowledge in many applications. For example, it can help locate extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumor cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs, and detect water pollution incidents. Numerous traditional outlier detection methods have been developed, but they cannot be directly applied to spatial data in order to extract abnormal patterns. Traditional outlier detection mainly focuses on "global comparison" and identifies deviations from the remainder of the entire data set. In contrast, spatial outlier detection concentrates on discovering neighborhood instabilities that break the spatial continuity. In recent years, a number of techniques have been proposed for spatial outlier detection. However, they have the following limitations. First, most of them focus primarily on single-attribute outlier detection. Second, they may not accurately locate outliers when multiple outliers exist in a cluster and correlate with each other. Third, the existing algorithms tend to abstract spatial objects as isolated points and do not consider their geometrical and topological properties, which may lead to inexact results. This dissertation reports a study of the problem of abnormal spatial pattern recognition, and proposes a suite of novel algorithms. Contributions include: (1) formal definitions of various spatial outliers, including single-attribute outliers, multi-attribute outliers, and region outliers; (2) a set of algorithms for the accurate detection of single-attribute spatial outliers; (3) a systematic approach to identifying and tracking region outliers in continuous meteorological data sequences; (4) a novel Mahalanobis-distance-based algorithm to detect outliers with multiple attributes; (5) a set of graph-based algorithms to identify point outliers and region outliers; and (6) extensive analysis of experiments on several spatial data sets (e.g., West Nile virus data and NOAA meteorological data) to evaluate the effectiveness and efficiency of the proposed algorithms.
Ph. D.

Gli stili APA, Harvard, Vancouver, ISO e altri

13

Sips, Mike. "Pixel-based visual data mining in large geo-spatial point sets /". Konstanz : Hartung-Gorre, 2006. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=014881714&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

14

Ruß, Georg [Verfasser], e Rudolf [Akademischer Betreuer] Kruse. "Spatial data mining in precision agriculture / Georg Ruß. Betreuer: Rudolf Kruse". Magdeburg : Universitätsbibliothek, 2012. http://d-nb.info/1047596296/34.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

15

Ruß, Georg Verfasser], e Rudolf [Akademischer Betreuer] [Kruse. "Spatial data mining in precision agriculture / Georg Ruß. Betreuer: Rudolf Kruse". Magdeburg : Universitätsbibliothek, 2012. http://nbn-resolving.de/urn:nbn:de:gbv:ma9:1-820.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

16

Lan, Liang. "Data Mining Algorithms for Classification of Complex Biomedical Data". Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/214773.

Testo completo

Abstract (sommario):

Computer and Information Science
Ph.D.
In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray classification, samples belong to several predefined categories (e.g., cancer vs. control tissues) and the goal is to build a predictor that classifies a new tissue sample based on its microarray measurements. When faced with the small-sample high-dimensional microarray data, most machine learning algorithm would produce an overly complicated model that performs well on training data but poorly on new data. To reduce the risk of over-fitting, feature selection becomes an essential technique in microarray classification. However, standard feature selection algorithms are bound to underperform when the size of the microarray data is particularly small. The best remedy is to borrow strength from external microarray datasets. In this dissertation, I will present two new multi-task feature filter methods which can improve the classification performance by utilizing the external microarray data. The first method is to aggregate the feature selection results from multiple microarray classification tasks. The resulting multi-task feature selection can be shown to improve quality of the selected features and lead to higher classification accuracy. The second method jointly selects a small gene set with maximal discriminative power and minimal redundancy across multiple classification tasks by solving an objective function with integer constraints. In protein function prediction problem, gene functions are predicted from a predefined set of possible functions (e.g., the functions defined in the Gene Ontology). Gene function prediction is a complex classification problem characterized by the following aspects: (1) a single gene may have multiple functions; (2) the functions are organized in hierarchy; (3) unbalanced training data for each function (much less positive than negative examples); (4) missing class labels; (5) availability of multiple biological data sources, such as microarray data, genome sequence and protein-protein interactions. As participants in the 2011 Critical Assessment of Function Annotation (CAFA) challenge, our team achieved the highest AUC accuracy among 45 groups. In the competition, we gained by focusing on the 5-th aspect of the problem. Thus, in this dissertation, I will discuss several schemes to integrate the prediction scores from multiple data sources and show their results. Interestingly, the experimental results show that a simple averaging integration method is competitive with other state-of-the-art data integration methods. Original spatial scan algorithm is used for detection of spatial overdensities: discovery of spatial subregions with significantly higher scores according to some density measure. This algorithm is widely used in identifying cluster of disease cases (e.g., identifying environmental risk factors for child leukemia). However, the original spatial scan algorithm only works on static spatial data. In this dissertation, I will propose one possible solution for spatial scan on movement data.
Temple University--Theses

Gli stili APA, Harvard, Vancouver, ISO e altri

17

Sandell, Anna. "GIS, data mining and wild land fire data within Räddningstjänsten". Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-543.

Testo completo

Abstract (sommario):

Geographical information systems (GIS), data mining and wild land fire would theoretically be suitable to use together. However, would data mining in reality bring out any useful information from wild land fire data stored within a GIS? In this report an investigation is done if GIS and data mining are used within Räddningstjänsten today in some municipalities of the former Skaraborg. The investigation shows that neither data mining nor GIS are used within the investigated municipalities. However, there is an interest in using GIS within the organisations in the future but also some kind of analysis tool, for example data mining. To show how GIS and data mining could be used in the future within Räddningstjänsten some examples on this were constructed.

Gli stili APA, Harvard, Vancouver, ISO e altri

18

Isik, Narin. "Fuzzy Spatial Data Cube Construction And Its Use In Association Rule Mining". Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606056/index.pdf.

Testo completo

Abstract (sommario):

The popularity of spatial databases increases since the amount of the spatial data that need to be handled has increased by the use of digital maps, images from satellites, video cameras, medical equipment, sensor networks, etc. Spatial data are difficult to examine and extract interesting knowledge
hence, applications that assist decision-making about spatial data like weather forecasting, traffic supervision, mobile communication, etc. have been introduced. In this thesis, more natural and precise knowledge from spatial data is generated by construction of fuzzy spatial data cube and extraction of fuzzy association rules from it in order to improve decision-making about spatial data. This involves an extensive research about spatial knowledge discovery and how fuzzy logic can be used to develop it. It is stated that incorporating fuzzy logic to spatial data cube construction necessitates a new method for aggregation of fuzzy spatial data. We illustrate how this method also enhances the meaning of fuzzy spatial generalization rules and fuzzy association rules with a case-study about weather pattern searching. This study contributes to spatial knowledge discovery by generating more understandable and interesting knowledge from spatial data by extending spatial generalization with fuzzy memberships, extending the spatial aggregation in spatial data cube construction by utilizing weighted measures, and generating fuzzy association rules from the constructed fuzzy spatial data cube.

Gli stili APA, Harvard, Vancouver, ISO e altri

19

Demšar, Urška. "Data mining of geospatial data: combining visual and automatic methods". Doctoral thesis, KTH, School of Architecture and the Built Environment (ABE), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3892.

Testo completo

Abstract (sommario):

Most of the largest databases currently available have a strong geospatial component and contain potentially useful information which might be of value. The discipline concerned with extracting this information and knowledge is data mining. Knowledge discovery is performed by applying automatic algorithms which recognise patterns in the data.

Classical data mining algorithms assume that data are independently generated and identically distributed. Geospatial data are multidimensional, spatially autocorrelated and heterogeneous. These properties make classical data mining algorithms inappropriate for geospatial data, as their basic assumptions cease to be valid. Extracting knowledge from geospatial data therefore requires special approaches. One way to do that is to use visual data mining, where the data is presented in visual form for a human to perform the pattern recognition. When visual mining is applied to geospatial data, it is part of the discipline called exploratory geovisualisation.

Both automatic and visual data mining have their respective advantages. Computers can treat large amounts of data much faster than humans, while humans are able to recognise objects and visually explore data much more effectively than computers. A combination of visual and automatic data mining draws together human cognitive skills and computer efficiency and permits faster and more efficient knowledge discovery.

This thesis investigates if a combination of visual and automatic data mining is useful for exploration of geospatial data. Three case studies illustrate three different combinations of methods. Hierarchical clustering is combined with visual data mining for exploration of geographical metadata in the first case study. The second case study presents an attempt to explore an environmental dataset by a combination of visual mining and a Self-Organising Map. Spatial pre-processing and visual data mining methods were used in the third case study for emergency response data.

Contemporary system design methods involve user participation at all stages. These methods originated in the field of Human-Computer Interaction, but have been adapted for the geovisualisation issues related to spatial problem solving. Attention to user-centred design was present in all three case studies, but the principles were fully followed only for the third case study, where a usability assessment was performed using a combination of a formal evaluation and exploratory usability.

Gli stili APA, Harvard, Vancouver, ISO e altri

20

Bogorny, Vania. "Enhancing spatial association rule mining in geographic databases". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2006. http://hdl.handle.net/10183/7841.

Testo completo

Abstract (sommario):

A técnica de mineração de regras de associação surgiu com o objetivo de encontrar conhecimento novo, útil e previamente desconhecido em bancos de dados transacionais, e uma grande quantidade de algoritmos de mineração de regras de associação tem sido proposta na última década. O maior e mais bem conhecido problema destes algoritmos é a geração de grandes quantidades de conjuntos freqüentes e regras de associação. Em bancos de dados geográficos o problema de mineração de regras de associação espacial aumenta significativamente. Além da grande quantidade de regras e padrões gerados a maioria são associações do domínio geográfico, e são bem conhecidas, normalmente explicitamente representadas no esquema do banco de dados. A maioria dos algoritmos de mineração de regras de associação não garantem a eliminação de dependências geográficas conhecidas a priori. O resultado é que as mesmas associações representadas nos esquemas do banco de dados são extraídas pelos algoritmos de mineração de regras de associação e apresentadas ao usuário. O problema de mineração de regras de associação espacial pode ser dividido em três etapas principais: extração dos relacionamentos espaciais, geração dos conjuntos freqüentes e geração das regras de associação. A primeira etapa é a mais custosa tanto em tempo de processamento quanto pelo esforço requerido do usuário. A segunda e terceira etapas têm sido consideradas o maior problema na mineração de regras de associação em bancos de dados transacionais e tem sido abordadas como dois problemas diferentes: “frequent pattern mining” e “association rule mining”. Dependências geográficas bem conhecidas aparecem nas três etapas do processo. Tendo como objetivo a eliminação dessas dependências na mineração de regras de associação espacial essa tese apresenta um framework com três novos métodos para mineração de regras de associação utilizando restrições semânticas como conhecimento a priori. O primeiro método reduz os dados de entrada do algoritmo, e dependências geográficas são eliminadas parcialmente sem que haja perda de informação. O segundo método elimina combinações de pares de objetos geográficos com dependências durante a geração dos conjuntos freqüentes. O terceiro método é uma nova abordagem para gerar conjuntos freqüentes não redundantes e sem dependências, gerando conjuntos freqüentes máximos. Esse método reduz consideravelmente o número final de conjuntos freqüentes, e como conseqüência, reduz o número de regras de associação espacial.
The association rule mining technique emerged with the objective to find novel, useful, and previously unknown associations from transactional databases, and a large amount of association rule mining algorithms have been proposed in the last decade. Their main drawback, which is a well known problem, is the generation of large amounts of frequent patterns and association rules. In geographic databases the problem of mining spatial association rules increases significantly. Besides the large amount of generated patterns and rules, many patterns are well known geographic domain associations, normally explicitly represented in geographic database schemas. The majority of existing algorithms do not warrant the elimination of all well known geographic dependences. The result is that the same associations represented in geographic database schemas are extracted by spatial association rule mining algorithms and presented to the user. The problem of mining spatial association rules from geographic databases requires at least three main steps: compute spatial relationships, generate frequent patterns, and extract association rules. The first step is the most effort demanding and time consuming task in the rule mining process, but has received little attention in the literature. The second and third steps have been considered the main problem in transactional association rule mining and have been addressed as two different problems: frequent pattern mining and association rule mining. Well known geographic dependences which generate well known patterns may appear in the three main steps of the spatial association rule mining process. Aiming to eliminate well known dependences and generate more interesting patterns, this thesis presents a framework with three main methods for mining frequent geographic patterns using knowledge constraints. Semantic knowledge is used to avoid the generation of patterns that are previously known as non-interesting. The first method reduces the input problem, and all well known dependences that can be eliminated without loosing information are removed in data preprocessing. The second method eliminates combinations of pairs of geographic objects with dependences, during the frequent set generation. A third method presents a new approach to generate non-redundant frequent sets, the maximal generalized frequent sets without dependences. This method reduces the number of frequent patterns very significantly, and by consequence, the number of association rules.

Gli stili APA, Harvard, Vancouver, ISO e altri

21

Weitl, Harms Sherri K. "Temporal association rule methodologies for geo-spatial decision support /". free to MU campus, to others for purchase, 2002. http://wwwlib.umi.com/cr/mo/fullcit?p3091989.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

22

Li, Xiaohui. "A Language and Visual Interface to Specify Complex Spatial Pattern Mining". Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5408/.

Testo completo

Abstract (sommario):

The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.

Gli stili APA, Harvard, Vancouver, ISO e altri

23

Lee, Ho Young. "Diagnosing spatial variation patterns in manufacturing processes". Diss., Texas A&M University, 2003. http://hdl.handle.net/1969/122.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

24

Goler, Isil. "Pattern Extraction By Using Both Spatial And Temporal Features On Turkish Meteorological Data". Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612877/index.pdf.

Testo completo

Abstract (sommario):

With the growth in the size of datasets, data mining has been an important research topic and is receiving substantial interest from both academia and industry for many years. Especially, spatio-temporal data mining, mining knowledge from large amounts of spatio-temporal data, is a highly demanding field because huge amounts of spatio-temporal data are collected in various applications. Therefore, spatio-temporal data mining requires the development of novel data mining algorithms and computational techniques for a successful analysis of large spatio-temporal databases. In this thesis, a spatio-temporal mining technique is proposed and applied on Turkish meteorological data which has been collected from various weather stations in Turkey. This study also includes an analysis and interpretation of spatio-temporal rules generated for Turkish Meteorological data set. We introduce a second level mining technique which is used to define general trends of the patterns according to the spatial changes. Genarated patterns are investigated under different temporal sets in order to monitor the changes of the events with respect to temporal changes.

Gli stili APA, Harvard, Vancouver, ISO e altri

25

Wrede, Fredrik. "An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations". Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287.

Testo completo

Abstract (sommario):

Stochastic reaction-diffusion simulations has become an efficient approach for modelling spatial aspects of intracellular biochemical reaction networks. By accounting for intrinsic noise due to low copy number of chemical species, stochastic reaction-diffusion simulations have the ability to more accurately predict and model biological systems. As with many simulations software, exploration of the parameters associated with the model can be needed to yield new knowledge about the underlying system. The exploration can be conducted by executing parameter sweeps for a model. However, with little or no prior knowledge about the modelled system, the effort for practitioners to explore the parameter space can get overwhelming. To account for this problem we perform a feasibility study on an explorative behavioural analysis of stochastic reaction-diffusion simulations by applying spatial-temporal data mining to large parameter sweeps. By reducing individual simulation outputs into a feature space involving simple time series and distribution analytics, we were able to find similar behaving simulations after performing an agglomerative hierarchical clustering.

Gli stili APA, Harvard, Vancouver, ISO e altri

26

BATRA, SHALINI. "DISCOVERY OF CLUSTERS IN SPATIAL DATABASES". University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1069701237.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

27

Schubert, Erich. "Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining". Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-166938.

Testo completo

Abstract (sommario):

Knowledge Discovery in Databases (KDD) ist der Prozess, nicht-triviale Muster aus großen Datenbanken zu extrahieren, mit dem Ziel, dass diese bisher unbekannt, potentiell nützlich, statistisch fundiert und verständlich sind. Der Prozess umfasst mehrere Schritte wie die Selektion, Vorverarbeitung, Evaluierung und den Analyseschritt, der als Data-Mining bekannt ist. Eine der zentralen Aufgabenstellungen im Data-Mining ist die Ausreißererkennung, das Identifizieren von Beobachtungen, die ungewöhnlich sind und mit der Mehrzahl der Daten inkonsistent erscheinen. Solche seltene Beobachtungen können verschiedene Ursachen haben: Messfehler, ungewöhnlich starke (aber dennoch genuine) Abweichungen, beschädigte oder auch manipulierte Daten. In den letzten Jahren wurden zahlreiche Verfahren zur Erkennung von Ausreißern vorgeschlagen, die sich oft nur geringfügig zu unterscheiden scheinen, aber in den Publikationen experimental als ``klar besser'' dargestellt sind. Ein Schwerpunkt dieser Arbeit ist es, die unterschiedlichen Verfahren zusammenzuführen und in einem gemeinsamen Formalismus zu modularisieren. Damit wird einerseits die Analyse der Unterschiede vereinfacht, andererseits aber die Flexibilität der Verfahren erhöht, indem man Module hinzufügen oder ersetzen und damit die Methode an geänderte Anforderungen und Datentypen anpassen kann. Um die Vorteile der modularisierten Struktur zu zeigen, werden (i) zahlreiche bestehende Algorithmen in dem Schema formalisiert, (ii) neue Module hinzugefügt, um die Robustheit, Effizienz, statistische Aussagekraft und Nutzbarkeit der Bewertungsfunktionen zu verbessern, mit denen die existierenden Methoden kombiniert werden können, (iii) Module modifiziert, um bestehende und neue Algorithmen auf andere, oft komplexere, Datentypen anzuwenden wie geographisch annotierte Daten, Zeitreihen und hochdimensionale Räume, (iv) mehrere Methoden in ein Verfahren kombiniert, um bessere Ergebnisse zu erzielen, (v) die Skalierbarkeit auf große Datenmengen durch approximative oder exakte Indizierung verbessert. Ausgangspunkt der Arbeit ist der Algorithmus Local Outlier Factor (LOF). Er wird zunächst mit kleinen Erweiterungen modifiziert, um die Robustheit und die Nutzbarkeit der Bewertung zu verbessern. Diese Methoden werden anschließend in einem gemeinsamen Rahmen zur Erkennung lokaler Ausreißer formalisiert, um die entsprechenden Vorteile auch in anderen Algorithmen nutzen zu können. Durch Abstraktion von einem einzelnen Vektorraum zu allgemeinen Datentypen können auch räumliche und zeitliche Beziehungen analysiert werden. Die Verwendung von Unterraum- und Korrelations-basierten Nachbarschaften ermöglicht dann, einen neue Arten von Ausreißern in beliebig orientierten Projektionen zu erkennen. Verbesserungen bei den Bewertungsfunktionen erlauben es, die Bewertung mit der statistischen Intuition einer Wahrscheinlichkeit zu interpretieren und nicht nur eine Ausreißer-Rangfolge zu erstellen wie zuvor. Verbesserte Modelle generieren auch Erklärungen, warum ein Objekt als Ausreißer bewertet wurde. Anschließend werden für verschiedene Module Verbesserungen eingeführt, die unter anderem ermöglichen, die Algorithmen auf wesentlich größere Datensätze anzuwenden -- in annähernd linearer statt in quadratischer Zeit --, indem man approximative Nachbarschaften bei geringem Verlust an Präzision und Effektivität erlaubt. Des weiteren wird gezeigt, wie mehrere solcher Algorithmen mit unterschiedlichen Intuitionen gleichzeitig benutzt und die Ergebnisse in einer Methode kombiniert werden können, die dadurch unterschiedliche Arten von Ausreißern erkennen kann. Schließlich werden für reale Datensätze neue Ausreißeralgorithmen konstruiert, die auf das spezifische Problem angepasst sind. Diese neuen Methoden erlauben es, so aufschlussreiche Ergebnisse zu erhalten, die mit den bestehenden Methoden nicht erreicht werden konnten. Da sie aus den Bausteinen der modularen Struktur entwickelt wurden, ist ein direkter Bezug zu den früheren Ansätzen gegeben. Durch Verwendung der Indexstrukturen können die Algorithmen selbst auf großen Datensätzen effizient ausgeführt werden.
Knowledge Discovery in Databases (KDD) is the process of extracting non-trivial patterns in large data bases, with the focus of extracting novel, potentially useful, statistically valid and understandable patterns. The process involves multiple phases including selection, preprocessing, evaluation and the analysis step which is known as Data Mining. One of the key techniques of Data Mining is outlier detection, that is the identification of observations that are unusual and seemingly inconsistent with the majority of the data set. Such rare observations can have various reasons: they can be measurement errors, unusually extreme (but valid) measurements, data corruption or even manipulated data. Over the previous years, various outlier detection algorithms have been proposed that often appear to be only slightly different than previous but ``clearly outperform'' the others in the experiments. A key focus of this thesis is to unify and modularize the various approaches into a common formalism to make the analysis of the actual differences easier, but at the same time increase the flexibility of the approaches by allowing the addition and replacement of modules to adapt the methods to different requirements and data types. To show the benefits of the modularized structure, (i) several existing algorithms are formalized within the new framework (ii) new modules are added that improve the robustness, efficiency, statistical validity and score usability and that can be combined with existing methods (iii) modules are modified to allow existing and new algorithms to run on other, often more complex data types including spatial, temporal and high-dimensional data spaces (iv) the combination of multiple algorithm instances into an ensemble method is discussed (v) the scalability to large data sets is improved using approximate as well as exact indexing. The starting point is the Local Outlier Factor (LOF) algorithm, which is extended with slight modifications to increase robustness and the usability of the produced scores. In order to get the same benefits for other methods, these methods are abstracted to a general framework for local outlier detection. By abstracting from a single vector space, other data types that involve spatial and temporal relationships can be analyzed. The use of subspace and correlation neighborhoods allows the algorithms to detect new kinds of outliers in arbitrarily oriented subspaces. Improvements in the score normalization bring back a statistic intuition of probabilities to the outlier scores that previously were only useful for ranking objects, while improved models also offer explanations of why an object was considered to be an outlier. Subsequently, for different modules found in the framework improved modules are presented that for example allow to run the same algorithms on significantly larger data sets -- in approximately linear complexity instead of quadratic complexity -- by accepting approximated neighborhoods at little loss in precision and effectiveness. Additionally, multiple algorithms with different intuitions can be run at the same time, and the results combined into an ensemble method that is able to detect outliers of different types. Finally, new outlier detection methods are constructed; customized for the specific problems of these real data sets. The new methods allow to obtain insightful results that could not be obtained with the existing methods. Since being constructed from the same building blocks, there however exists a strong and explicit connection to the previous approaches, and by using the indexing strategies introduced earlier, the algorithms can be executed efficiently even on large data sets.

Gli stili APA, Harvard, Vancouver, ISO e altri

28

Dos, Santos Raimundo Fonseca Jr. "Effective Methods of Semantic Analysis in Spatial Contexts". Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/49697.

Testo completo

Abstract (sommario):

With the growing spread of spatial data, exploratory analysis has gained a considerable amount of attention. Particularly in the fields of Information Retrieval and Data Mining, the integration of data points helps uncover interesting patterns not always visible to the naked eye. Social networks often link entities that share places and activities; marketing tools target users based on behavior and preferences; and medical technology combines symptoms to categorize diseases. Many of the current approaches in this field of research depend on semantic analysis, which is good for inferencing and decision making. From a functional point of view, objects can be investigated from a spatial and temporal perspectives. The former attempts to verify how proximity makes the objects related; the latter adds a measure of coherence by enforcing time ordering. This type of spatio-temporal reasoning examines several aspects of semantic analysis and their characteristics: shared relationships among objects, matches versus mismatches of values, distances among parents and children, and bruteforce comparison of attributes. Most of these approaches suffer from the pitfalls of disparate data, often missing true relationships, failing to deal with inexact vocabularies, ignoring missing values, and poorly handling multiple attributes. In addition, the vast majority does not consider the spatio-temporal aspects of the data. This research studies semantic techniques of data analysis in spatial contexts. The proposed solutions represent different methods on how to relate spatial entities or sequences of entities. They are able to identify relationships that are not explicitly written down. Major contributions of this research include (1) a framework that computes a numerical entity similarity, denoted a semantic footprint, composed of spatial, dimensional, and ontological facets; (2) a semantic approach that translates categorical data into a numerical score, which permits ranking and ordering; (3) an extensive study of GML as a representative spatial structure of how semantic analysis methods are influenced by its approaches to storage, querying, and parsing; (4) a method to find spatial regions of high entity density based on a clustering coefficient; (5) a ranking strategy based on connectivity strength which differentiates important relationships from less relevant ones; (6) a distance measure between entity sequences that quantifies the most related streams of information; (7) three distance-based measures (one probabilistic, one based on spatial influence, and one that is spatiological) that quantifies the interactions among entities and events; (8) a spatio-temporal method to compute the coherence of a data sequence.
Ph. D.

Gli stili APA, Harvard, Vancouver, ISO e altri

29

Icev, Aleksandar. "DARM distance-based association rule mining". Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0506103-132405.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

30

Leighty, Brian David. "Data Mining for Induction of Adjacency Grammars and Application to Terrain Pattern Recognition". NSUWorks, 2009. http://nsuworks.nova.edu/gscis_etd/212.

Testo completo

Abstract (sommario):

The process of syntactic pattern recognition makes the analogy between the syntax of languages and the structure of spatial patterns. The recognition process is achieved by parsing a given pattern to determine if it is syntactically correct with respect to a defined grammar. The generation of pattern grammars can be a cumbersome process when many objects are involved. This has led to the problem of spatial grammar inference. Current approaches have used genetic algorithms and inductive techniques and have demonstrated limitations. Alternative approaches are needed that produce accurate grammars while remaining computationally efficient in light of the NP-hardness of the problem. Co-location rule mining techniques in the field of Knowledge Discovery and Data Mining address the complexity issue using neighborhood restrictions and pruning strategies based on monotonic Measures Of Interest. The goal of this research was to develop and evaluate an inductive method for inferring an adjacency grammar utilizing co-location rule mining techniques to gain efficiency while providing accurate and concise production sets. The method incrementally discovers, without supervision, adjacency patterns in spatial samples, relabels them via a production rule and repeats the procedure with the newly labeled regions. The resulting rules are used to form an adjacency grammar. Grammars were generated and evaluated within the context of a syntactic pattern recognition system that identifies landform patterns in terrain elevation datasets. The proposed method was tested using a k-fold cross-validation methodology. Two variations were also tested using unsupervised and supervised training, both with no rule pruning. Comparison of these variations with the proposed method demonstrated the effectiveness of rule pruning and rule discovery. Results showed that the proposed method of rule inference produced rulesets having recall, precision and accuracy values of 82.6%, 97.7% and 92.8%, respectively, which are similar to those using supervised training. These rulesets were also the smallest, had the lowest average number of rules fired in parsing, and had the shortest average parse time. The use of rule pruning substantially reduced rule inference time (104.4 s vs. 208.9 s). The neighborhood restriction used in adjacency calculations demonstrated linear complexity in the number of regions.

Gli stili APA, Harvard, Vancouver, ISO e altri

31

Schmid, Klaus Arthur [Verfasser], e Matthias [Akademischer Betreuer] Renz. "Searching and mining in enriched geo-spatial data / Klaus Arthur Schmid ; Betreuer: Matthias Renz". München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2016. http://d-nb.info/1122435746/34.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

32

Franzke, Maximilian [Verfasser], e Matthias [Akademischer Betreuer] Renz. "Querying and mining heterogeneous spatial, social, and temporal data / Maximilian Franzke ; Betreuer: Matthias Renz". München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2019. http://d-nb.info/1190563630/34.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

33

Yan, Ping. "SPATIAL-TEMPORAL DATA ANALYTICS AND CONSUMER SHOPPING BEHAVIOR MODELING". Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/195232.

Testo completo

Abstract (sommario):

RFID technologies are being recently adopted in the retail space tracking consumer in-store movements. The RFID-collected data are location sensitive and constantly updated as a consumer moves inside a store. By capturing the entire shopping process including the movement path rather than analyzing merely the shopping basket at check-out, the RFID-collected data provide unique and exciting opportunities to study consumer purchase behavior and thus lead to actionable marketing applications.This dissertation research focuses on (a) advancing the representation and management of the RFID-collected shopping path data; (b) analyzing, modeling and predicting customer shopping activities with a spatial pattern discovery approach and a dynamic probabilistic modeling based methodology to enable advanced spatial business intelligence. The spatial pattern discovery approach identifies similar consumers based on a similarity metric between consumer shopping paths. The direct applications of this approach include a novel consumer segmentation methodology and an in-store real-time product recommendation algorithm. A hierarchical decision-theoretic model based on dynamic Bayesian networks (DBN) is developed to model consumer in-store shopping activities. This model can be used to predict a shopper's purchase goal in real time, infer her shopping actions, and estimate the exact product she is viewing at a time. We develop an approximate inference algorithm based on particle filters and a learning procedure based on the Expectation-Maximization (EM) algorithm to perform filtering and prediction for the network model. The developed models are tested on a real RFID-collected shopping trip dataset with promising results in terms of prediction accuracies of consumer purchase interests.This dissertation contributes to the marketing and information systems literature in several areas. First, it provides empirical insights about the correlation between spatial movement patterns and consumer purchase interests. Such correlation is demonstrated with in-store shopping data, but can be generalized to other marketing contexts such as store visit decisions by consumers and location and category management decisions by a retailer. Second, our study shows the possibility of utilizing consumer in-store movement to predict consumer purchase. The predictive models we developed have the potential to become the base of an intelligent shopping environment where store managers customize marketing efforts to provide location-aware recommendations to consumers as they travel through the store.

Gli stili APA, Harvard, Vancouver, ISO e altri

34

Du, Xiaoxi. "Migration Motif: A Spatial-Temporal Pattern Mining Approach for Financial Markets". [Kent, Ohio] : Kent State University, 2009. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1239139458.

Testo completo

Abstract (sommario):

Thesis (M.S.)--Kent State University, 2009.
Title from PDF t.p. (viewed Nov. 13, 2009). Advisor: Ruoming Jin. Keywords: migration motif, trajectory mining, sequential pattern mining, time series clustering. Includes bibliographical references (p. 47-57).

Gli stili APA, Harvard, Vancouver, ISO e altri

35

Wang, Xiaofeng. "New Procedures for Data Mining and Measurement Error Models with Medical Imaging Applications". Case Western Reserve University School of Graduate Studies / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=case1121447716.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

36

Kucuktunc, Onur. "Result Diversification on Spatial, Multidimensional, Opinion, and Bibliographic Data". The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374148621.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

37

Khiali, Lynda. "Fouille de données à partir de séries temporelles d’images satellites". Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS046/document.

Testo completo

Abstract (sommario):

Les images satellites représentent de nos jours une source d’information incontournable. Elles sont exploitées dans diverses applications, telles que : la gestion des risques, l’aménagent des territoires, la cartographie du sol ainsi qu’une multitude d’autre taches. Nous exploitons dans cette thèse les Séries Temporelles d’Images Satellites (STIS) pour le suivi des évolutions des habitats naturels et semi-naturels. L’objectif est d’identiﬁer, organiser et mettre en évidence des patrons d’évolution caractéristiques de ces zones.Nous proposons des méthodes d’analyse de STIS orientée objets, en opposition aux approches par pixel, qui exploitent des images satellites segmentées. Nous identiﬁons d’abord les proﬁls d’évolution des objets de la série. Ensuite, nous analysons ces proﬁls en utilisant des méthodes d’apprentissage automatique. Aﬁn d’identiﬁer les proﬁls d’évolution, nous explorons les objets de la série pour déterminer un sous-ensemble d’objets d’intérêt (entités spatio-temporelles/objets de référence). L’évolution de ces entités spatio-temporelles est ensuite illustrée en utilisant des graphes d’évolution.Aﬁn d’analyser les graphes d’évolution, nous avons proposé trois contributions. La première contribution explore des STIS annuelles. Elle permet d’analyser les graphes d’évolution en utilisant des algorithmes de clustering, aﬁn de regrouper les entités spatio-temporelles évoluant similairement. Dans la deuxième contribution, nous proposons une méthode d’analyse pluri-annuelle et multi-site. Nous explorons plusieurs sites d’étude qui sont décrits par des STIS pluri-annuelles. Nous utilisons des algorithmes de clustering aﬁn d’identiﬁer des similarités intra et inter-site. Dans la troisième contribution, nous introduisons une méthode d’analyse semi-supervisée basée sur du clustering par contraintes. Nous proposons une méthode de sélection de contraintes. Ces contraintes sont utilisées pour guider le processus de clustering et adapter le partitionnement aux besoins de l’utilisateur.Nous avons évalué nos travaux sur diﬀérents sites d’étude. Les résultats obtenus ont permis d’identiﬁer des proﬁls d’évolution types sur chaque site d’étude. En outre, nous avons aussi identiﬁé des évolutions caractéristiques communes à plusieurs sites. Par ailleurs, la sélection de contraintes pour l’apprentissage semi-supervisé a permis d’identiﬁer des entités proﬁtables à l’algorithme de clustering. Ainsi, les partitionnements obtenus en utilisant l’apprentissage non supervisé ont été améliorés et adaptés aux besoins de l’utilisateur
Nowadays, remotely sensed images constitute a rich source of information that can be leveraged to support several applications including risk prevention, land use planning, land cover classiﬁcation and many other several tasks. In this thesis, Satellite Image Time Series (SITS) are analysed to depict the dynamic of natural and semi-natural habitats. The objective is to identify, organize and highlight the evolution patterns of these areas.We introduce an object-oriented method to analyse SITS that consider segmented satellites images. Firstly, we identify the evolution proﬁles of the objects in the time series. Then, we analyse these proﬁles using machine learning methods. To identify the evolution proﬁles, we explore all the objects to select a subset of objects (spatio-temporal entities/reference objects) to be tracked. The evolution of the selected spatio-temporal entities is described using evolution graphs.To analyse these evolution graphs, we introduced three contributions. The ﬁrst contribution explores annual SITS. It analyses the evolution graphs using clustering algorithms, to identify similar evolutions among the spatio-temporal entities. In the second contribution, we perform a multi-annual cross-site analysis. We consider several study areas described by multi-annual SITS. We use the clustering algorithms to identify intra and inter-site similarities. In the third contribution, we introduce à semi-supervised method based on constrained clustering. We propose a method to select the constraints that will be used to guide the clustering and adapt the results to the user needs.Our contributions were evaluated on several study areas. The experimental results allow to pinpoint relevant landscape evolutions in each study sites. We also identify the common evolutions among the diﬀerent sites. In addition, the constraint selection method proposed in the constrained clustering allows to identify relevant entities. Thus, the results obtained using the unsupervised learning were improved and adapted to meet the user needs

Gli stili APA, Harvard, Vancouver, ISO e altri

38

Schubert, Erich [Verfasser], e Hans-Peter [Akademischer Betreuer] Kriegel. "Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining / Erich Schubert. Betreuer: Hans-Peter Kriegel". München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2013. http://d-nb.info/1048522377/34.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

39

Daniel, Guilherme Priólli [UNESP]. "Otimização de algoritmos de agrupamento espacial baseado em densidade aplicados em grandes conjuntos de dados". Universidade Estadual Paulista (UNESP), 2016. http://hdl.handle.net/11449/143832.

Testo completo

Abstract (sommario):

Submitted by Guilherme Priólli Daniel (gui.computacao@yahoo.com.br) on 2016-09-06T13:30:29Z No. of bitstreams: 1 Dissertação_final.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5)
Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-09-09T17:54:56Z (GMT) No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5)
Made available in DSpace on 2016-09-09T17:54:56Z (GMT). No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) Previous issue date: 2016-08-12
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
A quantidade de dados gerenciados por serviços Web de grande escala tem crescido significantemente e passaram a ser chamados de Big Data. Esses conjuntos de dados podem ser definidos como um grande volume de dados complexos provenientes de múltiplas fontes que ultrapassam a capacidade de armazenamento e processamento dos computadores atuais. Dentro desses conjuntos, estima-se que 80% dos dados possuem associação com alguma posição espacial. Os dados espaciais são mais complexos e demandam mais tempo de processamento que os dados alfanuméricos. Nesse sentido, as técnicas de MapReduce e sua implementação têm sido utilizadas a fim de retornar resultados em tempo hábil com a paralelização dos algoritmos de prospecção de dados. Portanto, o presente trabalho propõe dois algoritmos de agrupamento espacial baseado em densidade: o VDBSCAN-MR e o OVDBSCAN-MR. Ambos os algoritmos utilizam técnicas de processamento distribuído e escalável baseadas no modelo de programação MapReduce com intuito de otimizar o desempenho e permitir a análise em conjuntos Big Data. Por meio dos experimentos realizados foi possível verificar que os algoritmos desenvolvidos apresentaram melhor qualidade nos agrupamentos encontrados em comparação com os algoritmos tomados como base. Além disso, o VDBSCAN-MR obteve um melhor desempenho que o algoritmo sequencial e suportou a aplicação em grandes conjuntos de dados espaciais.
The amount of data managed by large-scale Web services has increased significantly and it arise to the status of Big Data. These data sets can be defined as a large volume of complex data from multiple data sources exceeding the storage and processing capacity of current computers. In such data sets, about 80% of the data is associated with some spatial position. Spatial data is even more complex and require more processing time than what would be required for alphanumeric data. In that sense, MapReduce techniques and their implementation have returned results timely with parallelization of data mining algorithms and could apply for Big Data sets. Therefore, this work develops two density-based spatial clustering algorithms: VDBSCAN-MR and OVDBSCAN-MR. Both algorithms use distributed and scalable processing techniques based on the MapReduce programming model in order to optimize performance and enable Big Data analysis. Throughout experimentation, we observed that the developed algorithms have better quality clusters compared to the base algorithms. Furthermore, VDBSCAN-MR achieved a better performance than the original sequential algorithm and it supported the application on large spatial data sets.

Gli stili APA, Harvard, Vancouver, ISO e altri

40

Mendez, Chaves Diego. "A Framework for Participatory Sensing Systems". Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4135.

Testo completo

Abstract (sommario):

Participatory sensing (PS) systems are a new emerging sensing paradigm based on the participation of cellular users in a cooperative way. Due to the spatio-temporal granularity that a PS system can provide, it is now possible to detect and analyze events that occur at different scales, at a low cost. While PS systems present interesting characteristics, they also create new problems. Since the measuring devices are cheaper and they are in the hands of the users, PS systems face several design challenges related to the poor accuracy and high failure rate of the sensors, the possibility of malicious users tampering the data, the violation of the privacy of the users as well as methods to encourage the participation of the users, and the effective visualization of the data. This dissertation presents four main contributions in order to solve some of these challenges. This dissertation presents a framework to guide the design and implementation of PS applications considering all these aspects. The framework consists of five modules: sample size determination, data collection, data verification, data visualization, and density maps generation modules. The remaining contributions are mapped one-on-one to three of the modules of this framework: data verification, data visualization and density maps. Data verification, in the context of PS, consists of the process of detecting and removing spatial outliers to properly reconstruct the variables of interest. A new algorithm for spatial outliers detection and removal is proposed, implemented, and tested. This hybrid neighborhood-aware algorithm considers the uneven spatial density of the users, the number of malicious users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors. The experimental results show that the proposed algorithm performs as good as the best estimator while reducing the execution time considerably. The problem of data visualization in the context of PS application is also of special interest. The characteristics of a typical PS application imply the generation of multivariate time-space series with many gaps in time and space. Considering this, a new method is presented based on the kriging technique along with Principal Component Analysis and Independent Component Analysis. Additionally, a new technique to interpolate data in time and space is proposed, which is more appropriate for PS systems. The results indicate that the accuracy of the estimates improves with the amount of data, i.e., one variable, multiple variables, and space and time data. Also, the results clearly show the advantage of a PS system compared with a traditional measuring system in terms of the precision and spatial resolution of the information provided to the users. One key challenge in PS systems is that of the determination of the locations and number of users where to obtain samples from so that the variables of interest can be accurately represented with a low number of participants. To address this challenge, the use of density maps is proposed, a technique that is based on the current estimations of the variable. The density maps are then utilized by the incentive mechanism in order to encourage the participation of those users indicated in the map. The experimental results show how the density maps greatly improve the quality of the estimations while maintaining a stable and low total number of users in the system. P-Sense, a PS system to monitor pollution levels, has been implemented and tested, and is used as a validation example for all the contributions presented here. P-Sense integrates gas and environmental sensors with a cell phone, in order to monitor air quality levels.

Gli stili APA, Harvard, Vancouver, ISO e altri

41

Fu, Kaiqun. "Spatiotemporal Event Forecasting and Analysis with Ubiquitous Urban Sensors". Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/104165.

Testo completo

Abstract (sommario):

The study of information extraction and knowledge exploration in the urban environment is gaining popularity. Ubiquitous sensors and a plethora of statistical reports provide an immense amount of heterogeneous urban data, such as traffic data, crime activity statistics, social media messages, and street imagery. The development of methods for heterogeneous urban data-based event identification and impacts analysis for a variety of event topics and assumptions is the subject of this dissertation. A graph convolutional neural network for crime prediction, a multitask learning system for traffic incident prediction with spatiotemporal feature learning, social media-based transportation event detection, and a graph convolutional network-based cyberbullying detection algorithm are the four methods proposed. Additionally, based on the sensitivity of these urban sensor data, a comprehensive discussion on ethical issues of urban computing is presented. This work makes the following contributions in urban perception predictions: 1) Create a preference learning system for inferring crime rankings from street view images using a bidirectional convolutional neural network (bCNN). 2) Propose a graph convolutional networkbased solution to the current urban crime perception problem; 3) Develop street view image retrieval algorithms to demonstrate real city perception. This work also makes the following contributions in traffic incident effect analysis: 1) developing a novel machine learning system for predicting traffic incident duration using temporal features; 2) modeling traffic speed similarity among road segments using spatial connectivity in feature space; and 3) proposing a sparse feature learning method for identifying groups of temporal features at a higher level. In transportation-related incidents detection, this work makes the following contributions: 1) creating a real-time social media-based traffic incident detection platform; 2) proposing a query expansion algorithm for traffic-related tweets; and 3) developing a text summarization tool for redundant traffic-related tweets. Cyberbullying detection from social media platforms is one of the major focus of this work: 1) Developing an online Dynamic Query Expansion process using concatenated keyword search. 2) Formulating a graph structure of tweet embeddings and implementing a Graph Convolutional Network for fine-grained cyberbullying classification. 3) Curating a balanced multiclass cyberbullying dataset from DQE, and making it publicly available. Additionally, this work seeks to identify ethical vulnerabilities from three primary research directions of urban computing: urban safety analysis, urban transportation analysis, and social media analysis for urban events. Visions for future improvements in the perspective of ethics are addressed.
Doctor of Philosophy
The ubiquitously deployed urban sensors such as traffic speed meters, street-view cameras, and even smartphones in everybody's pockets are generating terabytes of data every hour. How do we refine the valuable intelligence out of such explosions of urban data and information became one of the profitable questions in the field of data mining and urban computing. In this dissertation, four innovative applications are proposed to solve real-world problems with big data of the urban sensors. In addition, the foreseeable ethical vulnerabilities in the research fields of urban computing and event predictions are addressed. The first work explores the connection between urban perception and crime inferences. StreetNet is proposed to learn crime rankings from street view images. This work presents the design of a street view images retrieval algorithm to improve the representation of urban perception. A data-driven, spatiotemporal algorithm is proposed to find unbiased label mappings between the street view images and the crime ranking records. The second work proposes a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. Such functionality provided by this model is helpful for the transportation operators and first responders to judge the influences of traffic incidents. In the third work, a social media-based traffic status monitoring system is established. The system is initiated by a transportation-related keyword generation process. A state-of-the-art tweets summarization algorithm is designed to eliminate the redundant tweets information. In addition, we show that the proposed tweets query expansion algorithm outperforms the previous methods. The fourth work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim's age, ethnicity, gender, religion, or other quality. This work represents a step forward for establishing an active anti-cyberbullying presence in social media and a step forward towards a future without cyberbullying. Finally, a discussion of the ethical issues in the urban computing community is addressed. This work seeks to identify ethical vulnerabilities from three primary research directions of urban computing: urban safety analysis, urban transportation analysis, and social media analysis for urban events. Visions for future improvements in the perspective of ethics are pointed out.

Gli stili APA, Harvard, Vancouver, ISO e altri

42

Zhou, Guoqing. "Co-Location Decision Tree for Enhancing Decision-Making of Pavement Maintenance and Rehabilitation". Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/26059.

Testo completo

Abstract (sommario):

A pavement management system (PMS) is a valuable tool and one of the critical elements of the highway transportation infrastructure. Since a vast amount of pavement data is frequently and continuously being collected, updated, and exchanged due to rapidly deteriorating road conditions, increased traffic loads, and shrinking funds, resulting in the rapid accumulation of a large pavement database, knowledge-based expert systems (KBESs) have therefore been developed to solve various transportation problems. This dissertation presents the development of theory and algorithm for a new decision tree induction method, called co-location-based decision tree (CL-DT.) This method will enhance the decision-making abilities of pavement maintenance personnel and their rehabilitation strategies. This idea stems from shortcomings in traditional decision tree induction algorithms, when applied in the pavement treatment strategies. The proposed algorithm utilizes the co-location (co-occurrence) characteristics of spatial attribute data in the pavement database. With the proposed algorithm, one distinct event occurrence can associate with two or multiple attribute values that occur simultaneously in spatial and temporal domains. This research dissertation describes the details of the proposed CL-DT algorithms and steps of realizing the proposed algorithm. First, the dissertation research describes the detailed colocation mining algorithm, including spatial attribute data selection in pavement databases, the determination of candidate co-locations, the determination of table instances of candidate colocations, pruning the non-prevalent co-locations, and induction of co-location rules. In this step, a hybrid constraint, i.e., spatial geometric distance constraint condition and a distinct event-type constraint condition, is developed. The spatial geometric distance constraint condition is a neighborhood relationship-based spatial joins of table instances for many prevalent co-locations with one prevalent co-location; and the distance event-type constraint condition is a Euclidean distance between a set of attributes and its corresponding clusters center of attributes. The dissertation research also developed the spatial feature pruning method using the multi-resolution pruning criterion. The cross-correlation criterion of spatial features is used to remove the nonprevalent co-locations from the candidate prevalent co-location set under a given threshold. The dissertation research focused on the development of the co-location decision tree (CL-DT) algorithm, which includes the non-spatial attribute data selection in the pavement management database, co-location algorithm modeling, node merging criteria, and co-location decision tree induction. In this step, co-location mining rules are used to guide the decision tree generation and induce decision rules. For each step, this dissertation gives detailed flowcharts, such as flowchart of co-location decision tree induction, co-location/co-occurrence decision tree algorithm, algorithm of colocation/co-occurrence decision tree (CL-DT), and outline of steps of SFS (Sequential Feature Selection) algorithm. Finally, this research used a pavement database covering four counties, which are provided by NCDOT (North Carolina Department of Transportation), to verify and test the proposed method. The comparison analyses of different rehabilitation treatments proposed by NCDOT, by the traditional DT induction algorithm and by the proposed new method are conducted. Findings and conclusions include: (1) traditional DT technology can make a consistent decision for road maintenance and rehabilitation strategy under the same road conditions, i.e., less interference from human factors; (2) the traditional DT technology can increase the speed of decision-making because the technology automatically generates a decision-tree and rules if the expert knowledge is given, which saves time and expenses for PMS; (3) integration of the DT and GIS can provide the PMS with the capabilities of graphically displaying treatment decisions, visualizing the attribute and non-attribute data, and linking data and information to the geographical coordinates. However, the traditional DT induction methods are not as quite intelligent as oneâ s expectations. Thus, post-processing and refinement is necessary. Moreover, traditional DT induction methods for pavement M&R strategies only used the non-spatial attribute data. It has been demonstrated from this dissertation research that the spatial data is very useful for the improvement of decision-making processes for pavement treatment strategies. In addition, the decision trees are based on the knowledge acquired from pavement management engineers for strategy selection. Thus, different decision-trees can be built if the requirement changes.
Ph. D.

Gli stili APA, Harvard, Vancouver, ISO e altri

43

Ågren, Ola. "Finding, extracting and exploiting structure in text and hypertext". Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-22352.

Testo completo

Abstract (sommario):

Data mining is a fast-developing field of study, using computations to either predict or describe large amounts of data. The increase in data produced each year goes hand in hand with this, requiring algorithms that are more and more efficient in order to find interesting information within a given time. In this thesis, we study methods for extracting information from semi-structured data, for finding structure within large sets of discrete data, and to efficiently rank web pages in a topic-sensitive way. The information extraction research focuses on support for keeping both documentation and source code up to date at the same time. Our approach to this problem is to embed parts of the documentation within strategic comments of the source code and then extracting them by using a specific tool. The structures that our structure mining algorithms are able to find among crisp data (such as keywords) is in the form of subsumptions, i.e. one keyword is a more general form of the other. We can use these subsumptions to build larger structures in the form of hierarchies or lattices, since subsumptions are transitive. Our tool has been used mainly as input to data mining systems and for visualisation of data-sets. The main part of the research has been on ranking web pages in a such a way that both the link structure between pages and also the content of each page matters. We have created a number of algorithms and compared them to other algorithms in use today. Our focus in these comparisons have been on convergence rate, algorithm stability and how relevant the answer sets from the algorithms are according to real-world users. The research has focused on the development of efficient algorithms for gathering and handling large data-sets of discrete and textual data. A proposed system of tools is described, all operating on a common database containing "fingerprints" and meta-data about items. This data could be searched by various algorithms to increase its usefulness or to find the real data more efficiently. All of the methods described handle data in a crisp manner, i.e. a word or a hyper-link either is or is not a part of a record or web page. This means that we can model their existence in a very efficient way. The methods and algorithms that we describe all make use of this fact.
Informationsutvinning (som ofta kallas data mining även på svenska) är ett forskningsområde som hela tiden utvecklas. Det handlar om att använda datorer för att hitta mönster i stora mängder data, alternativt förutsäga framtida data utifrån redan tillgänglig data. Eftersom det samtidigt produceras mer och mer data varje år ställer detta högre och högre krav på effektiviteten hos de algoritmer som används för att hitta eller använda informationen inom rimlig tid. Denna avhandling handlar om att extrahera information från semi-strukturerad data, att hitta strukturer i stora diskreta datamängder och att på ett effektivt sätt rangordna webbsidor utifrån ett ämnesbaserat perspektiv. Den informationsextraktion som beskrivs handlar om stöd för att hålla både dokumentationen och källkoden uppdaterad samtidigt. Vår lösning på detta problem är att låta delar av dokumentationen (främst algoritmbeskrivningen) ligga som blockkommentarer i källkoden och extrahera dessa automatiskt med ett verktyg. De strukturer som hittas av våra algoritmer för strukturextraktion är i form av underordnanden, exempelvis att ett visst nyckelord är mer generellt än ett annat. Dessa samband kan utnyttjas för att skapa större strukturer i form av hierarkier eller riktade grafer, eftersom underordnandena är transitiva. Det verktyg som vi har tagit fram har främst använts för att skapa indata till ett informationsutvinningssystem samt för att kunna visualisera indatan. Huvuddelen av den forskning som beskrivs i denna avhandling har dock handlat om att kunna rangordna webbsidor utifrån både deras innehåll och länkarna som finns mellan dem. Vi har skapat ett antal algoritmer och visat hur de beter sig i jämförelse med andra algoritmer som används idag. Dessa jämförelser har huvudsakligen handlat om konvergenshastighet, algoritmernas stabilitet givet osäker data och slutligen hur relevant algoritmernas svarsmängder har ansetts vara utifrån användarnas perspektiv. Forskningen har varit inriktad på effektiva algoritmer för att hämta in och hantera stora datamängder med diskreta eller textbaserade data. I avhandlingen presenterar vi även ett förslag till ett system av verktyg som arbetar tillsammans på en databas bestående av “fingeravtryck” och annan meta-data om de saker som indexerats i databasen. Denna data kan sedan användas av diverse algoritmer för att utöka värdet hos det som finns i databasen eller för att effektivt kunna hitta rätt information.
AlgExt, CHiC, ProT

Gli stili APA, Harvard, Vancouver, ISO e altri

44

Cavazzi, Stefano. "Spatial scale analysis of landscape processes for digital soil mapping in Ireland". Thesis, Cranfield University, 2013. http://dspace.lib.cranfield.ac.uk/handle/1826/8591.

Testo completo

Abstract (sommario):

Soil is one of the most precious resources on Earth because of its role in storing and recycling water and nutrients essential for life, providing a variety of ecosystem services. This vulnerable resource is at risk from degradation by erosion, salinity, contamination and other effects of mismanagement. Information from soil is therefore crucial for its sustainable management. While the demand for soil information is growing, the quantity of data collected in the field is reducing due to financial constraints. Digital Soil Mapping (DSM) supports the creation of geographically referenced soil databases generated by using field observations or legacy data coupled, through quantitative relationships, with environmental covariates. This enables the creation of soil maps at unexplored locations at reduced costs. The selection of an optimal scale for environmental covariates is still an unsolved issue affecting the accuracy of DSM. The overall aim of this research was to explore the effect of spatial scale alterations of environmental covariates in DSM. Three main targets were identified: assessing the impact of spatial scale alterations on classifying soil taxonomic units; investigating existing approaches from related scientific fields for the detection of scale patterns and finally enabling practitioners to find a suitable scale for environmental covariates by developing a new methodology for spatial scale analysis in DSM. Three study areas, covered by detailed reconnaissance soil survey, were identified in the Republic of Ireland. Their different pedological and geomorphological characteristics allowed to test scale behaviours across the spectrum of conditions present in the Irish landscape. The investigation started by examining the effects of scale alteration of the finest resolution environmental covariate, the Digital Elevation Model (DEM), on the classification of soil taxonomic units. Empirical approaches from related scientific fields were subsequently selected from the literature, applied to the study areas and compared with the experimental methodology. Wavelet analysis was also employed to decompose the DEMs into a series of independent components at varying scales and then used in DSM analysis of soil taxonomic units. Finally, a new multiscale methodology was developed and evaluated against the previously presented experimental results. The results obtained by the experimental methodology have proved the significant role of scale alterations in the classification accuracy of soil taxonomic units, challenging the common practice of using the finest available resolution of DEM in DSM analysis. The set of eight empirical approaches selected in the literature have been proved to have a detrimental effect on the selection of an optimal DEM scale for DSM applications. Wavelet analysis was shown effective in removing DEM sources of variation, increasing DSM model performance by spatially decomposing the DEM. Finally, my main contribution to knowledge has been developing a new multiscale methodology for DSM applications by combining a DEM segmentation technique performed by k-means clustering of local variograms parameters calculated in a moving window with an experimental methodology altering DEM scales. The newly developed multiscale methodology offers a way to significantly improve classification accuracy of soil taxonomic units in DSM. In conclusion, this research has shown that spatial scale analysis of environmental covariates significantly enhances the practice of DSM, improving overall classification accuracy of soil taxonomic units. The newly developed multiscale methodology can be successfully integrated in current DSM analysis of soil taxonomic units performed with data mining techniques, so advancing the practice of soil mapping. The future of DSM, as it successfully progresses from the early pioneering years into an established discipline, will have to include scale and in particular multiscale investigations in its methodology. DSM will have to move from a methodology of spatial data with scale to a spatial scale methodology. It is now time to consider scale as a key soil and modelling attribute in DSM.

Gli stili APA, Harvard, Vancouver, ISO e altri

45

Jguirim, Ines. "Modélisation et génération d'itinéraires contextuels d'activités urbaines dans la ville". Thesis, Brest, 2016. http://www.theses.fr/2016BRES0074/document.

Testo completo

Abstract (sommario):

La ville est une agrégation urbaine permettant d’offrir divers services à ses citadins. Elle constitue un système complexe qui dépend de plusieurs facteurs sociaux et économiques. La configuration de l’espace influence d’une manière importante l’accessibilité aux différentes fonctionnalités de la ville. L’analyse spatiale de la structure urbaine est réalisée sur les villes afin d’étudier les caractéristiques de l’espace et pouvoir évaluer son potentiel fonctionnel. L’enjeu de la thèse est de proposer une approche d’analyse spatiale qui prenne en compte les différents aspects structurels et sémantiques de la ville. Un modèle basé sur les graphes a été proposé pour représenter le réseau de transport multimodal de la ville qui garantit l’accessibilité aux différents points d’intérêt. Les super-réseaux ont été utilisés pour intégrer la possibilité d’un transfert intermodal dans le modèle de transport par des liens d’interdépendance entre les sous-graphes associés aux différents modes de transport. L’aspect temporel a été représenté dans le modèle par des attributs spécifiant les contraintes temporelles caractérisant le parcours de chaque noeud et chaque arc tels que le temps d’exploration, le temps d’attente et le temps requis pour les pénalités routières. L’aspect fonctionnel est introduit par le concept d’activité. Nous avons proposé un modèle conceptuel qui vise à modéliser les différents éléments contextuels qui peuvent affecter la planification et l’exécution des activités urbaines tels que le cadre spatio-temporel et le profil de l’utilisateur. Ce modèle conceptuel de données a été enrichi par un système de gestion de connaissances qui vise à représenter des informations sur les comportements des individus dans le cadre d’une activité selon les profils et le contexte spatio-temporel. Nous nous basons sur des données collectées dans le cadre d’une enquête de déplacement pour l’extraction de connaissances à l’aide d’algorithmes de classement et de recherche de motifs séquentiels. Les connaissances extraites sont représentées par un système de gestion de règles permettant la planification contextuelle de l’activité à partir d’un programme d’activité adapté à un profil donné, des itinéraires assurant la réalisation de l’activité sont générés en formant un réseau d’activité contextuel. L’algorithme de recherche d’itinéraires s’appuie sur l’algorithme A* qui permet, à travers une fonction heuristique, la réduction de la complexité de la recherche en prenant en compte l’aspect temporel de l’activité et la structure multimodale de réseau. L’expérimentation de l’approche a été réalisée sur quatre villes Françaises dans l’objectif de générer des réseaux thématiques associés aux différentes activités réalisées par des profils différents. L’aspect fonctionnel représenté dans ces réseaux fait l’objet d’une analyse spatiale qui consiste à étudier la configuration de l’espace tout en prenant en compte l’usage contextuel des utilisateurs. L’analyse est basée sur les opérateurs de centralité définis par la syntaxe spatiale ainsi que des opérateurs d’étude de couverture des réseaux thématiques originaux
The city is an urban aggregation allowing to offer diverse services to his city-dwellers. She establishes a complex system which depends on several social and economic factors. The configuration of the space influences in a important way the accessibility to the various features of the city. The spatial analysis of the urban structure is realized on cities to study the characteristics of the space and be able to estimate its functional potential. The aim of the thesis is to propose an approach to spatial analysis which takes into account the various structural and semantic aspects of the city. A model based on the graphs was proposed to represent the multimodal transport network of the city which guarantees the accessibility to the various points of interest. Super-networks were used to integrate the possibility of an intermodal transfer into the model of transport by links of interdependence between the sub-graphs associated to the various means of transportation. The temporal aspect was represented in the model by attributes specifying the temporal constraints characterizing the itinerary of every node and every edge such as the time of exploration, the waiting time and the time required for the road penalties. The functional aspect is introduced by the concept of activity. We proposed a conceptual model which aims to model the various contextual elements which can affect the planning and the execution of the urban activities such as the spatiotemporal frame and the profile of the user. This model was enriched by knowledge management which aims to represent information about individual behaviors. The extracted knowledge are represented by a management system of rules allowing the contextual planning of the activity

Gli stili APA, Harvard, Vancouver, ISO e altri

46

Remes, J. (Jukka). "Method evaluations in spatial exploratory analyses of resting-state functional magnetic resonance imaging data". Doctoral thesis, Oulun yliopisto, 2013. http://urn.fi/urn:isbn:9789526202228.

Testo completo

Abstract (sommario):

Abstract Resting-state (RS) measurements during functional magnetic resonance imaging (fMRI) have become an established approach for studying spontaneous brain activity. RS-fMRI results are often obtained using explorative approaches like spatial independent component analysis (sICA). These approaches and their software implementations are rarely evaluated extensively or specifically concerning RS-fMRI. Trust is placed in the software that they will work according to the published method descriptions. Many methods and parameters are used despite the lack of test data, and the validity of the underlying models remains an open question. A substantially greater number of evaluations would be needed to ensure the quality of exploratory RS-fMRI analyses. This thesis investigates the applicability of sICA methodology and software in the RS-fMRI context. The experiences were used to formulate general guidelines to facilitate future method evaluations. Additionally, a novel multiple comparison correction (MCC) method, Maxmad, was devised for adjusting evaluation results statistically. With regard to software considerations, the source code of FSL Melodic, popular sICA software, was analyzed against its published method descriptions. Unreported and unevaluated details were found, which implies that one should not automatically assume a correspondence between the literature and the software implementations. The method implementations should rather be subjected to independent reviews. An experimental contribution of this thesis is that the credibility of the emerging sliding window sICAs has been improved by the validation of sICA related preprocessing procedures. In addition to that, the estimation accuracy regarding the results in existing RS-fMRI sICA literature was also shown not to suffer even though repeatability tools like Icasso have not been used in their computation. Furthermore, the evidence against conventional sICA model suggests the consideration of different approaches to analysis of RS-fMRI. The guidelines developed for facilitation of evaluations include adoption of 1) open software development (improved error detection), 2) modular software designs (easier evaluations), 3) data specific evaluations (increased validity), and 4) extensive coverage of parameter space (improved credibility). The proposed Maxmad MCC addresses a statistical problem arising from broad evaluations. Large scale cooperation efforts are proposed concerning evaluations in order to improve the credibility of exploratory RS-fMRI methods
Tiivistelmä Aivoista toiminnallisella magneettikuvantamisella (engl. functional magnetic resonance imaging, fMRI) lepotilassa tehdyt mittaukset ovat saaneet vakiintuneen aseman spontaanin aivotoiminnan tutkimuksessa. Lepotilan fMRI:n tulokset saadaan usein käyttämällä exploratiivisia menetelmiä, kuten spatiaalista itsenäisten komponenttien analyysia (engl. spatial independent component analysis, sICA). Näitä menetelmiä ja niiden ohjelmistototeutuksia evaluoidaan harvoin kattavasti tai erityisesti lepotilan fMRI:n kannalta. Ohjelmistojen luotetaan toimivan menetelmäkuvausten mukaisesti. Monia menetelmiä ja parametreja käytetään testidatan puuttumisesta huolimatta, ja myös menetelmien taustalla olevien mallien pätevyys on edelleen epäselvä asia. Eksploratiivisten lepotilan fMRI-datan analyysien laadun varmistamiseksi tarvittaisiin huomattavasti nykyistä suurempi määrä evaluaatioita. Tämä väitöskirja tutki sICA-menetelmien ja -ohjelmistojen soveltuvuutta lepotilan fMRI-tutkimuksiin. Kokemuksien perusteella luotiin yleisiä ohjenuoria helpottamaan tulevaisuuden menetelmäevaluaatioita. Lisäksi väitöskirjassa kehitettiin uusi monivertailukorjausmenetelmä, Maxmad, evaluaatiotulosten tilastolliseen korjaukseen. Tunnetun sICA-ohjelmiston, FSL Melodicin, lähdekoodi analysoitiin suhteessa julkaistuihin menetelmäkuvauksiin. Analyysissa ilmeni aiemmin raportoimattomia ja evaluoimattomia menetelmäyksityiskohtia, mikä tarkoittaa, ettei kirjallisuudessa olevien menetelmäkuvausten ja niiden ohjelmistototeutusten välille pitäisi automaattisesti olettaa vastaavuutta. Menetelmätoteutukset pitäisi katselmoida riippumattomasti. Väitöskirjan kokeellisena panoksena parannettiin liukuvassa ikkunassa suoritettavan sICA:n uskottavuutta varmistamalla sICA:n esikäsittelyjen oikeellisuus. Lisäksi väitöskirjassa näytettiin, että aiempien sICA-tulosten tarkkuus ei ole kärsinyt, vaikka niiden estimoinnissa ei ole käytetty toistettavuustyökaluja, kuten Icasso-ohjelmistoa. Väitöskirjan tulokset kyseenalaistavat myös perinteisen sICA-mallin, minkä vuoksi tulisi harkita siitä poikkeavia lähtökohtia lepotilan fMRI-datan analyysiin. Evaluaatioiden helpottamiseksi kehitetyt ohjeet sisältävät seuraavat periaatteet: 1) avoin ohjelmistokehitys (parantunut virheiden havaitseminen), 2) modulaarinen ohjelmistosuunnittelu (nykyistä helpommin toteutettavat evaluaatiot), 3) datatyyppikohtaiset evaluaatiot (parantunut validiteetti) ja 4) parametriavaruuden laaja kattavuus evaluaatioissa (parantunut uskottavuus). Ehdotettu Maxmad-monivertailukorjaus tarjoaa ratkaisuvaihtoehdon laajojen evaluaatioiden tilastollisiin haasteisiin. Jotta lepotilan fMRI:ssä käytettävien exploratiivisten menetelmien uskottavuus paranisi, väitöskirjassa ehdotetaan laaja-alaista yhteistyötä menetelmien evaluoimiseksi

Gli stili APA, Harvard, Vancouver, ISO e altri

47

Zhang, Weimin. "Topics in living cell miultiphoton laser scanning microscopy (MPLSM) image analysis". Texas A&M University, 2006. http://hdl.handle.net/1969.1/4412.

Testo completo

Abstract (sommario):

Multiphoton laser scanning microscopy (MPLSM) is an advanced fluorescence imaging technology which can produce a less noisy microscope image and minimize the damage in living tissue. The MPLSM image in this research is the dehydroergosterol (DHE, a fluorescent sterol which closely mimics those of cholesterol in lipoproteins and membranes) on living cell's plasma membrane area. The objective is to use a statistical image analysis method to describe how cholesterol is distributed on a living cell's membrane. Statistical image analysis methods applied in this research include image segmentation/classification and spatial analysis. In image segmentation analysis, we design a supervised learning method by using smoothing technique with rank statistics. This approach is especially useful in a situation where we have only very limited information of classes we want to segment. We also apply unsupervised leaning methods on the image data. In image data spatial analysis, we explore the spatial correlation of segmented data by a Monte Carlo test. Our research shows that the distributions of DHE exhibit a spatially aggregated pattern. We fit two aggregated point pattern models, an area-interaction process model and a Poisson cluster process model, to the data. For the area interaction process model, we design algorithms for maximum pseudo-likelihood estimator and Monte Carlo maximum likelihood estimator under lattice data setting. For the Poisson Cluster process parameter estimation, the method for implicit statistical model parameter estimate is used. A group of simulation studies shows that the Monte Carlo maximum estimation method produces consistent parameter estimates. The goodness-of-fit tests show that we cannot reject both models. We propose to use the area interaction process model in further research.

Gli stili APA, Harvard, Vancouver, ISO e altri

48

Evans, Ben Richard. "Data-driven prediction of saltmarsh morphodynamics". Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276823.

Testo completo

Abstract (sommario):

Saltmarshes provide a diverse range of ecosystem services and are protected under a number of international designations. Nevertheless they are generally declining in extent in the United Kingdom and North West Europe. The drivers of this decline are complex and poorly understood. When considering mitigation and management for future ecosystem service provision it will be important to understand why, where, and to what extent decline is likely to occur. Few studies have attempted to forecast saltmarsh morphodynamics at a system level over decadal time scales. There is no synthesis of existing knowledge available for specific site predictions nor is there a formalised framework for individual site assessment and management. This project evaluates the extent to which machine learning model approaches (boosted regression trees, neural networks and Bayesian networks) can facilitate synthesis of information and prediction of decadal-scale morphological tendencies of saltmarshes. Importantly, data-driven predictions are independent of the assumptions underlying physically-based models, and therefore offer an additional opportunity to crossvalidate between two paradigms. Marsh margins and interiors are both considered but are treated separately since they are regarded as being sensitive to different process suites. The study therefore identifies factors likely to control morphological trajectories and develops geospatial methodologies to derive proxy measures relating to controls or processes. These metrics are developed at a high spatial density in the order of tens of metres allowing for the resolution of fine-scale behavioural differences. Conventional statistical approaches, as have been previously adopted, are applied to the dataset to assess consistency with previous findings, with some agreement being found. The data are subsequently used to train and compare three types of machine learning model. Boosted regression trees outperform the other two methods in this context. The resulting models are able to explain more than 95% of the variance in marginal changes and 91% for internal dynamics. Models are selected based on validation performance and are then queried with realistic future scenarios which represent altered input conditions that may arise as a consequence of future environmental change. Responses to these scenarios are evaluated, suggesting system sensitivity to all scenarios tested and offering a high degree of spatial detail in responses. While mechanistic interpretation of some responses is challenging, process-based justifications are offered for many of the observed behaviours, providing confidence that the results are realistic. The work demonstrates a potentially powerful alternative (and complement) to current morphodynamic models that can be applied over large areas with relative ease, compared to numerical implementations. Powerful analyses with broad scope are now available to the field of coastal geomorphology through the combination of spatial data streams and machine learning. Such methods are shown to be of great potential value in support of applied management and monitoring interventions.

Gli stili APA, Harvard, Vancouver, ISO e altri

49

Prananto, Agnes Kristina. "The use of remotely sensed data to analyse spatial and temporal trends in vegetation patchiness within rehabilitated bauxite mines in the Darling Range, W.A. /". Connect to this title, 2005. http://theses.library.uwa.edu.au/adt-WU2006.0012.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

50

Da, Silva Sébastien. "Fouille de données spatiales et modélisation de linéaires de paysages agricoles". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0156/document.

Testo completo

Abstract (sommario):

Cette thèse s'inscrit dans un partenariat entre l'INRA et l'INRIA et dans le champs de l'extraction de connaissances à partir de bases de données spatiales. La problématique porte sur la caractérisation et la simulation de paysages agricoles. Plus précisément, nous nous concentrons sur des lignes qui structurent le paysage agricole, telles que les routes, les fossés d'irrigation et les haies. Notre objectif est de modéliser les haies en raison de leur rôle dans de nombreux processus écologiques et environnementaux. Nous étudions les moyens de caractériser les structures de haies sur deux paysages agricoles contrastés, l'un situé dans le sud-Est de la France (majoritairement composé de vergers) et le second en Bretagne (Ouest de la France, de type bocage). Nous déterminons également si, et dans quelles circonstances, la répartition spatiale des haies est structurée par la position des éléments linéaires plus pérennes du paysage tels que les routes et les fossés et l'échelle de ces structures. La démarche d'extraction de connaissances à partir de base de données (ECBD) mise en place comporte différentes étapes de prétraitement et de fouille de données, alliant des méthodes mathématiques et informatiques. La première partie du travail de thèse se concentre sur la création d'un indice spatial statistique, fondé sur une notion géométrique de voisinage et permettant la caractérisation des structures de haies. Celui-Ci a permis de décrire les structures de haies dans le paysage et les résultats montrent qu'elles dépendent des éléments plus pérennes à courte distance et que le voisinage des haies est uniforme au-Delà de 150 mètres. En outre différentes structures de voisinage ont été mises en évidence selon les principales orientations de haies dans le sud-Est de la France, mais pas en Bretagne. La seconde partie du travail de thèse a exploré l'intérêt du couplage de méthodes de linéarisation avec des méthodes de Markov. Les méthodes de linéarisation ont été introduites avec l'utilisation d'une variante des courbes de Hilbert : les chemins de Hilbert adaptatifs. Les données spatiales linéaires ainsi construites ont ensuite été traitées avec les méthodes de Markov. Ces dernières ont l'avantage de pouvoir servir à la fois pour l'apprentissage sur les données réelles et pour la génération de données, dans le cadre, par exemple, de la simulation d'un paysage. Les résultats montrent que ces méthodes couplées permettant un apprentissage et une génération automatique qui capte des caractéristiques des différents paysages. Les premières simulations sont encourageantes malgré le besoin d'un post-Traitement. Finalement, ce travail de thèse a permis la création d'une méthode d'exploration de données spatiales basée sur différents outils et prenant en charge toutes les étapes de l'ECBD classique, depuis la sélection des données jusqu'à la visualisation des résultats. De plus, la construction de cette méthode est telle qu'elle peut servir à son tour à la génération de données, volet nécessaire pour la simulation de paysage
This thesis is part of a partnership between INRA and INRIA in the field of knowledge extraction from spatial databases. The study focuses on the characterization and simulation of agricultural landscapes. More specifically, we focus on linears that structure the agricultural landscape, such as roads, irrigation ditches and hedgerows. Our goal is to model the spatial distribution of hedgerows because of their role in many ecological and environmental processes. We more specifically study how to characterize the spatial structure of hedgerows in two contrasting agricultural landscapes, one located in south-Eastern France (mainly composed of orchards) and the second in Brittany (western France, \emph{bocage}-Type). We determine if the spatial distribution of hedgerows is structured by the position of the more perennial linear landscape features, such as roads and ditches, or not. In such a case, we also detect the circumstances under which this spatial distribution is structured and the scale of these structures. The implementation of the process of Knowledge Discovery in Databases (KDD) is comprised of different preprocessing steps and data mining algorithms which combine mathematical and computational methods. The first part of the thesis focuses on the creation of a statistical spatial index, based on a geometric neighborhood concept and allowing the characterization of structures of hedgerows. Spatial index allows to describe the structures of hedgerows in the landscape. The results show that hedgerows depend on more permanent linear elements at short distances, and that their neighborhood is uniform beyond 150 meters. In addition different neighborhood structures have been identified depending on the orientation of hedgerows in the South-East of France but not in Brittany. The second part of the thesis explores the potential of coupling linearization methods with Markov methods. The linearization methods are based on the use of alternative Hilbert curves: Hilbert adaptive paths. The linearized spatial data thus constructed were then treated with Markov methods. These methods have the advantage of being able to serve both for the machine learning and for the generation of new data, for example in the context of the simulation of a landscape. The results show that the combination of these methods for learning and automatic generation of hedgerows captures some characteristics of the different study landscapes. The first simulations are encouraging despite the need for post-Processing. Finally, this work has enabled the creation of a spatial data mining method based on different tools that support all stages of a classic KDD, from the selection of data to the visualization of results. Furthermore, this method was constructed in such a way that it can also be used for data generation, a component necessary for the simulation of landscapes

Gli stili APA, Harvard, Vancouver, ISO e altri

Tesi sul tema "Spatial data mining"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili