Dissertations / Theses: 'Mining methods'

1

Mwitondi, K. S. "Robust methods in data mining." Thesis, University of Leeds, 2003. http://etheses.whiterose.ac.uk/807/.

Full text

Abstract:

The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machine Learning literature as unsupervised and supervised classification respectively, is central to decision making in different fields - the thesis seeks to contribute towards that end. In the first part of the thesis we consider whether robust methods can be applied to clustering - in particular, we perform clustering on fuzzy data using two methods originally developed for outlier-detection. The fuzzy data clusters are characterised by two intersecting lines such that points belonging to the same cluster lie close to the same line. This part of the thesis also investigates a new application of finite mixture of normals to the fuzzy data problem. The second part of the thesis addresses issues relating to classification - in particular, classification trees and boosting. The boosting algorithm is a relative newcomer to the classification portfolio that seeks to enhance the performance of classifiers by iteratively re-weighting the data according to their previous classification status. We explore the performance of "boosted" trees (mainly stumps) based on 3 different models all characterised by a sine-wave boundary. We also carry out a thorough study of the factors that affect the boosting algorithm. Other results include a new look at the concept of randomness in the classification context, particularly because the form of randomness in both training and testing data has directly affects the accuracy and reliability of domain- partitioning rules. Further, we provide statistical interpretations of some of the classification-related concepts, originally used in Computer Science, Machine Learning and Artificial Intelligence. This is important since there exists a need for a unified interpretation of some of the "landmark" concepts in various disciplines, as a step forward towards seeking the principles that can guide and strengthen practical applications.

APA, Harvard, Vancouver, ISO, and other styles

2

Wirta, Valtteri. "Mining the transcriptome - methods and applications." Doctoral thesis, Stockholm : School of Biotechnology, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4115.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Siddiqui, Muazzam. "DATA MINING METHODS FOR MALWARE DETECTION." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2783.

Full text

Abstract:

This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares.
Ph.D.
Other
Sciences
Modeling and Simulation PhD

APA, Harvard, Vancouver, ISO, and other styles

4

Espinoza, Sofia Elizabeth. "Data mining methods applied to healthcare problems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44903.

Full text

Abstract:

Growing adoption of health information technologies is allowing healthcare providers to capture and store enormous amounts of patient data. In order to effectively use this data to improve healthcare outcomes and processes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various data mining and statistical methods that could be applied to the type of datasets that are found in healthcare research. We discuss the process of identification of appropriate measures and statistical tools, the analysis and validation of mathematical models, and the interpretation of results to improve healthcare quality and safety. We illustrate the application of statistics and data mining techniques on three real-world healthcare datasets. In the first chapter, we develop a new method to assess hydration status using breath samples. Through analysis of the more than 300 volatile organic compounds contained in human breath, we aim to identify markers of hydration. In the second chapter, we evaluate the impact of the implementation of an electronic medical record system on the rate of inpatient medication errors and adverse drug events. The objective is to understand the impact on patient safety of different information technologies in a specific environment (inpatient pediatrics) and to provide recommendations on how to correctly analyze count data with a large amount of zeros. In the last chapter, we develop a mathematical model to predict the probability of developing post-operative nausea and vomiting based on patient demographics and clinical history, and to identify the group of patients at high-risk.

APA, Harvard, Vancouver, ISO, and other styles

5

Vu, Lan. "High performance methods for frequent pattern mining." Thesis, University of Colorado at Denver, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3667246.

Full text

Abstract:

Current Big Data era is generating tremendous amount of data in most fields such as business, social media, engineering, and medicine. The demand to process and handle the resulting "big data" has led to the need for fast data mining methods to develop powerful and versatile analysis tools that can turn data into useful knowledge. Frequent pattern mining (FPM) is an important task in data mining with numerous applications such as recommendation systems, consumer market analysis, web mining, network intrusion detection, etc. We develop efficient high performance FPM methods for large-scale databases on different computing platforms, including personal computers (PCs), multi-core multi-socket servers, clusters and graphics processing units (GPUs). At the core of our research is a novel self-adaptive approach that performs efficiently and fast on both sparse and dense databases, and outperforms its sequential counterparts. This approach applies multiple mining strategies and dynamically switches among them based on the data characteristics detected at runtime. The research results include two sequential FPM methods (i.e. FEM and DFEM) and three parallel ones (i.e. ShaFEM, SDFEM and CGMM). These methods are applicable to develop powerful and scalable mining tools for big data analysis. We have tested, analysed and demonstrated their efficacy on selecting representative real databases publicly available at Frequent Itemset Mining Implementations Repository.

APA, Harvard, Vancouver, ISO, and other styles

6

SOUZA, Ellen Polliana Ramos. "Swarm optimization clustering methods for opinion mining." Universidade Federal de Pernambuco, 2017. https://repositorio.ufpe.br/handle/123456789/25227.

Full text

Abstract:

Submitted by Pedro Barros (pedro.silvabarros@ufpe.br) on 2018-07-25T19:46:45Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Ellen Polliana Ramos Souza.pdf: 1140564 bytes, checksum: 0afe0dc25ea5b10611d057c23af46dec (MD5)
Approved for entry into archive by Alice Araujo (alice.caraujo@ufpe.br) on 2018-07-26T21:58:03Z (GMT) No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Ellen Polliana Ramos Souza.pdf: 1140564 bytes, checksum: 0afe0dc25ea5b10611d057c23af46dec (MD5)
Made available in DSpace on 2018-07-26T21:58:03Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Ellen Polliana Ramos Souza.pdf: 1140564 bytes, checksum: 0afe0dc25ea5b10611d057c23af46dec (MD5) Previous issue date: 2017-02-22
Opinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services.
A mineração de opinião, também conhecida como análise de sentimento, é um campo de estudo que analisa os sentimentos, opiniões, atitudes e emoções das pessoas sobre diferentes entidades, expressos de forma textual. Tal análise é obtida através da classificação das opiniões em categorias, tais como positiva, negativa ou neutra. As abordagens de aprendizado supervisionado e baseadas em léxico são mais comumente utilizadas na mineração de opinião. No entanto, tais abordagens requerem um esforço considerável para preparação da base de dados de treinamento e para construção dos léxicos de opinião, respectivamente. A fim de minimizar as desvantagens das abordagens apresentadas, esta Tese propõe o uso de uma abordagem de agrupamento não supervisionada para a tarefa de mineração de opinião, a qual é capaz de produzir resultados precisos para diversos domínios sem a necessidade de dados rotulados manualmente para a etapa treinamento e sem fazer uso de ferramentas dependentes de língua. Três algoritmos de agrupamento não-supervisionado baseados em otimização de partícula de enxame (Particle Swarm Optimization - PSO) são propostos: o DPSOMUT, que é baseado em versão discreta do PSO; o IDPSOMUT, que é baseado em uma versão melhorada e autoadaptativa do PSO com função de detecção; e o IDPSOMUT/CS, que é uma versão híbrida do IDPSOMUT com o Cuckoo Search (CS). Diversos experimentos foram conduzidos com diferentes tipos de corpora, domínios, idioma do texto, balanceamento de classes, função de otimização e técnicas de pré-processamento. A eficácia dos algoritmos de agrupamento foi avaliada com medidas externas como acurácia, precisão, revocação e f-medida. A partir das análises estatísticas, os algortimos baseados em inteligência coletiva, especialmente os baseado em PSO, obtiveram melhores resultados que os algortimos que utilizam técnicas convencionais de agrupamento como o K-means e o Agglomerative. Os algoritmos propostos obtiveram um melhor desempenho utilizando o pré-processamento baseado em n-grama e utilizando a Global Silhouete como função de otimização. O corpus OBCC é também uma contribuição desta Tese e contem uma coleção dourada com 2.940 tweets com opiniões de consumidores sobre produtos e serviços em Português brasileiro.

APA, Harvard, Vancouver, ISO, and other styles

7

Johnson, Eamon B. "Methods in Text Mining for Diagnostic Radiology." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1459514073.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Eales, James Matthew. "Text-mining of experimental methods in phylogenetics." Thesis, University of Manchester, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.529251.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Sundaravej, Dilokpol. "Predictive methods for subsidence due to longwall mining." Ohio : Ohio University, 1986. http://www.ohiolink.edu/etd/view.cgi?ohiou1183379335.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Bastos, Guilherme Sousa. "Methods for truck dispatching in open-pit mining." Instituto Tecnológico de Aeronáutica, 2010. http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=1098.

Full text

Abstract:

Material transportation is one of the most important aspects of open-pit mine operations. The problem usually involves a truck dispatching system in which decisions on truck assignments and destinations are taken in real-time. Due to its significance, several decision systems for this problem have been developed in the last few years, improving productivity and reducing operating costs. As in many other real-world applications, the assessment and correct modeling of uncertainty is a crucial requirement as the unpredictability originated from equipment faults, weather conditions, and human mistakes, can often result in truck queues or idle shovels. However, uncertainty is not considered in most commercial dispatching systems. In this thesis, we introduce novel truck dispatching systems as a starting point to modify the current practices with a statistically principled decision making methodology. First, we present a stochastic method using Time-Dependent Markov Decision Process (TiMDP) applied to the truck dispatching problem. In the TiMDP model, travel times are represented as probabilistic density functions (pdfs), time-windows can be inserted for paths availability, and time-dependent utility can be used as a priority parameter. In order to minimize the well-known curse of dimensionality issue, to which multi-agent problems are subject when considering discrete state modelings, the system is modeled based on the introduced single-dependent-agents. Based also on the single-dependent-agents concept, we introduce the Genetic TiMDP (G-TiMDP) method applied to the truck dispatching problem. This method is a hybridization of the TiMDP model and of a Genetic Algorithm (GA), which is also used to solve the truck dispatching problem. Finally, in order to evaluate and compare the results of the introduced methods, we execute Monte Carlo simulations in a example heterogeneous mine composed by 15 trucks, 3 shovels, and 1 crusher. The uncertain aspect of the problem is represented by the path selection through crusher and shovels, which is executed by the truck driver, being independent of the dispatching system. The results are compared to classical dispatching approaches (Greedy Heuristic and Minimization of Truck Cycle Times - MTCT) using Student's T-test, proving the efficiency of the introduced truck dispatching methods.

APA, Harvard, Vancouver, ISO, and other styles

11

Ashton, Triss A. "Accuracy and Interpretability Testing of Text Mining Methods." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc283791/.

Full text

Abstract:

Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.

APA, Harvard, Vancouver, ISO, and other styles

12

Kragh, J. Edward. "Borehole seismic methods for opencast coal exploration." Thesis, Durham University, 1990. http://etheses.dur.ac.uk/6178/.

Full text

Abstract:

Surface seismic techniques lack the resolution to image the top 100m or so of the earth's surface necessary for opencast coal exploration. The work reported in this thesis is the development of borehole seismic methods making use of the closely spaced boreholes that are routinely drilled by British Coal. The first method investigated was to use a tomographic technique to observe any reduction in seismic velocities above old workings, and hence infer the presence of old workings. In order to obtain clear images of the subsurface, it was necessary to interpret the field data for the presence of head waves, and to pick the later arrival direct waves for the tomographic inversions. However, independent data obtained from uphole surveys showed that there was no measurable reduction in the seismic velocity above old workings for strata below the water table, and the tomographic method was abandoned in favour of borehole seismic reflection methods. Fifteen hole-to-surface seismic reflection surveys were acquired using down- hole explosive charges as sources and a linear spread of surface geophones passing through the borehole position as receivers. A complete package of processing software was developed for processing the data, and eight of the surveys are presented in this thesis. The final migrated and stacked sections delineate a washout and faulting at both large and small scales. The vertical resolution of the data is high due to the wideband temporal frequencies in the data, typically up to 300Hz.The hole-to-surface method is compared to the crosshole seismic reflection method, which was developed in parallel by M. J. Findlay. The relative merits of the two techniques are discussed, and suggestions are made to improve the acquisition of the data to make both methods applicable to a wider variety of problems. Although the vertical resolution of the hole-to-surface method is lower than the crosshole method, this could be more than compensated for by extending the hole- to-surface method to three-dimensions, using areal arrays of surface geophones around the borehole.

APA, Harvard, Vancouver, ISO, and other styles

13

Shen, Rujun, and 沈汝君. "Mining optimal technical trading rules with genetic algorithms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B47870011.

Full text

Abstract:

In recent years technical trading rules are widely known by more and more people, not only the academics many investors also learn to apply them in financial markets. One approach of constructing technical trading rules is to use technical indicators, such as moving average(MA) and filter rules. These trading rules are widely used possibly because the technical indicators are simple to compute and can be programmed easily. An alternative approach of constructing technical trading rules is to rely on some chart patterns. However, the patterns and signals detected by these rules are often made by the visual inspection through human eyes. As for as I know, there are no universally acceptable methods of constructing the chart patterns. In 2000, Prof. Andrew Lo and his colleagues are the first ones who define five pairs of chart patterns mathematically. They are Head-and-Shoulders(HS) & Inverted Headand- Shoulders(IHS), Broadening tops(BTOP) & bottoms(BBOT), Triangle tops(TTOP) & bottoms(TBOT), Rectangle tops(RTOP) & bottoms( RBOT) and Double tops(DTOP) & bottoms(DBOT). The basic formulation of a chart pattern consists of two steps: detection of (i) extreme points of a price series; and (ii) shape of the pattern. In Lo et al.(2000), the method of kernel smoothing was used to identify the extreme points. It was admitted by Lo et al. (2000) that the optimal bandwidth used in kernel method is not the best choice and the expert judgement is needed in detecting the bandwidth. In addition, their work considered chart pattern detection only but no buy/sell signal detection. It should be noted that it is possible to have a chart pattern formed without a signal detected, but in this case no transaction will be made. In this thesis, I propose a new class of technical trading rules which aims to resolve the above problems. More specifically, each chart pattern is parameterized by a set of parameters which governs the shape of the pattern, the entry and exit signals of trades. Then the optimal set of parameters can be determined by using genetic algorithms (GAs). The advantage of GA is that they can deal with a high-dimensional optimization problems no matter the parameters to be optimized are continuous or discrete. In addition, GA can also be convenient to use in the situation that the fitness function is not differentiable or has a multi-modal surface.
published_or_final_version
Statistics and Actuarial Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

14

Del, Villar René. "Modelling and simulation of Brunswick mining grinding circuit." Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=72758.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Andrieux, Patrick. "Methods and practice of blast-induced vibration monitoring." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=23860.

Full text

Abstract:

Regardless of the objective sought, the conclusions drawn from seismic monitoring can only be as good as the quality of the recorded data: the importance of properly capturing relevant raw vibrational information in the first place is thus absolutely crucial. The difficulty is that blast-induced vibration monitoring is site specific and that general formulas do not apply: every situation will correspond to a unique combination of objectives, ground conditions, blast design and explosive types, and will need to be monitored accordingly. To adequately acquire all the pertinent seismic information, a number of points must be successfully addressed, such as the choice of sensors, their location, number, orientation and anchoring, the transmission of the captured signals from these gauges to the recording equipment, and the choice and set-up of the data acquisition system.
It is the purpose of this thesis to address these questions in some detail, in an attempt to provide the reader with an understanding of how all the components involved in blast-induced vibration monitoring interact, and on how the choices made at each step can significantly affect overall results. (Abstract shortened by UMI.)

APA, Harvard, Vancouver, ISO, and other styles

16

Ould-Hamou, Malek. "Beneficiation of Algerian phosphate tailings by electrostatic methods." Thesis, University of Leeds, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277859.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Khoshrou, Seyed Hassan. "Theoretical and experimental investigation of wall-control blasting methods." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=40161.

Full text

Abstract:

Overbreak and damage to rock walls is one of the most serious problems encountered in blasting operations. Several techniques have been developed to control the undesirable effects of rock blasting. These techniques are collectively known as wall-control blasting methods.
The stress distribution around pressurized holes has been numerically evaluated, in order to analyze the mechanism of wall-control blasting methods. The effect of blast geometry and the role of discontinuity on this stress field has also been studied in detail. The results obtained by numerical modelling have been verified by controlled blasting experiments, and further supported by analysis of existing roadcuts on a large scale.
It was found that the mechanism of wall-control blast can be explained by the collision and superposition of the stresses between the holes. A narrow fracture zone between the holes was produced by tensile stresses on the centreline. It is neither necessary nor realistic to assume onset of fractures at the midpoint between holes by reinforcement of the stresses from each hole.
The analysis shows that a burden can be defined as being infinite when the ratio of that to the spacing is greater than unity. For pre-split blasting (infinite burden) in an isotropic and homogeneous material the hole separation could range up to 15 borehole diameters. The decoupling ratio between the explosive charge and the borehole diameter should be smaller than 0.5. This ratio would generally be between 0.2 and 0.3 for pre-splitting (infinite burden), and between 0.3 and 0.4 in the presence of a free face.
A discontinuity parallel to the free face and located at the back of the holes causes high stress levels between the discontinuity and the boreholes, resulting is a shattered one in this region. The presence of a similar discontinuity at the front of the holes leads to considerable overbreak and development of an undamaged "hump" of rock between holes. The effect of a discontinuity oriented normal to the centreline at the midpoint between holes has minimal effect on the blast results. As the angle of the discontinuity with the free face decreases from 90$ sp circ,$ the damage zone between the holes and the discontinuity increases, and the shape of the final wall changes from a smooth face to a corrugated shape. A closed-discontinuity or an open discontinuity cemented with strong filling materials has little effect on the results of the blast. However, as the width of the discontinuity increases, the size of the damage zone also increases. An open discontinuity, 50 mm wide or more, plays a role similar to a free face.
In roadcut blast design, hole deviation is a key parameter in determining the quality of the face. However, consistent hole deviation in the same direction has minimal effect in the result of the blast. This type of deviation is usually associated with bedded rocks, with alternating bands of soft and hard rock on the face. The degree of deviation is dependent, amongst other factors, on orientation, thickness, frequency and the position of these bands.

APA, Harvard, Vancouver, ISO, and other styles

18

McSherry, Frank. "Spectral methods for data analysis /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/7004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Chen, Keke. "Geometric Methods for Mining Large and Possibly Private Datasets." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11561.

Full text

Abstract:

With the wide deployment of data intensive Internet applications and continued advances in sensing technology and biotechnology, large multidimensional datasets, possibly containing privacy-conscious information have been emerging. Mining such datasets has become increasingly common in business integration, large-scale scientific data analysis, and national security. The proposed research aims at exploring the geometric properties of the multidimensional datasets utilized in statistical learning and data mining, and providing novel techniques and frameworks for mining very large datasets while protecting the desired data privacy. The first main contribution of this research is the development of iVIBRATE interactive visualization-based approach for clustering very large datasets. The iVIBRATE framework uniquely addresses the challenges in handling irregularly shaped clusters, domain-specific cluster definition, and cluster-labeling of the data on disk. It consists of the VISTA visual cluster rendering subsystem, and the Adaptive ClusterMap Labeling subsystem. The second main contribution is the development of ``Best K Plot'(BKPlot) method for determining the critical clustering structures in multidimensional categorical data. The BKPlot method uniquely addresses two challenges in clustering categorical data: How to determine the number of clusters (the best K) and how to identify the existence of significant clustering structures. The method consists of the basic theory, the sample BKPlot theory for large datasets, and the testing method for identifying no-cluster datasets. The third main contribution of this research is the development of the theory of geometric data perturbation and its application in privacy-preserving data classification involving single party or multiparty collaboration. The key of geometric data perturbation is to find a good randomly generated rotation matrix and an appropriate noise component that provides satisfactory balance between privacy guarantee and data quality, considering possible inference attacks. When geometric perturbation is applied to collaborative multiparty data classification, it is challenging to unify the different geometric perturbations used by different parties. We study three protocols under the data-mining-service oriented framework for unifying the perturbations: 1) the threshold-satisfied voting protocol, 2) the space adaptation protocol, and 3) the space adaptation protocol with a trusted party. The tradeoffs between the privacy guarantee, the model accuracy and the cost are studied for the protocols.

APA, Harvard, Vancouver, ISO, and other styles

20

Amankwah, Henry. "Mathematical Optimization Models and Methods for Open-Pit Mining." Doctoral thesis, Linköpings universitet, Optimeringslära, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70844.

Full text

Abstract:

Open-pit mining is an operation in which blocks from the ground are dug to extract the ore contained in them, and in this process a deeper and deeper pit is formed until the mining operation ends. Mining is often a highly complex industrial operation, with respect to both technological and planning aspects. The latter may involve decisions about which ore to mine and in which order. Furthermore, mining operations are typically capital intensive and long-term, and subject to uncertainties regarding ore grades, future mining costs, and the market prices of the precious metals contained in the ore. Today, most of the high-grade or low-cost ore deposits have already been depleted, and to obtain sufficient profitability in mining operations it is therefore today often a necessity to achieve operational efficiency with respect to both technological and planning issues. In this thesis, we study the open-pit design problem, the open-pit mining scheduling problem, and the open-pit design problem with geological and price uncertainty. These problems give rise to (mixed) discrete optimization models that in real-life settings are large scale and computationally challenging. The open-pit design problem is to find an optimal ultimate contour of the pit, given estimates of ore grades, that are typically obtained from samples in drill holes, estimates of costs for mining and processing ore, and physical constraints on mining precedence and maximal pit slope. As is well known, this problem can be solved as a maximum flow problem in a special network. In a first paper, we show that two well known parametric procedures for finding a sequence of intermediate contours leading to an ultimate one, can be interpreted as Lagrangian dual approaches to certain side-constrained design models. In a second paper, we give an alternative derivation of the maximum flow problem of the design problem. We also study the combined open-pit design and mining scheduling problem, which is the problem of simultaneously finding an ultimate pit contour and the sequence in which the parts of the orebody shall be removed, subject to mining capacity restrictions. The goal is to maximize the discounted net profit during the life-time of the mine. We show in a third paper that the combined problem can also be formulated as a maximum flow problem, if the mining capacity restrictions are relaxed; in this case the network however needs to be time-expanded. In a fourth paper, we provide some suggestions for Lagrangian dual heuristic and time aggregation approaches for the open-pit scheduling problem. Finally, we study the open-pit design problem under uncertainty, which is taken into account by using the concept of conditional value-atrisk. This concept enables us to incorporate a variety of possible uncertainties, especially regarding grades, costs and prices, in the planning process. In real-life situations, the resulting models would however become very computationally challenging.

APA, Harvard, Vancouver, ISO, and other styles

21

Savas, Berkant. "Algorithms in data mining using matrix and tensor methods." Doctoral thesis, Linköpings universitet, Beräkningsvetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11597.

Full text

Abstract:

In many fields of science, engineering, and economics large amounts of data are stored and there is a need to analyze these data in order to extract information for various purposes. Data mining is a general concept involving different tools for performing this kind of analysis. The development of mathematical models and efficient algorithms is of key importance. In this thesis we discuss algorithms for the reduced rank regression problem and algorithms for the computation of the best multilinear rank approximation of tensors. The first two papers deal with the reduced rank regression problem, which is encountered in the field of state-space subspace system identification. More specifically the problem is \[ \min_{\rank(X) = k} \det (B - X A)(B - X A)\tp, \] where $A$ and $B$ are given matrices and we want to find $X$ under a certain rank condition that minimizes the determinant. This problem is not properly stated since it involves implicit assumptions on $A$ and $B$ so that $(B - X A)(B - X A)\tp$ is never singular. This deficiency of the determinant criterion is fixed by generalizing the minimization criterion to rank reduction and volume minimization of the objective matrix. The volume of a matrix is defined as the product of its nonzero singular values. We give an algorithm that solves the generalized problem and identify properties of the input and output signals causing a singular objective matrix. Classification problems occur in many applications. The task is to determine the label or class of an unknown object. The third paper concerns with classification of handwritten digits in the context of tensors or multidimensional data arrays. Tensor and multilinear algebra is an area that attracts more and more attention because of the multidimensional structure of the collected data in various applications. Two classification algorithms are given based on the higher order singular value decomposition (HOSVD). The main algorithm makes a data reduction using HOSVD of 98--99 \% prior the construction of the class models. The models are computed as a set of orthonormal bases spanning the dominant subspaces for the different classes. An unknown digit is expressed as a linear combination of the basis vectors. The resulting algorithm achieves 5\% in classification error with fairly low amount of computations. The remaining two papers discuss computational methods for the best multilinear rank approximation problem \[ \min_{\cB} \| \cA - \cB\| \] where $\cA$ is a given tensor and we seek the best low multilinear rank approximation tensor $\cB$. This is a generalization of the best low rank matrix approximation problem. It is well known that for matrices the solution is given by truncating the singular values in the singular value decomposition (SVD) of the matrix. But for tensors in general the truncated HOSVD does not give an optimal approximation. For example, a third order tensor $\cB \in \RR^{I \x J \x K}$ with rank$(\cB) = (r_1,r_2,r_3)$ can be written as the product \[ \cB = \tml{X,Y,Z}{\cC}, \qquad b_{ijk}=\sum_{\lambda,\mu,\nu} x_{i\lambda} y_{j\mu} z_{k\nu} c_{\lambda\mu\nu}, \] where $\cC \in \RR^{r_1 \x r_2 \x r_3}$ and $X \in \RR^{I \times r_1}$, $Y \in \RR^{J \times r_2}$, and $Z \in \RR^{K \times r_3}$ are matrices of full column rank. Since it is no restriction to assume that $X$, $Y$, and $Z$ have orthonormal columns and due to these constraints, the approximation problem can be considered as a nonlinear optimization problem defined on a product of Grassmann manifolds. We introduce novel techniques for multilinear algebraic manipulations enabling means for theoretical analysis and algorithmic implementation. These techniques are used to solve the approximation problem using Newton and Quasi-Newton methods specifically adapted to operate on products of Grassmann manifolds. The presented algorithms are suited for small, large and sparse problems and, when applied on difficult problems, they clearly outperform alternating least squares methods, which are standard in the field.

APA, Harvard, Vancouver, ISO, and other styles

22

Eberhard, Michael. "Optimisation of filtration by application of data mining methods." [S.l.] : [s.n.], 2006. http://mediatum2.ub.tum.de/doc/603763/document.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Al-Naymat, Ghazi. "NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA." University of Sydney, 2009. http://hdl.handle.net/2123/5295.

Full text

Abstract:

Doctor of Philosophy (PhD)
Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002.

APA, Harvard, Vancouver, ISO, and other styles

24

Hamzaoui, Amel. "Shared-Neighbours methods for visual content structuring and mining." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00856582.

Full text

Abstract:

This thesis investigates new clustering paradigms and algorithms based on the principle of the shared nearest-neighbors (SNN. As most other graph-based clustering approaches, SNN methods are actually well suited to overcome data complexity, heterogeneity and high-dimensionality.The first contribution of the thesis is to revisit existing shared neighbors methods in two points. We first introduce a new SNN formalism based on the theory of a contrario decision. This allows us to derive more reliable connectivity scores of candidate clusters and a more intuitive interpretation of locally optimum neighborhoods. We also propose a new factorization algorithm for speeding-up the intensive computation of the required sharedneighbors matrices.The second contribution of the thesis is a generalization of the SNN clustering approach to the multi-source case. Whereas SNN methods appear to be ideally suited to sets of heterogeneous information sources, this multi-source problem was surprisingly not addressed in the literature beforehand. The main originality of our approach is that we introduce an information source selection step in the computation of candidate cluster scores. As shown in the experiments, this source selection step makes our approach widely robust to the presence of locally outlier sources. This new method is applied to a wide range of problems including multimodal structuring of image collections and subspace-based clustering based on random projections. The third contribution of the thesis is an attempt to extend SNN methods to the context of bipartite k-nn graphs. We introduce new SNN relevance measures revisited for this asymmetric context and show that they can be used to select locally optimal bi-partite clusters. Accordingly, we propose a new bipartite SNN clustering algorithm that is applied to visual object's discovery based on a randomly precomputed matching graph. Experiments show that this new method outperformed state-of-the-art object mining results on OxfordBuilding dataset. Based on the discovered objects, we also introduce a new visual search paradigm, i.e. object-based visual query suggestion.

APA, Harvard, Vancouver, ISO, and other styles

25

Fiala, Dalibor Rousselot François Jez̆ek Karel. "Web mining methods for the detection of authoritative sources." Strasbourg : Université Louis Pasteur, 2008. http://eprints-scd-ulp.u-strasbg.fr:8080/883/01/FIALA_Dalibor_2007.pdf.

Full text

Abstract:

Thesis : Computer science and engineering : University of West Bohemia (Pilsen) : 2007. Thèse de doctorat : Informatique : Strasbourg 1 : 2007.
Thèse soutenue en co-tutelle. Titre provenant de l'écran-titre. Bibliogr. p. 100-107.

APA, Harvard, Vancouver, ISO, and other styles

26

Giess, Matthew. "Extracting information from manufacturing data using data mining methods." Thesis, University of Bath, 2006. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.432831.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Fiala, Dalibor. "Web mining methods for the detection of authoritative sources." Université Louis Pasteur (Strasbourg) (1971-2008), 2007. https://publication-theses.unistra.fr/public/theses_doctorat/2007/FIALA_Dalibor_2007.pdf.

Full text

Abstract:

La partie innovante de cette thèse porte sur les définitions, les explications et teste des modifications de la formule standard de PageRank adaptée aux réseaux bibliographiques. Les nouvelles versions de PageRank tiennent compte non seulement du graphe de citations mais aussi du graphe de collaboration. On vérifie l’applicabilité des nouveaux algorithmes en traitant des données issues de la bibliothèque numérique DBLP et en comparant les rangs des lauréats du prix « ACM SIGMOD E. F. Codd Innovations Award ». Les classements reposant sur les informations concernant à la fois les citations et les collaborations s’avèrent meilleurs que les classements générés par PageRank standard. Dans un autre chapitre de la thèse, on présente une méthodologie et deux études de cas concernant la recherche des chercheurs faisant autorité en analysant des sites Web académiques
The innovative portion of this doctoral thesis deals with the definitions, explanations and testing of modifications of the standard PageRank formula adapted for bibliographic networks. The new versions of PageRank take into account not only the citation but also the co-authorship graph. We verify the viability of the new algorithms by applying them to the data from the DBLP digital library and by comparing the resulting ranks of the winners of the ACM SIGMOD E. F. Codd Innovations Award. The rankings based on both the citation and co-authorship information turn out to be better than the standard PageRank ranking. In another part of the disseration, we present a methodology and two case studies for finding authoritative researchers by analyzing academic Web sites
Rozvoj informační společnosti v posledních desetiletích umožňuje shromažďovat, filtrovat a ukládat obrovská množství dat. Abychom z nich získali cenné informace a znalosti, musejí se tato data dále zpracovávat. Vědecký obor zabývající se získáváním informací a znalostí z dat se překotně vyvíjí, aby zachytil vysoké tempo nárůstu zdrojů informací, jejichž počet se po vzniku celosvětové pavučiny (webu) zvyšuje geometrickou řadou. Všechny tradiční přístupy z oblasti získávání informací, dobývání znalostí a dolování z dat se musejí přizpůsobit dynamickým, heterogenním a nestrukturovaným datům z webu. Dolování z webu (web mining) se stal plnohodnotnou vědeckou disciplínou. Web má mnoho speciálních vlastností. Tou nejvýznačnější je jeho struktura odkazů mezi stránkami. Web je dynamickou, propojenou sítí. Webové stránky obsahují odkazy na jiné stránky s podobným obsahem nebo na zajímavé či jinak spřízněné dokumenty. Velmi brzy se zjistilo, že webová struktura odkazů je ohromným zdrojem informací a že představuje rozsáhlé pole aplikací z oboru sociálních sítí a matematické teorie grafů. Brin a Page podrobili propojení webu intenzivnímu výzkumu a v roce 1998 vydali dnes už slavný článek „The anatomy of a large-scale hypertextual Web search engine“, v němž světu představili Google – webový vyhledávač pro každého. Úspěch Googlu spočívá především v algoritmu pro hodnocení webových stránek nazvaném PageRank. Ten využívá struktury webu k tomu, aby v něm rekurzivní metodou nalezl populární, důležité, významné a autoritativní zdroje. Technický popis PageRanku byl publikován a měl za následek doslova příval dalších odborných článků o metodách založených na propojení uzlů sítě, které nakonec daly vzniknout úplně nové skupině algoritmů – hodnoticím (ranking) algoritmům. Každá metoda má své zvláštnosti a umí se vypořádat s určitými problémy. Ačkoliv byly hodnoticí algoritmy původně vymyšleny pro web, jsou použitelné v každém prostředí, které lze modelovat grafem. Inovativní část této doktorské práce se zabývá definicemi, vysvětlením a testováním modifikací standardního vzorce PageRanku uzpůsobeného pro bibliografické sítě. Takto vzniklé nové verze PageRanku berou v úvahu nejen citační graf, ale i graf spoluautorství. Použitelnost nových algoritmů ověřujeme jejich aplikací na data z digitální knihovny DBLP. Získané žebříčky významných autorů porovnáváme s držiteli ocenění ACM SIGMOD E. F. Codd Innovations Award. Ukazujeme, že hodnocení založené jak na citacích, tak na spolupracích dává lepší výsledky než standardní PageRank. V jiné části disertace představujeme metodologii a dvě případové studie vyhledávání autoritativních vědců analyzováním univerzitních webů. První studie se zaměřuje na množinu webových stránek českých kateder informatiky. Zkoumáme zde propojení mezi jednotlivými katedrami a několika běžnými hodnoticími metodami označujeme ty nejdůležitější. Poté analyzujeme obsah odborných publikací nalezených na daných stránkách a určujeme nejvýznačnější české autory. V druhé případové studii provádíme ten samý postup s francouzskými univerzitními weby pro nalezení nejvýznamnějších francouzských výzkumníků v oboru informatiky. Rovněž se zmiňujeme o slabých stránkách našeho přístupu a navrhujeme několik budoucích vylepšení. Na základě našich znalostí konstatujeme, že výše uvedené studie jsou jediným dosud publikovaným pokusem o vyhledávání autoritativních vědců z obou zemí přímým dolováním z webových dat

APA, Harvard, Vancouver, ISO, and other styles

28

Bhattacharya, Sanmitra. "Computational methods for mining health communications in web 2.0." Diss., University of Iowa, 2014. https://ir.uiowa.edu/etd/4576.

Full text

Abstract:

Data from social media platforms are being actively mined for trends and patterns of interests. Problems such as sentiment analysis and prediction of election outcomes have become tremendously popular due to the unprecedented availability of social interactivity data of different types. In this thesis we address two problems that have been relatively unexplored. The first problem relates to mining beliefs, in particular health beliefs, and their surveillance using social media. The second problem relates to investigation of factors associated with engagement of U.S. Federal Health Agencies via Twitter and Facebook. In addressing the first problem we propose a novel computational framework for belief surveillance. This framework can be used for 1) surveillance of any given belief in the form of a probe, and 2) automatically harvesting health-related probes. We present our estimates of support, opposition and doubt for these probes some of which represent true information, in the sense that they are supported by scientific evidence, others represent false information and the remaining represent debatable propositions. We show for example that the levels of support in false and debatable probes are surprisingly high. We also study the scientific novelty of these probes and find that some of the harvested probes with sparse scientific evidence may indicate novel hypothesis. We also show the suitability of off-the-shelf classifiers for belief surveillance. We find these classifiers are quite generalizable and can be used for classifying newly harvested probes. Finally, we show the ability of harvesting and tracking probes over time. Although our work is focused in health care, the approach is broadly applicable to other domains as well. For the second problem, our specific goals are to study factors associated with the amount and duration of engagement of organizations. We use negative binomial hurdle regression models and Cox proportional hazards survival models for these. For Twitter, the hurdle analysis shows that presence of user-mention is positively associated with the amount of engagement while negative sentiment has inverse association. Content of tweets is also equally important for engagement. The survival analyses indicate that engagement duration is positively associated with follower count. For Facebook, both hurdle and survival analyses show that number of page likes and positive sentiment are correlated with higher and prolonged engagement while few content types are negatively correlated with engagement. We also find patterns of engagement that are consistent across Twitter and Facebook.

APA, Harvard, Vancouver, ISO, and other styles

29

Gomaa, Ehab. "Environmental balance of mining from seafloor." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2014. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-137627.

Full text

Abstract:

The underwater mining has increased in the recent years and the growing awareness of the potential impacts on the environment, as results of increasing the encroachment on the marine environment. Therefore, the debate has increased about how to protect this environment by using the scientific research that\'s relevant to the various environmental effects and developing the equipment used in dredging. There is a wide diversity of underwater mining equipments, such as continuous and non-continuous dredging which used for production of sand, gravel, alluvial deposits and raw material. There are a relation between increasingly dredging activities in the recent years and their impacts on the aquatic environment. These impacts are causes by changes in the topography of the sea floor such as turbidity, noise and other environmental impacts. Today, there is an international framework of legislation which has been developed for dredging projects. This contains rules and regulations which must be followed by mining companies and have to implemented by national authorities. The European countries also develop their legislation to control on the dredged material which deposit on land and sea. This legislation is constantly changing related to scientific knowledge and increasing the implementation frameworks. Also, the people become more sensitive to emissions that have a negative attitude towards dredging in the neighborhood. In addition, dredging techniques give rise to objections, which eventually led to think in more environmentally-friendly production methods. After that, the use of dredger is the only alternative in some project. The question is what are the true benefits of these techniques and what is the relation between possible improvements and technological potential as well as the costs. Therefore, it is intended to assess mining techniques in the context of their environmental impact and the costs. There are many systematic approaches which have been used for evaluation and determination of the environmental performances of different dredging equipment, techniques and procedures. Moreover, the new future developments and latest proposals in the dredging industry have been presented. Also, the new proposal to reduce the turbidity and suspended material effect which is the most important environmental impacts during dredging operations has presented. This research work gives a description of underwater mining techniques and different way of evaluating the dredging equipment’s in term of environmental, economic and social aspects. Also, this work presents two different methods of evaluation. The first approach is the statistical analyses method by using the modern Fuzzy evaluation concept. The second slant is a mathematical accounting method by using the information from the Egyptian case study, in order to reach the most environmental-friendly dredging techniques by taking into account the economic and social point of views. The final evaluation showed that the suction and bucket ladder dredger are the best choices. The assessment was made after comparing many parameters such as performance, characteristics, working depth, soil types and project area. Also, this work presents two different new practices for extraction of the underwater placer deposits which occur on the Egyptian Mediterranean coast. The first technique uses floating processing unit in the in situ area, which gets the extracted material and treated it to reduce the transportation costs. The second technique has divided the working area to many stages which make the suction dredger used also for transporting the material into the beach. The pipeline will be the best method of transportation by using the power of suction, which reduce also the transportation costs and also avoid the environmental effects.

APA, Harvard, Vancouver, ISO, and other styles

30

Huangfu, Dan. "Data Mining for Car Insurance Claims Prediction." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/383.

Full text

Abstract:

A key challenge for the insurance industry is to charge each customer an appropriate price for the risk they represent. Risk varies widely from customer to customer, and a deep understanding of different risk factors helps predict the likelihood and cost of insurance claims. The goal of this project is to see how well various statistical methods perform in predicting bodily injury liability Insurance claim payments based on the characteristics of the insured customerâ€™s vehicles for this particular dataset from Allstate Insurance Company.We tried several statistical methods, including logistic regression, Tweedieâ€™s compound gamma-Poisson model, principal component analysis (PCA), response averaging, and regression and decision trees. From all the models we tried, PCA combined with a with a Regression Tree produced the best results. This is somewhat surprising given the widespread use of the Tweedie model for insurance claim prediction problems.

APA, Harvard, Vancouver, ISO, and other styles

31

Demšar, Urška. "Data mining of geospatial data: combining visual and automatic methods." Doctoral thesis, KTH, School of Architecture and the Built Environment (ABE), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3892.

Full text

Abstract:

Most of the largest databases currently available have a strong geospatial component and contain potentially useful information which might be of value. The discipline concerned with extracting this information and knowledge is data mining. Knowledge discovery is performed by applying automatic algorithms which recognise patterns in the data.

Classical data mining algorithms assume that data are independently generated and identically distributed. Geospatial data are multidimensional, spatially autocorrelated and heterogeneous. These properties make classical data mining algorithms inappropriate for geospatial data, as their basic assumptions cease to be valid. Extracting knowledge from geospatial data therefore requires special approaches. One way to do that is to use visual data mining, where the data is presented in visual form for a human to perform the pattern recognition. When visual mining is applied to geospatial data, it is part of the discipline called exploratory geovisualisation.

Both automatic and visual data mining have their respective advantages. Computers can treat large amounts of data much faster than humans, while humans are able to recognise objects and visually explore data much more effectively than computers. A combination of visual and automatic data mining draws together human cognitive skills and computer efficiency and permits faster and more efficient knowledge discovery.

This thesis investigates if a combination of visual and automatic data mining is useful for exploration of geospatial data. Three case studies illustrate three different combinations of methods. Hierarchical clustering is combined with visual data mining for exploration of geographical metadata in the first case study. The second case study presents an attempt to explore an environmental dataset by a combination of visual mining and a Self-Organising Map. Spatial pre-processing and visual data mining methods were used in the third case study for emergency response data.

Contemporary system design methods involve user participation at all stages. These methods originated in the field of Human-Computer Interaction, but have been adapted for the geovisualisation issues related to spatial problem solving. Attention to user-centred design was present in all three case studies, but the principles were fully followed only for the third case study, where a usability assessment was performed using a combination of a formal evaluation and exploratory usability.

APA, Harvard, Vancouver, ISO, and other styles

32

Demšar, Urška. "Data mining of geospatial data: combining visual and automatic methods /." Stockholm : Department of urban planning and environment, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3892.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Jakimauskas, Gintautas. "Analysis and application of empirical Bayes methods in data mining." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140423_090853-72998.

Full text

Abstract:

The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented.
Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju.

APA, Harvard, Vancouver, ISO, and other styles

34

Alali, Abdulkareem. "Improved Methods for Mining Software Repositories to Detect Evolutionary Couplings." Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1406565384.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Lowe, Robert Alexander. "Investigating machine learning methods in chemistry." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610567.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Molavi, M. A. "A study of potash mining methods related to ground control criteria /." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66262.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Scott-Russell, Hugh. "The application of mechanised loading and drilling methods in the gold mining industry." Thesis, University of Nottingham, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.352962.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Lloyd, P. W. "An investigation of the influence of mining upon rock mass behaviour and stratified deposits." Thesis, Cardiff University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.244117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Naveed, Nasir [Verfasser]. "Mining social media: methods and approaches for content analysis / Nasir Naveed." Koblenz : Universitätsbibliothek Koblenz, 2014. http://d-nb.info/1051888239/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Shaker, Ammar [Verfasser]. "Novel methods for mining and learning from data streams / Ammar Shaker." Paderborn : Universitätsbibliothek, 2017. http://d-nb.info/1131162684/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Zhang, Qi Wang Wei. "Mining emerging massive scientific sequence data using block-wise decomposition methods." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2009. http://dc.lib.unc.edu/u?/etd,2530.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2009.
Title from electronic title page (viewed Oct. 5, 2009). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science." Discipline: Computer Science; Department/School: Computer Science.

APA, Harvard, Vancouver, ISO, and other styles

42

Newby, Danielle Anne. "Data mining methods for the prediction of intestinal absorption using QSAR." Thesis, University of Kent, 2014. https://kar.kent.ac.uk/47600/.

Full text

Abstract:

Oral administration is the most common route for administration of drugs. With the growing cost of drug discovery, the development of Quantitative Structure-Activity Relationships (QSAR) as computational methods to predict oral absorption is highly desirable for cost effective reasons. The aim of this research was to develop QSAR models that are highly accurate and interpretable for the prediction of oral absorption. In this investigation the problems addressed were datasets with unbalanced class distributions, feature selection and the effects of solubility and permeability towards oral absorption prediction. Firstly, oral absorption models were obtained by overcoming the problem of unbalanced class distributions in datasets using two techniques, under-sampling of compounds belonging to the majority class and the use of different misclassification costs for different types of misclassifications. Using these methods, models with higher accuracy were produced using regression and linear/non-linear classification techniques. Secondly, the use of several pre-processing feature selection methods in tandem with decision tree classification analysis – including misclassification costs – were found to produce models with better interpretability and higher predictive accuracy. These methods were successful to select the most important molecular descriptors and to overcome the problem of unbalanced classes. Thirdly, the roles of solubility and permeability in oral absorption were also investigated. This involved expansion of oral absorption datasets and collection of in vitro and aqueous solubility data. This work found that the inclusion of predicted and experimental solubility in permeability models can improve model accuracy. However, the impact of solubility on oral absorption prediction was not as influential as expected. Finally, predictive models of permeability and solubility were built to predict a provisional Biopharmaceutic Classification System (BCS) class using two multi-label classification techniques, binary relevance and classifier chain. The classifier chain method was shown to have higher predictive accuracy by using predicted solubility as a molecular descriptor for permeability models, and hence better final provisional BCS prediction. Overall, this research has resulted in predictive and interpretable models that could be useful in a drug discovery context.

APA, Harvard, Vancouver, ISO, and other styles

43

Liu, Yang. "Data mining methods for single nucleotide polymorphisms analysis in computational biology." HKBU Institutional Repository, 2011. http://repository.hkbu.edu.hk/etd_ra/1287.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Jonsson, Hanna. "Safety Education for Future Mining." Thesis, Luleå tekniska universitet, Institutionen för ekonomi, teknik och samhälle, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-75563.

Full text

Abstract:

The work environment in mines have changed during the last decades. Compared to when birds were used to examine the air quality in coal mines, today’s mines strive for a zero-entry production. Which means that the persons who works at the mining company are stationed over ground - instead of under it. With digitalization and automation, companies like Boliden aim to increase a safer work environment. However, zero-entry mines are still in the future, and until future notice, existing risks and hazards need to be managed. This master thesis at Luleå University of Technology in collaboration with the Crusher and Ore Handling System (G55) department at Boliden Aitik. This master thesis aims to improve work conditions and contribute to a safer work environment by increasing awareness and knowledge regarding risks and routines at the G55 department. To accomplish this, I have during this master thesis developed an educational tool as a supplement to the current safety educations provided by SSG. This master thesis focus on providing workers with information with safety educations and motivates by lack of knowledge as a cause of accidents. Through visits, interviews and observations have been conducted to map current and future state at the G55 department and within Boliden as a company. In total ten interviews have been performed as well as several feedback occasions. Trough feedback, adjustments have been done which are beneficial in the iterative working process due its ability to go back and adjust. The current state mapping has been compared to theories in which a theoretical framework has been used as a foundation when– analyzing the current state and developing the material. Included topics in the literature gather are “Health and Safety” and “Understand and Develop training material”. These have been the foundation when discussing improvement areas and when taking decisions when I developed the education material. Since the education material is a supplement to already existing safety educations provided by SSG, training methods have been investigated. The resulting education material delivered to the G55 department resulted in a lecture based presentation material in the software PowerPoint, as well as a pamphlet with summarized information from the lecture material. The lecture training method was chosen due to its advantage of containing personal contact between new workers and existing staff. This makes the introduction material complement current safety education which are performed as a data-based training method and misses personal contact. Delivered material contains information that is consider important for new workers to know before starting their employment. Further recommendations when implementing this material are to translate it to English to reach non-Swedish-speaking persons entering the department as well as keep developing the material. The discussion question if additional educations are the most efficient way to manage and correct risks depending on the classifications of existing risks. But concludes that it is an easily tool for short-term control. According to me, the mapping of the organization should be used to eliminate or separate current risks and hazards for a sustainable solution long-term. Today, I believe that an education supplement is necessary. And hopefully, the G55 department will keep develop their organization, eliminate current risks and long-term achieve a zero-accident vision.
Arbetsmiljön i gruvor har förändrats under de senaste årtiondena. Jämfört med när fåglar användes för att undersöka luftkvalitén i kolgruvorna, strävar dagens gruvor efter en noll-entrégruva (zero-entry). Det innebär att de personer som arbetar på gruvföretaget är stationerade över marken - istället för under den. Med digitalisering och automation hoppas företag som Boliden att skapa en säkrare arbetsmiljö. Noll-entrégruvor är dock fortfarande en bit bort och tillsvidare måste existerande risker hanteras. Det här examensarbetet är utfört vid Luleå tekniska universitet och gjort i samarbete med avdelningen Kross och infrakt (G55) i Aitikgruvan. Examensarbetet syftar till att förbättra arbetsförhållandena och bidra till en säkrare arbetsmiljö genom att öka medvetenheten och kunskapen kring risker och rutiner vid G55-avdelningen. Dessutom syftar det till att utveckla ett pedagogiskt verktyg som ett tillägg till SSG:s nuvarande säkerhetsutbildning som ska användas vid introduktion av nya arbetare på området. Examensarbetet fokuserar på att ge arbetstagare information med säkerhetsutbildning och motiverar genom bristande kunskap som orsak till olyckor. Under besök hos avdelningen har intervjuer och observationer genomförts för att kartlägga nuvarande och framtida tillstånd vid G55 och inom Boliden som företag. Totalt har tio intervjuer gjorts samt några feedbacktillfällen. Tack vare feedback, har justeringar gjorts som en del i den iterativa arbetsprocessen som tillåter att gå tillbaka och justera. Den nuvarande kartläggningen har jämförts med teorier där en teoretisk referensram har använts som grund vid analys av kartläggning och utveckling av materialet. Inkluderade ämnen i litteraturen är "Hälsa och säkerhet" och "Förstå och utveckla utbildningsmaterial". Dessa har varit grunden när jag diskuterat förbättringsområden och fattat beslut under utvecklandet av utbildningsmaterial. Eftersom utbildningsmaterialet kompletterar redan befintliga säkerhetsutbildningar från SSG har utbildningsmetoder undersökts. Det resulterande utbildningsmaterialet som levererades till G55 har resulterat i ett föreläsningsbaserat presentationsmaterial i PowerPoint, liksom en broschyr med sammanfattande information från föreläsningsmaterialet. Föreläsning som utbildningsmetoden valdes på grund av dess förmån att tillåta personlig kontakt mellan nya arbetstagare och befintlig personal. Detta gör att introduktionsmaterialet kompletterar nuvarande säkerhetsutbildning som utförs som en databaserad träningsmetod och saknar just personlig kontakt. Levererat material innehåller information som är viktig för nya medarbetare att veta innan de börjar sin anställning. Ytterligare rekommendationer vid implementering av detta material är att översätta det till engelska för att nå icke-svensktalande personer på avdelningen samt fortsätta att utveckla materialet. Avslutningsvis, diskuteras huruvida ytterligare utbildningar är det effektivaste sättet att hantera och korrigera risker. Slutsatsen är att det är ett verktyg för kortsiktig kontroll. På lång sikt, borde kartläggningen, enligt mig, användas för att eliminera eller skilja nuvarande risker och faror för at få hållbar lösning. Idag tror jag att ett utbildningstillägg är nödvändigt. Och förhoppningsvis kommer G55-avdelningen att fortsätta utveckla sin organisation, eliminera nuvarande risker och på lång sikt uppnå en olycksfallsvision.

APA, Harvard, Vancouver, ISO, and other styles

45

Mosquera, Jenyfer. "Static and pseudo-static stability analysis of tailings storage facilities using deterministic and probabilistic methods." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=117155.

Full text

Abstract:

Tailings facilities are vast man-made structures designed and built for the storage and management of mill effluents throughout the life of a mining project. There are different types of tailings storage facilities (TSF) classified in accordance with the method of construction of the embankment and the mechanical properties of the tailings to be stored. The composition of tailings is determined by the mineral processing technique used to obtain the concentrate as well as the physical and chemical properties of the ore body. As a common denominator, TSFs are vulnerable to failure due to design or operational deficiencies, site-specific features, or due to random variables such as material properties, seismic events or unusual precipitation. As a result, long-term risk based stability assessment of mine wastes storage facilities is necessary.The stability analyses of TSFs are traditionally conducted using the Limit Equilibrium Method (LEM). However, it has been demonstrated that relying exclusively on this approach may not warrant full understanding of the behaviour of the TSF because the LEM neglects the stress-deformation constitutive relationships that ensure displacement compatibility. Furthermore, the intrinsic variability of tailings properties is not taken into account either because it is basically a deterministic method. In order to overcome these limitations of the LEM, new methods and techniques have been proposed for slope stability assessment. The Strength Reduction Technique (SRT) based on the Finite Element Method (FEM), for instance, has been successfully applied for this purpose. Likewise, stability assessment with the probabilistic approach has gained more and more popularity in mining engineering because it offers a comprehensive and more realistic estimation of TSFs performance. In the light of the advances in numerical modelling and geotechnical engineering applied to the mining industry, this thesis presents a stability analysis comparison between an upstream tailings storage facility (UTSF), and a water retention tailings dam (WRTD). First, the effect of embankment/tailings height increase on the overall stability is evaluated under static and pseudo-static states. Second, the effect of the phreatic surface location in the UTSF, and the embankment to core permeability ratio in the WRTD are investigated. The analyses are conducted using rigorous and simplified LEMs and the FEM - SRT. In order to take into consideration the effect of the intrinsic variability of tailings properties on stability, parametric analyses are conducted to identify the critical random variables of each TSF. Finally, the Monte Carlo Simulation (MCS), and the Point Estimate Method (PEM) are applied to recalculate the FOS and to estimate the probability of failure and reliability indices of each analysis. The results are compared against the minimum static and pseudo-static stability requirements and design guidelines applicable to mining operations in the Province of Quebec, Canada.Keywords: Tailings storage facilities (TSF), Limit Equilibrium Method (LEM), Shear Reduction Technique (SST), pseudo-static seismic coefficient, probability of failure, Point Estimate Method (PEM), Reliability Index.
Les parcs à résidus miniers (PRMs) sont de vastes structures utilisées pour le stockage et la gestion des déchets pendant l'opération et après la clôture d'un site minier. Différentes techniques d'entreposage existent, dépendant principalement de la méthode de construction de la digue et des propriétés physiques, chimiques et mécaniques des résidus à stocker. La composition des résidus est déterminée par la technique utilisée pour extraire le minerai du gisement ainsi que par les propriétés physico-chimiques du gisement. De manière générale, les installations de stockage de résidus miniers sont dans une certaine mesure, sujettes à des ruptures. Celles-ci sont associées à des défauts de conception et d'exploitation, des conditions spécifiques au site, des facteurs environnementaux, ainsi que des variables aléatoires telles que les propriétés des matériaux, les événements sismiques, ou les précipitations inhabituelles. Par conséquent, la stabilité des PRMs à long terme est nécessaire sur la base de l'évaluation de risques.Les analyses de stabilité sont généralement effectuées à l'aide de la méthode d'équilibre limite (MEL), cependant, il a été prouvé que s'appuyer exclusivement sur les MELs n'est pas exact car la relation entre déformation et contrainte est négligée dans cette approche, tout comme le déplacement ayant lieu au pendant la construction et l'opération des PRMs. En outre, la variabilité spatiale intrinsèque des propriétés des résidus et autres matériaux utilisés pour la construction des PRMs n'est pas prise en compte. En conséquence, de nouvelles méthodes et techniques ont été développées pour surmonter les limites de la MEL. La méthode des éléments finis (MEF) et la Technique de réduction de cisaillement (TRC), par exemple, ont été appliquées avec succès pour l'analyse de la stabilité des PRMs. De même, l'approche probabiliste pour l'analyse de la stabilité des pentes a gagné en popularité car elle offre une simulation complète et plus réaliste de la performance des PRMs.À la lumière des progrès réalisés dans le domaine de la modélisation numérique et de la géotechnique pour l'industrie minière, cette thèse présente une comparaison entre une installation d'entreposage des résidus en amont et un barrage de stériles et d'eaux de décantation.En premier lieu, l'effet de l'augmentation de la hauteur des résidus sur la stabilité globale est évalué en vertu des états statiques et pseudo-statiques. En deuxième lieu, l'effet de l'emplacement de la nappe phréatique dans installation d'entreposage des résidus en amont et le rapport de perméabilité de remblai dans le barrage de stériles et d'eaux de décantation sont étudiés. Les analyses sont conduites en utilisant la modélisation numérique des MELs et la MEF – TRC.Des analyses paramétriques sont effectuées pour identifier les variables aléatoires critiques de chaque parc à résidus miniers. Finalement, pour évaluer, la simulation de Monte Carlo (MCS) et la méthode d'estimation ponctuelle (MEP) sont appliquées pour recalculer les facteurs de stabilité et pour estimer la probabilité de défaillance et les indices de fiabilité qui leur sont associées. Les résultats de chaque analyse sont comparés aux exigences minimales de stabilité des pentes applicables aux opérations minières dans la province de Québec, Canada.Mots-clés: Parcs à résidus miniers (PRMs), coefficient sismique, Technique de Réduction de Cisaillement (TRC), probabilité de défaillance, Méthode d'Estimation Ponctuelle (MEP), indice de fiabilité.

APA, Harvard, Vancouver, ISO, and other styles

46

Davis, Aaron Samuel. "Bisecting Document Clustering Using Model-Based Methods." BYU ScholarsArchive, 2009. https://scholarsarchive.byu.edu/etd/1938.

Full text

Abstract:

We all have access to large collections of digital text documents, which are useful only if we can make sense of them all and distill important information from them. Good document clustering algorithms that organize such information automatically in meaningful ways can make a difference in how effective we are at using that information. In this paper we use model-based document clustering algorithms as a base for bisecting methods in order to identify increasingly cohesive clusters from larger, more diverse clusters. We specifically use the EM algorithm and Gibbs Sampling on a mixture of multinomials as the base clustering algorithms on three data sets. Additionally, we apply a refinement step, using EM, to the final output of each clustering technique. Our results show improved agreement with human annotated document classes when compared to the existing base clustering algorithms, with marked improvement in two out of three data sets.

APA, Harvard, Vancouver, ISO, and other styles

47

Salvi, Giampiero. "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis." Doctoral thesis, Stockholm : KTH School of Computer Science and Comunication, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

TICONA, WILFREDO MAMANI. "STUDY OF DATA MINING METHODS APPLIED TO THE FINANCIAL MANAGEMENT OF MUNICIPALITIES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=35344@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Os impostos arrecadados pelas prefeituras são revertidos para o bem comum, para investimentos (tais como infraestrutura) e custeio de bens e serviços públicos, como saúde, segurança e educação. A previsão de valores futuros a serem arrecadados é uma das tarefas que as prefeituras têm como desafio. Essa é uma tarefa importante, pois as informações obtidas das previsões são valiosas para dar apoio à decisão com relação ao planejamento estratégico da prefeitura. Sendo assim, a investigação de modelos de previsão de impostos municipais, através de técnicas inteligentes, é de grande importância para a administração municipal. Deste modo, um dos objetivos desta dissertação foi desenvolver dois modelos para previsão de impostos utilizando redes neurais. Um modelo considerando variáveis endógenas e outro considerando variáveis endógenas e exógenas. Outro grande desafio para as prefeituras são as irregularidades no pagamento de tributos (erro ou fraude), que também prejudica o planejamento estratégico. A fiscalização mensal de todos os contribuintes é uma tarefa impossível de se realizar devido à desproporção entre o número de contribuintes e o reduzido número de agentes fiscais. Assim, a investigação de métodos baseados em técnicas inteligentes para indicar os possíveis suspeitos de irregularidade, é importante para o desempenho das atividades do agente fiscal. Deste modo, outro objetivo desta dissertação foi desenvolver um modelo visando identificar possíveis suspeitos de irregularidades no pagamento do ISSQN (Imposto Sobre Serviços de Qualquer Natureza). Os modelos de previsão foram avaliados, com três estudos de caso usando dados do município de Araruama. Para o modelo de previsão utilizando variáveis endógenas utilizou-se dois estudos de caso: o primeiro caso para a previsão de Receitas da Dívida Ativa e o segundo caso para a previsão de Receitas Tributárias, e um terceiro estudo caso para o modelo de previsão do ISSQN, utilizando variáveis endógenas e exógenas. Essas previsões obtiveram resultados, que se julgam promissores, a despeito dos dados utilizados nos estudos de caso. Com relação à irregularidade, apesar de não ter sido possível avaliar os resultados obtidos, entende-se que a ferramenta poderá ser utilizada como indicador para novas diligências.
Taxes collected by city halls are reverted towards common welfare; investments (such as infrastructure), and funding of public goods, as services on health, safety and education. The prediction of tax revenues is one of the tasks that have as challenges the city hall. This is an important task; because the information obtained from these predictions are important to support the city halls with relation the strategic planning. Thus, the investigation of prediction models designed for tax revenues through intelligent techniques is of great importance for public administration. One of the goals of this dissertation was to develop two models to prediction tax revenue using neural networks. The first model was designed considering endogenous variables only. The latter, considered both endogenous and exogenous variables. Another major challenge for city hall are irregularities in the taxes payment (error or fraud), which also affect the strategic planning. A monthly of all taxpayers is an impossible task to accomplish, due to the disproportion between the number of taxpayers and the reduced number of tax agents. Thus, research of methods based on intelligent techniques that indicate possible irregularities, is of great importance for tax agents. This way, another objective of this dissertation was to develop a model to identify possible suspects irregularities in the payment of the ISSQN (tax services of any nature). Prediction models were evaluated with three case studies using data from the city hall of Araruama. For the prediction model using endogenous variable, two case studies we used: (i) active debt revenues prediction, (ii) tax revenues prediction and (iii) ISSQN prediction, the latter using both endogenous and exogenous variables. In spite of the data used in the case studies, the results obtained from modeling are promising. Regarding tax irregularities, even though is not possible to evaluate the obtained results, the developed tool may be used as an indicator for future applications.

APA, Harvard, Vancouver, ISO, and other styles

49

Tatiya, Ratan Raj. "Ore estimation and selection of underground mining methods for some copper deposits." Thesis, Imperial College London, 1987. http://hdl.handle.net/10044/1/46738.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Chang, Shi Fong, and 張希鳳. "Schizophrenia Screening using data mining methods." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/21363974455241615659.

Full text

Abstract:

碩士
樹德科技大學
資訊管理研究所
90
In this thesis, we try to employ the intelligent data analysis, decision tree C4.5, and logistic regression, to explore the mental illness, schizophrenia, through neuropsychological batteries. By utilizing neuropsychological batteries, we first build a neuropsychological impairment model and second screen brain-behavior relationships of schizophrenia to understand various neurobehavioral systems that may be influenced by schizophrenia. By employing the neuropsychological impairment model of schizophrenia, behavioral scientists and clinicians can use the model to differentiate schizophrenic-like patients from schizophrenics and then decide how to rehabilitate the patients according to their impaired neuropsychological function.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Mining methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles