Dissertations / Theses: 'Mining engineering Blasting Data processing'

1

Williamson, Lance K. "ROPES : an expert system for condition analysis of winder ropes." Master's thesis, University of Cape Town, 1990. http://hdl.handle.net/11427/15982.

Full text

Abstract:

Includes bibliographical references.
This project was commissioned in order to provide engineers with the necessary knowledge of steel wire winder ropes so that they may make accurate decisions as to when a rope is near the end of its useful life. For this purpose, a knowledge base was compiled from the experience of experts in the field in order to create an expert system to aid the engineer in his task. The EXSYS expert system shell was used to construct a rule-based program which would be run on a personal computer. The program derived in this thesis is named ROPES, and provides information as to the forms of damage that may be present in a rope and the effect of any defects on rope strength and rope life. Advice is given as to the procedures that should be followed when damage is detected as well as the conditions which would necessitate rope discard and the urgency with which the replacement should take place. The expert system program will provide engineers with the necessary expertise and experience to assess, more accurately than at present, the condition of a winder rope. This should lead to longer rope life and improved safety with the associated cost savings. Rope assessment will also be more uniform with changes to policy being able to be implemented quickly and on an ongoing basis as technology and experience improves. The program ROPES, although compiled from expert knowledge, still requires the further input of personal opinions and inferences to some extent. For this reason, the program cannot be assumed infallible and must be used as an aid only.

APA, Harvard, Vancouver, ISO, and other styles

2

van, Schaik Sebastiaan Johannes. "A framework for processing correlated probabilistic data." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:91aa418d-536e-472d-9089-39bef5f62e62.

Full text

Abstract:

The amount of digitally-born data has surged in recent years. In many scenarios, this data is inherently uncertain (or: probabilistic), such as data originating from sensor networks, image and voice recognition, location detection, and automated web data extraction. Probabilistic data requires novel and different approaches to data mining and analysis, which explicitly account for the uncertainty and the correlations therein. This thesis introduces ENFrame: a framework for processing and mining correlated probabilistic data. Using this framework, it is possible to express both traditional and novel algorithms for data analysis in a special user language, without having to explicitly address the uncertainty of the data on which the algorithms operate. The framework will subsequently execute the algorithm on the probabilistic input, and perform exact or approximate parallel probability computation. During the probability computation, correlations and provenance are succinctly encoded using probabilistic events. This thesis contains novel contributions in several directions. An expressive user language – a subset of Python – is introduced, which allows a programmer to implement algorithms for probabilistic data without requiring knowledge of the underlying probabilistic model. Furthermore, an event language is presented, which is used for the probabilistic interpretation of the user program. The event language can succinctly encode arbitrary correlations using events, which are the probabilistic counterparts of deterministic user program variables. These highly interconnected events are stored in an event network, a probabilistic interpretation of the original user program. Multiple techniques for exact and approximate probability computation (with error guarantees) of such event networks are presented, as well as techniques for parallel computation. Adaptations of multiple existing data mining algorithms are shown to work in the framework, and are subsequently subjected to an extensive experimental evaluation. Additionally, a use-case is presented in which a probabilistic adaptation of a clustering algorithm is used to predict faults in energy distribution networks. Lastly, this thesis presents techniques for integrating a number of different probabilistic data formalisms for use in this framework and in other applications.

APA, Harvard, Vancouver, ISO, and other styles

3

Pabarškaitė, Židrina. "Enhancements of pre-processing, analysis and presentation techniques in web log mining." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090713_142203-05841.

Full text

Abstract:

As Internet is becoming an important part of our life, more attention is paid to the information quality and how it is displayed to the user. The research area of this work is web data analysis and methods how to process this data. This knowledge can be extracted by gathering web servers’ data – log files, where all users’ navigational patters about browsing are recorded. The research object of the dissertation is web log data mining process. General topics that are related with this object: web log data preparation methods, data mining algorithms for prediction and classification tasks, web text mining. The key target of the thesis is to develop methods how to improve knowledge discovery steps mining web log data that would reveal new opportunities to the data analyst. While performing web log analysis, it was discovered that insufficient interest has been paid to web log data cleaning process. By reducing the number of redundant records data mining process becomes much more effective and faster. Therefore a new original cleaning framework was introduced which leaves records that only corresponds to the real user clicks. People tend to understand technical information more if it is similar to a human language. Therefore it is advantageous to use decision trees for mining web log data, as they generate web usage patterns in the form of rules which are understandable to humans. However, it was discovered that users browsing history length is different, therefore specific data... [to full text]
Internetui skverbiantis į mūsų gyvenimą, vis didesnis dėmesys kreipiamas į informacijos pateikimo kokybę, bei į tai, kaip informacija yra pateikta. Disertacijos tyrimų sritis yra žiniatinklio serverių kaupiamų duomenų gavyba bei duomenų pateikimo galutiniam naudotojui gerinimo būdai. Tam reikalingos žinios išgaunamos iš žiniatinklio serverio žurnalo įrašų, kuriuose fiksuojama informacija apie išsiųstus vartotojams žiniatinklio puslapius. Darbo tyrimų objektas yra žiniatinklio įrašų gavyba, o su šiuo objektu susiję dalykai: žiniatinklio duomenų paruošimo etapų tobulinimas, žiniatinklio tekstų analizė, duomenų analizės algoritmai prognozavimo ir klasifikavimo uždaviniams spręsti. Pagrindinis disertacijos tikslas – perprasti svetainių naudotojų elgesio formas, tiriant žiniatinklio įrašus, tobulinti paruošimo, analizės ir rezultatų interpretavimo etapų metodologijas. Darbo tyrimai atskleidė naujas žiniatinklio duomenų analizės galimybes. Išsiaiškinta, kad internetinių duomenų – žiniatinklio įrašų švarinimui buvo skirtas nepakankamas dėmesys. Parodyta, kad sumažinus nereikšmingų įrašų kiekį, duomenų analizės procesas tampa efektyvesnis. Todėl buvo sukurtas naujas metodas, kurį pritaikius žinių pateikimas atitinka tikruosius vartotojų maršrutus. Tyrimo metu nustatyta, kad naudotojų naršymo istorija yra skirtingų ilgių, todėl atlikus specifinį duomenų paruošimą – suformavus fiksuoto ilgio vektorius, tikslinga taikyti iki šiol nenaudotus praktikoje sprendimų medžių algoritmus... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

4

Yu, Zhiguo. "Cooperative Semantic Information Processing for Literature-Based Biomedical Knowledge Discovery." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/33.

Full text

Abstract:

Given that data is increasing exponentially everyday, extracting and understanding the information, themes and relationships from large collections of documents is more and more important to researchers in many areas. In this paper, we present a cooperative semantic information processing system to help biomedical researchers understand and discover knowledge in large numbers of titles and abstracts from PubMed query results. Our system is based on a prevalent technique, topic modeling, which is an unsupervised machine learning approach for discovering the set of semantic themes in a large set of documents. In addition, we apply a natural language processing technique to transform the “bag-of-words” assumption of topic models to the “bag-of-important-phrases” assumption and build an interactive visualization tool using a modified, open-source, Topic Browser. In the end, we conduct two experiments to evaluate the approach. The first, evaluates whether the “bag-of-important-phrases” approach is better at identifying semantic themes than the standard “bag-of-words” approach. This is an empirical study in which human subjects evaluate the quality of the resulting topics using a standard “word intrusion test” to determine whether subjects can identify a word (or phrase) that does not belong in the topic. The second is a qualitative empirical study to evaluate how well the system helps biomedical researchers explore a set of documents to discover previously hidden semantic themes and connections. The methodology for this study has been successfully used to evaluate other knowledge-discovery tools in biomedicine.

APA, Harvard, Vancouver, ISO, and other styles

5

Grenoble, B. Alex. "Microcomputer simulation of near seam interaction." Thesis, Virginia Polytechnic Institute and State University, 1985. http://hdl.handle.net/10919/90929.

Full text

Abstract:

The mining of coal within 110 feet below a previously mined seam creates interaction effects which can be detrimental to work in the lower seam. These interaction effects are characterized by zones of very high stress and result in floor and roof instability and pillar crushing. Recent developments in the field of ground control make it possible to determine with a certain degree of confidence the location of these zones and estimate the degree to which the interaction will affect the lower seam. This information has been incorporated into a software package for microcomputers which will predict lower seam problems and suggest design criteria for minimizing the difficulties which will be encountered.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

6

Gupta, Shweta. "Software Development Productivity Metrics, Measurements and Implications." Thesis, University of Oregon, 2018. http://hdl.handle.net/1794/23816.

Full text

Abstract:

The rapidly increasing capabilities and complexity of numerical software present a growing challenge to software development productivity. While many open source projects enable the community to share experiences, learn and collaborate; estimating individual developer productivity becomes more difficult as projects expand. In this work, we analyze some HPC software Git repositories with issue trackers and compute productivity metrics that can be used to better understand and potentially improve development processes. Evaluating productivity in these communities presents additional challenges because bug reports and feature requests are often done by using mailing lists instead of issue tracking, resulting in difficult-to-analyze unstructured data. For such data, we investigate automatic tag generation by using natural language processing techniques. We aim to produce metrics that help quantify productivity improvement or degradation over the projects lifetimes. We also provide an objective measurement of productivity based on the effort estimation for the developer's work.

APA, Harvard, Vancouver, ISO, and other styles

7

Castro, Jose R. "MODIFICATIONS TO THE FUZZY-ARTMAP ALGORITHM FOR DISTRIBUTED LEARNING IN LARGE DATA SETS." Doctoral diss., University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4449.

Full text

Abstract:

The Fuzzy-ARTMAP (FAM) algorithm is one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. In this dissertation we apply data partitioning and network partitioning to the FAM algorithm in a sequential and parallel setting to achieve better convergence time and to efficiently train with large databases (hundreds of thousands of patterns). We implement our parallelization on a Beowulf clusters of workstations. This choice of platform requires that the process of parallelization be coarse grained. Extensive testing of all the approaches is done on three large datasets (half a million data points). One of them is the Forest Covertype database from Blackard and the other two are artificially generated Gaussian data with different percentages of overlap between classes. Speedups in the data partitioning approach reached the order of the hundreds without having to invest in parallel computation. Speedups on the network partitioning approach are close to linear on a cluster of workstations. Both methods allowed us to reduce the computation time of training the neural network in large databases from days to minutes. We prove formally that the workload balance of our network partitioning approaches will never be worse than an acceptable bound, and also demonstrate the correctness of these parallelization variants of FAM.
Ph.D.
School of Electrical and Computer Engineering
Engineering and Computer Science
Electrical and Computer Engineering

APA, Harvard, Vancouver, ISO, and other styles

8

Kumar, Saurabh. "Real-Time Road Traffic Events Detection and Geo-Parsing." Thesis, Purdue University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10842958.

Full text

Abstract:

In the 21^st century, there is an increasing number of vehicles on the road as well as a limited road infrastructure. These aspects culminate in daily challenges for the average commuter due to congestion and slow moving traffic. In the United States alone, it costs an average US driver $1200 every year in the form of fuel and time. Some positive steps, including (a) introduction of the push notification system and (b) deploying more law enforcement troops, have been taken for better traffic management. However, these methods have limitations and require extensive planning. Another method to deal with traffic problems is to track the congested area in a city using social media. Next, law enforcement resources can be re-routed to these areas on a real-time basis.

Given the ever-increasing number of smartphone devices, social media can be used as a source of information to track the traffic-related incidents.

Social media sites allow users to share their opinions and information. Platforms like Twitter, Facebook, and Instagram are very popular among users. These platforms enable users to share whatever they want in the form of text and images. Facebook users generate millions of posts in a minute. On these platforms, abundant data, including news, trends, events, opinions, product reviews, etc. are generated on a daily basis.

Worldwide, organizations are using social media for marketing purposes. This data can also be used to analyze the traffic-related events like congestion, construction work, slow-moving traffic etc. Thus the motivation behind this research is to use social media posts to extract information relevant to traffic, with effective and proactive traffic administration as the primary focus. I propose an intuitive two-step process to utilize Twitter users' posts to obtain for retrieving traffic-related information on a real-time basis. It uses a text classifier to filter out the data that contains only traffic information. This is followed by a Part-Of-Speech (POS) tagger to find the geolocation information. A prototype of the proposed system is implemented using distributed microservices architecture.

APA, Harvard, Vancouver, ISO, and other styles

9

Sheikha, Hassan. "Text mining Twitter social media for Covid-19 : Comparing latent semantic analysis and latent Dirichlet allocation." Thesis, Högskolan i Gävle, Avdelningen för datavetenskap och samhällsbyggnad, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-32567.

Full text

Abstract:

In this thesis, the Twitter social media is data mined for information about the covid-19 outbreak during the month of March, starting from the 3’rd and ending on the 31’st. 100,000 tweets were collected from Harvard’s opensource data and recreated using Hydrate. This data is analyzed further using different Natural Language Processing (NLP) methodologies, such as termfrequency inverse document frequency (TF-IDF), lemmatizing, tokenizing, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Furthermore, the results of the LSA and LDA algorithms is reduced dimensional data that will be clustered using clustering algorithms HDBSCAN and K-Means for later comparison. Different methodologies are used to determine the optimal parameters for the algorithms. This is all done in the python programing language, as there are libraries for supporting this research, the most important being scikit-learn. The frequent words of each cluster will then be displayed and compared with factual data regarding the outbreak to discover if there are any correlations. The factual data is collected by World Health Organization (WHO) and is then visualized in graphs in ourworldindata.org. Correlations with the results are also looked for in news articles to find any significant moments to see if that affected the top words in the clustered data. The news articles with good timelines used for correlating incidents are that of NBC News and New York Times. The results show no direct correlations with the data reported by WHO, however looking into the timelines reported by news sources some correlation can be seen with the clustered data. Also, the combination of LDA and HDBSCAN yielded the most desireable results in comparison to the other combinations of the dimnension reductions and clustering. This was much due to the use of GridSearchCV on LDA to determine the ideal parameters for the LDA models on each dataset as well as how well HDBSCAN clusters its data in comparison to K-Means.

APA, Harvard, Vancouver, ISO, and other styles

10

Yang, Yimin. "Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2254.

Full text

Abstract:

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.

APA, Harvard, Vancouver, ISO, and other styles

11

Mortensen, Clifton H. "A Computational Fluid Dynamics Feature Extraction Method Using Subjective Logic." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2208.

Full text

Abstract:

Computational fluid dynamics simulations are advancing to correctly simulate highly complex fluid flow problems that can require weeks of computation on expensive high performance clusters. These simulations can generate terabytes of data and pose a severe challenge to a researcher analyzing the data. Presented in this document is a general method to extract computational fluid dynamics flow features concurrent with a simulation and as a post-processing step to drastically reduce researcher post-processing time. This general method uses software agents governed by subjective logic to make decisions about extracted features in converging and converged data sets. The software agents are designed to work inside the Concurrent Agent-enabled Feature Extraction concept and operate efficiently on massively parallel high performance computing clusters. Also presented is a specific application of the general feature extraction method to vortex core lines. Each agent's belief tuple is quantified using a pre-defined set of information. The information and functions necessary to set each component in each agent's belief tuple is given along with an explanation of the methods for setting the components. A simulation of a blunt fin is run showing convergence of the horseshoe vortex core to its final spatial location at 60% of the converged solution. Agents correctly select between two vortex core extraction algorithms and correctly identify the expected probabilities of vortex cores as the solution converges. A simulation of a delta wing is run showing coherently extracted primary vortex cores as early as 16% of the converged solution. Agents select primary vortex cores extracted by the Sujudi-Haimes algorithm as the most probable primary cores. These simulations show concurrent feature extraction is possible and that intelligent agents following the general feature extraction method are able to make appropriate decisions about converging and converged features based on pre-defined information.

APA, Harvard, Vancouver, ISO, and other styles

12

Transell, Mark Marriott. "The Use of bioinformatics techniques to perform time-series trend matching and prediction." Diss., University of Pretoria, 2012. http://hdl.handle.net/2263/37061.

Full text

Abstract:

Process operators often have process faults and alarms due to recurring failures on process equipment. It is also the case that some processes do not have enough input information or process models to use conventional modelling or machine learning techniques for early fault detection. A proof of concept for online streaming prediction software based on matching process behaviour to historical motifs has been developed, making use of the Basic Local Alignment Search Tool (BLAST) used in the Bioinformatics field. Execution times of as low as 1 second have been recorded, demonstrating that online matching is feasible. Three techniques have been tested and compared in terms of their computational effciency, robustness and selectivity, with results shown in Table 1: • Symbolic Aggregate Approximation combined with PSI-BLAST • Naive Triangular Representation with PSI-BLAST • Dynamic Time Warping Table 1: Properties of different motif-matching methods Property SAX-PSIBLAST TER-PSIBLAST DTW Noise tolerance (Selectivity) Acceptable Inconclusive Good Vertical Shift tolerance None Perfect Poor Matching speed Acceptable Acceptable Fast Match speed scaling O < O(mn) O < O(mn) O(mn) Dimensionality Reduction Tolerance Good Inconclusive Acceptable It is recommended that a method using a weighted confidence measure for each technique be investigated for the purpose of online process event handling and operator alerts. Keywords: SAX, BLAST, motif-matching, Dynamic Time Warping
Dissertation (MEng)--University of Pretoria, 2012.
Chemical Engineering
unrestricted

APA, Harvard, Vancouver, ISO, and other styles

13

Shankar, Arunprasath. "ONTOLOGY-DRIVEN SEMI-SUPERVISED MODEL FOR CONCEPTUAL ANALYSIS OF DESIGN SPECIFICATIONS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1401706747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Bergfors, Anund. "Using machine learning to identify the occurrence of changing air masses." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-357939.

Full text

Abstract:

In the forecast data post-processing at the Swedish Meteorological and Hydrological Institute (SMHI) a regular Kalman filter is used to debias the two meter air temperature forecast of the physical models by controlling towards air temperature observations. The Kalman filter however diverges when encountering greater nonlinearities in shifting weather patterns, and can only be manually reset when a new air mass has stabilized itself within its operating region. This project aimed to automate this process by means of a machine learning approach. The methodology was at its base supervised learning, by first algorithmically labelling the air mass shift occurrences in the data, followed by training a logistic regression model. Observational data from the latest twenty years of the Uppsala automatic meteorological station was used for the analysis. A simple pipeline for loading, labelling, training on and visualizing the data was built. As a work in progress the operating regime was more of a semi-supervised one - which also in the long run could be a necessary and fruitful strategy. Conclusively the logistic regression appeared to be quite able to handle and infer from the dynamics of air temperatures - albeit non-robustly tested - being able to correctly classify 77% of the labelled data. This work was presented at Uppsala University in June 1st of 2018, and later in June 20th at SMHI.

APA, Harvard, Vancouver, ISO, and other styles

15

Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.

Full text

Abstract:

In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points.

APA, Harvard, Vancouver, ISO, and other styles

16

Gomes, Eduardo Luis. "Arquitetura RF-Miner: uma solução para localização em ambientes internos." Universidade Tecnológica Federal do Paraná, 2017. http://repositorio.utfpr.edu.br/jspui/handle/1/2898.

Full text

Abstract:

A utilização de etiquetas RFID UHF passivas para localização indoor vem sendo amplamente estudada devido ao seu baixo custo. Porém ainda existe uma grande dificuldade em obter bons resultados, principalmente devido à variação de rádio frequência em ambientes que possuem materiais reflexivos, como por exemplo, metais e vidros. Esta pesquisa propõe uma arquitetura de localização para ambientes indoor utilizando etiquetas RFID UHF passivas e técnicas de mineração de dados. Com a aplicação da arquitetura em ambiente real foi possível identificar a posição exata de objetos com a precisão de aproximadamente cinco centímetros e em tempo real. A arquitetura se demonstrou uma eficiente alternativa para implantação de sistemas de localização indoor, além de apresentar uma técnica de derivação de atributos diretos que contribui efetivamente para os resultados finais.
The use of passive UHF RFID tags for indoor location has been widely studied due to its low cost. However, there is still a great difficulty to reach good results, mainly due the radio frequency variation in environments that have materials with reflective surfaces, such as metal and glass. This research proposes a localization architecture for indoor environments using passive UHF RFID tags and data mining techniques. With the application of the architecture in real environment, it was possible to identify the exact position of objects with the precision of approximately five centimeters and in real time. The architecture has demonstrated an efficient alternative for the implantation of indoor localization systems, besides presenting a derivation technique of direct attributes that contributes effectively to the final results.

APA, Harvard, Vancouver, ISO, and other styles

17

STOLOJESCU, Cristina Laura. "A Wavelets Based Approach for Time Serie Mining." Phd thesis, 2012. http://tel.archives-ouvertes.fr/tel-00719668.

Full text

Abstract:

This thesis is based on the research of time series analysis. Our work evaluates a set of time series conceived by monitoring the traffic developed in a WiMAX network. Taking into consideration the high volume of information contained in this database, a data-mining approach was preferred. Assuming that the traffic associated with a BS bad positioned is heavier than the traffic associated with a BS well positioned, two approaches for the appreciation of the heaviness of the traffic were developed. The first approach is based on the supposition that a BS with heavy traffic has a reduced risk of saturation. Hence, it is necessary to appreciate the risk of saturation of each BS. So, the first objective of this thesis is to propose an approach for predicting time series. It is based on a multiple resolution decomposition of the signal using the Stationary Wavelet Transform and ARIMA modeling. The second approach for the appreciation of the heaviness of the traffic is based on Long Range Dependence analysis. The estimation of LRD degree is realized through the estimation of the Hurst parameter of the time-series under analysis. Our objective is to analyze the positioning of BSs in the architecture of a WiMAX network. The results show which BSs have a good localization and which BSs have a bad localization in the topology of the network and must be repositioned when the next session of network maintenance will take place. The application of both data mining techniques, forecasting and LRD analysis, in the wavelets domain is decisive for their performance, improving the speed and the precision of the developed algorithms.

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Yingjian. "Application of Stochastic Processes in Nonparametric Bayes." Diss., 2014. http://hdl.handle.net/10161/9395.

Full text

Abstract:

This thesis presents theoretical studies of some stochastic processes and their appli- cations in the Bayesian nonparametric methods. The stochastic processes discussed in the thesis are mainly the ones with independent increments - the Levy processes. We develop new representations for the Levy measures of two representative exam- ples of the Levy processes, the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma dis- tributions, with the truncation and posterior analyses provided. The decompositions provide new insights into the beta and gamma processes (and their generalizations), and we demonstrate how the proposed representation unifies some properties of the two, as these are of increasing importance in machine learning.

Next a new Levy process is proposed for an uncountable collection of covariate- dependent feature-learning measures; the process is called the kernel beta process. Available covariates are handled efficiently via the kernel construction, with covari- ates assumed observed with each data sample ("customer"), and latent covariates learned for each feature ("dish"). The dependencies among the data are represented with the covariate-parameterized kernel function. The beta process is recovered as a limiting case of the kernel beta process. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks.

Last is a non-Levy process example of the multiplicative gamma process applied in the low-rank representation of tensors. The multiplicative gamma process is applied along the super-diagonal of tensors in the rank decomposition, with its shrinkage property nonparametrically learns the rank from the multiway data. This model is constructed as conjugate for the continuous multiway data case. For the non- conjugate binary multiway data, the Polya-Gamma auxiliary variable is sampled to elicit closed-form Gibbs sampling updates. This rank decomposition of tensors driven by the multiplicative gamma process yields state-of-art performance on various synthetic and benchmark real-world datasets, with desirable model scalability.

Dissertation

APA, Harvard, Vancouver, ISO, and other styles

19

Sun, Le. "Data stream mining in medical sensor-cloud." Thesis, 2016. https://vuir.vu.edu.au/31032/.

Full text

Abstract:

Data stream mining has been studied in diverse application domains. In recent years, a population aging is stressing the national and international health care systems. Along with the advent of hundreds and thousands of health monitoring sensors, the traditional wireless sensor networks and anomaly detection techniques cannot handle huge amounts of information. Sensor-cloud makes the processing and storage of big sensor data much easier. Sensor-cloud is an extension of Cloud by connecting the Wireless Sensor Networks (WSNs) and the cloud through sensor and cloud gateways, which consistently collect and process a large amount of data from various sensors located in different areas. In this thesis, I will focus on analysing a large volume of medical sensor data streams collected from Sensor-cloud. To analyse the Medical data streams, I propose a medical data stream mining framework, which is targeted on tackling four main challenges ...

APA, Harvard, Vancouver, ISO, and other styles

20

Shehata, Shady. "Concept Mining: A Conceptual Understanding based Approach." Thesis, 2009. http://hdl.handle.net/10012/4430.

Full text

Abstract:

Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content. In addition, statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Therefore, there is an intensive need for a model that captures the meaning of linguistic utterances in a formal structure. The underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence, document and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, concept extractor and concept-based similarity measure. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based model on different datasets in text clustering, categorization and retrieval are conducted. The experiments demonstrate extensive comparison between traditional weighting and the concept-based weighting obtained by the concept-based model. Experimental results in text clustering, categorization and retrieval demonstrate the substantial enhancement of the quality using: (1) concept-based term frequency (tf), (2) conceptual term frequency (ctf), (3) concept-based statistical analyzer, (4) conceptual ontological graph, (5) concept-based combined model. In text clustering, the evaluation of results is relied on two quality measures, the F-Measure and the Entropy. In text categorization, the evaluation of results is relied on three quality measures, the Micro-averaged F1, the Macro-averaged F1 and the Error rate. In text retrieval, the evaluation of results relies on three quality measures, the precision at 10 documents retrieved P(10), the preference measure (bpref), and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based model is used to enhance the quality of the text clustering, categorization and retrieval.

APA, Harvard, Vancouver, ISO, and other styles

21

Margono, Hendro. "Analysis of the Indonesian Cyberbullying through Data Mining: The Effective Identification of Cyberbullying through Characteristics of Messages." Thesis, 2019. https://vuir.vu.edu.au/39499/.

Full text

Abstract:

The use of social networks sites such as Facebook, Twitter, YouTube, Instagram, and LinkedIn has increased rapidly in the last decade. It has been pointed out in the international data that more than 83% of people between the age of 18 and 29 have used social networking sites (Best et al., 2014). Social networks are also a powerful medium that can be used for positive purposes, such as communication and information sharing, and can provide easy access to fresh news. On the other hand, social network sites can be used for negative purposes such as harassment and bullying. Bullying on social networks is usually called cyberbullying. Cyberbullying has emerged as a significant issue and become an important topic in social network analysis, as more than 10% of parents globally have stated that their child has been cyberbullied (Gottfried, 2012). Ipsos reported that in Indonesia 91% of parents stated their children were bullied on social media in 2012 (Gottfried, 2012). Moreover, 58% of Indonesian adolescents ranging in age from 12 to 21 reported that they often suffered online harassment and humiliation (Dipa, 2016). Therefore, to be able to understand this phenomenon, the use of machine learning methods in data mining techniques can potentially assist in analysing cyberbullying issues. However, there are several points to be taken into consideration in the rapid use of various vocabularies for cyberbullying, the patterns of harmful words used in cyberbullying messages, and the scale of the data. The purpose of this research is to identify the indicators of cyberbullying within the written content, and to propose and develop effective models of analysis with the goal of detecting the incidence of cyberbullying activities on social networks. Therefore, this research has addressed concerns about the measurement of cyberbullying and aimed to develop a reliable and valid measurable tool. Through developing systematic measurement and techniques, this research has enhanced an effective analysis model to discover the patterns of insulting words which can assist in accurately detecting cyberbullying messages. The research in this thesis has developed the analysis model using association rules and classification techniques. These techniques have been used for effective identification of cyberbullying messages on social networks. Furthermore, this research has discovered interesting patterns of insulting words which can assist in identifying cyberbullying messages. The experimental results have also indicated that the proposed method can predict the messages precisely into cyberbullying or non-cyberbullying. Moreover, 80.37% of the total data has been detected as cyberbullying. Overall, this thesis makes a significant contribution in identifying new characteristics for cyberbullying recognition, in developing the analysis method for social issues and in advancing the parameters to determine the strength of the relationship between data in relation to data mining techniques. The research in this thesis presents the analysis results and contributes to our understanding of various cyberbullying patterns. Also, the results can be developed further in future research.

APA, Harvard, Vancouver, ISO, and other styles

22

Cui, Xiao. "Social Network Analysis Based on a Hierarchy of Communities." Thesis, 2016. https://vuir.vu.edu.au/31048/.

Full text

Abstract:

With the rapid growth of users in Social Networking Services (SNSs), data is generated in thousands of terabytes every day. This data contains lots of hidden information and patterns. The analysis of such data is not a trivial task. A great deal of effort has been put into it. Analysing users' behaviour in social networks can help researchers to better understand what happens in the real world and create huge commercial value for social networks themselves.

APA, Harvard, Vancouver, ISO, and other styles

23

Bharadwaj, Venkatesh. "Aural Mapping of STEM Concepts Using Literature Mining." 2013. http://hdl.handle.net/1805/3242.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Recent technological applications have made the life of people too much dependent on Science, Technology, Engineering, and Mathematics (STEM) and its applications. Understanding basic level science is a must in order to use and contribute to this technological revolution. Science education in middle and high school levels however depends heavily on visual representations such as models, diagrams, figures, animations and presentations etc. This leaves visually impaired students with very few options to learn science and secure a career in STEM related areas. Recent experiments have shown that small aural clues called Audemes are helpful in understanding and memorization of science concepts among visually impaired students. Audemes are non-verbal sound translations of a science concept. In order to facilitate science concepts as Audemes, for visually impaired students, this thesis presents an automatic system for audeme generation from STEM textbooks. This thesis describes the systematic application of multiple Natural Language Processing tools and techniques, such as dependency parser, POS tagger, Information Retrieval algorithm, Semantic mapping of aural words, machine learning etc., to transform the science concept into a combination of atomic-sounds, thus forming an audeme. We present a rule based classification method for all STEM related concepts. This work also presents a novel way of mapping and extracting most related sounds for the words being used in textbook. Additionally, machine learning methods are used in the system to guarantee the customization of output according to a user's perception. The system being presented is robust, scalable, fully automatic and dynamically adaptable for audeme generation.

APA, Harvard, Vancouver, ISO, and other styles

24

(6630578), Yellamraju Tarun. "n-TARP: A Random Projection based Method for Supervised and Unsupervised Machine Learning in High-dimensions with Application to Educational Data Analysis." Thesis, 2019.

Find full text

Abstract:

Analyzing the structure of a dataset is a challenging problem in high-dimensions as the volume of the space increases at an exponential rate and typically, data becomes sparse in this high-dimensional space. This poses a significant challenge to machine learning methods which rely on exploiting structures underlying data to make meaningful inferences. This dissertation proposes the n-TARP method as a building block for high-dimensional data analysis, in both supervised and unsupervised scenarios.

The basic element, n-TARP, consists of a random projection framework to transform high-dimensional data to one-dimensional data in a manner that yields point separations in the projected space. The point separation can be tuned to reflect classes in supervised scenarios and clusters in unsupervised scenarios. The n-TARP method finds linear separations in high-dimensional data. This basic unit can be used repeatedly to find a variety of structures. It can be arranged in a hierarchical structure like a tree, which increases the model complexity, flexibility and discriminating power. Feature space extensions combined with n-TARP can also be used to investigate non-linear separations in high-dimensional data.

The application of n-TARP to both supervised and unsupervised problems is investigated in this dissertation. In the supervised scenario, a sequence of n-TARP based classifiers with increasing complexity is considered. The point separations are measured by classification metrics like accuracy, Gini impurity or entropy. The performance of these classifiers on image classification tasks is studied. This study provides an interesting insight into the working of classification methods. The sequence of n-TARP classifiers yields benchmark curves that put in context the accuracy and complexity of other classification methods for a given dataset. The benchmark curves are parameterized by classification error and computational cost to define a benchmarking plane. This framework splits this plane into regions of "positive-gain" and "negative-gain" which provide context for the performance and effectiveness of other classification methods. The asymptotes of benchmark curves are shown to be optimal (i.e. at Bayes Error) in some cases (Theorem 2.5.2).

In the unsupervised scenario, the n-TARP method highlights the existence of many different clustering structures in a dataset. However, not all structures present are statistically meaningful. This issue is amplified when the dataset is small, as random events may yield sample sets that exhibit separations that are not present in the distribution of the data. Thus, statistical validation is an important step in data analysis, especially in high-dimensions. However, in order to statistically validate results, often an exponentially increasing number of data samples are required as the dimensions increase. The proposed n-TARP method circumvents this challenge by evaluating statistical significance in the one-dimensional space of data projections. The n-TARP framework also results in several different statistically valid instances of point separation into clusters, as opposed to a unique "best" separation, which leads to a distribution of clusters induced by the random projection process.

The distributions of clusters resulting from n-TARP are studied. This dissertation focuses on small sample high-dimensional problems. A large number of distinct clusters are found, which are statistically validated. The distribution of clusters is studied as the dimensionality of the problem evolves through the extension of the feature space using monomial terms of increasing degree in the original features, which corresponds to investigating non-linear point separations in the projection space.

A statistical framework is introduced to detect patterns of dependence between the clusters formed with the features (predictors) and a chosen outcome (response) in the data that is not used by the clustering method. This framework is designed to detect the existence of a relationship between the predictors and response. This framework can also serve as an alternative cluster validation tool.

The concepts and methods developed in this dissertation are applied to a real world data analysis problem in Engineering Education. Specifically, engineering students' Habits of Mind are analyzed. The data at hand is qualitative, in the form of text, equations and figures. To use the n-TARP based analysis method, the source data must be transformed into quantitative data (vectors). This is done by modeling it as a random process based on the theoretical framework defined by a rubric. Since the number of students is small, this problem falls into the small sample high-dimensions scenario. The n-TARP clustering method is used to find groups within this data in a statistically valid manner. The resulting clusters are analyzed in the context of education to determine what is represented by the identified clusters. The dependence of student performance indicators like the course grade on the clusters formed with n-TARP are studied in the pattern dependence framework, and the observed effect is statistically validated. The data obtained suggests the presence of a large variety of different patterns of Habits of Mind among students, many of which are associated with significant grade differences. In particular, the course grade is found to be dependent on at least two Habits of Mind: "computation and estimation" and "values and attitudes."

APA, Harvard, Vancouver, ISO, and other styles

25

(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.

Find full text

Abstract:

An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ _F1= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles

26

(9224231), Dongdong Ma. "Ameliorating Environmental Effects on Hyperspectral Images for Improved Phenotyping in Greenhouse and Field Conditions." Thesis, 2020.

Find full text

Abstract:

Hyperspectral imaging has become one of the most popular technologies in plant phenotyping because it can efficiently and accurately predict numerous plant physiological features such as plant biomass, leaf moisture content, and chlorophyll content. Various hyperspectral imaging systems have been deployed in both greenhouse and field phenotyping activities. However, the hyperspectral imaging quality is severely affected by the continuously changing environmental conditions such as cloud cover, temperature and wind speed that induce noise in plant spectral data. Eliminating these environmental effects to improve imaging quality is critically important. In this thesis, two approaches were taken to address the imaging noise issue in greenhouse and field separately. First, a computational simulation model was built to simulate the greenhouse microclimate changes (such as the temperature and radiation distributions) through a 24-hour cycle in a research greenhouse. The simulated results were used to optimize the movement of an automated conveyor in the greenhouse: the plants were shuffled with the conveyor system with optimized frequency and distance to provide uniform growing conditions such as temperature and lighting intensity for each individual plant. The results showed the variance of the plants’ phenotyping feature measurements decreased significantly (i.e., by up to 83% in plant canopy size) in this conveyor greenhouse. Secondly, the environmental effects (i.e., sun radiation) on aerial hyperspectral images in field plant phenotyping were investigated and modeled. An artificial neural network (ANN) method was proposed to model the relationship between the image variation and environmental changes. Before the 2019 field test, a gantry system was designed and constructed to repeatedly collect time-series hyperspectral images with 2.5 minutes intervals of the corn plants under varying environmental conditions, which included sun radiation, solar zenith angle, diurnal time, humidity, temperature and wind speed. Over 8,000 hyperspectral images of corn (Zea mays L.) were collected with synchronized environmental data throughout the 2019 growing season. The models trained with the proposed ANN method were able to accurately predict the variations in imaging results (i.e., 82.3% for NDVI) caused by the changing environments. Thus, the ANN method can be used by remote sensing professionals to adjust or correct raw imaging data for changing environments to improve plant characterization.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Mining engineering Blasting Data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles