Dissertations / Theses: 'Data mining Case studies'

1

Xu, Jie. "MINING STATIC AND DYNAMIC STRUCTURAL PATTERNS IN NETWORKS FOR KNOWLEDGE MANAGEMENT: A COMPUTATIONAL FRAMEWORK AND CASE STUDIES." Diss., Tucson, Arizona : University of Arizona, 2005. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1151%5F1%5Fm.pdf&type=application/pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ben, Nasr Sana. "Mining and modeling variability from natural language documents : two case studies." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S013/document.

Full text

Abstract:

L'analyse du domaine vise à identifier et organiser les caractéristiques communes et variables dans un domaine. Dans la pratique, le coût initial et le niveau d'effort manuel associés à cette analyse constituent un obstacle important pour son adoption par de nombreuses organisations qui ne peuvent en bénéficier. La contribution générale de cette thèse consiste à adopter et exploiter des techniques de traitement automatique du langage naturel et d'exploration de données pour automatiquement extraire et modéliser les connaissances relatives à la variabilité à partir de documents informels. L'enjeu est de réduire le coût opérationnel de l’analyse du domaine. Nous étudions l'applicabilité de notre idée à travers deux études de cas pris dans deux contextes différents: (1) la rétro-ingénierie des Modèles de Features (FMs) à partir des exigences réglementaires de sûreté dans le domaine de l’industrie nucléaire civil et (2) l’extraction de Matrices de Comparaison de Produits (PCMs) à partir de descriptions informelles de produits. Dans la première étude de cas, nous adoptons des techniques basées sur l’analyse sémantique, le regroupement (clustering) des exigences et les règles d'association. L'évaluation de cette approche montre que 69% de clusters sont corrects sans aucune intervention de l'utilisateur. Les dépendances entre features montrent une capacité prédictive élevée: 95% des relations obligatoires et 60% des relations optionnelles sont identifiées, et la totalité des relations d'implication et d'exclusion sont extraites. Dans la deuxième étude de cas, notre approche repose sur la technologie d'analyse contrastive pour identifier les termes spécifiques au domaine à partir du texte, l'extraction des informations pour chaque produit, le regroupement des termes et le regroupement des informations. Notre étude empirique montre que les PCMs obtenus sont compacts et contiennent de nombreuses informations quantitatives qui permettent leur comparaison. L'expérience utilisateur montre des résultats prometteurs et que notre méthode automatique est capable d'identifier 43% de features correctes et 68% de valeurs correctes dans des descriptions totalement informelles et ce, sans aucune intervention de l'utilisateur. Nous montrons qu'il existe un potentiel pour compléter ou même raffiner les caractéristiques techniques des produits. La principale leçon à tirer de ces deux études de cas, est que l’extraction et l’exploitation de la connaissance relative à la variabilité dépendent du contexte, de la nature de la variabilité et de la nature du texte
Domain analysis is the process of analyzing a family of products to identify their common and variable features. This process is generally carried out by experts on the basis of existing informal documentation. When performed manually, this activity is both time-consuming and error-prone. In this thesis, our general contribution is to address mining and modeling variability from informal documentation. We adopt Natural Language Processing (NLP) and data mining techniques to identify features, commonalities, differences and features dependencies among related products. We investigate the applicability of this idea by instantiating it in two different contexts: (1) reverse engineering Feature Models (FMs) from regulatory requirements in nuclear domain and (2) synthesizing Product Comparison Matrices (PCMs) from informal product descriptions. In the first case study, we adopt NLP and data mining techniques based on semantic analysis, requirements clustering and association rules to assist experts when constructing feature models from these regulations. The evaluation shows that our approach is able to retrieve 69% of correct clusters without any user intervention. Moreover, features dependencies show a high predictive capacity: 95% of the mandatory relationships and 60% of optional relationships are found, and the totality of requires and exclude relationships are extracted. In the second case study, our proposed approach relies on contrastive analysis technology to mine domain specific terms from text, information extraction, terms clustering and information clustering. Overall, our empirical study shows that the resulting PCMs are compact and exhibit numerous quantitative and comparable information. The user study shows that our automatic approach retrieves 43% of correct features and 68% of correct values in one step and without any user intervention. We show that there is a potential to complement or even refine technical information of products. The main lesson learnt from the two case studies is that the exploitability and the extraction of variability knowledge depend on the context, the nature of variability and the nature of text

APA, Harvard, Vancouver, ISO, and other styles

3

Madani, Farshad. "Opportunity Identification for New Product Planning: Ontological Semantic Patent Classification." PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4232.

Full text

Abstract:

Intelligence tools have been developed and applied widely in many different areas in engineering, business and management. Many commercialized tools for business intelligence are available in the market. However, no practically useful tools for technology intelligence are available at this time, and very little academic research in technology intelligence methods has been conducted to date. Patent databases are the most important data source for technology intelligence tools, but patents inherently contain unstructured data. Consequently, extracting text data from patent databases, converting that data to meaningful information and generating useful knowledge from this information become complex tasks. These tasks are currently being performed very ineffectively, inefficiently and unreliably by human experts. This deficiency is particularly vexing in product planning, where awareness of market needs and technological capabilities is critical for identifying opportunities for new products and services. Total nescience of the text of patents, as well as inadequate, unreliable and untimely knowledge derived from these patents, may consequently result in missed opportunities that could lead to severe competitive disadvantage and potentially catastrophic loss of revenue. The research performed in this dissertation tries to correct the abovementioned deficiency with an approach called patent mining. The research is conducted at Finex, an iron casting company that produces traditional kitchen skillets. To 'mine' pertinent patents, experts in new product development at Finex modeled one ontology for the required product features and another for the attributes of requisite metallurgical enabling technologies from which new product opportunities for skillets are identified by applying natural language processing, information retrieval, and machine learning (classification) to the text of patents in the USPTO database. Three main scenarios are examined in my research. Regular classification (RC) relies on keywords that are extracted directly from a group of USPTO patents. Ontological classification (OC) relies on keywords that result from an ontology developed by Finex experts, which is evaluated and improved by a panel of external experts. Ontological semantic classification (OSC) uses these ontological keywords and their synonyms, which are extracted from the WordNet database. For each scenario, I evaluate the performance of three classifiers: k-Nearest Neighbor (k-NN), random forest, and Support Vector Machine (SVM). My research shows that OSC is the best scenario and SVM is the best classifier for identifying product planning opportunities, because this combination yields the highest score in metrics that are generally used to measure classification performance in machine learning (e.g., ROC-AUC and F-score). My method also significantly outperforms current practice, because I demonstrate in an experiment that neither the experts at Finex nor the panel of external experts are able to search for and judge relevant patents with any degree of effectiveness, efficiency or reliability. This dissertation provides the rudiments of a theoretical foundation for patent mining, which has yielded a machine learning method that is deployed successfully in a new product planning setting (Finex). Further development of this method could make a significant contribution to management practice by identifying opportunities for new product development that have been missed by the approaches that have been deployed to date.

APA, Harvard, Vancouver, ISO, and other styles

4

Šenovský, Jakub. "Dolování z dat v jazyce Python." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363895.

Full text

Abstract:

The main goal of this thesis was to get acquainted with the phases of data mining, with the support of the programming languages Python and R in the field of data mining and demonstration of their use in two case studies. The comparison of these languages in the field of data mining is also included. The data preprocessing phase and the mining algorithms for classification, prediction and clustering are described here. There are illustrated the most significant libraries for Python and R. In the first case study, work with time series was demonstrated using the ARIMA model and Neural Networks with precision verification using a Mean Square Error. In the second case study, the results of football matches are classificated using the K - Nearest Neighbors, Bayes Classifier, Random Forest and Logical Regression. The precision of the classification is displayed using Accuracy Score and Confusion Matrix. The work is concluded with the evaluation of the achived results and suggestions for the future improvement of the individual models.

APA, Harvard, Vancouver, ISO, and other styles

5

Pena, Isis. "Utility-based data mining: An anthropometric case study." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27723.

Full text

Abstract:

One of the most important challenges for the apparel industry is to produce garments that fit the population properly. In order to achieve this objective, it is crucial to understand the typical profile of consumer's bodies. In this work, we aim to identify the typical consumer from the virtual tailor's perspective. To this end, we perform clustering analysis on anthropometric and 3-D data to group the population into clothing sizes. Next, we perform multi-view relational classification to analyze the interplay of different body measurements within each size. In this study, we analyze three different populations as contained in the CAESAR(TM) database, namely, the American, the Italian and the Dutch populations. Throughout this study, we follow a utility-based data mining approach. The goal of utility-base data mining is to consider all utility aspects of the mining process and to thus maximize the utility of the entire process. In order to address this issue, we engage in dimension reduction techniques to find a smaller set of body measurement that reduces the cost and improves the performance of the mining process. We also apply objective interestingness measures in our analysis of demographic data, to improve the quality of the results and reduce the time and search space of the mining process. The analysis of demographic data allows us to better understand the demographic nature of potential customers, in order to target subgroups of potential customers better.

APA, Harvard, Vancouver, ISO, and other styles

6

Daley, Caitlin Marie. "Application of Data Mining Tools for Exploring Data: Yarn Quality Case Study." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-10292008-165755/.

Full text

Abstract:

Businesses are constantly striving for a competitive edge in the economy, and data-driven decision making is crucial to achieve this goal. Four data mining tools, principal component analysis, cluster analysis, recursive partitioning, and discriminant analysis, were used to explore the major factors that contribute to ends down in a rotor spinning manufacturing process. Principal component analysis was used to explore the research question about whether the large number of cotton properties used to classify cotton could be reduced to a significant few. Cluster analysis was used to gain insight about whether there were groups of gins, counties, or classing offices that produced better raw material than others and led to less ends down. The important research question of what raw material properties were affecting ends down was explored with both recursive partitioning and discriminant analysis. Additional research investigated the effect of cotton variety and atmospheric conditions on spinning productivity. Each of the four data mining tools used was informative and offered a different perspective to the overall research question. Several significant factors emerged including humidity, temperature, %DP 555, and uniformity in addition to micronaire and the color properties (+b and Rd). With these results the researcher developed an improvement plan for better control and increased spinning productivity in future operations. A designed experiment is necessary to thoroughly investigate the impact of certain factors beyond the exploratory conclusions obtained from this study.

APA, Harvard, Vancouver, ISO, and other styles

7

Ivanovskiy, Tim V. "Mining Medical Data in a Clinical Environment." Scholar Commons, 2006. http://scholarcommons.usf.edu/etd/3908.

Full text

Abstract:

The availability of new treatments for a disease depends on the success of clinical trials. In order for a clinical trial to be successful and approved, medical researchers must first recruit patients with a specific set of conditions in order to test the effectiveness of the proposed treatment. In the past, the accrual process was tedious and time-consuming. Since accruals rely heavily on the ability of physicians and their staff to be familiar with the protocol eligibility criteria, candidates tend to be missed. This can result and has resulted in unsuccessful trials.A recent project at the University of South Florida aimed to assist research physicians at H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida, with a screening process by utilizing a web-based expert system, Moffitt Expedited Accrual Network System (MEANS). This system allows physicians to determine the eligibility of a patient for several clinical trials simultaneously.We have implemented this web-based expert system at the H. Lee Moffitt Cancer Center & Research Gastroenterology (GI) Clinic. Based on our findings and staff feedback, the system has undergone many optimizations. We used data mining techniques to analyze the medical data of current gastrointestinal patients. The use of the Apriori algorithm allowed us to discover new rules (implications) in the patient data. All of the discovered implications were checked for medical validity by a physician, and those that were determined to be valid were entered into the expert system. Additional analysis of the data allowed us to streamline the system and decrease the number of mouse clicks required for screening. We also used a probability-based method to reorder the questions, which decreased the amount of data entry required to determine a patient's ineligibility.

APA, Harvard, Vancouver, ISO, and other styles

8

桂宏胜 and Hongsheng Gui. "Data mining of post genome-wide association studies and next generation sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/193431.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Damle, Chaitanya. "Flood forecasting using time series data mining." [Tampa, Fla.] : University of South Florida, 2005. http://purl.fcla.edu/fcla/etd/SFE0001038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Haneuse, Sebastian J. P. A. "Ecological studies using supplemental case-control data /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/9595.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Abdull, Mohamed A. Salem. "Data mining techniques and breast cancer prediction : a case study of Libya." Thesis, Sheffield Hallam University, 2011. http://shura.shu.ac.uk/20611/.

Full text

Abstract:

Different forms of cancer have been widely studied and documented in various studies across the world. However, there have not been many similar studies in the developing countries - particularly those on the African continent (Parkin, et al., 2005). This thesis seeks to uncover the geo-demographic occurrence patterns of the disease by applying three Data mining Techniques, namely Logistic Regression (LR), Neural Networks (NNs) and Decision Trees (DTs), to learn the underlying rules in the overall behaviour of breast cancer. The data, 3,057 observations on 29 variables obtained from four cancer treatment centres in Libya (2004-2008), were interrogated using multiple K-folds cross validation. The predictive strategy yielded a list of breast cancer predictor factors ordered according to their importance in predicting the disease. Comparison between our results and those obtainable from conventional LR, NN and DT models shows that our strategy out-performs the conventional variable selection. It is expected that the findings from this thesis will provide an input into comparative geo-ethnic studies of cancer and provide informed intervention guidelines in the prevention and cure of the disease, not only in Libya but also in other parts of the world.

APA, Harvard, Vancouver, ISO, and other styles

12

Charest, Michel. "Intelligent data mining assistance via case-based reasoning and a formal ontology." Thèse, Trois-Rivières : Université du Québec à Trois-Rivières, 2007. http://www.uqtr.ca/biblio/notice/resume/30000316R.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Khan, Mohammed Saquib Akmal. "Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed Databases." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/51223.

Full text

Abstract:

Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges. In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35]. DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges. We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

14

Stönner, Christof [Verfasser]. "Application of data mining techniques to indoor and outdoor air studies / Christof Stönner." Mainz : Universitätsbibliothek Mainz, 2019. http://d-nb.info/1177193620/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Bhansali, Neera, and nbhansali@yahoo com. "Strategic Alignment in Data Warehouses Two Case Studies." RMIT University. Business Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080108.150431.

Full text

Abstract:

This research investigates the role of strategic alignment in the success of data warehouse implementation. Data warehouse technology is inherently complex, requires significant capital investment and development time. Many organizations fail to realize the full benefits from it. While failure to realize benefits has been attributed to numerous causes, ranging from technical to organizational reasons, the underlying strategic alignment issues have not been studied. This research confirms, through two case studies, that the successful adoption of the data warehouse depends on its alignment to the business plans and strategy. The research found that the factors that are critical to the alignment of data warehouses to business strategy and plans are (a) joint responsibility between data warehouse and business managers, (b) alignment between data warehouse plan and business plan, (c) business user satisfaction, (d) flexibility in data warehouse planning and (e) technical integration of the data warehouse. In the case studies, the impact of strategic alignment was visible both at implementation and use levels. The key findings from the case studies are that a) Senior management commitment and involvement are necessary for the initiation of the data warehouse project. The awareness and involvement of data warehouse managers in corporate strategies and a high level of joint responsibility between business and data warehouse managers is critical to strategic alignment and successful adoption of the data warehouse. b) Communication of the strategic direction between the business and data warehouse managers is important for the strategic alignment of the data warehouse. Significant knowledge sharing among the stakeholders and frequent communication between the iv data warehouse managers and users facilitates better understanding of the data warehouse and its successful adoption. c) User participation in the data warehouse project, perceived usefulness of the data warehouse, ease of use and data quality (accuracy, consistency, reliability and timelines) were significant factors in strategic alignment of the data warehouse. d) Technology selection based on its ability to address business and user requirements, and the skills and response of the data warehousing team led to better alignment of the data warehouse to business plans and strategies. e) The flexibility to respond to changes in business needs and flexibility in data warehouse planning is critical to strategic alignment and successful adoption of the data warehouse. Alignment is seen as a process requiring continuous adaptation and coordination of plans and goals. This research provides a pathway for facilitating successful adoption of data warehouse. The model developed in this research allows data warehouse professionals to ensure that their project when implemented, achieve the strategic goals and business objectives of the organization.

APA, Harvard, Vancouver, ISO, and other styles

16

Godes, David Bradley. "Use of heterogeneous data sources : three case studies." Thesis, Massachusetts Institute of Technology, 1989. http://hdl.handle.net/1721.1/61057.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Sloan School of Management, 1989.
Title as it appears in the M.I.T. Graduate List, June 1989: Integration of heterogeneous data sources--three case studies.
Includes bibliographical references (leaf 159).
by David Bradley Godes.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

17

Hao, Dayang. "Content extraction, analysis, and retrieval for plant visual traits studies." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/5704.

Full text

Abstract:

Thesis (M.S.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 12, 2009) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

18

Jiang, Shan Ph D. Massachusetts Institute of Technology. "Deciphering human activities in complex urban systems : mining big data for sustainable urban future." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101369.

Full text

Abstract:

Thesis: Ph. D. in Urban and Regional Planning, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 187-200).
"Big Data" is in vogue, and the explosion of urban sensors, mobile phone traces, and other windows onto urban activities has generated much hype about the advent of a new 'urban science.' However, translating such Big Data into a planning-relevant understanding of activity patterns and travel behavior presents a number of obstacles. This dissertation examines some of these obstacles and develops data processing pipelines and urban activity modeling techniques that can complement traditional travel surveys and facilitate the development of richer models of activity patterns and land use-transportation interactions. This study develops methods and tests their usefulness by using Singapore metropolitan area as an example, and employing data mining and statistical learning methods to distill useful spatiotemporal information on human activities by people and by place from traditional travel survey data, semantically enriched GIS data, massive and passive call detail records (CDR) data, and Wi-Fi augmented mobile positioning data. I illustrate that regularity and heterogeneity exist in individuals' daily activity patterns in the metropolitan area. I test the hypothesis that by characterizing and clustering individuals' activity profiles, and incorporating them into household decision choice models, we can characterize household lifestyles in ways that enhance our understanding and enable us to predict important decision-making processes within the urban system. I also demonstrate ways of integrating Big Data with traditional data sources in order to identify human mobility patterns, urban structures, and semantic themes of places reflected by human activities. Finally, I discuss how the enriched understanding about cities, human mobility, activity, and behavior choices derived from Big Data can make a difference in land use planning, urban growth management, and transportation policies.
by Shan Jiang.
Ph. D. in Urban and Regional Planning

APA, Harvard, Vancouver, ISO, and other styles

19

Nekvapil, Viktor. "Data Mining in Customer Relationship Management: The Case of a Major Logistic Company." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-124538.

Full text

Abstract:

The thesis addresses possibilities of deploying the open source data mining system LISp-Miner in the customer relationship management (CRM), specifically in the area of lead management. This is basically a process of finding information about potential customers, qualifying those customers according to their potential (future value), and turning the selected potential customers to real customers. The data used includes the records concerning the lead management of a major logistic company operating worldwide (the company wanted to stay in anonymity). The data is analysed using the LISp-Miner system which is an academic software developed at the Faculty of Informatics and Statistics at the University of Economics, Prague. The thesis also pays attention to the collaboration with the business experts of the company which provided the data. The principle aim of the thesis is to provide information contributing to the possible change of internal processes of the company. Further aims are to propose directions of the use of the LISp-Miner system when solving a similar data mining task, and propose a simple and understandable way how to present the results. The aims have been achieved by doing the analysis in compliance with the CRISP-DM Methodology. The asset of the thesis is the description of the whole project which includes the analysis of real data using the LISp-Miner system. Further result of the thesis is the description of the lead management domain. Finally, the thesis offers the instructions and recommendations for future similar projects. Section I outlines the LISp-Miner system and its procedures. Section II - A case study describes the process of analysing the data. Two cycles ("iterations") of the analysis were performed. The chapters devoted to the both iterations have been structured according to the phases of the CRISP-DM Methodology. Section III summarises the observations gained during the entire project. Moreover, it gives recommendations and instructions for the application in case a similar project of the data analysis using the LISp-Miner system is designed.

APA, Harvard, Vancouver, ISO, and other styles

20

Hughes, David Bryn. "Geotechnical engineering applications in opencast coal mining : case studies from Northern England." Thesis, University of Newcastle Upon Tyne, 2003. http://hdl.handle.net/10443/858.

Full text

Abstract:

Opencast coal mining using mechanical excavators has taken place in Northern England for over sixty years. In the early years the excavations for coal were relatively shallow and of limited area, typically less than 20 m deep and 50 ha in plan. Nowadays with the deployment of very large draglines and hydraulic shovels, opencast mines can be over 200 m deep and up to 1,000 ha in area. The investigations, excavations and earthworks failures associated with this activity have provided a unique opportunity to study several geotechnical engineering aspects of the drift and solid geology of Northern England, and how they impact on the mine planning, design and operations processes.

APA, Harvard, Vancouver, ISO, and other styles

21

Govinnage, Sunil Kantha. "Environmental Regulations of the Mining Industry: Two Case Studies from Western Australia." Thesis, Curtin University, 2018. http://hdl.handle.net/20.500.11937/75445.

Full text

Abstract:

The study analyses the Western Australian mining regulatory framework of environmental compliance. Through the case studies of Yeelirrie uranium mining approval, and Collie coal mining, it identifies a dichotomy (Acts of Parliament and State Agreements) of mining legislation and multi-agency approach challenging effective environmental protection. Grounded in sustainability and social sciences approaches, the thesis draws from expert interviews to identify weaknesses and best practices. It makes recommendations for strengthening the implementation of the mining regulatory framework.

APA, Harvard, Vancouver, ISO, and other styles

22

Helmuth, Angelo. "Economic diversification of a mining town: a case study of Oranjemund." Thesis, Rhodes University, 2009. http://hdl.handle.net/10962/d1003843.

Full text

Abstract:

Can mining industries and mining based localities promote Local Economic Development (LED)? This case study, on the mining town of Oranjemund, seeks to examine the economic diversification prospects of the town. Stakeholder views are considered and their aspirations determined, through an interview process. Relevant theories on economic development, growth and sustainability are outline. Lessons are drawn from local and international empirical studies on mining towns. The roles and contributions stakeholders and institutions could realize that could lead to local economic diversification and LED are defined. The opportunities and threats that could affect the town’s LED process are identified. This paper concludes that it is imperative that sound relationships be developed amongst key stakeholders. It further, recommends that a strategic LED plan be designed for Oranjemund and that national government, through the regional and local authority, lead the process.

APA, Harvard, Vancouver, ISO, and other styles

23

Dong, Zheng. "Automated Extraction and Retrieval of Metadata by Data Mining : a Case Study of Mining Engine for National Land Survey Sweden." Thesis, University of Gävle, Department of Technology and Built Environment, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-6811.

Full text

Abstract:

Metadata is the important information describing geographical data resources and their key elements. It is used to guarantee the availability and accessibility of the data. ISO 19115 is a metadata standard for geographical information, making the geographical metadata shareable, retrievable, and understandable at the global level. In order to cope with the massive, high-dimensional and high-diversity nature of geographical data, data mining is an applicable method to discover the metadata.

This thesis develops and evaluates an automated mining method for extracting metadata from the data environment on the Local Area Network at the National Land Survey of Sweden (NLS). These metadata are prepared and provided across Europe according to the metadata implementing rules for the Infrastructure for Spatial Information in Europe (INSPIRE). The metadata elements are defined according to the numerical formats of four different data entities: document data, time-series data, webpage data, and spatial data. For evaluating the method for further improvement, a few attributes and corresponding metadata of geographical data files are extracted automatically as metadata record in testing, and arranged in database. Based on the extracted metadata schema, a retrieving functionality is used to find the file containing the keyword of metadata user input. In general, the average success rate of metadata extraction and retrieval is 90.0%.

The mining engine is developed in C# programming language on top of the database using SQL Server 2005. Lucene.net is also integrated with Visual Studio 2005 to build an indexing framework for extracting and accessing metadata in database.

APA, Harvard, Vancouver, ISO, and other styles

24

Perl, Henning [Verfasser]. "Security and Data Analysis - Three Case Studies / Henning Perl." Bonn : Universitäts- und Landesbibliothek Bonn, 2017. http://d-nb.info/1149154179/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Kadambi, Rupasri. "Analysis of data mining techniques for customer segmentation and predictive modeling a case study /." Diss., Online access via UMI:, 2005.

Find full text

Abstract:

Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Dept. of Systems Science and Industrial Engineering, 2005.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

26

Gunturkun, Fatma. "A Comprehensive Review Of Data Mining Applications In Quality Improvement And A Case Study." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608751/index.pdf.

Full text

Abstract:

In today&lsquo
s world, knowledge is the most powerful factor for the success of the organizations. One of the most important resources to reach this knowledge is the huge data stored in their databases. In the analysis of this data, DM techniques are essentially used. In this thesis, firstly, a comprehensive literature review on DM techniques for the quality improvement in manufacturing is presented. Then one of these techniques is applied on a case study. In the case study, the customer quality perception data for driver seat quality is analyzed. Decision tree approach is implemented to identify the most influential variables on the satisfaction of customers regarding the comfort of the driver seat. Results obtained are compared to those of logistic regression analysis implemented in another study.

APA, Harvard, Vancouver, ISO, and other styles

27

Davoodi, Alireza. "User modeling and data mining in intelligent educational games : Prime Climb a case study." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/45274.

Full text

Abstract:

Educational games are designed to leverage students’ motivation and engagement in playing games to deliver pedagogical concepts to the players during game play. Adaptive educational games, in addition, utilize students’ models of learning to support personalization of learning experience according to students’ educational needs. A student’s model needs to be capable of making an evaluation of the mastery level of the target skills in the student and providing reliable base for generating tailored interventions to meet the user’s needs. Prime Climb, an adaptive educational game for students in grades 5 or 6 to practice number factorization related skill, provides a test-bed for research on user modeling and personalization in the domain of education games. Prime Climb leverages a student’s model using Dynamic Bayesian Network to implement personalization for assisting the students practice number factorization while playing the game. This thesis presents research conducted to improve the student’s model in Prime Climb by detecting and resolving the issue of degeneracy in the model. The issue of degeneracy is related to a situation in which the model’s accuracy is at its global maximum yet it violates conceptual assumptions about the process being modeled. Several criteria to evaluate the student’s model are introduced. Furthermore, using educational data mining techniques, different patterns of students’ interactions with Prime Climb were investigated to understand how students with higher prior knowledge or higher learning gain behave differently compared to students with lower prior knowledge and lower learning gain.

APA, Harvard, Vancouver, ISO, and other styles

28

LIEN, PO-CHUN, and 連柏鈞. "Big Data Mining Application with RapidMiner and Case Studies." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/vm7s9x.

Full text

Abstract:

碩士
朝陽科技大學
資訊工程系
106
In recent years, the rapid development of computer technology and information industry has led to a significant increase in the amount of data. However, regarding these large and messy multidimensional data sets, we cannot quickly and effectively find the information that we need. Therefore, we have to use the data mining techniques to concentrate on extracting the information that we need from the data. In this thesis, we will introduce a relatively new data mining software, Rapidminer. We compare the Rapidminer with other data mining software via comparative analysis of a functional operating procedures. Through the application of four case studies including linear regression, neural networks, decision trees, and support vector machines to illustrate the operations of Rapidminer. There are two reasons to use Rapidminer in this thesis. The first one is that it has a very convenient graphical interface. The second one is that user does not need to learn other programming syntax, just need to select components and setting parameters. The display of analysis results is also diversification, which allowing users to choose the functional map to view the results.

APA, Harvard, Vancouver, ISO, and other styles

29

Lin, Ying-Jiun, and 林映均. "Case studies of applying data mining clustering techniques to evaluate service quality." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/35752749859521560775.

Full text

Abstract:

碩士
國立彰化師範大學
行銷與流通管理研究所
97
The major objective of this study is to use self-organizing maps (SOM) and K-means method to cluster the consumers of E-Life Mall Corporation and Kuo-Kuang Motor Transport Company into appropriate categories. The service items concerned in this study fall into the first and fourth quadrants of the IPA matrix when Kano model is applied. This study uses data mining techniques to extract effective customer clusters, understand the differences among different clusters by Kano two-dimensional model, and analyze the importance and priority of clusters. The results of this study can be offered to the companies to design different marketing strategies by understanding and then improving the competitive advantages and weaknesses of service quality characteristics. Compared with market segmentation results by ANOVA, the service quality attributes of E-Life Mall Company consumers are classified into five clusters, key success factors of E-Life Mall Company consumers are classified into three clusters by K-means method. In addition, Kuo-Kuang Motor Transport Company consumers are classified into four clusters by K-means method. Moreover, the consumers are classified into two clusters by using complete linkage and Ward linkage methods. Finally, by integrating SOM and K-means, the consumers are classified into twelve clusters. According to four consumer categories proposed by Reinartz and Kumar (2002), twelve clusters are reduced to four categories. Then, this study analyzes Kano quality attributes,customers’demographic information, and average satisfaction based on these four categories. Different results generated by ANOVA, K-Means, and an integrated approach of SOM and K-Means methods are compared and discussed. The results of this study can provide insights for E-Life Mall Corporation and Kuo-Kuang Motor Transport Company to find appropriate customer clustering method.

APA, Harvard, Vancouver, ISO, and other styles

30

Shen, Yu-Ching, and 沈俞靜. "Applying data mining on outpatient medical record-Case studies of the pediatrics in a regional hospital." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/6g7c3j.

Full text

Abstract:

碩士
國立虎尾科技大學
資訊管理研究所
97
This research made use of the outpatient data of pediatrics, and was divided into three main subjects to debate. （1） Because of different training background for each doctor, so there were some cognitive differences to using frequency of the same medicine, therefore, will the frequency of the patient''s medicine usage affect the rate of their future revisit examination? （2） To discuss the relationship between medical allowances of patients and their actions of see a doctor─will patient change their choice of medical treatment level, that when medical copayment be raise up? （3） Comparing the diagnostic differences between clinic pediatrician and emergency pediatrician, as well as their patterns of usage in medical resources. The results of research show that: （1） The result of Data-Mining found that; the frequency of the patient''s medicine usage has no absolute relationship with their situation of revisit, perhaps, this is relating to personal physique, conditions of external environment, or complications caused to other disease. （2） When patient go to a doctor, the expense is not necessarily the most important consideration, the confidence to doctor, or far and near distance…etc all are considerations. （3） After July 2001, patients paid the expense for tested-check items was displayed raise up, it is show that emergency pediatrician use tested-check items to assist diagnosis have higher proportion than clinic pediatrician. In order to understand the situation of usage in medical resources, this study was directed to three main subjects, and analyzed by Data-Mining and statistical methods, and provide the result to hospital and medical-related staffs as a reference.

APA, Harvard, Vancouver, ISO, and other styles

31

Li, Jie Ru, and 李杰儒. "A Data Mining Framework for Analyzing Key Factors of Unemployment Duration Using Bayesian Networks and Case Studies." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/01276367379244773198.

Full text

Abstract:

碩士
國立清華大學
工業工程與工程管理學系
104
This study aims to develop a data mining framework to analyze key factors of unemployment duration and the complex relationships among each factor. Conducted on the basis of real data collected from a representative Human Resource Agency in Taiwan. In order to extract latent knowledge and patterns from huge data about job seekers. This study formulates research hypotheses based on literature review, domain expert knowledge and supported by Bayesian Network, statistical test and correlation coefficient to screen out 15 key factors of the job seekers’ “general information, job requirement, education, working experience and new work category” have a significant effect on unemployment duration. Then, using Bayesian network to clarify the relationships among each factor and unemployment duration. Finally, this study presents a process of case studies that can extract the useful knowledge of data mining results efficiently. Major findings indicate that unemployment duration difference among each field, the employment tendency of different type of job seekers and job transition patterns in the current domestic labor market. For example, in a particular industry, workers with 3 to 6 years’ seniority may have a high turnover intention, the reemployment difficulty among middle aged workers, the regional wage gaps and other social issues. The results assist various types of job seekers obtain comprehensive information to find their own niche in the labor market. In the meantime, this study also provides the decision-making reference for government and enterprise. On the other hand, Human Resource Agency can base on the results to improve their services. Help job seekers to find the most favorable direction.

APA, Harvard, Vancouver, ISO, and other styles

32

Fong, Ruei-Shiang, and 馮瑞祥. "Studies on Predicting the Outcome of Professional Baseball Games with Data Mining Techniques: MLB as a Case." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/13879224135755534063.

Full text

Abstract:

碩士
中國文化大學
資訊管理學系
101
Professional baseball games emphasize data collection and analysis because each game provides plenty of data that needs to be analyzed. Data mining methods involve computer analysis techniques with which a crucial outcome can be found from a huge amount of data. The data mining techniques thus can be used to efficiently analyze the data of professional baseball and also avoid the mistakes often caused by manual analysis. This study aims to predict the outcome and scores of professional baseball games in MLB. The data of the study are all the regular season games from 2000 to 2012 of thirty teams in MLB. The variables are the average statistics of both the fielders’ and the pitchers’ performances in the last ten games. First, we used the Pearson product-moment correlation coefficient to delete the unrelated variables and variables of multicollinearity and to select the suitable variables. Then we applied the Back Propagation Network (BPN) of the artificial neural network to build a model for the selected variables. The first 100 games served as the training set of the model while the later 62 games as the validation set. After obtaining the predicted scores of each game, we compared them to the real outcome of the games and the money line. After using the output model to predict the scores of the host and the guest, we further compared them with the real outcome, run line, and money line of sports gambling. The experimental results have proven that the model of this study provided better prediction accuracy. Follow-up researchers may consider using different variables for the model to improve the accuracy of the predictions.

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Li-Fei, and 陳麗妃. "A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/30869955008789719497.

Full text

Abstract:

博士
國立清華大學
工業工程與工程管理學系
95
Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.

APA, Harvard, Vancouver, ISO, and other styles

34

Dlamini, Wisdom Mdumiseni Dabulizwe. "Spatial analysis of invasive alien plant distribution patterns and processes using Bayesian network-based data mining techniques." Thesis, 2016. http://hdl.handle.net/10500/20692.

Full text

Abstract:

Invasive alien plants have widespread ecological and socioeconomic impacts throughout many parts of the world, including Swaziland where the government declared them a national disaster. Control of these species requires knowledge on the invasion ecology of each species including how they interact with the invaded environment. Species distribution models are vital for providing solutions to such problems including the prediction of their niche and distribution. Various modelling approaches are used for species distribution modelling albeit with limitations resulting from statistical assumptions, implementation and interpretation of outputs. This study explores the usefulness of Bayesian networks (BNs) due their ability to model stochastic, nonlinear inter-causal relationships and uncertainty. Data-driven BNs were used to explore patterns and processes influencing the spatial distribution of 16 priority invasive alien plants in Swaziland. Various BN structure learning algorithms were applied within the Weka software to build models from a set of 170 variables incorporating climatic, anthropogenic, topo-edaphic and landscape factors. While all the BN models produced accurate predictions of alien plant invasion, the globally scored networks, particularly the hill climbing algorithms, performed relatively well. However, when considering the probabilistic outputs, the constraint-based Inferred Causation algorithm which attempts to generate a causal BN structure, performed relatively better. The learned BNs reveal that the main pathways of alien plants into new areas are ruderal areas such as road verges and riverbanks whilst humans and human activity are key driving factors and the main dispersal mechanism. However, the distribution of most of the species is constrained by climate particularly tolerance to very low temperatures and precipitation seasonality. Biotic interactions and/or associations among the species are also prevalent. The findings suggest that most of the species will proliferate by extending their range resulting in the whole country being at risk of further invasion. The ability of BNs to express uncertain, rather complex conditional and probabilistic dependencies and to combine multisource data makes them an attractive technique for species distribution modeling, especially as joint invasive species distribution models (JiSDM). Suggestions for further research are provided including the need for rigorous invasive species monitoring, data stewardship and testing more BN learning algorithms.
Environmental Sciences
D. Phil. (Environmental Science)

APA, Harvard, Vancouver, ISO, and other styles

35

Bassett, Cameron. "Cloud computing and innovation: its viability, benefits, challenges and records management capabilities." Diss., 2015. http://hdl.handle.net/10500/20149.

Full text

Abstract:

This research investigated the potential benefits, risks and challenges, innovation properties and viability of cloud computing for records management on an Australian organisation within the mining software development sector. This research involved the use of a case study results analysis as well as a literature analysis. The literature analysis identified the ten potential benefits of cloud computing, as well as the ten risks and challenges associated with cloud computing. It further identified aspects, which needed to be addressed when adopting cloud computing in order to promote innovation within an organisation. The case study analysis was compared against a literature review of ten potential benefits of cloud computing, as well as the ten risks and challenges associated with cloud computing. This was done in order to determine cloud computing’s viability for records management for Company X (The company in the case study). Cloud computing was found to be viable for Company X. However, there were certain aspects, which need to be discussed and clarified with the cloud service provider beforehand in order to mitigate possible risks and compliance issues. It is also recommended that a cloud service provider who complies with international standards, such as ISO 15489, be selected. The viability of cloud computing for organisations similar to Company X (mining software development) followed a related path. These organisations need to ensure that the service provider is compliant with laws in their local jurisdiction, such as Electronic Transactions Act 1999 (Australia, 2011:14-15), as well as laws where their data (in the cloud) may be hosted. The benefits, risks and challenges of records management and cloud computing are applicable to these similar organisations. However, mitigation of these risks needs to be discussed with a cloud service provider beforehand. From an innovation perspective, cloud computing is able to promote innovation within an organisation, if certain antecedents are dealt with. Furthermore, if cloud computing is successfully adopted then it should promote innovation within organisations.
Information Science
M. Inf.

APA, Harvard, Vancouver, ISO, and other styles

36

Armstrong, Joshua J. "Rehabilitation Therapy Services For Older Long–Stay Clients in the Ontario Home Care System." Thesis, 2013. http://hdl.handle.net/10012/7342.

Full text

Abstract:

BACKGROUND Rehabilitation therapies are effective for older persons in home-based settings, and have the potential to save money for the health system, while also improving the quality of life for older adults who may otherwise be hospitalized or institutionalized. Although there is evidence that home-based rehabilitation can improve functional outcomes in older adults, research has shown that many older home care clients do not receive the rehabilitation services they need. Despite the home care sector’s increasing importance within Ontario’s health care system, we have a limited understanding of the population that currently utilizes these services and how these services are allocated in the province. This dissertation project aims to enhance the understanding of this domain using a large provincial data repository of home care client information (RAI-HC information system). METHODS Using the Andersen-Newman Framework to guide this research from a conceptual standpoint, and combining it with the CRoss Industry Standard Process for Data Mining (CRISP-DM) as an organizational framework, this dissertation focuses on examining data collected on older long-stay home care clients. Prior to the data mining modeling procedures, knowledge of the rehabilitation services in home care was developed through a series of semi-structured interviews with key informants. The results of this qualitative study were then used to inform quantitative analyses that included creating rehabilitation service user profiles using the K-means clustering algorithm, and the development of predictive models of rehabilitation service provision using a Random forest algorithm and multilevel models. RESULTS Older home care clients who receive occupational therapy and physiotherapy in the Ontario Home Care System form a complex and heterogeneous client population. These services are often provided to clients following an acute event, yet many older adults who could benefit from therapy services for functional improvement and maintenance are not provided services due to limited resources. K-means clustering analyses resulted in the creation of seven profiles of rehab service users illustrating the multidimensional diversity of the service user population. Predictive models were able to identify client characteristics that are commonly associated with service provision. These models confirmed the large amount of regional variation found across the province and highlighted the differences between factors that lead to occupational therapy and physiotherapy service provision. CONCLUSIONS Using multiple methods to systematically examine rehabilitation services for long-stay clients, new insights into the current user population and the client characteristics related to service provision were obtained. Future research activities should focus on ways to use the regularly collected standardized data to identify older long-stay home care clients who would benefit most from the rehabilitation therapy services provided by the provincial home care system.

APA, Harvard, Vancouver, ISO, and other styles

37

Wei, Tzu-Fa, and 魏子發. "Studies on Spatial Data mining Techniques on Construction Engineering." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/47062379617193689343.

Full text

Abstract:

碩士
國立臺灣科技大學
營建工程系
91
Due to the rapid advance of information technology, data saving is not so difficult as before. The amount of data in the world, in our lives,keeps on increasing continuously. Data mining is developed to search for useful information from data. Undoubtedly, there also exists a lot of data in construction engineering. We try to use different data mining techniques to find out useful information from these data. During the process of steel structure design, a huge amount of data is developed at stress re-analysis stage. The main purpose of this study is to establish stress approximation models by using approximation methodology, one commonly used data mining tool. These approximation models can be used to replace full analysis during the time-consuming re-analysis process. Because stress data of steel structure is spatial correlated, we use a spatial data mining technique, Kriging methods, which is a spatial correlated approximation method. Several cases were used to test the Kriging-based models.Other approximation models, such as response surface methods(RSM),were also used to evaluate the performances of approximation models. The experiments depict that Kriging-based approximation models outperform other approximation models.

APA, Harvard, Vancouver, ISO, and other styles

38

Hsu, Puteng, and 許普騰. "The Studies Of Data Mining By Using Neural Network." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/wt4h5u.

Full text

Abstract:

碩士
義守大學
電機工程學系
100
In recent years, due to the vigorous developments of database technology and artificial intelligence, data mining has become a useful and important technique in the area of information science. Data mining is a technique which can extract the hidden information from a large data base. It can help the researcher to search useful information that might be ignored and missed. Neural network has been used in the research of data mining since its powerful learning and adaptive capabilities. Through the learning process, the unknown information hidden in the data could be obtained by neural network. In this research, a new computation method based on the weights of the well-trained neural network is developed. By using the method developed, the degree of influence of each input variable to the output could be found and the useful influencing inputs also could be determined. Thus, the data mining of NN technique in data mining becomes very promising.

APA, Harvard, Vancouver, ISO, and other styles

39

Ferreira, Rita Gomes Salgado. "Data mining and cluster organisations : the case of PortugalFoods." Master's thesis, 2016. http://hdl.handle.net/10400.14/21838.

Full text

Abstract:

Even though the concept of clusters received a considerable amount of attention, the literature dedicated to cluster organisations is still very scarce. On the other hand, the widely applicability of data mining to several industries, along with the benefits that it might bring to any organisation, have been the subject of various articles throughout the years. This dissertation intends to assess how could cluster organisations benefit from the application of data mining on the type of services they provide. Through the empirical study of a Portuguese cluster organisation – PortugalFoods – I analysed if data mining represents an opportunity for these governance bodies, particularly if applied as a new support tool on their market intelligence services. Supported by CRISP-DM methodology, and based on data provided by Mintel’s databases, a prototype data mining project was developed. The findings of this study indicate that data mining could enhance PortugalFoods’ market intelligence services, as well as their role as producers and disseminators of knowledge. Yet, challenges were also detected, due to the existence of several data’s problems, which could jeopardize the future replication of this process.

APA, Harvard, Vancouver, ISO, and other styles

40

Juang, Ming-Lun, and 莊明倫. "Data Mining Analysis on RMA Raw Data - A Case Study of G Company." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/563e8z.

Full text

Abstract:

碩士
世新大學
資訊管理學研究所(含碩專班)
103
Ever to establish a database of information and accumulation rests single unified storage, can be a query, add, modify, or delete operation, but with the accumulated data, a simple functional operation can not meet today's needs, but want to be able to go from library to explore the potential relationship between data. In the past, the need for large amounts of data to be analyzed by experts and analysts in the field, the other hand, you can now use data mining technology, directly through large amounts of data in a quick extraction or dig out the knowledge, provide managers decisions. In this thesis, using data mining techniques in data mining motherboard repair, by the results of data analysis to identify hidden behind a large number of potential repository information to the relevant departments feedback, sharing of experience as an engineer repair problems caused by the materials and parts product development and improvement based on reference direction. This study explored by case approach, the use of data mining techniques, the use of the original data mining Clementine 12.0 to Company G motherboard repair data, mining the globally distributed motherboard repair information in order to analyze the two-stage cluster If engineers the new chip on the motherboard repair mistakes or misjudgments can wafer bumping and then re-use, will greatly reduce maintenance costs. From the decision tree model tree can know when the user installs itself will be damaged due to the installation socket processor, designed to provide propaganda departments to strengthen significantly the purpose of teaching and product installation precautions to avoid wasting materials and parts repair. From spider diagram can be found at Internet cafe industry in China, because the long turn on the computer, there are circumstances leading to overheating due to take place when the machine, it is recommended that developers can be improved for the thermal design aspects of the design and the associated wiring. So indirectly, to obtain product quality improvement, and reduce maintenance and repair rate, and thus get better customer satisfaction.

APA, Harvard, Vancouver, ISO, and other styles

41

Huang, Chin-tien, and 黃金田. "Data Mining Analysis on RMA Feedback Data- A Case Study on S Company." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/10608341995922209727.

Full text

Abstract:

碩士
世新大學
資訊管理學研究所(含碩專班)
99
For a leading electronic component provider, to pursue high quality products is always one of the most serious issues to consider. Without maintaining high quality goods, its market shares could be grabbed and therefore replaced by rivals through reducing prices easily. This research adopt the RFM model provided by Hughes（1994）to analyze RMA data. In order to make the model adaptive to RMA applications, the new factor – prduct unit price has been added to evaluate the importance of these returned materials. Next, in order to rank the importance of these four factors, the AHP (Analytic Hierarchy Process, Thomas L. Saaty, in the 1971) method is used to determine their weights. Finally, the K-means cluster algorithm divides returned materials into groups., Based on the data analysis processes stated above, our main results would include two directions. The first one is four main RMA clusters are proposed, characterized and analyzed. The other contribution is to propose adequate RMA handling strategies for helping FAEs, product administrator, factory quality manager control and further enhance respective RMA goods.

APA, Harvard, Vancouver, ISO, and other styles

42

Liu, Wei-Chihh, and 劉威志. "DEVELOP A WEB-BASED DATA ACQUISITION TOOL FOR DATA MINING STUDIES OF DRUG THERAPIES." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/91061939216319144127.

Full text

Abstract:

碩士
高雄醫學大學
臨床藥學研究所碩士班
95
This study aimed to develop a web-based data acquisition tool for data mining studies of drug therapies inspired by a 2004 study. To combine two predominant techniques, WWW and data mining are the bases of this study. Computer programming languages such as HTML, PHP, MySQL and JavaScript were used to develop the framework and the interaction between the project managers and this tool. EasyPHP was installed on a workstation PC, and was used to set up the client/server with PHP, MySQL, and Apache servers. A Linux PC with OS Fedora Core 4 was installed and served as a remote server for this tool. The Linux PC also served as a web server which allowed experts to log in to complete a project. Exported Weka ARFF file is the final step of this project, which enables project managers to apply Weka to data mining analysis. We successfully reproduced a 2004 study with this tool, including the questionnaires (simulated and generated) and the on-line expert log-in website (collect experts’ decisions and opinions). Moreover, it may shorten the time taken to construct a project using this tool. It may allow the project managers to define a project on their own demand. The project managers no longer need to manually build a project but can use this integrated and web-based tool to quickly complete a setup for a research project.

APA, Harvard, Vancouver, ISO, and other styles

43

曾雨慈. "Application of data mining techniques in business tax case selection." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/65925828837004498587.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Zheng, You-Lun, and 鄭又綸. "Information Mining and Testing Methods for Association Studies with Pedigree Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/77407006241468422996.

Full text

Abstract:

碩士
輔仁大學
應用統計學研究所
97
Family-based association tests are widely used in gene mapping. Pedigree disequilibrium test (PDT) is one of these tests for analysis of pedigree data. PDT derived by combining transmission disequilibrium and allele difference. In this thesis, we tried to combine the information of odds ratio, similarity difference, transmission disequilibrium and allele difference. We use some existing combination procedures, such as Tippett, Fisher and inverse normal method to combine the four statistics information. Overall, we have considered 48 statistics for investigating which of them having higher power performances. The simulation result shows that the statistic constructed with similarity difference has better power performance than other statistics.

APA, Harvard, Vancouver, ISO, and other styles

45

Yang, Ching-Hsiung, and 楊清雄. "Data Mining Apply in Business Administration Management - A Customer Relationship Case." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/73428181063669585131.

Full text

Abstract:

碩士
大葉大學
資訊管理學系碩士班
92
Internet changed the traditional business strategies. How to build up a good relationship with customers and business partners is the determinant between winner and loser. By using "Customer Relationship Management", companies can easily distinguish out valuable customers and no-valuable customers. Companies' goal is to make profits, so how to apply the customer's profit margins and customer's profit to provide for market segmenting and resources locating is vital operation strategies. The purpose of this study is using the association rule technique in Data Mining to determine the criteria for distinguishing valuable customers and no-valuable customers. According to those criteria, company can set up properly sale strategies to meet the requirements of valuable customers, as well as to create loyalty, and so to increase turnover rate and reduce sale cost.

APA, Harvard, Vancouver, ISO, and other styles

46

Chen, Fu-Jung, and 陳富中. "Data Mining: A Case Study of Telephone Customers'' Call Behavior Analysis." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/46594058995884131146.

Full text

Abstract:

碩士
淡江大學
資訊工程學系
88
Data warehousing has become very popular among organizations seeking competitive advantage by getting strategic information fast and easy. For corporation-wide data, high-sounding goals, and grandiose schemes, it turns into projects typified by massive cost overruns and mediocre results. The departmental Decision Support System (DSS) databases are called data mart that has an architectural foundation of a data warehouse. The data mart meets departmental decision making requirements and supports multiple dimensional data analysis situations.In the telecommunications industry, Call Detail Records (CDRs) contain a gold mine of information about customers and competitors. Traditional approaches were CDR-based only. The CDRs show subscriber trends. But a customer may own more than one subscriber line, and the line may belong to different customer at different period. In this paper, we built a data mart system based on the CDRs and the business vocation of customer. Using this data mart system, we can analyze the customer call behavior, and bring the behavior pattern to customer caring, pricing, marketing, and decision-making.

APA, Harvard, Vancouver, ISO, and other styles

47

Tsai, Jui-Mu, and 蔡瑞木. "Applications of Data Mining- A Case Study on a Department Store." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/69984226646483893862.

Full text

Abstract:

碩士
國立臺北大學
企業管理學系
101
Retailing industry has grown to maturity stage. Due to the dense spreading of shopping centers, low distinction on merchandises among the stores, each department store is facing with very harsh competition. Today's customers have more choices and thus less loyalty. To adapt to the irreversible trend and strive to survive, and maximize the profit through the vast variety of customers, department store must recognize his strength in the market segments and find out those valuable 20% customers that contribute to 80% of the corporate revenue. All marketing strategy is aimed to attract new customers and retain these valuable customers. Stores use the purchasing records from credit cards and discounted membership cards to collect customers' purchasing behaviors. Inside the massive data, there are hidden rules that can be discovered through data mining. Store administrators can make the best use of them to adjust sales promotion strategy and improve corporate performance. In this article, we use RFM analysis, Recency of last patronage, Frequency of patronage, and Monetary value they spent, and data mining from the 3 periods purchasing records of an anonymous store to categorize customers' purchasing characteristics. We generates individual quantized R, F, M values as inputs, then through K-means cluster research, these RFM values are grouped into different clusters. Then we use MLE-WMLE algorithm to get each customer's monetary value trend. All analysises are based on 3 continuous years of data. Then we check whether the result from each year is stable. Then CART decision tree is used to classify customers as churn or retention in next year, then we uses Apriori algorithm to do "Basket analysis". Through association analysis between customers and merchandises, We propose a list of goods and services and the associated layout. Finally we compares customers of this store with the progressive average life of Taiwan residents to study why young customers are disappearing from this store. In this way new countermeasures are proposed to attract new customers and old customers are retained so that the store can survive the tough competition. All data are intentionally edited to protect the corporation in this study. We just tries to present a method to use data mining to assist administration.

APA, Harvard, Vancouver, ISO, and other styles

48

Lin, Fang-Ru, and 林芳如. "Use of Data Mining in Catering Situation Analysis- AKAONI Steak Case." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/69963691110253593078.

Full text

Abstract:

碩士
亞洲大學
休閒與遊憩管理學系
104
Information for businesses or organizations, is a very important asset. Problems accumulated data, companies are facing, not a lack of information, but too much information. For the large and growing amount of data if not effectively treated, it will result in the case of so-called "data dumping" is. After statistical or use data mining techniques to be addressed, it can be converted into information or knowledge can be utilized. Available to decision makers to make the right moves. Enterprises are facing a huge market, competitive pressures, customer's consumption habits continue to change, the enterprise-based revenue sources and new customers and old customers repeat purchase consumption. The advancement of technology, the increasingly common use of the database, so that large amounts of data to be stored and managed to save. By data mining technology, and explore the establishment of consumer purchase signature rule merchandise mix, master Consumer consumption situation, the establishment of market segments and dividing the target customer base, and to identify potential market relevant consumers to predict their consumption behavior. The results pointed out that the turnover in the other Saturday with high turnover association, turnover in the other Sunday with high turnover relevance, do not turnover in the third quarter with high turnover relevance turnover not the shop on Sundays and public high turnover association, turnover in love do not shop on Sunday with high turnover association, turnover in other Zonta store on Sunday with high turnover relevance, not in turnover Feng Chia shop Saturday with high turnover relevance of the week in other public stores and Sunday have relevance, do week in stores and on Sunday there are fraternity association, the other week and Sunday Zonta store has relevance, do not store in the week and Saturday Fengjia have relevance.

APA, Harvard, Vancouver, ISO, and other styles

49

Kao, Shu-Chen, and 高淑珍. "A Data Mining Based Approach To Customer Response Model – A Case Study." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/19798201013038698354.

Full text

Abstract:

博士
國立成功大學
企業管理學系碩博士班
92
It has been seen that the modern marketing paradigm has been rapidly shifting and business has used to apply target marketing to capture the right customers in promotion activity. However, the customer response model, regarded as the tool for targeting and prediction, is the most important task in marketing promotion. This research proposes a data mining (DM) based customer response model for insurance industry to help in finding unobvious but valuable promotion knowledge to support making marketing related decisions. First, we visited a leading insurance company (denoted by A), one of the most popular insurance companies in Taiwan, to frame the research focus. There were 188464 transaction records provided by A company. Of the collected data, two to third was used as a dataset being mined while the remainder as a test dataset. The ID3 mining algorithm was utilized to derive decision rules and obtained 943 qualified rules in total. The accuracy of the proposed model was 81%. To capture the important implication of the knowledge, the research analyzed the obtained rules in two directions including the level of supports and degree of conditions. The former focused mainly on the amount of supports and degree of conditions for the obtained rules to analyze the product categories with respect to the customer characteristics. The latter carried primarily out the relationships between different degree of condition and product categories. The research then conducted the second visit of A in order to validate the obtained knowledge in practice. The results indicated that the customer response model was able to aim at finding and diffusing the insurance marketing knowledge. It was also found that the proposed customer response model with the DM mechanism was decision-supportable based on the opinion of executive manager. Moreover, the proposed model would play a key role in changing the decision making style from experience-oriented to information-oriented. Other research findings were provided and managerial implications addressed in this research also.

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Chun-Jen, and 陳俊任. "Application of Data Mining to Customer Relationship Management —The Case of Cosmetics." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/16660581231956292070.

Full text

Abstract:

碩士
元智大學
工業工程研究所
89
Customer satisfaction is always the most important factor in cosmetics business. To keep and develop a long-term relationship with customers, a cosmetics vendor should not only provide a valuable product but also thoughtful service. Through customer relationship management (CRM), we can provide the right products and services at the right time with the right deliver channel to the right customers. In this research, we propose a method that utilizes an artificial neural network and a statistic cluster method to distinguish customers based on their purchasing behavior. With the method, different segment results can be generated for cosmetics vendors to target potential customers with the right promotional activities. Our experiment shows that cosmetics vendors are expected to increase profit significantly.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data mining Case studies'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles