Journal articles on the topic 'Data mining Case studies'

To see the other types of publications on this topic, follow the link: Data mining Case studies.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data mining Case studies.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

MELLI, GABOR, XINDONG WU, PAUL BEINAT, FRANCESCO BONCHI, LONGBING CAO, RONG DUAN, CHRISTOS FALOUTSOS, et al. "TOP-10 DATA MINING CASE STUDIES." International Journal of Information Technology & Decision Making 11, no. 02 (March 2012): 389–400. http://dx.doi.org/10.1142/s021962201240007x.

Full text
Abstract:
We report on the panel discussion held at the ICDM'10 conference on the top 10 data mining case studies in order to provide a snapshot of where and how data mining techniques have made significant real-world impact. The tasks covered by 10 case studies range from the detection of anomalies such as cancer, fraud, and system failures to the optimization of organizational operations, and include the automated extraction of information from unstructured sources. From the 10 cases we find that supervised methods prevail while unsupervised techniques play a supporting role. Further, significant domain knowledge is generally required to achieve a completed solution. Finally, we find that successful applications are more commonly associated with continual improvement rather than by single "aha moments" of knowledge ("nugget") discovery.
APA, Harvard, Vancouver, ISO, and other styles
2

Lomax, Susan, and Sunil Vadera. "Case Studies in Applying Data Mining for Churn Analysis." International Journal of Conceptual Structures and Smart Applications 5, no. 2 (July 2017): 22–33. http://dx.doi.org/10.4018/ijcssa.2017070102.

Full text
Abstract:
The advent of price and product comparison sites now makes it even more important to retain customers and identify those that might be at risk of leaving. The use of data mining methods has been widely advocated for predicting customer churn. This paper presents two case studies that utilize decision tree learning methods to develop models for predicting churn for a software company. The first case study aims to predict churn for organizations which currently have an ongoing project, to determine if organizations are likely to continue with other projects. While the second case study presents a more traditional example, where the aim is to predict organizations likely to cease being a subscriber to a service. The case studies include presentation of the accuracy of the models using a standard methodology as well as comparing the results with what happened in practice. Both case studies show the significant savings that can be made, plus potential increase in revenue by using decision tree learning for churn analysis.
APA, Harvard, Vancouver, ISO, and other styles
3

Rauch, Jan, and Milan Šimůnek. "Data Mining with Histograms and Domain Knowledge – Case Studies and Considerations*." Fundamenta Informaticae 166, no. 4 (April 26, 2019): 349–78. http://dx.doi.org/10.3233/fi-2019-1805.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tang, Yan. "Studies on Broad-Sense Sample Method in Data Mining." Advanced Materials Research 989-994 (July 2014): 1453–55. http://dx.doi.org/10.4028/www.scientific.net/amr.989-994.1453.

Full text
Abstract:
With the development of science and technology, people pay more and more attention to the reliability of the products, especially in some special field, such as aerospace, military products, and some products of high reliability and long life. As a part that runs through the whole life cycle of products, reliability test provides an important source of data for the design, batch production and residual life assessment of the product development. For some expensive, new products put into use, they are not quite little in amount, having the characteristics of small sample. In this case, how to use the existing data to predict product life, reliability of calculating the reliability of a product more accurately and other related parameters is particularly important.
APA, Harvard, Vancouver, ISO, and other styles
5

Conrads, Paul, Ruby Daamen, and Edwin A. Roehl. "Maximizing Data-Collection Networks by Using Data-Mining Techniques – Case Studies in the Florida Everglades." Proceedings of the Water Environment Federation 2008, no. 12 (January 1, 2008): 4384–98. http://dx.doi.org/10.2175/193864708788752296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ding, Jianwei. "Case Investigation Technology Based on Artificial Intelligence Data Processing." Journal of Sensors 2021 (October 26, 2021): 1–9. http://dx.doi.org/10.1155/2021/4942657.

Full text
Abstract:
Through data mining technology, the hidden information behind a large amount of data is discovered, which can help various management services and provide scientific basis for leadership decision-making. It is an important subject of current police information research. This paper conducts in-depth research on the investigation analysis and decision-making of public security cases and proposes a case-based reasoning model based on two case databases. Moreover, this paper discusses in detail the use of data mining technology to automatically establish a case database, which is a useful exploration and practice for the public security department to establish a new and efficient case investigation auxiliary decision-making system. In addition, this paper studies the method of using data mining technology to assist in the establishment of a case database, analyzes the characteristics of traditional case storage methods, and constructs a case investigation model based on artificial intelligence data processing. The research results show that the model constructed in this paper has certain practical effects.
APA, Harvard, Vancouver, ISO, and other styles
7

Altuntas, Serkan, Turkay Dereli, and Andrew Kusiak. "Assessment of corporate innovation capability with a data-mining approach: industrial case studies." Computers & Industrial Engineering 102 (December 2016): 58–68. http://dx.doi.org/10.1016/j.cie.2016.10.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Heripracoyo, Sulistyo. "Data Warehouse dan Data Mining Pendidikan Tinggi: Studi Kasus Kategori Undur Diri di Universitas Bina Nusantara." ComTech: Computer, Mathematics and Engineering Applications 3, no. 2 (December 1, 2012): 808. http://dx.doi.org/10.21512/comtech.v3i2.2309.

Full text
Abstract:
Data warehouse and data mining is used to extract useful information and has a specific meaning and to develop a real relationship between some variables stored in the data/data warehouse. A data warehouse is appropriately designed and added a requirement to provide appropriate data and is useful in making better decisions. Hardware and software facilitate adequate access to such data, analyze and display the results interactively. Data mining software is a highly effective tool that can be used to interrogate the data contained in the data warehouse in order to find a relationship (Neary 1999). This study conducts some literature studies applies some models and case studies in a higher education institution, in terms of the benefits, functions and development. The case study conducted is objected to see the trend and prediction of the number of students who drop out (DO).
APA, Harvard, Vancouver, ISO, and other styles
9

Gozali, Elahe, Bahlol Rahimi, Malihe Sadeghi, and Reza Safdari. "Diagnosis of diseases using data mining." Medical Technologies Journal 1, no. 4 (November 29, 2017): 120–21. http://dx.doi.org/10.26415/2572-004x-vol1iss4p120-121.

Full text
Abstract:
Introduction: In the information age, data are the most important asset for health organizations. In the case of using data in useful and optimal manner, they can become financial resources for organization. Data mining is an appropriate method to transform this potential value into strategic information. Data mining means extraction of hidden information, recognition of hidden relationships and patterns, and in general, discovery of useful knowledge at high volume. The objective of this review paper was to evaluate using data mining in diagnoses of diseases. Methods: This research is a review paper conducted based on a structured review of the papers published in Science Direct, Pubmed, Google Scholar, SID, Magiran (between years 2005 and 2015) and books related to using data mining in medical science and using it in diagnose of diseases with related keywords. Results: Nowadays, data mining is used in many medical science studies, including diagnosis of diseases, discovering the hidden patterns in data, and so on. New ideas such as discovery of Knowledge from Discovery and Data Mining Database, which includes data mining techniques, have found more popularity and they has becomedesired research tool for researchers. Researchers can use them to identify patterns and relationshipsamong great number of variables. Using them, researchers have been able to predict theresults obtained from one disease by using information stores available in databases. Several studies have indicated that data mining is used widely in diagnosis of diseases based on types of information (medical images, characteristics of patients, and so on), such as tuberculosis, types of cancers, infectious diseases, and diagnosis of anomalies rarely diagnosed by human (spots and particular points within aye, which is the symptom of onset of blindness resulting from diabetes), determining type of behavior with patients, and predicting the success rate of surgical surgeries, determining the success rate of therapeutic methods in coping with incurable diseases, and so on. Conclusion: One of the most important challenging topics in healthcare is transformation of raw clinical data into meaningful information following continuous generation of great number of data. In current competitive environment, health organizations using technologies such as data mining to improve healthcare quality will achieve success faster. Many of research centers in Iran are faced with large volume of information, which is not analyzed at all or will be time-consuming due to using traditional methods, even in the case of using analysis and converting them to knowledge. In light of using data mining and its implementation, health organizations can transform the data into a powerful and competitive tool and take new steps in preventing, diagnosing, treating, and providing high-quality services for clients.
APA, Harvard, Vancouver, ISO, and other styles
10

Pelt, Maurice, Konstantinos Stamoulis, and Asteris Apostolidis. "Data analytics case studies in the maintenance, repair and overhaul (MRO) industry." MATEC Web of Conferences 304 (2019): 04005. http://dx.doi.org/10.1051/matecconf/201930404005.

Full text
Abstract:
Data analytics seems a promising approach to address the problem of unpredictability in MRO organizations. The Amsterdam University of Applied Sciences in cooperation with the aviation industry has initiated a two-year applied research project to explore the possibilities of data mining. More than 25 cases have been studied at eight different MRO enterprises. The CRISP-DM methodology is applied to have a structural guideline throughout the project. The data within MROs were explored and prepared. Individual case studies conducted with statistical and machine learning methods, were successfully to predict among others, the duration of planned maintenance tasks as well as the optimal maintenance intervals, the probability of the occurrence of findings during maintenance tasks.
APA, Harvard, Vancouver, ISO, and other styles
11

Lima, E., C. Mues, and B. Baesens. "Domain knowledge integration in data mining using decision tables: case studies in churn prediction." Journal of the Operational Research Society 60, no. 8 (August 2009): 1096–106. http://dx.doi.org/10.1057/jors.2008.161.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Mohotti, W. A., and S. C. Premaratne. "Analysing Sri Lankan lifestyles with data mining: two case studies of education and health." Kelaniya Journal of Management 6, no. 1 (July 27, 2017): 1. http://dx.doi.org/10.4038/kjm.v6i1.7523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Carlson, Roderick. "Understanding Geologic Uncertainty in Mining Studies." SEG Discovery, no. 117 (April 1, 2019): 21–29. http://dx.doi.org/10.5382/geo-and-mining-03.

Full text
Abstract:
Editor’s note: The Geology and Mining series, edited by Dan Wood and Jeffrey Hedenquist, is designed to introduce early-career professionals and students to a variety of topics in mineral exploration, development, and mining, in order to provide insight into the many ways in which geoscientists contribute to the mineral industry. Abstract The role of geology in advanced mining studies, such as feasibility studies, is commonly dwarfed by the technical inputs from mining, metallurgical, and social license issues. Understanding and planning for geologic risk in the feasibility process is often overlooked for the higher-profile aspects required to establish an ore reserve. If the geologic model of a deposit cannot be reliably forecast, then there will be lower confidence in many of the modifying factors (which include mining, processing, environmental, social, governmental, and economic factors that influence the conversion of identified mineral resources into economic reserves). Understanding geologic risk requires characterization of all the chemical, physical, and spatial properties of mineralization and waste that form part of the mined material. It is essential to understand the scope of the professionals who use geoscientific data in order to assist the outcomes of the study, with the data types first identified, then collected in a comprehensive manner, and finally interpreted at the appropriate time to contribute to the outcomes of the study. If the study is not comprehensive, remedial collection of data is required at a cost to development timeline and budget; a worse scenario is that the development fails economically after it is built. Developing projects to a construction stage after a mining study typically involves international standards of assessment and verification, although the standards of geoscientific data collection differ between companies and countries. For this reason, recent efforts by international bodies such as the Committee for Mineral Reserves International Reporting Standards (CRIRSCO) are assisting many countries to work toward a standardized terminology in a feasibility study. There are many examples where the mining outcomes have not met the feasibility study forecast, with variable causes for a failure to deliver to plan; geoscientific data shortfalls often contribute significantly to these negative outcomes. Examination of case histories, knowledge of international standards for risk reporting, advances in measurement technology, and an understanding of the end users of geoscientific data will help geologists to better prepare the scope of a feasibility study for a potential mine, in order to deliver a product with lower risk related to geologic uncertainty.
APA, Harvard, Vancouver, ISO, and other styles
14

Gál, Tamás Zoltán, Gábor Kovács, and Zsolt T. Kardkovács. "Survey on privacy preserving data mining techniques in health care databases." Acta Universitatis Sapientiae, Informatica 6, no. 1 (June 1, 2014): 33–55. http://dx.doi.org/10.2478/ausi-2014-0017.

Full text
Abstract:
Abstract In health care databases, there are tireless and antagonistic interests between data mining research and privacy preservation, the more you try to hide sensitive private information, the less valuable it is for analysis. In this paper, we give an outlook on data anonymization problems by case studies. We give a summary on the state-of-the-art health care data anonymization issues including legal environment and expectations, the most common attacking strategies on privacy, and the proposed metrics for evaluating usefulness and privacy preservation for anonymization. Finally, we summarize the strength and the shortcomings of different approaches and techniques from the literature based on these evaluations.
APA, Harvard, Vancouver, ISO, and other styles
15

Watada, Junzo, Keisuke Aoki, Masahiro Kawano, and Muhammad Suzuri Hitam. "Dual Scaling in Data Mining from Text Databases." Journal of Advanced Computational Intelligence and Intelligent Informatics 10, no. 4 (July 20, 2006): 451–57. http://dx.doi.org/10.20965/jaciii.2006.p0451.

Full text
Abstract:
The availability of multimedia text document information has disseminated text mining among researchers. Text documents, integrate numerical and linguistic data, making text mining interesting and challenging. We propose text mining based on a fuzzy quantification model and fuzzy thesaurus. In text mining, we focus on: 1) Sentences included in Japanese text that are broken down into words. 2) Fuzzy thesaurus for finding words matching keywords in text. 3) Fuzzy multivariate analysis to analyze semantic meaning in predefined case studies. We use a fuzzy thesaurus to translate words using Chinese and Japanese characters into keywords. This speeds up processing without requiring a dictionary to separate words. Fuzzy multivariate analysis is used to analyze such processed data and to extract latent mutual related structures in text data, i.e., to extract otherwise obscured knowledge. We apply dual scaling to mining library and Web page text information, and propose integrating the result in Kansei engineering for possible application in sales, marketing, and production.
APA, Harvard, Vancouver, ISO, and other styles
16

Arwanto, Bambang. "Political Economy of Coal Mining Policy: A Case Study in Rent Seeking of Surveyor’s Data Manipulation in East Kalimantan (2009-2014)." Journal of Public Administration and Governance 8, no. 4 (October 23, 2018): 66. http://dx.doi.org/10.5296/jpag.v8i4.13819.

Full text
Abstract:
Both rent seeking and mining policy are two interesting discourses which have enriched the Indonesian policy studies within last two decades. One of prominent problems in this sector is formulation policy process of mining permit (IUP). The concern of mining sector is because the huge economic incentive behind mining business including coal mining. Since the economic incentive is extremely high, the rent seeking is getting higher and more competitive. The competition on rent seeking contest is developed in different policy formulation stages through the elites of business people and bureaucrats.The consequences are the objectivity on issuing the coal mining policy, including extra regulation about surveyors. The policy formulation in this very case produces the dynamic and complex rent seeking activity among the main players. This study is aimed to reveal the case of rent seeking using surveyors on coal mining policy. Using qualitative method and non positivist approach, this case of study was one of five studies that tried to understand the social relationship among the policy actors during mining policy formulation.Findings in the study were: (i) the role of surveyors as “third person” as mediator who played prominent roles in delivering the interest and determined the data through surveyor’s report, (ii) bargaining power of the businessman to get access in penetrating the bureaucracy through bribing and lobbying, and (iii) the role of bureaucrat in manipulating regulation to accommodate their interest through extra regulation making.
APA, Harvard, Vancouver, ISO, and other styles
17

Maciejewska, Alina, Łukasz Kuzak, Janusz Sobieraj, and Dominik Metelski. "The Impact of Opencast Lignite Mining on Rural Development: A Literature Review and Selected Case Studies Using Desk Research, Panel Data and GIS-Based Analysis." Energies 15, no. 15 (July 26, 2022): 5402. http://dx.doi.org/10.3390/en15155402.

Full text
Abstract:
The future of opencast mining and energy production based on conventional resources is one of the most important issues being discussed in international forums. The whole discussion is becoming increasingly heated and takes on a special significance with the drastic increase in energy commodity prices that has occurred with the outbreak of war in Ukraine. Especially in a country like Poland, these issues are accompanied by heated discussions between miners, the government and citizens. It should be emphasised that Polish lignite mining currently produces about 35% of the cheapest electricity in Poland and also creates many jobs. The aim of this study is to assess the possibility of continuing opencast mining and its impact on rural development—both from an environmental and socio-economic point of view. The study was conducted for two municipalities in Poland where opencast lignite mining plays an important role, namely Kleszczów and Kleczew. As a result, it was found that in the case of the studied municipalities, the presence of opencast mining has contributed to their development, and the application of modern environmental protection technologies and recultivation have reduced the difficulties associated with mining. On the other hand, the decision to start mining should be the result of a comparison between the potential environmental and social benefits and damages. In some cases, mining is beneficial for community development and leads to new opportunities for agriculture and tourism after reclamation. The study is a combination of different methods, i.e., case studies, GIS remote sensing analysis (based on Landsat data) and econometric analysis for selected socio-economic data.
APA, Harvard, Vancouver, ISO, and other styles
18

Peng, Sihong, and Nick Vayenas. "Maintainability Analysis of Underground Mining Equipment Using Genetic Algorithms: Case Studies with an LHD Vehicle." Journal of Mining 2014 (February 19, 2014): 1–10. http://dx.doi.org/10.1155/2014/528414.

Full text
Abstract:
While increased mine mechanization and automation make considerable contributions to mine productivity, unexpected equipment failures and planned or routine maintenance prohibit the maximum possible utilization of sophisticated mining equipment and require a significant amount of extra capital investment. This paper deals with aspects of maintainability prediction for mining machinery. A PC software called GenRel was developed for this purpose. In GenRel, it is assumed that failures of mining equipment caused by an array of factors follow the biological evolution theory. GenRel then simulates the failure occurrences during a time period of interest using genetic algorithms (GAs) coupled with a number of statistical techniques. A group of case studies focuses on maintainability analysis of a Load Haul Dump (LHD) vehicle with two different time intervals, three months and six months. The data was collected from an underground mine in the Sudbury area in Ontario, Canada. In each prediction case study, a statistical test is carried out to examine the similarity between the predicted data set with the real-life data set in the same time period. The objectives of case studies include an assessment of the applicability of GenRel using real-life data and an investigation of the impacts of data size and chronological sequence on prediction results.
APA, Harvard, Vancouver, ISO, and other styles
19

Hambeishi, Nariman Khaled, Shaymaa Abed Hussein, and Waleed Khalid AlAzzawi. "The role of data mining in diagnosing the diseases: A case study of detecting the thyroid disease." International Journal Artificial Intelligent and Informatics 3, no. 2 (April 2, 2022): 92–103. http://dx.doi.org/10.33292/ijarlit.v3i2.51.

Full text
Abstract:
One of the most important tools used in the field of medicine is to search for data, especially in the field of exploration and knowledge of the prevailing health conditions, research into diseases and patient behavior in society, as well as contribute to knowing the effect of drugs and drugs on patients according to previous patient records. Data mining also helps to identify diseases spread in a specific area, which helps to take the necessary measures and awareness to control this disease, and works to develop and update the field of medicine and medicines and increase their spread. Efficiency and processing capacity. Data mining techniques are a recent method in the medical field, and have become increasingly reliable in diagnosing diseases especially in diagnosing and detecting thyroid diseases. Since such techniques help to properly analyze and accurately predict the disease, they contribute to helping physicians, providing appropriate medical care, and reducing the incidence or development of side effects of the disease. Today, data mining techniques are important in the medical field as they help detect diseases and epidemics that are spreading around the world. The purpose of this paper is to know the role of data mining techniques in the diagnosis of thyroid disease through the survey of a number of relevant studies. The critical appraisal tool of previous studies in this area has been used. The study has also found the importance of early diagnosis of thyroid diseases, providing proper treatment for patients. It also agreed on the importance of data mining tools such as neural networks, Machine and decision tree learning and their disease diagnosis potential. The paper recommends the need for extensive systematic reviews of studies on the use of data mining techniques in the diagnosis of thyroid diseases. Also, we concluded that Prospecting tools in medical devices have had an enormous and important impact on the health care industry and the country. It should be remembered that the medical data that began to multiply in a large amount must be contained herein a lot of useful information that greatly affects the improvement of the level of medical services and detection in Characteristics of many diseases and epidemics and find solutions to many difficult diseases.
APA, Harvard, Vancouver, ISO, and other styles
20

Romadloni, Muhammad Danial, and Indra Gita Anugrah. "Implementation of EM Algorithm in Opinion Mining Movies Review Case Studies." Journal of Development Research 5, no. 2 (November 29, 2021): 94–105. http://dx.doi.org/10.28926/jdr.v5i2.149.

Full text
Abstract:
Movies are very familiar to everyone, from children, adolescents to adults, whether just because they want to watch, a hobby, or fill their spare time. Movies that used to be watched only on television and had to wait months after release or directly to the cinema, with the development of technology, of course, it is increasingly easier for everyone to enjoy movies, now they can be watched through paid television services to smartphones. One of the websites that viewers often use to review movies they have watched is IMDb. The data review can be used to get an opinion or opinion mining from the audience, whether the title of the movie being reviewed is good or not. One of the algorithms that are often used is Naïve Bayes, apart from being easy to implement, Naïve Bayes is also known to be very fast and easy to use to predict classes on a test dataset. The purpose of this study is to see how much influence the Expectation-Maximization to increase accuracy on implementation of Expectation-Maximization algorithm in opinion mining movies review case studies. From the results of this study using the Expectation-Maximization method, it was found that the accuracy increased by 4% compared to using only Naïve Bayes.
APA, Harvard, Vancouver, ISO, and other styles
21

Gao, Tianyun, Bartosz Boguslawski, Sylvain Marié, Patrick Béguery, Simon Thebault, and Stéphane Lecoeuche. "Data mining and data-driven modelling for Air Handling Unit fault detection." E3S Web of Conferences 111 (2019): 05009. http://dx.doi.org/10.1051/e3sconf/201911105009.

Full text
Abstract:
Data-driven automatic fault detection and diagnostics (AFDD) have gained a lot of research attention in recent years. Many existing solutions need to learn from the fault operation data to be able to diagnose the faults. However, these data are usually not available in buildings. In this study we present a data-driven AFDD solution for Air Handling Units (AHUs). The solution consists of three levels of fault detection that require different levels of data availability: the first level is daily energy benchmarking; the second level is control performance evaluation; and the third level is data-driven modelling of mechanical systems. The method is applied to two case studies: experimental data from ASHRAE project 1312-RP, and real-life operation data of an office building in France. These tests show that the solution is able to isolate control faults and mechanical faults of individual components, by learning from normal operation data only.
APA, Harvard, Vancouver, ISO, and other styles
22

Solmaz, Mustafa, Adam Lane, Bilal Gonen, Ogulsheker Akmamedova, Mehmet H. Gunes, and Kakajan Komurov. "Graphical data mining of cancer mechanisms with SEMA." Bioinformatics 35, no. 21 (May 9, 2019): 4413–18. http://dx.doi.org/10.1093/bioinformatics/btz303.

Full text
Abstract:
Abstract Motivation An important goal of cancer genomics initiatives is to provide the research community with the resources for the unbiased query of cancer mechanisms. Several excellent web platforms have been developed to enable the visual analyses of molecular alterations in cancers from these datasets. However, there are few tools to allow the researchers to mine these resources for mechanisms of cancer processes and their functional interactions in an intuitive unbiased manner. Results To address this need, we developed SEMA, a web platform for building and testing of models of cancer mechanisms from large multidimensional cancer genomics datasets. Unlike the existing tools for the analyses and query of these resources, SEMA is explicitly designed to enable exploratory and confirmatory analyses of complex cancer mechanisms through a suite of intuitive visual and statistical functionalities. Here, we present a case study of the functional mechanisms of TP53-mediated tumor suppression in various cancers, using SEMA, and identify its role in the regulation of cell cycle progression, DNA repair and signal transduction in different cancers. SEMA is a first-in-its-class web application designed to allow visual data mining and hypothesis testing from the multidimensional cancer datasets. The web application, an extensive tutorial and several video screencasts with case studies are freely available for academic use at https://sema.research.cchmc.org/. Availability and implementation SEMA is freely available at https://sema.research.cchmc.org. The web site also contains a detailed Tutorial (also in Supplementary Information), and a link to the YouTube channel for video screencasts of analyses, including the analyses presented here. The Shiny and JavaScript source codes have been deposited to GitHub: https://github.com/msolmazm/sema. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
23

Das, Subasish, Anandi Dutta, and Marcus A. Brewer. "Case Study of Trend Mining in Transportation Research Record Articles." Transportation Research Record: Journal of the Transportation Research Board 2674, no. 10 (July 24, 2020): 1–14. http://dx.doi.org/10.1177/0361198120936254.

Full text
Abstract:
This study employs two topic models to perform trend mining on an abundance of textual data to determine trends in research topics from immense collections of unstructured documents over the years. This study collected data from the titles and abstracts of the papers published in Transportation Research Record: Journal of the Transportation Research Board, since 1974. The content of these papers was ideal for examining research trends in various fields of research because it contains large textual data. In previous studies, exploratory analysis tools such as text mining were used to provide descriptive information about the data. However, this method does not provide researchers with quantifications of the topics and their correlations. Furthermore, the contents examined in this study are largely unstructured, and therefore they require faster machine learning algorithms to decipher them. For these reasons, the research team chose to employ two topic modeling tools, latent Dirichlet allocation and structural topic model, to perform trend mining. This analysis succeeded in extracting 20 main topics, identified by keywords, from the data. The research team also developed two interactive topic model visualization tools that can be used to extract topics from journal titles and abstracts, respectively. The findings from this study provide researchers with a further understanding of research patterns within ever-evolving area of transportation engineering studies.
APA, Harvard, Vancouver, ISO, and other styles
24

Oskouei, Rozita Jamili, and Mohsen Askari. "Predicting Academic Performance with Applying Data Mining Techniques (Generalizing the results of two Different Case Studies)." Computer Engineering and Applications Journal 3, no. 2 (June 30, 2014): 79–88. http://dx.doi.org/10.18495/comengapp.v3i2.81.

Full text
Abstract:
Several research works are attempted to predict students academic performance and assess  the  evaluating students knowledge  or  detecting  students’  weakness and probability of failure in final semester examinations. However, several factors affect the performance of students in different countries or even in different states of one country. Therefore, understanding these factors and analyzing the effects of each one of those factors in each country, is necessary for improving instructors’ decisions in selecting  the best teaching method for helping weak students or  increasing performance  of  other  students. This study is motivated  to  study  the  students’ academic performance in high  school  and  bachelor  degree  studies  in  Iran and comparing these analysis results with the similar study’s results in India.
APA, Harvard, Vancouver, ISO, and other styles
25

Monteiro, Diego Vilela, Rafael Duarte Coelho dos Santos, and Karine Reis Ferreira. "Mining Partners in Trajectories." International Journal of Data Warehousing and Mining 16, no. 1 (January 2020): 22–38. http://dx.doi.org/10.4018/ijdwm.2020010102.

Full text
Abstract:
Spatiotemporal data is everywhere, being gathered from different devices such as Earth Observation and GPS satellites, sensor networks and mobile gadgets. Spatiotemporal data collected from moving objects is of particular interest for a broad range of applications. In the last years, such applications have motivated many pieces of research on moving object trajectory data mining. In this article, it is proposed an efficient method to discover partners in moving object trajectories. Such a method identifies pairs of trajectories whose objects stay together during certain periods, based on distance time series analysis. It presents two case studies using the proposed algorithm. This article also describes an R package, called TrajDataMining, that contains algorithms for trajectory data preparation, such as filtering, compressing and clustering, as well as the proposed method Partner.
APA, Harvard, Vancouver, ISO, and other styles
26

Jesus, M. A., and Vania Estrela. "An Introduction to Data Mining Applied to Health-Oriented Databases." Oriental journal of computer science and technology 9, no. 3 (December 20, 2016): 177–85. http://dx.doi.org/10.13005/ojcst/09.03.03.

Full text
Abstract:
The application of data mining (DM) in healthcare is increasing. Healthcare organizations generate and collect large voluminous and heterogeneous information daily and DM helps to uncover some interesting patterns, which leads to the manual tasks elimination, easy data extraction directly from records, to save lives, to reduce the cost of medical services and to enable early detection of diseases. These patterns can help healthcare specialists to make forecasts, put diagnoses, and set treatments for patients in health facilities. This work overviews DM methods and main issues. Three case studies illustrate DM in healthcare applications: (i) In-Vitro Fertilization; (ii) Content-Based Image Retrieval (CBIR); and (iii) Organ transplantation.
APA, Harvard, Vancouver, ISO, and other styles
27

Hendstein, Christophar Nicholas, and Hiroshi Akeera Katsu. "Decision-making in large corporations - role of big data analytics & data mining." Business & IT XII, no. 1 (2022): 144–51. http://dx.doi.org/10.14311/bit.2022.01.17.

Full text
Abstract:
Function - The goal of this particular paper is presenting a novel framework for strategic decision making utilizing Big Data Analytics methodology. Design/methodology/approach - In this particular research, 2 distinct machine learning algorithms, Random Forest as well as Artificial Neural Networks are used to forecast export volumes working with a considerable level of open industry information. The forecasted values are in the Boston Consulting Group Matrix to conduct strategic industry analysis. Results - The proposed technique is validated utilizing a hypothetical case study of a Chinese business exporting freezers and refrigerators. The results indicate the proposed methodology makes exact trade forecasts and helps to conduct strategic industry evaluation properly. Furthermore, the RF performs much better compared to the ANN in terminology of forecast accuracy. Investigate limitations/implications - This analysis provides just one case study to evaluate the proposed methodology. In future scientific studies, the validity of the suggested technique is further generalized in various item groups and nations. Functional implications - In present day extremely competitive business environment, a good strategic industry evaluation involves exporters or importers making much better predictions along with strategic choices. To us the proposed BDA based strategy, businesses may efficiently determine business opportunities and alter their strategic choices appropriately. Originality/value - This's the very first study to provide a holistic methodology for strategic industry evaluation using BDA. The proposed methodology effectively forecasts global trade volumes and helps with the strategic decision making practice through succeeding insights into worldwide marketplaces.
APA, Harvard, Vancouver, ISO, and other styles
28

Cardoso, Sara, Telma Afonso, Marcelo Maraschin, and Miguel Rocha. "WebSpecmine: A Website for Metabolomics Data Analysis and Mining." Metabolites 9, no. 10 (October 19, 2019): 237. http://dx.doi.org/10.3390/metabo9100237.

Full text
Abstract:
Metabolomics data analysis is an important task in biomedical research. The available tools do not provide a wide variety of methods and data types, nor ways to store and share data and results generated. Thus, we have developed WebSpecmine to overcome the aforementioned limitations. WebSpecmine is a web-based application designed to perform the analysis of metabolomics data based on spectroscopic and chromatographic techniques (NMR, Infrared, UV-visible, and Raman, and LC/GC-MS) and compound concentrations. Users, even those not possessing programming skills, can access several analysis methods including univariate, unsupervised and supervised multivariate statistical analysis, as well as metabolite identification and pathway analysis, also being able to create accounts to store their data and results, either privately or publicly. The tool’s implementation is based in the R project, including its shiny web-based framework. Webspecmine is freely available, supporting all major browsers. We provide abundant documentation, including tutorials and a user guide with case studies.
APA, Harvard, Vancouver, ISO, and other styles
29

Noekhah, Shirin, Naomie Binti Salim, and Nor Hawaniah Zakaria. "Evaluation of Data Mining Features, Features Taxonomies and their Applications." Kurdistan Journal of Applied Research 2, no. 3 (August 27, 2017): 131–41. http://dx.doi.org/10.24017/science.2017.3.3.

Full text
Abstract:
The World Wide Web has brought an enormous improvement in the lives of people, during the last couple of decades. E-commerce is a new area arisen during this evolutionary period and has changed the traditional trading approaches for selling products and services. It uses different techniques to discover a market trend and analyze the competitor’s activities by exploiting reviews’ information. On the other hand, potential customers, also, use the online opinion to make their purchase decision. Opinion mining and sentiment analysis are the most critical and fundamental domains of data mining which can be useful for variety its sub-domains such as opinion summarization, recommendation system and opinion spam detection. Opinion mining and all its sub-branches can be performed efficiently when there is a comprehensive understanding of the most effective features applied in those domains. To achieve the best results, we need to use the most proper set of features for different case studies in order to classification or clustering. To the best of our knowledge, there is no extensive study and taxonomy of variety range of features and their applications in opinion mining. In this paper, we do comprehensive investigation on various types of features exploited in variety sub-branches of opinion mining domain. We present the most frequent features’ sets including structural, linguistic and relation-based features as a complete reference for further opinion mining research. The results proved that using multiple types of features improve the accuracy of opinion mining applications.
APA, Harvard, Vancouver, ISO, and other styles
30

Que, Sisi, Liang Wang, Kwame Awuah-Offei, Wei Yang, and Hui Jiang. "Corporate Social Responsibility: Understanding the Mining Stakeholder with a Case Study." Sustainability 11, no. 8 (April 23, 2019): 2407. http://dx.doi.org/10.3390/su11082407.

Full text
Abstract:
The social responsibility of corporate mining has been challenged by a significant socio-political risk from local communities. These issues reduce shareholder value by increasing costs and decreasing the market perception of corporate social responsibility. Community engagement is the process of understanding the behavior and interests of a group of targeted mining communities through surveys and data analysis, with the purpose of incorporating mining community acceptance into the mining sustainability. While mining organizations have discussed community engagement to varying degrees, there are three main shortcomings in current studies, as concluded in the authors’ previous research. This paper presents a framework to apply discrete choice theory to improve mining community engagement and corporate mining social responsibility. In addition, this paper establishes the main technical challenges to implement the developed framework, and presents methods to overcome the challenges for future research with a case study. The contribution of this research will transform mine sustainability in a fundamental way by facilitating the incorporation of effective community engagement. This will lead to more sustainable mines that local communities support.
APA, Harvard, Vancouver, ISO, and other styles
31

Levkivskyi, Vitalii, Nadiia Lobanchykova, and Dmytro Marchuk. "Research of algorithms of Data Mining." E3S Web of Conferences 166 (2020): 05007. http://dx.doi.org/10.1051/e3sconf/202016605007.

Full text
Abstract:
The article explores data mining algorithms, which based on rules and calculations, that allow us to create a model that analyzes the data provided by searching for specific patterns and trends. The purpose of this work is to analyze correlation-regression algorithms on a statistical dataset of chronic diseases. Data mining allows building many models, multiple algorithms can be used within a single solution. The article explores the algorithms of clustering, correlation analysis, Naive Bayes algorithm for obtaining different views of data. Since diabetes is one of the most dangerous chronic diseases, the pathogenesis of which is a lack of insulin in the human body, which causes metabolic disorders and pathological changes in various organs and tissues. As a result, it leads to disability of all functional systems of the body. It was decided to investigate the data related to this disease. Also, the quality of the developed methods of information retrieval from the dataset was evaluated and the most informative features were identified. The developed methods were implemented in the system of intellectual data processing. Past studies show promise of using data mining methods to improve the quality of patient care.
APA, Harvard, Vancouver, ISO, and other styles
32

Hariyanto, Eko, Sri Wahyuni, and Supina Batubara. "Study of Potential Classification of Lost Students in College Based on Information Extraction on Text-Based Social Media; Case Study of Panca Budi Pembangunan University." International Journal of Research and Review 8, no. 11 (November 29, 2021): 325–31. http://dx.doi.org/10.52403/ijrr.20211140.

Full text
Abstract:
The main problem studied in this study is the large number of lost students who harm universities because of the difficulty of monitoring or monitoring as a preventive measure. Therefore, this research becomes very important to be done so that college institutions can make efforts to detect early (classification) of students who potentially cannot complete their studies on time or students who will drop out (DO). Thus, PT institutions through related parties such as academic guidance lecturers, academic bureaus and others can do initial prevention by providing the best solution or solution to the problems faced by students. This research aims to determine the training data model consisting of academic and non-academic factors (including the results of extracting information from social media). Furthermore, this model is used as a basis for classifying students who have the potential to "graduate on time", "graduate not on time", and "DO". The method approach used is quantitative with text mining computational algorithms for the process of extracting knowledge / information from social media which is further used in data training, as well as data mining computational algorithms for the process of classification of potential completion of student studies. The mandatory external targeted in the first year is the publication of the international journal Scopus Q4 and in the second year is the publication of the international journal Scopus Q3. For additional external targets in the first and second years respectively are the publication of international journals indexed on reputable indexers, ISBN teaching books and copyrights. The level of technological readiness (TKT) in this study up to level 2 is the formulation of technological concepts and applications to classify the potential completion of student studies using data mining. Keywords: [student lost, knowledge/information extraction, data classification, text mining, data mining].
APA, Harvard, Vancouver, ISO, and other styles
33

Andry, Johanes Fernandes, Henny Hartono, Honni, Aziza Chakir, and Rafael. "Data Set Analysis Using Rapid Miner to Predict Cost Insurance Forecast with Data Mining Methods." Journal of Hunan University Natural Sciences 49, no. 6 (June 30, 2022): 167–75. http://dx.doi.org/10.55463/issn.1674-2974.49.6.17.

Full text
Abstract:
The insurance protection program cannot be separated from everyday human life because there will always be risks in every human activity. Most people have entered into insurance agreements with state-owned and national private-owned insurance companies. The information system is one of the resources to increase competitive advantage. Information systems can be used to obtain, process, and disseminate information to support day-to-day operations and support strategic decision-making activities. The rapid growth of data accumulation has created data-rich but insufficient information conditions. Data mining is the mining or discovery of new information by looking for specific patterns or rules from large amounts of data expected to overcome these conditions. It is hoped that customer data can accurately produce information about insurance cost predictions. In this analysis, the authors use the RapidMiner Studio version 9.1 software. With the RapidMiner Studio app, authors can analyze the insurance data. A scientific novelty of this research is investigating data set cost insurance with data mining techniques consisting of classification, association, and clustering. Research goals for data mining techniques with classification, association, and clustering case studies implemented are to find all associative rules with high confidence, organize objects into groups whose members are similar, and collect objects between them. The following methods can be used: decision tree for data modeling, FP-Growth for determining which dataset occurs most frequently, and K-Means to classify the data attributes to facilitate the analysis.
APA, Harvard, Vancouver, ISO, and other styles
34

Carbone, M., L. Berardi, D. Laucelli, and P. Piro. "Data-mining approach to investigate sedimentation features in combined sewer overflows." Journal of Hydroinformatics 14, no. 3 (October 24, 2011): 613–27. http://dx.doi.org/10.2166/hydro.2011.003.

Full text
Abstract:
Sedimentation is the most common and effectively practiced method of urban drainage control in terms of operating installations and duration of service. Assessing the percentage of suspended solids removed after a given detention time is essential for both design and management purposes. In previous experimental studies by some of the authors, the expression of iso-removal curves (i.e. representing the water depth where a given percentage of suspended solids is removed after a given detention time in a sedimentation column) has been demonstrated to depend on two parameters which describe particle settling velocity and flocculation factor. This study proposes an investigation of the influence of some hydrological and pollutant aggregate information of the sampled events on both parameters. The Multi-Objective (EPR-MOGA) and Multi-Case Strategy (MCS-EPR) variants of the Evolutionary Polynomial Regression (EPR) are originally used as data-mining strategies. Results are proved to be consistent with previous findings in the field and some indications are drawn for relevant practical applicability and future studies.
APA, Harvard, Vancouver, ISO, and other styles
35

Jadrić, Mario, Ivana Ninčević Pašalić, and Maja Ćukušić. "Process Mining Contributions to Discrete-event Simulation Modelling." Business Systems Research Journal 11, no. 2 (October 1, 2020): 51–72. http://dx.doi.org/10.2478/bsrj-2020-0015.

Full text
Abstract:
AbstractBackground: Over the last 20 years, process mining has become a vibrant research area due to the advances in data management technologies and techniques and the advent of new process mining tools. Recently, the links between process mining and simulation modelling have become an area of interest.Objectives: The objective of the paper was to demonstrate and assess the role of process mining results as an input for discrete-event simulation modelling, using two different datasets, one of which is considered data-poor while the other one data-rich.Methods/Approach: Statistical calculations and process maps were prepared and presented based on the event log data from two case studies (smart mobility and higher education) using a process mining tool. Then, the implications of the results across the building blocks (entities, activities, control-flows, and resources) of simulation modelling are discussed.Results: Apart from providing a rationale and the framework for simulation that is more efficient modelling based on process mining results, the paper provides contributions in the two case studies by deliberating and identifying potential research topics that could be tackled and supported by the new combined approach.Conclusions: Event logs and process mining provide valuable information and techniques that could be a useful input for simulation modelling, especially in the first steps of building discreteevent models, but also for validation purposes.
APA, Harvard, Vancouver, ISO, and other styles
36

Helm, Emmanuel, Anna M. Lin, David Baumgartner, Alvin C. Lin, and Josef Küng. "Towards the Use of Standardized Terms in Clinical Case Studies for Process Mining in Healthcare." International Journal of Environmental Research and Public Health 17, no. 4 (February 19, 2020): 1348. http://dx.doi.org/10.3390/ijerph17041348.

Full text
Abstract:
Process mining can provide greater insight into medical treatment processes and organizational processes in healthcare. To enhance comparability between processes, the quality of the labelled-data is essential. A literature review of the clinical case studies by Rojas et al. in 2016 identified several common aspects for comparison, which include methodologies, algorithms or techniques, medical fields, and healthcare specialty. However, clinical aspects are not reported in a uniform way and do not follow a standard clinical coding scheme. Further, technical aspects such as details of the event log data are not always described. In this paper, we identified 38 clinically-relevant case studies of process mining in healthcare published from 2016 to 2018 that described the tools, algorithms and techniques utilized, and details on the event log data. We then correlated the clinical aspects of patient encounter environment, clinical specialty and medical diagnoses using the standard clinical coding schemes SNOMED CT and ICD-10. The potential outcomes of adopting a standard approach for describing event log data and classifying medical terminology using standard clinical coding schemes are further discussed. A checklist template for the reporting of case studies is provided in the Appendix A to the article.
APA, Harvard, Vancouver, ISO, and other styles
37

Arafik, Arafik, Restu Juniah, and Mohammad Zulkarnain. "Safety And Health Implementation Study Work (K3) In Coal Mining Companies (Case Study: PT. XYZ)." Indonesian Journal of Environmental Management and Sustainability 3, no. 3 (September 30, 2019): 75–79. http://dx.doi.org/10.26554/ijems.2019.3.3.75-79.

Full text
Abstract:
This study aims to analyze the implementation of occupational safety and health (K3) in the coal mining company PT XYZ, Analyze and identify the factors that influence the implementation of occupational safety and health (K3) in the mining company PT XYZ. This research is a descriptive qualitative and quantitative research approach. Primary data obtained from respondents are used as a means to obtain information or data carried out by field surveys through direct observation and interviews with respondents in the company and secondary data obtained from PT XYZ collected and compiled according to the problem of this study which was conducted in literature studies. Data were analyzed through text analysis, data interpretation. Data were analyzed using the SPSS (factor analysis) program to analyze the factors that influence the implementation of occupational safety and health in the coal mining company PT XYZ.
APA, Harvard, Vancouver, ISO, and other styles
38

Gunawan, Gunawan. "DATA MINING USING CRISP-DM PROCESS FRAMEWORK ON OFFICIAL STATISTICS: A CASE STUDY OF EAST JAVA PROVINCE." Jurnal Ekonomi dan Pembangunan 29, no. 2 (December 31, 2021): 183–98. http://dx.doi.org/10.14203/jep.29.2.2021.183-198.

Full text
Abstract:
Data mining on official statistics becomes a study interest, as it offers an opportunity to reveal hidden patterns within the data. This study investigates the data mining process's appropriateness using the CRISP-DM method to a secondary-quantitative data analysis and to investigate hidden information revealed from data mining on official statistics. Data is collected from the East Java BPS website, and the unit of analysis is regency/municipality. Five macro development indicators (Human Development Index, Gross Regional Domestic Products, poverty rate, Gini Ratio, open unemployment rate) are selected as analysis variables. Workflows of data analysis are designed using Knime software. This study shows the usefulness of the CRISP-DM method for secondary research because it specifies standardized stages for analyzing secondary data and improves the secondary analysis rigor. Furthermore, the clustering technique classifies regencies/municipalities into three clusters. One of the clusters has desirable indicator levels: high Human Development Index - high Gross Regional Domestic Products - low poverty rate, together with undesirable ones: high Gini Ratio - high open unemployment rate. This result indicates that a regency/municipality might not achieve an ideal condition of the five macro development indicators. Some indicators such as the open unemployment rate might be an inevitable impact. This research adds to the literature on development economics studies, particularly on the application of data mining, the CRISP-DM method, and Knime software to official statistics.
APA, Harvard, Vancouver, ISO, and other styles
39

Hojin Nam, Minseo Rhee, Jeung-Sun Lee, Yong-Gyu Jung, Bumsu Kim,. "Effective Diagnosis of Coronary Artery Disease using Case-based Reasoning." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (April 11, 2021): 449–57. http://dx.doi.org/10.17762/turcomat.v12i5.991.

Full text
Abstract:
With the advent of big data, data mining is more increasingly utilized in various decision-making fields by extracting hidden and meaningful information from large amounts of data. Even as exponential increase of the request of unrevealing the hidden meaning behind data, it becomes more and more important to decide to select which data mining algorithm and how to use it. There are several mainly used data mining algorithms in biology and clinics highlighted; Logistic regression, Neural networks, Support vector machine, and variety of statistical techniques. Among them Case-based reasoning (CBR) is relatively seems to be simplistic but very powerful to disclose unseeable problems in complex environments with only simplistic use of the above single technique for prediction of nonlinear models. On the other hand, quantities of the human momentum and activities are more diminished, whereas lifestyle of drinking, smoking and western eating habits are changing, and thus such as the unrevealed risks caused by heart attack or angina are growing up more and more. Therefore according to the increase of patients suffering from heart disease, a number of data mining studies are undergoing to assist medical doctors by prediction of whether to perform coronary angiography which requiring much resources in cost and procedures. Our study uses the same datasets on heart disease patients, that made use of multiple datasets collected from Cleveland, Hungary, Long Beach and Switzerland. Unlike the approach of , we observed that the experimental dataset is composed of multiple populations. And they are similar in use of same kinds of disease patients but different in the time and area of investigation. Through the experimental results, CBR made better performance than the techniques proposed from the original study for the disease prediction. Consequently we conclude effective diagnosis prediction must accompany with selection of the data mining technique considering the characteristics of samples and data collection.
APA, Harvard, Vancouver, ISO, and other styles
40

Yan, Yueguan, Ming Li, Jibo Liu, Weitao Yan, Jinman Zhang, and Bang Zhou. "Ground Subsidence Evolution from 1000 m Deep Mining: A Case Study in Fengfeng Mining Area." Shock and Vibration 2021 (September 27, 2021): 1–9. http://dx.doi.org/10.1155/2021/9942968.

Full text
Abstract:
The mining of coal resources in eastern China has entered the stage of deep mining, and many mines have reached the depth of 1000 meters. Different from shallow and moderate depth mining, the temporal and spatial evolution regulation of surface movement and deformation under deep mining has its particularity. Combining with the geological and mining conditions of Fengfeng mining area, this paper systematically studies the characteristics of surface movement under the condition of shallow, moderate, and near kilometer mining depth. By means of field measurement, InSAR monitoring, we get the subsidence data under different mining depth and get the relevant subsidence parameters by inversion. Through comparative analysis, the special law of subsidence under the mining depth of 1000 meters is obtained. The results show that under the condition of nearly 1000 meters mining depth, the surface movement and deformation have the characteristics of large displacement angle, small displacement deformation value, and large main influence radius. The regulation of small proportion of active period of maximum subsidence point, gentle shape of surface movement basin, and low mining adequacy are obtained. The research results provide technical references for deep mining under buildings, railways, and water bodies and provide basis and reference for scientific mining and safe recovery of coal pillars in kilometer deep mine.
APA, Harvard, Vancouver, ISO, and other styles
41

Conrads, Paul, and Edwin A. Roehl. "Transforming Large Databases Into Critical Knowledge Using Data Mining– Three Case Studies in South Carolina and Georgia." Proceedings of the Water Environment Federation 2006, no. 6 (January 1, 2006): 6248–67. http://dx.doi.org/10.2175/193864706783775775.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Alsukhni, Emad, Ahmad A. Saifan, and Hanadi Alawneh. "A New Data Mining-Based Framework to Test Case Prioritization Using Software Defect Prediction." International Journal of Open Source Software and Processes 8, no. 1 (January 2017): 21–41. http://dx.doi.org/10.4018/ijossp.2017010102.

Full text
Abstract:
Test cases do not have the same importance when used to detect faults in software; therefore, it is more efficient to test the system with the test cases that have the ability to detect the faults. This research proposes a new framework that combines data mining techniques to prioritize the test cases. It enhances fault prediction and detection using two different techniques: 1) the data mining regression classifier that depends on software metrics to predict defective modules, and 2) the k-means clustering technique that is used to select and prioritize test cases to identify the fault early. Our approach of test case prioritization yields good results in comparison with other studies. The authors used the Average Percentage of Faults Detection (APFD) metric to evaluate the proposed framework, which results in 19.9% for all system modules and 25.7% for defective ones. Our results give us an indication that it is effective to start the testing process with the most defective modules instead of testing all modules arbitrary arbitrarily.
APA, Harvard, Vancouver, ISO, and other styles
43

Bonidia, Robson P., Luiz A. L. Rodrigues, Anderson P. Avila-Santos, Danilo S. Sanches, and Jacques D. Brancher. "Computational Intelligence in Sports: A Systematic Literature Review." Advances in Human-Computer Interaction 2018 (October 30, 2018): 1–13. http://dx.doi.org/10.1155/2018/3426178.

Full text
Abstract:
Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.
APA, Harvard, Vancouver, ISO, and other styles
44

Gil, Jorge, José Nuno Beirão, Nuno Montenegro, and José Pinto Duarte. "On the discovery of urban typologies: data mining the many dimensions of urban form." Urban Morphology 16, no. 1 (June 21, 2011): 27–40. http://dx.doi.org/10.51347/jum.v16i1.3966.

Full text
Abstract:
The use of typomorphology as a means of understanding urban areas has a long tradition amongst academics but the reach of these methods into urban design practice has been limited. In this paper we present a method to support the description and prescription of urban form that is contextsensitive, multi-dimensional, systematic, exploratory, and quantitative, thus facilitating the application of urban typomorphology to planning practice. At the core of the proposed method is the k-means statistical clustering technique to produce objective classifications from the large complex data sets typical of urban environments. Block and street types were studied as a test case and a context-sensitive sample of types that correspond to two different neighbourhoods were identified. This method is suitable to support the identification, understanding and description of emerging urban forms that do not fall into standard classifications. The method can support larger urban form studies through consistent application of the procedures to different sites. The quantitative nature of its output lends itself to integration with other systematic procedures related to the research, analysis, planning and design of urban areas.
APA, Harvard, Vancouver, ISO, and other styles
45

Gusmadi, Setiawan, and Samsuri Samsuri. "Gerakan Kewarganegaraan Ekologis sebagai upaya Pembentukan Karakter Peduli Lingkungan." Jurnal Ilmiah Pendidikan Pancasila dan Kewarganegaraan 4, no. 2 (January 6, 2020): 381. http://dx.doi.org/10.17977/um019v4i2p381-391.

Full text
Abstract:
This article aims to discuss the forms of activities of the ecological citizenship movement and the formation of environmentally care characters. The study uses a qualitative approach with a case study. Data collection uses observation, interviews, and documentation studies. The ecological citizenship movement is carried out through the post-mining reclamation movement, mangrove planting movement, resistance movement, and waste care action movement. Efforts to build the character of environmental care are instilled through environmental education in the community and students in schools, clear law enforcement for mining companies, and campaigns through social media about environmental conditions.
APA, Harvard, Vancouver, ISO, and other styles
46

Kazemzadeh, Reza S., Kamran Sartipi, and Priya Jayaratna. "A Framework for Data and Mined Knowledge Interoperability in Clinical Decision Support Systems." International Journal of Healthcare Information Systems and Informatics 5, no. 1 (January 2010): 37–60. http://dx.doi.org/10.4018/jhisi.2010110303.

Full text
Abstract:
Due to reliance on human knowledge, the practice of medicine is subject to errors that endanger patients’ health and cause substantial financial loss to healthcare institutions. Computer-based decision support systems assist healthcare personnel to improve quality of clinical practice. Currently, most clinical guideline modeling languages represent decision-making knowledge in terms of basic logical expressions. In this paper, we focus on encoding, sharing, and using results of data mining analyses to influence decision making within Clinical Decision Support Systems. A knowledge management framework is proposed that addresses the issues of data and knowledge interoperability by adopting healthcare and data mining modeling standards. In a further step, data mining results are incorporated into a guideline-based decision support system. A prototype tool has been developed to provide an environment for clinical guideline authoring and execution. Also, three real world case studies have been presented, one of which is used as a running example throughout the paper.
APA, Harvard, Vancouver, ISO, and other styles
47

Abasova, Jela, Pavol Tanuska, and Stefan Rydzi. "Big Data—Knowledge Discovery in Production Industry Data Storages—Implementation of Best Practices." Applied Sciences 11, no. 16 (August 20, 2021): 7648. http://dx.doi.org/10.3390/app11167648.

Full text
Abstract:
CRISP-DM (cross-industry standard process for data mining) methodology was developed as an intuitive tool for data scientists, to help them with applying Big Data methods in the complex technological environment of Industry 4.0. The review of numerous recent papers and studies uncovered that most of papers focus either on the application of existing methods in case studies, summarizing existing knowledge, or developing new methods for a certain kind of problem. Although all of these types of research are productive and required, we identified a lack of complex best practices for a specific field. Therefore, our goal is to propose best practices for the data analysis in production industry. The foundation of our proposal is based on three main points: the CRISP-DM methodology as the theoretical framework, the literature overview as an expression of current needs and interests in the field of data analysis, and case studies of projects we were directly involved in as a source of real-world experience. The results are presented as lists of the most common problems for selected phases (‘Data Preparation’ and ‘Modelling’), proposal of possible solutions, and diagrams for these phases. These recommendations can help other data scientists avoid certain problems or choose the best way to approach them.
APA, Harvard, Vancouver, ISO, and other styles
48

Habibi, Reza. "Application of Predictive Methods to Financial Data Sets." Financial Internet Quarterly 17, no. 1 (March 1, 2021): 50–61. http://dx.doi.org/10.2478/fiqf-2021-0006.

Full text
Abstract:
Abstract Financial data sets are growing too fast and need to be analyzed. Data science has many different techniques to store and summarize, mining, running simulations and finally analyzing them. Among data science methods, predictive methods play a critical role in analyzing financial data sets. In the current paper, applications of 22 methods classified in four categories namely data mining and machine learning, numerical analysis, operation research techniques and meta-heuristic techniques, in financial data sets are studied. To this end, first, literature reviews on these methods are given. For each method, a data analysis case (as an illustrative example) is presented and the problem is analyzed with the mentioned method. An actual case is given to apply those methods to solve the problem and to choose a better one. Finally, a conclusion section is proposed.
APA, Harvard, Vancouver, ISO, and other styles
49

Nwobodo, Tonia Nkiru, and Bright Emeka Ogbuene. "Effects of sand mining on land use/land cover on river environment in developing countries: A case study of Ava River in Enugu State, Nigeria." IKENGA International Journal of Institute of African Studies 22, no. 3 (September 1, 2021): 1–25. http://dx.doi.org/10.53836/ijia/2021/22/3/003.

Full text
Abstract:
Sand mining contributes immensely to economic development. However, this activity when carried out in a river environment can affect the land use and land cover of the area. The study objectives include mapping, quantifying and assessing the landuse/land cover (LULC) changes of Ava River from 2007 to 2019 and projection from 2020 to 2025 and 2031. The paper discusses pre-existing LULC maps from the past (2007, 2013 and 2019), present (2020-2025) and near future (2026-3031). The study used Geographical Information System (GIS) and remote sensing data to estimate the changes in LULC of the study area in the various periods. The images were classified using a supervised classifier yielding three LULC maps of the Ava River environment. The classification grouped the area into six main LULC types. The result showed no change detection in the built-up area from 2007 to 2013 but increased by 26.15% in 2019. The mining area increased by 8.19% from 2007 to 2019.Riverbank also increased by 12.81% from 2007 to 2019.The correlation analysis used showed a positive relationship between the built-up area and sand mining as well as the river bank morphology. In 2019, sand mining activities in the Ava River site covered an area of approximately 389325.60 m2 and it was predicted that in 2025 and 2031, the affected area would increase to 485397.12m2 and 611753.52m2 respectively. The study reveals that sand mining activities in the Ava River environment are causing the river bank to widen. This may have an adverse effect on the erected buildings very close to the riverbank in the near future if not controlled. The study showed significant change detection across the periods. These detected changes would serve as a scientific basis upon which decisionmakers can design policy guidelines on sand mining, river environment protection, conservation and management in developing countries.
APA, Harvard, Vancouver, ISO, and other styles
50

Kurniawan, Yogiek Indra. "Perbandingan Algoritma Naive Bayes dan C.45 dalam Klasifikasi Data Mining." Jurnal Teknologi Informasi dan Ilmu Komputer 5, no. 4 (October 1, 2018): 455. http://dx.doi.org/10.25126/jtiik.201854803.

Full text
Abstract:
<p>Pada paper ini, telah diterapkan metode <em>Naive Bayes</em> serta <em>C.45</em> ke dalam 4 buah studi kasus, yaitu kasus penerimaan “Kartu Indonesia Sehat”, penentuan pengajuan kartu kredit di sebuah bank, penentuan usia kelahiran, serta penentuan kelayakan calon anggota kredit pada koperasi untuk mengetahui algoritma terbaik di setiap kasus<em>. </em>Setelah itu, dilakukan perbandingan dalam hal <em>Precision</em>, <em>Recall</em> serta <em>Accuracy</em> untuk setiap data training dan data testing yang telah diberikan. Dari hasil implementasi yang dilakukan, telah dibangun sebuah aplikasi yang dapat menerapkan algoritma <em>Naive Bayes </em>dan <em>C.45 </em>di 4 buah kasus tersebut. Aplikasi telah diuji dengan blackbox dan algoritma dengan hasil valid dan dapat mengimplementasikan kedua buah algoritma dengan benar. Berdasarkan hasil pengujian, semakin banyaknya data training yang digunakan, maka nilai <em>precision, recall</em> dan <em>accuracy</em> akan semakin meningkat. Selain itu, hasil klasifikasi pada algoritma <em>Naive Bayes</em> dan <em>C.45</em> tidak dapat memberikan nilai yang absolut atau mutlak di setiap kasus. Pada kasus penentuan penerimaan Kartu Indonesia Sehat, kedua buah algoritma tersebut sama-sama efektif untuk digunakan. Untuk kasus pengajuan kartu kredit di sebuah bank, C.45 lebih baik daripada Naive Bayes. Pada kasus penentuan usia kelahiran, Naive Bayes lebih baik daripada C.45. Sedangkan pada kasus penentuan kelayakan calon anggota kredit di koperasi, Naive Bayes memberikan nilai yang lebih baik pada precision, tapi untuk recall dan accuracy, C.45 memberikan hasil yang lebih baik. Sehingga untuk menentukan algoritma terbaik yang akan dipakai di sebuah kasus, harus melihat kriteria, variable maupun jumlah data di kasus tersebut.</p><p> </p><p class="Judul2"><strong><em>Abstract</em></strong></p><p><em>In this paper, applied Naive Bayes and C.45 into 4 case studies, namely the case of acceptance of “Kartu Indonesia Sehat”, determination of credit card application in a bank, determination of birth age, and determination of eligibility of prospective members of credit to Koperasi to find out the best algorithm in each case. After that, the comparison in Precision, Recall and Accuracy for each training data and data testing has been given. From the results of the implementation, has built an application that can apply the Naive Bayes and C.45 algorithm in 4 cases. Applications have been tested in blackbox and algorithms with valid results and can implement both algorithms correctly. Based on the test results, the more training data used, the value of precision, recall and accuracy will increase. The classification results of Naive Bayes and C.45 algorithms can not provide absolute value in each case. In the case of determining the acceptance of the Kartu Indonesia Indonesia, the two algorithms are equally effective to use. For credit card submission cases at a bank, C.45 is better than Naive Bayes. In the case of determining the age of birth, Naive Bayes is better than C.45. Whereas in the case of determining the eligibility of prospective credit members in the cooperative, Naive Bayes provides better value in precision, but for recall and accuracy, C.45 gives better results. So, to determine the best algorithm to be used in a case, it must look at the criteria, variables and amount of data in the case</em></p>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography