Journal articles on the topic 'Transactional datasets'

To see the other types of publications on this topic, follow the link: Transactional datasets.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Transactional datasets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

AL Bouna, Bechara, Chris Clifton, and Qutaibah Malluhi. "Anonymizing transactional datasets." Journal of Computer Security 23, no. 1 (March 15, 2015): 89–106. http://dx.doi.org/10.3233/jcs-140517.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Puri, Vartika, Parmeet Kaur, and Shelly Sachdeva. "ADT." International Journal of Information Security and Privacy 15, no. 3 (July 2021): 83–105. http://dx.doi.org/10.4018/ijisp.2021070106.

Full text
Abstract:
Data anonymization is commonly utilized for the protection of an individual's identity when his personal or sensitive data is published. A well-known anonymization model to define the privacy of transactional data is the km-anonymity model. This model ensures that an adversary who knows up to m items of an individual cannot determine which record in the dataset corresponds to the individual with a probability greater than 1/k. However, the existing techniques generally rely on the presence of similarity between items in the dataset tuples to achieve km-anonymization and are not suitable when transactional data contains tuples without many common values. The authors refer to this type of transactional data as diverse transactional data and propose an algorithm, anonymization of diverse transactional data (ADT). ADT is based on slicing and generalization to achieve km-anonymity for diverse transactional data. ADT has been experimentally evaluated on two datasets, and it has been found that ADT yields higher privacy protection and causes a lower loss in data utility as compared to existing methods.
APA, Harvard, Vancouver, ISO, and other styles
3

Vu, Duc Thi, and Huy Duc Nguyen. "Mining High Utility Itemsets in Massive Transactional Datasets." Acta Cybernetica 20, no. 2 (2011): 331–46. http://dx.doi.org/10.14232/actacyb.20.2.2011.6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Xiangwen, Xia Feng, and Yuquan Zhu. "Transactional Data Anonymization for Privacy and Information Preservation via Disassociation and Local Suppression." Symmetry 14, no. 3 (February 25, 2022): 472. http://dx.doi.org/10.3390/sym14030472.

Full text
Abstract:
Ubiquitous devices in IoT-based environments create a large amount of transactional data on daily personal behaviors. Releasing these data across various platforms and applications for data mining can create tremendous opportunities for knowledge-based decision making. However, solid guarantees on the risk of re-identification are required to make these data broadly available. Disassociation is a popular method for transactional data anonymization against re-identification attacks in privacy-preserving data publishing. The anonymization algorithm of disassociation is performed in parallel, suitable for the asymmetric paralleled data process in IoT where the nodes have limited computation power and storage space. However, the anonymization algorithm of disassociation is based on the global recoding mode to achieve transactional data km -anonymization, which leads to a loss of combinations of items in transactional datasets, thus decreasing the data quality of the published transactions. To address the issue, we propose a novel vertical partition strategy in this paper. By employing local suppression and global partition, we first eliminate the itemsets which violate km-anonymity to construct the first km-anonymous record chunk. Then, by the processes of itemset creating and reducing, we recombine the globally partitioned items from the first record chunk to construct remaining km-anonymous record chunks. The experiments illustrate that our scheme can retain more association between items in the dataset, which improves the utility of published data.
APA, Harvard, Vancouver, ISO, and other styles
5

Tang, Huijun, Le Wang, Yangguang Liu, and Jiangbo Qian. "Discovering Approximate and Significant High-Utility Patterns from Transactional Datasets." Journal of Mathematics 2022 (November 16, 2022): 1–17. http://dx.doi.org/10.1155/2022/6975130.

Full text
Abstract:
Mining high-utility pattern (HUP) on transactional datasets has been widely discussed, and various algorithms have been introduced to settle this problem. However, the time-space efficiency of the algorithms is still limited, and the mining system cannot provide timely feedback on relevant information. In addition, when mining HUP from taxonomy transactional datasets, a large portion of the quantitative results are just accidental responses to the user-defined utility constraints, and they may have no statistical significance. To address these two problems, we propose two corresponding approaches named Sampling HUP-Miner and Significant HUP-Miner. Sampling HUP-Miner pursues a sample size of a transitional dataset based on a theoretical guarantee; the mining results based on such a sample size can be an effective approximation to the results on the whole datasets. Significant HUP-Miner proposes the concept of testable support, and significant HUPs could be drawn timely based on the constraint of testable support. Experiments show that the designed two algorithms can discover approximate and significant HUPs smoothly and perform well according to the runtime, pattern numbers, memory usage, and average utility.
APA, Harvard, Vancouver, ISO, and other styles
6

Puri, Vartika, Parmeet Kaur, and Shelly Sachdeva. "Effective Removal of Privacy Breaches in Disassociated Transactional Datasets." Arabian Journal for Science and Engineering 45, no. 4 (January 28, 2020): 3257–72. http://dx.doi.org/10.1007/s13369-020-04353-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Bana, Mohamed Reda, Marwa Salah Farhan, and Nermin Abdelhakim Othman. "An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data." Data 7, no. 1 (January 14, 2022): 11. http://dx.doi.org/10.3390/data7010011.

Full text
Abstract:
Frequent itemset mining (FIM) is a common approach for discovering hidden frequent patterns from transactional databases used in prediction, association rules, classification, etc. Apriori is an FIM elementary algorithm with iterative nature used to find the frequent itemsets. Apriori is used to scan the dataset multiple times to generate big frequent itemsets with different cardinalities. Apriori performance descends when data gets bigger due to the multiple dataset scan to extract the frequent itemsets. Eclat is a scalable version of the Apriori algorithm that utilizes a vertical layout. The vertical layout has many advantages; it helps to solve the problem of multiple datasets scanning and has information that helps to find each itemset support. In a vertical layout, itemset support can be achieved by intersecting transaction ids (tidset/tids) and pruning irrelevant itemsets. However, when tids become too big for memory, it affects algorithms efficiency. In this paper, we introduce SHFIM (spark-based hybrid frequent itemset mining), which is a three-phase algorithm that utilizes both horizontal and vertical layout diffset instead of tidset to keep track of the differences between transaction ids rather than the intersections. Moreover, some improvements are developed to decrease the number of candidate itemsets. SHFIM is implemented and tested over the Spark framework, which utilizes the RDD (resilient distributed datasets) concept and in-memory processing that tackles MapReduce framework problem. We compared the SHFIM performance with Spark-based Eclat and dEclat algorithms for the four benchmark datasets. Experimental results proved that SHFIM outperforms Eclat and dEclat Spark-based algorithms in both dense and sparse datasets in terms of execution time.
APA, Harvard, Vancouver, ISO, and other styles
8

R., Sujatha, and Dr S. Ravichandran. "MAX-MiBit-An Algorithm To Discover Maximal Frequent Itemsets From Large Transactional Datasets." International Journal of Research in Advent Technology 7, no. 4 (April 10, 2019): 326–29. http://dx.doi.org/10.32622/ijrat.742019122.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yan, Hua, Keke Chen, Ling Liu, and Joonsoo Bae. "Determining the best K for clustering transactional datasets: A coverage density-based approach." Data & Knowledge Engineering 68, no. 1 (January 2009): 28–48. http://dx.doi.org/10.1016/j.datak.2008.08.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kahanda, Indika, and Jennifer Neville. "Using Transactional Information to Predict Link Strength in Online Social Networks." Proceedings of the International AAAI Conference on Web and Social Media 3, no. 1 (March 19, 2009): 74–81. http://dx.doi.org/10.1609/icwsm.v3i1.13957.

Full text
Abstract:
Many scientific fields analyzing and modeling social networks have focused on manually-collected datasets where the friendship links are sparse (due to the costs of collection) but relatively noise-free (i.e. they indicate strong relationships). In online social networks, where the notion of ``friendship'' is broader than what would generally be considered in sociological studies, the friendship links are denser but the links contain noisier information (i.e., some weaker relationships). However, the networks also contain additional transactional events among entities (e.g., communication, file transfers) that can be used to infer the true underlying social network. With this aim in mind, we develop a supervised learning approach to predict link strength from transactional information. We formulate this as a link prediction task and compare the utility of attribute-based, topological, and transactional features. We evaluate our approach on public data from the Purdue Facebook network and show that we can accurately predict strong relationships. Moreover, we show that transactional-network features are the most influential features for this task.
APA, Harvard, Vancouver, ISO, and other styles
11

Alsukhni, Emad, Ahmed AlEroud, and Ahmad A. Saifan. "A Hybrid Pre-Post Constraint-Based Framework for Discovering Multi-Dimensional Association Rules Using Ontologies." International Journal of Information Technology and Web Engineering 14, no. 1 (January 2019): 112–31. http://dx.doi.org/10.4018/ijitwe.2019010106.

Full text
Abstract:
Association rule mining is a very useful knowledge discovery technique to identify co-occurrence patterns in transactional data sets. In this article, the authors proposed an ontology-based framework to discover multi-dimensional association rules at different levels of a given ontology on user defined pre-processing constraints which may be identified using, 1) a hierarchy discovered in datasets; 2) the dimensions of those datasets; or 3) the features of each dimension. The proposed framework has post-processing constraints to drill down or roll up based on the rule level, making it possible to check the validity of the discovered rules in terms of support and confidence rule validity measures without re-applying association rule mining algorithms. The authors conducted several preliminary experiments to test the framework using the Titanic dataset by identifying the association rules after pre- and post-constraints are applied. The results have shown that the framework can be practically applied for rule pruning and discovering novel association rules.
APA, Harvard, Vancouver, ISO, and other styles
12

Yamaguchi, Takehiro, and Ayahiko Niimi. "Extraction of Community Transition Rules from Data Streams as Large Graph Sequence." Journal of Advanced Computational Intelligence and Intelligent Informatics 15, no. 8 (October 20, 2011): 1073–81. http://dx.doi.org/10.20965/jaciii.2011.p1073.

Full text
Abstract:
In this study, we treat transactional sets of data streams as a graph sequence. This graph sequence represents both the relational structures of data for each period and changes in these structures. In addition, we analyze changes in a community in this graph sequence. Our proposed algorithm extracts community transition rules to detect communities that appear irregularly in a graph sequence using our proposed method combined with adaptive graph kernels and hierarchical clustering. In experiments using synthetic datasets and social bookmark datasets, we demonstrate that our proposed algorithm detects changes in a community appearing irregularly.
APA, Harvard, Vancouver, ISO, and other styles
13

Chen, Liping, Weitao Ha, and Guojun Zhang. "Reliable Execution Based on CPN and Skyline Optimization for Web Service Composition." Scientific World Journal 2013 (2013): 1–10. http://dx.doi.org/10.1155/2013/729769.

Full text
Abstract:
With development of SOA, the complex problem can be solved by combining available individual services and ordering them to best suit user’s requirements. Web services composition is widely used in business environment. With the features of inherent autonomy and heterogeneity for component web services, it is difficult to predict the behavior of the overall composite service. Therefore, transactional properties and nonfunctional quality of service (QoS) properties are crucial for selecting the web services to take part in the composition. Transactional properties ensure reliability of composite Web service, and QoS properties can identify the best candidate web services from a set of functionally equivalent services. In this paper we define a Colored Petri Net (CPN) model which involves transactional properties of web services in the composition process. To ensure reliable and correct execution, unfolding processes of the CPN are followed. The execution of transactional composition Web service (TCWS) is formalized by CPN properties. To identify the best services of QoS properties from candidate service sets formed in the TCSW-CPN, we use skyline computation to retrieve dominant Web service. It can overcome that the reduction of individual scores to an overall similarity leads to significant information loss. We evaluate our approach experimentally using both real and synthetically generated datasets.
APA, Harvard, Vancouver, ISO, and other styles
14

Joseph, Jismy, and Kesavaraj G. "Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth." International Journal of Engineering Technology and Management Sciences 4, no. 6 (September 28, 2020): 1–4. http://dx.doi.org/10.46647/ijetms.2020.v04i06.001.

Full text
Abstract:
Nowadays the Frequentitemset mining (FIM) is an essential task for retrieving frequently occurring patterns, correlation, events or association in a transactional database. Understanding of such frequent patterns helps to take substantial decisions in decisive situations. Multiple algorithms are proposed for finding such patterns, however the time and space complexity of these algorithms rapidly increases with number of items in a dataset. So it is necessary to analyze the efficiency of these algorithms by using different datasets. The aim of this paper is to evaluate theperformance of frequent itemset mining algorithms, Apriori and Frequent Pattern (FP) growth by comparing their features. This study shows that the FP-growth algorithm is more efficient than the Apriori algorithm for generating rules and frequent pattern mining.
APA, Harvard, Vancouver, ISO, and other styles
15

Anantharaman, Padmanathan, and H. V. Ramakrishan. "Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools." APTIKOM Journal on Computer Science and Information Technologies 2, no. 2 (July 1, 2017): 57–62. http://dx.doi.org/10.11591/aptikom.j.csit.103.

Full text
Abstract:
As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.
APA, Harvard, Vancouver, ISO, and other styles
16

Srinivas, UMohan, Ch Anuradha, and Dr P. Sri Rama Chandra Murty. "Hash based Approach for Mining Frequent Item Sets from Transactional Databases." International Journal of Engineering & Technology 7, no. 3.34 (September 1, 2018): 309. http://dx.doi.org/10.14419/ijet.v7i3.34.19214.

Full text
Abstract:
Frequent Itemset Mining become so popular in extracting hidden patterns from transactional databases. Among the several approaches, Apriori algorithm is known to be a basic approach which follows candidate generate and test based strategy. Although it is efficient level-wise approach, it has two limitations, (i) several passes are required to check the support of candidate itemsets. (ii) Towards more candidate itemsets and minimum threshold variations. A novel approach is proposed to tackle the above limitations. The proposed approach is one pass Hash-based Frequent Itemset Mining to derive frequent patterns. HFIM has feature that maintains candidate itemsets dynamically which are independent on minimum threshold. This feature allows to limit the number of scans over the database to one. In this paper, HFIM is compared with the Apriori to show the performance on standard datasets. The result section shows that HFIM outperforms Apriori over large databases.
APA, Harvard, Vancouver, ISO, and other styles
17

Jasek, Pavel, Lenka Vrana, Lucie Sperkova, Zdenek Smutny, and Marek Kobulsky. "COMPARATIVE ANALYSIS OF SELECTED PROBABILISTIC CUSTOMER LIFETIME VALUE MODELS IN ONLINE SHOPPING." Journal of Business Economics and Management 20, no. 3 (April 5, 2019): 398–423. http://dx.doi.org/10.3846/jbem.2019.9597.

Full text
Abstract:
The selection of a suitable customer lifetime value (CLV) model is a key issue for companies that are introducing a CLV managerial approach in their online B2C relationship stores. The online retail environment places CLV models on several specific assumptions, e.g. non-contractual relationship, continuous purchase anytime, variable-spending environment. The article focuses on empirical statistical analysis and predictive abilities of selected probabilistic CLV models that show very good results in an online retail environment compared to different model families. For comparison, eleven CLV models were selected. The comparison has been made to the online stores’ datasets from Central and Eastern Europe with annual revenues of hundreds of millions of euros and with almost 2.3 million customers. Probabilistic models have achieved overall good and consistent results on the majority of the studied transactional datasets, with BG/NBD and Pareto/NBD models that can be considered stable with significant lifts from the baseline Status quo model. Abe's variant of Pareto/NBD have underperformed multiple criterions and would not be fully useful for the studied datasets without further improvements. In the end, the authors discuss the deployment implications of selected CLV models and propose further issues for future research to address.
APA, Harvard, Vancouver, ISO, and other styles
18

Gupta, Priyanka, and Vinaya Sawant. "A Parallel Apriori Algorithm and FP- Growth Based on SPARK." ITM Web of Conferences 40 (2021): 03046. http://dx.doi.org/10.1051/itmconf/20214003046.

Full text
Abstract:
Frequent Itemset Mining is an important data mining task in real-world applications. Distributed parallel Apriori and FP-Growth algorithm is the most important algorithm that works on data mining for finding the frequent itemsets. Originally, Map-Reduce mining algorithm-based frequent itemsets on Hadoop were resolved. For handling the big data, Hadoop comes into the picture but the implementation of Hadoop does not reach the expectations for the parallel algorithm of distributed data mining because of its high I/O results in the transactional disk. According to research, Spark has an in-memory computation technique that gives faster results than Hadoop. It was mainly acceptable for parallel algorithms for handling the data. The algorithm working on multiple datasets for finding the frequent itemset to get accurate results for computation time. In this paper, we propose on parallel apriori and FP-growth algorithm to finding the frequent itemset on multiple datasets to get the mining itemsets using the Apache SPARK framework. Our experiment results depend on the support value to get accurate results.
APA, Harvard, Vancouver, ISO, and other styles
19

Prasad, K. Rajendra. "Optimized High-Utility Itemsets Mining for Effective Association Mining Paper." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 5 (October 1, 2017): 2911. http://dx.doi.org/10.11591/ijece.v7i5.pp2911-2918.

Full text
Abstract:
Association rule mining is intently used for determining the frequent itemsets of transactional database; however, it is needed to consider the utility of itemsets in market behavioral applications. Apriori or FP-growth methods generate the association rules without utility factor of items. High-utility itemset mining (HUIM) is a well-known method that effectively determines the itemsets based on high-utility value and the resulting itemsets are known as high-utility itemsets. Fastest high-utility mining method (FHM) is an enhanced version of HUIM. FHM reduces the number of join operations during itemsets generation, so it is faster than HUIM. For large datasets, both methods are very expenisve. Proposed method addressed this issue by building pruning based utility co-occurrence structure (PEUCS) for elimatination of low-profit itemsets, thus, obviously it process only optimal number of high-utility itemsets, so it is called as optimal FHM (OFHM). Experimental results show that OFHM takes less computational runtime, therefore it is more efficient when compared to other existing methods for benchmarked large datasets.
APA, Harvard, Vancouver, ISO, and other styles
20

Khattri, Vipin, and Sandeep Kumar Nayak. "Identification and Mitigation of Fraudulent Transaction using Deep Autoencoder." Journal of University of Shanghai for Science and Technology 23, no. 11 (November 28, 2021): 769–75. http://dx.doi.org/10.51201/jusst/21/11956.

Full text
Abstract:
In an ancient era, physical resources used to apply for transacting messages, treaty content, monarchy schemes, and policies and associated national or territorial currency which consumes time duration in the heavy count with negligible security. But as time passes, technological advancement has tendered its valuable and qualitative inputs to make the conventional transaction more better at its highest level of the extent, and as a qualitative and progressive resultant, the world is breathing in the current era of the digital environment with high-security priority. The responsibility of researchers and concerned authorities is to protect the online digital transaction under the safe digital environment. Therefore continuous enhancement is required in the upgrade of the security of the transaction system to handle digital transaction fraud. This research study suggests an approach of deep autoencoder for identifying fraudulent payment card transactions. To assess the outcome and validity of the projected approach of deep autoencoder for identifying fraudulent payment card transactions, testing was executed with the help of two datasets. The first dataset is a real credit card fraud dataset that is public available in world and the second dataset are generated by collecting the data using payment card transaction including genuine transaction and fraudulent transactions. A comparative analysis performed which is based on a comparison with different method and used first dataset. The proposed integration approach performed exceptionally with the different method and accomplished the maximum performance with respect to area under receiver operating characteristic curve (AUC) (95.66%).
APA, Harvard, Vancouver, ISO, and other styles
21

Gayathiri, P., and B. Poorna. "Effective Gene Patterned Association Rule Hiding Algorithm for Privacy Preserving Data Mining on Transactional Database." Cybernetics and Information Technologies 17, no. 3 (September 1, 2017): 92–108. http://dx.doi.org/10.1515/cait-2017-0032.

Full text
Abstract:
Abstract Association Rule Hiding methodology is a privacy preserving data mining technique that sanitizes the original database by hide sensitive association rules generated from the transactional database. The side effect of association rules hiding technique is to hide certain rules that are not sensitive, failing to hide certain sensitive rules and generating false rules in the resulted database. This affects the privacy of the data and the utility of data mining results. In this paper, a method called Gene Patterned Association Rule Hiding (GPARH) is proposed for preserving privacy of the data and maintaining the data utility, based on data perturbation technique. Using gene selection operation, privacy linked hidden and exposed data items are mapped to the vector data items, thereby obtaining gene based data item. The performance of proposed GPARH is evaluated in terms of metrics such as number of sensitive rules generated, true positive privacy rate and execution time for selecting the sensitive rules by using Abalone and Taxi Service Trajectory datasets.
APA, Harvard, Vancouver, ISO, and other styles
22

Yousefinaghani, Samira, Rozita Dara, Zvonimir Poljak, Fei Song, and Shayan Sharif. "A framework for the risk prediction of avian influenza occurrence: An Indonesian case study." PLOS ONE 16, no. 1 (January 15, 2021): e0245116. http://dx.doi.org/10.1371/journal.pone.0245116.

Full text
Abstract:
Avian influenza viruses can cause economically devastating diseases in poultry and have the potential for zoonotic transmission. To mitigate the consequences of avian influenza, disease prediction systems have become increasingly important. In this study, we have proposed a framework for the prediction of the occurrence and spread of avian influenza events in a geographical area. The application of the proposed framework was examined in an Indonesian case study. An extensive list of historical data sources containing disease predictors and target variables was used to build spatiotemporal and transactional datasets. To combine disparate sources, data rows were scaled to a temporal scale of 1-week and a spatial scale of 1-degree × 1-degree cells. Given the constructed datasets, underlying patterns in the form of rules explaining the risk of occurrence and spread of avian influenza were discovered. The created rules were combined and ordered based on their importance and then stored in a knowledge base. The results suggested that the proposed framework could act as a tool to gain a broad understanding of the drivers of avian influenza epidemics and may facilitate the prediction of future disease events.
APA, Harvard, Vancouver, ISO, and other styles
23

Rehman, Saif Ur, Noha Alnazzawi, Jawad Ashraf, Javed Iqbal, and Shafiullah Khan. "Efficient Top-K Identical Frequent Itemsets Mining without Support Threshold Parameter from Transactional Datasets Produced by IoT-Based Smart Shopping Carts." Sensors 22, no. 20 (October 21, 2022): 8063. http://dx.doi.org/10.3390/s22208063.

Full text
Abstract:
Internet of Things (IoT)-backed smart shopping carts are generating an extensive amount of data in shopping markets around the world. This data can be cleaned and utilized for setting business goals and strategies. Artificial intelligence (AI) methods are used to efficiently extract meaningful patterns or insights from such huge amounts of data or big data. One such technique is Association Rule Mining (ARM) which is used to extract strategic information from the data. The crucial step in ARM is Frequent Itemsets Mining (FIM) followed by association rule generation. The FIM process starts by tuning the support threshold parameter from the user to produce the number of required frequent patterns. To perform the FIM process, the user applies hit and trial methods to rerun the aforesaid routine in order to receive the required number of patterns. The research community has shifted its focus towards the development of top-K most frequent patterns not using the support threshold parameter tuned by the user. Top-K most frequent patterns mining is considered a harder task than user-tuned support-threshold-based FIM. One of the reasons why top-K most frequent patterns mining techniques are computationally intensive is the fact that they produce a large number of candidate itemsets. These methods also do not use any explicit pruning mechanism apart from the internally auto-maintained support threshold parameter. Therefore, we propose an efficient TKIFIs Miner algorithm that uses depth-first search strategy for top-K identical frequent patterns mining. The TKIFIs Miner uses specialized one- and two-itemsets-based pruning techniques for topmost patterns mining. Comparative analysis is performed on special benchmark datasets, for example, Retail with 16,469 items, T40I10D100K and T10I4D100K with 1000 items each, etc. The evaluation results have proven that the TKIFIs Miner is at the top of the line, compared to recently available topmost patterns mining methods not using the support threshold parameter.
APA, Harvard, Vancouver, ISO, and other styles
24

Bawiskar, Saurav. "Smart Profitable Solutions with Recommendation Framework." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 4099–105. http://dx.doi.org/10.22214/ijraset.2022.44835.

Full text
Abstract:
Abstract: Discovering the frequent patterns in transactional databases is one of the crucial functionalities of apriori algorithm. Apriori algorithm is an algorithm which works on the principle of association rule mining. It is a dynamic and skillful algorithm used for discovering frequent patterns in a database, hence proving out to be efficient and important in data mining. Apriori algorithm finds associations between different sets of data. Every different set of data has a collective number of items and is called a transaction. The accomplishment of apriori is the set of rules that expose us how often any particular item or a set of items is contained in a set of data. In our proposed system, to provide efficiency, our basic aim is to implement apriori algorithm by setting up a threshold value and a varying support count which will act as a filter for our recommendation data. We can adjust the threshold value in order to increase or decrease the accuracy of the system. We have used apriori algorithm keeping in mind, its application in retailing industry and its capability of computing and handling large datasets and especially for the purpose of market basket analysis. The use of apriori algorithm along with analytical tools can provide insights into data and help the user in management and decision making provided that the user feeds the system in a correct way. Our aim is to provide user with recommendations which would ultimately help them in improving their business operations.
APA, Harvard, Vancouver, ISO, and other styles
25

Sanober, Sumaya, Izhar Alam, Sagar Pande, Farrukh Arslan, Kantilal Pitambar Rane, Bhupesh Kumar Singh, Aditya Khamparia, and Mohammad Shabaz. "An Enhanced Secure Deep Learning Algorithm for Fraud Detection in Wireless Communication." Wireless Communications and Mobile Computing 2021 (August 7, 2021): 1–14. http://dx.doi.org/10.1155/2021/6079582.

Full text
Abstract:
In today’s era of technology, especially in the Internet commerce and banking, the transactions done by the Mastercards have been increasing rapidly. The card becomes the highly useable equipment for Internet shopping. Such demanding and inflation rate causes a considerable damage and enhancement in fraud cases also. It is very much necessary to stop the fraud transactions because it impacts on financial conditions over time the anomaly detection is having some important application to detect the fraud detection. A novel framework which integrates Spark with a deep learning approach is proposed in this work. This work also implements different machine learning techniques for detection of fraudulent like random forest, SVM, logistic regression, decision tree, and KNN. Comparative analysis is done by using various parameters. More than 96% accuracy was obtained for both training and testing datasets. The existing system like Cardwatch, web service-based fraud detection, needs labelled data for both genuine and fraudulent transactions. New frauds cannot be found in these existing techniques. The dataset which is used contains transaction made by credit cards in September 2013 by cardholders of Europe. The dataset contains the transactions occurred in 2 days, in which there are 492 fraud transactions out of 284,807 which is 0.172% of all transaction.
APA, Harvard, Vancouver, ISO, and other styles
26

Bansal, Neha, R. K. Singh, and Arun Sharma. "An Insight into State-of-the-Art Techniques for Big Data Classification." International Journal of Information System Modeling and Design 8, no. 3 (July 2017): 24–42. http://dx.doi.org/10.4018/ijismd.2017070102.

Full text
Abstract:
This article describes how classification algorithms have emerged as strong meta-learning techniques to accurately and efficiently analyze the masses of data generated from the widespread use of internet and other sources. In particular, there is need of some mechanism which classifies unstructured data into some organized form. Classification techniques over big transactional database may provide required data to the users from large datasets in a more simplified way. With the intention of organizing and clearly representing the current state of classification algorithms for big data, present paper discusses various concepts and algorithms, and also an exhaustive review of existing classification algorithms over big data classification frameworks and other novel frameworks. The paper provides a comprehensive comparison, both from a theoretical as well as an empirical perspective. The effectiveness of the candidate classification algorithms is measured through a number of performance metrics such as implementation technique, data source validation, and scalability etc.
APA, Harvard, Vancouver, ISO, and other styles
27

Aljojo, Nahla. "Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness." ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY 9, no. 2 (November 2, 2021): 30–37. http://dx.doi.org/10.14500/aro.10857.

Full text
Abstract:
While Big Data analytics can provide a variety of benefits, processing heterogeneous data comes with its own set of limitations. A transaction pattern must be studied independently while working with Bitcoin data, this study examines twitter data related to Bitcoin and investigate communications pattern on bitcoin transactional tweet. Using the hashtags #Bitcoin or #BTC on Twitter, a vast amount of data was gathered, which was mined to uncover a pattern that everyone either (speculators, teaches, or the stakeholders) uses on Twitter to discuss Bitcoin transactions. This aim is to determine the direction of Bitcoin transaction tweets based on historical data. As a result, this research proposes using Big Data analytics to track Bitcoin transaction communications in tweets in order to discover a pattern. Hadoop platform MapReduce was used. The finding indicate that In the map step of the procedure, Hadoop's tokenize the dataset and parse them to the mapper where thirteen patterns were established and reduced to three patterns using the attributes previously stored data in the Hadoop context, one of which is the Emoji data that was left out in previous research discussions, but the text is only one piece of the puzzle on bitcoin transaction interaction, and the key part of it is “No certainty, only possibilities” in Bitcoin transactions
APA, Harvard, Vancouver, ISO, and other styles
28

Oprea, Simona-Vasilica, Adela Bâra, Florina Camelia Puican, and Ioan Cosmin Radu. "Anomaly Detection with Machine Learning Algorithms and Big Data in Electricity Consumption." Sustainability 13, no. 19 (October 2, 2021): 10963. http://dx.doi.org/10.3390/su131910963.

Full text
Abstract:
When analyzing smart metering data, both reading errors and frauds can be identified. The purpose of this analysis is to alert the utility companies to suspicious consumption behavior that could be further investigated with on-site inspections or other methods. The use of Machine Learning (ML) algorithms to analyze consumption readings can lead to the identification of malfunctions, cyberattacks interrupting measurements, or physical tampering with smart meters. Fraud detection is one of the classical anomaly detection examples, as it is not easy to label consumption or transactional data. Furthermore, frauds differ in nature, and learning is not always possible. In this paper, we analyze large datasets of readings provided by smart meters installed in a trial study in Ireland by applying a hybrid approach. More precisely, we propose an unsupervised ML technique to detect anomalous values in the time series, establish a threshold for the percentage of anomalous readings from the total readings, and then label that time series as suspicious or not. Initially, we propose two types of algorithms for anomaly detection for unlabeled data: Spectral Residual-Convolutional Neural Network (SR-CNN) and an anomaly trained model based on martingales for determining variations in time-series data streams. Then, the Two-Class Boosted Decision Tree and Fisher Linear Discriminant analysis are applied on the previously processed dataset. By training the model, we obtain the required capabilities of detecting suspicious consumers proved by an accuracy of 90%, precision score of 0.875, and F1 score of 0.894.
APA, Harvard, Vancouver, ISO, and other styles
29

Mqadi, Nhlakanipho Michael, Nalindren Naicker, and Timothy Adeliyi. "Solving Misclassification of the Credit Card Imbalance Problem Using Near Miss." Mathematical Problems in Engineering 2021 (July 19, 2021): 1–16. http://dx.doi.org/10.1155/2021/7194728.

Full text
Abstract:
In ordinary credit card datasets, there are far fewer fraudulent transactions than ordinary transactions. In dealing with the credit card imbalance problem, the ideal solution must have low bias and low variance. The paper aims to provide an in-depth experimental investigation of the effect of using a hybrid data-point approach to resolve the class misclassification problem in imbalanced credit card datasets. The goal of the research was to use a novel technique to manage unbalanced datasets to improve the effectiveness of machine learning algorithms in detecting fraud or anomalous patterns in huge volumes of financial transaction records where the class distribution was imbalanced. The paper proposed using random forest and a hybrid data-point approach combining feature selection with Near Miss-based undersampling technique. We assessed the proposed method on two imbalanced credit card datasets, namely, the European Credit Card dataset and the UCI Credit Card dataset. The experimental results were reported using performance matrices. We compared the classification results of logistic regression, support vector machine, decision tree, and random forest before and after using our approach. The findings showed that the proposed approach improved the predictive accuracy of the logistic regression, support vector machine, decision tree, and random forest algorithms in credit card datasets. Furthermore, we found that, out of the four algorithms, the random forest produced the best results.
APA, Harvard, Vancouver, ISO, and other styles
30

Kim, Junghee, Haemin Jung, and Wooju Kim. "Sequential Pattern Mining Approach for Personalized Fraudulent Transaction Detection in Online Banking." Sustainability 14, no. 15 (August 8, 2022): 9791. http://dx.doi.org/10.3390/su14159791.

Full text
Abstract:
Financial institutions face challenges of fraud due to an increased number of online transactions and sophisticated fraud techniques. Although fraud detection systems have been implemented to detect fraudulent transactions in online banking, many systems just use conventional rule-based approaches. Rule-based detection systems have a difficulty in updating and managing their rules and conditions manually. Additionally, generated from the few fraud cases, the rules are general rather than specific to each user. In this paper, we propose a personalized alarm model to detect frauds in online banking transactions using sequence pattern mining on each user’s normal transaction log. We assumed that a personalized fraud detection model is more effective in responding to the rapid increase in online banking users and diversified fraud patterns. Moreover, we focused on the fact that fraudulent transactions are very different from each user’s usual transactions. Our proposed model divides each user’s log into transactions, extracts a set of sequence patterns, and uses it to determine whether a new incoming transaction is fraudulent. The incoming transaction is divided into multiple windows, and if the normal patterns are not found in the consecutive windows, an alarm is sounded. We applied the model to a real-world dataset and showed that our model outperforms the rule-based model and the Markov chain model. Although more experiments on additional datasets are needed, our personalized alarm model can be applied to real-world systems.
APA, Harvard, Vancouver, ISO, and other styles
31

Lin, Tzu-Hsuan, and Jehn-Ruey Jiang. "Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest." Mathematics 9, no. 21 (October 22, 2021): 2683. http://dx.doi.org/10.3390/math9212683.

Full text
Abstract:
This paper proposes a method, called autoencoder with probabilistic random forest (AE-PRF), for detecting credit card frauds. The proposed AE-PRF method first utilizes the autoencoder to extract features of low-dimensionality from credit card transaction data features of high-dimensionality. It then relies on the random forest, an ensemble learning mechanism using the bootstrap aggregating (bagging) concept, with probabilistic classification to classify data as fraudulent or normal. The credit card fraud detection (CCFD) dataset is applied to AE-PRF for performance evaluation and comparison. The CCFD dataset contains large numbers of credit card transactions of European cardholders; it is highly imbalanced since its normal transactions far outnumber fraudulent transactions. Data resampling schemes like the synthetic minority oversampling technique (SMOTE), adaptive synthetic (ADASYN), and Tomek link (T-Link) are applied to the CCFD dataset to balance the numbers of normal and fraudulent transactions for improving AE-PRF performance. Experimental results show that the performance of AE-PRF does not vary much whether resampling schemes are applied to the dataset or not. This indicates that AE-PRF is naturally suitable for dealing with imbalanced datasets. When compared with related methods, AE-PRF has relatively excellent performance in terms of accuracy, the true positive rate, the true negative rate, the Matthews correlation coefficient, and the area under the receiver operating characteristic curve.
APA, Harvard, Vancouver, ISO, and other styles
32

Dixit, Abhishek, Akhilesh Tiwari, and R. K. Gupta. "A Model for Trend Analysis in the Online Shopping Scenario Using Multilevel Hesitation Pattern Mining." Mathematical Problems in Engineering 2021 (July 30, 2021): 1–11. http://dx.doi.org/10.1155/2021/2828262.

Full text
Abstract:
The present paper proposes a new model for the exploration of hesitated patterns from multiple levels of conceptual hierarchy in the transactional dataset. The usual practice of mining patterns has focused on identifying frequent patterns (i.e., which occur together) in the transactional dataset but uncovers the vital information about the patterns which are almost frequent (but not exactly frequent) called “hesitated patterns.” The proposed model uses the reduced minimum support threshold (contains two values: attractiveness and hesitation) and constant minimum confidence threshold with the top-down progressive deepening approach for generating patterns and utilizing the apriori property. To validate the model, an online purchasing scenario of books through e-commerce-based online shopping platforms such as Amazon has been considered and shown that how the various factors contributed towards building hesitation to purchase a book at the time of purchasing. The present work suggests a novel way for deriving hesitated patterns from multiple levels in the conceptual hierarchy with respect to the target dataset. Moreover, it is observed that the concepts and theories available in the existing related work Lu and Ng (2007) are only focusing on the introductory aspect of vague set theory-based hesitation association rule mining, which is not useful for handling the patterns from multiple levels of granularity, while the proposed model is complete in nature and addresses the very significant and untouched problem of mining “multilevel hesitated patterns” and is certainly useful for exploring the hesitated patterns from multiple levels of granularity based on the considered hesitation status in a transactional dataset. These hesitated patterns can be further utilized by decision makers and business analysts to build the strategy on how to increase the attraction level of such hesitated items (appeared in a particular transaction/set of transactions in a given dataset) to convert their state from hesitated to preferred items.
APA, Harvard, Vancouver, ISO, and other styles
33

Andi, Hari Krishnan. "Construction of Business Intelligence Model for Information Technology Sector with Decision Support System." December 2021 3, no. 4 (March 14, 2022): 259–68. http://dx.doi.org/10.36548/jitdw.2021.4.002.

Full text
Abstract:
Understanding and evaluating data is vital for making choices in a system. As the transactional system expands, it becomes increasingly difficult to execute analytical operations directly. Complex and huge datasets seem to be solved by analytical methods and their expansions. A lack of a clear and rigorous technique for measuring the realized value of intelligent business systems continues to be a problem, despite the fact that significant investment is being made in these systems. As a goal of the software program, an inventory management decision support system has been built. The method can be used by medium-sized industries or companies. Finding a suitable solution for the firm's size and adapting to business and decision-maker needs are the goals of this research work. Besides, the suitable solutions are measured and compared with reference data through Information Technology (IT) sectors by error computations. The computation has considered various types of errors in this research article. The proposed decision support system produces excellent results with the least amount of errors in the final output, as shown by the graphical depiction in the results and discussion section.
APA, Harvard, Vancouver, ISO, and other styles
34

Swami, Mrs M. M., Rushikesh Ghuge, Gurshan Singh, Harsh Tiwari, and Rohan Kalaskar. "Credit Card Fraud Detection." International Journal for Research in Applied Science and Engineering Technology 10, no. 12 (December 31, 2022): 2368–71. http://dx.doi.org/10.22214/ijraset.2022.47769.

Full text
Abstract:
Abstract: Due to exponential growth in the field of online transactions, credit cards are widely used in most financial aspects and hence there are more risks of fraudulent transactions. These fraudulent transactions can be shown by analysing several behaviours of credit card users from earlier transaction history datasets. If any abnormality is noticed in the behaviour from the existing patterns, there is the possibility of fraudulent transaction. In this project the proposed will use Ensemble Learning Algorithms (XGBoost). By using these models, the proposed system will predict if the transaction is fraudulent or genuine. Therefore, by the implementation of this methodology in fraud detection systems, monetary losses which are caused due to fraudulent transactions can be decreased.
APA, Harvard, Vancouver, ISO, and other styles
35

THABTAH, FADI, and SUHEL HAMMOUD. "MR-ARM: A MAP-REDUCE ASSOCIATION RULE MINING FRAMEWORK." Parallel Processing Letters 23, no. 03 (September 2013): 1350012. http://dx.doi.org/10.1142/s0129626413500126.

Full text
Abstract:
Association rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have been developed and a number of experiments against small, medium and large data collections have been conducted. The ground bases of the comparisons are time required by the algorithm for: data initialisation, frequent items discovery, rule generation, etc. The results show that MR-ARM is very useful tool for mining association rules from large datasets in a distributed environment.
APA, Harvard, Vancouver, ISO, and other styles
36

Padhi, Bharat Kumar, Sujata Chakravarty, Bighnaraj Naik, Radha Mohan Pattanayak, and Himansu Das. "RHSOFS: Feature Selection Using the Rock Hyrax Swarm Optimization Algorithm for Credit Card Fraud Detection System." Sensors 22, no. 23 (November 30, 2022): 9321. http://dx.doi.org/10.3390/s22239321.

Full text
Abstract:
In recent years, detecting credit card fraud transactions has been a difficult task due to the high dimensions and imbalanced datasets. Selecting a subset of important features from a high-dimensional dataset has proven to be the most prominent approach for solving high-dimensional dataset issues, and the selection of features is critical for improving classification performance, such as the fraud transaction identification process. To contribute to the field, this paper proposes a novel feature selection (FS) approach based on a metaheuristic algorithm called Rock Hyrax Swarm Optimization Feature Selection (RHSOFS), inspired by the actions of rock hyrax swarms in nature, and implements supervised machine learning techniques to improve credit card fraud transaction identification approaches. This approach is used to select a subset of optimal relevant features from a high-dimensional dataset. In a comparative efficiency analysis, RHSOFS is compared with Differential Evolutionary Feature Selection (DEFS), Genetic Algorithm Feature Selection (GAFS), Particle Swarm Optimization Feature Selection (PSOFS), and Ant Colony Optimization Feature Selection (ACOFS) in a comparative efficiency analysis. The proposed RHSOFS outperforms existing approaches, such as DEFS, GAFS, PSOFS, and ACOFS, according to the experimental results. Various statistical tests have been used to validate the statistical significance of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
37

Hewapathirana, Isuru. "Utilizing Prediction Intervals for Unsupervised Detection of Fraudulent Transactions: A Case Study." Asian Journal of Engineering and Applied Technology 11, no. 2 (October 28, 2022): 1–10. http://dx.doi.org/10.51983/ajeat-2022.11.2.3348.

Full text
Abstract:
Money laundering operations have a high negative impact on the growth of a country’s national economy. As all financial sectors are increasingly being integrated, it is vital to implement effective technological measures to address these fraudulent operations. Machine learning methods are widely used to classify an incoming transaction as fraudulent or non-fraudulent by analyzing the behaviour of past transactions. Unsupervised machine learning methods do not require label information on past transactions, and a classification is made solely based on the distribution of the transaction. This research presents three unsupervised classification methods: ordinary least squares regression-based (OLS) fraud detection, random forest-based (RF) fraud detection and dropout neural network-based (DNN) fraud detection. For each method, the goal is to classify an incoming transaction amount as fraudulent or non-fraudulent. The novelty in the proposed approach is the application of prediction interval calculation for automatically validating incoming transactions. The three methods are applied to a real-world dataset of credit card transactions. The fraud labels available for the dataset are removed during the model training phase but are later used to evaluate the performance of the final predictions. The performance of the proposed methods is further compared with two other unsupervised state-of-the-art methods. Based on the experimental results, the OLS and RF methods show the best performance in predicting the correct label of a transaction, while the DNN method is the most robust method for detecting fraudulent transactions. This novel concept of calculating prediction intervals for validating an incoming transaction introduces a new direction for unsupervised fraud detection. Since fraud labels on past transactions are not required for training, the proposed methods can be applied in an online setting to different areas, such as detecting money laundering activities, telecommunication fraud and intrusion detection.
APA, Harvard, Vancouver, ISO, and other styles
38

Chen, Lei, and Yanbin Tu. "Enhancing Online Auction Transaction Likelihood." International Journal of E-Business Research 15, no. 2 (April 2019): 116–32. http://dx.doi.org/10.4018/ijebr.2019040107.

Full text
Abstract:
This article compares four data mining models (discriminant analysis, logistic regression, decision tree, and multilayer neural networks) for online auction transaction predictions. It aims to choose the best model in terms of prediction accuracy and to identify determinants significant for auction transactions. By using datasets from eBay, the authors find that the best data mining model for auction transactions is multilayer neural networks. Logistic regression and decision tree models can be used to identify determinants significant for auction transaction such as seller's feedback profile, listing picture, listing files size, return policies, and others. By adjusting these listing options, sellers could increase the auction transaction likelihood. This study will help sellers improve their auction listings by constructing effective selling strategies so that they can enhance the likelihood of online auction transactions. All these efforts will help improve their online auction performances and finally lead to a more efficient electronic marketplace.
APA, Harvard, Vancouver, ISO, and other styles
39

Hussain, Ibrar, and Muhammad Asif. "Detection of Anomalous Transactions in Mobile Payment Systems." International Journal of Data Analytics 1, no. 2 (July 2020): 58–66. http://dx.doi.org/10.4018/ijda.2020070105.

Full text
Abstract:
Mobile payment systems are providing an opportunity for smartphone users for transferring money to each other with ease. This simple way of transferring through mobile payment systems has great potential for economic activity. However, fraudulent transactions may occur and can have a substantial impact on the economy of a country. Financial fraud and anomalous transactions can cause a loss of billions of dollars annually. Therefore, there is a need to detect anomalous transactions through mobile payment systems to prevent financial fraud. For this research study, a synthetic dataset is generated by using a PAYSIM simulator due to the lack of availability of a realistic dataset. This research study performed experiments on a financial transactional dataset using eight data mining classification algorithms. The performance of classification models was measured by using evaluation metrics: accuracy, precision, F-score, recall, and specificity. A comparative analysis of classification models was also performed based on their performance.
APA, Harvard, Vancouver, ISO, and other styles
40

Muhamed, Shatha Jassim. "Detection and Prevention WEB-Service for Fraudulent E-Transaction using APRIORI and SVM." Al-Mustansiriyah Journal of Science 33, no. 4 (December 30, 2022): 72–79. http://dx.doi.org/10.23851/mjs.v33i4.1242.

Full text
Abstract:
With the increased use of information technology, many financial services are available to users at their fingertips. However, this led to many fraud transactions. Automatic fraud identification and detection could improve the user experience and security of online transactions. Using machine learning algorithms, it is possible to detect fraud transactions. Machine learning algorithms have the ability to find the hidden implicit pattern and data relationships from a large dataset. Hence, using this algorithm is possible to detect the outlier from all transactions, which can help in determining the fraud transaction. In this paper, the APRIORI algorithm and Support Vector Machine are used to detect fraud transactions in credit cards via developing a secure web application service enforced the security by standard metrics. We compare the result with the other existing machine learning algorithms. We observed that the accuracy of fraud transaction detection is higher in the proposed algorithm more than 94.56, and the false fraud transaction detection is less than the fraud detection based on the Hidden Markov Model.
APA, Harvard, Vancouver, ISO, and other styles
41

He, Yihong. "Machine Learning Methods for Credit Card Fraud Detection." Highlights in Science, Engineering and Technology 23 (December 3, 2022): 106–10. http://dx.doi.org/10.54097/hset.v23i.3204.

Full text
Abstract:
Machine learning is an innovative and efficient tool to prevent credit card fraud, however, given the variety of machine learning models, which model is the most suitable for fraudulent transaction predictions becomes a tough question to answer. In this research, a comprehensive evaluation method is borrowed to compare performances between different machine learning models. More precisely, this research uses the Area under the ROC Curve (AUC) metric to evaluate and compare performances between four different machine learning models with the same transaction information dataset. The four models are K Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine. In this research, a dataset that contains over one million credit card transaction data is processed and divided into training data and testing data. After preprocessing, the same training data are fitted into four different models and being test against the same testing data. After a series of hyperparameter tuning, the AUC score of each model is obtained and compared. The comparison result indicates that Random Forest makes the most accurate and consistent predictions on fraudulent transactions in this dataset, and thus can be recommended as the primary machine learning algorithm to prevent credit card fraudulent transactions.
APA, Harvard, Vancouver, ISO, and other styles
42

Bandyopadhyay, Samir Kuma. "Detection of Fraud Transactions Using Recurrent Neural Network during COVID-19." Journal of Advanced Research in Medical Science & Technology 07, no. 03 (October 7, 2020): 16–21. http://dx.doi.org/10.24321/2394.6539.202012.

Full text
Abstract:
Online transactions are becoming more popular in present situation where the globe is facing an unknown disease COVID-19. Now authorities of several countries have requested people to use cashless transaction as far as possible. Practically, it is not always possible to use it in all transactions. Since number of such cashless transactions has been increasing during lockdown period due to COVID-19, fraudulent transactions are also increasing in a rapid way. Fraud can be analysed by viewing a series of customer transactions data that was done in his/ her previous transactions. Normally banks or other transaction authorities warn their customers about the transaction, if they notice any deviation from available patterns; the authorities consider it as a possibly fraudulent transaction. For detection of fraud during COVID-19, banks and credit card companies are applying various methods such as data mining, decision tree, rule based mining, neural network, fuzzy clustering approach and machine learning methods. The approach tries to find out normal usage pattern of customers based on their former activities. The objective of this paper is to propose a method to detect such fraud transactions during such unmanageable situation of the pandemic. Digital payment schemes are often threatened by fraudulent activities. Detecting fraud transactions during money transfer may save customers from financial loss. Mobile-based money transactions are focused in this paper for fraud detection. A Deep Learning (DL) framework is suggested in the paper that monitors and detects fraudulent activities. Implementing and applying Recurrent Neural Network on PaySim generated synthetic financial dataset, deceptive transactions are identified. The proposed method is capable to detect deceptive transactions with an accuracy of 99.87%, F1-Score of 0.99 and MSE of 0.01.
APA, Harvard, Vancouver, ISO, and other styles
43

Leung, Charles, Tin Cheuk Leung, and Kwok Ping Tsang. "International Real Estate Review." International Real Estate Review 18, no. 4 (December 31, 2015): 473–501. http://dx.doi.org/10.53383/100210.

Full text
Abstract:
We study the implications of a property market transaction tax. As property buyers are obligated to pay a transaction tax ("stamp duty¨ or SD) where the rate increases with the value of the transaction, there are incentives to trade at the cutoff points of the tax schedule or just below them. Thus, both ¡§bunching in transactions¡¨ and ¡§underpricing¡¨ should be observed near those cutoffs. Furthermore, the bunching points should change with the tax schedule. We confirm these conjectures with a rich dataset from the Hong Kong housing market and provide a measure of tax avoidance.
APA, Harvard, Vancouver, ISO, and other styles
44

., Soumya Shrivastava, and Punit Kumar Johari . "Convolutional Neural Network Approach for Mobile Banking Fraudulent Transaction to Detect Financial Frauds." International Journal of Engineering Technology and Management Sciences 6, no. 1 (January 28, 2022): 30–37. http://dx.doi.org/10.46647/ijetms.2022.v06i01.005.

Full text
Abstract:
In the real world, the identification of financial fraud in compliance with IoT criteria is highly effective since financial fraud causes financial damage. Several forms of financial fraud are likely, but unauthorized usage of mobile payment by credit card no. or certificate no.is the most common scenario. Detection of financial crime is a growing environment in which the victims will keep ahead. However, intelligent fraud detection facets remain scientifically unsupported. Deep learning (DL)arises from the idea of a multi-type representation of the human brain that incorporates basic characteristics at the low level or high-level abstractions.Financial fraud was a big issue as forgers discovered new methods of stealing currency. Therefore, adaptive methods of identification of fraud against forgers are required. Thanks to their versatile nature to detect emergent financial transaction fraud, deep learning approaches were enticing candidates. In this article, we suggest an in-depth learning approach for adapting financial fraud through the use of convolution neural networks (CNN). With the fraudulent transactions dataset, we tested our model experimentally. The results of the analysis show which our methods detect transactional fraud appropriately.
APA, Harvard, Vancouver, ISO, and other styles
45

Khajuria, Rakshit, Anuj Sharma, Sunny Sharma, Ashok Sharma, Jyoti Narayan Baliya, and Parveen Singh. "Performance analysis of frequent pattern mining algorithm on different real-life dataset." Indonesian Journal of Electrical Engineering and Computer Science 29, no. 3 (March 1, 2023): 1355. http://dx.doi.org/10.11591/ijeecs.v29.i3.pp1355-1363.

Full text
Abstract:
<span lang="EN-US">The efficient finding of common patterns: a group of items that appear frequently in a dataset is a critical task in data mining, especially in transaction datasets. The goal of this paper is to look into the efficiency of various algorithms for frequent pattern mining in terms of computing time and memory consumption, as well as the problem of how to apply the algorithms to different datasets. In this paper, the algorithms investigated for mining the frequent patterns are; Pre-post, Pre-post+, FIN, H-mine, R-Elim, and estDec+ algorithms. These algorithms have been implemented and tested on four real-life datasets that are: The retail dataset, the Accidents dataset, the Chess dataset, and the Mushrooms dataset. From the results, it has been observed that, for the Retail dataset, estDec+ algorithm is the fastest among all algorithms in terms of run time as well as consumes less memory for its execution. Pre-post+ algorithm performs better than all other algorithms in terms of run time and maximum memory for the Mushrooms dataset. Pre-Post outperforms other algorithms in terms of performance. And for Accident datasets, in terms of execution time and memory consumption, the FIN method outperforms other algorithms.</span>
APA, Harvard, Vancouver, ISO, and other styles
46

Zhang, Zhaohui, Lijun Yang, Ligong Chen, Qiuwen Liu, Ying Meng, Pengwei Wang, and Maozhen Li. "A generative adversarial network–based method for generating negative financial samples." International Journal of Distributed Sensor Networks 16, no. 2 (February 2020): 155014772090705. http://dx.doi.org/10.1177/1550147720907053.

Full text
Abstract:
In financial anti-fraud field, negative samples are small and sparse with serious sample imbalanced problem. Generating negative samples consistent with original data to naturally solve imbalanced problem is a serious problem. This article proposes a new method to solve this problem. We introduce a new generation model, combined Generative Adversarial Network with Long Short-Term Memory network for one-dimensional negative financial samples. The characteristic association between transaction sequences can be learned by long short-term memory layer, and the generator covers real data distribution by the adversarial discriminator with time-sequence. Mapping data distribution to feature space is a common evaluation method of synthetic data; however, relationships between data attributes have been ignored in online transactions. We define a comprehensive evaluation method to evaluate the validity of generated samples from data distribution and attribute characteristics. Experimental results on real bank B2B transaction data show that the proposed model has higher overall ratings, which is 10% higher than traditional generation models. Finally, well-trained model is used to generate negative samples and form new dataset. The classification results on new datasets show that precision and recall are all higher than baseline models. Our work has a certain practical value and provides a new idea to solve imbalanced problem in whatever fields.
APA, Harvard, Vancouver, ISO, and other styles
47

Liu, Xiaolong, and Weidong Qu. "International Real Estate Review." International Real Estate Review 18, no. 1 (March 31, 2015): 113–29. http://dx.doi.org/10.53383/100195.

Full text
Abstract:
Since its liberalization in 2003, the urban land lease market in China has experienced substantial growth in terms of both the volume and value of transactions. At the same time, significant transaction premiums are observed in these land transactions; these premiums make the general public skeptical about the emergence of a property market bubble that stems from aggressive bidding in the land market. In this paper, we seek to rationalize this phenomenon by means of the event study method. By using a land transaction dataset from Beijing for the period 2003 to 2013, we find that the capital market reacts significantly to land bidding events. In addition, the land transaction premium observed in the Chinese land market can be explained by the signaling effect, in that developers tend to use the bidding price as a signaling device to disseminate favorable private information to the marketplace.
APA, Harvard, Vancouver, ISO, and other styles
48

Ashfaq, Tehreem, Rabiya Khalid, Adamu Sani Yahaya, Sheraz Aslam, Ahmad Taher Azar, Safa Alsafari, and Ibrahim A. Hameed. "A Machine Learning and Blockchain Based Efficient Fraud Detection Mechanism." Sensors 22, no. 19 (September 21, 2022): 7162. http://dx.doi.org/10.3390/s22197162.

Full text
Abstract:
In this paper, we address the problems of fraud and anomalies in the Bitcoin network. These are common problems in e-banking and online transactions. However, as the financial sector evolves, so do the methods for fraud and anomalies. Moreover, blockchain technology is being introduced as the most secure method integrated into finance. However, along with these advanced technologies, many frauds are also increasing every year. Therefore, we propose a secure fraud detection model based on machine learning and blockchain. There are two machine learning algorithms—XGboost and random forest (RF)—used for transaction classification. The machine learning techniques train the dataset based on the fraudulent and integrated transaction patterns and predict the new incoming transactions. The blockchain technology is integrated with machine learning algorithms to detect fraudulent transactions in the Bitcoin network. In the proposed model, XGboost and random forest (RF) algorithms are used to classify transactions and predict transaction patterns. We also calculate the precision and AUC of the models to measure the accuracy. A security analysis of the proposed smart contract is also performed to show the robustness of our system. In addition, an attacker model is also proposed to protect the proposed system from attacks and vulnerabilities.
APA, Harvard, Vancouver, ISO, and other styles
49

Zhang, Zhong-jie, Jian Huang, and Ying Wei. "FI-FG: Frequent Item Sets Mining from Datasets with High Number of Transactions by Granular Computing and Fuzzy Set Theory." Mathematical Problems in Engineering 2015 (2015): 1–14. http://dx.doi.org/10.1155/2015/623240.

Full text
Abstract:
Mining frequent item set (FI) is an important issue in data mining. Considering the limitations of those exact algorithms and sampling methods, a novel FI mining algorithm based on granular computing and fuzzy set theory (FI-GF) is proposed, which mines those datasets with high number of transactions more efficiently. Firstly, the granularity is applied, which compresses the transactions to some granules for reducing the scanning cost. During the granularity, each granule is represented by a fuzzy set, and the transaction scale represented by a granule is optimized. Then, fuzzy set theory is used to compute the supports of item sets based on those granules, which faces the uncertainty brought by the granularity and ensures the accuracy of the final results. Finally, Apriori is applied to get the FIs based on those granules and the new computing way of supports. Through five datasets, FI-GF is compared with the original Apriori to prove its reliability and efficiency and is compared with a representative progressive sampling way, RC-SS, to prove the advantage of the granularity to the sampling method. Results show that FI-GF not only successfully saves the time cost by scanning transactions but also has the high reliability. Meanwhile, the granularity has advantages to those progressive sampling methods.
APA, Harvard, Vancouver, ISO, and other styles
50

Widi Hastomo, Adhitio Satyo Bayangkari Karno, Sudjiran, Dodi Arif, and Eka Sally Moreta. "Exloratory Data Analysis Untuk Data Belanja Pelanggan dan Pendapatan Bisnis." Infotekmesin 13, no. 2 (July 30, 2022): 314–21. http://dx.doi.org/10.35970/infotekmesin.v13i2.1547.

Full text
Abstract:
A more quantifiable perspective is assuming the role of mechanistic management in an effort to enhance business based on its capacity to transform data into knowledge and insight. The industry has not completely supported its business strategy also with driven data. Using a transaction dataset taken from one of the Kaggle.com challenges, this experiment attempts to determine consumer spending patterns and Retail Fashion business revenues (H&M Personalized Fashion Recommendations). The results of the experiment are the number of transactions based on customer age, the most sales product and one-time purchased item, and the type of product that generates the highest and smallest income. The approach employed is EDA using the Python language. In order for businesses to generate analytical findings that provide future perspectives and to help identify the gap by delivering analytical results in the form of suggestions that can be perpetuated, the findings of this experiment are intended to support the capabilities of simulation. The challenge in this experiment is the abundance of datasets, which necessitates a suitable operating environment.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography