Articles de revues sur le sujet « RDF dataset metrics »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : RDF dataset metrics.

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 28 meilleurs articles de revues pour votre recherche sur le sujet « RDF dataset metrics ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Mountantonakis, Michalis, et Yannis Tzitzikas. « Content-based Union and Complement Metrics for Dataset Search over RDF Knowledge Graphs ». Journal of Data and Information Quality 12, no 2 (14 mai 2020) : 1–31. http://dx.doi.org/10.1145/3372750.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Xia, Jianglin. « Credit Card Fraud Detection Based on Support Vector Machine ». Highlights in Science, Engineering and Technology 23 (3 décembre 2022) : 93–97. http://dx.doi.org/10.54097/hset.v23i.3202.

Texte intégral
Résumé :
Due to the increasing popularity cashless transactions, credit card fraud has become one of the most common frauds and caused huge harm to the financial institutions and individuals in real life. In this academic paper, the algorithm Support Vector Machine (SVM) is used to build models to deal with the credit card fraud detection problem with the performance metrics AUC and F1-score. The experiment dataset is named Credit Card Transactions Fraud Detection Dataset from the Kaggle website. After the step of preprocessing, the dataset is split into the training, testing and validation dataset with 11 numerical features and a label feature called “is_fraud”. The inner parameter “class_weight” of the SVM algorithm in Python is set as “balanced” to deal with the imbalanced datasets. The main method to find the optimized models is using the GridSearchCV function in Python library sklearn. After tuning the hyperparameters and handling the overfitting phenomenon, the optimized models for the two metrics are found. The parameter values of the best model for AUC are C=10, class_weight= “balanced”, g =0.01, kernel = “rbf”. The training AUC is 0.87 and testing AUC is 0.90. The parameter values of the final optimized model for F1-score are C=0.8, class_weight= “balanced”, g =0.06, kernel = “rbf”. The final training F-score is 0.305 and testing F-score is 0.260.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Wang, Ke, Ligang Cheng et Bin Yong. « Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification ». Remote Sensing 12, no 13 (6 juillet 2020) : 2154. http://dx.doi.org/10.3390/rs12132154.

Texte intégral
Résumé :
Spectral similarity measures can be regarded as potential metrics for kernel functions, and can be used to generate spectral-similarity-based kernels. However, spectral-similarity-based kernels have not received significant attention from researchers. In this paper, we propose two novel spectral-similarity-based kernels based on spectral angle mapper (SAM) and spectral information divergence (SID) combined with the radial basis function (RBF) kernel: Power spectral angle mapper RBF (Power-SAM-RBF) and normalized spectral information divergence-based RBF (Normalized-SID-RBF) kernels. First, we prove these spectral-similarity-based kernels to be Mercer’s kernels. Second, we analyze their efficiency in terms of local and global kernels. Finally, we consider three hyperspectral datasets to analyze the effectiveness of the proposed spectral-similarity-based kernels. Experimental results demonstrate that the Power-SAM-RBF and SAM-RBF kernels can obtain an impressive performance, particularly the Power-SAM-RBF kernel. For example, when the ratio of the training set is 20 % , the kappa coefficient of Power-SAM-RBF kernel (0.8561) is 1.61 % , 1.32 % , and 1.23 % higher than that of the RBF kernel on the Indian Pines, University of Pavia, and Salinas Valley datasets, respectively. We present three conclusions. First, the superiority of the Power-SAM-RBF kernel compared to other kernels is evident. Second, the Power-SAM-RBF kernel can provide an outstanding performance when the similarity between spectral signatures in the same hyperspectral dataset is either extremely high or extremely low. Third, the Power-SAM-RBF kernel provides even greater benefits compared to other commonly used kernels when the sizes of the training sets increase. In future work, multiple kernels combining with the spectral-similarity-based kernel are expected to be provide better hyperspectral classification.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Zhao, Qinghe, Zifang Zhang, Yuchen Huang et Junlong Fang. « TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values ». Agriculture 12, no 9 (13 septembre 2022) : 1452. http://dx.doi.org/10.3390/agriculture12091452.

Texte intégral
Résumé :
Soybeans with insignificant differences in appearance have large differences in their internal physical and chemical components; therefore, follow-up storage, transportation and processing require targeted differential treatment. A fast and effective machine learning method based on hyperspectral data of soybeans for pattern recognition of categories is designed as a non-destructive testing method in this paper. A hyperspectral-image dataset with 2299 soybean seeds in four categories is collected. Ten features are selected using an extreme gradient boosting algorithm from 203 hyperspectral bands in a range of 400 to 1000 nm; a Gaussian radial basis kernel function support vector machine with optimization by the tree-structured Parzen estimator algorithm is built as the TPE-RBF-SVM model for pattern recognition of soybean categories. The metrics of TPE-RBF-SVM are significantly improved compared with other machine learning algorithms. The accuracy is 0.9165 in the independent test dataset, which is 9.786% higher for the vanilla RBF-SVM model and 10.02% higher than the extreme gradient boosting model.
Styles APA, Harvard, Vancouver, ISO, etc.
5

Chen, Yanji, Mieczyslaw M. Kokar, Jakub Moskal et Kaushik R. Chowdhury. « Metrics-Based Comparison of OWL and XML for Representing and Querying Cognitive Radio Capabilities ». Applied Sciences 12, no 23 (23 novembre 2022) : 11946. http://dx.doi.org/10.3390/app122311946.

Texte intégral
Résumé :
Collaborative spectrum access requires wireless devices to perform spectrum-related tasks (such as sensing) on request from other nodes. Thus, while joining the network, they need to inform neighboring devices and/or the central coordinator of their capabilities. During the operational phase, nodes may request other permissions from the the controller, like the opportunity to transmit according to the current policies and spectrum availability. To achieve such coordinated behavior, all associated devices within the network need a language for describing radio capabilities, requests, scenarios, policies, and spectrum availability. In this paper, we present a thorough comparison of the use of two candidate languages—Web Ontology Language (OWL) and eXtensible Markup Language (XML)—for such purposes. Towards this goal, we propose an evaluation method for automating quantitative comparisons with metrics such as precision, recall, device registration, and the query response time. The requests are expressed in both SPARQL Protocol and RDF Query Language (SPARQL) and XML Query Language (XQuery), whereas the device capabilities are expressed in both OWL and XML. The evaluation results demonstrate the advantages of using OWL semantics to improve the quality of matching results over XML. We also discuss how the evaluation method can be applicable to other scenarios where knowledge, datasets, and queries require richer expressiveness and semantics.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Jerop, Brenda, et Davies Rene Segera. « An Efficient PCA-GA-HKSVM-Based Disease Diagnostic Assistant ». BioMed Research International 2021 (20 octobre 2021) : 1–10. http://dx.doi.org/10.1155/2021/4784057.

Texte intégral
Résumé :
Disease diagnosis faces challenges such as misdiagnosis, lack of diagnosis, and slow diagnosis. There are several machine learning techniques that have been applied to address these challenges, where a set of symptoms is applied to a classification model that predicts the presence or absence of a disease. To improve on the performance of these techniques, this paper presents a technique which involves feature selection using principal component analysis (PCA), a hybrid kernel-based support vector machine (HKSVM) classification model and hyperparameter optimization using genetic algorithm (GA). The HKSVM in this paper introduces a new way of combining three kernels: Radial basis function (RBF), linear, and polynomial. Combining local (RBF) and global (linear and polynomial) kernels has the effect of improved model performance. This is because the local kernels are better able to distinguish points closer to each other while the global kernels are more suited to distinguish points that are far away from each other. The PCA-GA-HKSVM is used on 7 different medical datasets, with two datasets being multiclass datasets and 5 datasets being binary. Performance evaluation metrics used were accuracy, precision, and recall. It was observed that the PCA-GA-HKSVM offered better performance than the single kernel support vector machines (SVMs).
Styles APA, Harvard, Vancouver, ISO, etc.
7

Mohammed, Yosra Abdulaziz, et Eman Gadban Saleh. « Comparative study of logistic regression and artificial neural networks on predicting breast cancer cytology ». Indonesian Journal of Electrical Engineering and Computer Science 21, no 2 (1 février 2021) : 1113. http://dx.doi.org/10.11591/ijeecs.v21.i2.pp1113-1120.

Texte intégral
Résumé :
<p>Currently, breast cancer is one of the most common cancers and a main reason of women death worldwide particularly in<strong> </strong>developing countries such as Iraq. our work aims to predict the type of tumor whether benign or malignant through models that were built using logistic regression and neural networks and we hope it will help doctors in detecting the type of breast tumor. Four models were set using binary logistic regression and two different types of artificial neural networks namely multilayer perceptron MLP and radial basis function RBF. Evaluation of validated and trained models was done using several performance metrics like accuracy, sensitivity, specificity, and AUC (area under receiver operating characteristic ROC). Dataset was downloaded from UCI ml repository; it is composed of 9 attributes and 699 samples. The findings are clearly showing that the RBF NN classifier is the best in prediction of the type of breast tumors since it had recorded the highest performance in terms of correct classification rate (accuracy), sensitivity, specificity, and AUC (area under Receiver Operating Characteristic ROC) among all other models.</p>
Styles APA, Harvard, Vancouver, ISO, etc.
8

Panda, Mrutyunjaya. « Software Defect Prediction Using Hybrid Distribution Base Balance Instance Selection and Radial Basis Function Classifier ». International Journal of System Dynamics Applications 8, no 3 (juillet 2019) : 53–75. http://dx.doi.org/10.4018/ijsda.2019070103.

Texte intégral
Résumé :
Software is an important part of human life and with the rapid development of software engineering the demands for software to be reliable with low defects is increasingly pressing. The building of a software defect prediction model is proposed in this article by using various software metrics with publicly available historical software defect datasets collected from several projects. Such a prediction model can enable the software engineers to take proactive actions in enhancing software quality from the early stages of the software development cycle. This article introduces a hybrid classification method (DBBRBF) by combining distribution base balance (DBB) based instance selection and radial basis function (RBF) neural network classifier to obtain the best prediction compared to the existing research. The experimental results with post-hoc statistical significance tests shows the effectiveness of the proposed approach.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Villa, Amalia, Abhijith Mundanad Narayanan, Sabine Van Huffel, Alexander Bertrand et Carolina Varon. « Utility metric for unsupervised feature selection ». PeerJ Computer Science 7 (21 avril 2021) : e477. http://dx.doi.org/10.7717/peerj-cs.477.

Texte intégral
Résumé :
Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.
Styles APA, Harvard, Vancouver, ISO, etc.
10

Bashir, Kamal, Tianrui Li et Mahama Yahaya. « A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction ». International Arab Journal of Information Technology 17, no 5 (1 septembre 2020) : 721–30. http://dx.doi.org/10.34028/iajit/17/5/5.

Texte intégral
Résumé :
The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process
Styles APA, Harvard, Vancouver, ISO, etc.
11

Mugo, Robinson, et Sei-Ichi Saitoh. « Ensemble Modelling of Skipjack Tuna (Katsuwonus pelamis) Habitats in the Western North Pacific Using Satellite Remotely Sensed Data ; a Comparative Analysis Using Machine-Learning Models ». Remote Sensing 12, no 16 (12 août 2020) : 2591. http://dx.doi.org/10.3390/rs12162591.

Texte intégral
Résumé :
To examine skipjack tuna’s habitat utilization in the western North Pacific (WNP) we used an ensemble modelling approach, which applied a fisher- derived presence-only dataset and three satellite remote-sensing predictor variables. The skipjack tuna data were compiled from daily point fishing data into monthly composites and re-gridded into a quarter degree resolution to match the environmental predictor variables, the sea surface temperature (SST), sea surface chlorophyll-a (SSC) and sea surface height anomalies (SSHA), which were also processed at quarter degree spatial resolution. Using the sdm package operated in RStudio software, we constructed habitat models over a 9-month period, from March to November 2004, using 17 algorithms, with a 70:30 split of training and test data, with bootstrapping and 10 runs as parameter settings for our models. Model performance evaluation was conducted using the area under the curve (AUC) of the receiver operating characteristic (ROC), the point biserial correlation coefficient (COR), the true skill statistic (TSS) and Cohen’s kappa (k) metrics. We analyzed the response curves for each predictor variable per algorithm, the variable importance information and the ROC plots. Ensemble predictions of habitats were weighted with the TSS metric. Model performance varied across various algorithms, with the Support Vector Machines (SVM), Boosted Regression Trees (BRT), Random Forests (RF), Multivariate Adaptive Regression Splines (MARS), Generalized Additive Models (GAM), Classification and Regression Trees (CART), Multi-Layer Perceptron (MLP), Recursive Partitioning and Regression Trees (RPART), and Maximum Entropy (MAXENT), showing consistently high performance than other algorithms, while the Flexible Discriminant Analysis (FDA), Mixture Discriminant Analysis (MDA), Bioclim (BIOC), Domain (DOM), Maxlike (MAXL), Mahalanobis Distance (MAHA) and Radial Basis Function (RBF) had lower performance. We found inter-algorithm variations in predictor variable responses. We conclude that the multi-algorithm modelling approach enabled us to assess the variability in algorithm performance, hence a data driven basis for building the ensemble model. Given the inter-algorithm variations observed, the ensemble prediction maps indicated a better habitat utilization map of skipjack tuna than would have been achieved by a single algorithm.
Styles APA, Harvard, Vancouver, ISO, etc.
12

Parsaeian, Mahdieh, Mohammad Rahimi, Abbas Rohani et Shaneka S. Lawson. « Towards the Modeling and Prediction of the Yield of Oilseed Crops : A Multi-Machine Learning Approach ». Agriculture 12, no 10 (21 octobre 2022) : 1739. http://dx.doi.org/10.3390/agriculture12101739.

Texte intégral
Résumé :
Crop seed yield modeling and prediction can act as a key approach in the precision agriculture industry, enabling the reliable assessment of the effectiveness of agro-traits. Here, multiple machine learning (ML) techniques are employed to predict sesame (Sesamum indicum L.) seed yields (SSY) using agro-morphological features. Various ML models were applied, coupled with the PCA (principal component analysis) method to compare them with the original ML models, in order to evaluate the prediction efficiency. The Gaussian process regression (GPR) and radial basis function neural network (RBF-NN) models exhibited the most accurate SSY predictions, with determination coefficients, or R2 values, of 0.99 and 0.91, respectfully. The root-mean-square error (RMSE) obtained using the ML models ranged between 0 and 0.30 t/ha (metric tons/hectare) for the varied modeling process phases. The estimation of the sesame seed yield with the coupled PCA-ML models improved the performance accuracy. According to the k-fold process, we utilized the datasets with the lowest error rates to ensure the continued accuracy of the GPR and RBF models. The sensitivity analysis revealed that the capsule number per plant (CPP), seed number per capsule (SPC), and 1000-seed weight (TSW) were the most significant seed yield determinants.
Styles APA, Harvard, Vancouver, ISO, etc.
13

Mazimwe, Allan, Imed Hammouda et Anthony Gidudu. « An Empirical Evaluation of Disaster Data Interoperability—A Case of Uganda ». ISPRS International Journal of Geo-Information 8, no 11 (26 octobre 2019) : 484. http://dx.doi.org/10.3390/ijgi8110484.

Texte intégral
Résumé :
One of the grand challenges of disaster management is for stakeholders to be able to discover, access, integrate and analyze task-appropriate disaster data together with their associated algorithms and work-flows. Even with a growing number of initiatives to publish disaster data using open principles, integration and reuse are still difficult due to existing interoperability barriers within datasets. Several frameworks for assessing data interoperability exist but do not generate best practice solutions to existing barriers based on the assessment they use. In this study, we assess disaster data interoperability in Uganda and identify generic solutions to interoperability challenges in the context of disaster data. Semi-structured interviews and focus group discussions were used to collect qualitative data from stakeholders in the disaster sector in Uganda. Data interoperability was measured to provide an understanding of interoperability in the disaster sector. Interoperability maturity is measured using qualitative methods, while data compatibility metrics are computed from identifiers in the RDF-triple model. Results indicate high syntactic and technical interoperability maturity for disaster data. On the contrary, there exists considerable semantic and legal interoperability barriers that hinder disaster data integration and reuse. A mapping of the interoperability challenges in the disaster management sector to solutions reveals a potential to reuse established patterns for managing interoperability. These include; the federated pattern, linked data patterns, broadcast pattern, rights and policy harmonization patterns, dissemination and awareness pattern, ontology design patterns among others. Thus a systematic approach to combining patterns is critical to managing data interoperability barriers among actors in the disaster management ecosystem.
Styles APA, Harvard, Vancouver, ISO, etc.
14

Shakhovska, Natalya, Vitaliy Yakovyna et Valentyna Chopyak. « A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system ». Mathematical Biosciences and Engineering 19, no 6 (2022) : 6102–23. http://dx.doi.org/10.3934/mbe.2022285.

Texte intégral
Résumé :
<abstract> <p>Starting from December 2019, the COVID-19 pandemic has globally strained medical resources and caused significant mortality. It is commonly recognized that the severity of SARS-CoV-2 disease depends on both the comorbidity and the state of the patient's immune system, which is reflected in several biomarkers. The development of early diagnosis and disease severity prediction methods can reduce the burden on the health care system and increase the effectiveness of treatment and rehabilitation of patients with severe cases. This study aims to develop and validate an ensemble machine-learning model based on clinical and immunological features for severity risk assessment and post-COVID rehabilitation duration for SARS-CoV-2 patients. The dataset consisting of 35 features and 122 instances was collected from Lviv regional rehabilitation center. The dataset contains age, gender, weight, height, BMI, CAT, 6-minute walking test, pulse, external respiration function, oxygen saturation, and 15 immunological markers used to predict the relationship between disease duration and biomarkers using the machine learning approach. The predictions are assessed through an area under the receiver-operating curve, classification accuracy, precision, recall, and F1 score performance metrics. A new hybrid ensemble feature selection model for a post-COVID prediction system is proposed as an automatic feature cut-off rank identifier. A three-layer high accuracy stacking ensemble classification model for intelligent analysis of short medical datasets is presented. Together with weak predictors, the associative rules allowed improving the classification quality. The proposed ensemble allows using a random forest model as an aggregator for weak repressors' results generalization. The performance of the three-layer stacking ensemble classification model (AUC 0.978; CA 0.920; F1 score 0.921; precision 0.924; recall 0.920) was higher than five machine learning models, viz. tree algorithm with forward pruning; Naïve Bayes classifier; support vector machine with RBF kernel; logistic regression, and a calibrated learner with sigmoid function and decision threshold optimization. Aging-related biomarkers, viz. CD3+, CD4+, CD8+, CD22+ were examined to predict post-COVID rehabilitation duration. The best accuracy was reached in the case of the support vector machine with the linear kernel (MAPE = 0.0787) and random forest classifier (RMSE = 1.822). The proposed three-layer stacking ensemble classification model predicted SARS-CoV-2 disease severity based on the cytokines and physiological biomarkers. The results point out that changes in studied biomarkers associated with the severity of the disease can be used to monitor the severity and forecast the rehabilitation duration.</p> </abstract>
Styles APA, Harvard, Vancouver, ISO, etc.
15

Matczyszyn, Julianne N., Timothy Harris, Kirsten Powers, Sydney E. Everhart et Thomas O. Powers. « Ecological and morphological differentiation among COI haplotype groups in the plant parasitic nematode species Mesocriconema xenoplax ». Journal of Nematology 54, no 1 (1 février 2022) : 1–24. http://dx.doi.org/10.2478/jofnem-2022-0009.

Texte intégral
Résumé :
Abstract DNA barcoding with the mitochondrial COI gene reveals distinct haplotype subgroups within the monophyletic and parthenogenetic nematode species, Mesocriconema xenoplax. Biological attributes of these haplotype groups (HG) have not been explored. An analysis of M. xenoplax from 40 North American sites representing both native plant communities and agroecosystems was conducted to identify possible subgroup associations with ecological, physiological, or geographic factors. A dataset of 132 M. xenoplax specimens was used to generate sequences of a 712 bp region of the cytochrome oxidase subunit I gene. Maximum-likelihood and Bayesian phylogenies recognized seven COI HG (≥99/0.99 posterior probability/bootstrap value). Species delimitation metrics largely supported the genetic integrity of the HG. Discriminant function analysis of HG morphological traits identified stylet length, total body length, and stylet knob width as the strongest distinguishing features among the seven groups, with stylet length as the strongest single distinguishing morphological feature. Multivariate analysis identified land cover, ecoregion, and maximum temperature as predictors of 53.6% of the total variation (P = 0.001). Within land cover, HG categorized under “herbaceous,” “woody wetlands,” and “deciduous forest” were distinct in DAPC and RDA analyses and were significantly different (analysis of molecular variance P = 0.001). These results provide empirical evidence for molecular, morphological, and ecological differentiation associated with HG within the monophyletic clade that represents the species Mesocriconema xenoplax.
Styles APA, Harvard, Vancouver, ISO, etc.
16

V, Sudha, et Girijamma H. A. « SCDT : FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fuzzy Cluster based Nearest Neighbor Classifier ». International Journal of Electrical and Computer Engineering (IJECE) 8, no 6 (1 décembre 2018) : 4505. http://dx.doi.org/10.11591/ijece.v8i6.pp4505-4518.

Texte intégral
Résumé :
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &amp;FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Venkatesh, B., et J. Anuradha. « A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data ». International Journal of Knowledge-based and Intelligent Engineering Systems 24, no 4 (18 janvier 2021) : 289–301. http://dx.doi.org/10.3233/kes-190134.

Texte intégral
Résumé :
In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.
Styles APA, Harvard, Vancouver, ISO, etc.
18

Ul Din, Shaker, et Hugo Wai Leung Mak. « Retrieval of Land-Use/Land Cover Change (LUCC) Maps and Urban Expansion Dynamics of Hyderabad, Pakistan via Landsat Datasets and Support Vector Machine Framework ». Remote Sensing 13, no 16 (23 août 2021) : 3337. http://dx.doi.org/10.3390/rs13163337.

Texte intégral
Résumé :
Land-use/land cover change (LUCC) is an important problem in developing and under-developing countries with regard to global climatic changes and urban morphological distribution. Since the 1900s, urbanization has become an underlying cause of LUCC, and more than 55% of the world’s population resides in cities. The speedy growth, development and expansion of urban centers, rapid inhabitant’s growth, land insufficiency, the necessity for more manufacture, advancement of technologies remain among the several drivers of LUCC around the globe at present. In this study, the urban expansion or sprawl, together with spatial dynamics of Hyderabad, Pakistan over the last four decades were investigated and reviewed, based on remotely sensed Landsat images from 1979 to 2020. In particular, radiometric and atmospheric corrections were applied to these raw images, then the Gaussian-based Radial Basis Function (RBF) kernel was used for training, within the 10-fold support vector machine (SVM) supervised classification framework. After spatial LUCC maps were retrieved, different metrics like Producer’s Accuracy (PA), User’s Accuracy (UA) and KAPPA coefficient (KC) were adopted for spatial accuracy assessment to ensure the reliability of the proposed satellite-based retrieval mechanism. Landsat-derived results showed that there was an increase in the amount of built-up area and a decrease in vegetation and agricultural lands. Built-up area in 1979 only covered 30.69% of the total area, while it has increased and reached 65.04% after four decades. In contrast, continuous reduction of agricultural land, vegetation, waterbody, and barren land was observed. Overall, throughout the four-decade period, the portions of agricultural land, vegetation, waterbody, and barren land have decreased by 13.74%, 46.41%, 49.64% and 85.27%, respectively. These remotely observed changes highlight and symbolize the spatial characteristics of “rural to urban transition” and socioeconomic development within a modernized city, Hyderabad, which open new windows for detecting potential land-use changes and laying down feasible future urban development and planning strategies.
Styles APA, Harvard, Vancouver, ISO, etc.
19

Haq, Ejaz Ul, Huang Jianjun, Xu Huarong, Kang Li et Lifen Weng. « A Hybrid Approach Based on Deep CNN and Machine Learning Classifiers for the Tumor Segmentation and Classification in Brain MRI ». Computational and Mathematical Methods in Medicine 2022 (8 août 2022) : 1–18. http://dx.doi.org/10.1155/2022/6446680.

Texte intégral
Résumé :
Conventional medical imaging and machine learning techniques are not perfect enough to correctly segment the brain tumor in MRI as the proper identification and segmentation of tumor borders are one of the most important criteria of tumor extraction. The existing approaches are time-consuming, incursive, and susceptible to human mistake. These drawbacks highlight the importance of developing a completely automated deep learning-based approach for segmentation and classification of brain tumors. The expedient and prompt segmentation and classification of a brain tumor are critical for accurate clinical diagnosis and adequately treatment. As a result, deep learning-based brain tumor segmentation and classification algorithms are extensively employed. In the deep learning-based brain tumor segmentation and classification technique, the CNN model has an excellent brain segmentation and classification effect. In this work, an integrated and hybrid approach based on deep convolutional neural network and machine learning classifiers is proposed for the accurate segmentation and classification of brain MRI tumor. A CNN is proposed in the first stage to learn the feature map from image space of brain MRI into the tumor marker region. In the second step, a faster region-based CNN is developed for the localization of tumor region followed by region proposal network (RPN). In the last step, a deep convolutional neural network and machine learning classifiers are incorporated in series in order to further refine the segmentation and classification process to obtain more accurate results and findings. The proposed model’s performance is assessed based on evaluation metrics extensively used in medical image processing. The experimental results validate that the proposed deep CNN and SVM-RBF classifier achieved an accuracy of 98.3% and a dice similarity coefficient (DSC) of 97.8% on the task of classifying brain tumors as gliomas, meningioma, or pituitary using brain dataset-1, while on Figshare dataset, it achieved an accuracy of 98.0% and a DSC of 97.1% on classifying brain tumors as gliomas, meningioma, or pituitary. The segmentation and classification results demonstrate that the proposed model outperforms state-of-the-art techniques by a significant margin.
Styles APA, Harvard, Vancouver, ISO, etc.
20

Ali, Muhammad, Dost Muhammad Khan, Muhammad Aamir, Amjad Ali et Zubair Ahmad. « Predicting the Direction Movement of Financial Time Series Using Artificial Neural Network and Support Vector Machine ». Complexity 2021 (2 décembre 2021) : 1–13. http://dx.doi.org/10.1155/2021/2906463.

Texte intégral
Résumé :
Prediction of financial time series such as stock and stock indexes has remained the main focus of researchers because of its composite nature and instability in almost all of the developing and advanced countries. The main objective of this research work is to predict the direction movement of the daily stock prices index using the artificial neural network (ANN) and support vector machine (SVM). The datasets utilized in this study are the KSE-100 index of the Pakistan stock exchange, Korea composite stock price index (KOSPI), Nikkei 225 index of the Tokyo stock exchange, and Shenzhen stock exchange (SZSE) composite index for the last ten years that is from 2011 to 2020. To build the architect of a single layer ANN and SVM model with linear, radial basis function (RBF), and polynomial kernels, different technical indicators derived from the daily stock trading, such as closing, opening, daily high, and daily low prices and used as input layers. Since both the ANN and SVM models were used as classifiers; therefore, accuracy and F-score were used as performance metrics calculated from the confusion matrix. It can be concluded from the results that ANN performs better than SVM model in terms of accuracy and F-score to predict the direction movement of the KSE-100 index, KOSPI index, Nikkei 225 index, and SZSE composite index daily closing price movement.
Styles APA, Harvard, Vancouver, ISO, etc.
21

Khan, Bilal, Rashid Naseem, Muhammad Arif Shah, Karzan Wakil, Atif Khan, M. Irfan Uddin et Marwan Mahmoud. « Software Defect Prediction for Healthcare Big Data : An Empirical Evaluation of Machine Learning Techniques ». Journal of Healthcare Engineering 2021 (15 mars 2021) : 1–16. http://dx.doi.org/10.1155/2021/8899263.

Texte intégral
Résumé :
Software defect prediction (SDP) in the initial period of the software development life cycle (SDLC) remains a critical and important assignment. SDP is essentially studied during few last decades as it leads to assure the quality of software systems. The quick forecast of defective or imperfect artifacts in software development may serve the development team to use the existing assets competently and more effectively to provide extraordinary software products in the given or narrow time. Previously, several canvassers have industrialized models for defect prediction utilizing machine learning (ML) and statistical techniques. ML methods are considered as an operative and operational approach to pinpoint the defective modules, in which moving parts through mining concealed patterns amid software metrics (attributes). ML techniques are also utilized by several researchers on healthcare datasets. This study utilizes different ML techniques software defect prediction using seven broadly used datasets. The ML techniques include the multilayer perceptron (MLP), support vector machine (SVM), decision tree (J48), radial basis function (RBF), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K-nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB). The performance of each technique is evaluated using different measures, for instance, relative absolute error (RAE), mean absolute error (MAE), root mean squared error (RMSE), root relative squared error (RRSE), recall, and accuracy. The inclusive outcome shows the best performance of RF with 88.32% average accuracy and 2.96 rank value, second-best performance is achieved by SVM with 87.99% average accuracy and 3.83 rank values. Moreover, CDT also shows 87.88% average accuracy and 3.62 rank values, placed on the third position. The comprehensive outcomes of research can be utilized as a reference point for new research in the SDP domain, and therefore, any assertion concerning the enhancement in prediction over any new technique or model can be benchmarked and proved.
Styles APA, Harvard, Vancouver, ISO, etc.
22

Ben Mahria, Bilal, Ilham Chaker et Azeddine Zahi. « An empirical study on the evaluation of the RDF storage systems ». Journal of Big Data 8, no 1 (10 juillet 2021). http://dx.doi.org/10.1186/s40537-021-00486-y.

Texte intégral
Résumé :
AbstractIn this paper, we introduce three new implementations of non-native methods for storing RDF data. These methods named RDFSPO, RDFPC and RDFVP, are based respectively on the statement table, property table and vertical partitioning approaches. As important, we consider the issue of how to select the most relevant strategy for storing the RDF data depending on the dataset characteristics. For this, we investigate the balancing between two performance metrics, including load time and query response time. In this context, we provide an empirical comparative study between on one hand the three proposed methods, and on the other hand the proposed methods versus the existing ones by using various publicly available datasets. Finally, in order to further assess where the statistically significant differences appear between studied methods, we have performed a statistical analysis, based on the non-parametric Friedman test followed by a Nemenyi post-hoc test. The obtained results clearly show that the proposed RDFVP method achieves highly competitive computational performance against other state-of-the-art methods in terms of load time and query response time.
Styles APA, Harvard, Vancouver, ISO, etc.
23

Touma, Roudy, Hazem Hajj, Wassim El-Hajj et Khaled Shaban. « Automated Generation of Human-readable Natural Arabic Text from RDF Data ». ACM Transactions on Asian and Low-Resource Language Information Processing, 25 janvier 2023. http://dx.doi.org/10.1145/3582262.

Texte intégral
Résumé :
With the advances in Natural Language Processing (NLP), the industry has been moving towards human-directed artificial intelligence (AI) solutions. Recently, chat bots and automated news generation have captured a lot of attention. The goal is to automatically generate readable text from tabular data or web data commonly represented in Resource Description Framework (RDF) format. The problem can then be formulated as Data-to-text (D2T) generation from structured non-linguistic data into human-readable natural language. Despite the significant work done for the English language, no efforts are being directed towards low-resource languages like the Arabic language. This work promotes the development of the first RDF data-to-text (D2T) generation system for the Arabic language while trying to address the low-resource limitation. We develop several models for the Arabic D2T task using transfer learning from large language models (LLM) such as AraBERT, AraGPT2 and mT5. These models include a baseline Bi-LSTM Sequence-to-Sequence (Seq2Seq) model, as well as encoder-decoder transformers like BERT2BERT, BERT2GPT, and T5. We then provide detailed comparative study highlighting the strengths and limitations of these methods setting the stage for further advancement in the field. We also introduce a new Arabic dataset (AraWebNLG) that can be used for new model development in the field. To ensure a comprehensive evaluation, general-purpose automated metrics (BLEU and Perplexity scores) are used as well as task-specific human evaluation metrics related to the accuracy of the content selection and fluency of the generated text. The results highlight the importance of pre-training on a large corpus of Arabic data and show that transfer learning from AraBERT gives the best performance. Text-to-text pre-training using mT5 achieves second best performance results even with multilingual weights.
Styles APA, Harvard, Vancouver, ISO, etc.
24

Guo, Jimao, et Yi Wang. « RDF Graph Summarization Based on Node Characteristic and Centrality ». Journal of Web Engineering, 6 décembre 2022. http://dx.doi.org/10.13052/jwe1540-9589.2174.

Texte intégral
Résumé :
The explosive growth of RDF data makes it difficult to be efficiently queried, understood and used. RDF graph (RDFG) summarization aims to extract the most relevant and crucial data as summaries according to different criteria. Current summarization approaches mainly apply single strategies such as graph structure, pattern mining or relevance metrics to calculate RDFG summaries. Different to the existing approaches, this paper proposes a summarization approach to automatically generating RDFG summary, which can capture both structure and centrality information. Specifically, we present three algorithms, SumW (merging nodes based on node characteristics or similar types), SumS (merging nodes based on typed node characteristics) and SummaryFL (retrieving central nodes by combining node frequency and bridging coefficient). The three algorithms can be used by two summarization strategies: SumS or SumW only, and SumS+SummaryFL or SumW+SummaryFL. We conducted experiments over large and real-world RDF datasets to verify the effectiveness of our method with respect to time complexity, compression capability and coverage of the summary. The experiment results demonstrate that our approach outperformed the comparative algorithms.
Styles APA, Harvard, Vancouver, ISO, etc.
25

Zloch, Matthäus, Maribel Acosta, Daniel Hienert, Stefan Conrad et Stefan Dietze. « Charaterizing RDF graphs through graph-based measures – framework and assessment ». Semantic Web, 20 octobre 2020, 1–24. http://dx.doi.org/10.3233/sw-200409.

Texte intégral
Résumé :
The topological structure of RDF graphs inherently differs from other types of graphs, like social graphs, due to the pervasive existence of hierarchical relations (TBox), which complement transversal relations (ABox). Graph measures capture such particularities through descriptive statistics. Besides the classical set of measures established in the field of network analysis, such as size and volume of the graph or the type of degree distribution of its vertices, there has been some effort to define measures that capture some of the aforementioned particularities RDF graphs adhere to. However, some of them are redundant, computationally expensive, and not meaningful enough to describe RDF graphs. In particular, it is not clear which of them are efficient metrics to capture specific distinguishing characteristics of datasets in different knowledge domains (e.g., Cross Domain vs. Linguistics). In this work, we address the problem of identifying a minimal set of measures that is efficient, essential (non-redundant), and meaningful. Based on 54 measures and a sample of 280 graphs of nine knowledge domains from the Linked Open Data Cloud, we identify an essential set of 13 measures, having the capacity to describe graphs concisely. These measures have the capacity to present the topological structures and differences of datasets in established knowledge domains.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Rajan, Rajeev, et B. S. Shajee Mohan. « Distance Metric Learnt Kernel-Based Music Classification Using Timbral Descriptors ». International Journal of Pattern Recognition and Artificial Intelligence 35, no 13 (octobre 2021). http://dx.doi.org/10.1142/s0218001421510149.

Texte intégral
Résumé :
Automatic music genre classification based on distance metric learning (DML) is proposed in this paper. Three types of timbral descriptors, namely, mel-frequency cepstral coefficient (MFCC) features, modified group delay features (MODGDF) and low-level timbral feature sets are combined at the feature level. We experimented with k nearest neighbor (kNN) and support vector machine (SVM)-based classifiers for standard and DML kernels (DMLK) using GTZAN and Folk music dataset. Standard kernel-based kNN and SVM-based classifiers report classification accuracy (in%) of 79.03 and 90.16, respectively, on GTZAN dataset and 86.60 and 92.26, respectively, for Folk music dataset, with the best performing RBF kernel. A further improvement was observed when DML kernels were used in place of standard kernels in the kernel kNN and SVM-based classifiers with an accuracy of 84.46%, 92.74% (GTZAN), 90.00 and 96.23 (Folk music dataset) for DMLK-kNN and DMLK-SVM, respectively. The results demonstrate the potential of DML kernels in music genre classification task.
Styles APA, Harvard, Vancouver, ISO, etc.
27

Mohammed Amin, Tahsin Ali, Sabah Robitan Mahmood, Rebar Dara Mohammed et Pshtiwan Jabar Karim. « A Novel Classification of Uncertain Stream Data using Ant Colony Optimization Based on Radial Basis Function ». Kurdistan Journal of Applied Research, 27 novembre 2022, 57–70. http://dx.doi.org/10.24017/science.2022.2.5.

Texte intégral
Résumé :
There are many potential sources of data uncertainty, such as imperfect measurement or sampling, intrusive environmental monitoring, unreliable sensor networks, and inaccurate medical diagnoses. To avoid unintended results, data mining from new applications like sensors and location-based services needs to be done with care. When attempting to classify data with a high degree of uncertainty, many researchers have turned to heuristic approaches and machine learning (ML) methods. We propose an entirely new ML method in this paper by fusing the Radial Basis Function (RBF) network based on ant colony optimization (ACO). After introducing a large amount of uncertainty into a dataset, we normalize the data and finish training on clean data. The ant colony optimization algorithm is then used to train a recurrent neural network. Finally, we evaluate our proposed method against some of the most popular ML methods, including a k-nearest neighbor, support vector machine, random forest, decision tree, logistic regression, and extreme gradient boosting (Xgboost). Error metrics show that our model significantly outperforms the gold standard and other popular ML methods. Using industry-standard performance metrics, the results of our experiments show that our proposed method does a better job of classifying uncertain data than other methods
Styles APA, Harvard, Vancouver, ISO, etc.
28

ÇETİN, Umut Ahmet, et Fatih ABUT. « COVID-19 Enfeksiyonunun Nitelik Seçme ile Birleştirilmiş Makine Öğrenmesi Yöntemleriyle Tahmin Edilmesi ». European Journal of Science and Technology, 30 juin 2022. http://dx.doi.org/10.31590/ejosat.1132337.

Texte intégral
Résumé :
COVID-19 is an infection that has affected the world since December 31, 2019, and was declared a pandemic by WHO in March 2020. In this study, Multi-Layer Perceptron (MLP), Tree Boost (TB), Radial Basis Function Network (RBF), Support Vector Machine (SVM), and K-Means Clustering (kMC) individually combined with minimum redundancy maximum relevance (mRMR) and Relief-F have been used to construct new feature selection-based COVID-19 prediction models and discern the influential variables for prediction of COVID-19 infection. The dataset has information related to 20.000 patients (i.e., 10.000 positives, 10.000 negatives) and includes several personal, symptomatic, and non-symptomatic variables. The accuracy, recall, and F1-score metrics have been used to assess the models’ performance, whereas the generalization errors of the models were evaluated using 10-fold cross-validation. The results show that the average performance of mRMR is slightly better than Relief-F in predicting the COVID-19 infection of a patient. In addition, mRMR is more successful than the Relief-F algorithm in finding the relative relevance order of the COVID-19 predictors. The mRMR algorithm emphasizes symptomatic variables such as fever and cough, whereas the Relief-F algorithm highlights non-symptomatic variables such as age and race. It has also been observed that, in general, MLP outperforms all other classifiers for predicting the COVID-19 infection.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie