Log in

Relevant bibliographies by topics / Best Subset Selection / Journal articles

Journal articles on the topic 'Best Subset Selection'

To see the other types of publications on this topic, follow the link: Best Subset Selection.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Best Subset Selection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Groz, Benoît, and Silviu Maniu. "Hypervolume Subset Selection with Small Subsets." Evolutionary Computation 27, no. 4 (December 2019): 611–37. http://dx.doi.org/10.1162/evco_a_00235.

Full text

Abstract:

The hypervolume subset selection problem (HSSP) aims at approximating a set of [Formula: see text] multidimensional points in [Formula: see text] with an optimal subset of a given size. The size [Formula: see text] of the subset is a parameter of the problem, and an approximation is considered best when it maximizes the hypervolume indicator. This problem has proved popular in recent years as a procedure for multiobjective evolutionary algorithms. Efficient algorithms are known for planar points ([Formula: see text]), but there are hardly any results on HSSP in larger dimensions ([Formula: see text]). So far, most algorithms in higher dimensions essentially enumerate all possible subsets to determine the optimal one, and most of the effort has been directed toward improving the efficiency of hypervolume computation. We propose efficient algorithms for the selection problem in dimension 3 when either [Formula: see text] or [Formula: see text] is small, and extend our techniques to arbitrary dimensions for [Formula: see text].

APA, Harvard, Vancouver, ISO, and other styles

2

Tamura, Ryuta, Ken Kobayashi, Yuichi Takano, Ryuhei Miyashiro, Kazuhide Nakata, and Tomomi Matsui. "BEST SUBSET SELECTION FOR ELIMINATING MULTICOLLINEARITY." Journal of the Operations Research Society of Japan 60, no. 3 (2017): 321–36. http://dx.doi.org/10.15807/jorsj.60.321.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Alrefaei, Mahmoud H., and Mohammad Almomani. "Subset selection of best simulated systems." Journal of the Franklin Institute 344, no. 5 (August 2007): 495–506. http://dx.doi.org/10.1016/j.jfranklin.2006.02.020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Bofinger, Evc, and Kerrie Mangersen. "Subset selection of the t best populations." Communications in Statistics - Theory and Methods 15, no. 10 (January 1986): 3145–61. http://dx.doi.org/10.1080/03610928608829299.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Takano, Yuichi, and Ryuhei Miyashiro. "Best subset selection via cross-validation criterion." TOP 28, no. 2 (February 14, 2020): 475–88. http://dx.doi.org/10.1007/s11750-020-00538-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Gupta, Shanti S., and Hwa-Ming Yang. "subset selection procedures for the best population." Journal of Statistical Planning and Inference 12 (January 1985): 213–33. http://dx.doi.org/10.1016/0378-3758(85)90071-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Laan, Paul Van Der, and Paul van der Laan. "Subset Selection of an Almost Best Treatment." Biometrical Journal 34, no. 6 (1992): 647–56. http://dx.doi.org/10.1002/bimj.4710340602.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zhang, Zhongheng. "Variable selection with stepwise and best subset approaches." Annals of Translational Medicine 4, no. 7 (April 2016): 136. http://dx.doi.org/10.21037/atm.2016.03.35.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zhu, Junxian, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. "A polynomial algorithm for best-subset selection problem." Proceedings of the National Academy of Sciences 117, no. 52 (December 16, 2020): 33117–23. http://dx.doi.org/10.1073/pnas.2014241117.

Full text

Abstract:

Best-subset selection aims to find a small subset of predictors, so that the resulting linear model is expected to have the most desirable prediction accuracy. It is not only important and imperative in regression analysis but also has far-reaching applications in every facet of research, including computer science and medicine. We introduce a polynomial algorithm, which, under mild conditions, solves the problem. This algorithm exploits the idea of sequencing and splicing to reach a stable solution in finite steps when the sparsity level of the model is fixed but unknown. We define an information criterion that helps the algorithm select the true sparsity level with a high probability. We show that when the algorithm produces a stable optimal solution, that solution is the oracle estimator of the true parameters with probability one. We also demonstrate the power of the algorithm in several numerical studies.

APA, Harvard, Vancouver, ISO, and other styles

10

Bertsimas, Dimitris, Angela King, and Rahul Mazumder. "Best subset selection via a modern optimization lens." Annals of Statistics 44, no. 2 (April 2016): 813–52. http://dx.doi.org/10.1214/15-aos1388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation." International Journal of Software Engineering and Knowledge Engineering 25, no. 09n10 (November 2015): 1467–90. http://dx.doi.org/10.1142/s0218194015400288.

Full text

Abstract:

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

APA, Harvard, Vancouver, ISO, and other styles

12

Wang, Xi, Qiang Li, and Zhi Hong Xie. "New Feature Selection Method Based on SVM-RFE." Advanced Materials Research 926-930 (May 2014): 3100–3104. http://dx.doi.org/10.4028/www.scientific.net/amr.926-930.3100.

Full text

Abstract:

This article analyzed the defects of SVM-RFE feature selection algorithm, put forward new feature selection method combined SVM-RFE and PCA. Firstly, get the best feature subset through the method of cross validation of k based on SVM-RFE. Then, the PCA decreased the dimension of the feature subset and got the independent feature subset. The independent feature subset was the training and testing subset of SVM. Make experiments on five subsets of UCI, the results indicated that the training and testing time was shortened and the recognition accuracy rate of the SVM was higher.

APA, Harvard, Vancouver, ISO, and other styles

13

Horn, Manfred, and Rüdiger Vollandt. "Subset selection of the best treatment via many-one tests." Communications in Statistics - Theory and Methods 25, no. 6 (January 1996): 1335–49. http://dx.doi.org/10.1080/03610929608831768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ferré, Joan, and F. Xavier Rius. "Selection of the Best Calibration Sample Subset for Multivariate Regression." Analytical Chemistry 68, no. 9 (January 1996): 1565–71. http://dx.doi.org/10.1021/ac950482a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Sarkar, A., and K. M. S. Sharma. "An approach to direct selection of best subset ar model." Journal of Statistical Computation and Simulation 56, no. 3 (February 1997): 273–91. http://dx.doi.org/10.1080/00949659708811793.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Liu, Keng-Hao, Yu-Kai Chen, and Tsun-Yang Chen. "A Band Subset Selection Approach Based on Sparse Self-Representation and Band Grouping for Hyperspectral Image Classification." Remote Sensing 14, no. 22 (November 10, 2022): 5686. http://dx.doi.org/10.3390/rs14225686.

Full text

Abstract:

Band subset selection (BSS) is one of the ways to implement band selection (BS) for a hyperspectral image (HSI). Different from conventional BS methods, which select bands one by one, BSS selects a band subset each time and preserves the best one from the collection of the band subsets. This paper proposes a BSS method, called band grouping-based sparse self-representation BSS (BG-SSRBSS), for hyperspectral image classification. It formulates BS as a sparse self-representation (SSR) problem in which the entire bands can be represented by a set of informatively complementary bands. The BG-SSRBSS consists of two steps. To tackle the issue of selecting redundant bands, it first applies band grouping (BG) techniques to pre-group the entire bands to form multiple band groups, and then performs band group subset selection (BGSS) to find the optimal band group subset. The corresponding representative bands are taken as the BS result. To efficiently find the nearly global optimal subset among all possible band group subsets, sequential and successive iterative search algorithms are adopted. Land cover classification experiments conducted on three real HSI datasets show that BG-SSRBSS can improve classification accuracy by 4–20% compared to the existing BSS methods and requires less computation time.

APA, Harvard, Vancouver, ISO, and other styles

17

Miao, Maoxuan, Jinran Wu, Fengjing Cai, and You-Gan Wang. "A Modified Memetic Algorithm with an Application to Gene Selection in a Sheep Body Weight Study." Animals 12, no. 2 (January 15, 2022): 201. http://dx.doi.org/10.3390/ani12020201.

Full text

Abstract:

Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.

APA, Harvard, Vancouver, ISO, and other styles

18

Hazimeh, Hussein, and Rahul Mazumder. "Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms." Operations Research 68, no. 5 (September 2020): 1517–37. http://dx.doi.org/10.1287/opre.2019.1919.

Full text

Abstract:

In several scientific and industrial applications, it is desirable to build compact, interpretable learning models where the output depends on a small number of input features. Recent work has shown that such best-subset selection-type problems can be solved with modern mixed integer optimization solvers. Despite their promise, such solvers often come at a steep computational price when compared with open-source, efficient specialized solvers based on convex optimization and greedy heuristics. In “Fast Best-Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms,” Hussein Hazimeh and Rahul Mazumder push the frontiers of computation for best-subset-type problems. Their algorithms deliver near-optimal solutions for problems with up to a million features—in times comparable with the fast convex solvers. Their work suggests that principled optimization methods play a key role in devising tools central to interpretable machine learning, which can help in gaining a deeper understanding of their statistical properties.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhang, John, Pinyuen Chen, and Yue Fang. "A two-stage procedure on comparing several experimental treatments and a control—the common and unknown variance case." Journal of Applied Mathematics and Decision Sciences 2005, no. 1 (January 1, 2005): 47–58. http://dx.doi.org/10.1155/jamds.2005.47.

Full text

Abstract:

This paper introduces a two-stage selection rule to compare several experimental treatments with a control when the variances are common and unknown. The selection rule integrates the indifference zone approach and the subset selection approach in multiple-decision theory. Two mutually exclusive subsets of the parameter space are defined, one is called the preference zone (PZ) and the other, the indifference zone (IZ). The best experimental treatment is defined to be the experimental treatment with the largest population mean. The selection procedure opts to select only the experimental treatment which corresponds to the largest sample mean when the parameters are in the PZ, and selects a subset of the experimental treatments and the control when the parameters fall in the IZ. The concept of a correct decision is defined differently in these two zones. A correct decision in the preference zone (CD1) is defined to be the event that the best experimental treatment is selected. In the indifference zone, a selection is called correct (CD2) if the selected subset contains the best experimental treatment. Theoretical results on the lower bounds for P(CD1) in PZ and P(CD2) in IZ are developed. A table is computed for the implementation of the selection procedure.

APA, Harvard, Vancouver, ISO, and other styles

20

Ehrman, Chaim Meyer, Abba Krieger, and Klaus J. Miescke. "Subset Selection toward Optimizing the Best Performance at a Second Stage." Journal of Business & Economic Statistics 5, no. 2 (April 1987): 295. http://dx.doi.org/10.2307/1391911.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Ehrman, Chaim Meyer, Abba Krieger, and Klaus J. Miescke. "Subset Selection Toward Optimizing the Best Performance at a Second Stage." Journal of Business & Economic Statistics 5, no. 2 (April 1987): 295–303. http://dx.doi.org/10.1080/07350015.1987.10509589.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Guorong, Qingming Huang, Junbiao Pang, Shuqiang Jiang, and Lei Qin. "Online selection of the best k-feature subset for object tracking." Journal of Visual Communication and Image Representation 23, no. 2 (February 2012): 254–63. http://dx.doi.org/10.1016/j.jvcir.2011.11.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Carter, Knute D., and Joseph E. Cavanaugh. "Best-subset model selection based on multitudinal assessments of likelihood improvements." Journal of Applied Statistics 47, no. 13-15 (July 25, 2019): 2384–420. http://dx.doi.org/10.1080/02664763.2019.1645097.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Guo, Hao Yan, and Da Zheng Wang. "A Multilevel Optimal Feature Selection and Ensemble Learning for a Specific CAD System-Pulmonary Nodule Detection." Applied Mechanics and Materials 380-384 (August 2013): 1593–99. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.1593.

Full text

Abstract:

The traditional motivation behind feature selection algorithms is to find the best subset of features for a task using one particular learning algorithm. However, it has been often found that no single classifier is entirely satisfactory for a particular task. Therefore, how to further improve the performance of these single systems on the basis of the previous optimal feature subset is a very important issue.We investigate the notion of optimal feature selection and present a practical feature selection approach that is based on an optimal feature subset of a single CAD system, which is referred to as a multilevel optimal feature selection method (MOFS) in this paper. Through MOFS, we select the different optimal feature subsets in order to eliminate features that are redundant or irrelevant and obtain optimal features.

APA, Harvard, Vancouver, ISO, and other styles

25

Zuo, Guoyu, Zhaokun Xu, Jiahao Lu, and Daoxiong Gong. "Feature subset evaluation method for upper limb rehabilitation training based on joint feature discernibility." International Journal of Distributed Sensor Networks 15, no. 3 (March 2019): 155014771983846. http://dx.doi.org/10.1177/1550147719838467.

Full text

Abstract:

A feature subset discernibility hybrid evaluation method using Fisher score based on joint feature and support vector machine is proposed for the feature selection problem of the upper limb rehabilitation training motion of Brunnstrom 4–5 stage patients. In this method, the joint feature is introduced to evaluate the discernibility between classes due to the joint effect of both candidate and selected features. A feature subset search strategy is used to search a set of candidate feature subsets. The Fisher score based on joint feature method is used to evaluate the candidate feature subsets and the best subset is selected as a new selected feature subset. From these selected subsets such as obtained by the above process, the subset with the best performance of support vector machine classification is finally selected as the optimal feature subset. Experiments were carried out on the upper limb routine rehabilitation training samples of the Brunnstrom 4–5 stage. Compared with both the F-score and the discernibility of feature subset methods, the experimental results show the effectiveness and feasibility of the proposed method which can obtain the feature subsets with higher accuracy and smaller feature dimension.

APA, Harvard, Vancouver, ISO, and other styles

26

Alrefaei, Mahmoud H., Mohammad H. Almomani, and Sarah N. Alabed Alhadi. "Selecting the best stochastic systems for large scale engineering problems." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 5 (October 1, 2021): 4289. http://dx.doi.org/10.11591/ijece.v11i5.pp4289-4299.

Full text

Abstract:

Selecting a subset of the best solutions among large-scale problems is an important area of research. When the alternative solutions are stochastic in nature, then it puts more burden on the problem. The objective of this paper is to select a set that is likely to contain the actual best solutions with high probability. If the selected set contains all the best solutions, then the selection is denoted as correct selection. We are interested in maximizing the probability of this selection; P(CS). In many cases, the available computation budget for simulating the solution set in order to maximize P(CS) is limited. Therefore, instead of distributing these computational efforts equally likely among the alternatives, the optimal computing budget allocation (OCBA) procedure came to put more effort on the solutions that have more impact on the selected set. In this paper, we derive formulas of how to distribute the available budget asymptotically to find the approximation of P(CS). We then present a procedure that uses OCBA with the ordinal optimization (OO) in order to select the set of best solutions. The properties and performance of the proposed procedure are illustrated through a numerical example. Overall results indicate that the procedure is able to select a subset of the best systems with high probability of correct selection using small number of simulation samples under different parameter settings.

APA, Harvard, Vancouver, ISO, and other styles

27

Mbakop, Eric, and Max Tabord-Meehan. "Model Selection for Treatment Choice: Penalized Welfare Maximization." Econometrica 89, no. 2 (2021): 825–48. http://dx.doi.org/10.3982/ecta16437.

Full text

Abstract:

This paper studies a penalized statistical decision rule for the treatment assignment problem. Consider the setting of a utilitarian policy maker who must use sample data to allocate a binary treatment to members of a population, based on their observable characteristics. We model this problem as a statistical decision problem where the policy maker must choose a subset of the covariate space to assign to treatment, out of a class of potential subsets. We focus on settings in which the policy maker may want to select amongst a collection of constrained subset classes: examples include choosing the number of covariates over which to perform best‐subset selection, and model selection when approximating a complicated class via a sieve. We adapt and extend results from statistical learning to develop the Penalized Welfare Maximization (PWM) rule. We establish an oracle inequality for the regret of the PWM rule which shows that it is able to perform model selection over the collection of available classes. We then use this oracle inequality to derive relevant bounds on maximum regret for PWM. An important consequence of our results is that we are able to formalize model‐selection using a “holdout” procedure, where the policy maker would first estimate various policies using half of the data, and then select the policy which performs the best when evaluated on the other half of the data.

APA, Harvard, Vancouver, ISO, and other styles

28

Wang, Lening, Pang Du, and Ran Jin. "MOSS—Multi-Modal Best Subset Modeling in Smart Manufacturing." Sensors 21, no. 1 (January 1, 2021): 243. http://dx.doi.org/10.3390/s21010243.

Full text

Abstract:

Smart manufacturing, which integrates a multi-sensing system with physical manufacturing processes, has been widely adopted in the industry to support online and real-time decision making to improve manufacturing quality. A multi-sensing system for each specific manufacturing process can efficiently collect the in situ process variables from different sensor modalities to reflect the process variations in real-time. However, in practice, we usually do not have enough budget to equip too many sensors in each manufacturing process due to the cost consideration. Moreover, it is also important to better interpret the relationship between the sensing modalities and the quality variables based on the model. Therefore, it is necessary to model the quality-process relationship by selecting the most relevant sensor modalities with the specific quality measurement from the multi-modal sensing system in smart manufacturing. In this research, we adopted the concept of best subset variable selection and proposed a new model called Multi-mOdal beSt Subset modeling (MOSS). The proposed MOSS can effectively select the important sensor modalities and improve the modeling accuracy in quality-process modeling via functional norms that characterize the overall effects of individual modalities. The significance of sensor modalities can be used to determine the sensor placement strategy in smart manufacturing. Moreover, the selected modalities can better interpret the quality-process model by identifying the most correlated root cause of quality variations. The merits of the proposed model are illustrated by both simulations and a real case study in an additive manufacturing (i.e., fused deposition modeling) process.

APA, Harvard, Vancouver, ISO, and other styles

29

Guerreiro, Andreia P., Carlos M. Fonseca, and Luís Paquete. "Greedy Hypervolume Subset Selection in Low Dimensions." Evolutionary Computation 24, no. 3 (September 2016): 521–44. http://dx.doi.org/10.1162/evco_a_00188.

Full text

Abstract:

Given a nondominated point set [Formula: see text] of size [Formula: see text] and a suitable reference point [Formula: see text], the Hypervolume Subset Selection Problem (HSSP) consists of finding a subset of size [Formula: see text] that maximizes the hypervolume indicator. It arises in connection with multiobjective selection and archiving strategies, as well as Pareto-front approximation postprocessing for visualization and/or interaction with a decision maker. Efficient algorithms to solve the HSSP are available only for the 2-dimensional case, achieving a time complexity of [Formula: see text]. In contrast, the best upper bound available for [Formula: see text] is [Formula: see text]. Since the hypervolume indicator is a monotone submodular function, the HSSP can be approximated to a factor of [Formula: see text] using a greedy strategy. In this article, greedy [Formula: see text]-time algorithms for the HSSP in 2 and 3 dimensions are proposed, matching the complexity of current exact algorithms for the 2-dimensional case, and considerably improving upon recent complexity results for this approximation problem.

APA, Harvard, Vancouver, ISO, and other styles

30

Saha, Subrata, Sanguthevar Rajasekaran, and Rampi Ramprasad. "Novel Randomized Feature Selection Algorithms." International Journal of Foundations of Computer Science 26, no. 03 (April 2015): 321–41. http://dx.doi.org/10.1142/s0129054115500185.

Full text

Abstract:

Feature selection is the problem of identifying a subset of the most relevant features in the context of model construction. This problem has been well studied and plays a vital role in machine learning. In this paper we present three randomized algorithms for feature selection. They are generic in nature and can be applied for any learning algorithm. Proposed algorithms can be thought of as a random walk in the space of all possible subsets of the features. We demonstrate the generality of our approaches using three different applications. The simulation results show that our feature selection algorithms outperforms some of the best known algorithms existing in the current literature.

APA, Harvard, Vancouver, ISO, and other styles

31

Bidi, Noria, and Zakaria Elberrichi. "Using Penguins Search Optimization Algorithm for Best Features Selection for Biomedical Data Classification." International Journal of Organizational and Collective Intelligence 7, no. 4 (October 2017): 51–62. http://dx.doi.org/10.4018/ijoci.2017100103.

Full text

Abstract:

Feature selection is essential to improve the classification effectiveness. This paper presents a new adaptive algorithm called FS-PeSOA (feature selection penguins search optimization algorithm) which is a meta-heuristic feature selection method based on “Penguins Search Optimization Algorithm” (PeSOA), it will be combined with different classifiers to find the best subset features, which achieve the highest accuracy in classification. In order to explore the feature subset candidates, the bio-inspired approach PeSOA generates during the process a trial feature subset and estimates its fitness value by using three classifiers for each case: Naive Bayes (NB), Nearest Neighbors (KNN) and Support Vector Machines (SVMs). Our proposed approach has been experimented on six well known benchmark datasets (Wisconsin Breast Cancer, Pima Diabetes, Mammographic Mass, Dermatology, Colon Tumor and Prostate Cancer data sets). Experimental results prove that the classification accuracy of FS-PeSOA is the highest and very powerful for different datasets.

APA, Harvard, Vancouver, ISO, and other styles

32

Huang, Yanrong, Zhan Zheng, and Bo Wei. "“Dimension Reduction: Feature Subset” Method for Selecting the Best Index Combination in Reputation Evaluation of Crowdsourcing Participants." Mobile Information Systems 2022 (October 12, 2022): 1–16. http://dx.doi.org/10.1155/2022/5008465.

Full text

Abstract:

An effective reputation evaluation mechanism is an essential guarantee for the crowdsourcing mode's healthy, orderly, and rapid development. Aiming at the problems of unsound reputation evaluation mechanism, single reputation evaluation index, and poor discrimination ability of crowdsourcing platforms a “dimension reduction feature subset” method for selecting the best reputation evaluation index combination of crowdsourcing participants is proposed. This method first selects the best dimensionality reduction method by empirical method, then uses the classifier as the evaluation function of feature selection, and uses the sequential backward selection strategy (SBS) to select the feature subset and reputation evaluation algorithm with the best classification performance. The experimental results show that the reputation evaluation method of crowdsourcing participants based on ReliefF-SVM has the best performance in terms of accuracy, F1 measure, and stability and can select a comprehensive, objective, and effective evaluation index combination to distinguish the reputation status of crowdsourcing participants.

APA, Harvard, Vancouver, ISO, and other styles

33

Merino, Ibon, Jon Azpiazu, Anthony Remazeilles, and Basilio Sierra. "Histogram-Based Descriptor Subset Selection for Visual Recognition of Industrial Parts." Applied Sciences 10, no. 11 (May 27, 2020): 3701. http://dx.doi.org/10.3390/app10113701.

Full text

Abstract:

This article deals with the 2D image-based recognition of industrial parts. Methods based on histograms are well known and widely used, but it is hard to find the best combination of histograms, most distinctive for instance, for each situation and without a high user expertise. We proposed a descriptor subset selection technique that automatically selects the most appropriate descriptor combination, and that outperforms approach involving single descriptors. We have considered both backward and forward mechanisms. Furthermore, to recognize the industrial parts a supervised classification is used with the global descriptors as predictors. Several class approaches are compared. Given our application, the best results are obtained with the Support Vector Machine with a combination of descriptors increasing the F1 by 0.031 with respect to the best descriptor alone.

APA, Harvard, Vancouver, ISO, and other styles

34

Sultan, Kiran, Ijaz Mansoor Qureshi, Aqdas Naveed Malik, and Muhammad Zubair. "Performance Analysis of Relay Subset Selection for Amplify-and-Forward Cognitive Relay Networks." Scientific World Journal 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/548082.

Full text

Abstract:

Cooperative communication is regarded as a key technology in wireless networks, including cognitive radio networks (CRNs), which increases the diversity order of the signal to combat the unfavorable effects of the fading channels, by allowing distributed terminals to collaborate through sophisticated signal processing. Underlay CRNs have strict interference constraints towards the secondary users (SUs) active in the frequency band of the primary users (PUs), which limits their transmit power and their coverage area. Relay selection offers a potential solution to the challenges faced by underlay networks, by selecting either single best relay or a subset of potential relay set under different design requirements and assumptions. The best relay selection schemes proposed in the literature for amplify-and-forward (AF) based underlay cognitive relay networks have been very well studied in terms of outage probability (OP) and bit error rate (BER), which is deficient in multiple relay selection schemes. The novelty of this work is to study the outage behavior of multiple relay selection in the underlay CRN and derive the closed-form expressions for the OP and BER through cumulative distribution function (CDF) of the SNR received at the destination. The effectiveness of relay subset selection is shown through simulation results.

APA, Harvard, Vancouver, ISO, and other styles

35

Uraibi, Hassan S., Habshah Midi, and Sohel Rana. "Robust Stability Best Subset Selection for Autocorrelated Data Based on Robust Location and Dispersion Estimator." Journal of Probability and Statistics 2015 (2015): 1–8. http://dx.doi.org/10.1155/2015/432986.

Full text

Abstract:

Stability selection (multisplit) approach is a variable selection procedure which relies on multisplit data to overcome the shortcomings that may occur to single-split data. Unfortunately, this procedure yields very poor results in the presence of outliers and other contamination in the original data. The problem becomes more complicated when the regression residuals are serially correlated. This paper presents a new robust stability selection procedure to remedy the combined problem of autocorrelation and outliers. We demonstrate the good performance of our proposed robust selection method using real air quality data and simulation study.

APA, Harvard, Vancouver, ISO, and other styles

36

Wang, Yue, Wenqi Lu, and Heng Lian. "Best subset selection for high-dimensional non-smooth models using iterative hard thresholding." Information Sciences 625 (May 2023): 36–48. http://dx.doi.org/10.1016/j.ins.2023.01.021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Panthong, Rattanawadee, and Anongnart Srivihok. "Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand." Information 10, no. 6 (May 31, 2019): 187. http://dx.doi.org/10.3390/info10060187.

Full text

Abstract:

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.

APA, Harvard, Vancouver, ISO, and other styles

38

Su, Ran, Xinyi Liu, and Leyi Wei. "MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy." Briefings in Bioinformatics 21, no. 2 (March 12, 2019): 687–98. http://dx.doi.org/10.1093/bib/bbz021.

Full text

Abstract:

Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.

APA, Harvard, Vancouver, ISO, and other styles

39

Liu, Si, Hairong Liu, Longin Jan Latecki, Shuicheng Yan, Changsheng Xu, and Hanqing Lu. "Size Adaptive Selection of Most Informative Features." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 392–97. http://dx.doi.org/10.1609/aaai.v25i1.7902.

Full text

Abstract:

In this paper, we propose a novel method to select the most informativesubset of features, which has little redundancy andvery strong discriminating power. Our proposed approach automaticallydetermines the optimal number of features and selectsthe best subset accordingly by maximizing the averagepairwise informativeness, thus has obvious advantage overtraditional filter methods. By relaxing the essential combinatorialoptimization problem into the standard quadratic programmingproblem, the most informative feature subset canbe obtained efficiently, and a strategy to dynamically computethe redundancy between feature pairs further greatly acceleratesour method through avoiding unnecessary computationsof mutual information. As shown by the extensive experiments,the proposed method can successfully select the mostinformative subset of features, and the obtained classificationresults significantly outperform the state-of-the-art results onmost test datasets.

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, G., Q. Song, H. Sun, X. Zhang, B. Xu, and Y. Zhou. "A Feature Subset Selection Algorithm Automatic Recommendation Method." Journal of Artificial Intelligence Research 47 (May 15, 2013): 1–34. http://dx.doi.org/10.1613/jair.3831.

Full text

Abstract:

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.

APA, Harvard, Vancouver, ISO, and other styles

41

Salem, Omar A. M., and Liwei Wang. "Fuzzy Mutual Information Feature Selection Based on Representative Samples." International Journal of Software Innovation 6, no. 1 (January 2018): 58–72. http://dx.doi.org/10.4018/ijsi.2018010105.

Full text

Abstract:

Building classification models from real-world datasets became a difficult task, especially in datasets with high dimensional features. Unfortunately, these datasets may include irrelevant or redundant features which have a negative effect on the classification performance. Selecting the significant features and eliminating undesirable features can improve the classification models. Fuzzy mutual information is widely used feature selection to find the best feature subset before classification process. However, it requires more computation and storage space. To overcome these limitations, this paper proposes an improved fuzzy mutual information feature selection based on representative samples. Based on benchmark datasets, the experiments show that the proposed method achieved better results in the terms of classification accuracy, selected feature subset size, storage, and stability.

APA, Harvard, Vancouver, ISO, and other styles

42

Li, Jialian, Chao Du, and Jun Zhu. "A Bayesian Approach for Subset Selection in Contextual Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 8384–91. http://dx.doi.org/10.1609/aaai.v35i9.17019.

Full text

Abstract:

Subset selection in Contextual Bandits (CB) is an important task in various applications such as advertisement recommendation. In CB, arms are attached with contexts and thus correlated in the context space. Proper exploration for subset selection in CB should carefully consider the contexts. Previous works mainly concentrate on the best one arm identification in linear bandit problems, where the expected rewards are linearly dependent on the contexts. However, these methods highly rely on linearity, and cannot be easily extended to more general cases. We propose a novel Bayesian approach for subset selection in general CB where the reward functions can be nonlinear. Our method provides a principled way to employ contextual information and efficiently explore the arms. For cases with relatively smooth posteriors, we give theoretical results that are comparable to previous works. For general cases, we provide a calculable approximate variant. Empirical results show the effectiveness of our method on both linear bandits and general CB.

APA, Harvard, Vancouver, ISO, and other styles

43

Ahmed, Imtiaz, Amir Nasri, Diomidis S. Michalopoulos, Robert Schober, and Ranjan K. Mallik. "Relay Subset Selection and Fair Power Allocation for Best and Partial Relay Selection in Generic Noise and Interference." IEEE Transactions on Wireless Communications 11, no. 5 (May 2012): 1828–39. http://dx.doi.org/10.1109/twc.2012.031212.111113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Rohit P. Ojha, Karan P. Singh, Martha J. Felini, Eva L. Evans, and Lori A. Fischbach. "Best Subset Selection And Trend Analysis For Optimizing The Discriminatory Accuracy Of Diagnostic Models." Annals of Epidemiology 18, no. 9 (September 2008): 732. http://dx.doi.org/10.1016/j.annepidem.2008.08.075.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Zhang, Tao, and Joseph E. Cavanaugh. "A multistage algorithm for best-subset model selection based on the Kullback–Leibler discrepancy." Computational Statistics 31, no. 2 (April 24, 2015): 643–69. http://dx.doi.org/10.1007/s00180-015-0584-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Watagoda, Lasanthi C. R. Pelawa. "A Sub-Model Theorem for Ordinary Least Squares." International Journal of Statistics and Probability 8, no. 1 (November 19, 2018): 40. http://dx.doi.org/10.5539/ijsp.v8n1p40.

Full text

Abstract:

Variable selection or subset selection is an important step in the process of model fitting. There are many ways to select the best subset of variables including Forward selection, Backward elimination, etcetera. Ordinary least squares (OLS) is one of the most commonly used methods of fitting the final model. Final sub-model can perform poorly if the variable selection process failed to choose the right number of variables. This paper gives a new theorem and a mathematical proof to illustrate the reason for the poor performances, when using the least squares method after variable selection.

APA, Harvard, Vancouver, ISO, and other styles

47

Barton, Franklin E. "Near Infrared Reflectance Spectroscopy. Part II. Effect of Calibration Set Selection on Accuracy of Method." Journal of AOAC INTERNATIONAL 74, no. 5 (September 1, 1991): 853–56. http://dx.doi.org/10.1093/jaoac/74.5.853.

Full text

Abstract:

Abstract Tall fescue samples (Festuca arundenacea Schreb) were collected during 1983-1985. The 1983 and 1984 samples were used for calibration and the 1985 samples were used for validation. The combined 1983-1984 calibration set contained 382 samples. The program "SUBSET" was run and 74 samples were selected. The remaining samples from 1983- 1984 were divided Into 4 files of 77 samples each by taking every 4th sample. The "SUBSET" program was run on the 1985 set of 211 samples and 40 samples were selected that represented all the spectral diversity in the set. Separate sets of equations were developed with 2 regression programs, "BEST" and "CAL," and used to predict acid detergent fiber, neutral detergent fiber, permanganate lignin, and crude protein. The results show that while a random selection could sometimes produce a better set for calibration, the "SUBSET" program picks a set that consistently will produce a good calibration. In most instances, the "SUBSET" equations were the best or next to the best when measured by the standard error of performance corrected for bias.

APA, Harvard, Vancouver, ISO, and other styles

48

Alzayed, Asaad, Waheeda Almayyan, and Ahmed Al-Hunaiyyan. "Diagnosis of Obesity Level based on Bagging Ensemble Classifier and Feature Selection Methods." International Journal of Artificial Intelligence & Applications 13, no. 02 (March 31, 2022): 37–54. http://dx.doi.org/10.5121/ijaia.2022.13203.

Full text

Abstract:

In the current era, the amount of data generated from various device sources and business transactions is rising exponentially, and the current machine learning techniques are not feasible for handling the massive volume of data. Two commonly adopted schemes exist to solve such issues scaling up the data mining algorithms and data reduction. Scaling the data mining algorithms is not the best way, but data reduction is feasible. There are two approaches to reducing datasets selecting an optimal subset of features from the initial dataset or eliminating those that contribute less information. Overweight and obesity are increasing worldwide, and forecasting future overweight or obesity could help intervention. Our primary objective is to find the optimal subset of features to diagnose obesity. This article proposes adapting a bagging algorithm based on filter-based feature selection to improve the prediction accuracy of obesity with a minimal number of feature subsets. We utilized several machine learning algorithms for classifying the obesity classes and several filter feature selection methods to maximize the classifier accuracy. Based on the results of experiments, Pairwise Consistency and Pairwise Correlation techniques are shown to be promising tools for feature selection in respect of the quality of obtained feature subset and computation efficiency. Analyzing the results obtained from the original and modified datasets has improved the classification accuracy and established a relationship between obesity/overweight and common risk factors such as weight, age, and physical activity patterns.

APA, Harvard, Vancouver, ISO, and other styles

49

USAI, M. GRAZIANO, MIKE E. GODDARD, and BEN J. HAYES. "LASSO with cross-validation for genomic selection." Genetics Research 91, no. 6 (December 2009): 427–36. http://dx.doi.org/10.1017/s0016672309990334.

Full text

Abstract:

SummaryWe used a least absolute shrinkage and selection operator (LASSO) approach to estimate marker effects for genomic selection. The least angle regression (LARS) algorithm and cross-validation were used to define the best subset of markers to include in the model. The LASSO–LARS approach was tested on two data sets: a simulated data set with 5865 individuals and 6000 Single Nucleotide Polymorphisms (SNPs); and a mouse data set with 1885 individuals genotyped for 10 656 SNPs and phenotyped for a number of quantitative traits. In the simulated data, three approaches were used to split the reference population into training and validation subsets for cross-validation: random splitting across the whole population; random sampling of validation set from the last generation only, either within or across families. The highest accuracy was obtained by random splitting across the whole population. The accuracy of genomic estimated breeding values (GEBVs) in the candidate population obtained by LASSO–LARS was 0·89 with 156 explanatory SNPs. This value was higher than those obtained by Best Linear Unbiased Prediction (BLUP) and a Bayesian method (BayesA), which were 0·75 and 0·84, respectively. In the mouse data, 1600 individuals were randomly allocated to the reference population. The GEBVs for the remaining 285 individuals estimated by LASSO–LARS were more accurate than those obtained by BLUP and BayesA for weight at six weeks and slightly lower for growth rate and body length. It was concluded that LASSO–LARS approach is a good alternative method to estimate marker effects for genomic selection, particularly when the cost of genotyping can be reduced by using a limited subset of markers.

APA, Harvard, Vancouver, ISO, and other styles

50

Gonzalez-Sanchez, Alberto, Juan Frausto-Solis, and Waldo Ojeda-Bustamante. "Attribute Selection Impact on Linear and Nonlinear Regression Models for Crop Yield Prediction." Scientific World Journal 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/509429.

Full text

Abstract:

Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5′ regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (R). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63).

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!