To see the other types of publications on this topic, follow the link: Data / features engineering.

Journal articles on the topic 'Data / features engineering'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data / features engineering.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jadhav, Shailaja B., and D. V. Kodavade. "Enhancing Flight Delay Prediction through Feature Engineering in Machine Learning Classifiers: A Real Time Data Streams Case Study." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 2s (January 31, 2023): 212–18. http://dx.doi.org/10.17762/ijritcc.v11i2s.6064.

Full text
Abstract:
The process of creating and selecting features from raw data to enhance the accuracy of machine learning models is referred to as feature engineering. In the context of real-time data streams, feature engineering becomes particularly important because the data is constantly changing and the model must be able to adapt quickly. A case study of using feature engineering in a flight information system is described in this paper. We used feature engineering to improve the performance of machine learning classifiers for predicting flight delays and describe various techniques for extracting and constructing features from the raw data, including time-based features, trend-based features, and error-based features. Before applying these techniques, we applied feature pre-processing techniques, including the CTAO algorithm for feature pre-processing, followed by the SCSO (Sand cat swarm optimization) algorithm for feature extraction and the Enhanced harmony search for feature optimization. The resultant feature set contained the 9 most relevant features for deciding whether a flight would be delayed or not. Additionally, we evaluate the performance of various classifiers using these engineered features and contrast the results with those obtained using raw features. The results show that feature engineering significantly improves the performance of the classifiers and allows for more accurate prediction of flight delays in real-time.
APA, Harvard, Vancouver, ISO, and other styles
2

Dube, R. P., and H. R. Johnson. "Computer-Assisted Engineering Data Base." Journal of Engineering for Industry 107, no. 1 (February 1, 1985): 33–38. http://dx.doi.org/10.1115/1.3185961.

Full text
Abstract:
General capabilities of data base management technology are described. Information requirements posed by the space station life cycle are discussed, and it is asserted that data base management technology supporting engineering/manufacturing in a heterogeneous hardware/data base management system environment should be applied to meeting these requirements. Today’s commercial systems do not satisfy all of these requirements. The features of an R&D data base management system being developed to investigate data base management in the engineering/manufacturing environment are discussed. Features of this system represent only a partial solution to space station requirements. Areas where this system should be extended to meet full space station information management requirements are discussed.
APA, Harvard, Vancouver, ISO, and other styles
3

Shrestha, Sushil, and Manish Pokharel. "Educational data mining in moodle data." International Journal of Informatics and Communication Technology (IJ-ICT) 10, no. 1 (April 1, 2021): 9. http://dx.doi.org/10.11591/ijict.v10i1.pp9-18.

Full text
Abstract:
<p>The main purpose of this research paper is to analyze the moodle data and identify the most influencing features to develop the predictive model. The research applies a wrapper-based feature selection method called Boruta for the selection of best predicting features. Data were collected from eighty-one students who were enrolled in the course called Human Computer Interaction (COMP341), offered by the Department of Computer Science and Engineering at Kathmandu University, Nepal. Kathmandu University uses Moodle as an e-learning platform. The dataset contained eight features where Assignment.Click, Chat.Click, File.Click, Forum.Click, System.Click, Url.Click, and Wiki.Click was used as the independent features and Grade as the dependent feature. Five classification algorithms such as K Nearest Neighbour, Naïve Bayes, and Support Vector Machine (SVM), Random Forest, and CART decision tree were applied in the moodle data. The finding shows that SVM has the highest accuracy in comparison to other algorithms. It suggested that File.Click and System.Click was the most significant feature. This type of research helps in the early identification of students’ performance. The growing popularity of the teaching-learning process through an online learning system has attracted researchers to work in the field of Educational Data Mining (EDM). Varieties of data are generated through several online activities that can be analyzed to understand the student’s performance which helps in the overall teaching-learning process. Academicians especially course instructors who use e-learning platforms for the delivery of the course contents and the learners who use these platforms are highly benefited from this research.</p>
APA, Harvard, Vancouver, ISO, and other styles
4

Zhang, Song. "The Construction of Modern Administrative Law via Data Mining." Archives des Sciences 74, s1 (August 10, 2024): 32–39. http://dx.doi.org/10.62227/as/74s16.

Full text
Abstract:
Early administrative jurisprudence generally experienced a shift from administrative science to legal science, while modern administrative jurisprudence has shifted from judicial review to administrative process centered. In this paper, we introduce feature engineering technology into the construction model of administrative law based on data mining. The prior knowledge is introduced into the model through the artificially constructed effective features, and the neural network method can automatically extract features such as abstract features from the original input text. We propose a deep feature engineering method to combine the advantages of both in the study of relation extraction tasks. Experiments show that our method provides a powerful legal analysis tool, which helps to innovate the conceptual framework and theoretical system of traditional administrative law and establish a modern administrative law system that can explain the real-world administrative process.
APA, Harvard, Vancouver, ISO, and other styles
5

Huang, Eunchong, Sarah Kim, and TaeJin Ahn. "Deep Learning for Integrated Analysis of Insulin Resistance with Multi-Omics Data." Journal of Personalized Medicine 11, no. 2 (February 15, 2021): 128. http://dx.doi.org/10.3390/jpm11020128.

Full text
Abstract:
Technological advances in next-generation sequencing (NGS) have made it possible to uncover extensive and dynamic alterations in diverse molecular components and biological pathways across healthy and diseased conditions. Large amounts of multi-omics data originating from emerging NGS experiments require feature engineering, which is a crucial step in the process of predictive modeling. The underlying relationship among multi-omics features in terms of insulin resistance is not well understood. In this study, using the multi-omics data of type II diabetes from the Integrative Human Microbiome Project, from 10,783 features, we conducted a data analytic approach to elucidate the relationship between insulin resistance and multi-omics features, including microbiome data. To better explain the impact of microbiome features on insulin classification, we used a developed deep neural network interpretation algorithm for each microbiome feature’s contribution to the discriminative model output in the samples.
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Songyuan, Yuyan Man, Chi Zhang, Qiong Fang, Suya Li, and Min Deng. "PRPD data analysis with Auto-Encoder Network." E3S Web of Conferences 81 (2019): 01019. http://dx.doi.org/10.1051/e3sconf/20198101019.

Full text
Abstract:
Gas Insulated Switchgear (GIS) is related to the stable operation of power equipment. The traditional partial discharge pattern recognition method relies on expert experience to carry out feature engineering design artificial features, which has strong subjectivity and large blindness. To address the problem, we introduce an encoding-decoding network to reconstruct the input data and then treat the encoded network output as a partial discharge signal feature. The adaptive feature mining ability of the Auto-Encoder Network is effectively utilized, and the traditional classifier is connected to realize the effective combination of the deep learning method and the traditional machine learning method. The results show that the features extracted based on this method have better recognition than artificial features, which can effectively improve the recognition accuracy of partial discharge.
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Zongze. "Feature Engineering and Data Visualization Analysis in Artificial Intelligence in Big Data Era." International Journal of Computer Science and Information Technology 3, no. 3 (August 12, 2024): 390–95. http://dx.doi.org/10.62051/ijcsit.v3n3.41.

Full text
Abstract:
In the environment of massive data, the selection and construction of feature engineering plays a crucial role in the performance and accuracy of sgon models. It is true that the classic hand-driven feature building method can incorporate insights from the professional field, but this method is potentially accompanied by the hidden trouble of information omission, and does not necessarily touch the boundary of the optimal solution. In order to solve these problems, this paper proposes two strategies of feature extraction: ensemble learning and deep learning. Ensemble learning enhances generalization by combining the opinions of multiple models, while deep learning allows models to automatically learn features, reducing the need for human intervention. Both of these methods can overcome the limitations of manual feature design to varying degrees. In addition, the paper also introduces the application of parallel coordinate graph in feature selection. By using the parallel axis system to implement projection transformation of high-dimensional data, scholars can intuitively analyze the data structure, so as to promote the process of feature selection and optimization. This method not only gives insight into the subtle relationship between the data, but also cleverly stimulates the potential of human pattern recognition and further improves the comprehensive performance of the model.
APA, Harvard, Vancouver, ISO, and other styles
8

Lu, Songyuanyi. "Technical Features and Trends of Data Science in Financial Engineering." Frontiers in Business, Economics and Management 4, no. 3 (July 31, 2022): 34–37. http://dx.doi.org/10.54097/fbem.v4i3.1068.

Full text
Abstract:
In the new financial era, the huge amount of data brings more challenges to the traditional financial business and creates unprecedented opportunities at the same time. In the financial industry, the use of data science by financial institutions has significantly deepened, from the traditional "data visualization presentation" to "data-based decision analysis". This paper analyzes the technical characteristics and development trend of data science in financial engineering against the background of rapid development of financial technology.
APA, Harvard, Vancouver, ISO, and other styles
9

Chen, Jingcheng, Yining Sun, and Shaoming Sun. "Improving Human Activity Recognition Performance by Data Fusion and Feature Engineering." Sensors 21, no. 3 (January 20, 2021): 692. http://dx.doi.org/10.3390/s21030692.

Full text
Abstract:
Human activity recognition (HAR) is essential in many health-related fields. A variety of technologies based on different sensors have been developed for HAR. Among them, fusion from heterogeneous wearable sensors has been developed as it is portable, non-interventional and accurate for HAR. To be applied in real-time use with limited resources, the activity recognition system must be compact and reliable. This requirement can be achieved by feature selection (FS). By eliminating irrelevant and redundant features, the system burden is reduced with good classification performance (CP). This manuscript proposes a two-stage genetic algorithm-based feature selection algorithm with a fixed activation number (GFSFAN), which is implemented on the datasets with a variety of time, frequency and time-frequency domain features extracted from the collected raw time series of nine activities of daily living (ADL). Six classifiers are used to evaluate the effects of selected feature subsets from different FS algorithms on HAR performance. The results indicate that GFSFAN can achieve good CP with a small size. A sensor-to-segment coordinate calibration algorithm and lower-limb joint angle estimation algorithm are introduced. Experiments on the effect of the calibration and the introduction of joint angle on HAR shows that both of them can improve the CP.
APA, Harvard, Vancouver, ISO, and other styles
10

Salii, Yevhenii, Alla Lavreniuk, and Nataliia Kussul. "Statistical methods of feature engineering for the problem of forest state classification using satellite data." System research and information technologies, no. 1 (March 29, 2024): 86–98. http://dx.doi.org/10.20535/srit.2308-8893.2024.1.07.

Full text
Abstract:
Timely detection of forest diseases is an important task for their prevention and spread limitation. The usage of satellite imagery provides capabilities for large-scale forest monitoring. Machine learning models allow to automate the analysis of these data for anomaly detection indicating diseases. However, selecting informative features is key to building an effective model. In this work, the application of Bhattacharyya distance and Spearman’s rank correlation coefficient for feature selection from satellite images was investigated. A greedy algorithm was applied to form a subset of weakly correlated features. The experiment showed that selected features allow for improving the classification quality compared to using all spectral bands. The proposed approach demonstrates effectiveness for informative and weakly correlated feature selection and can be utilized in other remote sensing tasks.
APA, Harvard, Vancouver, ISO, and other styles
11

Shao, Borong, Carlo Vittorio Cannistraci, and Tim OF Conrad. "Epithelial Mesenchymal Transition Network-Based Feature Engineering in Lung Adenocarcinoma Prognosis Prediction Using Multiple Omic Data." Genomics and Computational Biology 3, no. 3 (May 11, 2017): 57. http://dx.doi.org/10.18547/gcb.2017.vol3.iss3.e57.

Full text
Abstract:
Epithelial mesenchymal transition (EMT) process has been shown as highly relevant to cancer prognosis. However, although different biological network-based biomarker identification methods have been proposed to predict cancer prognosis, EMT network has not been directly used for this purpose. In this study, we constructed an EMT regulatory network consisting of 87 molecules and tried to select features that are useful for prognosis prediction in Lung Adenocarcinoma (LUAD). To incorporate multiple molecular profiles, we obtained four types of molecular data including mRNA-Seq, copy number alteration (CNA), DNA methylation, and miRNA-Seq data from The Cancer Genome Atlas. The data were mapped to the EMT network in three alternative ways: mRNA-Seq and miRNA-Seq, DNA methylation, and CNA and miRNA-Seq. Each mapping was employed to extract five different sets of features using discretization and network-based biomarker identification methods. Each feature set was then used to predict prognosis with SVM and logistic regression classifiers. We measured the prediction accuracy with AUC and AUPR values using 10 times 10-fold cross validation. For a more comprehensive evaluation, we also measured the prediction accuracies of clinical features, EMT plus clinical features, randomly picked 87 molecules from each data mapping, and using all molecules from each data type. Counter-intuitively, EMT features do not always outperform randomly selected features and the prediction accuracies of the five feature sets are mostly not significantly different. Clinical features are shown to give the highest prediction accuracies. In addition, the prediction accuracies of both EMT features and random features are comparable as using all features (more than 17,000) from each data type.
APA, Harvard, Vancouver, ISO, and other styles
12

Martanto, Martanto, Andri Dian Nugraha, David P. Sahara, Devy Kamil Syahbana, Puput P. Rahsetyo, Imam C. Priambodo, and Ardianto Ardianto. "Features Engineering and Features Extraction of Volcano-Tectonic (VT) Earthquake." Journal of Physics: Conference Series 2243, no. 1 (June 1, 2022): 012034. http://dx.doi.org/10.1088/1742-6596/2243/1/012034.

Full text
Abstract:
Abstract A volcano-Tectonic earthquake, commonly referred to as VT, is an earthquake caused by magma intrusion that increases the pressure below the volcano’s surface. The accumulation of stress that continuously affects the elasticity of rocks causes fractures when the elasticity limit of rocks is exceeded. VT is one of the earthquakes used as a parameter to decide the level of volcanic activity. To understand the characteristics of VT, it is necessary to do features engineering, which is a process of extracting features so that the characteristics of VT are obtained. The data used in this study was the VT earthquake when Agung was in crisis in 2017. The extraction process is conducted by performing statistics calculations in temporal and spectral domains. The waveform of VT is univariate time series data, and to perform the extraction of features, this study uses changes in amplitude value to the time taken from the waveform. Features that were successfully extracted from this study are as many as 48 features. The result of the extraction of these features can be used as input parameters in performing auto-classification of VT using machine learning.
APA, Harvard, Vancouver, ISO, and other styles
13

Jiang, Ling Yun, and Zhi Biao Wang. "Data Collection and Model Construction Methods for Reverse Engineering." Advanced Materials Research 102-104 (March 2010): 189–93. http://dx.doi.org/10.4028/www.scientific.net/amr.102-104.189.

Full text
Abstract:
The process of creating a CAD model from an object is mainly made up of two steps: the data collection through digital measurement and the construction of parameterized and revisable model. This paper discusses the measuring process and technical problems of the Coordinate Measuring Machine (CMM) and non-contact sensor. Through comparative analysis, we determine the application scope of those approaches in measuring different dimensions of the same objects considering the time efficiency and tolerance requirement. This paper divide the objects into two categories: freeform feature objects and regular feature objects. As for the freeform feature objects, people could fit wrap-around B-spline surfaces to construct the model. Regular feature objects for mass produce contain machined surfaces which should be precisely measured and modeled. The model of regular feature object should be constructed by three-dimensional modeling software, so that it is parametric and revisable for changing and improving the original design. Sizes and position of important surfaces of the model are acquired from CMM, and those of non-important features are fitted though point cloud processing. Some profile can’t be measured directly from CMM but should be precise, so this paper proposed two methods to construct profile line and analyze error by comparing it with point cloud.
APA, Harvard, Vancouver, ISO, and other styles
14

Jemai, Jaber, and Anis Zarrad. "Feature Selection Engineering for Credit Risk Assessment in Retail Banking." Information 14, no. 3 (March 22, 2023): 200. http://dx.doi.org/10.3390/info14030200.

Full text
Abstract:
In classification, feature selection engineering helps in choosing the most relevant data attributes to learn from. It determines the set of features to be rejected, supposing their low contribution in discriminating the labels. The effectiveness of a classifier passes mainly through the set of selected features. In this paper, we identify the best features to learn from in the context of credit risk assessment in the financial industry. Financial institutions concur with the risk of approving the loan request of a customer who may default later, or rejecting the request of a customer who can abide by their debt without default. We propose a feature selection engineering approach to identify the main features to refer to in assessing the risk of a loan request. We use different feature selection methods including univariate feature selection (UFS), recursive feature elimination (RFE), feature importance using decision trees (FIDT), and the information value (IV). We implement two variants of the XGBoost classifier on the open data set provided by the Lending Club platform to evaluate and compare the performance of different feature selection methods. The research shows that the most relevant features are found by the four feature selection techniques.
APA, Harvard, Vancouver, ISO, and other styles
15

Shukla, Khyati, William Holderbaum, Theodoros Theodoridis, and Guowu Wei. "Enhancing Gearbox Fault Diagnosis through Advanced Feature Engineering and Data Segmentation Techniques." Machines 12, no. 4 (April 14, 2024): 261. http://dx.doi.org/10.3390/machines12040261.

Full text
Abstract:
Efficient gearbox fault diagnosis is crucial for the cost-effective maintenance and reliable operation of rotating machinery. Despite extensive research, effective fault diagnosis remains challenging due to the multitude of features available for classification. Traditional feature selection methods often fail to achieve optimal performance in fault classification tasks. This study introduces diverse ranking methods for selecting the relevant features and utilizes data segmentation techniques such as sliding, windowing, and bootstrapping to strengthen predictive model performance and scalability. A comparative analysis of these methods was conducted to identify the potential causes and future solutions. An evaluation of the impact of enhanced feature engineering and data segmentation on predictive maintenance in gearboxes revealed promising outcomes, with decision trees, SVM, and KNN models outperforming others. Additionally, within a fully connected network, windowing emerged as a more robust and efficient segmentation method compared to bootstrapping. Further research is necessary to assess the performance of these techniques across diverse datasets and applications, offering comprehensive insights for future studies in fault diagnosis and predictive maintenance.
APA, Harvard, Vancouver, ISO, and other styles
16

Schreve, K., C. L. Goussard, A. H. Basson, and D. Dimitrov. "Interactive Feature Modeling for Reverse Engineering." Journal of Computing and Information Science in Engineering 6, no. 4 (August 8, 2006): 422–24. http://dx.doi.org/10.1115/1.2364205.

Full text
Abstract:
In feature based reverse engineering entities, or features, having higher level engineering meaning are used to approximate point data. This is in contrast to approximating the data with free form NURBS surfaces. Currently no such system is operationally available. Interactive feature based modeling tools for feature extraction, edge detection, and draft angle approximation are presented here. Several case studies demonstrate the application of these algorithms.
APA, Harvard, Vancouver, ISO, and other styles
17

Khabbazi, Mahmood Reza, Jan Wikander, Mauro Onori, and Antonio Maffei. "Object-oriented design of product assembly feature data requirements in advanced assembly planning." Assembly Automation 38, no. 1 (February 5, 2018): 97–112. http://dx.doi.org/10.1108/aa-07-2016-084.

Full text
Abstract:
Purpose This paper introduces a schema for the product assembly feature data in an object-oriented and module-based format using Unified Modeling Language (UML). To link production with product design, it is essential to determine at an early stage which entities of product design and development are involved and used at the automated assembly planning and operations. To this end, it is absolutely reasonable to assign meaningful attributes to the parts’ design entities (assembly features) in a systematic and structured way. As such, this approach empowers processes such as motion planning and sequence planning in assembly design. Design/methodology/approach The assembly feature data requirements are studied and definitions are analyzed and redefined. Using object-oriented techniques, the assembly feature data structure and relationships are modeled based on the identified requirements as five UML packages (Part, three-dimensional (3D) models, Mating, Joint and Handling). All geometric and non-geometric design data entities endorsed with assembly design perspective are extracted or assigned from 3D models and realized through the featured entity interface class. The featured entities are then associated (used) with the mating, handling and joints features. The AssemblyFeature interface is realized through mating, handling and joint packages related to the assembly and part classes. Each package contains all relevant classes which further classify the important attributes of the main class. Findings This paper sets out to provide an explanatory approach using object-oriented techniques to model the schema of assembly features association and artifacts at the product design level, all of which are essential in several subsequent and parallel steps of the assembly planning process, as well as assembly feature entity assignments in design improvement cycle. Practical implications The practical implication based on the identified advantages can be classified in three main features: module-based design, comprehensive classification, integration. These features help the automation and solution development processes based on the proposed models much easier and systematic. Originality/value The proposed schema’s comprehensiveness and reliability are verified through comparisons with other works and the advantages are discussed in detail.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhang, Jingwen, Dingwen Li, Ruixuan Dai, Heidy Cos, Gregory A. Williams, Lacey Raper, Chet W. Hammill, and Chenyang Lu. "Predicting Post-Operative Complications with Wearables." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, no. 2 (July 4, 2022): 1–27. http://dx.doi.org/10.1145/3534578.

Full text
Abstract:
Post-operative complications and hospital readmission are of great concern to surgical patients and health care providers. Wearable devices such as Fitbit wristbands enable long-term and non-intrusive monitoring of patients outside clinical environments. To build accurate predictive models based on wearable data, however, requires effective feature engineering to extract high-level features from time series data collected by the wearable sensors. This paper presents a pipeline for developing clinical predictive models based on wearable sensors. The core of the pipeline is a multi-level feature engineering framework for extracting high-level features from fine-grained time series data. The framework integrates a set of techniques tailored for noisy and incomplete wearable data collected in real-world clinical studies: (1) singular spectrum analysis for extracting high-level features from daily features over the course of the study; (2) a set of daily features that are resilient to missing data in wearable time series data; (3) a K-Nearest Neighbors (KNN) method for imputing short missing heart rate segments; (4) the integration of patients' clinical characteristics and wearable features. We evaluated the feature engineering approach and machine learning models in a clinical study involving 61 patients undergoing pancreatic surgery. Linear support vector machine (SVM) with integrated feature engineering achieved an AUROC of 0.8802 for predicting post-operative readmission or severe complications, which significantly outperformed the existing rule-based model used in clinical practice and other state-of-the-art feature engineering approaches.
APA, Harvard, Vancouver, ISO, and other styles
19

Snigdha Tadanki and Sai Kiran Reddy Malikireddy. "Context-aware chatbots with data engineering for multi-turn conversations." World Journal of Advanced Engineering Technology and Sciences 4, no. 1 (December 30, 2021): 063–78. https://doi.org/10.30574/wjaets.2021.4.1.0061.

Full text
Abstract:
The traditional method of deploying chatbots, which only answered simple questions, has developed to the present form, where chatbots are complex conversational models able to handle a sequence of turns within a conversation. This research analyzes context-aware chatbots on moderates in data engineering approaches and state-of-the-art machine learning methods. Using such tactics, this study focuses on critical aspects like data preprocessing and feature engineering and on creating training pipelines for which this study intends to address core concerns entrenched in the challenge of achieving conversational context switching across multiple exchanges. Keeping the context relevant is one activity that defines the multitudinous turn-taking communication processes. The research also pays special attention to the preprocessing step, which removes the noise from the data and improves the training dataset improves the training dataset. Feature engineering stands central to extracting linguistic and contextual features as a precondition for models to understand the user input and continue conversation selectively. These processes are best trained from pipelines designed for such flows, including reiterative feedback loops to help the model learn how to store and manipulate context as it adapts. It also examines the adoption of front-end technologies to enrich the customer experience and create great customer feedback. The UI is built not only to represent the bot’s ability but also with an inglorious role of adapting to suit the user to maintain an interactive and realistic user-friendly dialog. Through the use of users’ friendly features, these interfaces act as a middle link between the complicated back-end systems and the consumers, making them more comfortable to use.
APA, Harvard, Vancouver, ISO, and other styles
20

Katya, Ekaterina. "Exploring Feature Engineering Strategies for Improving Predictive Models in Data Science." Research Journal of Computer Systems and Engineering 4, no. 2 (December 31, 2023): 201–15. http://dx.doi.org/10.52710/rjcse.88.

Full text
Abstract:
A crucial step in the data science pipeline, feature engineering has a big impact on how well predictive models function. This study explores several feature engineering techniques and how they affect the robustness and accuracy of models. In order to extract useful information from unprocessed data and improve the prediction capability of machine learning models, we study a variety of techniques, from straightforward transformations to cutting-edge approaches. The study starts by investigating basic methods including data scaling, one-hot encoding, and handling missing values. Then, we go on to more complex techniques like feature selection, dimensionality reduction, and interaction term creation. We also explore the possibilities for domain-specific feature engineering, which entails designing features specifically for the issue domain and utilising additional data sources to expand the feature space. We run extensive experiments on numerous datasets including different sectors, such as healthcare, finance, and natural language processing, in order to evaluate the efficacy of these methodologies. We evaluate model performance using metrics like recall, accuracy, precision, and F1-score to get a comprehensive picture of how feature engineering affects various predictive tasks. This study also assesses the computational expense related to each feature engineering technique, taking scalability and efficiency in practical applications into account. To assist practitioners in making wise choices during feature engineering, we address the trade-offs between model complexity and performance enhancements. Our results highlight the importance of feature engineering in data science and demonstrate how it may significantly improve prediction models in a variety of fields. This study is a useful tool for data scientists because it emphasises the significance of careful feature engineering as a foundation for creating reliable and accurate prediction models.
APA, Harvard, Vancouver, ISO, and other styles
21

Necula, Sabina-Cristiana, and Cătălin Strîmbei. "Top 10 Differences between Machine Learning Engineers and Data Scientists." Electronics 11, no. 19 (September 22, 2022): 3016. http://dx.doi.org/10.3390/electronics11193016.

Full text
Abstract:
Data science and machine learning are subjects largely debated in practice and in mainstream research. Very often, they are overlapping due to their common purpose: prediction. Therefore, data science techniques mix with machine earning techniques in their mutual attempt to gain insights from data. Data contains multiple possible predictors, not necessarily structured, and it becomes difficult to extract insights. Identifying important or relevant features that can help improve the prediction power or to better characterize clusters of data is still debated in the scientific literature. This article uses diverse data science and machine learning techniques to identify the most relevant aspects which differentiate data science and machine learning. We used a publicly available dataset that describes multiple users who work in the field of data engineering. Among them, we selected data scientists and machine learning engineers and analyzed the resulting dataset. We designed the feature engineering process and identified the specific differences in terms of features that best describe data scientists and machine learning engineers by using the SelectKBest algorithm, neural networks, random forest classifier, support vector classifier, cluster analysis, and self-organizing maps. We validated our model through different statistics. Better insights lead to better classification. Classifying between data scientists and machine learning engineers proved to be more accurate after features engineering.
APA, Harvard, Vancouver, ISO, and other styles
22

Korotynskyi, Anton, Liudmyla Zhuchenko, Vitalii Tsapar, and Andrii Savula. "Identification of the electric motor mathematical model based on a data sample with feature engineering." Eastern-European Journal of Enterprise Technologies 5, no. 1 (131) (October 25, 2024): 91–98. http://dx.doi.org/10.15587/1729-4061.2024.312610.

Full text
Abstract:
The object of this study is a mathematical model of a synchronous electric motor, obtained on the basis of experimental data, which takes into account the temperature mode and uses artificial features to increase the accuracy of its operation. A characteristic feature of this work is that the model takes into account the temperature mode as a component of the technical-operational state of the object. The resulting mathematical model could make it possible to synthesize an optimal automatic control system in terms of the operational state of the object. The problem addressed was to increase the accuracy of the identified mathematical models by applying the approach of feature engineering. The results showed that the identification of mathematical models by the initial data leads to a low level of accuracy of the obtained models, namely 65–70 % for the first output channel, 80–85 % for the second, and 75–80 % for the third, fourth, and fifth output channels. Accordingly, building models with a higher threshold of accuracy requires the use of other, more significant data for identification. This paper reports a method for reformatting the original data into artificial features and provides results of their effectiveness in relation to the original channels. The resulting artificial features and the original features were used for further identification; the resulting mathematical model has on average higher accuracy thresholds, namely 82 %, 93 %, 88 %, 85 % for the corresponding output channels. The results prove the effectiveness of applying the principle of feature engineering since the accuracy of the resulting model is 5–10 % higher compared to the baseline. The scope of practical application of the results includes the synthesis of automatic control systems based on mathematical models of control objects obtained as a result of identification.
APA, Harvard, Vancouver, ISO, and other styles
23

An, Yi, Zhuohan Li, and Cheng Shao. "Feature Extraction from 3D Point Cloud Data Based on Discrete Curves." Mathematical Problems in Engineering 2013 (2013): 1–19. http://dx.doi.org/10.1155/2013/290740.

Full text
Abstract:
Reliable feature extraction from 3D point cloud data is an important problem in many application domains, such as reverse engineering, object recognition, industrial inspection, and autonomous navigation. In this paper, a novel method is proposed for extracting the geometric features from 3D point cloud data based on discrete curves. We extract the discrete curves from 3D point cloud data and research the behaviors of chord lengths, angle variations, and principal curvatures at the geometric features in the discrete curves. Then, the corresponding similarity indicators are defined. Based on the similarity indicators, the geometric features can be extracted from the discrete curves, which are also the geometric features of 3D point cloud data. The threshold values of the similarity indicators are taken from[0,1], which characterize the relative relationship and make the threshold setting easier and more reasonable. The experimental results demonstrate that the proposed method is efficient and reliable.
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Xiao Yu, Bo Liu, and Xin Xia. "Ensemble Learning in Data Mining of Fetal Cardiotocograms." Advanced Materials Research 945-949 (June 2014): 2505–8. http://dx.doi.org/10.4028/www.scientific.net/amr.945-949.2505.

Full text
Abstract:
ReliefF feature selection and LogitBoost ensemble learning method are employed in the data mining procedure of 2126 fetal cardiotocograms (CTGs). Based on 10 critical features selected by ReliefF and the full 21 features, LogitBoost algorithm almost outperforms the other three ensemble learning methods of Stacking, Bagging and AdaBoostM1 in ACC (%) and AUC in classification, and the ACC (%) and AUC of LogitBoost algorithm are achieved to 94.45% and 0.977 based on the critical features from ReliefF.
APA, Harvard, Vancouver, ISO, and other styles
25

Merkelbach, Silke, Lameya Afroze, Nils Janssen, Sebastian von Enzberg, Arno Kühn, and Roman Dumitrescu. "Using vibration data to classify conditions in disk stack separators." Vibroengineering PROCEDIA 46 (November 18, 2022): 21–26. http://dx.doi.org/10.21595/vp.2022.23000.

Full text
Abstract:
Mounting sensors in disk stack separators is often a major challenge due to the operating conditions. However, a process cannot be optimally monitored without sensors. Virtual sensors can be a solution to calculate the sought parameters from measurable values. We measured the vibrations of disk stack separators and applied machine learning (ML) to detect whether the separator contains only water or whether particles are also present. We combined seven ML classification algorithms with three feature engineering strategies and evaluated our model successfully on vibration data of an experimental disk stack separator. Our experimental results demonstrate that random forest in combination with manual feature engineering using domain specific knowledge about suitable features outperforms all other models with an accuracy of 91.27 %.
APA, Harvard, Vancouver, ISO, and other styles
26

Fong, Simon, Yan Zhuang, Rui Tang, Xin-She Yang, and Suash Deb. "Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search." Journal of Applied Mathematics 2013 (2013): 1–18. http://dx.doi.org/10.1155/2013/590614.

Full text
Abstract:
Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.
APA, Harvard, Vancouver, ISO, and other styles
27

Brykov, Michail Nikolaevich, Ivan Petryshynets, Catalin Iulian Pruncu, Vasily Georgievich Efremenko, Danil Yurievich Pimenov, Khaled Giasin, Serhii Anatolievich Sylenko, and Szymon Wojciechowski. "Machine Learning Modelling and Feature Engineering in Seismology Experiment." Sensors 20, no. 15 (July 29, 2020): 4228. http://dx.doi.org/10.3390/s20154228.

Full text
Abstract:
This article aims to discusses machine learning modelling using a dataset provided by the LANL (Los Alamos National Laboratory) earthquake prediction competition hosted by Kaggle. The data were obtained from a laboratory stick-slip friction experiment that mimics real earthquakes. Digitized acoustic signals were recorded against time to failure of a granular layer compressed between steel plates. In this work, machine learning was employed to develop models that could predict earthquakes. The aim is to highlight the importance and potential applicability of machine learning in seismology The XGBoost algorithm was used for modelling combined with 6-fold cross-validation and the mean absolute error (MAE) metric for model quality estimation. The backward feature elimination technique was used followed by the forward feature construction approach to find the best combination of features. The advantage of this feature engineering method is that it enables the best subset to be found from a relatively large set of features in a relatively short time. It was confirmed that the proper combination of statistical characteristics describing acoustic data can be used for effective prediction of time to failure. Additionally, statistical features based on the autocorrelation of acoustic data can also be used for further improvement of model quality. A total of 48 statistical features were considered. The best subset was determined as having 10 features. Its corresponding MAE was 1.913 s, which was stable to the third decimal point. The presented results can be used to develop artificial intelligence algorithms devoted to earthquake prediction.
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Shuxia, Bin Fu, Hongzhi Liu, Zhengshen Jiang, Zhonghai Wu, and D. Frank Hsu. "Feature Engineering for Credit Risk Evaluation in Online P2P Lending." International Journal of Software Science and Computational Intelligence 9, no. 2 (April 2017): 1–13. http://dx.doi.org/10.4018/ijssci.2017040101.

Full text
Abstract:
The rise of online P2P lending, as a novel economic lending model, brings new opportunities and challenges for the research of credit risk evaluation. This paper aims to mine information from different data sources to improve the performance of credit risk evaluation models. Be-sides the personal financial and demographic data used in traditional models, the authors collect in-formation from (1) text description, (2) social network and (3) macro-economic data. They de-sign methods to extract features from unstructured data. To avoid the curse of dimensionality caused by too many features and identify the key factors in credit risk, the authors remove the irrelevant and redundant features by feature selection. Using the data provided by Prosper.com, one of the biggest P2P lending platforms in the world, they show that: (1) it can achieve better performance, measured by both AUC (area under the receiver operating characteristic curve) and classification accuracy, by fusion of information from different data sources; (2) it requires only ten features from different data sources to get better performance.
APA, Harvard, Vancouver, ISO, and other styles
29

M L, Ravikumar, and Nagashree J. "Enhancing Predictive Modeling in Kuala Lumpur Real Estate: A Comprehensive Data Preprocessing and Feature Engineering Approach." International Journal of All Research Education and Scientific Methods 12, no. 04 (2023): 833–38. http://dx.doi.org/10.56025/ijaresm.2023.120124833.

Full text
Abstract:
The real estate market in Kuala Lumpur exhibits complex dynamics influenced by various factors. This paper presents a comprehensive approach to enhance the prediction of real estate prices in Kuala Lumpur utilizing a dataset of 49,416 records with advanced data preprocessing techniques, exploratory data analysis (EDA), and predictive modeling. Initially, the dataset undergoes rigorous preprocessing including handling missing values through property type-specific mean imputation, label encoding categorical variables, and standardization to ensure uniformity and compatibility for analysis. Subsequently, EDA techniques are employed to gain insights into the dataset's characteristics and relationships among variables. To improve model performance and interpretability, feature selection is performed based on Mutual Information (MI) score and correlation metrics. This aids in identifying the most relevant features for predicting real estate prices in Kuala Lumpur. Additionally, feature engineering techniques are applied to create two new features that capture nuanced aspects of the real estate market. The predictive modeling phase employs a Linear Regression algorithm to forecast real estate prices. Leveraging the preprocessed data and optimized feature set, the model aims to accurately predict property prices based on available features. The linear regression model offers interpretability, enabling stakeholders to understand the driving factors behind price variations. Through a rigorous methodology encompassing data preprocessing, exploratory analysis, feature selection, engineering, and predictive modeling, this study contributes to enhancing the accuracy and interpretability of real estate price prediction in Kuala Lumpurthat applies advanced machine learning methods. The findings offer valuable insights for real estate investors, policymakers, and stakeholders to make informed decisions in this dynamic market landscape.
APA, Harvard, Vancouver, ISO, and other styles
30

Chinthamu, Narender, Chandrasekar Venkatachalam, Muthuvairavan Pillai.N, Setti Vidya Sagar Appaji, and M. Murali. "Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 4s (May 5, 2023): 288–95. http://dx.doi.org/10.17762/ijritcc.v11i4s.6540.

Full text
Abstract:
The technology of big data has become highly popular in numerous industries owing to its various characteristics such as high value, large volume, rapid velocity, wide variety, and significant variability. Nevertheless, big data presents several difficulties that must be addressed, including lengthy processing times, high computational complexity, imprecise features, significant sparsity, irrelevant terms, redundancy, and noise, all of which can have an adverse effect on the performance of feature extraction. The objective of this research is to tackle these issues by utilizing the Partial Least Square Generalized Linear Regression (G-PLSGLR) approach to decrease the high dimensionality of text data. The suggested algorithm is made up of four stages: Firstly, gathering featured data in vector space model (VSM) and training it with bootstrap technique. Second, grouping trained feature samples using a Pearson correlation coefficient and graph-based technique. Third, getting rid of unimportant features by ranking significant group features using PLSGR. Lastly, choosing or extracting significant features using Bayesian information criterion (BIC). The G-PLSGLR algorithm surpasses current methods by achieving a high reduction rate and classification performance, while minimizing feature redundancy, time consumption, and complexity. Furthermore, it enhances the accuracy of features by 35%.
APA, Harvard, Vancouver, ISO, and other styles
31

Yang, Dazhi, Zibo Dong, Li Hong I. Lim, and Licheng Liu. "Analyzing big time series data in solar engineering using features and PCA." Solar Energy 153 (September 2017): 317–28. http://dx.doi.org/10.1016/j.solener.2017.05.072.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Antonini, Valerio, Alessandra Mileo, and Mark Roantree. "Engineering Features from Raw Sensor Data to Analyse Player Movements during Competition." Sensors 24, no. 4 (February 18, 2024): 1308. http://dx.doi.org/10.3390/s24041308.

Full text
Abstract:
Research in field sports often involves analysis of running performance profiles of players during competitive games with individual, per-position, and time-related descriptive statistics. Data are acquired through wearable technologies, which generally capture simple data points, which in the case of many team-based sports are times, latitudes, and longitudes. While the data capture is simple and in relatively high volumes, the raw data are unsuited to any form of analysis or machine learning functions. The main goal of this research is to develop a multistep feature engineering framework that delivers the transformation of sequential data into feature sets more suited to machine learning applications.
APA, Harvard, Vancouver, ISO, and other styles
33

P. Dinesh kumar, Dr. B. Subramani. "Stock Market Data Using Data Mining For Feature Extraction." Tuijin Jishu/Journal of Propulsion Technology 44, no. 4 (October 26, 2023): 2062–70. http://dx.doi.org/10.52783/tjjpt.v44.i4.1181.

Full text
Abstract:
This paper presents a robust approach for feature extraction from stock market data by combining Principal Component Analysis (IPCA) and Moving Averages (MA). IPCA reduces dimensionality, capturing underlying patterns, while MAs identify trends and cyclic behaviors. The synergistic integration of these techniques enhances the extraction of essential features for stock market analysis. Research method effectively uncovers relevant information, offering valuable insights for trading and investment decisions. It addresses dimensionality challenges and identifies meaningful patterns, promoting a deeper understanding of market dynamics.
APA, Harvard, Vancouver, ISO, and other styles
34

Salazar, Ricardo, Felix Neutatz, and Ziawasch Abedjan. "Automated feature engineering for algorithmic fairness." Proceedings of the VLDB Endowment 14, no. 9 (May 2021): 1694–702. http://dx.doi.org/10.14778/3461535.3463474.

Full text
Abstract:
One of the fundamental problems of machine ethics is to avoid the perpetuation and amplification of discrimination through machine learning applications. In particular, it is desired to exclude the influence of attributes with sensitive information, such as gender or race, and other causally related attributes on the machine learning task. The state-of-the-art bias reduction algorithm Capuchin breaks the causality chain of such attributes by adding and removing tuples. However, this horizontal approach can be considered invasive because it changes the data distribution. A vertical approach would be to prune sensitive features entirely. While this would ensure fairness without tampering with the data, it could also hurt the machine learning accuracy. Therefore, we propose a novel multi-objective feature selection strategy that leverages feature construction to generate more features that lead to both high accuracy and fairness. On three well-known datasets, our system achieves higher accuracy than other fairness-aware approaches while maintaining similar or higher fairness.
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Jinyu, Caiping Zhang, Xiangfeng Meng, Linjing Zhang, Xu Li, and Weige Zhang. "A Novel Feature Engineering-Based SOH Estimation Method for Lithium-Ion Battery with Downgraded Laboratory Data." Batteries 10, no. 4 (April 19, 2024): 139. http://dx.doi.org/10.3390/batteries10040139.

Full text
Abstract:
Accurate estimation of lithium-ion battery state of health (SOH) can effectively improve the operational safety of electric vehicles and optimize the battery operation strategy. However, previous SOH estimation algorithms developed based on high-precision laboratory data have ignored the discrepancies between field and laboratory data, leading to difficulties in field application. Therefore, aiming to bridge the gap between the lab-developed models and the field operational data, this paper presents a feature engineering-based SOH estimation method with downgraded laboratory battery data, applicable to real vehicles under different operating conditions. Firstly, a data processing pipeline is proposed to downgrade laboratory data to operational fleet-level data. The six key features are extracted on the partial ranges to capture the battery’s aging state. Finally, three machine learning (ML) algorithms for easy online deployment are employed for SOH assessment. The results show that the hybrid feature set performs well and has high accuracy in SOH estimation for downgraded data, with a minimum root mean square error (RMSE) of 0.36%. Only three mechanism features derived from the incremental capacity curve can still provide a proper assessment, with a minimum RMSE of 0.44%. Voltage-based features can assist in evaluating battery state, improving accuracy by up to 20%.
APA, Harvard, Vancouver, ISO, and other styles
36

Kamila, Vina Zahrotun, Islamiyah, Ade Nugraha, Nazila Fairuz Assyifa, and Rara Puspa Aisyah. "Engineering Students’ Perspectives on Progress Tracking and Badge Features." Journal of Learning and Development Studies 1, no. 1 (November 14, 2021): 94–99. http://dx.doi.org/10.32996/jlds.2021.1.1.9.

Full text
Abstract:
The purpose of this study was to examine the opinions of engineering students from various departments regarding the progress tracking and digital badge features applied in courses in the Learning Management System (LMS). The phenomenological design which is a qualitative research method is used in this study. Online forms with close and open questions were used to collect data. The study was conducted with a total of 226 students, who studied in 5 different departments. The data is subject to content analysis. According to the research results, students stated that the features tested triggered motivation to complete assignments and develop other students. As for the opinion of some students that these features should be held in every course in the LMS. Despite the general view of an LMS equipped with these features (Moodle), their general view of the features, especially the progress tracking feature and digital badges is quite good. What needs to be questioned is the willingness of lecturers and policymakers to apply it consistently in the campus environment.
APA, Harvard, Vancouver, ISO, and other styles
37

Park, S., and Y. Jun. "Automated segmentation of point data in a feature-based reverse engineering system." Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture 216, no. 3 (March 1, 2002): 445–51. http://dx.doi.org/10.1243/0954405021519951.

Full text
Abstract:
This paper proposes a novel methodology for robust segmentation of scanned point data that has been implemented in the feature-based reverse engineering system (FBRES). In the proposed method, firstly triangle meshes are generated from the input point data. The normal and the area of generated meshes are checked to find boundary meshes using the angle deviation criterion and the area criterion based upon a region-growing technique. Boundary meshes of each segmented region are connected into loops. Then, the meshes closed by each boundary loop are segmented into a single distinctive region. Finally, all segmented regions are mapped into a single feature using an artificial neural network (ANN) based feature recognizer. The FBRES is currently dedicated for reconstructing prismatic features such as a block, pocket, step, slot, hole and boss, which are very common and crucial in mechanical engineering products. The effectiveness of the proposed segmentation method is validated with experimental results.
APA, Harvard, Vancouver, ISO, and other styles
38

Majidiyan, Hamed, Hossein Enshaei, Damon Howe, and Eric Gubesch. "Part B: Innovative Data Augmentation Approach to Boost Machine Learning for Hydrodynamic Purposes—Computational Efficiency." Applied Sciences 15, no. 1 (January 1, 2025): 346. https://doi.org/10.3390/app15010346.

Full text
Abstract:
The increasing influence of AI across various scientific domains has prompted engineering to embark on new explorations. However, studies often overlook the foundational aspects of the maritime field, leading to over-optimistic or oversimplified outputs for real-world applications. We previously highlighted the sensitivity of trained models to noise, the importance of computational efficiency, and the need for feature engineering/compactness in hydrodynamic models due to the stochastic nature of waves. A novel data analysis framework was introduced with two purposes to augment data for machine learning (ML) models: transferring features from high-fidelity to low-fidelity surrogates and enhancing simulation data and increasing computational efficiency. The current issue addresses the second objectives. Wave-induced response time series data from experiments on a spherical model under various wave conditions were analyzed using continuous wavelet transform to extract spectral-temporal features. These features were then reorganized into a new feature map and augmented with additional endogenous features to enhance their uniqueness. Different ML models were trained; the new framework substantially reduced training costs while maintaining fair accuracy, with training times slashed from hours to seconds. The significance of the current study extends beyond the maritime context and can be utilized for ML applications in intrinsically stochastic data.
APA, Harvard, Vancouver, ISO, and other styles
39

E. Ramadevi, K. Brindha,. "Twitter Data Feature Selection Using Enhanced Genetic Algorithm." Tuijin Jishu/Journal of Propulsion Technology 44, no. 4 (October 16, 2023): 7738–46. http://dx.doi.org/10.52783/tjjpt.v44.i4.2670.

Full text
Abstract:
Feature selection is a basic critical task in sentiment analysis, especially while analyzing Twitter data for stock market sentiment. This paper proposes an enhanced genetic algorithm (GA) for feature selection utilizing Finance Yahoo stocks data and openly accessible Twitter data. The objective is to distinguish the most relevant features that can successfully anticipate stock market sentiment. The proposed GA integrates methods to enhance the investigation and double-dealing capacities, empowering it to look through a bigger feature space and work on the nature of chosen features. The algorithm starts by introducing a populace of random binary chromosomes, with every chromosome addressing a feature subset. Wellness assessment is performed utilizing sentiment analysis strategies to survey the prescient force of each feature subset. Trial assessment utilizing Finance Yahoo stocks and Twitter data shows that the enhanced GA beats customary GA and PSO strategies concerning exactness and forecast performance. The proposed approach gives important experiences to sentiment analysis and feature selection with regards to stock market sentiment utilizing Twitter data.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhang, Qian, Kaihong Yang, Lihui Wang, and Siyang Zhou. "Geological Type Recognition by Machine Learning on In-Situ Data of EPB Tunnel Boring Machines." Mathematical Problems in Engineering 2020 (April 27, 2020): 1–10. http://dx.doi.org/10.1155/2020/3057893.

Full text
Abstract:
At present, many large-scale engineering equipment can obtain massive in-situ data at runtime. In-depth data mining is conducive to the real-time understanding of equipment operation status or recognition of service environment. This paper proposes a geological type recognition system by the analysis of in-situ data recorded during TBM tunneling to address geological information acquisition during TBM construction. Owing to high dimensionality and nonlinear coupling between parameters of TBM in-situ data, the dimensionality reduction feature engineering and machine learning methods are introduced into TBM in-situ data analysis. The chi-square test is used to screen for sensitive features due to the disobedience to common distributions of TBM parameters. Considering complex relationships, ANN, SVM, KNN, and CART algorithms are used to construct a geology recognition classifier. A case study of a subway tunnel project constructed using an earth pressure balance tunnel boring machine (EPB-TBM) in China is used to verify the effectiveness of the proposed geological recognition method. The result shows that the recognition accuracy gradually increases to a stable level with the increase of input features, and the accuracy of all algorithms is higher than 97%. Seven features are considered as the best selection strategy among SVM, KNN, and ANN, while feature selection is an inherent part of the CART method which shows a good recognition performance. This work provides an intelligent path for obtaining geological information for underground excavation TBM projects and a possibility for solving the problem of engineering recognition of more complex geological conditions.
APA, Harvard, Vancouver, ISO, and other styles
41

Jiang, Hong, Wen Lei Sun, Mamtimin Gheni, and Yong Fang Shi. "Extracting the Features Point Data Based on the FE and RE Softwares." Key Engineering Materials 462-463 (January 2011): 1062–67. http://dx.doi.org/10.4028/www.scientific.net/kem.462-463.1062.

Full text
Abstract:
In view of the bottleneck of the interfaces between Forward Engineering (FE) and Reverse Engineering (RE) softwares, the method of extracting feature for hybrid modeling of FE and RE is attempted to present based on secondary development for UG NX by using VC++. The method is proved through two simple surface modeling.
APA, Harvard, Vancouver, ISO, and other styles
42

Beyer, Christian, Maik Büttner, Vishnu Unnikrishnan, Miro Schleicher, Eirini Ntoutsi, and Myra Spiliopoulou. "Active feature acquisition on data streams under feature drift." Annals of Telecommunications 75, no. 9-10 (July 8, 2020): 597–611. http://dx.doi.org/10.1007/s12243-020-00775-2.

Full text
Abstract:
Abstract Traditional active learning tries to identify instances for which the acquisition of the label increases model performance under budget constraints. Less research has been devoted to the task of actively acquiring feature values, whereupon both the instance and the feature must be selected intelligently and even less to a scenario where the instances arrive in a stream with feature drift. We propose an active feature acquisition strategy for data streams with feature drift, as well as an active feature acquisition evaluation framework. We also implement a baseline that chooses features randomly and compare the random approach against eight different methods in a scenario where we can acquire at most one feature at the time per instance and where all features are considered to cost the same. Our initial experiments on 9 different data sets, with 7 different degrees of missing features and 8 different budgets show that our developed methods outperform the random acquisition on 7 data sets and have a comparable performance on the remaining two.
APA, Harvard, Vancouver, ISO, and other styles
43

Khaliq, Ali Raza, Subhan Ullah, Tahir Ahmad, Ashish Yadav, and M. Imran Majid. "Behavioral Analysis of Backdoor Malware Exploiting Heap Overflow Vulnerabilities Using Data Mining and Machine Learning." Pakistan Journal of Engineering, Technology & Science 11, no. 1 (November 14, 2023): 1–14. http://dx.doi.org/10.22555/pjets.v11i1.984.

Full text
Abstract:
Backdoor malware remains a persistent and elusive threat that successfully evades conventional detection methods through intricate techniques, such as registry key concealment and API call manipulation. In this study, we introduce an approach to detect backdoor malware, drawing upon the diverse domains of cybersecurity. Our method combines static and dynamic analysis techniques with machine learning methodologies, particularly emphasizing classification and feature engineering. Through static analysis, we extract valuable raw features from malware binaries. Discerning the most significant attributes, we delve into the calling frequencies embedded within these raw features. Subsequently, these selected attributes undergo a meticulous refinement process facilitated by feature engineering techniques, culminating in a streamlined set of distinctive features. To accurately detect malware exploiting heap-based overflow vulnerabilities, we employ three distinct yet potent classifiers: J48, Naïve Bayes, and Simple Logistic. These classifiers are trained and tested using carefully curated feature sets. Our approach combines machine learning and data mining principles to develop a comprehensive malware detection methodology. We demonstrate the efficacy of our approach through rigorous validation using two distinct settings: a dedicated training/testing set and a comprehensive 10-fold validation. Our approach simultaneously achieves 90.29% and 84.46% accuracy in train/ test split and cross-validation strategies.
APA, Harvard, Vancouver, ISO, and other styles
44

Yoon, Jaehan, and Sooyoung Cha. "FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic Execution." Proceedings of the ACM on Software Engineering 1, FSE (July 12, 2024): 2447–68. http://dx.doi.org/10.1145/3660815.

Full text
Abstract:
We present FeatMaker, a novel technique that automatically generates state features to enhance the search strategy of symbolic execution. Search strategies, designed to address the well-known state-explosion problem, prioritize which program states to explore. These strategies typically depend on a ”state feature” that describes a specific property of program states, using this feature to score and rank them. Recently, search strategies employing multiple state features have shown superior performance over traditional strategies that use a single, generic feature. However, the process of designing these features remains largely manual. Moreover, manually crafting state features is both time-consuming and prone to yielding unsatisfactory results. The goal of this paper is to fully automate the process of generating state features for search strategies from scratch. The key idea is to leverage path-conditions, which are basic but vital information maintained by symbolic execution, as state features. A challenge arises when employing all path-conditions as state features, as it results in an excessive number of state features. To address this, we present a specialized algorithm that iteratively generates and refines state features based on data accumulated during symbolic execution. Experimental results on 15 open-source C programs show that FeatMaker significantly outperforms existing search strategies that rely on manually-designed features, both in terms of branch coverage and bug detection. Notably, FeatMaker achieved an average of 35.3% higher branch coverage than state-of-the-art strategies and discovered 15 unique bugs. Of these, six were detected exclusively by FeatMaker.
APA, Harvard, Vancouver, ISO, and other styles
45

Liu, Zhenyu, Tao Wen, Wei Sun, and Qilong Zhang. "A Novel Multiway Splits Decision Tree for Multiple Types of Data." Mathematical Problems in Engineering 2020 (November 12, 2020): 1–12. http://dx.doi.org/10.1155/2020/7870534.

Full text
Abstract:
Classical decision trees such as C4.5 and CART partition the feature space using axis-parallel splits. Oblique decision trees use the oblique splits based on linear combinations of features to potentially simplify the boundary structure. Although oblique decision trees have higher generalization accuracy, most oblique split methods are not directly conducive to the categorical data and are computationally expensive. In this paper, we propose a multiway splits decision tree (MSDT) algorithm, which adopts feature weighting and clustering. This method can combine multiple numerical features, multiple categorical features, or multiple mixed features. Experimental results show that MSDT has excellent performance for multiple types of data.
APA, Harvard, Vancouver, ISO, and other styles
46

Procházka, Aleš, Jiří Kuchyňka, Oldřich Vyšata, Martin Schätz, Mohammadreza Yadollahi, Saeid Sanei, and Martin Vališ. "Sleep scoring using polysomnography data features." Signal, Image and Video Processing 12, no. 6 (February 10, 2018): 1043–51. http://dx.doi.org/10.1007/s11760-018-1252-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Meethongjan, Kittikhun, Vinh Truong Hoang, and Thongchai Surinwarangkoon. "Data augmentation by combining feature selection and color features for image classification." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 6 (December 1, 2022): 6172. http://dx.doi.org/10.11591/ijece.v12i6.pp6172-6177.

Full text
Abstract:
<span lang="EN-US">Image classification is an essential task in computer vision with various applications such as bio-medicine, industrial inspection. In some specific cases, a huge training data is required to have a better model. However, it is true that full label data is costly to obtain. Many basic pre-processing methods are applied for generating new images by translation, rotation, flipping, cropping, and adding noise. This could lead to degrade the performance. In this paper, we propose a method for data augmentation based on color features information combining with feature selection. This combination allows improving the classification accuracy. The proposed approach is evaluated on several texture datasets by using local binary patterns features.</span>
APA, Harvard, Vancouver, ISO, and other styles
48

Newman, Katelyn E., Chris K. Mechefske, Markus Timusk, and Dustin Helm. "Rotary-percussive drill bit condition prediction using traditional feature engineering and neural network-based feature extraction." Proceedings of the International Conference on Condition Monitoring and Asset Management 2023, no. 1 (January 1, 2023): 1–12. http://dx.doi.org/10.1784/cm2023.2e2.

Full text
Abstract:
Condition monitoring of replaceable components in underground drill rigs using machine learning is a difficult task, as the operating conditions may vary considerably between each hole. To model this nuanced data with acceptable performance, feature extraction must be performed, either by field experts or using automated machine learning architectures. This work compares the use of traditional feature extraction techniques to neural network-based automatic feature extraction for a rotary-percussive underground hydraulic drill rig. A dataset was created for the purpose of predicting the condition of drill bits with tungsten carbide button inserts, and consists of both operational pressures, as well as signals from an accelerometer and a microphone. Two feature extraction approaches are compared using data collected under controlled operating conditions. The first approach uses traditional features including kurtosis, FFT features, and wavelet features. Feature selection and bit condition prediction are performed using a Random Forest model. The second approach uses neural networks to automatically extract features from raw data. Convolutional neural networks and long short-term memory networks are used in the automatic feature extraction approach. The traditional feature extraction approach is sufficient for binary classification of bit condition, while the automatic neural network-based feature extraction approach is superior when prediction is scaled up to a more complex multi-class problem.
APA, Harvard, Vancouver, ISO, and other styles
49

Atteia, Ghada, Rana Alnashwan, and Malak Hassan. "Hybrid Feature-Learning-Based PSO-PCA Feature Engineering Approach for Blood Cancer Classification." Diagnostics 13, no. 16 (August 14, 2023): 2672. http://dx.doi.org/10.3390/diagnostics13162672.

Full text
Abstract:
Acute lymphoblastic leukemia (ALL) is a lethal blood cancer that is characterized by an abnormal increased number of immature lymphocytes in the blood or bone marrow. For effective treatment of ALL, early assessment of the disease is essential. Manual examination of stained blood smear images is current practice for initially screening ALL. This practice is time-consuming and error-prone. In order to effectively diagnose ALL, numerous deep-learning-based computer vision systems have been developed for detecting ALL in blood peripheral images (BPIs). Such systems extract a huge number of image features and use them to perform the classification task. The extracted features may contain irrelevant or redundant features that could reduce classification accuracy and increase the running time of the classifier. Feature selection is considered an effective tool to mitigate the curse of the dimensionality problem and alleviate its corresponding shortcomings. One of the most effective dimensionality-reduction tools is principal component analysis (PCA), which maps input features into an orthogonal space and extracts the features that convey the highest variability from the data. Other feature selection approaches utilize evolutionary computation (EC) to search the feature space and localize optimal features. To profit from both feature selection approaches in improving the classification performance of ALL, in this study, a new hybrid deep-learning-based feature engineering approach is proposed. The introduced approach integrates the powerful capability of PCA and particle swarm optimization (PSO) approaches in selecting informative features from BPI mages with the power of pre-trained CNNs of feature extraction. Image features are first extracted through the feature-transfer capability of the GoogleNet convolutional neural network (CNN). PCA is utilized to generate a feature set of the principal components that covers 95% of the variability in the data. In parallel, bio-inspired particle swarm optimization is used to search for the optimal image features. The PCA and PSO-derived feature sets are then integrated to develop a hybrid set of features that are then used to train a Bayesian-based optimized support vector machine (SVM) and subspace discriminant ensemble-learning (SDEL) classifiers. The obtained results show improved classification performance for the ML classifiers trained by the proposed hybrid feature set over the original PCA, PSO, and all extracted feature sets for ALL multi-class classification. The Bayesian-optimized SVM trained with the proposed hybrid PCA-PSO feature set achieves the highest classification accuracy of 97.4%. The classification performance of the proposed feature engineering approach competes with the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
50

Cheng, Zhun, and Zhixiong Lu. "A Novel Efficient Feature Dimensionality Reduction Method and Its Application in Engineering." Complexity 2018 (October 8, 2018): 1–14. http://dx.doi.org/10.1155/2018/2879640.

Full text
Abstract:
In the engineering field, excessive data dimensions affect the efficiency of machine learning and analysis of the relationships between data or features. To render feature dimensionality reduction more effective and faster, this paper proposes a new feature dimensionality reduction approach combining a sampling survey method with a heuristic intelligent optimization algorithm. Drawing on feature selection, this method builds a feature-scoring system and a reduced-dimension length-scoring system based on the sampling survey method. According to feature scores and reduced-dimension lengths, the method selects a number of features and reduced-dimension lengths that are ranked in the front with high scores. This feature dimensionality reduction method allows for in-depth optimal selection of features and reduced-dimension lengths with high scores using an improved heuristic intelligent optimization algorithm. To verify the effectiveness of the dimensionality reduction method, this paper applies it to road roughness time-domain estimation based on vehicle dynamic response and gene-selection research in bioengineering. Results in the first case show that the proposed method can improve the accuracy of road roughness time-domain estimation to above 0.99 and reduce measured data of the vehicle dynamic response, reducing the experimental workload significantly. Results in the second case show that the method can select a set of genes quickly and effectively with high disease recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography