Relevant bibliographies by topics / Selected subset of training data

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Selected subset of training data'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Selected subset of training data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Selected subset of training data"

Liu, Xiao Fang, and Chun Yang. "Training Data Reduction and Classification Based on Greedy Kernel Principal Component Analysis and Fuzzy C-Means Algorithm." Applied Mechanics and Materials 347-350 (August 2013): 2390–94. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.2390.

Full text

Abstract:

Nonlinear feature extraction used standard Kernel Principal Component Analysis (KPCA) method has large memories and high computational complexity in large datasets. A Greedy Kernel Principal Component Analysis (GKPCA) method is applied to reduce training data and deal with the nonlinear feature extraction problem for training data of large data in classification. First, a subset, which approximates to the original training data, is selected from the full training data using the greedy technique of the GKPCA method. Then, the feature extraction model is trained by the subset instead of the full training data. Finally, FCM algorithm classifies feature extraction data of the GKPCA, KPCA and PCA methods, respectively. The simulation results indicate that the feature extraction performance of both the GKPCA, and KPCA methods outperform the PCA method. In addition of retaining the performance of the KPCA method, the GKPCA method reduces computational complexity due to the reduced training set in classification.

APA, Harvard, Vancouver, ISO, and other styles

Yu, Siwei, Jianwei Ma, and Stanley Osher. "Monte Carlo data-driven tight frame for seismic data recovery." GEOPHYSICS 81, no. 4 (July 2016): V327—V340. http://dx.doi.org/10.1190/geo2015-0343.1.

Full text

Abstract:

Seismic data denoising and interpolation are essential preprocessing steps in any seismic data processing chain. Sparse transforms with a fixed basis are often used in these two steps. Recently, we have developed an adaptive learning method, the data-driven tight frame (DDTF) method, for seismic data denoising and interpolation. With its adaptability to seismic data, the DDTF method achieves high-quality recovery. For 2D seismic data, the DDTF method is much more efficient than traditional dictionary learning methods. But for 3D or 5D seismic data, the DDTF method results in a high computational expense. The motivation behind this work is to accelerate the filter bank training process in DDTF, while doing less damage to the recovery quality. The most frequently used method involves only a randomly selective subset of the training set. However, this random selection method uses no prior information of the data. We have designed a new patch selection method for DDTF seismic data recovery. We suppose that patches with higher variance contain more information related to complex structures, and should be selected into the training set with higher probability. First, we calculate the variance of all available patches. Then for each patch, a uniformly distributed random number is generated and the patch is preserved if its variance is greater than the random number. Finally, all selected patches are used for filter bank training. We call this procedure the Monte Carlo DDTF method. We have tested the trained filter bank on seismic data denoising and interpolation. Numerical results using this Monte Carlo DDTF method surpass random or regular patch selection DDTF when the sizes of the training sets are the same. We have also used state-of-the-art methods based on the curvelet transform, block matching 4D, and multichannel singular spectrum analysis as comparisons when dealing with field data.

APA, Harvard, Vancouver, ISO, and other styles

Ukil, Arijit, Leandro Marin, and Antonio J. Jara. "When less is more powerful: Shapley value attributed ablation with augmented learning for practical time series sensor data classification." PLOS ONE 17, no. 11 (November 23, 2022): e0277975. http://dx.doi.org/10.1371/journal.pone.0277975.

Full text

Abstract:

Time series sensor data classification tasks often suffer from training data scarcity issue due to the expenses associated with the expert-intervened annotation efforts. For example, Electrocardiogram (ECG) data classification for cardio-vascular disease (CVD) detection requires expensive labeling procedures with the help of cardiologists. Current state-of-the-art algorithms like deep learning models have shown outstanding performance under the general requirement of availability of large set of training examples. In this paper, we propose Shapley Attributed Ablation with Augmented Learning: ShapAAL, which demonstrates that deep learning algorithm with suitably selected subset of the seen examples or ablating the unimportant ones from the given limited training dataset can ensure consistently better classification performance under augmented training. In ShapAAL, additive perturbed training augments the input space to compensate the scarcity in training examples using Residual Network (ResNet) architecture through perturbation-induced inputs, while Shapley attribution seeks the subset from the augmented training space for better learnability with the goal of better general predictive performance, thanks to the “efficiency” and “null player” axioms of transferable utility games upon which Shapley value game is formulated. In ShapAAL, the subset of training examples that contribute positively to a supervised learning setup is derived from the notion of coalition games using Shapley values associated with each of the given inputs’ contribution into the model prediction. ShapAAL is a novel push-pull deep architecture where the subset selection through Shapley value attribution pushes the model to lower dimension while augmented training augments the learning capability of the model over unseen data. We perform ablation study to provide the empirical evidence of our claim and we show that proposed ShapAAL method consistently outperforms the current baselines and state-of-the-art algorithms for time series sensor data classification tasks from publicly available UCR time series archive that includes different practical important problems like detection of CVDs from ECG data.

APA, Harvard, Vancouver, ISO, and other styles

Hampson, Daniel P., James S. Schuelke, and John A. Quirein. "Use of multiattribute transforms to predict log properties from seismic data." GEOPHYSICS 66, no. 1 (January 2001): 220–36. http://dx.doi.org/10.1190/1.1444899.

Full text

Abstract:

We describe a new method for predicting well‐log properties from seismic data. The analysis data consist of a series of target logs from wells which tie a 3-D seismic volume. The target logs theoretically may be of any type; however, the greatest success to date has been in predicting porosity logs. From the 3-D seismic volume a series of sample‐based attributes is calculated. The objective is to derive a multiattribute transform, which is a linear or nonlinear transform between a subset of the attributes and the target log values. The selected subset is determined by a process of forward stepwise regression, which derives increasingly larger subsets of attributes. An extension of conventional crossplotting involves the use of a convolutional operator to resolve frequency differences between the target logs and the seismic data. In the linear mode, the transform consists of a series of weights derived by least‐squares minimization. In the nonlinear mode, a neural network is trained, using the selected attributes as inputs. Two types of neural networks have been evaluated: the multilayer feedforward network (MLFN) and the probabilistic neural network (PNN). Because of its mathematical simplicity, the PNN appears to be the network of choice. To estimate the reliability of the derived multiattribute transform, crossvalidation is used. In this process, each well is systematically removed from the training set, and the transform is rederived from the remaining wells. The prediction error for the hidden well is then calculated. The validation error, which is the average error for all hidden wells, is used as a measure of the likely prediction error when the transform is applied to the seismic volume. The method is applied to two real data sets. In each case, we see a continuous improvement in predictive power as we progress from single‐attribute regression to linear multiattribute prediction to neural network prediction. This improvement is evident not only on the training data but, more importantly, on the validation data. In addition, the neural network shows a significant improvement in resolution over that from linear regression.

APA, Harvard, Vancouver, ISO, and other styles

Abuassba, Adnan O. M., Dezheng Zhang, Xiong Luo, Ahmad Shaheryar, and Hazrat Ali. "Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines." Computational Intelligence and Neuroscience 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/3405463.

Full text

Abstract:

Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.

APA, Harvard, Vancouver, ISO, and other styles

Lai, Feilin, and Xiaojun Yang. "Improving Land Cover Classification Over a Large Coastal City Through Stacked Generalization with Filtered Training Samples." Photogrammetric Engineering & Remote Sensing 88, no. 7 (July 1, 2022): 451–59. http://dx.doi.org/10.14358/pers.21-00035r3.

Full text

Abstract:

To improve remote sensing-based land cover mapping over heterogenous landscapes, we developed an ensemble classifier based on stacked generalization with a new training sample refinement technique for the combiner. Specifically, a group of individual classifiers were identified and trained to derive land cover information from a satellite image covering a large complex coastal city. The mapping accuracy was quantitatively assessed with an independent reference data set, and several class probability measures were derived for each classifier. Meanwhile, various subsets were derived from the original training data set using the times of being correctly labeled by the individual classifiers as the thresholds, which were further used to train a random forest model as the combiner in generating the final class predictions. While outperforming each individual classifier, the combiner performed better when using the class probabilities rather than the class predictions as the meta-feature layers and performed significantly better when trained with a carefully selected subset rather than with the entire sample set. The novelties of this work are with the insight into the impact of different training sample subsets on the performance of stacked generalization and the filtering technique developed to prepare training samples for the combiner leading to a large accuracy improvement.

APA, Harvard, Vancouver, ISO, and other styles

Hao, Ruqian, Lin Liu, Jing Zhang, Xiangzhou Wang, Juanxiu Liu, Xiaohui Du, Wen He, Jicheng Liao, Lu Liu, and Yuanying Mao. "A Data-Efficient Framework for the Identification of Vaginitis Based on Deep Learning." Journal of Healthcare Engineering 2022 (February 27, 2022): 1–11. http://dx.doi.org/10.1155/2022/1929371.

Full text

Abstract:

Vaginitis is a gynecological disease affecting the health of millions of women all over the world. The traditional diagnosis of vaginitis is based on manual microscopy, which is time-consuming and tedious. The deep learning method offers a fast and reliable solution for an automatic early diagnosis of vaginitis. However, deep neural networks require massive well-annotated data. Manual annotation of microscopic images is highly cost extensive because it not only is a time-consuming process but also needs highly trained people (doctors, pathologists, or technicians). Most existing active learning approaches are not applicable in microscopic images due to the nature of complex backgrounds and numerous formed elements. To address the problem of high cost of labeling microscopic images, we present a data-efficient framework for the identification of vaginitis based on transfer learning and active learning strategies. The proposed informative sample selection strategy selected the minimal training subset, and then the pretrained convolutional neural network (CNN) was fine-tuned on the selected subset. The experiment results show that the proposed pipeline can save 37.5% annotation cost while maintaining competitive performance. The proposed promising novel framework can significantly save the annotation cost and has the potential of extending widely to other microscopic imaging applications, such as blood microscopic image analysis.

APA, Harvard, Vancouver, ISO, and other styles

Yao, Yu Kai, Yang Liu, Zhao Li, and Xiao Yun Chen. "An Effective K-Means Clustering Based SVM Algorithm." Applied Mechanics and Materials 333-335 (July 2013): 1344–48. http://dx.doi.org/10.4028/www.scientific.net/amm.333-335.1344.

Full text

Abstract:

Support Vector Machine (SVM) is one of the most popular and effective data mining algorithms which can be used to resolve classification or regression problems, and has attracted much attention these years. SVM could find the optimal separating hyperplane between classes, which afford outstanding generalization ability with it. Usually all the labeled records are used as training set. However, the optimal separating hyperplane only depends on a few crucial samples (Support Vectors, SVs), we neednt train SVM model on the whole training set. In this paper a novel SVM model based on K-means clustering is presented, in which only a small subset of the original training set is selected to constitute the final training set, and the SVM classifier is built through training on these selected samples. This greatly decrease the scale of the training set, and effectively saves the training and predicting cost of SVM, meanwhile guarantees its generalization performance.

APA, Harvard, Vancouver, ISO, and other styles

Nakoneczny, S. J., M. Bilicki, A. Pollo, M. Asgari, A. Dvornik, T. Erben, B. Giblin, et al. "Photometric selection and redshifts for quasars in the Kilo-Degree Survey Data Release 4." Astronomy & Astrophysics 649 (May 2021): A81. http://dx.doi.org/10.1051/0004-6361/202039684.

Full text

Abstract:

We present a catalog of quasars with their corresponding redshifts derived from the photometric Kilo-Degree Survey (KiDS) Data Release 4. We achieved it by training machine learning (ML) models, using optical ugri and near-infrared ZYJHKs bands, on objects known from Sloan Digital Sky Survey (SDSS) spectroscopy. We define inference subsets from the 45 million objects of the KiDS photometric data limited to 9-band detections, based on a feature space built from magnitudes and their combinations. We show that projections of the high-dimensional feature space on two dimensions can be successfully used, instead of the standard color-color plots, to investigate the photometric estimations, compare them with spectroscopic data, and efficiently support the process of building a catalog. The model selection and fine-tuning employs two subsets of objects: those randomly selected and the faintest ones, which allowed us to properly fit the bias versus variance trade-off. We tested three ML models: random forest (RF), XGBoost (XGB), and artificial neural network (ANN). We find that XGB is the most robust and straightforward model for classification, while ANN performs the best for combined classification and redshift. The ANN inference results are tested using number counts, Gaia parallaxes, and other quasar catalogs that are external to the training set. Based on these tests, we derived the minimum classification probability for quasar candidates which provides the best purity versus completeness trade-off: p(QSOcand) > 0.9 for r < 22 and p(QSOcand) > 0.98 for 22 < r < 23.5. We find 158 000 quasar candidates in the safe inference subset (r < 22) and an additional 185 000 candidates in the reliable extrapolation regime (22 < r < 23.5). Test-data purity equals 97% and completeness is 94%; the latter drops by 3% in the extrapolation to data fainter by one magnitude than the training set. The photometric redshifts were derived with ANN and modeled with Gaussian uncertainties. The test-data redshift error (mean and scatter) equals 0.009 ± 0.12 in the safe subset and −0.0004 ± 0.19 in the extrapolation, averaged over a redshift range of 0.14 < z < 3.63 (first and 99th percentiles). Our success of the extrapolation challenges the way that models are optimized and applied at the faint data end. The resulting catalog is ready for cosmology and active galactic nucleus (AGN) studies.

APA, Harvard, Vancouver, ISO, and other styles

Zavala, Valentina A., Tatiana Vidaurre, Xiaosong Huang, Sandro Casavilca, Jeannie Navarro, Michelle A. Williams, Sixto Sanchez, et al. "Abstract 3683: Identification of optimal set of genetic variants from a previously reported polygenic risk score for breast cancer risk prediction in Latin American women." Cancer Research 82, no. 12_Supplement (June 15, 2022): 3683. http://dx.doi.org/10.1158/1538-7445.am2022-3683.

Full text

Abstract:

Abstract Around 10% of genetic predisposition for breast cancer is explained by mutations in high/moderate penetrance genes. The remaining proportion is explained by multiple common variants of relatively small effect. A subset of these variants has been identified mostly in Europeans and Asians; and combined into polygenic risk scores (PRS) to predict breast cancer risk. Our aim is to identify a subset of variants to improve breast cancer risk prediction in Hispanics/Latinas (H/Ls).Breast cancer patients were recruited at the Instituto Nacional de Enfermedades Neoplásicas in Peru, to be part of The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN). Women without a diagnosis of breast cancer from a pregnancy outcomes study conducted in Peru were included as controls. After quality control filters, genome-wide genotypes were available for 1,809 cases and 3,334 controls. Missing genotypes were imputed using the Michigan Imputation Server using individuals from 1000 Genomes Project as reference. Genotypes for 313 previously reported breast cancer associated variants and 2 Latin American specific single nucleotide polymorphisms (SNPs) were extracted from the data, using an imputation r2 filter of 30%. Feature selection techniques were used to identify the best subset of SNPs for breast cancer prediction in Peruvian women. We randomly split the PEGEN data by 4:1 ratio for training/validation and testing. Training/validation data were resampled and split in 3:1 ratio into training and validation sets. SNP ranking and selection were done by bootstrapping results from 100 resampled training and validation sets. PRS were built by adding counts of risk alleles weighted by previously reported beta coefficients. The Area Under the Curve (AUC) was used to estimate the prediction accuracy of subsets of SNPs selected with different techniques. Logistic regression was used to test the association between standardized PRS residuals (after adjustment for genetic ancestry) and breast cancer risk. Of the 315 reported variants, 274 were available from the imputed dataset. The full 274-SNP PRS was associated with an AUC of 0.63 (95%CI=0.59-0.66) in the PEGEN study. Using different feature selection methods, we found subsets of SNPs that were associated with AUC values between 0.65-0.69. The best method (AUC=0.69, 95%CI=0.66-0.72) included a subset of 98 SNPs. Sixty-eight SNPs were selected by all methods, including the protective SNP rs140068132 in the 6q25 region, which is associated with Indigenous American ancestry and the largest contribution to the AUC.We identified a subset of 98 SNPs from a previously identified breast cancer PRS that improves breast cancer risk prediction compared to the full set, in women of high Indigenous American ancestry from Peru. Replication in women from Mexico and Colombia, and H/Ls from the U.S will allow us to confirm these results. Citation Format: Valentina A. Zavala, Tatiana Vidaurre, Xiaosong Huang, Sandro Casavilca, Jeannie Navarro, Michelle A. Williams, Sixto Sanchez, Elad Ziv, Luis Carvajal-Carmona, Susan L. Neuhausen7, Bizu Gelaye, Laura Fejerman. Identification of optimal set of genetic variants from a previously reported polygenic risk score for breast cancer risk prediction in Latin American women [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3683.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Selected subset of training data"

Little, Ann M. "Perceived Training Needs of Principals in Northeast Tennessee: Analysis of Data in Two Selected Years." Digital Commons @ East Tennessee State University, 1988. https://dc.etsu.edu/etd/2719.

Full text

Abstract:

The problem of this study was to compare perceived training needs of public school administrators at two points in time and to analyze those needs as to age, sex, educational degree, and experience of respondents. The survey population consisted of public school principals in the 14 systems of the First Educational District in Northeast Tennessee. A descriptive research design was chosen for the study. A follow-up questionnaire was developed based on the 1986 Brown Survey which surveyed the same population for demographic and professional characteristics in addition to the perceived training needs of principals, superintendents, and school board members. Respondents prioritized training needs from most beneficial to least beneficial. Descriptive and inferential statistics were used in answering five research questions which directed the study. The statistical analyses revealed the following: perceived training needs remained stable during the 2 year period, clusters of training needs consistently appeared in the top five and bottom five interest areas, and various approaches were utilized by First Educational District principals to address their perceived training needs. Curriculum and Instruction was identified by all groups of respondents as their top priority for additional training, indicating a recognition of need for more training in the fundamentals of teaching and learning. Others included in the top five training needs were Staff Evaluation, Leadership, Staff Development, and Effective Schools. Those consistently reported in the bottom five training needs included Organizational Governance, Organizational Communication, Law/Policy, Budget, and Problem Solving. The results of this study should prove useful to institutions of higher education in planning programs and courses of study for school administrators. An abundance of opportunities exists to provide much needed advanced training for principals in Northeast Tennessee.

APA, Harvard, Vancouver, ISO, and other styles

Omar, Ebrahim. "Educators' access, training and use of computer-based technology at selected primary schools in the Cape Town suburb of Athlone, Western Cape." Thesis, University of the Western Cape, 2003. http://etd.uwc.ac.za/index.php?module=etd&amp.

Full text

Abstract:

This research study determines designated primary school educator's use of computer technology for accomplishing teaching related tasks such as using the computer to create instructional material
administrative record keeping
to access information via CD-ROM and the Internet for best practice teaching, model lesson plans and e-mail communication. In addition, the research also investigates factors influenicing designated primary schools' ability to become ICT ready and the purposes for which primary school educators use computer technology.

APA, Harvard, Vancouver, ISO, and other styles

Mendenhall, Gordon L. "A model for the assessment of in-service education using data on the acquisition of human genetics concepts by secondary biology teachers and their students and implementation of selected teaching strategies." Virtual Press, 1995. http://liblink.bsu.edu/uhtbin/catkey/1019469.

Full text

Abstract:

This research extended and refined an in-service assessment model used in Project Genethics resulting in an evaluation of Project Genethics and a test of the model's utility. The model guided analyses of the correlational relationships between (a) teacher competency measured by a written 50-item validated posttest (Teacher 50), (b) the number of teaching strategies reported by the participant teachers (Strategy 20), and (c) student competency measured by a written 25-item validated posttest (Student 25) using a Pearson product-moment correlation coefficient Lr). A multiple R statistic and stepwise linear regression with an F ratio were used to determine the association of Teacher 50 and Strategy 20 with the criterion, Student 25.The model is hierarchical. Subsets of test items and teaching strategies related to core genetics concepts (Mendelian genetics, mitosis and meiosis, pedigrees and probability, polygenic inheritance, and chromosome aberrations) were analyzed in teacher posttests, student posttests, and reported teaching strategies. Stepwise linear regression was used to determine the relative impact of the predictors on the criterion, Student 25.The research population consisted of 78 secondary biology teachers and 4,920 of their students. The teachers attend one of six Project Genethics workshops conducted in the summer of 1991, . funded by the National Science Foundation, and implemented by staff of the Human Genetics and Bioethics Education Laboratory (HGABEL).The researcher employed an ex g facto design. A summative data form was designed and used with project data for testing eight null hypotheses. A significant positive linear correlation was found between teacher competency and student competency and the number of strategies used in both full and subset analyses. No significant correlation was found between the number of strategies used and student performance in both full and subset analyses. The number of strategies used did not add significantly to the predictability of student competency after teacher competency was considered.The conceptual understanding of secondary students should be the ultimate criterion by which the effectiveness of in-service programs is measured provided the assessment items are congruent with the student conceptual level of understanding. Teacher knowledge was the most highly associated predictor of student concept attainment.
Department of Biology

APA, Harvard, Vancouver, ISO, and other styles

Nayak, Gaurav Kumar. "Data-efficient Deep Learning Algorithms for Computer Vision Applications." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/6094.

Full text

Abstract:

The performance of any deep learning model depends heavily on the quantity and quality of the available training data. The generalization of the trained deep models improves with the availability of a large number of training samples and hence these models are often referred to as ‘data-hungry’. However, large scale datasets may not always be available in practice due to proprietary/privacy reasons or because of the high cost of generation, annotation, transmission and storage of data. Hence, efficient utilization of available data is of utmost importance, and this gives rise to a class of ML problems, which is often referred to as “data-efficient deep learning”. In this thesis we study the various types of such problems for diverse applications in computer vision, where the aim is to design deep neural network-based solutions that do not rely on the availability of large quantities of training data to attain the desired performance goals. Under the aforementioned thematic area, this thesis focuses on three different scenarios, namely - (1) learning in the absence of training data, (2) learning with limited training data and (3) learning using selected subset of training data. Absence of training data: Pre-trained deep models hold their learnt knowledge in the form of model parameters that act as ‘memory’ for the trained models and help them generalize well on unseen data. In the first part of this thesis, we present solutions to a diverse set of ‘zero-shot’ tasks, where in absence of any training data (or even their statistics) the trained models are leveraged to synthesize data-representative samples. We dub them Data Impressions (DIs), which act as proxy to the training data. As the DIs are not tied to any specific application, we show their utility in solving several CV/ML tasks under the challenging data-free setup, such as unsupervised domain adaptation, continual learning as well as knowledge distillation (KD). We also study the adversarial robustness of lightweight models trained via knowledge distillation using DIs. Further, we demonstrate the efficacy of DIs in generating data-free Universal Adversarial Perturbations (UAPs) with better fooling rates. However, one limiting factor of this solution is the relatively high computation (i.e., several rounds of backpropagation) to synthesize each sample. In fact, the other natural alternatives such as GAN based solutions also suffer from similar computational overhead and complicated training procedures. This motivated us to explore the utility of target class-balanced ‘arbitrary’ data as transfer set, which achieves competitive distillation performance and can yield strong baselines for data-free KD. We have also proposed data-free solutions beyond classification by extending zero-shot distillation to the object detection task, where we compose the pseudo transfer set by synthesizing multi-object impressions from a pretrained faster RCNN model. Another concern with the deployment of given trained models is their vulnerability against adversarial attacks. The popular adversarial training strategies rely on availability of original training data or explicit regularization-based techniques. On the contrary, we propose test-time adversarial defense (detection and correction framework), which can provide robustness in absence of training data and their statistics. We observe significant improvements in adversarial accuracy with minimal drop in clean accuracy against state-of-the-art ‘Auto Attack’ without having to retrain the model. Further, we explore an even more challenging problem setup and make the first attempt to provide adversarial robustness to ‘black box’ models (i.e., model architecture, weights, training details are inaccessible) under a complete data-free set up. Our method minimizes adversarial contamination on perturbed samples via proposed ‘wavelet noise remover’ (WNR) that remove coefficients corresponding to high frequency components which are most likely to be corrupted by adversarial attack, and recovers the lost image content by training a ‘regenerator’ network. This results in a high boost in adversarial accuracy when WNR combined with the trained regenerator network is prepended to black box network. Limited training data: In the second part, we assume the availability of a few training samples, where access to trained models may or may not be provided. In the few-shot setup, existing works obtain robustness using sophisticated meta-learning techniques which rely on the generation of adversarial samples in every episode of training - thereby making it computationally expensive. We propose the first computationally cheaper non-meta learning approach for robust few-shot learning that does not require any adversarial sample. We perform pretraining using self-distillation to make the feature representation of low-frequency samples close to original samples of base classes. Similarly, we also improve the discriminability of low-frequency query set features that further boost the robustness. Our method obtains massive improvement in adversarial performance while being ≈5x faster compared to state-of-the-art adversarial meta-learning methods. However, empirical robustness methods do not guarantee robustness of the trained models against all the adversarial perturbations possible within a given threat model. Thus, we also propose a novel problem of certified robustness of pretrained models in limited data settings. Our method provides a novel sample-generation strategy that synthesize ‘boundary’ and ‘interpolated’ samples to augment the limited training data and uses them in training the denoiser (prepended to pretrained classifier) via aligning the feature representations at multiple granularities (both instance and distribution levels). We achieve significant improvements across diverse sample budgets and noise levels in the white-box and observe similar performance under challenging black-box setup. Selected subset of training data: In the third part, we enforce efficient utilization via intelligently doing selective sampling on existing training datasets to obtain representative samples for the target task such as distillation, incremental learning and person-reid. Adversarial attacks recently have shown robustness bias, where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. Existing works characterize a subgroup’s robustness bias by only checking individual sample’s proximity to the decision boundary. We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives and further develop a trustworthy system to alert the humans about the incoming samples that are highly likely to be misclassified. Moreover, we demonstrate the utility of the proposed metric for data (and time)-efficient knowledge distillation which achieves better performance compared to competing baselines. Other applications such as incremental learning and video based person-reid can also be framed as a subset selection problem where representative samples need to be selected. We leverage DPP (Determinantal Point Process) for choosing the relevant and diverse samples. In Incremental learning, we propose a new variant of k-DPP that uses the RBF kernel (termed as “RBF k-DPP”) for challenging task of animal pose estimation and further tackle class imbalance by using image warping as an augmentation technique to generate varied poses for a given image, leading to further gains in performance. In video based re-id, we propose SLGDPP method which exploits the sequential nature of the frames in video while avoiding noisy and redundant (correlated) frames, resulting in outperforming the baseline sampling methods.

APA, Harvard, Vancouver, ISO, and other styles

Taati, BABAK. "Generation and Optimization of Local Shape Descriptors for Point Matching in 3-D Surfaces." Thesis, 2009. http://hdl.handle.net/1974/5107.

Full text

Abstract:

We formulate Local Shape Descriptor selection for model-based object recognition in range data as an optimization problem and offer a platform that facilitates a solution. The goal of object recognition is to identify and localize objects of interest in an image. Recognition is often performed in three phases: point matching, where correspondences are established between points on the 3-D surfaces of the models and the range image; hypothesis generation, where rough alignments are found between the image and the visible models; and pose refinement, where the accuracy of the initial alignments is improved. The overall efficiency and reliability of a recognition system is highly influenced by the effectiveness of the point matching phase. Local Shape Descriptors are used for establishing point correspondences by way of encapsulating local shape, such that similarity between two descriptors indicates geometric similarity between their respective neighbourhoods. We present a generalized platform for constructing local shape descriptors that subsumes a large class of existing methods and allows for tuning descriptors to the geometry of specific models and to sensor characteristics. Our descriptors, termed as Variable-Dimensional Local Shape Descriptors, are constructed as multivariate observations of several local properties and are represented as histograms. The optimal set of properties, which maximizes the performance of a recognition system, depend on the geometry of the objects of interest and the noise characteristics of range image acquisition devices and is selected through pre-processing the models and sample training images. Experimental analysis confirms the superiority of optimized descriptors over generic ones in recognition tasks in LIDAR and dense stereo range images.
Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2009-09-01 11:07:32.084

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Selected subset of training data"

JCPDS--International Centre for Diffraction Data. and American Ceramic Society, eds. Selected powder diffraction data for education & training: Search manual and data cards. Swarthmore, PA, U.S.A. (1601 Park La., Swarthmore 19081-2389): The Centre, 1988.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

1942-, Lovegrove Gillian, Segal Barbara 1944-, WiC (Organization) Conference, and British Computer Society, eds. Women into computing: Selected papers, 1988-1990. London: Springer-Verlag, 1991.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Division, United Nations Population, ed. World population monitoring 1996: Selected aspects of reproductive rights and reproductive health. New York: United Nations, 1998.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Office, General Accounting. Tax administration: IRS' abatement process in selected locations : report to the Joint Committee on Taxation. Washington, D.C: The Office, 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Office, General Accounting. Tax administration: Selected IRS forms, publications, and notices could be improved : report to the Honorable J.J. Pickle, Chairman, Subcommittee on Oversight, Committee on Ways and Means, House of Representatives. Washington, D.C: The Office, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Lovegrove, Gillian, and Barbara Segal. Women into Computing: Selected Papers 1988-1990. Springer London, Limited, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Hankin, David, Michael S. Mohr, and Kenneth B. Newman. Sampling Theory. Oxford University Press, 2019. http://dx.doi.org/10.1093/oso/9780198815792.001.0001.

Full text

Abstract:

We present a rigorous but understandable introduction to the field of sampling theory for ecologists and natural resource scientists. Sampling theory concerns itself with development of procedures for random selection of a subset of units, a sample, from a larger finite population, and with how to best use sample data to make scientifically and statistically sound inferences about the population as a whole. The inferences fall into two broad categories: (a) estimation of simple descriptive population parameters, such as means, totals, or proportions, for variables of interest, and (b) estimation of uncertainty associated with estimated parameter values. Although the targets of estimation are few and simple, estimates of means, totals, or proportions see important and often controversial uses in management of natural resources and in fundamental ecological research, but few ecologists or natural resource scientists have formal training in sampling theory. We emphasize the classical design-based approach to sampling in which variable values associated with units are regarded as fixed and uncertainty of estimation arises via various randomization strategies that may be used to select samples. In addition to covering standard topics such as simple random, systematic, cluster, unequal probability (stressing the generality of Horvitz–Thompson estimation), multi-stage, and multi-phase sampling, we also consider adaptive sampling, spatially balanced sampling, and sampling through time, three areas of special importance for ecologists and natural resource scientists. The text is directed to undergraduate seniors, graduate students, and practicing professionals. Problems emphasize application of the theory and R programming in ecological and natural resource settings.

APA, Harvard, Vancouver, ISO, and other styles

Cheng, Russell. Bootstrapping Linear Models. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780198505044.003.0016.

Full text

Abstract:

Bootstrap model selection is proposed for the difficult problem of selecting important factors in non-orthogonal linear models when the number of factors, P, is large. In the method, the full model is first fitted to the original data. Then B parametric bootstrap samples are drawn from the fitted model, and the full model fitted to each. A submodel is obtained from each fitted full model by rejecting those factors found unimportant in the fit. Each distinct selected submodel is then fitted to the original data and its Mallows Cp statistic calculated. A subset of good submodels based on the Cp values is then obtained. A reliability check can be made by fitting this subset to the BS samples also, to see how often each submodel is found to be a good fit. Use of the method is illustrated using a real-data sample.

APA, Harvard, Vancouver, ISO, and other styles

Nation, Population Division of the United. World Population Monitoring 1996: Selected Aspects of Reproductive Rights and Reproductive Health (Population Studies). United Nations, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Dodge, Ronald C. Information Assurance and Security Education and Training: 8th IFIP WG 11.8 World Conference on Information Security Education, WISE 8, Auckland, New Zealand, July 8-10, 2013, Proceedings, WISE 7, Lucerne Switzerland, June 9-10, 2011, and WISE 6, Bento Gonçalves, RS, Brazil, July 27-31, 2009, Revised Selected Papers. 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Selected subset of training data"

Hojaji, Fazilat, Adam J. Toth, and Mark J. Campbell. "A Machine Learning Approach for Modeling and Analyzing of Driver Performance in Simulated Racing." In Communications in Computer and Information Science, 95–105. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_8.

Full text

Abstract:

AbstractThe emerging progress of esports lacks the approaches for ensuring high-quality analytics and training in professional and amateur esports teams. In this paper, we demonstrated the application of Artificial Intelligence (AI) and Machine Learning (ML) approach in the esports domain, particularly in simulated racing. To achieve this, we gathered a variety of feature-rich telemetry data from several web sources that was captured through MoTec telemetry software and the ACC simulated racing game. We performed a number of analyses using ML algorithms to classify the laps into the performance levels, evaluating driving behaviors along these performance levels, and finally defined a prediction model highlighting the channels/features that have significant impact on the driver performance. To identify the optimal feature set, three feature selection algorithms, i.e., the Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) have been applied where out of 84 features, a subset of 10 features has been selected as the best feature subset. For the classification, XGBoost outperformed RF and SVM with the highest accuracy score among the other evaluated models. The study highlights the promising use of AI to categorize sim racers according to their technical-tactical behaviour, enhancing sim racing knowledge and know how.

APA, Harvard, Vancouver, ISO, and other styles

Das, Soumi, Arshdeep Singh, Saptarshi Chatterjee, Suparna Bhattacharya, and Sourangshu Bhattacharya. "Finding High-Value Training Data Subset Through Differentiable Convex Programming." In Machine Learning and Knowledge Discovery in Databases. Research Track, 666–81. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86520-7_41.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gräßler, Iris, Michael Hieb, Daniel Roesmann, and Marc Unverzagt. "Creating Synthetic Training Data for Machine Vision Quality Gates." In Bildverarbeitung in der Automation, 95–106. Berlin, Heidelberg: Springer Berlin Heidelberg, 2023. http://dx.doi.org/10.1007/978-3-662-66769-9_7.

Full text

Abstract:

AbstractManufacturing companies face the challenge of combining increasing productivity and quality standards with customer″=oriented mass production. To achieve the required quality standards, quality controls are carried out after selected production steps. These are often visual inspections by trained personnel based on checklists. To automate visual inspection industrial, cameras and powerful machine vision algorithms are needed. Large amounts of visual training data are usually required in order to train these algorithms. However, collecting training data is time″=consuming, especially in customer″=oriented mass production. Synthetic training data generated by CAD tools and rendering software can alleviate the lack of available training data. Within the paper at hand, a novel approach is presented examining the use of synthetic training data in machine vision applications. The results show that synthetically generated training data used to train machine vision quality gates is fundamentally suitable. This offers great potential to relieve process and productions developers in the development of quality gates in the future.

APA, Harvard, Vancouver, ISO, and other styles

Borst, Timo, Jonas Mielck, Matthias Nannt, and Wolfgang Riese. "Extracting Funder Information from Scientific Papers - Experiences with Question Answering." In Linking Theory and Practice of Digital Libraries, 289–96. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-16802-4_24.

Full text

Abstract:

AbstractThis paper is about automatic recognition of entities that funded a research work in economics as being expressed in a publication. While many works apply rules and/or regular expressions to candidate sections within the text, we follow a question answering (QA) based approach to identify those passages that are most likely to inform us about funding. With regard to a digital library scenario, we are dealing with three more challenges: confirming that our approach at least outperforms manual indexing, disambiguation of funding organizations by linking their names to authority data, and integrating the generated metadata into a digital library application. Our computational results by means of machine learning techniques show that our QA performs similar to a previous work (AckNER), although we operated on rather small sets of training and test data. While manual indexing is still needed for a gold standard of reliable metadata, the identification of funding entities only worked for a subset of funder names.

APA, Harvard, Vancouver, ISO, and other styles

Ouala, Said, Pierre Tandeo, Bertrand Chapron, Fabrice Collard, and Ronan Fablet. "End-to-End Kalman Filter in a High Dimensional Linear Embedding of the Observations." In Mathematics of Planet Earth, 211–21. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-18988-3_13.

Full text

Abstract:

AbstractData assimilation techniques are the state-of-the-art approaches in the reconstruction of a spatio-temporal geophysical state such as the atmosphere or the ocean. These methods rely on a numerical model that fills the spatial and temporal gaps in the observational network. Unfortunately, limitations regarding the uncertainty of the state estimate may arise when considering the restriction of the data assimilation problems to a small subset of observations, as encountered for instance in ocean surface reconstruction. These limitations motivated the exploration of reconstruction techniques that do not rely on numerical models. In this context, the increasing availability of geophysical observations and model simulations motivates the exploitation of machine learning tools to tackle the reconstruction of ocean surface variables. In this work, we formulate sea surface spatio-temporal reconstruction problems as state space Bayesian smoothing problems with unknown augmented linear dynamics. The solution of the smoothing problem, given by the Kalman smoother, is written in a differentiable framework which allows, given some training data, to optimize the parameters of the state space model.

APA, Harvard, Vancouver, ISO, and other styles

Merino, Mikel, Javier Ibarrola, Jokin Aginaga, and Mikel Hualde. "AI Training for Application to Industrial Robotics: Trajectory Generation for Neural Network Tuning." In Proceedings of the XV Ibero-American Congress of Mechanical Engineering, 405–11. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-38563-6_59.

Full text

Abstract:

AbstractIn the present work robot trajectories are generated and kinematically simulated. Different data (joint coordinates, end effector position and orientation, images, etc.) are obtained in order to train a neural network suited for applications in robotics. The neural network has the goal of automatically generating trajectories based on a set of images and coordinates. For this purpose, trajectories are designed in two separate sections which are conveniently connected using Bezier curves, ensuring continuity up to accelerations. In addition, among the possible trajectories that can be carried out due to the different configurations of the robot, the most suitable ones have been selected avoiding collisions and singularities. The designed algorithm can be used in multiple applications by adapting its different parameters.

APA, Harvard, Vancouver, ISO, and other styles

Maryl, Maciej. "Operationalising the Change. Dispersion of Polish literary life (1989–2002)." In Germanistische Symposien, 509–34. Stuttgart: J.B. Metzler, 2022. http://dx.doi.org/10.1007/978-3-476-05886-7_21.

Full text

Abstract:

AbstractThis paper explores a possibility of quantitative research into transitions in literary history on the example of Polish literary life 1989–2002. This exploration will be carried out on the basis of the data from Polish Literary Bibliography (PBL), a comprehensive database of Polish literary and cultural life. The paper aim to meet two interconnected goals: (1) to test selected qualitative hypotheses concerning Polish literature of the transition period (1989–2002); (2) to propose methodology for the data-driven study of literary history with the use of bibliographical datasets. Both explorations are supplemented with a reflection on how documentation methodologies and practices affect the quantitative analysis. The analyses focus on describing the changes of literary system – precisely the movement from dispersion towards centralisation and back -- on many levels. I commence with some statistics of the literary production of that period, then combine those findings with the reception data, and subsequently, with comparison of reception of debutants and established writers. Next, I analyse the dispersion on the example of geography and finally use network analysis for centrality measures. Altogether the analysed dataset, a subset of PBL database, consists of 33,142 literary books and 138,925 literary articles by 21,865 authors. These were published by 8233 publishers and 958 journals. There are 93,874 reception texts, including pieces about 9286 authors and 22,335 books.

APA, Harvard, Vancouver, ISO, and other styles

Kozłowski, Marek, Przemyslaw Buczkowski, and Piotr Brzezinski. "A Novel Process of Shoe Pairing Using Computer Vision and Deep Learning Methods." In Digital Interaction and Machine Intelligence, 35–44. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37649-8_4.

Full text

Abstract:

AbstractThe industrialisation of the footwear recycling processes is a major issue in the European Union—particularly in view of the fact that at least 90% of shoes consumed in western economies are ultimately sent to landfill. This requires new AI-empowered technologies that enable detection, classification, pairing, and quality assessment in a viable automatic process. This article discusses automatic shoe pairing, which comprises two sequential stages: a) deep multiview shoe embedding (compact representation of multiview data); and b) clustering of shoes’ embeddings with a fixed similarity threshold to return sets of possible pairs. Each shoe in our pipeline is represented by multiple images that are collected in industrial darkrooms. We present various approaches to shoe pairing—from fully unsupervised ones based on image descriptors to supervised ones that rely on deep neural networks—to identify the most effective one for this highly specific industrial task. The article also explains how the selected method can be improved by hyperparameter tuning, massive increases in training data, and data augmentation.

APA, Harvard, Vancouver, ISO, and other styles

Culotta, Fabrizio. "Given N forecasting models, what to do?" In Proceedings e report, 317–22. Florence: Firenze University Press and Genova University Press, 2023. http://dx.doi.org/10.36253/979-12-215-0106-3.55.

Full text

Abstract:

This work evaluates the forecasting performances of different models using data on Italian unemployment and employment rates over the years 2004-2022 at the monthly frequency. The logic of this work is inspired by the series of M-Competitions, i.e. the tradition of competitions organized to test the forecasting performances of classical and innovative models. Given N competing models, only one winner is selected. The types of forecasting models range from the Exponential Smoothing family to ARIMA-like models, to their hybridization, to machine learning and neural network engines. Model combinations through various ensemble techniques are also considered. Once the observational period is split between the training and test set, the estimated forecasting models are ranked in terms of fitting on the training set and in terms of their forecast accuracy on the test set. Results confirm that it does not exist yet a single superior universal model. On the contrary, the ranking of different forecasting models is specific to the adopted training set. Secondly, results confirm that performances of machine learning and neural network models offer satisfactory alternatives and complementarities to the traditional models like ARIMA and Exponential Smoothing. Finally, the results stress the importance of model ensemble techniques as a solution to model uncertainty as well as a tool to improve forecast accuracy. The flexibility provided by a rich set of different forecasting models, and the possibility of combining them, together represent an advantage for decision-makers often constrained to adopt solely pure, not-combined, forecasting models. Overall, this work can represent a first step toward the construction of a semi-automatic forecasting algorithm, which has become an essential tool for both trained and untrained eyes in an era of data-driven decision-making.

APA, Harvard, Vancouver, ISO, and other styles

Yao, Wei, and Jianwei Wu. "Airborne LiDAR for Detection and Characterization of Urban Objects and Traffic Dynamics." In Urban Informatics, 367–400. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-8983-6_22.

Full text

Abstract:

AbstractIn this chapter, we present an advanced machine learning strategy to detect objects and characterize traffic dynamics in complex urban areas by airborne LiDAR. Both static and dynamical properties of large-scale urban areas can be characterized in a highly automatic way. First, LiDAR point clouds are colorized by co-registration with images if available. After that, all data points are grid-fitted into the raster format in order to facilitate acquiring spatial context information per-pixel or per-point. Then, various spatial-statistical and spectral features can be extracted using a cuboid volumetric neighborhood. The most important features highlighted by the feature-relevance assessment, such as LiDAR intensity, NDVI, and planarity or covariance-based features, are selected to span the feature space for the AdaBoost classifier. Classification results as labeled points or pixels are acquired based on pre-selected training data for the objects of building, tree, vehicle, and natural ground. Based on the urban classification results, traffic-related vehicle motion can further be indicated and determined by analyzing and inverting the motion artifact model pertinent to airborne LiDAR. The performance of the developed strategy towards detecting various urban objects is extensively evaluated using both public ISPRS benchmarks and peculiar experimental datasets, which were acquired across European and Canadian downtown areas. Both semantic and geometric criteria are used to assess the experimental results at both per-pixel and per-object levels. In the datasets of typical city areas requiring co-registration of imagery and LiDAR point clouds a priori, the AdaBoost classifier achieves a detection accuracy of up to 90% for buildings, up to 72% for trees, and up to 80% for natural ground, while a low and robust false-positive rate is observed for all the test sites regardless of object class to be evaluated. Both theoretical and simulated studies for performance analysis show that the velocity estimation of fast-moving vehicles is promising and accurate, whereas slow-moving ones are hard to distinguish and yet estimated with acceptable velocity accuracy. Moreover, the point density of ALS data tends to be related to system performance. The velocity can be estimated with high accuracy for nearly all possible observation geometries except for those vehicles moving in or (quasi-)along the track. By comparative performance analysis of the test sites, the performance and consistent reliability of the developed strategy for the detection and characterization of urban objects and traffic dynamics from airborne LiDAR data based on selected features was validated and achieved.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Selected subset of training data"

Lin, Hui, and Jeff Bilmes. "How to select a good training-data subset for transcription: submodular active selection for sequences." In Interspeech 2009. ISCA: ISCA, 2009. http://dx.doi.org/10.21437/interspeech.2009-730.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yao, Yazhou, Jian Zhang, Fumin Shen, Wankou Yang, Xian-Sheng Hua, and Zhenmin Tang. "Extracting Privileged Information from Untagged Corpora for Classifier Learning." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/151.

Full text

Abstract:

The performance of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), \eg attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. To address this issue, we propose to enhance classifier learning by extracting PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data. In detail, we treat each selected PI as a subcategory and learn one classifier for per subcategory independently. The classifiers for all subcategories are then integrated together to form a more powerful category classifier. Particularly, we propose a new instance-level multi-instance learning (MIL) model to simultaneously select a subset of training images from each subcategory and learn the optimal classifiers based on the selected images. Extensive experiments demonstrate the superiority of our approach.

APA, Harvard, Vancouver, ISO, and other styles

WICKRAMARACHCHI,, CHANDULA T., XIAOFEI JIANG, ELIZABETH J. CROSS, and KEITH WORDEN. "ASSESSING THE INFORMATION CONTENT OF DATASETS FOR STRUCTURAL HEALTH MONITORING." In Structural Health Monitoring 2021. Destech Publications, Inc., 2022. http://dx.doi.org/10.12783/shm2021/36355.

Full text

Abstract:

Data-based SHM is highly dependent on the quality of the training data needed for machine learning algorithms. In many cases of engineering interest, data can be scarce, and this is a problem. However, in some cases, data are abundant and can create a computational burden. In data-rich situations, it is often desirable to select the subset(s) of the data which are of highest value (in some sense) for the problem of interest. In this paper, ‘value’ is interpreted in terms of information content, and entropy is used a measure of that content in order to condense training data without compromising useful information. Using the minimum covariance determinant, the dataset is first separated using inclusive outlier analysis. The entropies of the separated datasets are then assessed using parametric and nonparametric density estimators to identify the subset of data carrying most information. The Z24-Bridge dataset is used here to illustrate the idea, where the entropy values indicate that the subset containing data from environmental variations and damage is most rich in information. This subset was made up of half of the entire dataset, suggesting that it is possible to significantly reduce the amount of training data for an SHM algorithm whilst retaining the required information for analysis.

APA, Harvard, Vancouver, ISO, and other styles

Norlund, Philip, and Fan Jiang. "Improving Machine Learning Approaches to Seismic Fault Imaging Through Training Augmentation." In International Petroleum Technology Conference. IPTC, 2022. http://dx.doi.org/10.2523/iptc-21940-ea.

Full text

Abstract:

Abstract Fault interpretation is critical for understanding subsurface challenges such as fluid migration and avoiding drilling hazards. Recently, Convolutional Neural Networks (CNN) have been shown to be effective tools for identifying faults in seismic data by utilizing image segmentation. However, selecting the optimal training data for building a model can be challenging. Fault shapes and relationships are highly variable due to the complexity of the regional geodynamic processes happening within the earth over time. How to properly select a training subset from a real (i.e., non-synthetic) data-set is crucial for the success of machine learning for seismic interpretation. In this paper, we attempt to quantify how models can be improved by augmenting the training data in various ways. These augmentations include varying the amount of real and synthetic data and including combinations of seismic attributes.

APA, Harvard, Vancouver, ISO, and other styles

Wang, Nan, Xibin Zhao, Yu Jiang, and Yue Gao. "Iterative Metric Learning for Imbalance Data Classification." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/389.

Full text

Abstract:

In many classification applications, the amount of data from different categories usually vary significantly, such as software defect predication and medical diagnosis. Under such circumstances, it is essential to propose a proper method to solve the imbalance issue among the data. However, most of the existing methods mainly focus on improving the performance of classifiers rather than searching for an appropriate way to find an effective data space for classification. In this paper, we propose a method named Iterative Metric Learning (IML) to explore the correlations among imbalance data and construct an effective data space for classification. Given the imbalance training data, it is important to select a subset of training samples for each testing data. Thus, we aim to find a more stable neighborhood for testing data using the iterative metric learning strategy. To evaluate the effectiveness of the proposed method, we have conducted experiments on two groups of dataset, i.e., the NASA Metrics Data Program (NASA) dataset and UCI Machine Learning Repository (UCI) dataset. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

Rios, Amanda, and Laurent Itti. "Closed-Loop Memory GAN for Continual Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/462.

Full text

Abstract:

Sequential learning of tasks using gradient descent leads to an unremitting decline in the accuracy of tasks for which training data is no longer available, termed catastrophic forgetting. Generative models have been explored as a means to approximate the distribution of old tasks and bypass storage of real data. Here we propose a cumulative closed-loop memory replay GAN (CloGAN) provided with external regularization by a small memory unit selected for maximum sample diversity. We evaluate incremental class learning using a notoriously hard paradigm, single-headed learning, in which each task is a disjoint subset of classes in the overall dataset, and performance is evaluated on all previous classes. First, we show that when constructing a dynamic memory unit to preserve sample heterogeneity, model performance asymptotically approaches training on the full dataset. We then show that using a stochastic generator to continuously output fresh new images during training increases performance significantly further meanwhile generating quality images. We compare our approach to several baselines including fine-tuning by gradient descent (FGD), Elastic Weight Consolidation (EWC), Deep Generative Replay (DGR) and Memory Replay GAN (MeRGAN). Our method has very low long-term memory cost, the memory unit, as well as negligible intermediate memory storage.

APA, Harvard, Vancouver, ISO, and other styles

Liu, Bo, Ying Wei, Yu Zhang, and Qiang Yang. "Deep Neural Networks for High Dimension, Low Sample Size Data." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/318.

Full text

Abstract:

Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics, DNN suffers from overfitting and high-variance gradients. In this paper, we propose a DNN model tailored for the HDLSS data, named Deep Neural Pursuit (DNP). DNP selects a subset of high dimensional features for the alleviation of overfitting and takes the average over multiple dropouts to calculate gradients with low variance. As the first DNN method applied on the HDLSS data, DNP enjoys the advantages of the high nonlinearity, the robustness to high dimensionality, the capability of learning from a small number of samples, the stability in feature selection, and the end-to-end training. We demonstrate these advantages of DNP via empirical results on both synthetic and real-world biological datasets.

APA, Harvard, Vancouver, ISO, and other styles

Jiang, Botao, and Fuyu Zhao. "An Alternative Approach to Prediction of Critical Heat Flux: Projection Support Vector Regression." In 2014 22nd International Conference on Nuclear Engineering. American Society of Mechanical Engineers, 2014. http://dx.doi.org/10.1115/icone22-30747.

Full text

Abstract:

Critical heat flux (CHF) is one of the most crucial design criteria in other boiling systems such as evaporator, steam generators, fuel cooling system, boiler, etc. This paper presents an alternative CHF prediction method named projection support vector regression (PSVR), which is a combination of feature vector selection (FVS) method and support vector regression (SVR). In PSVR, the FVS method is first used to select a relevant subset (feature vectors, FVs) from the training data, and then both the training data and the test data are projected into the subspace constructed by FVs, and finally SVR is applied to estimate the projected data. An available CHF dataset taken from the literature is used in this paper. The CHF data are split into two subsets, the training set and the test set. The training set is used to train the PSVR model and the test set is then used to evaluate the trained model. The predicted results of PSVR are compared with those of artificial neural networks (ANNs). The parametric trends of CHF are also investigated using the PSVR model. It is found that the results of the proposed method not only fit the general understanding, but also agree well with the experimental data. Thus, PSVR can be used successfully for prediction of CHF in contrast to ANNs.

APA, Harvard, Vancouver, ISO, and other styles

Kwegyir-Afful, Ebo, Janne Heilala, and Jussi Kantola. "A Scoping Review on Certified Immersive Virtual Training Applications." In 14th International Conference on Applied Human Factors and Ergonomics (AHFE 2023). AHFE International, 2023. http://dx.doi.org/10.54941/ahfe1003934.

Full text

Abstract:

The fourth industrial revolution has given rise to virtual applications for training to meet the rapidly changing work environment. However, there are scanty reviews on the subject in the context of virtual training and certification. The current paper thus presents a scoping review of the literature regarding virtual applications for certification in training for specific competence building. The review essentially focuses on the required processes and outcomes of previous immersive virtual environments for certifications, and the effects of the training on competence measurement reliability. Additionally, the study investigates from the selected publications the learning outcome, competencies gained and the related training acquired. Besides, current weaknesses and strengths of VR applications are presented with suggestions for further improvements. Reviewed articles were obtained by extracting the salient information from publications indexed in four scientific digital libraries utilizing exclusion and inclusion methods. Our research design constituted five steps beginning with the research questions, literature extraction, relevant publication section, data extraction, evaluation and future research agenda. Selected publications also focused on fully immersive virtual reality utilizing HTC VIVE and Oculus Rift. Several advantages of the virtual certification training were discovered including enthusiasm, learning outcome, cost reduction, measurability and effects of the certifications. The majority of the publications focused extensively on the healthcare industry, especially medical/surgical. However, industrial virtual certifications such as hot work safety training, forklift safety operation, crane safety operation, and general work safety were discovered to be lacking. Despite these gaps, current interest and commitments are driving future alternatives for virtual training and certification for improving industrial training and competencies in other areas where utilization is conspicuously limited.

APA, Harvard, Vancouver, ISO, and other styles

Walters, Madeline A., Zhaoyan Fan, and Burak Sencer. "Data-Based Modeling for Reactive Ion Etching: Effectiveness of an Artificial Neural Network Model for Estimating Tungsten Silicon Nitride Etch Rate." In ASME 2020 International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers, 2020. http://dx.doi.org/10.1115/imece2020-23992.

Full text

Abstract:

Abstract This paper presents a data-based approach for modeling a plasma etch process by estimating etch rate based on controlled input parameters. This work seeks to use an Artificial Neural Network (ANN) model to correlate controlled tool parameters with etch rate and uniformity for a blanket 1100 Å WSiN thin film using Cl2 and BCl3 chemistry. Experimental data was collected using a Lam 9600 PTX plasma metal etch chamber in an industrial cleanroom. The WSiN film was deposited over 3000 Å TEOS to ensure adhesion, with an 8-inch bare silicon wafer as the base layer. Controlled tool parameters were radio frequency (RF) upper electrode power, RF lower electrode power, Cl2 gas flowrate, BCl3 gas flowrate, and chamber pressure. The full factorial design of experiment method was used to select the combinations of experimental configurations. The ANN model was validated using a subset of the training data.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Selected subset of training data"

Olefirenko, Nadiia V., Ilona I. Kostikova, Nataliia O. Ponomarova, Liudmyla I. Bilousova, and Andrey V. Pikilnyak. E-learning resources for successful math teaching to pupils of primary school. [б. в.], September 2019. http://dx.doi.org/10.31812/123456789/3266.

Full text

Abstract:

Ukrainian primary schools are undergoing significant changes as for Reform ‘New Ukrainian School’, it reflects rapid updating information technology and high level of children’ informational activity. Primary schools are basically focused on development subject knowledge and general study skills. One of the ways of their developing is to use tools and apps. There are the examples of using interactive tools and apps for teaching Math for young learners by teachers-to-be in the article. The article presents as well the experimental data about training teachers-to-be to use tools and apps. Interactive tools and apps provide real task variability, uniqueness of exercises, operative assessment of correction, adjustment of task difficulty, a shade of competitiveness and gaming to the exercises. To create their own apps teachers-to be use the tools that are the part of the integrated Microsoft Office package using designing environments, and other simple and convenient programs. The article presents experimental data about the results of training teachers-to-be to create apps. A set of criteria for creation apps was made and checked at the experimental research such as ability to develop apps, knowledge and understanding the functional capabilities of apps, knowledge of tools for creating apps and their functional capabilities, ability to select and formulate tasks for young learners, ability to assess adequately the quality of the developed apps.

APA, Harvard, Vancouver, ISO, and other styles

Arbeit, Caren A., Alexander Bentz, Emily Forrest Cataldi, and Herschel Sanders. Alternative and Independent: The universe of technology-related “bootcamps". RTI Press, February 2019. http://dx.doi.org/10.3768/rtipress.2019.rr.0033.1902.

Full text

Abstract:

In recent years, nontraditional workforce training programs have proliferated inside and outside of traditional postsecondary institutions. A subset of these programs, bootcamps, advertise high job placement rates and have been hailed by policymakers as key to training skilled workers. However, few formal data exist on the number, types, prices, location, or other descriptive details of program offerings. We fill this void by studying the universe of bootcamp programs offered as of June 30, 2017. In this report, we discuss the attributes of the 1,010 technology-related programs offered in the United States, Canada, and online. We find more diversity among bootcamp providers and programs than would be expected from public discourse. This primarily relates to the mode of delivery (online vs. in person), intensity (part time/full time), cost, and program types. Based on the data we collected, we present a classification structure for bootcamps focused on five distinct program types.

APA, Harvard, Vancouver, ISO, and other styles

ABB ENVIRONMENTAL PORTLAND ME. Fort Devens Sudbury Training Annex, Middlesex County Massachusetts, Remedical (Data Gap) Investigations of Area of Contamination A4 and Areas of Contamination A7/A9 (Management-of-Migration Operable Unit) and Supplemental Site Investigations of Selected Study Areas, Final Task Order Work Plan. Fort Belvoir, VA: Defense Technical Information Center, May 1996. http://dx.doi.org/10.21236/ada467432.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Adebayo, Oliver, Joanna Aldoori, William Allum, Noel Aruparayil, Abdul Badran, Jasmine Winter Beatty, Sanchita Bhatia, et al. Future of Surgery: Technology Enhanced Surgical Training: Report of the FOS:TEST Commission. The Royal College of Surgeons of England, August 2022. http://dx.doi.org/10.1308/fos2.2022.

Full text

Abstract:

Over the past 50 years the capability of technology to improve surgical care has been realised and while surgical trainees and trainers strive to deliver care and train; the technological ‘solutions’ market continues to expand. However, there remains no coordinated process to assess these technologies. The FOS:TEST Report aimed to (1) define the current, unmet needs in surgical training, (2) assess the current evidence-base of technologies that may be beneficial to training and map these onto both the patient and trainee pathway and (3) make recommendations on the development, assessment, and adoption of novel surgical technologies. The FOS:TEST Commission was formed by the Association of Surgeons in Training (ASiT), The Royal College of Surgeons of England (RCS England) Robotics and Digital Surgery Group and representatives from all trainee specialty associations. Two national datasets provided by Health Education England were used to identify unmet surgical training needs through qualitative analysis against pre-defined coding frameworks. These unmet needs were prioritised at two virtual consensus hackathons and mapped to the patient and trainee pathway and the capabilities in practice (CiPs) framework. The commission received more than 120 evidence submissions from surgeons in training, consultant surgeons and training leaders. Following peer review, 32 were selected that covered a range of innovations. Contributors also highlighted several important key considerations, including the changing pedagogy of surgical training, the ethics and challenges of big data and machine learning, sustainability, and health economics. This summates to 7 Key Recommendations and 51 concluding statements. The FOS:TEST Commission was borne out of what is a pivotal point in the digital transformation of surgical training. Academic expertise and collaboration will be required to evaluate efficacy of any novel training solution. However, this must be coupled with pragmatic assessments of feasibility and cost to ensure that any intervention is scalable for national implementation. Currently, there is no replacement for hands-on operating. However, for future UK and ROI surgeons to stay relevant in a global market, our training methods must adapt. The Future of Surgery: Technology Enhanced Surgical Training Report provides a blueprint for how this can be achieved.

APA, Harvard, Vancouver, ISO, and other styles

Gur, Amit, Edward Buckler, Joseph Burger, Yaakov Tadmor, and Iftach Klapp. Characterization of genetic variation and yield heterosis in Cucumis melo. United States Department of Agriculture, January 2016. http://dx.doi.org/10.32747/2016.7600047.bard.

Full text

Abstract:

Project objectives: 1) Characterization of variation for yield heterosis in melon using Half-Diallele (HDA) design. 2) Development and implementation of image-based yield phenotyping in melon. 3) Characterization of genetic, epigenetic and transcriptional variation across 25 founder lines and selected hybrids. The epigentic part of this objective was modified during the course of the project: instead of characterization of chromatin structure in a single melon line through genome-wide mapping of nucleosomes using MNase-seq approach, we took advantage of rapid advancements in single-molecule sequencing and shifted the focus to Nanoporelong-read sequencing of all 25 founder lines. This analysis provides invaluable information on genome-wide structural variation across our diversity 4) Integrated analyses and development of prediction models Agricultural heterosis relates to hybrids that outperform their inbred parents for yield. First generation (F1) hybrids are produced in many crop species and it is estimated that heterosis increases yield by 15-30% globally. Melon (Cucumismelo) is an economically important species of The Cucurbitaceae family and is among the most important fleshy fruits for fresh consumption Worldwide. The major goal of this project was to explore the patterns and magnitude of yield heterosis in melon and link it to whole genome sequence variation. A core subset of 25 diverse lines was selected from the Newe-Yaar melon diversity panel for whole-genome re-sequencing (WGS) and test-crosses, to produce structured half-diallele design of 300 F1 hybrids (MelHDA25). Yield variation was measured in replicated yield trials at the whole-plant and at the rootstock levels (through a common-scion grafted experiments), across the F1s and parental lines. As part of this project we also developed an algorithmic pipeline for detection and yield estimation of melons from aerial-images, towards future implementation of such high throughput, cost-effective method for remote yield evaluation in open-field melons. We found extensive, highly heritable root-derived yield variation across the diallele population that was characterized by prominent best-parent heterosis (BPH), where hybrids rootstocks outperformed their parents by 38% and 56 % under optimal irrigation and drought- stress, respectively. Through integration of the genotypic data (~4,000,000 SNPs) and yield analyses we show that root-derived hybrids yield is independent of parental genetic distance. However, we mapped novel root-derived yield QTLs through genome-wide association (GWA) analysis and a multi-QTLs model explained more than 45% of the hybrids yield variation, providing a potential route for marker-assisted hybrid rootstock breeding. Four selected hybrid rootstocks are further studied under multiple scion varieties and their validated positive effect on yield performance is now leading to ongoing evaluation of their commercial potential. On the genomic level, this project resulted in 3 layers of data: 1) whole-genome short-read Illumina sequencing (30X) of the 25 founder lines provided us with 25 genome alignments and high-density melon HapMap that is already shown to be an effective resource for QTL annotation and candidate gene analysis in melon. 2) fast advancements in long-read single-molecule sequencing allowed us to shift focus towards this technology and generate ~50X Nanoporesequencing of the 25 founders which in combination with the short-read data now enable de novo assembly of the 25 genomes that will soon lead to construction of the first melon pan-genome. 3) Transcriptomic (3' RNA-Seq) analysis of several selected hybrids and their parents provide preliminary information on differentially expressed genes that can be further used to explain the root-derived yield variation. Taken together, this project expanded our view on yield heterosis in melon with novel specific insights on root-derived yield heterosis. To our knowledge, thus far this is the largest systematic genetic analysis of rootstock effects on yield heterosis in cucurbits or any other crop plant, and our results are now translated into potential breeding applications. The genomic resources that were developed as part of this project are putting melon in the forefront of genomic research and will continue to be useful tool for the cucurbits community in years to come.

APA, Harvard, Vancouver, ISO, and other styles

O’Brien, Tom, Deanna Matsumoto, Diana Sanchez, Caitlin Mace, Elizabeth Warren, Eleni Hala, and Tyler Reeb. Southern California Regional Workforce Development Needs Assessment for the Transportation and Supply Chain Industry Sectors. Mineta Transportation Institute, October 2020. http://dx.doi.org/10.31979/mti.2020.1921.

Full text

Abstract:

COVID-19 brought the public’s attention to the critical value of transportation and supply chain workers as lifelines to access food and other supplies. This report examines essential job skills required of the middle-skill workforce (workers with more than a high school degree, but less than a four-year college degree). Many of these middle-skill transportation and supply chain jobs are what the Federal Reserve Bank defines as “opportunity occupations” -- jobs that pay above median wages and can be accessible to those without a four-year college degree. This report lays out the complex landscape of selected technological disruptions of the supply chain to understand the new workforce needs of these middle-skill workers, followed by competencies identified by industry. With workplace social distancing policies, logistics organizations now rely heavily on data management and analysis for their operations. All rungs of employees, including warehouse workers and truck drivers, require digital skills to use mobile devices, sensors, and dashboards, among other applications. Workforce training requires a focus on data, problem solving, connectivity, and collaboration. Industry partners identified key workforce competencies required in digital literacy, data management, front/back office jobs, and in operations and maintenance. Education and training providers identified strategies to effectively develop workforce development programs. This report concludes with an exploration of the role of Institutes of Higher Education in delivering effective workforce education and training programs that reimagine how to frame programs to be customizable, easily accessible, and relevant.

APA, Harvard, Vancouver, ISO, and other styles

Henderson, Tim, Mincent Santucci, Tim Connors, and Justin Tweet. National Park Service geologic type section inventory: Chihuahuan Desert Inventory & Monitoring Network. National Park Service, April 2021. http://dx.doi.org/10.36967/nrr-2285306.

Full text

Abstract:

A fundamental responsibility of the National Park Service is to ensure that park resources are preserved, protected, and managed in consideration of the resources themselves and for the benefit and enjoyment by the public. Through the inventory, monitoring, and study of park resources, we gain a greater understanding of the scope, significance, distribution, and management issues associated with these resources and their use. This baseline of natural resource information is available to inform park managers, scientists, stakeholders, and the public about the conditions of these resources and the factors or activities which may threaten or influence their stability. There are several different categories of geologic or stratigraphic units (supergroup, group, formation, member, bed) which represent a hierarchical system of classification. The mapping of stratigraphic units involves the evaluation of lithologies, bedding properties, thickness, geographic distribution, and other factors. If a new mappable geologic unit is identified, it may be described and named through a rigorously defined process that is standardized and codified by the professional geologic community (North American Commission on Stratigraphic Nomenclature 2005). In most instances when a new geologic unit such as a formation is described and named in the scientific literature, a specific and well-exposed section of the unit is designated as the type section or type locality (see Definitions). The type section is an important reference section for a named geologic unit which presents a relatively complete and representative profile for this unit. The type or reference section is important both historically and scientifically, and should be recorded such that other researchers may evaluate it in the future. Therefore, this inventory of geologic type sections in NPS areas is an important effort in documenting these locations in order that NPS staff recognize and protect these areas for future studies. The documentation of all geologic type sections throughout the 423 units of the NPS is an ambitious undertaking. The strategy for this project is to select a subset of parks to begin research for the occurrence of geologic type sections within particular parks. The focus adopted for completing the baseline inventories throughout the NPS was centered on the 32 inventory and monitoring networks (I&M) established during the late 1990s. The I&M networks are clusters of parks within a defined geographic area based on the ecoregions of North America (Fenneman 1946; Bailey 1976; Omernik 1987). These networks share similar physical resources (geology, hydrology, climate), biological resources (flora, fauna), and ecological characteristics. Specialists familiar with the resources and ecological parameters of the network, and associated parks, work with park staff to support network level activities (inventory, monitoring, research, data management). Adopting a network-based approach to inventories worked well when the NPS undertook paleontological resource inventories for the 32 I&M networks. The network approach is also being applied to the inventory for the geologic type sections in the NPS. The planning team from the NPS Geologic Resources Division who proposed and designed this inventory selected the Greater Yellowstone Inventory and Monitoring Network (GRYN) as the pilot network for initiating this project. Through the research undertaken to identify the geologic type sections within the parks of the GRYN, methodologies for data mining and reporting on these resources was established. Methodologies and reporting adopted for the GRYN have been used in the development of this type section inventory for the Chihuahuan Desert Inventory & Monitoring Network. The goal of this project is to consolidate information pertaining to geologic type sections which occur within NPS-administered areas, in order that this information is available throughout the NPS...

APA, Harvard, Vancouver, ISO, and other styles

Henderson, Tim, Vincent Santucci, Tim Connors, and Justin Tweet. National Park Service geologic type section inventory: Northern Colorado Plateau Inventory & Monitoring Network. National Park Service, April 2021. http://dx.doi.org/10.36967/nrr-2285337.

Full text

Abstract:

A fundamental responsibility of the National Park Service (NPS) is to ensure that park resources are preserved, protected, and managed in consideration of the resources themselves and for the benefit and enjoyment by the public. Through the inventory, monitoring, and study of park resources, we gain a greater understanding of the scope, significance, distribution, and management issues associated with these resources and their use. This baseline of natural resource information is available to inform park managers, scientists, stakeholders, and the public about the conditions of these resources and the factors or activities which may threaten or influence their stability. There are several different categories of geologic or stratigraphic units (supergroup, group, formation, member, bed) which represent a hierarchical system of classification. The mapping of stratigraphic units involves the evaluation of lithologies, bedding properties, thickness, geographic distribution, and other factors. If a new mappable geologic unit is identified, it may be described and named through a rigorously defined process that is standardized and codified by the professional geologic community (North American Commission on Stratigraphic Nomenclature 2005). In most instances when a new geologic unit such as a formation is described and named in the scientific literature, a specific and well-exposed section of the unit is designated as the type section or type locality (see Definitions). The type section is an important reference section for a named geologic unit which presents a relatively complete and representative profile. The type or reference section is important both historically and scientifically, and should be available for other researchers to evaluate in the future. Therefore, this inventory of geologic type sections in NPS areas is an important effort in documenting these locations in order that NPS staff recognize and protect these areas for future studies. The documentation of all geologic type sections throughout the 423 units of the NPS is an ambitious undertaking. The strategy for this project is to select a subset of parks to begin research for the occurrence of geologic type sections within particular parks. The focus adopted for completing the baseline inventories throughout the NPS was centered on the 32 inventory and monitoring networks (I&M) established during the late 1990s. The I&M networks are clusters of parks within a defined geographic area based on the ecoregions of North America (Fenneman 1946; Bailey 1976; Omernik 1987). These networks share similar physical resources (geology, hydrology, climate), biological resources (flora, fauna), and ecological characteristics. Specialists familiar with the resources and ecological parameters of the network, and associated parks, work with park staff to support network level activities (inventory, monitoring, research, data management). Adopting a network-based approach to inventories worked well when the NPS undertook paleontological resource inventories for the 32 I&M networks. The network approach is also being applied to the inventory for the geologic type sections in the NPS. The planning team from the NPS Geologic Resources Division who proposed and designed this inventory selected the Greater Yellowstone Inventory and Monitoring Network (GRYN) as the pilot network for initiating this project. Through the research undertaken to identify the geologic type sections within the parks of the GRYN methodologies for data mining and reporting on these resources was established. Methodologies and reporting adopted for the GRYN have been used in the development of this type section inventory for the Northern Colorado Plateau Inventory & Monitoring Network. The goal of this project is to consolidate information pertaining to geologic type sections which occur within NPS-administered areas, in order that this information is available throughout the NPS...

APA, Harvard, Vancouver, ISO, and other styles

Henderson, Tim, Vincent Santucci, Tim Connors, and Justin Tweet. National Park Service geologic type section inventory: Klamath Inventory & Monitoring Network. National Park Service, July 2021. http://dx.doi.org/10.36967/nrr-2286915.

Full text

Abstract:

A fundamental responsibility of the National Park Service (NPS) is to ensure that park resources are preserved, protected, and managed in consideration of the resources themselves and for the benefit and enjoyment by the public. Through the inventory, monitoring, and study of park resources, we gain a greater understanding of the scope, significance, distribution, and management issues associated with these resources and their use. This baseline of natural resource information is available to inform park managers, scientists, stakeholders, and the public about the conditions of these resources and the factors or activities which may threaten or influence their stability. There are several different categories of geologic or stratigraphic units (supergroup, group, formation, member, bed) which represent a hierarchical system of classification. The mapping of stratigraphic units involves the evaluation of lithologies, bedding properties, thickness, geographic distribution, and other factors. If a new mappable geologic unit is identified, it may be described and named through a rigorously defined process that is standardized and codified by the professional geologic community (North American Commission on Stratigraphic Nomenclature 2005). In most instances when a new geologic unit such as a formation is described and named in the scientific literature, a specific and well-exposed section of the unit is designated as the type section or type locality (see Definitions). The type section is an important reference section for a named geologic unit which presents a relatively complete and representative profile. The type or reference section is important both historically and scientifically, and should be protected and conserved for researchers to study and evaluate in the future. Therefore, this inventory of geologic type sections in NPS areas is an important effort in documenting these locations in order that NPS staff recognize and protect these areas for future studies. The documentation of all geologic type sections throughout the 423 units of the NPS is an ambitious undertaking. The strategy for this project is to select a subset of parks to begin research for the occurrence of geologic type sections within particular parks. The focus adopted for completing the baseline inventories throughout the NPS was centered on the 32 inventory and monitoring networks (I&M) established during the late 1990s. The I&M networks are clusters of parks within a defined geographic area based on the ecoregions of North America (Fenneman 1946; Bailey 1976; Omernik 1987). These networks share similar physical resources (geology, hydrology, climate), biological resources (flora, fauna), and ecological characteristics. Specialists familiar with the resources and ecological parameters of the network, and associated parks, work with park staff to support network level activities (inventory, monitoring, research, data management). Adopting a network-based approach to inventories worked well when the NPS undertook paleontological resource inventories for the 32 I&M networks. The network approach is also being applied to the inventory for the geologic type sections in the NPS. The planning team from the NPS Geologic Resources Division who proposed and designed this inventory selected the Greater Yellowstone Inventory and Monitoring Network (GRYN) as the pilot network for initiating this project. Through the research undertaken to identify the geologic type sections within the parks of the GRYN methodologies for data mining and reporting on these resources were established. Methodologies and reporting adopted for the GRYN have been used in the development of this type section inventory for the Klamath Inventory & Monitoring Network. The goal of this project is to consolidate information pertaining to geologic type sections which occur within NPS-administered areas, in order that this information is available throughout the NPS to inform park managers...

APA, Harvard, Vancouver, ISO, and other styles

Balyk, Nadiia, Svitlana Leshchuk, and Dariia Yatsenyak. Developing a Mini Smart House model. [б. в.], February 2020. http://dx.doi.org/10.31812/123456789/3741.

Full text

Abstract:

The work is devoted to designing a smart home educational model. The authors analyzed the literature in the field of the Internet of Things and identified the basic requirements for the training model. It contains the following levels: command, communication, management. The authors identify the main subsystems of the training model: communication, signaling, control of lighting, temperature, filling of the garbage container, monitoring of sensor data. The proposed smart home educational model takes into account the economic indicators of resource utilization, which gives the opportunity to save on payment for their consumption. The hardware components for the implementation of the Mini Smart House were selected in the article. It uses a variety of technologies to conveniently manage it and use renewable energy to power it. The model was produced independently by students involved in the STEM project. Research includes sketching, making construction parts, sensor assembly and Arduino boards, programming in the Arduino IDE environment, testing the functioning of the system. Research includes sketching, making some parts, assembly sensor and Arduino boards, programming in the Arduino IDE environment, testing the functioning of the system. Approbation Mini Smart House researches were conducted within activity the STEM-center of Physics and Mathematics Faculty of Ternopil Volodymyr Hnatiuk National Pedagogical University, in particular during the educational process and during numerous trainings and seminars for pupils and teachers of computer science.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Selected subset of training data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Selected subset of training data"

Dissertations / Theses on the topic "Selected subset of training data"

Books on the topic "Selected subset of training data"

Book chapters on the topic "Selected subset of training data"

Conference papers on the topic "Selected subset of training data"

Reports on the topic "Selected subset of training data"