Relevant bibliographies by topics / Dataset noise

Academic literature on the topic 'Dataset noise'

Author: Grafiati

Published: 7 July 2024

Last updated: 7 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Dataset noise.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Journal articles on the topic "Dataset noise":

Jia, Qingrui, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li, Haoyi Xiong, and Dejing Dou. "Learning from Training Dynamics: Identifying Mislabeled Data beyond Manually Designed Features." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8041–49. http://dx.doi.org/10.1609/aaai.v37i7.25972.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training dynamics, i.e., the traces left by iterations of optimization algorithms, have recently been proved to be effective to localize mislabeled samples with hand-crafted features. In this paper, beyond manually designed features, we introduce a novel learning-based solution, leveraging a noise detector, instanced by an LSTM network, which learns to predict whether a sample was mislabeled using the raw training dynamics as input. Specifically, the proposed method trains the noise detector in a supervised manner using the dataset with synthesized label noises and can adapt to various datasets (either naturally or synthesized label-noised) without retraining. We conduct extensive experiments to evaluate the proposed method. We train the noise detector based on the synthesized label-noised CIFAR dataset and test such noise detector on Tiny ImageNet, CUB-200, Caltech-256, WebVision and Clothing1M. Results show that the proposed method precisely detects mislabeled samples on various datasets without further adaptation, and outperforms state-of-the-art methods. Besides, more experiments demonstrate that the mislabel identification can guide a label correction, namely data debugging, providing orthogonal improvements of algorithm-centric state-of-the-art techniques from the data aspect.

Jiang, Gaoxia, Jia Zhang, Xuefei Bai, Wenjian Wang, and Deyu Meng. "Which Is More Effective in Label Noise Cleaning, Correction or Filtering?" Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (March 24, 2024): 12866–73. http://dx.doi.org/10.1609/aaai.v38i11.29183.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Most noise cleaning methods adopt one of the correction and filtering modes to build robust models. However, their effectiveness, applicability, and hyper-parameter insensitivity have not been carefully studied. We compare the two cleaning modes via a rebuilt error bound in noisy environments. At the dataset level, Theorem 5 implies that correction is more effective than filtering when the cleaned datasets have close noise rates. At the sample level, Theorem 6 indicates that confident label noises (large noise probabilities) are more suitable to be corrected, and unconfident noises (medium noise probabilities) should be filtered. Besides, an imperfect hyper-parameter may have fewer negative impacts on filtering than correction. Unlike existing methods with a single cleaning mode, the proposed Fusion cleaning framework of Correction and Filtering (FCF) combines the advantages of different modes to deal with diverse suspicious labels. Experimental results demonstrate that our FCF method can achieve state-of-the-art performance on benchmark datasets.

Fu, Bo, Xiangyi Zhang, Liyan Wang, Yonggong Ren, and Dang N. H. Thanh. "A blind medical image denoising method with noise generation network." Journal of X-Ray Science and Technology 30, no. 3 (April 15, 2022): 531–47. http://dx.doi.org/10.3233/xst-211098.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

BACKGROUND: In the process of medical images acquisition, the unknown mixed noise will affect image quality. However, the existing denoising methods usually focus on the known noise distribution. OBJECTIVE: In order to remove the unknown real noise in low-dose CT images (LDCT), a two-step deep learning framework is proposed in this study, which is called Noisy Generation-Removal Network (NGRNet). METHODS: Firstly, the output results of L0 Gradient Minimization are used as the labels of a dental CT image dataset to form a pseudo-image pair with the real dental CT images, which are used to train the noise generation network to estimate real noise distribution. Then, for the lung CT images of the LIDC/IDRI database, we migrate the real noise to the noise-free lung CT images, to construct a new almost-real noisy images dataset. Since dental images and lung images are all CT images, this migration can be achieved. The denoising network is trained to realize the denoising of real LDCT for dental images by using this dataset but can extend for any low-dose CT images. RESULTS: To prove the effectiveness of our NGRNet, we conduct experiments on lung CT images with synthetic noise and tooth CT images with real noise. For synthetic noise image datasets, experimental results show that NGRNet is superior to existing denoising methods in terms of visual effect and exceeds 0.13dB in the peak signal-to-noise ratio (PSNR). For real noisy image datasets, the proposed method can achieve the best visual denoising effect. CONCLUSIONS: The proposed method can retain more details and achieve impressive denoising performance.

Choi, Hwiyong, Haesang Yang, Seungjun Lee, and Woojae Seong. "Classification of Inter-Floor Noise Type/Position Via Convolutional Neural Network-Based Supervised Learning." Applied Sciences 9, no. 18 (September 7, 2019): 3735. http://dx.doi.org/10.3390/app9183735.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Inter-floor noise, i.e., noise transmitted from one floor to another floor through walls or ceilings in an apartment building or an office of a multi-layered structure, causes serious social problems in South Korea. Notably, inaccurate identification of the noise type and position by human hearing intensifies the conflicts between residents of apartment buildings. In this study, we propose a robust approach using deep convolutional neural networks (CNNs) to learn and identify the type and position of inter-floor noise. Using a single mobile device, we collected nearly 2000 inter-floor noise events that contain 5 types of inter-floor noises generated at 9 different positions on three floors in a Seoul National University campus building. Based on pre-trained CNN models designed and evaluated separately for type and position classification, we achieved type and position classification accuracy of 99.5% and 95.3%, respectively in validation datasets. In addition, the robustness of noise type classification with the model was checked against a new test dataset. This new dataset was generated in the building and contains 2 types of inter-floor noises at 10 new positions. The approximate positions of inter-floor noises in the new dataset with respect to the learned positions are presented.

Hossain, Sadat, and Bumshik Lee. "NG-GAN: A Robust Noise-Generation Generative Adversarial Network for Generating Old-Image Noise." Sensors 23, no. 1 (December 26, 2022): 251. http://dx.doi.org/10.3390/s23010251.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Numerous old images and videos were captured and stored under unfavorable conditions. Hence, old images and videos have uncertain and different noise patterns compared with those of modern ones. Denoising old images is an effective technique for reconstructing a clean image containing crucial information. However, obtaining noisy-clean image pairs for denoising old images is difficult and challenging for supervised learning. Preparing such a pair is expensive and burdensome, as existing denoising approaches require a considerable number of noisy-clean image pairs. To address this issue, we propose a robust noise-generation generative adversarial network (NG-GAN) that utilizes unpaired datasets to replicate the noise distribution of degraded old images inspired by the CycleGAN model. In our proposed method, the perception-based image quality evaluator metric is used to control noise generation effectively. An unpaired dataset is generated by selecting clean images with features that match the old images to train the proposed model. Experimental results demonstrate that the dataset generated by our proposed NG-GAN can better train state-of-the-art denoising models by effectively denoising old videos. The denoising models exhibit significantly improved peak signal-to-noise ratios and structural similarity index measures of 0.37 dB and 0.06 on average, respectively, on the dataset generated by our proposed NG-GAN.

Zhang, Rui, Zhenghao Chen, Sanxing Zhang, Fei Song, Gang Zhang, Quancheng Zhou, and Tao Lei. "Remote Sensing Image Scene Classification with Noisy Label Distillation." Remote Sensing 12, no. 15 (July 24, 2020): 2376. http://dx.doi.org/10.3390/rs12152376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The widespread applications of remote sensing image scene classification-based Convolutional Neural Networks (CNNs) are severely affected by the lack of large-scale datasets with clean annotations. Data crawled from the Internet or other sources allows for the most rapid expansion of existing datasets at a low-cost. However, directly training on such an expanded dataset can lead to network overfitting to noisy labels. Traditional methods typically divide this noisy dataset into multiple parts. Each part fine-tunes the network separately to improve performance further. These approaches are inefficient and sometimes even hurt performance. To address these problems, this study proposes a novel noisy label distillation method (NLD) based on the end-to-end teacher-student framework. First, unlike general knowledge distillation methods, NLD does not require pre-training on clean or noisy data. Second, NLD effectively distills knowledge from labels across a full range of noise levels for better performance. In addition, NLD can benefit from a fully clean dataset as a model distillation method to improve the student classifier’s performance. NLD is evaluated on three remote sensing image datasets, including UC Merced Land-use, NWPU-RESISC45, AID, in which a variety of noise patterns and noise amounts are injected. Experimental results show that NLD outperforms widely used directly fine-tuning methods and remote sensing pseudo-labeling methods.

Van Hulse, Jason, Taghi M. Khoshgoftaar, and Amri Napolitano. "Evaluating the Impact of Data Quality on Sampling." Journal of Information & Knowledge Management 10, no. 03 (September 2011): 225–45. http://dx.doi.org/10.1142/s021964921100295x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Learning from imbalanced training data can be a difficult endeavour, and the task is made even more challenging if the data is of low quality or the size of the training dataset is small. Data sampling is a commonly used method for improving learner performance when data is imbalanced. However, little effort has been put forth to investigate the performance of data sampling techniques when data is both noisy and imbalanced. In this work, we present a comprehensive empirical investigation of the impact of changes in four training dataset characteristics — dataset size, class distribution, noise level and noise distribution — on data sampling techniques. We present the performance of four common data sampling techniques using 11 learning algorithms. The results, which are based on an extensive suite of experiments for which over 15 million models were trained and evaluated, show that: (1) even for relatively clean datasets, class imbalance can still hurt learner performance, (2) data sampling, however, may not improve performance for relatively clean but imbalanced datasets, (3) data sampling can be very effective at dealing with the combined problems of noise and imbalance, (4) both the level and distribution of class noise among the classes are important, as either factor alone does not cause a significant impact, (5) when sampling does improve the learners (i.e. for noisy and imbalanced datasets), RUS and SMOTE are the most effective at improving the AUC, while SMOTE performed well relative to the F-measure, (6) there are significant differences in the empirical results depending on the performance measure used, and hence it is important to consider multiple metrics in this type of analysis, and (7) data sampling rarely hurt the AUC, but only significantly improved performance when data was at least moderately skewed or noisy, while for the F-measure, data sampling often resulted in significantly worse performance when applied to slightly skewed or noisy datasets, but did improve performance when data was either severely noisy or skewed, or contained moderate levels of both noise and imbalance.

Nogales, Alberto, Javier Caracuel-Cayuela, and Álvaro J. García-Tejedor. "Analyzing the Influence of Diverse Background Noises on Voice Transmission: A Deep Learning Approach to Noise Suppression." Applied Sciences 14, no. 2 (January 15, 2024): 740. http://dx.doi.org/10.3390/app14020740.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. The training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combined with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise suppression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. The results have been assessed with objective (mathematical metrics) and subjective (listening to a set of audios by humans) methods. Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work should involve refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical applications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality.

Kramberger, Tin, and Božidar Potočnik. "LSUN-Stanford Car Dataset: Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training." Applied Sciences 10, no. 14 (July 17, 2020): 4913. http://dx.doi.org/10.3390/app10144913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved car image dataset that would be better suited for GAN training. To improve the performance of the GAN, we coupled the LSUN and Stanford car datasets. A new merged dataset was then pruned in order to adjust zoom levels and reduce the noise of images. This process resulted in fewer images that could be used for training, with increased quality though. This pruned dataset was evaluated by training the StyleGAN with original settings. Pruning the combined LSUN and Stanford datasets resulted in 2,067,710 images of cars with less noise and more adjusted zoom levels. The training of the StyleGAN on the LSUN-Stanford car dataset proved to be superior to the training with just the LSUN dataset by 3.7% using the Fréchet Inception Distance (FID) as a metric. Results pointed out that the proposed LSUN-Stanford car dataset is more consistent and better suited for training GAN neural networks than other currently available large car datasets.

Shi, Haoxiang, Jun Ai, Jingyu Liu, and Jiaxi Xu. "Improving Software Defect Prediction in Noisy Imbalanced Datasets." Applied Sciences 13, no. 18 (September 19, 2023): 10466. http://dx.doi.org/10.3390/app131810466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Software defect prediction is a popular method for optimizing software testing and improving software quality and reliability. However, software defect datasets usually have quality problems, such as class imbalance and data noise. Oversampling by generating the minority class samples is one of the most well-known methods to improving the quality of datasets; however, it often introduces overfitting noise to datasets. To better improve the quality of these datasets, this paper proposes a method called US-PONR, which uses undersampling to remove duplicate samples from version iterations and then uses oversampling through propensity score matching to reduce class imbalance and noise samples in datasets. The effectiveness of this method was validated in a software prediction experiment that involved 24 versions of software data in 11 projects from PROMISE in noisy environments that varied from 0% to 30% noise level. The experiments showed a significant improvement in the quality of datasets pre-processed by US-PONR in noisy imbalanced datasets, especially the noisiest ones, compared with 12 other advanced dataset processing methods. The experiments also demonstrated that the US-PONR method can effectively identify the label noise samples and remove them.

More sources

Dissertations / Theses on the topic "Dataset noise":

Kvist, Eric, and Rhodin Sandro Lockvall. "A comparative study between MLP and CNN for noise reduction on images : The impact of different input dataset sizes and the impact of different types of noise on performance." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259654.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Images damaged by noise present a problem that can be addressed by performing noise-reduction using neural networks. This thesis analyses the performance of two different neural networks, a Mulilayer Perceptron (MLP) and a Convolutional Neural Network (CNN), when performing noise reduction on images. Specifically focusing on the impact of the size of dataset used to train the two different kinds of neural networks has on the performance, as well as how well these two networks perform when reducing different types of noise. This in an attempt to determine whether the use of the more modern type of network, the CNN, performs better than the older type of network, the MLP, specifically for image noise reduction. The results show as expected that the MLP performs worse than the CNN, also that the impact of the size of the dataset and choice of noise to be reduced is, albeit of great impact on the performance, not as important as the choice of neural network.
Bilder som är utsatta för brus är ett problem som kan adresseras genom att utföra brusreduktion med hjälp av neurala nätverk. I denna studie analyseras effekt-skillnader i brusredusering av bilder för två olika typer av neurala nätverk, en Multilayer Perceptron (MLP) och ett konvolutionellt neuralt nätverk (CNN). Fokus ligger specifikt på hur indatans storlek under träningen, är påverkad av två olika typer av neuronnätverk samt hur bra dessa två neurala nätverk presterar när de reducerar olika typer av brus. Detta i ett försök att avgöra om användningen av den modernare typen av nätverk, CNN har högre prestanda än den äldre typen, MLP för brusreducering. Resultaten visar som förväntat att MLP:n fungerar sämre än CNN:n, också att effekten av indatans storlek och valet av brus att reduceras är, trots att de båda har en stor inverkan på prestandan, inte lika viktigt som valet av neuralt nätverk.

Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.

Al, Jurdi Wissam. "Towards next generation recommender systems through generic data quality." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCD005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les systèmes de recommandation sont essentiels pour filtrer les informations en ligne et fournir un contenu personnalisé, réduisant ainsi l’effort nécessaire pour trouver des informations pertinentes. Ils jouent un rôle crucial dans divers domaines, dont le commerce électronique, en aidant les clients à trouver des produits pertinents, améliorant l’expérience utilisateur et augmentant les ventes. Un aspect significatif de ces systèmes est le concept d’inattendu, qui implique la découverte d’éléments nouveaux et surprenants. Cependant, il est complexe et subjectif, nécessitant une compréhension approfondie des recommandations fortuites pour sa mesure et son optimisation. Le bruit naturel, une variation imprévisible des données, peut influencer la sérendipité dans les systèmes de recommandation. Il peut introduire de la diversité et de l’inattendu dans les recommandations, conduisant à des surprises agréables. Cependant, il peut également réduire la pertinence de la recommandation. Par conséquent, il est crucial de concevoir des systèmes qui équilibrent le bruit naturel et la sérendipité. Cette thèse souligne le rôle de la sérendipité dans l’amélioration des systèmes de recommandation et la prévention des bulles de filtre. Elle propose des techniques conscientes de la sérendipité pour gérer le bruit, identifie les défauts de l’algorithme, suggère une méthode d’évaluation centrée sur l’utilisateur, et propose une architecture basée sur la communauté pour une performance améliorée
Recommender systems are essential for filtering online information and delivering personalized content, thereby reducing the effort users need to find relevant information. They can be content-based, collaborative, or hybrid, each with a unique recommendation approach. These systems are crucial in various fields, including e-commerce, where they help customers find pertinent products, enhancing user experience and increasing sales. A significant aspect of these systems is the concept of unexpectedness, which involves discovering new and surprising items. This feature, while improving user engagement and experience, is complex and subjective, requiring a deep understanding of serendipitous recommendations for its measurement and optimization. Natural noise, an unpredictable data variation, can influence serendipity in recommender systems. It can introduce diversity and unexpectedness in recommendations, leading to pleasant surprises. However, it can also reduce recommendation relevance, causing user frustration. Therefore, it is crucial to design systems that balance natural noise and serendipity. Inconsistent user information due to natural noise can negatively impact recommender systems, leading to lower-quality recommendations. Current evaluation methods often overlook critical user-oriented factors, making noise detection a challenge. To provide powerful recommendations, it’s important to consider diverse user profiles, eliminate noise in datasets, and effectively present users with relevant content from vast data catalogs. This thesis emphasizes the role of serendipity in enhancing recommender systems and preventing filter bubbles. It proposes serendipity-aware techniques to manage noise, identifies algorithm flaws, suggests a user-centric evaluation method, and proposes a community-based architecture for improved performance. It highlights the need for a system that balances serendipity and considers natural noise and other performance factors. The objectives, experiments, and tests aim to refine recommender systems and offer a versatile assessment approach

Fonseca, Eduardo. "Training sound event classifiers using different types of supervision." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/673067.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The automatic recognition of sound events has gained attention in the past few years, motivated by emerging applications in fields such as healthcare, smart homes, or urban planning. When the work for this thesis started, research on sound event classification was mainly focused on supervised learning using small datasets, often carefully annotated with vocabularies limited to specific domains (e.g., urban or domestic). However, such small datasets do not support training classifiers able to recognize hundreds of sound events occurring in our everyday environment, such as kettle whistles, bird tweets, cars passing by, or different types of alarms. At the same time, large amounts of environmental sound data are hosted in websites such as Freesound or YouTube, which can be convenient for training large-vocabulary classifiers, particularly using data-hungry deep learning approaches. To advance the state-of-the-art in sound event classification, this thesis investigates several strands of dataset creation as well as supervised and unsupervised learning to train large-vocabulary sound event classifiers, using different types of supervision in novel and alternative ways. Specifically, we focus on supervised learning using clean and noisy labels, as well as self-supervised representation learning from unlabeled data. The first part of this thesis focuses on the creation of FSD50K, a large-vocabulary dataset with over 100h of audio manually labeled using 200 classes of sound events. We provide a detailed description of the creation process and a comprehensive characterization of the dataset. In addition, we explore architectural modifications to increase shift invariance in CNNs, improving robustness to time/frequency shifts in input spectrograms. In the second part, we focus on training sound event classifiers using noisy labels. First, we propose a dataset that supports the investigation of real label noise. Then, we explore network-agnostic approaches to mitigate the effect of label noise during training, including regularization techniques, noise-robust loss functions, and strategies to reject noisy labeled examples. Further, we develop a teacher-student framework to address the problem of missing labels in sound event datasets. In the third part, we propose algorithms to learn audio representations from unlabeled data. In particular, we develop self-supervised contrastive learning frameworks, where representations are learned by comparing pairs of examples computed via data augmentation and automatic sound separation methods. Finally, we report on the organization of two DCASE Challenge Tasks on automatic audio tagging with noisy labels. By providing data resources as well as state-of-the-art approaches and audio representations, this thesis contributes to the advancement of open sound event research, and to the transition from traditional supervised learning using clean labels to other learning strategies less dependent on costly annotation efforts.
El interés en el reconocimiento automático de eventos sonoros se ha incrementado en los últimos años, motivado por nuevas aplicaciones en campos como la asistencia médica, smart homes, o urbanismo. Al comienzo de esta tesis, la investigación en clasificación de eventos sonoros se centraba principalmente en aprendizaje supervisado usando datasets pequeños, a menudo anotados cuidadosamente con vocabularios limitados a dominios específicos (como el urbano o el doméstico). Sin embargo, tales datasets no permiten entrenar clasificadores capaces de reconocer los cientos de eventos sonoros que ocurren en nuestro entorno, como silbidos de kettle, sonidos de pájaros, coches pasando, o diferentes alarmas. Al mismo tiempo, websites como Freesound o YouTube albergan grandes cantidades de datos de sonido ambiental, que pueden ser útiles para entrenar clasificadores con un vocabulario más extenso, particularmente utilizando métodos de deep learning que requieren gran cantidad de datos. Para avanzar el estado del arte en la clasificación de eventos sonoros, esta tesis investiga varios aspectos de la creación de datasets, así como de aprendizaje supervisado y no supervisado para entrenar clasificadores de eventos sonoros con un vocabulario extenso, utilizando diferentes tipos de supervisión de manera novedosa y alternativa. En concreto, nos centramos en aprendizaje supervisado usando etiquetas sin ruido y con ruido, así como en aprendizaje de representaciones auto-supervisado a partir de datos no etiquetados. La primera parte de esta tesis se centra en la creación de FSD50K, un dataset con más de 100h de audio etiquetado manualmente usando 200 clases de eventos sonoros. Presentamos una descripción detallada del proceso de creación y una caracterización exhaustiva del dataset. Además, exploramos modificaciones arquitectónicas para aumentar la invariancia frente a desplazamientos en CNNs, mejorando la robustez frente a desplazamientos de tiempo/frecuencia en los espectrogramas de entrada. En la segunda parte, nos centramos en entrenar clasificadores de eventos sonoros usando etiquetas con ruido. Primero, proponemos un dataset que permite la investigación del ruido de etiquetas real. Después, exploramos métodos agnósticos a la arquitectura de red para mitigar el efecto del ruido en las etiquetas durante el entrenamiento, incluyendo técnicas de regularización, funciones de coste robustas al ruido, y estrategias para rechazar ejemplos etiquetados con ruido. Además, desarrollamos un método teacher-student para abordar el problema de las etiquetas ausentes en datasets de eventos sonoros. En la tercera parte, proponemos algoritmos para aprender representaciones de audio a partir de datos sin etiquetar. En particular, desarrollamos métodos de aprendizaje contrastivos auto-supervisados, donde las representaciones se aprenden comparando pares de ejemplos calculados a través de métodos de aumento de datos y separación automática de sonido. Finalmente, reportamos sobre la organización de dos DCASE Challenge Tasks para el tageado automático de audio a partir de etiquetas ruidosas. Mediante la propuesta de datasets, así como de métodos de vanguardia y representaciones de audio, esta tesis contribuye al avance de la investigación abierta sobre eventos sonoros y a la transición del aprendizaje supervisado tradicional utilizando etiquetas sin ruido a otras estrategias de aprendizaje menos dependientes de costosos esfuerzos de anotación.

CAPPOZZO, ANDREA. "Robust model-based classification and clustering: advances in learning from contaminated datasets." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2020. http://hdl.handle.net/10281/262919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Al momento della stesura della tesi, ogni giorno viene raccolta una quantità sempre maggiore di dati, con un volume stimato che è destinato a raddoppiare ogni due anni. Grazie ai progressi tecnologici, i datasets stanno diventando enormi in termini di dimensioni e sostanzialmente più complessi in natura. Tuttavia, questa abbondanza di informazioni non elaborate ha un prezzo: misurazioni errate, errori di immissione dei dati, guasti dei sistemi di raccolta automatica e diverse altre cause possono in definitiva compromettere la qualità complessiva dei dati. I metodi robusti hanno un ruolo centrale nel convertire correttamente le informazioni grezze contaminate in conoscenze affidabili: un obiettivo primario di qualsiasi analisi statistica. La tesi presenta nuove metodologie per ottenere risultati affidabili, nell'ambito della classificazione e del clustering model-based, in presenza di dati contaminati. In primo luogo, si propone una modifica robusta di una famiglia di modelli semi-supervisionati, per ottenere una corretta classificazione in presenza di valori anomali ed errori nelle etichette. In secondo luogo, si sviluppa un metodo di analisi discriminante per il rilevamento di anomalie e novelties, con l'obiettivo finale di scoprire outliers, osservazioni assegnate a classi sbagliate e gruppi non precedentemente osservati nel training set. In terzo luogo, si introducono due metodi per la selezione delle variabili robusta, che eseguono efficacemente una high-dimensional classification in uno scenario adulterato.
At the time of writing, an ever-increasing amount of data is collected every day, with its volume estimated to be doubling every two years. Thanks to the technological advancements, datasets are becoming massive in terms of size and substantially more complex in nature. Nevertheless, this abundance of ``raw information'' does come at a price: wrong measurements, data-entry errors, breakdowns of automatic collection systems and several other causes may ultimately undermine the overall data quality. To this extent, robust methods have a central role in properly converting contaminated ``raw information'' to trustworthy knowledge: a primary goal of any statistical analysis. The present manuscript presents novel methodologies for performing reliable inference, within the model-based classification and clustering framework, in presence of contaminated data. First, we propose a robust modification to a family of semi-supervised patterned models, for accomplishing classification when dealing with both class and attribute noise. Second, we develop a discriminant analysis method for anomaly and novelty detection, with the final aim of discovering label noise, outliers and unobserved classes in an unlabelled dataset. Third, we introduce two robust variable selection methods, that effectively perform high-dimensional discrimination within an adulterated scenario.

Rehn, Ruben, and Ricky Molén. "The ghost in the machine : Exploring the impact of noise in datasets used for graph-based action recognition." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302503.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Human action recognition is the task of classifying human movement and actions from video data. To benchmark different algorithms within the action recognition field, a common benchmark dataset, called NTU-RGB+D is used. However, this dataset is not without its issues as some samples contain data that is mistakenly captured as a human. In the context of this thesis, these are defined as ghost bodies. This thesis explores to what extent the accuracy of a state-of-the-art directed graph neural net, DGNN, is affected if trained without ghost bodies. The results suggest that the accuracy increases by 1.79 percentage points when ghost bodies are excluded during testing with an unofficial implementation of the DGNN. However, the results of the original DGNN could not be fully replicated which undermines the strength of the results. Despite this, given the importance of the NTU dataset within action recognition, we suggest considering a new benchmark dataset that takes ghost bodies into account. While the results of the study are not generalizable, the measured difference in recognition accuracy still points to the the necessity of looking deeper into the phenomenon of ghost bodies within action recognition.
Mänsklig rörelseigenkänning (en. human action recognition) är forskningsområdet ägnat åt att känna igen mänskliga rörelser från videodata. För att kunna jämföra olika algoritmer inom området förekommer ofta ett standardiserat datasetet, NTU-RGB+D, som bland annat innehåller skelettrepresentationer av människor som utför rörelser. Trots datasetets vida användning inom rörelseigenkänning innehåller det vad som i denna uppsats benämns spökkroppar (en. ghost bodies). Dessa artefakter i datasetet är skelettrepresentationer som felaktigt klassats som att de tillhör en människokropp när de i själva verket utgör något annat icke-mänskligt objekt i videodatan. Experimentet som redogörs för i denna uppsats har ägnats åt att undersöka hur dessa spökkroppar påverkar rörelseigenkänningsprecisionen (en. action recognition accuracy) hos ett nutida riktad-graf-baserat neuralt nätverk (en. directed graph neural network, DGNN). Resultaten visar att igenkänningsprecisionen tycks öka med 1,79 procentenheter när grafnätverket tränas utan förekomster av spökkroppar. Resultaten bör dock tolkas med försiktighet då den igenkänningsprecision som rapporterats för grafnätverket i originalexperimentet inte kunde replikeras. Trots detta utgör NTU ett så pass viktigt dataset för forskning inom rörelseigenkänning, att vidare analys och förbättring av datasetet med avseende på spökkropparna är att rekommendera. Även om resultaten inte kan generaliseras bortom det grafnätverk som experimentet utfördes med, pekar ändå den uppmätta skillnaden i igenkänningsprecision på vikten av vidare analys vad gäller spökkroppars inverkan på moderna algoritmer inom rörelseigenkänning.

Jia, Sen. "Data from the wild in computer vision : generating and exploiting large scale and noisy datasets." Thesis, University of Bristol, 2016. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.738203.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Osman, Ahmad. "Automated evaluation of three dimensional ultrasonic datasets." Phd thesis, INSA de Lyon, 2013. http://tel.archives-ouvertes.fr/tel-00995119.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Non-destructive testing has become necessary to ensure the quality of materials and components either in-service or at the production stage. This requires the use of a rapid, robust and reliable testing technique. As a main testing technique, the ultrasound technology has unique abilities to assess the discontinuity location, size and shape. Such information play a vital role in the acceptance criteria which are based on safety and quality requirements of manufactured components. Consequently, an extensive usage of the ultrasound technique is perceived especially in the inspection of large scale composites manufactured in the aerospace industry. Significant technical advances have contributed into optimizing the ultrasound acquisition techniques such as the sampling phased array technique. However, acquisition systems need to be complemented with an automated data analysis procedure to avoid the time consuming manual interpretation of all produced data. Such a complement would accelerate the inspection process and improve its reliability. The objective of this thesis is to propose an analysis chain dedicated to automatically process the 3D ultrasound volumes obtained using the sampling phased array technique. First, a detailed study of the speckle noise affecting the ultrasound data was conducted, as speckle reduces the quality of ultrasound data. Afterward, an analysis chain was developed, composed of a segmentation procedure followed by a classification procedure. The proposed segmentation methodology is adapted for ultrasound 3D data and has the objective to detect all potential defects inside the input volume. While the detection of defects is vital, one main difficulty is the high amount of false alarms which are detected by the segmentation procedure. The correct distinction of false alarms is necessary to reduce the rejection ratio of safe parts. This has to be done without risking missing true defects. Therefore, there is a need for a powerful classifier which can efficiently distinguish true defects from false alarms. This is achieved using a specific classification approach based on data fusion theory. The chain was tested on several ultrasound volumetric measures of Carbon Fiber Reinforced Polymers components. Experimental results of the chain revealed high accuracy, reliability in detecting, characterizing and classifying defects.

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called âoff-lineâ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to âon-lineâ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.

Chen, Jun-An, and 陳俊安. "Using Fuzzy Support Vector Machine to Solve Imbalanced Datasets and Noise Problems." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/12559097922249906761.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

碩士
朝陽科技大學
資訊工程系
103
This paper proposed a method that removes the redundant training data in order to retrieve the support vectors and introduces fuzzy support vector machine to solve imbalanced datasets problems. Firstly, all categories of training data were clustered and the probability of training data belongs to support vectors were computing, and then randomly remove the non-support vector so that the number of data in each category was reached balanced. Next, the degrees of membership of training data were calculated by using fuzzy k-nearest neighborhood algorithm, in order to identify and remove the noise. Finally, the data obtained from the above treatment are recombined to construct a fuzzy support vector machine. In this paper, UCI WCBD (Wisconsin Breast Cancer Dataset) repository was selected for the experiment. The experimental results that are achieved by the proposed method were compared to some well know techniques, i.e. the classical SMOTE approach, SBC approach, and SUNDO approach. Experimental results reveal that the proposed approach outperforms with other approaches.

Books on the topic "Dataset noise":

W, Spencer Roy, McNider Richard T, and United States. National Aeronautics and Space Administration., eds. Reducing noise in the MSU daily lower-tropospheric global temperature dataset. [Washington, D.C: National Aeronautics and Space Administration, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Machine Learning Methods with Noisy, Incomplete or Small Datasets. MDPI, 2021. http://dx.doi.org/10.3390/books978-3-0365-1288-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Dataset noise":

Alrashed, Tarfah, Dimitris Paparas, Omar Benjelloun, Ying Sheng, and Natasha Noy. "Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages." In The Semantic Web – ISWC 2021, 338–56. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-88361-4_20.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractSemantic markup, such as , allows providers on the Web to describe content using a shared controlled vocabulary. This markup is invaluable in enabling a broad range of applications, from vertical search engines, to rich snippets in search results, to actions on emails, to many others. In this paper, we focus on semantic markup for datasets, specifically in the context of developing a vertical search engine for datasets on the Web, Google’s Dataset Search. Dataset Search relies on to identify pages that describe datasets. While was the core enabling technology for this vertical search, we also discovered that we need to address the following problem: pages from 61% of internet hosts that provide markup do not actually describe datasets. We analyze the veracity of dataset markup for Dataset Search’s Web-scale corpus and categorize pages where this markup is not reliable. We then propose a way to drastically increase the quality of the dataset metadata corpus by developing a deep neural-network classifier that identifies whether or not a page with markup is a dataset page. Our classifier achieves 96.7% recall at the 95% precision point. This level of precision enables Dataset Search to circumvent the noise in semantic markup and to use the metadata to provide high quality results to users.

Hagn, Korbinian, and Oliver Grau. "Optimized Data Synthesis for DNN Training and Validation by Sensor Artifact Simulation." In Deep Neural Networks and Data for Automated Driving, 127–47. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-01233-4_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractSynthetic, i.e., computer-generated imagery (CGI) data is a key component for training and validating deep-learning-based perceptive functions due to its ability to simulate rare cases, avoidance of privacy issues, and generation of pixel-accurate ground truth data. Today, physical-based rendering (PBR) engines simulate already a wealth of realistic optical effects but are mainly focused on the human perception system. Whereas the perceptive functions require realistic images modeled with sensor artifacts as close as possible toward the sensor, the training data has been recorded. This chapter proposes a way to improve the data synthesis process by application of realistic sensor artifacts. To do this, one has to overcome the domain distance between real-world imagery and the synthetic imagery. Therefore, we propose a measure which captures the generalization distance of two distinct datasets which have been trained on the same model. With this measure the data synthesis pipeline can be improved to produce realistic sensor-simulated images which are closer to the real-world domain. The proposed measure is based on the Wasserstein distance (earth mover’s distance, EMD) over the performance metric mean intersection-over-union (mIoU) on a per-image basis, comparing synthetic and real datasets using deep neural networks (DNNs) for semantic segmentation. This measure is subsequently used to match the characteristic of a real-world camera for the image synthesis pipeline which considers realistic sensor noise and lens artifacts. Comparing the measure with the well-established Fréchet inception distance (FID) on real and artificial datasets demonstrates the ability to interpret the generalization distance which is inherent asymmetric and more informative than just a simple distance measure. Furthermore, we use the metric as an optimization criterion to adapt a synthetic dataset to a real dataset, decreasing the EMD distance between a synthetic and the Cityscapes dataset from 32.67 to 27.48 and increasing the mIoU of our test algorithm () from 40.36 to $$47.63\%$$ 47.63 % .

Singstad, Bjørn Jostein, Bendik Steinsvåg Dalen, Sandhya Sihra, Nickolas Forsch, and Samuel Wall. "Identifying Ionic Channel Block in a Virtual Cardiomyocyte Population Using Machine Learning Classifiers." In Computational Physiology, 91–109. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05164-7_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractImmature cardiomyocytes, such as those obtained by stem cell differentiation, have been shown to be useful alternatives to mature cardiomyocytes, which are limited in availability and difficult to obtain, for evaluating the behaviour of drugs for treating arrhythmia. In silico models of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) can be used to simulate the behaviour of the transmembrane potential and cytosolic calcium under drug-treated conditions. Simulating the change in action potentials due to various ionic current blocks enables the approximation of drug behaviour. We used eight machine learning classification models to predict partial block of seven possible ion currents $$ (\textit{I}_{\textit{CaL}},\textit{I}_{\textit{Kr}},\textit{I}_{\textit{to}},\textit{I}_{\textit{K1}},\textit{I}_{\textit{Na}},\textit{I}_{\textit{NaL}} and \textit{I}_{\textit{Ks}}) $$ in a simulated dataset containing nearly 4600 action potentials represented as a paired measure of transmembrane potential and cytosolic calcium. Each action potential was generated under 1 $$ \textit{H}_{\textit{z}} $$ pacing. The Convolutional Neural Network outperformed the other models with an average accuracy of predicting partial ionic current block of 93% in noise-free data and 72% accuracy with 3% added random noise. Our results show that $$ \textit{I}_{\textit{CaL}} $$ and $$ \textit{I}_{\textit{Kr}} $$ current block were classified with high accuracy with and without noise. The classification of $$ \textit{I}_{\textit{to}} $$ , $$ \textit{I}_{\textit{K1}} $$ and $$ \textit{I}_{\textit{Na}} $$ current block showed high accuracy at 0% noise, but showed a significant decrease in accuracy when noise was added. Finally, the accuracy of $$ \textit{I}_{\textit{NaL}} $$ and $$ \textit{I}_{\textit{Ks}} $$ classification were relatively lower than the other current blocks at 0% noise and also showed a significant drop in accuracy when noise was added. In conclusion, these machine learning methods may present a pathway for estimating drug response in adult phenotype cardiac systems, but the data must be sufficiently filtered to remove noise before being used with classifier algorithms.

Spreeuwers, Luuk, Maikel Schils, Raymond Veldhuis, and Una Kelly. "Practical Evaluation of Face Morphing Attack Detection Methods." In Handbook of Digital Face Manipulation and Detection, 351–65. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-87664-7_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractFace morphing is a technique to combine facial images of two (or more) subjects such that the result resembles both subjects. In a morphing attack, this is exploited by, e.g., applying for a passport with the morphed image. Both subjects who contributed to the morphed image can then travel using this passport. Many state-of-the-art face recognition systems are vulnerable to morphing attacks. Morphing attack detection (MAD) methods are developed to mitigate this threat. MAD methods published in literature are often trained on a limited number of or even a single dataset where all morphed faces are created using the same procedure. The resulting MAD methods work well for these specific datasets, with reported detection rates of over 99%, but their performance collapses for face morphs created using other procedures. Often even simple image manipulations, like adding noise or smoothing cause a serious degradation in performance of the MAD methods. In addition, more advanced tools exist to manipulate the face morphs, like manual retouching or morphing artifacts can be concealed by printing and scanning a photograph (as used in the passport application process in many countries). Furthermore, datasets for training and testing MAD methods are often created by morphing images from arbitrary subjects including even male-female morphs and morphs between subjects with different skin color. Although this may result in a large number of morphed faces, the created morphs are often not convincing and certainly don’t represent a best effort attack by a criminal. A far more realistic attack would include careful selection of subjects that look alike and create high quality morphs from images of these subjects using careful (manual) post-processing. In this chapter we therefore argue that for robust evaluation of MAD methods, we require datasets with morphed images created using a large number of different morphing methods, including various ways to conceal the morphing artifacts by, e.g., adding noise, smoothing, printing and scanning, various ways of pre- and post-processing, careful selection of the subjects and multiple facial datasets. We also show the sensitivity of various MAD methods to the mentioned variations and the effect of training MAD methods on multiple datasets.

Frank, David, Keyan Fang, and Patrick Fonti. "Dendrochronology: Fundamentals and Innovations." In Stable Isotopes in Tree Rings, 21–59. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-92698-4_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractThis chapter overviews long-standing foundations, methods, and concepts of dendrochronology, yet also pays attention to a few related paradigm shifts driven by isotope measurements in tree-rings. The basics of annual ring formation are first reviewed, followed by structural descriptions of tree-rings at the macroscopic-to-microscopic scale including earlywoodandlatewoodin conifers (gymnosperms) and hardwoods (angiosperms), as well as wood anatomical features. Numerous examples of inter-disciplinary applications connected to various tree-ring parameters are provided. With the foundation of tree-rings established, this chapter then describes the process and necessity for crossdating—the process by which each and every ring is assigned to a specific year. Methods and terminology related to field sampling also briefly described. The long-standing paradigm of site selection criteria—well shown to maximize common signals in tree-ring width datasets—is challenged in a brief discussion of newer tree-ring isotope literature demonstrating that robust chronologies with high signal-to-noise ratios can be obtained at non-ecotonal locations. Opportunities for isotope measurements to enable crossdating in otherwise challenging contexts are likewise highlighted. The chapter reviews a conceptual framework to disaggregate tree-ring time-series, with special attention to detrending and standardization methods used to mitigate tree-age/size related noise common to many applications such as dendroclimatic reconstruction. Some of the drivers of long-term trends in tree-ring isotope data such as the increase in the atmospheric concentration of CO2, age/size/height trends, and climate variation are presented along with related debates/uncertainties evident in literature in order to establish priorities for future investigations. The development of tree-ring chronologies and related quality control metrics used to assess the common signal and the variance of tree-ring data are described, along with the limitations in correlation based statistics to determine the robustness of tree-ring datasets particularly in the low frequency domain. These statistical methods will gain relevance as tree-ring isotope datasets increasingly approach sample replications and dataset structures typical for tree-ring width measurements.

Mulder, Valentin, and Mathias Humbert. "Differential Privacy." In Trends in Data Protection and Encryption Technologies, 157–61. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-33386-6_27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractDifferential privacy is a technology that allows sharing of information about a dataset while protecting individual privacy by adding noise to the results. It will have the following effect: if the arbitrary single substitution in the database is small enough, then the query result cannot be used to infer much about any single individual. In the cases of counting, summation, or average queries over a large, single table of data, Differential privacy is ready to be used effectively. One key drawback of differential privacy is that it often trades data accuracy for privacy. Differential privacy could be a great tool to help the government and large companies better comply with the demand for data privacy.

Passonneau, Rebecca J., Cynthia Rudin, Axinia Radeva, and Zhi An Liu. "Reducing Noise in Labels and Features for a Real World Dataset: Application of NLP Corpus Annotation Methods." In Computational Linguistics and Intelligent Text Processing, 86–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-00382-0_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Stojanovski, David, Uxio Hermida, Pablo Lamata, Arian Beqiri, and Alberto Gomez. "Echo from Noise: Synthetic Ultrasound Image Generation Using Diffusion Models for Real Image Segmentation." In Simplifying Medical Ultrasound, 34–43. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-44521-7_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractWe propose a novel pipeline for the generation of synthetic ultrasound images via Denoising Diffusion Probabilistic Models (DDPMs) guided by cardiac semantic label maps. We show that these synthetic images can serve as a viable substitute for real data in the training of deep-learning models for ultrasound image analysis tasks such as cardiac segmentation. To demonstrate the effectiveness of this approach, we generated synthetic 2D echocardiograms and trained a neural network for segmenting the left ventricle and left atrium. The performance of the network trained on exclusively synthetic images was evaluated on an unseen dataset of real images and yielded mean Dice scores of $$88.6 \pm 4.91$$ 88.6 ± 4.91 , $$91.9 \pm 4.22$$ 91.9 ± 4.22 , $$85.2 \pm 4.83$$ 85.2 ± 4.83 % for left ventricular endocardium, epicardium and left atrial segmentation respectively. This represents a relative increase of 9.2, 3.3 and 13.9% in Dice scores compared to the previous state-of-the-art. The proposed pipeline has potential for application to a wide range of other tasks across various medical imaging modalities.

Wang, Zehui, Luca Koroll, Wolfram Höpken, and Matthias Fuchs. "Analysis of Instagram Users’ Movement Pattern by Cluster Analysis and Association Rule Mining." In Information and Communication Technologies in Tourism 2022, 97–109. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94751-4_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractUnderstanding the characteristics of tourists’ movements is essential for tourism destination management. With advances in information and communication technology, more and more people are willing to upload photos and videos to various social media platforms while traveling. These openly available media data is gaining increasing attention in the field of movement pattern mining as a new data source. In this study, uploaded images and their geographic information within Lake Constance region, Germany were collected and through clustering analysis, a state-of-the-art k-means with noise removal algorithm was compared with the commonly used DBCSCAN on Instagram dataset. Finally, association rules between popular attractions at region-level and city-level were mined respectively. Results show that social media data like Instagram constitute a valuable input to analyse tourists’ movement patterns as input to decision support and destination management.

Cai, Hua, Qing Xu, and Weilin Shen. "Complex Relative Position Encoding for Improving Joint Extraction of Entities and Relations." In Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications, 644–55. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-2456-9_66.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractRelative position encoding (RPE) is important for transformer based pretrained language model to capture sequence ordering of input tokens. Transformer based model can detect entity pairs along with their relation for joint extraction of entities and relations. However, prior works suffer from the redundant entity pairs, or ignore the important inner structure in the process of extracting entities and relations. To address these limitations, in this paper, we first use BERT with complex relative position encoding (cRPE) to encode the input text information, then decompose the joint extraction task into two interrelated subtasks, namely head entity extraction and tail entity relation extraction. Owing to the excellent feature representation and reasonable decomposition strategy, our model can fully capture the semantic interdependence between different steps, as well as reduce noise from irrelevant entity pairs. Experimental results show that the F1 score of our method outperforms previous baseline work, achieving a better result on NYT-multi dataset with F1 score of 0.935.

Conference papers on the topic "Dataset noise":

Brummer, Benoit, and Christophe De Vleeschouwer. "Natural Image Noise Dataset." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2019. http://dx.doi.org/10.1109/cvprw.2019.00228.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hu, Xiaomin, Ying Tang, Xinmu Zhu, Qi Qiu, Tao Zhu, and Chao Liu. "WNRoom Dataset: White Noise Dataset for Detecting the Status of Room." In 2023 IEEE 11th International Conference on Information, Communication and Networks (ICICN). IEEE, 2023. http://dx.doi.org/10.1109/icicn59530.2023.10392954.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pio, Pedro B., Adriano Rivolli, André C. P. L. F. de Carvalho, and Luís P. F. Garcia. "Noise filter with hyperparameter recommendation: a meta-learning approach." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234295.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Applying Machine Learning (ML) algorithms to a dataset can be time-consuming. It usually involves, not only selecting and fine-tuning the algorithm, but also other steps, such as data preprocessing. To reduce this time, the whole or a subset of this process has been automated by Automated ML (AutoML) techniques, which can include Bayesian Optimization, Genetic Programming, and Meta-Learning techniques. However, despite it often being a necessary stage, preprocessing is commonly not well handled in AutoML tools. In this work, we propose and experimentally investigate the use of meta-learning to recommend noise detection algorithms and the values for their hyperparameters. The proposed approach produces a ranking of the best noise filters for a given dataset, reducing the development cost of ML-based solutions and improving their predictive performance. To validate the process, we generated 10740 noisy datasets, which we describe using 97 meta-features. For each dataset, we applied 8 noise filters, which increased to 27 when we added variations of hyperparameter values. Next, we applied 4 ML algorithms to this data and created a performance ranking, which we used as a meta-target to induce 3 meta-regressors. We compared these 3 meta-regressors and the results with and without hyperparameters for the noise filters. According to the experimental results, the introduction of hyperparameter recommendation resulted in a higher gain in the F1-Score performance metric. However, it came at the cost of lower accuracy in the Top-K ranking evaluation.

Xiang, Cheng, Li Ke, and Yan Jun. "Analyzing Dataset with Noise in Geometric Fashion." In Second International Conference on Information and Computing Science, ICIC 2009. IEEE, 2009. http://dx.doi.org/10.1109/icic.2009.137.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Price, G. "Susceptibility to intense impulse noise: evidence from the Albuquerque dataset." In 159th Meeting Acoustical Society of America/NOISE-CON 2010. ASA, 2010. http://dx.doi.org/10.1121/1.3478337.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hsieh, Jiang. "Generation of training dataset for deep-learning noise reduction." In Physics of Medical Imaging, edited by Rebecca Fahrig, John M. Sabol, and Lifeng Yu. SPIE, 2023. http://dx.doi.org/10.1117/12.2647904.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rücker, Susanna, and Alan Akbik. "CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-main.533.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lew, C. L., C. MacBeth, and A. Elsheikh. "Deep Learning Application for Inverting Petrophysical Properties Directly from Seismic." In ADIPEC. SPE, 2023. http://dx.doi.org/10.2118/216433-ms.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Abstract In this study, we introduce a method to directly invert for porosity, Vclay and hydrocarbon saturation (Shc) simultaneously from pre-stack seismic data using deep learning approach. We implemented L1 norm in the loss function for Shc estimation, added noise into synthetic seismic dataset for training, and estimated uncertainties in the inversion results by training multiple network models. UNet architecture (ResNet-18 as encoder) is used due to its ability to preserve spatial resolution. The inputs for the network are the angle stacks whereas the outputs are the petrophysical properties. We implemented mean-squared error and L1 norm as the loss functions during the training process. The L1 norm is the mean absolute values of the predicted hydrocarbon saturation, which can help promotes sparsity. The network learns on synthetic dataset. We use facies-based geostatistical simulation to generate 1D synthetic petrophysical logs. Then linking the petrophysical properties to elastic properties through rock physics model (RPM), followed by computation of reflectivities using full Zoeppritz equations at five different groups of incidence angles (0°-55°). The traces in each group are convolved with the source wavelet prior to stacking the synthetic seismograms. To increase the variability of possible scenarios, we vary the spherical variogram ranges (8,10, and 12ms), use four different types of suitable RPM, apply oil and gas cases for the hydrocarbon fluid types, and convolve with nine different sets of angle dependent source wavelets. Two synthetic datasets are prepared: Dataset1 (ideal noiseless case) and Dataset 2 (noise added to the angle stacks), and a field data. The first (MLT1) and second (MLT2) machine learning are trained on a sub-dataset in Dataset 1 and 2 respectively. Based on the field dataset, the results from MLT2 show a better prediction performance than MLT1, with an average correlation coefficient of 0.68 (porosity), 0.74 (Vclay) and 0.67 (Shc) achieved. The better results from MLT2 can be related to the nature of measured seismic which contain noise that being learnt by MLT2. For uncertainty estimation, the network (ML3) is trained for 20 times on randomly selected sub-dataset in Dataset 2 using Monte Carlo dropout technique. The uncertainty is estimated by calculating the standard deviation of the solutions provided by ML3 when applying to the field data. Uncertainty estimation allows quantification on the stability of the solutions when varying training dataset.

Li, Ang, Qiuhong Ke, Xingjun Ma, Haiqin Weng, Zhiyuan Zong, Feng Xue, and Rui Zhang. "Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image. In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods. To this end, we first propose a novel data generation approach to generate a universal training dataset, which imitates the noise discrepancies exist in real versus inpainted image contents to train universal detectors. We then design a Noise-Image Cross-fusion Network (NIX-Net) to effectively exploit the discriminative information contained in both the images and their noise patterns. We empirically show, on multiple benchmark datasets, that our approach outperforms existing detection methods by a large margin and generalize well to unseen deep inpainting techniques. Our universal training dataset can also significantly boost the generalizability of existing detection methods.

Jayawardena, Lasal, and Prasan Yapa. "Parafusion: A Large-Scale LLM-Driven English Paraphrase Dataset Infused with High-Quality Lexical and Syntactic Diversity." In 5th International Conference on Artificial Intelligence and Big Data. Academy & Industry Research Collaboration Center, 2024. http://dx.doi.org/10.5121/csit.2024.140418.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Paraphrase generation is a pivotal task in natural language processing (NLP). Existing datasets in the domain lack syntactic and lexical diversity, resulting in paraphrases that closely resemble the source sentences. Moreover, these datasets often contain hate speech and noise, and may unintentionally include non-English language sentences. This research introduces ParaFusion, a large-scale, high-quality English paraphrase dataset developed using Large Language Models (LLM) to address these challenges. ParaFusion augments existing datasets with high-quality data, significantly enhancing both lexical and syntactic diversity while maintaining close semantic similarity. It also mitigates the presence of hate speech and reduces noise, ensuring a cleaner and more focused English dataset. Results show that ParaFusion offers at least a 25% improvement in both syntactic and lexical diversity, measured across several metrics for each data source. The paper also aims to set a gold standard for paraphrase evaluation as it contains one of the most comprehensive evaluation strategies to date. The results underscore the potential of ParaFusion as a valuable resource for improving NLP applications.

Reports on the topic "Dataset noise":

Farahbod, A. M., and J. F. Cassidy. An overview of seismic attenuation in the Eastern Canadian Arctic and the Hudson Bay Complex, Manitoba, Newfoundland and Labrador, Nunavut, Ontario, and Quebec. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/330396.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this study we investigated coda-wave attenuation (QC) from the eastern Canadian Arctic in Nunavut and the Hudson Bay complex including portions of northern Manitoba, Ontario, Quebec and Labrador. We used earthquake recordings from 15 broadband and 3 short period seismograph stations of the Canadian National Seismic Network (CNSN) and 29 broadband stations of the POLARIS network across the region. Our dataset is comprised of 637 earthquakes recorded between 1985 and 2021 with magnitudes ranging from 1.3 to 6.1, depths from 0 to 20 km and epicentral distances of 5 to 100 km. This gives a total of 246 high signal-to-noise (S/N) traces (S/N[lesser/equal]5.0) useful for QC calculation (with a maximum ellipse parameter, a2, of 100) across the region. Coda windows were selected to start at tc = 2tS (two times the travel time of the direct S wave), and were filtered at center frequencies of 2, 4, 8, 12 and 16 Hz. Our study reveals a consistent pattern. We find that in the northern section of the study area, the highest Q0 values (e.g., Q0 of 110 and 112) are at station POIN and station RES, respectively, which are located in the older Archean province. The lowest Q0 values that we find (e.g., Q0 of 55 and 61) are at station AKVQ and IVKQ respectively, located in northern Quebec. Smaller Q0 values for stations in the south are explained by the younger age of the rocks and proximity to the main fault systems. An average for all the data results in a Q relationship of QC = 82f1.08 for the frequency band of 2 to 16 Hz for the entire region.

Farahbod, A. M., and J. F. Cassidy. An overview of seismic attenuation in the Northern Appalachians Seismic Zone, New Brunswick and Nova Scotia. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/329702.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this study we investigated coda-wave attenuation (QC) from the northern Appalachian region of eastern Canada in the two provinces of New Brunswick and Nova Scotia. We used earthquake recordings from 8 broadband and 2 short period seismograph stations of the Canadian National Seismograph Network (CNSN) across the region. Our dataset is comprised of 476 earthquakes recorded between 1983 and 2021 with magnitudes ranging from 1.5 to 4.1, depths from 0 to 20 km (with the vast majority being &lt;10 km) and epicentral distances of 5 to 100 km. This gives a total of 261 high signalto- noise (S/N) traces (S/N greater than or equal to 5.0) useful for QC calculation (with a maximum ellipse parameter, a2, of 100) across the region. Coda windows were selected to start at tc = 2tS (two times the travel time of the direct S wave), and were filtered at center frequencies of 2, 4, 8, 12 and 16 Hz. Our study reveals a consistent pattern. We find that in the northern New Brunswick, the lowest Q0 values (e.g., Q0 of 61) are at station KLN which is the closest station to the epicenter of the 1982 Miramichi earthquake (M 5.8). The highest Q0 values that we find (e.g., Q0 of 178) are at station GGN, located in the southern New Brunswick. Smaller Q0 values for stations in the north (closer to the Charlevoix-Kamouraska seismic zone or Miramichi source area) is explained by Jin and Aki's (1988) finding that Q0 is lower in the vicinity of large earthquakes. An average for all the data results in a Q relationship of QC = 99f0.96 for the frequency band of 2 to 16 Hz for the entire region.

Farahbod, A. M., and J. F. Cassidy. An overview of seismic attenuation in the Charlevoix Seismic Zone, southern Quebec. Natural Resources Canada/CMSS/Information Management, 2023. http://dx.doi.org/10.4095/332158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

We investigate seismic attenuation characteristics of the Charlevoix Seismic Zone. This zone is located ~100 km downstream from Quebec City and is the most seismically active region of eastern Canada. We used earthquake recordings from 8 seismograph stations of the Canadian National Seismic Network (CNSN) across the region. Our dataset is comprised of 584 earthquakes recorded between 1992 and 2022 with magnitudes ranging from 2.0 to 5.4, depths from 0 to 30 km and epicentral distances of 5 to 100 km. This gives a total of 1490 high signal-to-noise (S/N) traces (S/N?5.0) useful for QC calculation (with a maximum ellipse parameter, a2, of 100) across the region. Coda windows were selected to start at tc = 2tS (two times the travel time of the direct S wave), and were filtered at center frequencies of 2, 4, 8, 12 and 16 Hz. Our study reveals a consistent pattern. We find that the highest Q0 (Q at 1 Hz) values are at station A11 (e.g., Q0 of 109), that is the farthest station from the 1663, M~7 earthquake (D=40 km), excluding the new station of CACQ. The lowest Q0 values that we find are at the station A16 (e.g., Q0 of 72) that is the second closest station to the epicenter of the 1663 earthquake (D=16 km) after station A61 (D=10 km). Also, we find the lowest overall average Q0 values at station A16 (e.g., Q0 of 72). In general, Q0 is lower in the vicinity of large earthquakes (Jin & Aki, 1988). Therefore, the low Q0 values at station A16 may suggest that the 1663 earthquake is located slightly southeast of the catalog epicenter, considering high uncertainty associated with historic events. An average for all the data results in a Q relationship of QC = 81f1.06 for the frequency band of 2 to 16 Hz for the entire region.

Farahbod, A., and J. F. Cassidy. Spatial and temporal variations in seismic coda Q attenuation in the lower St. Lawrence region, southeastern Quebec. Natural Resources Canada/CMSS/Information Management, 2023. http://dx.doi.org/10.4095/332027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

We investigate seismic attenuation characteristics of the Lower St. Lawrence seismic zone in southeastern Quebec. This zone is located ~400 km downstream from Quebec City and is between the Quebec North Shore and the Lower St. Lawrence. We used earthquake recordings from 5 broadband and 5 short period seismograph stations of the Canadian National Seismic Network (CNSN) across the region. Our dataset is comprised of 847 earthquakes recorded between 1985 and 2022 with magnitudes ranging from 2.0 to 5.1, depths from 0 to 30 km and epicentral distances of 5 to 100 km. This gives a total of 446 high signal-to-noise (S/N) traces (S/N?5.0) useful for QC calculation (with a maximum ellipse parameter, a2, of 100) across the region. Coda windows were selected to start at tc = 2tS (two times the travel time of the direct S wave), and were filtered at center frequencies of 2, 4, 8, 12 and 16 Hz. Our study reveals a consistent pattern. We find that the lowest overall average of Q0 (Q at 1 Hz) values are at the three stations (GSQ, ICQ and SMQ) within 100 km of a moderate earthquake of mN 5.1 in 1999 (e.g., Q0 of 81, 88 and 80, respectively). We determined temporal variations in attenuation following the 1999 earthquake. The overall average of Q0 decreased from 87 (before the mainshock) to 77 (GSQ, D=96 km), from 92 to 85 (ICQ, D=69 km) and from 88 to 82 (SMQ, D=73 km). These results are in agreement with global studies that show a decrease in Q0 following a significant earthquake, (e.g., M &gt; 5) likely the result of increased fracturing and fluids in the epicentral region. An average for all the data results in a Q relationship of QC = 86f1.07 for the frequency band of 2 to 16 Hz for the entire region.

Тарасова, Олена Юріївна, and Ірина Сергіївна Мінтій. Web application for facial wrinkle recognition. Кривий Ріг, КДПУ, 2022. http://dx.doi.org/10.31812/123456789/7012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Facial recognition technology is named one of the main trends of recent years. It’s wide range of applications, such as access control, biometrics, video surveillance and many other interactive humanmachine systems. Facial landmarks can be described as key characteristics of the human face. Commonly found landmarks are, for example, eyes, nose or mouth corners. Analyzing these key points is useful for a variety of computer vision use cases, including biometrics, face tracking, or emotion detection. Different methods produce different facial landmarks. Some methods use only basic facial landmarks, while others bring out more detail. We use 68 facial markup, which is a common format for many datasets. Cloud computing creates all the necessary conditions for the successful implementation of even the most complex tasks. We created a web application using the Django framework, Python language, OpenCv and Dlib libraries to recognize faces in the image. The purpose of our work is to create a software system for face recognition in the photo and identify wrinkles on the face. The algorithm for determining the presence and location of various types of wrinkles and determining their geometric determination on the face is programmed.

Anderson, Gerald L., and Kalman Peleg. Precision Cropping by Remotely Sensed Prorotype Plots and Calibration in the Complex Domain. United States Department of Agriculture, December 2002. http://dx.doi.org/10.32747/2002.7585193.bard.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This research report describes a methodology whereby multi-spectral and hyperspectral imagery from remote sensing, is used for deriving predicted field maps of selected plant growth attributes which are required for precision cropping. A major task in precision cropping is to establish areas of the field that differ from the rest of the field and share a common characteristic. Yield distribution f maps can be prepared by yield monitors, which are available for some harvester types. Other field attributes of interest in precision cropping, e.g. soil properties, leaf Nitrate, biomass etc. are obtained by manual sampling of the filed in a grid pattern. Maps of various field attributes are then prepared from these samples by the "Inverse Distance" interpolation method or by Kriging. An improved interpolation method was developed which is based on minimizing the overall curvature of the resulting map. Such maps are the ground truth reference, used for training the algorithm that generates the predicted field maps from remote sensing imagery. Both the reference and the predicted maps are stratified into "Prototype Plots", e.g. 15xl5 blocks of 2m pixels whereby the block size is 30x30m. This averaging reduces the datasets to manageable size and significantly improves the typically poor repeatability of remote sensing imaging systems. In the first two years of the project we used the Normalized Difference Vegetation Index (NDVI), for generating predicted yield maps of sugar beets and com. The NDVI was computed from image cubes of three spectral bands, generated by an optically filtered three camera video imaging system. A two dimensional FFT based regression model Y=f(X), was used wherein Y was the reference map and X=NDVI was the predictor. The FFT regression method applies the "Wavelet Based", "Pixel Block" and "Image Rotation" transforms to the reference and remote images, prior to the Fast - Fourier Transform (FFT) Regression method with the "Phase Lock" option. A complex domain based map Yfft is derived by least squares minimization between the amplitude matrices of X and Y, via the 2D FFT. For one time predictions, the phase matrix of Y is combined with the amplitude matrix ofYfft, whereby an improved predicted map Yplock is formed. Usually, the residuals of Y plock versus Y are about half of the values of Yfft versus Y. For long term predictions, the phase matrix of a "field mask" is combined with the amplitude matrices of the reference image Y and the predicted image Yfft. The field mask is a binary image of a pre-selected region of interest in X and Y. The resultant maps Ypref and Ypred aremodified versions of Y and Yfft respectively. The residuals of Ypred versus Ypref are even lower than the residuals of Yplock versus Y. The maps, Ypref and Ypred represent a close consensus of two independent imaging methods which "view" the same target. In the last two years of the project our remote sensing capability was expanded by addition of a CASI II airborne hyperspectral imaging system and an ASD hyperspectral radiometer. Unfortunately, the cross-noice and poor repeatability problem we had in multi-spectral imaging was exasperated in hyperspectral imaging. We have been able to overcome this problem by over-flying each field twice in rapid succession and developing the Repeatability Index (RI). The RI quantifies the repeatability of each spectral band in the hyperspectral image cube. Thereby, it is possible to select the bands of higher repeatability for inclusion in the prediction model while bands of low repeatability are excluded. Further segregation of high and low repeatability bands takes place in the prediction model algorithm, which is based on a combination of a "Genetic Algorithm" and Partial Least Squares", (PLS-GA). In summary, modus operandi was developed, for deriving important plant growth attribute maps (yield, leaf nitrate, biomass and sugar percent in beets), from remote sensing imagery, with sufficient accuracy for precision cropping applications. This achievement is remarkable, given the inherently high cross-noice between the reference and remote imagery as well as the highly non-repeatable nature of remote sensing systems. The above methodologies may be readily adopted by commercial companies, which specialize in proving remotely sensed data to farmers.

Dia Internacional da Conscientização sobre o Ruído — INAD Brasil 2022. Sociedade Brasileira de Acústica (Sobrac), December 2022. http://dx.doi.org/10.55753/aev.v37e54.203.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

O INAD Brasil é o ramo brasileiro da campanha International Noise Awareness Day (INAD), que visa à conscientização sobre os impactos do ruído na saúde e dia a dia da população. Todos os anos o INAD Brasil traz um tema e um lema para destacar a importância dos cuidados relativos aos impactos do ruído na vida cotidiana e na realidade de nosso país. A poluição sonora é um mal que afeta todo o planeta, causando prejuízos à humanidade e ao meio ambiente. Este artigo descreve o desenvolvimento da campanha brasileira referente ao ano de 2022. Inicialmente, há uma breve apresentação do INAD, contextualizando a sua atuação no Brasil ao longo dos anos, seguida da descrição do tema, lema e desenvolvimento dos materiais, além das atividades realizadas a partir do INAD que foi celebrado em 27 de abril de 2022. Existe também um levantamento histórico de todas as datas em que o INAD aconteceu. O texto finaliza anunciando a organização do INAD 2023.

Academic literature on the topic 'Dataset noise'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Dataset noise":

Dissertations / Theses on the topic "Dataset noise":

Books on the topic "Dataset noise":

Book chapters on the topic "Dataset noise":

Conference papers on the topic "Dataset noise":

Reports on the topic "Dataset noise":