Log in

Relevant bibliographies by topics / Data quality and noise / Dissertations / Theses

Dissertations / Theses on the topic 'Data quality and noise'

To see the other types of publications on this topic, follow the link: Data quality and noise.

Author: Grafiati

Published: 7 July 2024

Last updated: 7 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data quality and noise.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Alkharboush, Nawaf Abdullah H. "A data mining approach to improve the automated quality of data." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/65641/1/Nawaf%20Abdullah%20H_Alkharboush_Thesis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.

2

Lie, Chin Cheong Patrick. "Iterative algorithms for fast, signal-to-noise ratio insensitive image restoration." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63767.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Al, Jurdi Wissam. "Towards next generation recommender systems through generic data quality." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCD005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les systèmes de recommandation sont essentiels pour filtrer les informations en ligne et fournir un contenu personnalisé, réduisant ainsi l’effort nécessaire pour trouver des informations pertinentes. Ils jouent un rôle crucial dans divers domaines, dont le commerce électronique, en aidant les clients à trouver des produits pertinents, améliorant l’expérience utilisateur et augmentant les ventes. Un aspect significatif de ces systèmes est le concept d’inattendu, qui implique la découverte d’éléments nouveaux et surprenants. Cependant, il est complexe et subjectif, nécessitant une compréhension approfondie des recommandations fortuites pour sa mesure et son optimisation. Le bruit naturel, une variation imprévisible des données, peut influencer la sérendipité dans les systèmes de recommandation. Il peut introduire de la diversité et de l’inattendu dans les recommandations, conduisant à des surprises agréables. Cependant, il peut également réduire la pertinence de la recommandation. Par conséquent, il est crucial de concevoir des systèmes qui équilibrent le bruit naturel et la sérendipité. Cette thèse souligne le rôle de la sérendipité dans l’amélioration des systèmes de recommandation et la prévention des bulles de filtre. Elle propose des techniques conscientes de la sérendipité pour gérer le bruit, identifie les défauts de l’algorithme, suggère une méthode d’évaluation centrée sur l’utilisateur, et propose une architecture basée sur la communauté pour une performance améliorée
Recommender systems are essential for filtering online information and delivering personalized content, thereby reducing the effort users need to find relevant information. They can be content-based, collaborative, or hybrid, each with a unique recommendation approach. These systems are crucial in various fields, including e-commerce, where they help customers find pertinent products, enhancing user experience and increasing sales. A significant aspect of these systems is the concept of unexpectedness, which involves discovering new and surprising items. This feature, while improving user engagement and experience, is complex and subjective, requiring a deep understanding of serendipitous recommendations for its measurement and optimization. Natural noise, an unpredictable data variation, can influence serendipity in recommender systems. It can introduce diversity and unexpectedness in recommendations, leading to pleasant surprises. However, it can also reduce recommendation relevance, causing user frustration. Therefore, it is crucial to design systems that balance natural noise and serendipity. Inconsistent user information due to natural noise can negatively impact recommender systems, leading to lower-quality recommendations. Current evaluation methods often overlook critical user-oriented factors, making noise detection a challenge. To provide powerful recommendations, it’s important to consider diverse user profiles, eliminate noise in datasets, and effectively present users with relevant content from vast data catalogs. This thesis emphasizes the role of serendipity in enhancing recommender systems and preventing filter bubbles. It proposes serendipity-aware techniques to manage noise, identifies algorithm flaws, suggests a user-centric evaluation method, and proposes a community-based architecture for improved performance. It highlights the need for a system that balances serendipity and considers natural noise and other performance factors. The objectives, experiments, and tests aim to refine recommender systems and offer a versatile assessment approach

4

Sorensen, Thomas J. "Inverse Scattering Image Quality with Noisy Forward Data." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2541.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Demiroglu, Cenk. "Multisensor Segmentation-based Noise Suppression for Intelligibility Improvement in MELP Coders." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/10455.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis investigates the use of an auxiliary sensor, the GEMS device, for improving the quality of noisy speech and designing noise preprocessors to MELP speech coders. Use of auxiliary sensors for noise-robust ASR applications is also investigated to develop speech enhancement algorithms that use acoustic-phonetic properties of the speech signal. A Bayesian risk minimization framework is developed that can incorporate the acoustic-phonetic properties of speech sounds and knowledge of human auditory perception into the speech enhancement framework. Two noise suppression systems are presented using the ideas developed in the mathematical framework. In the first system, an aharmonic comb filter is proposed for voiced speech where low-energy frequencies are severely suppressed while high-energy frequencies are suppressed mildly. The proposed system outperformed an MMSE estimator in subjective listening tests and DRT intelligibility test for MELP-coded noisy speech. The effect of aharmonic comb filtering on the linear predictive coding (LPC) parameters is analyzed using a missing data approach. Suppressing the low-energy frequencies without any modification of the high-energy frequencies is shown to improve the LPC spectrum using the Itakura-Saito distance measure. The second system combines the aharmonic comb filter with the acoustic-phonetic properties of speech to improve the intelligibility of the MELP-coded noisy speech. Noisy speech signal is segmented into broad level sound classes using a multi-sensor automatic segmentation/classification tool, and each sound class is enhanced differently based on its acoustic-phonetic properties. The proposed system is shown to outperform both the MELPe noise preprocessor and the aharmonic comb filter in intelligibility tests when used in concatenation with the MELP coder. Since the second noise suppression system uses an automatic segmentation/classification algorithm, exploiting the GEMS signal in an automatic segmentation/classification task is also addressed using an ASR approach. Current ASR engines can segment and classify speech utterances in a single pass; however, they are sensitive to ambient noise. Features that are extracted from the GEMS signal can be fused with the noisy MFCC features to improve the noise-robustness of the ASR system. In the first phase, a voicing feature is extracted from the clean speech signal and fused with the MFCC features. The actual GEMS signal could not be used in this phase because of insufficient sensor data to train the ASR system. Tests are done using the Aurora2 noisy digits database. The speech-based voicing feature is found to be effective at around 10 dB but, below 10 dB, the effectiveness rapidly drops with decreasing SNR because of the severe distortions in the speech-based features at these SNRs. Hence, a novel system is proposed that treats the MFCC features in a speech frame as missing data if the global SNR is below 10 dB and the speech frame is unvoiced. If the global SNR is above 10 dB of the speech frame is voiced, both MFCC features and voicing feature are used. The proposed system is shown to outperform some of the popular noise-robust techniques at all SNRs. In the second phase, a new isolated monosyllable database is prepared that contains both speech and GEMS data. ASR experiments conducted for clean speech showed that the GEMS-based feature, when fused with the MFCC features, decreases the performance. The reason for this unexpected result is found to be partly related to some of the GEMS data that is severely noisy. The non-acoustic sensor noise exists in all GEMS data but the severe noise happens rarely. A missing data technique is proposed to alleviate the effects of severely noisy sensor data. The GEMS-based feature is treated as missing data when it is detected to be severely noisy. The combined features are shown to outperform the MFCC features for clean speech when the missing data technique is applied.

6

Correia, Fábio Gonçalves. "Quality control of ultra high resolution seismic data acquisition in real-time." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/22007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Mestrado em Engenharia Geológica
A aquisicção de grandes volumes de dados durante uma campanha sísmica exige, necessariamente, mais tempo para o controlo de qualidade (QC). No entanto, o tempo de QC não pode ser extendido devido a limitações do tempo de operação, tendo de ser feito mais rápido, o que pode comprometer a qualidade. A alternativa, alocar mais pessoas e recursos para QC e melhorar a eficiência, leva a aumentos de custo e à necessidade de maiores embarcações. Além disso, o QC tradicional requer tempo de análise após a aquisição, atrasando a desmobilização da embarcação, aumentando assim os custos da aquisição. A solução proposta passou pelo desenvolvimento de um QC automático em tempo real eficiente, testando a Comparação Espetral e o Atributo Razão Sinal-Ruído - ferramentas desenvolvidas no software SPW, usado para processamento de dados sísmicos. Usando este software foi testada a deteção e identificação de dados de fraca qualidade através das ferramentas de QC automáticas e os seus parâmetros ajustados para incluir pelo menos todos os maus registos encontrados manualmente. Foi também feita a deteção e identificação de vários problemas encontrados durante uma campanha de aquisição, tais como fortes ondulações e respetiva direção, o ruído de esteira provocado pelas hélices da embarcação e consequente Trouser’s Effect e mau funcionamento das fontes ou dos recetores. A deteção antecipada destes problemas pode permitir a sua resolução atempada, não comprometendo a aquisição dos dados. Foram feitos vários relatórios para descrever problemas encontrados durante os testes de versões beta do software SPW e os mesmos reportados à equipa da Parallel Geoscience, que atualizou o software de forma a preencher os requisitos necessários ao bom funcionamento do QC em tempo real. Estas atualizações permitiram o correto mapeamento dos headers dos ficheiros, otimização da velocidade de análise das ferramentas automáticas e correção de erros em processamento dos dados em multi-thread, para evitar atrasos entre o QC em tempo real e a aquisição dos dados, adaptação das ferramentas à leitura de um número variável de assinaturas das fontes, otimização dos limites de memória gráfica e correção de valores anómalos de semelhança espetral. Algumas atualizações foram feitas através da simulação da aquisição de dados na empresa, de forma a efetuar alguns ajustes e posteriormente serem feitos testes numa campanha futura. A parametrização destas ferramentas foi alcançada, assegurando-se assim a correta deteção automática dos vários problemas encontrados durante a campanha de aquisição usada para os testes, o que levará à redução do tempo gasto na fase de QC a bordo e ao aumento da sua eficácia.
The acquisition of larger volumes of seismic data during a survey requires, necessarily, more time for quality control (QC). Despite this, QC cannot be extended due operational time constraints and must be done faster, compromising its efficiency and consequently the data quality. The alternative, to allocate more people and resources for QC to improve efficiency, leads to prohibitive higher costs and larger vessel requirements. Therefore, traditional QC methods for large data require extended standby times after data acquisition, before the vessel can be demobilized, increasing the cost of survey. The solution tested here consisted on the development of an efficient Real- Time QC by testing Spectral Comparison and Signal to Noise Ratio Attribute (tools developed for the SPW seismic processing software). The detection and identification of bad data by the automatic QC tools was made and the parameters adapted to include at least all manual QC flags. Also, the detection and identification of common problems during acquisition, such strong wave motion and its direction, strong propeller’s wash, trouser’s effect and malfunction in sources or receivers were carried out. The premature detection of these problems will allow to solve them soon enough to not compromise the data acquisition. Several problem reports from beta tests of SPW were transmitted to the Parallel Geoscience team, to be used as a reference to update the software and fulfil Real-Time QC requirements. These updates brought the correct mapping of data headers in files, optimization of data analysis speed along with multi-thread processing debug, to assure it will be running fast enough to avoid delays between acquisition and Real-Time QC, software design to read a variable number of source signatures, optimization of graphic memory limits and debugging of anomalous spectral semblance values. Some updates resulted from a data acquisition simulation that was set up in the office, to make some adjustments to be later tested on an upcoming survey. The parameterization of these tools was finally achieved, assuring the correct detection of all major issues found during the survey, what will eventually lead to the reduction of time needed for QC stage on board, as also to the improvement of its efficiency.

7

Hardwick, Jonathan Robert. "Synthesis of Noise from Flyover Data." Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/50531.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Flyover noise is a problem that affects citizens, primarily those that live near or around places with high air traffic such as airports or military bases. Such noise can be of great annoyance. The focus of this thesis is in determining a method to create a high fidelity sound source simulation of rotorcraft noise for the purpose of producing a complete flyover scenario to be used in psychoacoustic testing. The focus of the sound source simulation is simulating rotorcraft noise fluctuations during level flight to aid in psychoacoustic testing to determine human perception of such noise. Current methods only model the stationary or time-average components when synthesizing the sound source. The synthesis process described in this thesis determines the steady-state waveform of the noise as well as the time-varying fluctuations for each rotor individually. The process explored in this thesis uses an empirical approach to synthesize flyover noise by directly using physical flyover recordings. Four different methods of synthesis were created to determine the combination of components that produce high fidelity sound source simulation. These four methods of synthesis are: a) Unmodulated main rotor b) Modulated main rotor c) Unmodulated main rotor combined with the unmodulated tail rotor d) Modulated main rotor combined with the modulated tail rotor Since the time-varying components of the source sound are important to the creation of high fidelity sound source simulation, five different types of time-varying fluctuations, or modulations, were implemented to determine the importance of the fluctuating components on the sound source simulation. The types of modulation investigated are a) no modulation, b) randomly applied generic modulation, c) coherently applied generic modulation, d) randomly applied specific modulation, and e) coherently applied specific modulation. Generic modulation is derived from a different section of the source recording to which it is applied. For the purposes of this study, it is not clearly dominated by either thickness or loading noise characteristics, but still displays long-term modulation. Random application of the modulation implies that there is a loss of absolute modulation phase and amplitude information across the frequency spectrum. Coherent application of the modulation implies that an attempt is made to line up the absolute phase and amplitude of the modulation signal with that which is being replaced (i.e. that which was stripped from the original recording and expanding or contracting to fit the signal to which it is applied). Specific modulation is the modulation from the source recording which is being reconstructed. A psychoacoustic test was performed to rank the fidelity of each synthesis method and each type of modulation. Performing this comparison for two different emission angles provides insight as to whether the ranking will differ between the emission angles. The modulated main rotor combined with the modulated tail rotor showed the highest fidelity and had a much higher fidelity than any of the other synthesis methods. The psychoacoustic test proved that modulation is necessary to produce a high fidelity sound source simulation. However, the use of a generic modulation or a randomly applied specific modulation proved to be an inadequate substitute for the coherently applied specific modulation. The results from this research show that more research is necessary to properly simulate a full flyover scenario. Specifically, more data is needed in order to properly model the modulation for level flight.
Master of Science

8

Durand, Philippe. "Traitement des donnees radar varan et estimation de qualites en geologie, geomorphologie et occupation des sols." Paris 7, 1988. http://www.theses.fr/1988PA077183.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Ce travail porte sur l'exploitation thematique des donnees radar varan en geologie et l'occupation des sols. Les deux premieres parties passent en revue les pretraitements subis par l'image: elimination du bruit et corrections geometriques. Ces chapitres suivants exploitent l'analyse multisources, ainsi que les methodes issus de la morphologie mathematique et de l'analyse de texture

9

Grillo, Aderibigbe. "Developing a data quality scorecard that measures data quality in a data warehouse." Thesis, Brunel University, 2018. http://bura.brunel.ac.uk/handle/2438/17137.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants' found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development.

10

Stone, Ian. "The effect of noise on image quality." Thesis, University of Westminster, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.283456.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Sýkorová, Veronika. "Data Quality Metrics." Master's thesis, Vysoká škola ekonomická v Praze, 2008. http://www.nusl.cz/ntk/nusl-2815.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The aim of the thesis is to prove measurability of the Data Quality which is a relatively subjective measure and thus is difficult to measure. In doing this various aspects of measuring the quality of data are analyzed and a Complex Data Quality Monitoring System is introduced with the aim to provide a concept for measuring/monitoring the overall Data Quality in an organization. The system is built on a metrics hierarchy decomposed into particular detailed metrics, dimensions enabling multidimensional analyses of the metrics, and processes being measured by the metrics. The first part of the thesis (Chapter 2 and Chapter 3) is focused on dealing with Data Quality, i.e. provides various definitions of Data Quality, gives reasoning for the importance of Data Quality in a company, and presents some of the most common tools and solutions that target to managing Data Quality in an organization. The second part of the thesis (Chapter 4 and Chapter 5) builds on the previous part and leads into measuring Data Quality using metrics, i.e. contains definition and purpose of Data Quality Metrics, places them into the multidimensional context (dimensions, hierarchies) and states five possible decompositions of Data Quality metrics into detail. The third part of the thesis (Chapter 6) contains the proposed Complex Data Quality Monitoring System including description of Data Quality Management related dimensions and processes, and most importantly detailed definition of bottom-level metrics used for calculation of the overall Data Quality.

12

Crozier, Philip Mark. "Enhancement techniques for noise affected telephone quality speech." Thesis, University of Liverpool, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321115.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Peralta, Veronika. "Data Quality Evaluation in Data Integration Systems." Phd thesis, Université de Versailles-Saint Quentin en Yvelines, 2006. http://tel.archives-ouvertes.fr/tel-00325139.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les besoins d'accéder, de façon uniforme, à des sources de données multiples, sont chaque jour plus forts, particulièrement, dans les systèmes décisionnels qui ont besoin d'une analyse compréhensive des données. Avec le développement des Systèmes d'Intégration de Données (SID), la qualité de l'information est devenue une propriété de premier niveau de plus en plus exigée par les utilisateurs. Cette thèse porte sur la qualité des données dans les SID. Nous nous intéressons, plus précisément, aux problèmes de l'évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l'utilisation de mesures de qualité pour l'amélioration de la conception du SID et de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l'exactitude des données. Nous analysons les différentes définitions et mesures qui ont été proposées pour la fraîcheur et l'exactitude des données et nous faisons émerger les propriétés du SID qui ont un impact important sur leur évaluation. Nous résumons l'analyse de chaque facteur par le biais d'une taxonomie, qui sert à comparer les travaux existants et à faire ressortir les problèmes ouverts. Nous proposons un canevas qui modélise les différents éléments liés à l'évaluation de la qualité tels que les sources de données, les requêtes utilisateur, les processus d'intégration du SID, les propriétés du SID, les mesures de qualité et les algorithmes d'évaluation de la qualité. En particulier, nous modélisons les processus d'intégration du SID comme des processus de workflow, dans lesquels les activités réalisent les tâches qui extraient, intègrent et envoient des données aux utilisateurs. Notre support de raisonnement pour l'évaluation de la qualité est un graphe acyclique dirigé, appelé graphe de qualité, qui a la même structure du SID et contient, comme étiquettes, les propriétés du SID qui sont relevants pour l'évaluation de la qualité. Nous développons des algorithmes d'évaluation qui prennent en entrée les valeurs de qualité des données sources et les propriétés du SID, et, combinent ces valeurs pour qualifier les données délivrées par le SID. Ils se basent sur la représentation en forme de graphe et combinent les valeurs des propriétés en traversant le graphe. Les algorithmes d'évaluation peuvent être spécialisés pour tenir compte des propriétés qui influent la qualité dans une application concrète. L'idée derrière le canevas est de définir un contexte flexible qui permet la spécialisation des algorithmes d'évaluation à des scénarios d'application spécifiques. Les valeurs de qualité obtenues pendant l'évaluation sont comparées à celles attendues par les utilisateurs. Des actions d'amélioration peuvent se réaliser si les exigences de qualité ne sont pas satisfaites. Nous suggérons des actions d'amélioration élémentaires qui peuvent être composées pour améliorer la qualité dans un SID concret. Notre approche pour améliorer la fraîcheur des données consiste à l'analyse du SID à différents niveaux d'abstraction, de façon à identifier ses points critiques et cibler l'application d'actions d'amélioration sur ces points-là. Notre approche pour améliorer l'exactitude des données consiste à partitionner les résultats des requêtes en portions (certains attributs, certaines tuples) ayant une exactitude homogène. Cela permet aux applications utilisateur de visualiser seulement les données les plus exactes, de filtrer les données ne satisfaisant pas les exigences d'exactitude ou de visualiser les données par tranche selon leur exactitude. Comparée aux approches existantes de sélection de sources, notre proposition permet de sélectionner les portions les plus exactes au lieu de filtrer des sources entières. Les contributions principales de cette thèse sont : (1) une analyse détaillée des facteurs de qualité fraîcheur et exactitude ; (2) la proposition de techniques et algorithmes pour l'évaluation et l'amélioration de la fraîcheur et l'exactitude des données ; et (3) un prototype d'évaluation de la qualité utilisable dans la conception de SID.

14

Peralta, Costabel Veronika del Carmen. "Data quality evaluation in data integration systems." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis deals with data quality evaluation in Data Integration Systems (DIS). Specifically, we address the problems of evaluating the quality of the data conveyed to users in response to their queries and verifying if users’ quality expectations can be achieved. We also analyze how quality measures can be used for improving the DIS and enforcing data quality. Our approach consists in studying one quality factor at a time, analyzing its impact within a DIS, proposing techniques for its evaluation and proposing improvement actions for its enforcement. Among the quality factors that have been proposed, this thesis analyzes two of the most used ones: data freshness and data accuracy
Cette thèse porte sur la qualité des données dans les Systèmes d’Intégration de Données (SID). Nous nous intéressons, plus précisément, aux problèmes de l’évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l’utilisation de mesures de qualité pour l’amélioration de la conception du SID et la conséquente amélioration de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l’exactitude des données

15

Deb, Rupam. "Data Quality Enhancement for Traffic Accident Data." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/367725.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Death, injury, and disability resulting from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030, road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Both developed and developing countries suffer from the consequences of the increase in human population, and consequently, vehicle numbers. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analyze traffic accident factors effectively, a complete traffic accident historical database is needed. Road accident fatality rates depend on many factors, so it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Missing data and noisy data in the database obscure the discovery of important factors and lead to invalid conclusions.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text

16

Powers, John W. "Neural networks : an application to electrochemical noise data." Virtual Press, 1997. http://liblink.bsu.edu/uhtbin/catkey/1045629.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Neural networks were applied to the analysis of electrochemical noise data. Electrochemical noise is defined as the fluctuations in either current or potential with time for a metal which is immersed in a conductive solution. This data is of interest because of its relationship to particular corrosion processes. Specifically, a system which is experiencing uniform corrosion will produce a different noise signal than one which is experiencing localized (perforation) corrosion. The economic effects of corrosion are significant and methods which improve the ability to detect, measure and predict corrosion would be extremely valuable.Two series of experiments were conducted. The data for both series were collected from aluminum samples immersed in various aqueous solutions. The series differed from each other in the configuration and programming of the potentiostat which collected the data. The first series only dealt with potential noise while the second series dealt with both potential and current noise. Auxiliary parameters, such as the pH and chloride concentration of the solutions were used in the second series. The first series studied data from only two solutions, while the second series included six solutions.It was possible for neural networks to correctly categorize systems in Series 1 according to the class of corrosion being observed (uniform or perforating). Appropriate data transformation steps were required to effect these classifications and it was also observed that many of these data transformations would lead directly to categorization without the use of a neural network.The additional data collected in Series 2 allowed a more complex analysis. Neural networks were able to simultaneously predict both the propensity towards localized corrosion and the metal dissolution rate. This application demonstrated the power of neural networks.Several types of neural networks and learning algorithms were included in this study. The two systems used most were a backpropagation (multi-layer perceptron) and a radial basis system. Comparisons of the various network systems with regard to speed and accuracy were made.
Department of Mathematical Sciences

17

Cousins, John David. "CEAREX ambient noise data measured northeast of Svalbard." Thesis, Monterey, California. Naval Postgraduate School, 1991. http://hdl.handle.net/10945/28023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Sampson, Aaron (Aaron Lee Kasey). "An analysis of noise in the CoRoT data." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61265.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Thesis (S.B.)--Massachusetts Institute of Technology, Dept. of Physics, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 57).
In this thesis, publically available data from the French/ESA satellite mission CoRoT, designed to seek out extrasolar planets, was analyzed using MATLAB. CoRoT attempts to observe the transits of these planets across their parent stars. CoRoT occupies an orbit which periodically carries it through the Van Allen Belts, resulting in a very high level of high outliers in the flux data. Known systematics and outliers were removed from the data and the remaining scatter was evaluated using the median of absolute deviations from the median (MAD), a measure of scatter which is robust to outliers. The level of scatter (evaluated with MAD) present in this data is indicative of the lower limits on the size of planets detectable by CoRoT or a similar satellite. The MAD for CoRoT stars is correlated with the magnitude. The brightest stars observed by CoRoT display scatter of approximately 0.02 percent, while the median value for all stars is 0.16 percent.
by Aaron Sampson.
S.B.

19

Fisher, Robert W. H. "ExploringWeakly Labeled Data Across the Noise-Bias Spectrum." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/786.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

As the availability of unstructured data on the web continues to increase, it is becoming increasingly necessary to develop machine learning methods that rely less on human annotated training data. In this thesis, we present methods for learning from weakly labeled data. We present a unifying framework to understand weakly labeled data in terms of bias and noise and identify methods that are well suited to learning from certain types of weak labels. To compensate for the tremendous sizes of weakly labeled datasets, we leverage computationally efficient and statistically consistent spectral methods. Using these methods, we present results from four diverse, real-world applications coupled with a unifying simulation environment. This allows us to make general observations that would not be apparent when examining any one application on its own. These contributions allow us to significantly improve prediction when labeled data is available, and they also make learning tractable when the cost of acquiring annotated data is prohibitively high.

20

Wang, Tianmiao. "Non-parametric regression for data with correlated noise." Thesis, University of Bristol, 2017. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.730888.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

He, Ying Surveying &amp Spatial Information Systems Faculty of Engineering UNSW. "Spatial data quality management." Publisher:University of New South Wales. Surveying & Spatial Information Systems, 2008. http://handle.unsw.edu.au/1959.4/43323.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The applications of geographic information systems (GIS) in various areas have highlighted the importance of data quality. Data quality research has been given a priority by GIS academics for three decades. However, the outcomes of data quality research have not been sufficiently translated into practical applications. Users still need a GIS capable of storing, managing and manipulating data quality information. To fill this gap, this research aims to investigate how we can develop a tool that effectively and efficiently manages data quality information to aid data users to better understand and assess the quality of their GIS outputs. Specifically, this thesis aims: 1. To develop a framework for establishing a systematic linkage between data quality indicators and appropriate uncertainty models; 2. To propose an object-oriented data quality model for organising and documenting data quality information; 3. To create data quality schemas for defining and storing the contents of metadata databases; 4. To develop a new conceptual model of data quality management; 5. To develop and implement a prototype system for enhancing the capability of data quality management in commercial GIS. Based on reviews of error and uncertainty modelling in the literature, a conceptual framework has been developed to establish the systematic linkage between data quality elements and appropriate error and uncertainty models. To overcome the limitations identified in the review and satisfy a series of requirements for representing data quality, a new object-oriented data quality model has been proposed. It enables data quality information to be documented and stored in a multi-level structure and to be integrally linked with spatial data to allow access, processing and graphic visualisation. The conceptual model for data quality management is proposed where a data quality storage model, uncertainty models and visualisation methods are three basic components. This model establishes the processes involved when managing data quality, emphasising on the integration of uncertainty modelling and visualisation techniques. The above studies lay the theoretical foundations for the development of a prototype system with the ability to manage data quality. Object-oriented approach, database technology and programming technology have been integrated to design and implement the prototype system within the ESRI ArcGIS software. The object-oriented approach allows the prototype to be developed in a more flexible and easily maintained manner. The prototype allows users to browse and access data quality information at different levels. Moreover, a set of error and uncertainty models are embedded within the system. With the prototype, data quality elements can be extracted from the database and automatically linked with the appropriate error and uncertainty models, as well as with their implications in the form of simple maps. This function results in proposing a set of different uncertainty models for users to choose for assessing how uncertainty inherent in the data can affect their specific application. It will significantly increase the users' confidence in using data for a particular situation. To demonstrate the enhanced capability of the prototype, the system has been tested against the real data. The implementation has shown that the prototype can efficiently assist data users, especially non-expert users, to better understand data quality and utilise it in a more practical way. The methodologies and approaches for managing quality information presented in this thesis should serve as an impetus for supporting further research.

22

Yoo, Seungyup. "Field effect transistor noise model analysis and low noise amplifier design for wireless data communications." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/13024.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Bringle, Per. "Data Quality in Data Warehouses: a Case Study." Thesis, University of Skövde, Department of Computer Science, 1999. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Companies today experience problems with poor data quality in their systems. Because of the enormous amount of data in companies, the data has to be of good quality if companies want to take advantage of it. Since the purpose with a data warehouse is to gather information from several databases for decision support, it is absolutely vital that data is of good quality. There exists several ways of determining or classifying data quality in databases. In this work the data quality management in a large Swedish company's data warehouse is examined, through a case study, using a framework specialized for data warehouses. The quality of data is examined from syntactic, semantic and pragmatic point of view. The results of the examination is then compared with a similar case study previously conducted in order to find any differences and similarities.

24

Redgert, Rebecca. "Evaluating Data Quality in a Data Warehouse Environment." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208766.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The amount of data accumulated by organizations have grown significantly during the last couple of years, increasing the importance of data quality. Ensuring data quality for large amounts of data is a complicated task, but crucial to subsequent analysis. This study investigates how to maintain and improve data quality in a data warehouse. A case study of the errors in a data warehouse was conducted at the Swedish company Kaplan, and resulted in guiding principles on how to improve the data quality. The investigation was done by manually comparing data from the source systems to the data integrated in the data warehouse and applying a quality framework based on semiotic theory to identify errors. The three main guiding principles given are (1) to implement a standardized format for the source data, (2) to implement a check prior to integration where the source data are reviewed and corrected if necessary, and (3) to create and implement specific database integrity rules. Further work is encouraged on establishing a guide for the framework on how to best perform a manual approach for comparing data, and quality assurance of source data.
Mängden data som ackumulerats av organisationer har ökat betydligt under de senaste åren, vilket har ökat betydelsen för datakvalitet. Att säkerställa datakvalitet för stora mängder data är en komplicerad uppgift, men avgörande för efterföljande analys. Denna studie undersöker hur man underhåller och förbättrar datakvaliteten i ett datalager. En fallstudie av fel i ett datalager på det svenska företaget Kaplan genomfördes och resulterade i riktlinjer för hur datakvaliteten kan förbättras. Undersökningen gjordes genom att manuellt jämföra data från källsystemen med datat integrerat i datalagret och genom att tillämpa ett kvalitetsramverk grundat på semiotisk teori för att kunna identifiera fel. De tre huvudsakliga riktlinjerna som gavs är att (1) implementera ett standardiserat format för källdatat, (2) genomföra en kontroll före integration där källdatat granskas och korrigeras vid behov, och (3) att skapa och implementera specifika databasintegritetsregler. Vidare forskning uppmuntras för att skapa en guide till ramverket om hur man bäst jämför data genom en manuell undersökning, och kvalitetssäkring av källdata.

25

Li, Lin. "Data quality and data cleaning in database applications." Thesis, Edinburgh Napier University, 2012. http://researchrepository.napier.ac.uk/Output/5788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Today, data plays an important role in people's daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today's business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning. In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process. Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of 'algorithm selection mechanism' in the data cleaning framework, which enhances the performance of data cleaning system in database applications.

26

Konaté, Cheick Mohamed. "Enhancing speech coder quality: improved noise estimation for postfilters." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104578.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

ITU-T G.711.1 is a multirate wideband extension for the well-known ITU-T G.711 pulse code modulation of voice frequencies. The extended system is fully interoperable with the legacy narrowband one. In the case where the legacy G.711 is used to code a speech signal and G.711.1 is used to decode it, quantization noise may be audible. For this situation, the standard proposes an optional postfilter. The application of postfiltering requires an estimation of the quantization noise. The more accurate the estimate of the quantization noise is, the better the performance of the postfilter can be.In this thesis, we propose an improved noise estimator for the postfilter proposed for the G.711.1 codec and assess its performance. The proposed estimator provides a more accurate estimate of the noise with the same computational complexity.
ITU-T G.711.1 est une extension multi-débit pour signaux à large-bande de la très répandue norme de compression audio de UIT-T G.711. Cette extension est interoperationelle avec sa version initiale à bande étroite. Lorsque l'ancienne version G.711 est employée pour coder un signal vocal et que G.711.1 est utiliser pour le décoder, le bruit de quantificationpeut être entendu. Pour ce cas, la norme propose un post-filtre optionel. Le post-filtre nécessite l'estimation du bruit de quantification. La précision de l'estimation du bruit de quantification va jouer sur la performance du post-filtre.Dans cette thèse, nous proposons un meilleur estimateur du bruit de quantification pour le post-filtre proposé pour le codec G.711.1 et nous évaluons ses performances. L'estimateur que nous proposons donne une estimation plus précise du bruit de quantification avec la même complexité.

27

Johansson, Magnus. "On noise and hearing loss : Prevalence and reference data." Doctoral thesis, Linköping : Univ, 2003. http://www.ep.liu.se/diss/science_technology/07/97/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Jeatrakul, Piyasak. "Enhancing classification performance over noise and imbalanced data problems." Thesis, Jeatrakul, Piyasak (2012) Enhancing classification performance over noise and imbalanced data problems. PhD thesis, Murdoch University, 2012. https://researchrepository.murdoch.edu.au/id/eprint/10044/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This research presents the development of techniques to handle two issues in data classification: noise and imbalanced data problems. Noise is a significant problem that can degrade the quality of training data in any learning algorithm. Learning algorithms trained by noisy instances generally increase misclassification when they perform classification. As a result, the classification performance tends to decrease. Meanwhile, the imbalanced data problem is another problem affecting the performance of learning algorithms. If some classes have a much larger number of instances than the others, the learning algorithms tend to be dominated by the features of the majority classes, and the features of the minority classes are difficult to recognise. As a result, the classification performance of the minority classes could be significantly lower than that of the majority classes. It is therefore important to implement techniques to better handle the negative effects of noise and imbalanced data problems. Although there are several approaches attempting to handle noise and imbalanced data problems, shortcomings of the available approaches still exist. For the noise handling techniques, even though the noise tolerant approach does not require any data preprocessing, it can tolerate only a certain amount of noise. The classifier developed from noisy data tends to be less predictive if the training data contains a great number of noise instances. Furthermore, for the noise elimination approach, although it can be easily applied to various problem domains, it could degrade the quality of training data if it cannot distinguish between noise and rare cases (exceptions). Besides, for the imbalanced data problem, the available techniques used still present some limitations. For example, the algorithm-level approach can perform effectively only on specific problem domains or specific learning algorithms. The data-level approach can either eliminate necessary information from the training set or produce the over-fitting problem over the minority class. Moreover, when the imbalanced data problem becomes more complex, such as for the case of multi-class classification, it is difficult to apply the re-sampling techniques (the data-level approach), which perform effectively for imbalanced data problems in binary classification, to the multi-class classification. Due to the limitations above, these lead to the motivation of this research to propose and investigate techniques to handle noise and imbalanced data problems more effectively. This thesis has developed three new techniques to overcome the identified problems. Firstly, a cleaning technique called the Complementary Neural Network (CMTNN) data cleaning technique has been developed in order to remove noise (misclassification data) from the training set. The results show that the new noise detection and removal technique can eliminate noise with confidence. Furthermore, the CMTNN cleaning technique can increase the classification accuracy across different learning algorithms, which are Artificial Neural Network (ANN), Support Vector Machine (SVM), k- Nearest Neighbor (k-NN), and Decision Tree (DT). It can provide higher classification performance than other cleaning methods such as Tomek links, the majority voting filtering, and the consensus voting filtering. Secondly, the CMTNN re-sampling technique, which is a new under-sampling technique, has been developed to handle the imbalanced data problem in binary classification. The results show that the combined techniques of the CMTNN resampling technique and Synthetic Minority Over-sampling Technique (SMOTE) can perform effectively by improving the classification performance of the minority class instances in terms of Geometric Mean (G-Mean) and the area under the Receiver Operating Characteristic (ROC) curve. It generally provides higher performance than other re-sampling techniques such as Tomek links, Wilson’s Edited Nearest Neighbor Rule (ENN), SMOTE, the combined technique of SMOTE and ENN, and the combined technique of SMOTE and Tomek links. For the third proposed technique, an algorithm named One-Against-All with Data Balancing (OAA-DB) has been developed in order to deal with the imbalanced data problem in multi-class classification. It can be asserted that this algorithm not only improves the performance for the minority class but it also maintains the overall accuracy, which is normally reduced by other techniques. The OAA-DB algorithm can increase the performance in terms of the classification accuracy and F-measure when compared to other multi-class classification approaches including One-Against-All (OAA), One-Against-One (OAO), All and One (A&O), and One Against Higher Order (OAHO) approaches. Furthermore, this algorithm has shown that the re-sampling technique is not only used effectively for the class imbalance problem in binary classification but it has been also applied successfully to the imbalanced data problem in multi-class classification.

29

Hammond, Patrick Douglas. "Deep Synthetic Noise Generation for RGB-D Data Augmentation." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.

30

Cooley, Daniel Warren. "Data acquisition unit for low-noise, continuous glucose monitoring." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/2844.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

As the number of people with diabetes continues to increase, research efforts improving glucose testing methods and devices are under way to improve outcomes and quality of life for diabetic patients. This dissertation describes the design and testing of a Data Acquisition Unit (DAU) providing low noise photocurrent spectra for use in a continuous glucose monitoring system. The goal of this research is to improve the signal to noise ratio (SNR) of photocurrent measurements to increase glucose concentration measurement accuracy. The glucose monitoring system consists of a portable monitoring device and base station. The monitoring device measures near infrared (IR) absorption spectra from interstitial fluid obtained by microdialysis or ultrafiltration probe and transmits the spectra to a base station via USB or a ZigBee radio link. The base station utilizes chemometric calibration methods to calculate glucose concentration from the photocurrent spectra. Future efforts envisage credit card-sized monitoring devices. The glucose monitor system measures the optical absorbance spectrum of an interstitial fluid (ISF) sample pumped through a fluid chamber inside a glucose sensor. Infrared LEDs in the glucose sensor illuminate the ISF sample with IR light covering the 2.2 to 2.4 micron wavelength region where glucose has unique features in its absorption spectrum. Light that passes through the sample propagates through a linearly variable bandpass filter and impinges on a photodiode array. The center frequency of the variable filter is graded along its length such that the filter and photodiode array form a spectrometer. The data acquisition unit (DAU) conditions and samples photocurrent from each photodiode channel and sends the resulting photocurrent spectra to the Main Controller Unit (MCU). The MCU filters photocurrent samples providing low noise photocurrent spectra to a base station via USB or Zigbee radio link. The glucose monitoring system limit of detection (LOD) from a single glucose sensor wavelength is 5.8 mM with a system bandwidth of 0.00108 Hz. Further analysis utilizing multivariate calibration methods such as the net analyte signal method promise to reduce the glucose monitoring system LOD approaching a clinically useful level of approximately 2 mM.

31

Tcheheumeni, Djanni Axel Laurel. "Identification and quantification of noise sources in marine towed active electromagnetic data." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The towed streamer controlled source electromagnetic (CSEM) system collects data faster than the conventional static node-based CSEM system. However, the towed streamer CSEM is typically much noisier than the conventional static node-based CSEM. Identifying and quantifying various sources of noise is important for the development of future robust electromagnetic streamer system. This is the problem I address in this thesis. I achieve this in three parts. First, I examine the idea that the towed streamer suffers from noise induced by its motion through the Earth’s magnetic field according to Faraday’s law of induction. I derive expressions for the motionally-induced noise for the cases of a horizontal streamer parallel to the acquisition vessel’s path and a curved streamer caused by a constant cross-current. These expressions demonstrate that the motionally-induced noise is sensitive to the magnitude of the feather angle at the head and at the tail of the streamer, and to the vertical and lateral motion of the streamer. The key finding is that no motionally-induced noise is generated when the streamer is horizontal and moving in a constant magnetic field. By contrast, when the streamer shape is curved because of cross-currents, motionally-induced noise is generated if the velocity of the streamer varies over time. Second, I analyse and compare the noise recorded using the first generation of towed streamer with the noise recorded using a static ocean bottom cable (OBC) CSEM. I find out that within the frequency range of interest, 0.01–1 Hz the towed streamer noise is 20 dB greater (factor of 10) than the noise recorded with the OBC CSEM. I show also that the motion of the telluric cable between the pair of electrodes in the towed streamer is responsible for this difference in amplitude between the two systems. In the frequency ranges, 0.03–0.1 Hz and 0.03–0.2 Hz, the motionally-induced noise is shown to be uncorrelated across all channels. However, within the frequency band 0.1–0.3 Hz, the motionally-induced noise correlation gradually increases and becomes well correlated at about 0.2 Hz. This correlated noise could be caused by ocean swell from surface waves, water flowing around the streamer or cross-currents. Finally, to identify and quantify the contribution of several distinct sources of noise, and to describe the mechanisms generating each source of noise, I co-designed a prototype towed streamer CSEM. I carried out an experiment with the prototype streamer suspended 1 m below the water surface in the controlled environment of the Edinburgh wave tank located in King’s building campus (the University of Edinburgh). I then subjected the streamer to flow running at velocities of 0–1ms−1 along its length and to waves propagating in the same direction, at 45°, and perpendicular relative to the streamer direction.

32

Wedin, Jonas. "Replicating noise in video : a comparison between physics-based and deep learning models for simulating noise." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-272110.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Algorithms that track objects in video following Newtonian physics can often be affected by noise in the data. Some types of noise might be hard or expensive to capture, so to be able to augment or generate a new data set from models replicating a certain type of noise can be useful. Recent research into unsupervised learning of video sequences using Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTMs) combined with convolutions (ConvLSTM) gives hope of a deep learning model which can when trained on a certain type of noise, replicate the properties of that noise without copying the exact data. This thesis takes two data sets of different noises (rain and moving insects) and attempts to replicate these. For comparison, two models are created for each noise. One is physicsbased and focuses on creating a specific model of a noise for simulation and generation. The other is a deep learning network trained on data captured in real-life representing each noise. Generated sequences from the models are then evaluated using different measurements and compared to a validation set, using both established techniques such as Fréchet Inception Distance (FID) and new ones created to show the difference for this type of sparse data. The result shows that it is difficult to measure such a sparse data set using existing techniques. FID scores for insect models compared to a validation set, are almost equal (103 ≈ 107). However, this is not consistent with a visual inspection of the data, which shows the deep learning model performing worse. Similar results can be seen for the rain models, which makes FID scores difficult to interpret since it does not match a visual inspection. New measurements techniques show the difference between data sets created with a physics-based model and a deep learning model, but their generalization are questioned. The conclusion is that the physics-based models perform better than the deep learning models, however, they do not generalize as well and takes a considerable effort to produce.
Algoritmer som används för att spåra objekt i video, som följer newtonska rörelser, kan ofta bli påverkade av störningar. Vissa av dessa störningar kan vara svåra och dyra att spela in, så att kunna utöka eller generera ny data som representerar en viss typ av störning kan vara mycket användbart. Forskning inom oövervakad träning av djupinlärningsmodeller som använder sig av Recurrent Neural Networks (RNNs) och Long Short-Term Memory (LSTMs) kombinerat med konvolutioner (ConvLSTM) ger hopp om att en djupinlärningsmodell som är tränad på en viss typ av data, ska kunna återskapa den utan att kopiera orginaldatan. Den här uppsatsen använder sig utav två dataset som representerar störningar (regn och flygande insekter) och försöker att imitera dessa. För att kunna jämföra så skapas två modeller för varje störning. En skapas genom att definera en fysisk modell för störningen som sedan används för att generera data, och den andra är en djupinlärninsmodell som tränas på riktig data. Sekvenser genererade från dessa modeller utvärderas sedan med olika tekniker. Etablerade tekniker så som Frechet Inception Distance (FID) används och andra tas fram för att visa statisktiska skillnader mellan modellerna. Resultatet visar att det är svårt att mäta så gles data med existerande tekniker. FID-mätningen för insekts-modellerna jämfört med ett valideringsset är nästan lika (103 ≈ 107). Detta stämmer inte överens med en visuell inspektion utav datan, där djupinlärningsmodellen presterar sämre. Liknande resultat kan ses för regndatan, vilket gör FID-mätningarna svåra att tolka eftersom det inte stämmer med vad datan visar. Nya mättekniker visar att dom fysiska modellerna presterar bättre än djupinlärninsmodellerna, men användbarheten hos dom teknikerna ifrågasätts. Slutsatsen är att dom fysiska modellerna presterar bättre än djupinlärningsmodellerna men att dom inte generaliserar lika väl och kräver stor ansträngning att producera.

33

Yu, Wenyuan. "Improving data quality : data consistency, deduplication, currency and accuracy." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8899.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Data quality is one of the key problems in data management. An unprecedented amount of data has been accumulated and has become a valuable asset of an organization. The value of the data relies greatly on its quality. However, data is often dirty in real life. It may be inconsistent, duplicated, stale, inaccurate or incomplete, which can reduce its usability and increase the cost of businesses. Consequently the need for improving data quality arises, which comprises of five central issues of improving data quality, namely, data consistency, data deduplication, data currency, data accuracy and information completeness. This thesis presents the results of our work on the first four issues with regards to data consistency, deduplication, currency and accuracy. The first part of the thesis investigates incremental verifications of data consistencies in distributed data. Given a distributed database D, a set S of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. Although the problems are intractable, we show that they are bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. Such incremental algorithms are provided for both vertically and horizontally partitioned data, and we show that the algorithms are optimal. The second part of the thesis studies the interaction between record matching and data repairing. Record matching, the main technique underlying data deduplication, aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent by fixing errors in the data using constraints. These are treated as separate processes in most data cleaning systems, based on heuristic solutions. However, our studies show that repairing can effectively help us identify matches, and vice versa. To capture the interaction, a uniform framework that seamlessly unifies repairing and matching operations is proposed to clean a database based on integrity constraints, matching rules and master data. The third part of the thesis presents our study of finding certain fixes that are absolutely correct for data repairing. Data repairing methods based on integrity constraints are normally heuristic, and they may not find certain fixes. Worse still, they may even introduce new errors when attempting to repair the data, which may not work well when repairing critical data such as medical records, in which a seemingly minor error often has disastrous consequences. We propose a framework and an algorithm to find certain fixes, based on master data, a class of editing rules and user interactions. A prototype system is also developed. The fourth part of the thesis introduces inferring data currency and consistency for conflict resolution, where data currency aims to identify the current values of entities, and conflict resolution is to combine tuples that pertain to the same real-world entity into a single tuple and resolve conflicts, which is also an important issue for data deduplication. We show that data currency and consistency help each other in resolving conflicts. We study a number of associated fundamental problems, and develop an approach for conflict resolution by inferring data currency and consistency. The last part of the thesis reports our study of data accuracy on the longstanding relative accuracy problem which is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2[A], i.e., t1[A] is closer to the true value of the A attribute of e than t2[A]. We introduce a class of accuracy rules and an inference system with a chase procedure to deduce relative accuracy, and the related fundamental problems are studied. We also propose a framework and algorithms for inferring accurate values with users’ interaction.

34

Barker, James M. "Data governance| The missing approach to improving data quality." Thesis, University of Phoenix, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10248424.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In an environment where individuals use applications to drive activities from what book to purchase, what film to view, to what temperature to heat a home, data is the critical element. To make things work data must be correct, complete, and accurate. Many firms view data governance as a panacea to the ills of systems and organizational challenge while other firms struggle to generate the value of these programs. This paper documents a study that was executed to understand what is being done by firms in the data governance space and why? The conceptual framework that was established from the literature on the subject was a set of six areas that should be addressed for a data governance program including: data governance councils; data quality; master data management; data security; policies and procedures; and data architecture. There is a wide range of experiences and ways to address data quality and the focus needs to be on execution. This explanatory case study examined the experiences of 100 professionals at 41 firms to understand what is being done and why professionals are undertaking such an endeavor. The outcome is that firms need to address data quality, data security, and operational standards in a manner that is organized around business value including strong business leader sponsorship and a documented dynamic business case. The outcome of this study provides a foundation for data governance program success and a guide to getting started.

35

Wolf, Hilke. "Data Quality Bench-Marking for High Resolution Bragg Data." Doctoral thesis, Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2014. http://hdl.handle.net/11858/00-1735-0000-0022-5DE2-A.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Swapna, B., and R. VijayaPrakash. "Privacy Preserving Data Mining Operations without Disrupting Data Quality." International Journal of Computer Science and Network (IJCSN), 2012. http://hdl.handle.net/10150/271473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Data mining operations have become prevalent as they can extract trends or patterns that help in taking good business decisions. Often they operate on large historical databases or data warehouses to obtain actionable knowledge or business intelligence that helps in taking well informed decisions. In the data mining domain there came many tools to perform data mining operations. These tools are best used to obtain actionable knowledge from data. Manually doing this is not possible as the data is very huge and takes lot of time. Thus the data mining domain is being improved in a rapid pace. While data mining operations are very useful in obtaining business intelligence, they also have some drawbacks that are they get sensitive information from the database. People may misuse the freedom given by obtaining sensitive information illegally. Preserving privacy of data is also important. Towards this end many Privacy Preserving Data Mining (PPDM) algorithms came into existence that sanitize data to prevent data mining algorithms from extracting sensitive information from the databases.
Data mining operations help discover business intelligence from historical data. The extracted business intelligence or actionable knowledge helps in taking well informed decisions that leads to profit to the organization that makes use of it. While performing mining privacy of data has to be given utmost importance. To achieve this PPDM (Privacy Preserving Data Mining) came into existence by sanitizing database that prevents discovery of association rules. However, this leads to modification of data and thus disrupting the quality of data. This paper proposes a new technique and algorithms that can perform privacy preserving data mining operations while ensuring that the data quality is not lost. The empirical results revealed that the proposed technique is useful and can be used in real world applications.

37

Pedroza, Moises. "TRACKING RECEIVER NOISE BANDWIDTH SELECTION." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/607591.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
The selection of the Intermediate Frequency (IF) bandwidth filter for a data receiver for processing PCM data is based on using a peak deviation of 0.35 times the bit rate. The optimum IF bandwidth filter is equal to the bit rate. An IF bandwidth filter of 1.5 times the bit rate degrades the data by approximately 0.7 dB. The selection of the IF bandwidth filter for tracking receivers is based on the narrowest “noise bandwidth” that will yield the best system sensitivity. In some cases the noise bandwidth of the tracking receiver is the same as the IF bandwidth of the data receiver because it is the same receiver. If this is the case, the PCM bit rate determines the IF bandwidth and establishes the system sensitivity. With increasing bit rates and increased transmitter stability characteristics, the IF bandwidth filter selection criteria for a tracking receiver must include system sensitivity considerations. The tracking receiver IF bandwidth filter selection criteria should also be based on the narrowest IF bandwidth that will not cause the tracking errors to be masked by high bit rates and alter the pedestal dynamic response. This paper describes a selection criteria for a tracking receiver IF bandwidth filter based on measurements of the tracking error signals versus antenna pedestal dynamic response. Different IF bandwidth filters for low and high bit rates were used.

38

López, Martinez Carlos. "Multidimensional speckle noise. Modelling and filtering related to sar data." Doctoral thesis, Universitat Politècnica de Catalunya, 2003. http://hdl.handle.net/10803/6921.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Los Radares de Apertura Sintética, o sistemas SAR, representan el mejorejemplo de sistemas activos de teledetección por microondas. Debido a su naturaleza coherente, un sistema SAR es capaz de adquirir información dedispersión electromagnética con una alta resolución espacial, pero por otro lado, esta naturaleza coherente provoca también la aparición de speckle.A pesar de que el speckle es una medida electromagnética, sólo puede ser analizada como una componente de ruido debido a la complejidad asociadacon el proceso de dispersión electromagnética.Para eliminar los efectos del ruido speckle adecuadamente, es necesario un modelo de ruido, capaz de identificar las fuentes de ruido y como éstasdegradan la información útil. Mientras que este modelo existe para sistemasSAR unidimensionales, conocido como modelo de ruido speckle multiplicativo,éste no existe en el caso de sistemas SAR multidimensionales.El trabajo presentado en esta tesis presenta la definición y completa validación de nuevos modelos de ruido speckle para sistemas SAR multidimensionales,junto con su aplicación para la reducción de ruido speckle y la extracción de información.En esta tesis, los datos SAR multidimensionales, se consideran bajo una formulación basada en la matriz de covarianza, ya que permite el análisisde datos sobre la base del producto complejo Hermítico de pares de imágenesSAR. Debido a que el mantenimiento de la resolución especial es un aspectoimportante del procesado de imágenes SAR, la reducción de ruido speckleestá basada, en este trabajo, en la teoría de análisis wavelet.

39

Arizaleta, Mikel. "Structured data extraction: separating content from noise on news websites." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9898.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.

40

Kawaguchi, Hirokazu. "Signal Extraction and Noise Removal Methods for Multichannel Electroencephalographic Data." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/188593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Shahbazian, Mehdi. "Multiresolution denoising for arbitrarily spaced data contaminated with arbitrary noise." Thesis, University of Surrey, 2005. http://epubs.surrey.ac.uk/843064/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Denoising is an essential ingredient of any data processing task because real data are usually contaminated by some amount of uncertainty, error or noise. The ultimate objective in this study is to handle the multiresolution denoising of an arbitrarily spaced multidimensional data set contaminated with arbitrary noise. Denoising is closely related to function estimation from noisy samples, which is best achieved by complexity control in a structured function space. Multiresolution analysis and wavelets provide a suitable structured space for function estimation. However, conventional wavelet decompositions, such as the fast wavelet transform, are designed for regularly spaced data. Furthermore, the projection and lifting scheme approaches for dealing with irregular data cannot be easily extended to higher dimensions and their application to denoising is not straightforward. In contrast, the least squares wavelet decomposition offers a method for direct decomposition and denoising of multidimensional irregularly spaced data. We show that the frequently applied level by level multiresolution least squares wavelet decomposition suffers from gross interpolation error in the case of irregularly spaced data. The simultaneous least squares wavelet decomposition, with careful wavelet selection, is proposed to overcome this problem. Conventional wavelet domain denoising techniques, such as global and level dependent thresholding, work well for regularly spaced data but more sophisticated coefficient dependent thresholding is required for irregularly spaced data. We propose a new data domain denoising method for Gaussian noise, referred to as the Local Goodness of Fit (LGF) algorithm, which is based on the local application of the conventional goodness of fit measure in a multiresolution structure. We show that the combination of the simultaneous least squares wavelet decomposition and the LGF denoising algorithm is superior to the projection and coefficient dependent thresholding and can handle arbitrarily spaced multidimensional data contaminated with independent, but not necessarily identically distributed, Gaussian noise. For denoising of data contaminated with outliers and/or non-Gaussian long tail noise, the decomposition methods based on mean estimation are not robust. We develop a new robust multiresolution decomposition, based on median estimation in a dyadic multiresolution structure, referred to as the Interpolated Block Median Decomposition (IBMD). The IBMD method overcomes the limitations of existing median preserving transforms and can handle multidimensional irregularly spaced data of arbitrary size. Thresholding methods for the coefficients of robust median preserving decompositions are currently limited to regular data contaminated with noise drawn independently and identically from a known symmetric distribution. To overcome these serious limitations, we develop a fundamentally new data domain robust multiresolution denoising procedure, called the Local Balance of Fit (LBF) algorithm, which is based on local balancing of the data points above and below the denoised function in a dyadic multiresolution structure. The LBF algorithm, which was inspired by the intuitive denoising style carried out by a human operator, is a distribution free method that can handle any arbitrary noise without a priori knowledge or estimation of the noise distribution. The combination of the robust IBMD decomposition and the LBF denoising algorithm can effectively handle a wide spectrum of denoising applications involving multidimensional arbitrarily spaced data contaminated with arbitrary and unknown noise. The only limitation is that the noise samples must be independent or uncorrelated.

42

López, Martinez Carlos. "Multidimensional speckle noise, modelling and filtering related to SAR data /." Köln : DLR, Bibliotheks- und Informationswesen, 2004. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=015380575&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

De, Stefano Antonio. "Wavelet-based reduction of spatial video noise." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.342855.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Angeles, Maria del Pilar. "Management of data quality when integrating data with known provenance." Thesis, Heriot-Watt University, 2007. http://hdl.handle.net/10399/64.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Diallo, Thierno Mahamoudou. "Discovering data quality rules in a master data management context." Thesis, Lyon, INSA, 2013. http://www.theses.fr/2013ISAL0067.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Le manque de qualité des données continue d'avoir un impact considérable pour les entreprises. Ces problèmes, aggravés par la quantité de plus en plus croissante de données échangées, entrainent entre autres un surcoût financier et un rallongement des délais. De ce fait, trouver des techniques efficaces de correction des données est un sujet de plus en plus pertinent pour la communauté scientifique des bases de données. Par exemple, certaines classes de contraintes comme les Dépendances Fonctionnelles Conditionnelles (DFCs) ont été récemment introduites pour le nettoyage de données. Les méthodes de nettoyage basées sur les CFDs sont efficaces pour capturer les erreurs mais sont limitées pour les corriger . L’essor récent de la gestion de données de référence plus connu sous le sigle MDM (Master Data Management) a permis l'introduction d'une nouvelle classe de règle de qualité de données: les Règles d’Édition (RE) qui permettent d'identifier les attributs en erreur et de proposer les valeurs correctes correspondantes issues des données de référence. Ces derniers étant de très bonne qualité. Cependant, concevoir ces règles manuellement est un processus long et coûteux. Dans cette thèse nous développons des techniques pour découvrir de manière automatique les RE à partir des données source et des données de référence. Nous proposons une nouvelle sémantique des RE basée sur la satisfaction. Grace à cette nouvelle sémantique le problème de découverte des RE se révèle être une combinaison de la découverte des DFCs et de l'extraction des correspondances entre attributs source et attributs des données de référence. Nous abordons d'abord la découverte des DFCs, en particulier la classe des DFCs constantes très expressives pour la détection d'incohérence. Nous étendons des techniques conçues pour la découverte des traditionnelles dépendances fonctionnelles. Nous proposons ensuite une méthode basée sur les dépendances d'inclusion pour extraire les correspondances entre attributs source et attributs des données de référence avant de construire de manière automatique les RE. Enfin nous proposons quelques heuristiques d'application des ER pour le nettoyage de données. Les techniques ont été implémenté et évalué sur des données synthétiques et réelles montrant la faisabilité et la robustesse de nos propositions
Dirty data continues to be an important issue for companies. The datawarehouse institute [Eckerson, 2002], [Rockwell, 2012] stated poor data costs US businesses $611 billion dollars annually and erroneously priced data in retail databases costs US customers $2.5 billion each year. Data quality becomes more and more critical. The database community pays a particular attention to this subject where a variety of integrity constraints like Conditional Functional Dependencies (CFD) have been studied for data cleaning. Repair techniques based on these constraints are precise to catch inconsistencies but are limited on how to exactly correct data. Master data brings a new alternative for data cleaning with respect to it quality property. Thanks to the growing importance of Master Data Management (MDM), a new class of data quality rule known as Editing Rules (ER) tells how to fix errors, pointing which attributes are wrong and what values they should take. The intuition is to correct dirty data using high quality data from the master. However, finding data quality rules is an expensive process that involves intensive manual efforts. It remains unrealistic to rely on human designers. In this thesis, we develop pattern mining techniques for discovering ER from existing source relations with respect to master relations. In this set- ting, we propose a new semantics of ER taking advantage of both source and master data. Thanks to the semantics proposed in term of satisfaction, the discovery problem of ER turns out to be strongly related to the discovery of both CFD and one-to-one correspondences between sources and target attributes. We first attack the problem of discovering CFD. We concentrate our attention to the particular class of constant CFD known as very expressive to detect inconsistencies. We extend some well know concepts introduced for traditional Functional Dependencies to solve the discovery problem of CFD. Secondly, we propose a method based on INclusion Dependencies to extract one-to-one correspondences from source to master attributes before automatically building ER. Finally we propose some heuristics of applying ER to clean data. We have implemented and evaluated our techniques on both real life and synthetic databases. Experiments show both the feasibility, the scalability and the robustness of our proposal

46

Gredmaier, Ludwig Konrad. "The effect of probe tone duration on psychoacoustic frequency selectivity." Thesis, University of Southampton, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396142.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Wilson, James Harris. "Development and validation of a laminate flooring system sound quality test method." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29660.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Thesis (M. S.)--Mechanical Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Cunefare, Kenneth A.; Committee Member: Qu, Jianmin; Committee Member: Ryherd, Erica. Part of the SMARTech Electronic Thesis and Dissertation Collection.

48

Gens, Rüdiger. "Quality assessment of SAR interferometric data." Hannover : Fachrichtung Vermessungswesen der Univ, 1998. http://deposit.ddb.de/cgi-bin/dokserv?idn=95607121X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Berg, Marcus. "Evaluating Quality of Online Behavior Data." Thesis, Stockholms universitet, Statistiska institutionen, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-97524.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis has two purposes; emphasizing the importance of data quality of Big Data, and identifying and evaluating potential error sources in JavaScript tracking (a client side on - site online behavior clickstream data collection method commonly used in web analytics). The importance of data quality of Big Data is emphasized through the evaluation of JavaScript tracking. The Total Survey Error framework is applied to JavaScript tracking and 17 nonsampling error sources are identified and evaluated. The bias imposed by these error sources varies from large to small, but the major takeaway is the large number of error sources actually identified. More work is needed. Big Data has much to gain from quality work. Similarly, there is much that can be done with statistics in web analytics.

50

Ma, Shuai. "Extending dependencies for improving data quality." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5045.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This doctoral thesis presents the results of my work on extending dependencies for improving data quality, both in a centralized environment with a single database and in a data exchange and integration environment with multiple databases. The first part of the thesis proposes five classes of data dependencies, referred to as CINDs, eCFDs, CFDcs, CFDps and CINDps, to capture data inconsistencies commonly found in practice in a centralized environment. For each class of these dependencies, we investigate two central problems: the satisfiability problem and the implication problem. The satisfiability problem is to determine given a set Σ of dependencies defined on a database schema R, whether or not there exists a nonempty database D of R that satisfies Σ. And the implication problem is to determine whether or not a set Σ of dependencies defined on a database schema R entails another dependency φ on R. That is, for each database D ofRthat satisfies Σ, the D must satisfy φ as well. These are important for the validation and optimization of data-cleaning processes. We establish complexity results of the satisfiability problem and the implication problem for all these five classes of dependencies, both in the absence of finite-domain attributes and in the general setting with finite-domain attributes. Moreover, SQL-based techniques are developed to detect data inconsistencies for each class of the proposed dependencies, which can be easily implemented on the top of current database management systems. The second part of the thesis studies three important topics for data cleaning in a data exchange and integration environment with multiple databases. One is the dependency propagation problem, which is to determine, given a view defined on data sources and a set of dependencies on the sources, whether another dependency is guaranteed to hold on the view. We investigate dependency propagation for views defined in various fragments of relational algebra, conditional functional dependencies (CFDs) [FGJK08] as view dependencies, and for source dependencies given as either CFDs or traditional functional dependencies (FDs). And we establish lower and upper bounds, all matching, ranging from PTIME to undecidable. These not only provide the first results for CFD propagation, but also extend the classical work of FD propagation by giving new complexity bounds in the presence of a setting with finite domains. We finally provide the first algorithm for computing a minimal cover of all CFDs propagated via SPC views. The algorithm has the same complexity as one of the most efficient algorithms for computing a cover of FDs propagated via a projection view, despite the increased expressive power of CFDs and SPC views. Another one is matching records from unreliable data sources. A class of matching dependencies (MDs) is introduced for specifying the semantics of unreliable data. As opposed to static constraints for schema design such as FDs, MDs are developed for record matching, and are defined in terms of similarity metrics and a dynamic semantics. We identify a special case of MDs, referred to as relative candidate keys (RCKs), to determine what attributes to compare and how to compare them when matching records across possibly different relations. We also propose a mechanism for inferring MDs with a sound and complete system, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. We finally provide a quadratic time algorithm for inferring MDs, and an effective algorithm for deducing quality RCKs from a given set of MDs. The last one is finding certain fixes for data monitoring [CGGM03, SMO07], which is to find and correct errors in a tuple when it is created, either entered manually or generated by some process. That is, we want to ensure that a tuple t is clean before it is used, to prevent errors introduced by adding t. As noted by [SMO07], it is far less costly to correct a tuple at the point of entry than fixing it afterward. Data repairing based on integrity constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct.