Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Data analysis and interpretation techniques.

Rozprawy doktorskie na temat „Data analysis and interpretation techniques”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Data analysis and interpretation techniques”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Vitale, Raffaele. "Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation". Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/90442.

Pełny tekst źródła
Streszczenie:
The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest: Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA; Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments; Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose; Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented; Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed; Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included.
La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos. El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés: Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA; Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas; Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin; Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano; Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real; Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos.
La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades. El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès: Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA; Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles; Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada; Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper; Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real; Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos.
Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442
TESIS
Style APA, Harvard, Vancouver, ISO itp.
2

Smith, Eugene Herbie. "An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. Smith". Thesis, North-West University, 2009. http://hdl.handle.net/10394/5029.

Pełny tekst źródła
Streszczenie:
Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch.
Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010.
Style APA, Harvard, Vancouver, ISO itp.
3

Carter, Duane B. "Analysis of Multiresolution Data fusion Techniques". Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/36609.

Pełny tekst źródła
Streszczenie:
In recent years, as the availability of remote sensing imagery of varying resolution has increased, merging images of differing spatial resolution has become a significant operation in the field of digital remote sensing. This practice, known as data fusion, is designed to enhance the spatial resolution of multispectral images by merging a relatively coarse-resolution image with a higher resolution panchromatic image of the same geographic area. This study examines properties of fused images and their ability to preserve the spectral integrity of the original image. It analyzes five current data fusion techniques for three complex scenes to assess their performance. The five data fusion models used include one spatial domain model (High-Pass Filter), two algebraic models (Multiplicative and Brovey Transform), and two spectral domain models (Principal Components Transform and Intensity-Hue-Saturation). SPOT data were chosen for both the panchromatic and multispectral data sets. These data sets were chosen for the high spatial resolution of the panchromatic (10 meters) data, the relatively high spectral resolution of the multispectral data, and the low spatial resolution ratio of two to one (2:1). After the application of the data fusion techniques, each merged image was analyzed statistically, graphically, and for increased photointerpretive potential as compared with the original multispectral images. While all of the data fusion models distorted the original multispectral imagery to an extent, both the Intensity-Hue-Saturation Model and the High-Pass Filter model maintained the original qualities of the multispectral imagery to an acceptable level. The High-Pass Filter model, designed to highlight the high frequency spatial information, provided the most noticeable increase in spatial resolution.
Master of Science
Style APA, Harvard, Vancouver, ISO itp.
4

Astbury, S. "Analysis and interpretation of full waveform sonic data". Thesis, University of Oxford, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.371535.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Gimblett, Brian James. "The application of artificial intelligence techniques to data interpretation in analytical chemistry". Thesis, University of Salford, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395862.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Lahouar, Samer. "Development of Data Analysis Algorithms for Interpretation of Ground Penetrating Radar Data". Diss., Virginia Tech, 2003. http://hdl.handle.net/10919/11051.

Pełny tekst źródła
Streszczenie:
According to a 1999 Federal Highway Administration statistic, the U.S. has around 8.2 million lane-miles of roadways that need to be maintained and rehabilitated periodically. Therefore, in order to reduce rehabilitation costs, pavement engineers need to optimize the rehabilitation procedure, which is achieved by accurately knowing the existing pavement layer thicknesses and localization of subsurface defects. Currently, the majority of departments of transportation (DOTs) rely on coring as a means to estimate pavement thicknesses, instead of using other nondestructive techniques, such as Ground Penetrating Radar (GPR). The use of GPR as a nondestructive pavement assessment tool is limited mainly due to the difficulty of GPR data interpretation, which requires experienced operators. Therefore, GPR results are usually subjective and inaccurate. Moreover, GPR data interpretation is very time-consuming because of the huge amount of data collected during a survey and the lack of reliable GPR data-interpretation software. This research effort attempts to overcome these problems by developing new GPR data analysis techniques that allow thickness estimation and subsurface defect detection from GPR data without operator intervention. The data analysis techniques are based on an accurate modeling of the propagation of the GPR electromagnetic waves through the pavement dielectric materials while traveling from the GPR transmitter to the receiver. Image-processing techniques are also applied to detect layer boundaries and subsurface defects. The developed data analysis techniques were validated utilizing data collected from an experimental pavement system: the Virginia Smart Road. The layer thickness error achieved by the developed system was around 3%. The conditions needed to achieve reliable and accurate results from GPR testing were also established.
Ph. D.
Style APA, Harvard, Vancouver, ISO itp.
7

Pinpart, Tanya. "Techniques for analysis and interpretation of UHF partial discharge signals". Thesis, University of Strathclyde, 2010. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=12830.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Deng, Xinping. "Texture analysis and physical interpretation of polarimetric SAR data". Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/396607.

Pełny tekst źródła
Streszczenie:
This thesis is dedicated to the study of texture analysis and physical interpretation of PolSAR data. As the starting point, a complete survey of the statistical models for PolSAR data is conducted. All the models are classified into three categories: Gaussian distributions, texture models, and finite mixture models. The texture models, which assume that the randomness of the SAR data is due to two unrelated factors, texture and speckle, are the main subject of this study. The PDFs of the scattering vector and the sample covariance matrix in different models are reviewed. Since many models have been proposed, how to choose the most accurate one for a test data is a big challenge. Methods which analyze different polarimetric channels separately or require a filtering of the data are limited in many cases, especially when it comes to high resolution data. In this thesis, the L2-norms of the scattering vectors are studied, and they are found to be advantageous to extract statistical information from PolSAR data. Statistics based on the L2-norms can be utilized to determine what distribution the data actually follow. A number of models are suggested to model the texture of PolSAR data, and some are very complex. But most of them lack a physical explanation. The random walk model, which can be interpreted as a discrete analog of the SAR data focusing process, is studied with the objective to understand the data statistics from the point of view of scattering process. A simulator based on the random walk model is developed, where different variations in the scatterer types and scatterer numbers are considered. It builds a bridge between the mathematical models and underlying physical mechanisms. It is found that both the mixture and the texture could give the same statistics such as log-cumulants of the second order and the third order. The two concepts, texture and mixture, represent two quite different scenarios. A further study was carried on to see if it is possible to distinguish them. And higher order statistics are demonstrated to be favorable in this task. They can be physically interpreted to distinguish the scattering from a single type of target from a mixture of targets.
Esta tesis está dedicada al estudio del análisis de texturas y de la interpretación física de datos PolSAR. Como punto de partida, se ha llevado a cabo un estudio completo de los modelos estadísticos para datos PolSAR. Todos los modelos se han clasificado en tres categorías: distribuciónes gaussianas, modelos de textura y modelos de mezcla finita. Los modelos de textura, que asumen que la aleatoriedad de los datos SAR se debe a dos factores no relacionados, la textura y el speckle, son el tema principal de este estudio. Las distribuciones del vector de dispersión y de la matriz de covarianza en diferentes modelos son revisados. Debido a que se han propuesto muchos modelos, cómo elegir el más preciso para unos datos en particular es un gran reto. Los métodos que analizan diferentes canales polarimétricos por separado o requieren de un filtrado de los datos presentan limitacions en muchos casos, especialmente cuando se trata de datos de alta resolución. En esta tesis, la norma L2 de los vectores de dispersión se estudian, demostrando su utilidad para extraer información estadística de los datos PolSAR. Las estadísticas basadas en la norma L2 se pueden utilizar para determinar la distribución de los datos. En la literatura, se sugieren una serie de modelos para modelar la textura de los datos PolSAR, siendo alguno de ellos muy complejos. Sin embargo, la mayoría de ellos carecen de una explicación física. El modelo de random walk, que se puede interpretar como un análogo discreto del proceso de enfocado de los datos SAR, se estudia con el objetivo de comprender las estadísticas de los datos desde el punto de vista de proceso de dispersión. Se desarrolla un simulador basado en el modelo de random walk, donde se consideran diversas variaciones en los tipos de dispersores y número de dispersores. Se construye un puente entre los modelos matemáticos y mecanismos físicos subyacentes. Se encontró que tanto la mezcla como la textura podrían dar las mismas estadísticas, tales como log-cumulantes de segundo orden y tercer orden. Los dos conceptos, la textura y la mezcla, representan dos escenarios muy diferentes. Se realizó un estudio adicional para ver si es posible distinguirlos, demostrando que las estadísticas de orden superior son favorables en esta tarea. Pueden interpretarse físicamente para distinguir la dispersión a partir de un solo tipo de blanco de una mezcla de blancos.
Style APA, Harvard, Vancouver, ISO itp.
9

Fitzgerald, Tomas W. "Data analysis methods for copy number discovery and interpretation". Thesis, Cranfield University, 2014. http://dspace.lib.cranfield.ac.uk/handle/1826/10002.

Pełny tekst źródła
Streszczenie:
Copy number variation (CNV) is an important type of genetic variation that can give rise to a wide variety of phenotypic traits. Differences in copy number are thought to play major roles in processes that involve dosage sensitive genes, providing beneficial, deleterious or neutral modifications to individual phenotypes. Copy number analysis has long been a standard in clinical cytogenetic laboratories. Gene deletions and duplications can often be linked with genetic Syndromes such as: the 7q11.23 deletion of Williams-­‐Bueren Syndrome, the 22q11 deletion of DiGeorge syndrome and the 17q11.2 duplication of Potocki-­‐Lupski syndrome. Interestingly, copy number based genomic disorders often display reciprocal deletion / duplication syndromes, with the latter frequently exhibiting milder symptoms. Moreover, the study of chromosomal imbalances plays a key role in cancer research. The datasets used for the development of analysis methods during this project are generated as part of the cutting-­‐edge translational project, Deciphering Developmental Disorders (DDD). This project, the DDD, is the first of its kind and will directly apply state of the art technologies, in the form of ultra-­‐high resolution microarray and next generation sequencing (NGS), to real-­‐time genetic clinical practice. It is collaboration between the Wellcome Trust Sanger Institute (WTSI) and the National Health Service (NHS) involving the 24 regional genetic services across the UK and Ireland. Although the application of DNA microarrays for the detection of CNVs is well established, individual change point detection algorithms often display variable performances. The definition of an optimal set of parameters for achieving a certain level of performance is rarely straightforward, especially where data qualities vary.
Style APA, Harvard, Vancouver, ISO itp.
10

Venugopal, Niveditha. "Annotation-Enabled Interpretation and Analysis of Time-Series Data". PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4708.

Pełny tekst źródła
Streszczenie:
As we continue to produce large amounts of time-series data, the need for data analysis is growing rapidly to help gain insights from this data. These insights form the foundation of data-driven decisions in various aspects of life. Data annotations are information about the data such as comments, errors and provenance, which provide context to the underlying data and aid in meaningful data analysis in domains such as scientific research, genomics and ECG analysis. Storing such annotations in the database along with the data makes them available to help with analysis of the data. In this thesis, I propose a user-friendly technique for Annotation-Enabled Analysis through which a user can employ annotations to help query and analyze data without having prior knowledge of the details of the database schema or any kind of database programming language. The proposed technique receives the request for analysis as a high-level specification, hiding the details of the schema, joins, etc., and parses it, validates the input and converts it into SQL. This SQL query can then be executed in a relational database and the result of the query returned to the user. I evaluate this technique by providing real-world data from a building-data platform containing data about Portland State University buildings such as room temperature, air volume and CO2 level. This data is annotated with information such as class schedules, power outages and control modes (for example, day or night mode). I test my technique with three increasingly sophisticated levels of use cases drawn from this building science domain. (1) Retrieve data with include or exclude annotation selection (2) Correlate data with include or exclude annotation selection (3) Align data based on include annotation selection to support aggregation over multiple periods. I evaluate the technique by performing two kinds of tests: (1) To validate correctness, I generate synthetic datasets for which I know the expected result of these annotation-enabled analyses and compare the expected results with the results generated from my technique (2) I evaluate the performance of the queries generated by this service with respect to execution time in the database by comparing them with alternative SQL translations that I developed.
Style APA, Harvard, Vancouver, ISO itp.
11

Hamby, Stephen Edward. "Data mining techniques for protein sequence analysis". Thesis, University of Nottingham, 2010. http://eprints.nottingham.ac.uk/11498/.

Pełny tekst źródła
Streszczenie:
This thesis concerns two areas of bioinformatics related by their role in protein structure and function: protein structure prediction and post translational modification of proteins. The dihedral angles Ψ and Φ are predicted using support vector regression. For the prediction of Ψ dihedral angles the addition of structural information is examined and the normalisation of Ψ and Φ dihedral angles is examined. An application of the dihedral angles is investigated. The relationship between dihedral angles and three bond J couplings determined from NMR experiments is described by the Karplus equation. We investigate the determination of the correct solution of the Karplus equation using predicted Φ dihedral angles. Glycosylation is an important post translational modification of proteins involved in many different facets of biology. The work here investigates the prediction of N-linked and O-linked glycosylation sites using the random forest machine learning algorithm and pairwise patterns in the data. This methodology produces more accurate results when compared to state of the art prediction methods. The black box nature of random forest is addressed by using the trepan algorithm to generate a decision tree with comprehensible rules that represents the decision making process of random forest. The prediction of our program GPP does not distinguish between glycans at a given glycosylation site. We use farthest first clustering, with the idea of classifying each glycosylation site by the sugar linking the glycan to protein. This thesis demonstrates the prediction of protein backbone torsion angles and improves the current state of the art for the prediction of glycosylation sites. It also investigates potential applications and the interpretation of these methods.
Style APA, Harvard, Vancouver, ISO itp.
12

Guile, Geofrrey Robert. "Boosting ensemble techniques for Microarray data analysis". Thesis, University of East Anglia, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.518361.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

HaKong, L. "Expert systems techniques for statistical data analysis". Thesis, London South Bank University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.381956.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Waterworth, Alan Richard. "Data analysis techniques of measured biological impedance". Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340146.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Stark, J. Alex. "Statistical model selection techniques for data analysis". Thesis, University of Cambridge, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390190.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Clement, Meagan E. Couper David J. "Analysis techniques for diffusion tensor imaging data". Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2008. http://dc.lib.unc.edu/u?/etd,2010.

Pełny tekst źródła
Streszczenie:
Thesis (DrPH)--University of North Carolina at Chapel Hill, 2008.
Title from electronic title page (viewed Feb. 17, 2009). "... in partial fulfillment of the requirements for the degree of Doctorate of Public Health in the School of Public Health Department of Biostatistics." Discipline: Biostatistics; Department/School: Public Health.
Style APA, Harvard, Vancouver, ISO itp.
17

Ikin, Peter Jonathon Christopher. "Data analysis techniques for triggerless data streams in nuclear structure physics". Thesis, University of Liverpool, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.427039.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Tütüncü, Göknur. "Analysis and interpretation of diffraction data from complex, anisotropic materials". [Ames, Iowa : Iowa State University], 2010. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3403852.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Stenlund, Hans. "Improving interpretation by orthogonal variation : Multivariate analysis of spectroscopic data". Doctoral thesis, Umeå universitet, Kemiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-43476.

Pełny tekst źródła
Streszczenie:
The desire to use the tools and concepts of chemometrics when studying problems in the life sciences, especially biology and medicine, has prompted chemometricians to shift their focus away from their field‘s traditional emphasis on model predictivity and towards the more contemporary objective of optimizing information exchange via model interpretation. The complex data structures that are captured by modern advanced analytical instruments open up new possibilities for extracting information from complex data sets. This in turn imposes higher demands on the quality of data and the modeling techniques used. The introduction of the concept of orthogonal variation in the late 1990‘s led to a shift of focus within chemometrics; the information gained from analysis of orthogonal structures complements that obtained from the predictive structures that were the discipline‘s previous focus. OPLS, which was introduced in the beginning of 2000‘s, refined this view by formalizing the model structure and the separation of orthogonal variations. Orthogonal variation stems from experimental/analytical issues such as time trends, process drift, storage, sample handling, and instrumental differences, or from inherent properties of the sample such as age, gender, genetics, and environmental influence. The usefulness and versatility of OPLS has been demonstrated in over 500 citations, mainly in the fields of metabolomics and transcriptomics but also in NIR, UV and FTIR spectroscopy. In all cases, the predictive precision of OPLS is identical to that of PLS, but OPLS is superior when it comes to the interpretation of both predictive and orthogonal variation. Thus, OPLS models the same data structures but provides increased scope for interpretation, making it more suitable for contemporary applications in the life sciences. This thesis discusses four different research projects, including analyses of NIR, FTIR and NMR spectroscopic data. The discussion includes comparisons of OPLS and PLS models of complex datasets in which experimental variation conceals and confounds relevant information. The PLS and OPLS methods are discussed in detail. In addition, the thesis describes new OPLS-based methods developed to accommodate hyperspectral images for supervised modeling. Proper handling of orthogonal structures revealed the weaknesses in the analytical chains examined. In all of the studies described, the orthogonal structures were used to validate the quality of the generated models as well as gaining new knowledge. These aspects are crucial in order to enhance the information exchange from both past and future studies.
Style APA, Harvard, Vancouver, ISO itp.
20

Scoon, Alison. "Analysis and interpretation of SAR data for the English Channel". Thesis, University of Southampton, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.261751.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

Bunks, Carey David. "Random field modeling for interpretation and analysis of layered data". Thesis, Massachusetts Institute of Technology, 1987. http://hdl.handle.net/1721.1/14930.

Pełny tekst źródła
Streszczenie:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1987.
MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING
Bibliography: leaves 288-290.
by Carey David Bunks.
Ph.D.
Style APA, Harvard, Vancouver, ISO itp.
22

Zhang, Lu. "Analysis and Interpretation of Complex Lipidomic Data Using Bioinformatic Approaches". Thesis, Boston College, 2012. http://hdl.handle.net/2345/2656.

Pełny tekst źródła
Streszczenie:
Thesis advisor: Jeffrey H. Chuang
The field of lipidomics has rapidly progressed since its inception only a decade ago. Technological revolutions in mass spectrometry, chromatography, and computational biology now enables high-throughput high-accuracy quantification of the cellular lipidome. One significant improvement of these technologies is that lipids can now be identified and quantified as individual molecular species. Lipidomics provides an additional layer of information to genomics and proteomics and opens a new opportunity for furthering our understanding of cellular signaling networks and physiology, which have broad therapeutic values. As with other 'omics sciences, these new technologies are producing vast amounts of lipidomic data, which require sophisticated statistical and computational approaches for analysis and interpretation. However, computational tools for utilizing such data are sparse. The complexity of lipid metabolic systems and the fact that lipid enzymes remain poorly understood also present challenges to computational lipidomics. The focus of my dissertation has been the development of novel computational methods for systematic study of lipid metabolism in cellular function and human diseases using lipidomic data. In this dissertation, I first present a mathematical model describing cardiolipin molecular species distribution in steady state and its relationship with fatty acid chain compositions. Knowledge of this relationship facilitates determination of isomeric species for complex lipids, providing more detailed information beyond current limits of mass spectrometry technology. I also correlate lipid species profiles with diseases and predict potential therapeutics. Second, I present statistical studies of mechanisms influencing phosphatidylcholine and phosphatidylethanolamine molecular architectures, respectively. I describe a statistical approach to examine dependence of sn1 and sn2 acyl chain regulatory mechanisms. Third, I describe a novel network inference approach and illustrate a dynamic model of ethanolamine glycerophospholipid acyl chain remodeling. The model is the first that accurately and robustly describes lipid species changes in pulse-chase experiments. A key outcome is that the deacylation and reacylation rates of individual acyl chains can be determined, and the resulting rates explain the well-known prevalence of sn1 saturated chains and sn2 unsaturated chains. Lastly, I summarize and remark on future studies for lipidomics
Thesis (PhD) — Boston College, 2012
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
Style APA, Harvard, Vancouver, ISO itp.
23

Thomas, Seimon M. "Signal and neutral processing techniques for the interpretation of mobile robot ultrasonic range data". Thesis, Cardiff University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.246116.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Chelsom, John James Leonard. "The interpretation of data in intensive care medicine : an application of knowledge-based techniques". Thesis, City University London, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.237826.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

Riera, Sardà Alexandre. "Computational Intelligence Techniques for Electro-Physiological Data Analysis". Doctoral thesis, Universitat de Barcelona, 2012. http://hdl.handle.net/10803/107818.

Pełny tekst źródła
Streszczenie:
This work contains the efforts I have made in the last years in the field of Electrophysiological data analysis. Most of the work has been done at Starlab Barcelona S.L. and part of it at the Neurodynamics Laboratory of the Department of Psychiatry and Clinical Psychobiology of the University of Barcelona. The main work deals with the analysis of electroencephalography (EEG) signals, although other signals, such as electrocardiography (ECG), electroculography (EOG) and electromiography (EMG) have also been used. Several data sets have been collected and analysed applying advanced Signal Processing techniques. On a later stage Computational Intelligence techniques, such as Machine Learning and Genetic Algorithms, have been applied, mainly to classify the different conditions from the EEG data sets. 3 applications involving EEG and classification are proposed corresponding to each one of the 3 case studies presented in this thesis. Analysis of Electrophysiological signals for biometric purposes: We demonstrate the potential of using EEG signals for biometric purposes. Using the ENOBIO EEG amplifier, and using only two frontal EEG channels, we are able to authenticate subjects with a performance up to 96.6%. We also looked for features extracted from the ECG signals and in that case the performance was equal to 97.9%. We also fused the results of both modalities achieving a perfect performance. Our system is ready to use and since it only uses 4 channels (2 for EEG, 1 for ECG in the left wrist and 1 as active reference in the right ear lobe), the wireless ENOBIO sensor is perfectly suited for our application. EEG differences in First Psychotic Episode (FPE) Patients: From an EEG data set of 15 FPE patients and the same number of controls, we studied the differences in their EEG signals in order to train a classifier able to recognise to which group an EEG sample comes from. The feature we use are extracted from the EEG by computing the Synchronization Likelihood feature between all possible pairs of channels. The next step is to build a graph and from that graph we extracted the Mean Path Length and the Clustering Coefficient. Those features as a function of the connectivity threshold are then used in our classifiers. We then create several classification problems and we reach up to 100% of classification in some cases. Markers of stress in the EEG signal: In this research, we designed a protocol in which the participants where asked to perform different tasks, each one with a different stress level. Among these tasks we can find the Stroop Test, Mathematical arithmetics and also a fake blood sample test. By extracting the alpha asymmetry and the beta/alpha ration, we where able to discriminate between the different tasks with performances up to 88%. This application can be used with only 3 EEG electrodes, and it can also work in real time. Finally this application can also be used as a neurofeedback training to learn how to cope with stress.
Este trabajo contiene los esfuerzos que he realizado en los últimos años en el campo del análisis de datos electro-fisiológicos. La mayor parte del trabajo se ha hecho en Starlab Barcelona SL y otra parte en el Laboratorio de Neurodinámica del Departamento de Psiquiatría y Psicobiología Clínica de la Universidad de Barcelona. La parte central de esta tesis está relacionado con el análisis de la señales de electroencefalografía (EEG), aunque otras señales, tales como electrocardiografía (ECG), electroculografía (EOG) y electromiografía (EMG) también se han utilizado. Varios conjuntos de datos se han recogido y analizado aplicando técnicas avanzadas de procesamiento de señales. En una fase posterior, técnicas de inteligencia computacional, tales como 'Machine Learning' y algoritmos genéticos, se han aplicado, principalmente para clasificar las diferentes condiciones de los conjuntos de datos de EEG. Las 3 aplicaciones, que involucran EEG y técnicas de clasificación, que se presentan en esta tesis son: -Análisis de señales electro-fisiológicas para aplicaciones de biometría -Diferencias en las características del EEG en pacientes de primer brote psicótico -Marcadores de estrés en la señal de EEG
Style APA, Harvard, Vancouver, ISO itp.
26

Tekieh, Mohammad Hossein. "Analysis of Healthcare Coverage Using Data Mining Techniques". Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/20547.

Pełny tekst źródła
Streszczenie:
This study explores healthcare coverage disparity using a quantitative analysis on a large dataset from the United States. One of the objectives is to build supervised models including decision tree and neural network to study the efficient factors in healthcare coverage. We also discover groups of people with health coverage problems and inconsistencies by employing unsupervised modeling including K-Means clustering algorithm. Our modeling is based on the dataset retrieved from Medical Expenditure Panel Survey with 98,175 records in the original dataset. After pre-processing the data, including binning, cleaning, dealing with missing values, and balancing, it contains 26,932 records and 23 variables. We build 50 classification models in IBM SPSS Modeler employing decision tree and neural networks. The accuracy of the models varies between 76% and 81%. The models can predict the healthcare coverage for a new sample based on its significant attributes. We demonstrate that the decision tree models provide higher accuracy that the models based on neural networks. Also, having extensively analyzed the results, we discover the most efficient factors in healthcare coverage to be: access to care, age, poverty level of family, and race/ethnicity.
Style APA, Harvard, Vancouver, ISO itp.
27

Navarro, Moisés M. "Ocean wave data analysis using Hilbert transform techniques". Thesis, Monterey, California. Naval Postgraduate School, 1996. http://hdl.handle.net/10945/32022.

Pełny tekst źródła
Streszczenie:
A novel technique to determine the phase velocity of long-wavelength shoaling waves is investigated. Operationally, the technique consists of three steps. First, using the Hilbert transform of a time series, the phase of the analytic signal is determined. Second, the correlations of the phases of analytic signals between two points in space are calculated and an average time of travel of the wave fronts is obtained. Third, if directional spectra are available or can be determined from time series of large array of buoys, the angular information can be used to determine the true time of travel. The phase velocity is obtained by dividing the distance between buoys by the correlation time. Using the Hilbert transform approach, there is no explicit assumption of the relation between frequency and wavenumber of waves in the wave field, indicating that it may be applicable to arbitrary wave fields, both linear and nonlinear. Limitations of the approach are discussed.
Style APA, Harvard, Vancouver, ISO itp.
28

Walter, Martin Alan. "Visualization techniques for the analysis of neurophysiological data". Thesis, University of Plymouth, 2004. http://hdl.handle.net/10026.1/2551.

Pełny tekst źródła
Streszczenie:
In order to understand the diverse and complex functions of the Human brain, the temporal relationships of vast quantities of multi-dimensional spike train data must be analysed. A number of statistical methods already exist to analyse these relationships. However, as a result of expansions in recording capability hundreds of spike trains must now be analysed simultaneously. In addition to the requirements for new statistical analysis methods, the need for more efficient data representation is paramount. The computer science field of Information Visualization is specifically aimed at producing effective representations of large and complex datasets. This thesis is based on the assumption that data analysis can be significantly improved by the application of Information Visualization principles and techniques. This thesis discusses the discipline of Information Visualization, within the wider context of visualization. It also presents some introductory neurophysiology focusing on the analysis of multidimensional spike train data and software currently available to support this problem. Following this, the Toolbox developed to support the analysis of these datasets is presented. Subsequently, three case studies using the Toolbox are described. The first case study was conducted on a known dataset in order to gain experience of using these methods. The second and third case studies were conducted on blind datasets and both of these yielded compelling results.
Style APA, Harvard, Vancouver, ISO itp.
29

Hyde, Richard William. "Advanced analysis and visualisation techniques for atmospheric data". Thesis, Lancaster University, 2017. http://eprints.lancs.ac.uk/88136/.

Pełny tekst źródła
Streszczenie:
Atmospheric science is the study of a large, complex system which is becoming increasingly important to understand. There are many climate models which aim to contribute to that understanding by computational simulation of the atmosphere. To generate these models, and to confirm the accuracy of their outputs, requires the collection of large amounts of data. These data are typically gathered during campaigns lasting a few weeks, during which various sources of measurements are used. Some are ground based, others airborne sondes, but one of the primary sources is from measurement instruments on board aircraft. Flight planning for the numerous sorties is based on pre-determined goals with unpredictable influences, such as weather patterns, and the results of some limited analyses of data from previous sorties. There is little scope for adjusting the flight parameters during the sortie based on the data received due to the large volumes of data and difficulty in processing the data online. The introduction of unmanned aircraft with extended flight durations also requires a team of mission scientists with the added complications of disseminating observations between shifts. Earth’s atmosphere is a non-linear system, whereas the data gathered is sampled at discrete temporal and spatial intervals introducing a source of variance. Clustering data provides a convenient way of grouping similar data while also acknowledging that, for each discrete sample, a minor shift in time and/ or space could produce a range of values which lie within its cluster region. This thesis puts forward a set of requirements to enable the presentation of cluster analyses to the mission scientist in a convenient and functional manner. This will enable in-flight decision making as well as rapid feedback for future flight planning. Current state of the art clustering algorithms are analysed and a solution to all of the proposed requirements is not found. New clustering algorithms are developed to achieve these goals. These novel clustering algorithms are brought together, along with other visualization techniques, into a software package which is used to demonstrate how the analyses can provide information to mission scientists in flight. The ability to carry out offline analyses on historical data, whether to reproduce the online analyses of the current sortie, or to provide comparative analyses from previous missions, is also demonstrated. Methods for offline analyses of historical data prior to continuing the analyses in an online manner are also considered. The original contributions in this thesis are the development of five new clustering algorithms which address key challenges: speed and accuracy for typical hyper-elliptical offline clustering; speed and accuracy for offline arbitrarily shaped clusters; online dynamic and evolving clustering for arbitrary shaped clusters; transitions between offline and online techniques and also the application of these techniques to atmospheric science data analysis.
Style APA, Harvard, Vancouver, ISO itp.
30

Davis, Alice. "Modelling techniques for time-to-event data analysis". Thesis, University of Bath, 2018. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.767575.

Pełny tekst źródła
Streszczenie:
This thesis focusses on the cumulative hazard function as a tool for modelling time-to-event data, as opposed to the more common hazard or survival functions. By focussing on and providing a detailed discussion of the properties of these functions a new framework is explored for building complex models from, the relatively simple, cumulative hazards. Parametric families are thoroughly explored in this thesis by detailing types of parameters for time-to-event models. The discussion leads to the proposal of combination parametric families, which aim to provide flexible behaviour of the cumulative hazard function. A common issue in the analysis of time-to-event data is the presence of informative censoring. This thesis explores new models which are useful for dealing with this issue.
Style APA, Harvard, Vancouver, ISO itp.
31

Carrington, Ian B. "Development of blade tip timing data analysis techniques". Thesis, University of Manchester, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492042.

Pełny tekst źródła
Streszczenie:
All turbomachines experience vibration from a variety of sources. Resonance may occur that subjects one or more blades to maximum stress conditions. Blade tip-timing (BTT) is a emergent, alternative blade stress measurement technology that is non-intrusive and noncontacting, yet delivers data on all blades in a rotor stage. By measuring blade tip passing times at a number of sensors mounted externally on the rotor's casing, processing and analysis of this data yields resonance frequencies or Engine Orders, and tip amplitudes. This study sought to advance Rolls-Royce's strategy for BTT by improving its data processing and analysis capabilities in two areas: 1. Data analysis of blades undergoing synchronous resonance, by the development of: • A multiple degrees-of-freedom blade tip displacement simulator, created to provide representative synchronous data under controlled conditions. • The formulation of a number of approaches new to BIT data analysis and their evaluation and comparison using the above simulator: three variations of an Autoregressive (AR) approach an Eigen decomposition technique and a matrix Determinant method. • An experimental Rig, designed and constructed to provide real BTT data at low cost, with which to test and begin validation of the best analysis methods. 2. The productionisation ofBTT at Rolls-Royce, which will be enhanced by: • A fast ellipse-fitting algorithm for the Two Parameter Plot (2PP) analysis method. • An ellipse-fitting Goodness-of-Fit Factor to aid analysis automation. • Cross-correlation data analysis that detects resonance behaviour automatically. This work makes the following recommendations: 1. One of the three AR techniques is robust and reliable enough for industrial use. 2. The fast ellipse-fitting algorithm be written into an industrial 2PP analysis package. 3. Trial Cross-correlation analysis on industrial data to establish its reliability further. 4. Replace optical 'spot' probes with 'line' probes to reduce measurement uncertainties. 5. Increase the industrial BTT system capacity to allow connection of up to eight probes.
Style APA, Harvard, Vancouver, ISO itp.
32

Kelman, Timothy George Harold. "Techniques for capture and analysis of hyperspectral data". Thesis, University of Strathclyde, 2016. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=26434.

Pełny tekst źródła
Streszczenie:
The work presented in this thesis focusses on new techniques for capture and analysis of hyperspectral data. Due to the three-dimensional nature of hyperspectral data, image acquisition often requires some form of movement of either the object or the detector. This thesis presents a novel technique which utilises a rotational line-scan rather than a linear line-scan. Furthermore, a method for automatically calibrating this system using a calibration object is described. Compared with traditional linear scanning systems, the performance is shown to be high enough that a rotational scanning system is a viable alternative. Classification is an important tool in hyperspectral image analysis. In this thesis, five different classification techniques are explained before they are tested on a classification problem; the classification of five different kinds of Chinese tea leaves. The process from capture to pre-processing to classification and post-processing is described. The effects of altering the parameters of the classifers and the pre and post-processing steps are also evaluated. This thesis documents the analysis of baked sponges using hyperspectral imaging. By comparing hyperspectral images of sponges of varying ages with the results of an expert tasting panel, a strong correlation is shown between the hyperspectral data and human determined taste, texture and appearance scores. This data is then used to show the distribution of moisture content throughout a sponge image. While hyperspectral imaging provides significantly more data than a conventional imaging system, the benefits offered by this extra data are not always clear. A quantitative analysis of hyperspectral imaging versus conventional imaging is performed using a rice grain classification problem where spatial, spectral and colour information is compared.
Style APA, Harvard, Vancouver, ISO itp.
33

Pham, Cuong X. "Advanced techniques for data stream analysis and applications". Thesis, Griffith University, 2023. http://hdl.handle.net/10072/421691.

Pełny tekst źródła
Streszczenie:
Deep learning (DL) is one of the most advanced AI techniques that has gained much attention in the last decade and has also been applied in many successful applications such as market stock prediction, object detection, and face recognition. The rapid advances in computational techniques like Graphic Processing Units (GPU) and Tensor Processing Units (TPU) have made it possible to train large deep learning models to obtain high accuracy surpassing human ability in some tasks, e.g., LipNet [9] achieves 93% of accuracy compared with 52% of human to recognize the word from speaker lips movement. Most of the current deep learning research work has focused on designing a deep architecture working in a static environment where the whole training set is known in advance. However, in many real-world applications like predicting financial markets, autonomous cars, and sensor networks, the data often comes in the form of streams with massive volume, and high velocity has affected the scalability of deep learning models. Learning from such data is called continual, incremental, or online learning. When learning a deep model in dynamic environments where the data come from streams, the modern deep learning models usually suffer the so-called catastrophic forgetting problem, one of the most challenging issues that have not been solved yet. Catastrophic forgetting occurs when a model learns new knowledge, i.e., new objects or classes, but its performance in the previously learned classes reduces significantly. The cause of catastrophic forgetting in the deep learning model has been identified and is related to the weight-sharing property. In detail, the model updating the corresponding weights to capture knowledge of the new tasks may push the learned weights of the past tasks away and cause the model performance to degrade. According to the stability-plasticity dilemma [17], if the model weights are too stable, it will not be able to acquire new knowledge, while a model with high plasticity can have large weight changes leading to significant forgetting of the previously learned patterns. Many approaches have been proposed to tackle this issue, like imposing constraints on weights (regularizations) or rehearsal from experience, but significant research gap still exists. First, current regularization methods often do not simultaneously consider class imbalance and catastrophic forgetting. Moreover, these methods usually require more memory to store previous versions of the model, which sometimes is not able to hold a copy of a substantial deep model due to memory constraints. Second, existing rehearsal approaches pay little attention to selecting and storing critical instances that help the model to retain as much knowledge of the learned tasks. This study focuses on dealing with these challenges by proposing several novel methods. We first proposed a new loss function that combines two loss terms to deal with class imbalance data and catastrophic forgetting simultaneously. The former is a modification of a widely used loss function for class imbalance learning, called Focal loss, to handle the exploding gradient (loss goes to NaN) and the ability to learn from highly confident data points. At the same time, the latter is a novel loss term that addresses the catastrophic forgetting within the current mini-batch. In addition, we also propose an online convolution neural network (OCNN) architecture for tabular data that act as a base classifier in an ensemble system (OECNN). Next, we introduce a rehearsal-based method to prevent catastrophic forgetting. In which we select a triplet of instances within each mini-batch to store in the memory buffer. We find that these instances are identified as crucial instances that can help either remind the model of easy tasks or revise for the hard ones. We also propose a class-wise forgetting detector that monitors the performance of each class encountered so far in a stream. If a class’s performance drops below a predefined threshold, that class is identified as a forgetting class. Finally, based on the nature of data which often comprises many modalities, we study online multi-modal multi-task (M3T) learning problems. Unlike the traditional methods in stable environments, online M3T learning need to be considered in many scenarios like missing modalities and incremental tasks. We establish the setting for six frequently happened scenarios for M3T. Most of the existing works in M3T fail to run on all of these scenarios. Therefore, we propose a novel M3T deep learning model called UniCNet that can work on all of these scenarios and achieves superior performance compared with state-of-the-art M3T methods. To conclude, this dissertation contributes to novel computational techniques that deal with catastrophic forgetting problem in continual deep learning.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Info & Comm Tech
Science, Environment, Engineering and Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
34

JABEEN, SAIMA. "Document analysis by means of data mining techniques". Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2537297.

Pełny tekst źródła
Streszczenie:
The huge amount of textual data produced everyday by scientists, journalists and Web users, allows investigating many different aspects of information stored in the published documents. Data mining and information retrieval techniques are exploited to manage and extract information from huge amount of unstructured textual data. Text mining also known as text data mining is the processing of extracting high quality information (focusing relevance, novelty and interestingness) from text by identifying patterns etc. Text mining typically involves the process of structuring input text by means of parsing and other linguistic features or sometimes by removing extra data and then finding patterns from structured data. Patterns are then evaluated at last and interpretation of output is performed to accomplish the desired task. Recently, text mining has got attention in several fields such as in security (involves analysis of Internet news), for commercial (for search and indexing purposes) and in academic departments (such as answering query). Beyond searching the documents consisting the words given in a user query, text mining may provide direct answer to user by semantic web for content based (content meaning and its context). It can also act as intelligence analyst and can also be used in some email spam filters for filtering out unwanted material. Text mining usually includes tasks such as clustering, categorization, sentiment analysis, entity recognition, entity relation modeling and document summarization. In particular, summarization approaches are suitable for identifying relevant sentences that describe the main concepts presented in a document dataset. Furthermore, the knowledge existed in the most informative sentences can be employed to improve the understanding of user and/or community interests. Different approaches have been proposed to extract summaries from unstructured text documents. Some of them are based on the statistical analysis of linguistic features by means of supervised machine learning or data mining methods, such as Hidden Markov models, neural networks and Naive Bayes methods. An appealing research field is the extraction of summaries tailored to the major user interests. In this context, the problem of extracting useful information according to domain knowledge related to the user interests is a challenging task. The main topics have been to study and design of novel data representations and data mining algorithms useful for managing and extracting knowledge from unstructured documents. This thesis describes an effort to investigate the application of data mining approaches, firmly established in the subject of transactional data (e.g., frequent itemset mining), to textual documents. Frequent itemset mining is a widely exploratory technique to discover hidden correlations that frequently occur in the source data. Although its application to transactional data is well-established, the usage of frequent itemsets in textual document summarization has never been investigated so far. A work is carried on exploiting frequent itemsets for the purpose of multi-document summarization so a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer) is presented, that is based on an itemset-based model, i.e., a framework comprise of frequent itemsets, taken out from the document collection. Highly representative and not redundant sentences are selected for generating summary by considering both sentence coverage, with respect to a sentence relevance score, based on tf-idf statistics, and a concise and highly informative itemset-based model. To evaluate the ItemSum performance a suite of experiments on a collection of news articles has been performed. Obtained results show that ItemSum significantly outperforms mostly used previous summarizers in terms of precision, recall, and F-measure. We also validated our approach against a large number of approaches on the DUC’04 document collection. Performance comparisons, in terms of precision, recall, and F-measure, have been performed by means of the ROUGE toolkit. In most cases, ItemSum significantly outperforms the considered competitors. Furthermore, the impact of both the main algorithm parameters and the adopted model coverage strategy on the summarization performance are investigated as well. In some cases, the soundness and readability of the generated summaries are unsatisfactory, because the summaries do not cover in an effective way all the semantically relevant data facets. A step beyond towards the generation of more accurate summaries has been made by semantics-based summarizers. Such approaches combine the use of general-purpose summarization strategies with ad-hoc linguistic analysis. The key idea is to also consider the semantics behind the document content to overcome the limitations of general-purpose strategies in differentiating between sentences based on their actual meaning and context. Most of the previously proposed approaches perform the semantics-based analysis as a preprocessing step that precedes the main summarization process. Therefore, the generated summaries could not entirely reflect the actual meaning and context of the key document sentences. In contrast, we aim at tightly integrating the ontology-based document analysis into the summarization process in order to take the semantic meaning of the document content into account during the sentence evaluation and selection processes. With this in mind, we propose a new multi-document summarizer, namely Yago-based Summarizer, that integrates an established ontology-based entity recognition and disambiguation step. Named Entity Recognition from Yago ontology is being used for the task of text summarization. The Named Entity Recognition (NER) task is concerned with marking occurrences of a specific object being mentioned. These mentions are then classified into a set of predefined categories. Standard categories include “person”, “location”, “geo-political organization”, “facility”, “organization”, and “time”. The use of NER in text summarization improved the summarization process by increasing the rank of informative sentences. To demonstrate the effectiveness of the proposed approach, we compared its performance on the DUC’04 benchmark document collections with that of a large number of state-of-the-art summarizers. Furthermore, we also performed a qualitative evaluation of the soundness and readability of the generated summaries and a comparison with the results that were produced by the most effective summarizers. A parallel effort has been devoted to integrating semantics-based models and the knowledge acquired from social networks into a document summarization model named as SociONewSum. The effort addresses the sentence-based generic multi-document summarization problem, which can be formulated as follows: given a collection of news articles ranging over the same topic, the goal is to extract a concise yet informative summary, which consists of most salient document sentences. An established ontological model has been used to improve summarization performance by integrating a textual entity recognition and disambiguation step. Furthermore, the analysis of the user-generated content coming from Twitter has been exploited to discover current social trends and improve the appealing of the generated summaries. An experimental evaluation of the SociONewSum performance was conducted on real English-written news article collections and Twitter posts. The achieved results demonstrate the effectiveness of the proposed summarizer, in terms of different ROUGE scores, compared to state-of-the-art open source summarizers as well as to a baseline version of the SociONewSum summarizer that does not perform any UGC analysis. Furthermore, the readability of the generated summaries has also been analyzed.
Style APA, Harvard, Vancouver, ISO itp.
35

DE, GIORGIS NIKOLAS. "Multi-scale techniques for multi-dimensional data analysis". Doctoral thesis, Università degli studi di Genova, 2018. http://hdl.handle.net/11567/931227.

Pełny tekst źródła
Streszczenie:
Large datasets of geometric data of various nature are becoming more and more available as sensors become cheaper and more widely used. Due to both their size and their noisy nature, special techniques must be employed to deal with them correctly. In order to efficiently handle this amount of data and to tackle the technical challenges they pose, we propose techniques that analyze a scalar signal by means of its critical points (i.e. maxima and minima), ranking them on a scale of importance, by which we can extrapolate important information of the input signal separating it from noise, thus dramatically reducing the complexity of the problem. In order to obtain a ranking of critical points we employ multi-scale techniques. The standard scale-space approach, however, is not sufficient when trying to track critical points across various scales. We start from an implementation of the scale-space which computes a linear interpolation between scales in order to make tracking of critical points easier. The linear interpolation of a process which is not itself linear, though, does not fulfill some theoretical properties of scale-space, thus making the tracking of critical points much harder. We propose an extension of this piecewiselinear scale-space implementation, which recovers the theoretical properties (e.g., to avoid the generation of new critical points as the scale increases) and keeps the tracking consistent. Next we combine the scale-space with another technique that comes from the topology theory: the classification of critical points based on their persistence value. While the scale-space applies a filtering in the frequency domain, by progressively smoothing the input signal with low-pass filters of increasing size, the computation of the persistence can be seen as a filtering applied in the amplitude domain, which progressively removes pairs of critical points based on their difference in amplitude. The two techniques, while being both relevant to the concept of scale, express different qualities of the critical points of the input signal; depending on the application domain we can use either of them, or, since they both have non-zero values only at critical points, they can be used together with a linear combination. The thesis will be structured as follows: In Chapter 1 we will present an overview on the problem of analyzing huge geometric datasets, focusing on the problem of dealing with their size and noise, and of reducing the problem to a subset of relevant samples. The Chapter 2 will contain a study of the state of the art in scale-space algorithms, followed by a more in-depth analysis of the virtually continuous framework used as base technique will be presented. In its last part, we will propose methods to extend these techniques in order to satisfy the axioms present in the continuous version of the scale-space and to have a stronger and more reliable tracking of critical points across scales, and the extraction of the persistence of critical points of a signal as a variant to the standard scale-space approach; we will show the differences between the two and discuss how to combine them. The Chapter 3 will introduce an ever growing source of data, the motion capture systems; we will motivate its importance by discussing the many applications in which it has been used for the past two decades. We will briefly summarize the different systems existing and then we will focus on a particular one, discussing its peculiarities and its output data. In Chapter 4, we will discuss the problem of studying intra-personal synchronization computed on data coming from such motion-capture systems. We will show how multi-scale approaches can be used to identify relevant instants in the motion and how these instants can be used to precisely study synchronization between the different parts of the body from which they are extracted. We will apply these techniques to the problem of generating a classifier to discriminate between martial artists of different skills who have been recorded doing karate’s movements. In Chapter 5 will present a work on the automatic detection of relevant points of the human face from 3D data. We will show that the Gaussian curvature of the 3D surface is a good feature to distinguish the so-called fiducial points, but also that multi-scale techniques must be used to extract only relevant points and get rid of the noise. In closing, Chapter 6 will discuss an ongoing work about motion segmentation; after an introduction about the meaning and different possibilities of motion segmentation we will present the data we work with, the approach used to identify segments and some preliminary tools and results.
Style APA, Harvard, Vancouver, ISO itp.
36

Wong, Kok W. "A neural fuzzy approach for well log and hydrocyclone data interpretation". Thesis, Curtin University, 1999. http://hdl.handle.net/20.500.11937/1281.

Pełny tekst źródła
Streszczenie:
A novel data analysis approach that is automatic, self-learning and self-explained, and which provides accurate and reliable results is reported. The data analysis tool is capable of performing multivariate non-parametric regression analysis, as well as quantitative inferential analysis using predictive learning. Statistical approaches such as multiple regression or discriminant analysis are usually used to perform this kind of analysis. However, they lack universal capabilities and their success in any particular application is directly affected by the problem complexity.The approach employs the use of Artificial Neural Networks (ANNs) and Fuzzy Logic to perform the data analysis. The features of these two techniques are the means by which the developed data analysis approach has the ability to perform self-learning as well as allowing user interaction in the learning process. Further, they offer a means by which rules may be generated to assist human understanding of the learned analysis model, and so enable an analyst to include external knowledge.Two problems in the resource industry have been used to illustrate the proposed method, as these applications contain non-linearity in the data that is unknown and difficult to derive. They are well log data analysis in petroleum exploration and hydrocyclone data analysis in mineral processing. This research also explores how this proposed data analysis approach could enhance the analysis process for problems of this type.
Style APA, Harvard, Vancouver, ISO itp.
37

Infantosi, Antonio Fernando Catelli. "Interpretation of case occurrences in two communicable diseases using pattern-analysis techniques". Thesis, Imperial College London, 1986. http://hdl.handle.net/10044/1/38047.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Bossers, Lydia C. A. M. "Alcohol markers in hair : new detection techniques and evidence interpretation". Thesis, University of South Wales, 2014. https://pure.southwales.ac.uk/en/studentthesis/alcohol-markers-in-hair(967fb3a2-2257-49ce-b861-d0105b2150be).html.

Pełny tekst źródła
Streszczenie:
It can be useful to discover a person’s chronic drinking consumption in child custody cases and to aid in the diagnosis of diseases like fetal alcohol spectrum disorder. When one alcohol marker in hair is analysed to indicate chronic use false negatives and false positives can occur. When two (ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs)) are analysed false negatives and false positives can be recognized and provide stronger evidence as is underlined statistically in this work. For a combined method, the sample preparation and analytical procedures were optimized. The effect of the decontamination step was difficult to interpret, which shows that addressing issues with external contamination is challenging. Analytes may be extracted from the hair matrix during decontamination and analytes can diffuse into the hair shaft from external contamination. The last is illustrated by the incorporation via excretions of endogenous EtG and FAEEs. A novel and sensitive analytical procedure was developed and validated which saves time and possibly money compared to analysing of both markers separately. The best overall method had a linear calibration curve (r2 > 0:99) and an intra-day (n=3) and inter-day (n=9) accuracy for the quality control samples at three concentration levels between 84–118% with a coefficient of variation of 3–30% for both EtG and the FAEEs. The Bayesian approach was suggested as a new interpretation framework for hair tests, to account for the uncertainties in these tests in a transparent manner. In this work databases were constructed with EtG and FAEEs hair concentrations linked to the subject’s chronic alcohol use, the likelihood ratios were calculated and working examples were provided. This showed that a positive hair test for either EtG or FAEEs may very well be only ’limited’ evidence and therefore should only be used with a high prior odds. This means that a hair test result should not be used in isolation. The large confidence interval in this study also underlines the need for more control data.
Style APA, Harvard, Vancouver, ISO itp.
39

Patrick, Ellis. "Statistical methods for the analysis and interpretation of RNA-Seq data". Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/10438.

Pełny tekst źródła
Streszczenie:
In the post-genomic era, sequencing technologies have become a vital tool in the global analysis of biological systems. RNA-Seq, the sequencing of messenger RNA, in particular has the potential to answer many diverse and interesting questions about the inner workings of cells. Despite the decreasing cost of sequencing data, the majority of RNA-Seq experiments are still suffering from low replication numbers. The statistical methodology for dealing with low replicate RNA-Seq experiments is still in its infancy and has room for further development. Incorporating additional information from publicly accessible databases may provide a plausible avenue to overcome the shortcomings of low replication. Not only could this additional information improve on the ability to find statistically significant signal but this signal should also be more biologically interpretable. This thesis is separated into three distinct statistical problems that arise when processing and analysing RNA-Seq data. Firstly, the use of experimental data to customise gene annotations is proposed. When customised annotations are used to summarise read counts, the corresponding measures of transcript abundance include more information than alternate summarisation approaches and offer improved concordance with qRT-PCR data. A moderation methodology that exploits external estimates of variation is then developed to address the issue of small sample differential expression analysis. This approach performs favourably against existing approaches when comparing gene rankings and sensitivity. With the aim of identifying groups of miRNA-mRNA regulatory relationships, a framework for integrating various databases of prior knowledge with small sample miRNA-Seq and mRNA-Seq data is then outlined. This framework appears to identify more signal than simpler approaches and also provides highly interpretable models of miRNA-mRNA regulation. To conclude, a small sample miRNA-Seq and mRNA-Seq experiment is presented that seeks to discover miRNA-mRNA regulatory relationships associated with loss of Notch2 function and its links to neurodegeneration. This experiment is used to illustrate the methodologies developed in this thesis.
Style APA, Harvard, Vancouver, ISO itp.
40

Campbell, Jonathan G. "Fuzzy logic and neural network techniques in data analysis". Thesis, University of Ulster, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.342530.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Palmer, Nathan Patrick. "Data mining techniques for large-scale gene expression analysis". Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/68493.

Pełny tekst źródła
Streszczenie:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 238-256).
Modern computational biology is awash in large-scale data mining problems. Several high-throughput technologies have been developed that enable us, with relative ease and little expense, to evaluate the coordinated expression levels of tens of thousands of genes, evaluate hundreds of thousands of single-nucleotide polymorphisms, and sequence individual genomes. The data produced by these assays has provided the research and commercial communities with the opportunity to derive improved clinical prognostic indicators, as well as develop an understanding, at the molecular level, of the systemic underpinnings of a variety of diseases. Aside from the statistical methods used to evaluate these assays, another, more subtle challenge is emerging. Despite the explosive growth in the amount of data being generated and submitted to the various publicly available data repositories, very little attention has been paid to managing the phenotypic characterization of their samples (i.e., managing class labels in a controlled fashion). If sense is to be made of the underlying assay data, the samples' descriptive metadata must first be standardized in a machine-readable format. In this thesis, we explore these issues, specifically within the context of curating and analyzing a large DNA microarray database. We address three main challenges. First, we acquire a large subset of a publicly available microarray repository and develop a principled method for extracting phenotype information from freetext sample labels, then use that information to generate an index of the sample's medically-relevant annotation. The indexing method we develop, Concordia, incorporates pre-existing expert knowledge relating to the hierarchical relationships between medical terms, allowing queries of arbitrary specificity to be efficiently answered. Second, we describe a highly flexible approach to answering the question: "Given a previously unseen gene expression sample, how can we compute its similarity to all of the labeled samples in our database, and how can we utilize those similarity scores to predict the phenotype of the new sample?" Third, we describe a method for identifying phenotype-specific transcriptional profiles within the context of this database, and explore a method for measuring the relative strength of those signatures across the rest of the database, allowing us to identify molecular signatures that are shared across various tissues ad diseases. These shared fingerprints may form a quantitative basis for optimal therapy selection and drug repositioning for a variety of diseases.
by Nathan Patrick Palmer.
Ph.D.
Style APA, Harvard, Vancouver, ISO itp.
42

Wong, Kok W. "A neural fuzzy approach for well log and hydrocyclone data interpretation". Curtin University of Technology, School of Electrical and Computer Engineering, 1999. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=10344.

Pełny tekst źródła
Streszczenie:
A novel data analysis approach that is automatic, self-learning and self-explained, and which provides accurate and reliable results is reported. The data analysis tool is capable of performing multivariate non-parametric regression analysis, as well as quantitative inferential analysis using predictive learning. Statistical approaches such as multiple regression or discriminant analysis are usually used to perform this kind of analysis. However, they lack universal capabilities and their success in any particular application is directly affected by the problem complexity.The approach employs the use of Artificial Neural Networks (ANNs) and Fuzzy Logic to perform the data analysis. The features of these two techniques are the means by which the developed data analysis approach has the ability to perform self-learning as well as allowing user interaction in the learning process. Further, they offer a means by which rules may be generated to assist human understanding of the learned analysis model, and so enable an analyst to include external knowledge.Two problems in the resource industry have been used to illustrate the proposed method, as these applications contain non-linearity in the data that is unknown and difficult to derive. They are well log data analysis in petroleum exploration and hydrocyclone data analysis in mineral processing. This research also explores how this proposed data analysis approach could enhance the analysis process for problems of this type.
Style APA, Harvard, Vancouver, ISO itp.
43

Zarei, Mahdi. "Development and evaluation of optimization based data mining techniques analysis of brain data". Thesis, Federation University Australia, 2015. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/97229.

Pełny tekst źródła
Streszczenie:
Neuroscience is an interdisciplinary science which deals with the study of structure and function of the brain and nervous system. Neuroscience encompasses disciplines such as computer science, mathematics, engineering, and linguistics. The structure of the healthy brain and representation of information by neural activity are among most challenging problems in neuroscience. Neuroscience is experiencing exponentially growing volumes of data obtained by using different technologies. The investigation of such data has tremendous impact on developing new and improving existing models of both healthy and diseased brains. Various techniques have been used for collecting brain data sets for addressing neuroscience problems. These data sets can be categorized into two main groups: resting-state and state-dependent data sets. Resting-state data is based on recording the brain activity when a subject does not think about any specific concept while state-dependent data is based on recording brain activity related to specific tasks. In general, brain data sets contain a large number of features (e.g. tens of thousands) and significantly fewer samples (e.g. several hundred). Such data sets are sparse and noisy. In addition to these problems, brain data sets have a few number of subjects. Brains are very complex systems and data about any brain activity reflects very complex relationship between neurons as well as different parts of the brain. Such relationships are highly nonlinear and general purpose data mining algorithms are not always efficient for their study. The development of machine learning techniques for brain data sets is an emerging research area in neuroscience. Over the last decade, various machine learning techniques have been developed for application to brain data sets. In the meantime, some well-known algorithms such as feature selection and supervised classification have been modified for analysis of brain data sets. Support vector machines, logistic regression, and Gaussian Naive Bayes classifiers are widely used for application to brain data sets. However, Support vector machines and logistic regression algorithms are not efficient for sparse and noisy data sets and Gaussian Naive Bayes classifiers do not give high accuracy. The aim of this study is to develop new and modify the existing data mining algorithms for the analysis brain data sets. Our contribution in this thesis can be listed as follow: 1. Development of new algorithms: 1.1. Development of new voxel (feature) selection algorithms for Functional magnetic resonance imaging (fMRI) data sets, and evaluation of these algorithms on the Haxby and Science 2008 data sets. 1.2. Development of new feature selection algorithm based on the catastrophe model for regression analysis problems. 2. Development and evaluation of different versions of the adaptive neuro-fuzzy model for the analysis of the spike-discharge as a function of other neuronal parameters. 3. Development and evaluation of the modified global k-means clustering algorithm for investigation of the structure of the healthy brain. 4. Development and evaluation of region of interest (ROI) method for analysis of brain functionalconnectivity in healthy subjects and schizophrenia patients.
Doctor of Philosophy
Style APA, Harvard, Vancouver, ISO itp.
44

Abdalla, Taysir. "Performance analysis of disk mirroring techniques". FIU Digital Commons, 1994. http://digitalcommons.fiu.edu/etd/1061.

Pełny tekst źródła
Streszczenie:
Unequaled improvements in processor and I/O speeds make many applications such as databases and operating systems to be increasingly I/O bound. Many schemes such as disk caching and disk mirroring have been proposed to address the problem. In this thesis we focus only on disk mirroring. In disk mirroring, a logical disk image is maintained on two physical disks allowing a single disk failure to be transparent to application programs. Although disk mirroring improves data availability and reliability, it has two major drawbacks. First, writes are expensive because both disks must be updated. Second, load balancing during failure mode operation is poor because all requests are serviced by the surviving disk. Distorted mirrors was proposed to address the write problem and interleaved declustering to address the load balancing problem. In this thesis we perform a comparative study of these two schemes under various operating modes. In addition we also study traditional mirroring to provide a common basis for comparison.
Style APA, Harvard, Vancouver, ISO itp.
45

Ferguson, Alexander B. "Higher order strictness analysis by abstract interpretation over finite domains". Thesis, University of Glasgow, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308143.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

Tobeck, Daniel. "Data Structures and Reduction Techniques for Fire Tests". Thesis, University of Canterbury. Civil Engineering, 2007. http://hdl.handle.net/10092/1578.

Pełny tekst źródła
Streszczenie:
To perform fire engineering analysis, data on how an object or group of objects burn is almost always needed. This data should be collected and stored in a logical and complete fashion to allow for meaningful analysis later. This thesis details the design of a new fire test Data Base Management System (DBMS) termed UCFIRE which was built to overcome the limitations of existing fire test DBMS and was based primarily on the FDMS 2.0 and FIREBASEXML specifications. The UCFIRE DBMS is currently the most comprehensive and extensible DBMS available in the fire engineering community and can store the following test types: Cone Calorimeter, Furniture Calorimeter, Room/Corner Test, LIFT and Ignitability Apparatus Tests. Any data reduction which is performed on this fire test data should be done in an entirely mechanistic fashion rather than rely on human intuition which is subjective. Currently no other DBMS allows for the semi-automation of the data reduction process. A number of pertinent data reduction algorithms were investigated and incorporated into the UCFIRE DBMS. An ASP.NET Web Service (WEBFIRE) was built to reduce the bandwidth required to exchange fire test information between the UCFIRE DBMS and a UCFIRE document stored on a web server. A number of Mass Loss Rate (MLR) algorithms were investigated and it was found that the Savitzky-Golay filtering algorithm offered the best performance. This algorithm had to be further modified to autonomously filter other noisy events that occurred during the fire tests. This algorithm was then evaluated on test data from exemplar Furniture Calorimeter and Cone Calorimeter tests. The LIFT test standard (ASTM E 1321-97a) requires its ignition and flame spread data to be scrutinised but does not state how to do this. To meet these requirements the fundamentals of linear regression were reviewed and an algorithm to mechanistically scrutinise ignition and flame spread data was developed. This algorithm seemed to produce reasonable results when used on exemplar ignition and flame spread test data.
Style APA, Harvard, Vancouver, ISO itp.
47

Sloan, Lauren Elizabeth. "Methods for analysis of missing data using simulated longitudinal data with a binary outcome". Oklahoma City : [s.n.], 2005.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Nyumbeka, Dumisani Joshua. "Using data analysis and Information visualization techniques to support the effective analysis of large financial data sets". Thesis, Nelson Mandela Metropolitan University, 2016. http://hdl.handle.net/10948/12983.

Pełny tekst źródła
Streszczenie:
There have been a number of technological advances in the last ten years, which has resulted in the amount of data generated in organisations increasing by more than 200% during this period. This rapid increase in data means that if financial institutions are to derive significant value from this data, they need to identify new ways to analyse this data effectively. Due to the considerable size of the data, financial institutions also need to consider how to effectively visualise the data. Traditional tools such as relational database management systems have problems processing large amounts of data due to memory constraints, latency issues and the presence of both structured and unstructured data The aim of this research was to use data analysis and information visualisation techniques (IV) to support the effective analysis of large financial data sets. In order to visually analyse the data effectively, the underlying data model must produce results that are reliable. A large financial data set was identified, and used to demonstrate that IV techniques can be used to support the effective analysis of large financial data sets. A review of the literature on large financial data sets, visual analytics, existing data management and data visualisation tools identified the shortcomings of existing tools. This resulted in the determination of the requirements for the data management tool, and the IV tool. The data management tool identified was a data warehouse and the IV toolkit identified was Tableau. The IV techniques identified included the Overview, Dashboards and Colour Blending. The IV tool was implemented and published online and can be accessed through a web browser interface. The data warehouse and the IV tool were evaluated to determine their accuracy and effectiveness in supporting the effective analysis of the large financial data set. The experiment used to evaluate the data warehouse yielded positive results, showing that only about 4% of the records had incorrect data. The results of the user study were positive and no major usability issues were identified. The participants found the IV techniques effective for analysing the large financial data set.
Style APA, Harvard, Vancouver, ISO itp.
49

Williams, Saunya Michelle. "Effects of image compression on data interpretation for telepathology". Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42762.

Pełny tekst źródła
Streszczenie:
When geographical distance poses as a barrier, telepathology is designed to offer pathologists the opportunity to replicate their normal activities by using an alternative means of practice. The rapid progression in technology has greatly influenced the appeal of telepathology and its use in multiple domains. To that point, telepathology systems help to afford teleconsultation services for remote locations, improve the workload distribution in clinical environments, measure quality assurance, and also enhance educational programs. While telepathology is an attractive method to many potential users, the resource requirements for digitizing microscopic specimens have hindered widespread adoption. The use of image compression is extremely critical to help advance the pervasiveness of digital images in pathology. For this research, we characterize two different methods that we use to assess compression of pathology images. Our first method is characterized by the fact that image quality is human-based and completely subjective in terms of interpretation. Our second method is characterized by the fact that image analysis is introduced by using machine-based interpretation to provide objective results. Additionally, the objective outcomes from the image analysis may also be used to help confirm tumor classification. With these two methods in mind, the purpose of this dissertation is to quantify the effects of image compression on data interpretation as seen by human experts and a computerized algorithm for use in telepathology.
Style APA, Harvard, Vancouver, ISO itp.
50

Turville, Christopher, University of Western Sydney i of Informatics Science and Technology Faculty. "Techniques to handle missing values in a factor analysis". THESIS_FIST_XXX_Turville_C.xml, 2000. http://handle.uws.edu.au:8081/1959.7/395.

Pełny tekst źródła
Streszczenie:
A factor analysis typically involves a large collection of data, and it is common for some of the data to be unrecorded. This study investigates the ability of several techniques to handle missing values in a factor analysis, including complete cases only, all available cases, imputing means, an iterative component method, singular value decomposition and the EM algorithm. A data set that is representative of that used for a factor analysis is simulated. Some of this data are then randomly removed to represent missing values, and the performance of the techniques are investigated over a wide range of conditions. Several criteria are used to investigate the abilities of the techniques to handle missing values in a factor analysis. Overall, there is no one technique that performs best for all of the conditions studied. The EM algorithm is generally the most effective technique except when there are ill-conditioned matrices present or when computing time is of concern. Some theoretical concerns are introduced regarding the effects that changes in the correlation matrix will have on the loadings of a factor analysis. A complicated expression is derived that shows that the change in factor loadings as a result of change in the elements of a correlation matrix involves components of eigenvectors and eigenvalues.
Doctor of Philosophy (PhD)
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii