Dissertations / Theses: 'Data Importance'

1

Törnqvist, Christian. "Evaluating the Importance of Disk-locality for Data Analytics Workloads : Evaluating the Importance of Disk-locality for Data Analytics Workloads." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-410212.

Full text

Abstract:

Designing on-premise hardware platforms to deal with big data analytics should be done in a way in which the available resources can be scaled both up and down depending on future needs. Two of the main components of an analytics cluster is the data storage and computational part. Separating those two components yields great value but can come with the price of performance loss if not set up properly. The objective of this thesis is to examine how much the performance gets impacted when the computational and storage part gets divided into different hardware nodes. To get data on how well this separation could be done, several tests were conducted on different hardware setups. These tests included real-world workloads run on configurations where both the storage and the computation took place on the same nodes and on configurations where these components were separated. While those tests were done on a smaller scale with only three compute nodes parallel, tests with similar workloads were also conducted on a larger scale with up to 32 computational nodes. The tests revealed that separating compute from storage on a smaller scale could be done without any significant performance drawbacks. However,when the computational components grew large enough,bottlenecks in the storage cluster surfaced. While the results on a smaller scale were satisfactory,further improvements could be made for the larger-scale tests.

APA, Harvard, Vancouver, ISO, and other styles

2

Stephens, Joshua J. "Data Governance Importance and Effectiveness| Health System Employee Perception." Thesis, Central Michigan University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10751061.

Full text

Abstract:

The focus of this study was to understand how health system employees define Data Governance (DG), how they perceive its importance and effectiveness to their role and how it may impact strategic outcomes of the organization. Having a better understanding of employee perceptions will help identify areas of education, process improvement and opportunities for more structured data governance within the healthcare industry. Additionally, understanding how employees associate each of these domains to strategic outcomes, will help inform decision-makers on how best to align the Data Governance strategy with that of the organization.

This research is intended to expand the data governance community’s knowledge about how health system employee demographics influence their perceptions of Data Governance. Very little academic research has been done to-date, which is unfortunate given the value of employee engagement to an organization’s culture juxtaposed to the intent of Data Governance to change that culture into one that fully realizes the value of its data and treats it as a corporate asset. This lack of understanding leads to two distinct problems: executive resistance toward starting a Data Governance Program due to the lack of association between organizational strategic outcomes and Data Governance, and employee, or cultural, resistance to the change Data Governance brings to employee roles and processes.

The dataset for this research was provided by a large mid-west health system’s Enterprise Data Governance Program and was collected internally through an electronic survey. A mixed methods approach was taken. The first analysis intended to see how employees varied in their understanding of the definition of data governance as represented by the Data Management Association’s DAMA Wheel. The last three research questions focused on determining which factors influence a health system employee’s perception of the importance, effectiveness, and impact Data Governance has on their role and on the organization.

Perceptions on the definition of Data Governance varied slightly for Gender, Management Role, IT Role, and Role Tenure, and the thematic analysis identified a lack of understanding of Data Governance by health system employees. Perceptions of Data Governance importance and effectiveness varied by participants’ gender, and organizational role as part of analytics, IT, and Management. In general, employees perceive a deficit of data governance to their role based on their perceptions of importance and effectiveness. Lastly, employee perceptions of the impact of Data Governance on strategic outcomes varied among participants by gender for Cost of Care and by Analytics Role for Quality of Analytics. For both Quality of Care and Patient Experience, perceptions did not vary.

Perceptions related to the impact of Data Governance on strategic outcomes found that Data Quality Management was most impactful to all four strategic outcomes included in the study: quality of care, cost of care, patient experience, and quality of analytics. Leveraging the results of this study to tailor communication, education and training, and roles and responsibilities required for a successful implementation of Data Governance in healthcare should be considered by DG practitioners and executive leadership implementing or evaluating a DG Program within a healthcare organization. Additionally, understanding employee perceptions of Data Governance and their impact to strategic outcomes will provide meaningful insight to executive leadership who have difficulty connecting the cost of Data Governance to the value realization, which is moving the organization closer to achieving the Triple Aim by benefiting from their data.

APA, Harvard, Vancouver, ISO, and other styles

3

Bordoloi, Udeepta Dutta. "Importance-driven algorithms for scientific visualization." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1118952958.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains xiv, 126 p.; also includes graphics. Includes bibliographical references (p. 119-126). Available online via OhioLINK's ETD Center

APA, Harvard, Vancouver, ISO, and other styles

4

Northrop, Amanda Rosalind. "Importance of various data sources in deterministic stock assessment models." Thesis, Rhodes University, 2008. http://hdl.handle.net/10962/d1002811.

Full text

Abstract:

In fisheries, advice for the management of fish populations is based upon management quantities that are estimated by stock assessment models. Fisheries stock assessment is a process in which data collected from a fish population are used to generate a model which enables the effects of fishing on a stock to be quantified. This study determined the effects of various data sources, assumptions, error scenarios and sample sizes on the accuracy with which the age-structured production model and the Schaefer model (assessment models) were able to estimate key management quantities for a fish resource similar to the Cape hakes (Merluccius capensis and M. paradoxus). An age-structured production model was used as the operating model to simulate hypothetical fish resource population dynamics for which management quantities could be determined by the assessment models. Different stocks were simulated with various harvest rate histories. These harvest rates produced Downhill trip data, where harvest rates increase over time until the resource is close to collapse, and Good contrast data, where the harvest rate increases over time until the resource is at less than half of it’s exploitable biomass, and then it decreases allowing the resource to rebuild. The accuracy of the assessment models were determined when data were drawn from the operating model with various combinations of error. The age-structured production model was more accurate at estimating maximum sustainable yield, maximum sustainable yield level and the maximum sustainable yield ratio. The Schaefer model gave more accurate estimates of Depletion and Total Allowable Catch. While the assessment models were able to estimate management quantities using Downhill trip data, the estimates improved significantly when the models were tuned with Good contrast data. When autocorrelation in the spawner-recruit curve was not accounted for by the deterministic assessment model, inaccuracy in parameter estimates were high. The assessment model management quantities were not greatly affected by multinomial ageing error in the catch-at-age matrices at a sample size of 5000 otoliths. Assessment model estimates were closer to their true values when log-normal error were assumed in the catch-at-age matrix, even when the true underlying error were multinomial. However, the multinomial had smaller coefficients of variation at all sample sizes, between 1000 and 10000, of otoliths aged. It was recommended that the assessment model is chosen based on the management quantity of interest. When the underlying error is multinomial, the weighted log-normal likelihood function should be used in the catch-at-age matrix to obtain accurate parameter estimates. However, the multinomial likelihood should be used to minimise the coefficient of variation. Investigation into correcting for autocorrelation in the stock-recruitment relationship should be carried out, as it had a large effect on the accuracy of management quantities.

APA, Harvard, Vancouver, ISO, and other styles

5

Wan, Shuyan. "Likelihood-based procedures for obtaining confidence intervals of disease Loci with general pedigree data." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164815591.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Matthäus, Antje, and Markus Dammers. "Computational underground short-term mine planning: the importance of real-time data." Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2018. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-231345.

Full text

Abstract:

Short-term mine plans are the key operational basis for ore production targets ranging from shift to weekly or monthly targets. Short-term plans cover detailed operational subprocesses such as development, extraction and backfill schedules as well as materials handling and blending processes. The aim is to make long-term goals feasible by providing a constant plant feed that complies with quality constraints. Short-term mine planning highly depends on the accuracy of the resource model as well as the current production status and equipment fleet. Most of these parameters are characterized by uncertainties due to a lack of information and equipment reliability. At the same time, concentrate production and quality must be kept within acceptable ranges to ensure productivity and economic viability of the operation. Within the EU-funded Real-Time Mining project, the reduction of uncertainty in mine planning is carried by using real-time data. Ore and rock characteristics of active faces and equipment data are iteratively integrated in a simulation-based optimization tool. Therefore, predicted processing plant efficiencies can be met by delivering constant ore grades. Hence, a constant concentrate quality is ensured and long-term targets can be fulfilled. Consequently, a more reliable exploitation plan of the mineral reserve is facilitated.

APA, Harvard, Vancouver, ISO, and other styles

7

Mafu, Thandile John. "Modelling of multi-state panel data : the importance of the model assumptions." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/95994.

Full text

Abstract:

Thesis (MCom)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: A multi-state model is a way of describing a process in which a subject moves through a series of states in continuous time. The series of states might be the measurement of a disease for example in state 1 we might have subjects that are free from disease, in state 2 we might have subjects that have a disease but the disease is mild, in state 3 we might have subjects having a severe disease and in last state 4 we have those that die because of the disease. So Markov models estimates the transition probabilities and transition intensity rates that describe the movement of subjects between these states. The transition might be for example a particular subject or patient might be slightly sick at age 30 but after 5 years he or she might be worse. So Markov model will estimate what probability will be for that patient for moving from state 2 to state 3. Markov multi-state models were studied in this thesis with the view of assessing the Markov models assumptions such as homogeneity of the transition rates through time, homogeneity of the transition rates across the subject population and Markov property or assumption. The assessments of these assumptions were based on simulated panel or longitudinal dataset which was simulated using the R package named msm package developed by Christopher Jackson (2014). The R code that was written using this package is attached as appendix. Longitudinal dataset consists of repeated measurements of the state of a subject and the time between observations. The period of time with observations in longitudinal dataset is being made on subject at regular or irregular time intervals until the subject dies then the study ends.
AFRIKAANSE OPSOMMING: ’n Meertoestandmodel is ’n manier om ’n proses te beskryf waarin ’n subjek in ’n ononderbroke tydperk deur verskeie toestande beweeg. Die verskillende toestande kan byvoorbeeld vir die meting van siekte gebruik word, waar toestand 1 uit gesonde subjekte bestaan, toestand 2 uit subjekte wat siek is, dog slegs matig, toestand 3 uit subjekte wat ernstig siek is, en toestand 4 uit subjekte wat aan die siekte sterf. ’n Markov-model raam die oorgangswaarskynlikhede en -intensiteit wat die subjekte se vordering deur hierdie toestande beskryf. Die oorgang is byvoorbeeld wanneer ’n bepaalde subjek of pasiënt op 30-jarige ouderdom net lig aangetas is, maar na vyf jaar veel ernstiger siek is. Die Markov-model raam dus die waarskynlikheid dat so ’n pasiënt van toestand 2 tot toestand 3 sal vorder. Hierdie tesis het ondersoek ingestel na Markov-meertoestandmodelle ten einde die aannames van die modelle, soos die homogeniteit van oorgangstempo’s oor tyd, die homogeniteit van oorgangstempo’s oor die subjekpopulasie en tipiese Markov-eienskappe, te beoordeel. Die beoordeling van hierdie aannames was gegrond op ’n gesimuleerde paneel of longitudinale datastel wat met behulp van Christopher Jackson (2014) se R-pakket genaamd msm gesimuleer is. Die R-kode wat met behulp van hierdie pakket geskryf is, word as bylae aangeheg. Die longitudinale datastel bestaan uit herhaalde metings van die toestand waarin ’n subjek verkeer en die tydsverloop tussen waarnemings. Waarnemings van die longitudinale datastel word met gereelde of ongereelde tussenposes onderneem totdat die subjek sterf, wanneer die studie dan ook ten einde loop.

APA, Harvard, Vancouver, ISO, and other styles

8

Matthäus, Antje, and Markus Dammers. "Computational underground short-term mine planning: the importance of real-time data." TU Bergakademie Freiberg, 2017. https://tubaf.qucosa.de/id/qucosa%3A23194.

Full text

Abstract:

Short-term mine plans are the key operational basis for ore production targets ranging from shift to weekly or monthly targets. Short-term plans cover detailed operational subprocesses such as development, extraction and backfill schedules as well as materials handling and blending processes. The aim is to make long-term goals feasible by providing a constant plant feed that complies with quality constraints. Short-term mine planning highly depends on the accuracy of the resource model as well as the current production status and equipment fleet. Most of these parameters are characterized by uncertainties due to a lack of information and equipment reliability. At the same time, concentrate production and quality must be kept within acceptable ranges to ensure productivity and economic viability of the operation. Within the EU-funded Real-Time Mining project, the reduction of uncertainty in mine planning is carried by using real-time data. Ore and rock characteristics of active faces and equipment data are iteratively integrated in a simulation-based optimization tool. Therefore, predicted processing plant efficiencies can be met by delivering constant ore grades. Hence, a constant concentrate quality is ensured and long-term targets can be fulfilled. Consequently, a more reliable exploitation plan of the mineral reserve is facilitated.

APA, Harvard, Vancouver, ISO, and other styles

9

Kaponen, Martina. "Fairness and parameter importance in logistic regression models of criminal sentencing data." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-417359.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Nalini, Ramakrishna Sindhu Kanya. "Component importance indices and failure prevention using outage data in distribution systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287173.

Full text

Abstract:

Interruptions in power supply are inevitable due to faults in power system distribution network. These interruptions are not only expensive for the customers but also for the distribution system operator in the form of penalties. Increase in system redundancy or the use of component-specific sensors can help in reduction of interruptions. However, these options are not always economically feasible. Therefore, there is a need to check for other possibilities to reduce the risk of outages. The data stored in substations can be used for reducing the risk of outages by deriving component importance indices followed by ranking and predicting the outages. This thesis presents component importance indices derived by identifying the critical components in the grid and assigning index based on certain criterion. The model for predicting the faults is based on the weather conditions observed during the outages in the past. Component importance indices are derived and ranked based on the de-energisation time of components, frequency and impact of outages. This helps prioritize components according to the chosen criterion and adapt monitoring strategies by focusing on the most critical components. Based on categorical Naive Bayes, a model is developed to predict the probability of fault/failure, location and component type likely to be affected for a given set of weather conditions. The results from the component importance indices reveal that each component’s rank varies based on the chosen criterion. This indicates that certain components are critical with respect to specific criterion and not all criteria. However, some components are ranked high in all the methods. These components are critical and need focused monitoring. The reliability of results from component importance indices to a great extent depends on the time frame of the outage data considered for analysis. The prediction model can alert the distribution system operator regarding the possible outages in the network for a given set of weather conditions. However, the prediction of location and component type likely to be affected is relatively inaccurate, since the number of outages considered in the time frame is low. By updating the model regularly with new data, the predictions would be more accurate.
Avbrott i strömförsörjningen är oundvikliga på grund av fel i distributionsnätet för kraftsystemet. Dessa avbrott är inte bara dyra för kunderna utan också för distributionssystemoperatören i form av påföljder. Ökad systemredundans eller användning av komponentspecifika sensorer kan hjälpa till att minska avbrott. Dessa alternativ är dock inte alltid ekonomiskt genomförbara. Därför är det nödvändigt att kontrollera om det finns andra möjligheter för att minska risken för avbrott. Data lagrade i transformatorstationer kan användas för att minska risken för avbrott genom att härleda komponentviktindex följt av rangordning och förutsäga avbrott. I denna avhandling härleds viktighetsindex genom att identifiera de kritiska komponenterna i nätet och tilldela index baserat på vissa kriterier. Felprognoserna gjordes baserat på de väderförhållanden som observerades under avbrott. komponentviktighetsindex härleds och rankas baserat på komponenternas urladdningstid, frekvens och påverkan av avbrott. Detta hjälper till att prioritera komponenter enligt det valda kriteriet och anpassa övervakningsstrategier genom att fokusera på de mest kritiska komponenterna. Baserat på kategoriska Naive Bayes utvecklas en modell för att förutsäga sannolikheten för fel / fel, plats och komponenttyp som sannolikt kommer att påverkas under en viss uppsättning väderförhållanden. Resultaten från komponentviktighetsindexen visar att varje komponents rang varierar beroende på det valda kriteriet. Vissa komponenter rankas dock högt i alla metoder. Dessa komponenter är kritiska och behöver fokuserad övervakning. Tillförlitligheten hos resultat från komponentviktindex beror till stor del på tidsramen för avbrottsdata som beaktas för analys. Prognosmodellen kan varna distributionssystemoperatören om möjliga avbrott i nätverket för en viss uppsättning väderförhållanden. Förutsägelsen av plats och komponenttyp som sannolikt kommer att påverkas är dock relativt felaktig, eftersom antalet avbrott som beaktas i tidsramen är lågt. Genom att uppdatera modellen regelbundet med nya data skulle förutsägelserna vara mer exakta.

APA, Harvard, Vancouver, ISO, and other styles

11

Williams, Rachel L. "The importance and effectiveness of volunteer-collected data in ecology and conservation." Thesis, University of Gloucestershire, 2012. http://eprints.glos.ac.uk/2459/.

Full text

Abstract:

Volunteers have been collecting ecological data for centuries. However, volunteercollected data are frequently challenged because they lack the precision and rigour of scientific studies. This thesis evaluates the advantages of volunteer‐collected data and the importance of such data for the study of ecology and conservation, and considers methods to verify data to avoid or reduce inaccuracies. Different case studies aimed to answer questions relating to species’ ecology, habitat selection, and behaviour. Charismatic mammals were selected in order to increase volunteer participation (Water voles Arvicola terrestris; dormice Muscardinus avellanarius; North American otters Lontra canadensis; hedgehogs Erinaceus europaeus). Simple, rapid data collection methods were used so that volunteers and citizen scientists could easily follow instructions. The findings show that simple methods such as scales and estimates can be an effective way of studying water vole habitat associations; however, inter‐observer variability was highly problematic when volunteers collected data based on subjective estimations. A volunteer‐collected long‐term dataset on dormouse nestbox occupancy provided excellent information on habitat selection despite some irregularities when the data were recorded. Untrained citizen scientists could not record activity budgets for captive otters despite simple instructions, whereas citizen scientists were able to record habitat variables within their gardens, but false absences were found to be an issue when they recorded hedgehog sightings. Overall, this thesis suggests that volunteer‐collected data can provide useful insights into various aspects of ecology, for example, for studying distributions and species‐habitat interactions. Encouraging volunteers to collect ecological data has additional benefits such as increasing the health and wellbeing of participants, and it also raises public awareness of conservation issues. Recommendations on how to increase participation rates while minimising sources of error and bias are given.

APA, Harvard, Vancouver, ISO, and other styles

12

Fang, Tongtong. "Learning from noisy labelsby importance reweighting: : a deep learning approach." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264125.

Full text

Abstract:

Noisy labels could cause severe degradation to the classification performance. Especially for deep neural networks, noisy labels can be memorized and lead to poor generalization. Recently label noise robust deep learning has outperformed traditional shallow learning approaches in handling complex input data without prior knowledge of label noise generation. Learning from noisy labels by importance reweighting is well-studied. Existing work in this line using deep learning failed to provide reasonable importance reweighting criterion and thus got undesirable experimental performances. Targeting this knowledge gap and inspired by domain adaptation, we propose a novel label noise robust deep learning approach by importance reweighting. Noisy labeled training examples are weighted by minimizing the maximum mean discrepancy between the loss distributions of noisy labeled and clean labeled data. In experiments, the proposed approach outperforms other baselines. Results show a vast research potential of applying domain adaptation in label noise problem by bridging the two areas. Moreover, the proposed approach potentially motivate other interesting problems in domain adaptation by enabling importance reweighting to be used in deep learning.
Felaktiga annoteringar kan sänka klassificeringsprestanda.Speciellt för djupa nätverk kan detta leda till dålig generalisering. Nyligen har brusrobust djup inlärning överträffat andra inlärningsmetoder när det gäller hantering av komplexa indata Befintligta resultat från djup inlärning kan dock inte tillhandahålla rimliga viktomfördelningskriterier. För att hantera detta kunskapsgap och inspirerat av domänanpassning föreslår vi en ny robust djup inlärningsmetod som använder omviktning. Omviktningen görs genom att minimera den maximala medelavvikelsen mellan förlustfördelningen av felmärkta och korrekt märkta data. I experiment slår den föreslagna metoden andra metoder. Resultaten visar en stor forskningspotential för att tillämpa domänanpassning. Dessutom motiverar den föreslagna metoden undersökningar av andra intressanta problem inom domänanpassning genom att möjliggöra smarta omviktningar.

APA, Harvard, Vancouver, ISO, and other styles

13

王達才 and Tat-choi Wong. "The strategic importance of information systems to airline revenue management." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1995. http://hub.hku.hk/bib/B31266873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Hjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.

Full text

Abstract:

The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.
Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.

APA, Harvard, Vancouver, ISO, and other styles

15

Xu, Yan. "Using data to answer questions of public health importance for ACT Health, with an emphasis on routinely-collected linked data." Master's thesis, Canberra, ACT : The Australian National University, 2017. http://hdl.handle.net/1885/144601.

Full text

Abstract:

My field placement was with the Epidemiology Section in the Population Health Protection and Prevention Division at ACT Health. Within this placement, I have completed four projects for this thesis: an analysis of Emergency Department (ED) data; a gastroenteritis outbreak investigation; an evaluation of a population health survey and, for my main project, a study of unplanned hospital readmissions. One of the motivations for undertaking these projects was to promote better use of the routinely-collected linked data to answer questions of public health importance for ACT Health. My data analysis project was an analysis of frequent ED use in the Australian Capital Territory (ACT). This is the first study to quantify and characterise ED frequent users in the ACT. The results support existing evidence that frequent users tend to be older, female, and/or single, and commonly present with pain-related conditions. The data also showed that compared to non-frequent ED users, frequent users were more likely to be referred by police, corrective or community services; arrive by ambulance, not wait to be assessed, or leave at their own risk. In addition, we investigated visit intervals, rarely reported on in other studies. This study found around one third of frequent users returned within 7 days, with 41% of their visits having the same diagnosis as the last visit. Early identification and follow-up in the community for frequent users will assist in the development of targeted strategies to improve health service delivery to this vulnerable group. Unexpected return to hospital has negative impacts on families and healthcare systems. We examined which conditions have the highest rates of readmission and contribute most to 30-day unplanned readmissions in the ACT, and which patient characteristics are associated with readmissions. The study identified a 30-day unplanned readmission rate of 6.2%, with admission rates highest for alcohol-related liver disease (19.2%), and heart valve disorders (17.4%). Older age and comorbidities are strong predictors for 30-day unplanned readmissions. For some conditions the rates were relatively high, suggesting areas to target for reducing readmissions. Therefore, when developing preventative strategies and post-discharge plans, particular consideration should be given to patients at older age or with underlying comorbidities. As part of the ACT Health Survey Program (HSP), the ACT General Health Survey (GHS) is a computer-assisted telephone interviewing survey conducted every year among ACT residents. My evaluation of the GHS found that it is a useful tool to monitor trends of overweight, obesity, nutrition and physical activity for adults and children in the ACT. The data collected are used to provide evidence to understand and analyse overweight and obesity patterns in the ACT and create awareness of unhealthy lifestyles. However, improvements could be made in a few areas, including: developing a proper evaluation plan and a data quality statement, increasing the sample size and the proportion of young people in the sample population. I also carried out an outbreak investigation of foodborne gastroenteritis that occurred among staff and public members at a large national institution in Canberra. I conducted two studies for this outbreak – a retrospective cohort study and a case control study. The epidemiological, environmental and laboratory evidence suggested the outbreak was caused by C. perfringens toxin Type A, with the likely vehicles of transmission being butter chicken and rice. The findings of this investigation suggest that a breakdown in temperature control and good food handling practices may have resulted in C. perfringens bacterium growing rapidly and producing a toxin which caused the illness. This project also indicated that the value of a second epidemiological study was questionable given the limited time and resources available.

APA, Harvard, Vancouver, ISO, and other styles

16

梁南柱 and Nam-chu Alexander Leung. "The strategic importance of information system/technology to the Hong Kong Polytechnic University." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1995. http://hub.hku.hk/bib/B31266708.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Reibold, Florian [Verfasser], and C. [Akademischer Betreuer] Dachsbacher. "Data-driven global importance sampling for physically-based rendering / Florian Reibold ; Betreuer: C. Dachsbacher." Karlsruhe : KIT-Bibliothek, 2021. http://d-nb.info/1228439281/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Faronius, Hofmann Therese, and Linda Håkansson. "Visualization Design Effects on Credibility and Data Perception, and the Importance of Digital Interaction." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-453694.

Full text

Abstract:

An effective visualization can often give an insight into data that would otherwise be difficult to analyze. The company Assedon aims to make data understandable to their clients by using data visualization in an interactive user interface. The goal of this study was to create an interactive visual representation of data from the Swedish Public Employment Service with the use of dynamically created digital graphs that are considered credible and beneficial for data perception. Moreover, the goal was to study data perception of the digitally displayed and interactive graphs. The study was conducted by interviewing 19 people with different backgrounds, using a combination of a qualitative and a quantitative interview technique. The interviewees were shown three different designs of a graph type, and rated the graph as well as commented on the graph. The results of this study indicated that a graph is more likely to be perceived as credible if it looks modern and professional. This also means that the design of the graphs needs more attention than people might normally appreciate. The perception of data presented in digitally displayed graphs will be affected by several factors, but most prominently the choice of color can either enhance the perception or confuse. Lastly, interaction with the data will benefit the perception and create another dimension of the data, but only to a certain extent. If the graph is too difficult to evaluate, the purpose of the graph is lost and the interaction becomes a necessity instead of an asset.
En graf kan ge insikt i data som annars är svår att analysera. It-företaget Assedons mål är att konvertera data till digitala interaktiva lösningar som gör data förståelig för deras klienter. Målet med denna studie var att skapa en interaktiv visuell representation av Arbetsförmedlingens data i ett användarvänligt gränssnitt. Detta gjordes genom att skapa digitala grafer som anses trovärdiga och fördelaktiga för datauppfattningen. Målet var även att undersöka hur datauppfattningen av digitala grafer påverkades av interaktion med dessa grafer. Studien utfördes genom att intervjua 19 personer från olika bakgrunder med användning av kvalitativa och kvantitativa intervjutekniker. Deltagarna i studien visades tre olika interaktiva designer av en graf typ och betygsatte dessa samt kommenterade. Resultaten visade att en digital graf är mer sannolik att uppfattas som trovärdig om den ser modern och professionell ut. Datauppfattningen påverkades av flera faktorer, främst färgvalen som kan förtydliga data, men även förvirra läsaren. Avslutningsvis, så kan interaktion erbjuda en ytterligare dimension till grafer och därmed förbättra förståelsen av data. Dock till en viss gräns, är grafen för svår att evaluera utan tillgång till interaktionen så förloras syftet med grafen och interaktionen blir en nödvändighet istället för en tillgång.

APA, Harvard, Vancouver, ISO, and other styles

19

Lindmark, Jessica. "Betydelse av datakvalitet vid modellering av grundvatten : The Importance of Data Quality in Groundwater Modelling." Thesis, Uppsala universitet, Luft-, vatten och landskapslära, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-260204.

Full text

Abstract:

Groundwater modelling can be applied within many fields, such as aid for geotechnical examinations and contaminant transport. In many cases however, groundwater modelling is not used due to the need of large data quantities. A sensitivity analysis has been conducted for hydraulic conductivity, groundwater recharge and data resolution in time and space, to examine what parameters affects the result most. A reference case was calibrated to form the basis of the analysis. The reference case was formed by a ground model scanned with laser from a plane, probes for the level dividing friction soil and clay, a base model composed by interpretations and probes for base level and 19 groundwater pipes. These data were then scaled down where geological and hydrological data were changed in different experiments. It was clear that the amount of information points was not as important as the placement of them. For both types of data it was important to spread out probe points and make sure that peaks in the topography were included. The results generally showed that recharge areas on a higher altitude are the most important recharge areas. These areas have no other water supply unless further boundary conditions apply. A change in the hydraulic conductivity in the friction earth gave a larger difference in model results than when an equally large change in hydraulic conductivity was applied in the clay layer. The largest difference in the model result occurred when the same change was applied for both layers at the same time. The reason the hydraulic conductivity in the friction soil layer matters so much is because it is the layer through which water travels. A change in the clay ’s hydraulic conductivity does not pose an obstacle in the same way it does in the friction soil. A change in the hydraulic conductivity gave a smaller change in model results than an equal percentage change in groundwater recharge. Since higher uncertainties are associated with hydraulic conductivity an extended sensitivity analysis was performed for this parameter. This analysis showed that a change in hydraulic conductivity gives rise to larger differences in model results than for a change in groundwater recharge within their reasonable uncertainty ranges.

APA, Harvard, Vancouver, ISO, and other styles

20

Fritz, Eric Ryan. "Relational database models and other software and their importance in data analysis, storage, and communication." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1468081.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Leung, Kwok-wing, and 梁國榮. "The strategic importance of information systems in the electricity supply industry in Hong Kong." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1995. http://hub.hku.hk/bib/B31266691.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Nybacka, A. (Aino). "The role of consumers’ knowledge and attitudes in determining the importance of privacy in big data marketing." Master's thesis, University of Oulu, 2018. http://urn.fi/URN:NBN:fi:oulu-201806062496.

Full text

Abstract:

The world has changed due to the digital revolution and this means that the marketing environment has changed and that it has to keep changing. Even though the data has always existed and been a part of marketing, due to this digital revolution the marketing done based on customer data has evolved immensely as well. Big data offers marketers opportunities that seem endless, due to the exponentially growing amount of data and the new innovations that are made to harness it, but what has to be remembered is that big data marketing is not a solution without its problems. In this research we focus on big data marketing and its implications on consumer privacy; what are the knowledge and attitudes of consumers towards privacy in big data marketing and what is their role in determining the importance of consumer privacy for the companies. The question we ask and answer is: how consumers’ knowledge of and attitudes towards privacy in big data marketing affect the reasons why companies should take privacy as a part of their strategy? The aim of the research is to explain why consumers’ knowledge and attitudes towards privacy in big data marketing are important and thus something that the companies should take into account in their strategy when looking at it from this particular point of view. These questions are examined through asking a group of informants about their knowledge and attitudes towards privacy in big data marketing and analyzing the data in its entirety before drawing conclusions from it. The research is conducted with a qualitative questionnaire sent to this group of informants and the findings are a direct result of these answers collected from them. The findings of the research include both theoretical contributions and managerial implications. Theoretical contributions of this work presents a model that introduces a framework that explains how the knowledge and attitudes of consumers are interwoven with companies’ actions and what are the major aspects in each of the parts that have an effect on how the privacy in big data marketing is seen and this plays directly to the reasons why companies should take privacy as a part of their strategy. In the managerial implications we explain these things further and learn how either positive or negative outlook towards big data marketing from the consumers’ point of view can affect them in a larger scale and not just on how the customers’ see them. These results can be used to evaluate how the consumers see privacy in big data marketing in a specific company and what are the things that can be done to help the situation; what the companies can do to help consumers’ see privacy in big data marketing in a better light and help them understand the possible ramifications from negative outlook and benefits from a positive one.

APA, Harvard, Vancouver, ISO, and other styles

23

Lai, Kam-hung Jimmy, and 黎錦鴻. "The strategic importance of information technology (IT) to the credit card business of a local banking group." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B3126721X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Kimes, Ryan Vincent. "Quantifying the Effects of Correlated Covariates on Variable Importance Estimates from Random Forests." VCU Scholars Compass, 2006. http://scholarscompass.vcu.edu/etd/1433.

Full text

Abstract:

Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the response. We conclude that the random forest method is an effective classification tool when the goals of a study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. These goals are common in gene expression analysis, therefore we apply the random forest method for the purpose of estimating variable importance on a microarray data set.

APA, Harvard, Vancouver, ISO, and other styles

25

Doubleday, Kevin. "Generation of Individualized Treatment Decision Tree Algorithm with Application to Randomized Control Trials and Electronic Medical Record Data." Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/613559.

Full text

Abstract:

With new treatments and novel technology available, personalized medicine has become a key topic in the new era of healthcare. Traditional statistical methods for personalized medicine and subgroup identification primarily focus on single treatment or two arm randomized control trials (RCTs). With restricted inclusion and exclusion criteria, data from RCTs may not reflect real world treatment effectiveness. However, electronic medical records (EMR) offers an alternative venue. In this paper, we propose a general framework to identify individualized treatment rule (ITR), which connects the subgroup identification methods and ITR. It is applicable to both RCT and EMR data. Given the large scale of EMR datasets, we develop a recursive partitioning algorithm to solve the problem (ITR-Tree). A variable importance measure is also developed for personalized medicine using random forest. We demonstrate our method through simulations, and apply ITR-Tree to datasets from diabetes studies using both RCT and EMR data. Software package is available at https://github.com/jinjinzhou/ITR.Tree.

APA, Harvard, Vancouver, ISO, and other styles

26

Jankovsky, Zachary Kyle. "Development of Computational and Data Processing Tools for ADAPT to Assist Dynamic Probabilistic Risk Assessment." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524194454292866.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Twum, Amoako Benjamin. "The importance of Business Intelligence as a decision-making tool : case study electricity company of Ghana (E.C.G)." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-17652.

Full text

Abstract:

Demand and technology are driving competition to its best if not to the edge, blurring the industrial boundaries and resulting in a substantial re-arrangement of businesses. The advancement in information technology has also made it possible for organisations to hoard large volumes of data from multiple sources through their business processes. To remain competitive in the face of these changing times and fierce competition, a tool is needed which has the capability to allow a holistic view of the operating environment of the organisation, by taking advantage of the huge body of accumulated data and thereby allowing decision makers to be spontaneous with their decision-making.Business Intelligence offers these capabilities and more, for instance the possibility to perform analytics operations about event(s) that demands more clarity on their behaviour.Research in this area, though young, is gradually gaining attention in academia, although still scanty in Africa. This thesis investigates if the adaptation of Business Intelligence (BI) systems can help in an organisation's strategic decision-making in the context of the Electricity Company of Ghana (E.C.G), operating in the utility industry in Ghana.A qualitative approach, employing interviews with seven selected managers at E.C.G was adopted. The results indicate that BI, or a similar system, has never been adapted by E.C.G, though the company creates huge data through its operations. Further, the organisation's information system is not linked together to allow possible discovery of some intelligence that would be worthwhile to influence strategic decisions. The dispersed nature of the current systems is not only causing delays in quest of information from other departments, but also affecting decision-making and progress of work. E.C.G is a prime candidate for the adaptation of BI to leverage on its huge data and also/additionally reduce production waste and costs and to help provide an efficient supply of electricity to its customers. Such a tool would prove to be indispensable.
Program: Masterutbildning i Informatik

APA, Harvard, Vancouver, ISO, and other styles

28

Lubanski, Adam Roman. "Returns to the delivery and support of information services for academic research and learning : the importance of data and information support." Thesis, University College London (University of London), 1999. http://discovery.ucl.ac.uk/10019785/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

GORDOS, PYGMALION-ALEXANDROS, and JONAS BULOVAS. "The importance of supplier information quality in purchasing of transport services." Thesis, KTH, Industriell Marknadsföring och Entreprenörskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236510.

Full text

Abstract:

An important prerequisite for successful supply chain integration is the ability to convert data into information combined with structured storing and sharing processes. The purpose of this master thesis is to investigate potential relation between supplier data quality and performance of purchasing of transport services. The output of the thesis generates evidence about the imperative to emphasize on the supplier data quality throughout the supplier selection process. A supplier data quality assessment framework consisting of 4 dimensions - ease of manipulation, accessibility, accuracy and completeness, is developed as the core product of this research project. The weights of these dimensions were assigned specifically for the case company - Cramo, to determine the quality score for a selected sample of carriers. A coefficient k1 representing the ratio of transport expenditure over sales was introduced to facilitate the identification of relation between supplier data quality and transport expenditure. Business units served by transport companies with higher quality data displayed a lower k1, consequently, paying less for the transport services in comparison to their revenue than business units served by carriers with lower data quality score. The framework developed is adaptable - dimensions and metrics can be added or excluded according to situational factors and case peculiarities. The application of the supplier data quality assessment framework allows for a more objective and streamlined supplier selection. It stresses on the overall costs experienced during the period of cooperation. The finding regarding the importance of supplier data quality in purchasing of transport services can be nonetheless generalized for other cases when companies strive for achieving better informed strategic decisions.
En viktig förutsättning för framgångsrik integration av leverantörskedjor ligger i förmågan att omvandla data till information, kombinerat med en strukturerad lagrings- och delningsprocess. Syftet med denna masteruppsats är att undersöka potentiell relation mellan leverantörers datakvalitet och hur effektivt inköpet av transporttjänsterna är. Utfallet av uppsatsen understryker vikten av att beakta leverantörers datakvalitet i alla delar av en upphandling. Som produkt av denna uppsats har en utvärderingsmall för leverantörers datakvalitet utvecklats. Den består av fyra dimensioner – Hanterbarhet, tillgänglighet, noggrannhet samt fullständighet. De olika dimensionerna är viktade specifikt för det studerade företaget – Cramo, för att fastslå kvalitetsindex för ett urval av deras transportörer. En koefficient - k1- infördes för att representera förhållandet mellan transportkostnad och försäljning. Detta för att underlätta identifieringen av potentiell relation mellan datakvalitet och transportkostnad. Depåer vars transportörer kunde uppvisa en högre datakvalitet hade ett lägre koefficientvärde (k1). Alltså fanns ett samband mellan hög datakvalitet och lägre transportkostnad i förhållande till försäljning. Den utvecklade bedömningsmallen är anpassningsbar – dimensioner och mått kan enkelt adderas eller elimineras utifrån rådande omständigheter i varje fall. Bedömningsmallen ger möjlighet till en mer objektiv och harmoniserad leverantörsbedömning. Mallen understryker även vikten av att beakta den totala kostnaden under avtalstiden. Kunskapen från denna uppsats kring vikten av datakvalitet gällande just transportinköp kan även generaliseras till andra fall där företag strävar mot bättre informerade strategiska beslut.

APA, Harvard, Vancouver, ISO, and other styles

30

Ladieu, François. "Importance des fluctuations spatio-temporelles et des non linéarités pour le transport dans les verres isolants." Habilitation à diriger des recherches, Université Paris Sud - Paris XI, 2003. http://tel.archives-ouvertes.fr/tel-00003424.

Full text

Abstract:

La notion d'onde plane est omniprésente dans la physique de l'état cristallin. Que l'on pense au théorème de Bloch, lequel assure que les fonctions d'onde électroniques ont la périodicité du réseau (à une phase près); ou bien que l'on pense aux degrés de liberté atomiques appelés "phonons", on retrouve toujours la notion d'onde plane. Il est assez intuitif que cette notion va faciliter énormément la compréhension de tous les problèmes liés à la physique du transport. Par exemple, en ce qui concerne le transport de la chaleur, la plupart des cristaux, à température ambiante, ont à peu près les mêmes propriétés puisque la conductivité thermique de tous les cristaux est toujours de l'ordre de 0.3W/K/cm, à un facteur 3 près dans un sens ou dans l'autre. Pour ce qui concerne le transport électrique, la question se complique à peine du fait de la statistique de Fermi Dirac, qui traduit le principe d'exclusion de Pauli: ainsi, la conductivité électrique sera soit très bonne s'il existe des états accessibles libres à une distance en énergie inférieure à la température T, et très mauvaise si au contraire il faut franchir une grande bande interdite pour exciter les électrons. Ce travail se situe dans le vaste ensemble des recherches visant à appréhender la physique du transport dans les matériaux qui sont très loin de présenter la perfection de la périodicité cristalline. Nous allons en effet traiter de quelques questions relatives au transport dans les "verres". Pour tous ces matériaux (parfois appelés "amorphes"), on ne peut plus faire appel aussi naturellement à la notion d'onde plane, ce qui, immédiatement, complique beaucoup la compréhension des phénomènes de transport. En effet, tant que l'écart à la périodicité reste faible (comme dans le cas des cristaux comportant des défauts), la notion d'onde "moyenne naturellement" l'effet de ces écarts à la périodicité idéale. Qu'en est il lorsque la notion d'onde n'est plus aussi naturelle ? Intuitivement, il semble que le transport sera toujours - cette idée souffre quelques exceptions, comme par exemple le fait que, pour $T\le 7$K le Bismuth transporte infiniment mieux le courant lorsqu'il est amorphe (il est alors supraconducteur) que lorsqu'il est cristallin (il est alors normalement métallique). Mais il est bien connu, depuis les développements de la Théorie de Landau des Liquides de Fermi, que l'instabilité supraconductrice joue un rôle tout à fait exceptionnel pour les systèmes électroniques - plus difficile que dans le cas cristallin équivalent. Nous déclinerons cette intuition au travers de deux idées que nous retrouverons dans chacun des trois chapitres de ce document : * Le transport est si difficile dans les "verres" qu'il est fortement "inhomogène", autrement dit il est dominé par une petite portion de l'ensemble "contrainte extérieure/système vitreux". * L'étude du régime non linéaire, où les effets produits sur le système cessent d'être proportionnels à la contrainte appliquée, est un très bon révélateur de la nature de cette "petite portion" qui domine le problème du transport. Ainsi, dans le chapitre 1, consacré à l'étude des échauffements produits dans des verres isolants soumis à des micro faisceaux de particules, nous verrons que le problème du transport (de la chaleur) est dominé par les fluctuations temporelles de l'intensité du faisceau incident: la somme des effets thermiques induits sera dominée par ce qu'il se passe durant les brefs instants où le faisceau est d'intensité "exceptionnellement" élevée. Toute cette physique provient de la très faible conductivité thermique du verre, 10 à 100 fois plus faible que celle des cristaux, à Température ambiante. Le chapitre 2 est consacré à l'étude de la conduction par sauts entre états électroniques localisés par le désordre. Dans ce cas, contrairement au chapitre 1, l'intensité de la contrainte appliquée (celle du champ électrique F) peut être maintenue parfaitement stable dans le temps. Cependant, nous verrons que ce sont alors les "fluctuations spatiales" qu'il faut prendre en compte pour comprendre le transport. Nous montrerons, en effet, que, suivant la valeur du champ électrique F, le chemin qui porte l'essentiel du courant électrique, dans le verre, n'est pas du tout le même: pour $F \to 0$, il s'agira d'un chemin de type "percolation isotrope", alors que dans la limite $F \to \infty$ ce sera un chemin de type "percolation dirigée". Nous montrerons que le comportement non linéaire courant tension permet justement de "voir" cette transformation d'un type de chemin en l'autre, et d'obtenir certains renseignements nouveaux sur la topologie de ces chemins de percolation. Enfin, le chapitre 3 traite de la constante diélectrique basse fréquence dans les verres isolants. Cette étude s'inscrit dans le cadre qu'on appelle "le modèle des doubles puits" élaboré par Anderson, Halperin et Varma (et indépendamment par Philipps) au début des années 70. Nous rappellerons que ce modèle, en dépit de sa simplicité extrême, permet de rendre compte du comportement si particulier des verres, surtout aux plus basses températures. Cependant, nous montrerons qu'en y regardant de plus près, le comportement non linéaire de la constante diélectrique ne peut pas être expliqué dans ce modèle. Nous proposerons alors, suivant un travail récent de Burin et al., que le problème du transport dans ces systèmes est dominé par "des fluctuations spatio-temporelles" tout à fait particulières: les doubles puits n'interagissent efficacement que s'ils sont quasi semblables (donc a priori assez éloignés les uns des autres), et, de plus, l'interaction ne joue un rôle qu'à certains instants bien précis de la période électrique, qui dépend des caractéristiques fines des deux doubles puits considérés. Nous nous permettons une remarque de forme : la longueur de ce document atteint la limite supérieure autorisée à cause des efforts qui ont été faits pour le rendre "lisible". En particulier, chaque sous-chapitre comporte une introduction exposant le problème traité et une conclusion synthétisant les résultats principaux. On peut donc sauter des sous chapitres entiers, sans pour autant perdre le fil de la lecture. Une autre façon de lire ce document est de ne considérer que les Figures et leurs légendes, qui constituent un sous ensemble suffisant pour comprendre l'essentiel de ce qui est dit. Enfin, bien sûr, on peut, si on le désire, lire ce travail en entier. Dans ce cas, l'exposition des problèmes a été conçue pour qu'aucune consultation de la bibliographie ne soit nécessaire, même pour une compréhension détaillée des résultats obtenus.

APA, Harvard, Vancouver, ISO, and other styles

31

Tang, Han, and Han Tang. "The Importance of Prior Geologic Information on Hydraulic Tomography Analysis at the North Campus Research Site (NCRS)." Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/621836.

Full text

Abstract:

The purpose of this study is to investigate the importance of prior information about hydraulic conductivity (K) by Kriging, using point K data and/or residual covariance, on improvements of K estimates at the North Campus Research Site (NCRS). Among many methods that can characterize the mean or detail distribution of hydraulic conductivity (K), the Cooper-Jacob straight line solution, Kriging using point K data, single-well pumping tests inversion and Hydraulic Tomography (HT) have been compared in this study, using the head data collected from 15 cross-hole pumping tests collected at NCRS, where 9 existing wells were installed with packer system and the pressure responses at different intervals in different wells were monitored with transducers. It is found that the HT method, which fuse all the available pumping test data, yields more accurate and consistent results. However, many studies have indicated that the hydraulic data combined with geologic investigation will improve the HT estimates. Thus, in this study, hard data of K obtained by permeameter (227 data points) are brought in using Kriging and combined with HT to yield better estimate K field. Moreover, the validations of unused tests indicate that the estimated K obtained using collected K information makes more accurate predictions.

APA, Harvard, Vancouver, ISO, and other styles

32

Taylor, La'Shan Denise. "Assessing Health Status, Disease Burden, and Quality of Life in Appalachia Tennessee: The Importance of Using Multiple Data Sources in Health Research." Digital Commons @ East Tennessee State University, 2009. https://dc.etsu.edu/etd/1889.

Full text

Abstract:

As the US population ages, public health agencies must examine better ways to measure the impact of adverse health outcome on a population. Many reports have asserted that more adverse health events occur in Appalachia. However, few studies have assessed the quality of life and burden of disease on those residing in Appalachia. Therefore, the overall aim of this dissertation was to assess the health status, burden of disease, and quality of life in Appalachia using available data and improved health outcome assessment measures. For this dissertation, 3 secondary data sources collected by the State of Tennessee and the National Center for Health Statistics (NCHS) were used. These data were used to calculate the index of disparity and absolute and relative disparity measures within the study area of 8 Appalachian counties in upper east Tennessee. Vital statistics data for the selected area were also used to calculate Disability Adjusted Life Years (DALYs) by gender for all cause mortality and stroke mortality. The Behavior Risk Factor Surveillance System (BRFSS) data were used for prevalence data and to determine what factors impact Health Related Quality of Life (HRQOL) within the study area. The Index of disparity (ID) for all cause mortality for the study area found that disparity is greatest in stroke mortality for the study area and TN and the least for all cause mortality and the US. The highest numbers of DALYs was found in the 45-59 age group for the Appalachian study population. Finally, the mean general health status did not vary significantly by gender; however, predictors of reporting excellent to good health status did vary based on gender. Predictors of fair to poor general health status were found to be low income, having diabetes, or having had a stroke or heart attack. The results within this dissertation are intended to assist health professionals with the creation of health interventions and policy development within the Appalachian area. This dissertation proposes a more comprehensive health status monitoring system for assessing health disparity at a regional level.

APA, Harvard, Vancouver, ISO, and other styles

33

Saunders, Gary University of Ballarat. "Pharmacovigilance Decision Support: The value of Disproportionality Analysis Signal Detection Methods, the development and testing of Covariability Techniques, and the importance of Ontology." University of Ballarat, 2006. http://archimedes.ballarat.edu.au:8080/vital/access/HandleResolver/1959.17/12755.

Full text

Abstract:

The cost of adverse drug reactions to society in the form of deaths, chronic illness, foetal malformation, and many other effects is quite significant. For example, in the United States of America, adverse reactions to prescribed drugs is around the fourth leading cause of death. The reporting of adverse drug reactions is spontaneous and voluntary in Australia. Many methods that have been used for the analysis of adverse drug reaction data, mostly using a statistical approach as a basis for clinical analysis in drug safety surveillance decision support. This thesis examines new approaches that may be used in the analysis of drug safety data. These methods differ significantly from the statistical methods in that they utilize co variability methods of association to define drug-reaction relationships. Co variability algorithms were developed in collaboration with Musa Mammadov to discover drugs associated with adverse reactions and possible drug-drug interactions. This method uses the system organ class (SOC) classification in the Australian Adverse Drug Reaction Advisory Committee (ADRAC) data to stratify reactions. The text categorization algorithm BoosTexter was found to work with the same drug safety data and its performance and modus operandi was compared to our algorithms. These alternative methods were compared to a standard disproportionality analysis methods for signal detection in drug safety data including the Bayesean mulit-item gamma Poisson shrinker (MGPS), which was found to have a problem with similar reaction terms in a report and innocent by-stander drugs. A classification of drug terms was made using the anatomical-therapeutic-chemical classification (ATC) codes. This reduced the number of drug variables from 5081 drug terms to 14 main drug classes. The ATC classification is structured into a hierarchy of five levels. Exploitation of the ATC hierarchy allows the drug safety data to be stratified in such a way as to make them accessible to powerful existing tools. A data mining method that uses association rules, which groups them on the basis of content, was used as a basis for applying the ATC and SOC ontologies to ADRAC data. This allows different views of these associations (even very rare ones). A signal detection method was developed using these association rules, which also incorporates critical reaction terms.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

34

Saunders, Gary. "Pharmacovigilance Decision Support: The value of Disproportionality Analysis Signal Detection Methods, the development and testing of Covariability Techniques, and the importance of Ontology." University of Ballarat, 2006. http://archimedes.ballarat.edu.au:8080/vital/access/HandleResolver/1959.17/15382.

Full text

Abstract:

The cost of adverse drug reactions to society in the form of deaths, chronic illness, foetal malformation, and many other effects is quite significant. For example, in the United States of America, adverse reactions to prescribed drugs is around the fourth leading cause of death. The reporting of adverse drug reactions is spontaneous and voluntary in Australia. Many methods that have been used for the analysis of adverse drug reaction data, mostly using a statistical approach as a basis for clinical analysis in drug safety surveillance decision support. This thesis examines new approaches that may be used in the analysis of drug safety data. These methods differ significantly from the statistical methods in that they utilize co variability methods of association to define drug-reaction relationships. Co variability algorithms were developed in collaboration with Musa Mammadov to discover drugs associated with adverse reactions and possible drug-drug interactions. This method uses the system organ class (SOC) classification in the Australian Adverse Drug Reaction Advisory Committee (ADRAC) data to stratify reactions. The text categorization algorithm BoosTexter was found to work with the same drug safety data and its performance and modus operandi was compared to our algorithms. These alternative methods were compared to a standard disproportionality analysis methods for signal detection in drug safety data including the Bayesean mulit-item gamma Poisson shrinker (MGPS), which was found to have a problem with similar reaction terms in a report and innocent by-stander drugs. A classification of drug terms was made using the anatomical-therapeutic-chemical classification (ATC) codes. This reduced the number of drug variables from 5081 drug terms to 14 main drug classes. The ATC classification is structured into a hierarchy of five levels. Exploitation of the ATC hierarchy allows the drug safety data to be stratified in such a way as to make them accessible to powerful existing tools. A data mining method that uses association rules, which groups them on the basis of content, was used as a basis for applying the ATC and SOC ontologies to ADRAC data. This allows different views of these associations (even very rare ones). A signal detection method was developed using these association rules, which also incorporates critical reaction terms.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

35

Posey, Orlando Guy. "Client/Server Systems Performance Evaluation Measures Use and Importance: a Multi-Site Case Study of Traditional Performance Measures Applied to the Client/Server Environment." Thesis, University of North Texas, 1999. https://digital.library.unt.edu/ark:/67531/metadc277882/.

Full text

Abstract:

This study examines the role of traditional computing performance measures when used in a client/server system (C/SS) environment. It also evaluates the effectiveness of traditional computing measures of mainframe systems for use in C/SS. The underlying problem was the lack of knowledge about how performance measures are aligned with key business goals and strategies. This research study has identified and evaluated client/server performance measurements' importance in establishing an effective performance evaluation system. More specifically, this research enables an organization to do the following: (1) compare the relative states of development or importance of performance measures, (2) identify performance measures with the highest priority for future development, (3) contrast the views of different organizations regarding the current or desired states of development or relative importance of these performance measures.

APA, Harvard, Vancouver, ISO, and other styles

36

Langham, J. "The importance of data quality and risk assessment in developing measures of comparative outcome : the National Study of Subarachnoid Haemorrhage, a case study." Thesis, London School of Hygiene and Tropical Medicine (University of London), 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.549778.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Fortuin, Mildred. "A geographic information systems approach to the identification of Table Mountain group aquifer "type areas" of ecological importance." Thesis, University of the Western Cape, 2004. http://etd.uwc.ac.za/index.php?module=etd&amp.

Full text

Abstract:

The Table Mountain group aquifer system has the potential to be an important supply of water. Although the aquifer system is used to some extent, a number of aspects relating to the aquifer system are poorly understood and unquantified. This study aimed to take into consideration the importance of differenct ecosytems, which is essential in predicting the effects of groundwater abstruction. However, the ecological requirements of systems that depend on groundwater are poorly understood. This project identified "
type areas"
for further detailed research into the impacts of large-scale groundwater abstraction from the Table Mountain group aquifer system based on the nature and functioning of ecosystems across groundwater dependent ecosystem boundaries of a regional scale.

APA, Harvard, Vancouver, ISO, and other styles

38

Roxbergh, Linus. "Language Classification of Music Using Metadata." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-379625.

Full text

Abstract:

The purpose of this study was to investigate how metadata from Spotify could be used to identify the language of songs in a dataset containing nine languages. Features based on song name, album name, genre, regional popularity and vectors describing songs, playlists and users were analysed individually and in combination with each other in different classifiers. In addition to this, this report explored how different levels of prediction confidence affects performance and how it compared to a classifier based on audio input. A random forest classifier proved to have the best performance with an accuracy of 95.4% for the whole data set. Performance was also investigated when the confidence of the model was taken into account, and when only keeping more confident predictions from the model, accuracy was higher. When keeping the 70% most confident predictions an accuracy of 99.4% was achieved. The model also proved to be robust to input of other languages than it was trained on, and managed to filter out unwanted records not matching the languages of the model. A comparison was made to a classifier based on audio input, where the model using metadata performed better on the training and test set used. Finally, a number of possible improvements and future work were suggested.

APA, Harvard, Vancouver, ISO, and other styles

39

Sternberg, Sebastian [Verfasser], and Thomas [Akademischer Betreuer] Gschwend. "No public, no power? Analyzing the importance of public support for constitutional review with novel data and machine learning methods / Sebastian Sternberg ; Betreuer: Thomas Gschwend." Mannheim : Universitätsbibliothek Mannheim, 2019. http://d-nb.info/1196009902/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Boland, Paul William. "Morphometric analysis of data inherent in examination by magnetic resonance imaging : importance to natural history, prognosis and disease staging of squamous carcinoma of the oral cavity." Thesis, University of Oxford, 2010. http://ora.ox.ac.uk/objects/uuid:934e1e5a-24db-40ab-ab54-5e58901a9c2a.

Full text

Abstract:

Magnetic resonance imaging plays an important yet underutilized role in determining the natural history and prognosis of oral carcinoma. Depth of tumour invasion is an emergent factor in the oral cancer literature. However, problems exist with the definition of cut-points suitable for inclusion in TNM staging criteria. Statistical methodology represents a possible explanation but is underexplored. In this work, a review of the depth of invasion literature is conducted with emphasis on statistical technique. As well, statistical simulation is used to explore the implications of the of the minimum p-value method. The results demonstrate that the use of continuous variable categorization and multiple testing is widespread, and contributes to cut-point variability and false-positive tests. Depth, as a predictor of OCLNM and survival, must be questioned. The volume of tumour invasion is a promising prognostic factor that has not been fully investigated in the oral carcinoma literature. In this work, the volume of tumour invasion is measured on MRI and compared to thickness and maximum diameter in its capacity to predict 2-year all-cause, disease-related and disease-free survival, as well as occult cervical lymph node metastasis prediction. As part of a comprehensive approach, morphometric factors are incorporated into multifactor predictive models using regression, artificial neural networks and recursive partitioning. It is evident that MRI-based volume is superior all other linear measurements for both occult cervical lymph node metastasis and survival prediction. Artificial neural networks wee superior to all other techniques for survival prediction. There is a case for a unified artificial neural networks model for survival prediction that uses volume, midline invasion and N-stage to determine prognosis. This model can be used to determine individualized probabilities of 2-year survival. The lateral extrinsic muscles of the tongue lie just beneath the surface of the lateral tongue, yet their invasion is a criterion for T4 classification using the TNM staging system. In this work, the Visible Human Female is used to conduct an anatomic study of the extrinsic muscles of the tongue. Linear measurement is used to quantify the distance from the surface mucosa to the most superficial muscle fibres of the styloglossus and genioglossus. Further, the lateral extrinsic muscles are poorly demonstrated on MRI. An anatomic atlas of the tongue is fused with MRI images of oral carcinoma to demonstrate lateral muscle invasion. The results demonstrate that the styloglossus and hyoglossus lie very close to the surface of the lateral tongue, in some cases passing within 1 mm of the surface mucosa. These extrinsic muscles are readily invaded by even small tumours of the lateral tongue. Strict application of the TNM T4a criteria leads to unnecessary upstaging as these carcinomas do not warrant the prognosis and aggressive treatment of Stage IV disease. Extrinsic muscle invasion should be removed as a T4a criterion for the oral cavity. A separate category, T4a (oral tongue) specifying invasion of the genioglossus is also recommended. This work presented in this thesis is an original contribution to the field of oral cavity cancer research and has determined that there is capacity for improvement in current efforts to determine the natural history and prognosis of oral cavity squamous cell carcinoma. This thesis is the first to examine the role of statistical methodology in oral carcinoma depth of invasion cut-point variability. Further, this work presents an original approach to the prediction of regional metastasis and survival using advanced multivariate modeling techniques. No other work explored MRI-measured volume using the substantial sample size gathered in this thesis. Finally, this work is the first to demonstrate that lateral extrinsic muscle invasion is an unnecessary component of the T4a (oral cavity) classification criteria and should be reconsidered.

APA, Harvard, Vancouver, ISO, and other styles

41

Rudelius, Johan, and Erik Zetterström. "The importance of data when training a CNN for medical diagnostics : A study of how dataset size and format affects the learning process of a CNN." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280355.

Full text

Abstract:

Using the computational capabilities of computers within the medical field has become increasingly popular since the emergence of CAD during the middle of the twentieth century. The prevalence of skin cancer attracted research resources, and in 2017, a group of scientists from Stanford University trained a CNN which could outperform board certified dermatologists in several skin cancer classification tests. The Stanford study gave rise to another study conducted by Boman and Volminger who tried to replicate the results using publicly available data. However, they did not achieve the same performance. It was the ambition of this study to extend the work of Boman and Volminger. But due to a large part of the training data being unavailable, comparisons were difficult to make and therefore the ambitions of the study shifted. The models presented in this study achieved 3-way classification accuracies of 82.2% and 87.3% for the balanced and imbalanced models respectively. The balanced model was trained on a data set which had been randomly oversampled and downsampled to make the different classes equal in size. The balanced model showed greater average values of specificity and sensitivity at a relatively small loss of accuracy. Despite the accuracies of these models being higher than that produced by Boman and Volminger, it is difficult to draw any conclusions as the methodology in this study diverged from the previous work.
Sedan CAD utvecklades under mitten av 1950-talet har det har blivit allt mer populärt att utnyttja den beräkningskapacitet som moderna datorer tillför inom det medicinska området. Att hudcancer är så vanligt förekommande ledde till att en grupp av forskare från Stanford år 2017 tränade ett CNN som kunde prestera bättre än certifierade hudläkare i flera klassifikationstester av hudcancer. Stanfordstudien gav upphov till en studie av Boman och Volminger som försökte replikera resultaten med offentligt tillgängliga data. Men de lyckades inte uppnå samma prestanda. Syftet med denna studie var inledningsvis att bygga på Boman och Volmingers arbete. Men på grund av att en stor del av den träningsdata som de använde var otillgänglig så var jämförelser svåra att göra och fokus skiftades därmed till att förändra andra delar av metoden. Modellerna i detta arbete uppnådde en 3-vägs- klassifikationsträffsäkerhet på 82,2% och 87,3% för den balanserade modellen respektive den obalanserade modellen. Den balanserade modellen tränades på en uppsättning data som slumpmässigt över- och undersamplats för att göra klasserna lika stora. Detta resulterade i bättre genomsnittlig sensitivitet och specificitet på bekostnad av en relativt liten förlust i klassifikationsträffsäkerhet. Trots att klassifikationsträffsäkerheten var bättre för dessa modeller än den från Boman och Volmingers arbete, så är det svårt att dra några slutsatser eftersom metodiken i detta arbete avvek från den tidigare studien.

APA, Harvard, Vancouver, ISO, and other styles

42

Elmsjö, Albert. "Selectivity in NMR and LC-MS Metabolomics : The Importance of Sample Preparation and Separation, and how to Measure Selectivity in LC-MS Metabolomics." Doctoral thesis, Uppsala universitet, Avdelningen för analytisk farmaceutisk kemi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-318296.

Full text

Abstract:

Until now, most metabolomics protocols have been optimized towards high sample throughput and high metabolite coverage, parameters considered to be highly important for identifying influenced biological pathways and to generate as many potential biomarkers as possible. From an analytical point of view this can be troubling, as neither sample throughput nor the number of signals relates to actual quality of the detected signals/metabolites. However, a method’s selectivity for a specific signal/metabolite is often closely associated to the quality of that signal, yet this is a parameter often neglected in metabolomics. This thesis demonstrates the importance of considering selectivity when developing NMR and LC-MS metabolomics methods, and introduces a novel approach for measuring chromatographic and signal selectivity in LC-MS metabolomics. Selectivity for various sample preparations and HILIC stationary phases was compared. The choice of sample preparation affected the selectivity in both NMR and LC-MS. For the stationary phases, selectivity differences related primarily to retention differences of unwanted matrix components, e.g. inorganic salts or glycerophospholipids. Metabolites co-eluting with these matrix components often showed an incorrect quantitative signal, due to an influenced ionization efficiency and/or adduct formation. A novel approach for measuring selectivity in LC-MS metabolomics has been introduced. By dividing the intensity of each feature (a unique mass at a specific retention time) with the total intensity of the co-eluting features, a ratio representing the combined chromatographic (amount of co-elution) and signal (e.g. in-source fragmentation) selectivity is acquired. The calculated co-feature ratios have successfully been used to compare the selectivity of sample preparations and HILIC stationary phases. In conclusion, standard approaches in metabolomics research might be unwise, as each metabolomics investigation is often unique. The methods used should be adapted for the research question at hand, primarily based on any key metabolites, as well as the type of sample to be analyzed. Increased selectivity, through proper choice of analytical methods, may reduce the risks of matrix-associated effects and thereby reduce the false positive and false negative discovery rate of any metabolomics investigation.

APA, Harvard, Vancouver, ISO, and other styles

43

Réau, Manon. "Importance des données inactives dans les modèles : application aux méthodes de criblage virtuel en santé humaine et environnementale." Thesis, Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1251/document.

Full text

Abstract:

Le criblage virtuel est utilisé dans la recherche de médicaments et la construction de modèle de prédiction de toxicité. L’application d’un protocole de criblage est précédée par une étape d’évaluation sur une banque de données de référence. La composition des banques d’évaluation est un point critique ; celles-ci opposent généralement des molécules actives à des molécules supposées inactives, faute de publication des données d’inactivité. Les molécules inactives sont néanmoins porteuses d’information. Nous avons donc créé la banque NR-DBIND composée uniquement de molécules actives et inactives expérimentalement validées et dédiées aux récepteurs nucléaires. L’exploitation de la NR-DBIND nous a permis d’étudier l’importance des molécules inactives dans l’évaluation de modèles de docking et dans la construction de modèles de pharmacophores. L’application de protocoles de criblage a permis d’élucider des modes de liaison potentiels de petites molécules sur FXR, NRP-1 et TNF⍺
Virtual screening is widely used in early stages of drug discovery and to build toxicity prediction models. Commonly used protocols include an evaluation of the performances of different tools on benchmarking databases before applying them for prospective studies. The content of benchmarking tools is a critical point; most benchmarking databases oppose active data to putative inactive due to the scarcity of published inactive data in the literature. Nonetheless, experimentally validated inactive data also bring information. Therefore, we constructed the NR-DBIND, a database dedicated to nuclear receptors that contains solely experimentally validated active and inactive data. The importance of the integration of inactive data in docking and pharmacophore models construction was evaluated using the NR-DBIND data. Virtual screening protocols were used to resolve the potential binding mode of small molecules on FXR, NRP-1 et TNF⍺

APA, Harvard, Vancouver, ISO, and other styles

44

Pirathiban, Ramethaa. "Improving species distribution modelling: Selecting absences and eliciting variable usefulness for input into standard algorithms or a Bayesian hierarchical meta-factor model." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/134401/1/Ramethaa_Pirathiban_Thesis.pdf.

Full text

Abstract:

This thesis explores and proposes methods to improve species distribution models. Throughout this thesis, a rich class of statistical modelling techniques has been developed to address crucial and interesting issues related to the data input into these models. The overall contribution of this research is the advancement of knowledge on species distribution modelling through an increased understanding of extraneous zeros, quality of the ecological data, variable selection that incorporates ecological theory and evaluating performance of the fitted models. Though motivated by the challenge of species distribution modelling from ecology, this research is broadly relevant to many ﬁelds, including bio-security and medicine. Speciﬁcally, this research is of potential signiﬁcance to researchers seeking to: identify and explain extraneous zeros; assess the quality of their data; or employ expert-informed variable selection.

APA, Harvard, Vancouver, ISO, and other styles

45

Tavakkoli, Timon-Amir [Verfasser], Sohrab [Akademischer Betreuer] [Gutachter] Fratz, Peter [Gutachter] Ewert, and Alfred [Gutachter] Hager. "Importance of Hemodynamic Right and Left Ventricular Parameters and CPET-Data in Fallot-Patients and Patients with Fallot-like Pathologies / Timon-Amir Tavakkoli ; Gutachter: Peter Ewert, Sohrab Fratz, Alfred Hager ; Betreuer: Sohrab Fratz." München : Universitätsbibliothek der TU München, 2016. http://d-nb.info/1114393940/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Carneiro, Murillo Guimarães. "Redes complexas para classificação de dados via conformidade de padrão, caracterização de importância e otimização estrutural." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-01022017-100223/.

Full text

Abstract:

A classificação é uma tarefa do aprendizado de máquina e mineração de dados, na qual um classificador é treinado sobre um conjunto de dados rotulados de forma que as classes de novos itens de dados possam ser preditas. Tradicionalmente, técnicas de classificação trabalham por definir fronteiras de decisão no espaço de dados considerando os atributos físicos do conjunto de treinamento e uma nova instância é classificada verificando sua posição relativa a tais fronteiras. Essa maneira de realizar a classificação, essencialmente baseada nos atributos físicos dos dados, impossibilita que as técnicas tradicionais sejam capazes de capturar relações semânticas existentes entre os dados, como, por exemplo, a formação de padrão. Por outro lado, o uso de redes complexas tem se apresentado como um caminho promissor para capturar relações espaciais, topológicas e funcionais dos dados, uma vez que a abstração da rede unifica a estrutura, a dinâmica e as funções do sistema representado. Dessa forma, o principal objetivo desta tese é o desenvolvimento de métodos e heurísticas baseadas em teorias de redes complexas para a classificação de dados. As principais contribuições envolvem os conceitos de conformidade de padrão, caracterização de importância e otimização estrutural de redes. Para a conformidade de padrão, onde medidas de redes complexas são usadas para estimar a concordância de um item de teste com a formação de padrão dos dados, é apresentada uma técnica híbrida simples pela qual associações físicas e topológicas são produzidas a partir da mesma rede. Para a caracterização de importância, é apresentada uma técnica que considera a importância individual dos itens de dado para determinar o rótulo de um item de teste. O conceito de importância aqui é definido em termos do PageRank, algoritmo usado na engine de busca do Google para definir a importância de páginas da web. Para a otimização estrutural de redes, é apresentado um framework bioinspirado capaz de construir a rede enquanto otimiza uma função de qualidade orientada à tarefa, como, por exemplo, classificação, redução de dimensionalidade, etc. A última investigação apresentada no documento explora a representação baseada em grafo e sua habilidade para detectar classes de distribuições arbitrárias na tarefa de difusão de papéis semânticos. Vários experimentos em bases de dados artificiais e reais, além de comparações com técnicas bastante usadas na literatura, são fornecidos em todas as investigações. Em suma, os resultados obtidos demonstram que as vantagens e novos conceitos propiciados pelo uso de redes se configuram em contribuições relevantes para as áreas de classificação, sistemas de aprendizado e redes complexas.
Data classification is a machine learning and data mining task in which a classifier is trained over a set of labeled data instances in such a way that the labels of new instances can be predicted. Traditionally, classification techniques define decision boundaries in the data space according to the physical features of a training set and a new data item is classified by verifying its relative position to the boundaries. Such kind of classification, which is only based on the physical attributes of the data, makes traditional techniques unable to detect semantic relationship existing among the data such as the pattern formation, for instance. On the other hand, recent works have shown the use of complex networks is a promissing way to capture spatial, topological and functional relationships of the data, as the network representation unifies structure, dynamic and functions of the networked system. In this thesis, the main objective is the development of methods and heuristics based on complex networks for data classification. The main contributions comprise the concepts of pattern conformation, data importance and network structural optimization. For pattern conformation, in which complex networks are employed to estimate the membership of a test item according to the data formation pattern, we present, in this thesis, a simple hybrid technique where physical and topological associations are produced from the same network. For data importance, we present a technique which considers the individual importance of the data items in order to determine the label of a given test item. The concept of importance here is derived from PageRank formulation, the ranking measure behind the Googles search engine used to calculate the importance of webpages. For network structural optimization, we present a bioinspired framework, which is able to build up the network while optimizing a task-oriented quality function such as classification, dimension reduction, etc. The last investigation presented in this thesis exploits the graph representation and its hability to detect classes of arbitrary distributions for the task of semantic role diffusion. In all investigations, a wide range of experiments in artificial and real-world data sets, and many comparisons with well-known and widely used techniques are also presented. In summary, the experimental results reveal that the advantages and new concepts provided by the use of networks represent relevant contributions to the areas of classification, learning systems and complex networks.

APA, Harvard, Vancouver, ISO, and other styles

47

Goble, Peter. "Maximizing the utility of available root zone soil moisture data for drought monitoring purposes in the Upper Colorado River Basin and western High Plains, and assessing the interregional importance of root zone soil moisture on warm season water." Thesis, Colorado State University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10139009.

Full text

Abstract:

Root Zone Soil Moisture (RZSM) data have both drought monitoring and seasonal forecasting applications. It is the lifeblood of vegetation, an integral component of the hydrologic system, a determining factor in irrigation requirements, and works to govern the means by which energy imbalances are settled between land and atmosphere. The National Integrated Drought Information System (NIDIS) has worked in conjunction with the Colorado Climate Center to improve regional drought early warning through enhanced monitoring and understanding of RZSM. The chief goals of this research have been as follows: 1. Examine regional drought monitoring in the Upper Colorado River Basin and eastern Colorado with specific inquiry as to soil moisture’s role in the process. 2. Develop operational products that can be used to improve the weekly drought monitoring process in the Upper Colorado River Basin and eastern Colorado with an emphasis on utilization of soil moisture data. 3. Review in-situ soil moisture data from high elevation Snow Telemetry measurement sites in Colorado in order to understand the descriptive climatology of soil moisture over the Colorado Rockies. 4. Compare output from soil sensors installed by the Snow Telemetry and Colorado Agricultural Meteorological Network using current calibration methods in order to better understand application of direct comparison between output from the two different sensor types. Engineer a soil moisture core measurement protocol that is reliable within ten percent of the true volumetric water content value. This protocol, if successful on a local plot, will be expanded to alpha testers around the United States and used by the USDA for drought monitoring as well as NASA for ground validation of the Soil Moisture Active Passive (SMAP) Satellite. 5. Expose the seasonality and spatial variability of positive feedbacks that occur between RZSM and the atmosphere across the Upper Colorado River Basin and western High Plains using reanalysis data from the North American Land Data Assimilation System Phase-2 (NLDAS).

Regional drought monitoring was found to involve assimilation of data from a bevy of sources. The decision-making process includes assessment of precipitation, soil moisture, snowpack, vegetative health, streamflow, reservoir levels, reference evapotranspiration, surface air temperature, and ground reports from the regional agricultural sector. Drought monitoring was expanded upon in this research through the development of several products intended for future Colorado Climate Center use. In-situ soil moisture timeseries are now being created from select SNOTEL and SCAN measurement sites. Reservoir monitoring graphics are being produced to accompany spatial analyses downloaded from the bureau of reclamation. More soil moisture data is being used, and now come from an ensemble of models rather than just the VIC model.

While only ten years of data were collected in analyzing the descriptive soil moisture climatology of the Colorado Rockies, these data were telling in terms of the expected seasonal cycle of soil moisture at high elevations. SNOTEL measurements reveal that soil moisture levels peak prior to snowmelt, large decreases in soil moisture are expected in June and early July, a slight recovery is anticipated in association with the North American Monsoon, and the sign of near-surface water balance flips back to positive in the first two weeks of September before soils freeze. Seasonal variance and distribution of volumetric water content varies in ways that are useful to understand from a drought monitoring standpoint. The data show that measurements are affected when soil freezes.

Comparing output from soil sensor relays using sensor types and calibration methods consistent with current SNOTEL and CoAgMet specifications revealed large differences in output regardless of being subject to the same meteorologic conditions.

Soil moisture measurement protocol development proved to be a trial and error process. The data collected at Christman Field was not sufficient proof that soil coring results did come within ten percent of ground truth perhaps due to microscale variations in infiltration. It was possible to develop a protocol of an acceptable standard that could be followed by citizen scientist for an estimated cost of $50.

Results from statistical modeling of post-processed NLDAS data from the last 30 years point primarily to a time frame between May and July in which soil moisture anomalies become significantly correlated with seasonal temperature and precipitation anomalies. This time of year is partially characterized by a climatologic maximization of downwelling solar radiation and a northward recession of the polar jet, but also precedes the anticipated arrival of the North American Monsoon. (Abstract shortened by ProQuest.)

APA, Harvard, Vancouver, ISO, and other styles

48

West, Adam. "Hunting for humans in forest ecosystems : are the traces of Iron-age people detectable? : an investigation into the importance of Iron-age slash-an-burn agriculture in KwaZulu-Natal forests using compositional and demographic data and carbon isotope techniques." Master's thesis, University of Cape Town, 1999. http://hdl.handle.net/11427/23678.

Full text

Abstract:

To what extent are humans responsible for the biological landscapes that we see today? We relate to recent phenomena such as urban environments and commercial farmlands as anthropogenically created landscapes, however historic anthropogenic influence may have been a lot more extensive than previously accepted (Gomez-Pompa & Kaus 1992, Bird 1995, Motzkin et a/1996). In southern Africa we are surrounded by landscapes influenced by humans to some degree (Hoffman 1997). It is now accepted that even wilderness landscapes previously labelled as "pristine" or "natural" are subject to constant change (Botkin 1990) and could well have been generated, or at least influenced, by humans in the past (Gomez-Pompa & Kaus 1992, Foster et a/1996, Bird & Cali 1998). This is certainly the case for many forest systems (Binford eta/ 1987, Balee 1989, Northrop & Horn 1996, Noble & Dirzo 1997, Ogden eta/ 1998, Lindbladh & Bradshaw 1998, Foster et a/1999). This thesis attempts to answer, for forest ecosystems, the question posed almost 20 years ago by Feely (1980): "Did Iron Age Man have a role in the history of Zululand's wilderness landscapes?" In doing so, I hoped to address the larger issue of "ecosystem virginity" and to what extent landscapes with a lengthy history of human habitation are dependant on human-ge·nerated disturbance.

APA, Harvard, Vancouver, ISO, and other styles

49

Alet, Ferran (Alet I. Puig). "Finding important entities in continuous streaming data." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118027.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 65-67).
In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections. Rather than formalize this as a clustering problem, in which all detections must be grouped into hard or soft categories, we formalize it as an instance of the frequent items or heavy hitters problem, which finds groups of tightly clustered objects that have a high density in the feature space. We show that the heavy hitters formulation generates solutions that are more accurate and effective than the clustering formulation. In addition, we present a novel online algorithm for heavy hitters, called HAC, which addresses problems in continuous space, and demonstrate its effectiveness on real video and household domains.
by Ferran Alet.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

50

Korkmaz, Gulberal Kircicegi Yoksul. "Mining Microarray Data For Biologically Important Gene Sets." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614266/index.pdf.

Full text

Abstract:

Microarray technology enables researchers to measure the expression levels of thousands of genes simultaneously to understand relationships between genes, extract pathways, and in general understand a diverse amount of biological processes such as diseases and cell cycles. While microarrays provide the great opportunity of revealing information about biological processes, it is a challenging task to mine the huge amount of information contained in the microarray datasets. Generally, since an accurate model for the data is missing, first a clustering algorithm is applied and then the resulting clusters are examined manually to find genes that are related with the biological process under inspection. We need automated methods for this analysis which can be used to eliminate unrelated genes from data and mine for biologically important genes. Here, we introduce a general methodology which makes use of traditional clustering algorithms and involves integration of the two main sources of biological information, Gene Ontology and interaction networks, with microarray data for eliminating unrelated information and find a clustering result containing only genes related with a given biological process. We applied our methodology successfully on a number of different cases and on different organisms. We assessed the results with Gene Set Enrichment Analysis method and showed that our final clusters are highly enriched. We also analyzed the results manually and found that most of the genes that are in the final clusters are actually related with the biological process under inspection.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data Importance'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles