Dissertations / Theses: 'Real word Data'

1

Rodittis, Katherine, and Patrick Mattingly. "USING MICROSOFT’S COMPONENT OBJECT MODEL (COM) TO IMPROVE REAL-TIME DISPLAY DEVELOPMENT FOR THE ADVANCED DATA ACQUISITION AND PROCESSING SYSTEM (ADAPS)." International Foundation for Telemetering, 2000. http://hdl.handle.net/10150/606801.

Full text

Abstract:

International Telemetering Conference Proceedings / October 23-26, 2000 / Town & Country Hotel and Conference Center, San Diego, California
Microsoft’s Component Object Model (COM) allows us to rapidly develop display and analysis features for the Advanced Data Acquisition and Processing System (ADAPS).

APA, Harvard, Vancouver, ISO, and other styles

2

Anantharajah, Kaneswaran. "Robust face clustering for real-world data." Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/89400/1/Kaneswaran_Anantharajah_Thesis.pdf.

Full text

Abstract:

This thesis has investigated how to cluster a large number of faces within a multi-media corpus in the presence of large session variation. Quality metrics are used to select the best faces to represent a sequence of faces; and session variation modelling improves clustering performance in the presence of wide variations across videos. Findings from this thesis contribute to improving the performance of both face verification systems and the fully automated clustering of faces from a large video corpus.

APA, Harvard, Vancouver, ISO, and other styles

3

Allen, Brett. "Learning body shape models from real-world data /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/6969.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Apeh, Edward Tersoo. "Adaptive algorithms for real-world transactional data mining." Thesis, Bournemouth University, 2012. http://eprints.bournemouth.ac.uk/20989/.

Full text

Abstract:

The accurate identiﬁcation of the right customer to target with the right product at the right time, through the right channel, to satisfy the customer’s evolving needs, is a key performance driver and enhancer for businesses. Data mining is an analytic process designed to explore usually large amounts of data (typically business or market related) in search of consistent patterns and/or systematic relationships between variables for the purpose of generating explanatory/predictive data models from the detected patterns. It provides an effective and established mechanism for accurate identiﬁcation and classiﬁcation of customers. Data models derived from the data mining process can aid in effectively recognizing the status and preference of customers - individually and as a group. Such data models can be incorporated into the business market segmentation, customer targeting and channelling decisions with the goal of maximizing the total customer lifetime proﬁt. However, due to costs, privacy and/or data protection reasons, the customer data available for data mining is often restricted to veriﬁed and validated data,(in most cases,only the business owned transactional data is available). Transactional data is a valuable resource for generating such data models. Transactional data can be electronically collected and readily made available for data mining in large quantity at minimum extra cost. Transactional data is however, inherently sparse and skewed. These inherent characteristics of transactional data give rise to the poor performance of data models built using customer data based on transactional data. Data models for identifying, describing, and classifying customers, constructed using evolving transactional data thus need to effectively handle the inherent sparseness and skewness of evolving transactional data in order to be efficient and accurate. Using real-world transactional data, this thesis presents the ﬁndings and results from the investigation of data mining algorithms for analysing, describing, identifying and classifying customers with evolving needs. In particular, methods for handling the issues of scalability, uncertainty and adaptation whilst mining evolving transactional data are analysed and presented. A novel application of a new framework for integrating transactional data binning and classiﬁcation techniques is presented alongside an effective prototype selection algorithm for efficient transactional data model building. A new change mining architecture for monitoring, detecting and visualizing the change in customer behaviour using transactional data is proposed and discussed as an effective means for analysing and understanding the change in customer buying behaviour over time. Finally, the challenging problem of discerning between the change in the customer proﬁle (which may necessitate the effective change of the customer’s label) and the change in performance of the model(s) (which may necessitate changing or adapting the model(s)) is introduced and discussed by way of a novel ﬂexible and efficient architecture for classiﬁer model adaptation and customer proﬁles class relabeling.

APA, Harvard, Vancouver, ISO, and other styles

5

Naulleau, Patrick. "Optical signal processing and real world applications /." Online version of thesis, 1993. http://hdl.handle.net/1850/12136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Vogetseder, Georg. "Functional Analysis of Real World Truck Fuel Consumption Data." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-1148.

Full text

Abstract:

This thesis covers the analysis of sparse and irregular fuel consumption data of long

distance haulage articulate trucks. It is shown that this kind of data is hard to analyse with multivariate as well as with functional methods. To be able to analyse the data, Principal Components Analysis through Conditional Expectation (PACE) is used, which enables the use of observations from many trucks to compensate for the sparsity of observations in order to get continuous results. The principal component scores generated by PACE, can then be used to get rough estimates of the trajectories for single trucks as well as to detect outliers. The data centric approach of PACE is very useful to enable functional analysis of sparse and irregular data. Functional analysis is desirable for this data to sidestep feature extraction and enabling a more natural view on the data.

APA, Harvard, Vancouver, ISO, and other styles

7

Langdell, Stephen James. "Radial basis function networks for modelling real world data." Thesis, University of Huddersfield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285590.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Tsapeli, Theofania Kleio. "Understanding real-world phenomena from human-generated sensor data." Thesis, University of Birmingham, 2018. http://etheses.bham.ac.uk//id/eprint/8445/.

Full text

Abstract:

Nowadays, there is an increasing data availability. Smartphones, wearable devices, social media, web browsing information and sales recordings are only few of the newly available information sources. Analysing this kind of information is an important step towards understanding human behaviour. In this dissertation, I propose novel techniques for uncovering the complex dependencies between factors extracted from raw sensor data and real-world phenomena and I demonstrate the potential of utilising the vast amount of human digital traces in order to better understand human behaviour and factors influenced by it. In particular, two main problems are considered: 1) whether there is a dependency between social media data and traded assets prices and 2) how smartphone sensor data can be used to understand factors that influence our stress level. In this thesis, I focus on uncovering the structural dependencies among factors of interest rather than on the detection of mere correlation. Special attention is given on enhancing the reliability of the findings by developing techniques that can better handle the specific characteristics of the examined datasets. Although the developed approaches are motivated by specific problems related to human-generated sensor data, they are general and can be applied in any dataset with similar characteristics.

APA, Harvard, Vancouver, ISO, and other styles

9

Saunders, L. J. "Studies on real world visual field data in glaucoma." Thesis, City, University of London, 2015. http://openaccess.city.ac.uk/16170/.

Full text

Abstract:

Glaucoma is a leading cause of blindness. As a progressive condition, it is important to monitor how the visual field (VF) changes over time with perimetry in preventing vision from deteriorating to a stage where quality of life is affected. However, there is little evidence of how clinical measurements correlate with meaningful quality of life landmarks for the patient or, by extension, the proportion of patients in danger of progressing to these landmarks. Further, measurement variability associated with visual fields make it difficult to monitor true change over time. The purpose of this thesis was to use large-scale clinical data (almost 500,000 VFs) to address some of these issues. The first study attempted to relate clinical measurements of glaucoma severity to UK legal fitness to drive status. Legal fitness to drive (LFTD) was estimated using the integrated visual field as a surrogate of the Esterman test, which is the approved method by the UK DVLA of defining LFTD, while the mean deviation (MD) was used to represent defect severity. An MD of -14dB or worse in the better eye was found to be associated with a 92% (95% Confidence Interval [CI]: 87-95%) probability of being legally unfit to drive. The second study used a statistical model to estimate the number of patients progressing at rates that could lead to this landmark of significant visual impairment or blindness in their predicted remaining lifetime. A significant minority of patients were progressing at rates that could lead to statutory blindness, as defined by the US Social Security Administration, in their predicted remaining lifetime (5.2% [CI: 4.5-6.0%]) with a further 10% in danger of becoming legally unfit to drive (10.4% [CI: 9.4-11.4%]). More than 90% (CI: 85.7-94.3%) of patients predicted to progress to statutory blindness had an MD worse than -6dB in at least one eye at presentation, suggesting an association between baseline VF damage and risk of future impairment. The next section investigated whether choice of testing algorithm, SITA Standard or SITA Fast, affected the time taken to detect progression in VF follow-up. The precision of the tests was measured using linear modelling techniques and the impact of these differences was analysed using simulations. Though SITA Fast was found to be slightly less precise, no evidence was found to suggest that this resulted in progression being detected later. The final study evaluated a validated and published risk calculator, which utilised baseline risk factors to profile risk of fast progression. A simpler model using baseline VF data was developed to have similar statistical properties for comparison(including equivalent R2 statistics). The results suggested that risk calculators with low R2 statistics had little utility for predicting future progression rate in clinical practice. Together these results contribute a variety of novel findings and demonstrate the benefit of using large quantities of data collected from the everyday clinical milieu to extend clinical knowledge.

APA, Harvard, Vancouver, ISO, and other styles

10

Lövenvald, Frans-Lukas. "FINDING ANOMALOUS TIME FRAMES IN REAL-WORLD LOG DATA." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163311.

Full text

Abstract:

Anomaly detection is a huge field of research focused on the task of finding weird or outlying points in data. This task is useful in all fields that handle large amounts of data and is therefore a big topic of research. The focus of research often lies in finding novel approaches for finding anomalies in already labeled and well-understood data. This thesis will not focus on a novel algorithm but instead display and discuss the power of an anomaly detection process that focuses on feature engineering and feature exploration. The thesis will also compare two unsupervised anomaly classification algorithms, namely k-nearest neighbours and principal component analysis, in terms of explainability and scalability. The results concludes that sometimes feature engineering can display anomalies just as well as novel and complex anomaly detection algorithms.

APA, Harvard, Vancouver, ISO, and other styles

11

Larsson, Daniel. "ARAVQ for discretization of radar data : An experimental study on real world sensor data." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11114.

Full text

Abstract:

The aim of this work was to investigate if interesting patterns could be found in time series radar data that had been discretized by the algorithm ARAVQ into symbolic representations and if the ARAVQ thus might be suitable for use in the radar domain. An experimental study was performed where the ARAVQ was used to create symbolic representations of data sets with radar data. Two experiments were carried out that used a Markov model to calculate probabilities used for discovering potentially interesting patterns. Some of the most interesting patterns were then investigated further. Results have shown that the ARAVQ was able to create accurate representations for several time series and that it was possible to discover patterns that were interesting and represented higher level concepts. However, the results also showed that the ARAVQ was not able to create accurate representations for some of the time series.

APA, Harvard, Vancouver, ISO, and other styles

12

Nagao, Katashi, Katsuhiko Kaji, and Toshiyuki Shimizu. "Discussion Mining : Knowledge Discovery from Data on the Real World Activities." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10350.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Parker, K. N. "Numeric data frames and probabilistic judgments in complex real-world environments." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1536437/.

Full text

Abstract:

This thesis investigates human probabilistic judgment in complex real-world settings to identify processes underpinning biases across groups which relate to numerical frames and formats. Experiments are conducted replicating real-world environments and data to test judgment performance based on framing and format. Regardless of background skills and experience, people in professional and consumer contexts show a strong tendency to perceive the world from a linear perspective, interpreting information in concrete, absolute terms and making judgments based on seeking and applying linear functions. Whether predicting sales, selecting between financial products, or forecasting refugee camp data, people use minimal cues and systematically apply additive methods amidst non-linear trends and percentage points to yield linear estimates in both rich and sparse informational contexts. Depending on data variability and temporality, human rationality and choice may be significantly helped or hindered by informational framing and format. The findings deliver both theoretical and practical contributions. Across groups and individual differences, the effects of informational format and the tendency to linearly extrapolate are connected by the bias to perceive values in concrete terms and make sense of data by seeking simple referent points. People compare and combine referents using additive methods when inappropriate and adhere strongly to defaults when applied in complex numeric environments. The practical contribution involves a framing manipulation which shows that format biases (i.e., additive processing) and optimism (i.e., associated with intertemporal effects) can be counteracted in judgments involving percentages and exponential growth rates by using absolute formats and positioning defaults in future event context information. This framing manipulation was highly effective in improving loan choice and repayment judgments compared to information in standard finance industry formats. There is a strong potential to increase rationality using this data format manipulation in other financial settings and domains such as health behaviour change in which peoples’ erroneous interpretation of percentages and non-linear relations negatively impact choice and behaviours in both the short and long-term.

APA, Harvard, Vancouver, ISO, and other styles

14

Treitz, Bastian. "Comparison of data-communication technologies for a Real-World Business Application." Hochschule für Technik, Wirtschaft und Kultur, 2020. https://htwk-leipzig.qucosa.de/id/qucosa%3A73720.

Full text

Abstract:

In einer Client/Server-Netzwerkarchitektur wird die Kommunikation der Komponenten durch Protokolle geregelt. Die Wahl des Protokolls hat dabei Einfluss auf die Verzögerungszeiten im Datentransfer. Diese Arbeit hat ausgewählte Protokolle und Technologien für A12 Business-Anwendungen bezüglich des Aspekts der Performance getestet. Der Fokus lag auf HTTP/2 HTTP Long Polling, Web Sockets und der AMQP-Implementierung RabbitMQ. Um dies zu realisieren wurde die Applikation entwickelt, die alle Anwendungsfälle für jede der Technologien implementiert. Die erstellten Testszenarien wurden daraufhin ausgeführt und die Ergebnisse zeigen, welche Technologien in welchen Anwendungsfällen gut oder schlecht sind. HTTP/2 erzielt im Vergleich gute Ergebnisse, wenn der vorliegende Anwendungsfall dem Request/Response-Muster entspricht. Wenn es sich zudem um große zu sendene Objekte handelt, kann HTTP/2 Verzögerungseinsparungen von mindestens 22% gegenüber den anderen getesteten Technologien vorweisen. Long Polling erweist sich für A12 Business-Anwendungen mit sich ständig aktualisierenden Live-Daten als keine gute Option. WebSockets hingegen stechen in diesem Fall durch eine sehr geringe Latenz hervor, die insbesondere bei einer höheren Anzahl von Clients überzeugt. Hier existieren Anwendungsfälle, bei denen der WebSocket 63% schneller ist als die Alternativen. RabbitMQ überzeugt durch gute Resultate in verschiedensten Anwendungsfällen. Bei Live-Daten für einen Client kann die Message Queue mit den Latenzen des WebSockets mithalten. Erst eine Erhöhung der Anzahl der Clients bewirkt eine größer werdende Abweichung. Zudem überzeugt RabbitMQ mit geringen Zeiten in der Neu-Verbindung und für A12 Business-Anwendungen mit Request/Response-Muster. In Letzterem existieren Anwendungsfälle bei denen RabbitMQ 10% schneller ist als HTTP/2.

APA, Harvard, Vancouver, ISO, and other styles

15

Nogueira, Mariana. "Machine learning to support exploring and exploiting real-world clinical longitudinal data." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669968.

Full text

Abstract:

Following-up on patient evolution by reacquiring the same measurements over time (longitudinal data) is a crucial component in clinical care dynamics, as it creates opportunity for timely decision making in preventing adverse outcome. It is thus important that clinicians have proper longitudinal analysis tools at their service. Nonetheless, most traditional longitudinal analysis tools have limited applicability if data are (1) not highly standardized or (2) very heterogeneous (e.g. images, signal, continuous and categorical variables) and/or high-dimensional. These limitations are extremely relevant, as both scenarios are prevalent in routine clinical practice. The aim of this thesis is the development of tools that facilitate the integration and interpretation of complex and nonstandardized longitudinal clinical data. Specifically, we explore approaches based on unsupervised dimensionality reduction, which allow the integration of complex longitudinal data and their representation as low-dimensional yet clinically interpretable trajectories. We showcase the potential of the proposed approach in the contexts of two specific clinical problems with different scopes and challenges: (1) nonstandardized stress echocardiography and (2) labour monitoring and decision making. In the first application, the proposed approach proved to help in the identification of normal and abnormal patterns in cardiac response to stress and in the understanding of the underlying pathophysiological mechanisms, in a context of nonstandardized longitudinal data collection involving heterogeneous data streams. In the second application, we showed how the proposed approach could be used as the central concept of a personalized labour monitoring and decision support system, outperforming the current reference labour monitoring and decision support tool. Overall, we believe that this thesis validates unsupervised dimensionality reduction as a promising approach to the analysis of complex and nonstandardized clinical longitudinal data.
El seguimiento de la evolución de un paciente tomando las mismas medidas en diferentes instantes temporales (datos longitudinales) es un componente crucial en la dinámica de los cuidados médicos, ya que permite tomar decisiones correctas en el momento idóneo para prevenir eventos adversos. Es entonces importante que los médicos tengan a su disposicion herramientas para analizar datos de carácter longitudinal. Sin embargo, la mayoría de las herramientas que actualmente existen tienen una aplicabilidad limitada si los datos (1) no están suficientemente estandarizados o (2) son muy heterogéneos (eg: imágenes, señales, variables continuas y categóricas) y/o tienen una alta dimensionalidad. Estas limitaciones son tremendamente relevantes, ya que ambos casos son prevalentes en la practica clínica habitual. El objetivo de esta tesis es el desarrollo de herramientas que facilitan la integración e interpretación de datos clínicos longitudinales que son complejos y no están estandarizados. Específicamente, exploramos enfoques basados en la reducción de dimensionalidad no supervisada, que permite integrar datos longitudinales complejos y su representación como una trayectoria de baja dimensión que es clínicamente interpretable. Mostramos el potencial del enfoque propuesto en el contexto de dos problemas clínicos en diferentes ámbitos y con diferentes desafíos: (1) ecocardiografía de estrés no estandarizada y (2) monitoreo de parto y toma de decisiones. En la primera aplicación, el enfoque propuesto ha mostrado ser de ayuda en la identificación de patrones normales y anormales en la respuesta cardiaca al estrés y en entender los mecanismos patofisiologicos subyacentes, en el contexto de una adquisición de datos longitudinales no estandarizados que contiene un flujo de datos heterogéneo. En la segunda aplicación, mostramos como el enfoque propuesto puede ser el concepto central de un sistema de monitoreo del parto y soporte a la decisión personalizado, superando el sistema actual de referencia. En conclusión, creemos que esta tesis muestra que la reducción de dimensión no supervisada es un prometedor enfoque para analizar datos clínicos longitudinales complejos y no estandarizados.

APA, Harvard, Vancouver, ISO, and other styles

16

Hu, Yang. "PV Module Performance Under Real-world Test Conditions - A Data Analytics Approach." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1396615109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Lundmark, Lukas. "Synthetic Meta-Learning: : Learning to learn real-world tasks with synthetic data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264919.

Full text

Abstract:

Meta-learning is an approach to machine learning that teaches models how to learn new tasks with only a handful of examples. However, meta-learning requires a large labeled dataset during its initial meta-learning phase, which restricts what domains meta-learning can be used in. This thesis investigates if this labeled dataset can be replaced with a synthetic dataset without a loss in performance. The approach has been tested on the task of military vehicle classification. The results show that for few-shot classification tasks, models trained with synthetic data can come close to the performance of models trained with real-world data. The results also show that adjustments to the data-generation process, such as light randomization, can have a significant effect on performance, suggesting that fine-tuning to the generation process could further improve performance.
Metainlärning är en metodik inom maskininlärning som gör det möjligt att lära en modell nya uppgifter med endast en handfull mängd träningsexempel. Metainlärning kräver dock en stor mängd träningsdata under själva metaträningsfasen, vilket begränsar de domäner där metodiken kan användas. Detta examensarbete utreder huruvida syntetisk bilddata, som genererats med hjälp av en simulator, kan ersätta verklig bilddata under metainlärningsfasen. Metoden har utvärderats på militär fordonsklassificering. Resultaten visar att för bildklassificering med 1–10 träningsexempel per klass kan en modell metainlärd med syntetisk data närma sig prestandan hos en modell metainlärd med riktig data. Resultaten visar även att små ändringar i genereringsprocessen, exempelvis graden av slumpmässigt ljus, har en stor inverkan på den slutgiltiga prestandan, vilket ger hopp om att ytterligare finjustering av genereringsprocessen kan resultera i ännu fler prestandaförbättringar.

APA, Harvard, Vancouver, ISO, and other styles

18

POLLINI, RAMA. "Data exploitation at different levels for Behaviour Analysis in real-world scenarios." Doctoral thesis, Università Politecnica delle Marche, 2018. http://hdl.handle.net/11566/253152.

Full text

Abstract:

La ricerca è stata condotta nell’ambito dell’analisi dei comportamenti, con l’obiettivo di sfruttare e rielaborare differenti tipologie di dati, offrendo le basi per nuovi sviluppi e metodologie all’interno di un determinato scenario. L’analisi del comportamento è stato affrontato in 3 scenari reali: In ambito AAL è stata creata un’architettura basata sulla piattaforma “Zabbix” in grado di monitorare e raccogliere dati provenienti da numerosi smart objects in una casa domotica. In questo ambito l’obiettivo principale è stato quello di sfruttare i dati raccolti per definire e delineare dei comportamenti anomali di persone anziane all’interno della loro abitazione, implementando un sistema intelligente basato su algoritmi di machine learning in grado di far scattare allarmi; un aspetto innovativo di questo sistema è la capacità di incrociare allarmi provenienti da più SOs garantendo una diminuzione di falsi positivi. Il progetto ha coinvolto 16 aziende e 2 università. Nell’ambito Industry 4.0 è stata sviluppata un’applicazione per la realtà aumentata fruibile attraverso smart glasses e finalizzata al training on the job. Il progetto nasce dalla collaborazione dell’azienda Intermac, specializzata nella lavorazione di pietra, vetro e metallo. L’obiettivo è di addestrare l’operatore nello svolgimento di un task mediante animazioni 3d e labels informative. E’ stata posta notevole attenzione allo studio dell’usabilità di dispositivi wearable perché rappresenta una delle principali criticità di questa tecnologia. Nell’ambito E-commerce è stato condotto lo studio sui dati di navigazione degli utenti che hanno effettuato acquisti all’interno dell’e-commerce. Questi dati, opportunamente filtrati e accorpati in sessioni di navigazione, consentono di addestrare reti neurali (LSTM) nel riconoscimento di sequenze di azioni. L’obiettivo principale è quello di migliorare il processo di acquisto dell’utente proponendo prodotti sulla base delle operazioni svolte sul portale e-commerce. Il lavoro presentato in questa tesi è stato reso possibile grazie alla collaborazione con la società informatica Apra Spa che ha cofinanziato il dottorato con una borsa di studio EUREKA.
The research was carried out in the field of behavioural analysis, with the aim of exploiting and re-elaborating different types of data, offering the basis for new developments and methodologies within a given scenario. The analysis of behaviour has been addressed in 3 real-life scenarios. In Ambient Assisted Living (AAL) has been created an architecture based on the "Zabbix" platform that can monitor and collect data from different smart objects in a home automation house. In this context, the main objective was to exploit the data collected to define and outline anomalous behaviours of elderly people in their homes, implementing an intelligent system based on machine learning algorithms capable of triggering alarms; an innovative aspect of this system is the ability to cross alarms from multiple SOs ensuring a decrease in false positives. The project involved 16 companies and 2 universities. In Industry 4.0, an application has been developed for augmented reality, usable through smart glasses and aimed at training on the job. The project stems from the collaboration of Intermac, a company specialized in stone, glass and metal processing. The goal is to train the operator in carrying out a task by means of 3D animations and informative labels. Considerable attention has been paid to the study of wearable devices' usability because it is one of the main critical aspects of this technology. In the field of E-commerce, a study has been conducted on the navigation data of users who have made purchases within the e-commerce website. These data, suitably filtered and grouped in navigation sessions, allow to train neural networks (Long Short Term Memory networks – usually just called “LSTMs") in the recognition of action sequences. The main objective is to improve the user's purchasing process by proposing products based on the operations carried out on the e-commerce portal. The work presented in this thesis was made possible thanks to the collaboration with the IT company Apra Spa which co-financed the doctorate with a EUREKA scholarship.

APA, Harvard, Vancouver, ISO, and other styles

19

Pagliarani, Andrea <1990&gt. "Big Data mining and machine learning techniques applied to real world scenarios." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/8904/1/Pagliarani_Andrea_tesi.pdf.

Full text

Abstract:

Data mining techniques allow the extraction of valuable information from heterogeneous and possibly very large data sources, which can be either structured or unstructured. Unstructured data, such as text files, social media, mobile data, are much more than structured data, and grow at a higher rate. Their high volume and the inherent ambiguity of natural language make unstructured data very hard to process and analyze. Appropriate text representations are therefore required in order to capture word semantics as well as to preserve statistical information, e.g. word counts. In Big Data scenarios, scalability is also a primary requirement. Data mining and machine learning approaches should take advantage of large-scale data, exploiting abundant information and avoiding the curse of dimensionality. The goal of this thesis is to enhance text understanding in the analysis of big data sets, introducing novel techniques that can be employed for the solution of real world problems. The presented Markov methods temporarily achieved the state-of-the-art on well-known Amazon reviews corpora for cross-domain sentiment analysis, before being outperformed by deep approaches in the analysis of large data sets. A noise detection method for the identification of relevant tweets leads to 88.9% accuracy in the Dow Jones Industrial Average daily prediction, which is the best result in literature based on social networks. Dimensionality reduction approaches are used in combination with LinkedIn users' skills to perform job recommendation. A framework based on deep learning and Markov Decision Process is designed with the purpose of modeling job transitions and recommending pathways towards a given career goal. Finally, parallel primitives for vendor-agnostic implementation of Big Data mining algorithms are introduced to foster multi-platform deployment, code reuse and optimization.

APA, Harvard, Vancouver, ISO, and other styles

20

Efendic, Nedim. "Creating a Digital Twin by Using Real World Sensors." Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-92249.

Full text

Abstract:

Örebro University and Akademiska Hus have started an initiative towards smart buildings. Avery important role to this is Digital Twin for buildings. Digital twin for buildings is a virtualcopy of a physical building. And by adding a Data Driven Simulation System, an even moresmart building could be achieved. Given a humidity-, temperature-, illuminance- and motionsensor in a specific corridor at the Örebro University, this thesis will ascertain what can bedone by creating a Data Driven Simulation System and using these sensors to achieve thedesired smart building. In this thesis, a simulation was created with simulated sensors andpedestrians. The simulation is a clone of the real world, by using real life sensors andapplying the data to the simulated sensors, this was partially achieved.

APA, Harvard, Vancouver, ISO, and other styles

21

Kenny, Ian Duncan. "An evaluation of performance enhancements to particle swarm optimisation on real-world data." Thesis, Open University, 2016. http://oro.open.ac.uk/47999/.

Full text

Abstract:

Swarm Computation is a relatively new optimisation paradigm. The basic premise is to model the collective behaviour of self-organised natural phenomena such as swarms, flocks and shoals, in order to solve optimisation problems. Particle Swarm Optimisation (PSO) is a type of swarm computation inspired by bird flocks or swarms of bees by modelling their collective social influence as they search for optimal solutions. In many real-world applications of PSO, the algorithm is used as a data pre-processor for a neural network or similar post processing system, and is often extensively modified to suit the application. The thesis introduces techniques that allow unmodified PSO to be applied successfully to a range of problems, specifically three extensions to the basic PSO algorithm: solving optimisation problems by training a hyperspatial matrix, using a hierarchy of swarms to coordinate optimisation on several data sets simultaneously, and dynamic neighbourhood selection in swarms. Rather than working directly with candidate solutions to an optimisation problem, the PSO algorithm is adapted to train a matrix of weights, to produce a solution to the problem from the inputs. The search space is abstracted from the problem data. A single PSO swarm optimises a single data set and has difficulties where the data set comprises disjoint parts (such as time series data for different days). To address this problem, we introduce a hierarchy of swarms, where each child swarm optimises one section of the data set whose gbest particle is a member of the swarm above in the hierarchy. The parent swarm(s) coordinate their children and encourage more exploration of the solution space. We show that hierarchical swarms of this type perform better than single swarm PSO optimisers on the disjoint data sets used. PSO relies on interaction between particles within a neighbourhood to find good solutions. In many PSO variants, possible interactions are arbitrary and fixed on initialisation. Our third contribution is a dynamic neighbourhood selection: particles can modify their neighbourhood, based on the success of the candidate neighbour particle. As PSO is intended to reflect the social interaction of agents, this change significantly increases the ability of the swarm to find optimal solutions. Applied to real-world medical and cosmological data, this modification is and shows improvements over standard PSO approaches with fixed neighbourhoods.

APA, Harvard, Vancouver, ISO, and other styles

22

Pereira, Gina Ribeiro. "Real-world data as a tool for establishing the value of a medicine." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/15417.

Full text

Abstract:

Mestrado em Biomedicina Farmacêutica
Nos últimos anos tem sido discutido em que medida são fornecidos dados suficientes para estimar o valor clínico do medicamento durante o processo de aprovação e autorização de introdução no mercado (AIM). Apesar dos ensaios clínicos randomizados (ECR) possuírem extrema validade interna na avaliação da segurança e eficácia de novos produtos, não permitem a extrapolação dos dados de eficácia para a vida real (efetividade). Alguns peritos têm discutido o potencial uso dos dados recolhidos na vida real (DVR) na contribuição de uma avaliação mais robusta de produtos e resultados em saúde. Os avanços nas tecnologias da informação permitem recolher, partilhar, analisar e utilizar grandes quantidades de informação a um custo relativamente baixo. Neste contexto, os DVR podem ser usados em conjunto com ECR e outros dados médicos para proporcionar perspectivas sobre resultados clínicos reais. Se esses dados e metodologias puderem ser canalizados para a pré- AIM, os titulares serão capazes de direccionar o desenvolvimento de medicamentos para áreas onde o valor é susceptível de ser elevado para os doentes e sistemas de saúde. Assim, as agências regulamentares e de avaliação de tecnologias da saúde terão informação suficiente para tomar decisões devidamente fundamentadas sobre a eficácia relativa de novas intervenções em saúde. O principal objetivo desta tese é promover uma análise da utilidade dos DVR, como criadores de valor, em todas as fases de desenvolvimento de medicamentos e discutir o papel-chave de todas as partes interessadas no uso de DVR.
It has long been discussed to which extend the licensing procedure should assure the availability of sufficient data to assess the clinical value of a new dug at the time of marketing introduction. Despite the high internal validity of randomised clinical trials (RCTs) generated evidence and its ability to robustly indicate the safety and efficacy of new products, it falls short of allowing for extrapolation from efficacy to clinical effectiveness. A number of analysts and academics have signalled the potential of real-world data (RWD) to contribute to improved health products and outcomes. Advances in computing allow collecting, share, analyse and use large quantities of data routinely at a relatively low cost. In this context, RWD can be used in conjunction with RCTs and other medical data to provide insights into real-world clinical outcomes. If such data and methodologies could be harnessed in pre-authorisation drug development, drug manufactures would be able to direct drug development to areas where value is likely to be highest for patients and health systems. In addition, regulatory and Heath Technology Assessment (HTA) agencies would be able to make better-informed decisions on relative effectiveness of new health interventions. The main goal of this thesis is to analyse the usefulness of RWD collection, as creator of value, in all drug development phases and discuss the key role of all stakeholders in use of RWD.

APA, Harvard, Vancouver, ISO, and other styles

23

REA, FEDERICO. "Monitoring and assessing diagnostic-therapeutic paths with healthcare utilization databases: experiences, concerns and challenges." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2020. http://hdl.handle.net/10281/262324.

Full text

Abstract:

The aim of this thesis is to provide the methodology used to develop and validate population-based prognostic scores, and to assess the effectiveness and cost-effectiveness of the diagnostic-therapeutic path of diabetes, using the healthcare utilization databases (or administrative databases) of Italian regions. Thus, the thesis is structured into the following three main parts. First, the reasons to justify the need of real-world studies in addition to evidence from randomized controlled trials, the definitions of real-world data and real-world evidence, and an overview of the Italian healthcare utilization databases are given. Second, because patients should be monitored according to their risk to experience adverse outcomes (e.g., all-cause mortality, hospital admissions), prognostic scores could be used. However, the main limitation in the use of pre-existing score is that they are usually developed in countries different from Italy and from hospital-based or pharmacy-based surveys, so hindering their applicability to all beneficiaries of the National Health Service. Therefore, two population-based prognostic scores were developed and validated using data from some Italian regions. The usefulness of one of these scores (i.e., the so-called Multisource Comorbidity Score) in the risk adjustments and as a tool for health policy planning is also shown. Third, tracing the work carried out from the “Monitoring and assessing care pathways” working group of the Italian Ministry of Health, a description of the following activities is provided: I. the development of process indicators to monitor and assess the quality of care of patients suffering from some chronic disease; II. the comparison of care quality between regions; III. the validation of the diabetes care indicators with respect to selected outcomes (i.e., the assessment of their effectiveness); IV. the assessment of the costs from the National Health Service perspective (calculated by the amount that the Regional Health Authority reimbursed to health providers) according to different levels of adherence to the diagnostic-therapeutic path of diabetes. Finally, the Beaver® regional research platform, able to compute the set of process and outcome indicators defined by the Health Ministry and to generate evidence on effectiveness and cost-effectiveness profile, is described.
The aim of this thesis is to provide the methodology used to develop and validate population-based prognostic scores, and to assess the effectiveness and cost-effectiveness of the diagnostic-therapeutic path of diabetes, using the healthcare utilization databases (or administrative databases) of Italian regions. Thus, the thesis is structured into the following three main parts. First, the reasons to justify the need of real-world studies in addition to evidence from randomized controlled trials, the definitions of real-world data and real-world evidence, and an overview of the Italian healthcare utilization databases are given. Second, because patients should be monitored according to their risk to experience adverse outcomes (e.g., all-cause mortality, hospital admissions), prognostic scores could be used. However, the main limitation in the use of pre-existing score is that they are usually developed in countries different from Italy and from hospital-based or pharmacy-based surveys, so hindering their applicability to all beneficiaries of the National Health Service. Therefore, two population-based prognostic scores were developed and validated using data from some Italian regions. The usefulness of one of these scores (i.e., the so-called Multisource Comorbidity Score) in the risk adjustments and as a tool for health policy planning is also shown. Third, tracing the work carried out from the “Monitoring and assessing care pathways” working group of the Italian Ministry of Health, a description of the following activities is provided: I. the development of process indicators to monitor and assess the quality of care of patients suffering from some chronic disease; II. the comparison of care quality between regions; III. the validation of the diabetes care indicators with respect to selected outcomes (i.e., the assessment of their effectiveness); IV. the assessment of the costs from the National Health Service perspective (calculated by the amount that the Regional Health Authority reimbursed to health providers) according to different levels of adherence to the diagnostic-therapeutic path of diabetes. Finally, the Beaver® regional research platform, able to compute the set of process and outcome indicators defined by the Health Ministry and to generate evidence on effectiveness and cost-effectiveness profile, is described.

APA, Harvard, Vancouver, ISO, and other styles

24

Koskimäki, H. (Heli). "Utilizing similarity information in industrial applications." Doctoral thesis, University of Oulu, 2009. http://urn.fi/urn:isbn:9789514290398.

Full text

Abstract:

Abstract The amount of digital data surrounding us has exploded within the past years. In industry, data are gathered from different production phases with the intent to use the data to improve the overall manufacturing process. However, management and utilization of these huge data sets is not straightforward. Thus, a computer-driven approach called data mining has become an attractive research area. Using data mining methods, new and useful information can be extracted from enormous data sets. In this thesis, diverse industrial problems are approached using data mining methods based on similarity. Similarity information is shown to give an additional advantage in different phases of manufacturing. Similarity information is utilized with smaller-scale problems, but also in a broader perspective when aiming to improve the whole manufacturing process. Different ways of utilizing similarity are also introduced. Methods are chosen to emphasize the similarity aspect; some of the methods rely entirely on similarity information, while other methods just preserve similarity information as a result. The actual problems covered in this thesis are from quality control, process monitoring, improvement of manufacturing efficiency and model maintenance. They are real-world problems from two different application areas: spot welding and steel manufacturing. Thus, this thesis clearly shows how the industry can benefit from the presented data mining methods.

APA, Harvard, Vancouver, ISO, and other styles

25

Almay, Felix, and Oskar Strömberg. "Applicability of Constraint Solving and Simulated Annealing to Real-World Scale University Course Timetabling Problems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259761.

Full text

Abstract:

The university course timetabling problem is the problem of creating a schedule for university courses under certain constraints. The decision variant of this optimisation problem is NP-complete. We have researched this problem and implemented the heuristic simulated annealing. This implementation has been compared with respect to time to the constraint solver CPSolver, based on iterative forward search. Our results show that CPSolver scales better for large problem instances. Simulated annealing as implemented by us is thus not suitable in itself for generating valid solutions to this problem on a real-world scale.
Universitetsschemaläggningsproblemet går ut på att skapa ett schema för universitetskurser under vissa villkor. Beslutsversionen av detta optimeringsproblem är NP-fullständig. Vi har undersökt problemet och implementerat heuristiken simulerad härdning. Denna har jämförts med avseende på tid med villkorsprogrammeringslösaren CPSolver, som är baserad på iterativ framåtsökning. Våra resultat visar att CPSolver skalar bättre för stora probleminstanser. Simulerad härdning som implementerad av oss är därför inte i sig lämplig för att generera giltiga lösningar till verklighetstrogna probleminstanser.

APA, Harvard, Vancouver, ISO, and other styles

26

Franquet, Bonet Álvaro. "Statistical methods to manage and analyse real world data. Development of a health observatory in Alt Empordà. Mètodes estadístics per gestionar i analitzar real world data. Desenvolupament d'un observatori de salut a l'Alt Empordà." Doctoral thesis, Universitat de Girona, 2021. http://hdl.handle.net/10803/673830.

Full text

Abstract:

The objective of this thesis is to explore in greater depth the process of creating a public health observatory and the processing of the information needed to take territorial decisions in the county of Alt Empordà. The outcome is the creation of Indika, the Health and Social Pole of the Alt Empordà. Indika is a public health agency that acts as the territory’s health and well-being agent. The objective of Indika is to generate knowledge about the impact of social determinants of health in Alt Empordà. To do so, it uses a working framework based on three points: to inform, to discuss, and to act. This working framework served to produce the visual information solutions, including infographics, collections of indicators, and a web application whose purpose is to portray the existing problems in the territory to start debate with the different political and social agents to take decisions on improvement initiatives. Apart from creating a public health observatory, two parallel lines of action have also been developed in this thesis: the creation of public structures to store information based on events as a way of structuring medical information. These two lines of investigation have resulted in the creation of the "Indika Data Repository", an information repository about the county of Alt Empordà, using the collaborative platform GitHub and the implementation of the eventr package in R programming language, and currently published in the Comprehensive R Archive Network. Eventr is a package whose purpose is to facilitate the implementation of architectures based on events
Aquesta tesi té com a objectiu aprofundir en el procés de creació d'un observatori de salut pública i el tractament de la informació necessària per a la presa de decisions territorials a la comarca de l'Alt Empordà. El resultat ha estat la creació d'Indika pol de salut i social de l'Alt Empordà. Indika és un observatori de salut pública que actua com a agent de salut i benestar del territori. L'objectiu d'Indika és generar coneixement sobre l'impacte dels determinants socials de salut a l'Alt Empordà. Per a això fa ús d'un marc de treball basat en tres punts: informar, conversar i actuar. Aquest marc de treball s'ha concretat en la generació de visualitzacions d'informació com infografies, col·leccions d'indicadors i una aplicació web amb l'objectiu de donar a conèixer les problemàtiques del territori i iniciar així un debat amb els diferents actors polítics i socials que permeti concretar accions de millora. En aquesta tesi, a més de la creació d'un observatori de salut pública, s'han desenvolupat dues línies d'investigació paral·leles que són: la creació d'estructures públiques d'emmagatzematge d'informació i l'anàlisi de l'arquitectura basada en esdeveniments com a forma d'estructurar la informació mèdica. Aquestes dues línies d'investigació s'han concretat en la creació de "Indika Data Repository", un repositori d'informació sobre la comarca de l'Alt Empordà a través de la plataforma col·laborativa GitHub i la implementació de la llibreria eventr en el llenguatge de programació R publicada actualment al Comprehensive R Archive Network. Eventr és una llibreria que té com a objectiu facilitar la implementació d'arquitectures basades en esdeveniments
Programa de Doctorat en Biologia Molecular, Biomedicina i Salut

APA, Harvard, Vancouver, ISO, and other styles

27

Tsoi, Ada. "The Potential of Event Data Recorders to Improve Impact Injury Assessment in Real World Crashes." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/73805.

Full text

Abstract:

Event data recorders (EDRs) are an invaluable data source that have begun to, and will increasingly, provide novel insight into motor vehicle crash characteristics. The "black boxes" in automobiles, EDRs directly measure precrash and crash kinematics. This data has the potential to eclipse the many traditional surrogate measures used in vehicle safety that often rely upon assumptions and simplifications of real world crashes. Although EDRs have been equipped in passenger vehicles for over two decades, the recent establishment of regulation has greatly affected the quantity, resolution, duration, and accuracy of the recorded data elements. Thus, there was not only a demand to reestablish confidence in the data, but a need to demonstrate the potential of the data. The objectives of the research presented in this dissertation were to (1) validate EDR data accuracy in full-frontal, side-impact moving deformable barrier, and small overlap crash tests; (2) evaluate EDR survivability beyond regulatory crash tests, (3) determine the seat belt accuracy of current databases, and (4) assess the merits of other vehicle-based crash severity metrics relative to delta-v. This dissertation firstly assessed the capabilities of EDRs. Chapter 2 demonstrated the accuracy of 176 crash tests, corresponding to 29 module types, 5 model years, 9 manufacturers, and 4 testing configurations from 2 regulatory agencies. Beyond accuracy, Chapter 3 established that EDRs are anecdotally capable of surviving extreme events of vehicle fire, vehicle immersion, and high delta; although the frequency of these events are very rare on U.S. highways. The studies in Chapters 4 and 5 evaluated specific applications intended to showcase the potential of EDR data. Even single value data elements from EDRs were shown to be advantageous. In particular, the seat belt use status may become a useful tool to supplement crash investigators, especially in low severity crashes that provide little forensic evidence. Moreover, time-series data from EDRs broadens the number of available vehicle-based crash severity metrics that can be utilized. In particular, EDR data was used to calculate vehicle pulse index (VPI), which was shown to have modestly increased predictive abilities of serious injury compared to the widely used delta-v among belted occupants. Ultimately, this work has strong implications for EDR users, regulatory agencies, and future technologies.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

28

GARBARINO, DAVIDE. "Acknowledging the structured nature of real-world data with graphs embeddings and probabilistic inference methods." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1092453.

Full text

Abstract:

In the artificial intelligence community there is a growing consensus that real world data is naturally represented as graphs because they can easily incorporate complexity at several levels, e.g. hierarchies or time dependencies. In this context, this thesis studies two main branches for structured data. In the first part we explore how state-of-the-art machine learning methods can be extended to graph modeled data provided that one is able to represent graphs in vector spaces. Such extensions can be applied to analyze several kinds of real-world data and tackle different problems. Here we study the following problems: a) understand the relational nature and evolution of websites which belong to different categories (e-commerce, academic (p.a.) and encyclopedic (forum)); b) model tennis players scores based on different game surfaces and tournaments in order to predict matches results; c) analyze preter- m-infants motion patterns able to characterize possible neuro degenerative disorders and d) build an academic collaboration recommender system able to model academic groups and individual research interest while suggesting possible researchers to connect with, topics of interest and representative publications to external users. In the second part we focus on graphs inference methods from data which present two main challenges: missing data and non-stationary time dependency. In particular, we study the problem of inferring Gaussian Graphical Models in the following settings: a) inference of Gaussian Graphical Models when data are missing or latent in the context of multiclass or temporal network inference and b) inference of time-varying Gaussian Graphical Models when data is multivariate and non-stationary. Such methods have a natural application in the composition of an optimized stock markets portfolio. Overall this work sheds light on how to acknowledge the intrinsic structure of data with the aim of building statistical models that are able to capture the actual complexity of the real world.

APA, Harvard, Vancouver, ISO, and other styles

29

Albrecht, Philipp, Ingrid Kristine Bjørnå, David Brassat, Rachel Farrell, Peter Feys, Jeremy Hobart, Michael Linnebank, et al. "Prolonged-release fampridine in multiple sclerosis: clinical data and real-world experience. Report of an expert meeting." Sage, 2018. https://tud.qucosa.de/id/qucosa%3A35544.

Full text

Abstract:

Prolonged-release (PR) fampridine is the only approved medication to improve walking in multiple sclerosis (MS), having been shown to produce a clinically meaningful improvement in walking ability in the subset of MS patients with Expanded Disability Status Scale 4–7. Recent responder subgroup analyses in the phase III ENHANCE study show a large effect size in terms of an increase of 20.58 points on the patient-reported 12-item MS Walking Scale in the 43% of patients classified as responders to PR-fampridine, corresponding to a standardized response mean of 1.68. Use of PR-fampridine in clinical practice varies across Europe, depending partly on whether it is reimbursed. A group of European MS experts met in June 2017 to discuss their experience with using PR-fampridine, including their views on the patient population for treatment, assessment of treatment response, re-testing and retreatment, and stopping criteria. This article summarizes the experts’ opinions on how PRfampridine can be used in real-world clinical practice to optimize the benefits to people with MS with impaired walking ability.

APA, Harvard, Vancouver, ISO, and other styles

30

Iwao, Tomohide. "A Methodology of Dataset Generation for Secondary Use of Health Care Big Data." Kyoto University, 2020. http://hdl.handle.net/2433/253411.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Hwang, Yuan-Chun. "Local and personalised models for prediction, classification and knowledge discovery on real world data modelling problems." Click here to access this resource online, 2009. http://hdl.handle.net/10292/776.

Full text

Abstract:

This thesis presents several novel methods to address some of the real world data modelling issues through the use of local and individualised modelling approaches. A set of real world data modelling issues such as modelling evolving processes, defining unique problem subspaces, identifying and dealing with noise, outliers, missing values, imbalanced data and irrelevant features, are reviewed and their impact on the models are analysed. The thesis has made nine major contributions to information science, includes four generic modelling methods, three real world application systems that apply these methods, a comprehensive review of the real world data modelling problems and a data analysis and modelling software. Four novel methods have been developed and published in the course of this study. They are: (1) DyNFIS – Dynamic Neuro-Fuzzy Inference System, (2) MUFIS – A Fuzzy Inference System That Uses Multiple Types of Fuzzy Rules, (3) Integrated Temporal and Spatial Multi-Model System, (4) Personalised Regression Model. DyNFIS addresses the issue of unique problem subspaces by identifying them through a clustering process, creating a fuzzy inference system based on the clusters and applies supervised learning to update the fuzzy rules, both antecedent and consequent part. This puts strong emphasis on the unique problem subspaces and allows easy to understand rules to be extracted from the model, which adds knowledge to the problem. MUFIS takes DyNFIS a step further by integrating a mixture of different types of fuzzy rules together in a single fuzzy inference system. In many real world problems, some problem subspaces were found to be more suitable for one type of fuzzy rule than others and, therefore, by integrating multiple types of fuzzy rules together, a better prediction can be made. The type of fuzzy rule assigned to each unique problem subspace also provides additional understanding of its characteristics. The Integrated Temporal and Spatial Multi-Model System is a different approach to integrating two contrasting views of the problem for better results. The temporal model uses recent data and the spatial model uses historical data to make the prediction. By combining the two through a dynamic contribution adjustment function, the system is able to provide stable yet accurate prediction on real world data modelling problems that have intermittently changing patterns. The personalised regression model is designed for classification problems. With the understanding that real world data modelling problems often involve noisy or irrelevant variables and the number of input vectors in each class may be highly imbalanced, these issues make the definition of unique problem subspaces less accurate. The proposed method uses a model selection system based on an incremental feature selection method to select the best set of features. A global model is then created based on this set of features and then optimised using training input vectors in the test input vector’s vicinity. This approach focus on the definition of the problem space and put emphasis the test input vector’s residing problem subspace. The novel generic prediction methods listed above have been applied to the following three real world data modelling problems: 1. Renal function evaluation which achieved higher accuracy than all other existing methods while allowing easy to understand rules to be extracted from the model for future studies. 2. Milk volume prediction system for Fonterra achieved a 20% improvement over the method currently used by Fonterra. 3. Prognoses system for pregnancy outcome prediction (SCOPE), achieved a more stable and slightly better accuracy than traditional statistical methods. These solutions constitute a contribution to the area of applied information science. In addition to the above contributions, a data analysis software package, NeuCom, was primarily developed by the author prior and during the PhD study to facilitate some of the standard experiments and analysis on various case studies. This is a full featured data analysis and modelling software that is freely available for non-commercial purposes (see Appendix A for more details). In summary, many real world problems consist of many smaller problems. It was found beneficial to acknowledge the existence of these sub-problems and address them through the use of local or personalised models. The rules extracted from the local models also brought about the availability of new knowledge for the researchers and allowed more in-depth study of the sub-problems to be carried out in future research.

APA, Harvard, Vancouver, ISO, and other styles

32

Reinartz, Thomas [Verfasser]. "Focusing solutions for data mining : analytical studies and experimental results in real world domains / T. Reinartz." Berlin, 1999. http://d-nb.info/965635090/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Lai, Daphne Teck Ching. "An exploration of improvements to semi-supervised fuzzy c-means clustering for real-world biomedical data." Thesis, University of Nottingham, 2014. http://eprints.nottingham.ac.uk/14232/.

Full text

Abstract:

This thesis explores various detailed improvements to semi-supervised learning (using labelled data to guide clustering or classification of unlabelled data) with fuzzy c-means clustering (a ‘soft’ clustering technique which allows data patterns to be assigned to multiple clusters using membership values), with the primary aim of creating a semi-supervised fuzzy clustering algorithm that shows good performance on real-world data. Hence, there are two main objectives in this work. The first objective is to explore novel technical improvements to semi-supervised Fuzzy c-means (ssFCM) that can address the problem of initialisation sensitivity and can improve results. The second objective is to apply the developed algorithm on real biomedical data, such as the Nottingham Tenovus Breast Cancer (NTBC) dataset, to create an automatic methodology for identifying stable subgroups which have been previously elicited semi-manually. Investigations were conducted into detailed improvements to the ss-FCM algorithm framework, including a range of distance metrics, initialisation and feature selection techniques and scaling parameter values. These methodologies were tested on different data sources to demonstrate their generalisation properties. Evaluation results between methodologies were compared to determine suitable techniques on various University of California, Irvine (UCI) benchmark datasets. Results were promising, suggesting that initialisation techniques, feature selection and scaling parameter adjustment can increase ssFCM performance. Based on these investigations, a novel ssFCM framework was developed, applied to the NTBC dataset, and various statistical and biological evaluations were conducted. This demonstrated highly significant improvement in agreement with previous classifications, with solutions that are biologically useful and clinically relevant in comparison with Sorias study [141]. On comparison with the latest NTBC study by Green et al. [63], similar clinical results have been observed, confirming stability of the subgroups. Two main contributions to knowledge have been made in this work. Firstly, the ssFCM framework has been improved through various technical refinements, which may be used together or separately. Secondly, the NTBC dataset has been successfully automatically clustered (in a single algorithm) into clinical sub-groups which had previously been elucidated semi-manually. While results are very promising, it is important to note that fully, detailed validation of the framework has only been carried out on the NTBC dataset, and so there is limit on the general conclusions that may be drawn. Future studies include applying the framework on other biomedical datasets and applying distance metric learning into ssFCM. In conclusion, an enhanced ssFCM framework has been proposed, and has been demonstrated to have highly significant improved accuracy on the NTBC dataset.

APA, Harvard, Vancouver, ISO, and other styles

34

Hu, Yang. "Temporal Change in the Power Production of Real-world Photovoltaic Systems Under Diverse Climatic Conditions." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1481295879868785.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Pang, Shih-Hao. "Life Cycle Inventory Incorporating Fuel Cycle and Real-World In-Use Measurement Data for Construction Equipment and Vehicles." NCSU, 2008. http://web.lib.ncsu.edu/theses/available/etd-12152007-080346/.

Full text

Abstract:

Biodiesel is an alternative fuel that can be made from vegetable oils or animal fat. This study focuses on whether substitution of soy-based biodiesel fuels for petroleum diesel would produce an overall reduction in emissions of selected pollutants. A life cycle inventory model was developed to estimate energy consumption and emissions of selected pollutants and greenhouse gases. Real-world measurements using portable emission measurement system (PEMS) were made for 15 construction vehicles, including five backhoes, four front-end loaders, and six motor graders on both petroleum diesel and soy-based B20 biodiesel. These data are used as the basis for vehicle tailpipe emission factors of CO2, CO, HC, NOx, and PM. The results imply that biodiesel is a promising alternative fuel for diesel, but that there are some environmental trade-offs. Analysis of empirical data reveals that intra-vehicle variability of energy use and emissions is strongly influenced by vehicle activity that leads to variations in engine load, as represented by manifold absolute pressure (MAP). Vehicle-specific models for fuel use and tailpipe emissions were developed for each of the 30 construction vehicle. The time-based regression model has the highest explanatory ability among six models and is recommended in order to predict fuel use and emission rate for diesel-fueled nonroad construction equipment. Representative duty cycles for each type of vehicles were characterized by a frequency distribution of normalized manifold absolute pressure (MAP). In order to assess the variations of fuel use and emissions among different duty cycles, for a given engine, the inter-cycle variability is assessed. In order to assess the variations of fuel use and emissions among engines, for a given duty cycle, the inter-engine variability is assessed. The results indicated time-based inter-cycle and inter-engine variations of fuel use and emissions are significant. Fuel-based emission factors have less variability among cycles and engines than time-based emission factors. Fuel-based emission factors are more robust with respect to inter-engine and inter-cycle variations and are recommended in order to develop an emissions inventory for nonroad construction vehicles. Real-world in-use measurements should be a basis for developing duty cycle correction factors in models such as NONROAD.

APA, Harvard, Vancouver, ISO, and other styles

36

Cattenoz, Mathieu. "MIMO Radar Processing Methods for Anticipating and Preventing Real World Imperfections." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112077/document.

Full text

Abstract:

Le concept du radar MIMO est prometteur en raison des nombreux avantages qu'il apporte par rapport aux architectures radars actuelles : flexibilité pour la formation de faisceau à l'émission - large illumination de la scène et résolution fine après traitement - et allègement de la complexité des systèmes, via la réduction du nombre d'antennes et la possibilité de transférer des fonctions de contrôle et d'étalonnage du système dans le domaine numérique. Cependant, le radar MIMO reste au stade du concept théorique, avec une prise en compte insuffisante des impacts du manque d'orthogonalité des formes d'onde et des défauts matériels.Ce travail de thèse, dans son ambition de contribuer à ouvrir la voie vers le radar MIMO opérationnel, consiste à anticiper et compenser les défauts du monde réel par des traitements numériques. La première partie traite de l'élaboration des formes d'onde MIMO. Nous montrons que les codes de phase sont optimaux en termes de résolution spatiale. Nous présentons également leurs limites en termes d'apparition de lobes secondaires en sortie de filtre adapté. La seconde partie consiste à accepter les défauts intrinsèques des formes d'onde et proposer des traitements adaptés au modèle de signal permettant d'éliminer les lobes secondaires résiduels induits. Nous développons une extension de l'Orthogonal Matching Pursuit (OMP) qui satisfait les conditions opérationnelles, notamment par sa robustesse aux erreurs de localisation, sa faible complexité calculatoire et la non nécessité de données d'apprentissage. La troisième partie traite de la robustesse des traitements vis-à-vis des écarts au modèle de signal, et particulièrement la prévention et l'anticipation de ces phénomènes afin d'éviter des dégradations de performance. En particulier, nous proposons une méthode numérique d'étalonnage des phases des émetteurs. La dernière partie consiste à mener des expérimentations en conditions réelles avec la plateforme radar MIMO Hycam. Nous montrons que certaines distorsions subies non anticipées, même limitées en sortie de filtre adapté, peuvent impacter fortement les performances en détection des traitements dépendant du modèle de signal
The MIMO radar concept promises numerous advantages compared to today's radar architectures: flexibility for the transmitting beampattern design - including wide scene illumination and fine resolution after processing - and system complexity reduction, through the use of less antennas and the possibility to transfer system control and calibration to the digital domain. However, the MIMO radar is still at the stage of theoretical concept, with insufficient consideration for the impacts of waveforms' lack of orthogonality and system hardware imperfections.The ambition of this thesis is to contribute to paving the way to the operational MIMO radar. In this perspective, this thesis work consists in anticipating and compensating the imperfections of the real world with processing techniques. The first part deals with MIMO waveform design and we show that phase code waveforms are optimal in terms of spatial resolution. We also exhibit their limits in terms of sidelobes appearance at matched filter output. The second part consists in taking on the waveform intrinsic imperfections and proposing data-dependent processing schemes for the rejection of the induced residual sidelobes. We develop an extension for the Orthogonal Matching Pursuit (OMP) that satisfies operational requirements, especially localization error robustness, low computation complexity, and nonnecessity of training data. The third part deals with processing robustness to signal model mismatch, especially how it can be prevented or anticipated to avoid performance degradation. In particular, we propose a digital method of transmitter phase calibration. The last part consists in carrying out experiments in real conditions with the Hycam MIMO radar testbed. We exhibit that some unanticipated encountered distortions, even when limited at the matched filter output, can greatly impact the performance in detection of the data-dependent processing methods

APA, Harvard, Vancouver, ISO, and other styles

37

Yogeswaran, Arjun. "Self-Organizing Neural Visual Models to Learn Feature Detectors and Motion Tracking Behaviour by Exposure to Real-World Data." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37096.

Full text

Abstract:

Advances in unsupervised learning and deep neural networks have led to increased performance in a number of domains, and to the ability to draw strong comparisons between the biological method of self-organization conducted by the brain and computational mechanisms. This thesis aims to use real-world data to tackle two areas in the domain of computer vision which have biological equivalents: feature detection and motion tracking. The aforementioned advances have allowed efficient learning of feature representations directly from large sets of unlabeled data instead of using traditional handcrafted features. The first part of this thesis evaluates such representations by comparing regularization and preprocessing methods which incorporate local neighbouring information during training on a single-layer neural network. The networks are trained and tested on the Hollywood2 video dataset, as well as the static CIFAR-10, STL-10, COIL-100, and MNIST image datasets. The induction of topography or simple image blurring via Gaussian filters during training produces better discriminative features as evidenced by the consistent and notable increase in classification results that they produce. In the visual domain, invariant features are desirable such that objects can be classified despite transformations. It is found that most of the compared methods produce more invariant features, however, classification accuracy does not correlate to invariance. The second, and paramount, contribution of this thesis is a biologically-inspired model to explain the emergence of motion tracking behaviour in early development using unsupervised learning. The model’s self-organization is biased by an original concept called retinal constancy, which measures how similar visual contents are between successive frames. In the proposed two-layer deep network, when exposed to real-world video, the first layer learns to encode visual motion, and the second layer learns to relate that motion to gaze movements, which it perceives and creates through bi-directional nodes. This is unique because it uses general machine learning algorithms, and their inherent generative properties, to learn from real-world data. It also implements a biological theory and learns in a fully unsupervised manner. An analysis of its parameters and limitations is conducted, and its tracking performance is evaluated. Results show that this model is able to successfully follow targets in real-world video, despite being trained without supervision on real-world video.

APA, Harvard, Vancouver, ISO, and other styles

38

Schirber, Sebastian, Daniel Klocke, Robert Pincus, Johannes Quaas, and Jeffrey L. Anderson. "Parameter estimation using data assimilation in an atmospheric general circulation model: Parameter estimation using data assimilation in an atmosphericgeneral circulation model: from a perfect toward the real world." American Geophysical Union (AGU), 2013. https://ul.qucosa.de/id/qucosa%3A13463.

Full text

Abstract:

This study explores the viability of parameter estimation in the comprehensive general circulation model ECHAM6 using ensemble Kalman filter data assimilation techniques. Four closure parameters of the cumulus-convection scheme are estimated using increasingly less idealized scenarios ranging from perfect-model experiments to the assimilation of conventional observations. Updated parameter values from experiments with real observations are used to assess the error of the model state on short 6 h forecasts and on climatological timescales. All parameters converge to their default values in single parameter perfect-model experiments. Estimating parameters simultaneously has a neutral effect on the success of the parameter estimation, but applying an imperfect model deteriorates the assimilation performance. With real observations, single parameter estimation generates the default parameter value in one case, converges to different parameter values in two cases, and diverges in the fourth case. The implementation of the two converging parameters influences the model state: Although the estimated parameter values lead to an overall error reduction on short timescales, the error of the model state increases on climatological timescales.

APA, Harvard, Vancouver, ISO, and other styles

39

Nguyen, Thi Yen Lien, Trung Dung Nghiem, and Minh Quý Cao. "Impact of the driving cycle on exhaust emissions of buses in Hanoi." Technische Universität Dresden, 2016. https://tud.qucosa.de/id/qucosa%3A32626.

Full text

Abstract:

The impact of driving cycle on exhaust emissions of buses in Hanoi was presented in this article. A typical driving cycle of buses in Hanoi was developed based on the real-world driving data, and it also was assessed that has a good conformity with the real-world driving data. The typical driving cycle and European Transient Cycle part 1 (ETC-part1) were used to estimate vehicle emission according to different driving cycles. The obtained results showed that emissions level of CO, VOC, PM, CO2 and NOx of the buses were very different between two driving cycles, especially CO2 and NOx. This paper, therefore, reconfirms the necessity of the development of the typical driving cycle before conducting the emission inventory for mobile sources.
Tóm tắt: Tác động của chu trình lái tới sự phát thải của xe buýt tại Hà Nội đã được trình bày trong bài báo này. Một chu trình lái đặc trưng của xe buýt Hà Nội đã được xây dựng dựa trên dữ liệu hoạt động ngoài thực tế của phương tiện, và chu trình lái này cũng đã được đánh giá có sự phù hợp rất cao với dữ liệu lái ngoài thực tế. Chu trình lái đặc trưng và chu trình thử ETC-part1 được sử dụng để đánh giá phát thải của phương tiện theo các chu trình lái khác nhau. Các kết quả đạt được cho thấy mức độ phát thải CO, VOC, PM, CO2 và NOx của xe buýt rất khác nhau giữa hai chu trình lái, đặc biệt là CO2 và NOx. Do đó, bài báo khẳng định sự cần thiết phải xây dựng chu trình lái đặc trưng trước khi thực hiện kiểm kê phát thải đối với nguồn động.

APA, Harvard, Vancouver, ISO, and other styles

40

Leroy, Mélanie. "Contribution des bases de données de soin courant à l’amélioration du diagnostic et du pronostic de la maladie d’Alzheimer et des dégénérescences lobaires frontotemporales." Electronic Thesis or Diss., Université de Lille (2018-2021), 2021. http://www.theses.fr/2021LILUS026.

Full text

Abstract:

Les biomarqueurs, tant le liquide cérébrospinal (LCS), l’imagerie par résonnance magnétique (IRM) que la tomographie par émission de positons (TEP), ont acquis une place prépondérante dans la démarche diagnostique d’un trouble cognitif. Ils font partie intégrante des critères diagnostiques de la maladie d’Alzheimer (MA) et de son principal diagnostic différentiel, les démences frontotemporales (DFT). En 2021 cependant, l’analyse histologique, où les modifications pathologiques sont observées directement sur les tissus, reste souvent la seule méthode permettant un diagnostic de certitude.Le centre mémoire de ressource et de recherche (CMRR) de Lille a constitué dès 1992 le réseau des consultations mémoire du Nord Pas-de-Calais, Méotis, et mis en place une base de données qui compte à ce jour plus de 120 000 patients. Fort de cette file active conséquente, notre travail a consisté à étudier les caractéristiques cliniques et biochimiques des patients MA ou DFT à différentes échelles (cas rares mais parfaitement caractérisés, cohortes monocentriques de patients ayant subi une ponction lombaire, cohortes régionales ou internationales).Notre première étude s’est intéressée aux corrélations entre le profil biochimique du LCS et les observations post mortem. Nous avons pu montrer que les biomarqueurs amyloïde et tau sont moins sensibles aux pathologies correspondantes quand celles-ci ne sont pas encore complètement développées au sein du cortex, conduisant à une détection incomplète des patients présentant des changements neuropathologiques liés à une MA.Nous nous sommes par la suite intéressés aux patients MA avec des biomarqueurs Tau, pTau et Aβ42/40 dans le LCS pathologiques, associé à un Aβ42 est normal. Nous avons montré que le profil cognitif, morphologique et fonctionnel des patients avec une MA et un Aβ42 normal ne diffère pas de ceux avec un Aβ42 pathologique.Dans le cadre d’une DFT, les biomarqueurs du LCS sont utilisés pour écarter un diagnostic possible de MA. Néanmoins, si tel est cas, tout un spectre de pathologie reste envisageable tant les DFT sont hétérogènes. Il existe à l’heure actuelle peu de corrélations phénotype-pathologie établies, ce qui, à l’aube du développement d’un traitement ciblé, peut représenter une perte de chance pour ces patients. Nous avons souhaité constituer une cohorte multicentrique de patients avec une DFT confirmée post mortem, afin d’améliorer les corrélations clinicopathologiques. Ce travail préliminaire démontre la complexité du spectre des DFT, avec de nombreux recoupements phénotypiques et histologiques.En complément de cette étude, nous avons souhaité considérer l’ensemble de la file active DFT du réseau Méotis. Bien que cette maladie soit rare, la mise en commun de données nous a permis d’atteindre des effectifs importants et de démontrer que les DFT se différencient des MA, tant au niveau des caractéristiques initiales, de la vitesse de progression que sur les traitements. Malgré l’utilisation des derniers critères cliniques, ces pathologies restent sous-diagnostiquées et ne doivent plus être considérées comme circonscrites aux sujets jeunes.Bien que chaque centre mémoire soit dans la capacité de contribuer individuellement à améliorer la compréhension des maladies neurodégénératives, il semble à l’heure actuelle évident que la mise en commun des données de santé est indispensable. Nous avons travaillé dans le cadre du projet européen Human Brain Project à la mise en place d’un outil d’analyse de données fédérées. La Medical Informatics Plateform permet d’effectuer des analyses complexes à partir de bases de données éloignées géographiquement, évitant ainsi les transferts, et les risques de fuites, de données de santé entre les centres de recherche.Les données issues du soin courant sont abondantes et porteuses de nombreuses informations. A nous de les valoriser pour avancer dans la compréhension des maladies neurocognitives, défi des années à venir
Biomarkers, whether cerebrospinal fluid (CSF), magnetic resonance imaging (MRI), or positional emission tomography (PET), have acquired a prominent place in the diagnostic process of a cognitive disorder. They are an integral part of the diagnostic criteria for Alzheimer's disease (AD) and its main differential diagnosis, frontotemporal dementia (FTD). In 2021, however, histological analysis, where pathological changes are observed directly on tissues, often remains the only method allowing a diagnosis of certainty.In 1992, the Lille Memory Resource and Research Center (CMRR) set up a network of memory consultations in the Nord Pas-de-Calais region. In addition to standardizing care, it has set up a database that now includes over 120,000 patients. Based on this large active file, our work consisted in studying the clinical and biochemical characteristics of AD and FTD patients on different scales (rare but perfectly characterized cases of clinicopathological correlations, monocentric cohorts of patients having undergone lumbar puncture, regional or international cohorts).Our first study focused on the correlations between the biochemical profile of cerebrospinal fluid (CSF) and post-mortem findings. We were able to show that amyloid and tau biomarkers are less sensitive to the corresponding pathologies when these are not yet fully developed in the cortex, leading to incomplete detection of patients with AD-related neuropathological changes.We subsequently focused on AD patients with pathological tau and pTau biomarkers in the CSF, pathological Aβ42/Aβ40 ratio, but normal Aβ42 levels. We showed that the cognitive, morphological, and functional profile of patients with AD and normal Aβ42 does not differ from those with pathological Aβ42.In the setting of FTD, CSF biomarkers are used to rule out a possible diagnosis of AD. Nevertheless, if this is the case, a whole spectrum of pathology remains possible as FTD is so heterogeneous. There are currently few established phenotype-pathology correlations, which, at the dawn of the development of a targeted treatment, may represent a loss of chance for these patients. We wished to constitute a multicentric cohort of patients with confirmed post-mortem FTD, in order to improve clinicopathological correlations. This preliminary work demonstrates the complexity of the FTD spectrum, with many phenotypic and histological overlaps.In addition to this study of gold standard FTD patients, we wished to consider the entire FTD active file of the Nord Pas-de-Calais memory consultation network. Although this disease is rare, the pooling of data within the network allowed us to reach a large number of patients. This work allowed us to demonstrate that FTD differs from AD, both in terms of initial characteristics, speed of progression and treatment. Despite the use of the latest clinical criteria, these pathologies remain under-diagnosed and should no longer be considered as limited to young subjects.Although each memory center, individually, is in a position to contribute to the advancement of science and to help better understand neurodegenerative diseases, it seems obvious at this time that the pooling of health data is indispensable. Within the framework of the European Human Brain Project, we have worked on the implementation of a federated data analysis tool. The Medical Informatics Platform allows complex analyses to be carried out from geographically distant databases, thus avoiding transfers and the risk of health data leaks between research centers.Data from routine care is abundant and contains a lot of information. It is up to us to make the most of it to advance our understanding of neurocognitive diseases, a challenge for the years to come

APA, Harvard, Vancouver, ISO, and other styles

41

Ziemssen, Tjalf, and Katja Thomas. "Alemtuzumab in the long-term treatment of relapsing-remitting multiple sclerosis: an update on the clinical trial evidence and data from the real world." Sage, 2017. https://tud.qucosa.de/id/qucosa%3A35541.

Full text

Abstract:

Alemtuzumab is a humanized monoclonal antibody approved for the treatment of relapsing-remitting multiple sclerosis (RRMS), given as two annual courses on five consecutive days at baseline and on three consecutive days 12 months later. Here we provide an update on the long-term efficacy and safety of alemtuzumab in RRMS, including realworld experience, and advances in our understanding of its mechanism of action. Recent data from the phase II/III extension study have demonstrated that alemtuzumab reduces relapse rates, disability worsening, and the rate of brain volume loss over the long term, with many patients achieving no evidence of disease activity. In high proportions of patients, preexisting disability remained stable or improved. Alemtuzumab is associated with a consistent safety profile over the long term, with no new safety signals emerging and the overall annual incidence of reported adverse events decreasing after the first year on treatment. Acyclovir prophylaxis reduces herpetic infections, and monitoring has been shown to mitigate the risk of autoimmune adverse events, allowing early detection and overall effective management. Data from clinical practice and ongoing observational studies are providing additional information on the real-world use of alemtuzumab. Recent evidence on the mechanism of action of alemtuzumab indicates that in addition to its previously known effects of inducing depletion and repopulation of T and B lymphocytes, it also results in a relative increase of cells with memory and regulatory phenotypes and a decrease in cells with a proinflammatory signature, and may further promote an immunoregulatory environment through an impact on other innate immune cells (e.g. dendritic cells) that play a role in MS. These effects may allow preservation of innate immunity and immunosurveillance. Together, these lines of evidence help explain the durable clinical efficacy of alemtuzumab, in the absence of continuous treatment, in patients with RRMS.

APA, Harvard, Vancouver, ISO, and other styles

42

Deshotel, Matthew Wayne. "Enhancing Undergraduate Water Resources Engineering Education Using Data and Modeling Resources Situated in Real-world Ecosystems| Design Principles and Challenges for Scaling and Sustainability." Thesis, University of Louisiana at Lafayette, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10266036.

Full text

Abstract:

Recent research and technological advances in the field of hydrology and water resources call for parallel educational reforms at the undergraduate level. This thesis describes the design, development, and evaluation of a series of undergraduate learning modules that engage students in investigative and inquiry-based learning experiences and introduces data analysis and numerical modeling skills. The modules are situated in the coastal hydrologic basins of Louisiana, USA. Centered on the current crisis of coastal land loss in the region, the modules immerse students in a suite of active-learning experiences in which they prepare and analyze data, reproduce model simulations, interpret results, and balance the beneficial and detrimental impacts of several real-world coastal restoration projects. The modules were developed using a web-based design that includes geospatial visualization via a built-in map-interface, textual instructions, video tutorials, and immediate feedback mechanisms. Following pilot implementations, an improvement-focused evaluation was conducted to examine the effectiveness of the modules and their potential for advancing students’ experiences with modeling-based analysis in hydrology and water resources. Both qualitative and quantitative data was collected including Likert-scale surveys, student performance grades, informal interviews, and text-response surveys. Students’ perceptions indicated that data and modeling-driven pedagogy using local real-world projects contributed to their learning and served as an effective supplement to instruction. The evaluation results also pointed out some key aspects on how to design effective and conducive undergraduate learning experiences that adopt technology-enhanced, data and modeling-based strategies, and how to pedagogically strike a balance between sufficient module complexity, ensurance of students’ continuous engagement, and flexibility to fit within existing curricula limitations. Additionally, to investigate how such learning modules can achieve large scale adoption, a total of 100 interviews were conducted with academic instructors and practicing professionals in the field of hydrology and water resources engineering. Key perspectives indicate that future efforts should appease hindering factors such as steep learning curves, lack of assessment data, refurbishment requirements, rigidness of material, time limitations.

APA, Harvard, Vancouver, ISO, and other styles

43

Cenzer, Irena [Verfasser], and Helmut [Akademischer Betreuer] Ostermann. "Geriatrics principles in health care of older adults and the use of real-world data in aging-related research / Irena Cenzer ; Betreuer: Helmut Ostermann." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2020. http://d-nb.info/1221960636/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Gulati, Mayank. "Bridging Sim-to-Real Gap in Offline Reinforcement Learning for Antenna Tilt Control in Cellular Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292948.

Full text

Abstract:

Antenna tilt is the angle subtended by the radiation beam and horizontal plane. This angle plays a vital role in determining the coverage and the interference of the network with neighbouring cells and adjacent base stations. Traditional methods for network optimization rely on rule-based heuristics to do decision making for antenna tilt optimization to achieve desired network characteristics. However, these methods are quite brittle and are incapable of capturing the dynamics of communication traffic. Recent advancements in reinforcement learning have made it a viable solution to overcome this problem but even this learning approach is either limited to its simulation environment or is limited to off-policy offline learning. So far, there has not been any effort to overcome the previously mentioned limitations, so as to make it applicable in the real world. This work proposes a method that consists of transferring reinforcement learning policies from a simulated environment to a real environment i.e. sim-to-real transfer through the use of offline learning. The approach makes use of a simulated environment and a fixed dataset to compensate for the underlined limitations. The proposed sim-to-real transfer technique utilizes a hybrid policy model, which is composed of a portion trained in simulation and a portion trained on the offline real-world data from the cellular networks. This enables to merge samples from the real-world data to the simulated environment consequently modifying the standard reinforcement learning training procedures through knowledge sharing between the two environment’s representations. On the one hand, simulation enables to achieve better generalization performance with respect to conventional offline learning as it complements offline learning with learning through unseen simulated trajectories. On the other hand, the offline learning procedure enables to close the sim-to-real gap by exposing the agent to real-world data samples. Consequently, this transfer learning regime enable us to establish optimal antenna tilt control which in turn results in improved coverage and reduced interference with neighbouring cells in the cellular network.
Antennlutning är den vinkel som dämpas av strålningsstrålen och det horisontella planet. Denna vinkel spelar en viktig roll för att bestämma täckningen och störningen av nätverket med angränsande celler och intilliggande basstationer. Traditionella metoder för nätverksoptimering förlitar sig på regelbaserad heuristik för att göra beslutsfattande för antennlutningsoptimering för att uppnå önskade nätverksegenskaper. Dessa metoder är dock ganska styva och är oförmögna att fånga dynamiken i kommunikationstrafiken. De senaste framstegen inom förstärkningsinlärning har gjort det till en lönsam lösning att lösa detta problem, men även denna inlärningsmetod är antingen begränsad till dess simuleringsmiljö eller är begränsad till off-policy offline inlärning. Hittills har inga ansträngningar gjorts för att övervinna de tidigare nämnda begränsningarna för att göra det tillämpligt i den verkliga världen. Detta arbete föreslår en metod som består i att överföra förstärkningsinlärningspolicyer från en simulerad miljö till en verklig miljö, dvs. sim-till-verklig överföring genom användning av offline-lärande. Metoden använder en simulerad miljö och en fast dataset för att kompensera för de understrukna begränsningarna. Den föreslagna sim-till-verkliga överföringstekniken använder en hybridpolicymodell, som består av en del utbildad i simulering och en del utbildad på offline-verkliga data från mobilnätverk. Detta gör det möjligt att slå samman prover från verklig data till den simulerade miljön och därmed modifiera standardutbildningsförfarandena för förstärkning genom kunskapsdelning mellan de två miljöernas representationer. Å ena sidan möjliggör simulering att uppnå bättre generaliseringsprestanda med avseende på konventionellt offlineinlärning eftersom det kompletterar offlineinlärning med inlärning genom osynliga simulerade banor. Å andra sidan möjliggör offline-inlärningsförfarandet att stänga sim-till-real-klyftan genom att exponera agenten för verkliga dataprov. Följaktligen möjliggör detta överföringsinlärningsregime att upprätta optimal antennlutningskontroll som i sin tur resulterar i förbättrad täckning och minskad störning med angränsande celler i mobilnätet.

APA, Harvard, Vancouver, ISO, and other styles

45

Telikapalli, Surya. "Collaborative design (COLLDESIGN): A real-time interactive unified modeling language tool." CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2669.

Full text

Abstract:

This project extended COLLDESIGN, an interactive collaborative modeling tool that was developed by Mr. Hara Totapally. The initial version included a collaborative framework comprised of configurable client and server components. This project accomplished a complete implementation of the Class Diagram view. In addition, extending the framework, text messaging and audio conferencing features have been implemented to allow for real-time textual and audio communication between team members working on a particular project. VideoClient is the GUI of the application.

APA, Harvard, Vancouver, ISO, and other styles

46

Ferreira, E. (Eija). "Model selection in time series machine learning applications." Doctoral thesis, Oulun yliopisto, 2015. http://urn.fi/urn:isbn:9789526209012.

Full text

Abstract:

Abstract Model selection is a necessary step for any practical modeling task. Since the true model behind a real-world process cannot be known, the goal of model selection is to find the best approximation among a set of candidate models. In this thesis, we discuss model selection in the context of time series machine learning applications. We cover four steps of the commonly followed machine learning process: data preparation, algorithm choice, feature selection and validation. We consider how the characteristics and the amount of data available should guide the selection of algorithms to be used, and how the data set at hand should be divided for model training, selection and validation to optimize the generalizability and future performance of the model. We also consider what are the special restrictions and requirements that need to be taken into account when applying regular machine learning algorithms to time series data. We especially aim to bring forth problems relating model over-fitting and over-selection that might occur due to careless or uninformed application of model selection methods. We present our results in three different time series machine learning application areas: resistance spot welding, exercise energy expenditure estimation and cognitive load modeling. Based on our findings in these studies, we draw general guidelines on which points to consider when starting to solve a new machine learning problem from the point of view of data characteristics, amount of data, computational resources and possible time series nature of the problem. We also discuss how the practical aspects and requirements set by the environment where the final model will be implemented affect the choice of algorithms to use
Tiivistelmä Mallinvalinta on oleellinen osa minkä tahansa käytännön mallinnusongelman ratkaisua. Koska mallinnettavan ilmiön toiminnan taustalla olevaa todellista mallia ei voida tietää, on mallinvalinnan tarkoituksena valita malliehdokkaiden joukosta sitä lähimpänä oleva malli. Tässä väitöskirjassa käsitellään mallinvalintaa aikasarjamuotoista dataa sisältävissä sovelluksissa neljän koneoppimisprosessissa yleisesti noudatetun askeleen kautta: aineiston esikäsittely, algoritmin valinta, piirteiden valinta ja validointi. Väitöskirjassa tutkitaan, kuinka käytettävissä olevan aineiston ominaisuudet ja määrä tulisi ottaa huomioon algoritmin valinnassa, ja kuinka aineisto tulisi jakaa mallin opetusta, testausta ja validointia varten mallin yleistettävyyden ja tulevan suorituskyvyn optimoimiseksi. Myös erityisiä rajoitteita ja vaatimuksia tavanomaisten koneoppimismenetelmien soveltamiselle aikasarjadataan käsitellään. Työn tavoitteena on erityisesti tuoda esille mallin ylioppimiseen ja ylivalintaan liittyviä ongelmia, jotka voivat seurata mallinvalin- tamenetelmien huolimattomasta tai osaamattomasta käytöstä. Työn käytännön tulokset perustuvat koneoppimismenetelmien soveltamiseen aikasar- jadatan mallinnukseen kolmella eri tutkimusalueella: pistehitsaus, fyysisen harjoittelun aikasen energiankulutuksen arviointi sekä kognitiivisen kuormituksen mallintaminen. Väitöskirja tarjoaa näihin tuloksiin pohjautuen yleisiä suuntaviivoja, joita voidaan käyttää apuna lähdettäessä ratkaisemaan uutta koneoppimisongelmaa erityisesti aineiston ominaisuuksien ja määrän, laskennallisten resurssien sekä ongelman mahdollisen aikasar- jaluonteen näkökulmasta. Työssä pohditaan myös mallin lopullisen toimintaympäristön asettamien käytännön näkökohtien ja rajoitteiden vaikutusta algoritmin valintaan

APA, Harvard, Vancouver, ISO, and other styles

47

Lanera, Corrado. "Sviluppo e applicazione di tecniche di apprendimento automatico per l'analisi e la classificazione del testo in ambito clinico. Development and Application of Machine Learning Techniques for Text Analyses and Classification in Clinical Research." Doctoral thesis, Università degli studi di Padova, 2020. http://hdl.handle.net/11577/3426256.

Full text

Abstract:

The content of Electronic Health Records (EHRs) is hugely heterogeneous, depending on the overall health system structure. Possibly, the most present and underused unstructured type of data included in the EHRs is the free-text. Nowadays, with Machine Learning (ML), we can take advantage of automatic models to encode narratives showing performance comparable to the human ones. In this dissertation, the focus is on the investigation of ML Techniques (MLT) to get insights from free-text in clinical settings. We considered two main groups of free-text involved in clinical research. The first is composed of extensive documents like research papers or study protocols. For this group, we considered 14 Systematic Reviews (SRs), including 7,494 studies from PubMed and a whole snapshot of 233,609 trials from ClinicalTrials.gov. Pediatric EHRs compose the second group, for which we considered two sources of data: one of 6,903,035 visits from the Italian Pedianet database, and the second of 2,723 Spanish discharging notes from pediatric Emergency Departments (EDs) of nine hospitals in Nicaragua. The first contribution reported is an automatic system trained to replicate a search from specialized search engines to clinical registries. The model purposed showed very high classification performances (AUC from 93.4% to 99.9% among the 14 SRs), with the added value of a reduced amount of non-relevant studies extracted (mean of 472 and maximum of 2119 additional records compared to 572 and 2680 of the original manual extraction respectively). A comparative study to explore the effect of changing different MLT or methods to manage class imbalance is reported. A whole investigation on pediatric ED visits collected from nine hospitals in Nicaragua was reported, showing a mean accuracy in the classification of discharge diagnoses of 78.31% showing promising performance of an ML for the automatic classification of ED free-text discharge diagnoses in the Spanish language. A further contribution aimed to improve the accuracy of infectious disease detection at the population level. That is a crucial public health issue that can provide the background information necessary for the implementation of effective control strategies, such as advertising and monitoring the effectiveness of vaccination campaigns. Among the two studies reported of classify cases of Varicella-Zoster Virus and types of otitis, both the primary ML paradigms of shallow and deep models were explored. In both cases the results were highly promising; in the latter, reaching performances comparable to the human ones (Accuracy 96.59% compared with 95.91% achieved by human annotators, and balanced F1 score of 95.47% compared with 93.47%). A further relevant side goal achieved rely on the languages investigated. The international research on the use of MLTs to classify EHRs is focused on English-based datasets mainly. Hence, results on non-English databases, like the Italian Pedianet or the Spanish of ED visits considered in the dissertation are essential to assess general applicability of MLTs at a general linguistic level. Showing performances comparable to the human ones, the dissertation highlights the real possibility to start to incorporate ML systems on daily clinical practice to produce a concrete improvement in the health care processes when free-text comes into account.
Il contenuto delle cartelle cliniche elettroniche (EHR) è estremamente eterogeneo, dipendendo della struttura generale del sistema sanitario. Al loro interno, il testo libero èprobabilmente la tipologia di dati non strutturato più presente e contemporaneamente sottoutilizzato. Al giorno d'oggi, grazie alle tecniche di Machine Learning (MLT), possiamo sfruttare modelli automatici per codificarne il contenuto testuale con prestazioni comparabili a quelle umane. In questa tesi, l'attenzione si concentra sull'investigazione delle MLT per l'ottenimento di informazioni utili non triviali dal testo libero in contesti clinici. Abbiamo considerato due tipi principali di testo libero coinvolti nella ricerca clinica. Il primo è composto da documenti estesi come articoli scientifici o protocolli di studio. Per questo gruppo, abbiamo preso in considerazione 14 revisioni sistematiche (SR), tra cui 7.494 studi di PubMed e un'intera istantanea composta da 233.609 studi clinici da ClinicalTrials.gov. Le cartelle cliniche elettroniche pediatriche compongono il secondo gruppo, per il quale abbiamo considerato due fonti di dati: una di 6.903.035 visite dal database italiano Pedianet e la seconda da 2.723 note di dimissione ospedaliera scritte in spagnolo e provenienti dai dipartimenti di emergenza (DE) pediatrica di nove ospedali in Nicaragua. Il primo contributo riportato è un sistema automatico addestrato per replicare una ricerca dai motori di ricerca specializzati ai registri clinici. Il modello proposto ha mostrato prestazioni di classificazione molto elevate (AUC dal 93,4% al 99,9% tra i 14 SR), con il valore aggiunto di una quantità ridotta di studi non rilevanti estratti (media di 472 e massimo di 2119 record aggiuntivi rispetto a 572 e 2680 dell'estrazione manuale originale rispettivamente). Viene riportato anche uno studio comparativo per esplorare l'effetto dell'utilizzo di differenti MLT e di metodi diversi per gestire gli effetti dello squilibro di numerosità nelle classi. Nella tesi è riportata inoltre un'intera indagine sulle visite pediatriche presso i DE raccolte presso i nove ospedali del Nicaragua. In tale indagine emerge un'accuratezza media nella classificazione delle diagnosi di dimissione coi modelli proposti del 78,31%, mostrando promettenti prestazioni per un sistema ML per la classificazione automatica delle diagnosi di dimissione da testo libero in lingua spagnola. Un ulteriore contributo riportato ha mirato a migliorare l'accuratezza del rilevamento delle malattie infettive a livello di popolazione. Questo è un problema cruciale per la salute pubblica che può fornire le informazioni di base necessarie per l'implementazione di strategie di controllo efficaci, come la notifica e il monitoraggio di efficacia di campagne di vaccinazione. Tra i due studi riportati, sono stati esplorati entrambi i paradigmi primari di ML classici e profondi. In entrambi i casi i risultati sono stati molto promettenti; nel secondo, raggiungendo prestazioni paragonabili a quelle umane (precisione del 96,59% rispetto al 95,91% raggiunta dagli annotatori umani e livello F1 bilanciato del 95,47% rispetto al 93,47%). Un ulteriore obiettivo secondario ma rilevante raggiunto riguarda le lingue indagate. La ricerca internazionale sull'uso delle MLT per classificare gli EHR si concentra principalmente su set di dati testuali in lingua inglese. Pertanto, i risultati su database non inglesi, come il Pedianet italiano o quello spagnolo delle visite ED considerate nella tesi, risultano contributi chiave per valutare l'applicabilità generale delle MLT a livello linguistico generale. Mostrando prestazioni paragonabili a quelle umane, la tesi evidenzia la reale possibilità di iniziare a incorporare i sistemi ML nella pratica clinica quotidiana per produrre un miglioramento concreto nei processi sanitari quando si tiene conto del testo libero.

APA, Harvard, Vancouver, ISO, and other styles

48

Canavan, Caroline. "Using real world data to generate health economic models : a worked example assessing the cost-effectiveness of referral to gastroenterology for irritable bowel syndrome in the UK." Thesis, University of Nottingham, 2016. http://eprints.nottingham.ac.uk/32666/.

Full text

Abstract:

Introduction: Irritable bowel syndrome (IBS) has substantial impact on Quality of Life (QoL) and patients have high healthcare utilization. Guidelines recommend diagnosis and management within primary care, yet around 25% of patients are referred to gastroenterology. These studies aimed to assess the incidence of organic gastrointestinal disease in patients diagnosed with IBS, the cost of healthcare utilization and the QoL in patients with IBS before and after seeing a gastroenterologist and to estimate the cost-effectiveness of a gastroenterology appointment. Methods: Patients with IBS were identified within the UK Clinical Practice Research Dataset. Incidence rates of coeliac disease, colorectal cancer (CRC) and inflammatory bowel disease (IBD) were calculated. Individual-level healthcare utilization data were extracted for IBS patients who first visited a gastroenterologist in 2008 or 2009. Mean costs of total healthcare utilization were calculated before and after gastroenterology attendance. A questionnaire study of patients with IBS attending a gastroenterology outpatient clinic for the first time measured QoL and utility before and after the appointment. Quality Adjusted Life Years (QALYs) were modeled from these utility values. Cost-effectiveness of a referral to gastroenterology in IBS was assessed using mean cost per QALY. Results: Fifteen years after IBS diagnosis, the combined cumulative excess incidence of coeliac disease, IBD and CRC in IBS is 3.7%. Over one year following gastroenterology appointment, the expected QALY gain compared to no appointment was 0.03 and the expected extra total healthcare costs were £657. The incremental cost-effectiveness ratio was £27865.64/QALY. Referral for patients younger than 30, men, and increasing the time horizon, reduces the expected cost effectiveness. Conclusions: My findings provide reassurance that non-specialists are unlikely to be missing an organic condition in the majority of IBS patients. Referral to a gastroenterologist for IBS might be cost-effective for the NHS but more data, especially on potential QALY gains, are needed.

APA, Harvard, Vancouver, ISO, and other styles

49

Cars, Thomas. "Real-Time Monitoring of Healthcare Interventions in Routine Care : Effectiveness and Safety of Newly Introduced Medicines." Doctoral thesis, Uppsala universitet, Kardiovaskulär epidemiologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-304324.

Full text

Abstract:

Before market authorization of new medicines, their efficacy and safety are evaluated using randomized controlled trials. While there is no doubt about the scientific value of randomized trials, they are usually conducted in selected populations with questionable generalizability to routine care. In the digital data revolution era, with healthcare data growing at an unprecedented rate, drug monitoring in routine care is still highly under-utilized. Although many countries have access to data on prescription drugs at the individual level in ambulatory care, such data are often missing for hospitals. This is a growing problem considering the clear trend towards more new and expensive drugs administered in the hospital setting. The aim of this thesis was therefore to develop methods for extracting data on drug use from a hospital-based electronic health record system and further to build and evaluate models for real-time monitoring of effectiveness and safety of new drugs in routine care using data from electronic health records and regional and national health care registers. Using the developed techniques, we were able to demonstrate drug use and health service utilization for inflammatory bowel disease and to evaluate the comparative effectiveness and safety of antiarrhythmic drugs. With a rapidly evolving drug development, it is important to optimize the evaluation of effectiveness, safety and health economic value of new medicines in routine care. We believe that the models described in this thesis could contribute to fulfil this need.

APA, Harvard, Vancouver, ISO, and other styles

50

Hiroi, Shinzo. "Impact of health insurance coverage for Helicobacter pylori gastritis on the trends in eradication therapy in Japan: retrospective observational study and simulation study based on real world data." Kyoto University, 2018. http://hdl.handle.net/2433/232454.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Real word Data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles