Dissertations / Theses: 'Aggregated data'

1

Tanaka, Yusuke. "Probabilistic Models for Spatially Aggregated Data." Kyoto University, 2020. http://hdl.handle.net/2433/253422.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mouhoub, Mohamed Lamine. "Aggregated Search of Data and Services." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLED066/document.

Full text

Abstract:

Ces dernières années ont témoigné du succès du projet Linked Open Data (LOD) et de la croissance du nombre de sources de données sémantiques disponibles sur le web. Cependant, il y a encore beaucoup de données qui ne sont pas encore mises à disposition dans le LOD telles que les données sur demande, les données de capteurs etc. Elles sont néanmoins fournies par des API des services Web. L'intégration de ces données au LOD ou dans des applications de mashups apporterait une forte valeur ajoutée. Cependant, chercher de tels services avec les outils de découverte de services existants nécessite une connaissance préalable des répertoires de services ainsi que des ontologies utilisées pour les décrire.Dans cette thèse, nous proposons de nouvelles approches et des cadres logiciels pour la recherche de services web sémantiques avec une perspective d'intégration de données. Premièrement, nous introduisons LIDSEARCH, un cadre applicatif piloté par SPARQL pour chercher des données et des services web sémantiques.De plus, nous proposons une approche pour enrichir les descriptions sémantiques de services web en décrivant les relations ontologiques entre leurs entrées et leurs sorties afin de faciliter l'automatisation de la découverte et de la composition de services. Afin d'atteindre ce but, nous utilisons des techniques de traitement automatique de la langue et d'appariement de textes basées sur le deep-learning pour mieux comprendre les descriptions des services.Nous validons notre travail avec des preuves de concept et utilisons les services et les ontologies d'OWLS-TC pour évaluer nos approches proposées de sélection et d'enrichissement
The last years witnessed the success of the Linked Open Data (LOD) project as well as a significantly growing amount of semantic data sources available on the web. However, there are still a lot of data not being published as fully materialized knowledge bases like as sensor data, dynamic data, data with limited access patterns, etc. Such data is in general available through web APIs or web services. Integrating such data to the LOD or in mashups would have a significant added value. However, discovering such services requires a lot of efforts from developers and a good knowledge of the existing service repositories that the current service discovery systems do not efficiently overcome.In this thesis, we propose novel approaches and frameworks to search for semantic web services from a data integration perspective. Firstly, we introduce LIDSEARCH, a SPARQL-driven framework to search for linked data and semantic web services. Moreover, we propose an approach to enrich semantic service descriptions with Input-Output relations from ontologies to facilitate the automation of service discovery and composition. To achieve such a purpose, we apply natural language processing techniques and deep-learning-based text similarity techniques to leverage I/O relations from text to ontologies.We validate our work with proof-of-concept frameworks and use OWLS-TC as a dataset for conducting our experiments on service search and enrichment

APA, Harvard, Vancouver, ISO, and other styles

3

Samita, Sembakutti. "Analysis of aggregated plant disease incidence data." Thesis, University of Edinburgh, 1995. http://hdl.handle.net/1842/27331.

Full text

Abstract:

If diseased plants (or plant units) are randomly dispersed, the frequency distribution of diseased plants (or plant units) per sample may be described by a binomial distribution, and statistical analyses may be based on the linear logistic model. Since most disease incidence data do not have a random spatial pattern, the binomial distribution can hardly ever, in practice, be used to describe observed frequencies. In this study, the use of conditional probability distributions, such as the logistic-normal binomial distribution, for such data is illustrated. Both descriptive distribution fitting and statistical modelling are discussed. The study evaluates several methods for analysis of incidence data which do not exhibit a random spatial pattern. Some of these methods are applied to plant disease data for the first time. A method of choosing between the different analyses is discussed. All the techniques are illustrated using examples and, as an application, survey data collected on pineapple wilt disease in Sri Lanka are extensively studied. As an alternative method of describing disease incidence data with a non random spatial pattern, the use of two-dimensional distance class (2DCLASS) analysis was evaluated using the same survey data. 2DCLASS analysis is widely accepted in plant disease epidemiology as a method of analysing non-random spatial patterns when the observations are made as presence or absence of the disease on individual plant basis. We demonstrate the possibility of using quadrat-based data in 2DCLASS analysis. We investigate the use of 2DCLASS analysis as a methodology and find some drawbacks with this technique, which are discussed in detail. Moreover, this study introduces a new parameter in the 2DCLASS analysis called Scaled Core Cluster size, that may be more suitable to use for comparison of datasets of different sizes.

APA, Harvard, Vancouver, ISO, and other styles

4

Rastogi, Tanay. "Load Identification from Aggregated Data using Generative Modeling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-249599.

Full text

Abstract:

In the view of an exponential increase in demand for energy, there is a need to come up with a sustainable energy consumption system in residential buildings. Several pieces of research show that this can be achieved by providing real-time energy consumption feedback of each appliance to its residents. This can be achieved through Non-Intrusive Load Monitoring System (NILM) that disaggregates the electricity consumption of individual appliances from the total energy consumption of a household. The state-of-art NILM have several challenges that preventing its large-scale implementation due to its limited applicability and scalability on different households. Most of the NILM research only trains the inference model for a specific house with a limited set of appliances and does not create models that can generalize appliances that are not present in the dataset. In this Master thesis, a novel approach is proposed to tackle the above-mentioned issue in the NILM. The thesis propose to use a Gaussian Mixture Model (GMM) procedure to create a generalizable electrical signature model for each appliance type by training over labelled data from different appliances of the same type and create various combinations of appliances by merging the generated models. Maximum likelihood estimation method is used to label the unlabeled aggregated data and disaggregate it into individual appliances. As a proof of concept, the proposed algorithm is evaluated on two datasets, Toy dataset and ACSF2 dataset, and is compared with a modified version of state-of-the-art RNN network on ACS-F2 dataset. For evaluation, Precision, Recall and F-score metrics are used on all the implementations. From the evaluation, it can be stated that the GMM procedure can create a generalizable appliance signature model, can disaggregate the aggregated data and label previously unseen appliances. The thesis work also shows that given a small set of training data, the proposed algorithm performs better than RNN implementation. On the other hand, the proposed algorithm highly depends on the quality of the data. The algorithm also fails to create an accurate model for appliances due to the poor initialization of parameters for the GMM. In addition, the proposed algorithm suffers from the same inaccuracies as the state of art.
På grund av den exponentiella ökningen av energi-efterfrågan är det nödvändigt att komma fram till ett hållbart energiförbrukningssystem i bostäder. Flera undersökningar visar att detta kan uppnås genom att upplysa användaren om energikonsumtionen för varje apparat i huset. Detta kan uppnås genom ett icke-störande övervakningssystem som visar belastningen (NILM) och skiljer elförbrukningen hos enskilda apparater från hushållets totala energiförbrukning. Det senaste NILM har flera utmaningar som försvårar ett omfattande genomförande på grund av begränsad lämplighet hos olika hushåll. I forskningen inom NILM tränas oftast endast inferensmodellen för ett specifikt hus med ett begränsat antal apparater och skapar inte modeller som kan generalisera till apparater som inte finns i datasetet. I detta examensarbete föreslås ett nytt tillvägagångssätt för att angripa det ovan nämnda problemet med NILM. Arbetet avser att använda en Gaussian Mixture Model, GMM-teknik, för att skapa en generaliserbar elektrisk signaturmodell för varje typ av apparat genom att träna över markerade data från olika apparater av samma typ och skapa olika kombinationer av apparater genom att slå samman de genererade modellerna. Maximum likelihood-metoden används för att markera omärkta aggregerade data och disaggregera data i enskilda apparater. Som ett bevis på konceptet utvärderas den föreslagna algoritmen på två dataset, Toy-datasetet och ACS-F2- datasetet, och jämförs med en modifierad version av det senaste RNN- nätverket på ACS-F2-datasetet. Precision, Recall och F-score är mätetal som används för utvärdering av alla implementeringar. Från utvärderingen kan det konstateras att GMM-förfarandet kan skapa en generaliserbar signaturmodell, kan disaggregera aggregerade data och markera tidigare osynliga apparater. Examensarbetet visar också att, givet en liten uppsättning av träningsdata, så har den föreslagna algoritmen bättre prestanda än RNNgenomförandet. Å andra sidan är den föreslagna algoritmen väldigt beroende av kvaliteten hos data. Algoritmen misslyckas också med att skapa en exakt modell för apparater på grund av den dåliga initialiseringen av parametrar för GMM. Dessutom lider den föreslagna algoritmen av samma felaktigheter som den aktuella modellen.

APA, Harvard, Vancouver, ISO, and other styles

5

Folia, Maria Myrto. "Inference in stochastic systems with temporally aggregated data." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/inference-in-stochastic-systems-with-temporally-aggregated-data(17940c86-e6b3-4f7d-8a43-884bbf72b39e).html.

Full text

Abstract:

The stochasticity of cellular processes and the small number of molecules in a cell make deterministic models inappropriate for modelling chemical reactions at the single cell level. The Chemical Master Equation (CME) is widely used to describe the evolution of biochemical reactions inside cells stochastically but is computationally expensive. The Linear Noise Approximation (LNA) is a popular method for approximating the CME in order to carry out inference and parameter estimation in stochastic models. Data from stochastic systems is often aggregated over time. One such example is in luminescence bioimaging, where a luciferase reporter gene allows us to quantify the activity of proteins inside a cell. The luminescence intensity emitted from the luciferase experiments is collected from single cells and is integrated over a time period (usually 15 to 30 minutes), which is then collected as a single data point. In this work we consider stochastic systems that we approximate using the Linear Noise Approximation (LNA). We demonstrate our method by learning the parameters of three different models from which aggregated data was simulated, an Ornstein-Uhlenbeck model, a Lotka-Voltera model and a gene transcription model. We have additionally compared our approach to the existing approach and find that our method is outperforming the existing one. Finally, we apply our method in microscopy data from a translation inhibition experiment.

APA, Harvard, Vancouver, ISO, and other styles

6

Davis, Brett Andrew, and Brett Davis@abs gov au. "Inference for Discrete Time Stochastic Processes using Aggregated Survey Data." The Australian National University. Faculty of Economics and Commerce, 2003. http://thesis.anu.edu.au./public/adt-ANU20040806.104137.

Full text

Abstract:

We consider a longitudinal system in which transitions between the states are governed by a discrete time finite state space stochastic process X. Our aim, using aggregated sample survey data of the form typically collected by official statistical agencies, is to undertake model based inference for the underlying process X. We will develop inferential techniques for continuing sample surveys of two distinct types. First, longitudinal surveys in which the same individuals are sampled in each cycle of the survey. Second, cross-sectional surveys which sample the same population in successive cycles but with no attempt to track particular individuals from one cycle to the next. Some of the basic results have appeared in Davis et al (2001) and Davis et al (2002).¶ Longitudinal surveys provide data in the form of transition frequencies between the states of X. In Chapter Two we develop a method for modelling and estimating the one-step transition probabilities in the case where X is a non-homogeneous Markov chain and transition frequencies are observed at unit time intervals. However, due to their expense, longitudinal surveys are typically conducted at widely, and sometimes irregularly, spaced time points. That is, the observable frequencies pertain to multi-step transitions. Continuing to assume the Markov property for X, in Chapter Three, we show that these multi-step transition frequencies can be stochastically interpolated to provide accurate estimates of the one-step transition probabilities of the underlying process. These estimates for a unit time increment can be used to calculate estimates of expected future occupation time, conditional on an individuals state at initial point of observation, in the different states of X.¶ For reasons of cost, most statistical collections run by official agencies are cross-sectional sample surveys. The data observed from an on-going survey of this type are marginal frequencies in the states of X at a sequence of time points. In Chapter Four we develop a model based technique for estimating the marginal probabilities of X using data of this form. Note that, in contrast to the longitudinal case, the Markov assumption does not simplify inference based on marginal frequencies. The marginal probability estimates enable estimation of future occupation times (in each of the states of X) for an individual of unspecified initial state. However, in the applications of the technique that we discuss (see Sections 4.4 and 4.5) the estimated occupation times will be conditional on both gender and initial age of individuals.¶ The longitudinal data envisaged in Chapter Two is that obtained from the surveillance of the same sample in each cycle of an on-going survey. In practice, to preserve data quality it is necessary to control respondent burden using sample rotation. This is usually achieved using a mechanism known as rotation group sampling. In Chapter Five we consider the particular form of rotation group sampling used by the Australian Bureau of Statistics in their Monthly Labour Force Survey (from which official estimates of labour force participation rates are produced). We show that our approach to estimating the one-step transition probabilities of X from transition frequencies observed at incremental time intervals, developed in Chapter Two, can be modified to deal with data collected under this sample rotation scheme. Furthermore, we show that valid inference is possible even when the Markov property does not hold for the underlying process.

APA, Harvard, Vancouver, ISO, and other styles

7

Marklund, Emil. "Bayesian inference in aggregated hidden Markov models." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-243090.

Full text

Abstract:

Single molecule experiments study the kinetics of molecular biological systems. Many such studies generate data that can be described by aggregated hidden Markov models, whereby there is a need of doing inference on such data and models. In this study, model selection in aggregated Hidden Markov models was performed with a criterion of maximum Bayesian evidence. Variational Bayes inference was seen to underestimate the evidence for aggregated model fits. Estimation of the evidence integral by brute force Monte Carlo integration theoretically always converges to the correct value, but it converges in far from tractable time. Nested sampling is a promising method for solving this problem by doing faster Monte Carlo integration, but it was here seen to have difficulties generating uncorrelated samples.

APA, Harvard, Vancouver, ISO, and other styles

8

Broc, Camilo. "Variable selection for data aggregated from different sources with group of variable structure." Thesis, Pau, 2019. http://www.theses.fr/2019PAUU3048.

Full text

Abstract:

Durant les dernières décennies, la quantité de données disponibles en génétique a consi-dérablement augmenté. D’une part, une amélioration des technologies de séquençage demolécules a permis de réduire fortement le coût d’extraction du génome humain. D’autrepart, des consortiums internationaux d’institutions ont permis la mise en commun de lacollecte de données sur de larges populations. Cette quantité de données nous permetd’espérer mieux comprendre les mécanismes régissant le fonctionnement de nos cellules.Dans ce contexte, l’épidémiologie génétique est un domaine cherchant à déterminer larelation entre des caractéristiques génétiques et l’apparition d’une maladie. Des méthodesstatistiques spécifiques à ce domaine ont dû être développées, en particulier à cause desdimensions que les données présentent : en génétique, l’information est contenue dans unnombre de variables grand par rapport au nombre d’observations.Dans cette dissertation, deux contributions sont présentées. Le premier projet appeléPIGE (Pathway-Interaction Gene Environment) développe une méthode pour déterminerdes interactions gène-environnement. Le second projet vise à développer une méthode desélection de variables adaptée à l’analyse de données provenant de différentes études etprésentant une structure de groupe de variables.Le document est divisé en six parties. Le premier chapitre met en relief le contexte,d’un point de vue à la fois biologique et mathématique. Le deuxième chapitre présente lesmotivations de ce travail et la mise en œuvre d’études en épidémiologie génétique. Le troi-sième chapitre aborde les questions relatives à l’analyse d’interactions gène-environnementet la première contribution de la thèse y est présentée. Le quatrième chapitre traite desproblématiques de méta-analyses. Le développement d’une nouvelle méthode de réductionde dimension répondant à ces questions y est présenté. Le cinquième chapitre met en avantla pertinence de la méthode dans des cas de pleiotropie. Enfin, le sixième et dernier chapitredresse un bilan du travail présenté et dresse des perspectives pour le futur
During the last decades, the amount of available genetic data on populations has growndrastically. From one side, a refinement of chemical technologies have made possible theextraction of the human genome of individuals at an accessible cost. From the other side,consortia of institutions and laboratories around the world have permitted the collectionof data on a variety of individuals and population. This amount of data raised hope onour ability to understand the deepest mechanisms involved in the functioning of our cells.Notably, genetic epidemiology is a field that studies the relation between the geneticfeatures and the onset of a disease. Specific statistical methods have been necessary forthose analyses, especially due to the dimensions of available data: in genetics, informationis contained in a high number of variables compared to the number of observations.In this dissertation, two contributions are presented. The first project called PIGE (Pathway-Interaction Gene Environment) deals with gene-environment interaction assessments.The second one aims at developing variable selection methods for data which has groupstructures in both the variables and the observations.The document is divided into six chapters. The first chapter sets the background of this work,where both biological and mathematical notations and concepts are presented and gives ahistory of the motivation behind genetics and genetic epidemiology. The second chapterpresent an overview of the statistical methods currently in use for genetic epidemiology.The third chapter deals with the identification of gene-environment interactions. It includesa presentation of existing approaches for this problem and a contribution of the thesis. Thefourth chapter brings off the problem of meta-analysis. A definition of the problem and anoverview of the existing approaches are presented. Then, a new approach is introduced.The fifth chapter explains the pleiotropy studies and how the method presented in theprevious chapter is suited for this kind of analysis. The last chapter compiles conclusionsand research lines for the future

APA, Harvard, Vancouver, ISO, and other styles

9

Molitor, Torsten. "Coverage Prediction for Inter-Frequency Handover using Machine Learning with Aggregated Training Data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286676.

Full text

Abstract:

An important application of Machine Learning (ML) in mobile networks is to predict if a user device has coverage on a frequency other than the current serving frequency. This is a use-case called Secondary Carrier Prediction (SCP). In this thesis we deduce if data across different cells and frequencies can be successfully combined when learning this task, thus reducing the number of models that require training. Aggregation of data involves several challenges, such as different prevalences and varying amounts of available data, but more importantly the possibility of achieving synergies in training by exploiting recurring patterns in data. By using an experimental setup in which models are trained and validated on aggregated datasets it is shown that synergies in fact can be achieved through aggregation. The scalability of this task is improved so that the number of models can be reduced with a factor as large as the number of cells times the number of frequencies, while maintaining similar or improved prediction performance.
Prediktion av täckning på sekundära frekvenser är en signifikant tillämpning av maskininlärning inom mobila nätverk. I den här avhandlingen utreds möjligheten att träna modeller på aggregationer av data, med följden att antalet modeller blir färre. Olika klassbalanser och varierande tillgång på data är utmaningar som uppstår vid aggregation, men även möjligheten att uppnå synergier genom att utnyttja återkommande mönster i datat. Med en experimentell uppställning där modeller tränas och valideras på aggregerade dataset visas att synergier kan uppnås genom aggregation. Skalbarheten på denna tillämpning förbättras till den grad att antalet modeller kan reduceras med en faktor lika stor som antalet celler gånger antalet frekvenser, med likvärdig eller förbättrad prediktionsprestanda.

APA, Harvard, Vancouver, ISO, and other styles

10

MacKelvie, Erin. "A Comparison of Traditional Aggregated Data to a Comprehensive Second-by-Second Data Depiction in Functional Analysis Graphs." Scholarly Commons, 2021. https://scholarlycommons.pacific.edu/uop_etds/3730.

Full text

Abstract:

Functional analyses (FAs) are an important component of treatment and the data gathered from FAs are often graphed in an aggregate or summary format, such as mean rate per session. Given the prevalence of undifferentiated analyses, it may be that this common method of data depiction is incomplete. In this paper, we compare the traditional aggregate method to a comprehensive second-by-second demonstration of the data including all appropriate and inappropriate responses emitted, as well as programmed and accidental antecedent and consequent variables, which may help further clarify the results of a functional analysis. We compared the functional analysis results of two participants when the data were depicted using the traditional rate aggregate method and depicted using a comprehensive second-by-second method. Although both rate and comprehensive second-by-second data depiction resulted in similar conclusions regarding the maintaining variables for the participants, comprehensive second-by-second data depiction allowed us to draw the conclusions in less time. Additional advantages and disadvantages of each method as it relates to efficiency, therapeutic risk and safety, and practicality are also discussed. Keywords: efficiency, functional analysis, problem behavior, safety, within-session second-by-second analysis.

APA, Harvard, Vancouver, ISO, and other styles

11

Olsén, Ingefeldt Niclas. "The determinants of voter turnout in OECD : An aggregated cross-national study using panel data." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-295468.

Full text

Abstract:

This paper examines in a descriptive manner how two groups of variables, institutional and socio-economic, correlate with voter turnout respectively and if their magnitude have changed over time in OECD countries. Previous research is often based on data from the 70’s and 80’s. Since then, voter turnout in democratic countries has decreased and more citizens do not use their fundamental democratic right of being involved in the process of choosing their representatives. To answer the paper hypotheses i.e. analyzing what factors that correlates with voter turnout, panel data between 1980 and 2012 are used which is estimated by an OLS approach. The outcome of the empirical estimations indicates that 13 out of 19 variables have a significant relationship with turnout. Most of the variables magnitudes are a bit lower than previous literature. From the time sensitivity analysis the result indicates that voters are less influenced by the significant variables that focus on the voting cost. It seems that voters in the 21st century meet voting costs in different manner than previously.

APA, Harvard, Vancouver, ISO, and other styles

12

Lee, Bu Hyoung. "The use of temporally aggregated data on detecting a structural change of a time series process." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/375511.

Full text

Abstract:

Statistics
Ph.D.
A time series process can be influenced by an interruptive event which starts at a certain time point and so a structural break in either mean or variance may occur before and after the event time. However, the traditional statistical tests of two independent samples, such as the t-test for a mean difference and the F-test for a variance difference, cannot be directly used for detecting the structural breaks because it is almost certainly impossible that two random samples exist in a time series. As alternative methods, the likelihood ratio (LR) test for a mean change and the cumulative sum (CUSUM) of squares test for a variance change have been widely employed in literature. Another point of interest is temporal aggregation in a time series. Most published time series data are temporally aggregated from the original observations of a small time unit to the cumulative records of a large time unit. However, it is known that temporal aggregation has substantial effects on process properties because it transforms a high frequency nonaggregate process into a low frequency aggregate process. In this research, we investigate the effects of temporal aggregation on the LR test and the CUSUM test, through the ARIMA model transformation. First, we derive the proper transformation of ARIMA model orders and parameters when a time series is temporally aggregated. For the LR test for a mean change, its test statistic is associated with model parameters and errors. The parameters and errors in the statistic should be changed when an AR(p) process transforms upon the mth order temporal aggregation to an ARMA(P,Q) process. Using the property, we propose a modified LR test when a time series is aggregated. Through Monte Carlo simulations and empirical examples, we show that the aggregation leads the null distribution of the modified LR test statistic being shifted to the left. Hence, the test power increases as the order of aggregation increases. For the CUSUM test for a variance change, we show that two aggregation terms will appear in the test statistic and have negative effects on test results when an ARIMA(p,d,q) process transforms upon the mth order temporal aggregation to an ARIMA(P,d,Q) process. Then, we propose a modified CUSUM test to control the terms which are interpreted as the aggregation effects. Through Monte Carlo simulations and empirical examples, the modified CUSUM test shows better performance and higher test powers to detect a variance change in an aggregated time series than the original CUSUM test.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

13

Findlay, Elizabeth. "An Exploration of Aggregated Patterns of Student Curriculum-Based-Measurement Outcome Data Within a Response to Intervention Program." DigitalCommons@USU, 2012. https://digitalcommons.usu.edu/etd/1433.

Full text

Abstract:

One major concern when developing a response to intervention (RTI) program is to select effective practices that will be successfully implemented and sustained with adequate organizational guidance and support. The purpose of this study was to explore patterns of student tier placement data as a school-based case example of the nature and utility of RTI in an applied setting. Specifically, this study aimed to explore the extent that the percentages of students placed in a three-tier program based on student oral reading fluency (ORF) level and growth trajectories reflect the standard RTI tier placement (80%, 15%, and 5%) at fall, winter, and spring in a school setting. Percentages of the total student population tier placement were explored with ORF data from third- and fourth-grade students (N = 429) at two schools in fall, winter, and spring. Results showed that school and ORF data reflected the standard percentages of student populations within each tier in fall, winter, and spring. However, slope data showed greater percentages of students in the more intensive tiers. Moreover, flexible grouping, or movement between tiers occurred for few students when movement occurred based on school or ORF level data. No significant differences were found between the school and ORF student tier placements in fall, winter, and spring. A significant difference was found in spring between placement methods with a larger proportion of students in Tier 1 based on the school assignments and a larger proportion of students in Tier 2 and Tier 3 based on ORF slope assignments.

APA, Harvard, Vancouver, ISO, and other styles

14

Bergström, Balder. "The Swedish payroll tax reduction for young workers : - A study of effects found using publicly available aggregated (macro) data." Thesis, Umeå universitet, Nationalekonomi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-166606.

Full text

Abstract:

In 2007, the Swedish payroll tax was reduced for youths in an attempt to suppress the perceived high unemployment among Swedish youths. The reform was rolled back later in 2016. For this period there is a rich supply of publicly available aggregated (macro) data. This thesis aims to examine: first, if the aggregated data is suitable for policy evaluation of the reform, and second, the effects of the reform introduction and repeal. This has been done by using both a conventional fixed effects model and a more unorthodox synthetic control method. Neither of the two methods could show any unbiased and consistent significant result of the treatment effects of the reform. Instead, the results of this thesis suggest that the publicly available aggregated data doesn’t contain enough information to evaluate such reforms.

APA, Harvard, Vancouver, ISO, and other styles

15

Bhatt, Shreyansh. "Data-driven and Knowledge-Based Strategies for Realizing Crowd Wisdom on Social Media." Wright State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1578920003779943.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Tönnies, Thaddäus [Verfasser], Ralph [Gutachter] Brinks, and Adrian [Gutachter] Loerbroks. "Application of the illness-death model to estimate epidemiological measures for diabetes based on aggregated data / Thaddäus Tönnies ; Gutachter: Ralph Brinks, Adrian Loerbroks." Düsseldorf : Universitäts- und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf, 2020. http://d-nb.info/121223863X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Rodrigues, Mário Jorge Santos. "Relação entre despesas em saúde e a saúde das populações : análise da base de dados da OCDE 2012." Master's thesis, Escola Nacional de Saúde Pública. Universidade Nova de Lisboa, 2013. http://hdl.handle.net/10362/10780.

Full text

Abstract:

RESUMO - Introdução: A despesa em saúde aumentou consideravelmente nas últimas décadas na maioria dos países industrializados. Por outro lado, os indicadores de saúde melhoraram. A evidência empírica sobre a relação entre as despesas em saúde e a saúde das populações tem sido inconclusiva. Este estudo aborda a relação entre as despesas em saúde e a saúde das populações através de dados agregados para 34 países para o período 1980-2010. Metodologia: Utilizou-se o coeficiente de correlação de Pearson para avaliar a correlação entre as variáveis explicativas e os indicadores de saúde. Procedeuse ainda à realização de uma regressão multivariada com dados em painel para cada indicador de saúde utilizado como variável dependente: esperança de vida à nascença e aos 65 anos para mulheres e homens, anos de vida potencialmente perdidos para mulheres e homens e mortalidade infantil. A principal variável explicativa utilizada foi a despesa em saúde, mas consideraram-se também vários fatores de confundimento, nomeadamente a riqueza, fatores estilo de vida, e oferta de cuidados. Resultados: A despesa per capita tem impacto nos indicadores de saúde mas ao adicionarmos a variável PIB per capita deixa de ser estatisticamente significativa. Outros fatores têm um impacto significativo para quase todos os indicadores de saúde utilizados: consumo de álcool e tabaco, gordura, o número de médicos e a imunização, confirmando vários resultados da literatura. Conclusão: Os resultados vão ao encontro de alguns estudos que afirmam o impacto marginal das despesas em saúde e do progresso da medicina nos resultados em saúde desde os anos 80 nos países industrializados.
ABSTRACT - Introduction: Health expenditure in most industrialized countries has increased considerably in recent decades. On the other hand, health indicators have improved. However, empirical evidence on the relationship between health expenditure and the health of populations has been inconclusive. This study discusses the relationship between health expenditure and the health outcomes through aggregated data for 34 countries for the period between 1980 and 2010. Methodology: Pearson´s correlation coefficient has been used to evaluate the correlation between explanatory variables and health indicators. We also performed a multivariate regression with panel data for each health indicator used as dependent variable: life expectancy at birth and at the age of 65 for females and males, male and female potential years of life lost and infant mortality. Although the main explanatory variable used was health spending, several other confounding factors such as wealth, lifestyle factors and availability of care were also considered. Results: Although expenditure per capita has an impact in health outcomes, when we add GDP per capita variable the former is no longer statistically significant. Other factors also have a significant impact on almost all health indicators used: alcohol and tobacco consumption, fat, the number of doctors and immunization; thus confirming multiple results. Conclusion: The results confirm several studies that claim marginal impact of health expenditure and medical progress on health results since the 80's in industrialized countries.

APA, Harvard, Vancouver, ISO, and other styles

18

Johnsen, Sofia, and Sarah Felldin. "Improving Knowledge of Truck Fuel Consumption Using Data Analysis." Thesis, Linköpings universitet, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-130047.

Full text

Abstract:

The large potential of big data and how it has brought value into various industries have been established in research. Since big data has such large potential if handled and analyzed in the right way, revealing information to support decision making in an organization, this thesis is conducted as a case study at an automotive manufacturer with access to large amounts of customer usage data of their vehicles. The reason for performing an analysis of this kind of data is based on the cornerstones of Total Quality Management with the end objective of increasing customer satisfaction of the concerned products or services. The case study includes a data analysis exploring how and if patterns about what affects fuel consumption can be revealed from aggregated customer usage data of trucks linked to truck applications. Based on the case study, conclusions are drawn about how a company can use this type of analysis as well as how to handle the data in order to turn it into business value. The data analysis reveals properties describing truck usage using Factor Analysis and Principal Component Analysis. Especially one property is concluded to be important as it appears in the result of both techniques. Based on these properties the trucks are clustered using k-means and Hierarchical Clustering which shows groups of trucks where the importance of the properties varies. Due to the homogeneity and complexity of the chosen data, the clusters of trucks cannot be linked to truck applications. This would require data that is more easily interpretable. Finally, the importance for fuel consumption in the clusters is explored using model estimation. A comparison of Principal Component Regression (PCR) and the two regularization techniques Lasso and Elastic Net is made. PCR results in poor models difficult to evaluate. The two regularization techniques however outperform PCR, both giving a higher and very similar explained variance. The three techniques do not show obvious similarities in the models and no conclusions can therefore be drawn concerning what is important for fuel consumption. During the data analysis many problems with the data are discovered, which are linked to managerial and technical issues of big data. This leads to for example that some of the parameters interesting for the analysis cannot be used and this is likely to have an impact on the inability to get unanimous results in the model estimations. It is also concluded that the data was not originally intended for this type of analysis of large populations, but rather for testing and engineering purposes. Nevertheless, this type of data still contains valuable information and can be used if managed in the right way. From the case study it can be concluded that in order to use the data for more advanced analysis a big-data plan is needed at a strategic level in the organization. The plan summarizes the suggested solution for the managerial issues of the big data for the organization. This plan describes how to handle the data, how the analytic models revealing the information should be designed and the tools and organizational capabilities needed to support the people using the information.

APA, Harvard, Vancouver, ISO, and other styles

19

Holmqvist, Oskar. "Datalagerstruktur inom psykiatrin : En analys av vårdens data på ettuniversitetssjukhus psykiatriavdelningar." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18510.

Full text

Abstract:

Denna rapports frågeställning är: hur lämpar sig psykiatrins data för en datalagerstruktur och inleds med en bakgrund där en litteraturstudie genomförs. Efter detta kommer en metod del och en genomförande del där intervjuer med avdelningschefer, verksamhetschefen, samt en verksamhetsutvecklare på ett svenskt universitetssjukhus psykiatriavdelning beskrivs. Genom observationer och analys av sjukhusets data samt svaren från respondenterna sammanställdes ett resultat som visar emot att detta sjukhus inte har en god grund för en datalagerstruktur, men genom omprioriteringar kan detta förbättras. Resultatet baserades på forskning gjord av Inmon (2005) och fyra av de kategorier som han anser är mest relevanta för en datalagerstruktur, samt insamlad data från universitetssjukhuset. Det framkom även att alla svenska sjukhus verkar ha problem med sina system och att hela svenska vården är i en uppgraderingsfas där det vid en sådan investering bör prioriteras att skapa fungerande datalagerstrukturer för att kunna analysera vilka resultat vården ger. Just nu fattas beslut i blindo och detta kommer inte förändras om inte förändring sker.
This report's question is: how does psychiatry's data fit into a data warehouse structure and starts with a background in which a literature study is carried out. After this, a method part and an implementation part will be described in which interviews with department heads, the operations manager, as well as an operations developer at a Swedish university hospital's psychiatry department are described. Through observations and analysis of the hospital data and the responses of the respondents, a result was compiled which shows that this hospital does not have a good basis for a data warehouse structure, but through re-prioritization this can be improved. The result was based on research done by Inmon (2005) and four of the categories that he considers to be most relevant to a data warehouse structure, as well as data collected from the university hospital. It also emerged that all Swedish hospitals seem to have problems with their systems and that the entire Swedish healthcare system is in an upgrade phase where it should be prioritized in such an investment to create functioning data warehouse structures in order to be able to analyze the results the health care gives. Right now, decisions are being made blindly and this will not change unless change is made.

APA, Harvard, Vancouver, ISO, and other styles

20

Peterson, Christer. "Familjeföretag i omvandling : en studie av fusionsförlopp och utvecklingsmönster." Doctoral thesis, Umeå universitet, Företagsekonomi, 1985. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-65873.

Full text

Abstract:

In this study a population of 60 family owned businesses acquired in 1971 are analysed over a period of 15 years. The firms are followed historically for four years before and ten years after the merger. The aim is to identify dominating processes and behaviour in different variables during the period 1967-81. This will be done through the following: - on an aggregated level, identify and analyse characteristic processes and patterns by the acquired businesses before and after the acquisition - on an aggregated level compare the pre- and post-merger performances - on an individual business level illustrate, validate and theoretically interpret results and conclusions. Primarily this study has not a theoretical but an empirical point of departure. A working paradigm is that the "confrontation" between the firms 1 "external environment and internal resources" results in dynamics having an impact on the firms. The processes are classified in taxonomies/typologies, in an attempt to answer what has happened. Interpreting the forces behind the development is the attempt to answer why it has happened. The empirical data was collected through three different surveys resulting in quantitative and qualitative observations combined in different perspectives in a multimethological approach. The first is economic data (sales, financial ratios etc) gathered from the firms' external account statements. However, several firms were found to have gone bankrupt, closed down etc. This initiated a second, follow-up study, which had a longitudinal "geography of enterprise" approach and was implemented through a telephone inquiry. The third collection is a case-study of five firms from the population carried out by discussions with representatives of the merging companies. The merged businesses turned out to be extremes compared to branch characteristics respectively. Refinements of the patterns made it possible to construct a three-dimensional typology showing four principal processes. Ten years after the merger there followed five principal spatial and institutional changes. Closures, removals from community and amalgamation with group companies, reduction to production units only, the joining of premises with group companies in the same community and relatively "indépendant" affiliations. One third of the population have been closed down or removed. One half do not exist as "indépendant units". Only one third have escaped larger infringement. Thirty businesses have once more been acquired. Some more than once. When comparing the pre- and post-merger performances, a convergence phenomenon was identified. Oscillating and deviating pre-merger trends later converged towards standard variable values and equilibrium, searching for an optimum group course. The different changes and restructuring activities conducted after the acquisitions, can be summarized in three principal post-merger processes: - liquidation and adjustment of output capacity to market demand. - reorientation through new product and market combinations. - growth and development through "multiplying by splitting" and emancipation of expansion potential.

Diss. Umeå : Univ., 1986

digitalisering@umu

APA, Harvard, Vancouver, ISO, and other styles

21

Heredia, Guzman Maria Belen. "Contributions to the calibration and global sensitivity analysis of snow avalanche numerical models." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALU028.

Full text

Abstract:

Une avalanche de neige est un danger naturel défini comme une masse de neige en mouvement rapide. Depuis les années 30, scientifiques conçoivent des modèles d'avalanche de neige pour décrire ce phénomène. Cependant, ces modèles dépendent de certains paramètres d'entrée mal connus qui ne peuvent pas être mesurés. Pour mieux comprendre les paramètres d'entrée du modèle et les sorties du modèle, les objectifs de cette thèse sont (i) de proposer un cadre pour calibrer les paramètres d'entrée et (ii) de développer des méthodes pour classer les paramètres d'entrée en fonction de leur importance dans le modèle en tenant compte la nature fonctionnelle des sorties. Dans ce cadre, nous développons des méthodes statistiques basées sur l'inférence bayésienne et les analyses de sensibilité globale. Nos développements sont illustrés sur des cas de test et des données réelles des avalanches de neige.D'abord, nous proposons une méthode d'inférence bayésienne pour récupérer la distribution des paramètres d'entrée à partir de séries chronologiques de vitesse d'avalanche ayant été collectées sur des sites de test expérimentaux. Nos résultats montrent qu'il est important d'inclure la structure d'erreur (dans notre cas l'autocorrélation) dans la modélisation statistique afin d'éviter les biais dans l'estimation des paramètres de frottement.Deuxièmement, pour identifier les paramètres d'entrée importants, nous développons deux méthodes basées sur des mesures de sensibilité basées sur la variance. Pour la première méthode, nous supposons que nous avons un échantillon de données et nous voulons estimer les mesures de sensibilité avec cet échantillon. Dans ce but, nous développons une procédure d'estimation non paramétrique basée sur l'estimateur de Nadaraya-Watson pour estimer les indices agrégés de Sobol. Pour la deuxième méthode, nous considérons le cadre où l'échantillon est obtenu à partir de règles d'acceptation/rejet correspondant à des contraintes physiques. L'ensemble des paramètres d'entrée devient dépendant du fait de l'échantillonnage d'acceptation-rejet, nous proposons donc d'estimer les effets de Shapley agrégés (extension des effets de Shapley à des sorties multivariées ou fonctionnelles). Nous proposons également un algorithme pour construire des intervalles de confiance bootstrap. Pour l'application du modèle d'avalanche de neige, nous considérons différents scénarios d'incertitude pour modéliser les paramètres d'entrée. Dans nos scénarios, la position et le volume de départ de l'avalanche sont les entrées les plus importantes.Nos contributions peuvent aider les spécialistes des avalanches à (i) prendre en compte la structure d'erreur dans la calibration du modèle et (ii) proposer un classementdes paramètres d'entrée en fonction de leur importance dans les modèles en utilisant des approches statistiques
Snow avalanche is a natural hazard defined as a snow mass in fast motion. Since the thirties, scientists have been designing snow avalanche models to describe snow avalanches. However, these models depend on some poorly known input parameters that cannot be measured. To understand better model input parameters and model outputs, the aims of this thesis are (i) to propose a framework to calibrate input parameters and (ii) to develop methods to rank input parameters according to their importance in the model taking into account the functional nature of outputs. Within these two purposes, we develop statistical methods based on Bayesian inference and global sensitivity analyses. All the developments are illustrated on test cases and real snow avalanche data.First, we propose a Bayesian inference method to retrieve input parameter distribution from avalanche velocity time series having been collected on experimental test sites. Our results show that it is important to include the error structure (in our case the autocorrelation) in the statistical modeling in order to avoid bias for the estimation of friction parameters.Second, to identify important input parameters, we develop two methods based on variance based measures. For the first method, we suppose that we have a given data sample and we want to estimate sensitivity measures with this sample. Within this purpose, we develop a nonparametric estimation procedure based on the Nadaraya-Watson kernel smoother to estimate aggregated Sobol' indices. For the second method, we consider the setting where the sample is obtained from acceptance/rejection rules corresponding to physical constraints. The set of input parameters become dependent due to the acceptance-rejection sampling, thus we propose to estimate aggregated Shapley effects (extension of Shapley effects to multivariate or functional outputs). We also propose an algorithm to construct bootstrap confidence intervals. For the snow avalanche model application, we consider different uncertainty scenarios to model the input parameters. Under our scenarios, the release avalanche position and volume are the most crucial inputs.Our contributions should help avalanche scientists to (i) account for the error structure in model calibration and (ii) rankinput parameters according to their importance in the models using statistical methods

APA, Harvard, Vancouver, ISO, and other styles

22

Bengtsson, Fredrik. "Efficient aggregate queries on data cubes." Licentiate thesis, Luleå, 2004. http://epubl.luth.se/1402-1757/2004/53.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Khandelwal, Nileshkumar. "An aggregate navigator for data warehouse." Ohio : Ohio University, 2000. http://www.ohiolink.edu/etd/view.cgi?ohiou1172255887.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Badinger, Harald, and Cuaresma Jesus Crespo. "Aggregravity: estimating gravity models from aggregate data." Taylor & Francis, 2015. http://dx.doi.org/10.1080/00036846.2014.1002903.

Full text

Abstract:

This paper considers alternative methods to estimate econometric models based on bilateral data when only aggregate information on the dependent variable is available. Such methods can be used to obtain an indication of the sign and magnitude of bilateral model parameters and, more importantly, to decompose aggregate into bilateral data, which can then be used as proxy variables in further empirical analysis. We perform a Monte Carlo study and carry out a simple real world application using intra-EU trade and capital flows, showing that the methods considered work reasonably well and are worthwhile being considered in the absence of bilateral data. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

25

Badinger, Harald, and Cuaresma Jesus Crespo. "Aggregravity: Estimating Gravity Models from Aggregate Data." WU Vienna University of Economics and Business, 2014. http://epub.wu.ac.at/4295/1/wp183.pdf.

Full text

Abstract:

This paper considers alternative methods to estimate econometric models based on bilateral data when only aggregate information on the dependent variable is available. Such methods can be used to obtain an indication of the sign and magnitude of bilateral model parameters and, more importantly, to decompose aggregate into bilateral data, which can then be used as proxy variables in further empirical analysis. We perform a Monte Carlo study and carry out a simple real world application using intra-EU trade and capital flows, showing that the methods considered work reasonably well and are worthwhile being considered in the absence of bilateral data. (authors' abstract)
Series: Department of Economics Working Paper Series

APA, Harvard, Vancouver, ISO, and other styles

26

Vasconcellos, Klaus Leite Pinto. "Aspects of forecasting aggregate and discrete data." Thesis, University of Warwick, 1992. http://wrap.warwick.ac.uk/59616/.

Full text

Abstract:

This work studies three related topics arising from the problem of forecasting airline passenger bookings. The first topic concerns the initialization through the starting prior for a DLM (Dynamic Linear Model) or Generalized DLM. An approach is given which uses the first observations of the series much more efficiently than that suggested by Pole and West. Proper marginal priors are derived for stationary model components and proper marginal priors may be obtained for parameter subspaces and used for forecasting within that subspace well before a full proper prior is available. The second topic proposes a model to forecast the number of people booking tickets for particular flights. The model is more realistic than those which are classically used, since it is a dynamic model and acknowledges discrete distributions. The basic idea is given by the Dynamic Generalized Linear Model and a key feature is given by the gamma to log-normal approximation that is developed. The third topic consists of a study of temporal aggregation of a process that can be represented by a DLM. We give representation results for the simplest univariate cases, reveal some surprising phenomena, such as drastic model simplification with aggregation, and discuss some advantages and disadvantages of using the aggregated observations, depending on the forecasting objectives, as well as the importance of aggregation in our particular booking problem.

APA, Harvard, Vancouver, ISO, and other styles

27

Chen, Jianzhong. "Probabilistic relational data mining with aggregates and hierarchies." Thesis, University of Ulster, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.407761.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Mitchell, Richard James Lamacraft. "An integration of aggregate and disaggregate census data." Thesis, University of Southampton, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.242866.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

PaÌirceÌir, RoÌnaÌn. "Knowledge discovery from distributed aggregate data in data warehouses and statistical databases." Thesis, University of Ulster, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274398.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Yang, Shuxiang Computer Science &amp Engineering Faculty of Engineering UNSW. "Probabilistic threshold range aggregate query processing over uncertain data." Publisher:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/43374.

Full text

Abstract:

Uncertainty is inherent in many novel and important applications such as market surveillance, information extraction sensor data analysis, etc. In the recent a few decades, uncertain data has attracted considerable research attention. There are various factors that cause the uncertainty, for instance randomness or incompleteness of data, limitations of equipment and delay or loss in data transfer. A probabilistic threshold range aggregate (PRTA) query retrieves summarized information about the uncertain objects in the database satisfying a range query, with respect to a given probability threshold. This thesis is trying to address and handle this important type of query which there is no previous work studying on. We formulate the problem in both discrete and continuous uncertain data model and develop a novel index structure, asU-tree (aggregate-based sampling-auxiliary U-tree) which not only supports exact query answering but also provides approximate results with accuracy guarantee if efficiency is more concerned. The new asU-tree structure is totally dynamic. Query processing algorithms for both exact answer and approximate answer based on this new index structure are also proposed. An extensive experimental study shows that asU-tree is very efficient and effective over real and synthetic datasets.

APA, Harvard, Vancouver, ISO, and other styles

31

French, Benjamin. "Analysis of aggregate longitudinal data with time-dependent exposure /." Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/9569.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Xu, Bojian. "A study of time-decayed aggregates computation on data streams." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Zhang, Shuai. "Learning from semantically heterogeneous aggregate data in a distributed environment." Thesis, University of Ulster, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.551564.

Full text

Abstract:

Information cooperation, reuse and integration can be developed on the platform of rapidly growing open distributed environments and can support development of Ambient Intelligence. However, in such environments, information may be only partially observed due to the unreliability of data collection technologies and heterogeneity in the ontologies employed caused by distributed and independent system development. These challenges need to be overcome to facilitate intelligent data analysis. We focus on the use of large-scale databases such as statistical databases and data warehouses, where aggregates can be obtained to summarise information; such aggregates are valuable in providing efficient access, computation and communication. A principle-based learning framework is proposed and developed for semantically heterogeneous aggregate data using maximum likelihood techniques via the EM (Expectation-Maximisation) algorithm. The learning framework inherently handles data incompleteness and schema heterogeneity from unreliable, incomplete or uncertain information sources. The framework is developed for supervised and unsupervised learning from data in a distributed environment. This development is demonstrated using two scenarios. In the first scenario a decision-making mechanism is proposed to support assistive living for elderly people in a smart home environment. The mechanism incorporates modules for learning inhabitants' activities of daily living based on partially observed and unlabelled data, enabling hierarchical activity prediction and assisting inhabitants in completing activities by providing personalised reminders. Real data have been collected in a smart kitchen laboratory, and realistic synthetic data are also used for evaluation. Results show consistent and robust performance and other information and insights are also obtained. In the second scenano a model-based clustering algorithm is proposed for independently developed distributed heterogeneous databases to support cooperation between organisations, including distributed smart homes from different institutions. Clustering in the presence of data heterogeneity enables the characteristics of similar contexts to be captured. The algorithm is systematically evaluated using simulated data, with encouraging results and good scalability to large numbers of databases.

APA, Harvard, Vancouver, ISO, and other styles

34

Dodds, Gordon Ivan. "Modelling and forecasting electricity demand using aggregate and disaggregate data." Thesis, Queen's University Belfast, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.306073.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Udell, Michael Alan. "Essays in applied economics : new techniques in aggregate data analysis." Diss., Pasadena, Calif. : California Institute of Technology, 1995. http://resolver.caltech.edu/CaltechETD:etd-10262007-105829.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Holzhauer, Björn [Verfasser]. "Meta-analysis of aggregate data on medical events / Björn Holzhauer." Magdeburg : Universitätsbibliothek, 2017. http://d-nb.info/1149124334/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Bhat, Amit. "Low-latency Estimates for Window-Aggregate Queries over Data Streams." PDXScholar, 2011. https://pdxscholar.library.pdx.edu/open_access_etds/161.

Full text

Abstract:

Obtaining low-latency results from window-aggregate queries can be critical to certain data-stream processing applications. Due to a DSMS's lack of control over incoming data (typically, because of delays and bursts in data arrival), timely results for a window-aggregate query over a data stream cannot be obtained with guarantees about the results' accuracy. In this thesis, I propose a technique, which I term prodding, to obtain early result estimates for window-aggregate queries over data streams. The early estimates are obtained in addition to the regular query results. The proposed technique aims to maximize the contribution to a result-estimate computation from all the stateful operators across a multi-level query plan. I evaluate the benefits of prodding using real-world and generated data streams having different patterns in data arrival and data values. I conclude that, in various DSMS applications, prodding can generate low-latency estimates to window-aggregate query results. The main factors affecting the degree of inaccuracy in such estimates are: the aggregate function used in a query, the patterns in arrivals and values of stream data, and the aggressiveness of demanding the estimates. The utility of the estimates obtained using prodding should be optimized by tuning the aggressiveness in result-estimate demands to the specific latency and accuracy needs of a business, considering any available knowledge about patterns in the incoming data.

APA, Harvard, Vancouver, ISO, and other styles

38

Amara, Joseph George. "Computer Pre-Analysis Of Aggregate Data In High School Science Laboratories." NSUWorks, 1992. http://nsuworks.nova.edu/gscis_etd/387.

Full text

Abstract:

High school students have had difficulty analyzing data collected in laboratory experiments. This problem has been well-documented in the body of literature, and many suggested remedies have been proposed regarding the use of computers. The author attempted to demonstrate that an increase occurs in student data analysis skills when using computers. The hypothesis offered in this dissertation declared that students exposed to the computerized pre-analysis system will increase their abilities to analyze laboratory data into a conclusion better than students who are not exposed to the system. The system suggested placing three micro-computers in the physics, chemistry, and general science laboratories and equip the devices with simple spreadsheets and graphing software. The author devised spreadsheets for 15 experiments commonly used in high school science. The spreadsheet screens and equations are displayed in the appendices of the dissertation. Quizzes were also devised that place data on graphs or in problems similar to those found in each experiment three class meetings following the laboratory experience, each student took the quiz. Following completion of the project the students again took the Arlin Test of Formal Reasoning (ATFR). The study was conducted during a 12 week period of the 1991-1992 school year. The author used student produced laboratory reports, quiz scores, and the pre and post application of the Arlin Test of Formal Reasoning as analysis tools. A t test of significance was performed on these tools. The results determined that students in the classes using the computer were affected by the project. Specific reasoning skills were improved according to the test data. The program developed in this dissertation could serve as a seed in science education. The program contained a study of five experiments in each of three disciplines of science. The expansion to a complete set of experiments used in those courses could increase the benefit to the students involved. The expansion of this concept to other realms of science such as Earth Science, Astronomy, and Life Science could also enhance student skills. The concepts put forward in this dissertation could be expanded to other disciplines. Use of this project could be expanded to social studies or other subjects. The examination of population data, production, economics, and any data of a statistical nature could be studied by students in a laboratory setting such as described in this program.

APA, Harvard, Vancouver, ISO, and other styles

39

Zhu, Wei. "Using Accounting Data to Predict Firm-level and Aggregate Stock Returns." Thesis, Yale University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3578482.

Full text

Abstract:

This dissertation consists of three essays studying the role of accounting data in predicting distributions of stock returns. In the first essay, I explore the ability of accruals to predict future price (earnings) crashes and jumps, representing extreme negative and positive observations in the distribution of firm-level weekly returns (changes in quarterly ROA). I find that high (low) accruals predict a higher probability of price and earnings crashes (jumps) than medium accruals. In the second essay, I re-examine the ability of asset turnover growth, which reflects growth in both assets and sales, to predict future stock returns. While the prevailing view is that this relation is due to the spread between sales and asset growth, my results suggest it is driven mainly by the asset growth component. I do, however, find that this spread is positively related to future returns for a subsample of firms that did not make significant acquisitions or divestitures. In the third essay, I re-examine the puzzling negative correlation between aggregate stock returns and aggregate earnings at the quarterly level. I find that the negative aggregate returns-earnings correlation is unstable and the negative correlation for the period of 1976-2000 is mainly caused by the negative correlation between aggregate earnings and discount rate news.

APA, Harvard, Vancouver, ISO, and other styles

40

Villalba, Navarro Álvaro. "Scalable processing of aggregate functions for data streams in resource-constrained environments." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/667476.

Full text

Abstract:

The fast evolution of data analytics platforms has resulted in an increasing demand for real-time data stream processing. From Internet of Things applications to the monitoring of telemetry generated in large datacenters, a common demand for currently emerging scenarios is the need to process vast amounts of data with low latencies, generally performing the analysis process as close to the data source as possible. Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. Stream processing platforms are required to be malleable and absorb spikes generated by fluctuations of data generation rates. Data is usually produced as time series that have to be aggregated using multiple operators, being sliding windows one of the most common abstractions used to process data in real-time. To satisfy the above-mentioned demands, efficient stream processing techniques that aggregate data with minimal computational cost need to be developed. However, data analytics might require to aggregate extensive windows of data. Approximate computing has been a central paradigm for decades in data analytics in order to improve the performance and reduce the needed resources, such as memory, computation time, bandwidth or energy. In exchange for these improvements, the aggregated results suffer from a level of inaccuracy that in some cases can be predicted and constrained. This doctoral thesis aims to demonstrate that it is possible to have constant-time and memory efficient aggregation functions with approximate computing mechanisms for constrained environments. In order to achieve this goal, the work has been structured in three research challenges. First we introduce a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model de¿ned by the applications that consume data. The subscription-based programing model enables multiple users to deploy their own data-processing services. On top of this runtime, we present the Amortized Monoid Tree Aggregator general sliding window aggregation framework, which seamlessly combines the following features: amortized O(1) time complexity and a worst-case of O(log n) between insertions; it provides both a window aggregation mechanism and a window slide policy that are user programmable; the enforcement of the window sliding policy exhibits amortized O(1) computational cost for single evictions and supports bulk evictions with cost O(log n); and it requires a local memory space of O(log n). The framework can compute aggregations over multiple data dimensions, and has been designed to support decoupling computation and data storage through the use of distributed Key-Value Stores to keep window elements and partial aggregations. Specially motivated by edge computing scenarios, we contribute Approximate and Amortized Monoid Tree Aggregator (A2MTA). It is, to our knowledge, the first general purpose sliding window programable framework that combines constant-time aggregations with error bounded approximate computing techniques. A2MTA uses statistical analysis of the stream data in order to perform inaccurate aggregations, providing a critical reduction of needed resources for massive stream data aggregation, and an improvement of performance.
La ràpida evolució de les plataformes d'anàlisi de dades ha resultat en un increment de la demanda de processament de fluxos continus de dades en temps real. Des de la internet de les coses fins al monitoratge de telemetria generada en grans servidors, una demanda recurrent per escenaris emergents es la necessitat de processar grans quantitats de dades amb latències molt baixes, generalment fent el processat de les dades tant a prop dels origines com sigui possible. Les dades son generades com a fluxos continus per dispositius que utilitzen una varietat de localitzacions i protocols. Aquests processat de les dades s pot fer en temps real amb les transformacions efectuant-se al vol, i en aquest cas la utilització de plataformes de processat d'streams és necessària. Les plataformes de processat d'streams cal que absorbeixin pics de freqüència de dades. Les dades es generen com a series temporals que s'agreguen fent servir multiples operadors, on les finestres són l'abstracció més habitual. Per a satisfer les baixes latències i maleabilitat requerides, els operadors necesiten tenir un cost computacional mínim, inclús amb extenses finestres de dades per a agregar. La computació aproximada ha sigut durant decades un paradigma rellevant per l'anàlisi de dades on cal millorar el rendiment de diferents algorismes i reduir-ne el temps de computació, la memòria requerida, l'ample de banda o el consum energètic. A canvi d'aquestes millores, els resultats poden patir d'una falta d'exactitud que pot ser estimada i controlada. Aquesta tesi doctoral vol demostrar que es posible tenir funcions d'agregació pel processat d'streams que tinc un cost de temps constant, sigui eficient en termes de memoria i faci ús de computació aproximada. Per aconseguir aquests objectius, aquesta tesi està dividida en tres reptes. Primer presentem un entorn per a la construcció dinàmica de topologies de computació d'streams de dades utilitzant codi d'usuari. Aquestes topologies es construeixen fent servir un model de subscripció a streams, en el que les aplicación consumidores de dades amplien les topologies mentre s'estan executant. Aquest entorn permet multiples entitats ampliant una mateixa topologia. A sobre d'aquest entorn, presentem un framework de propòsit general per a l'agregació de finestres de dades anomenat AMTA (Amortized Monoid Tree Aggregator). Aquest framework combina: temps amortitzat constant per a totes les operacions, amb un cas pitjor logarítmic; programable tant en termes d'agregació com en termes d'expulsió d'elements de la finestra. L'expulsió massiva d'elements de la finestra es considera una operació atòmica, amb un cost amortitzat constant; i requereix espai en memoria local per a O(log n) elements de la finestra. Aquest framework pot computar agregacions sobre multiples dimensions de dades, i ha estat dissenyat per desacoplar la computació de les dades del seu desat, podent tenir els continguts de la finestra distribuits en diferents màquines. Motivats per la computació en l'edge (edge computing), hem contribuit A2MTA (Approximate and Amortized Monoid Tree Aggregator). Des de el nostre coneixement, es el primer framework de propòsit general per a la computació de finestres que combina un cost constant per a totes les seves operacions amb tècniques de computació aproximada amb control de l'error. A2MTA fa us d'anàlisis estadístics per a poder fer agregacions amb error limitat, reduint críticament els recursos necessaris per a la computació de grans quantitats de dades.

APA, Harvard, Vancouver, ISO, and other styles

41

Guthrie, Katherine Adams. "A hierarchical aggregate data model with allowance for spatially correlated disease rates /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/9572.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Choo, Sangho. "Aggregate relationships between telecommunications and travel : structural equation modeling of time series data /." For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2004. http://uclibs.org/PID/11984.

Full text

Abstract:

Thesis (Ph.D. in Civil and Environmental Engineering)--University of California, Davis, 2004.
Cover title. Computer-produced typeface. Includes bibliographical references (p. 149-161). Also available via the World Wide Web. (Restricted to UC campuses)

APA, Harvard, Vancouver, ISO, and other styles

43

Margraf, C. "On the use of micro models for claims reversing based on aggregate data." Thesis, City, University of London, 2017. http://openaccess.city.ac.uk/17908/.

Full text

Abstract:

In most developed economies, the insurance sector earns premiums that amount to around eight percent of their GNP. In order to protect both the financial market and the real economy, this results in strict regulations, such as the Solvency II Directive, which has monitored the EU insurance sector since early 2016. The largest item on general insurers’ balance sheets is often liabilities, which consist of future costs for reported claims that have not yet been settled, as well as incurred claims that have not yet been reported. The best estimate of these liabilities, the so-called reserve, is given attention to in Article 77 of the Solvency II Directive. However, the guidelines in this article are quite vague, so it is not surprising that modern statistics has not been used to a great extent in the reserving departments of insurance companies. This thesis aims to combine some theoretical results with the practical world of claims reserving. All results are motivated by the chain ladder method, and provide different reserving methods that will be introduced thoughout four separate papers. The first two papers show how claim estimates can be embedded into a full statistical reserving model based on the double chain ladder method. The new methods introduced incorporate available incurred data into the outstanding liability cash flow model. In the third paper a new Bornhuetter-Ferguson method is suggested, that enables the actuary to adjust the relative ultimates. Adjusted cash flow estimates are obtained as constrained maximum likelihood estimates. The last paper addresses how to consider reserving issues when there is excess-of loss reinsurance. It provides a practical example as well as an alternative approach using recent developments in stochastic claims reserving.

APA, Harvard, Vancouver, ISO, and other styles

44

Zhang, Chian-fan. "Applying spatial theory to new democracies : a model for analyzing aggregate election data /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Malchik, Alexander 1975. "An aggregator tool for extraction and collection of data from web pages." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86522.

Full text

Abstract:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.
Includes bibliographical references (p. 54-56).
by Alexander Malchik.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

46

Xia, Wei. "Molecular Structural and Electrical Characterization of Rodlike Aggregates of Discotic Phthalocyanines." Diss., Tucson, Arizona : University of Arizona, 2005. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1097%5F1%5Fm.pdf&type=application/pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Homem, Irvin. "LEIA: The Live Evidence Information Aggregator : A Scalable Distributed Hypervisor‐based Peer‐2‐Peer Aggregator of Information for Cyber‐Law Enforcement I." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177902.

Full text

Abstract:

The Internet in its most basic form is a complex information sharing organism. There are billions of interconnected elements with varying capabilities that work together supporting numerous activities (services) through this information sharing. In recent times, these elements have become portable, mobile, highly computationally capable and more than ever intertwined with human controllers and their activities. They are also rapidly being embedded into other everyday objects and sharing more and more information in order to facilitate automation, signaling that the rise of the Internet of Things is imminent. In every human society there are always miscreants who prefer to drive against the common good and engage in illicit activity. It is no different within the society interconnected by the Internet (The Internet Society). Law enforcement in every society attempts to curb perpetrators of such activities. However, it is immensely difficult when the Internet is the playing field. The amount of information that investigators must sift through is incredibly massive and prosecution timelines stated by law are prohibitively narrow. The main solution towards this Big Data problem is seen to be the automation of the Digital Investigation process. This encompasses the entire process: From the detection of malevolent activity, seizure/collection of evidence, analysis of the evidentiary data collected and finally to the presentation of valid postulates. This paper focuses mainly on the automation of the evidence capture process in an Internet of Things environment. However, in order to comprehensively achieve this, the subsequent and consequent procedures of detection of malevolent activity and analysis of the evidentiary data collected, respectively, are also touched upon. To this effect we propose the Live Evidence Information Aggregator (LEIA) architecture that aims to be a comprehensive automated digital investigation tool. LEIA is in essence a collaborative framework that hinges upon interactivity and sharing of resources and information among participating devices in order to achieve the necessary efficiency in data collection in the event of a security incident. Its ingenuity makes use of a variety of technologies to achieve its goals. This is seen in the use of crowdsourcing among devices in order to achieve more accurate malicious event detection; Hypervisors with inbuilt intrusion detection capabilities to facilitate efficient data capture; Peer to Peer networks to facilitate rapid transfer of evidentiary data to a centralized data store; Cloud Storage to facilitate storage of massive amounts of data; and the Resource Description Framework from Semantic Web Technologies to facilitate the interoperability of data storage formats among the heterogeneous devices. Within the description of the LEIA architecture, a peer to peer protocol based on the Bittorrent protocol is proposed, corresponding data storage and transfer formats are developed, and network security protocols are also taken into consideration. In order to demonstrate the LEIA architecture developed in this study, a small scale prototype with limited capabilities has been built and tested. The prototype functionality focuses only on the secure, remote acquisition of the hard disk of an embedded Linux device over the Internet and its subsequent storage on a cloud infrastructure. The successful implementation of this prototype goes to show that the architecture is feasible and that the automation of the evidence seizure process makes the otherwise arduous process easy and quick to perform.

APA, Harvard, Vancouver, ISO, and other styles

48

Van, Eenoo Edward Charles Jr. "Theoretically Valid Aggregates in the Absence of Homothetic Preferences, Separable Utility, and Complete Price Data." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/9782.

Full text

Abstract:

The improper aggregation of commodities can have important consequences when estimating a system of group demand equations. Generally, aggregates are created under the assumptions that intra-group preferences are homothetic and the consumer's utility function is weakly separable over some partition. These assumptions place severe restrictions on the model that can significantly impact parameter and elasticity estimates. An alternative to imposing weak separability is to employ the Generalized Composite Commodity Theorem, which requires the relative intra-group commodity prices to be independent of the group price index. This study compares the results of estimating a demand system for composite beef, pork, and poultry products under the assumptions of weak separability and the Generalized Composite Commodity Theorem. Another important issue related to aggregation is the specification of an appropriate group price index. Price indices consistent with linear homogeneous preferences (a subset of the homothetic class of preferences) and non-homothetic intra-group preferences are identified and it is shown that several of the commonly employed indices are biased in the absence of complete price data.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

49

Bergmark, Fabian. "Online aggregate tables : A method forimplementing big data analysis in PostgreSQLusing real time pre-calculations." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207808.

Full text

Abstract:

In modern user-centric applications, data gathering and analysis is often of vitalimportance. Current trends in data management software show that traditionalrelational databases fail to keep up with the growing data sets. Outsourcingdata analysis often means data is locked in with a particular service, makingtransitions between analysis systems nearly impossible. This thesis implementsand evaluates a data analysis framework implemented completely within a re-lational database. The framework provides a structure for implementations ofonline algorithms of analytical methods to store precomputed results. The re-sult is an even resource utilization with predictable performance that does notdecrease over time. The system keeps all raw data gathered to allow for futureexportation. A full implementation of the framework is tested based on thecurrent analysis requirements of the company Shortcut Labs, and performancemeasurements show no problem with managing data sets of over a billion datapoints.
I moderna användarcentrerade applikationer är insamling och analys av dataofta av affärskritisk vikt. Traditionalla relationsdatabaser har svårt att hanterade ökande datamängderna. Samtidigt medför användning av externa tjänster fördataanalys ofta inlåsning av data, vilket försvårar byte av analystjänst. Dennarapport presenterar och utvärderar ett ramverk för dataanalys som är imple-menterat i en relationsdatabas. Ramverket tillhandahåller strukturer för attförberäkna resultat för analytiska beräkningar på ett effektivt sätt. Resultatetblir en jämn resursanvändning med förutsägbar prestanda som inte försämrasöver tid. Ramverket sparar även all insamlad data vilket möjliggör exporter-ing. Ramverket utvärderas hos företaget Shortcut Labs och resultatet visar attramverket klarar av datamängder på över en miljard punkter.

APA, Harvard, Vancouver, ISO, and other styles

50

Wanders, Anne-Christine. "The simulation of small-area migrant populations through integration of aggregate and disaggregate data sources." Thesis, University of Southampton, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.302029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Aggregated data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles