Dissertations / Theses: 'Data representation'

1

Chintala, Venkatram Reddy. "Digital image data representation." Ohio : Ohio University, 1986. http://www.ohiolink.edu/etd/view.cgi?ohiou1183128563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lansley, Guy David. "Big data : geodemographics and representation." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10045119/.

Full text

Abstract:

Due to the harmonisation of data collection procedures with everyday activities, Big Data can be harnessed to produce geodemographic representations to supplement or even replace traditional sources of population data which suffer from low response rates or intermittent refreshes. Furthermore, the velocity and diversity of new forms of data also enable the creation entirely new forms of geodemographic insight. However, their miscellaneous data collection procedures are inconsistent, unregulated and are not robustly sampled like conventional social sciences data sources. Therefore, uncertainty is inherent when attempting to glean representative research on the population at large from Big Data. All data are of partial coverage; however, the provenance Big Data is poorly understood. Consequently, the use of said data has epistemologically shifted how geographers build representations of the population. In repurposing Big Data, researchers might encounter a variety of data types that are not readily suitable for quantitative analysis and may represent geodemographic phenomena indirectly. Furthermore, whilst there are considerable barriers acquiring data pertaining to people and their actions, it is also challenging to link Big Data. In light of this, this work explores the fundamental challenges of using geospatial Big Data to represent the population and their activities across space and time. These are demonstrated through original research on various big datasets, they include Consumer Registers (which comprise public versions of the Electoral Register and consumer data), Driver and Vehicle Licencing Agency (DVLA) car registration data, and geotagged Twitter posts. While this thesis is critical of Big Data, it remains optimistic of their potential value and demonstrates techniques through which uncertainty can be identified or mitigated to an extent. In the process it also exemplifies how new forms of data can produce geodemographic insight that was previously unobservable on a large scale.

APA, Harvard, Vancouver, ISO, and other styles

3

Dos, Santos Ludovic. "Representation learning for relational data." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066480/document.

Full text

Abstract:

L'utilisation croissante des réseaux sociaux et de capteurs génère une grande quantité de données qui peuvent être représentées sous forme de graphiques complexes. Il y a de nombreuses tâches allant de l'analyse de l'information à la prédiction et à la récupération que l'on peut imaginer sur ces données où la relation entre les noeuds de graphes devrait être informative. Dans cette thèse, nous avons proposé différents modèles pour trois tâches différentes: - Classification des noeuds graphiques - Prévisions de séries temporelles relationnelles - Filtrage collaboratif. Tous les modèles proposés utilisent le cadre d'apprentissage de la représentation dans sa variante déterministe ou gaussienne. Dans un premier temps, nous avons proposé deux algorithmes pour la tâche de marquage de graphe hétérogène, l'un utilisant des représentations déterministes et l'autre des représentations gaussiennes. Contrairement à d'autres modèles de pointe, notre solution est capable d'apprendre les poids de bord lors de l'apprentissage simultané des représentations et des classificateurs. Deuxièmement, nous avons proposé un algorithme pour la prévision des séries chronologiques relationnelles où les observations sont non seulement corrélées à l'intérieur de chaque série, mais aussi entre les différentes séries. Nous utilisons des représentations gaussiennes dans cette contribution. C'était l'occasion de voir de quelle manière l'utilisation de représentations gaussiennes au lieu de représentations déterministes était profitable. Enfin, nous appliquons l'approche d'apprentissage de la représentation gaussienne à la tâche de filtrage collaboratif. Ceci est un travail préliminaire pour voir si les propriétés des représentations gaussiennes trouvées sur les deux tâches précédentes ont également été vérifiées pour le classement. L'objectif de ce travail était de généraliser ensuite l'approche à des données plus relationnelles et pas seulement des graphes bipartis entre les utilisateurs et les items
The increasing use of social and sensor networks generates a large quantity of data that can be represented as complex graphs. There are many tasks from information analysis, to prediction and retrieval one can imagine on those data where relation between graph nodes should be informative. In this thesis, we proposed different models for three different tasks: - Graph node classification - Relational time series forecasting - Collaborative filtering. All the proposed models use the representation learning framework in its deterministic or Gaussian variant. First, we proposed two algorithms for the heterogeneous graph labeling task, one using deterministic representations and the other one Gaussian representations. Contrary to other state of the art models, our solution is able to learn edge weights when learning simultaneously the representations and the classifiers. Second, we proposed an algorithm for relational time series forecasting where the observations are not only correlated inside each series, but also across the different series. We use Gaussian representations in this contribution. This was an opportunity to see in which way using Gaussian representations instead of deterministic ones was profitable. At last, we apply the Gaussian representation learning approach to the collaborative filtering task. This is a preliminary work to see if the properties of Gaussian representations found on the two previous tasks were also verified for the ranking one. The goal of this work was to then generalize the approach to more relational data and not only bipartite graphs between users and items

APA, Harvard, Vancouver, ISO, and other styles

4

Dos, Santos Ludovic. "Representation learning for relational data." Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066480.

Full text

Abstract:

L'utilisation croissante des réseaux sociaux et de capteurs génère une grande quantité de données qui peuvent être représentées sous forme de graphiques complexes. Il y a de nombreuses tâches allant de l'analyse de l'information à la prédiction et à la récupération que l'on peut imaginer sur ces données où la relation entre les noeuds de graphes devrait être informative. Dans cette thèse, nous avons proposé différents modèles pour trois tâches différentes: - Classification des noeuds graphiques - Prévisions de séries temporelles relationnelles - Filtrage collaboratif. Tous les modèles proposés utilisent le cadre d'apprentissage de la représentation dans sa variante déterministe ou gaussienne. Dans un premier temps, nous avons proposé deux algorithmes pour la tâche de marquage de graphe hétérogène, l'un utilisant des représentations déterministes et l'autre des représentations gaussiennes. Contrairement à d'autres modèles de pointe, notre solution est capable d'apprendre les poids de bord lors de l'apprentissage simultané des représentations et des classificateurs. Deuxièmement, nous avons proposé un algorithme pour la prévision des séries chronologiques relationnelles où les observations sont non seulement corrélées à l'intérieur de chaque série, mais aussi entre les différentes séries. Nous utilisons des représentations gaussiennes dans cette contribution. C'était l'occasion de voir de quelle manière l'utilisation de représentations gaussiennes au lieu de représentations déterministes était profitable. Enfin, nous appliquons l'approche d'apprentissage de la représentation gaussienne à la tâche de filtrage collaboratif. Ceci est un travail préliminaire pour voir si les propriétés des représentations gaussiennes trouvées sur les deux tâches précédentes ont également été vérifiées pour le classement. L'objectif de ce travail était de généraliser ensuite l'approche à des données plus relationnelles et pas seulement des graphes bipartis entre les utilisateurs et les items
The increasing use of social and sensor networks generates a large quantity of data that can be represented as complex graphs. There are many tasks from information analysis, to prediction and retrieval one can imagine on those data where relation between graph nodes should be informative. In this thesis, we proposed different models for three different tasks: - Graph node classification - Relational time series forecasting - Collaborative filtering. All the proposed models use the representation learning framework in its deterministic or Gaussian variant. First, we proposed two algorithms for the heterogeneous graph labeling task, one using deterministic representations and the other one Gaussian representations. Contrary to other state of the art models, our solution is able to learn edge weights when learning simultaneously the representations and the classifiers. Second, we proposed an algorithm for relational time series forecasting where the observations are not only correlated inside each series, but also across the different series. We use Gaussian representations in this contribution. This was an opportunity to see in which way using Gaussian representations instead of deterministic ones was profitable. At last, we apply the Gaussian representation learning approach to the collaborative filtering task. This is a preliminary work to see if the properties of Gaussian representations found on the two previous tasks were also verified for the ranking one. The goal of this work was to then generalize the approach to more relational data and not only bipartite graphs between users and items

APA, Harvard, Vancouver, ISO, and other styles

5

Penton, Dave. "Linguistic data models : presentation and representation /." Connect to thesis, 2006. http://eprints.unimelb.edu.au/archive/00002875.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sanches, Pedro. "Health Data : Representation and (In)visibility." Doctoral thesis, KTH, Programvaruteknik och Datorsystem, SCS, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-158909.

Full text

Abstract:

Health data requires context to be understood. I show how, by examining two areas: self-surveillance, with a focus on representation of bodily data, and mass-surveillance, with a focus on representing populations. I critically explore how Information and Communication Technology (ICT) can be made to represent individuals and populations, and identify implications of such representations. My contributions are: (i) the design of a self-tracking stress management system, (ii) the design of a mass-surveillance system based on mobile phone data, (iii) an empirical study exploring how users of a fitness tracker make sense of their generated data, (iv) an analysis of the discourse of designers of a syndrome surveillance system, (v) a critical analysis of the design process of a mass-surveillance system, and (vi) an analysis of the historicity of the concepts and decisions taken during the design of a stress management system. I show that producing health data, and subsequently the technological characteristics of algorithms that produce them depend on factors present in the ICT design process. These factors determine how data is made to represent individuals and populations in ways that may selectively make invisible parts of the population, determinants of health, or individual conception of self and wellbeing. In addition, I show that the work of producing data does not stop with the work of the engineers who produce ICT-based systems: maintenance is constantly required.
För att förstå hälsodata krävs sammanhang. Jag visar hur detta kan erhållas, genom två fallstudier: en om självövervakning, med fokus på representation av kroppsdata, samt en om massövervakning, med fokus på representation av populationer. Jag granskar kritiskt hur informationsteknologi (IT) kan fås att representera såväl individer som populationer och vilka följder det får. Mina bidrag är: (i) utformningen av ett självövervakningssystem för stresshantering, (ii) utformningen av ett massövervakningssystem baserat på data från mobiltelefonanvändning, (iii) en empirisk studie av hur användare av en hälsosensor begriper det data som sensorn genererar, (iv) en diskursiv analys av hur syndromövervakningssystem utformas, (v) en kritisk analys av processer kring att utforma ett massövervakningssystem, samt (vi) en analys av den historiska korrektheten i begrepp och beslutsfattande i samband med utformningen av ett stresshanteringssystem. Jag visar att produktion av hälsodata, liksom tekniska beskrivningar av de algoritmer som används i den processen, beror av faktorer som hänger samman med IT-utformningsprocessen. Dessa faktorer avgör sedan hur data kan fås att representera individer och populationer på sätt som kan rendera delar av en population, hälsodeterminanter, eller individens självuppfattning och förståelse av välmående osynliga. Jag visar också att arbetet med att producera data inte är avslutat i och med det ingenjörsarbete som krävs för att IT-systemen ska byggas: konstant underhåll krävs också.

QC 20150114

APA, Harvard, Vancouver, ISO, and other styles

7

Parvathala, Rajeev (Rajeev Krishna). "Representation learning for non-sequential data." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119581.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 85-90).
In this thesis, we design and implement new models to learn representations for sets and graphs. Typically, data collections in machine learning problems are structured as arrays or sequences, with sequential relationships between successive elements. Sets and graphs both break this common mold of data collections that have been extensively studied in the machine learning community. First, we formulate a new method for performing diverse subset selection using a neural set function approximation method. This method relies on the deep sets idea, which says that any set function s(X) has a universal approximator of the form f([sigma]x[xi]X [phi](x)). Second, we design a new variational autoencoding model for highly structured, sparse graphs, such as chemical molecules. This method uses the graphon, a probabilistic graphical model from mathematics, as inspiration for the decoder. Furthermore, an adversary is employed to force the distribution of vertex encodings to follow a target distribution, so that new graphs can be generated by sampling from this target distribution. Finally, we develop a new framework for performing encoding of graphs in a hierarchical manner. This approach partitions an input graph into multiple connected subgraphs, and creates a new graph where each node represents one such subgraph. This allows the model to learn a higher level representation for graphs, and increases robustness of graphical encoding to varying graph input sizes.
by Rajeev Parvathala.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

8

Andersson, Elin, and Hanna Bengtsson. "Geovisualisering: En rumslig representation av data." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-43221.

Full text

Abstract:

Internet of things ger oss möjligheten att kunna identifiera, kontrollera och övervaka objekt över hela världen. För att mängden rådata som strömmar in ska kunna skapa mening och insikter till människan krävs det att den presenteras på rätt sätt. Studien undersöker därför om geovisualisering bättre kan möta människans kognitiva förmåga vid intag och tolkning av information. Geovisualisering innebär att rumslig data kan utforskas på en karta via en interaktiv display och är en länk mellan den mänskliga beslutsprocessen, interaktiva gränssnitt och data [21]. Mer forskning behövs inom området för att undersöka hur geovisualisering kan ta plats i system där stora datamängder behöver presenteras på ett överskådligt sätt och stödja beslutsprocesser. Studien syftar till att jämföra geovisualiseringar med ett befintligt system som tillhandahåller kontinuerlig uppdatering och övervakning av nätverkskameror genom utförande av användbarhetstester och intervjuer. Det som undersökts är om geovisualisering kan ge en ökad förståelse och bättre interaktion i ett utrymme som efterliknar den fysiska världen, samt undersöka potentiella problem för att hitta framtida förbättringar. Resultaten visade att navigering och informationsöverbelastning var återkommande problem under testerna av det befintliga systemet. För geovisualiseringarna visade resultaten det motsatta då de istället underlättade förståelsen för interaktion och information. Vissa problem identifierades dock för de framtagna geovisualiseringarna, som exempelvis dess begränsade interaktion och misstolkningar av objekt. Trots detta visade det sig vara fördelaktigt att placera ut enheter i deras verkliga miljö med hjälp av geovisualisering då det bidrog till en bättre översikt och förståelse av systemets sammanhang.
The Internet of Things gives us the ability to identify, control and monitor objects around the world. In order to get meaning and knowledge from the amount of raw data, it needs to be presented in the right way for people to get insights from it. The study therefore examines whether geovisualization can better meet human cognitive ability in interpretation of information. Geovisualization means that spatial data can be explored on a map through an interactive display and is a link between the human decision-making process, interactive interfaces and data [21]. More research is needed in the area to investigate how geovisualization can take place in systems where large amounts of data needs to be presented and how it can support decision-making processes. The study aims to compare geovisualizations with an existing system that provides continuous updating and monitoring of network cameras by performing usability tests and interviews. Geovisualization has been investigated to see if it can contribute an increased understanding and better navigation in a space that mimics the physical world, as well as investigate potential problems to find future improvements. The results proved that navigation and information overload were recurring problems during the tests of the existing system. For the geovisualizations, the results proved the opposite as they instead facilitated the understanding of navigation and information. However, some problems were identified for the developed geovisualizations, such as its limited interaction and misinterpretations of objects. Despite this, it proved to be advantageous to place units in their real environment using geovisualization as it contributed to a better overview and understanding of the system's context.

APA, Harvard, Vancouver, ISO, and other styles

9

Friedman, Marc T. "Representation and optimization for data integration /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/6979.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Jansson, Erika. "Data-model representation for non-programmers." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-394277.

Full text

Abstract:

Nowadays, there are people working within the IT-industry without any major knowledge in programming. Some of them sometimes need to make changes that currently only can be done in the actual code. This project is about finding the best way for non-programmers to make changes in a data-model without having to change the code. The project is divided into three parts where the first two is about finding different ways to solve this problem and then evaluate them through expert evaluation and based on relevant theory. The third part is about taking the result from part one and two and develop it. The third part ends with user-tests and follow-up interviews with 12 test-participants. In this part, also programmers will participate to get a complete overview of all the intended user’s experience. The result is that a graphical concept is to be preferred for users with minor/without programming experience. For programmers, it is harder to tell which concept is best and a more extensive investigation probably has to be done to get a fair result. These conclusions are based on the results from the conducted tests/interviews together with available external theory. The results could be improved with more users and more extensive tests. Worth mentioning is since all users are individuals, different concepts suit different persons and what suits one user best might not suit another at all, despite background as programmer or non-programmer.

APA, Harvard, Vancouver, ISO, and other styles

11

Dineva, A. A. "Non-conventional data representation and control." Doctoral thesis, Università degli Studi di Milano, 2017. http://hdl.handle.net/2434/487393.

Full text

Abstract:

Non-conventional approaches are prime concerns for most design issues in nonlinear adaptive control and signal processing. During the last decade major advances have been made in the theory of Adaptive Systems. Due to the advantageous features of Soft Computing techniques, such as flexibility and robustness, they have become fundamental tools in many areas. These methods are suitable for solving problems that are highly nonlinear or when only partial, uncertain data is available. In such situations, the application of traditional approaches is often complicated and what is more, can not guarantee the expected performance level. Therefore, my primary aim is to reveal new ways to overcome this difficulty by using Soft Computing, non-conventional and novel data representation techniques. In this Thesis I address novel data representation and control methods that are able to adaptively cope with usually imperfect, noisy or even missing information (for instance, wavelet based multiresolution controllers, anytime control, Situational Control, Robust Fixed Point Transformation (RFPT)-based control). The great majority of the adaptive nonlinear control design are based on Lyapunov’s 2nd or commonly referred to as the “Direct” method. The major defect of this method that it is mathematically complicated and it works with a large number of arbitrary adaptive control parameters. Moreover, the parameter identification process in certain cases is vulnerable if unknown external perturbations can disturb the system under control, etc. In the recent years the RFPT has been introduced for replacing the Lyapunov technique. Since, in this Thesis my first intention was to deal with the possibilities of the combination of classical model identification and the RFPT-based design in depth. I have proposed a new method that utilize the geometric interpretation provided by the Lyapunov-technique that can be directly used for parameter tuning. I have shown that these useful information can be obtained on the actual parameter estimation error by using the same feedback terms and equations of motion as the original methods. In order to improve the parameter tuning process, I have suggested the application of the Modified Gram-Schmidt Algoritm for the possible combination of the RFPT-based method with the Modified Adaptive Inverse Dynamic Robot Controller (MAIDRC) and the Modified Adaptive Slotine-Li Robot Controller (MADSLRC). Besides, I have presented an even simpler tuning technique in the case of the Modified Adaptive Inverse Dynamics Robot Controller that also applies fixed point transformation-based tuning rule for parameter identification. Afterwards, I have presented a systematic method for the generation of a new family of the Fixed Point Transformations, the so-called „Sigmoid Generated Fixed Point Transformation (SGFPT)” for the purpose of „Adaptive Dynamic Control” for nonlinear systems. At first, I have outlined the idea for the „Single Input - Single Output (SISO)” systems, then I have shown that it can be extended to „Multiple Input – Multiple Output (MIMO)” systems. Additionally, I have replaced the tuning method by a simple calculation in order to further simplify and improve the method. I have proposed new advances regarding the SGFPT. Also, I have described a new control strategy based on the combination of the “adaptive” and “optimal” control by applying time-sharing strategy in the SGFPT method, that supports error containment by cyclic control of the different variables. Further, I have introduced new improvements on the SGFPT technique by introducing “Stretched Sigmoid Functions”. The efficiency of the presented control solution has been confirmed by the adaptive control of an underactuated mechanical system. I have investigated the applicability of fuzzy approximation in the SGFPT-type control design and demonstrated the usability via simulation investigations. Furthermore, I have shown a new type of function for the adaptive deformation in the SGFPT. The other important issue that includes the maintenance of unwanted sensor noises that are mainly introduced by feedback into the system under control. Therefore, in the development of a control system the signals of noisy measurements has to be addressed first so that more sophisticated signal pre-processing methods are required. Since, I have concerned the issue of well-adapted techniques for smoothing problems in the time domain and fitting data to parametric models. In a wider sense this means, that research is also needed to determine novel approximations that can be used for smoothing the operation of the adaptive controller. After, I have investigated the Savitzky-Golay (SG) smoothing and differentiation filter. It has been proven that the performance of the classical SG-filter depends on the appropriate setting of the windowlength and the polynomial degree. The main limitations of the performance of this filter are the most conspicuous in processing of signals with high rate of change. In order to overcome this limitation I have developed a new adaptive method to smooth signals based on the Savitzky-Golay algorithm. The provided method ensures high precision noise removal by iterative multi-round smoothing. The signal approximated by linear regression lines and corrections are made in each step. Also, in each round the parameters are dynamically change due to the results of the previous smoothing. For supporting high precision reconstruction I have introduced a new parametric weighting function. Thesis applications and proof of operation have been confirmed by numerical simulations.

APA, Harvard, Vancouver, ISO, and other styles

12

Karras, Panagiotis. "Data structures and algorithms for data representation in constrained environments." Thesis, Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/B38897647.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Kalaiah, Aravind. "Visual data representation using context-aware Samples." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/2465.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

14

Khanna, Rajiv. "Image data compression using multiple bases representation." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-12302008-063722/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Cottee, Michaela J. "The graphical representation of structured multivariate data." Thesis, Open University, 1996. http://oro.open.ac.uk/57616/.

Full text

Abstract:

During the past two decades or so, graphical representations have been used increasingly for the examination, summarisation and communication of statistical data. Many graphical techniques exist for exploratory data analysis (ie. for deciding which model it is appropriate to fit to the data) and a number of graphical diagnostic techniques exist for checking the appropriateness of a fitted model. However, very few techniques exist for the representation of the fitted model itself. This thesis is concerned with the development of some new and existing graphical representation techniques for the communication and interpretation of fitted statistical models. The first part of this thesis takes the form of a general overview of the use in statistics of graphical representations for exploratory data analysis and diagnostic model checking. In relation to the concern of this thesis, particular consideration is given to the few graphical techniques which already exist for the representation of fitted models. A number of novel two-dimensional approaches are then proposed which go partway towards providing a graphical representation of the main effects and interaction terms for fitted models. This leads on to a description of conditional independence graphs, and consideration of the suitability of conditional independence graphs as a technique for the representation of fitted models. Conditional independence graphs are then developed further in accordance with the research aims. Since it becomes apparent that it is not possible to use any of the approaches taken m order to develop a simple two-dimensional pen-and-paper technique for the unambiguous graphical representation of all fitted statistical models, an interactive computer package based on the conditional independence graph approach is developed for the construction, communication and interpretation of graphical representations for fitted statistical models. This package, called the "Conditional Independence Graph Enhancer" (CIGE), does provide unambiguous graphical representations for all fitted statistical models considered.

APA, Harvard, Vancouver, ISO, and other styles

16

Todman, Christopher Derek. "The representation of time in data warehouses." Thesis, Open University, 1999. http://oro.open.ac.uk/58004/.

Full text

Abstract:

This thesis researches the problems concerning the specification and implementation of the temporal requirements in data warehouses. The thesis focuses on two areas, firstly, the methods for identifying and capturing the business information needs and associated temporal requirements at the conceptual level and; secondly, methods for classifying and implementing the requirements at the logical level using the relational model. At the conceptual level, eight candidate methodologies were investigated to examine their suitability for the creation of data models that are appropriate for a data warehouse. The methods were evaluated to assess their representation of time, their ability to reflect the dimensional nature of data warehouse models and their simplicity of use. The research found that none of the methods under review fully satisfied the criteria. At the logical level, the research concluded that the methods widely used in current practice result in data structures that are either incapable of answering some very basic questions involving history or that return inaccurate results. Specific proposals are made in three areas. Firstly, a new conceptual model is described that is designed to capture the information requirements for dimensional models and has full support for time. Secondly, a new approach at the logical level is proposed. It provides the data structures that enable the requirements captured in the conceptual model to be implemented, thus enabling the historical questions to be answered simply and accurately. Thirdly, a set of rules is developed to help minimise the inaccuracy caused by time. A guide has been produced that provides practitioners with the tools and instructions on how to implement data warehouses using the methods developed in the thesis.

APA, Harvard, Vancouver, ISO, and other styles

17

Osborne, William George. "Data representation optimisation for reconfigurable hardware design." Thesis, Imperial College London, 2011. http://hdl.handle.net/10044/1/9044.

Full text

Abstract:

One of the challenges of designing hardware circuits is representing the data in an efficient way - minimising area and power while maximising clock frequency. There are several ways of representing variables, each with different characteristics, such as the effect arithmetic operations have on the absolute and relative error. In the first part of this thesis, a new method of transforming arithmetic by combining different numerical representations to exploit their advantages is discussed. The problem is formulated as a set of linear equations which are then solved to find the optimal solution. Algorithms that generate sub-optimal solutions are investigated because they take a fraction of the time to run. A new reconfigurable device structure is proposed based on the results presented. In this case, the accuracy of the original application is guaranteed to be met regardless of the input data. In many applications, guaranteeing that a transformed design has at least the same accuracy as the original is not a strong enough constraint. For this reason, the error on the output is guaranteed to be lower than a specified value. In the second part of this thesis, accuracy reduction is investigated with the goal of minimising circuit area. Energy-efficient run-time reconfigurable hardware is automatically created by systematically deactivating parts of the circuit based on the accuracy required. A model to determine the conditions under which reconfiguring the chip, if this is possible, is more energy-efficient than multiplexing is shown. The approach is expanded to general purpose processors; a new computational model - both software and hardware architecture - to reduce the energy of future devices is introduced.

APA, Harvard, Vancouver, ISO, and other styles

18

Nan, Lihao. "Privacy Preserving Representation Learning For Complex Data." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20662.

Full text

Abstract:

Here we consider a common data encryption problem encountered by users who want to disclose some data to gain utility but preserve their private information. Specifically, we consider the inference attack, in which an adversary conducts inference on the disclosed data to gain information about users' private data. Following privacy funnel \cite{makhdoumi2014information}, assuming that the original data $X$ is transformed into $Z$ before disclosing and the log loss is used for both privacy and utility metrics, then the problem can be modeled as finding a mapping $X \rightarrow Z$ that maximizes mutual information between $X$ and $Z$ subject to a constraint that the mutual information between $Z$ and private data $S$ is smaller than a predefined threshold $\epsilon$. In contrast to the original study \cite{makhdoumi2014information}, which only focused on discrete data, we consider the more general and practical setting of continuous and high-dimensional disclosed data (e.g., image data). Most previous work on privacy-preserving representation learning is based on adversarial learning or generative adversarial networks, which has been shown to suffer from the vanishing gradient problem, and it is experimentally difficult to eliminate the relationship with private data $Y$ when $Z$ is constrained to retain more information about $X$. Here we propose a simple but effective variational approach that does not rely on adversarial training. Our experimental results show that our approach is stable and outperforms previous methods in terms of both downstream task accuracy and mutual information estimation.

APA, Harvard, Vancouver, ISO, and other styles

19

Morell, Vicente. "Contributions to 3D Data Registration and Representation." Doctoral thesis, Universidad de Alicante, 2014. http://hdl.handle.net/10045/42364.

Full text

Abstract:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

APA, Harvard, Vancouver, ISO, and other styles

20

Ugail, Hassan, and Eyad Elyan. "Efficient 3D data representation for biometric applications." IOS Press, 2007. http://hdl.handle.net/10454/2683.

Full text

Abstract:

Yes
An important issue in many of today's biometric applications is the development of efficient and accurate techniques for representing related 3D data. Such data is often available through the process of digitization of complex geometric objects which are of importance to biometric applications. For example, in the area of 3D face recognition a digital point cloud of data corresponding to a given face is usually provided by a 3D digital scanner. For efficient data storage and for identification/authentication in a timely fashion such data requires to be represented using a few parameters or variables which are meaningful. Here we show how mathematical techniques based on Partial Differential Equations (PDEs) can be utilized to represent complex 3D data where the data can be parameterized in an efficient way. For example, in the case of a 3D face we show how it can be represented using PDEs whereby a handful of key facial parameters can be identified for efficient storage and verification.

APA, Harvard, Vancouver, ISO, and other styles

21

Henning, Gustav. "Visualization of neural data : Dynamic representation and analysis of accumulated experimental data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166770.

Full text

Abstract:

The scientific method is an integral part of the investigation and exploration of hypotheses. Although procedures may vary from one field to the next, most have common identifiable stages. Today, there is no lack of tools that illustrate data in different graphical mediums. This thesis focuses instead on the type of tools that researchers use to investigate their hypotheses’ validity.When a sufficient amount of data is gathered, it can be presented for analysis in meaningful ways to illustrate patterns or abnormalities that would otherwise go unnoticed when only viewed in raw numbers. However useful static visualization of data can be when presented in ascientific paper, researchers are often overwhelmed by the number of plots and graphs that can be made using only a sliver of data. Therefore, this thesis will introduce software which purpose is to demonstrate the needs of researchers inanalyzing data from repeated experiments in order to speed up the process of recognizing variations between them.
Den vetenskapliga metoden är en integral del av undersökningen och utforskandet av hypoteser. Medan procedurer varierar mellan fält liknar de varandra i stora drag. Idag finns det ingen brist på verktyg som visualiserar data i olika grafiska kontexter. Istället fokuserar denna tes på de typ av verktyg som forskare använder för att undersöka integriteten av hypoteser. När tillräckligt med data samlats finns det olika sätt att presentera denna på ett meningsfullt sätt för att demonstrera mönster och avvikelser som skulle förbli osedda i endast siffror. Hurvida användbar statisk visualisering av data är som grafik till vetenskapliga rapporter gäller nödvändigtvis inte samma sak vid analys på grund av de många kombinationer av visualisering som ofta finns. Mjukvara kommer att introduceras för att demonstrera behovet av dynamisk representation vid analys av ackumulerad data för att påskynda upptäckten av mönster och avvikelser.

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Mingfei, and 李明飞. "Sparse representation and fast processing of massive data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B49617977.

Full text

Abstract:

Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with |Sj|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2).
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

23

Xie, Hanting. "A generic data representation for predicting player behaviours." Thesis, University of York, 2017. http://etheses.whiterose.ac.uk/20137/.

Full text

Abstract:

A common use of predictive models in game analytics is to predict the behaviours of players so that pre-emptive measures can be taken before they make undesired decisions. A standard data pre-processing step in predictive modelling includes both data representation and category definition. Data representation extracts features from the raw dataset to represent the whole dataset. Much research has been done towards predicting important player behaviours with game-specific data representations. Some of the resulting efforts have achieved competitive performance; however, due to the game-specific data representations they apply, game companies need to spend extra efforts to reuse the proposed methods in more than one products. This work proposes an event-frequency-based data representation that is generally applicable to games. This method of data representation relies only on counts of in-game events instead of prior knowledge of the game. To verify the generality and performance of this data-representation, it was applied to three different genres of games for predicting player first-purchasing, disengagement and churn behaviours. Experiments show that this data representation method can provide a competitive performance across different games. Category definition is another essential component of classification problems. As labelling method that relies on some specific conditions to distribute players into classes can often lead to imbalanced classification problems, this work applied two commonly used approaches, i.e., random undersampling and Synthetic Minority Over-Sampling Technique (SMOTE), for rebalancing the imbalanced tasks. Results suggested that undersampling is able to provide better performance in the cases where the quantity of data is sufficient whereas the SMOTE has more chances when the dataset is too small to be balanced with the undersampling approach. Besides, this work also proposes a new category-definition method which can maintain a distribution of the resultant classes that is closer to balanced. In addition, the parameters used in this method can also be used to gain insight into the health of the game. Preliminary experimental results show that this method of category definition is able to improve the balance of the class distribution when it is applied to different games and provide significantly better performance than random classifiers.

APA, Harvard, Vancouver, ISO, and other styles

24

Cheung, Jarvis T. "Representation and extraction of trends from process data." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/13186.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Baum, Robert Adam. "A tolerance representation scheme for solid models." Thesis, Georgia Institute of Technology, 1989. http://hdl.handle.net/1853/18180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Correia, Filipe Laginha Pinto. "Cartographic representation of spatiotemporal phenomena." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/11185.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
The field of geovisual analytics focuses on visualization techniques to analyze spatial data by enhancing human cognition. However, spatial data also has a temporal component that is practically disregarded when using conventional geovisual analytic tools. Some proposals have been made for techniques to analyze spatiotemporal data, but most were made for specific use cases, and are hard to abstract for other situations. There was a need to create a method to describe and compare the existing techniques. A catalog that provides a clear description of a set of techniques that deal with spatiotemporal data is proposed. This allows the identification of the most useful techniques depending on the required criteria. The description of a technique in the catalog relies on the two frameworks proposed. The first framework is used for describing spatiotemporal datasets resorting to data scenarios, a class of datasets. Twenty three data scenarios are described using this framework. The second framework is used for describing analytical tasks on spatiotemporal data, nine different tasks are described using this framework. Also, in this document, is the proposal of two new geovisual analytical techniques that can be applied to spatiotemporal data: the attenuation & accumulation map technique and the overlapping spatiotemporal windows technique. A prototype was developed that implements both techniques as a proof of concept.
research project “GIAP - GeoInsight Analytics Platform (LISBOA-01-0202-FEDER- 024822)”, funded by Comissão de Coordenação e Desenvolvimento Regional de Lisboa e Vale do Tejo (PORLisboa), included in Sistema de Incentivos à Investigação e Desenvolvimento Tecnológico (SI I&DT), through a MSc research fellowship from FCT-UNL

APA, Harvard, Vancouver, ISO, and other styles

27

Rahman, Md Anisur. "Tabular Representation of Schema Mappings: Semantics and Algorithms." Thèse, Université d'Ottawa / University of Ottawa, 2011. http://hdl.handle.net/10393/20032.

Full text

Abstract:

Our thesis investigates a mechanism for representing schema mapping by tabular forms and checking utility of the new representation. Schema mapping is a high-level specification that describes the relationship between two database schemas. Schema mappings constitute essential building blocks of data integration, data exchange and peer-to-peer data sharing systems. Global-and-local-as-view (GLAV) is one of the approaches for specifying the schema mappings. Tableaux are used for expressing queries and functional dependencies on a single database in a tabular form. In our thesis, we first introduce a tabular representation of GLAV mappings. We find that this tabular representation helps to solve many mapping-related algorithmic and semantic problems. For example, a well-known problem is to find the minimal instance of the target schema for a given instance of the source schema and a set of mappings between the source and the target schema. Second, we show that our proposed tabular mapping can be used as an operator on an instance of the source schema to produce an instance of the target schema which is `minimal' and `most general' in nature. There exists a tableaux-based mechanism for finding equivalence of two queries. Third, we extend that mechanism for deducing equivalence between two schema mappings using their corresponding tabular representations. Sometimes, there exist redundant conjuncts in a schema mapping which causes data exchange, data integration and data sharing operations more time consuming. Fourth, we present an algorithm that utilizes the tabular representations for reducing number of constraints in the schema mappings. At present, either schema-level mappings or data-level mappings are used for data sharing purposes. Fifth, we introduce and give the semantics of bi-level mapping that combines the schema-level and data-level mappings. We also show that bi-level mappings are more effective for data sharing systems. Finally, we implemented our algorithms and developed a software prototype to evaluate our proposed strategies.

APA, Harvard, Vancouver, ISO, and other styles

28

Mehta, Nishant A. "On sparse representations and new meta-learning paradigms for representation learning." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/52159.

Full text

Abstract:

Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning. On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries. The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective.

APA, Harvard, Vancouver, ISO, and other styles

29

Brisson, Erik. "Representation of d-dimensional geometric objects /." Thesis, Connect to this title online; UW restricted, 1990. http://hdl.handle.net/1773/6903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Gurung, Topraj. "Compact connectivity representation for triangle meshes." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/47709.

Full text

Abstract:

Many digital models used in entertainment, medical visualization, material science, architecture, Geographic Information Systems (GIS), and mechanical Computer Aided Design (CAD) are defined in terms of their boundaries. These boundaries are often approximated using triangle meshes. The complexity of models, which can be measured by triangle count, increases rapidly with the precision of scanning technologies and with the need for higher resolution. An increase in mesh complexity results in an increase of storage requirement, which in turn increases the frequency of disk access or cache misses during mesh processing, and hence decreases performance. For example, in a test application involving a mesh with 55 million triangles in a machine with 4GB of memory versus a machine with 1GB of memory, performance decreases by a factor of about 6000 because of memory thrashing. To help reduce memory thrashing, we focus on decreasing the average storage requirement per triangle measured in 32-bit integer references per triangle (rpt). This thesis covers compact connectivity representation for triangle meshes and discusses four data structures: 1. Sorted Opposite Table (SOT), which uses 3 rpt and has been extended to support tetrahedral meshes. 2. Sorted Quad (SQuad), which uses about 2 rpt and has been extended to support streaming. 3. Laced Ring (LR), which uses about 1 rpt and offers an excellent compromise between storage compactness and performance of mesh traversal operators. 4. Zipper, an extension of LR, which uses about 6 bits per triangle (equivalently 0.19 rpt), therefore is the most compact representation. The triangle mesh data structures proposed in this thesis support the standard set of mesh connectivity operators introduced by the previously proposed Corner Table at an amortized constant time complexity. They can be constructed in linear time and space from the Corner Table or any equivalent representation. If geometry is stored as 16-bit coordinates, using Zipper instead of the Corner Table increases the size of the mesh that can be stored in core memory by a factor of about 8.

APA, Harvard, Vancouver, ISO, and other styles

31

Torres-Rojas, Francisco Jose. "Efficient time representation in distributed systems." Thesis, Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/8301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Becek, Kazimierz. "Biomass Representation in Synthetic Aperture Radar Interferometry Data Sets." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-62707.

Full text

Abstract:

This work makes an attempt to explain the origin, features and potential applications of the elevation bias of the synthetic aperture radar interferometry (InSAR) datasets over areas covered by vegetation. The rapid development of radar-based remote sensing methods, such as synthetic aperture radar (SAR) and InSAR, has provided an alternative to the photogrammetry and LiDAR for determining the third dimension of topographic surfaces. The InSAR method has proved to be so effective and productive that it allowed, within eleven days of the space shuttle mission, for acquisition of data to develop a three-dimensional model of almost the entire land surface of our planet. This mission is known as the Shuttle Radar Topography Mission (SRTM). Scientists across the geosciences were able to access the great benefits of uniformity, high resolution and the most precise digital elevation model (DEM) of the Earth like never before for their a wide variety of scientific and practical inquiries. Unfortunately, InSAR elevations misrepresent the surface of the Earth in places where there is substantial vegetation cover. This is a systematic error of unknown, yet limited (by the vertical extension of vegetation) magnitude. Up to now, only a limited number of attempts to model this error source have been made. However, none offer a robust remedy, but rather partial or case-based solutions. More work in this area of research is needed as the number of airborne and space-based InSAR elevation models has been steadily increasing over the last few years, despite strong competition from LiDAR and optical methods. From another perspective, however, this elevation bias, termed here as the “biomass impenetrability”, creates a great opportunity to learn about the biomass. This may be achieved due to the fact that the impenetrability can be considered a collective response to a few factors originating in 3D space that encompass the outermost boundaries of vegetation. The biomass, presence in InSAR datasets or simply the biomass impenetrability, is the focus of this research. The report, presented in a sequence of sections, gradually introduces terminology, physical and mathematical fundamentals commonly used in describing the propagation of electromagnetic waves, including the Maxwell equations. The synthetic aperture radar (SAR) and InSAR as active remote sensing methods are summarised. In subsequent steps, the major InSAR data sources and data acquisition systems, past and present, are outlined. Various examples of the InSAR datasets, including the SRTM C- and X-band elevation products and INTERMAP Inc. IFSAR digital terrain/surface models (DTM/DSM), representing diverse test sites in the world are used to demonstrate the presence and/or magnitude of the biomass impenetrability in the context of different types of vegetation – usually forest. Also, results of investigations carried out by selected researchers on the elevation bias in InSAR datasets and their attempts at mathematical modelling are reviewed. In recent years, a few researchers have suggested that the magnitude of the biomass impenetrability is linked to gaps in the vegetation cover. Based on these hints, a mathematical model of the tree and the forest has been developed. Three types of gaps were identified; gaps in the landscape-scale forest areas (Type 1), e.g. forest fire scares and logging areas; a gap between three trees forming a triangle (Type 2), e.g. depending on the shape of tree crowns; and gaps within a tree itself (Type 3). Experiments have demonstrated that Type 1 gaps follow the power-law density distribution function. One of the most useful features of the power-law distributed phenomena is their scale-independent property. This property was also used to model Type 3 gaps (within the tree crown) by assuming that these gaps follow the same distribution as the Type 1 gaps. A hypothesis was formulated regarding the penetration depth of the radar waves within the canopy. It claims that the depth of penetration is simply related to the quantisation level of the radar backscattered signal. A higher level of bits per pixels allows for capturing weaker signals arriving from the lower levels of the tree crown. Assuming certain generic and simplified shapes of tree crowns including cone, paraboloid, sphere and spherical cap, it was possible to model analytically Type 2 gaps. The Monte Carlo simulation method was used to investigate relationships between the impenetrability and various configurations of a modelled forest. One of the most important findings is that impenetrability is largely explainable by the gaps between trees. A much less important role is played by the penetrability into the crown cover. Another important finding is that the impenetrability strongly correlates with the vegetation density. Using this feature, a method for vegetation density mapping called the mean maximum impenetrability (MMI) method is proposed. Unlike the traditional methods of forest inventories, the MMI method allows for a much more realistic inventory of vegetation cover, because it is able to capture an in situ or current situation on the ground, but not for areas that are nominally classified as a “forest-to-be”. The MMI method also allows for the mapping of landscape variation in the forest or vegetation density, which is a novel and exciting feature of the new 3D remote sensing (3DRS) technique. Besides the inventory-type applications, the MMI method can be used as a forest change detection method. For maximum effectiveness of the MMI method, an object-based change detection approach is preferred. A minimum requirement for the MMI method is a time-lapsed reference dataset in the form, for example, of an existing forest map of the area of interest, or a vegetation density map prepared using InSAR datasets. Preliminary tests aimed at finding a degree of correlation between the impenetrability and other types of passive and active remote sensing data sources, including TerraSAR-X, NDVI and PALSAR, proved that the method most sensitive to vegetation density was the Japanese PALSAR - L-band SAR system. Unfortunately, PALSAR backscattered signals become very noisy for impenetrability below 15 m. This means that PALSAR has severe limitations for low loadings of the biomass per unit area. The proposed applications of the InSAR data will remain indispensable wherever cloud cover obscures the sky in a persistent manner, which makes suitable optical data acquisition extremely time-consuming or nearly impossible. A limitation of the MMI method is due to the fact that the impenetrability is calculated using a reference DTM, which must be available beforehand. In many countries around the world, appropriate quality DTMs are still unavailable. A possible solution to this obstacle is to use a DEM that was derived using P-band InSAR elevations or LiDAR. It must be noted, however, that in many cases, two InSAR datasets separated by time of the same area are sufficient for forest change detection or similar applications.

APA, Harvard, Vancouver, ISO, and other styles

33

Alami, Wassim T. (Wassim Tarek). "Multi-scale object representation and localization using range data." Thesis, McGill University, 1994. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=69782.

Full text

Abstract:

This thesis examines an approach to the representation of 3-D objects based on multi-scale surface patches. These patches are computed using multiple alternative decompositions of the surface based on the signs of the mean and Gaussian curvatures. The approach is applicable both to polyhedral objects as well as smoothly curved objects.
A hierarchical ranking of these patches is then used to describe individual objects based on geometric information. These geometric descriptors are ranked according to several criteria expressing their estimated stability and utility.
Pose estimation is cast as an optimal matching problem. The geometric pose transformation between two views of a simple curved object is found by matching multi-scale descriptions corresponding to the two views. Different combinations of possible three patch correspondences are found and ranked between the two views and the position transformation (rotation and translation) is computed. The starting patches are constrained to be those with the most stable description. The cost of matching the two sets of representative patches based on the position transformation is computed. The final pose estimate is obtained from the correspondence that produces the best global consistency.
The algorithm's applicability to pose estimation is demonstrated by examples using real range data and its behaviour in the presence of noise is validated. Its use in object recognition is then discussed.

APA, Harvard, Vancouver, ISO, and other styles

34

Lustosa, Hermano Lourenço Souza. "Managing numerical simulation data using a multidimensional array representation." Laboratório Nacional de Computação Científica, 2015. https://tede.lncc.br/handle/tede/250.

Full text

Abstract:

Submitted by Maria Cristina (library@lncc.br) on 2017-04-18T17:56:33Z No. of bitstreams: 1 Dissertação (Hermano Lustosa).pdf: 11841214 bytes, checksum: c30da4b19ca9fd69bf262318a593729b (MD5)
Approved for entry into archive by Maria Cristina (library@lncc.br) on 2017-04-18T17:56:51Z (GMT) No. of bitstreams: 1 Dissertação (Hermano Lustosa).pdf: 11841214 bytes, checksum: c30da4b19ca9fd69bf262318a593729b (MD5)
Made available in DSpace on 2017-04-18T17:57:04Z (GMT). No. of bitstreams: 1 Dissertação (Hermano Lustosa).pdf: 11841214 bytes, checksum: c30da4b19ca9fd69bf262318a593729b (MD5) Previous issue date: 2015-12-09
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Scientific applications, such as numerical simulations, generate an ever increasing amount of data that needs to be eficiently managed. As most traditional row-store Database Management Systems are not tailored for the analytical workload usually required by such applications, alternative approaches, e. g., columnstore and multidimensional arrays, can offer better querying processing time. In this work, we propose new techniques for managing the data produced by numerical simulations, such as those coming from HeMoLab, by using multidimensional array technologies. We take advantage of multidimensional array that nicely models the dimensions and variables used in numerical simulations. The eficient mapping of the simulation output file onto a multi-dimensional array is not simple. A naive solution may lead to sparse arrays, impacting query response time, specially when the simulation uses irregular meshes to model its physical domain. We propose novel strategies to solve these problems by defining an eficient mapping of coordinate values in numerical simulations to evenly distribute cells in array chunks with the use of equi-depth histograms and space-filling curves. We evaluated our techniques through experiments over real-world data, comparing them with a columnar and a row-store relational systems. The results indicate that multidimensional arrays and column-stores are much faster than a tradivitional row-store system for queries issued over a larger amount of simulation data. Also, the results help to identify the scenarios in which using multidimensional arrays is the most eficient approach, and the ones in which they are outperformed by the relational column-store approach.
Aplicações científicas geram uma crescente massa de dados que precisam ser analisados e gerenciados eficientemente. Uma vez que os tradicionais bancos de dados relacionais não são projetados para a carga de trabalho predominantemente analítica exigida por essas aplicações, abordagens alternativas, tais como, matrizes multidimensionais e bancos de dados colunares, podem oferecer melhores tempos de execução de consultas. Neste trabalho, propomos o uso de novas tecnologias para a gerência de dados produzidos por simulações numéricas, similares às desenvolvidas pelo HeMoLab. O modelo de matrizes multidimensionais permite a modelagem elegante de dimensões e variáveis usadas em simulações numéricas. Entretanto, o mapeamento dos dados de saída de uma simulação em uma matriz multidimensional não é simples. Uma solução ingênua pode levar a criação de matrizes excessivamente esparsas, impactando o tempo de resposta do sistema, especialmente quando a simulação utiliza uma malha irregular para modelar o seu domínio físico. Nós propomos novas estratégias para resolver esses problemas através da definição de um mapeamento eficiente de valores de coordenadas com o uso de histogramas e curvas de preenchimento espacial. Nós avaliamos nossas técnicas através de experimentos feitos com dados reais, comparando-as com bancos de dados relacionais. Os resultados indicam que tanto iv matrizes multidimensionais quanto bancos de dados colunares são muitas vezes mais rápidos que bancos de dados relacionais tradicionais para consultas avaliando uma grande quantidade de dados. Além disso, os resultados auxiliam na identificação de cenários nos quais matrizes multidimensionais são mais eficientes, e nos quais elas são superadas por uma abordagem envolvendo o uso de um banco de dados colunar.

APA, Harvard, Vancouver, ISO, and other styles

35

Houé, Maxime. "Clustering of short sentences through representation of text data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-266121.

Full text

Abstract:

Natural Language Processing has developed in the past few years very quickly.Numerous new applications emerged from new methods, notably involved bythe creation of the popular word embedding Word2Vec created by a team ofGoogle researchers. One of these new applications is the chatbot technology.The goal of these conversational interfaces is to be able to communicate automaticallywith humans via written or voice chat. With a chatbot, a companyhopes to improve customer relations at a lower cost. Unfortunately, the skills ofthe chatbots can vary a lot, and until now, their understanding of the humansis often rather bad. This harsh conclusion leads to wonder how the chatbotdevelopers can be helped for handling the large amounts of user requests notunderstood by their chatbot.This thesis was made in collaboration with a start-up named Askhub. Thisstart-up aims to help the companies with the development of their chatbot.The aim of this master thesis is to propose a clustering system in order toclassify the data not understood by a chatbot. To begin with, a study of thedierent methods of word embeddings has been realized, followed by a study ofdierent clustering techniques available suitable to the chosen word embedding.The results are then compared with some metrics and some propositions weremade in order to improve the clustering results.
Natural Language Processing har utvecklats de senaste åren mycket snabbt.Många nya applikationer uppstod av nya metoder, särskilt involverade i skapandet av det populära word embedding Word2Vec skapat av ett team av Googleforskare. En av dessa nya applikationer är chatbot-tekniken. Målet med dessa konversationsgränssnitt är att kunna kommunicera automatiskt med människor via skriftlig eller röstchatt. Med ett chatbot hoppas ett företag förbättra sina kundrelationer till en lägre kostnad. Tyvärr kan chatbots kompetens variera mycket, men till dess är deras förståelse för människorna ofta ganska dålig. Denna hårda slutsats leder till att undra hur chatbot-utvecklarna kan hjälpas för att hantera stora mängder användarförfrågningar som inte förstås av deras chatbot.Detta examensarbete gjordes i samarbete med en start-up som heter Askhub.Denna uppstart syftar till att hjälpa företagen att utveckla sin chatbot.Syftet med denna detta examensarbete är att föreslå ett klustringssystem för att klassificera data som inte förstås av en chatbot. Till att börja med har en studie av de olika metoderna för word embeddings gjorts, följt av en studie av olika klusteranalyser som är lämpliga för det valda word embedding. Resultaten jämförs sedan med vissa mätvärden och några förslag gjordes för att förbättraklusteranalysresultatet.

APA, Harvard, Vancouver, ISO, and other styles

36

Sävhammar, Simon. "Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19194.

Full text

Abstract:

The uniform interval normalization technique is proposed as an approach to handle sparse data and to handle noise in the data. The technique is evaluated transforming and normalizing the MoodMapper and Safebase data sets, the predictive capabilities are compared by forecasting the data set with aLSTM model. The results are compared to both the commonly used MinMax normalization technique and MinMax normalization with a time2vec layer. It was found the uniform interval normalization performed better on the sparse MoodMapper data set, and the denser Safebase data set. Future works consist of studying the performance of uniform interval normalization on other data sets and with other machine learning models.

APA, Harvard, Vancouver, ISO, and other styles

37

Lundgren, Clara. "Female representation and public spending : Investigating female representation as a determinant of local expenditure patterns." Thesis, Uppsala universitet, Nationalekonomiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-435526.

Full text

Abstract:

The objective of this thesis is to investigate if the share of women in the municipality board affects the municipalities spending patterns. The study is based on the assumption that women as a group have particular needs, interest and concerns, and when represented, the political decision making will be affected. I used panel data over all 290 municipalities in Sweden over the years 2011, 2015 and 2019, to study the following public spending areas; reception of refugees, elderly care, education and childcare. I conducted a panel regression model with entity and time fixed effects and also added several control variables. The result suggests that there is a significant effect of female representation on spending related to the reception of refugees, but the effect of the other spending areas examined; childcare, education and care of elderly is negative but not significant.

APA, Harvard, Vancouver, ISO, and other styles

38

Shi, Peiyang. "Faster Unsupervised Object Detection For Symbolic Representation." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277852.

Full text

Abstract:

Symbolic artificial intelligence has seen a wave of intense research in the late 20th century. More recently, the field of deep learning and deep reinforcement learning has been making large strides in terms of computer vision and robotic applications. Both fields have impressive accomplishments but are situated on two opposite ends of the spectrum in AI research. Mainstream deep learning relies on automatic feature extraction which often includes abstract features while symbolic AI often relies on handcrafting symbols and semantics. In this work, we introduce a deep learning algorithm for learning symbolic representation. The algorithm is bases on recent advances in un-supervised object detection, and we demonstrate that it ca be easily adapted for symbolic representation. Our algorithm, FaSPAIR, is an adaption of the object detection algorithm SPAIR. We have made several changes to bridge the model to the symbolic representation needed for reinforcement learning and to improve training speed. Our results demonstrate the efficacy of using object detection for learning symbolic representation. We also demonstrate that FaSPAIR has a large boost in computation speed compared to the current state of the art algorithm SPAIR.
Under slutet av 1900-talet har forskning inom symbolisk artificiell intelligens ökat kraftigt. Områdena djupinlärning och djup förstärkningsinlärning har på senare tid gjort stora framsteg inom datorseende och robotapplikationer. Både områdena har gjort imponerande framsteg men finner sig i motsatta ändar av AI-forskningens spektrum. Mainstream djupinlärning bygger på automatisk extraktion utan vidare hänsyn till tolkningsbara symboler medan symbolisk AI ofta fokuserar på handgjorda symboler. I denna studie introducerar vi en djupinlärningsalgoritm för symbolrepresentationsinlärning. Algoritmen baseras på de senaste framstegen inom oövervakad objektdetektering och vi visar att den lätt kan anpassas för symbolisk representation. Vår algoritm, FaSPAIR, är en anpassning av algoritmen för objektdetektering, SPAIR. Vi har gjort flera förändringar för att kunna länka modellen till den symboliska representationen som behövs för förstärkningsinlärning samt för förbättring av träningshastigheten. Våra resultat visar verkan och effektiviteten av att använda objektdetektion för symbolisk representationsinlärning. Vi visar även att FaSPAIR ger stor förbättring i beräkningshastigheten jämfört med den toppmoderna algoritmen SPAIR.

APA, Harvard, Vancouver, ISO, and other styles

39

Goebel, Randy. "A logic data model for the machine representation of knowledge." Thesis, University of British Columbia, 1985. http://hdl.handle.net/2429/25799.

Full text

Abstract:

DLOG is a logic-based data model developed to show how logic-programming can combine contributions of Data Base Management (DBM) and Artificial Intelligence (AI). The DLOG specification includes a language syntax, a proof (or query evaluation) procedure, a description of the language's semantics, and a specification of the relationships between assertions, queries, and application databases. DLOG's data description language is the Horn clause subset of first order logic [Kowalski79, Kowalski81], augmented with descriptive terms and non-Horn integrity constraints. The descriptive terms are motivated by AI representation language ideas, specifically, the descriptive terms of the KRL language [Bobrow77]. A similar facility based on logical descriptions is provided in DLOG. DLOG permits the use of definite and indefinite descriptions of individuals and sets in queries and assertions. The meaning of DLOG's extended language is specified as Horn clauses that describe the relation between the basic language and the extensions. The experimental implementation is a Prolog program derived from that specification. The DLOG implementation relies on an extension to the standard Prolog proof procedure. This includes a "unification" procedure that matches embedded terms by recursively invoking the DLOG proof procedure (cf. LOGLISP [Robinson82]). The experimental system includes Prolog implementations of traditional database facilities (e.g., transactions, integrity constraints, data dictionaries, data manipulation language facilities), and an idea for using logic as the basis for heuristic interpretation of queries. This heuristic uses a notion of partial, match or sub-proof to produce assumptions under which plausible query answers can be derived. The experimental DLOG knowledge base management system is exercised by describing an undergraduate degree program. The example application is a description of the Bachelor of Computer Science degree requirements at The University of British Columbia. This application demonstrates how DLOG's descriptive terms provide a concise description of degree program knowledge, and how that knowledge is used to specify student programs and select program options.
Science, Faculty of
Computer Science, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

40

Barton, Louis W. G. "Theory of semantic data representation for non-determinate symbol systems." Thesis, University of Oxford, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.669946.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Herzog, Erik. "An approach to systems engineering tool data representation and exchange." Doctoral thesis, Linköping : Univ, 2004. http://www.ep.liu.se/diss/science_technology/08/67/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Lodolini, Lucia. "The representation of symmetric patterns using the Quadtree data structure /." Online version of thesis, 1988. http://hdl.handle.net/1850/8402.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Rosen, Jonathan Adam. "Distortion correction and momentum representation of angle-resolved photoemission data." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/5317.

Full text

Abstract:

Angle Resolve Photoemission Spectroscopy (ARPES) experiments provides a map of intensity as function of angles and electron kinetic energy to measure the many-body spectral function, but the raw data returned by standard apparatus is not ready for analysis. An image warping based distortion correction from slit array calibration is shown to provide the relevant information for construction of ARPES intensity as a function of electron momentum. A theory is developed to understand the calculation and uncertainty of the distortion corrected angle space data and the final momentum data. An experimental procedure for determination of the electron analyzer focal point is described and shown to be in good agreement with predictions. The electron analyzer at the Quantum Materials Laboratory at UBC is found to have a focal point at cryostat position 1.09mm within 1.00 mm, and the systematic error in the angle is found to be 0.2 degrees. The angular error is shown to be proportional to a functional form of systematic error in the final ARPES data that is highly momentum dependent.

APA, Harvard, Vancouver, ISO, and other styles

44

Lucke, Helmut. "On the representation of temporal data for connectionist word recognition." Thesis, University of Cambridge, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239520.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Shen, Yuming. "Deep binary representation learning for single/cross-modal data retrieval." Thesis, University of East Anglia, 2018. https://ueaeprints.uea.ac.uk/67635/.

Full text

Abstract:

Data similarity search is widely regarded as a classic topic in the realms of computer vision, machine learning and data mining. Providing a certain query, the retrieval model sorts out the related candidates in the database according to their similarities, where representation learning methods and nearest-neighbour search apply. As matching data features in Hamming space is computationally cheaper than in Euclidean space, learning to hash and binary representations are generally appreciated in modern retrieval models. Recent research seeks solutions in deep learning to formulate the hash functions, showing great potential in retrieval performance. In this thesis, we gradually extend our research topics and contributions from unsupervised single-modal deep hashing to supervised cross-modal hashing _nally zero-shot hashing problems, addressing the following challenges in deep hashing. First of all, existing unsupervised deep hashing works are still not attaining leading retrieval performance compared with the shallow ones. To improve this, a novel unsupervised single-modal hashing model is proposed in this thesis, named Deep Variational Binaries (DVB). We introduce the popular conditional variational auto-encoders to formulate the encoding function. By minimizing the reconstruction error of the latent variables, the proposed model produces compact binary codes without training supervision. Experiments on benchmarked datasets show that our model outperform existing unsupervised hashing methods. The second problem is that current cross-modal hashing methods only consider holistic image representations and fail to model descriptive sentences, which is inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To handle this problem, we propose a supervised deep cross-modal hashing model called Textual-Visual Deep Binaries (TVDB). Region-based neural networks and recurrent neural networks are involved in the image encoding network in order to make e_ective use of visual information, while the text encoder is built using a convolutional neural network. We additionally introduce an e_cient in-batch optimization routine to train the network parameters. The proposed mode successfully outperforms state-of-the-art methods on large-scale datasets. Finally, existing hashing models fail when the categories of query data have never been seen during training. This scenario is further extended into a novel zero-shot cross-modal hashing task in this thesis, and a Zero-shot Sketch-Image Hashing (ZSIH) scheme is then proposed with graph convolution and stochastic neurons. Experiments show that the proposed ZSIH model signi_cantly outperforms existing hashing algorithms in the zero-shot retrieval task. Experiments suggest our proposed and novel hashing methods outperform state-of-the-art researches in single-modal and cross-modal data retrieval.

APA, Harvard, Vancouver, ISO, and other styles

46

Mohammed, Ayat Mohammed Naguib. "High-dimensional Data in Scientific Visualization: Representation, Fusion and Difference." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78343.

Full text

Abstract:

Visualization has proven to be an effective means for analyzing high-dimensional data, especially Multivariate Multidimensional (MVMD) scientific data. Scientific visualization deals with data that have natural spatial mapping such as maps, buildings interiors or even your physiological body parts, while information visualization involves abstract, non-spatial data. Visual analytics uses either visualization types to gain deep inferences about scientific data or information. In recent years, a variety of techniques have been developed combining statistical and visual analysis tools to represent data of different types in one view to enable data fusion. One vital feature of such visualization tools is the support for comparison: showing the differences between two or more objects. This feature is called visual differencing, or discrimination. Visual differencing is a common requirement across different research domains, helping analysts compare different objects in the data set or compare different attributes of the same object. From a visual analytic point of view, this research examines humans' predictable bias in interpreting visual-spatial, spatiotemporal information, and inference-making in scientific visualization. Practically, I examined different case studies from different domains such as land suitability in agriculture, spectrum sensing in software-defined radio networks, raster images in remote sensing, pattern recognition in point cloud, airflow distribution in aerodynamics, galaxy catalogs in astrophysics and protein membrane interaction in molecular dynamics. Each case required different computing power, ranging from personal computer to high performance cluster. Based on this experience across application domains, I propose a high-performance visualization paradigm for scientific visualization that supports three key features of scientific data analysis: representations, fusion, and visual discrimination. This paradigm is informed by practical work with multiple high-performance computing and visualization platforms from desktop displays to immersive CAVE displays. In order to evaluate the applicability of the proposed paradigm, I carried out two user studies. The first user study addressed the feature of data fusion with multivariate maps and the second one addressed visual differencing with three multi-view management techniques. The high-performance visualization paradigm and the results of these studies contribute to our knowledge of efficient MVMD designs and provides scientific visualization developers with a framework to mitigate the trade-offs of scalable visualization design such as the data mappings, computing power, and output modality.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

47

Srivastava, Arunima. "Univariate and Multivariate Representation and Modeling of Cancer Biomedical Data." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1577717365850367.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Laforgue, Pierre. "Deep kernel representation learning for complex data and reliability issues." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT006.

Full text

Abstract:

Cette thèse débute par l'étude d'architectures profondes à noyaux pour les données complexes. L'une des clefs du succès des algorithmes d'apprentissage profond est la capacité des réseaux de neurones à extraire des représentations pertinentes. Cependant, les raisons théoriques de ce succès nous sont encore largement inconnues, et ces approches sont presque exclusivement réservées aux données vectorielles. D'autre part, les méthodes à noyaux engendrent des espaces fonctionnels étudiés de longue date, les Espaces de Hilbert à Noyau Reproduisant (Reproducing Kernel Hilbert Spaces, RKHSs), dont la complexité est facilement contrôlée par le noyau ou la pénalisation, tout en autorisant les prédictions dans les espaces structurés complexes via les RKHSs à valeurs vectorielles (vv-RKHSs).L'architecture proposée consiste à remplacer les blocs élémentaires des réseaux usuels par des fonctions appartenant à des vv-RKHSs. Bien que très différents à première vue, les espaces fonctionnels ainsi définis sont en réalité très similaires, ne différant que par l'ordre dans lequel les fonctions linéaires/non-linéaires sont appliquées. En plus du contrôle théorique sur les couches, considérer des fonctions à noyau permet de traiter des données structurées, en entrée comme en sortie, étendant le champ d'application des réseaux aux données complexes. Nous conclurons cette partie en montrant que ces architectures admettent la plupart du temps une paramétrisation finie-dimensionnelle, ouvrant la voie à des méthodes d'optimisation efficaces pour une large gamme de fonctions de perte.La seconde partie de cette thèse étudie des alternatives à la moyenne empirique comme substitut de l'espérance dans le cadre de la Minimisation du Risque Empirique (Empirical Risk Minimization, ERM). En effet, l'ERM suppose de manière implicite que la moyenne empirique est un bon estimateur. Cependant, dans de nombreux cas pratiques (e.g. données à queue lourde, présence d'anomalies, biais de sélection), ce n'est pas le cas.La Médiane-des-Moyennes (Median-of-Means, MoM) est un estimateur robuste de l'espérance construit comme suit: des moyennes empiriques sont calculées sur des sous-échantillons disjoints de l'échantillon initial, puis est choisie la médiane de ces moyennes. Nous proposons et analysons deux extensions de MoM, via des sous-échantillons aléatoires et/ou pour les U-statistiques. Par construction, les estimateurs MoM présentent des propriétés de robustesse, qui sont exploitées plus avant pour la construction de méthodes d'apprentissage robustes. Il est ainsi prouvé que la minimisation d'un estimateur MoM (aléatoire) est robuste aux anomalies, tandis que les méthodes de tournoi MoM sont étendues au cas de l'apprentissage sur les paires.Enfin, nous proposons une méthode d'apprentissage permettant de résister au biais de sélection. Si les données d'entraînement proviennent d'échantillons biaisés, la connaissance des fonctions de biais permet une repondération non-triviale des observations, afin de construire un estimateur non biaisé du risque. Nous avons alors démontré des garanties non-asymptotiques vérifiées par les minimiseurs de ce dernier, tout en supportant empiriquement l'analyse
The first part of this thesis aims at exploring deep kernel architectures for complex data. One of the known keys to the success of deep learning algorithms is the ability of neural networks to extract meaningful internal representations. However, the theoretical understanding of why these compositional architectures are so successful remains limited, and deep approaches are almost restricted to vectorial data. On the other hand, kernel methods provide with functional spaces whose geometry are well studied and understood. Their complexity can be easily controlled, by the choice of kernel or penalization. In addition, vector-valued kernel methods can be used to predict kernelized data. It then allows to make predictions in complex structured spaces, as soon as a kernel can be defined on it.The deep kernel architecture we propose consists in replacing the basic neural mappings functions from vector-valued Reproducing Kernel Hilbert Spaces (vv-RKHSs). Although very different at first glance, the two functional spaces are actually very similar, and differ only by the order in which linear/nonlinear functions are applied. Apart from gaining understanding and theoretical control on layers, considering kernel mappings allows for dealing with structured data, both in input and output, broadening the applicability scope of networks. We finally expose works that ensure a finite dimensional parametrization of the model, opening the door to efficient optimization procedures for a wide range of losses.The second part of this thesis investigates alternatives to the sample mean as substitutes to the expectation in the Empirical Risk Minimization (ERM) paradigm. Indeed, ERM implicitly assumes that the empirical mean is a good estimate of the expectation. However, in many practical use cases (e.g. heavy-tailed distribution, presence of outliers, biased training data), this is not the case.The Median-of-Means (MoM) is a robust mean estimator constructed as follows: the original dataset is split into disjoint blocks, empirical means on each block are computed, and the median of these means is finally returned. We propose two extensions of MoM, both to randomized blocks and/or U-statistics, with provable guarantees. By construction, MoM-like estimators exhibit interesting robustness properties. This is further exploited by the design of robust learning strategies. The (randomized) MoM minimizers are shown to be robust to outliers, while MoM tournament procedure are extended to the pairwise setting.We close this thesis by proposing an ERM procedure tailored to the sample bias issue. If training data comes from several biased samples, computing blindly the empirical mean yields a biased estimate of the risk. Alternatively, from the knowledge of the biasing functions, it is possible to reweight observations so as to build an unbiased estimate of the test distribution. We have then derived non-asymptotic guarantees for the minimizers of the debiased risk estimate thus created. The soundness of the approach is also empirically endorsed

APA, Harvard, Vancouver, ISO, and other styles

49

Lacerda, Fred W. "Comparative advantages of graphic versus numeric representation of quantitative data." Diss., Virginia Polytechnic Institute and State University, 1986. http://hdl.handle.net/10919/49817.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Pech, Palacio Manuel Alfredo. "Spatial data modeling and mining using a graph-based representation." Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0118/these.pdf.

Full text

Abstract:

Est proposé un unique modèle basé sur des graphes pour représenter des données spatiales, les données non-spatiales et les relations entre les objets spatiaux. Ainsi un graphe est généré à partir de ces trois éléments. On considère que l'outil de fouille de données basé sur les graphes peut découvrir des patterns incluant ces trois éléments, selon trois types de relation spatiale (topologique, cardinale et de distance). Dans notre modèle, les données spatiales, non-spatiales (attributs non-spatiaux), et les relations spatiales représentent une collections d'un ou plusieurs graphes orientés. Les sommets représentent soit les objets spatiaux, soit les relations spatiales entre deux objets spatiaux, ou les attributs non-spatiaux. De plus, un sommet peut représenter soit un attribut, soit le nom d'une relation spatiale. Les noms des attributs peuvent référencer des objets spatiaux ou non-spatiaux. Les arcs orientés sont utilisés pour représenter des informations directionnelles sur les relations entre les éléments, et pour décrire les attributs des objets. On a adopté SUBDUE comme un outil de fouille de graphes. Une caractéristique particulière dite de recouvrement joue un rôle important dans la découverte de patterns. Cependant, elle peut-être implémentée pour recouvrir la totalité du graphe, ou bien ne considérer aucun sommet. En conséquence, nous proposons une troisième piste nommée recouvrement limité, laquelle donne à l'utilisateur la capacité de choisir le recouvrement. On analyse directement trois caractéristiques de l'algorithme proposé, la réduction de l'espace de recherche, la réduction du temps de calcul, et la découverte de patterns grâce à ce type de recouvrement
We propose a unique graph-based model to represent spatial data, non-spatial data and the spatial relations among spatial objects. We will generate datasets composed of graphs with a set of these three elements. We consider that by mining a dataset with these characteristics a graph-based mining tool can search patterns involving all these elements at the same time improving the results of the spatial analysis task. A significant characteristic of spatial data is that the attributes of the neighbors of an object may have an influence on the object itself. So, we propose to include in the model three relationship types (topological, orientation, and distance relations). In the model the spatial data (i. E. Spatial objects), non-spatial data (i. E. Non-spatial attributes), and spatial relations are represented as a collection of one or more directed graphs. A directed graph contains a collection of vertices and edges representing all these elements. Vertices represent either spatial objects, spatial relations between two spatial objects (binary relation), or non-spatial attributes describing the spatial objects. Edges represent a link between two vertices of any type. According to the type of vertices that an edge joins, it can represent either an attribute name or a spatial relation name. The attribute name can refer to a spatial object or a non-spatial entity. We use directed edges to represent directional information of relations among elements (i. E. Object x touches object y) and to describe attributes about objects (i. E. Object x has attribute z). We propose to adopt the Subdue system, a general graph-based data mining system developed at the University of Texas at Arlington, as our mining tool. A special feature named overlap has a primary role in the substructures discovery process and consequently a direct impact over the generated results. However, it is currently implemented in an orthodox way: all or nothing. Therefore, we propose a third approach: limited overlap, which gives the user the capability to set over which vertices the overlap will be allowed. We visualize directly three motivations issues to propose the implementation of the new algorithm: search space reduction, processing time reduction, and specialized overlapping pattern oriented search

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data representation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles