Dissertations / Theses on the topic 'Approximate sampling'

To see the other types of publications on this topic, follow the link: Approximate sampling.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 16 dissertations / theses for your research on the topic 'Approximate sampling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nutini, Julie Ann. "A derivative-free approximate gradient sampling algorithm for finite minimax problems." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42200.

Full text
Abstract:
Mathematical optimization is the process of minimizing (or maximizing) a function. An algorithm is used to optimize a function when the minimum cannot be found by hand, or finding the minimum by hand is inefficient. The minimum of a function is a critical point and corresponds to a gradient (derivative) of 0. Thus, optimization algorithms commonly require gradient calculations. When gradient information of the objective function is unavailable, unreliable or ‘expensive’ in terms of computation time, a derivative-free optimization algorithm is ideal. As the name suggests, derivative-free optimization algorithms do not require gradient calculations. In this thesis, we present a derivative-free optimization algorithm for finite minimax problems. Structurally, a finite minimax problem minimizes the maximum taken over a finite set of functions. We focus on the finite minimax problem due to its frequent appearance in real-world applications. We present convergence results for a regular and a robust version of our algorithm, showing in both cases that either the function is unbounded below (the minimum is −∞) or we have found a critical point. Theoretical results are explored for stopping conditions. Additionally, theoretical and numerical results are presented for three examples of approximate gradients that can be used in our algorithm: the simplex gradient, the centered simplex gradient and the Gupal estimate of the gradient of the Steklov averaged function. A performance comparison is made between the regular and robust algorithm, the three approximate gradients, and the regular and robust stopping conditions. Finally, an application in seismic retrofitting is discussed.
APA, Harvard, Vancouver, ISO, and other styles
2

Rösch, Philipp, and Wolfgang Lehner. "Optimizing Sample Design for Approximate Query Processing." IGI Global, 2013. https://tud.qucosa.de/id/qucosa%3A72930.

Full text
Abstract:
The rapid increase of data volumes makes sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatically determine the optimal sample for a given query remained (almost) unaddressed. To tackle this problem the authors propose a sample advisor based on a novel cost model. Primarily designed for advising samples of a few queries specified by an expert, the authors additionally propose two extensions of the sample advisor. The first extension enhances the applicability by utilizing recorded workload information and taking memory bounds into account. The second extension increases the effectiveness by merging samples in case of overlapping pieces of sample advice. For both extensions, the authors present exact and heuristic solutions. Within their evaluation, the authors analyze the properties of the cost model and demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.
APA, Harvard, Vancouver, ISO, and other styles
3

Le, Quoc Do. "Approximate Data Analytics Systems." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-234219.

Full text
Abstract:
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.
APA, Harvard, Vancouver, ISO, and other styles
4

Jahraus, Karen Veronica. "Using the jackknife technique to approximate sampling error for the cruise-based lumber recovery factor." Thesis, University of British Columbia, 1987. http://hdl.handle.net/2429/26419.

Full text
Abstract:
Timber cruises in the interior of British Columbia are designed to meet precision requirements for estimating total net merchantable volume. The effect of this single objective design on the precision of other cruise-based estimates is not calculated. One key secondary objective, used in the stumpage appraisal of timber in the interior of the province, is estimation of the lumber recovery factor (LRF). The importance of the LRF in determining stumpage values and the fact that its precision is not presently calculated, prompted this study. Since the LRF is a complicated statistic obtained from a complex sampling design, standard methods of variance calculation cannot be applied. Therefore, the jackknife procedure, a replication technique for approximating variance, was used to determine the sampling error for LRF. In the four cruises examined, the sampling error for LRF ranged from 1.27 fbm/m³ to 15.42 fbm/m³. The variability in the LRF was related to the number of sample trees used in its estimation. The impact of variations in the LRF on the appraised stumpage rate was influenced by the lumber selling price, the profit and risk ratio and the chip value used in the appraisal calculations. In the cruises investigated, the change in the stumpage rate per unit change in the LRF ranged between $0.17/m³ and $0.21/m³. As a result, sampling error in LRF can have a significant impact on assessed stumpage rates. Non-sampling error is also a major error source associated with LRF, but until procedural changes occur, control of sampling error is the only available means of increasing the precision of the LRF estimate. Consequently, it is recommended that the cruise design objectives be modified to include a maximum allowable level of sampling error for the LRF.
Forestry, Faculty of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
5

Rösch, Philipp. "Design von Stichproben in analytischen Datenbanken." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-22916.

Full text
Abstract:
Aktuelle Studien belegen ein rasantes, mehrdimensionales Wachstum in analytischen Datenbanken: Das Datenvolumen verzehnfachte sich in den letzten vier Jahren, die Anzahl der Nutzer wuchs um durchschnittlich 25% pro Jahr und die Anzahl der Anfragen verdoppelte sich seit 2004 jährlich. Bei den Anfragen handelt es sich zunehmend um komplexe Verbundanfragen mit Aggregationen; sie sind häufig explorativer Natur und werden interaktiv an das System gestellt. Eine Möglichkeit, der Forderung nach Interaktivität bei diesem starken, mehrdimensionalen Wachstum nachzukommen, stellen Stichproben und eine darauf aufsetzende näherungsweise Anfrageverarbeitung dar. Diese Lösung bietet signifikant kürzere Antwortzeiten sowie Schätzungen mit probabilistischen Fehlergrenzen. Mit den Operationen Verbund, Gruppierung und Aggregation als Hauptbestandteile analytischer Anfragen ergeben sich folgende Anforderungen an das Design von Stichproben in analytischen Datenbanken: Zwischen den Stichproben fremdschlüsselverbundener Relationen ist die referenzielle Integrität zu gewährleisten, sämtliche Gruppen sind angemessen zu repräsentieren und Aggregationsattribute sind auf extreme Werte zu untersuchen. In dieser Dissertation wird für jedes dieser Teilprobleme ein Stichprobenverfahren vorgestellt, das sich durch speicherplatzbeschränkte Stichproben und geringe Schätzfehler auszeichnet. Im ersten der vorgestellten Verfahren wird durch eine korrelierte Stichprobenerhebung die referenzielle Integrität bei minimalem zusätzlichen Speicherplatz gewährleistet. Das zweite vorgestellte Stichprobenverfahren hat durch eine Berücksichtigung der Streuung der Daten eine angemessene Repräsentation sämtlicher Gruppen zur Folge und unterstützt damit beliebige Gruppierungen, und im dritten Verfahren ermöglicht eine mehrdimensionale Ausreißerbehandlung geringe Schätzfehler für beliebig viele Aggregationsattribute. Für jedes dieser Verfahren wird die Qualität der resultierenden Stichprobe diskutiert und bei der Berechnung speicherplatzbeschränkter Stichproben berücksichtigt. Um den Berechnungsaufwand und damit die Systembelastung gering zu halten, werden für jeden Algorithmus Heuristiken vorgestellt, deren Kennzeichen hohe Effizienz und eine geringe Beeinflussung der Stichprobenqualität sind. Weiterhin werden alle möglichen Kombinationen der vorgestellten Stichprobenverfahren betrachtet; diese Kombinationen ermöglichen eine zusätzliche Verringerung der Schätzfehler und vergrößern gleichzeitig das Anwendungsspektrum der resultierenden Stichproben. Mit der Kombination aller drei Techniken wird ein Stichprobenverfahren vorgestellt, das alle Anforderungen an das Design von Stichproben in analytischen Datenbanken erfüllt und die Vorteile der Einzellösungen vereint. Damit ist es möglich, ein breites Spektrum an Anfragen mit hoher Genauigkeit näherungsweise zu beantworten
Recent studies have shown the fast and multi-dimensional growth in analytical databases: Over the last four years, the data volume has risen by a factor of 10; the number of users has increased by an average of 25% per year; and the number of queries has been doubling every year since 2004. These queries have increasingly become complex join queries with aggregations; they are often of an explorative nature and interactively submitted to the system. One option to address the need for interactivity in the context of this strong, multi-dimensional growth is the use of samples and an approximate query processing approach based on those samples. Such a solution offers significantly shorter response times as well as estimates with probabilistic error bounds. Given that joins, groupings and aggregations are the main components of analytical queries, the following requirements for the design of samples in analytical databases arise: 1) The foreign-key integrity between the samples of foreign-key related tables has to be preserved. 2) Any existing groups have to be represented appropriately. 3) Aggregation attributes have to be checked for extreme values. For each of these sub-problems, this dissertation presents sampling techniques that are characterized by memory-bounded samples and low estimation errors. In the first of these presented approaches, a correlated sampling process guarantees the referential integrity while only using up a minimum of additional memory. The second illustrated sampling technique considers the data distribution, and as a result, any arbitrary grouping is supported; all groups are appropriately represented. In the third approach, the multi-column outlier handling leads to low estimation errors for any number of aggregation attributes. For all three approaches, the quality of the resulting samples is discussed and considered when computing memory-bounded samples. In order to keep the computation effort - and thus the system load - at a low level, heuristics are provided for each algorithm; these are marked by high efficiency and minimal effects on the sampling quality. Furthermore, the dissertation examines all possible combinations of the presented sampling techniques; such combinations allow to additionally reduce estimation errors while increasing the range of applicability for the resulting samples at the same time. With the combination of all three techniques, a sampling technique is introduced that meets all requirements for the design of samples in analytical databases and that merges the advantages of the individual techniques. Thereby, the approximate but very precise answering of a wide range of queries becomes a true possibility
APA, Harvard, Vancouver, ISO, and other styles
6

Žakienė, Inesa. "Horvico ir Tompsono įvertinio dispersijos vertinimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120813_131528-29461.

Full text
Abstract:
Šiame magistro diplominiame darbe, naudojant skirtingas atstumo funkcijas ir kalibravimo lygtis, išvedami Horvico ir Tompsono įvertinio dispersijos įvertinių svoriai. Tokiu būdu, sukonstruojami aštuoni nauji Horvico ir Tompsono įvertinio dispersijos įvertiniai. Naudojant Teiloro ištiesinimo metodą pateikiamos sukonstruotų įvertinių apytikslės dispersijos ir pasiūlyti šių dispersijų įvertiniai. Be to, darbe atliekamas matematinis modeliavimas, kurio eksperimentai atlikti naudojant darbo autorės sukurtas MATLAB programas. Matematinio modeliavimo tikslas - naujus įvertinius palyginti tarpusavyje ir su standartiniu įvertiniu. Tiriama, kaip įvertinių tikslumas priklauso nuo pasirinkto imties plano.
In this master's graduation work, the weights of estimators of Horvitz & Thompson estimator of variance are defined by using some different distance function and calibration equations. In such a way, the new eight estimators of Horvitz & Thompson estimator of variance were constructed. Using the Taylor linearization method the approximate variances of the constructed estimators were derived. The estimators of the variances of these estimators are proposed as well. Also we perform here a mathematical modeling using MATLAB program. The aim of this mathematical modeling is to compare the new estimators with each other and with a standard one. We analyze also how the accuracy of estimators depends of selected sampling design.
APA, Harvard, Vancouver, ISO, and other styles
7

Heng, Jeremy. "On the use of transport and optimal control methods for Monte Carlo simulation." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:6cbc7690-ac54-4a6a-b235-57fa62e5b2fc.

Full text
Abstract:
This thesis explores ideas from transport theory and optimal control to develop novel Monte Carlo methods to perform efficient statistical computation. The first project considers the problem of constructing a transport map between two given probability measures. In the Bayesian formalism, this approach is natural when one introduces a curve of probability measures connecting the prior to posterior by tempering the likelihood function. The main idea is to move samples from the prior using an ordinary differential equation (ODE), constructed by solving the Liouville partial differential equation (PDE) which governs the time evolution of measures along the curve. In this work, we first study the regularity solutions of Liouville equation should satisfy to guarantee validity of this construction. We place an emphasis on understanding these issues as it explains the difficulties associated with solutions that have been previously reported. After ensuring that the flow transport problem is well-defined, we give a constructive solution. However, this result is only formal as the representation is given in terms of integrals which are intractable. For computational tractability, we proposed a novel approximation of the PDE which yields an ODE whose drift depends on the full conditional distributions of the intermediate distributions. Even when the ODE is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of mapped samples can be evaluated and used as a proposal within Markov chain Monte Carlo and sequential Monte Carlo (SMC) schemes. We then illustrate experimentally that the resulting algorithm can outperform state-of-the-art SMC methods at a fixed computational complexity. The second project aims to exploit ideas from optimal control to design more efficient SMC methods. The key idea is to control the proposal distribution induced by a time-discretized Langevin dynamics so as to minimize the Kullback-Leibler divergence of the extended target distribution from the proposal. The optimal value functions of the resulting optimal control problem can then be approximated using algorithms developed in the approximate dynamic programming (ADP) literature. We introduce a novel iterative scheme to perform ADP, provide a theoretical analysis of the proposed algorithm and demonstrate that the latter can provide significant gains over state-of-the-art methods at a fixed computational complexity.
APA, Harvard, Vancouver, ISO, and other styles
8

Vo, Brenda. "Novel likelihood-free Bayesian parameter estimation methods for stochastic models of collective cell spreading." Thesis, Queensland University of Technology, 2016. https://eprints.qut.edu.au/99588/1/Brenda_Vo_Thesis.pdf.

Full text
Abstract:
Biological processes underlying skin cancer growth and wound healing are governed by various collective cell spreading mechanisms. This thesis develops new statistical methods to provide key insights into the mechanisms driving the spread of cell populations such as motility, proliferation and cell-to-cell adhesion, using experimental data. The new methods allow us to precisely estimate the parameters of such mechanisms, quantify the associated uncertainty and investigate how these mechanisms are influenced by various factors. The thesis provides a useful tool to measure the efficacy of medical treatments that aim to influence the spread of cell populations.
APA, Harvard, Vancouver, ISO, and other styles
9

Cao, Phuong Thao. "Approximation of OLAP queries on data warehouses." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00905292.

Full text
Abstract:
We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.
APA, Harvard, Vancouver, ISO, and other styles
10

Sedki, Mohammed Amechtoh. "Échantillonnage préférentiel adaptatif et méthodes bayésiennes approchées appliquées à la génétique des populations." Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20041/document.

Full text
Abstract:
Dans cette thèse, on propose des techniques d'inférence bayésienne dans les modèles où la vraisemblance possède une composante latente. La vraisemblance d'un jeu de données observé est l'intégrale de la vraisemblance dite complète sur l'espace de la variable latente. On s'intéresse aux cas où l'espace de la variable latente est de très grande dimension et comportes des directions de différentes natures (discrètes et continues), ce qui rend cette intégrale incalculable. Le champs d'application privilégié de cette thèse est l'inférence dans les modèles de génétique des populations. Pour mener leurs études, les généticiens des populations se basent sur l'information génétique extraite des populations du présent et représente la variable observée. L'information incluant l'histoire spatiale et temporelle de l'espèce considérée est inaccessible en général et représente la composante latente. Notre première contribution dans cette thèse suppose que la vraisemblance peut être évaluée via une approximation numériquement coûteuse. Le schéma d'échantillonnage préférentiel adaptatif et multiple (AMIS pour Adaptive Multiple Importance Sampling) de Cornuet et al. [2012] nécessite peu d'appels au calcul de la vraisemblance et recycle ces évaluations. Cet algorithme approche la loi a posteriori par un système de particules pondérées. Cette technique est conçue pour pouvoir recycler les simulations obtenues par le processus itératif (la construction séquentielle d'une suite de lois d'importance). Dans les nombreux tests numériques effectués sur des modèles de génétique des populations, l'algorithme AMIS a montré des performances numériques très prometteuses en terme de stabilité. Ces propriétés numériques sont particulièrement adéquates pour notre contexte. Toutefois, la question de la convergence des estimateurs obtenus parcette technique reste largement ouverte. Dans cette thèse, nous montrons des résultats de convergence d'une version légèrement modifiée de cet algorithme. Sur des simulations, nous montrons que ses qualités numériques sont identiques à celles du schéma original. Dans la deuxième contribution de cette thèse, on renonce à l'approximation de la vraisemblance et onsupposera seulement que la simulation suivant le modèle (suivant la vraisemblance) est possible. Notre apport est un algorithme ABC séquentiel (Approximate Bayesian Computation). Sur les modèles de la génétique des populations, cette méthode peut se révéler lente lorsqu'on vise uneapproximation précise de la loi a posteriori. L'algorithme que nous proposons est une amélioration de l'algorithme ABC-SMC de DelMoral et al. [2012] que nous optimisons en nombre d'appels aux simulations suivant la vraisemblance, et que nous munissons d'un mécanisme de choix de niveauxd'acceptations auto-calibré. Nous implémentons notre algorithme pour inférer les paramètres d'un scénario évolutif réel et complexe de génétique des populations. Nous montrons que pour la même qualité d'approximation, notre algorithme nécessite deux fois moins de simulations par rapport à laméthode ABC avec acceptation couramment utilisée
This thesis consists of two parts which can be read independently.The first part is about the Adaptive Multiple Importance Sampling (AMIS) algorithm presented in Cornuet et al.(2012) provides a significant improvement in stability and Effective Sample Size due to the introduction of the recycling procedure. These numerical properties are particularly adapted to the Bayesian paradigm in population genetics where the modelization involves a large number of parameters. However, the consistency of the AMIS estimator remains largely open. In this work, we provide a novel Adaptive Multiple Importance Sampling scheme corresponding to a slight modification of Cornuet et al. (2012) proposition that preserves the above-mentioned improvements. Finally, using limit theorems on triangular arrays of conditionally independant random variables, we give a consistensy result for the final particle system returned by our new scheme.The second part of this thesis lies in ABC paradigm. Approximate Bayesian Computation has been successfully used in population genetics models to bypass the calculation of the likelihood. These algorithms provide an accurate estimator by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for assuring a suitable approximation quality of the posterior distribution are still long. To alleviate this issue, we propose a sequential algorithm adapted fromDel Moral et al. (2012) which runs twice as fast as traditional ABC algorithms. Itsparameters are calibrated to minimize the number of simulations from the model
APA, Harvard, Vancouver, ISO, and other styles
11

Kramer, Stephan Christoph. "CUDA-based Scientific Computing." Doctoral thesis, Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2012. http://hdl.handle.net/11858/00-1735-0000-000D-FB52-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Craiu, Virgil Radu. "Multivalent framework for approximate and exact sampling and resampling /." 2001. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:3006484.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Le, Quoc Do. "Approximate Data Analytics Systems." Doctoral thesis, 2017. https://tud.qucosa.de/id/qucosa%3A30872.

Full text
Abstract:
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.
APA, Harvard, Vancouver, ISO, and other styles
14

Yang, Shin-Ta, and 楊世達. "Improvement of the curve-based method of finding approximate repeating patterns: frame sampling and re-mapping." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/16264505268482025088.

Full text
Abstract:
碩士
輔仁大學
資訊工程學系
93
In the music information retrieval field, the most important topic is to extract the feature which represents the content from the music objects. Therefore, the content feature is useful for music analysis, music retrieval and other services. In this paper, an application of feature extraction from music data is first introduced to motivate our research of finding approximate repeating patterns from sequence data. Finding approximate repeating patterns from music data is one of the key issues in the research field of music information retrieval. In this paper, Liu propose a curve-based algorithm to efficiently find nontrivial and approximate repeating patterns from music data. First, the given interval sequence is cut into interval substrings by the sliding window. By applying DCT on interval substrings, each substring will be mapped into the corresponding point in the feature space. Those points, which are near to each other in Euclidean distance, will be self-joined together into a trial. Therefore, similar trials correspond to similar interval substrings which are considered as candidates for approximate repeating patterns. By validation process, the final result set of nontrivial repeating patterns will be confirmed. We provide frame sampling for search patterns more exactly and we change freom self-joining to re-mapping method. Experiments are also performed to show the efficient and effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
15

Le, Huu Minh. "New algorithmic developments in maximum consensus robust fitting." Thesis, 2018. http://hdl.handle.net/2440/115183.

Full text
Abstract:
In many computer vision applications, the task of robustly estimating the set of parameters of a geometric model is a fundamental problem. Despite the longstanding research efforts on robust model fitting, there remains significant scope for investigation. For a large number of geometric estimation tasks in computer vision, maximum consensus is the most popular robust fitting criterion. This thesis makes several contributions in the algorithms for consensus maximization. Randomized hypothesize-and-verify algorithms are arguably the most widely used class of techniques for robust estimation thanks to their simplicity. Though efficient, these randomized heuristic methods do not guarantee finding good maximum consensus estimates. To improve the randomize algorithms, guided sampling approaches have been developed. These methods take advantage of additional domain information, such as descriptor matching scores, to guide the sampling process. Subsets of the data that are more likely to result in good estimates are prioritized for consideration. However, these guided sampling approaches are ineffective when good domain information is not available. This thesis tackles this shortcoming by proposing a new guided sampling algorithm, which is based on the class of LP-type problems and Monte Carlo Tree Search (MCTS). The proposed algorithm relies on a fundamental geometric arrangement of the data to guide the sampling process. Specifically, we take advantage of the underlying tree structure of the maximum consensus problem and apply MCTS to efficiently search the tree. Empirical results show that the new guided sampling strategy outperforms traditional randomized methods. Consensus maximization also plays a key role in robust point set registration. A special case is the registration of deformable shapes. If the surfaces have the same intrinsic shapes, their deformations can be described accurately by a conformal model. The uniformization theorem allows the shapes to be conformally mapped onto a canonical domain, wherein the shapes can be aligned using a M¨obius transformation. The problem of correspondence-free M¨obius alignment of two sets of noisy and partially overlapping point sets can be tackled as a maximum consensus problem. Solving for the M¨obius transformation can be approached by randomized voting-type methods which offers no guarantee of optimality. Local methods such as Iterative Closest Point can be applied, but with the assumption that a good initialization is given or these techniques may converge to a bad local minima. When a globally optimal solution is required, the literature has so far considered only brute-force search. This thesis contributes a new branch-and-bound algorithm that solves for the globally optimal M¨obius transformation much more efficiently. So far, the consensus maximization problems are approached mainly by randomized algorithms, which are efficient but offer no analytical convergence guarantee. On the other hand, there exist exact algorithms that can solve the problem up to global optimality. The global methods, however, are intractable in general due to the NP-hardness of the consensus maximization. To fill the gap between the two extremes, this thesis contributes two novel deterministic algorithms to approximately optimize the maximum consensus criterion. The first method is based on non-smooth penalization supported by a Frank-Wolfe-style optimization scheme, and another algorithm is based on Alternating Direction Method of Multipliers (ADMM). Both of the proposed methods are capable of handling the non-linear geometric residuals commonly used in computer vision. As will be demonstrated, our proposed methods consistently outperform other heuristics and approximate methods.
Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Computer Science, 2018
APA, Harvard, Vancouver, ISO, and other styles
16

Paz, Solange de Lemos. "Processamento aproximado depesquisas para análise de Big Data." Master's thesis, 2019. http://hdl.handle.net/10316/87927.

Full text
Abstract:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
Nos últimos dez anos o crescimento dos dados digitais aumentou exponencialmente. Com o aumento da quantidade de dados processada diariamente, a análise de dados para extrair informações relevantes de forma rápida tornou-se uma tarefa cada vez mais importante e difícil. As tecnologias atuais para análise de dados, que utilizam sistemas de bases de dados relacionais e data warehouses tornaram-se incapazes de lidar de forma eficiente com grandes quantidades de dados. Uma pesquisa nesses sistemas pode demorar horas até devolver um resultado, surgindo assim a necessidade de melhorar o seu desempenho, em termos de custo e tempo. Para melhorar este desempenho surgiram os sistemas de processamento aproximado de pesquisas, que garantem o processamento rápido de grandes quantidades de dados, abdicando de 100% de exatidão na resposta mas promovendo tempos de resposta mais curtos, utilizando apenas uma parte do conjunto de dados. Ao longo das últimas décadas foram propostas diversas técnicas de processamento aproximado de pesquisas, no entanto estas possuem limitações.Neste trabalho é proposta e avaliada uma nova técnica de processamento aproximado de pesquisas que mitiga as seguintes deficiências das abordagens atuais: não requer que seja efetuada qualquer alteração na base de dados, uma vez que possui uma arquitetura de middleware; permite a parametrização do grau de confiança e o erro máximo admitido para a resposta de uma pesquisa e lida com a maioria dos tipos de pesquisas. Esta técnica, designada JDBCApprox, consiste na implementação de uma biblioteca Java que recorre a uma amostragem aleatória simples sem repetição para criar amostras das tabelas da base de dados e, em seguida utiliza uma base de dados com uma configuração em memória para obter uma aceleração no tempo de resposta das pesquisas. A avaliação experimental mostrou que a técnica JDBCApprox consegue ser até 24 vezes mais rápida do que o PostgreSQL e devolve na maioria dos casos respostas mais exatas do que o sistema que apresenta os melhores resultados do estado da arte.
Over the last ten years, the growth of digital data has increased exponentially. With the increase in the amount of data processed daily, using data analysis to quickly extract relevant information has become an increasingly important and difficult task. Current technologies for data analysis, which utilize relational database systems and data warehouses, have become incapable of handling large amounts of data efficiently. Performing a query on these systems may take hours before returning a result, thus emerging the need to improve their performance in terms of cost and time. To improve this performance, new processing systems of research have emerged. These systems ensure the rapid processing of large amounts of data, abdicating from 100\% accuracy in the response but promoting shorter response times, using only a portion of the data set. Over the last decades, several techniques have been proposed to approximate processing of queries, however these have limitations.\\ This work proposes and evaluates a new technique of approximate processing of researches that mitigates the following shortcomings of current approaches: it does not require any changes to be made on the database since it has a middleware architecture; allows the parameterization of the degree of confidence and the maximum error admitted to the response of a survey and deals with most types of queries. This technique, named JDBCApprox, consists of the implementation of a Java library that uses a simple random sampling without repetition to create samples of the database tables. It then uses a database with an in-memory configuration to get an acceleration in the response time of the queries. The deployed library can be up to 24 times faster than PostgreSQL and returns, in most cases, more accurate answers than the system that presents the best state of the art results.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography