Tesis: "String algorithms"

1

Pinzon, Yoan Jose. "String algorithms on sequence comparison". Thesis, King's College London (University of London), 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395648.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Fischer, Johannes. "Data Structures for Efficient String Algorithms". Diss., lmu, 2007. http://nbn-resolving.de/urn:nbn:de:bvb:19-75053.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Newman, Alantha. "Algorithms for string and graph layout". Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28745.

Texto completo

Resumen

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.
Includes bibliographical references (p. 121-125).
Many graph optimization problems can be viewed as graph layout problems. A layout of a graph is a geometric arrangement of the vertices subject to given constraints. For example, the vertices of a graph can be arranged on a line or a circle, on a two- or three-dimensional lattice, etc. The goal is usually to place all the vertices so as to optimize some specified objective function. We develop combinatorial methods as well as models based on linear and semidefinite programming for graph layout problems. We apply these techniques to some well-known optimization problems. In particular, we give improved approximation algorithms for the string folding problem on the two- and three-dimensional square lattices. This combinatorial graph problem is motivated by the protein folding problem, which is central in computational biology. We then present a new semidefinite programming formulation for the linear ordering problem (also known as the maximum acyclic subgraph problem) and show that it provides an improved bound on the value of an optimal solution for random graphs. This is the first relaxation that improves on the trivial "all edges" bound for random graphs.
by Alantha Newman.
Ph.D.

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Luccio, Flaminia L. Carleton University Dissertation Computer Science. "Distributed algorithms for routing and string recognition". Ottawa, 1995.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Aljamea, Mudhi Mohammed. "Advances in string algorithms for information security applications". Thesis, King's College London (University of London), 2016. https://kclpure.kcl.ac.uk/portal/en/theses/advances-in-string-algorithms-for-information-security-applications(93f6c82c-8b7a-4fb1-a0bd-29f14549d57c).html.

Texto completo

Resumen

This thesis focuses on introducing novel algorithms in information security through studying successful algorithms in bioinformatics and adapting them to solve some open problems in information security. Although, the problems in both bioinformatics and information security are different, yet, they might be considered very similar when it comes to identifying and solving them using Stringology techniques. Different successful bioinformatics algorithms have been studied and introduced to solve different security problems such as malware detection, biometrics and big data. Firstly, we present a dynamic computer malware detection model; a novel approach for detecting malware code embedded in different types of computer files, with consistency, accuracy and in high speed without excessive memory usages. This model was inspired by REAL; an efficient read aligner used by next generation sequencing for processing biological data. In addition, we introduce a novel algorithmic approach to detect malicious URLs in image secret communications. Secondly, we also focus on biometrics, specifically fingerprint which is considered to be one of the most reliable and used technique to identify individuals. In particular, we introduce a new fingerprint matching technique, which matches the fingerprint information using circular approximate string matching to solve the rotation problem overcoming the previous methods’ obstacles. Finally, we conclude with an algorithmic approach to analyse big data readings from smart meters to confirm some privacy issues concerns.

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Cheng, Lok-lam y 鄭樂霖. "Approximate string matching in DNA sequences". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Hunt, Kieran. "A comparison of exact string search algorithms for deep packet inspection". Thesis, Rhodes University, 2018. http://hdl.handle.net/10962/60629.

Texto completo

Resumen

Every day, computer networks throughout the world face a constant onslaught of attacks. To combat these, network administrators are forced to employ a multitude of mitigating measures. Devices such as firewalls and Intrusion Detection Systems are prevalent today and employ extensive Deep Packet Inspection to scrutinise each piece of network traffic. Systems such as these usually require specialised hardware to meet the demand imposed by high throughput networks. Hardware like this is extremely expensive and singular in its function. It is with this in mind that the string search algorithms are introduced. These algorithms have been proven to perform well when searching through large volumes of text and may be able to perform equally well in the context of Deep Packet Inspection. String search algorithms are designed to match a single pattern to a substring of a given piece of text. This is not unlike the heuristics employed by traditional Deep Packet Inspection systems. This research compares the performance of a large number of string search algorithms during packet processing. Deep Packet Inspection places stringent restrictions on the reliability and speed of the algorithms due to increased performance pressures. A test system had to be designed in order to properly test the string search algorithms in the context of Deep Packet Inspection. The system allowed for precise and repeatable tests of each algorithm and then for their comparison. Of the algorithms tested, the Horspool and Quick Search algorithms posted the best results for both speed and reliability. The Not So Naive and Rabin-Karp algorithms were slowest overall.

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Pockrandt, Christopher Maximilian [Verfasser]. "Approximate String Matching : Improving Data Structures and Algorithms / Christopher Maximilian Pockrandt". Berlin : Freie Universität Berlin, 2019. http://d-nb.info/1183675879/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

黎少斌 y Shiao-bun Lai. "Trading off time for space for the string matching problem". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31214216.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Lai, Shiao-bun. "Trading off time for space for the string matching problem /". Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18061795.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Samir, Uzzaman Mohammad. "Algorithms and optimized implementations in context of circular string and relevant web security". Thesis, King's College London (University of London), 2017. https://kclpure.kcl.ac.uk/portal/en/theses/algorithms-and-optimized-implementations-in-context-of-circular-string-and-relevant-web-security(3f33c6b0-db65-4a39-baa3-d241963deb1e).html.

Texto completo

Resumen

Ammonia plays a central role in the pathogenesis of cerebral oedema in paracetamol-induced acute liver failure (PALF). Infection and inflammation play an important synergistic role in its development. Toll-like receptors (TLRs) sense pathogens and induce inflammation but whether this contributes to the development of cerebral oedema in PALF remains unknown. I postulated that ammonia-induced cerebral oedema and immune dysfunction are mediated by TLR9 and aimed to determine whether this could be prevented in a hyperammonemic TLR9 knockout mouse model. TLR9 expression on circulating neutrophils and their function in PALF was assessed. To examine the influence of PALF plasma and endogenous DNA on TLR9 expression, healthy neutrophils were incubated with PALF plasma with/without DNase. Ammonium acetate (NH4-Ac) was injected intraperitoneally in wild type Black6 (WT-B6), TLR9-/- B6 mice and TLR9fl/fl LysCre B6 mice with TLR9 deleted from neutrophils and macrophages. The TLR9 antagonist ODN2088 was also evaluated. Neutrophil TLR9 correlated with plasma IL-8 and ammonia concentration and increased with severity of hepatic encephalopathy and systemic inflammation. Healthy neutrophil TLR9 expression increased upon stimulation with PALF plasma which was abrogated by pre-incubation with DNase. Following NH4-Ac stimulation, intracellular cytokine (IFN-γ, TNF-α and IL-6) production of lymphocytes and macrophages were increased in WT-B6 mice compared to controls. This was accompanied by increased brain water however in TLR9-/-, cytokine production and brain water content were decreased. This was seen similarly in WT-B6 administered the TLR9 antagonist ODN2088 in conjunction with NH4-Ac. TLR9fl/fl LysCre mice had decreased cytokine production and brain water compared to the TLR9fl/fl group following NH4-Ac injection. Total DNA levels were increased in the circulation after NH4-Ac injection. In summary, ammonia-induced cerebral oedema and immune dysfunction are mediated through TLR9 and DNA dependent. The amelioration of brain oedema and lymphocyte cytokine production by ODN2088 supports exploration of TLR9 antagonism in early PALF to prevent progression to cerebral oedema.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Yim, Cheuk-hon Terence. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping". Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31455736.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Oliveira, Rafael Massambone de. "String-averaging incremental subgradient methods for constrained convex optimization problems". Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-14112017-150512/.

Texto completo

Resumen

In this doctoral thesis, we propose new iterative methods for solving a class of convex optimization problems. In general, we consider problems in which the objective function is composed of a finite sum of convex functions and the set of constraints is, at least, convex and closed. The iterative methods we propose are basically designed through the combination of incremental subgradient methods and string-averaging algorithms. Furthermore, in order to obtain methods able to solve optimization problems with many constraints (and possibly in high dimensions), generally given by convex functions, our analysis includes an operator that calculates approximate projections onto the feasible set, instead of the Euclidean projection. This feature is employed in the two methods we propose; one deterministic and the other stochastic. A convergence analysis is proposed for both methods and numerical experiments are performed in order to verify their applicability, especially in large scale problems.
Nesta tese de doutorado, propomos novos métodos iterativos para a solução de uma classe de problemas de otimização convexa. Em geral, consideramos problemas nos quais a função objetivo é composta por uma soma finita de funções convexas e o conjunto de restrições é, pelo menos, convexo e fechado. Os métodos iterativos que propomos são criados, basicamente, através da junção de métodos de subgradientes incrementais e do algoritmo de média das sequências. Além disso, visando obter métodos flexíveis para soluções de problemas de otimização com muitas restrições (e possivelmente em altas dimensões), dadas em geral por funções convexas, a nossa análise inclui um operador que calcula projeções aproximadas sobre o conjunto viável, no lugar da projeção Euclideana. Essa característica é empregada nos dois métodos que propomos; um determinístico e o outro estocástico. Uma análise de convergência é proposta para ambos os métodos e experimentos numéricos são realizados a fim de verificar a sua aplicabilidade, principalmente em problemas de grande escala.

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Klaib, Ahmad. "Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases". Thesis, University of Huddersfield, 2014. http://eprints.hud.ac.uk/id/eprint/24266/.

Texto completo

Resumen

The enormous quantities of biological and chemical files and databases are likely to grow year on year, consequently giving rise to the need to develop string-matching algorithms capable of minimizing the searching response time. Being aware of this need, this thesis aims to develop string matching algorithms to search biological sequences and chemical structures by studying exact string matching algorithms in detail. As a result, this research developed a new classification of string matching algorithms containing eight categories according to the pre-processing function of algorithms and proposed five new string matching algorithms; BRBMH, BRQS, Odd and Even algorithm (OE), Random String Matching algorithm (RSMA) and Skip Shift New algorithm (SSN). The main purpose behind the proposed algorithms is to reduce the searching response time and the total number of comparisons. They are tested by comparing them with four well- known standard algorithms, Boyer Moore Horspool (BMH), Quick Search (QS), TVSBS and BRFS. This research applied all of the algorithms to sample data files by implementing three types of tests. The number of comparison tests showed a substantial difference in the number of comparisons our algorithms use compared to the non-hybrid algorithms such as QS and BMH. In addition, the tests showed considerable difference between our algorithms and other hybrid algorithm such as TVSBS and BRFS. For instance, the average elapsed search time tests showed that our algorithms presented better average elapsed search time than the BRFS, TVSBS, QS and BMH algorithms, while the average number of tests showed better number of attempts compared to BMH, QS, TVSBS and BRFS algorithms. A new contribution has been added by this research by using the fastest proposed algorithm, the SSN algorithm, to develop a chemical structure searching toolkit to search chemical structures in our local database. The new algorithms were paralleled using OpenMP and MPI parallel models and tested at the University of Science Malaysia (USM) on a Stealth Cluster with different number of threads and processors to improve the speed of searching pattern in the given text which, as we believe, is another contribution.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Bingmann, Timo [Verfasser] y P. [Akademischer Betreuer] Sanders. "Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools / Timo Bingmann ; Betreuer: P. Sanders". Karlsruhe : KIT-Bibliothek, 2018. http://d-nb.info/1164081144/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Vishwanathan, S. V. N. "Kernel Methods Fast Algorithms and real life applications". Thesis, Indian Institute of Science, 2003. http://hdl.handle.net/2005/49.

Texto completo

Resumen

Support Vector Machines (SVM) have recently gained prominence in the field of machine learning and pattern classification (Vapnik, 1995, Herbrich, 2002, Scholkopf and Smola, 2002). Classification is achieved by finding a separating hyperplane in a feature space, which can be mapped back onto a non-linear surface in the input space. However, training an SVM involves solving a quadratic optimization problem, which tends to be computationally intensive. Furthermore, it can be subject to stability problems and is non-trivial to implement. This thesis proposes an fast iterative Support Vector training algorithm which overcomes some of these problems. Our algorithm, which we christen Simple SVM, works mainly for the quadratic soft margin loss (also called the l2 formulation). We also sketch an extension for the linear soft-margin loss (also called the l1 formulation). Simple SVM works by incrementally changing a candidate Support Vector set using a locally greedy approach, until the supporting hyperplane is found within a finite number of iterations. It is derived by a simple (yet computationally crucial) modification of the incremental SVM training algorithms of Cauwenberghs and Poggio (2001) which allows us to perform update operations very efficiently. Constant-time methods for initialization of the algorithm and experimental evidence for the speed of the proposed algorithm, when compared to methods such as Sequential Minimal Optimization and the Nearest Point Algorithm are given. We present results on a variety of real life datasets to validate our claims. In many real life applications, especially for the l2 formulation, the kernel matrix K є R n x n can be written as K = Z T Z + Λ , where, Z є R n x m with m << n and Λ є R n x n is diagonal with nonnegative entries. Hence the matrix K - Λ is rank-degenerate, Extending the work of Fine and Scheinberg (2001) and Gill et al. (1975) we propose an efficient factorization algorithm which can be used to find a L D LT factorization of K in 0(nm2) time. The modified factorization, after a rank one update of K, can be computed in 0(m2) time. We show how the Simple SVM algorithm can be sped up by taking advantage of this new factorization. We also demonstrate applications of our factorization to interior point methods. We show a close relation between the LDV factorization of a rectangular matrix and our LDLT factorization (Gill et al., 1975). An important feature of SVM's is that they can work with data from any input domain as long as a suitable mapping into a Hilbert space can be found, in other words, given the input data we should be able to compute a positive semi definite kernel matrix of the data (Scholkopf and Smola, 2002). In this thesis we propose kernels on a variety of discrete objects, such as strings, trees, Finite State Automata, and Pushdown Automata. We show that our kernels include as special cases the celebrated Pair-HMM kernels (Durbin et al., 1998, Watkins, 2000), the spectrum kernel (Leslie et al., 20024, convolution kernels for NLP (Collins and Duffy, 2001), graph diffusion kernels (Kondor and Lafferty, 2002) and various other string-matching kernels. Because of their widespread applications in bio-informatics and web document based algorithms, string kernels are of special practical importance. By intelligently using the matching statistics algorithm of Chang and Lawler (1994), we propose, perhaps, the first ever algorithm to compute string kernels in linear time. This obviates dynamic programming with quadratic time complexity and makes string kernels a viable alternative for the practitioner. We also propose extensions of our string kernels to compute kernels on trees efficiently. This thesis presents a linear time algorithm for ordered trees and a log-linear time algorithm for unordered trees. In general, SVM's require time proportional to the number of Support Vectors for prediction. In case the dataset is noisy a large fraction of the data points become Support Vectors and thus time required for prediction increases. But, in many applications like search engines or web document retrieval, the dataset is noisy, yet, the speed of prediction is critical. We propose a method for string kernels by which the prediction time can be reduced to linear in the length of the sequence to be classified, regardless of the number of Support Vectors. We achieve this by using a weighted version of our string kernel algorithm. We explore the relationship between dynamic systems and kernels. We define kernels on various kinds of dynamic systems including Markov chains (both discrete and continuous), diffusion processes on graphs and Markov chains, Finite State Automata, various linear time-invariant systems etc Trajectories arc used to define kernels introduced on initial conditions lying underlying dynamic system. The same idea is extended to define Kernels on a. dynamic system with respect to a set of initial conditions. This framework leads to a large number of novel kernels and also generalize many previously proposed kernels. Lack of adequate training data is a problem which plagues classifiers. We propose n new method to generate virtual training samples in the case of handwritten digit data. Our method uses the two dimensional suffix tree representation of a set of matrices to encode an exponential number of virtual samples in linear space thus leading to an increase in classification accuracy. This in turn, leads us naturally to a, compact data dependent representation of a test pattern which we call the description tree. We propose a new kernel for images and demonstrate a quadratic time algorithm for computing it by wing the suffix tree representation of an image. We also describe a method to reduce the prediction time to quadratic in the size of the test image by using techniques similar to those used for string kernels.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Yim, Cheuk-hon Terence y 嚴卓漢. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31455736.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Robinson, Matthew Brandon Cleaver Gerald B. "Towards a systematic investigation of weakly coupled free fermionic heterotic string gauge group statistics". Waco, Tex. : Baylor University, 2009. http://hdl.handle.net/2104/5358.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Ãvila, Ricardo Lima Feitosa. "Emprego de tÃcnicas de prÃ-processamento textual e algoritmos de comparaÃÃo como suporte Ã correÃÃo de questÃes dissertativas: experimentos, anÃlises e contribuiÃÃes". Universidade Federal do CearÃ, 2013. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=10572.

Texto completo

Resumen

CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
Esta dissertaÃÃo apresenta um estudo de tÃcnicas que podem ser empregadas como apoio para a correÃÃo de questÃes dissertativas com base na adaptaÃÃo de algoritmos de comparaÃÃo textual combinados a tÃcnicas de prÃ-processamento de textos. O principal desafio na concepÃÃo de uma ferramenta para este tipo de aplicaÃÃo Ã a ambiguidade da linguagem natural. Para analisar situaÃÃes de correÃÃo de questÃes subjetivas, foram efetuados testes com esses algoritmos, tendo-se desenvolvido uma ferramenta para tal propÃsito. Confrontando respostas de alunos ao padrÃo de resposta de questÃes propostas em provas subjetivas, foram analisados o desempenho individual dos algoritmos e de um conjunto de tÃcnicas de prÃ-processamento que sÃo encontrados na literatura, de maneira isolada e combinada. Buscando contornar situaÃÃes especÃficas de falso negativo e falso positivo, foram propostas algumas tÃcnicas auxiliares como contribuiÃÃo deste trabalho. ApÃs a anÃlise dos experimentos realizados, os resultados de Ãndice de similaridade entre respostas indicam o uso da soluÃÃo como suporte a correÃÃo de questÃes discursivas, podendo, ainda, ser aplicado na detecÃÃo de plÃgio e ser integrado a um ambiente virtual de ensino e aprendizagem.
This master thesis presents a study of techniques used as support for a correction of essay questions based in an adaptation of string-matching algorithms combined with text preprocessing techniques. The main challenge to design a tool like this is an ambiguity of natural language. To analyze a correction of subjective questions, tests were performed with these algorithms, and a tool have been developed for this purpose. Comparing student responses with response pattern of questions proposed in subjective tests, we analyzed the performance of individual algorithms and a set of pre-processing techniques that are found in the literature, in isolation and combined. Seeking to neutralize specific situations of false negative and false positive, some techniques have been proposed as auxiliary contribution of this work. After analyzing the experiments, the results of similarity index between responses indicate the use of the solution to support the correction of essay questions, and may also be applied in the detection of plagiarism and be integrated to a learning management system.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Toth, Róbert. "Přibližné vyhledávání řetězců v předzpracovaných dokumentech". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236122.

Texto completo

Resumen

This thesis deals with the problem of approximate string matching, also called string matching allowing errors. The thesis targets the area of offline algorithms, which allows very fast pattern matching thanks to index created during initial text preprocessing phase. Initially, we will define the problem itself and demonstrate variety of its applications, followed by short survey of different approaches to cope with this problem. Several existing algorithms based on suffix trees will be explained in detail and new hybrid algorithm will be proposed. Algorithms wil be implemented in C programming language and thoroughly compared in series of experiments with focus on newly presented algorithm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Sastre, Javier M. "Efficient finite-state algorithms for the application of local grammars". Phd thesis, Université de Marne la Vallée, 2011. http://tel.archives-ouvertes.fr/tel-00621249.

Texto completo

Resumen

Notre travail porte sur le développement d'algorithmes performants d'application de grammaires locales, en prenant comme référence ceux des logiciels libres existants: l'analyseur syntaxique descendant d'Unitex et l'analyseur syntaxique à la Earley d'Outilex. Les grammaires locales sont un formalisme de représentation de la syntaxe des langues naturelles basé sur les automates finis. Les grammaires locales sont un modèle de construction de descriptions précises et à grande échelle de la syntaxe des langues naturelles par le biais de l'observation systématique et l'accumulation méthodique de données. L'adéquation des grammaires locales pour cette tâche a été testée à l'occasion de nombreux travaux. À cause de la nature ambiguë des langues naturelles et des propriétés des grammaires locales, les algorithmes classiques d'analyse syntaxique tels que LR, CYK et Tomita ne peuvent pas être utilisés dans le contexte de ce travail. Les analyseurs descendant et Earley sont des alternatives possibles, cependant, ils ont des coûts asymptotiques exponentiels pour le cas des grammaires locales. Nous avons d'abord conçu un algorithme d'application de grammaires locales avec un coût polynomial dans le pire des cas. Ensuite, nous avons conçu des structures de données performantes pour la représentation d'ensembles d'éléments et de séquences. Elles ont permis d'améliorer la vitesse de notre algorithme dans le cas général. Nous avons mis en oeuvre notre algorithme et ceux des systèmes Unitex et Outilex avec les mêmes outils afin de les tester dans les mêmes conditions. En outre, nous avons mis en oeuvre différentes versions de chaque algorithme en utilisant nos structures de données et algorithmes pour la représentation d'ensembles et ceux fournis par la Standard Template Library (STL) de GNU. Nous avons comparé les performances des différents algorithmes et de leurs variantes dans le cadre d'un projet industriel proposé par l'entreprise Telefónica I+D: augmenter la capacité de compréhension d'un agent conversationnel qui fournit des services en ligne, voire l'envoi de SMS à des téléphones portables ainsi que des jeux et d'autres contenus numériques. Les conversations avec l'agent sont en espagnol et passent par Windows Live Messenger. En dépit du domaine limité et de la simplicité des grammaires appliquées, les temps d'exécution de notre algorithme, couplé avec nos structures de données et algorithmes pour la représentation d'ensembles, ont été plus courts. Grâce au coût asymptotique amélioré, on peut s'attendre à des temps d'exécution significativement inférieurs par rapport aux algorithmes utilisés dans les systèmes Unitex et Outilex, pour le cas des grammaires complexes et à large couverture.

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Neto, Domingos Soares. "Filtros para a busca e extração de padrões aproximados em cadeias biológicas". Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-19102009-002745/.

Texto completo

Resumen

Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros.
This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Berry, Thomas. "Algorithm engineering : string processing". Thesis, Liverpool John Moores University, 2002. http://researchonline.ljmu.ac.uk/4973/.

Texto completo

Resumen

The string matching problem has attracted a lot of interest throughout the history of computer science, and is crucial to the computing industry. The theoretical community in Computer Science has a developed a rich literature in the design and analysis of string matching algorithms. To date, most of this work has been based on the asymptotic analysis of the algorithms. This analysis rarely tell us how the algorithm will perform in practice and considerable experimentation and fine-tuning is typically required to get the most out of a theoretical idea. In this thesis, promising string matching algorithms discovered by the theoretical community are implemented, tested and refined to the point where they can be usefully applied in practice. In the course of this work we have presented the following new algorithms. We prove that the time complexity of the new algorithms, for the average case is linear. We also compared the new algorithms with the existing algorithms by experimentation. " We implemented the existing one dimensional string matching algorithms for English texts. From the findings of the experimental results we identified the best two algorithms. We combined these two algorithms and introduce a new algorithm. " We developed a new two dimensional string matching algorithm. This algorithm uses the structure of the pattern to reduce the number of comparisons required to search for the pattern. " We described a method for efficiently storing text. Although this reduces the size of the storage space, it is not a compression method as in the literature. Our aim is to improve both space and time taken by a string matching algorithm. Our new algorithm searches for patterns in the efficiently stored text without decompressing the text. " We illustrated that by pre-processing the text we can improve the speed of the string matching algorithm when we search for a large number of patterns in a given text. " We proposed a hardware solution for searching in an efficiently stored DNA text.

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Mohamed, Manal Abd El-Kadeer Kasem. "Algorithmic issues on string regularities". Thesis, King's College London (University of London), 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.422247.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Toopsuwan, Chalita. "Algorithms and combinatorics of repetitions in strings". Thesis, King's College London (University of London), 2017. https://kclpure.kcl.ac.uk/portal/en/theses/algorithms-and-combinatorics-of-repetitions-in-strings(a20776fa-6a15-4c37-bf87-505211309fd7).html.

Texto completo

Resumen

Repetitions in strings constitute one of the most fundamental areas of string combinatorics with exactly essential applications to text algorithms, data compression, and also analysis of biological sequences. It is relevant to periodicities, regularities, and compression. The higher compression rate can be obtained from the repetitive behavior of strings, and reversely some compression techniques are at the core of fast algorithms for detecting repetitions. Repetitions are highly periodic factors (or substrings) in strings, there are various type of repetitions such as repeat, repetition, squares, cubes, palindrome, maximal periodicitiie which is also called runs. The aim of this thesis is concentrated on the repetitions in strings in algorithmic and combinatorics approaches as they are very intricate and plenty of interesting works remain as open problems. The critical study of this thesis firstly approach to the maximal periodicities or runs. It presents in Algorithmics of repetitions, local periods and critical factorization. An algorithm is designed in order to compute all runs for a string drawn from an infinite alphabet. On a string of length n, the algorithm runs optimally in time O(n log n) while there is a linear number of runs. The key model of computation is the comparison of letters which is done with the equality operator only. Under the same proposition, another time-optimal algorithm is created. This gives the same running time to compute local periods and all critical factorisations. The prefix table of input strings is applied as the main tool of those algorithms. In this study, we also design a simple algorithm based on the Dictionary of Basic Factors of the input string. The notion of Gapped Palindrome and its Anti-exponent goes toward this research. A palindrome is a string x = a1 · · · an which is equal to its reversal ex = an · · · a1. The definition of a gapped palindromes is given by a string of the form uveu, where u, v are strings, |v| 2, and eu is the reversal of u. Replicating the standard notion of string exponent, we together define the anti-exponent of a gapped palindrome uveu as the quotient of |uveu| by |uv|. In this work, an algorithm is described to compute the maximal anti-exponent of gapped palindromes occurring in an ordinary palindrome-free string. To get an efficient computation of maximal anti-exponent of factors in a palindrome-free string, we apply techniques based on the suffix automaton and the reversed Lempel-Ziv factorisation. The complexity analyse shows that algorithm runs in linear-time on a fixed-size alphabet. Repeats are also of main concern in the domains of text compression and of pattern matching so lastly the study of repeat and its exponents are discussed in this thesis. Here we create linear-time algorithm to compute maximal exponent of repeats occurring in an overlapping-free string. Two main tools for the algorithm are a factorisation of the string and the Suffix Automaton of some factors. Eventually, we obtain the graceful result as the direct consequence from this research. There is the linearity of the number of occurrences of repeats whose exponent is maximal in an overlap-free string. Among all of the previous researches and our further viewpoints in this thesis, acquiring knowledge on repetitions in string remains interesting open questions to continue.

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Dubois, Simon. "Offline Approximate String Matching forInformation Retrieval : An experiment on technical documentation". Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-22566.

Texto completo

Resumen

Approximate string matching consists in identifying strings as similar even ifthere is a number of mismatch between them. This technique is one of thesolutions to reduce the exact matching strictness in data comparison. In manycases it is useful to identify stream variation (e.g. audio) or word declension (e.g.prefix, suffix, plural). Approximate string matching can be used to score terms in InformationRetrieval (IR) systems. The benefit is to return results even if query terms doesnot exactly match indexed terms. However, as approximate string matchingalgorithms only consider characters (nor context neither meaning), there is noguarantee that additional matches are relevant matches. This paper presents the effects of some approximate string matchingalgorithms on search results in IR systems. An experimental research design hasbeen conducting to evaluate such effects from two perspectives. First, resultrelevance is analysed with precision and recall. Second, performance is measuredthanks to the execution time required to compute matches. Six approximate string matching algorithms are studied. Levenshtein andDamerau-Levenshtein computes edit distance between two terms. Soundex andMetaphone index terms based on their pronunciation. Jaccard similarity calculatesthe overlap coefficient between two strings. Tests are performed through IR scenarios regarding to different context,information need and search query designed to query on a technicaldocumentation related to software development (man pages from Ubuntu). Apurposive sample is selected to assess document relevance to IR scenarios andcompute IR metrics (precision, recall, F-Measure). Experiments reveal that all tested approximate matching methods increaserecall on average, but, except Metaphone, they also decrease precision. Soundexand Jaccard Similarity are not advised because they fail on too many IR scenarios.Highest recall is obtained by edit distance algorithms that are also the most timeconsuming. Because Levenshtein-Damerau has no significant improvementcompared to Levenshtein but costs much more time, the last one is recommendedfor use with a specialised documentation. Finally some other related recommendations are given to practitioners toimplement IR systems on technical documentation.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

MacLeod, Christopher. "The synthesis of artificial neural networks using single string evolutionary techniques". Thesis, Robert Gordon University, 1999. http://hdl.handle.net/10059/367.

Texto completo

Resumen

The research presented in this thesis is concerned with optimising the structure of Artificial Neural Networks. These techniques are based on computer modelling of biological evolution or foetal development. They are known as Evolutionary, Genetic or Embryological methods. Specifically, Embryological techniques are used to grow Artificial Neural Network topologies. The Embryological Algorithm is an alternative to the popular Genetic Algorithm, which is widely used to achieve similar results. The algorithm grows in the sense that the network structure is added to incrementally and thus changes from a simple form to a more complex form. This is unlike the Genetic Algorithm, which causes the structure of the network to evolve in an unstructured or random way. The thesis outlines the following original work: The operation of the Embryological Algorithm is described and compared with the Genetic Algorithm. The results of an exhaustive literature search in the subject area are reported. The growth strategies which may be used to evolve Artificial Neural Network structure are listed. These growth strategies are integrated into an algorithm for network growth. Experimental results obtained from using such a system are described and there is a discussion of the applications of the approach. Consideration is given of the advantages and disadvantages of this technique and suggestions are made for future work in the area. A new learning algorithm based on Taguchi methods is also described. The report concludes that the method of incremental growth is a useful and powerful technique for defining neural network structures and is more efficient than its alternatives. Recommendations are also made with regard to the types of network to which this approach is best suited. Finally, the report contains a discussion of two important aspects of Genetic or Evolutionary techniques related to the above. These are Modular networks (and their synthesis) and the functionality of the network itself.

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Rahman, Mohammad Sohel. "Fast and efficient algorithms on strings and sequences". Thesis, King's College London (University of London), 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.617795.

Texto completo

Resumen

The last few decades have witnessed the fascinating outgrowth of the field of String Algorithms. And, with the ever increasing mass of information and rapid pace of dissemination and sharing thereof, the importance and relevance of these algorithms can be expected to grow even further in near future. In this thesis, we have studied a number of interesting problems on strings and sequences and taken an effort to devise efficient algorithms to solve them. The problems we have studied here falls into two main categories of problems, namely the String Matching Problems and the Longest Common Subsequence Problems. The first String matching problem handled in this thesis is the pattern matching problems under the don't care paradigm. Don't care pattern matching problems have been studied since 1974 and the quest for improved algorithms is still on. We present efficient algorithms for different versions of this problem. We also present efficient algorithms for a more general problem, namely the pattern matching problem with degenerate strings. Next, we consider another well-studied problem namely the Swap Matching problem. We present a new graph-theoretic approach to model the problem, which opens a new and hitherto unexplored avenue to solve it. Then, using the model, we devise an efficient algorithm, which works particularly well for short patterns. Computer assisted music analysis and music information retrieval has a number of tasks that can be related to stringology. To this extent, we consider the problem of identifying rhythms in a musical text, which can be used in automatic song classification. Since there doesn't exist a complete agreement on the definitions of related features, we take an effort to present a new paradigm to model this problem and devise an efficient algorithm to solve it. Next, we consider a number ofrelatively new interesting pattern matching problems from indexing point of view. We present efficient indexing schemes for the problem of pattern matching in given intervals, the property matching problem and the circular pattern matching problem. We conclude our study of string matching problems by focusing on devising an efficient data structure to index gapped factors. In the second part of this thesis, we study the longest common subsequence (LCS) problems and variants thereof. We first introduce a number of new LCS variants by applying the notion of gap-constraints. The new variants are motivated by practical applications from computational molecular biology and we present efficient algorithms to solve them. Then, using the techniques developed while solving these variants, we present efficient algorithms for the classic LCS problem and for another relatively new variant, namely, the Constrained LCS (CLCS) problem. Finally, we efficiently solve the LCS and CLCS problems for degenerate strings.

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Silva, Júnior José Bonifácio da. "Paralelização em CUDA do algoritmo Aho-Corasick utilizando as hierarquias de memórias da GPU e nova compactação da Tabela de Transcrição de Estados". Universidade Federal de Sergipe, 2017. https://ri.ufs.br/handle/riufs/3353.

Texto completo

Resumen

The Intrusion Detection System (IDS) needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This This dissertation aims to parallelize the Brute Force and Aho-Corasick string matching algorithms and to propose a new compression of the State Transition Table of the Aho-Corasick algorithm in order to make it possible to use it in shared memory and accelerate the comparison of strings. The two algorithms were parallelized using the NVIDIA CUDA platform and executed in the GPU memories to allow a comparative analysis of the performance of these memories. Initially, the AC algorithm proved to be faster than the Brute Force algorithm and so it was followed for optimization. The AC algorithm was compressed and executed in parallel in shared memory, achieving a performance gain of 15% over other GPU memories and being 48 times faster than its serial version when testing with real network packets. When the tests were done with synthetic data (less random data) the gain reached 73% and the parallel algorithm was 56 times faster than its serial version. Thus, it can be seen that the use of compression in shared memory becomes a suitable solution to accelerate the processing of IDSs that need agility in the search for patterns.
Um Sistema de Detecção de Intrusão (IDS) necessita comparar o conteúdo de todos os pacotes que chegam na interface da rede com um conjunto de assinaturas que indicam possíveis ataques, tarefa esta que consome bastante tempo de processamento da CPU. Para amenizar esse problema, tem-se tentado paralelizar o motor de comparação dos IDSs transferindo sua execução da CPU para a GPU. Esta dissertação tem como objetivo fazer a paralelização dos algoritmos de comparação de strings Força-Bruta e Aho-Corasick e propor uma nova compactação da Tabela de Transição de Estados do algoritmo Aho-Corasick a fim de possibilitar o uso dela na memória compartilhada e acelerar a comparação de strings. Os dois algoritmos foram paralelizados utilizando a plataforma CUDA da NVIDIA e executados nas memórias da GPU a fim de possibilitar uma análise comparativa de desempenho dessas memórias. Inicialmente, o algoritmo AC mostrou-se mais veloz do que o algoritmo Força-Bruta e por isso seguiu-se para sua otimização. O algoritmo AC foi compactado e executado de forma paralela na memória compartilhada, alcançando um ganho de desempenho de 15% em relação às outras memórias da GPU e sendo 48 vezes mais rápido que sua versão na CPU quando os testes foram feitos com pacotes de redes reais. Já quando os testes foram feitos com dados sintéticos (dados menos aleatórios) o ganho chegou a 73% e o algoritmo paralelo chegou a ser 56 vezes mais rápido que sua versão serial. Com isso, pode-se perceber que o uso da compactação na memória compartilhada torna-se uma solução adequada para acelerar o processamento de IDSs que necessitem de agilidade na busca por padrões.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Yahiaoui, Said. "Algorithmes et applications pour la coloration et les alliances dans les graphes". Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10274.

Texto completo

Resumen

Dans cette thèse, nous nous intéressons aux aspects algorithmiques et applications de deux problèmes de graphes, à savoir, la coloration et les alliances. La première partie concerne deux variantes de la coloration de graphes, la coloration Grundy et la coloration forte stricte. Nous commençons par l'étude du nombre Grundy des graphes réguliers. Nous donnons une condition fixe k, nous fournissons une condition nécessaire et suffisante pour que le nombre Grundy d'un graphe régulier soit au moins égal k. Nous caractérisons la classe des graphes cubiques (3-réguliers) pour laquelle le nombre Grundy est égal à 4, et nous présentons un algorithme linéaire pour déterminer le nombre Grundy d'un graphe cubique quelconque. Par ailleurs, en se basant sur la coloration forte stricte pour décomposer les arbres en petites composantes, nous présentons un nouvel algorithme pour l'appariement d'arbres étiquetés, non-ordonnés non-enracinés. Nous montrons que la distance calculée entre deux arbres est une pseudo-métrique. Nos expérimentations sur de larges bases synthétiques et des bases de données réelles confirment nos résultats analytiques et montrent que la distance proposée est précise et son algorithme est scalable. La seconde partie de cette thèse est consacrée aux alliances dans les graphes. Nous proposons un algorithme distribué autostabilisant pour la construction d'alliance offensive globale minimale dans un graphe arbitraire. Nous démontrons que cet algorithme converge sous le démon synchrone en temps linéaire. Ensuite, nous donnons le premier algorithme distribué autostabilisant pour le problème de l'alliance forte globale minimale dans un graphe quelconque. Nous prouvons que cet algorithme est polynomial sous le démon inéquitable distribué. Nous montrons par la suite, comment cet algorithme peut être adapté pour des généralisations du problème, comme la k-alliance forte et l'alliance forte pondérée. Enfin, en se basant sur les propriétés structurelles de l'alliance offensive, nous présentons une solution pour décentraliser le protocole de signalisation SIP. Ceci rend possible son déploiement dans un réseau mobile ad hoc
This thesis investigates the algorithmic aspects and applications of two graph problems, namely, colorings and alliances. In the first part, we focus on two variants of the proper vertex coloring, the Grundy coloring and the strict strong coloring. We start by the study of Grundy number for regular graphs. We give a sufficient condition for d-regular graphs with sufficiently large girth to have Grundy number equals d + 1. Then, using graph homomorphism, we obtain a necessary and sufficient condition for d-regular graphs to have Grundy number at least k. Moreover, we characterize cubic graphs (3-regular) for which the Grundy number is d + 1, and present a linear-time algorithm to determine the Grundy number of any arbitrary cubic graph. Subsequently, based on the strict strong coloring, we present an approach for the problem of matching labeled trees. Using this coloring, we propose a new algorithm to deterministically decompose a tree into small components. This leads to an efficient algorithm to measure an accurate distance between unrooted unordered labeled trees. The second part is devoted to the alliances in graphs. We first propose a linear-time self-stabilizing algorithm for the minimal global offensive alliance set problem, under the synchronous distributed scheduler. Then, we give the first self-stabilizing algorithm for the minimal global powerful alliance set problem in arbitrary graphs. Moreover, we show how this algorithm can be adapted to find the minimal global powerful k-alliance and the minimal weighted global powerful alliance sets. We prove that all these algorithms converge in polynomial-time under the unfair distributed scheduler. Finally, based on the structural properties of the offensive alliance, we propose a solution to decentralize the signaling protocol SIP. This enables SIP applications in mobile ad hoc networks

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Frey, Jeffrey Daniel. "Finding Song Melody Similarities Using a DNA String Matching Algorithm". Kent State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=kent1208961242.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Sawada, Joseph James. "Fast algorithms to generate restricted classes of strings under rotation". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0015/NQ48228.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Bavirisetty, Rambabu 1963. "COMPARISON OF STRESS RECOVERY ALGORITHMS". Thesis, The University of Arizona, 1987. http://hdl.handle.net/10150/276405.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Kotwal, Mithilesh N. "Encoding of trellises with strong tailbiting property". Ohio : Ohio University, 2005. http://www.ohiolink.edu/etd/view.cgi?ohiou1113584078.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Baker, Audrey. "An algorithm for the strong freeness of quadratic lie polynomials /". Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100761.

Texto completo

Resumen

The Lie algebra associated to the lower central and p-central series of a group is an important invariant of the group but is difficult to compute. For a finitely presented group this Lie algebra, can be determined under a certain condition on the initial forms of the relators, namely that of strong freeness. We give an algorithm for the strong freeness of 4 quadratic Lie polynomials in 4 variables over an arbitrary field thus extending a result of Bush and Labute.

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Gundu, Pavan Kumar. "Trajectory Tracking Control of Unmanned Ground Vehicles using an Intermittent Learning Algorithm". Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/93213.

Texto completo

Resumen

Traffic congestion and safety has become a major issue in the modern world's commute. Congestion has been causing people to travel billions of hours more and to purchase billions of gallons of fuel extra which account to congestion cost of billions of dollars. Autonomous driving vehicles have been one solution to this problem because of their huge impact on efficiency, pollution, and human safety. Also, extensive research has been carried out on control design of vehicular platoons because a further improvement in traffic throughput while not compromising the safety is possible when the vehicles in the platoon are provided with better predictive abilities. Motion control is a key area of autonomous driving research that handles moving parts of vehicles in a deliberate and controlled manner. A widely worked on problem in motion control concerned with time parameterized reference tracking is trajectory tracking. Having an efficient and effective tracking algorithm embedded in the autonomous driving system is the key for better performance in terms of resources consumed and tracking error. Many tracking control algorithms in literature rely on an accurate model of the vehicle and often, it can be an intimidating task to come up with an accurate model taking into consideration various conditions like friction, heat effects, ageing processes etc. And typically, control algorithms rely on periodic execution of the tasks that update the control actions, but such updates might not be required, which result in unnecessary actions that waste resources. The main focus of this work is to design an intermittent model-free optimal control algorithm in order to enable autonomous vehicles to track trajectories at high-speeds. To obtain a solution which is model-free, a Q-learning setup with an actor-network to approximate the optimal intermittent controller and a critic network to approximate the optimal cost, resulting in the appropriate tuning laws is considered.
Master of Science
A risen research effort in the area of autonomous vehicles has been witnessed in the past few decades because these systems improve safety, comfort, transport time and energy consumption which are some of the main issues humans are facing in the modern world’s highway systems. Systems like emergency braking, automatic parking, blind angle vehicle detection are creating a safer driving environment in populated areas. Advanced driver assistance systems (ADAS) are what such kind of systems are known as. An extension of these partially automated ADAS are vehicles with fully automated driving abilities, which are able to drive by themselves without any human involvement. An extensively proposed approach for making traffic throughput more efficient on existing highways is to assemble autonomous vehicles into platoons. Small intervehicle spacing and many vehicles constituting each platoon formation improve the traffic throughput significantly. Lately, the advancements in computational capabilities, in terms of both algorithms and hardware, communications, and navigation and sensing devices contributed a lot to the development of autonomous systems (both single and multiagent) that operate with high reliability in uncertain/dynamic operating conditions and environments. Motion control is an important area in the autonomous vehicles research. Trajectory-tracking is a widely studied motion control scenario which is about designing control laws that force a system to follow some time-dependent reference path and it is important to have an effective and efficient trajectory-tracking control law in an autonomous vehicle to reduce the resources consumed and tracking error. The goal of this work is to design an intermittent model-free trajectory tracking control algorithm where there is no need of any mathematical model of the vehicle system being controlled and which can reduce the controller updates by allowing the system to evolve in an open loop fashion and close the loop only when an user defined triggering condition is satisfied. The approach is energy efficient in that the control updates are limited to instances when they are needed rather than unnecessary periodic updates. Q-learning which is a model-free reinforcement learning technique is used in the trajectory tracking motion control algorithm to make the vehicles track their respective reference trajectories without any requirement of their motion model, the knowledge of which is generally needed when dealing with a motion control problem. The testing of the designed algorithm in simulations and experiments is presented in this work. The study and development of a vehicle platform in order to perform the experiments is also discussed. Different motion control and sensing techniques are presented and used. The vehicle platform is shown to track a reference trajectory autonomously without any human intervention, both in simulations and experiments, proving the effectiveness of the proposed algorithm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Momeninasab, Leila. "Design and Implementation of a Name Matching Algorithm for Persian Language". Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-102210.

Texto completo

Resumen

Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Persian language. The purpose of this thesis is to present a name matching algorithm for Persian. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly well for Persian names. This proposed algorithm, called Persian Edit Distance Algorithm or shortly PEDA, was built based on the characteristics of the Persian language and it compares Persian names with each other on three levels: phonetic similarity, character form similarity and keyboard distance, in order to give more accurate results for Persian names. The algorithm gets Persian names as its input and determines their similarity as a percentage in the output. In this thesis three series of experiments have been accomplished in order to evaluate the proposed algorithm. The f-measure average shows a value of 0.86 for the first series and a value of 0.80 for the second series results. The first series of experiments have been repeated with Levenshtein as well, and have 33.9% false negatives on average while PEDA has a false negative average of 6.4%. The third series of experiments shows that PEDA works well for one edit, two edits and three edits with true positive average values of 99%, 81%, and 69% respectively.

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Magnanti, Thomas L. y Rita Vachani. "A Strong Cutting Plane Algorithm for Production Scheduling with Changeover Costs". Massachusetts Institute of Technology, Operations Research Center, 1987. http://hdl.handle.net/1721.1/5192.

Texto completo

Resumen

Changeover costs (and times) are central to numerous manufacturing operations. These costs arise whenever work centers capable of processing only one product at a time switch from the manufacture of one product to another. Although many researchers have contributed to the solution of scheduling problems that include changeover costs, due to the problem's combinatorial explosiveness, optimization-based methods have met with limited success. In this paper, we develop and apply polyhedral methods from integer programming for a dynamic version of the problem. Computational tests with problems containing one to five products (and up to 225 integer variables) show that polyhedral methods based upon a set of facet inequalities developed in this paper can effectively reduce the gap between the value of an integer program formulation of the problem and its linear programming relaxation (by a factor of 94 to 100 per cent). These results suggest the use of a combined cutting plane/branch and bound procedure as a solution approach. In a test with a five product problem, this procedure, when compared with a standard linear programming-based branch and bound approach, reduced computation time by a factor of seven.

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Jackson, Robert Charles. "Data detection algorithms for perpendicular magnetic recording in the presence of strong media noise". Thesis, University of Warwick, 2008. http://wrap.warwick.ac.uk/851/.

Texto completo

Resumen

As the throughput and density requirements increase for perpendicular magnetic recording channels, the presence of strong media noise degrades performance. Detection algorithms have been developed that increase performance in channels with strong media noise through the use of data dependent detectors. However optimal data dependent detectors are exponentially more complex than data independent detectors, and therefore cannot be fully exploited. In this thesis we shall discuss the existing detection algorithms, comparing the performance against the complexity. We then introduce a new sub-optimal detection algorithm, which employs a simple pre-detector that supplies estimates to a main detector. Numerical simulations are performed which show near optimal performance, but without the exponential increase in complexity. We will also show how detector implementations can exploit structure in the trellis to further reduce complexity, through loops and path invariants. An analytical means of measuring bit error rate from only the statistics of noise is presented, and this is utilised to optimally determine the equaliser and ISI target coefficients for a white noise Viterbi detector. Finally, we introduce a new class of VLSI binary addition algorithms which can be utilised to increase the throughput of a Viterbi detector, but which also has a wider application in hardware design.

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Foster, Carol Lynn. "Algorithms, abstraction and implementation : a massively multilevel theory of strong equivalence of complex systems". Thesis, University of Edinburgh, 1991. http://hdl.handle.net/1842/6591.

Texto completo

Resumen

This thesis puts forward a formal theory of levels and algorithms to provide a foundation for those terms as they are used in much of cognitive science and computer science. Abstraction with respect to concreteness is distinguished from abstraction with respect to detail, resulting in three levels of concreteness and a large number of algorithmic levels, which are levels of detail and the primary focus of the theory. An algorithm or ideal machine is a set of sequences of states defining a particular level of detail. Rather than one fundamental ideal machine to describe the behaviour of a complex system, there are many possible ideal machines, extending Turing's approach to reflect the multiplicity of system descriptions required to express more than weak input-output equivalence of systems. Cognitive science is concerned with stronger equivalence; e.g., do two models go through the same states at some level of description? The state-based definition of algorithms serves as a basis for such strong equivalence and facilitates formal renditions of abstraction and implementation as relations between algorithms. It is possible to prove within the new framework whether or not one given algorithm is a valid implementation of another, or whether two unequal algorithms have a common abstraction, for example. Some implications of the theory are discussed, notably a characterisation of connectionist versus classical models.

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Martin, Florent de. "Influence of the nonlinear behaviour of soft soils on strong ground motions". Thesis, Châtenay-Malabry, Ecole centrale de Paris, 2010. http://www.theses.fr/2010ECAP0013/document.

Texto completo

Resumen

Le comportement nonlinéaire des sols observé lors des mouvements sismiques forts est maintenant bien admis et le déploiement des puits accélérométriques a permis des analyses détaillées de la propagation des ondes ainsi qu’une évaluation quantitative des paramètres physiques tels que la vitesse de cisaillement et de compression des ondes et les facteurs d’amortissements en fonction de la déformation. En dépit du nombre grandissant d’études sur ce phénomène, sa connaissance est encore récente et les recherches sur les données de puits accélérométriques restent une étape importante vers la compréhension du comportement complexe in-situ des sédiments soumis à des mouvements sismiques forts.L’objectif de ces travaux est triple. Premièrement, un code d’inversion par algorithme génétique est développé afin d’inverser des données de puits accélérométriques via la théorie des matrices de propagation de Thomson-Haskell. Cette technique nous permet dans un premier temps de valider la structure en une dimension (1D) (e.g., vitesse des ondes de cisaillement, facteurs d’ amortissements) d’un puits accélérométrique dans le domaine linéaire et dans un second temps de mettre en évidence de manière quantitative le comportement nonlinéaire des sédiments lors du séisme de Fukuoka, 2005, Japon. Deuxièmement, les résultats de l’inversion sont utilisés pour tester des lois de comportement simples et avancées en utilisant la Méthode des éléments Finis. Les résultats montrent clairement que l’hypothèse bi-linéaire de la loi de comportement simple produit des séries temporelles non réalistes en vitesse et en accélération. L’utilisation d’une loi de comportement avancée mène à de meilleurs résultats, cependant, le nombre de paramètres ajustables pour obtenir des résultats consistants avec l’observation est un obstable inévitable. Troisièmement, afin d’étendre l’étude des effets de site à des dimensions supérieures, des codes 2D et 3D de la Méthode en éléments Spectraux sont développés et validés en comparant leurs résultats dans le domaine linéaire avec ceux obtenus théoriquement ou via d’autres méthodes numériques
Nonlinear behavior of soft soils observed during strong ground motions isnow well established and the deployment of vertical arrays (i.e., boreholestations) has contributed to detailed wave propagation analyses and the assessmentfor quantitative physical parameters such as shear-wave velocity,pressure-wave velocity and damping factors with respect to shear strain levels.Despite the growing number of studies on this phenomena, its knowledgeis still recent and research on borehole station data remains an importantstep toward the understanding of the complex in-situ behavior of soft sedimentssubjected to strong ground motions.The purpose of this work is threefold. First, an inversion code by geneticalgorithm is developed in order to inverse borehole stations data viathe Thomson-Haskell propagator matrix method. This technique allows usto validate the one-dimensional (1D) structure (e.g., shear-wave velocity,damping factors) of a borehole in the linear elastic domain and to showquantitative evidence of the nonlinear behavior of the soft sediments duringthe 2005 Fukuoka Prefecture western offshore earthquake, Japan. Second,the results of the inversion are used in order to test simple and advancedconstitutive laws using the Finite Elements Method. The results clearlyshow that the bi-linear assumption of the simple constitutive law producesunrealistic velocity and acceleration time histories. The use of the advancedconstitutive law leads to better results, however, the number of parametersto be tuned in order to obtain results consistent with the observation is anunavoidable obstacle. Third, in order to extend the study of site effects tohigher dimensions, 2D and 3D codes of the very efficient Spectral ElementsMethod are developed and validated by comparing their results in the lineardomain with those obtained theoretically or with other numerical methods

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

BERNARDINI, GIULIA. "COMBINATORIAL METHODS FOR BIOLOGICAL DATA". Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/305220.

Texto completo

Resumen

Lo scopo di questa tesi è di elaborare e analizzare metodi rigorosi dal punto di vista matematico per l’analisi di due tipi di dati biologici: dati relativi a pan-genomi e filogenesi. Con il termine “pan-genoma” si indica, in generale, un insieme di sequenze genomiche strettamente correlate (tipicamente appartenenti a individui della stessa specie) che si vogliano utilizzare congiuntamente come sequenze di riferimento per un’intera popolazione. Una filogenesi, invece, rappresenta le relazioni evolutive in un gruppo di entità, che siano esseri viventi, geni, lingue naturali, manoscritti antichi o cellule tumorali. Con l’eccezione di uno dei risultati presentati in questa tesi, relativo all’analisi di filogenesi tumorali, il taglio della dissertazione è prevalentemente teorico: lo scopo è studiare gli aspetti combinatori dei problemi affrontati, più che fornire soluzioni efficaci in pratica. Una conoscenza approfondita degli aspetti teorici di un problema, del resto, permette un'analisi matematicamente rigorosa delle soluzioni già esistenti, individuandone i punti deboli e quelli di forza, fornendo preziosi dettagli sul loro funzionamento e aiutando a decidere quali problemi vadano ulteriormente investigati. Oltretutto, è spesso il caso che nuovi risultati teorici (algoritmi, strutture dati o riduzioni ad altri problemi più noti) si possano direttamente applicare o adattare come soluzione ad un problema pratico, o come minimo servano ad ispirare lo sviluppo di nuovi metodi efficaci in pratica. La prima parte della tesi è dedicata a nuovi metodi per eseguire delle operazioni fondamentali su un testo elastico-degenerato, un oggetto computazionale che codifica in maniera compatta un insieme di testi simili tra loro, come, ad esempio, un pan-genoma. Nello specifico, si affrontano il problema di cercare una sequenza di lettere in un testo elastico-degenerato, sia in maniera esatta che tollerando un numero prefissato di errori, e quello di confrontare due testi degenerati. Nella seconda parte si considerano sia filogenesi tumorali, che ricostruiscono per l'appunto l'evoluzione di un tumore, sia filogenesi "classiche", che rappresentano, ad esempio, la storia evolutiva delle specie viventi. In particolare, si presentano nuove tecniche per confrontare due o più filogenesi tumorali, necessarie per valutare i risultati di diversi metodi che ricostruiscono le filogenesi stesse, e una nuova e più efficiente soluzione a un problema di lunga data relativo a filogenesi "classiche", consistente nel determinare se sia possibile sistemare, in presenza di dati mancanti, un insieme di specie in un albero filogenetico che abbia determinate proprietà.
The main goal of this thesis is to develop new algorithmic frameworks to deal with (i) a convenient representation of a set of similar genomes and (ii) phylogenetic data, with particular attention to the increasingly accurate tumor phylogenies. A “pan-genome” is, in general, any collection of genomic sequences to be analyzed jointly or to be used as a reference for a population. A phylogeny, in turn, is meant to describe the evolutionary relationships among a group of items, be they species of living beings, genes, natural languages, ancient manuscripts or cancer cells. With the exception of one of the results included in this thesis, related to the analysis of tumor phylogenies, the focus of the whole work is mainly theoretical, the intent being to lay firm algorithmic foundations for the problems by investigating their combinatorial aspects, rather than to provide practical tools for attacking them. Deep theoretical insights on the problems allow a rigorous analysis of existing methods, identifying their strong and weak points, providing details on how they perform and helping to decide which problems need to be further addressed. In addition, it is often the case where new theoretical results (algorithms, data structures and reductions to other well-studied problems) can either be directly applied or adapted to fit the model of a practical problem, or at least they serve as inspiration for developing new practical tools. The first part of this thesis is devoted to methods for handling an elastic-degenerate text, a computational object that compactly encodes a collection of similar texts, like a pan-genome. Specifically, we attack the problem of matching a sequence in an elastic-degenerate text, both exactly and allowing a certain amount of errors, and the problem of comparing two degenerate texts. In the second part we consider both tumor phylogenies, describing the evolution of a tumor, and “classical” phylogenies, representing, for instance, the evolutionary history of the living beings. In particular, we present new techniques to compare two or more tumor phylogenies, needed to evaluate the results of different inference methods, and we give a new, efficient solution to a longstanding problem on “classical” phylogenies: to decide whether, in the presence of missing data, it is possible to arrange a set of species in a phylogenetic tree that enjoys specific properties.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Barton, Carl Samuel. "Algorithmic problems in strings with applications to the analysis of biological sequences". Thesis, King's College London (University of London), 2015. http://kclpure.kcl.ac.uk/portal/en/theses/algorithmic-problems-in-strings-with-applications-to-the-analysis-of-biological-sequences(461c8961-c256-4ff8-97f7-c0718709367d).html.

Texto completo

Resumen

Recent advances in molecular biology have dramatically changed the way biological data analysis is performed [119, 92]. Next-Generation Sequenc- ing (NGS) technologies produce high-throughput data of highly controlled quality, hundreds of times faster and cheaper than a decade ago. Mapping of short reads to a reference sequence is a fundamental problem in NGS technologies. After finding an occurrence of a high quality fragment of the read, the rest must be approximately aligned, but a good alignment would not be expected to contain a large number of gaps (consecutive insertions or deletions). We present an alternative alignment algorithm which computes the optimal alignment with a bounded number of gaps. Another problem arising from NGS technologies is merging overlapping reads into a single string. We present a data structure which allows for the efficient computation of the overlaps between two strings as well as being applicable to other problems. Weighted strings are a representation of data that allows for a subtle representation of ambiguity in strings. In this document we present algorithms for other problems related to weighted strings: the computation of exact and approximate inverted repeats in weighted strings, computing repetitions and computing covers. We investigate the average-case complexity of wildcard matching. Wildcards can be used to model single nucleotide polymorphisms and so, efficient algorithms to search for strings with wildcards are necessary. In this document we investigate how efficient algorithms for this problem can be on average. There exist many organisms such as viruses, bacteria, eukaryotic cells, and archaea which have a circular DNA structure. If a biologist wishes to find occurrences of a particular virus in a carriers DNA sequence which may not be circular it must be possible to efficiently locate occurrences of circular strings. In this document we present a number of algorithms for circular string matching.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Girard, Alexandre 1987. "Fast and strong lightweight robots based on variable gear ratio actuators and control algorithms leveraging the natural dynamics". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/111689.

Texto completo

Resumen

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 191-196).
In many applications, robots have to bear large loads while moving slowly and also have to move quickly through the air with almost no load. These type of bi-modal tasks, with conflicting requirements in terms of operating speeds and desired impedances, often lead to the use of oversized and inefficient actuators which are inhibitory particularly for mobile robots. Multiple gear ratios, like in a powertrain, address this issue by allowing an effective use of power over a wide range of output speeds, by enabling significant changes to the reflected intrinsic actuator impedances and by making possible the leveraging or attenuation of the natural load dynamics. This thesis aims to develop the technological solutions needed to use variable gear ratio actuators and exploit the advantages of variable transmissions in a robotic context. First, by addressing the issue of how to make fast and seamless gearshifts between two very different reduction ratios under diverse load conditions, with a solution based on a dual-motor actuator architecture and a control scheme using the null space. Second, by developing control algorithms that select optimal gear ratios dynamically based on state feedback, to move with minimal motor torques and to adjust the output impedance appropriately given a task. The proposed approach exploit variable transmissions not merely for increasing maximum torque and speed, but also to significantly alter the dynamic properties, including load sensitivity, robustness, and backdrivability. Simulations and experiments using a novel lightweight robotic arm using three custom-built dual-speed dual-motor actuators are presented. Results demonstrate very fast gear shifting in highly dynamic situations with dual-speed dual-motor actuators, and show that actively changing gear ratios using the proposed control algorithms can lead to an order-of-magnitude reduction of necessary motor torque and power.
by Alexandre Girard.
Ph. D.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Shih, Hung-Te y 施鴻德. "A Survey of String-Matching Algorithms". Thesis, 2004. http://ndltd.ncl.edu.tw/handle/18193520008520957794.

Texto completo

Resumen

碩士
逢甲大學
應用數學所
92
We study the algorithms for pattern matching, i.e., searching a query substring in a given string, which appears in the text processing, image processing, and other applications. The algorithms cover the Direct String-Matching algorithms, the Rabin-Karp algorithm, the Finite Automata, the Knuth-Morris-Pratt algorithm, the Boyer-Moore algorithm, etc. The time complexity, worst-case performance and the criteria for choosing these algorithms are also studied.

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Kuang, Hsueh Jui y 薛睿光. "A Study on String-Matching Algorithms". Thesis, 2003. http://ndltd.ncl.edu.tw/handle/45412037298316276441.

Texto completo

Resumen

碩士
大同大學
應用數學研究所
91
In this thesis, we give experimental results on the performance of several popular string-matching algorithms. Meanwhile the behaviors of the costs of some variant Boyer-Moore algorithms are investigated. The costs considered include the numbers of character inspections, and the total time used.

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Chen, Kuei-Hao y 陳奎昊. "Improved Algorithms for Exact String Matching Problems". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/64377810416197972394.

Texto completo

Resumen

博士
國立暨南國際大學
資訊工程學系
101
In this dissertation, we consider two problems: the exact string matching problem and its variation, the exact circular string matching. The exact string matching problem is: Given two strings, a text $T$ of length $n$ and a pattern $P$ of length $m$, find all occurrences of $P$ in $T$. We propose a strategy to analyze the average performance of the reverse factor algorithm. The analysis is based on the assumption that the text is very long as compared to the length of the pattern, and each symbol in the text is drawn uniformly from a random source with $sigma$ symbols. Our analysis uses only elementary techniques in probability theory and avoids applying complicated combinatorics in stringology. We also propose a new algorithm for exact string matching problem under the name Improved BNDM algorithm. The Improved BNDM algorithm uses the $q$-gram filtering technique to speed up the performance of the Turbo BNDM algorithm. The time complexity of the Improved BNDM algorithm achieves $mathcal{O}(n)$ in the worst case and $mathcal{O}(nlog_sigma m/m)$ in the average case where $sigma$ is the alphabet size. It is optimal in both worst case and average case. Another problem we discuss in this dissertation is the exact circular string matching problem. Given a string $P=p_1p_2cdots p_m$, let a string $P^{(i)}=p_ip_{i+1}cdots p_mp_1cdots p_{i-1}$. The exact circular string matching problem is: Given two strings, $T$ of length $n$ and $P$ of length $m$, find all occurrences of $P^{(i)}$ in $T$ for $1leq i leq m$. We propose two algorithms that perform searching of a circular string in the text using bit-parallel technique. Our algorithms use only the composition of bitwise-logical operations and basic arithmetic operations, and apply this technique to solve the problem. These algorithms are given names CSBNDM and $ ext{CSBNDN}_q$, respectively. We give several experiments to verify that they have good performance for random strings and DNA sequence.

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Lu, Chia Wei y 呂嘉維. "Efficient Exact and Approximate String Matching Algorithms". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49922113552090356808.

Texto completo

Resumen

博士
國立清華大學
資訊工程學系
102
In this thesis, we first propose two algorithms for exact string matching problem, which aims to find all the positions i's in a given text where a given pattern occurs. Our algorithms find an optimal selective comparing order of the characters of the pattern so that we could have a better performance in the searching phase. To find the optimal comparing order, we adopt the branch and bound approach. Moreover, our proposed algorithm can be combined with other existing exact string matching algorithms to improve the searching efficiency. The experimental results show that our algorithms indeed have the smallest number of character comparisons and are also efficient in time as compared with other existing exact string matching algorithms. Second, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called the k-difference problem), which aims to find all the positions i's in a given text such that there exists a substring of the text ending at position i whose edit distance from a given pattern is less than or equal to a given error bound k. Our experimental results on simulated datasets of DNA sequences show that when compared with other filtration algorithms, our filtration algorithm has better performance on the efficiency to filter out those positions of the text at which the pattern does not occur approximately. Moreover, our hybrid filtration strategy further improves the effectiveness of our filtration algorithm. Third, we propose a progressive approach to solve the DNA resequencing problem which is defined as follows: We are given an unknown DNA sequence X and a known reference sequence R. Our task is to see whether X and R are similar or not. The present popular approach is to break up X into subsequences by the next generation sequencing (NGS) technologies, called reads. We then map the reads of X onto R with a suitable error bound. However, if the similarity between X and R is not very high (<95%), there would be many reads unmapped, and we then cannot obtain the mutations inside the unmapped regions. One can use a large error bound to increase the number of reads mapped. But it is not a good solution because increasing error bound will also increase the probability of false positive mapping. Our approach uses a small error bound and to increase the number of reads mapped, our approach modifies R each time after the reads are mapped. Thus our approach is a progressive approach. Compared with other available tools, our approach allows us to be able to map more reads to the reference sequence. In our simulated experiments, we also show the high correctness of our mapping algorithm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Lu, Chia-Wei y 呂嘉維. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/17632385982720975496.

Texto completo

Resumen

碩士
國立暨南國際大學
資訊工程學系
96
In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Lu, Chia-Wei. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method". 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200815183100.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "String algorithms"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros