Дисертації з теми "Approximate string matching problem"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-43 дисертацій для дослідження на тему "Approximate string matching problem".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.
Повний текст джерелаNguyen, Hong-Thinh. "Approximate string matching distance for image classification." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4029/document.
Повний текст джерелаThe exponential increasing of the number of images requires efficient ways to classify them based on their visual content. The most successful and popular approach is the Bag of visual Word (BoW) representation due to its simplicity and robustness. Unfortunately, this approach fails to capture the spatial image layout, which plays an important roles in modeling image categories. Recently, Lazebnik et al (2006) introduced the Spatial Pyramid Representation (SPR) which successfully incorporated spatial information into the BoW model. The idea of their approach is to split the image into a pyramidal grid and to represent each grid cell as a BoW. Assuming that images belonging to the same class have similar spatial distributions, it is possible to use a pairwise matching as similarity measurement. However, this rigid matching scheme prevents SPR to cope with image variations and transformations. The main objective of this dissertation is to study a more flexible string matching model. Keeping the idea of local BoW histograms, we introduce a new class of edit distance to compare strings of local histograms. Our first contribution is a string based image representation model and a new edit distance (called SMD for String Matching Distance) well suited for strings composed of symbols which are local BoWs. The new distance benefits from an efficient Dynamic Programming algorithm. A corresponding edit kernel including both a weighting and a pyramidal scheme is also derived. The performance is evaluated on classification tasks and compared to the standard method and several related methods. The new method outperforms other methods thanks to its ability to detect and ignore identical successive regions inside images. Our second contribution is to propose an extended version of SMD replacing insertion and deletion operations by merging operations between successive symbols. In this approach, the number of sub regions ie. the grid divisions may vary according to the visual content. We describe two algorithms to compute this merge-based distance. The first one is a greedy version which is efficient but can produce a non optimal edit script. The other one is an optimal version but it requires a 4th degree polynomial complexity. All the proposed distances are evaluated on several datasets and are shown to outperform comparable existing methods
Marco-Sola, Santiago. "Efficient approximate string matching techniques for sequence alignment." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/460835.
Повний текст джерелаUno de los avances más importantes de los últimos años en el campo de la biotecnología ha sido el desarrollo de las llamadas técnicas de secuenciación de alto rendimiento (high-throughput sequencing, HTS). Debido a las limitaciones técnicas para secuenciar un genoma, las técnicas de alto rendimiento secuencian individualmente billones de pequeñas partes del genoma provenientes de regiones aleatorias. Posteriormente, estas pequeñas secuencias han de ser localizadas en el genoma de referencia del organismo en cuestión. Este proceso se denomina alineamiento - o mapeado - y consiste en identificar aquellas regiones del genoma de referencia que comparten una alta similaridad con las lecturas producidas por el secuenciador. De esta manera, en cuestión de horas, la secuenciación de alto rendimiento puede secuenciar un individuo y establecer las diferencias de este con el resto de la especie. En última instancia, estas tecnologías han potenciado nuevos protocolos y metodologías de investigación con un profundo impacto en el campo de la genómica, la medicina y la biología en general. La secuenciación alto rendimiento, sin embargo, supone un reto para los procesos tradicionales de análisis de datos. Debido a la elevada cantidad de datos a analizar, se necesitan nuevas y mejoradas técnicas algorítmicas que puedan escalar con el volumen de datos y producir resultados precisos. Esta tesis aborda dicho problema. Las contribuciones que en ella se realizan se enfocan desde una perspectiva metodológica y otra algorítmica que propone el desarrollo de nuevos algoritmos y técnicas que permitan alinear secuencias de manera eficiente, precisa y escalable. Desde el punto de vista metodológico, esta tesis analiza y propone un marco de referencia para evaluar la calidad de los resultados del alineamiento de secuencias. Para ello, se analiza el origen de los conflictos durante la alineación de secuencias y se exploran los límites alcanzables en calidad con las tecnologías de secuenciación de alto rendimiento. Desde el punto de vista algorítmico, en el contexto de la búsqueda aproximada de patrones, esta tesis propone nuevas técnicas algorítmicas y de diseño de índices con el objetivo de mejorar la calidad y el desempeño de las herramientas dedicadas a alinear secuencias. En concreto, esta tesis presenta técnicas de diseño de índices genómicos enfocados a obtener un acceso más eficiente y escalable. También se presentan nuevas técnicas algorítmicas de filtrado con el fin de reducir el tiempo de ejecución necesario para alinear secuencias. Y, por último, se proponen algoritmos incrementales y técnicas híbridas para combinar métodos de alineamiento y mejorar el rendimiento en búsquedas donde el error esperado es alto. Todo ello sin degradar la calidad de los resultados y con garantías formales de precisión. Para concluir, es preciso apuntar que todos los algoritmos y metodologías propuestos en esta tesis están implementados y forman parte del alineador GEM. Este versátil alineador ofrece resultados de alta calidad en entornos de producción siendo varias veces más rápido que otros alineadores. En la actualidad este software se ofrece gratuitamente, tiene una amplia comunidad de usuarios y ha sido citado en numerosas publicaciones científicas.
Mäkinen, Veli. "Parameterized approximate string matching and local-similarity-based point-pattern matching." Helsinki : University of Helsinki, 2003. http://ethesis.helsinki.fi/julkaisut/mat/tieto/vk/makinen/.
Повний текст джерелаSiragusa, Enrico [Verfasser]. "Approximate string matching for high-throughput sequencing / Enrico Siragusa." Berlin : Freie Universität Berlin, 2015. http://d-nb.info/1074404882/34.
Повний текст джерелаKeng, Leng Hui. "Approximate String Matching With Dynamic Programming and Suffix Trees." UNF Digital Commons, 2006. http://digitalcommons.unf.edu/etd/196.
Повний текст джерелаSmith, David. "Parallel approximate string matching applied to occluded object recognition." PDXScholar, 1987. https://pdxscholar.library.pdx.edu/open_access_etds/3724.
Повний текст джерелаDubois, Simon. "Offline Approximate String Matching forInformation Retrieval : An experiment on technical documentation." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-22566.
Повний текст джерелаPockrandt, Christopher Maximilian [Verfasser]. "Approximate String Matching : Improving Data Structures and Algorithms / Christopher Maximilian Pockrandt." Berlin : Freie Universität Berlin, 2019. http://d-nb.info/1183675879/34.
Повний текст джерелаRichter, Christoph Jan [Verfasser]. "Über die Vermeidung redundanter Betrachtungen beim Approximate String Matching / Christoph Jan Richter." Dortmund : Universitätsbibliothek Technische Universität Dortmund, 2005. http://d-nb.info/1011533499/34.
Повний текст джерелаJupin, Joseph. "Temporal Graph Record Linkage and k-Safe Approximate Match." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/412419.
Повний текст джерелаPh.D.
Since the advent of electronic data processing, organizations have accrued vast amounts of data contained in multiple databases with no reliable global unique identifier. These databases were developed by different departments for different purposes at different times. Organizing and analyzing these data for human services requires linking records from all sources. RL (Record Linkage) is a process that connects records that are related to the identical or a sufficiently similar entity from multiple heterogeneous databases. RL is a data and compute intensive, mission critical process. The process must be efficient enough to process big data and effective enough to provide accurate matches. We have evaluated an RL system that is currently in use by a local health and human services department. We found that they were using the typical approach that was offered by Fellegi and Sunter with tuple-by-tuple processing, using the Soundex as the primary approximate string matching method. The Soundex has been found to be unreliable both as a phonetic and as an approximate string matching method. We found that their data, in many cases, has more than one value per field, suggesting that the data were queried from a 5NF data base. Consider that if a woman has been married 3 times, she may have up to 4 last names on record. This query process produced more than one tuple per database/entity apparently generating a Cartesian product of this data. In many cases, more than a dozen tuples were observed for a single database/entity. This approach is both ineffective and inefficient. An effective RL method should handle this multi-data without redundancy and use edit-distance for approximate string matching. However, due to high computational complexity, edit-distance will not scale well with big data problems. We developed two methodologies for resolving the aforementioned issues: PSH and ALIM. PSH – The Probabilistic Signature Hash is a composite method that increases the speed of Damerau-Levenshtein edit-distance. It combines signature filtering, probabilistic hashing, length filtering and prefix pruning to increase the speed of edit-distance. It is also lossless because it does not lose any true positive matches. ALIM – Aggregate Link and Iterative Match is a graph-based record linkage methodology that uses a multi-graph to store demographic data about people. ALIM performs string matching as records are inserted into the graph. ALIM eliminates data redundancy and stores the relationships between data. We tested PSH for string comparison and found it to be approximately 6,000 times faster than DL. We tested it against the trie-join methods and found that they are up to 6.26 times faster but lose between 10 and 20 percent of true positives. We tested ALIM against a method currently in use by a local health and human services department and found ALIM to produce significantly more matches (even with more restrictive match criteria) and that ALIM ran more than twice as fast. ALIM handles the multi-data problem and PSH allows the use of edit-distance comparison in this RL model. ALIM is more efficient and effective than a currently implemented RL system. This model can also be expanded to perform social network analysis and temporal data modeling. For human services, temporal modeling can reveal how policy changes and treatments affect clients over time and social network analysis can determine the effects of these on whole families by facilitating family linkage.
Temple University--Theses
Owolabi, Olumide. "Hardware and software approximate string matching and pattern recognition for intelligent knowledge based systems." Thesis, University of Strathclyde, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280018.
Повний текст джерела黎少斌 and Shiao-bun Lai. "Trading off time for space for the string matching problem." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31214216.
Повний текст джерелаLai, Shiao-bun. "Trading off time for space for the string matching problem /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18061795.
Повний текст джерелаFogla, Prahlad. "Improving the Efficiency and Robustness of Intrusion Detection Systems." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19772.
Повний текст джерелаBergman, John. "Efficient fuzzy type-ahead search on big data using a ranked trie data structure." Thesis, Umeå universitet, Institutionen för fysik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-145029.
Повний текст джерелаEffektiviteten hos moderna sökmotorer beror på hur väl de presenterar rättstavade resultat för en användare medan en sökning skrivs. Så kallad fuzzy type-ahead sök kombinerar approximativ strängmatchning och sök-medan-du-skriver funktionalitet, vilket skapar ett kraftfullt verktyg för att utforska data. Dagens algoritmer för fuzzy type-ahead sök fungerar väl för små mängder data, men för data i storleksordningen “big data” från t.ex sociala nätverkstjänster så som Facebook, e-handelssidor så som Amazon, eller media tjänster så som YouTube, är en responsiv fuzzy type-ahead sök ännu en stor utmaning. Denna avhandling beskriver en metod som möjliggör responsiv type-ahead sök kombinerat med approximativ strängmatchning för big data genom att hålla söktiden optimal för mänsklig interaktion på bekostnad av lägre precision för mindre populär information när en sök-förfrågan innehåller felstavningar. Detta gör metoden effektiv för e-handel och mediatjänster där populariteten av sök-termer är ett resultat av mänskligt beteende vilket ofta följer en potens-lag distribution.
Toth, Róbert. "Přibližné vyhledávání řetězců v předzpracovaných dokumentech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236122.
Повний текст джерелаKällberg, David. "Nonparametric Statistical Inference for Entropy-type Functionals." Doctoral thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-79976.
Повний текст джерелаNeto, Domingos Soares. "Filtros para a busca e extração de padrões aproximados em cadeias biológicas." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-19102009-002745/.
Повний текст джерелаThis thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.
Тодоріко, Ольга Олексіївна. "Моделі та методи очищення та інтеграції текстових даних в інформаційних системах". Thesis, Запорізький національний університет, 2016. http://repository.kpi.kharkov.ua/handle/KhPI-Press/21856.
Повний текст джерелаThe thesis for the candidate degree in technical sciences, speciality 05.13.06 – Information Technologies. – National Technical University "Kharkiv Polytechnic Institute", Kharkiv, 2016. In the thesis the actual scientific and practical problem of increasing the efficiency and quality of cleaning and integration of data in information reference system and information retrieval system is solved. The improvement of information technology of cleaning and integration of data is achieved by reduction of quantity of mistakes in text information by means of use of model of an inflectional paradigm, methods of creation of a lexeme index, advanced methods of tolerant retrieval. The developed model of an inflectional paradigm includes a representation of words as an ordered collection of signatures and an approximate measure of similarity between two representations. The model differs in method of dealing with forms of words and character positions. It provides the basis for the implementation of improved methods of tolerant retrieval, cleaning and integration of datasets. The method of creation of the lexeme index which is based on the offered model of an inflectional paradigm is developed, and it allows mapping a word and all its forms to a record of the index. The method of tolerant retrieval is improved at preliminary filtration stage thanks to the developed model of an inflectional paradigm and the lexeme index. The experimental efficiency evaluation indicates high precision and 99 0,5 % recall. The information technology of cleaning and integration of data is improved using the developed models and methods. The software which on the basis of the developed models and methods carries out tolerant retrieval, cleaning and integration of data sets was developed. Theoretical and practical results of the thesis are introduced in production of document flow of an entrance committee and educational process of mathematical faculty of the State institution of higher education "Zaporizhzhya National University".
Тодоріко, Ольга Олексіївна. "Моделі та методи очищення та інтеграції текстових даних в інформаційних системах". Thesis, НТУ "ХПІ", 2016. http://repository.kpi.kharkov.ua/handle/KhPI-Press/21853.
Повний текст джерелаThe thesis for the candidate degree in technical sciences, speciality 05.13.06 – Information Technologies. – National Technical University «Kharkiv Polytechnic Institute», Kharkiv, 2016. In the thesis the actual scientific and practical problem of increasing the efficiency and quality of cleaning and integration of data in information reference system and information retrieval system is solved. The improvement of information technology of cleaning and integration of data is achieved by reduction of quantity of mistakes in text information by means of use of model of an inflectional paradigm, methods of creation of a lexeme index, advanced methods of tolerant retrieval. The developed model of an inflectional paradigm includes a representation of words as an ordered collection of signatures and an approximate measure of similarity between two representations. The model differs in method of dealing with forms of words and character positions. It provides the basis for the implementation of improved methods of tolerant retrieval, cleaning and integration of datasets. The method of creation of the lexeme index which is based on the offered model of an inflectional paradigm is developed, and it allows mapping a word and all its forms to a record of the index. The method of tolerant retrieval is improved at preliminary filtration stage thanks to the developed model of an inflectional paradigm and the lexeme index. The experimental efficiency evaluation indicates high precision and 99 0,5 % recall. The information technology of cleaning and integration of data is improved using the developed models and methods. The software which on the basis of the developed models and methods carries out tolerant retrieval, cleaning and integration of data sets was developed. Theoretical and practical results of the thesis are introduced in production of document flow of an entrance committee and educational process of mathematical faculty of the State institution of higher education «Zaporizhzhya National University».
Huang, Shih-Yuan, and 黃詩元. "Approximate String Matching under Non-Overlapping Inversions." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/91360854996893997589.
Повний текст джерела國立清華大學
資訊工程學系
102
In this thesis, we introduce and study the approximate string matching problem under non-overlapping inversion distance. Given a text t, a pattern p and a non-negative integer k, the goal of the problem is to find all locations in the text t that match the pattern p with at most k non-overlapping inversions. First, we use the dynamic programming approach to design an algorithm that solves this problem in O(nm^2 ) time and O(m^2) space, where n is the length of the text and m is the length of the pattern. Next, we present another algorithm based on an efficient filtering strategy that has the same worst-case time and space complexities as the first algorithm.
Lu, Chia Wei, and 呂嘉維. "Efficient Exact and Approximate String Matching Algorithms." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49922113552090356808.
Повний текст джерела國立清華大學
資訊工程學系
102
In this thesis, we first propose two algorithms for exact string matching problem, which aims to find all the positions i's in a given text where a given pattern occurs. Our algorithms find an optimal selective comparing order of the characters of the pattern so that we could have a better performance in the searching phase. To find the optimal comparing order, we adopt the branch and bound approach. Moreover, our proposed algorithm can be combined with other existing exact string matching algorithms to improve the searching efficiency. The experimental results show that our algorithms indeed have the smallest number of character comparisons and are also efficient in time as compared with other existing exact string matching algorithms. Second, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called the k-difference problem), which aims to find all the positions i's in a given text such that there exists a substring of the text ending at position i whose edit distance from a given pattern is less than or equal to a given error bound k. Our experimental results on simulated datasets of DNA sequences show that when compared with other filtration algorithms, our filtration algorithm has better performance on the efficiency to filter out those positions of the text at which the pattern does not occur approximately. Moreover, our hybrid filtration strategy further improves the effectiveness of our filtration algorithm. Third, we propose a progressive approach to solve the DNA resequencing problem which is defined as follows: We are given an unknown DNA sequence X and a known reference sequence R. Our task is to see whether X and R are similar or not. The present popular approach is to break up X into subsequences by the next generation sequencing (NGS) technologies, called reads. We then map the reads of X onto R with a suitable error bound. However, if the similarity between X and R is not very high (<95%), there would be many reads unmapped, and we then cannot obtain the mutations inside the unmapped regions. One can use a large error bound to increase the number of reads mapped. But it is not a good solution because increasing error bound will also increase the probability of false positive mapping. Our approach uses a small error bound and to increase the number of reads mapped, our approach modifies R each time after the reads are mapped. Thus our approach is a progressive approach. Compared with other available tools, our approach allows us to be able to map more reads to the reference sequence. In our simulated experiments, we also show the high correctness of our mapping algorithm.
Pan, Z. H., and 潘子豪. "An Approximate String Matching Based on an Exact String Matching with Constant Pattern Length." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/45738203365580501083.
Повний текст джерела國立暨南國際大學
資訊工程學系
96
The string matching problem is a very important problem which is needed to solve in computer science, computational biology and other domains. The exact matching problem is defined as follows: Given a string T whose length is n and a string P whose length is m, n≧m, find each location where string P occurs in string T. The approximate string matching problem is defined as follows: Given a pattern string P of length and a text string T of length n, n≧m, and a maximal number k of errors allowed, find each location i of T such that there exists a suffix T of T(0,i) and ED(A,B)≦k. In our thesis, we will use the exact string matching to solve the approximate string matching problem.
Pan, Zu-Hao. "An Approximate String Matching Based on an Exact String Matching with Constant Pattern Length." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200816061900.
Повний текст джерелаWang, Hui Min, and 王惠民. "A Pre-processing Procedure for Approximate String Matching." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/93583400034298936718.
Повний текст джерела中原大學
應用數學研究所
83
Given two strings P=p1p2...pm and T=t1t2...tn over an alphabet Omega whose size is s, the edit distance d(P,T) between P and T is the minimum number of editing operations needed to convert P to T. An editing operation is either an insertion, a deletion or a change of a character. The k-differences problem is to find all substrings T'' of T such that d(P,T'')<= k, where k is a positive integer. Many researchers work on this problem. The main purpose of this paper is not proposing a new algorithm but instead of a pre-processing procedure. This procedure can be easily applied to various algorithms. The basic idea is to find the impossible occurrence by counting the number of each symbol. When the approximate strings happen rarely, this procedure performs a good result.
Burkhardt, Stefan [Verfasser]. "Filter algorithms for approximate string matching / von Stefan Burkhardt." 2004. http://d-nb.info/972323015/34.
Повний текст джерелаLIU, YAN-SHAO, and 劉彥劭. "Query by Humming System Based-on Approximate String Matching Technique." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/96901427832257868600.
Повний текст джерела國立臺灣大學
資訊工程學研究所
90
With the growth of digital music released on the Internet, the requirement of the music information retrieval system becomes more and more important. In this thesis, we proposed and realized a query-by-humming system based on string matching technique. Users can find songs they want by humming about ten seconds notes of the song. In this system, we adopt autocorrelation pitch tracking, amplitude based note segmentation, and key adjusting by key finding algorithm, to get the pitch interval and duration ratio representation from the user’s acoustic input. And different melody representations and similarity comparison methods are performed and compared. Besides, we try to use Genetic Algorithm to optimize the system parameter. Through the training process, we can find the most suitable parameter for the humming habits of the user, and it can help us to improve the query accuracy of our system.
Lu, Chia-Wei, and 呂嘉維. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/17632385982720975496.
Повний текст джерела國立暨南國際大學
資訊工程學系
96
In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.
Lu, Chia-Wei. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200815183100.
Повний текст джерелаShieh, Y. K., and 謝一功. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/66121462520038468689.
Повний текст джерела國立暨南國際大學
資訊工程學系
96
In this thesis, we discuss the exact string matching problem and approximate string matching problem. We avoid using a brute-force algorithm to solve the string matching problem. We improve the Bad Character Rule of Boyer and Moore Algorithm and compare it with the Horspool Algorithm. And we find that the improved method include the information of the Horspool Algorithm. Therefore, we combine the improved method with the Horspool Algorithm. The combinative method has a better efficiency in searching phase.
Shieh, Yi-Kung. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200816121600.
Повний текст джерелаHuang, Chun-cheng, and 黃俊程. "High-Performance Parallel Location-Aware Algorithms for Approximate String Matching on GPUs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/5b596z.
Повний текст джерела國立臺灣師範大學
電機工程學系
105
Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement. Finally, we develop a web service which allows users to perform approximate string matching on line and deliver all matched substrings with the start and end positions.
Chen, Hui-Min, and 陳慧敏. "An Exact String Matching Problem Using Data Encoding Scheme." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/99302180071058684639.
Повний текст джерела國立暨南國際大學
資訊工程學系
96
The traditional exact string matching problem is to find all locations of a pattern string with length m in a text with length n. Here we propose a new encoding method to shorten the both lengths of pattern and text by substituting the substring between a special character for its length in O(m+n). Then we use an exact matching algorithm to solve the exact string matching problem on the encoding pattern and text. As can be seen、by using the encoding method、the pattern and text can be shortened about 2/|Σ| times the lengths of the original ones. In practice、it performs better than 2/|Σ|. For instance、for an English sentence pattern whose length is 50 and a text whose length is 200000、in average、the pattern is shortened to 6% of its original length and the text is shortened to 12.4% of its original length. Thus、the exact matching can be done in a much shorter time.
Chen, Hui-Min. "An Exact String Matching Problem Using Data Encoding Scheme." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814110600.
Повний текст джерелаOu, Chia-Shin, and 歐家欣. "On the Bit-Parallel Approaches to String Matching Problem." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/71663047138441425921.
Повний текст джерела國立清華大學
資訊工程學系
102
In this thesis, we consider two problems: (1) Developing a systematical approach to solve problems involving special properties of bit-vectors (2) Developing a bit-parallel approach to solve the nearest neighbor string matching problem. For problem 1, suppose that we are given a vector consisting of only 1's and 0's and we are interested in finding some special properties of this vector. These properties all involve "for-all" or "there-exists" notations and we are interested in bit-parallel processes to find these properties. In Myers' work, a sequence of logical operations can be expressed as a logical formula: (∃k ≤ i, A(k) and ∀x∈[k, i – 1]) = (((A & B) + B) ^ B) | A) which is by no means easy to be obtained. The contribution of our research is to present a systematical method to find such logical formulas to solve problems involving bit vectors with "for-all" and "there-exists" notations. Using our way of thinking, the equation in Myers' work can be deduced step by step. Five logical prototype problems, "single-for-all", "single-there- exists", "multiple-for-all", "multiple- there-exists" and "multiple-there-exists-and-for-all", are defined in this thesis and are proved that all can be computed using bit-parallel operations in O(n/w) time, where w is the word size of the machine. We also consider four variants of the five problems and show that their logical formulas can be obtained using those of the five prototype problems systematically. The nearest neighbor string matching problem is defined as follows: Given a text string T = t1t2…tn and a pattern string P = p1p2…pm, the nearest neighbor string matching problem is to find all substrings of T whose edit distances with P are the smallest, among all substrings of T. The nearest neighbor string matching problem has useful applications in bioinformatics. It can be straight-forwardly solved by the Seller's Algorithm and the Myers Algorithm which are used to solve the approximate string matching problem. Hyyro and Navarro proposed a filtering algorithm to speed up the Myers Algorithm. However, Hyyro and Navarro's filtering approach needs to perform a pre-processing based on the error bound k which is given by the definition of approximate string matching problem. Hence, it is not suitable to be used to solve the nearest neighbor string matching problem which has no k. In this thesis, we present a modification of the Hyyro and Navarro Algorithm, and also present a bit-parallel algorithm combining the Myers Algorithm and the modified Hyyro and Navarro Algorithm. In experiments, we show that our bit-parallel algorithm is efficient.
Lin, Sheng-Yuan, and 林聖淵. "A High Performance Parallel Algorithm for Approximate String Matching on Multi-core Processor." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/96412144907123635479.
Повний текст джерелаHou, Kuan-Wei, and 侯冠維. "The Discrete Convolution Method for Solving the Exact String Matching Problem." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/13881027671610839668.
Повний текст джерела國立清華大學
電機工程學系
100
In this thesis, we introduce discrete convolution method on solving the exact string matching problem. Based on the assumption that all the text and pattern strings are generated randomly, we derived an equation which can approximate the probability of appearing of a pattern string in a text string. From this equation, we see that the probability that a pattern string appears in a text string reduces to 0 quickly as the length of the pattern string increases. Because of this observation, we introduce an algorithm based on the discrete convolution method with early termination. The algorithm terminates as soon as it discovers that a prefix of the pattern string does not appear in the text string. We show that the discrete convolution method with early termination is quite efficient to solve exact string matching problem for randomly generated text and pattern strings. In this thesis, we also show that the shift-add algorithm is equivalent to and can be implemented by the discrete convolution.
Yang, Chao-Yuan, and 楊朝淵. "A New Algorithm for Solving the String Matching with k-Mismatches Problem." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/53970156684807658094.
Повний текст джерела國立暨南國際大學
資訊工程學系
93
String matching with k-mismatches problem is defined as follows. Given a pattern P with length m, a text T with length n and an integer k, string matching with k-mismatches problem wants to find all occurrences of P on T with k maximal number of mismatches allowable. In this thesis, we propose a new algorithm to solve the problem. The algorithm uses the Kangaroo method mentioned in the paper by Galil and Giancarlo and that by Landau and Vishkin. Our algorithm and the Kangaroo Method are distinct from the number of steps to find mismatches. Each step in our algorithm may find two new mismatches or at least one new mismatch in some case. We can compare our algorithm to the Kangaroo Method. Each step in the Kangaroo Method only finds a mismatch. Another difference between our algorithm and the Kangaroo method is the number of positions on the text which we need to verify if those positions are occurrences of the pattern with at most k mismatches. Our algorithm would consider that some positions on the text are impossible to be occurrences of the pattern and then we don’t need to waste any time to check whether those locations are matches with at most k mismatches.
Liao, Kuei-Hui, and 廖桂慧. "Solving the Exact String Matching Problem by Using the 2-Substring Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/46723272181241712677.
Повний текст джерела國立暨南國際大學
生物醫學科技研究所
95
String matching is a very important component of many problems, such as data compression, search engine, speech recognition, virus detection, computational biology, and so on. There are many efficient method proposed to solve the string matching problem. For example, KMP algorithm、Boyer-and-Moore algorithm. In this thesis, we proposed a method to solve the exact string matching problem. We proposed a rule, called the 2-substring rule, which avoids the brute force method and can be used to solve the problem. We know the time complexity of KMP algorithm is better than Boyer-and-Moore algorithm. But the practice of Boyer-and-Moore algorithm is faster than KMP algorithm. We implement our method in practice. Our method not only is faster than KMP algorithm, Boyer and Moore algorithm, Horspool algorithm and Quick Search algorithm but also is earlier to implement.
Zhong-He, Chen, and 陳中和. "The Application of Convolution to Suffix to Prefix Rule for the Exact String Matching Problem." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/55234892315310089339.
Повний текст джерела國立暨南國際大學
資訊工程學系
95
In this thesis, we consider the exact string matching problem. We first point out a rule, called the suffix to prefix rule, which can be used to avoid the brute-force sliding window approach. The Backward Nondeterministic DAWG Matching algorithm, Backward Oracle algorithm and Reverse Factor algorithm all use this rule. To implement this rule, we have to find the longest suffix of text T which is equal to a prefix of pattern P. In this thesis, we point out that convolution can be used to do this. As can be seen, the convolution technique is easy to understand and easy to program.
Chen, Zhong-He. "The Application of Convolution to Suffix to Prefix Rule for the Exact String Matching Problem." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2006200719565300.
Повний текст джерелаDobiášovský, Jan. "Přibližná shoda znakových řetězců a její aplikace na ztotožňování metadat vědeckých publikací." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-415121.
Повний текст джерела