Academic literature on the topic 'String similarity measure'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'String similarity measure.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "String similarity measure"

1

Revesz, Peter Z. "A Tiling Algorithm-Based String Similarity Measure." WSEAS TRANSACTIONS ON COMPUTER RESEARCH 9 (August 10, 2021): 109–12. http://dx.doi.org/10.37394/232018.2021.9.13.

Full text
Abstract:
This paper describes a similarity measure for strings based on a tiling algorithm. The algorithm is applied to a pair of proteins that are described by their respective amino acid sequences. The paper also describes how the algorithm can be used to find highly conserved amino acid sequences and examples of horizontal gene transfer between different species
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Bakry, Abbas, and Marwa Al-Rikaby. "Enhanced Levenshtein Edit Distance Method functioning as a String-to-String Similarity Measure." Iraqi Journal for Computers and Informatics 42, no. 1 (December 31, 2016): 48–54. http://dx.doi.org/10.25195/ijci.v42i1.83.

Full text
Abstract:
Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.
APA, Harvard, Vancouver, ISO, and other styles
3

Sakunthala Prabha, K. S., C. Mahesh, and S. P. Raja. "An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm." Cybernetics and Information Technologies 21, no. 2 (June 1, 2021): 105–20. http://dx.doi.org/10.2478/cait-2021-0022.

Full text
Abstract:
Abstract Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.
APA, Harvard, Vancouver, ISO, and other styles
4

Rakhmawati, Nur Aini, and Miftahul Jannah. "Food Ingredients Similarity Based on Conceptual and Textual Similarity." Halal Research Journal 1, no. 2 (October 27, 2021): 87–95. http://dx.doi.org/10.12962/j22759970.v1i2.107.

Full text
Abstract:
Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.
APA, Harvard, Vancouver, ISO, and other styles
5

Znamenskij, Sergej Vital'evich. "Stable assessment of the quality of similarity algorithms of character strings and their normalizations." Program Systems: Theory and Applications 9, no. 4 (December 28, 2018): 561–78. http://dx.doi.org/10.25209/2079-3316-2018-9-4-561-578.

Full text
Abstract:
The choice of search tools for hidden commonality in the data of a new nature requires stable and reproducible comparative assessments of the quality of abstract algorithms for the proximity of symbol strings. Conventional estimates based on artificially generated or manually labeled tests vary significantly, rather evaluating the method of this artificial generation with respect to similarity algorithms, and estimates based on user data cannot be accurately reproduced. A simple, transparent, objective and reproducible numerical quality assessment of a string metric. Parallel texts of book translations in different languages are used. The quality of a measure is estimated by the percentage of errors in possible different tries of determining the translation of a given paragraph among two paragraphs of a book in another language, one of which is actually a translation. The stability of assessments is verified by independence from the choice of a book and a pair of languages. The numerical experiment steadily ranked by quality algorithms for abstract character string comparisons and showed a strong dependence on the choice of normalization.
APA, Harvard, Vancouver, ISO, and other styles
6

Setiawan, Rudi. "Similarity Checking Similarity Checking of Source Code Module Using Running Karp Rabin Greedy String Tiling." Science Proceedings Series 1, no. 2 (April 24, 2019): 43–46. http://dx.doi.org/10.31580/sps.v1i2.624.

Full text
Abstract:
Similarity checking of source code module, required a long process if it is done manually. Based on that problem, this research designed a software with structure-based approach using string matching technique with Running Karp-Rabin Greedy String Tiling (RKR-GST) Algorithm to check the similarity and using Dice Coefficient method to measure the level of similarity from 2 results source code modules. The result of the experiments show that RKRGST which applied in this system capable of recognizing the changing of statement and the changing statement order, and be able to recognize the syntax procedure testing that has been taken from its comparison module. Modification by adding the comment on source code module and changing of procedure name which is called in body of procedure can also be recognized by system. Processing time needed to produce output depends on the number of program code row that contained in source code module.
APA, Harvard, Vancouver, ISO, and other styles
7

RODRIGUEZ, WLADIMIR, MARK LAST, ABRAHAM KANDEL, and HORST BUNKE. "GEOMETRIC APPROACH TO DATA MINING." International Journal of Image and Graphics 01, no. 02 (April 2001): 363–86. http://dx.doi.org/10.1142/s0219467801000220.

Full text
Abstract:
In this paper, a new, geometric approach to pattern identification in data mining is presented. It is based on applying string edit distance computation to measuring the similarity between multi-dimensional curves. The string edit distance computation is extended to allow the possibility of using strings, where each element is a vector rather than just a symbol. We discuss an approach for representing 3D-curves using the curvature and the tension as their symbolic representation. This transformation preserves all the information contained in the original 3D-curve. We validate this approach through experiments using synthetic and digitalized data. In particular, the proposed approach is suitable to measure the similarity of 3D-curves invariant under translation, rotation, and scaling. It also can be applied for partial curve matching.
APA, Harvard, Vancouver, ISO, and other styles
8

Samanta, Soumitra, Steve O’Hagan, Neil Swainston, Timothy J. Roberts, and Douglas B. Kell. "VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder." Molecules 25, no. 15 (July 29, 2020): 3446. http://dx.doi.org/10.3390/molecules25153446.

Full text
Abstract:
Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhu, Jin, Dayu Cheng, Weiwei Zhang, Ci Song, Jie Chen, and Tao Pei. "A New Approach to Measuring the Similarity of Indoor Semantic Trajectories." ISPRS International Journal of Geo-Information 10, no. 2 (February 20, 2021): 90. http://dx.doi.org/10.3390/ijgi10020090.

Full text
Abstract:
People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.
APA, Harvard, Vancouver, ISO, and other styles
10

Sabarish, B. A., Karthi R., and Gireesh Kumar T. "String-Based Feature Representation for Trajectory Clustering." International Journal of Embedded and Real-Time Communication Systems 10, no. 2 (April 2019): 1–18. http://dx.doi.org/10.4018/ijertcs.2019040101.

Full text
Abstract:
A trajectory is the spatial trail of a moving object as a function of time. All moving objects such as humans, robots, cloud, taxis, animals, mobile phones generate trajectories. Trajectory clustering is grouping of trajectories that have similar moving patterns, and the formed clusters depend on feature representation, similarity metrics, and clustering algorithm used. In this article, trajectory features are generated after mapping trajectories onto grids, as this smoothens the variations that occur in spatial coordinates. These variations occur due to differences in how GPS points at varying intervals are generated by the device, even when they follow the same path. The main motivation for the article is to devise an algorithm for trajectory clustering that is independent of the variations from GPS devices. A string-based model is used, where trajectories are represented as strings and string-based distance metrics are used to measure the similarity between trajectories. A hierarchical method is applied for clustering and the results are validated using three metrics. An experimental study is conducted and the results show the effectiveness of string-based representation and distance metrics for trajectory clustering.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "String similarity measure"

1

Тодоріко, Ольга Олексіївна. "Моделі та методи очищення та інтеграції текстових даних в інформаційних системах." Thesis, Запорізький національний університет, 2016. http://repository.kpi.kharkov.ua/handle/KhPI-Press/21856.

Full text
Abstract:
Дисертація на здобуття наукового ступеня кандидата технічних наук за спеціальністю 05.13.06 – інформаційні технології. – Національний технічний університет "Харківський політехнічний інститут", Харків, 2016. У дисертаційній роботі вирішена актуальна науково-практична задача підвищення ефективності та якості технології очищення та інтеграції текстових даних в довідкових і пошукових інформаційних системах за рахунок використання моделей словозмінної парадигми та методу побудови лексемного індексу при організації пошуку за схожістю. Розроблено моделі словозмінної парадигми, що включають представлення слів та обчислення приблизної міри схожості між ними. Розроблено метод побудови лексемного індексу, що базується на запропонованих моделях словозмінної парадигми та дозволяє відобразити слово і всі його словоформи в один запис індексу. Удосконалено метод пошуку за схожістю за рахунок покращення етапу попередньої фільтрації завдяки використанню розробленої моделі словозмінної парадигми та лексемного індексу. Виконана експериментальна оцінка ефективності вказує на високу точність та 99 0,5 % повноту. Удосконалено інформаційну технологію очищення та інтеграції даних за рахунок розроблених моделей та методів. Розроблено програмну реалізацію, яка на базі запропонованих моделей та методів виконує пошук за схожістю, очищення та інтеграцію наборів даних. Одержані в роботі теоретичні та практичні результати впроваджено у виробничий процес документообігу приймальної комісії та навчальний процес математичного факультету Державного вищого навчального закладу "Запорізький національний університет".
The thesis for the candidate degree in technical sciences, speciality 05.13.06 – Information Technologies. – National Technical University "Kharkiv Polytechnic Institute", Kharkiv, 2016. In the thesis the actual scientific and practical problem of increasing the efficiency and quality of cleaning and integration of data in information reference system and information retrieval system is solved. The improvement of information technology of cleaning and integration of data is achieved by reduction of quantity of mistakes in text information by means of use of model of an inflectional paradigm, methods of creation of a lexeme index, advanced methods of tolerant retrieval. The developed model of an inflectional paradigm includes a representation of words as an ordered collection of signatures and an approximate measure of similarity between two representations. The model differs in method of dealing with forms of words and character positions. It provides the basis for the implementation of improved methods of tolerant retrieval, cleaning and integration of datasets. The method of creation of the lexeme index which is based on the offered model of an inflectional paradigm is developed, and it allows mapping a word and all its forms to a record of the index. The method of tolerant retrieval is improved at preliminary filtration stage thanks to the developed model of an inflectional paradigm and the lexeme index. The experimental efficiency evaluation indicates high precision and 99  0,5 % recall. The information technology of cleaning and integration of data is improved using the developed models and methods. The software which on the basis of the developed models and methods carries out tolerant retrieval, cleaning and integration of data sets was developed. Theoretical and practical results of the thesis are introduced in production of document flow of an entrance committee and educational process of mathematical faculty of the State institution of higher education "Zaporizhzhya National University".
APA, Harvard, Vancouver, ISO, and other styles
2

Тодоріко, Ольга Олексіївна. "Моделі та методи очищення та інтеграції текстових даних в інформаційних системах." Thesis, НТУ "ХПІ", 2016. http://repository.kpi.kharkov.ua/handle/KhPI-Press/21853.

Full text
Abstract:
Дисертація на здобуття наукового ступеня кандидата технічних наук за спеціальністю 05.13.06 – інформаційні технології. – Національний технічний університет «Харківський політехнічний інститут», Харків, 2016. У дисертаційній роботі вирішена актуальна науково-практична задача підвищення ефективності та якості технології очищення та інтеграції текстових даних в довідкових і пошукових інформаційних системах за рахунок використання моделей словозмінної парадигми та методу побудови лексемного індексу при організації пошуку за схожістю. Розроблено моделі словозмінної парадигми, що включають представлення слів та обчислення приблизної міри схожості між ними. Розроблено метод побудови лексемного індексу, що базується на запропонованих моделях словозмінної парадигми та дозволяє відобразити слово і всі його словоформи в один запис індексу. Удосконалено метод пошуку за схожістю за рахунок покращення етапу попередньої фільтрації завдяки використанню розробленої моделі словозмінної парадигми та лексемного індексу. Виконана експериментальна оцінка ефективності вказує на високу точність та 99 0,5 % повноту. Удосконалено інформаційну технологію очищення та інтеграції даних за рахунок розроблених моделей та методів. Розроблено програмну реалізацію, яка на базі запропонованих моделей та методів виконує пошук за схожістю, очищення та інтеграцію наборів даних. Одержані в роботі теоретичні та практичні результати впроваджено у виробничий процес документообігу приймальної комісії та навчальний процес математичного факультету Державного вищого навчального закладу «Запорізький національний університет».
The thesis for the candidate degree in technical sciences, speciality 05.13.06 – Information Technologies. – National Technical University «Kharkiv Polytechnic Institute», Kharkiv, 2016. In the thesis the actual scientific and practical problem of increasing the efficiency and quality of cleaning and integration of data in information reference system and information retrieval system is solved. The improvement of information technology of cleaning and integration of data is achieved by reduction of quantity of mistakes in text information by means of use of model of an inflectional paradigm, methods of creation of a lexeme index, advanced methods of tolerant retrieval. The developed model of an inflectional paradigm includes a representation of words as an ordered collection of signatures and an approximate measure of similarity between two representations. The model differs in method of dealing with forms of words and character positions. It provides the basis for the implementation of improved methods of tolerant retrieval, cleaning and integration of datasets. The method of creation of the lexeme index which is based on the offered model of an inflectional paradigm is developed, and it allows mapping a word and all its forms to a record of the index. The method of tolerant retrieval is improved at preliminary filtration stage thanks to the developed model of an inflectional paradigm and the lexeme index. The experimental efficiency evaluation indicates high precision and 99  0,5 % recall. The information technology of cleaning and integration of data is improved using the developed models and methods. The software which on the basis of the developed models and methods carries out tolerant retrieval, cleaning and integration of data sets was developed. Theoretical and practical results of the thesis are introduced in production of document flow of an entrance committee and educational process of mathematical faculty of the State institution of higher education «Zaporizhzhya National University».
APA, Harvard, Vancouver, ISO, and other styles
3

Lo, Hao-Yu, and 羅浩毓. "An Improved Similarity Measure for Image Database Based on 2D C+-string." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/02733704989122668239.

Full text
Abstract:
碩士
靜宜大學
資訊管理學系研究所
92
In an image database system, the spatial knowledge representation is a technique of abstraction to describe an image. The 2D string and its variants are based on this concept. One of the variants, called 2D C+-string, considers sizes of and distances between objects. This method presents three major advantages: (1) more accuracy in picture representation and reconstruction; (2) less ambiguity in similarity retrieval; (3) reasoning about relative sizes, locations, and distances for a symbolic picture is possible. However, the similarity measure based on 2D C+-string doesn’t consider the ratios about sizes of and distances between objects on x- and y-axis together. The neglect will cause distorted pictures are retrieved. In this paper, we improve the similarity measure based on 2D C+-string. The improved similarity measure modifies the original equations for taking down the variation of ratios between two symbolic pictures and furthermore proposes two new types of similarity measure for discriminating pictures more precise. By having the variation of ratios, the pictorial information retrieval is more flexible for certain demands.
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Yi-Ching, and 陳怡靜. "An Improved Similarity Measure Based on a New Spatial Knowledge Representation-2D Be+-String." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/96d9b3.

Full text
Abstract:
碩士
靜宜大學
資訊管理學系研究所
97
The spatial knowledge representation is a technique of abstraction to describe an image. This concept is mainly based on variants of 2D string. One of the variants is called 2D Be-string which is improved from the 2D B-string. With applying “dummy objects,” the 2D Be-string can represent the pictorial spatial information intuitively and naturally without cutting mechanism and any spatial operator. Besides, 2D Be-string can simplify the retrieval progress of linear transformations, including rotation and reflection of images. However, 2D Be-string ignores the sizes and distances between objects. In consequence, the representation has deficiencies in spatial reasoning and similarity retrieval. In this paper, we propose a new spatial knowledge representation scheme called 2D Be+-string which extends the work of 2D Be-string by including relative metric information about the objects of the image into the strings. Consequently, our scheme offers the advantages of more accurate similarity retrieval and possible spatial reasoning about relative sizes and distances between objects for image databases.
APA, Harvard, Vancouver, ISO, and other styles
5

Rebenich, Niko. "Counting prime polynomials and measuring complexity and similarity of information." Thesis, 2016. http://hdl.handle.net/1828/7251.

Full text
Abstract:
This dissertation explores an analogue of the prime number theorem for polynomials over finite fields as well as its connection to the necklace factorization algorithm T-transform and the string complexity measure T-complexity. Specifically, a precise asymptotic expansion for the prime polynomial counting function is derived. The approximation given is more accurate than previous results in the literature while requiring very little computational effort. In this context asymptotic series expansions for Lerch transcendent, Eulerian polynomials, truncated polylogarithm, and polylogarithms of negative integer order are also provided. The expansion formulas developed are general and have applications in numerous areas other than the enumeration of prime polynomials. A bijection between the equivalence classes of aperiodic necklaces and monic prime polynomials is utilized to derive an asymptotic bound on the maximal T-complexity value of a string. Furthermore, the statistical behaviour of uniform random sequences that are factored via the T-transform are investigated, and an accurate probabilistic model for short necklace factors is presented. Finally, a T-complexity based conditional string complexity measure is proposed and used to define the normalized T-complexity distance that measures similarity between strings. The T-complexity distance is proven to not be a metric. However, the measure can be computed in linear time and space making it a suitable choice for large data sets.
Graduate
0544 0984 0405
nrebenich@gmail.com
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "String similarity measure"

1

Horne, Cynthia M. Transitional Justice in Support of Democratization. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780198793328.003.0008.

Full text
Abstract:
Chapter 7 examines the conditions under which transitional justice affected democratic consolidation, a strong civil society, and low levels of corruption in the post-communist sphere. Lustration measures were robustly associated with democracy, with compulsory programs involving a punitive dimension having more noticeable effects than programs relying on symbolic shaming mechanisms. Wide and compulsory programs were similarly associated with more robust civil societies. However, there was evidence of a weak but negative relationship between truth commissions and democracy and civil society. Moreover, despite the framing of lustration as a corruption corrective, there was no apparent direct relationship. There were, however, indirect relationships between lustration measures and lower levels of corruption, highlighting the possible conditional effects of lustration. Finally, the chapter illustrated that there was a relatively long period of time after the transition within which to pass beneficial transitional justice measures, with declining efficacy several decades after the transition.
APA, Harvard, Vancouver, ISO, and other styles
2

Regan, Patrick M. A Perceptual Approach to Quality Peace. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780190680121.003.0003.

Full text
Abstract:
This chapter tackles the problem of finding data-derived indicators to measure the quality of peace, versus a definition of peace simply as the absence of war. Conceptually, peace is seen as an equilibrium condition where resort to violence is minimal and where the highest quality of peace exists when the idea of armed violence approaches the unthinkable. The author draws upon the early work of Quincy Wright and Kenneth Boulding and progresses from there, establishing first their definitions of and conditions for peace. To put his theories to work, he introduces two proxy indicators: black market currency exchanges and bond market prices. Specifically, he examines and compares the premiums attached to the black market values of currencies in less stable economies and relates them to factors that promote destabilization of the equilibrium. Similarly, he compares the strip spreads on sovereign bonds as an indicator of government stability and instability.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "String similarity measure"

1

Cristo, Marco, Pável Calado, Edleno Silva de Moura, Nivio Ziviani, and Berthier Ribeiro-Neto. "Link Information as a Similarity Measure in Web Classification." In String Processing and Information Retrieval, 43–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/978-3-540-39984-1_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nguyen, Thi Thuy Anh, and Stefan Conrad. "An Improved String Similarity Measure Based on Combining Information-Theoretic and Edit Distance Methods." In Communications in Computer and Information Science, 228–39. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-25840-9_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fred, Ana. "Similarity Measures and Clustering of String Patterns." In Pattern Recognition and String Matching, 155–93. Boston, MA: Springer US, 2003. http://dx.doi.org/10.1007/978-1-4613-0231-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ardila, Yoan José Pinzón, Raphaël Clifford, and Manal Mohamed. "Necklace Swap Problem for Rhythmic Similarity Measures." In String Processing and Information Retrieval, 234–45. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11575832_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Luján-Mora, Sergio, and Manuel Palomar. "Comparing String Similarity Measures for Reducing Inconsistency in Integrating Data from Different Sources." In Advances in Web-Age Information Management, 191–202. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-47714-4_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Alaran, Misturah Adunni, AbdulAkeem Adesina Agboola, Adio Taofiki Akinwale, and Olusegun Folorunso. "A New LCS-Neutrosophic Similarity Measure for Text Information Retrieval." In Neutrosophic Sets in Decision Analysis and Operations Research, 258–80. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-7998-2555-5.ch012.

Full text
Abstract:
The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.
APA, Harvard, Vancouver, ISO, and other styles
7

Hirschberg, D. S. "Serial Computations of Levenshtein Distances." In Pattern Matching Algorithms. Oxford University Press, 1997. http://dx.doi.org/10.1093/oso/9780195113679.003.0007.

Full text
Abstract:
In the previous chapters, we discussed problems involving an exact match of string patterns. We now turn to problems involving similar but not necessarily exact pattern matches. There are a number of similarity or distance measures, and many of them are special cases or generalizations of the Levenshtein metric. The problem of evaluating the measure of string similarity has numerous applications, including one arising in the study of the evolution of long molecules such as proteins. In this chapter, we focus on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance. The Levenshtein distance is a metric that measures the similarity of two strings. In its simple form, the Levenshtein distance, D(x , y), between strings x and y is the minimum number of character insertions and/or deletions (indels) required to transform string x into string y. A commonly used generalization of the Levenshtein distance is the minimum cost of transforming x into y when the allowable operations are character insertion, deletion, and substitution, with costs δ(λ , σ), δ(σ, λ), and δ(σ1, σ2) , that are functions of the involved character(s). There are direct correspondences between the Levenshtein distance of two strings, the length of the shortest edit sequence from one string to the other, and the length of the longest common subsequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n — D)/2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 6, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions).
APA, Harvard, Vancouver, ISO, and other styles
8

Pinaire, Jessica, Etienne Chabert, Jérôme Azé, Sandra Bringay, Pascal Poncelet, and Paul Landais. "Prediction of In-Hospital Mortality from Administrative Data: A Sequential Pattern Mining Approach." In Studies in Health Technology and Informatics. IOS Press, 2021. http://dx.doi.org/10.3233/shti210167.

Full text
Abstract:
Study of trajectory of care is attractive for predicting medical outcome. Models based on machine learning (ML) techniques have proven their efficiency for sequence prediction modeling compared to other models. Introducing pattern mining techniques contributed to reduce model complexity. In this respect, we explored methods for medical events’ prediction based on the extraction of sets of relevant event sequences of a national hospital discharge database. It is illustrated to predict the risk of in-hospital mortality in acute coronary syndrome (ACS). We mined sequential patterns from the French Hospital Discharge Database. We compared several predictive models using a text string distance to measure the similarity between patients’ patterns of care. We computed combinations of similarity measurements and ML models commonly used. A Support Vector Machine model coupled with edit-based distance appeared as the most effective model. Indeed discrimination ranged from 0.71 to 0.99, together with a good overall accuracy. Thus, sequential patterns mining appear motivating for event prediction in medical settings as described here for ACS.
APA, Harvard, Vancouver, ISO, and other styles
9

Shahri, Hamid Haidarian. "A Machine Learning Approach to Data Cleaning in Databases and Data Warehouses." In Database Technologies, 2245–60. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-60566-058-5.ch136.

Full text
Abstract:
Entity resolution (also known as duplicate elimination) is an important part of the data cleaning process, especially in data integration and warehousing, where data are gathered from distributed and inconsistent sources. Learnable string similarity measures are an active area of research in the entity resolution problem. Our proposed framework builds upon our earlier work on entity resolution, in which fuzzy rules and membership functions are defined by the user. Here, we exploit neuro-fuzzy modeling for the first time to produce a unique adaptive framework for entity resolution, which automatically learns and adapts to the specific notion of similarity at a metalevel. This framework encompasses many of the previous work on trainable and domain-specific similarity measures. Employing fuzzy inference, it removes the repetitive task of hard-coding a program based on a schema, which is usually required in previous approaches. In addition, our extensible framework is very flexible for the end user. Hence, it can be utilized in the production of an intelligent tool to increase the quality and accuracy of data.
APA, Harvard, Vancouver, ISO, and other styles
10

Cohn, Margit. "The Nature and Use of Unilateral Executive Measures." In A Theory of the Executive Branch, 137–64. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780198821984.003.0006.

Full text
Abstract:
Constitutions and constitutional constructs offer executives a repository of fuzzy sources of power which enable unilateral action. This chapter focuses on one of these forms: executive making of (semi)-formal unilateral measures. These orders and edicts have an important edge: on their face, they are ‘lawlike’, and seemingly carry the imprimatur of binding law, even when their legal status is fuzzy. The chapter uses comparative methodology in order to show the strong similarity between such measures as they emerged and continue to be applied in the two systems compared in this book. Orders in Council, Executive Orders and the like, such as the ones brought before the courts in Bancoult and Youngstown, have been at the focus of extensive study; yet to date, such measures, issued in both systems, have never been conjointly discussed. This chapter offers the first comparative analysis. This novel comparative exercise leads to the discovery of a surprising convergence—surprising, if attention is focused on structural regime elements. The findings support two of the main themes advanced in this book: that the emergence and retention of fuzzy legality is an unavoidable feature of the state, despite the ingrained danger it poses to the proper functioning of democracies. A third theme, concerned with the need to constrain fuzziness by robust judicial oversight, is addressed in the last chapter of this book. This chapter also offers new insights on the unclear distinction between constitutional- and statute-derived fuzziness, again, a feature shared by both systems.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "String similarity measure"

1

Debbarma, Abhijit, BS Purkayastha, and Paritosh Bhattacharya. "Stemmer for resource scarce language using string similarity measure." In 2014 International Conference on Optimization, Reliabilty, and Information Technology (ICROIT). IEEE, 2014. http://dx.doi.org/10.1109/icroit.2014.6798299.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lu, Jiaheng, Chunbin Lin, Wei Wang, Chen Li, and Haiyong Wang. "String similarity measures and joins with synonyms." In the 2013 international conference. New York, New York, USA: ACM Press, 2013. http://dx.doi.org/10.1145/2463676.2465313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bilenko, Mikhail, and Raymond J. Mooney. "Adaptive duplicate detection using learnable string similarity measures." In the ninth ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2003. http://dx.doi.org/10.1145/956750.956759.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chernyak, Ekaterina. "Comparison of String Similarity Measures for Obscenity Filtering." In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-1415.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Unknown. "A Comparison of String Similarity Measures for Toponym Matching." In The First ACM SIGSPATIAL International Workshop. New York, New York, USA: ACM Press, 2013. http://dx.doi.org/10.1145/2534848.2534850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Malakasiotis, Prodromos, and Ion Androutsopoulos. "Learning textual entailment using SVMs and string similarity measures." In the ACL-PASCAL Workshop. Morristown, NJ, USA: Association for Computational Linguistics, 2007. http://dx.doi.org/10.3115/1654536.1654547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Montalvo, Soto, Eduardo G. Pardo, Raquel Martinez, and Victor Fresno. "Automatic cognate identification based on a fuzzy combination of string similarity measures." In 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2012. http://dx.doi.org/10.1109/fuzz-ieee.2012.6250802.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Park, Jungseo, Seunghwan Mun, Chungmin Hyun, Byungkwon Kang, and Kwanghee Ko. "Similarity Assessment Method for Automated Curved Plate Forming." In SNAME 5th World Maritime Technology Conference. SNAME, 2015. http://dx.doi.org/10.5957/wmtc-2015-240.

Full text
Abstract:
In this paper, a novel similarity estimation method for two shapes in the automated thermal forming is proposed. One shape is given as a CAD surface, and the other is given as a set of points measured points. These two shapes are registered with respect to a reference coordinate system so that they are aligned as closely as possible using the ICP based method. Three geometric properties are considered in the method. The first property is the distance between them. At each measured point, the closest distance to the CAD surface is computed, and the defined tolerance for the distances is used as a similarity measure. The second measure is the average distance of the minimum distances to the CAD surface at the measured points. The third one is the average of the bending strain values at the measured points and at the points on the CAD surface that are orthogonal projection points of the measured ones. The proposed similarity is computed as the linear combination of the three properties with weight values, which are determined empirically. Extensive experiments show that the proposed similarity method successfully computes the similarity of a plate to its CAD shape in the forming process.
APA, Harvard, Vancouver, ISO, and other styles
9

Baldwin, Timothy, Huizhi Liang, Bahar Salehi, Doris Hoogeveen, Yitong Li, and Long Duong. "UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by combining a CNN with String Similarity Measures." In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/s16-1131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Xinyi, Huanhuan Cao, Yanhua Yang, Erkun Yang, and Cheng Deng. "Zero-shot Metric Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/555.

Full text
Abstract:
In this work, we tackle the zero-shot metric learning problem and propose a novel method abbreviated as ZSML, with the purpose to learn a distance metric that measures the similarity of unseen categories (even unseen datasets). ZSML achieves strong transferability by capturing multi-nonlinear yet continuous relation among data. It is motivated by two facts: 1) relations can be essentially described from various perspectives; and 2) traditional binary supervision is insufficient to represent continuous visual similarity. Specifically, we first reformulate a collection of specific-shaped convolutional kernels to combine data pairs and generate multiple relation vectors. Furthermore, we design a new cross-update regression loss to discover continuous similarity. Extensive experiments including intra-dataset transfer and inter-dataset transfer on four benchmark datasets demonstrate that ZSML can achieve state-of-the-art performance.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "String similarity measure"

1

Brenan, J. M., K. Woods, J. E. Mungall, and R. Weston. Origin of chromitites in the Esker Intrusive Complex, Ring of Fire Intrusive Suite, as revealed by chromite trace element chemistry and simple crystallization models. Natural Resources Canada/CMSS/Information Management, 2021. http://dx.doi.org/10.4095/328981.

Full text
Abstract:
To better constrain the origin of the chromitites associated with the Esker Intrusive Complex (EIC) of the Ring of Fire Intrusive Suite (RoFIS), a total of 50 chromite-bearing samples from the Black Thor, Big Daddy, Blackbird, and Black Label chromite deposits have been analysed for major and trace elements. The samples represent three textural groups, as defined by the relative abundance of cumulate silicate phases and chromite. To provide deposit-specific partition coefficients for modeling, we also report on the results of laboratory experiments to measure olivine- and chromite-melt partitioning of V and Ga, which are two elements readily detectable in the chromites analysed. Comparison of the Cr/Cr+Al and Fe/Fe+Mg of the EIC chromites and compositions from previous experimental studies indicates overlap in Cr/Cr+Al between the natural samples and experiments done at >1400oC, but significant offset of the natural samples to higher Fe/Fe+Mg. This is interpreted to be the result of subsolidus Fe-Mg exchange between chromite and the silicate matrix. However, little change in Cr/Cr+Al from magmatic values, owing to the lack of an exchangeable reservoir for these elements. A comparison of the composition of the EIC chromites and a subset of samples from other tectonic settings reveals a strong similarity to chromites from the similarly-aged Munro Township komatiites. Partition coefficients for V and Ga are consistent with past results in that both elements are compatible in chromite (DV = 2-4; DGa ~ 3), and incompatible in olivine (DV = 0.01-0.14; DGa ~ 0.02), with values for V increasing with decreasing fO2. Simple fractional crystallization models that use these partition coefficients are developed that monitor the change in element behaviour based on the relative proportions of olivine to chromite in the crystallizing assemblage; from 'normal' cotectic proportions involving predominantly olivine, to chromite-only crystallization. Comparison of models to the natural chromite V-Ga array suggests that the overall positive correlation between these two elements is consistent with chromite formed from a Munro Township-like komatiitic magma crystallizing olivine and chromite in 'normal' cotectic proportions, with no evidence of the strong depletion in these elements expected for chromite-only crystallization. The V-Ga array can be explained if the initial magma responsible for chromite formation is slightly reduced with respect to the FMQ oxygen buffer (~FMQ- 0.5), and has assimilated up to ~20% of wall-rock banded iron formation or granodiorite. Despite the evidence for contamination, results indicate that the EIC chromitites crystallized from 'normal' cotectic proportions of olivine to chromite, and therefore no specific causative link is made between contamination and chromitite formation. Instead, the development of near- monomineralic chromite layers likely involves the preferential removal of olivine relative to chromite by physical segregation during magma flow. As suggested for some other chromitite-forming systems, the specific fluid dynamic regime during magma emplacement may therefore be responsible for crystal sorting and chromite accumulation.
APA, Harvard, Vancouver, ISO, and other styles
2

Tidd, Alexander N., Richard A. Ayers, Grant P. Course, and Guy R. Pasco. Scottish Inshore Fisheries Integrated Data System (SIFIDS): work package 6 final report development of a pilot relational data resource for the collation and interpretation of inshore fisheries data. Edited by Mark James and Hannah Ladd-Jones. Marine Alliance for Science and Technology for Scotland (MASTS), 2019. http://dx.doi.org/10.15664/10023.23452.

Full text
Abstract:
[Extract from Executive Summary] The competition for space from competing sectors in the coastal waters of Scotland has never been greater and thus there is a growing a need for interactive seascape planning tools that encompass all marine activities. Similarly, the need to gather data to inform decision makers, especially in the fishing industry, has become essential to provide advice on the economic impact on fishing fleets both in terms of alternative conservation measures (e.g. effort limitations, temporal and spatial closures) as well as the overlap with other activities, thereby allowing stakeholders to derive a preferred option. The SIFIDS project was conceived to allow the different relevant data sources to be identified and to allow these data to be collated in one place, rather than as isolated data sets with multiple data owners. The online interactive tool developed as part of the project (Work Package 6) brought together relevant data sets and developed data storage facilities and a user interface to allow various types of user to view and interrogate the data. Some of these data sets were obtained as static layers which could sit as background data e.g. substrate type, UK fishing limits; whilst other data came directly from electronic monitoring systems developed as part of the SIFIDS project. The main non-static data source was Work Package 2, which was collecting data from a sample of volunteer inshore fishing vessels (<12m). This included data on location; time; vessel speed; count, time and position of deployment of strings of creels (or as fleets and pots as they are also known respectively); and a count of how many creels were hauled on these strings. The interactive online tool allowed all the above data to be collated in a specially designed database and displayed in near real time on the web-based application.
APA, Harvard, Vancouver, ISO, and other styles
3

Klement, Eyal, Elizabeth Howerth, William C. Wilson, David Stallknecht, Danny Mead, Hagai Yadin, Itamar Lensky, and Nadav Galon. Exploration of the Epidemiology of a Newly Emerging Cattle-Epizootic Hemorrhagic Disease Virus in Israel. United States Department of Agriculture, January 2012. http://dx.doi.org/10.32747/2012.7697118.bard.

Full text
Abstract:
In September 2006 an outbreak of 'Bluetongue like' disease struck the cattle herds in Israel. Over 100 dairy and beef cattle herds were affected. Epizootic hemorrhagic disease virus (EHDV) (an Orbivirusclosely related to bluetongue virus (BTV)), was isolated from samples collected from several herds during the outbreaks. Following are the aims of the study and summary of the results: which up until now were published in 6 articles in peer-reviewed journals. Three more articles are still under preparation: 1. To identify the origin of the virus: The virus identified was fully sequenced and compared with the sequences available in the GenBank. It appeared that while gene segment L2 was clustered with EHDV-7 isolated in Australia, most of the other segments were clustered with EHDV-6 isolates from South-Africa and Bahrain. This may suggest that the strain which affected Israel on 2006 may have been related to similar outbreaks which occurred in north-Africa at the same year and could also be a result of reassortment with an Australian strain (Wilson et al. article in preparation). Analysis of the serological results from Israel demonstrated that cows and calves were similarly positive as opposed to BTV for which seropositivity in cows was significantly higher than in calves. This finding also supports the hypothesis that the 2006 EHD outbreak in Israel was an incursive event and the virus was not present in Israel before this outbreak (Kedmi et al. Veterinary Journal, 2011) 2. To identify the vectors of this virus: In the US, Culicoides sonorensis was found as an efficient vector of EHDV as the virus was transmitted by midges fed on infected white tailed deer (WTD; Odocoileusvirginianus) to susceptible WTD (Ruder et al. Parasites and Vectors, 2012). We also examined the effect of temperature on replication of EHDV-7 in C. sonorensis and demonstrated that the time to detection of potentially competent midges decreased with increasing temperature (Ruder et al. in preparation). Although multiple attempts were made, we failed to evaluate wild-caught Culicoidesinsignisas a potential vector for EHDV-7; however, our finding that C. sonorensis is a competent vector is far more significant because this species is widespread in the U.S. As for Israeli Culicoides spp. the main species caught near farms affected during the outbreaks were C. imicolaand C. oxystoma. The vector competence studies performed in Israel were in a smaller scale than in the US due to lack of a laboratory colony of these species and due to lack of facilities to infect animals with vector borne diseases. However, we found both species to be susceptible for infection by EHDV. For C. oxystoma, 1/3 of the Culicoidesinfected were positive 11 days post feeding. 3. To identify the host and environmental factors influencing the level of exposure to EHDV, its spread and its associated morbidity: Analysis of the cattle morbidity in Israel showed that the disease resulted in an average loss of over 200 kg milk per cow in herds affected during September 2006 and 1.42% excess mortality in heavily infected herds (Kedmi et al. Journal of Dairy Science, 2010). Outbreak investigation showed that winds played a significant role in virus spread during the 2006 outbreak (Kedmi et al. Preventive Veterinary Medicine, 2010). Further studies showed that both sheep (Kedmi et al. Veterinary Microbiology, 2011) and wild ruminants did not play a significant role in virus spread in Israel (Kedmi et al. article in preparation). Clinical studies in WTD showed that this species is highly susceptibile to EHDV-7 infection and disease (Ruder et al. Journal of Wildlife Diseases, 2012). Experimental infection of Holstein cattle (cows and calves) yielded subclinical viremia (Ruder et al. in preparation). The findings of this study, which resulted in 6 articles, published in peer reviewed journals and 4 more articles which are in preparation, contributed to the dairy industry in Israel by defining the main factors associated with disease spread and assessment of disease impact. In the US, we demonstrated that sufficient conditions exist for potential virus establishment if EHDV-7 were introduced. The significant knowledge gained through this study will enable better decision making regarding prevention and control measures for EHDV and similar viruses, such as BTV.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography