Academic literature on the topic 'String similarity measure'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'String similarity measure.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "String similarity measure"

1

Revesz, Peter Z. "A Tiling Algorithm-Based String Similarity Measure." WSEAS TRANSACTIONS ON COMPUTER RESEARCH 9 (August 10, 2021): 109–12. http://dx.doi.org/10.37394/232018.2021.9.13.

Full text
Abstract:
This paper describes a similarity measure for strings based on a tiling algorithm. The algorithm is applied to a pair of proteins that are described by their respective amino acid sequences. The paper also describes how the algorithm can be used to find highly conserved amino acid sequences and examples of horizontal gene transfer between different species
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Bakry, Abbas, and Marwa Al-Rikaby. "Enhanced Levenshtein Edit Distance Method functioning as a String-to-String Similarity Measure." Iraqi Journal for Computers and Informatics 42, no. 1 (2016): 48–54. http://dx.doi.org/10.25195/ijci.v42i1.83.

Full text
Abstract:
Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.
APA, Harvard, Vancouver, ISO, and other styles
3

Sakunthala Prabha, K. S., C. Mahesh, and S. P. Raja. "An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm." Cybernetics and Information Technologies 21, no. 2 (2021): 105–20. http://dx.doi.org/10.2478/cait-2021-0022.

Full text
Abstract:
Abstract Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.
APA, Harvard, Vancouver, ISO, and other styles
4

Rakhmawati, Nur Aini, and Miftahul Jannah. "Food Ingredients Similarity Based on Conceptual and Textual Similarity." Halal Research Journal 1, no. 2 (2021): 87–95. http://dx.doi.org/10.12962/j22759970.v1i2.107.

Full text
Abstract:
Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.
APA, Harvard, Vancouver, ISO, and other styles
5

Znamenskij, Sergej Vital'evich. "Stable assessment of the quality of similarity algorithms of character strings and their normalizations." Program Systems: Theory and Applications 9, no. 4 (2018): 561–78. http://dx.doi.org/10.25209/2079-3316-2018-9-4-561-578.

Full text
Abstract:
The choice of search tools for hidden commonality in the data of a new nature requires stable and reproducible comparative assessments of the quality of abstract algorithms for the proximity of symbol strings. Conventional estimates based on artificially generated or manually labeled tests vary significantly, rather evaluating the method of this artificial generation with respect to similarity algorithms, and estimates based on user data cannot be accurately reproduced. A simple, transparent, objective and reproducible numerical quality assessment of a string metric. Parallel texts of book translations in different languages are used. The quality of a measure is estimated by the percentage of errors in possible different tries of determining the translation of a given paragraph among two paragraphs of a book in another language, one of which is actually a translation. The stability of assessments is verified by independence from the choice of a book and a pair of languages. The numerical experiment steadily ranked by quality algorithms for abstract character string comparisons and showed a strong dependence on the choice of normalization.
APA, Harvard, Vancouver, ISO, and other styles
6

Setiawan, Rudi. "Similarity Checking Similarity Checking of Source Code Module Using Running Karp Rabin Greedy String Tiling." Science Proceedings Series 1, no. 2 (2019): 43–46. http://dx.doi.org/10.31580/sps.v1i2.624.

Full text
Abstract:

 
 
 Similarity checking of source code module, required a long process if it is done manually. Based on that problem, this research designed a software with structure-based approach using string matching technique with Running Karp-Rabin Greedy String Tiling (RKR-GST) Algorithm to check the similarity and using Dice Coefficient method to measure the level of similarity from 2 results source code modules.
 
 
 
 The result of the experiments show that RKRGST which applied in this system capable of recognizing the changing of statement and the changing statement order, and be able to recognize the syntax procedure testing that has been taken from its comparison module. Modification by adding the comment on source code module and changing of procedure name which is called in body of procedure can also be recognized by system. Processing time needed to produce output depends on the number of program code row that contained in source code module.
 
 
 
 
 
APA, Harvard, Vancouver, ISO, and other styles
7

RODRIGUEZ, WLADIMIR, MARK LAST, ABRAHAM KANDEL, and HORST BUNKE. "GEOMETRIC APPROACH TO DATA MINING." International Journal of Image and Graphics 01, no. 02 (2001): 363–86. http://dx.doi.org/10.1142/s0219467801000220.

Full text
Abstract:
In this paper, a new, geometric approach to pattern identification in data mining is presented. It is based on applying string edit distance computation to measuring the similarity between multi-dimensional curves. The string edit distance computation is extended to allow the possibility of using strings, where each element is a vector rather than just a symbol. We discuss an approach for representing 3D-curves using the curvature and the tension as their symbolic representation. This transformation preserves all the information contained in the original 3D-curve. We validate this approach through experiments using synthetic and digitalized data. In particular, the proposed approach is suitable to measure the similarity of 3D-curves invariant under translation, rotation, and scaling. It also can be applied for partial curve matching.
APA, Harvard, Vancouver, ISO, and other styles
8

Samanta, Soumitra, Steve O’Hagan, Neil Swainston, Timothy J. Roberts, and Douglas B. Kell. "VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder." Molecules 25, no. 15 (2020): 3446. http://dx.doi.org/10.3390/molecules25153446.

Full text
Abstract:
Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhu, Jin, Dayu Cheng, Weiwei Zhang, Ci Song, Jie Chen, and Tao Pei. "A New Approach to Measuring the Similarity of Indoor Semantic Trajectories." ISPRS International Journal of Geo-Information 10, no. 2 (2021): 90. http://dx.doi.org/10.3390/ijgi10020090.

Full text
Abstract:
People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.
APA, Harvard, Vancouver, ISO, and other styles
10

Sabarish, B. A., Karthi R., and Gireesh Kumar T. "String-Based Feature Representation for Trajectory Clustering." International Journal of Embedded and Real-Time Communication Systems 10, no. 2 (2019): 1–18. http://dx.doi.org/10.4018/ijertcs.2019040101.

Full text
Abstract:
A trajectory is the spatial trail of a moving object as a function of time. All moving objects such as humans, robots, cloud, taxis, animals, mobile phones generate trajectories. Trajectory clustering is grouping of trajectories that have similar moving patterns, and the formed clusters depend on feature representation, similarity metrics, and clustering algorithm used. In this article, trajectory features are generated after mapping trajectories onto grids, as this smoothens the variations that occur in spatial coordinates. These variations occur due to differences in how GPS points at varying intervals are generated by the device, even when they follow the same path. The main motivation for the article is to devise an algorithm for trajectory clustering that is independent of the variations from GPS devices. A string-based model is used, where trajectories are represented as strings and string-based distance metrics are used to measure the similarity between trajectories. A hierarchical method is applied for clustering and the results are validated using three metrics. An experimental study is conducted and the results show the effectiveness of string-based representation and distance metrics for trajectory clustering.
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography