To see the other types of publications on this topic, follow the link: String similarity measure.

Journal articles on the topic 'String similarity measure'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'String similarity measure.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Revesz, Peter Z. "A Tiling Algorithm-Based String Similarity Measure." WSEAS TRANSACTIONS ON COMPUTER RESEARCH 9 (August 10, 2021): 109–12. http://dx.doi.org/10.37394/232018.2021.9.13.

Full text
Abstract:
This paper describes a similarity measure for strings based on a tiling algorithm. The algorithm is applied to a pair of proteins that are described by their respective amino acid sequences. The paper also describes how the algorithm can be used to find highly conserved amino acid sequences and examples of horizontal gene transfer between different species
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Bakry, Abbas, and Marwa Al-Rikaby. "Enhanced Levenshtein Edit Distance Method functioning as a String-to-String Similarity Measure." Iraqi Journal for Computers and Informatics 42, no. 1 (December 31, 2016): 48–54. http://dx.doi.org/10.25195/ijci.v42i1.83.

Full text
Abstract:
Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.
APA, Harvard, Vancouver, ISO, and other styles
3

Sakunthala Prabha, K. S., C. Mahesh, and S. P. Raja. "An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm." Cybernetics and Information Technologies 21, no. 2 (June 1, 2021): 105–20. http://dx.doi.org/10.2478/cait-2021-0022.

Full text
Abstract:
Abstract Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.
APA, Harvard, Vancouver, ISO, and other styles
4

Rakhmawati, Nur Aini, and Miftahul Jannah. "Food Ingredients Similarity Based on Conceptual and Textual Similarity." Halal Research Journal 1, no. 2 (October 27, 2021): 87–95. http://dx.doi.org/10.12962/j22759970.v1i2.107.

Full text
Abstract:
Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.
APA, Harvard, Vancouver, ISO, and other styles
5

Znamenskij, Sergej Vital'evich. "Stable assessment of the quality of similarity algorithms of character strings and their normalizations." Program Systems: Theory and Applications 9, no. 4 (December 28, 2018): 561–78. http://dx.doi.org/10.25209/2079-3316-2018-9-4-561-578.

Full text
Abstract:
The choice of search tools for hidden commonality in the data of a new nature requires stable and reproducible comparative assessments of the quality of abstract algorithms for the proximity of symbol strings. Conventional estimates based on artificially generated or manually labeled tests vary significantly, rather evaluating the method of this artificial generation with respect to similarity algorithms, and estimates based on user data cannot be accurately reproduced. A simple, transparent, objective and reproducible numerical quality assessment of a string metric. Parallel texts of book translations in different languages are used. The quality of a measure is estimated by the percentage of errors in possible different tries of determining the translation of a given paragraph among two paragraphs of a book in another language, one of which is actually a translation. The stability of assessments is verified by independence from the choice of a book and a pair of languages. The numerical experiment steadily ranked by quality algorithms for abstract character string comparisons and showed a strong dependence on the choice of normalization.
APA, Harvard, Vancouver, ISO, and other styles
6

Setiawan, Rudi. "Similarity Checking Similarity Checking of Source Code Module Using Running Karp Rabin Greedy String Tiling." Science Proceedings Series 1, no. 2 (April 24, 2019): 43–46. http://dx.doi.org/10.31580/sps.v1i2.624.

Full text
Abstract:
Similarity checking of source code module, required a long process if it is done manually. Based on that problem, this research designed a software with structure-based approach using string matching technique with Running Karp-Rabin Greedy String Tiling (RKR-GST) Algorithm to check the similarity and using Dice Coefficient method to measure the level of similarity from 2 results source code modules. The result of the experiments show that RKRGST which applied in this system capable of recognizing the changing of statement and the changing statement order, and be able to recognize the syntax procedure testing that has been taken from its comparison module. Modification by adding the comment on source code module and changing of procedure name which is called in body of procedure can also be recognized by system. Processing time needed to produce output depends on the number of program code row that contained in source code module.
APA, Harvard, Vancouver, ISO, and other styles
7

RODRIGUEZ, WLADIMIR, MARK LAST, ABRAHAM KANDEL, and HORST BUNKE. "GEOMETRIC APPROACH TO DATA MINING." International Journal of Image and Graphics 01, no. 02 (April 2001): 363–86. http://dx.doi.org/10.1142/s0219467801000220.

Full text
Abstract:
In this paper, a new, geometric approach to pattern identification in data mining is presented. It is based on applying string edit distance computation to measuring the similarity between multi-dimensional curves. The string edit distance computation is extended to allow the possibility of using strings, where each element is a vector rather than just a symbol. We discuss an approach for representing 3D-curves using the curvature and the tension as their symbolic representation. This transformation preserves all the information contained in the original 3D-curve. We validate this approach through experiments using synthetic and digitalized data. In particular, the proposed approach is suitable to measure the similarity of 3D-curves invariant under translation, rotation, and scaling. It also can be applied for partial curve matching.
APA, Harvard, Vancouver, ISO, and other styles
8

Samanta, Soumitra, Steve O’Hagan, Neil Swainston, Timothy J. Roberts, and Douglas B. Kell. "VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder." Molecules 25, no. 15 (July 29, 2020): 3446. http://dx.doi.org/10.3390/molecules25153446.

Full text
Abstract:
Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhu, Jin, Dayu Cheng, Weiwei Zhang, Ci Song, Jie Chen, and Tao Pei. "A New Approach to Measuring the Similarity of Indoor Semantic Trajectories." ISPRS International Journal of Geo-Information 10, no. 2 (February 20, 2021): 90. http://dx.doi.org/10.3390/ijgi10020090.

Full text
Abstract:
People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.
APA, Harvard, Vancouver, ISO, and other styles
10

Sabarish, B. A., Karthi R., and Gireesh Kumar T. "String-Based Feature Representation for Trajectory Clustering." International Journal of Embedded and Real-Time Communication Systems 10, no. 2 (April 2019): 1–18. http://dx.doi.org/10.4018/ijertcs.2019040101.

Full text
Abstract:
A trajectory is the spatial trail of a moving object as a function of time. All moving objects such as humans, robots, cloud, taxis, animals, mobile phones generate trajectories. Trajectory clustering is grouping of trajectories that have similar moving patterns, and the formed clusters depend on feature representation, similarity metrics, and clustering algorithm used. In this article, trajectory features are generated after mapping trajectories onto grids, as this smoothens the variations that occur in spatial coordinates. These variations occur due to differences in how GPS points at varying intervals are generated by the device, even when they follow the same path. The main motivation for the article is to devise an algorithm for trajectory clustering that is independent of the variations from GPS devices. A string-based model is used, where trajectories are represented as strings and string-based distance metrics are used to measure the similarity between trajectories. A hierarchical method is applied for clustering and the results are validated using three metrics. An experimental study is conducted and the results show the effectiveness of string-based representation and distance metrics for trajectory clustering.
APA, Harvard, Vancouver, ISO, and other styles
11

Qiu, Dehong, Jialin Sun, and Hao Li. "Improving Similarity Measure for Java Programs Based on Optimal Matching of Control Flow Graphs." International Journal of Software Engineering and Knowledge Engineering 25, no. 07 (September 2015): 1171–97. http://dx.doi.org/10.1142/s0218194015500229.

Full text
Abstract:
Measuring program similarity plays an important role in solving many problems in software engineering. However, because programs are instruction sequences with complex structures and semantic functions and furthermore, programs may be obfuscated deliberately through semantics-preserving transformations, measuring program similarity is a difficult task that has not been adequately addressed. In this paper, we propose a new approach to measuring Java program similarity. The approach first measures the low-level similarity between basic blocks according to the bytecode instruction sequences and the structural property of the basic blocks. Then, an error-tolerant graph matching algorithm that can combat structure transformations is used to match the Control Flow Graphs (CFG) based on the basic block similarity. The high-level similarity between Java programs is subsequently calculated on the matched pairs of the independent paths extracted from the optimal CFG matching. The proposed CFG-Match approach is compared with a string-based approach, a tree-based approach and a graph-based approach. Experimental results show that the CFG-Match approach is more accurate and robust against semantics-preserving transformations. The CFG-Match approach is used to detect Java program plagiarism. Experiments on the collection of benchmark program pairs collected from the students’ submission of project assignments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism.
APA, Harvard, Vancouver, ISO, and other styles
12

LYRAS, DIMITRIOS P., KYRIAKOS N. SGARBAS, and NIKOLAOS D. FAKOTAKIS. "APPLYING SIMILARITY MEASURES FOR AUTOMATIC LEMMATIZATION: A CASE STUDY FOR MODERN GREEK AND ENGLISH." International Journal on Artificial Intelligence Tools 17, no. 05 (October 2008): 1043–64. http://dx.doi.org/10.1142/s021821300800428x.

Full text
Abstract:
This paper addresses the problem of automatic induction of the normalized form (lemma) of regular and mildly irregular words with no direct supervision using language-independent algorithms. More specifically, two string distance metric models (i.e. the Levenshtein Edit Distance algorithm and the Dice Coefficient similarity measure) were employed in order to deal with the automatic word lemmatization task by combining two alignment models based on the string similarity and the most frequent inflectional suffixes. The performance of the proposed model has been evaluated quantitatively and qualitatively. Experiments were performed for the Modern Greek and English languages and the results, which are set within the state-of-the-art, have showed that the proposed model is robust (for a variety of languages) and computationally efficient. The proposed model may be useful as a pre-processing tool to various language engineering and text mining applications such as spell-checkers, electronic dictionaries, morphological analyzers etc.
APA, Harvard, Vancouver, ISO, and other styles
13

Ibrahim, Arsmah, Zainab Abu Bakar, Nuru’l–‘Izzah Othman, and Nor Fuzaina Ismail. "Assessing the Line-By-Line Marking Performance of n-Gram String Similarity Method." Scientific Research Journal 6, no. 1 (June 30, 2009): 15. http://dx.doi.org/10.24191/srj.v6i1.5636.

Full text
Abstract:
Manual marking of free-response solutions in mathematics assessments is very demanding in terms of time and effort. Available software equipped with automated marking features to mark open-ended questions has very limited capabilities. In most cases the marking process focuses on the final answer only. Few available software are capable of marking the intermediate steps as is norm in manual marking. This paper discusses the line-by-line marking performance of the n_gram string similarity method using the Dice coefficient as means to measure similarity. The marks awarded by the automated marking process are compared with marks awarded by manual marking. Marks awarded by manual marking are used as the benchmark to gauge the performance of the automated marking technique in terms of its closeness to manual marking.
APA, Harvard, Vancouver, ISO, and other styles
14

WANG, SHENG, and WEI-MOU ZHENG. "CLePAPS: FAST PAIR ALIGNMENT OF PROTEIN STRUCTURES BASED ON CONFORMATIONAL LETTERS." Journal of Bioinformatics and Computational Biology 06, no. 02 (April 2008): 347–66. http://dx.doi.org/10.1142/s0219720008003461.

Full text
Abstract:
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Cα pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.
APA, Harvard, Vancouver, ISO, and other styles
15

Son, Nguyen Van, Le Thanh Huong, and Nguyen Chi Thanh. "A two-phase plagiarism detection system based on multi-layer long short-term memory networks." IAES International Journal of Artificial Intelligence (IJ-AI) 10, no. 3 (September 1, 2021): 636. http://dx.doi.org/10.11591/ijai.v10.i3.pp636-648.

Full text
Abstract:
Finding plagiarism strings between two given documents are the main task of the plagiarism detection problem. Traditional approaches based on string matching are not very useful in cases of similar semantic plagiarism. Deep learning approaches solve this problem by measuring the semantic similarity between pairs of sentences. However, these approaches still face the following challenging points. First, it is impossible to solve cases where only part of a sentence belongs to a plagiarism passage. Second, measuring the sentential similarity without considering the context of surrounding sentences leads to decreasing in accuracy. To solve the above problems, this paper proposes a two-phase plagiarism detection system based on multi-layer long short-term memory network model and feature extraction technique: (i) a passage-phase to recognize plagiarism passages, and (ii) a word-phase to determine the exact plagiarism strings. Our experiment results on PAN 2014 corpus reached 94.26% F-measure, higher than existing research in this field.
APA, Harvard, Vancouver, ISO, and other styles
16

TSAY, YIH-TAY, and WEN-HSIANG TSAI. "MODEL-GUIDED ATTRIBUTED STRING MATCHING BY SPLIT-AND-MERGE FOR SHAPE RECOGNITION." International Journal of Pattern Recognition and Artificial Intelligence 03, no. 02 (June 1989): 159–79. http://dx.doi.org/10.1142/s0218001489000140.

Full text
Abstract:
Due to noise and distortion, segmentation uncertainty is a key problem in structural pattern analysis. In this paper we propose the use of the split operation for shape recognition by attributed string matching. After illustrating the disadvantage of attributed string matching using the merge operation, the split operation is proposed. Under the guidance of the model shape, an input shape can be reapproximated, using the split operation, into a new attributed string representation. By combining the split and the merge operations for shape matching it is unnecessary to apply any type of edit operation to a model shape. This makes the distance between the input shape and the model shape more meaningful and stable, and improves recognition results. An algorithm for attributed string matching by split-and-merge is proposed. To eliminate the effect of the numbers of primitives in the model shape on the shape distance, shape recognition based on a similarity measure is also proposed. Good experimental results prove the feasibility of the proposed approach for general shape recognition.
APA, Harvard, Vancouver, ISO, and other styles
17

Putra, Pandu Pratama, Afriansyah Afriansyah, and Muhammad Syaifullah. "Pendeteksi Kesamaan Dokumen pada Sistem Informasi Pendaftaran Proposal Skripsi dengan Pendekatan Algoritma Rabin-Karp." INTECOMS: Journal of Information Technology and Computer Science 2, no. 1 (June 30, 2019): 40–47. http://dx.doi.org/10.31539/intecoms.v2i1.738.

Full text
Abstract:
Plagiarism is a significant problem in many areas, including universities. plagiarism is usually performed on digital content is to copy-paste of the original document. To anticipate, we need a way to analyze the technique of plagiarism. There are several approaches that can be taken, for example by using a search algorithm Rabin-Karp string because these algorithms can be used to detect plagiarism in a text document. In the testing phase, the test documents used were three documents with similarity level categories of low, medium and high. From some of the testing that has been done, this approach can find the longest quote the same between the two text documents and measure the similarity of text documents. With this system will help prevent acts of plagiarism in the thesis proposal registration so that the absence of the same thesis. Keywords: Plagiarism, Rabin-Karp algorithm, Similarity, Document
APA, Harvard, Vancouver, ISO, and other styles
18

CHALI, YLLIAS, and SADID A. HASAN. "Query-focused multi-document summarization: automatic data annotations and supervised learning approaches." Natural Language Engineering 18, no. 1 (April 7, 2011): 109–45. http://dx.doi.org/10.1017/s1351324911000167.

Full text
Abstract:
AbstractIn this paper, we apply different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time-consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel. The supervised methods we use are Support Vector Machines, Conditional Random Fields, Hidden Markov Models, Maximum Entropy, and two ensemble-based approaches. During different experiments, we analyze the impact of automatic labeling methods on the performance of the applied supervised methods. To our knowledge, no other study has deeply investigated and compared the effects of using different automatic annotation techniques on different supervised learning approaches in the domain of query-focused multi-document summarization.
APA, Harvard, Vancouver, ISO, and other styles
19

Pratama, Zudha, Ema Utami, and M. Rudyanto Arief. "Analisa Perbandingan Jenis N-GRAM Dalam Penentuan Similarity Pada Deteksi Plagiat." Creative Information Technology Journal 4, no. 4 (January 12, 2019): 254. http://dx.doi.org/10.24076/citec.2017v4i4.118.

Full text
Abstract:
Dampak.akses informasi yang mudah membuat tindakan plagiasi makin marak. Tindakan tersebut dapat dicegah dengan menggunakan sistem deteksi plagiat. Sistem tersebut dapat dibangun dengan menggunakan konsep similarity dengan algoritma rabin-karp sebagai string matchingnya dan n-gram sebagai metode parsingnya. Penelitian terdahulu menggunakan kedua algoritma tersebut menunjukkan hasil sistem yang cukup baik untuk deteksi plagiat. Kemudian hasil penelitian dari luar negeri ada yang melakukan hal serupa mengenai deteksi plagiat serta menghasilkan penemuan baru misalnya cross-language similarity. Selain itu ada temuan faktafakta baru mengenai deteksi plagiat dengan berbagai cara pengujian serta penggabungan berbagai metode yang sudah ada untuk perbaikan hasil deteksi. Sedangkan tujuan kami pada penelitian ini adalah membandingkan metode parsing untuk mengetahui metode parsing yang mana yang dapat memberikan hasil paling cepat dan masih dalam nilai akurasi yang wajar. Kami sebagai kontrol ukuran akurasi kami menggunakan plagiarism checker x free. Kami menggunakan aplikasi tersebut untuk menentukan akurasi instrumen uji kami menggunakan selisih similarity aplikasi ini dengan instrumen uji kami. Hasilnya kami menemukan fakta jika ngram word memiliki akurasi yang paling optimal dibanding n-gram yang lain dan masih relatif paling cepat dibanding lainnya.Kata Kunci — perbandingan, ngram, similarity text, deteksi plagiat The impact of easy information access makes plagiarism more and more prevalent. Such action can be prevented by using a plagiarism detection system. The system can be constructed using the concept of similarity with the rabin-karp algorithm as its matching string and n-gram as its parsing method. Earlier studies using both algorithms show good system results for plagiarism detection. Then the results of research from abroad have done the same about the detection of plagiarism and produce new inventions such as cross-language similarity. In addition, there are new facts about plagiarism detection by various testing methods and incorporating existing methods for improving the detection. While our goal in this study is to compare the method of parsing to find out which parsing method that can provide the fastest results and still in a reasonable accuracy value. We measure our accuracy as accurate using plagiarism checker x free. We use the application to determine the accuracy of our test instruments using the similarity difference of this application with our test instruments. We found that n-gram word has the most optimal accuracy compared to other n-grams and is still relatively fastest compared to others.Keywords — comparison, ngram, similarity text, plagiarism detection
APA, Harvard, Vancouver, ISO, and other styles
20

Kuang, Teo Poh, Hamidah Ibrahim, Fatimah Sidi, Nur Izura Udzir, and Ali A. Alwan. "An Effective Naming Heterogeneity Resolution for XACML Policy Evaluation in a Distributed Environment." Symmetry 13, no. 12 (December 12, 2021): 2394. http://dx.doi.org/10.3390/sym13122394.

Full text
Abstract:
Policy evaluation is a process to determine whether a request submitted by a user satisfies the access control policies defined by an organization. Naming heterogeneity between the attribute values of a request and a policy is common due to syntactic variations and terminological variations, particularly among organizations of a distributed environment. Existing policy evaluation engines employ a simple string equal matching function in evaluating the similarity between the attribute values of a request and a policy, which are inaccurate, since only exact match is considered similar. This work proposes several matching functions which are not limited to the string equal matching function that aim to resolve various types of naming heterogeneity. Our proposed solution is also capable of supporting symmetrical architecture applications, in which the organization can negotiate with the users for the release of their resources and properties that raise privacy concerns. The effectiveness of the proposed matching functions on real XACML policies, designed for universities, conference management, and the health care domain, is evaluated. The results show that the proposed solution has successfully achieved higher percentages of Recall and F-measure compared with the standard Sun’s XACML implementation, with our improvement, these measures gained up to 70% and 57%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
21

MOHRI, MEHRYAR. "EDIT-DISTANCE OF WEIGHTED AUTOMATA: GENERAL DEFINITIONS AND ALGORITHMS." International Journal of Foundations of Computer Science 14, no. 06 (December 2003): 957–82. http://dx.doi.org/10.1142/s0129054103002114.

Full text
Abstract:
The problem of computing the similarity between two sequences arises in many areas such as computational biology and natural language processing. A common measure of the similarity of two strings is their edit-distance, that is the minimal cost of a series of symbol insertions, deletions, or substitutions transforming one string into the other. In several applications such as speech recognition or computational biology, the objects to compare are distributions over strings, i.e., sets of strings representing a range of alternative hypotheses with their associated weights or probabilities. We define the edit-distance of two distributions over strings and present algorithms for computing it when these distributions are given by automata. In the particular case where two sets of strings are given by unweighted automata, their edit-distance can be computed using the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm. In the general case, we show that general weighted automata algorithms over the appropriate semirings can be used to compute the edit-distance of two weighted automata exactly. These include classical algorithms such as the composition and ∊-removal of weighted transducers and a new and simple synchronization algorithm for weighted transducers which, combined with ∊-removal, can be used to normalize weighted transducers with bounded delays. Our algorithm for computing the edit-distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems. It can also be extended to provide an edit-distance automaton useful for re-scoring and other post-processing purposes in the context of large-vocabulary speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
22

SAKAKIBARA, YASUBUMI, KRIS POPENDORF, NANA OGAWA, KIYOSHI ASAI, and KENGO SATO. "STEM KERNELS FOR RNA SEQUENCE ANALYSES." Journal of Bioinformatics and Computational Biology 05, no. 05 (October 2007): 1103–22. http://dx.doi.org/10.1142/s0219720007003028.

Full text
Abstract:
Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.
APA, Harvard, Vancouver, ISO, and other styles
23

Pinaire, Jessica, Etienne Chabert, Jérôme Azé, Sandra Bringay, and Paul Landais. "Sequential Pattern Mining to Predict Medical In-Hospital Mortality from Administrative Data: Application to Acute Coronary Syndrome." Journal of Healthcare Engineering 2021 (May 25, 2021): 1–12. http://dx.doi.org/10.1155/2021/5531807.

Full text
Abstract:
Prediction of a medical outcome based on a trajectory of care has generated a lot of interest in medical research. In sequence prediction modeling, models based on machine learning (ML) techniques have proven their efficiency compared to other models. In addition, reducing model complexity is a challenge. Solutions have been proposed by introducing pattern mining techniques. Based on these results, we developed a new method to extract sets of relevant event sequences for medical events’ prediction, applied to predict the risk of in-hospital mortality in acute coronary syndrome (ACS). From the French Hospital Discharge Database, we mined sequential patterns. They were further integrated into several predictive models using a text string distance to measure the similarity between patients’ patterns of care. We computed combinations of similarity measurements and ML models commonly used. A Support Vector Machine model coupled with edit-based distance appeared as the most effective model. We obtained good results in terms of discrimination with the receiver operating characteristic curve scores ranging from 0.71 to 0.99 with a good overall accuracy. We demonstrated the interest of sequential patterns for event prediction. This could be a first step to a decision-support tool for the prevention of in-hospital death by ACS.
APA, Harvard, Vancouver, ISO, and other styles
24

Birkenes, Magnus Breder, and Jürg Fleischer. "Syntactic vs. phonological areas: A quantitative perspective on Hessian dialects." Journal of Linguistic Geography 9, no. 2 (October 2021): 142–61. http://dx.doi.org/10.1017/jlg.2021.9.

Full text
Abstract:
AbstractThis paper takes a quantitative perspective on data from the project Syntax hessischer Dialekte (SyHD), covering dialects in the German state of Hesse, an area with rich dialectal variation. Many previous dialectometric analyses abstracted away from intralocal variation (e.g., by only counting the most frequent variant at a location). In contrast, we do justice to intralocal variation by taking into account local frequency relations. The study shows that the border between Low German and Central German—one of the most important isoglosses in German dialectology—is not relevant for syntactic phenomena. At the same time, a comparison with character n-grams (a global measure of string similarity) reveals that the traditionally assumed dialect areas, primarily defined according to phonological developments, are still present in the twenty-first century data. Different from previous studies, our results are obtained from a uniform data base. Therefore, the differences between syntax and phonology cannot be due to variation in sampling, elicitation method, or time of elicitation.
APA, Harvard, Vancouver, ISO, and other styles
25

Gali, Najlah, Radu Mariescu-Istodor, Damien Hostettler, and Pasi Fränti. "Framework for syntactic string similarity measures." Expert Systems with Applications 129 (September 2019): 169–85. http://dx.doi.org/10.1016/j.eswa.2019.03.048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Flower, Darren R. "On the Properties of Bit String-Based Measures of Chemical Similarity." Journal of Chemical Information and Computer Sciences 38, no. 3 (April 4, 1998): 379–86. http://dx.doi.org/10.1021/ci970437z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

El-ghafar, Randa Mohamed Abd, Ali H. El-Bastawissy, Eman S. Nasr, and Mervat H. Gheith. "An Effective Entity Resolution Approach for Big Data." International Journal of Innovative Technology and Exploring Engineering 10, no. 11 (September 30, 2021): 100–112. http://dx.doi.org/10.35940/ijitee.k9503.09101121.

Full text
Abstract:
Entity Resolution (ER) is defined as the process 0f identifying records/ objects that correspond to real-world objects/ entities. To define a good ER approach, the schema of the data should be well-known. In addition, schema alignment of multiple datasets is not an easy task and may require either domain expert or ML algorithm to select which attributes to match. Schema agnostic meta-blocking tries to solve such a problem by considering each token as a blocking key regardless of the attributes it appears in. It may also be coupled with meta-blocking to reduce the number of false negatives. However, it requires the exact match of tokens which is very hard to occur in the actual datasets and it results in very low precision. To overcome such issues, we propose a novel and efficient ER approach for big data implemented in Apache Spark. The proposed approach is employed to avoid schema alignment as it treats the attributes as a bag of words and generates a set of n-grams which is transformed to vectors. The generated vectors are compared using a chosen similarity measure. The proposed approach is a generic one as it can accept all types of datasets. It consists of five consecutive sub-modules: 1) Dataset acquisition, 2) Dataset pre-processing, 3) Setting selection criteria, where all settings of the proposed approach are selected such as the used blocking key, the significant attributes, NLP techniques, ER threshold, and the used scenario of ER, 4) ER pipeline construction, and 5) Clustering where the similar records are grouped into the similar cluster. The ER pipeline could accept two types of attributes; the Weighted Attributes (WA) or the Compound Attributes (CA). In addition, it accepts all the settings selected in the fourth module. The pipeline consists of five phases. Phase 1) Generating the tokens composing the attributes. Phase 2) Generating n-grams of length n. Phase 3) Applying the hashing Text Frequency (TF) to convert each n-grams to a fixed-length feature vector. Phase 4) Applying Locality Sensitive Hashing (LSH), which maps similar input items to the same buckets with a higher probability than dissimilar input items. Phase 5) Classification of the objects to duplicates or not according to the calculated similarity between them. We introduced seven different scenarios as an input to the ER pipeline. To minimize the number of comparisons, we proposed the length filter which greatly contributes to improving the effectiveness of the proposed approach as it achieves the highest F-measure between the existing computational resources and scales well with the available working nodes. Three results have been revealed: 1) Using the CA in the different scenarios achieves better results than the single WA in terms of efficiency and effectiveness. 2) Scenario 3 and 4 Achieve the best performance time because using Soundex and Stemming contribute to reducing the performance time of the proposed approach. 3) Scenario 7 achieves the highest F-measure because by utilizing the length filter, we only compare records that are nearly within a pre-determined percentage of increase or decrease of string length. LSH is used to map the same inputs items to the buckets with a higher probability than dis-similar ones. It takes numHashTables as a parameter. Increasing the number of candidate pairs with the same numHashTables will reduce the accuracy of the model. Utilizing the length filter helps to minimize the number of candidates which in turn increases the accuracy of the approach.
APA, Harvard, Vancouver, ISO, and other styles
28

ARDILA, YOAN JOSÉ PINZÓN, RAPHAËL CLIFFORD, COSTAS S. ILIOPOULOS, GAD M. LANDAU, and MANAL MOHAMED. "NECKLACE SWAP PROBLEM FOR RHYTHMIC SIMILARITY MEASURES." International Journal of Computational Methods 05, no. 03 (September 2008): 351–63. http://dx.doi.org/10.1142/s0219876208001583.

Full text
Abstract:
Given two n-bit (cyclic) binary strings, A and B, represented on a circle (necklace instances), let each sequence have the same number (k) of 1's. We are interested in computing the cyclic swap distance between A and B, i.e. the minimum number of swaps needed to convert A to B, minimized over all possible rotations of B. We show that, given the compressed representation of A and B, this distance may be computed in O(k2).
APA, Harvard, Vancouver, ISO, and other styles
29

Egghe, L., and C. Michel. "Strong similarity measures for ordered sets of documents in information retrieval." Information Processing & Management 38, no. 6 (November 2002): 823–48. http://dx.doi.org/10.1016/s0306-4573(01)00051-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Jerman-Blažič, Borka, and Milan Randić. "Similarity measures for sets of strings and application in chemical classification." Journal of Mathematical Chemistry 4, no. 1 (December 1990): 217–25. http://dx.doi.org/10.1007/bf01170014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Zhang, Jin, and Marcia Lei Zeng. "A new similarity measure for subject hierarchical structures." Journal of Documentation 70, no. 3 (May 6, 2014): 364–91. http://dx.doi.org/10.1108/jd-12-2012-0160.

Full text
Abstract:
Purpose – The purpose of this paper is to introduce a new similarity method to gauge the differences between two subject hierarchical structures. Design/methodology/approach – In the proposed similarity measure, nodes on two hierarchical structures are projected onto a two-dimensional space, respectively, and both structural similarity and subject similarity of nodes are considered in the similarity between the two hierarchical structures. The extent to which the structural similarity impacts on the similarity can be controlled by adjusting a parameter. An experiment was conducted to evaluate soundness of the measure. Eight experts whose research interests were information retrieval and information organization participated in the study. Results from the new measure were compared with results from the experts. Findings – The evaluation shows strong correlations between the results from the new method and the results from the experts. It suggests that the similarity method achieved satisfactory results. Practical implications – Hierarchical structures that are found in subject directories, taxonomies, classification systems, and other classificatory structures play an extremely important role in information organization and information representation. Measuring the similarity between two subject hierarchical structures allows an accurate overarching understanding of the degree to which the two hierarchical structures are similar. Originality/value – Both structural similarity and subject similarity of nodes were considered in the proposed similarity method, and the extent to which the structural similarity impacts on the similarity can be adjusted. In addition, a new evaluation method for a hierarchical structure similarity was presented.
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Shuangyuan, Shihong Xia, Zhaoqi Wang, and Chunpeng Li. "Efficient motion data indexing and retrieval with local similarity measure of motion strings." Visual Computer 25, no. 5-7 (March 3, 2009): 499–508. http://dx.doi.org/10.1007/s00371-009-0345-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Tsuruoka, Y., J. McNaught, J. ;c Tsujii, and S. Ananiadou. "Learning string similarity measures for gene/protein name dictionary look-up using logistic regression." Bioinformatics 23, no. 20 (August 12, 2007): 2768–74. http://dx.doi.org/10.1093/bioinformatics/btm393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

ICHISE, RYUTARO. "AN ANALYSIS OF MULTIPLE SIMILARITY MEASURES FOR ONTOLOGY MAPPING PROBLEM." International Journal of Semantic Computing 04, no. 01 (March 2010): 103–22. http://dx.doi.org/10.1142/s1793351x1000095x.

Full text
Abstract:
This paper presents an analysis of similarity measures for the ontology mapping problem. To that end, 48 similarity measures such as string matching and knowledge based similarities that have been widely used in ontology mapping systems are defined. The similarity measures are investigated by discriminant analysis with a real-world data set. As a result, it was possible to identify 22 effective similarity measures for the ontology mapping problem out of 48 possible similarity measures. The identified measures have a wide variety in the type of similarity. To test whether the identified similarity measures are effective for the problem, experiments were conducted with all 48 similarity measures and the 22 identified similarity measures by using two major machine learning methods, decision tree and support vector machine. The experimental results show that the performance of the 48 cases and the 22 cases is almost the same regardless of the machine learning method. This implies that effective features for the ontology mapping problem were successfully identified.
APA, Harvard, Vancouver, ISO, and other styles
35

Revesz, Peter Z. "A Comparative Analysis of Motifs from Minoan and Hungarian Folk Art." MATEC Web of Conferences 210 (2018): 05020. http://dx.doi.org/10.1051/matecconf/201821005020.

Full text
Abstract:
This paper presents a similarity measure for motives. The similarity measure is applied to several ceramic and metal artifacts that contain spiral motives. The similarity measure shows a particularly strong similarity between some Minoan and Hungarian ceramics.
APA, Harvard, Vancouver, ISO, and other styles
36

Rahal, Imad, and Colin Wielga. "Source Code Plagiarism Detection Using Biological String Similarity Algorithms." Journal of Information & Knowledge Management 13, no. 03 (September 2014): 1450028. http://dx.doi.org/10.1142/s0219649214500282.

Full text
Abstract:
Source code plagiarism is easy to commit but difficult to catch. Many approaches have been proposed in the literature to automate its detection; however there is little consensus on what works best. In this paper, we propose two new measures for determining the accuracy of a given technique and describe an approach to convert code files into strings which can then be compared for similarity in order to detect plagiarism. We then compare several string comparison techniques, heavily utilised in the area of biological sequence alignment, and compare their performance on a large collection of student source code containing various types of plagiarism. Experimental results show that the compared techniques succeed in matching a plagiarised file to its original files upwards of 90% of the time. Finally, we propose a modification for these algorithms that drastically improves their runtimes with little or no effect on accuracy. Even though the ideas presented herein are applicable to most programming languages, we focus on a case study pertaining to an introductory-level Visual Basic programming course offered at our institution.
APA, Harvard, Vancouver, ISO, and other styles
37

Escobar, Marco A., José R. Guzmán Sepúlveda, Jorge R. Parra Michel, and Rafael Guzmán Cabrera. "A proposal to measure the similarity between retinal vessel segmentations images." Nova Scientia 11, no. 22 (May 29, 2019): 224–45. http://dx.doi.org/10.21640/ns.v11i22.1872.

Full text
Abstract:
Introduction: We propose a novel approach for the assessment of the similarity of retinal vessel segmentation images that is based on linking the standard performance metrics of a segmentation algorithm, with the actual structural properties of the images through the fractal dimension.Method: We apply our methodology to compare the vascularity extracted by automatic segmentation against manually segmented images.Results: We demonstrate that the strong correlation between the standard metrics and fractal dimension is preserved regardless of the size of the subimages analyzed.Discussion or Conclusion: We show that the fractal dimension is correlated to the segmentation algorithm’s performance and therefore it can be used as a comparison metric.
APA, Harvard, Vancouver, ISO, and other styles
38

Yang, Jie, Wei Zhou, and Shuai Li. "Similarity measure for multi-granularity rough approximations of vague sets." Journal of Intelligent & Fuzzy Systems 40, no. 1 (January 4, 2021): 1609–21. http://dx.doi.org/10.3233/jifs-200611.

Full text
Abstract:
Vague sets are a further extension of fuzzy sets. In rough set theory, target concept can be characterized by different rough approximation spaces when it is a vague concept. The uncertainty measure of vague sets in rough approximation spaces is an important issue. If the uncertainty measure is not accurate enough, different rough approximation spaces of a vague concept may possess the same result, which makes it impossible to distinguish these approximation spaces for charactering a vague concept strictly. In this paper, this problem will be solved from the perspective of similarity. Firstly, based on the similarity between vague information granules(VIGs), we proposed an uncertainty measure with strong distinguishing ability called rough vague similarity (RVS). Furthermore, by studying the multi-granularity rough approximations of a vague concept, we reveal the change rules of RVS with the changing granularities and conclude that the RVS between any two rough approximation spaces can degenerate to granularity measure and information measure. Finally, a case study and related experiments are listed to verify that RVS possesses a better performance for reflecting differences among rough approximation spaces for describing a vague concept.
APA, Harvard, Vancouver, ISO, and other styles
39

Botto, C., A. Escalante, M. Arango, and L. Yarzabal. "Morphological differences between Venezuelan and African microfilariae of Onchocerca volvulus." Journal of Helminthology 62, no. 4 (December 1988): 345–51. http://dx.doi.org/10.1017/s0022149x00011755.

Full text
Abstract:
AbstractComparative morphological and biometric characteristics of microfilariae of Onchocerca gutturosa and O. volvulus from different geographical areas (Upper Orinoco, Venezuela; Togo; Liberia) were assessed. “Stepwise” discriminant analysis and Mahalanobis estimators were applied to measure distance between populations. The results indicate a strong similarity between the two strains from the Upper Orinoco (Venezuela) and the Togo strain, as well as a clear separation between these strains and that of O. gutturosa. The Liberian strain was easily distinguishable from microfilariae from Togo and Venezuela. Discriminant analysis showed the Liberian deme to be as different from the Venezuelan and Togo demes as these demes were from microfilariae of the reference species, O. gutturosa. Although it is necessary to confirm these data using formalin-fixed specimens obtained from the skin, the present findings suggest the existence of geographically-different strains of O. volvulus in America and Africa.
APA, Harvard, Vancouver, ISO, and other styles
40

Tanskanen, Antti O., and Anna Rotkirch. "Sibling similarity and relationship quality in Finland." Acta Sociologica 62, no. 4 (June 26, 2018): 440–56. http://dx.doi.org/10.1177/0001699318777042.

Full text
Abstract:
Siblings form the strongest horizontal family tie, which often involves life-long emotional closeness and various forms of support. Similarity is often assumed to strengthen sibling relations, but existing evidence is scarce and mixed. Using data from the Generational Transmissions in Finland surveys collected in 2012, we employ both total and sibling fixed-effect regressions and examine whether sibling similarity is associated with relationship quality in two family generations: an older generation born in 1945–1950, and the generation of their children, born in 1962–1993. We study sibling similarity in gender, age, financial condition and parenthood status and measure relationship quality by contact frequency, emotional closeness and provision of practical help. In both generations, being of the same gender was associated with all relationship measures. Age similarity was also associated with more contacts and increased emotional closeness in the younger generation, and differences in parenthood status with increased provision of practical help in the older generation. In most aspects, however, sibling similarity was not associated with relationship quality. While sibling relations tend be strong in contemporary Finland, this is only partly due to similarity effects.
APA, Harvard, Vancouver, ISO, and other styles
41

Tuan, Tran Manh, Luong Thi Hong Lan, Shuo-Yan Chou, Tran Thi Ngan, Le Hoang Son, Nguyen Long Giang, and Mumtaz Ali. "M-CFIS-R: Mamdani Complex Fuzzy Inference System with Rule Reduction Using Complex Fuzzy Measures in Granular Computing." Mathematics 8, no. 5 (May 3, 2020): 707. http://dx.doi.org/10.3390/math8050707.

Full text
Abstract:
Complex fuzzy theory has strong practical background in many important applications, especially in decision-making support systems. Recently, the Mamdani Complex Fuzzy Inference System (M-CFIS) has been introduced as an effective tool for handling events that are not restricted to only values of a given time point but also include all values within certain time intervals (i.e., the phase term). In such decision-making problems, the complex fuzzy theory allows us to observe both the amplitude and phase values of an event, thus resulting in better performance. However, one of the limitations of the existing M-CFIS is the rule base that may be redundant to a specific dataset. In order to handle the problem, we propose a new Mamdani Complex Fuzzy Inference System with Rule Reduction Using Complex Fuzzy Measures in Granular Computing called M-CFIS-R. Several fuzzy similarity measures such as Complex Fuzzy Cosine Similarity Measure (CFCSM), Complex Fuzzy Dice Similarity Measure (CFDSM), and Complex Fuzzy Jaccard Similarity Measure (CFJSM) together with their weighted versions are proposed. Those measures are integrated into the M-CFIS-R system by the idea of granular computing such that only important and dominant rules are being kept in the system. The difference and advantage of M-CFIS-R against M-CFIS is the usage of the training process in which the rule base is repeatedly changed toward the original base set until the performance is better. By doing so, the new rule base in M-CFIS-R would improve the performance of the whole system. Experiments on various decision-making datasets demonstrate that the proposed M-CFIS-R performs better than M-CFIS.
APA, Harvard, Vancouver, ISO, and other styles
42

Harvey, Andrew S., and Clarke Wilson. "Evolution of Daily Activity Patterns from 1971 to 1981: A Study of the Halifax Activity Panel Survey." Canadian Studies in Population 28, no. 2 (December 31, 2001): 459. http://dx.doi.org/10.25336/p6bc8x.

Full text
Abstract:
Episode sequences from diaries are the richest source of information about daily activities of individuals and households available to social scientists. Their use has been advocated as an approach to urban planning that incorporates explicit consideration of the demands made by daily life on the built environment. The paper examines sequences of daily activities and activities augmented by data on their settings (including location and the presence of other people) to measure change in daily behaviour from 1971 to 1981. Diaries were supplied by respondents to the Halifax panel study carried out at Dalhousie University. Episode sequences are analysed using alignment methods, also called optimal matching, developed in molecular biology. These are implemented through the ClustalG multiple alignment program package. Alignment methods define similarity measures between character strings, which can be used to measure the similarity of two persons’ daily activities, to measure change over time, or to determine the relative similarity of three or more activity diaries. The results of the research showed that both pure activities and activity-settings identified broadly the same behvioural groupings: employed workers, domestic workers, and weekend activities. The similarity of activity patterns of individuals was greater over the ten-year analysis period than the average similarity of the sample in either 1971 or 1981. The average similarity of activity and activitysetting patterns rose from 1971 to 1981, which contradicts observations that daily routines are becoming more complex and diverse.
APA, Harvard, Vancouver, ISO, and other styles
43

Abdul-Jabbar, Safa, and Loay George. "A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures." ARO-The Scientific Journal of Koya University 5, no. 2 (2017): 6–18. http://dx.doi.org/10.14500/aro.10180.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Winter, Felix, Nysret Musliu, and Peter Stuckey. "Explaining Propagators for String Edit Distance Constraints." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 1676–83. http://dx.doi.org/10.1609/aaai.v34i02.5530.

Full text
Abstract:
The computation of string similarity measures has been thoroughly studied in the scientific literature and has applications in a wide variety of different areas. One of the most widely used measures is the so called string edit distance which captures the number of required edit operations to transform a string into another given string. Although polynomial time algorithms are known for calculating the edit distance between two strings, there also exist NP-hard problems from practical applications like scheduling or computational biology that constrain the minimum edit distance between arrays of decision variables. In this work, we propose a novel global constraint to formulate restrictions on the minimum edit distance for such problems. Furthermore, we describe a propagation algorithm and investigate an explanation strategy for an edit distance constraint propagator that can be incorporated into state of the art lazy clause generation solvers. Experimental results show that the proposed propagator is able to significantly improve the performance of existing exact methods regarding solution quality and computation speed for benchmark problems from the literature.
APA, Harvard, Vancouver, ISO, and other styles
45

Hanmandlu, Madasu, and Anirban Das. "Content-based Image Retrieval by Information Theoretic Measure." Defence Science Journal 61, no. 5 (September 2, 2011): 415. http://dx.doi.org/10.14429/dsj.61.1177.

Full text
Abstract:
<p>Content-based image retrieval focuses on intuitive and efficient methods for retrieving images from databases based on the content of the images. A new entropy function that serves as a measure of information content in an image termed as 'an information theoretic measure' is devised in this paper. Among the various query paradigms, 'query by example' (QBE) is adopted to set a query image for retrieval from a large image database. In this paper, colour and texture features are extracted using the new entropy function and the dominant colour is considered as a visual feature for a particular set of images. Thus colour and texture features constitute the two-dimensional feature vector for indexing the images. The low dimensionality of the feature vector speeds up the atomic query. Indices in a large database system help retrieve the images relevant to the query image without looking at every image in the database. The entropy values of colour and texture and the dominant colour are considered for measuring the similarity. The utility of the proposed image retrieval system based on the information theoretic measures is demonstrated on a benchmark dataset.</p><p><strong>Defence Science Journal, 2011, 61(5), pp.415-430</strong><strong><strong>, DOI:http://dx.doi.org/10.14429/dsj.61.1177</strong></strong></p>
APA, Harvard, Vancouver, ISO, and other styles
46

Maslov, V. "Research of freak wave effect on a floating object in seakeeping tank." Transactions of the Krylov State Research Centre 3, no. 397 (August 6, 2021): 65–74. http://dx.doi.org/10.24937/2542-2324-2021-3-397-65-74.

Full text
Abstract:
Object and purpose of research. This paper describes physical modeling of interaction process of abnormal wave (freak wave) with a marine floating structure in a seakeeping tank of the Krylov State Research Center. Freak wave is extremely dangerous because of the difference from wind waves by an unusually steep front slope and a gentle trough. Freak wave appears suddenly and collapses rapidly. Research of effect process features is necessary for understanding and analysis of the object behavior at extreme sea conditions. As experiment results it was necessary to obtain empirical data of sea object motions and accelerations at interaction with freak wave on different course angles and speeds. The obtained physical experiment results will be the foundation of theoretical studies and numerical calculation methods. Materials and methods. Physical modeling of the interaction process of freak wave with a marine floating structure was conducted in a deep seakeeping tank. Freak wave was generated by the linear superposition method of four twodimensional unidirectional regular waves with variable steepness in frequency range of 2 to 6 rad/s. To create a control signal was using special software. Wave packets were formed consisting of a sequence of a four harmonicas with a given frequency, height and duration. For parameters registration of freak wave were used string probes installed with a certain step along the length of the tank. A marine floating structure model was fixed by elastic fastening system in a window of a tow cart. For measure the motions of marine floating structure and its accelerations in define points at encounter with freak wave the contactless optic system and two-component acceleration sensors (accelerometers) were used. Cases of structure interaction with freak wave at different course angles and speeds were considered. Main results. As result of physical experimental data of floating structure motions in the interaction with freak wave in conditions of regular sea state at five course angles with speed and without speed were obtained. Dependencies of roll, pitch and heave motions at different course angles and various speeds were built. Similar dependencies of vertical and transverse accelerations on a stem also were built. Comparative analysis of results with data, which were obtained on intensive irregular sea state (spectrum JONSWAP) at identical experiment conditions, and also with foreign results was carried out. Conclusions. The greatest roll and maximum accelerations are registered at alongside position to abnormal wave, but cargo vessel has a sufficient reserve of dynamic stability to withstand such an impulse effect. The values of roll motion and accelerations on irregular sea state are close to the parameters measured at freak wave effect. This similarity is explained by rocking effect of periodic impact of irregular sea state, the proximity of natural period of roll oscillations to average period of waves and sufficiently high waves. In comparison with foreign researches, a wider range of heading angles and speeds is considered, and data about accelerations in a stem are obtained.
APA, Harvard, Vancouver, ISO, and other styles
47

Egghe, L., and C. Michel. "Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques." Information Processing & Management 39, no. 5 (September 2003): 771–807. http://dx.doi.org/10.1016/s0306-4573(02)00027-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Kaivapalu, Annekatrin, and Maisa Martin. "Perceived similarity between written Estonian and Finnish: Strings of letters or morphological units?" Nordic Journal of Linguistics 40, no. 2 (October 2017): 149–74. http://dx.doi.org/10.1017/s0332586517000142.

Full text
Abstract:
The distance or similarity between two languages can be objective or actual, i.e. discoverable by the tools and methods of linguists, or perceived by users of the languages. In this article two methods, the Levenshtein Distance (LD), which purports to measure the objective distance, and the Index of Perceived Similarity (IPS), which quantifies language users’ perceptions, are compared. The data are the quantitative results of a test measuring conscious perceptions of similarity between Estonian and Finnish inflectional morphology by Finnish and Estonian native speakers (‘Finns’ and ‘Estonians’) with no knowledge of and exposure to the other (‘target’) language. The results show that Finns see more similarity between Finnish and Estonian than Estonians do. Also the correlations between LD and the perception results of the Finns are statistically significant while the correlations between the LD and the IPS scores of the Estonians are not. Comments by test participants provide insights into the nature of the perceptions of similarity.
APA, Harvard, Vancouver, ISO, and other styles
49

Bjerrum, Esben, and Boris Sattarov. "Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders." Biomolecules 8, no. 4 (October 30, 2018): 131. http://dx.doi.org/10.3390/biom8040131.

Full text
Abstract:
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here, it is shown that the choice of chemical representation, such as strings from the simplified molecular-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the code layer in quantitative structure activity relationship (QSAR) of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a marked increase in the rate of decoding to different molecules than encoded, a tendency that can be counteracted with more complex network architectures.
APA, Harvard, Vancouver, ISO, and other styles
50

Dorji, Yonten, Peter Annighöfer, Christian Ammer, and Dominik Seidel. "Response of Beech (Fagus sylvatica L.) Trees to Competition—New Insights from Using Fractal Analysis." Remote Sensing 11, no. 22 (November 13, 2019): 2656. http://dx.doi.org/10.3390/rs11222656.

Full text
Abstract:
Individual tree architecture and the composition of tree species play a vital role for many ecosystem functions and services provided by a forest, such as timber value, habitat diversity, and ecosystem resilience. However, knowledge is limited when it comes to understanding how tree architecture changes in response to competition. Using 3D-laser scanning data from the German Biodiversity Exploratories, we investigated the detailed three-dimensional architecture of 24 beech (Fagus sylvatica L.) trees that grew under different levels of competition pressure. We created detailed quantitative structure models (QSMs) for all study trees to describe their branching architecture. Furthermore, structural complexity and architectural self-similarity were measured using the box-dimension approach from fractal analysis. Relating these measures to the strength of competition, the trees are exposed to reveal strong responses for a wide range of tree architectural measures indicating that competition strongly changes the branching architecture of trees. The strongest response to competition (rho = −0.78) was observed for a new measure introduced here, the intercept of the regression used to determine the box-dimension. This measure was discovered as an integrating descriptor of the size of the complexity-bearing part of the tree, namely the crown, and proven to be even more sensitive to competition than the box-dimension itself. Future studies may use fractal analysis to investigate and quantify the response of tree individuals to competition.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography