Log in

Relevant bibliographies by topics / Information storage and retrieval systems Chinese characters

Academic literature on the topic 'Information storage and retrieval systems Chinese characters'

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Dissertations / Theses
Books

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Information storage and retrieval systems Chinese characters.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Information storage and retrieval systems Chinese characters"

1

Ma, Chunguang, Hongjun Bei, Guihua Chen, and Jianhui Gao. "An Information Retrieval Algorithm for Accounting Internal Audit Using Multi-Pattern Similarity Matching." Mobile Information Systems 2022 (June 16, 2022): 1–11. http://dx.doi.org/10.1155/2022/6521905.

Full text

Abstract:

The multi-mode matching has noteworthy transformations equated with the classical multi-mode matching algorithms. It is frequently used for the policy part of the TCP connection to connect the English characters. In this article, we analyzed the features of multi-mode similarity for audit information retrieval in a cluttered environment. The proposed model analyzed the performance theorem of a multi-mode matching algorithm for audit information retrieval. It also analyzed the shortcomings of existing multi-mode similarity systems and proposed a multi-mode algorithm based on the trail hash trie matching machine suitable for mixed Chinese and English environments. The algorithm converts the set of pattern strings into multiple finite automata and then builds a state driver using the set of pattern strings. The state driver is driven by the characters of the string to be matched in turn, and each finite automaton is driven by the state driver to achieve similar multimodal matching with mixed English and Chinese characters by allowing the insertion errors. The algorithm does not need to match every character and can make full use of the information of this unsuccessful match during the matching process and skip as many characters as possible by combining the improved text window mechanism. It can control the upper limit of allowed errors for each pattern string. The matching speed is independent of the number k of allowed insertion errors. The algorithm has comprehensive application projections in the fields of information auditing, database, and information retrieval, respectively.

APA, Harvard, Vancouver, ISO, and other styles

2

Al-Omoush, Ashraf, Norita Md Norwawi, and Ahmad Akmalludin Mazlan. "Handling Words Duplication and Memory Management for Digital Quran Based on Hexadecimal Representation and Sparse Matrix." International Journal of Engineering & Technology 7, no. 4.15 (October 7, 2018): 481. http://dx.doi.org/10.14419/ijet.v7i4.15.25760.

Full text

Abstract:

Al-Quran is the holy book of the Muslims and the most important scripture containing knowledge on many domains. The recent advent of smart technologies like smart phones, digital devices and tablets has connected the daily life routines under a single touch adopted by many, these new tools with an exponential growth. This paper presented a Digital Quran Model (DQM) using hexadecimal representation using Unicode Hexadecimal and UTF-8 for character encoding, which is backward compatible with ASCII code. DQM target to handle all duplicated words or verses in Al-Quran using sparse matrix with double offset indexing to handle memory optimization. Three approaches were discussed: indexing and representation of the digital Quran to optimize storage, organize verses structure using sparse matrix to handle repetition with double offset indexing to efficiently use the space. The algorithms were implemented using Visual studio and Java server and the solution quality was measured by the size of a file before and after applying DQM model. For surah Al-Baqarah, the longest chapter in the Al-Quran, the reduction of the storage size was 25.00% whereas surah Al-Fatihah was 47.89%. The proposed DQM model is able to optimize the memory space and can be extended to other non-Roman characters used for information retrieval such as Hindi, Chinese and Japanese that are categorized in unicode standards.

APA, Harvard, Vancouver, ISO, and other styles

3

Yusof, Mohd Kamir, Wan Mohd Amir Fazamin Wan Hamzah, and Nur Shuhada Md Rusli. "Efficiency of hybrid algorithm for COVID-19 online screening test based on its symptoms." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 1 (January 1, 2022): 440. http://dx.doi.org/10.11591/ijeecs.v25.i1.pp440-449.

Full text

Abstract:

The coronavirus COVID-19 is affecting 196 countries and territories around the world. The number of deaths keep on increasing each day because of COVID-19. According to World Health Organization (WHO), infected COVID-19 is slightly increasing day by day and now reach to 570,000. WHO is prefer to conduct a screening COVID-19 test via online system. A suitable approach especially in string matching based on symptoms is required to produce fast and accurate result during retrieving process. Currently, four latest approaches in string matching have been implemented in string matching; characters-based algorithm, hashing algorithm, suffix automation algorithm and hybrid algorithm. Meanwhile, extensible markup language (XML), JavaScript object notation (JSON), asynchronous JavaScript XML (AJAX) and JQuery tehnology has been used widelfy for data transmission, data storage and data retrieval. This paper proposes a combination of algorithm among hybrid, JSON and JQuery in order to produce a fast and accurate results during COVID-19 screening process. A few experiments have been by comparison performance in term of execution time and memory usage using five different collections of datasets. Based on the experiments, the results show hybrid produce better performance compared to JSON and JQuery. Online screening COVID-19 is hopefully can reduce the number of effected and deaths because of COVID.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Information storage and retrieval systems Chinese characters"

1

Liu, Chaomei. "Traditional Chinese medical clinic system." CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2517.

Full text

Abstract:

The Chinese Medical Clinic System is designed to help acupuncturists and assistants record and store information. This system can maintain and schedule appointments and view patient diagnoses effectively. The system will be implemented on a desktop PC connected to the internet to facilitate the acupuncturists record of information.

APA, Harvard, Vancouver, ISO, and other styles

2

"Shape-based image retrieval in iconic image databases." 1999. http://library.cuhk.edu.hk/record=b5889854.

Full text

Abstract:

by Chan Yuk Ming.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.
Includes bibliographical references (leaves 117-124).
Abstract also in Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Content-based Image Retrieval --- p.3
Chapter 1.2 --- Designing a Shape-based Image Retrieval System --- p.4
Chapter 1.3 --- Information on Trademark --- p.6
Chapter 1.3.1 --- What is a Trademark? --- p.6
Chapter 1.3.2 --- Search for Conflicting Trademarks --- p.7
Chapter 1.3.3 --- Research Scope --- p.8
Chapter 1.4 --- Information on Chinese Cursive Script Character --- p.9
Chapter 1.5 --- Problem Definition --- p.9
Chapter 1.6 --- Contributions --- p.11
Chapter 1.7 --- Thesis Organization --- p.13
Chapter 2 --- Literature Review --- p.14
Chapter 2.1 --- Trademark Retrieval using QBIC Technology --- p.14
Chapter 2.2 --- STAR --- p.16
Chapter 2.3 --- ARTISAN --- p.17
Chapter 2.4 --- Trademark Retrieval using a Visually Salient Feature --- p.18
Chapter 2.5 --- Trademark Recognition using Closed Contours --- p.19
Chapter 2.6 --- Trademark Retrieval using a Two Stage Hierarchy --- p.19
Chapter 2.7 --- Logo Matching using Negative Shape Features --- p.21
Chapter 2.8 --- Chapter Summary --- p.22
Chapter 3 --- Background on Shape Representation and Matching --- p.24
Chapter 3.1 --- Simple Geometric Features --- p.25
Chapter 3.1.1 --- Circularity --- p.25
Chapter 3.1.2 --- Rectangularity --- p.26
Chapter 3.1.3 --- Hole Area Ratio --- p.27
Chapter 3.1.4 --- Horizontal Gap Ratio --- p.27
Chapter 3.1.5 --- Vertical Gap Ratio --- p.28
Chapter 3.1.6 --- Central Moments --- p.28
Chapter 3.1.7 --- Major Axis Orientation --- p.29
Chapter 3.1.8 --- Eccentricity --- p.30
Chapter 3.2 --- Fourier Descriptors --- p.30
Chapter 3.3 --- Chain Codes --- p.31
Chapter 3.4 --- Seven Invariant Moments --- p.33
Chapter 3.5 --- Zernike Moments --- p.35
Chapter 3.6 --- Edge Direction Histogram --- p.36
Chapter 3.7 --- Curvature Scale Space Representation --- p.37
Chapter 3.8 --- Chapter Summary --- p.39
Chapter 4 --- Genetic Algorithm for Weight Assignment --- p.42
Chapter 4.1 --- Genetic Algorithm (GA) --- p.42
Chapter 4.1.1 --- Basic Idea --- p.43
Chapter 4.1.2 --- Genetic Operators --- p.44
Chapter 4.2 --- Why GA? --- p.45
Chapter 4.3 --- Weight Assignment Problem --- p.46
Chapter 4.3.1 --- Integration of Image Attributes --- p.46
Chapter 4.4 --- Proposed Solution --- p.47
Chapter 4.4.1 --- Formalization --- p.47
Chapter 4.4.2 --- Proposed Genetic Algorithm --- p.43
Chapter 4.5 --- Chapter Summary --- p.49
Chapter 5 --- Shape-based Trademark Image Retrieval System --- p.50
Chapter 5.1 --- Problems on Existing Methods --- p.50
Chapter 5.1.1 --- Edge Direction Histogram --- p.51
Chapter 5.1.2 --- Boundary Based Techniques --- p.52
Chapter 5.2 --- Proposed Solution --- p.53
Chapter 5.2.1 --- Image Preprocessing --- p.53
Chapter 5.2.2 --- Automatic Feature Extraction --- p.54
Chapter 5.2.3 --- Approximated Boundary --- p.55
Chapter 5.2.4 --- Integration of Shape Features and Query Processing --- p.58
Chapter 5.3 --- Experimental Results --- p.58
Chapter 5.3.1 --- Experiment 1: Weight Assignment using Genetic Algorithm --- p.59
Chapter 5.3.2 --- Experiment 2: Speed on Feature Extraction and Retrieval --- p.62
Chapter 5.3.3 --- Experiment 3: Evaluation by Precision --- p.63
Chapter 5.3.4 --- Experiment 4: Evaluation by Recall for Deformed Images --- p.64
Chapter 5.3.5 --- Experiment 5: Evaluation by Recall for Hand Drawn Query Trademarks --- p.66
Chapter 5.3.6 --- "Experiment 6: Evaluation by Recall for Rotated, Scaled and Mirrored Images" --- p.66
Chapter 5.3.7 --- Experiment 7: Comparison of Different Integration Methods --- p.68
Chapter 5.4 --- Chapter Summary --- p.71
Chapter 6 --- Shape-based Chinese Cursive Script Character Image Retrieval System --- p.72
Chapter 6.1 --- Comparison to Trademark Retrieval Problem --- p.79
Chapter 6.1.1 --- Feature Selection --- p.73
Chapter 6.1.2 --- Speed of System --- p.73
Chapter 6.1.3 --- Variation of Style --- p.73
Chapter 6.2 --- Target of the Research --- p.74
Chapter 6.3 --- Proposed Solution --- p.75
Chapter 6.3.1 --- Image Preprocessing --- p.75
Chapter 6.3.2 --- Automatic Feature Extraction --- p.76
Chapter 6.3.3 --- Thinned Image and Linearly Normalized Image --- p.76
Chapter 6.3.4 --- Edge Directions --- p.77
Chapter 6.3.5 --- Integration of Shape Features --- p.78
Chapter 6.4 --- Experimental Results --- p.79
Chapter 6.4.1 --- Experiment 8: Weight Assignment using Genetic Algorithm --- p.79
Chapter 6.4.2 --- Experiment 9: Speed on Feature Extraction and Retrieval --- p.81
Chapter 6.4.3 --- Experiment 10: Evaluation by Recall for Deformed Images --- p.82
Chapter 6.4.4 --- Experiment 11: Evaluation by Recall for Rotated and Scaled Images --- p.83
Chapter 6.4.5 --- Experiment 12: Comparison of Different Integration Methods --- p.85
Chapter 6.5 --- Chapter Summary --- p.87
Chapter 7 --- Conclusion --- p.88
Chapter 7.1 --- Summary --- p.88
Chapter 7.2 --- Future Research --- p.89
Chapter 7.2.1 --- Limitations --- p.89
Chapter 7.2.2 --- Future Directions --- p.90
Chapter A --- A Representative Subset of Trademark Images --- p.91
Chapter B --- A Representative Subset of Cursive Script Character Images --- p.93
Chapter C --- Shape Feature Extraction Toolbox for Matlab V53 --- p.95
Chapter C.l --- central .moment --- p.95
Chapter C.2 --- centroid --- p.96
Chapter C.3 --- cir --- p.96
Chapter C.4 --- ess --- p.97
Chapter C.5 --- css_match --- p.100
Chapter C.6 --- ecc --- p.102
Chapter C.7 --- edge一directions --- p.102
Chapter C.8 --- fourier-d --- p.105
Chapter C.9 --- gen_shape --- p.106
Chapter C.10 --- hu7 --- p.108
Chapter C.11 --- isclockwise --- p.109
Chapter C.12 --- moment --- p.110
Chapter C.13 --- normalized-moment --- p.111
Chapter C.14 --- orientation --- p.111
Chapter C.15 --- resample-pts --- p.112
Chapter C.16 --- rectangularity --- p.113
Chapter C.17 --- trace-points --- p.114
Chapter C.18 --- warp-conv --- p.115
Bibliography --- p.117

APA, Harvard, Vancouver, ISO, and other styles

3

"Robust methods for Chinese spoken document retrieval." 2003. http://library.cuhk.edu.hk/record=b5896122.

Full text

Abstract:

Hui Pui Yu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 158-169).
Abstracts in English and Chinese.
Abstract --- p.2
Acknowledgements --- p.6
Chapter 1 --- Introduction --- p.23
Chapter 1.1 --- Spoken Document Retrieval --- p.24
Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28
Chapter 1.3 --- Motivation --- p.33
Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34
Chapter 1.4 --- Goals --- p.34
Chapter 1.5 --- Thesis Organization --- p.35
Chapter 2 --- Multimedia Repository --- p.37
Chapter 2.1 --- The Cantonese Corpus --- p.37
Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39
Chapter 2.1.2 --- The MPEG-1 Collection --- p.40
Chapter 2.2 --- The Multimedia Markup Language --- p.42
Chapter 2.3 --- Chapter Summary --- p.44
Chapter 3 --- Monolingual Retrieval Task --- p.45
Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45
Chapter 3.2 --- Automatic Speech Transcription --- p.46
Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47
Chapter 3.2.2 --- Indexing Units --- p.48
Chapter 3.3 --- Known-Item Retrieval Task --- p.49
Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50
Chapter 3.4 --- Retrieval Model --- p.51
Chapter 3.5 --- Experimental Results --- p.52
Chapter 3.6 --- Chapter Summary --- p.53
Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55
Chapter 4.1 --- Video-based Segmentation --- p.56
Chapter 4.1.1 --- Metric Computation --- p.57
Chapter 4.1.2 --- Shot Boundary Detection --- p.58
Chapter 4.1.3 --- Shot Transition Detection --- p.67
Chapter 4.2 --- Audio-based Segmentation --- p.69
Chapter 4.2.1 --- Gaussian Mixture Models --- p.69
Chapter 4.2.2 --- Transition Detection --- p.70
Chapter 4.3 --- Performance Evaluation --- p.72
Chapter 4.3.1 --- Automatic Story Segmentation --- p.72
Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73
Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74
Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75
Chapter 4.5 --- Retrieval Performance --- p.76
Chapter 4.6 --- Chapter Summary --- p.78
Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79
Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81
Chapter 5.1.1 --- Annotations from MmML --- p.81
Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83
Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84
Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84
Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87
Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90
Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90
Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92
Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92
Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93
Chapter 5.4 --- Chapter Summary --- p.94
Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97
Chapter 6.1 --- The TDT-2 Corpus --- p.99
Chapter 6.1.1 --- English Textual Queries --- p.100
Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101
Chapter 6.2 --- Query Processing --- p.101
Chapter 6.2.1 --- Query Weighting --- p.101
Chapter 6.2.2 --- Bigram Formation --- p.102
Chapter 6.3 --- Cross-language Retrieval Task --- p.103
Chapter 6.3.1 --- Indexing Units --- p.104
Chapter 6.3.2 --- Retrieval Model --- p.104
Chapter 6.3.3 --- Performance Measure --- p.105
Chapter 6.4 --- Relevance Feedback --- p.106
Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107
Chapter 6.5 --- Retrieval Performance --- p.107
Chapter 6.6 --- Chapter Summary --- p.109
Chapter 7 --- Conclusions and Future Work --- p.111
Chapter 7.1 --- Future Work --- p.114
Chapter A --- XML Schema for Multimedia Markup Language --- p.117
Chapter B --- Example of Multimedia Markup Language --- p.128
Chapter C --- Significance Tests --- p.135
Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135
Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137
Chapter C.3 --- Document Expansion with Reporter Speech --- p.137
Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140
Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140
Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142
Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145
Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148
Chapter F --- Parameters Estimation --- p.152
Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152
Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153
Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153
Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154
Chapter G --- Abbreviations --- p.155
Bibliography --- p.158

APA, Harvard, Vancouver, ISO, and other styles

4

"Probabilistic models for information extraction: from cascaded approach to joint approach." Thesis, 2010. http://library.cuhk.edu.hk/record=b6074899.

Full text

Abstract:

Based on these observations and analysis, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework defines a joint probability distribution for both segmentations in sequence data and relations of segments in the form of an exponential family. This model allows tight interactions between segmentations and relations of segments and it offers a natural way for IE tasks. Since exact parameter estimation and inference are prohibitively intractable, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a strong bi-directional MH approach to find the MAP assignments for joint segmentations and relations to explore mutual benefits on both directions, such that segmentations can aid relations, and vice-versa.
Information Extraction (IE) aims at identifying specific pieces of information (data) in a unstructured or semi-structured textual document and transforming unstructured information in a corpus of documents or Web pages into a structured database. There are several representative tasks in IE: named entity recognition (NER), which aims at identifying phrases that denote types of named entities, entity relation extraction, which aims at discovering the events or relations related to the entities, and the task of coreference resolution, aims at determining whether two extracted mentions of entities refer to the same object. IE is useful for a wide variety of applications.
The end-to-end performance of high-level IE systems for compound tasks is often hampered by the use of cascaded frameworks. The integrated model we proposed can alleviate some of these problems, but it is only loosely coupled. Parameter estimation is performed independently and it only allows information to flow in one direction. In this top-down integration model, the decision of the bottom sub-model could guide the decision of the upper sub-model, but not vice-versa. Thus, deep interactions and dependencies between different tasks can hardly be well captured.
We have investigated and developed a cascaded framework in an attempt to consider entity extraction and qualitative domain knowledge based on undirected, discriminatively-trained probabilistic graphical models. This framework consists of two stages and it is the combination of statistical learning and first-order logic. As a pipeline model, the first stage is a base model and the second stage is used to validate and correct the errors made in the base model. We incorporated domain knowledge that can be well formulated into first-order logic to extract entity candidates from the base model. We have applied this framework and achieved encouraging results in Chinese NER on the People's Daily corpus.
We perform extensive experiments on three important IE tasks using real-world datasets, namely Chinese NER, entity identification and relationship extraction from Wikipedia's encyclopedic articles, and citation matching, to test our proposed models, including the bidirectional model, the integrated model, and the joint model. Experimental results show that our models significantly outperform current state-of-the-art probabilistic models, such as decoupled and joint models, illustrating the feasibility and promise of our proposed approaches. (Abstract shortened by UMI.)
We present a general, strongly-coupled, and bidirectional architecture based on discriminatively trained factor graphs for information extraction, which consists of two components---segmentation and relation. First we introduce joint factors connecting variables of relevant subtasks to capture dependencies and interactions between them. We then propose a strong bidirectional Markov chain Monte Carlo (MCMC) sampling inference algorithm which allows information to flow in both directions to find the approximate maximum a posteriori (MAP) solution for all subtasks. Notably, our framework is considerably simpler to implement, and outperforms previous ones.
Yu, Xiaofeng.
Adviser: Zam Wai.
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (leaves 109-123).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.

APA, Harvard, Vancouver, ISO, and other styles

5

"Automatic construction of English/Chinese parallel corpus." 2001. http://library.cuhk.edu.hk/record=b5890676.

Full text

Abstract:

Li Kar Wing.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 88-96).
Abstracts in English and Chinese.
ABSTRACT --- p.i
ACKNOWLEDGEMENTS --- p.v
LIST OF TABLES --- p.viii
LIST OF FIGURES --- p.ix
CHAPTERS
Chapter 1. --- INTRODUCTION --- p.1
Chapter 1.1 --- Application of corpus-based techniques --- p.2
Chapter 1.1.1 --- Machine Translation (MT) --- p.2
Chapter 1.1.1.1 --- Linguistic --- p.3
Chapter 1.1.1.2 --- Statistical --- p.4
Chapter 1.1.1.3 --- Lexicon construction --- p.4
Chapter 1.1.2 --- Cross-lingual Information Retrieval (CLIR) --- p.6
Chapter 1.1.2.1 --- Controlled vocabulary --- p.6
Chapter 1.1.2.2 --- Free text --- p.7
Chapter 1.1.2.3 --- Application corpus-based approach in CLIR --- p.9
Chapter 1.2 --- Overview of linguistic resources --- p.10
Chapter 1.3 --- Written language corpora --- p.12
Chapter 1.3.1 --- Types of corpora --- p.13
Chapter 1.3.2 --- Limitation of comparable corpora --- p.16
Chapter 1.4 --- Outline of the dissertation --- p.17
Chapter 2. --- LITERATURE REVIEW --- p.19
Chapter 2.1 --- Research in automatic corpus construction --- p.20
Chapter 2.2 --- Research in translation alignment --- p.25
Chapter 2.2.1 --- Sentence alignment --- p.27
Chapter 2.2.2 --- Word alignment --- p.28
Chapter 2.3 --- Research in alignment of sequences --- p.33
Chapter 3. --- ALIGNMENT AT WORD LEVEL AND CHARACTER LEVEL --- p.35
Chapter 3.1 --- Title alignment --- p.35
Chapter 3.1.1 --- Lexical features --- p.37
Chapter 3.1.2 --- Grammatical features --- p.40
Chapter 3.1.3 --- The English/Chinese alignment model --- p.41
Chapter 3.2 --- Alignment at word level and character level --- p.42
Chapter 3.2.1 --- Alignment at word level --- p.42
Chapter 3.2.2 --- Alignment at character level: Longest matching --- p.44
Chapter 3.2.3 --- Longest common subsequence(LCS) --- p.46
Chapter 3.2.4 --- Applying LCS in the English/Chinese alignment model --- p.48
Chapter 3.3 --- Reduce overlapping ambiguity --- p.52
Chapter 3.3.1 --- Edit distance --- p.52
Chapter 3.3.2 --- Overlapping in the algorithm model --- p.54
Chapter 4. --- ALIGNMENT AT TITLE LEVEL --- p.59
Chapter 4.1 --- Review of score functions --- p.59
Chapter 4.2 --- The Score function --- p.60
Chapter 4.2.1 --- (C matches E) and (E matches C) --- p.60
Chapter 4.2.2 --- Length similarity --- p.63
Chapter 5. --- EXPERIMENTAL RESULTS --- p.69
Chapter 5.1 --- Hong Kong government press release articles --- p.69
Chapter 5.2 --- Hang Seng Bank economic monthly reports --- p.76
Chapter 5.3 --- Hang Seng Bank press release articles --- p.78
Chapter 5.4 --- Hang Seng Bank speech articles --- p.81
Chapter 5.5 --- Quality of the collections and future work --- p.84
Chapter 6. --- CONCLUSION --- p.87
Bibliography

APA, Harvard, Vancouver, ISO, and other styles

6

"Automatic index generation for the free-text based database." Chinese University of Hong Kong, 1992. http://library.cuhk.edu.hk/record=b5887040.

Full text

Abstract:

by Leung Chi Hong.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1992.
Includes bibliographical references (leaves 183-184).
Chapter Chapter one: --- Introduction --- p.1
Chapter Chapter two: --- Background knowledge and linguistic approaches of automatic indexing --- p.5
Chapter 2.1 --- Definition of index and indexing --- p.5
Chapter 2.2 --- Indexing methods and problems --- p.7
Chapter 2.3 --- Automatic indexing and human indexing --- p.8
Chapter 2.4 --- Different approaches of automatic indexing --- p.10
Chapter 2.5 --- Example of semantic approach --- p.11
Chapter 2.6 --- Example of syntactic approach --- p.14
Chapter 2.7 --- Comments on semantic and syntactic approaches --- p.18
Chapter Chapter three: --- Rationale and methodology of automatic index generation --- p.19
Chapter 3.1 --- Problems caused by natural language --- p.19
Chapter 3.2 --- Usage of word frequencies --- p.20
Chapter 3.3 --- Brief description of rationale --- p.24
Chapter 3.4 --- Automatic index generation --- p.27
Chapter 3.4.1 --- Training phase --- p.27
Chapter 3.4.1.1 --- Selection of training documents --- p.28
Chapter 3.4.1.2 --- Control and standardization of variants of words --- p.28
Chapter 3.4.1.3 --- Calculation of associations between words and indexes --- p.30
Chapter 3.4.1.4 --- Discarding false associations --- p.33
Chapter 3.4.2 --- Indexing phase --- p.38
Chapter 3.4.3 --- Example of automatic indexing --- p.41
Chapter 3.5 --- Related researches --- p.44
Chapter 3.6 --- Word diversity and its effect on automatic indexing --- p.46
Chapter 3.7 --- Factors affecting performance of automatic indexing --- p.60
Chapter 3.8 --- Application of semantic representation --- p.61
Chapter 3.8.1 --- Problem of natural language --- p.61
Chapter 3.8.2 --- Use of concept headings --- p.62
Chapter 3.8.3 --- Example of using concept headings in automatic indexing --- p.65
Chapter 3.8.4 --- Advantages of concept headings --- p.68
Chapter 3.8.5 --- Disadvantages of concept headings --- p.69
Chapter 3.9 --- Correctness prediction for proposed indexes --- p.78
Chapter 3.9.1 --- Example of using index proposing rate --- p.80
Chapter 3.10 --- Effect of subject matter on automatic indexing --- p.83
Chapter 3.11 --- Comparison with other indexing methods --- p.85
Chapter 3.12 --- Proposal for applying Chinese medical knowledge --- p.90
Chapter Chapter four: --- Simulations of automatic index generation --- p.93
Chapter 4.1 --- Training phase simulations --- p.93
Chapter 4.1.1 --- Simulation of association calculation (word diversity uncontrolled) --- p.94
Chapter 4.1.2 --- Simulation of association calculation (word diversity controlled) --- p.102
Chapter 4.1.3 --- Simulation of discarding false associations --- p.107
Chapter 4.2 --- Indexing phase simulation --- p.115
Chapter 4.3 --- Simulation of using concept headings --- p.120
Chapter 4.4 --- Simulation for testing performance of predicting index correctness --- p.125
Chapter 4.5 --- Summary --- p.128
Chapter Chapter five: --- Real case study in database of Chinese Medicinal Material Research Center --- p.130
Chapter 5.1 --- Selection of real documents --- p.130
Chapter 5.2 --- Case study one: Overall performance using real data --- p.132
Chapter 5.2.1 --- Sample results of automatic indexing for real documents --- p.138
Chapter 5.3 --- Case study two: Using multi-word terms --- p.148
Chapter 5.4 --- Case study three: Using concept headings --- p.152
Chapter 5.5 --- Case study four: Prediction of proposed index correctness --- p.156
Chapter 5.6 --- Case study five: Use of (Σ ΔRij) Fi to determine false association --- p.159
Chapter 5.7 --- Case study six: Effect of word diversity --- p.162
Chapter 5.8 --- Summary --- p.166
Chapter Chapter six: --- Conclusion --- p.168
Appendix A: List of stopwords --- p.173
Appendix B: Index terms used in case studies --- p.174
References --- p.183

APA, Harvard, Vancouver, ISO, and other styles

7

"Statistical modeling for lexical chains for automatic Chinese news story segmentation." 2010. http://library.cuhk.edu.hk/record=b5894500.

Full text

Abstract:

Chan, Shing Kai.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (leaves 106-114).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgements --- p.v
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Problem Statement --- p.2
Chapter 1.2 --- Motivation for Story Segmentation --- p.4
Chapter 1.3 --- Terminologies --- p.5
Chapter 1.4 --- Thesis Goals --- p.6
Chapter 1.5 --- Thesis Organization --- p.8
Chapter 2 --- Background Study --- p.9
Chapter 2.1 --- Coherence-based Approaches --- p.10
Chapter 2.1.1 --- Defining Coherence --- p.10
Chapter 2.1.2 --- Lexical Chaining --- p.12
Chapter 2.1.3 --- Cosine Similarity --- p.15
Chapter 2.1.4 --- Language Modeling --- p.19
Chapter 2.2 --- Feature-based Approaches --- p.21
Chapter 2.2.1 --- Lexical Cues --- p.22
Chapter 2.2.2 --- Audio Cues --- p.23
Chapter 2.2.3 --- Video Cues --- p.24
Chapter 2.3 --- Pros and Cons and Hybrid Approaches --- p.25
Chapter 2.4 --- Chapter Summary --- p.27
Chapter 3 --- Experimental Corpora --- p.29
Chapter 3.1 --- The TDT2 and TDT3 Multi-language Text Corpus --- p.29
Chapter 3.1.1 --- Introduction --- p.29
Chapter 3.1.2 --- Program Particulars and Structures --- p.31
Chapter 3.2 --- Data Preprocessing --- p.33
Chapter 3.2.1 --- Challenges of Lexical Chain Formation on Chi- nese Text --- p.33
Chapter 3.2.2 --- Word Segmentation for Word Units Extraction --- p.35
Chapter 3.2.3 --- Part-of-speech Tagging for Candidate Words Ex- traction --- p.36
Chapter 3.3 --- Chapter Summary --- p.37
Chapter 4 --- Indication of Lexical Cohesiveness by Lexical Chains --- p.39
Chapter 4.1 --- Lexical Chain as a Representation of Cohesiveness --- p.40
Chapter 4.1.1 --- Choice of Word Relations for Lexical Chaining --- p.41
Chapter 4.1.2 --- Lexical Chaining by Connecting Repeated Lexi- cal Elements --- p.43
Chapter 4.2 --- Lexical Chain as an Indicator of Story Segments --- p.48
Chapter 4.2.1 --- Indicators of Absence of Cohesiveness --- p.49
Chapter 4.2.2 --- Indicator of Continuation of Cohesiveness --- p.58
Chapter 4.3 --- Chapter Summary --- p.62
Chapter 5 --- Indication of Story Boundaries by Lexical Chains --- p.63
Chapter 5.1 --- Formal Definition of the Classification Procedures --- p.64
Chapter 5.2 --- Theoretical Framework for Segmentation Based on Lex- ical Chaining --- p.65
Chapter 5.2.1 --- Evaluation of Story Segmentation Accuracy --- p.65
Chapter 5.2.2 --- Previous Approach of Story Segmentation Based on Lexical Chaining --- p.66
Chapter 5.2.3 --- Statistical Framework for Story Segmentation based on Lexical Chaining --- p.69
Chapter 5.2.4 --- Post Processing of Ratio for Boundary Identifi- cation --- p.73
Chapter 5.3 --- Comparing Segmentation Models --- p.75
Chapter 5.4 --- Chapter Summary --- p.79
Chapter 6 --- Analysis of Lexical Chains Features as Boundary Indi- cators --- p.80
Chapter 6.1 --- Error Analysis --- p.81
Chapter 6.2 --- Window Length in the LRT Model --- p.82
Chapter 6.3 --- The Relative Importance of Each Set of Features --- p.84
Chapter 6.4 --- The Effect of Removing Timing Information --- p.92
Chapter 6.5 --- Chapter Summary --- p.96
Chapter 7 --- Conclusions and Future Work --- p.98
Chapter 7.1 --- Contributions --- p.98
Chapter 7.2 --- Future Works --- p.100
Chapter 7.2.1 --- Further Extension of the Framework --- p.100
Chapter 7.2.2 --- Wider Applications of the Framework --- p.105
Bibliography --- p.106

APA, Harvard, Vancouver, ISO, and other styles

8

"ACTION: automatic classification for Chinese documents." Chinese University of Hong Kong, 1994. http://library.cuhk.edu.hk/record=b5895378.

Full text

Abstract:

by Jacqueline, Wai-ting Wong.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.
Includes bibliographical references (p. 107-109).
Abstract --- p.i
Acknowledgement --- p.iii
List of Tables --- p.viii
List of Figures --- p.ix
Chapter 1 --- Introduction --- p.1
Chapter 2 --- Chinese Information Processing --- p.6
Chapter 2.1 --- Chinese Word Segmentation --- p.7
Chapter 2.1.1 --- Statistical Method --- p.8
Chapter 2.1.2 --- Probabilistic Method --- p.9
Chapter 2.1.3 --- Linguistic Method --- p.10
Chapter 2.2 --- Automatic Indexing --- p.10
Chapter 2.2.1 --- Title Indexing --- p.11
Chapter 2.2.2 --- Free-Text Searching --- p.11
Chapter 2.2.3 --- Citation Indexing --- p.12
Chapter 2.3 --- Information Retrieval Systems --- p.13
Chapter 2.3.1 --- Users' Assessment of IRS --- p.13
Chapter 2.4 --- Concluding Remarks --- p.15
Chapter 3 --- Survey on Classification --- p.16
Chapter 3.1 --- Text Classification --- p.17
Chapter 3.2 --- Survey on Classification Schemes --- p.18
Chapter 3.2.1 --- Commonly Used Classification Systems --- p.18
Chapter 3.2.2 --- Classification of Newspapers --- p.31
Chapter 3.3 --- Concluding Remarks --- p.37
Chapter 4 --- System Models and the ACTION Algorithm --- p.38
Chapter 4.1 --- Factors Affecting Systems Performance --- p.38
Chapter 4.1.1 --- Specificity --- p.39
Chapter 4.1.2 --- Exhaustivity --- p.40
Chapter 4.2 --- Assumptions and Scope --- p.42
Chapter 4.2.1 --- Assumptions --- p.42
Chapter 4.2.2 --- System Scope ´ؤ Data Flow Diagrams --- p.44
Chapter 4.3 --- System Models --- p.48
Chapter 4.3.1 --- Article --- p.48
Chapter 4.3.2 --- Matching Table --- p.49
Chapter 4.3.3 --- Forest --- p.51
Chapter 4.3.4 --- Matching --- p.53
Chapter 4.4 --- Classification Rules --- p.54
Chapter 4.5 --- The ACTION Algorithm --- p.56
Chapter 4.5.1 --- Algorithm Design Objectives --- p.56
Chapter 4.5.2 --- Measuring Node Significance --- p.56
Chapter 4.5.3 --- Pseudocodes --- p.61
Chapter 4.6 --- Concluding Remarks --- p.64
Chapter 5 --- Analysis of Results and Validation --- p.66
Chapter 5.1 --- Seeking for Exhaustivity Rather Than Specificity --- p.67
Chapter 5.1.1 --- The News Article --- p.67
Chapter 5.1.2 --- The Matching Results --- p.68
Chapter 5.1.3 --- The Keyword Values --- p.68
Chapter 5.1.4 --- Analysis of Classification Results --- p.71
Chapter 5.2 --- Catering for Hierarchical Relationships Between Classes and Subclasses --- p.72
Chapter 5.2.1 --- The News Article --- p.72
Chapter 5.2.2 --- The Matching Results --- p.73
Chapter 5.2.3 --- The Keyword Values --- p.74
Chapter 5.2.4 --- Analysis of Classification Results --- p.75
Chapter 5.3 --- A Representative With Zero Occurrence --- p.78
Chapter 5.3.1 --- The News Article --- p.78
Chapter 5.3.2 --- The Matching Results --- p.79
Chapter 5.3.3 --- The Keyword Values --- p.80
Chapter 5.3.4 --- Analysis of Classification Results --- p.81
Chapter 5.4 --- Statistical Analysis --- p.83
Chapter 5.4.1 --- Classification Results with Highest Occurrence Frequency --- p.83
Chapter 5.4.2 --- Classification Results with Zero Occurrence Frequency --- p.85
Chapter 5.4.3 --- Distribution of Classification Results on Level Numbers --- p.86
Chapter 5.5 --- Concluding Remarks --- p.87
Chapter 5.5.1 --- Advantageous Characteristics of ACTION --- p.88
Chapter 6 --- Conclusion --- p.93
Chapter 6.1 --- Perspectives in Document Representation --- p.93
Chapter 6.2 --- Classification Schemes --- p.95
Chapter 6.3 --- Classification System Model --- p.95
Chapter 6.4 --- The ACTION Algorithm --- p.96
Chapter 6.5 --- Advantageous Characteristics of the ACTION Algorithm --- p.96
Chapter 6.6 --- Testing and Validating the ACTION algorithm --- p.98
Chapter 6.7 --- Future Work --- p.99
Chapter 6.8 --- A Final Remark --- p.100
Chapter A --- System Models --- p.102
Chapter B --- Classification Rules --- p.104
Chapter C --- Node Significance Definitions --- p.105
References --- p.107

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Information storage and retrieval systems Chinese characters"

1

Wan, Tian-Long. Experiments with automatic indexing and a relational thesaurus in a Chinese information retrieval system. Ann Arbor, Mich: UMI Dissertation Services, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wen xian xin xi jian suo xue. Nanjing Shi: Nanjing shi fan da xue chu ban she, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Xuemei, Wu, and Dong Wu da xue (Taipei, Taiwan). Zhongguo wen xue xi, eds. Wen xian yu zi xun xue shu yan tao hui lun wen ji. Taibei Shi: Dong Wu da xue Zhongguo wen xue xi, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

Symposium on Computerized Information Retrieval (2nd 1987 Peking, China). Database development and Chinese information needs =: Shu ju ku jian she he Zhong guo qing bao xu qiu : proceedings of the Second Beijing International Symposium on Computerised Information Retrieval, 7-11 December 1987. London: Aslib, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Ji, Donghong. Chinese Lexical Semantics: 13th Workshop, CLSW 2012, Wuhan, China, July 6-8, 2012, Revised Selected Papers. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

6

Kyōkai, Nihon Kikaku, Kokuritsu Kokugo Kenkyūjo (Japan), Jōhō Shori Gakkai (Japan), and Japan Keizai Sangyōshō, eds. Hanʼyō denshi jōhō kōkan kankyō seibi puroguramu: Seika hōkokusho. Tōkyō: Nihon Kikaku Kyōkai, 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

1949-, Takada Tokio, and Kyōto Daigaku. 21-seiki COE Puroguramu. Higashi Ajia Sekai no Jinbun Jōhōgaku Kenkyū Kyōiku Kyoten., eds. "Higashi Ajia sekai no jinbun jōhōgaku kenkyū kyōiku kyoten" hōkokusho. Kyōto-shi: Kyōto Daigaku 21-seiki COE Puroguramu "Higashi Ajia Sekai no Jinbun Jōhōgaku Kenkyū Kyōiku Kyoten", 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

1949-, Takada Tokio, and Kyōto Daigaku. 21-seiki COE Puroguramu. Higashi Ajia Sekai no Jinbun Jōhōgaku Kenkyū Kyōiku Kyoten., eds. "Higashi Ajia sekai no jinbun jōhōgaku kenkyū kyōiku kyoten" hōkokusho. Kyōto-shi: Kyōto Daigaku 21-seiki COE Puroguramu "Higashi Ajia Sekai no Jinbun Jōhōgaku Kenkyū Kyōiku Kyoten", 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

9

Hyon, Kyu-sop. Hanguk pyojun hwakchang Hancha setu chejong pangan yongu (Chulpan yongu chongso). Hanguk Chulpan Yonguso, 1994.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

Guo ji biao zhun tong yong duo ba wei bian ma zi fu ji, tong yi di Zhong Ri Han Han zi =: ISO/IEC DIS 10646 information technology universal multiple-octet ... set (UCS) unified ideographic CJK characters. Xing guang (Zhong wen dian nao) you xian gong si (Xianggang), 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!