Journal articles on the topic 'Chinese language Technical Chinese Data processing'

To see the other types of publications on this topic, follow the link: Chinese language Technical Chinese Data processing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Chinese language Technical Chinese Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cheng, Xian Yi, Wei Kang, and Yu Guo. "An Algorithm of Network Sensitive Information Features Extracting." Applied Mechanics and Materials 556-562 (May 2014): 3558–61. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.3558.

Full text
Abstract:
In the vast ocean of data, how people can learn from the best, to eliminate the dross, the Internet age has become a major issue, also facing great challenge for data processing, and it is the key to develop the national network economy. For sensitive information filtering time lag, low accuracy, poor self-adaptability to the Internet, Chinese text media (webpage, micro-blogging, forum, etc.) for the study, using the technology of the opinion mining and natural language processing, study of sensitive information feature extraction algorithm to reveal the interrelationship of sensitive information and sensitive dictionary, providing technical support for sensitive dictionary and sensitive information recognition.
APA, Harvard, Vancouver, ISO, and other styles
2

ZHANG, WEN, TAKETOSHI YOSHIDA, and XIJIN TANG. "DISTRIBUTION OF MULTI-WORDS IN CHINESE AND ENGLISH DOCUMENTS." International Journal of Information Technology & Decision Making 08, no. 02 (June 2009): 249–65. http://dx.doi.org/10.1142/s0219622009003399.

Full text
Abstract:
As a hybrid of N-gram in natural language processing and collocation in statistical linguistics, multi-word is becoming a hot topic in area of text mining and information retrieval. In this paper, a study concerning distribution of multi-words is carried out to explore a theoretical basis for probabilistic term-weighting scheme. Specifically, the Poisson distribution, zero-inflated binomial distribution, and G-distribution are comparatively studied on a task of predicting probabilities of multi-words' occurrences using these distributions, for both technical multi-words and nontechnical multi-words. In addition, a rule-based multi-word extraction algorithm is proposed to extract multi-words from texts based on words' occurring patterns and syntactical structures. Our experimental results demonstrate that G-distribution has the best capability to predict probabilities of frequency of multi-words' occurrence and the Poisson distribution is comparable to zero-inflated binomial distribution in estimation of multi-word distribution. The outcome of this study validates that burstiness is a universal phenomenon in linguistic count data, which is applicable not only for individual content words but also for multi-words.
APA, Harvard, Vancouver, ISO, and other styles
3

Shi, Lijuan, Ang Li, and Lei Zhang. "Sustainable Fault Diagnosis of Imbalanced Text Mining for CTCS-3 Data Preprocessing." Sustainability 13, no. 4 (February 17, 2021): 2155. http://dx.doi.org/10.3390/su13042155.

Full text
Abstract:
At present, the method for fault diagnosis and maintenance of the CTCS-3 (Chinese Train Control System Level 3) electronic equipment relies too heavily on expert knowledge. Moreover, the use of historical fault data is not valued. This paper proposes a sustainable fault diagnosis model based on imbalanced text mining. First, to process fault data from the field recorded in natural language, natural language processing technology is used to extract fault feature words. Then, a term frequency-inverse document frequency model is used to transform the fault feature words extracted from the database into vectors. It is worth noting that imbalance in the fault samples affects the accuracy of this sustainable fault diagnosis model. To solve this problem, we use the borderline-synthetic minority over-sampling technique in the step of predicting train fault components, we also use the backpropagation neural network we proposed and the naive Bayesian model which is commonly used as a classification model, to compare the prediction accuracy of these two algorithms. The experimental results perform well, which proves that the fault diagnosis method using the backpropagation neural network can further assist engineers to complete timely repair and maintenance work. The research in this paper has played a very important role in technical support for intelligent train dispatching and command, and will also play a positive role in technical support for the automatic operation of urban rail transit under the prevention and control of the new coronavirus.
APA, Harvard, Vancouver, ISO, and other styles
4

Lingling, Wu, and Chen Fuli. "Role of AI Technology in Brend Building of Chinese Higher Education Institution – Thought Based on Integrated Marketing Communicanion." Marketing and Digital Technologies 5, no. 2 (June 29, 2021): 7–13. http://dx.doi.org/10.15276/mdt.5.2.2021.1.

Full text
Abstract:
As the competitions among higher education institutions (HEIs) intensify, brand building has gradually become an important means for HEIs to build their images and enhance their competitiveness. For HEIs, the significance of integrated marketing communication lies in the integration of brand image communication content, communication channel and communication process. At present, the influence of traditional communication channels declines, the influence of self-established media is limited, and the negative information is not monitored well. Under such circumstances, AI technology can provide technical support for integrated marketing communications of HEI brand. In terms of communication content, VR/AR, UAV, interactive games and chatbot are mainly applied. In the aspect of communication channels, the data mining technique is mainly used to achieve differentiated communication, and the big data analysis technique is adopted to integrate brand image information communication channels. With regard to negative information monitoring, the natural language processing technology can provide high-efficiency, full-coverage and round-the-clock negative information monitoring.
APA, Harvard, Vancouver, ISO, and other styles
5

He, Lanfei, Xuefei Zhang, Zhiwei Li, Peng Xiao, Ziming Wei, Xu Cheng, and Shaocheng Qu. "A Chinese Named Entity Recognition Model of Maintenance Records for Power Primary Equipment Based on Progressive Multitype Feature Fusion." Complexity 2022 (February 7, 2022): 1–11. http://dx.doi.org/10.1155/2022/8114217.

Full text
Abstract:
Presently, the State Grid Corporation of China has accumulated a large amount of maintenance records for power primary equipment. Unfortunately, most of these records are unstructured data which lead to difficultly analyze and utilize them. The emergence of natural language processing technology and deep learning methods provide a solution for unstructured text data. This paper proposes a progressive multitype feature fusion model to recognize Chinese named entity of unstructured maintenance records for power primary equipment. Firstly, the textual characteristics and word separation difficulties of maintenance records are analyzed, then 7 main entity categories of power technical terms from unstructured maintenance records are chosen, and 3452 maintenance records are labeled by these categories, which is so called EPE-MR training dataset. Secondly, the standard test reports, standard maintenance, and fault analysis reports for three types of power primary equipment (namely, main transformer, circuit breaker, and isolating switch) are employed as corpus to train character embedding in order to obtain certain words representation ability of maintenance records. After that, progressive multilevel radicals feature extraction module is designed to get detailed and fine semantic information in a hierarchical manner. Further, radicals feature representation and character embedding are concatenated and sent to BiLSTM module to extract contextual information in order to improve Chinese entity recognition ability. Moreover, CRF is introduced to handle the dependencies among prediction labels and to output the optimal prediction sequence, which can easily obtain structured data of maintenance records. Finally, comparative experiments on public MSRA dataset, China People’s Daily corpus, and EPE-MR dataset are implemented, respectively, which show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Zi Li. "Research and Application of Clustering Algorithm for Text Big Data." Computational Intelligence and Neuroscience 2022 (June 8, 2022): 1–8. http://dx.doi.org/10.1155/2022/7042778.

Full text
Abstract:
In the era of big data, text as an information reserve database is very important, in all walks of life. From humanities research to government decision-making, from precision medicine to quantitative finance, from customer management to marketing, massive text, as one of the most important information carriers, plays an important role everywhere. The text data generated in these practical problems of humanities research, financial industry, marketing, and other fields often has obvious domain characteristics, often containing the professional vocabulary and unique language patterns in these fields and often accompanied by a variety of “noise.” Dealing with such texts is a great challenge for the current technical conditions, especially for Chinese texts. A clustering algorithm provides a better solution for text big data information processing. Clustering algorithm is the main body of cluster analysis, K-means algorithm with its implementation principle is simple, low time complexity is widely used in the field of cluster analysis, but its K value needs to be preset, initial clustering center random selection into local optimal solution, other clustering algorithm, such as mean drift clustering, K-means clustering in mining text big data. In view of the problems of the above algorithm, this paper first extracts and analyzes the text big data and then does experiments with the clustering algorithm. Experimental conclusion: by analyzing large-scale text data limited to large-scale and simple data set, the traditional K-means algorithm has low efficiency and reduced accuracy, and the K-means algorithm is susceptible to the influence of initial center and abnormal data. According to the above problems, the K-means cluster analysis algorithm for data sets with large data volumes is analyzed and improved to improve its execution efficiency and accuracy on data sets with large data volume set. Mean shift clustering can be regarded as making many random centers move towards the direction of maximum density gradually, that is, moving their mean centroid continuously according to the probability density of data and finally obtaining multiple maximum density centers. It can also be said that mean shift clustering is a kernel density estimation algorithm.
APA, Harvard, Vancouver, ISO, and other styles
7

Kamal, Suhail Muhammad, Yidong Chen, Shaozi Li, Xiaodong Shi, and Jiangbin Zheng. "Technical Approaches to Chinese Sign Language Processing: A Review." IEEE Access 7 (2019): 96926–35. http://dx.doi.org/10.1109/access.2019.2929174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Semenov, Kirill I., Armine K. Titizian, Aleksandra O. Piskunova, Yulia O. Korotkova, Alena D. Tsvetkova, Elena A. Volf, Alexandra S. Konovalova, and Yulia N. Kuznetsova. "Linguistic Annotation of Translated Chinese Texts: Coordinating Theory, Algorithms and Data." Journal of Linguistics/Jazykovedný casopis 72, no. 2 (December 1, 2021): 590–602. http://dx.doi.org/10.2478/jazcas-2021-0054.

Full text
Abstract:
Abstract The article tackles the problems of linguistic annotation in the Chinese texts presented in the Ruzhcorp – Russian-Chinese Parallel Corpus of RNC, and the ways to solve them. Particular attention is paid to the processing of Russian loanwords. On the one hand, we present the theoretical comparison of the widespread standards of Chinese text processing. On the other hand, we describe our experiments in three fields: word segmentation, grapheme-to-phoneme conversion, and PoS-tagging, on the specific corpus data that contains many transliterations and loanwords. As a result, we propose the preprocessing pipeline of the Chinese texts, that will be implemented in Ruzhcorp.
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Yi. "Processing relative clauses in Chinese as a second language." Second Language Research 30, no. 4 (July 8, 2014): 439–61. http://dx.doi.org/10.1177/0267658313511485.

Full text
Abstract:
This project investigates second language (L2) learners’ processing of four types of Chinese relative clauses crossing extraction types and demonstrative-classifier (DCl) positions. Using a word order judgment task with a whole-sentence reading technique, the study also discusses how psycholinguistic theories bear explanatory power in L2 data. An overall preference for DCl-first structures and an advantage of DCl-subject relative clauses over the other three structures were found. Results were largely compatible with the filler-gap domain theory and indicated a weak subject-gap advantage. These motivations are subject to influences from other factors, and a multi-constraint proposal was proposed.
APA, Harvard, Vancouver, ISO, and other styles
10

Lu, Cailing, Frank Boers, and Averil Coxhead. "Exploring learners’ understanding of technical vocabulary in Traditional Chinese Medicine." Studies in Second Language Learning and Teaching 11, no. 1 (March 29, 2021): 71–101. http://dx.doi.org/10.14746/ssllt.2021.11.1.4.

Full text
Abstract:
This study explores English for specific purposes learners’ understanding of technical words in a previously-developed technical word list in Traditional Chinese Medicine (TCM). The principal aim was to estimate what kind of technical terms pose problems to TCM learners and might therefore merit special attention in instruction. Of particular interest was the question whether there is a divergence in the understanding of technical vocabulary in TCM between Chinese and Western background learners. To achieve these aims, a combination of word association tasks and retrospective interviews was implemented with 11 Chinese and 10 Western background TCM learners. The data showed that both Chinese and Western learners encountered certain difficulties in understanding technical vocabulary in their study. However, their sources of difficulty were different. Comparisons of typical word associations between Chinese and Western learners indicated that there was a degree of divergence in the way these two participant groups understood TCM terms.
APA, Harvard, Vancouver, ISO, and other styles
11

Chen, Jinyan, Susanne Becken, and Bela Stantic. "Lexicon based Chinese language sentiment analysis method." Computer Science and Information Systems 16, no. 2 (2019): 639–55. http://dx.doi.org/10.2298/csis181015013c.

Full text
Abstract:
The growing number of social media users and vast volume of posts could provide valuable information about the sentiment toward different locations, services as well as people. Recent advances in Big Data analytics and natural language processing often means to automatically calculate sentiment in these posts. Sentiment analysis is challenging and computationally demanding task due to the volume of data, misspelling, emoticons as well as abbreviations. While significant work was directed toward the sentiment analysis of English text there is limited attention in literature toward the sentiment analytic of Chinese language. In this work we propose method to identify the sentiment in Chinese social media posts and to test our method we rely on posts sent by visitors of Great Barrier Reef by users of most popular Chinese social media platform Sina Weibo. We elaborate process of capturing of weibo posts, describe a creation of lexicon as well as develop and explain algorithm for sentiment calculation. In case study, related to sentiment toward the different GBR destinations, we demonstrate that the proposed method is effective in obtaining the information and is suitable to monitor visitors? opinion.
APA, Harvard, Vancouver, ISO, and other styles
12

Jin, Lingzi, Zuohua Ding, and Huihui Zhou. "Evaluation of Chinese Natural Language Processing System Based on Metamorphic Testing." Mathematics 10, no. 8 (April 12, 2022): 1276. http://dx.doi.org/10.3390/math10081276.

Full text
Abstract:
A natural language processing system can realize effective communication between human and computer with natural language. Because its evaluation method relies on a large amount of labeled data and human judgment, the question of how to systematically evaluate its quality is still a challenging task. In this article, we use metamorphic testing technology to evaluate natural language processing systems from the user’s perspective to help users better understand the functionalities of these systems and then select the appropriate natural language processing system according to their specific needs. We have defined three metamorphic relation patterns. These metamorphic relation patterns respectively focus on some characteristics of different aspects of natural language processing. Moreover, on this basis, we defined seven metamorphic relations and chose three tasks (text similarity, text summarization, and text classification) to evaluate the quality of the system. Chinese is used as target language. We extended the defined abstract metamorphic relations to these tasks, and seven specific metamorphic relations were generated for each task. Then, we judged whether the metamorphic relations were satisfied for each task, and used them to evaluate the quality and robustness of the natural language processing system without reference output. We further applied the metamorphic test to three mainstream natural language processing systems (including BaiduCloud API, AliCloud API, and TencentCloud API), and on the PWAS-X datasets, LCSTS datasets, and THUCNews datasets. Experiments were carried out, revealing the advantages and disadvantages of each system. These results further show that the metamorphic test can effectively test the natural language processing system without annotated data.
APA, Harvard, Vancouver, ISO, and other styles
13

Xu, Fan, Yangjie Dan, Keyu Yan, Yong Ma, and Mingwen Wang. "Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 2 (March 31, 2022): 1–21. http://dx.doi.org/10.1145/3473499.

Full text
Abstract:
Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.
APA, Harvard, Vancouver, ISO, and other styles
14

Ding, Chen, and Barry Lee Reynolds. "The effects of L1 congruency, L2 proficiency, and the collocate-node relationship on the processing of L2 English collocations by L1-Chinese EFL learners." Review of Cognitive Linguistics 17, no. 2 (December 31, 2019): 331–57. http://dx.doi.org/10.1075/rcl.00038.din.

Full text
Abstract:
Abstract This study investigated the effects of first language (L1) congruency, second language (L2) proficiency, and the collocate-node relationship (i.e., verb-noun, adjective-noun, noun-noun) on collocation processing by logographic L1-Chinese learners of English. Comparisons were made of accuracy rates and response times to a collocation lexical decision task completed by L1-Chinese English as a Foreign Language (EFL) English Majors (n = 30), L1-Chinese EFL non-English Majors (n = 30), and L1-English Native Speakers (n = 26). Analysis of the data revealed that while congruent collocations were processed more accurately and faster than incongruent collocations by both L1-Chinese participant groups, the English Majors showed a processing advantage over their non-English Major peers. Further analysis revealed a processing advantage for noun-noun collocations, providing additional evidence in explaining the difficulties L1-Chinese have in acquiring verb-noun collocations. These results and other nuanced statistical findings are discussed in relation to pedagogical means of enhancing L2 collocation acquisition by L1-Chinese speakers.
APA, Harvard, Vancouver, ISO, and other styles
15

Zhao, Shuai, Fucheng You, Wen Chang, Tianyu Zhang, and Man Hu. "Augment BERT with average pooling layer for Chinese summary generation." Journal of Intelligent & Fuzzy Systems 42, no. 3 (February 2, 2022): 1859–68. http://dx.doi.org/10.3233/jifs-211229.

Full text
Abstract:
The BERT pre-trained language model has achieved good results in various subtasks of natural language processing, but its performance in generating Chinese summaries is not ideal. The most intuitive reason is that the BERT model is based on character-level composition, while the Chinese language is mostly in the form of phrases. Directly fine-tuning the BERT model cannot achieve the expected effect. This paper proposes a novel summary generation model with BERT augmented by the pooling layer. In our model, we perform an average pooling operation on token embedding to improve the model’s ability to capture phrase-level semantic information. We use LCSTS and NLPCC2017 to verify our proposed method. Experimental data shows that the average pooling model’s introduction can effectively improve the generated summary quality. Furthermore, different data needs to be set with varying pooling kernel sizes to achieve the best results through comparative analysis. In addition, our proposed method has strong generalizability. It can be applied not only to the task of generating summaries, but also to other natural language processing tasks.
APA, Harvard, Vancouver, ISO, and other styles
16

Zheng, Zihui. "Logical Intelligent Detection Algorithm of Chinese Language Articles Based on Text Mining." Mobile Information Systems 2021 (December 16, 2021): 1–10. http://dx.doi.org/10.1155/2021/8115551.

Full text
Abstract:
With the advent of the big data era and the rapid development of the Internet industry, the information processing technology of text mining has become an indispensable role in natural language processing. In our daily life, many things cannot be separated from natural language processing technology, such as machine translation, intelligent response, and semantic search. At the same time, with the development of artificial intelligence, text mining technology has gradually developed into a research hotspot. There are many ways to realize text mining. This paper mainly describes the realization of web text mining and the realization of text structure algorithm based on HTML through a variety of methods to compare the specific clustering time of web text mining. Through this comparison, we can also get which web mining is the most efficient. The use of WebKB datasets for many times in experimental comparison also reflects that Web text mining for the Chinese language logic intelligent detection algorithm provides a basis.
APA, Harvard, Vancouver, ISO, and other styles
17

Ji, Hongbo, and Christina L. Gagné. "Lexical and relational influences on the processing of Chinese modifier-noun compounds." Mental Lexicon 2, no. 3 (December 7, 2007): 387–417. http://dx.doi.org/10.1075/ml.2.3.05ji.

Full text
Abstract:
The data from three experiments indicate that recent exposure to a similar Chinese modifier­-noun compound (e.g., 书柜, book cabinet or 饼店, cookie store) influences the ease of processing a subsequent compound (e.g., 书店, book store) by increasing the availability of the lexical entries for the individual constituents, and by altering the availability of the relation (e.g., noun FOR modifier) used to bind the two constituents. The results imply that theories and models about compound processing should take the representation of relational information into account and should be able to accommodate the influence of relation availability.
APA, Harvard, Vancouver, ISO, and other styles
18

Goswami, Usha, H. L. Sharon Wang, Alicia Cruz, Tim Fosker, Natasha Mead, and Martina Huss. "Language-universal Sensory Deficits in Developmental Dyslexia: English, Spanish, and Chinese." Journal of Cognitive Neuroscience 23, no. 2 (February 2011): 325–37. http://dx.doi.org/10.1162/jocn.2010.21453.

Full text
Abstract:
Studies in sensory neuroscience reveal the critical importance of accurate sensory perception for cognitive development. There is considerable debate concerning the possible sensory correlates of phonological processing, the primary cognitive risk factor for developmental dyslexia. Across languages, children with dyslexia have a specific difficulty with the neural representation of the phonological structure of speech. The identification of a robust sensory marker of phonological difficulties would enable early identification of risk for developmental dyslexia and early targeted intervention. Here, we explore whether phonological processing difficulties are associated with difficulties in processing acoustic cues to speech rhythm. Speech rhythm is used across languages by infants to segment the speech stream into words and syllables. Early difficulties in perceiving auditory sensory cues to speech rhythm and prosody could lead developmentally to impairments in phonology. We compared matched samples of children with and without dyslexia, learning three very different spoken and written languages, English, Spanish, and Chinese. The key sensory cue measured was rate of onset of the amplitude envelope (rise time), known to be critical for the rhythmic timing of speech. Despite phonological and orthographic differences, for each language, rise time sensitivity was a significant predictor of phonological awareness, and rise time was the only consistent predictor of reading acquisition. The data support a language-universal theory of the neural basis of developmental dyslexia on the basis of rhythmic perception and syllable segmentation. They also suggest that novel remediation strategies on the basis of rhythm and music may offer benefits for phonological and linguistic development.
APA, Harvard, Vancouver, ISO, and other styles
19

Wong, Wing Sze Winsy, and Sam Po Law. "Relationship Between Cognitive Functions and Multilevel Language Processing: Data From Chinese Speakers With Aphasia and Implications." Journal of Speech, Language, and Hearing Research 65, no. 3 (March 8, 2022): 1128–44. http://dx.doi.org/10.1044/2021_jslhr-21-00381.

Full text
Abstract:
Purpose: This study aims to investigate the relationship between nonverbal cognitive functions and language processing of people with aphasia (PWA) by taking a data-driven approach, as well as multiple cognitive components and multilevel linguistic perspectives. It is hypothesized that language performance is differentially associated with cognitive processing of PWA, with executive functions (EFs) playing a stronger role in language tasks with increasing linguistic complexity. Method: A language battery assessing word comprehension/production, sentence comprehension, and discourse production, together with a series of nonlinguistic cognitive tasks targeting simple/complex attention, short-term/working memory, or EFs, was administered to 53 Cantonese-speaking PWA. Cognitive factors extracted from principal component analysis applied to the cognitive battery served as predictors in four multiple regression analyses to predict PWA's performance at various linguistic levels. Results: Two cognitive factors, representing (a) simple attention and memory and (b) EF, were extracted. The former predicted performance in word processing tasks, whereas EF significantly predicted performance in all language tasks with increasing contribution as a function of linguistic complexity. Conclusion: The results based on Chinese PWA provide comprehensive evidence for the view that language performance is the end product of interaction between linguistic and nonlinguistic functions and have clear implications for clinical management of PWA.
APA, Harvard, Vancouver, ISO, and other styles
20

Juffs, Alan. "Some effects of first language argument structure and morphosyntax on second language sentence processing." Second Language Research 14, no. 4 (October 1998): 406–24. http://dx.doi.org/10.1191/026765898668800317.

Full text
Abstract:
This article explores some effects of first language verb-argument structure on second language processing of English as a second language. Speakers of Chinese, Japanese or Korean, three Romance languages and native English speakers provided word-by-word reading times and grammaticality judgement data in a self-paced reading task. Results suggest that reliable differences in parsing are not restricted to cases where verb-argument structure differs crosslinguistically.
APA, Harvard, Vancouver, ISO, and other styles
21

Tang, Liqun, Qiang Liu, Wanjiang Yang, and Jianying Wang. "Do agricultural services contribute to cost saving? Evidence from Chinese rice farmers." China Agricultural Economic Review 10, no. 2 (May 8, 2018): 323–37. http://dx.doi.org/10.1108/caer-06-2016-0082.

Full text
Abstract:
Purpose The purpose of this paper is to clarify agricultural services into five categories, including agricultural materials supply service, financial service, technical service, machinery service and processing and sales service, and to examine the effect of agricultural services on cost saving of rice production in China. Design/methodology/approach Based on a three-year panel data set covering 3,421 rice farmers in 12 Chinese provinces collected from the state rice industry experiment stations’ fixed watch points of China Agriculture Research System, a stochastic frontier model which takes the price vectors of input variables into cost function is developed by stochastic frontier analysis method in the study. Findings There is a deviation between the actual cost and the minimum cost on rice production in China due to the loss of cost efficiency, whose score is 0.7983 at the mean. Agricultural services can help improve cost efficiency, thus contributing to cost saving. Specifically, the effect of technical service on cost saving is the highest, followed by processing and sales service, machinery service, financial service and agricultural materials supply service. Originality/value The results of this paper are of great significance to the effectiveness and efficiency of the targeted agricultural services and indicate implications for policy improvement under the context of clear upward trend of agricultural production costs.
APA, Harvard, Vancouver, ISO, and other styles
22

Gries, Stefan Th, and Stefanie Wulff. "The genitive alternation in Chinese and German ESL learners." International Journal of Corpus Linguistics 18, no. 3 (October 28, 2013): 327–56. http://dx.doi.org/10.1075/ijcl.18.3.04gri.

Full text
Abstract:
This paper exemplifies an approach to learner corpus data that adopts a multifactorial definition of ‘context’. We apply a logistic regression to 2,986 attestations of genitive alternation (the squirrel’s nest vs. the nest of the squirrel) from the Chinese and German sub-sections of the International Corpus of Learner English and the British component of the International Corpus of English that were coded for 12 factors. Importantly, the speakers’ L1 was included as a predictor to be able to compare properly the native speakers with the learners as well as the two learner groups with each other. The final regression model predicts all speakers’ genitive choices very accurately (> 93%) and suggests that (i) the learners rely heavily on processing-related factors, which can be overridden by semantic constraints, and (ii) learners’ choices are differentially modulated by their L1. We close with a discussion of how this context-based, multifactorial approach goes beyond traditional learner corpus research.
APA, Harvard, Vancouver, ISO, and other styles
23

Lu, Xinchao. "Propositional information loss in English-to-Chinese simultaneous conference interpreting." Babel. Revue internationale de la traduction / International Journal of Translation 64, no. 5-6 (December 31, 2018): 792–818. http://dx.doi.org/10.1075/babel.00070.lu.

Full text
Abstract:
Abstract Simultaneous Interpreting (SI) as a profession has been gaining momentum in China, but little has been researched on Chinese professional conference interpreting on a basis of large quantity of empirical data. This study adopts an information-based SI fidelity assessment approach to probe into the propositional information loss in an SI corpus of seventeen English(B)-Chinese(A) simultaneous interpreters’ interpretations, and through stimulated retrospective interviews of three conference interpreters. Results show that operational constraints (concurrent listening and speaking, time constraint and incremental processing), source language factors (speed, information density, accent, linguistic complexity, technicality, etc) and interpreting direction (B to A), etc, account for typical propositional omission, incompletion or error.
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Juan, Chenggang Wu, Tiemin Zhou, and Yaxuan Meng. "Cognate facilitation priming effect is modulated by writing system: Evidence from Chinese-English bilinguals." International Journal of Bilingualism 23, no. 2 (January 10, 2018): 553–66. http://dx.doi.org/10.1177/1367006917749062.

Full text
Abstract:
Aims: The present study aims to examine the cross-script cognate facilitation effect that cognates have processing advantages over non-cognates and this effect is strong evidence supporting the non-selective access hypothesis for bilinguals. Methodology: By adopting a masked translation priming paradigm, Experiment 1 used 48 Chinese–English cognates (Chinese words) and 48 non-cognates (Chinese words) as primes and their English translation equivalences as targets. Chinese–English bilinguals were instructed to judge whether the target stimuli were real words or not. In Experiment 2, another group of participants took the same lexical decision task as in Experiment 1, except that English–Chinese cognates and non-cognates (English words) served as primes and their Chinese translation equivalences were targets. Data and analysis: Response latency and accuracy data were submitted to a repeated-measures analysis of variance. Findings/conclusions: Experiment 1 showed that Chinese–English cognates (Chinese words) and non-cognates (Chinese words) produced similar priming effect, while Experiment 2 revealed that English–Chinese cognates (English words) generated a significant priming effect, whereas non-cognates (English words) failed to induce any priming effect. Overall, Chinese words did not show cognate advantage, while English words produced a significant cognate facilitation effect. These results might be attributed to different mappings from orthography to phonology in English and Chinese. Opaque mapping from orthography to phonology in Chinese hindered phonological activation and reduced Chinese–English cognate phonological priming effect. However, English–Chinese cognates benefited from transparent mapping from sound to print and thus generated a significant phonological priming effect. Implications of the current findings for bilingual word recognition models were discussed. Originality: The present study is the first to investigate the cross-script cognate facilitation effect by ensuring both the heterogeneity of primes and targets (English and Chinese) and the homogeneity of primes (Chinese or English). The results indicated that the writing systems of the primes constrained the cross-script cognate priming effect.
APA, Harvard, Vancouver, ISO, and other styles
25

Xu, Qing, and Zhiyou Wang. "A Data-Driven Model for Automated Chinese Word Segmentation and POS Tagging." Computational Intelligence and Neuroscience 2022 (September 16, 2022): 1–10. http://dx.doi.org/10.1155/2022/7622392.

Full text
Abstract:
Chinese natural language processing tasks often require the solution of Chinese word segmentation and POS tagging problems. Traditional Chinese word segmentation and POS tagging methods mainly use simple matching algorithms based on lexicons and rules. The simple matching or statistical analysis requires manual word segmentation followed by POS tagging, which leads to the inability to meet the practical requirements for label prediction accuracy. With the continuous development of deep learning technology, data-driven machine learning models provide new opportunities for automated Chinese word segmentation and POS tagging. Therefore, a data-driven automated Chinese word segmentation and POS tagging model is proposed in order to address the above problems. Firstly, the main idea and overall framework of the proposed automated model are outlined, and the tagging strategy and neural network language model used are described. Secondly, two main optimisations are made on the input side of the model: (1) the use of word2Vec for the representation of text features, thus representing the text as a distributed word vector; and (2) the use of an improved AlexNet for efficient encoding of long-range word, and the addition of an attention mechanism to the model. Finally, on the output side, an additional auxiliary loss function was designed to optimise the Chinese text based on its frequency. The experimental results show that the proposed model can significantly improve the accuracy and operational efficiency of Chinese word segmentation and POS tagging compared with other existing models, thus verifying its effectiveness and advancement.
APA, Harvard, Vancouver, ISO, and other styles
26

Jin, Peng, John Carroll, Yunfang Wu, and Diana McCarthy. "Distributional Similarity for Chinese: Exploiting Characters and Radicals." Mathematical Problems in Engineering 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/347257.

Full text
Abstract:
Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhou, Changyin, and Yuhuan Zhang. "An ERP Study on the Processing of Chinese Applied-Object Structures." Chinese Journal of Applied Linguistics 41, no. 2 (June 26, 2018): 204–17. http://dx.doi.org/10.1515/cjal-2018-0012.

Full text
Abstract:
AbstractVerb-argument relation is a very important aspect of syntax-semantics interaction in sentence processing. Previous ERP (event related potentials) studies in this field concentrated on the relation between the verb and its core arguments. The present study aims to reveal the ERP pattern of Chinese applied object structures (AOSs), in which a peripheral argument is promoted to occupy the position of the patient object, as compared with the patient object structures (POSs). The ERP data were collected when participants were asked to perform acceptability judgments about Chinese phrases. The result shows that, similar to the previous studies of number-of-argument violations, Chinese AOSs show a bilaterally distributed N400 effect. But different from all the previous studies of verb-argument relations, Chinese AOSs demonstrate a sustained anterior positivity (SAP). This SAP, which is very rare in the studies related to complexity of argument structure operation, reflects the integration difficulty of the newly promoted arguments and the progressive nature of well-formedness checking in the processing of Chinese AOSs which is in accordance with the metonymic mechanism of non-patient objects in the relevant cognitive study. It shows that, in Chinese, which is a paratactic language, semantics (thematic roles) plays a more important role in the syntax-semantics interface than that in hypotactic languages.
APA, Harvard, Vancouver, ISO, and other styles
28

Yu, Chenghai, Shupei Wang, and Jiajun Guo. "Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model." International Journal of Technology and Human Interaction 15, no. 3 (July 2019): 47–62. http://dx.doi.org/10.4018/ijthi.2019070104.

Full text
Abstract:
Chinese word segmentation is the basis of the Chinese natural language processing (NLP). With the development of the deep learning, various neural network models are applied to the Chinese word segmentation. However, current neural network models have the characteristics of artificial feature extraction, nonstandard word-weight, inability to effectively use long-distance information and long training time of models in Chinese word segmentation. To solve a series of problems, this article presents a CNN-Bidirectional GRU-CRF neural network model (CNN Bidirectional GRU CRF Network, CBiGCN), which breaks through the limit of conventional method window, truly realizes end-to-end processing and applies to the neural network model by the five-Tag set method, bias-variable-weight greedy strategy and supplements by Goldstein-Armijo guidelines. Besides, this model, with simple structure, is easy to be operated. And it can automatically learn features, reduces large amounts of tasks on specific knowledge in the form of handcrafted features and data pre-processing, makes use of context information effectively. The authors set an experiment with two data corpuses for Chinese word segmentation to evaluate their system. The experiment verified their new model can obtain better Chinese word segmentation results and greatly reduce training time.
APA, Harvard, Vancouver, ISO, and other styles
29

Yao, Qin, and Claire Renaud. "Processing Chinese relative clauses: An investigation of second-language learners from different learning contexts." Chinese as a Second Language Research 5, no. 2 (October 1, 2016): 155–86. http://dx.doi.org/10.1515/caslar-2016-0007.

Full text
Abstract:
AbstractThe goal of this study is to examine the processing of Chinese relative clauses (RCs) through a self-paced reading task and to determine whether the learning environment plays a role in the second-language (L2) acquisition of RCs. We investigated two types of RCs (subject vs. object RCs) along with two positions in which a RC can occur (modifying a matrix subject noun phrase [NP] vs. a matrix object NP). Eighteen native speakers of Chinese and twenty-one L2 learners at an intermediate proficiency level participated in the study. Ten learners were students learning Chinese in the US (i. e., in a foreign-language context), whereas the other eleven learners were students studying Chinese in China (i. e., in a study-abroad context). The comprehension of sentences containing a RC and reading times (RTs) on the RC and the head noun (the segment immediately following the RC) were analyzed. The results show distinct patterns for the learners and the native speakers. The accuracy data reveals that the L2 learners in China performed better than the L2 learners in the US. Additionally, the L2 learners in China exhibited a processing speed advantage to the L2 learners in the US. The RT data highlighted important asymmetries in the L2 learners in the US and the native speakers, while the results were flat for the L2 learners in China. Specifically, L2 learners in the US took longer to read object RCs than subject RCs while the opposite pattern was obtained for the L1 speakers. Moreover, matrix-object-modifying RCs revealed shorter RTs than matrix-subject-modifying RCs for L2 learners in the US, whereas the opposite pattern was found for the L1 speakers. These findings are discussed in light of the Linear Distance Theory and the Structural Distance Theory (e. g., O’Grady 1997. Syntactic development. Chicago: University of Chicago Press). Overall, these results seem to provide support to the assumption that changes in syntactic processing happen as a result of exposure to the language environment (Cuetos et al. 1996. Parsing in different languages. In Manuel Carreias, Jose E. Garcia-Albea & Nuria Sebastien-Galles (eds.), Language processing in Spanish, 145–187. Mahwah, NJ: Erlbaum; Frenck–Mestre 2002. An on-line look at sentence processing in the second language. In Roberto Heredia & Jeanette Altarriba (eds.), Bilingual sentence processing, 217–236. Amsterdam: Elsevier Science Publishers.).
APA, Harvard, Vancouver, ISO, and other styles
30

Yang, Xiao. "Application of Speech Recognition Technology in Chinese English Simultaneous Interpretation of Law." International Journal of Circuits, Systems and Signal Processing 16 (March 30, 2022): 956–63. http://dx.doi.org/10.46300/9106.2022.16.117.

Full text
Abstract:
Speech recognition is an important research field in natural language processing. In Chinese and English, which have rich data resources, the performance of end-to-end speech recognition model is close to that of Hidden Markov Model—Deep Neural Network (HMM-DNN) model. However, for the low resource speech recognition task of Chinese English hybrid, the end-to-end speech recognition system does not achieve good performance. In the case of limited mixed data between Chinese and English, the modeling method of end-to-end speech recognition is studied. This paper focuses on two end-to-end speech recognition models: connection timing distribution and attention based codec network. In order to improve the performance of Chinese English hybrid speech recognition, this paper studies how to improve the performance of the coder based on connection timing distribution model and attention mechanism, and tries to combine the two models to improve the performance of Chinese English hybrid speech recognition. In low resource Chinese English mixed data, the advantages of different models are used to improve the performance of end-to-end models, so as to improve the recognition accuracy of speech recognition technology in legal Chinese English simultaneous interpretation.
APA, Harvard, Vancouver, ISO, and other styles
31

Yang, Guiyun. "Features of using e-resources when learning Chinese as a second foreign language in secondary school." RUDN Journal of Informatization in Education 19, no. 3 (December 15, 2022): 171–82. http://dx.doi.org/10.22363/2312-8631-2022-19-3-171-182.

Full text
Abstract:
Problem and goal. A priority aspect of learning Chinese as a foreign language is the use of electronic educational complexes and materials. However, in the context of the complex digitalization of society, special factors appear that should be taken into account when developing and implementing appropriate electronic educational tools. The purpose is to explore the features and potential of e-resources for teaching Chinese as a foreign language in secondary school. Methodology. Theoretical analysis and generalization of the literature are used to describe the conditions for effective teaching of the Chinese language, and the problems of improving the quality of students' educational results. The experiment involved 52 students from the Vyatka Humanitarian Gymnasium. Learning Chinese as a foreign language is supported by StudyChinese.ru, Chinese Boost, Shibushi.ru services. Fisher's criterion was used for statistical data processing. Results. In the experimental group, primary school students used e-resources for comprehensive informatization at all stages of learning Chinese as a foreign language (speaking, listening, reading, writing, intercultural communication and collaboration). An assessment of learning outcomes was made and statistically significant differences in the qualitative changes that occurred in the pedagogical system were revealed. The features of using e-resources for integrated informatization in the study of Chinese as a second foreign language in secondary school are described. Conclusion. The types of activities and interactive exercises in the information environment are formulated, which most effectively work to improve the quality of teaching Chinese as a second foreign language.
APA, Harvard, Vancouver, ISO, and other styles
32

Sun, Weiwei, and Xiaojun Wan. "Towards Accurate and Efficient Chinese Part-of-Speech Tagging." Computational Linguistics 42, no. 3 (September 2016): 391–419. http://dx.doi.org/10.1162/coli_a_00253.

Full text
Abstract:
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.
APA, Harvard, Vancouver, ISO, and other styles
33

Wang, Siting, Fuman Song, Qinqun Qiao, Yuanyuan Liu, Jiageng Chen, and Jun Ma. "A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data." Healthcare 10, no. 6 (June 15, 2022): 1119. http://dx.doi.org/10.3390/healthcare10061119.

Full text
Abstract:
(1) Background: Poor adherence to management behaviors in Chinese Type 2 diabetes mellitus (T2DM) patients leads to an uncontrolled prognosis of diabetes, which results in significant economic costs for China. It is imperative to quickly locate vulnerability factors in the management behavior of patients with T2DM. (2) Methods: In this study, a thematic analysis of the collected interview materials was conducted to construct the themes of T2DM management vulnerability. We explored the applicability of the pre-trained models based on the evaluation metrics in text classification. (3) Results: We constructed 12 themes of vulnerability related to the health and well-being of people with T2DM in Tianjin. We considered that Bidirectional Encoder Representation from Transformers (BERT) performed better in this Natural Language Processing (NLP) task with a shorter completion time. With the splitting ratio of 6:3:1 and batch size of 64 for BERT, the test accuracy was 97.71%, the completion time was 10 min 24 s, and the macro-F1 score was 0.9752. (4) Conclusions: Our results proved the applicability of NLP techniques in this specific Chinese-language medical environment. We filled the knowledge gap in the application of NLP technologies in diabetes management. Our study provided strong support for using NLP techniques to rapidly locate vulnerability factors in T2DM management.
APA, Harvard, Vancouver, ISO, and other styles
34

Jiang, Nan, Guiling Hu, Anna Chrabaszcz, and Lijuan Ye. "The activation of grammaticalized meaning in L2 processing: Toward an explanation of the morphological congruency effect." International Journal of Bilingualism 21, no. 1 (July 27, 2016): 81–98. http://dx.doi.org/10.1177/1367006915603823.

Full text
Abstract:
Objectives: The study was intended to test the hypothesis that L2 speakers have difficulty in automatically activating a grammaticalized L2 meaning that is not morphologically marked in L1. Methodology: The study consisted of three experiments. A sentence–picture matching task was designed to assess the activation of grammaticalized meaning. The participants were asked to judge if a sentence correctly described the physical relationships of three objects in a picture. Hidden in the stimuli that required a positive response was a number agreement manipulation whereby a noun phrase in the sentence may agree or disagree with the number of objects in the picture. A number disagreement effect, as shown in a delay in producing a positive response on items of number disagreement was used to assess automatic activation of number meanings. Data and Analysis: The data constituted reaction times and accuracy rates from 32 English native speakers, 36 Chinese native speakers, 54 Chinese–English bilinguals, and 26 Russian–English bilinguals. Analyses of variance were performed in analyzing these data. Findings: The results showed a number disagreement effect in L1 and L2 among Russian English as a second language (ESL) speakers only. Chinese ESL speakers showed no difference between the two critical conditions in either language. A follow-up experiment showed that Chinese ESL speakers had no difficulty in automatically activating number meanings which were expressed lexically in English sentence processing. These findings provided support for the idea that the well documented difficulty L2 learners have in learning incongruent L2 inflectional morphemes may have to do with their difficulty in automatically activating a grammaticalized meaning that is not grammaticalized in their L1. Originality: The sentence–picture matching task represented a unique and effective approach to the study of the activation of grammaticalized meanings. Significance: The findings from the study represented some first psycholinguistic evidence regarding the activation of grammaticalized meanings among non-native speakers.
APA, Harvard, Vancouver, ISO, and other styles
35

Si, Xiaopeng, Wenjing Zhou, and Bo Hong. "Cooperative cortical network for categorical processing of Chinese lexical tone." Proceedings of the National Academy of Sciences 114, no. 46 (October 30, 2017): 12303–8. http://dx.doi.org/10.1073/pnas.1710752114.

Full text
Abstract:
In tonal languages such as Chinese, lexical tone with varying pitch contours serves as a key feature to provide contrast in word meaning. Similar to phoneme processing, behavioral studies have suggested that Chinese tone is categorically perceived. However, its underlying neural mechanism remains poorly understood. By conducting cortical surface recordings in surgical patients, we revealed a cooperative cortical network along with its dynamics responsible for this categorical perception. Based on an oddball paradigm, we found amplified neural dissimilarity between cross-category tone pairs, rather than between within-category tone pairs, over cortical sites covering both the ventral and dorsal streams of speech processing. The bilateral superior temporal gyrus (STG) and the middle temporal gyrus (MTG) exhibited increased response latencies and enlarged neural dissimilarity, suggesting a ventral hierarchy that gradually differentiates the acoustic features of lexical tones. In addition, the bilateral motor cortices were also found to be involved in categorical processing, interacting with both the STG and the MTG and exhibiting a response latency in between. Moreover, the motor cortex received enhanced Granger causal influence from the semantic hub, the anterior temporal lobe, in the right hemisphere. These unique data suggest that there exists a distributed cooperative cortical network supporting the categorical processing of lexical tone in tonal language speakers, not only encompassing a bilateral temporal hierarchy that is shared by categorical processing of phonemes but also involving intensive speech–motor interactions over the right hemisphere, which might be the unique machinery responsible for the reliable discrimination of tone identities.
APA, Harvard, Vancouver, ISO, and other styles
36

Qiao, Yong Zhong, and Si Wen Liu. "Research on the Technical Fields Distribution of Patents Licensing of Chinese Firms in the Next-Generation Information Technology Industry." Applied Mechanics and Materials 530-531 (February 2014): 1142–45. http://dx.doi.org/10.4028/www.scientific.net/amm.530-531.1142.

Full text
Abstract:
By analyzing the patents licensing data of Huawei, Lenovo, ZTE and Datang in the next-generation information technology industry in China, the following conclusions can be drawn: Huawei and Lenovos technical fields of patents licensing respectively concentrate on the field of the manufacture of assemblages of electrical components, the fields of the speech analysis or synthesis, the speech recognition, the speech or voice processing and the speech or audio coding or decoding, ZTE and Datangs technical field of patents licensing both concentrate on field of the selecting.
APA, Harvard, Vancouver, ISO, and other styles
37

Zhu, Cong Hui, Shi Liang Wang, and De Quan Zheng. "Discriminate Chinese Word Segmenter with Global and Context Features." Applied Mechanics and Materials 198-199 (September 2012): 267–72. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.267.

Full text
Abstract:
Chinese Word segmenter is the basis for all subsequent applications of natural language processing. The Corpus-based statistic method has become the predominant method. However, the training corpora are not enough especially in certain areas. Therefore, we introduce some global features and context features in order to get almost the same performance only with much smaller scale corpus. The experiments results show that our approach significantly outperforms the original feature sets in the same training data. Meanwhile, the time-consuming of model training is also reduced. In addition, these features do not depend on classifiers, so our method can easily be changed to other models.
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Jinping, and Hong Liu. "Construction of Medical Academic English Translation Model Driven by Bilingual Corpus-Based Data." Scientific Programming 2022 (May 9, 2022): 1–10. http://dx.doi.org/10.1155/2022/2264235.

Full text
Abstract:
With the rapid development of information collection technology and natural language processing technology, the construction of English-Chinese bilingual parallel corpus has developed rapidly. The scale of corpus and related technology have a large research space; and how to obtain effective data and knowledge from massive resources, in order to better serve the basic and applied research, is becoming a trend. Based on the research of parallel bilingual corpora at home and abroad, this article extracts the text features of medical English and constructs English-Chinese bilingual corpora with different levels aligned. Based on the statistical analysis of the distribution characteristics of phrase structures and the acquisition of characteristic knowledge, an English-Chinese bilingual translation model is constructed based on the bilingual corpus, and then the phrase structure knowledge in English and Chinese sentences is extracted from the data-driven English-Chinese bilingual corpus through the model. The results show that the accuracy of the test set is 91.63% and the F value is 90.05% under the condition of keeping the recall rate stable. The accuracy of the text structure features has been significantly improved, and the expected effect has been achieved through the test.
APA, Harvard, Vancouver, ISO, and other styles
39

Liang, Yuzhi, Min Yang, Jia Zhu, and S. M. Yiu. "Out-domain Chinese new word detection with statistics-based character embedding." Natural Language Engineering 25, no. 2 (February 11, 2019): 239–55. http://dx.doi.org/10.1017/s1351324918000463.

Full text
Abstract:
AbstractUnlike English and other Western languages, many Asian languages such as Chinese and Japanese do not delimit words by space. Word segmentation and new word detection are therefore key steps in processing these languages. Chinese word segmentation can be considered as a part-of-speech (POS)-tagging problem. We can segment corpus by assigning a label for each character which indicates the position of the character in a word (e.g., “B” for word beginning, and “E” for the end of the word, etc.). Chinese word segmentation seems to be well studied. Machine learning models such as conditional random field (CRF) and bi-directional long short-term memory (LSTM) have shown outstanding performances on this task. However, the segmentation accuracies drop significantly when applying the same approaches to out-domain cases, in which high-quality in-domain training data are not available. An example of out-domain applications is the new word detection in Chinese microblogs for which the availability of high-quality corpus is limited. In this paper, we focus on out-domain Chinese new word detection. We first design a new method Edge Likelihood (EL) for Chinese word boundary detection. Then we propose a domain-independent Chinese new word detector (DICND); each Chinese character is represented as a low-dimensional vector in the proposed framework, and segmentation-related features of the character are used as the values in the vector.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhou, Lu, Shuangqiao Liu, Caiyan Li, Yuemeng Sun, Yizhuo Zhang, Yuda Li, Huimin Yuan, Yan Sun, Fengqin Xu, and Yuhang Li. "Natural Language Processing Algorithms for Normalizing Expressions of Synonymous Symptoms in Traditional Chinese Medicine." Evidence-Based Complementary and Alternative Medicine 2021 (October 11, 2021): 1–12. http://dx.doi.org/10.1155/2021/6676607.

Full text
Abstract:
Background. The modernization of traditional Chinese medicine (TCM) demands systematic data mining using medical records. However, this process is hindered by the fact that many TCM symptoms have the same meaning but different literal expressions (i.e., TCM synonymous symptoms). This problem can be solved by using natural language processing algorithms to construct a high-quality TCM symptom normalization model for normalizing TCM synonymous symptoms to unified literal expressions. Methods. Four types of TCM symptom normalization models, based on natural language processing, were constructed to find a high-quality one: (1) a text sequence generation model based on a bidirectional long short-term memory (Bi-LSTM) neural network with an encoder-decoder structure; (2) a text classification model based on a Bi-LSTM neural network and sigmoid function; (3) a text sequence generation model based on bidirectional encoder representation from transformers (BERT) with sequence-to-sequence training method of unified language model (BERT-UniLM); (4) a text classification model based on BERT and sigmoid function (BERT-Classification). The performance of the models was compared using four metrics: accuracy, recall, precision, and F1-score. Results. The BERT-Classification model outperformed the models based on Bi-LSTM and BERT-UniLM with respect to the four metrics. Conclusions. The BERT-Classification model has superior performance in normalizing expressions of TCM synonymous symptoms.
APA, Harvard, Vancouver, ISO, and other styles
41

Koda, Keiko. "The role of phonemic awareness in second language reading." Second Language Research 14, no. 2 (April 1998): 194–215. http://dx.doi.org/10.1191/026765898676398460.

Full text
Abstract:
This study investigates the effects of disparate L1 (first language) alphabetic experience on L2 (second language) phonemic awareness and decoding among ESL (English as a Second Language) readers with alphabetic and nonalphabetic L1 orthographic backgrounds. It was hypothesized that amount of L1 alphabetic experience is causally related to the development of L2 phonemic awareness and decoding skills. The specific objectives were threefold: to compare varying aspects of phonemic awareness among Chinese and Korean ESL learners; to explore the relationship between L2 phonemic awareness and decoding skills;and to examine the extent to which L2 text comprehension is facilitated by phonemic awareness and decoding skills. Data demonstrated that the two groups differed neither in their phonemic awareness nor in decoding;phonemic awareness was differentially related to decoding performance between the groups; and strong interconnections existed between reading comprehension, decoding and phonemic awareness among Korean participants, but no such direct relationships occurred among Chinese. Viewed collectively, these findings seem to suggest that, while differential L1 orthographic experience is not directly associated with L2 phonemic awareness, variations in prior processing experience may engender the use of diverse phonological processing procedures and, thus, account for qualitative differences in L2 processing behaviours.
APA, Harvard, Vancouver, ISO, and other styles
42

Egorova, Maia, Alexander Egorov, Tatyana Orlova, and Elizaveta Trifonova. "Methods of research of hieroglyphs on the oldest artifacts — introduction to problem: history, archeology, linguistics." OOO "Zhurnal "Voprosy Istorii" 2022, no. 3-1 (March 1, 2022): 17–25. http://dx.doi.org/10.31166/voprosyistorii202203statyi10.

Full text
Abstract:
It was given brief information about the Chinese language and Chinese language communities (from antiquity to the present day) and it was shown the history of the formation of the Chinese language. The oldest monuments of the Chinese language are the inscriptions on the bones and tortoise shells (Jiaguwen), which were used for fortune telling, as well as inscriptions on bronze vessels (Jinwen). In this paper, for the study of hieroglyphic inscriptions on the most ancient artifacts, it is proposed to use a method based on photometry of the investigated surface of the samples and subsequent digital processing of the obtained data on a computer in order to determine the characteristics and parameters of the test sample. To test the method, we used onyx, jasper, and jade specimens, the surface of which is similar to the surface of ancient specimens. Some features of this method for the study of ancient artifacts are noted. The possibility of obtaining reliable results in the study of the most ancient hieroglyphic signs is shown.
APA, Harvard, Vancouver, ISO, and other styles
43

Zhou, Shuchun. "Data Mining and Analysis of the Compatibility Law of Traditional Chinese Medicines Based on FP-Growth Algorithm." Journal of Mathematics 2021 (December 17, 2021): 1–10. http://dx.doi.org/10.1155/2021/1045152.

Full text
Abstract:
The compatibility law of prescriptions is the core link of TCM theory of “theory, method, prescription and medicine,” which is of great significance for guiding clinical practice, new drug development and revealing the scientific connotation of TCM theory, and is also one of the hot spots and difficulties of TCM modernization research. How to efficiently analyze the frequency of drug use, core combination, and association rules between drugs in prescription is a basic core problem in the study of prescription compatibility law. In this paper, a systematic study was made on the compatibility rules of traditional Chinese antiviral classical prescriptions and the mechanism of traditional Chinese medicine molecules. FP-growth algorithm was used to analyze association rules of 961 classical prescriptions collected and to explore the compatibility rules of traditional Chinese antiviral classical prescriptions. In terms of compatibility law of traditional Chinese antiviral prescriptions, this paper studied the compatibility law of traditional Chinese antiviral prescriptions based on the FP-growth algorithm and made exploratory research on the compatibility law information of 961 traditional classical antiviral prescriptions. Firstly, FP tree was constructed based on the classic recipe data set. Then, frequent item set rules were established, and association rules contained in FP tree were extracted. Finally, the frequency and association rules of antiviral TCM prescriptions were analyzed according to dosage forms (decoction, pill, paste, and ingot). The results show that the FP-growth algorithm adopted in this paper has excellent algorithm performance and strong generalization and robustness in the screening and mining of large-scale prescription data sets, which can provide important processing tools and technical methods for the study of the compatibility rule of traditional Chinese medicine prescriptions.
APA, Harvard, Vancouver, ISO, and other styles
44

Skobelkina, N. M., and W. Na. "DIFFICULTIES IN BUILDING THE AUDITORY AND PRONUNCIATION SKILLS IN CHINESE STUDENTS WHILE TEACHING RUSSIAN." Pedagogical IMAGE 15, no. 1 (2021): 16–25. http://dx.doi.org/10.32343/2409-5052-2021-15-1-16-25.

Full text
Abstract:
The introduction. The paper deals with a problem of Russian speech sound acquisition by Chinese students, examines the bilateral nature of this problem (sound perception and sound pronunciation), and identifies typical difficulties of the Chinese audience. Materials and methods. The paper analyzes the results of an empirical study aimed at identifying the most frequent difficulties encountered by Chinese students learning the Russian Language when mastering auditory and pronunciation skills. The study relies on the methods of experimental research, statistical data processing, and comparative analysis. Results. The findings show the correlation between the two types of mistakes (in perception and pronunciation of Russian sounds) made by Chinese students. This correlation made it possible to obtain the data on the extent to which the processes of sound unit perception and generation are interconnected and interdependent. The experimental study has identified the most typical difficulties of Chinese students and considered their causes. Conclusion. The study has shown that the number of sound perception mistakes significantly exceeds that of sound pronunciation ones. Therefore, focused work is required to develop auditory skills. Both types of mistakes result from the differences in sound systems of the Russian and Chinese languages, which should be fully taken into account when building audial and pronunciation skills of Chinese students. Keywords: the audial and pronunciation skills, sound system, differentiation of sounds, methods of teaching Russian as a foreign language.
APA, Harvard, Vancouver, ISO, and other styles
45

Xiaofei, Ren, Feng Qinghua, and Wang Nan. "A translator on the target stage." Babel. Revue internationale de la traduction / International Journal of Translation 56, no. 4 (December 31, 2010): 363–76. http://dx.doi.org/10.1075/babel.56.4.05xia.

Full text
Abstract:
Ying Ruocheng, an admirable artist in China and abroad, was responsible for the translation and production of many foreign plays in China and Chinese plays abroad, with which Ying played an important role in transforming China’s cultural life, encouraging international exchange and promoting modern drama. Based on his experience in drama and film acting and directing as well as translating, he argues that the major concern of theatre translation is its performability and speakability, which can be achieved through the recreation of the orality and gestic text with each role’s unique discourse and individuality. The paper is focused on researches on Ying’s text choice and his dramatic dialogue translation to explore the characteristics of his theatre translation and influence. The study selected his two well known translations and productions in the target theatre Death of a Salesman (English to Chinese) and The Family (Chinese to English) as case studies. Text processing software Concordance 3.0 and TextPreProcessing were used to collect appropriate data. Through the careful data analysis from the aspects of word frequency, sentence length, discourse markers and deixis, Ying Ruocheng’s idea of performability in theatrical translation were proved to be true, which demonstrates his discriminating taste of dramaturgical art and his great influence on Chinese modern drama.
APA, Harvard, Vancouver, ISO, and other styles
46

Vasilieva, Galina M., Marina A. Chepinskaya, and Jian Wang. "Public food service communication field in the Chinese students’ linguistic consciousness: ethnocultural barriers and obstacles." Russian Language Studies 20, no. 3 (September 30, 2022): 330–43. http://dx.doi.org/10.22363/2618-8163-2022-20-3-330-343.

Full text
Abstract:
The authors identify, classify and methodologically interpret communicative barriers and interferences arising for Chinese students in the sphere of catering, which is a fundamentally important culturally marked area of social and everyday communication. The relevance of this study is due to the significance and complexity of this social everyday sphere for the consciousness of foreign students, who face significant communicative barriers and obstacles that require methodological interpretation and consideration in the content of teaching Russian as a foreign language. The aims of this work are to identify and methodo- logically interpret communicative barriers of Chinese students in everyday communication (on the material of catering sphere), and to establish their correlations with the phenomena of lexical asymmetry. The main methods used in the research: mathematical processing of the obtained data, questionnaires, component analysis of vocabulary and comparison. The research material includes the results of the questionnaire aimed at identifying the barriers and obstacles encountered by Chinese students in the field of catering. It was determined that barriers and interference related to catering communication occurred for more than 75% of the students. Quantitative and meaningful processing of the questionnaire materials demonstrated that barriers and interference appeared in three main aspects: ethnographic, ethno- psychological, and ethnolinguistic. Methodologically oriented interpretation of communicative barriers is based on their correlation with the facts of the language. The integrative approach to the word taking into account its linguistic and extra-linguistic content determined which components of its structure translate ethnographic, ethnopsychological and ethnolinguistic differences of Russian and Chinese linguistic cultures that create communicative barriers. That is why the process of Russian language teaching considers conceptual, proper-lexical, semantic, connotative, background and contextual lacunas. Considering asymmetrical phenomena in the content of vocabulary teaching allows reducing the level of ethnographic, ethnopsychological and ethnolinguistic barriers and hindrances that arise for Chinese students in the sphere of catering. The prospects of the research include creation of the training dictionary “Gastronomic Culture Code in Language Vocabulary,” aimed at the Chinese students studying Russian.
APA, Harvard, Vancouver, ISO, and other styles
47

XUE, NAIWEN, FEI XIA, FU-DONG CHIOU, and MARTA PALMER. "The Penn Chinese TreeBank: Phrase structure annotation of a large corpus." Natural Language Engineering 11, no. 2 (May 19, 2005): 207–38. http://dx.doi.org/10.1017/s135132490400364x.

Full text
Abstract:
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The first two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (www.ldc.upenn.edu). In this paper, we discuss several Chinese linguistic issues and their implications for our treebanking efforts and how we address these issues when developing our annotation guidelines. We also describe our engineering strategies to improve speed while ensuring annotation quality.
APA, Harvard, Vancouver, ISO, and other styles
48

Zhang, Qinghui, Qihao Yuan, Pengtao Lv, Mengya Zhang, and Lei Lv. "Research on Medical Text Classification Based on Improved Capsule Network." Electronics 11, no. 14 (July 17, 2022): 2229. http://dx.doi.org/10.3390/electronics11142229.

Full text
Abstract:
In the medical field, text classification based on natural language process (NLP) has shown good results and has great practical application prospects such as clinical medical value, but most existing research focuses on English electronic medical record data, and there is less research on the natural language processing task for Chinese electronic medical records. Most of the current Chinese electronic medical records are non-institutionalized texts, which generally have low utilization rates and inconsistent terminology, often mingling patients’ symptoms, medications, diagnoses, and other essential information. In this paper, we propose a Capsule network model for electronic medical record classification, which combines LSTM and GRU models and relies on a unique routing structure to extract complex Chinese medical text features. The experimental results show that this model outperforms several other baseline models and achieves excellent results with an F1 value of 73.51% on the Chinese electronic medical record dataset, at least 4.1% better than other baseline models.
APA, Harvard, Vancouver, ISO, and other styles
49

Huang, Yingshen, Andrew Cox, and Laura Sbaffi. "Research Data Management Policy and Practice in China." International Journal of Digital Curation 15, no. 1 (December 31, 2020): 18. http://dx.doi.org/10.2218/ijdc.v15i1.718.

Full text
Abstract:
On April 2, 2018, the State Council of China formally released a national research data management (RDM) policy “Measures for Managing Scientific Data”. Literature review shows that university libraries have played an important role in supporting Research Data Management at an institutional level in countries in North America, Europe and Australasia. The aim of this paper is to capture the current status of RDM in Chinese universities, in particular how university libraries have involved in taking the agenda forward. This paper uses mixed methods: a website analysis of university policies and services; a questionnaire for university librarians; and semi-structured interviews. Findings from website analysis and questionnaires indicate that RDS at a local level in Chinese Universities are in their infancy. On the whole there is more evidence of activity in developing data repositories than support services. Despite the existence of a national policy there remain significant barriers to further service development, such as the lag in the creation of local policy, insufficient funding for technical infrastructure, shortages of staff skills in data curation, and language barriers to international data sharing and open science. RDS in Chinese university libraries are still lagging behind the English-speaking countries and Europe.
APA, Harvard, Vancouver, ISO, and other styles
50

Mi, Chenggang, Shaolin Zhu, and Rui Nie. "Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion." Computational Intelligence and Neuroscience 2021 (April 8, 2021): 1–9. http://dx.doi.org/10.1155/2021/9975078.

Full text
Abstract:
Loanword identification is studied in recent years to alleviate data sparseness in several natural language processing (NLP) tasks, such as machine translation, cross-lingual information retrieval, and so on. However, recent studies on this topic usually put efforts on high-resource languages (such as Chinese, English, and Russian); for low-resource languages, such as Uyghur and Mongolian, due to the limitation of resources and lack of annotated data, loanword identification on these languages tends to have lower performance. To overcome this problem, we first propose a lexical constraint-based data augmentation method to generate training data for low-resource language loanword identification; then, a loanword identification model based on a log-linear RNN is introduced to improve the performance of low-resource loanword identification by incorporating features such as word-level embeddings, character-level embeddings, pronunciation similarity, and part-of-speech (POS) into one model. Experimental results on loanword identification in Uyghur (in this study, we mainly focus on Arabic, Chinese, Russian, and Turkish loanwords in Uyghur) showed that our proposed method achieves best performance compared with several strong baseline systems.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography