Dissertations / Theses: 'Text processing (Computer science)'

1

Nyns, Roland. "Text grammar and text processing: a cognitivist approach." Doctoral thesis, Universite Libre de Bruxelles, 1989. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213285.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zaghloul, Waleed A. Lee Sang M. "Text mining using neural networks." Lincoln, Neb. : University of Nebraska-Lincoln, 2005. http://0-www.unl.edu.library.unl.edu/libr/Dissertations/2005/Zaghloul.pdf.

Full text

Abstract:

Thesis (Ph.D.)--University of Nebraska-Lincoln, 2005.
Title from title screen (sites viewed on Oct. 18, 2005). PDF text: 100 p. : col. ill. Includes bibliographical references (p. 95-100 of dissertation).

APA, Harvard, Vancouver, ISO, and other styles

3

Tumu, Sudheer. "An Investigative and Goal driven Workbench for Text Extraction and Image Processing." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1376930066.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

McCaffrey, Corey (Corey Stanley Gordon). "StarLogo TNG : the convergence of graphical programming and text processing." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/36904.

Full text

Abstract:

Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (leaves 67-68).
StarLogo TNG is a robust graphical programming environment for secondary students. Despite the educational advantages of graphical programming, TNG has sustained criticism from some who object to the exclusion of a textual language. Recognizing the benefits of text processing and the power of controlling software with a keyboard, I sought to incorporate text-processing techniques into TNG's graphical language. The key component of this work is an innovation dubbed "Typeblocking," by which users construct block code through the use of a keyboard.
by Corey McCaffrey.
M.Eng.and S.B.

APA, Harvard, Vancouver, ISO, and other styles

5

Ganguli, Nitu. "The design considerations for display oriented proportional text editors using bit-mapped graphics display systems /." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66142.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Lok, Shien-wai. "A galley and page formatter based on relations /." Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Joachims, Thorsten. "Learning to classify text using support vector machines /." Boston [u.a.] : Kluwer Acad. Publ, 2002. http://www.loc.gov/catdir/toc/fy032/2002022127.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Venour, Chris. "A computational model of lexical incongruity in humorous text." Thesis, University of Aberdeen, 2013. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=201735.

Full text

Abstract:

Many theories of humour claim that incongruity is an essential ingredient of humour. How- ever this idea is poorly understood and little work has been done in computational humour to quantify it. For example classifiers which attempt to distinguish jokes from regular texts tend to look for secondary features of humorous texts rather than for incongruity. Similarly most joke generators attempt to recreate structural patterns found in example jokes but do not deliberately endeavour to create incongruity. As in previous research, this thesis develops classifiers and a joke generator which attempt to automatically recognize and generate a type of humour. However the systems described here differ from previous programs because they implement a model of a certain type of humorous incongruity. We focus on a type of register humour we call lexical register jokes in which the tones of individual words are in conflict with each other. Our goal is to create a semantic space that reflects the kind of tone at play in lexical register jokes so that words that are far apart in the space are not simply different but exhibit the kinds of incongruities seen in lexical jokes. This thesis attempts to develop such a space and various classifiers are implemented to use it to distinguish lexical register jokes from regular texts. The best of these classifiers achieved high levels of accuracy when distinguishing between a test set of lexical register jokes and 4 different kinds of regular text. A joke generator which makes use of the semantic space to create original lexical register jokes is also implemented and described in this thesis. In a test of the generator, texts that were generated by the system were evaluated by volunteers who considered them not as humorous as human-made lexical register jokes but significantly more humorous than a set of control (i.e.non- joke) texts. This was an encouraging result which suggests that the vector space is somewhat successful in discovering lexical differences in tone and in modelling lexical register jokes.

APA, Harvard, Vancouver, ISO, and other styles

9

Green, Charles Arthur. "An empirical study on the effects of a collaboration-aware computer system and several communication media alternatives on product quality and time to complete in a co-authoring environment /." This resource online, 1992. http://scholar.lib.vt.edu/theses/available/etd-01122010-020201/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Zobair, Hamza A. "A method for finding common attributes in hetrogenous DoD databases." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Jun%5FZobair.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Bellettini, Carlo, Violetta Lonati, Dario Malchiodi, Mattia Monga, Anna Morpurgo, and Mauro Torelli. "What you see is what you have in mind : constructing mental models for formatted text processing." Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2013/6461/.

Full text

Abstract:

In this paper we report on our experiments in teaching computer science concepts with a mix of tangible and abstract object manipulations. The goal we set ourselves was to let pupils discover the challenges one has to meet to automatically manipulate formatted text. We worked with a group of 25 secondary school pupils (9-10th grade), and they were actually able to “invent” the concept of mark-up language. From this experiment we distilled a set of activities which will be replicated in other classes (6th grade) under the guidance of maths teachers.

APA, Harvard, Vancouver, ISO, and other styles

12

Oldham, Joseph Dowell. "Generating documents by means of computational registers." Lexington, Ky. : [University of Kentucky Libraries], 2000. http://lib.uky.edu/ETD/ukycosc2000d00006/oldham.pdf.

Full text

Abstract:

Thesis (Ph. D.)--University of Kentucky, 2000.
Title from document title page. Document formatted into pages; contains ix, 169 p. : ill. Includes abstract. Includes bibliographical references (p. 160-167).

APA, Harvard, Vancouver, ISO, and other styles

13

Ritholtz, Lee. "Intelligent text recognition system on a heterogeneous multi-core processor cluster a performance profile and architecture exploration /." Diss., Online access via UMI:, 2009.

Find full text

Abstract:

Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2009.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

14

Hon, Wing-kai. "On the construction and application of compressed text indexes." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31059739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Hon, Wing-kai, and 韓永楷. "On the construction and application of compressed text indexes." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31059739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Smith, Andrew Edward. "Development of a practical system for text content analysis and mining /." [St. Lucia, Qld.], 2002. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe17847.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Preece, Daniel Joseph. "Text Identification by Example." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2060.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Mick, Alan A. "Knowledge based text indexing and retrieval utilizing case based reasoning /." Online version of thesis, 1994. http://hdl.handle.net/1850/11715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Lazic, Marko. "Using Natural Language Processing to extract information from receipt text." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279302.

Full text

Abstract:

The ability to automatically read, recognize, and extract different information from unstructured text is of key importance to many areas. Most research in this area has been focused on scanned invoices. This thesis investigates the feasibility of using natural language processing to extract information from receipt text. Three different machine learning models, BiLSTM, GCN, and BERT, were trained to extract a total of 7 different data points from a dataset consisting of 790 receipts. In addition, a simple rule-based model is built to serve as a baseline. These four models were then compered on how well they perform on different data points. The best performing machine learning model was BERT with an overall F1 score of 0.455. The second best machine learning model was BiLSTM with the F1 score of 0.278 and GCN had the F1 score of 0.167. These F1 scores are highly affected by the low performance on the product list which was observed with all three models. BERT showed promising results on vendor name, date, tax rate, price, and currency. However, a simple rule-based method was able to outperform the BERT model on all data points except vendor name and tax rate. Receipt images from the dataset were often blurred, rotated, and crumbled which introduced a high OCR error. This error then propagated through all of the steps and was most likely the main rea- son why machine learning models, especially BERT were not able to perform. It is concluded that there is potential in using natural language processing for the problem of information extraction. However, further research is needed if it is going to outperform the rule-based models.
Förmågan att automatiskt läsa, känna igen och utvinna information från ostrukturerad text har en avgörande betydelse för många områden. Majoriteten av den forskning som gjorts inom området har varit inriktad på inskannade fakturor. Detta examensarbete undersöker huruvida språkteknologi kan användas för att utvinna information från kvittotext. Tre olika maskininlärningsmodeller, BiLSTM, GCN och BERT, tränades på att utvinna totalt 7 olika datapunkter från ett dataset bestående av 790 kvitton. Dessutom byggdes en enkel regel- baserad modell som en referens. Dessa fyra modeller har sedan jämförts på hur väl de presterat på de olika datapunkterna. Modellen som gav bäst resultat bland maskininlärningsmodellerna var BERT med F1-resultatet 0.455. Den näst bästa modellen var BiLSTM med F1-resultatet 0.278 medan GCN ha- de F1-resultat 0.167. Dessa resultat påverkas starkt av den låga prestandan på produktlistan som observerades med alla tre modellerna. BERT visade lovande resultat på leverantörens namn, datum, moms, pris och valuta. Dock hade den regelbaserade modellen bättre resultat på alla datapunkter förutom leve- rantörens namn och moms. Kvittobilder från datasetet är ofta suddiga, roterade och innehåller skrynkliga kvitton, vilket resulterar i ett högt fel hos maskinläsningverktyget. Detta fel propagerades sedan genom alla steg och var troligen den främsta anledningen till att maskininlärningsmodellerna, särskilt BERT, inte kunde prestera. Sammanfattningsvis kan slutsatsen dras att användandet av språkteknologi för att utvinna information från kvittotext har potential. Ytterligare forskning behövs dock om det ska användas istället för regelbaserade modeller.

APA, Harvard, Vancouver, ISO, and other styles

20

Williams, Ken. "A framework for text categorization." Thesis, The University of Sydney, 2003. https://hdl.handle.net/2123/27951.

Full text

Abstract:

The field of automatic Text Categorization (TC) concerns the creation of categorizer functions, usually involving Machine Learning techniques, to assign labels from a pre-defined set of categories to documents based on the documents' content. Because of the many variations on how this can be achieved and the diversity of applications in which it can be employed, creating specific TC applications is often a difficult task. This thesis concerns the design, implementation, and testing of an ObjectOriented Application Framework for Text Categorization. By encoding expertise in the architecture of the framework, many of the barriers to creating TC applications are eliminated. Developers can focus on the domain-specific aspects of their applications, leaving the generic aspects of categorization to the framework. This allows significant code and design reuse when building new applications. Chapter 1 provides an introduction to automatic Text Categorization, Object-Oriented Application Frameworks, and Design Patterns. Some common application areas and benefits of using automatic TC are discussed. Frameworks are defined and their advantages compared to other software engineering strategies are presented. Design patterns are defined and placed in the context of framework development. An overview of three related products in the TC space, Weka, Autonomy, and Teragram, follows. Chapter 2 contains a detailed presentation of Text Categorization. TC is formally defined, followed by a detailed account of the main functional areas in Text Categorization that a modern TC framework must provide. These include document tokenizing, feature selection and reduction, Machine Learning techniques, and categorization runtime behavior. Four Machine Learning techniques (Na"ive Bayes categorizers, k-Nearest-Neighbor categorizers, Support Vector Machines, and Decision Trees) are presented, with discussions of their core algorithms and the computational complexity involved. Several measures for evaluating the quality of a categorizer are then defined, including precision, recall, and the Ff3 measure. The design of a framework that addresses the functional areas from Chapter 2 is presented in Chapter 3. This design is motivated by consideration of the framework's audience and some expected usage scenarios. The core architectural classes in the framework are then presented, and Design Patterns are employed in a detailed discussion of the cooperative relationships among framework classes. This is the first known use of Design Patterns in an academic work on Text Categorization software. Following the presentation of the framework design, some possible design limitations are discussed. The design in Chapter 3 has been implemented as the AI: : Categorizer Perl package. Chapter 4 is a short discussion of implementation issues, including considerations in choosing the programming language. Special consideration is given to the implementation of constructor methods in the framework, since they are responsible for enforcing the structural relationships among framework classes. Three data structure issues within the framework are then discussed: feature vectors, sets of document or category objects, and the serialized representation of a framework object. Chapter 5 evaluates the framework from several different perspectives on two corpora. The first corpus is the standard Reuters-21578 benchmark corpus, and the second is assembled from messages sent to an educational ask-an-expert service. Using these corpora, the framework is evaluated on the measures introduced in Chapter 2. The performance on the first corpus is compared to the well-known results in [50]. The Nai·ve Bayes categorizer is found to be competitive with standard implementations in the literature, and the Support Vector Machine and k-Nearest-Neighbor implementations are outperformed by comparable systems by other researchers. The framework is then evaluated in terms of its resource usage, and several applications using AI: : Categorizer are presented in order to show the framework's ability to function in the usage scenarios discussed in Chapter 3.

APA, Harvard, Vancouver, ISO, and other styles

21

Lee, Wing Kuen. "Interpreting tables in text using probabilistic two-dimensional context-free grammars /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LEEW.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Hert, Ronald Sterling. "A Study of One Computer-Driven Text Analysis Package for Collegiate Student Writers." Thesis, University of North Texas, 1988. https://digital.library.unt.edu/ark:/67531/metadc331597/.

Full text

Abstract:

This study examines the effects of the computer-assisted text analysis program, WRITER'S WORKBENCH, on writing performance, levels of writing apprehension, students' writing processes and attitudes about using the computer and WORKBENCH for writing. A sample of 275 subjects enrolled in freshman composition were divided into an experimental group (N = 200) who used WORKBENCH in a mandatory computer lab component in addition to their composition course and a control group (N = 75) who received only the course, itself. Because random selection of participants was not possible, a Nonequivalent Control Group design was utilized. Holistic scoring of pre and posttest essays revealed a significant improvement in writing among both groups as a result of the treatments, but there was no significant difference in writing gains between the group using WORKBENCH and the group who did not (p = .942) . Similarly, though both groups demonstrated a small decrease in writing apprehension after instruction, there was no significant difference in the degree of decrease between the two groups (p = .201). Also, the data did not support a relationship between writing performance and apprehension. A 40 item questionnaire was given to the experimental group to determine: 1) attitudes about writing with a computer, 2) how students use WORKBENCH, and 3) students' attitudes about WORKBENCH. Some highlights of these findings are that narrow majorities enjoyed and were comfortable using the computer and WORKBENCH, but substantial minorities dissented or were uncertain. While 60% felt happier with their essays after using WORKBENCH and preferred using a computer to write, 89% of students felt word processing represented the greatest advantage and SPELL was the next most popular feature. Personal interviews conducted with 13 of the most and least apprehensive WORKBENCH users revealed that some students ignored the WORKBENCH analyses, and highly apprehensive students experienced more frustration with the computer, employed different writing processes, used WORKBENCH less often and less skillfully, and expressed more dissatisfaction with the computer.

APA, Harvard, Vancouver, ISO, and other styles

23

Sutter, Christopher M., and Mark D. Eramo. "Automated psychological categorization via linguistic processing system." Thesis, Monterey, California. Naval Postgraduate School, 2004. http://hdl.handle.net/10945/1439.

Full text

Abstract:

Approved for public release; distribution is unlimited
Influencing one's adversary has always been an objective in warfare. However, to date the majority of influence operations have been geared toward the masses or to very small numbers of individuals. Although marginally effective, this approach is inadequate with respect to larger numbers of high value targets and to specific subsets of the population. Limited human resources have prevented a more tailored approach, which would focus on segmentation, because individual targeting demands significant time from psychological analysts. This research examined whether or not Information Technology (IT) tools, specializing in text mining, are robust enough to automate the categorization/segmentation of individual profiles for the purpose of psychological operations (PSYOP). Research indicated that only a handful of software applications claimed to provide adequate functionality to perform these tasks. Text mining via neural networks was determined to be the best approach given the constraints of the profile data and the desired output. Five software applications were tested and evaluated for their ability to reproduce the results of a social psychologist. Through statistical analysis, it was concluded that the tested applications are not currently mature enough to produce accurate results that would enable automated segmentation of individual profiles based on supervised linguistic processing.
Captain, United States Marine Corps
Lieutenant, United States Navy

APA, Harvard, Vancouver, ISO, and other styles

24

Eramo, Mark D. Sutter Christopher M. "Automated psychological categorization via linguistic processing system /." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Sep%5FEramo.pdf.

Full text

Abstract:

Thesis (M.S. in Information Technology Management and M.S. in Information Systems and Operations)--Naval Postgraduate School, Sept. 2004.
Thesis advisor(s): Raymond Buettner, Magdi Kamel. Includes bibliographical references (p. 115-122). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

25

Currin, Aubrey Jason. "Text data analysis for a smart city project in a developing nation." Thesis, University of Fort Hare, 2015. http://hdl.handle.net/10353/2227.

Full text

Abstract:

Increased urbanisation against the backdrop of limited resources is complicating city planning and management of functions including public safety. The smart city concept can help, but most previous smart city systems have focused on utilising automated sensors and analysing quantitative data. In developing nations, using the ubiquitous mobile phone as an enabler for crowdsourcing of qualitative public safety reports, from the public, is a more viable option due to limited resources and infrastructure limitations. However, there is no specific best method for the analysis of qualitative text reports for a smart city in a developing nation. The aim of this study, therefore, is the development of a model for enabling the analysis of unstructured natural language text for use in a public safety smart city project. Following the guidelines of the design science paradigm, the resulting model was developed through the inductive review of related literature, assessed and refined by observations of a crowdsourcing prototype and conversational analysis with industry experts and academics. The content analysis technique was applied to the public safety reports obtained from the prototype via computer assisted qualitative data analysis software. This has resulted in the development of a hierarchical ontology which forms an additional output of this research project. Thus, this study has shown how municipalities or local government can use CAQDAS and content analysis techniques to prepare large quantities of text data for use in a smart city.

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Yalin. "Document analysis : table structure understanding and zone content classification /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Ramachandran, Venkateshwaran. "A temporal analysis of natural language narrative text." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Petersen, Sarah E. "Natural language processing tools for reading level assessment and text simplication for bilingual education /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6906.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Wang, Xuerui. "Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities." Amherst, Mass. : University of Massachusetts Amherst, 2009. http://scholarworks.umass.edu/open_access_dissertations/58/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Popescu, Ana-Maria. "Information extraction from unstructured web text /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Varcholik, Paul David. "Multi-touch for general-purpose computing an examination of text entry." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5074.

Full text

Abstract:

In recent years, multi-touch has been heralded as a revolution in human-computer interaction. Multi-touch provides features such as gestural interaction, tangible interfaces, pen-based computing, and interface customization--features embraced by an increasingly tech-savvy public. However, multi-touch platforms have not been adopted as "everyday" computer interaction devices; that is, multi-touch has not been applied to general-purpose computing. The questions this thesis seeks to address are: Will the general public adopt these systems as their chief interaction paradigm? Can multi-touch provide such a compelling platform that it displaces the desktop mouse and keyboard? Is multi-touch truly the next revolution in human-computer interaction? As a first step toward answering these questions, we observe that general-purpose computing relies on text input, and ask: "Can multi-touch, without a text entry peripheral, provide a platform for efficient text entry? And, by extension, is such a platform viable for general-purpose computing?" We investigate these questions through four user studies that collected objective and subjective data for text entry and word processing tasks. The first of these studies establishes a benchmark for text entry performance on a multi-touch platform, across a variety of input modes. The second study attempts to improve this performance by examining an alternate input technique. The third and fourth studies include mouse-style interaction for formatting rich-text on a multi-touch platform, in the context of a word processing task. These studies establish a foundation for future efforts in general-purpose computing on a multi-touch platform. Furthermore, this work details deficiencies in tactile feedback with modern multi-touch platforms, and describes an exploration of audible feedback. Finally, the thesis conveys a vision for a general-purpose multi-touch platform, its design and rationale.
ID: 029809614; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 270-277).
Ph.D.
Doctorate
Engineering and Computer Science
Modeling and Simulation

APA, Harvard, Vancouver, ISO, and other styles

32

Van, Leeuwen Theo. "Language and representation : the recontextualisation of participants, activities and reactions." Thesis, The University of Sydney, 1993. http://hdl.handle.net/2123/1615.

Full text

Abstract:

This thesis proposes a model for the description of social practice which analyses social practices into the following elements: (1) the participants of the practice; (2) the activities which constitute the practice; (3) the performance indicators which stipulate how the activities are to be performed; (4) the dress and body grooming for the participants; (5) the times when, and (6)the locations where the activities take place; (7) the objects, tools and materials, required for performing the activities; and (8) the eligibility conditions for the participants and their dress, the objects, and the locations, that is, the characteristics these elements must have to be eligible to participate in, or be used in, the social practice.

APA, Harvard, Vancouver, ISO, and other styles

33

Van, Leeuwen Theo. "Language and representation : the recontextualisation of participants, activities and reactions." University of Sydney, 1993. http://hdl.handle.net/2123/1615.

Full text

Abstract:

Doctor of Philosophy
This thesis proposes a model for the description of social practice which analyses social practices into the following elements: (1) the participants of the practice; (2) the activities which constitute the practice; (3) the performance indicators which stipulate how the activities are to be performed; (4) the dress and body grooming for the participants; (5) the times when, and (6)the locations where the activities take place; (7) the objects, tools and materials, required for performing the activities; and (8) the eligibility conditions for the participants and their dress, the objects, and the locations, that is, the characteristics these elements must have to be eligible to participate in, or be used in, the social practice.

APA, Harvard, Vancouver, ISO, and other styles

34

Geiss, Johanna. "Latent semantic sentence clustering for multi-document summarization." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609761.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Chiu, Pei-Wen Andy. "From Atoms to the Solar System: Generating Lexical Analogies from Text." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/2943.

Full text

Abstract:

A lexical analogy is two pairs of words (w₁, w₂) and (w₃, w₄) such that the relation between w₁ and w₂ is identical or similar to the relation between w₃ and w₄. For example, (abbreviation, word) forms a lexical analogy with (abstract, report), because in both cases the former is a shortened version of the latter. Lexical analogies are of theoretic interest because they represent a second order similarity measure: relational similarity. Lexical analogies are also of practical importance in many applications, including text-understanding and learning ontological relations.

This thesis presents a novel system that generates lexical analogies from a corpus of text documents. The system is motivated by a well-established theory of analogy-making, and views lexical analogy generation as a series of three processes: identifying pairs of words that are semantically related, finding clues to characterize their relations, and generating lexical analogies by matching pairs of words with similar relations. The system uses a dependency grammar to characterize semantic relations, and applies machine learning techniques to determine their similarities. Empirical evaluation shows that the system performs remarkably well, generating lexical analogies at a precision of over 90%.

APA, Harvard, Vancouver, ISO, and other styles

36

van, Schijndel Marten. "The Influence of Syntactic Frequencies on Human Sentence Processing." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1502452939626929.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Botha, Gerrti Reinier. "Text-based language identification for the South African languages." Pretoria : [s.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-090942008-133715/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Green, Charles A. "An empirical study on the effects of a collaboration-aware computer system and several communication media alternatives on product quality and time to complete in a co-authoring environment." Thesis, Virginia Tech, 1992. http://hdl.handle.net/10919/40617.

Full text

Abstract:

A new type of software, termed a "group editor", allows multiple users to create and simultaneously edit a single document; this software has ostensibly been developed to increase efficiency in co-authoring environments where users may not be co-located. However, questions as to the effectiveness of this type of communication aid, which is a member of the "groupware" family of tools used for some types of computer supported cooperative work, remain. Particularly, there has been very little objective data on any group editor because of the problems inherent in evaluating writing, as well as due to the few examples of group editors that exist. A method was developed to examine the effect of using a particular group editor, Aspectsâ ¢ from Group Technologies in Arlington, Va., in conjunction with several communication media, on a simple dyad writing task. Six dyads of college students familiar with journalistic writing were matched on attributes of dominance and writing ability and were asked to write short news articles based on short video clips in a balanced two factor within-subject analysis of variance design. Six conditions were tested based on communication media: audio only, audio plus video, and face-to-face; each of these with and without the availability of the group editor. Constraints inherent in the task attempted to enforce consistent document quality levels, measured by grammatical quality and content quality (correctness of infonnation and chronological sequencing). Time to complete the articles was used as a measure of efficiency, independent from quality due to the consistent quality levels of the resulting work. Results from the time data indicated a significant effect of communication media, with the face-to-face conditions taking significantly less time to complete than either of the other media alternatives. Grammatical quality of the written articles was found to be of consistent high quality by way of computerized grammar checker. Content quality of the documents did not significantly differ for any of the conditions. A supplemental Latin square analysis showed additional significant differences in time to complete for trial means (a practice effect) and team differences. Further, significantly less variance was found in certain conditions which had the group editor than in other conditions which did not. Subjective data obtained from questionnaires supported these results and additional1y showed that subjects significantly preferred trials with the group editor and considered then more productive. The face-to-face conditions may have been more efficient due to the nature of the task or due to increased communication structure within dyads due to practice with the group editor. The significant effect of Team Differences may have been due to consistent style differences between dyads that affected efficiency levels. The decreased variability in time to complete in certain group editor conditions may have been due to increased communication structure in these conditions, or perhaps due to leveling effects of group writing as opposed to individual writing with team member aid. These hypotheses need to be tested with further study, and generalizability of the experimental task conditions and results from this particular group editor need to be established as well face-to-face conditions clearly resulted in the most efficient performance on this task. The results obtained concerning the group editor suggest possible efficiency or consistency benefits from the use of group editors by co-authoring persons when face-to-face communication is not practical. Perhaps group editors will become a useful method for surrogate travel for persons with disabilities.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

39

Tran, Anh Xuan. "Identifying latent attributes from video scenes using knowledge acquired from large collections of text documents." Thesis, The University of Arizona, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3634275.

Full text

Abstract:

Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.

In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.

APA, Harvard, Vancouver, ISO, and other styles

40

Poria, Soujanya. "Novel symbolic and machine-learning approaches for text-based and multimodal sentiment analysis." Thesis, University of Stirling, 2017. http://hdl.handle.net/1893/25396.

Full text

Abstract:

Emotions and sentiments play a crucial role in our everyday lives. They aid decision-making, learning, communication, and situation awareness in human-centric environments. Over the past two decades, researchers in artificial intelligence have been attempting to endow machines with cognitive capabilities to recognize, infer, interpret and express emotions and sentiments. All such efforts can be attributed to affective computing, an interdisciplinary field spanning computer science, psychology, social sciences and cognitive science. Sentiment analysis and emotion recognition has also become a new trend in social media, avidly helping users understand opinions being expressed on different platforms in the web. In this thesis, we focus on developing novel methods for text-based sentiment analysis. As an application of the developed methods, we employ them to improve multimodal polarity detection and emotion recognition. Specifically, we develop innovative text and visual-based sentiment-analysis engines and use them to improve the performance of multimodal sentiment analysis. We begin by discussing challenges involved in both text-based and multimodal sentiment analysis. Next, we present a number of novel techniques to address these challenges. In particular, in the context of concept-based sentiment analysis, a paradigm gaining increasing interest recently, it is important to identify concepts in text; accordingly, we design a syntaxbased concept-extraction engine. We then exploit the extracted concepts to develop conceptbased affective vector space which we term, EmoSenticSpace. We then use this for deep learning-based sentiment analysis, in combination with our novel linguistic pattern-based affective reasoning method termed sentiment flow. Finally, we integrate all our text-based techniques and combine them with a novel deep learning-based visual feature extractor for multimodal sentiment analysis and emotion recognition. Comparative experimental results using a range of benchmark datasets have demonstrated the effectiveness of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

41

Zechner, Niklas. "A novel approach to text classification." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-138917.

Full text

Abstract:

This thesis explores the foundations of text classification, using both empirical and deductive methods, with a focus on author identification and syntactic methods. We strive for a thorough theoretical understanding of what affects the effectiveness of classification in general. To begin with, we systematically investigate the effects of some parameters on the accuracy of author identification. How is the accuracy affected by the number of candidate authors, and the amount of data per candidate? Are there differences in how methods react to the changes in parameters? Using the same techniques, we see indications that methods previously thought to be topic-independent might not be so, but that syntactic methods may be the best option for avoiding topic dependence. This means that previous studies may have overestimated the power of lexical methods. We also briefly look for ways of spotting which particular features might be the most effective for classification. Apart from author identification, we apply similar methods to identifying properties of the author, including age and gender, and attempt to estimate the number of distinct authors in a text sample. In all cases, the techniques are proven viable if not overwhelmingly accurate, and we see that lexical and syntactic methods give very similar results. In the final parts, we see some results of automata theory that can be of use for syntactic analysis and classification. First, we generalise a known algorithm for finding a list of the best-ranked strings according to a weighted automaton, to doing the same with trees and a tree automaton. This result can be of use for speeding up parsing, which often runs in several steps, where each step needs several trees from the previous as input. Second, we use a compressed version of deterministic finite automata, known as failure automata, and prove that finding the optimal compression is NP-complete, but that there are efficient algorithms for finding good approximations. Third, we find and prove the derivatives of regular expressions with cuts. Derivatives are an operation on expressions to calculate the remaining expression after reading a given symbol, and cuts are an extension to regular expressions found in many programming languages. Together, these findings may be able to improve on the syntactic analysis which we have seen is a valuable tool for text classification.

APA, Harvard, Vancouver, ISO, and other styles

42

Li, Jie. "Intention-driven textual semantic analysis." School of Computer Science and Software Engineering, 2008. http://ro.uow.edu.au/theses/104.

Full text

Abstract:

The explosion of World Wide Web has brought endless amount of information within our reach. In order to take advantage of this phenomenon, text search becomes a major contemporary research challenge. Due to the nature of the Web, assisting users to find desired information is still a challenging task. In this thesis, we investigate semantic anlaysis techniques which can facilitate the search process at semantic level. We also study the problem that short queries are less informative and difficult to convey the user's intention into the search service system. We propose a generalized framework to address these issues. We conduct a case study of movie plot search in which a semantic analyzer seamlessly works with a user's intention detector. Our experimental results show the importance and effectiveness of intention detection and semantic analysis techniques.

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Michelle W. M. Eng Massachusetts Institute of Technology. "Comparison of natural language processing algorithms for medical texts." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100298.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Title as it appears in MIT Commencement Exercises program, June 5, 2015: Comparison of NLP systems for medical text. Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
With the large corpora of clinical texts, natural language processing (NLP) is growing to be a field that people are exploring to extract useful patient information. NLP applications in clinical medicine are especially important in domains where the clinical observations are crucial to define and diagnose the disease. There are a variety of different systems that attempt to match words and word phrases to medical terminologies. Because of the differences in annotation datasets and lack of common conventions, many of the systems yield conflicting results. The purpose of this thesis project is (1) to create a visual representation of how different concepts compare to each other when using various annotators and (2) to improve upon the NLP methods to yield terms with better fidelity to what the clinicians are trying to express.
by Michelle W. Chen.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

44

Finch, Dezon K. "TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4321.

Full text

Abstract:

Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines' performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable.

APA, Harvard, Vancouver, ISO, and other styles

45

Yeates, Stuart Andrew. "Text Augmentation: Inserting markup into natural language text with PPM Models." The University of Waikato, 2006. http://hdl.handle.net/10289/2600.

Full text

Abstract:

This thesis describes a new optimisation and new heuristics for automatically marking up XML documents, and CEM, a Java implementation, using PPM models. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BibTeX system and marked up in XML with every field from the original BibTeX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists' Communique corpus and the Reuters' corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked up documents. The performance of the new heuristics and optimisation are examined using the four corpora.

APA, Harvard, Vancouver, ISO, and other styles

46

Wu, Qinyi. "Partial persistent sequences and their applications to collaborative text document editing and processing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/44916.

Full text

Abstract:

In a variety of text document editing and processing applications, it is necessary to keep track of the revision history of text documents by recording changes and the metadata of those changes (e.g., user names and modification timestamps). The recent Web 2.0 document editing and processing applications, such as real-time collaborative note taking and wikis, require fine-grained shared access to collaborative text documents as well as efficient retrieval of metadata associated with different parts of collaborative text documents. Current revision control techniques only support coarse-grained shared access and are inefficient to retrieve metadata of changes at the sub-document granularity. In this dissertation, we design and implement partial persistent sequences (PPSs) to support real-time collaborations and manage metadata of changes at fine granularities for collaborative text document editing and processing applications. As a persistent data structure, PPSs have two important features. First, items in the data structure are never removed. We maintain necessary timestamp information to keep track of both inserted and deleted items and use the timestamp information to reconstruct the state of a document at any point in time. Second, PPSs create unique, persistent, and ordered identifiers for items of a document at fine granularities (e.g., a word or a sentence). As a result, we are able to support consistent and fine-grained shared access to collaborative text documents by detecting and resolving editing conflicts based on the revision history as well as to efficiently index and retrieve metadata associated with different parts of collaborative text documents. We demonstrate the capabilities of PPSs through two important problems in collaborative text document editing and processing applications: data consistency control and fine-grained document provenance management. The first problem studies how to detect and resolve editing conflicts in collaborative text document editing systems. We approach this problem in two steps. In the first step, we use PPSs to capture data dependencies between different editing operations and define a consistency model more suitable for real-time collaborative editing systems. In the second step, we extend our work to the entire spectrum of collaborations and adapt transactional techniques to build a flexible framework for the development of various collaborative editing systems. The generality of this framework is demonstrated by its capabilities to specify three different types of collaborations as exemplified in the systems of RCS, MediaWiki, and Google Docs respectively. We precisely specify the programming interfaces of this framework and describe a prototype implementation over Oracle Berkeley DB High Availability, a replicated database management engine. The second problem of fine-grained document provenance management studies how to efficiently index and retrieve fine-grained metadata for different parts of collaborative text documents. We use PPSs to design both disk-economic and computation-efficient techniques to index provenance data for millions of Wikipedia articles. Our approach is disk economic because we only save a few full versions of a document and only keep delta changes between those full versions. Our approach is also computation-efficient because we avoid the necessity of parsing the revision history of collaborative documents to retrieve fine-grained metadata. Compared to MediaWiki, the revision control system for Wikipedia, our system uses less than 10% of disk space and achieves at least an order of magnitude speed-up to retrieve fine-grained metadata for documents with thousands of revisions.

APA, Harvard, Vancouver, ISO, and other styles

47

Oyarce, Guillermo Alfredo. "A Study of Graphically Chosen Features for Representation of TREC Topic-Document Sets." Thesis, University of North Texas, 2000. https://digital.library.unt.edu/ark:/67531/metadc2456/.

Full text

Abstract:

Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.

APA, Harvard, Vancouver, ISO, and other styles

48

Zhang, Shujian. "Evaluation in built-in self-test." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ34293.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Storby, Johan. "Information extraction from text recipes in a web format." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189888.

Full text

Abstract:

Searching the Internet for recipes to find interesting ideas for meals to prepare is getting increasingly popular. It can however be difficult to find a recipe for a dish that can be prepared with the items someone has available at home. In this thesis a solution to a part of that problem will be presented. This thesis will investigate a method for extracting the various parts of a recipe from the Internet in order to save them and build a searchable database of recipes where users can search for recipes based on the ingredients they have available. The system works for both English and Swedish and is able identify both languages. This is a problem within Natural Language Processing and the subfield Information Extraction. To solve the Information Extraction problem rule-based techniques based on Named Entity Recognition, Content Extraction and general rule-based extraction are used. The results indicate a generally good but not flawless functionality. For English the rule-based algorithm achieved an F1-score of 83.8% for ingredient identification, 94.5% for identification of cooking instructions and an accuracy of 88.0% and 96.4% for cooking time and number of portions respectively. For Swedish the ingredient identification worked slightly better but the other parts worked slightly worse. The results are comparable to the results of other similar methods and can hence be considered good, they are however not good enough for the system to be used independently without a supervising human.
Att söka på Internet efter recept för att hitta intressanta idéer till måltider att laga blir allt populärare. Det kan dock vara svårt att hitta ett recept till en maträtt som kan tillagas med råvarorna som finns hemma. I detta examensarbete kommer en lösning på en del av detta problem att presenteras. Detta examensarbete undersöker en metod för att extrahera de olika delarna av ett recept från Internet för att spara dem och fylla en sökbar databas av recept där användarna kan söka efter recept baserat på de ingredienser som de har till förfogande. Systemet fungerar för både engelska och svenska och kan identifiera båda språken. Detta är ett problem inom språkteknologi och delfältet informationsextraktion. För att lösa informationsextraktionsproblemet använder vi regelbaserade metoder baserade på entitetsigenkänning, metoder för extraktion av brödtext samt allmäna regelbaserade extraktionsmetoder. Resultaten visar på en generellt bra men inte felfri funktionalitet. För engelska har den regelbaserade algoritmen uppnått ett F1-värde av 83,8 % för ingrediensidentifiering, 94,5 % för identifiering av tillagningsinstruktioner och en träffsäkerhet på 88,0 % och 96,4 % för tillagningstid och antal portioner. För svenska fungerade ingrediensidentifieringen något bättre än för engelska men de andra delarna fungerade något sämre. Resultaten är jämförbara med resultaten för andra liknande metoder och kan därmed betraktas som goda, de är dock inte tillräckligt bra för att systemet skall kunna användas självständigt utan en övervakande människa.

APA, Harvard, Vancouver, ISO, and other styles

50

Cimiano, Philipp. "Ontology learning and population from text : algorithms, evaluation and applications /." New York, NY : Springer, 2006. http://www.loc.gov/catdir/enhancements/fy0824/2006931701-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Text processing (Computer science)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles