Дисертації з теми "Text modeling"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Text modeling".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Sauper, Christina (Christina Joan). "Content modeling for social media text." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/75648.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (p. 129-136).
This thesis focuses on machine learning methods for extracting information from user-generated content. Instances of this data such as product and restaurant reviews have become increasingly valuable and influential in daily decision making. In this work, I consider a range of extraction tasks such as sentiment analysis and aspect-based review aggregation. These tasks have been well studied in the context of newswire documents, but the informal and colloquial nature of social media poses significant new challenges. The key idea behind our approach is to automatically induce the content structure of individual documents given a large, noisy collection of user-generated content. This structure enables us to model the connection between individual documents and effectively aggregate their content. The models I propose demonstrate that content structure can be utilized at both document and phrase level to aid in standard text analysis tasks. At the document level, I capture this idea by joining the original task features with global contextual information. The coupling of the content model and the task-specific model allows the two components to mutually influence each other during learning. At the phrase level, I utilize a generative Bayesian topic model where a set of properties and corresponding attribute tendencies are represented as hidden variables. The model explains how the observed text arises from the latent variables, thereby connecting text fragments with corresponding properties and attributes.
by Christina Sauper.
Ph.D.
Harrysson, Mattias. "Neural probabilistic topic modeling of short and messy text." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189532.
Повний текст джерелаAtt utforska enorma mängder användargenererad data med ämnen postulerar ett nytt sätt att hitta användbar information. Ämnena antas vara “gömda” och måste “avtäckas” med statistiska metoder såsom ämnesmodellering. Dock är användargenererad data generellt sätt kort och stökig t.ex. informella chattkonversationer, mycket slangord och “brus” som kan vara URL:er eller andra former av pseudo-text. Denna typ av data är svår att bearbeta för de flesta algoritmer i naturligt språk, inklusive ämnesmodellering. Det här arbetet har försökt hitta den metod som objektivt ger dem bättre ämnena ur kort och stökig text i en jämförande studie. De metoder som jämfördes var latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words samt en egen metod med namnet Neural Probabilistic Topic Modeling (NPTM) baserat på tidigare arbeten. Den slutsats som kan dras är att NPTM har en tendens att ge bättre ämnen på kort och stökig text jämfört med LDA och RO-LDA. GMM lyckades inte ge några meningsfulla resultat alls. Resultaten är mindre bevisande eftersom NPTM har problem med långa körtider vilket innebär att tillräckligt många stickprov inte kunde erhållas för ett statistiskt test.
Reynolds, Douglas A. "A Gaussian mixture modeling approach to text-independent speaker identification." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/16903.
Повний текст джерелаSad, Hamed. "Text entry interfaces on mobile devices : modeling, design and evaluation." Lorient, 2009. http://www.theses.fr/2009LORIS153.
Повний текст джерелаCette thèse concerne la saisie de texte sur les dispositifs mobiles qui est un domaine très actif de l'interaction home-machine (IHM) depuis quelques années. Cette recherche traite plus particulièrement de l'évaluation des méthodes de saisie de texte. Nous abordons les deux approches principales de l'évaluation : l'évaluation expérimentale et l’évaluation par modélisation. Une plateforme pour l'évaluation expérimentale est présentée. Elle vise à faciliter, rendre plus rapide et plus reproductive l'évaluation. Cette plateforme qui intègre de nombreuses méthodes de saisie rend possible leur comparaison et simplifie grandement la conception et le développement d'une nouvelle idée pour la saisie de texte. Enfin, la plateforme inclut des outils pour l'évaluation, comme un outil d'aide à la création d'un corpus de test représentatif de la langue cible ou un outil pour automatiser l'analyse des performances sur la base des métriques standard du domaine. Nous proposons également un cadre (framework) pour décrire, classifier et modéliser les opérations impliquées dans la saisie de texte sur dispositifs mobiles. À la base, nous distinguons deux étapes : la planification et l’exécution. La première étape correspond au processus mental de planification des actions physiques requises pour saisir un mot avec une méthode donnée ; la deuxième étape concerne le processus moteur de production du texte à partir des actions disponibles pour l’utilisateur. On propose, dans cette thèse, des mesures pour une évaluation théorique de ces deux phases de la saisie. L’évaluation théorique s’appuie sur des modèles de la performance humaine pour l’exécution des différentes tâches impliquées dans la saisie. Nous avons étudié en particulier deux tâches fréquemment utilisées dans la phase d'exécution : la sélection d’un mot dans une liste de mots et le pointage et défilement par une interaction basée sur l’inclinaison (tilt). Nous présentons un algorithme et des recommandations pour la conception de claviers ambigus efficaces. Un modèle de performance pour la sélection de mot dans une liste est proposé qui fait suite à une étude expérimentale. Un autre modèle prédit le temps d’exécution du ciblage et du défilement par inclinaison sur un dispositif mobile. Enfin, nous proposons de nouvelles directions originales pour la saisie de texte qui concernent la phase de planification. L’approche exploite notre « connaissance du monde » ainsi que la nature syntaxique des mots du message. Nous nous affranchissons le plus possible d’une saisie de texte « lettre par lettre », pour suivre une approche pictographique où les mots les plus fréquents sont Page 8 Résumé Text entry interfaces on mobile devices: Modeling, design and evaluation, PhD thesis 2009 directement accessibles à partir d’une représentation graphique. L’approche proposée exploite également la syntaxe de la langue pour permettre à l'utilisateur de filtrer gestuellement le mot désiré selon sa catégorie grammaticale. Cette approche pictographique et syntaxique utilise un moteur de prédiction et un codage du lexique spécifiques qui assurent une structure de données efficace et adaptée aux performances limitées des dispositifs mobiles
Cheng, Chi Wa. "Probabilistic topic modeling and classification probabilistic PCA for text corpora." HKBU Institutional Repository, 2011. http://repository.hkbu.edu.hk/etd_ra/1263.
Повний текст джерелаRen, Zhaowei. "Analysis and Modeling of the Structure of Semantic Dynamics in Texts." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1512045439740177.
Повний текст джерелаPreece, Daniel Joseph. "Text Identification by Example." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2060.pdf.
Повний текст джерелаBischof, Jonathan Michael. "Interpretable and Scalable Bayesian Models for Advertising and Text." Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11400.
Повний текст джерелаStatistics
Foulds, James Richard. "Latent Variable Modeling for Networks and Text| Algorithms, Models and Evaluation Techniques." Thesis, University of California, Irvine, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3631094.
Повний текст джерелаIn the era of the internet, we are connected to an overwhelming abundance of information. As more facets of our lives become digitized, there is a growing need for automatic tools to help us find the content we care about. To tackle the problem of information overload, a standard machine learning approach is to perform dimensionality reduction, transforming complicated high-dimensional data into a manageable, low-dimensional form. Probabilistic latent variable models provide a powerful and elegant framework for performing this transformation in a principled way. This thesis makes several advances for modeling two of the most ubiquitous types of online information: networks and text data.
Our first contribution is to develop a model for social networks as they vary over time. The model recovers latent feature representations of each individual, and tracks these representations as they change dynamically. We also show how to use text information to interpret these latent features.
Continuing the theme of modeling networks and text data, we next build a model of citation networks. The model finds influential scientific articles and the influence relationships between the articles, potentially opening the door for automated exploratory tools for scientists. The increasing prevalence of web-scale data sets provides both an opportunity and a challenge. With more data we can fit more accurate models, as long as our learning algorithms are up to the task. To meet this challenge, we present an algorithm for learning latent Dirichlet allocation topic models quickly, accurately and at scale. The algorithm leverages stochastic techniques, as well as the collapsed representation of the model. We use it to build a topic model on 4.6 million articles from the open encyclopedia Wikipedia in a matter of hours, and on a corpus of 1740 machine learning articles from the NIPS conference in seconds.
Finally, evaluating the predictive performance of topic models is an important yet computationally difficult task. We develop one algorithm for comparing topic models, and another for measuring the progress of learning algorithms for these models. The latter method achieves better estimates than previous algorithms, in many cases with an order of magnitude less computational effort.
Alsadhan, Majed. "An application of topic modeling algorithms to text analytics in business intelligence." Thesis, Kansas State University, 2014. http://hdl.handle.net/2097/17580.
Повний текст джерелаDepartment of Computing and Information Sciences
Doina Caragea
William H. Hsu
In this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau. According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters.
Sun, Yingcheng. "Topic Modeling and Spam Detection for Short Text Segments in Web Forums." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1575281495398615.
Повний текст джерелаFries, Jason Alan. "Modeling words for online sexual behavior surveillance and clinical text information extraction." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/2076.
Повний текст джерелаLund, Jeffrey A. "Fast Inference for Interactive Models of Text." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/5780.
Повний текст джерелаWang, Xuerui. "Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities." Amherst, Mass. : University of Massachusetts Amherst, 2009. http://scholarworks.umass.edu/open_access_dissertations/58/.
Повний текст джерелаJernite, Yacine. "Learning Representations of Text through Language and Discourse Modeling| From Characters to Sentences." Thesis, New York University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10680744.
Повний текст джерелаIn this thesis, we consider the problem of obtaining a representation of the meaning expressed in a text. How to do so correctly remains a largely open problem, combining a number of inter-related questions (e.g. what is the role of context in interpreting text? how should language understanding models handle compositionality? etc...) In this work, after reflecting on the notion of meaning and describing the most common sequence modeling paradigms in use in recent work, we focus on two of these questions: what level of granularity text should be read at, and what training objectives can lead models to learn useful representations of a text's meaning.
In a first part, we argue for the use of sub-word information for that purpose, and present new neural network architectures which can either process words in a way that takes advantage of morphological information, or do away with word separations altogether while still being able to identify relevant units of meaning.
The second part starts by arguing for the use of language modeling as a learning objective, and provides algorithms which can help with its scalability issues and propose a globally rather than locally normalized probability distribution. It then explores the question of what makes a good language learning objective, and introduces discriminative objectives inspired by the notion of discourse coherence which help learn a representation of the meaning of sentences.
Duong-Trung, Nghia [Verfasser]. "Social Media Learning : Novel Text Analytics for Geolocation and Topic Modeling / Nghia Duong-Trung." Göttingen : Cuvillier Verlag, 2017. http://d-nb.info/1136676988/34.
Повний текст джерелаLipka, Nedim [Verfasser], Benno [Akademischer Betreuer] Stein, and James [Gutachter] Shanahan. "Modeling Non-Standard Text Classification Tasks / Nedim Lipka ; Gutachter: James Shanahan ; Betreuer: Benno Stein." Weimar : Professur Content Management / Web-Technologien, 2013. http://d-nb.info/1116094495/34.
Повний текст джерелаRedyuk, Sergey. "Finding early signals of emerging trends in text through topic modeling and anomaly detection." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15507.
Повний текст джерелаNguyen, Thi Thu Trang. "HMM-based Vietnamese Text-To-Speech : Prosodic Phrasing Modeling, Corpus Design System Design, and Evaluation." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112201/document.
Повний текст джерелаThe thesis objective is to design and build a high quality Hidden Markov Model (HMM-)based Text-To-Speech (TTS) system for Vietnamese – a tonal language. The system is called VTED (Vietnamese TExt-tospeech Development system). In view of the great importance of lexical tones, a “tonophone” – an allophone in tonal context – was proposed as a new speech unit in our TTS system. A new training corpus, VDTS (Vietnamese Di-Tonophone Speech corpus), was designed for 100% coverage of di-phones in tonal contexts (i.e. di-tonophones) using the greedy algorithm from a huge raw text. A total of about 4,000 sentences of VDTS were recorded and pre-processed as a training corpus of VTED.In the HMM-based speech synthesis, although pause duration can be modeled as a phoneme, the appearanceof pauses cannot be predicted by HMMs. Lower phrasing levels above words may not be completely modeled with basic features. This research aimed at automatic prosodic phrasing for Vietnamese TTS using durational clues alone as it appeared too difficult to disentangle intonation from lexical tones. Syntactic blocks, i.e. syntactic phrases with a bounded number of syllables (n), were proposed for predicting final lengthening (n = 6) and pause appearance (n = 10). Improvements for final lengthening were done by some strategies of grouping single syntactic blocks. The quality of the predictive J48-decision-tree model for pause appearance using syntactic blocks combining with syntactic link and POS (Part-Of-Speech) features reached F-score of 81.4% Precision=87.6%, Recall=75.9%), much better than that of the model with only POS (F-score=43.6%)or syntactic link (F-score=52.6%) alone.The architecture of the system was proposed on the basis of the core architecture of HTS with an extension of a Natural Language Processing part for Vietnamese. Pause appearance was predicted by the proposed model. Contextual feature set included phone identity features, locational features, tone-related features, and prosodic features (i.e. POS, final lengthening, break levels). Mary TTS was chosen as a platform for implementing VTED. In the MOS (Mean Opinion Score) test, the first VTED, trained with the old corpus and basic features, was rather good, 0.81 (on a 5 point MOS scale) higher than the previous system – HoaSung (using the non-uniform unit selection with the same training corpus); but still 1.2-1.5 point lower than the natural speech. The quality of the final VTED, trained with the new corpus and prosodic phrasing model, progressed by about 1.04 compared to the first VTED, and its gap with the natural speech was much lessened. In the tone intelligibility test, the final VTED received a high correct rate of 95.4%, only 2.6% lower than the natural speech, and 18% higher than the initial one. The error rate of the first VTED in the intelligibility test with the Latin square design was about 6-12% higher than the natural speech depending on syllable, tone or phone levels. The final one diverged about only 0.4-1.4% from the natural speech
Akinepally, Pratima Rao. "Investigating Performance of Different Models at Short Text Topic Modelling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288531.
Повний текст джерелаDet huvudsakliga syftet med det i denna uppsats rapporterade projektet är att kvantitativt och kvalitativt utvärdera och jämföra hur väl Universal Sentence Encoder USE, ett semantiskt vektorrum för meningar, och word2vec, ett semantiskt vektorrum för ord, fungerar för att modellera ämnesinnehåll i text. Projektet har som träningsdata använt skriftliga sammanfattningar och ämnesetiketter för podd-episoder som gjorts tillgängliga av Spotify. De skriftliga sammanfattningarna har använts för att generera både vektorer för de enskilda podd-episoderna och för de ämnen de behandlar. De båda ansatsernas vektorer har sedan utvärderats genom att de använts för att tilldela ämnen till beskrivningar ur en testmängd. Resultaten har sedan jämförts och leder både till den allmänna slutsatsen att semantiska vektorrum är väl lämpade för den här sortens uppgifter, och att USE totalt sett överträffar word2vec-modellerna.
Kim, Hyowon. "Improving Inferences about Preferences in Choice Modeling." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587524882296023.
Повний текст джерелаBarford, Paul R. "Modeling, measurement and performance of World Wide Web transactions." Thesis, Boston University, 2001. https://hdl.handle.net/2144/36753.
Повний текст джерелаPLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
The size, diversity and continued growth of the World Wide Web combine to make its understanding difficult even at the most basic levels. The focus of our work is in developing novel methods for measuring and analyzing the Web which lead to a deeper understanding of its performance. We describe a methodology and a distributed infrastructure for taking measurements in both the network and end-hosts. The first unique characteristic of the infrastructure is our ability to generate requests at our Web server which closely imitate actual users. This ability is based on detailed analysis of Web client behavior and the creation of the Scalable URL Request Generator (SURGE) tool. SURGE provides us with the flexibility to test different aspects of Web performance. We demonstrate this flexibility in an evaluation of the 1.0 and 1.1 versions of the Hyper Text Transfer Protocol. The second unique aspect of our approach is that we analyze the details of Web transactions by applying critical path analysis (CPA). CPA enables us to precisely decompose latency in Web transactions into propagation delay, network variation, server delay, client delay and packet loss delays. We present analysis of pe1formance data collected in our infrastructure. Our results show that our methods can expose surprising behavior in Web servers, and can yield considerable insight into the causes of delay variability in Web transactions.
2031-01-01
Johnson, Barbara Denise. "Modeling Cognitive Authority Relationships." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc955042/.
Повний текст джерелаDas, Manirupa. "Neural Methods Towards Concept Discovery from Text via Knowledge Transfer." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572387318988274.
Повний текст джерелаAl, Madi Naser S. "A STUDY OF LEARNING PERFORMANCE AND COGNITIVE ACTIVITY DURING MULTIMODAL COMPREHENSION USING SEGMENTATION-INTEGRATION MODEL AND EEG." Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1416868268.
Повний текст джерелаKeneshloo, Yaser. "Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer Learning." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/91910.
Повний текст джерелаDoctor of Philosophy
Nowadays, each person is exposed to an immense amount of information from social media, blog posts, and online news portals. Among these sources, news agencies are one of the main content providers for each person around the world. Contemporary news agencies are moving from traditional journalism to modern techniques from different angles. This is achieved either by building smart tools to track the behaviour of readers’ reaction around a specific news article or providing automated tools to facilitate the editor’s job in providing higher quality content to readers. These systems should not only be able to scale well with the growth of readers but also they have to be able to process ad-hoc requests, precisely since most of the policies and decisions in these agencies are taken around the result of these analytical tools. As part of this new movement towards adapting new technologies for smart journalism, we have worked on various problems with The Washington Post news agency on building tools for predicting the popularity of a news article and automated text summarization model. We develop a model that monitors each news article after its publication and provide prediction over the number of views that this article will receive within the next 24 hours. This model will help the content creator to not only promote potential viral article in the main page of the web portal or social media, but also provide intuition for editors on potential poorly performing articles so that they can edit the content of those articles for better exposure. On the other hand, current news agencies are generating more than a thousands news articles per day and generating three to four summary sentences for each of these news pieces not only become infeasible in the near future but also very expensive and time-consuming. Therefore, we also develop a separate model for automated text summarization which generates summary sentences for a news article. Our model will generate summaries by selecting the most salient sentence in the news article and paraphrase them to shorter sentences that could represent as a summary sentence for the entire document.
Xiong, Hui. "Combining Subject Expert Experimental Data with Standard Data in Bayesian Mixture Modeling." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1312214048.
Повний текст джерелаChen, Le. "Identifying Job Categories and Required Competencies for Instructional Technologist: A Text Mining and Content Analysis." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99279.
Повний текст джерелаDoctor of Philosophy
According to Kimmons and Veletsianos (2018), text mining has not been widely applied in the field of instructional technology. This study provides an example of using text mining techniques to discover a set of required job competencies. It can be helpful to researchers unfamiliar with text mining methodology, allowing them to understand its potentials and limitations better. The primary research focus was to examine the efficacy of text mining by comparing text mining results with content analysis results. Both content analysis and text mining procedures were applied to the same data set to extract job competencies. Similarities and differences between the results were compared, and the pros and cons of each methodology were discussed.
SUI, ZHENHUAN. "Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637.
Повний текст джерелаFonseca, Felipe Penhorate Carvalho da. "Inferência das áreas de atuação de pesquisadores." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-02032018-102111/.
Повний текст джерелаNowadays, there is a wide range of academic data available on the web. With this information, it is possible to solve tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. Several academic data repositories do not contain or do not require the explicit information of the researchers\' areas. In the data of the Lattes curricula, this information exists, but it is inserted manually by the researcher without any kind of validation (and potentially it is outdated, missing or even there is incorrect information). The present work utilized machine learning techniques in the inference of the researcher\'s areas based on the data registered in the Lattes platform. The titles of the scientific production were used as data source and they were enriched with semantically related information present in other bases, besides adopting other representations for the text of the titles and other academic information as orientations and research projects. The objective of this dissertation was to evaluate if the data enrichment improves the performance of the classification algorithms tested, as well as to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms. The proposed technique can be applied to different academic data (not restricted to data present in the Lattes platform), but the data from this platform was used for the tests and validations of the proposed solution. As a result, it was identified that the technique used to perform the enrichment of the text did not improve the accuracy of the inference. However, social network metrics and numerical representations improved inference accuracy when compared to state-of-the-art techniques, as well as the use of the hierarchical structure of the classes, which returned the best results among the obtained
Ahmad, Irfan [Verfasser], Gernot A. [Akademischer Betreuer] Fink, and Laurence [Gutachter] Likforman-Sulem. "Modeling and training options for handwritten Arabic text recognition / Irfan Ahmad ; Gutachter: Laurence Likforman-Sulem ; Betreuer: Gernot A. Fink." Dortmund : Universitätsbibliothek Dortmund, 2016. http://d-nb.info/1128903393/34.
Повний текст джерелаWei, Wei. "Probabilistic Models of Topics and Social Events." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/941.
Повний текст джерелаRomero, Margaurete. "Comparing Game Simulation to Concept Models for Student-Centered Learning in Biology." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6577.
Повний текст джерелаShokat, Imran. "Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78995.
Повний текст джерелаApelthun, Catharina. "Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-441653.
Повний текст джерелаAlverio, Gustavo. "DISCUSSION ON EFFECTIVE RESTORATION OF ORAL SPEECH USING VOICE CONVERSION TECHNIQUES BASED ON GAUSSIAN MIXTURE MODELING." Master's thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2909.
Повний текст джерелаM.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE
Al, Madi Naser S. "Modeling Eye Movement for the Assessment of Programming Proficiency." Kent State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=kent1595429905152276.
Повний текст джерелаKramer, Stephanie. "Holy day effects on language: How religious geography, individual affiliation and day of the week relate to sentiment and topics on Twitter." Thesis, University of Oregon, 2018. http://hdl.handle.net/1794/23106.
Повний текст джерелаBarbieri, Francesco. "Machine learning methods for understanding social media communication: modeling irony and emojis." Doctoral thesis, Universitat Pompeu Fabra, 2018. http://hdl.handle.net/10803/461793.
Повний текст джерелаEn esta tesis proponemos algoritmos para el análisis de textos de redes sociales, enfocándonos en dos aspectos particulares: el reconocimiento automático de la ironía y el análisis y predicción de emojis. Proponemos sistemas automáticos, basados en métodos de aprendizaje automático, capaces de reconocer e interpretar estos dos fenómenos. También exploramos el problema del sesgo en análisis del sentimiento y en la detección de la ironía, mostrando que los sistemas tradicionales, basados en palabras, no son robustos cuando los datos de entrenamiento y test pertenecen a dominios diferentes. El modelo que se propone en esta tesis para el reconocimiento de la ironía es más estable a los cambios de dominio que los sistemas basados en palabras. En una serie de experimentos demostramos que nuestro modelo es también capaz de distinguir entre noticias satíricas y no satíricas. Asimismo, exploramos con modelos semánticos distribucional, si y cómo el significado y el uso de emojis varía entre los idiomas, así como a través de las épocas del año. También nos preguntamos si es posible predecir el emoji que un mensaje contiene solo utilizando el texto del mensaje. Hemos demostrado que nuestro sistema basado en deep-learning es capaz de realizar esta tarea con buena precisión y que se pueden mejorar los resultados si además del texto se utiliza información sobre las imágenes que acompañan al texto.
Walker, Daniel David. "Bayesian Test Analytics for Document Collections." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3530.
Повний текст джерелаUslu, Tolga [Verfasser], Alexander [Akademischer Betreuer] Mehler, Alexander [Gutachter] Mehler, and Visvanathan [Gutachter] Ramesh. "Multi-document analysis : semantic analysis of large text corpora beyond topic modeling / Tolga Uslu ; Gutachter: Alexander Mehler, Visvanathan Ramesh ; Betreuer: Alexander Mehler." Frankfurt am Main : Universitätsbibliothek Johann Christian Senckenberg, 2020. http://d-nb.info/1221669125/34.
Повний текст джерелаDi, Fiore Silvia. "La dimensione discorsiva della Politica di Coesione. Confronto fra Content Analysis e Topic Modeling." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/17284/.
Повний текст джерелаDzhambazov, Georgi. "Knowledge-based probabilistic modeling for tracking lyrics in music audio signals." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404681.
Повний текст джерелаLa tesi aquí presentada proposa metodologies d’aprenentatge automàtic i processament de senyal per alinear automàticament el text d’una cançó amb el seu corresponent enregistrament d’àudio. La recerca duta a terme s’engloba en l’ampli camp de l’extracció d’informació musical (Music Information Retrieval o MIR). Dins aquest context la tesi pretén millorar algunes de les metodologies d’última generació del camp introduint coneixement específic de l’àmbit. L’objectiu d’aquest treball és dissenyar models que siguin capaços de detectar en la senyal d’àudio l’aspecte seqüencial d’un element particular dels textos musicals; els fonemes. Podem entendre la música com la composició de diversos elements entre els quals podem trobar el text. Els models que construïm tenen en compte el context complementari del text. El context són tots aquells aspectes musicals que complementen el text, dels quals hem utilitzat en aquest tesi: la estructura de la composició musical, la estructura de les frases melòdiques i els accents rítmics. Des d’aquesta prespectiva analitzem no només les característiques acústiques de baix nivell, que representen el timbre musical dels fonemes, sinó també les característiques d’alt nivell en les quals es fa patent el context complementari. En aquest treball proposem models probabilístics específics que representen com les transicions entre fonemes consecutius de veu cantanda es veuen afectats per diversos aspectes del context complementari. El context complementari que tractem aquí es desenvolupa en el temps en funció de les característiques particulars de cada tradició musical. Per tal de modelar aquestes característiques hem creat corpus i conjunts de dades de dues tradicions musicals que presenten una gran riquesa en aquest aspectes; la música de l’opera de Beijing i la música makam turc-otomana. Les dades són de diversos tipus; enregistraments d’àudio, partitures musicals i metadades. Des d’aquesta prespectiva els models proposats poden aprofitar-se tant de les dades en si mateixes com del coneixement específic de la tradició musical per a millorar els resultats de referència actuals. Com a resultat de referència prenem un reconeixedor de fonemes basat en models ocults de Markov (Hidden Markov Models o HMM), una metodologia abastament emprada per a detectar fonemes tant en la veu cantada com en la parlada. Presentem millores en els processos comuns dels reconeixedors de fonemes actuals, ajustant-los a les característiques de les tradicions musicals estudiades. A més de millorar els resultats de referència també dissenyem models probabilistics basats en xarxes dinàmiques de Bayes (Dynamic Bayesian Networks o DBN) que respresenten la relació entre la transició dels fonemes i el context complementari. Hem creat dos models diferents per dos aspectes del context complementari; la estructura de la frase melòdica (alt nivell) i la estructura mètrica (nivell subtil). En un dels models explotem el fet que la duració de les síl·labes depén de la seva posició en la frase melòdica. Obtenim aquesta informació sobre les frases musical de la partitura i del coneixement específic de la tradició musical. En l’altre model analitzem com els atacs de les notes vocals, estimats directament dels enregistraments d’àudio, influencien les transicions entre vocals i consonants consecutives. A més també proposem com detectar les posicions temporals dels atacs de les notes en les frases melòdiques a base de localitzar simultàniament els accents en un cicle mètric musical. Per tal d’evaluar el potencial dels mètodes proposats utlitzem la tasca específica d’alineament de text amb àudio. Cada model proposat millora la precisió de l’alineament en comparació als resultats de referència, que es basen exclusivament en les característiques acústiques tímbriques dels fonemes. D’aquesta manera validem la nostra hipòtesi de que el coneixement del context complementari ajuda a la detecció automàtica de text musical, especialment en el cas de veu cantada amb acompanyament instrumental. Els resultats d’aquest treball no consisteixen només en metodologies teòriques i dades, sinó també en eines programàtiques específiques que han sigut integrades a Dunya, un paquet d’eines creat en el context del projecte de recerca CompMusic, l’objectiu del qual és promoure l’anàlisi computacional de les músiques del món. Gràcies a aquestes eines demostrem també que les metodologies desenvolupades es poden fer servir per a altres aplicacions en el context de la educació musical o la escolta musical enriquida.
Efer, Thomas. "Graphdatenbanken für die textorientierten e-Humanities." Doctoral thesis, Universitätsbibliothek Leipzig, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-219122.
Повний текст джерелаIn light of the recent massive digitization efforts, most of the humanities disciplines are currently undergoing a fundamental transition towards the widespread application of digital methods. In between those traditional scholarly fields and computer science exists a methodological and communicational gap, that the so-called \\\"e-Humanities\\\" aim to bridge systematically, via interdisciplinary project work. With text being the most common object of study in this field, many approaches from the area of Text Mining have been adapted to problems of the disciplines. While common workflows and best practices slowly emerge, it is evident that generic solutions are no ultimate fit for many specific application scenarios. To be able to create custom-tailored digital tools, one of the central issues is to digitally represent the text, as well as its many contexts and related objects of interest in an adequate manner. This thesis introduces a novel form of text representation that is based on Property Graph databases – an emerging technology that is used to store and query highly interconnected data sets. Based on this modeling paradigm, a new text research system called \\\"Kadmos\\\" is introduced. It provides user-definable asynchronous web services and is built to allow for a flexible extension of the data model and system functionality within a prototype-driven development process. With Kadmos it is possible to easily scale up to text collections containing hundreds of millions of words on a single device and even further when using a machine cluster. It is shown how various methods of Text Mining can be implemented with and adapted for the graph representation at a very fine granularity level, allowing the creation of fitting digital tools for different aspects of scholarly work. In extended usage scenarios it is demonstrated how the graph-based modeling of domain data can be beneficial even in research scenarios that go beyond a purely text-based study
Albishre, Khaled Mohammed H. "Informative feature discovery for social media mining." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/199464/1/Khaled%20Mohammed%20H_Albishre_Thesis.pdf.
Повний текст джерелаNorkevičius, Giedrius. "Method for creating phone duration models using very large, multi-speaker, automatically annotated speech corpus." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2011. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2011~D_20110201_144440-12017.
Повний текст джерелаDisertacijoje nagrinėjamos dvi iki šiol netyrinėtos problemos: 1. Lietuvių kalbos garsų trukmių prognozavimo modelių kūrimas Iki šiol visi darbai, kuriuose yra nagrinėjamos lietuvių kalbos garsų trukmės, yra atlikti kalbininkų, tačiau šie tyrimai yra daugiau aprašomosios statistikos pobūdžio ir apsiriboja pavienių požymių įtakos garso trukmei analize. Šiame darbe, mašininio mokymo algoritmo pagalba, požymių įtaka garsų trukmei yra išmokstama iš duomenų ir užrašoma sprendimo medžio pavidalu. 2. Nuo kalbos nepriklausomų garsų trukmių prognozavimo modelių kūrimo metodas, naudojant didelės apimties daugelio, kalbėtojų automatiškai, anotuotą garsyną. Dėl skirtingų kalbėtojų tarties specifikos ir dėl automatinio anotavimo netikslumų, kuriant garsų trukmės modelius visame pasaulyje yra apsiribojama vieno kalbėtojo ekspertų anotuotais nedidelės apimties garsynais. Darbe pasiūlyti skirtingų kalbėtojų tarties ypatybių normalizavimo ir garsyno duomenų triukšmo atmetimo algoritmai leidžia garsų trukmių modelių kūrimui naudoti didelės apimties, daugelio kalbėtojų automatiškai anotuotus garsynus. Darbo metu atliktas audicinis tyrimas, kurio pagalba parodoma, kad šnekos signalą sudarančių garsų trukmės turi įtakos klausytojų/respondentų suvokiamam šnekos signalo natūralumui; kontekstinės informacijos panaudojimas garsų trukmių prognozavimo uždavinio sprendime yra svarbus faktorius įtakojantis sintezuotos šnekos natūralumą; natūralaus šnekos signalo atžvilgiu, geriausiai vertinamas yra... [toliau žr. visą tekstą]
Rivaldo, Ricardo de Moura. "GraphSchema : uma linguagem visual para a criação de modelos de contratos com SML." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2008. http://hdl.handle.net/10183/14908.
Повний текст джерелаIt is common place to talk about the widespread presence of text documents and unstructured information stored in natural language text documents file format. This fact is still more dramatic to law professionals where text is the basic tool for their work. Those texts came from multiple sources like research documents and legislation and also are the main product from law activities, i.e., text documents which are created by law professionals. Since the first text editor there are several initiatives to use information technologies to help the generation, storage and search of law documents. From all documents, legal contracts generation is especially important due to its ubiquity and use by all social actors like common people, companies and government agencies. This work main focus is legal contract model generation. GraphSchema graphical language is introduced as a proposed solution to enable users to create contract models without help from a computer professional. It uses a visual representation to create legal contracts models, where concepts, relationships between those and constraints can be represented in a visual paradigm which can be understood by users. The graphical representation is translated to SML, a XML Schema extension. On enabling final user conceptual contract modeling without forcing a restrict vocabulary or ontology, GraphSchema and. by consequence, the use of SML, has several advantages in comparison with the use of simple XML Schema, RDF and OWL. But mainly show advantages when compared with other approaches based on vocabulary definition and formal ontology usage. Those advantages are mainly due to its simplicity and flexibility which enable the use of existing standards to define contract models like the eContracts standard defined by LegalXML consortium. This way, GraphSchema appears as an option to implement and use this standard in real world cases. The availability of a language directed towards non technical user will enable the contracts creation with tag markup from the beginning when used with XML guided text editors. This opens a door to productivity grow on contracts and legal documents creation.
Packer, Thomas L. "Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4258.
Повний текст джерелаGupta, Smita. "Modelling Deception Detection in Text." Thesis, Kingston, Ont. : [s.n.], 2007. http://hdl.handle.net/1974/922.
Повний текст джерелаSvensson, Karin, and Johan Blad. "Exploring NMF and LDA Topic Models of Swedish News Articles." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-429250.
Повний текст джерела