Дисертації з теми "Textual data-mining"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-38 дисертацій для дослідження на тему "Textual data-mining".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Zhou, Wubai. "Data Mining Techniques to Understand Textual Data." FIU Digital Commons, 2017. https://digitalcommons.fiu.edu/etd/3493.
Повний текст джерелаUr-Rahman, Nadeem. "Textual data mining applications for industrial knowledge management solutions." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/6373.
Повний текст джерелаKubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.
Повний текст джерелаKalledat, Tobias. "Tracking domain knowledge based on segmented textual sources." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2009. http://dx.doi.org/10.18452/15925.
Повний текст джерелаThe research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing.
元吉, 忠寛, та Tadahiro MOTOYOSHI. "災害のイマジネーション力に関する探索的研究 - 大学生の想像力と阪神淡路大震災の事例との比較 -". 名古屋大学大学院教育発達科学研究科, 2006. http://hdl.handle.net/2237/9454.
Повний текст джерелаSpiegler, Sebastian R. "Comparative study of clustering algorithms on textual databases : clustering of curricula vitae into comptency-based groups to support knowledge management /." Saarbrücken : VDM Verl. Müller, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=3035354&prov=M&dok_var=1&dok_ext=htm.
Повний текст джерелаNieto, Erick Mauricio Gómez. "Projeção multidimensional aplicada a visualização de resultados de busca textual." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-105730/.
Повний текст джерелаInternet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or web page) and a link to it. This display has many advantages, e.g., it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query - e.g., it may be too general, or ambiguous, or ill expressed - the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content-wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a bag-of-words vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping amongst snippets and the preservation of the neighborhood structure as given in vii the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global view of the query results and visual groupings that reflect related results, as illustrated in several examples shown
Fabbri, Renato. "Topological stability and textual differentiation in human interaction networks: statistical analysis, visualization and linked data." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-11092017-154706/.
Повний текст джерелаEste trabalho relata propriedades topológicas estáveis (ou invariantes) e diferenciação textual em redes de interação humana, com referências derivadas de listas públicas de e-mail. A atividade ao longo do tempo e a topologia foram observadas em instantâneos ao longo de uma linha do tempo e em diferentes escalas. A análise mostra que a atividade é praticamente a mesma para todas as redes em escalas temporais de segundos a meses. As componentes principais dos participantes no espaço das métricas topológicas mantêm-se praticamente inalteradas quando diferentes conjuntos de mensagens são considerados. A atividade dos participantes segue o esperado perfil livre de escala, produzindo, assim, as classes de vértices dos hubs, dos intermediários e dos periféricos em comparação com o modelo Erdös-Rényi. Os tamanhos relativos destes três setores são essencialmente os mesmos para todas as listas de e-mail e ao longo do tempo. Normalmente, 3-12% dos vértices são hubs, 15-45% são intermediários e 44-81% são vértices periféricos. Os textos de cada um destes setores são considerados muito diferentes através de uma adaptação dos testes de Kolmogorov-Smirnov. Estas propriedades são consistentes com a literatura e podem ser gerais para redes de interação humana, o que tem implicações importantes para o estabelecimento de uma tipologia dos participantes com base em critérios quantitativos. De modo a guiar e apoiar esta pesquisa, também desenvolvemos um método de visualização para redes dinâmicas através de animações. Para facilitar a verificação e passos seguintes nas análises, fornecemos uma representação em dados ligados dos dados relacionados aos nossos resultados.
Mendes, MarÃlia Soares. "MALTU - model for evaluation of interaction in social systems from the Users Textual Language." Universidade Federal do CearÃ, 2015. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=14296.
Повний текст джерелаA Ãrea de InteraÃÃo Humano-Computador (IHC) tem sugerido muitas formas para avaliar sistemas a fim de melhorar sua usabilidade e a eXperiÃncia do UsuÃrio (UX). O surgimento da web 2.0 permitiu o desenvolvimento de aplicaÃÃes marcadas pela colaboraÃÃo, comunicaÃÃo e interatividade entre seus usuÃrios de uma forma e em uma escala nunca antes observadas. Sistemas Sociais (SS) (e.g., Twitter, Facebook, MySpace, LinkedIn etc.) sÃo exemplos dessas aplicaÃÃes e possuem caracterÃsticas como: frequente troca de mensagens e expressÃo de sentimentos de forma espontÃnea. As oportunidades e os desafios trazidos por esses tipos de aplicaÃÃes exigem que os mÃtodos tradicionais de avaliaÃÃo sejam repensados, considerando essas novas caracterÃsticas. Por exemplo, as postagens dos usuÃrios em SS revelam suas opiniÃes sobre diversos assuntos, inclusive sobre o que eles pensam do sistema em uso. Esta tese procura testar a hipÃtese de que as postagens dos usuÃrios em SS fornecem dados relevantes para avaliaÃÃo da Usabilidade e da UX (UUX) em SS. Durante as pesquisas realizadas na literatura, nÃo foi identificado nenhum modelo de avaliaÃÃo que tenha direcionado seu foco na coleta e anÃlise das postagens dos usuÃrios a fim de avaliar a UUX de um sistema em uso. Sendo assim, este estudo propÃe o MALTU â Modelo para AvaliaÃÃo da interaÃÃo em sistemas sociais a partir da Linguagem Textual do UsuÃrio. A fim de fornecer bases para o desenvolvimento do modelo proposto, foram realizados estudos de como os usuÃrios expressam suas opiniÃes sobre o sistema em lÃngua natural. Foram extraÃdas postagens de usuÃrios de quatro SS de contextos distintos. Tais postagens foram classificadas por especialistas de IHC, estudadas e processadas utilizando tÃcnicas de Processamento da Linguagem Natural (PLN) e mineraÃÃo de dados e, analisadas a fim da obtenÃÃo de um modelo genÃrico. O MALTU foi aplicado em dois SS: um de entretenimento e um SS educativo. Os resultados mostram que à possÃvel avaliar um sistema a partir das postagens dos usuÃrios em SS. Tais avaliaÃÃes sÃo auxiliadas por padrÃes de extraÃÃo relacionados ao uso, aos tipos de postagens e Ãs metas de IHC utilizadas na avaliaÃÃo do sistema.
Kamenieva, Iryna. "Research Ontology Data Models for Data and Metadata Exchange Repository." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-6351.
Повний текст джерелаFor researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks.
Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized.
For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models - Protégé Version 3.4.
Ammari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns : the development and evaluation of new Web mining methods that enhance information retrieval and improve the understanding of users' Web behavior in websites and social blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.
Повний текст джерелаAmmari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.
Повний текст джерелаMalherbe, Emmanuel. "Standardization of textual data for comprehensive job market analysis." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC058/document.
Повний текст джерелаWith so many job adverts and candidate profiles available online, the e-recruitment constitutes a rich object of study. All this information is however textual data, which from a computational point of view is unstructured. The large number and heterogeneity of recruitment websites also means that there is a lot of vocabularies and nomenclatures. One of the difficulties when dealing with this type of raw textual data is being able to grasp the concepts contained in it, which is the problem of standardization that is tackled in this thesis. The aim of standardization is to create a unified process providing values in a nomenclature. A nomenclature is by definition a finite set of meaningful concepts, which means that the attributes resulting from standardization are a structured representation of the information. Several questions are however raised: Are the websites' structured data usable for a unified standardization? What structure of nomenclature is the best suited for standardization, and how to leverage it? Is it possible to automatically build such a nomenclature from scratch, or to manage the standardization process without one? To illustrate the various obstacles of standardization, the examples we are going to study include the inference of the skills or the category of a job advert, or the level of training of a candidate profile. One of the challenges of e-recruitment is that the concepts are continuously evolving, which means that the standardization must be up-to-date with job market trends. In light of this, we will propose a set of machine learning models that require minimal supervision and can easily adapt to the evolution of the nomenclatures. The questions raised found partial answers using Case Based Reasoning, semi-supervised Learning-to-Rank, latent variable models, and leveraging the evolving sources of the semantic web and social media. The different models proposed have been tested on real-world data, before being implemented in a industrial environment. The resulting standardization is at the core of SmartSearch, a project which provides a comprehensive analysis of the job market
Saneifar, Hassan. "Locating Information in Heterogeneous log files." Thesis, Montpellier 2, 2011. http://www.theses.fr/2011MON20092/document.
Повний текст джерелаIn this thesis, we present contributions to the challenging issues which are encounteredin question answering and locating information in complex textual data, like log files. Question answering systems (QAS) aim to find a relevant fragment of a document which could be regarded as the best possible concise answer for a question given by a user. In this work, we are looking to propose a complete solution to locate information in a special kind of textual data, i.e., log files generated by EDA design tools.Nowadays, in many application areas, modern computing systems are instrumented to generate huge reports about occurring events in the format of log files. Log files are generated in every computing field to report the status of systems, products, or even causes of problems that can occur. Log files may also include data about critical parameters, sensor outputs, or a combination of those. Analyzing log files, as an attractive approach for automatic system management and monitoring, has been enjoying a growing amount of attention [Li et al., 2005]. Although the process of generating log files is quite simple and straightforward, log file analysis could be a tremendous task that requires enormous computational resources, long time and sophisticated procedures [Valdman, 2004]. Indeed, there are many kinds of log files generated in some application domains which are not systematically exploited in an efficient way because of their special characteristics. In this thesis, we are mainly interested in log files generated by Electronic Design Automation (EDA) systems. Electronic design automation is a category of software tools for designing electronic systems such as printed circuit boards and Integrated Circuits (IC). In this domain, to ensure the design quality, there are some quality check rules which should be verified. Verification of these rules is principally performed by analyzing the generated log files. In the case of large designs that the design tools may generate megabytes or gigabytes of log files each day, the problem is to wade through all of this data to locate the critical information we need to verify the quality check rules. These log files typically include a substantial amount of data. Accordingly, manually locating information is a tedious and cumbersome process. Furthermore, the particular characteristics of log files, specially those generated by EDA design tools, rise significant challenges in retrieval of information from the log files. The specific features of log files limit the usefulness of manual analysis techniques and static methods. Automated analysis of such logs is complex due to their heterogeneous and evolving structures and the large non-fixed vocabulary.In this thesis, by each contribution, we answer to questions raised in this work due to the data specificities or domain requirements. We investigate throughout this work the main concern "how the specificities of log files can influence the information extraction and natural language processing methods?". In this context, a key challenge is to provide approaches that take the log file specificities into account while considering the issues which are specific to QA in restricted domains. We present different contributions as below:> Proposing a novel method to recognize and identify the logical units in the log files to perform a segmentation according to their structure. We thus propose a method to characterize complex logicalunits found in log files according to their syntactic characteristics. Within this approach, we propose an original type of descriptor to model the textual structure and layout of text documents.> Proposing an approach to locate the requested information in the log files based on passage retrieval. To improve the performance of passage retrieval, we propose a novel query expansion approach to adapt an initial query to all types of corresponding log files and overcome the difficulties like mismatch vocabularies. Our query expansion approach relies on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of questions. The second phase consists of a novel type of pseudo relevance feedback. Our method is based on a new term weighting function, called TRQ (Term Relatedness to Query), introduced in this work, which gives a score to terms of corpus according to their relatedness to the query. We also investigate how to apply our query expansion approach to documents from general domains.> Studying the use of morpho-syntactic knowledge in our approaches. For this purpose, we are interested in the extraction of terminology in the log files. Thus, we here introduce our approach, named Exterlog (EXtraction of TERminology from LOGs), to extract the terminology of log files. To evaluate the extracted terms and choose the most relevant ones, we propose a candidate term evaluation method using a measure, based on the Web and combined with statistical measures, taking into account the context of log files
Valentin, Sarah. "Extraction et combinaison d’informations épidémiologiques à partir de sources informelles pour la veille des maladies infectieuses animales." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS067.
Повний текст джерелаEpidemic intelligence aims to detect, investigate and monitor potential health threats while relying on formal (e.g. official health authorities) and informal (e.g. media) information sources. Monitoring of unofficial sources, or so-called event-based surveillance (EBS), requires the development of systems designed to retrieve and process unstructured textual data published online. This manuscript focuses on the extraction and combination of epidemiological information from informal sources (i.e. online news), in the context of the international surveillance of animal infectious diseases. The first objective of this thesis is to propose and compare approaches to enhance the identification and extraction of relevant epidemiological information from the content of online news. The second objective is to study the use of epidemiological entities extracted from the news articles (i.e. diseases, hosts, locations and dates) in the context of event extraction and retrieval of related online news.This manuscript proposes new textual representation approaches by selecting, expanding, and combining relevant epidemiological features. We show that adapting and extending text mining and classification methods improves the added value of online news sources for event-based surveillance. We stress the role of domain expert knowledge regarding the relevance and the interpretability of methods proposed in this thesis. While our researches are conducted in the context of animal disease surveillance, we discuss the generic aspects of our approaches regarding unknown threats and One Health surveillance
Yang, Hsien-Min 1957. "PRINCIPAL COMPONENTS AND TEXTURE ANALYSIS OF THE NS-001 THEMATIC MAPPER SIMULATOR DATA IN THE ROSEMONT MINING DISTRICT, ARIZONA (GEOLOGIC, DIGITAL IMAGE PROCESSING, TEXTURE EXTRACTION)." Thesis, The University of Arizona, 1985. http://hdl.handle.net/10150/275436.
Повний текст джерелаMusil, David. "Algoritmus pro detekci pozitívního a negatívního textu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-242026.
Повний текст джерелаKalmegh, Prajakta. "Image mining methodologies for content based retrieval." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/39587.
Повний текст джерелаDiaz, Alexandra Katiuska Ramos. "Biagrupamento heurístico e coagrupamento baseado em fatoração de matrizes: um estudo em dados textuais." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-12112018-182428/.
Повний текст джерелаBiclustering e coclustering are data mining tasks that allow the extraction of relevant information about data and have been applied successfully in a wide variety of domains, including those involving textual data - the focus of interest of this research. In biclustering and coclustering tasks, similarity criteria are applied simultaneously to the rows and columns of the data matrices, simultaneously grouping the objects and attributes and enabling the discovery of biclusters/coclusters. However their definitions vary according to their natures and objectives, being that the task of coclustering can be seen as a generalization of the task of biclustering. These tasks applied in the textual data demand a representation in a model of vector space, which commonly leads to the generation of spaces characterized by high dimensionality and sparsity and influences the performance of many algorithms. This work provides an analysis of the behavior of the algorithm for biclustering Cheng and Church and the algorithm for coclustering non-negative block decomposition (NBVD) applied to the context of textual data. Quantitative and qualitative experimental results are shown, from experiments on synthetic datasets created with different sparsity levels and on a real data set. The results are evaluated in terms of their biclustering oriented measures, internal clustering measures applied to the projections in the lines of the biclusters/coclusters and in terms of generation of information. The analysis of the results clarifies questions related to the difficulties faced by these algorithms in the experimental environment, as well as if they are able to provide differentiated information useful to the field of text mining. In general, the analyses carried out showed that the NBVD algorithm is better suited to work with datasets in high dimensions and with high sparsity. The algorithm of Cheng and Church, although it obtained good results according to its own objectives, provided results with low relevance in the context of textual data
Matička, Jiří. "Extrakce klíčových slov z dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236533.
Повний текст джерелаSychra, Martin. "Analýza sentimentu s využitím dolování dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255424.
Повний текст джерелаPrůša, Petr. "Multi-label klasifikace textových dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-412872.
Повний текст джерелаKrižan, Viliam. "Analýza sociálních sítí využitím metod rozpoznání vzoru." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-220399.
Повний текст джерелаHasan, Maryam. "Extracting Structured Knowledge from Textual Data in Software Repositories." Master's thesis, 2011. http://hdl.handle.net/10048/1776.
Повний текст джерелаDlamini, Phezulu, and 佩祖露. "Mining Textual Relationships from Social Media Data for Users’ E-Learning Experiences." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/r4v6xc.
Повний текст джерела"Stock market forecasting by integrating time-series and textual information." 2003. http://library.cuhk.edu.hk/record=b5896089.
Повний текст джерелаThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 88-93).
Abstracts in English and Chinese.
Abstract (English) --- p.i
Abstract (Chinese) --- p.ii
Acknowledgement --- p.iii
Contents --- p.v
List of Figures --- p.ix
List of Tables --- p.x
Chapter Part I --- The Very Beginning --- p.1
Chapter 1 --- Introduction --- p.2
Chapter 1.1 --- Contributions --- p.3
Chapter 1.2 --- Dissertation Organization --- p.4
Chapter 2 --- Problem Formulation --- p.6
Chapter 2.1 --- Defining the Prediction Task --- p.6
Chapter 2.2 --- Overview of the System Architecture --- p.8
Chapter Part II --- Literatures Review --- p.11
Chapter 3 --- The Social Dynamics of Financial Markets --- p.12
Chapter 3.1 --- The Collective Behavior of Groups --- p.13
Chapter 3.2 --- Prediction Based on Publicity Information --- p.16
Chapter 4 --- Time Series Representation --- p.20
Chapter 4.1 --- Technical Analysis --- p.20
Chapter 4.2 --- Piecewise Linear Approximation --- p.23
Chapter 5 --- Text Classification --- p.27
Chapter 5.1 --- Document Representation --- p.28
Chapter 5.2 --- Document Pre-processing --- p.30
Chapter 5.3 --- Classifier Construction --- p.31
Chapter 5.3.1 --- Naive Bayes (NB) --- p.31
Chapter 5.3.2 --- Support Vectors Machine (SVM) --- p.33
Chapter Part III --- Mining Financial Time Series and Textual Doc- uments Concurrently --- p.36
Chapter 6 --- Time Series Representation --- p.37
Chapter 6.1 --- Discovering Trends on the Time Series --- p.37
Chapter 6.2 --- t-test Based Split and Merge Segmentation Algorithm ´ؤ Splitting Phrase --- p.39
Chapter 6.3 --- t-test Based Split and Merge Segmentation Algorithm - Merging Phrase --- p.41
Chapter 7 --- Article Alignment and Pre-processing --- p.43
Chapter 7.1 --- Aligning News Articles to the Stock Trends --- p.44
Chapter 7.2 --- Selecting Positive Training Examples --- p.46
Chapter 7.3 --- Selecting Negative Training Examples --- p.48
Chapter 8 --- System Learning --- p.52
Chapter 8.1 --- Similarity Based Classification Approach --- p.53
Chapter 8.2 --- Category Sketch Generation --- p.55
Chapter 8.2.1 --- Within-Category Coefficient --- p.55
Chapter 8.2.2 --- Cross-Category Coefficient --- p.56
Chapter 8.2.3 --- Average-Importance Coefficient --- p.57
Chapter 8.3 --- Document Sketch Generation --- p.58
Chapter 9 --- System Operation --- p.60
Chapter 9.1 --- System Operation --- p.60
Chapter Part IV --- Results and Discussions --- p.62
Chapter 10 --- Evaluations --- p.63
Chapter 10.1 --- Time Series Evaluations --- p.64
Chapter 10.2 --- Classifier Evaluations --- p.64
Chapter 10.2.1 --- Batch Classification Evaluation --- p.69
Chapter 10.2.2 --- Online Classification Evaluation --- p.71
Chapter 10.2.3 --- Components Analysis --- p.74
Chapter 10.2.4 --- Document Sketch Analysis --- p.75
Chapter 10.3 --- Prediction Evaluations --- p.75
Chapter 10.3.1 --- Simulation Results --- p.77
Chapter 10.3.2 --- Hit Rate Analysis --- p.78
Chapter Part V --- The Final Words --- p.80
Chapter 11 --- Conclusion and Future Work --- p.81
Appendix --- p.84
Chapter A --- Hong Kong Stocks Categorization Powered by Reuters --- p.84
Chapter B --- Morgan Stanley Capital International (MSCI) Classification --- p.85
Chapter C --- "Precision, Recall and F1 measure" --- p.86
Bibliography --- p.88
Wren, Jonathan Daniel. "The iridescent system : an automated data-mining method to identify, evaluate, and analyze sets of relationships within textual databases." 2000. http://edissertations.library.swmed.edu/pdf/WrenJ012403/WrenJonathan.pdf.
Повний текст джерелаZois, Christos. "Applying text mining techniques to forecast the stock market fluctuations of large it companies with twitter data: descriptive and predictive approaches to enhance the research of stock market predictions with textual and semantic data." Master's thesis, 2019. http://hdl.handle.net/10362/92164.
Повний текст джерелаThis research project applies advanced text mining techniques as a method to predict stock market fluctuations by merging published tweets and daily stock market prices for a set of American Information Technology companies. This project executes a systematical approach to investigate and further analyze, by using mainly R code, two main objectives: i) which are the descriptive criteria, patterns, and variables, which are correlated with the stock fluctuation and ii) does the single usage of tweets indicate moderate signal to predict with high accuracy the stock market fluctuations. The main supposition and expected output of the research work is to deliver findings about the twitter text significance and predictability power to indicate the importance of social media content in terms of stock market fluctuations by using descriptive and predictive data mining approaches, as natural language processing, topic modelling, sentiment analysis and binary classification with neural networks.
(10157291), Yi-Yu Lai. "Relational Representation Learning Incorporating Textual Communication for Social Networks." Thesis, 2021.
Знайти повний текст джерелаSarkas, Nikolaos. "Querying, Exploring and Mining the Extended Document." Thesis, 2011. http://hdl.handle.net/1807/29857.
Повний текст джерелаChen, Jhih-Rong, and 陳之容. "Texture Synthesis Using Data Mining Technique." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/48354363991458229077.
Повний текст джерела國立東華大學
資訊工程學系
92
We present a new texture synthesis algorithm, which combines texture synthesis with data mining technique. And our approach works well for many types of textures without any knowledge of their physical information process. Our approach first analyzes input texture to construct patch candidate data, and then we use this data to find frequent pattern sequences for synthesis results by using data mining technique─Sequential Pattern Mining.
"All Purpose Textual Data Information Extraction, Visualization and Querying." Master's thesis, 2018. http://hdl.handle.net/2286/R.I.50530.
Повний текст джерелаDissertation/Thesis
Masters Thesis Software Engineering 2018
Louis, Anita Lily. "Unsupervised discovery of relations for analysis of textual data in digital forensics." Diss., 2010. http://hdl.handle.net/2263/27479.
Повний текст джерелаDissertation (MSc)--University of Pretoria, 2010.
Computer Science
unrestricted
Moravcová, Libuše. "Srovnání sylabů předmětů na různých univerzitách dolováním znalosti z textu." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-428783.
Повний текст джерелаLiao, Shao-An, and 廖紹安. "Using Data Mining Techniques And Texture Analysis for Landslide Change Assessment-A Case Study at Chiufanershan Area." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/28409706792512680728.
Повний текст джерела明道大學
環境規劃暨設計研究所
96
Massive landslides, caused by the catastrophic Chi-Chi earthquake on September 21, 1999, occurred at Chiufanershan area in Nantou County. In this study, multi-temporal SPOT satellite images were chosen for landslide change analysis. First, image subtraction method was employed for analyzing landslide spectral characteristics, and ISODATA method was used for training sites selection before performing supervised classification. The landslide sites were extracted and compared using the common used maximum likelihood estimation (MLE), data mining techniques, including support vector machine (SVM) and decision tree C5.0, and texture analysis. The results can be used as the references of disaster assessment in the landslide area. The analyzed result shows support vector machine (SVM) has higher overall accuracy than other classification methods. By texture analysis, the classified average overall accuracy (Kappa value) was obviously risen up from 76.5% to 87.2%. It indicates that texture information can effectively isolate the surface characteristics among land covers. The analyzed results show landslide areas have decreased from 210.8 ha on September 27, 1999 to 72.6 ha on May 5, 2007, about 65.6% of area restored, indicating that the sites of landslide have been gradually restored over nine years of natural vegetation succession.
Ray, A., P. K. Bala, and Nripendra P. Rana. "Exploring the drivers of customers’ brand attitudes of online travel agency services: A text-mining based approach." 2021. http://hdl.handle.net/10454/18339.
Повний текст джерелаThis paper aims to explore the important qualitative aspects of online user-generated-content that reflects customers’ brand-attitudes. Additionally, the qualitative aspects can help service-providers understand customers’ brand-attitudes by focusing on the important aspects rather than reading the entire review, which will save both their time and effort. We have utilised a total of 10,000 reviews from TripAdvisor (an online-travel-agency provider). This study has analysed the data using statistical-technique (logistic regression), predictive-model (artificial-neural-networks) and structural-modelling technique to understand the most important aspects (i.e. sentiment, emotion or parts-of-speech) that can help to predict customers’ brand-attitudes. Results show that sentiment is the most important aspect in predicting brand-attitudes. While total sentiment content and content polarity have significant positive association, negative high-arousal emotions and low-arousal emotions have significant negative association with customers’ brand attitudes. However, parts-of-speech aspects have no significant impact on brand attitude. The paper concludes with implications, limitations and future research directions.
The full-text of this article will be released for public view at the end of the publisher embargo on 28 Aug 2022.
Samson, Anne-Renée. "Extraction automatique et visualisation des thèmes abordés dans des résumés de mémoires et de thèses en anthropologie au Québec, de 1985 à 2009." Thèse, 2013. http://hdl.handle.net/1866/10440.
Повний текст джерелаTaking advantage of the recent development of automated analysis of textual data, digital records of documents, data graphics and anthropology, this study was set forth using data mining techniques to create a thematic map of anthropological documents. In this exploratory research, we propose to evaluate the usefulness of thematic analysis by using automated classification of textual data, as well as information visualizations (based on network analysis). More precisely, we want to examine the method of hierarchical clustering (HCA, agglomerative) for thematic analysis and information extraction. We built our study from a database consisting of 1 240 thesis abstracts, granted from 1985 to 2009, by anthropological departments at the University of Montreal and University Laval, as well as historical department at University Laval (for archaeological and ethnological abstracts). In the first section, we present our theoretical framework; we expose definitions of text mining, its origins, the practical applications and the methodology, and in the end, we present a literature review. The second part is devoted to the methodological framework and we discuss the various stages through which the project was conducted; construction of database, linguistic and statistical filtering, automated classification, etc. Finally, in the last section, we display results of two specific experiments and we present our interpretations. We also discuss about thematic navigation and conceptual approaches. We conclude with the limitations we faced through this project and paths of interest for future research.
Zouaq, Amal. "Une approche d'ingénierie ontologique pour l'acquisition et l'exploitation des connaissances à partir de documents textuels : vers des objets de connaissances et d'apprentissage." Thèse, 2007. http://hdl.handle.net/1866/6437.
Повний текст джерела