Academic literature on the topic 'Textual data-mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Textual data-mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Textual data-mining"

1

Yasuda, Akio. "Reviewing "Text Mining": Textual Data Mining." IEEJ Transactions on Electronics, Information and Systems 125, no. 5 (2005): 682–89. http://dx.doi.org/10.1541/ieejeiss.125.682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Raiyani, Ronak S., Dr Bankim Radadiya, and Dr Satish Thumar. "Analyzing, Developing and Implementing Data Mining Techniques on Databases, Web Contents and Textual Data." Paripex - Indian Journal Of Research 2, no. 3 (January 15, 2012): 48–50. http://dx.doi.org/10.15373/22501991/mar2013/18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yassir, Ali Hameed, Ali A. Mohammed, Adel Abdul-Jabbar Alkhazraji, Mustafa Emad Hameed, Mohammed Saad Talib, and Mohanad Faeq Ali. "Sentimental classification analysis of polarity multi-view textual data using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (October 1, 2020): 5526. http://dx.doi.org/10.11591/ijece.v10i5.pp5526-5534.

Full text
Abstract:
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
APA, Harvard, Vancouver, ISO, and other styles
4

Jayasudha, J., and A. Christina Esther. "Mining Sequential Pattern of Data in Textual Document Using Data Mining Classification Technique." Asian Journal of Computer Science and Technology 8, S1 (February 5, 2019): 41–45. http://dx.doi.org/10.51983/ajcst-2019.8.s1.1961.

Full text
Abstract:
Text document were transmitted over the internet for the text communication. So they were occurred many problems like repeated text occurred because of same data were provided in the internet. To characterize and extracting that is a most critical task for the researchers. Many researchers were characterized and applied in many fields like real-life scenarios, such as real-time monitoring on abnormal user behaviors, etc. In this case to detect and characterize the personalized behavior of the user were provide some drawbacks. To solve this problem, this paper analyzing the sequential data and characterize the user behavior with the help of the data mining sequential pattern matching algorithm.
APA, Harvard, Vancouver, ISO, and other styles
5

Eltaher, Mohammed, and Jeongkyu Lee. "Social User Mining." International Journal of Multimedia Data Engineering and Management 4, no. 4 (October 2013): 58–70. http://dx.doi.org/10.4018/ijmdem.2013100104.

Full text
Abstract:
In recent years, the pervasive use of social media has generated huge amounts of data that starts to gain a lot of attentions. Each social media source utilizes different data types such as textual and visual. For example, Twitter1 is for a short text message, Flickr2 is for images and videos, and Facebook3 allows all of these data types. It is highly desired to find patterns of social media users from such different data formats. With the use of data mining techniques, the social media data opens a lot of opportunities for researchers. Despite of its short history, social media mining has become very active research area. This paper provides a comprehensive survey on recent research on social user mining. In particular, the survey focuses on two aspects: (1) social user mining based on data types, such as textual, visual, and both textual and visual information, and (2) social user mining based on mining techniques. In addition, we present our current research on social user mining as well as its future directions.
APA, Harvard, Vancouver, ISO, and other styles
6

Davahli, Mohammad Reza, Waldemar Karwowski, Edgar Gutierrez, Krzysztof Fiok, Grzegorz Wróbel, Redha Taiar, and Tareq Ahram. "Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data." Symmetry 12, no. 11 (November 19, 2020): 1902. http://dx.doi.org/10.3390/sym12111902.

Full text
Abstract:
The identification of human behavior can provide useful information across multiple job spectra. Recent advances in applying data-based approaches to social sciences have increased the feasibility of modeling human behavior. In particular, studying human behavior by analyzing unstructured textual data has recently received considerable attention because of the abundance of textual data. The main objective of the present study was to discuss the primary methods for identifying and predicting human behavior through the mining of unstructured textual data. Of the 823 articles analyzed, 87 met the predefined inclusion criteria and were included in the literature review. Our results show that the included articles could be symmetrically classified into two groups. The first group of articles attempted to identify the leading indicators of human behavior in unstructured textual data. In this group, the data-based approaches had three main components: (1) collecting self-reported survey data, (2) collecting data from social media and extracting data features, and (3) applying correlation analysis to evaluate the relationship between two sets of data. In contrast, the second group focused on the accuracy of data-based approaches for predicting human behavior. In this group, the data-based approaches could be categorized into (1) approaches based on labeled unstructured textual data and (2) approaches based on unlabeled unstructured textual data. The review provides a comprehensive insight into unstructured textual data mining to identify and predict human behavior and personality traits.
APA, Harvard, Vancouver, ISO, and other styles
7

HOLZMAN, LARS E., TODD A. FISHER, LEON M. GALITSKY, APRIL KONTOSTATHIS, and WILLIAM M. POTTENGER. "A SOFTWARE INFRASTRUCTURE FOR RESEARCH IN TEXTUAL DATA MINING." International Journal on Artificial Intelligence Tools 13, no. 04 (December 2004): 829–49. http://dx.doi.org/10.1142/s0218213004001843.

Full text
Abstract:
Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Pei Bin, Lan Hu, Hui Yang, Xiang Feng Xue, Chuan Xu Liu, and Xin Jian Li. "Target Value Analysis Based on Data Mining Technology." Applied Mechanics and Materials 602-605 (August 2014): 3096–99. http://dx.doi.org/10.4028/www.scientific.net/amm.602-605.3096.

Full text
Abstract:
In this paper, the data mining technology and the mining process was explained; and several common methods of data mining were described. Based on the characteristics of the target value, application of text classification and textual association in the target value mining were discussed, and the process model of data mining concerning target value was also expressed.
APA, Harvard, Vancouver, ISO, and other styles
9

Ur-Rahman, Nadeem. "Textual Data Mining For Knowledge Discovery and Data Classification: A Comparative Study." European Scientific Journal, ESJ 13, no. 21 (July 31, 2017): 429. http://dx.doi.org/10.19044/esj.2017.v13n21p429.

Full text
Abstract:
Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.
APA, Harvard, Vancouver, ISO, and other styles
10

Alguliev, Rasim M., Ramiz M. Aliguliyev, and Saadat A. Nazirova. "Classification of Textual E-Mail Spam Using Data Mining Techniques." Applied Computational Intelligence and Soft Computing 2011 (2011): 1–8. http://dx.doi.org/10.1155/2011/416308.

Full text
Abstract:
A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined byk-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Textual data-mining"

1

Zhou, Wubai. "Data Mining Techniques to Understand Textual Data." FIU Digital Commons, 2017. https://digitalcommons.fiu.edu/etd/3493.

Full text
Abstract:
More than ever, information delivery online and storage heavily rely on text. Billions of texts are produced every day in the form of documents, news, logs, search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Text understanding is a fundamental and essential task involving broad research topics, and contributes to many applications in the areas text summarization, search engine, recommendation systems, online advertising, conversational bot and so on. However, understanding text for computers is never a trivial task, especially for noisy and ambiguous text such as logs, search queries. This dissertation mainly focuses on textual understanding tasks derived from the two domains, i.e., disaster management and IT service management that mainly utilizing textual data as an information carrier. Improving situation awareness in disaster management and alleviating human efforts involved in IT service management dictates more intelligent and efficient solutions to understand the textual data acting as the main information carrier in the two domains. From the perspective of data mining, four directions are identified: (1) Intelligently generate a storyline summarizing the evolution of a hurricane from relevant online corpus; (2) Automatically recommending resolutions according to the textual symptom description in a ticket; (3) Gradually adapting the resolution recommendation system for time correlated features derived from text; (4) Efficiently learning distributed representation for short and lousy ticket symptom descriptions and resolutions. Provided with different types of textual data, data mining techniques proposed in those four research directions successfully address our tasks to understand and extract valuable knowledge from those textual data. My dissertation will address the research topics outlined above. Concretely, I will focus on designing and developing data mining methodologies to better understand textual information, including (1) a storyline generation method for efficient summarization of natural hurricanes based on crawled online corpus; (2) a recommendation framework for automated ticket resolution in IT service management; (3) an adaptive recommendation system on time-varying temporal correlated features derived from text; (4) a deep neural ranking model not only successfully recommending resolutions but also efficiently outputting distributed representation for ticket descriptions and resolutions.
APA, Harvard, Vancouver, ISO, and other styles
2

Ur-Rahman, Nadeem. "Textual data mining applications for industrial knowledge management solutions." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/6373.

Full text
Abstract:
In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.
APA, Harvard, Vancouver, ISO, and other styles
3

Kubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.

Full text
Abstract:
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
APA, Harvard, Vancouver, ISO, and other styles
4

Kalledat, Tobias. "Tracking domain knowledge based on segmented textual sources." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2009. http://dx.doi.org/10.18452/15925.

Full text
Abstract:
Die hier vorliegende Forschungsarbeit hat zum Ziel, Erkenntnisse über den Einfluss der Vorverarbeitung auf die Ergebnisse der Wissensgenerierung zu gewinnen und konkrete Handlungsempfehlungen für die geeignete Vorverarbeitung von Textkorpora in Text Data Mining (TDM) Vorhaben zu geben. Der Fokus liegt dabei auf der Extraktion und der Verfolgung von Konzepten innerhalb bestimmter Wissensdomänen mit Hilfe eines methodischen Ansatzes, der auf der waagerechten und senkrechten Segmentierung von Korpora basiert. Ergebnis sind zeitlich segmentierte Teilkorpora, welche die Persistenzeigenschaft der enthaltenen Terme widerspiegeln. Innerhalb jedes zeitlich segmentierten Teilkorpus können jeweils Cluster von Termen gebildet werden, wobei eines diejenigen Terme enthält, die bezogen auf das Gesamtkorpus nicht persistent sind und das andere Cluster diejenigen, die in allen zeitlichen Segmenten vorkommen. Auf Grundlage einfacher Häufigkeitsmaße kann gezeigt werden, dass allein die statistische Qualität eines einzelnen Korpus es erlaubt, die Vorverarbeitungsqualität zu messen. Vergleichskorpora sind nicht notwendig. Die Zeitreihen der Häufigkeitsmaße zeigen signifikante negative Korrelationen zwischen dem Cluster von Termen, die permanent auftreten, und demjenigen das die Terme enthält, die nicht persistent in allen zeitlichen Segmenten des Korpus vorkommen. Dies trifft ausschließlich auf das optimal vorverarbeitete Korpus zu und findet sich nicht in den anderen Test Sets, deren Vorverarbeitungsqualität gering war. Werden die häufigsten Terme unter Verwendung domänenspezifischer Taxonomien zu Konzepten gruppiert, zeigt sich eine signifikante negative Korrelation zwischen der Anzahl unterschiedlicher Terme pro Zeitsegment und den einer Taxonomie zugeordneten Termen. Dies trifft wiederum nur für das Korpus mit hoher Vorverarbeitungsqualität zu. Eine semantische Analyse auf einem mit Hilfe einer Schwellenwert basierenden TDM Methode aufbereiteten Datenbestand ergab signifikant unterschiedliche Resultate an generiertem Wissen, abhängig von der Qualität der Datenvorverarbeitung. Mit den in dieser Forschungsarbeit vorgestellten Methoden und Maßzahlen ist sowohl die Qualität der verwendeten Quellkorpora, als auch die Qualität der angewandten Taxonomien messbar. Basierend auf diesen Erkenntnissen werden Indikatoren für die Messung und Bewertung von Korpora und Taxonomien entwickelt sowie Empfehlungen für eine dem Ziel des nachfolgenden Analyseprozesses adäquate Vorverarbeitung gegeben.
The research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing.
APA, Harvard, Vancouver, ISO, and other styles
5

元吉, 忠寛, and Tadahiro MOTOYOSHI. "災害のイマジネーション力に関する探索的研究 - 大学生の想像力と阪神淡路大震災の事例との比較 -." 名古屋大学大学院教育発達科学研究科, 2006. http://hdl.handle.net/2237/9454.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Spiegler, Sebastian R. "Comparative study of clustering algorithms on textual databases : clustering of curricula vitae into comptency-based groups to support knowledge management /." Saarbrücken : VDM Verl. Müller, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=3035354&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nieto, Erick Mauricio Gómez. "Projeção multidimensional aplicada a visualização de resultados de busca textual." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-105730/.

Full text
Abstract:
Usuários da Internet estão muito familiarizados que resultados de uma consulta sejam exibidos como uma lista ordenada de snippets. Cada snippet possui conteúdo textual que mostra um resumo do documento referido (ou página web) e um link para o mesmo. Esta representação tem muitas vantagens como, por exemplo, proporcionar uma navegação fácil e simples de interpretar. No entanto, qualquer usuário que usa motores de busca poderia reportar possivelmente alguma experiência de decepção com este modelo. Todavia, ela tem limitações em situações particulares, como o não fornecimento de uma visão geral da coleção de documentos recuperados. Além disso, dependendo da natureza da consulta - por exemplo, pode ser muito geral, ou ambígua, ou mal expressa - a informação desejada pode ser mal classificada, ou os resultados podem contemplar temas variados. Várias tarefas de busca seriam mais fáceis se fosse devolvida aos usuários uma visão geral dos documentos organizados de modo a refletir a forma como são relacionados, em relação ao conteúdo. Propomos uma técnica de visualização para exibir os resultados de consultas web que visa superar tais limitações. Ela combina a capacidade de preservação de vizinhança das projeções multidimensionais com a conhecida representação baseada em snippets. Essa visualização emprega uma projeção multidimensional para derivar layouts bidimensionais dos resultados da pesquisa, que preservam as relações de similaridade de texto, ou vizinhança. A similaridade é calculada mediante a aplicação da similaridade do cosseno sobre uma representação bag-of-words vetorial de coleções construídas a partir dos snippets. Se os snippets são exibidos diretamente de acordo com o layout derivado, eles se sobrepõem consideravelmente, produzindo uma visualização pobre. Nós superamos esse problema definindo uma energia funcional que considera tanto a sobreposição entre os snippets e a preservação da estrutura de vizinhanças como foi dada no layout da projeção. Minimizando esta energia funcional é fornecida uma representação bidimensional com preservação das vizinhanças dos snippets textuais com sobreposição mínima. A visualização transmite tanto uma visão global dos resultados da consulta como os agrupamentos visuais que refletem documentos relacionados, como é ilustrado em vários dos exemplos apresentados
Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or web page) and a link to it. This display has many advantages, e.g., it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query - e.g., it may be too general, or ambiguous, or ill expressed - the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content-wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a bag-of-words vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping amongst snippets and the preservation of the neighborhood structure as given in vii the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global view of the query results and visual groupings that reflect related results, as illustrated in several examples shown
APA, Harvard, Vancouver, ISO, and other styles
8

Fabbri, Renato. "Topological stability and textual differentiation in human interaction networks: statistical analysis, visualization and linked data." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-11092017-154706/.

Full text
Abstract:
This work reports on stable (or invariant) topological properties and textual differentiation in human interaction networks, with benchmarks derived from public email lists. Activity along time and topology were observed in snapshots in a timeline, and at different scales. Our analysis shows that activity is practically the same for all networks across timescales ranging from seconds to months. The principal components of the participants in the topological metrics space remain practically unchanged as different sets of messages are considered. The activity of participants follows the expected scale-free outline, thus yielding the hub, intermediary and peripheral classes of vertices by comparison against the Erdös-Rényi model. The relative sizes of these three sectors are essentially the same for all email lists and the same along time. Typically, 3-12% of the vertices are hubs, 15-45% are intermediary and 44-81% are peripheral vertices. Texts from each of such sectors are shown to be very different through direct measurements and through an adaptation of the Kolmogorov-Smirnov test. These properties are consistent with the literature and may be general for human interaction networks, which has important implications for establishing a typology of participants based on quantitative criteria. For guiding and supporting this research, we also developed a visualization method of dynamic networks through animations. To facilitate verification and further steps in the analyses, we supply a linked data representation of data related to our results.
Este trabalho relata propriedades topológicas estáveis (ou invariantes) e diferenciação textual em redes de interação humana, com referências derivadas de listas públicas de e-mail. A atividade ao longo do tempo e a topologia foram observadas em instantâneos ao longo de uma linha do tempo e em diferentes escalas. A análise mostra que a atividade é praticamente a mesma para todas as redes em escalas temporais de segundos a meses. As componentes principais dos participantes no espaço das métricas topológicas mantêm-se praticamente inalteradas quando diferentes conjuntos de mensagens são considerados. A atividade dos participantes segue o esperado perfil livre de escala, produzindo, assim, as classes de vértices dos hubs, dos intermediários e dos periféricos em comparação com o modelo Erdös-Rényi. Os tamanhos relativos destes três setores são essencialmente os mesmos para todas as listas de e-mail e ao longo do tempo. Normalmente, 3-12% dos vértices são hubs, 15-45% são intermediários e 44-81% são vértices periféricos. Os textos de cada um destes setores são considerados muito diferentes através de uma adaptação dos testes de Kolmogorov-Smirnov. Estas propriedades são consistentes com a literatura e podem ser gerais para redes de interação humana, o que tem implicações importantes para o estabelecimento de uma tipologia dos participantes com base em critérios quantitativos. De modo a guiar e apoiar esta pesquisa, também desenvolvemos um método de visualização para redes dinâmicas através de animações. Para facilitar a verificação e passos seguintes nas análises, fornecemos uma representação em dados ligados dos dados relacionados aos nossos resultados.
APA, Harvard, Vancouver, ISO, and other styles
9

Mendes, MarÃlia Soares. "MALTU - model for evaluation of interaction in social systems from the Users Textual Language." Universidade Federal do CearÃ, 2015. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=14296.

Full text
Abstract:
The field of Human Computer Interaction (HCI) has suggested various methods for evaluating systems in order to improve their usability and User eXperience (UX). The advent of Web 2.0 has allowed the development of applications marked by collaboration, communication and interaction among their users in a way and on a scale never seen before. Social Systems (SS) (e.g. Twitter, Facebook, MySpace, LinkedIn etc.) are examples of such applications and have features such as: frequent exchange of messages, spontaneity and expression of feelings. The opportunities and challenges posed by these types of applications require the traditional evaluation methods to be reassessed, taking into consideration these new characteristics. For instance, the postings of users on SS reveal their opinions on various issues, including on what they think of the system. This work aims to test the hypothesis that the postings of users in SS provide relevant data for evaluation of the usability and of UX in SS. While researching through literature, we have not identified any evaluation model intending to collect and interpret texts from users in order to assess the user experience and system usability. Thus, this thesis proposes MALTU - Model for evaluation of interaction in social systems from the Users Textual Language. In order to provide a basis for the development of the proposed model, we conducted a study of how users express their opinions on the system in natural language. We extracted postings of users from four SS of different contexts. HCI experts classified, studied and processed such postings by using Natural Language Processing (PLN) techniques and data mining, and then analyzed them in order to obtain a generic model. The MALTU was applied in two SS: an entertainment and an educational SS. The results show that is possible to evaluate a system from the postings of users in SS. Such assessments are aided by extraction patterns related to the use, to the types of postings and to HCI factors used in system.
A Ãrea de InteraÃÃo Humano-Computador (IHC) tem sugerido muitas formas para avaliar sistemas a fim de melhorar sua usabilidade e a eXperiÃncia do UsuÃrio (UX). O surgimento da web 2.0 permitiu o desenvolvimento de aplicaÃÃes marcadas pela colaboraÃÃo, comunicaÃÃo e interatividade entre seus usuÃrios de uma forma e em uma escala nunca antes observadas. Sistemas Sociais (SS) (e.g., Twitter, Facebook, MySpace, LinkedIn etc.) sÃo exemplos dessas aplicaÃÃes e possuem caracterÃsticas como: frequente troca de mensagens e expressÃo de sentimentos de forma espontÃnea. As oportunidades e os desafios trazidos por esses tipos de aplicaÃÃes exigem que os mÃtodos tradicionais de avaliaÃÃo sejam repensados, considerando essas novas caracterÃsticas. Por exemplo, as postagens dos usuÃrios em SS revelam suas opiniÃes sobre diversos assuntos, inclusive sobre o que eles pensam do sistema em uso. Esta tese procura testar a hipÃtese de que as postagens dos usuÃrios em SS fornecem dados relevantes para avaliaÃÃo da Usabilidade e da UX (UUX) em SS. Durante as pesquisas realizadas na literatura, nÃo foi identificado nenhum modelo de avaliaÃÃo que tenha direcionado seu foco na coleta e anÃlise das postagens dos usuÃrios a fim de avaliar a UUX de um sistema em uso. Sendo assim, este estudo propÃe o MALTU â Modelo para AvaliaÃÃo da interaÃÃo em sistemas sociais a partir da Linguagem Textual do UsuÃrio. A fim de fornecer bases para o desenvolvimento do modelo proposto, foram realizados estudos de como os usuÃrios expressam suas opiniÃes sobre o sistema em lÃngua natural. Foram extraÃdas postagens de usuÃrios de quatro SS de contextos distintos. Tais postagens foram classificadas por especialistas de IHC, estudadas e processadas utilizando tÃcnicas de Processamento da Linguagem Natural (PLN) e mineraÃÃo de dados e, analisadas a fim da obtenÃÃo de um modelo genÃrico. O MALTU foi aplicado em dois SS: um de entretenimento e um SS educativo. Os resultados mostram que à possÃvel avaliar um sistema a partir das postagens dos usuÃrios em SS. Tais avaliaÃÃes sÃo auxiliadas por padrÃes de extraÃÃo relacionados ao uso, aos tipos de postagens e Ãs metas de IHC utilizadas na avaliaÃÃo do sistema.
APA, Harvard, Vancouver, ISO, and other styles
10

Kamenieva, Iryna. "Research Ontology Data Models for Data and Metadata Exchange Repository." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-6351.

Full text
Abstract:

For researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks.

Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized.

For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models - Protégé Version 3.4.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Textual data-mining"

1

Inmon, William H. Tapping into unstructured data: Integrating unstructured data and textual analytics into business intelligence. Upper Saddle River, NJ: Prentice Hall, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Poibeau, Thierry. Traitement automatique du contenu textuel. Paris: Hermès science-Lavoisier, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Inmon, Bill. Turning Text into Gold: Taxonomies and Textual Analytics. Technics Publications, LLC, 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Inmon, William, and Anthony Nesavich. Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence. Pearson Education, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence. Prentice Hall PTR, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Textual data-mining"

1

Poon, Leonard K. M., Chun Fai Leung, and Nevin L. Zhang. "Mining Textual Reviews with Hierarchical Latent Tree Analysis." In Data Mining and Big Data, 401–8. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-61845-6_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nguyen, Thin, Svetha Venkatesh, and Dinh Phung. "Textual Cues for Online Depression in Community and Personal Settings." In Advanced Data Mining and Applications, 19–34. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-49586-6_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kurach, Karol, Krzysztof Pawłowski, Łukasz Romaszko, Marcin Tatjewski, Andrzej Janusz, and Hung Son Nguyen. "An Ensemble Approach to Multi-label Classification of Textual Data." In Advanced Data Mining and Applications, 306–17. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35527-1_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Balbi, Simona, and Emilio Meglio. "Contributions of Textual Data Analysis to Text Retrieval." In Classification, Clustering, and Data Mining Applications, 511–20. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-642-17103-1_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cho, Vincent, and Beat Wüthrich. "Combining Forecasts from Multiple Textual Data Sources." In Methodologies for Knowledge Discovery and Data Mining, 174–79. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999. http://dx.doi.org/10.1007/3-540-48912-6_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Galanopoulos, Damianos, Milan Dojchinovski, Krishna Chandramouli, Tomáš Kliegr, and Vasileios Mezaris. "Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video." In Multimedia Data Mining and Analytics, 295–310. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-14998-1_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Takasu, Atsuhiro. "A Sequence Labeling Method Using Syntactical and Textual Patterns for Record Linkage." In Pattern Recognition and Data Mining, 199–208. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11551188_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Singhal, Mayank, and Suman Banerjee. "Group Trip Planning Queries on Road Networks Using Geo-Tagged Textual Information." In Advanced Data Mining and Applications, 243–57. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-95405-5_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rajman, Martin, and Romaric Besançon. "Text Mining - Knowledge extraction from unstructured textual data." In Studies in Classification, Data Analysis, and Knowledge Organization, 473–80. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/978-3-642-72253-0_64.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lai⋆, Kwok-Yin, and Wai Lam. "Meta-learning Models for Automatic Textual Document Categorization." In Advances in Knowledge Discovery and Data Mining, 78–89. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-45357-1_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Textual data-mining"

1

Caputo, G. M., and N. F. F. Ebecken. "Computational system for the textual processing of industrial patents." In DATA MINING AND MIS 2006. Southampton, UK: WIT Press, 2006. http://dx.doi.org/10.2495/data060171.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Fize, Jacques, Mathieu Roche, and Maguelonne Teisseire. "Matching Heterogeneous Textual Data Using Spatial Features." In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018. http://dx.doi.org/10.1109/icdmw.2018.00197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tan, Pang-Ning, Hannah Blau, Steve Harp, and Robert Goldman. "Textual data mining of service center call records." In the sixth ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2000. http://dx.doi.org/10.1145/347090.347177.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Michalenko, Joshua J., Andrew S. Lan, and Richard G. Baraniuk. "Data-Mining Textual Responses to Uncover Misconception Patterns." In L@S 2017: Fourth (2017) ACM Conference on Learning @ Scale. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3051457.3053996.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Nath, Devjyoti, Anirban Roy, Sumitra Kumari Shaw, Amlan Ghorai, and Shanta Phani. "Textual Lyrics Based Emotion Analysis of Bengali Songs." In 2020 International Conference on Data Mining Workshops (ICDMW). IEEE, 2020. http://dx.doi.org/10.1109/icdmw51313.2020.00015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Xu, Jia. "Joint Visual and Textual Mining on Social Media." In 2014 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE, 2014. http://dx.doi.org/10.1109/icdmw.2014.114.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Akhtar, Nadeem, Bushra Siddique, and Rounaque Afroz. "Visual and textual summarization of webpages." In 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC). IEEE, 2014. http://dx.doi.org/10.1109/icdmic.2014.6954267.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Roos, Teemu, and Yuan Zou. "Analysis of Textual Variation by Latent Tree Structures." In 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, 2011. http://dx.doi.org/10.1109/icdm.2011.24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Neri, Federico, and Paolo Geraci. "Mining Textual Data to Boost Information Access in OSINT." In 2009 13th International Conference Information Visualisation, IV. IEEE, 2009. http://dx.doi.org/10.1109/iv.2009.99.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Hristidis, Vagelis, Oscar Valdivia, Michail Vlachos, and Philip S. Yu. "A System for Keyword Search on Textual Streams." In Proceedings of the 2007 SIAM International Conference on Data Mining. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2007. http://dx.doi.org/10.1137/1.9781611972771.52.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Textual data-mining"

1

Dooley, Kevin, Steven Corman, and Dan Ballard. Centering Resonance Analysis: A Superior Data Mining Algorithm for Textual Data Streams. Fort Belvoir, VA: Defense Technical Information Center, March 2004. http://dx.doi.org/10.21236/ada422048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Neyedley, K., J. J. Hanley, Z. Zajacz, and M. Fayek. Accessory mineral thermobarometry, trace element chemistry, and stable O isotope systematics, Mooshla Intrusive Complex (MIC), Doyon-Bousquet-LaRonde mining camp, Abitibi greenstone belt, Québec. Natural Resources Canada/CMSS/Information Management, 2021. http://dx.doi.org/10.4095/328986.

Full text
Abstract:
The Mooshla Intrusive Complex (MIC) is an Archean polyphase magmatic body located in the Doyon-Bousquet-LaRonde (DBL) mining camp of the Abitibi greenstone belt, Québec, that is spatially associated with numerous gold (Au)-rich VMS, epizonal 'intrusion-related' Au-Cu vein systems, and shear zone-hosted (orogenic?) Au deposits. To elucidate the P-T conditions of crystallization, and oxidation state of the MIC magmas, accessory minerals (zircon, rutile, titanite) have been characterized using a variety of analytical techniques (e.g., trace element thermobarometry). The resulting trace element and oxythermobarometric database for accessory minerals in the MIC represents the first examination of such parameters in an Archean magmatic complex in a world-class mineralized district. Mineral thermobarometry yields P-T constraints on accessory mineral crystallization consistent with the expected conditions of tonalite-trondhjemite-granite (TTG) magma genesis, well above peak metamorphic conditions in the DBL camp. Together with textural observations, and mineral trace element data, the P-T estimates reassert that the studied minerals are of magmatic origin and not a product of metamorphism. Oxygen fugacity constraints indicate that while the magmas are relatively oxidizing (as indicated by the presence of magmatic epidote, titanite, and anhydrite), zircon trace element systematics indicate that the magmas were not as oxidized as arc magmas in younger (post-Archean) porphyry environments. The data presented provides first constraints on the depth and other conditions of melt generation and crystallization of the MIC. The P-T estimates and qualitative fO2 constraints have significant implications for the overall model for formation (crystallization, emplacement) of the MIC and potentially related mineral deposits.
APA, Harvard, Vancouver, ISO, and other styles
3

Neyedley, K., J. J. Hanley, P. Mercier-Langevin, and M. Fayek. Ore mineralogy, pyrite chemistry, and S isotope systematics of magmatic-hydrothermal Au mineralization associated with the Mooshla Intrusive Complex (MIC), Doyon-Bousquet-LaRonde mining camp, Abitibi greenstone belt, Québec. Natural Resources Canada/CMSS/Information Management, 2021. http://dx.doi.org/10.4095/328985.

Full text
Abstract:
The Mooshla Intrusive Complex (MIC) is an Archean polyphase magmatic body located in the Doyon-Bousquet-LaRonde (DBL) mining camp of the Abitibi greenstone belt, Québec. The MIC is spatially associated with numerous gold (Au)-rich VMS, epizonal 'intrusion-related' Au-Cu vein systems, and shear zone-hosted (orogenic?) Au deposits. To elucidate genetic links between deposits and the MIC, mineralized samples from two of the epizonal 'intrusion-related' Au-Cu vein systems (Doyon and Grand Duc Au-Cu) have been characterized using a variety of analytical techniques. Preliminary results indicate gold (as electrum) from both deposits occurs relatively late in the systems as it is primarily observed along fractures in pyrite and gangue minerals. At Grand Duc gold appears to have formed syn- to post-crystallization relative to base metal sulphides (e.g. chalcopyrite, sphalerite, pyrrhotite), whereas base metal sulphides at Doyon are relatively rare. The accessory ore mineral assemblage at Doyon is relatively simple compared to Grand Duc, consisting of petzite (Ag3AuTe2), calaverite (AuTe2), and hessite (Ag2Te), while accessory ore minerals at Grand Duc are comprised of tellurobismuthite (Bi2Te3), volynskite (AgBiTe2), native Te, tsumoite (BiTe) or tetradymite (Bi2Te2S), altaite (PbTe), petzite, calaverite, and hessite. Pyrite trace element distribution maps from representative pyrite grains from Doyon and Grand Duc were collected and confirm petrographic observations that Au occurs relatively late. Pyrite from Doyon appears to have been initially trace-element poor, then became enriched in As, followed by the ore metal stage consisting of Au-Ag-Te-Bi-Pb-Cu enrichment and lastly a Co-Ni-Se(?) stage enrichment. Grand Duc pyrite is more complex with initial enrichments in Co-Se-As (Stage 1) followed by an increase in As-Co(?) concentrations (Stage 2). The ore metal stage (Stage 3) is indicated by another increase in As coupled with Au-Ag-Bi-Te-Sb-Pb-Ni-Cu-Zn-Sn-Cd-In enrichment. The final stage of pyrite growth (Stage 4) is represented by the same element assemblage as Stage 3 but at lower concentrations. Preliminary sulphur isotope data from Grand Duc indicates pyrite, pyrrhotite, and chalcopyrite all have similar delta-34S values (~1.5 � 1 permille) with no core-to-rim variations. Pyrite from Doyon has slightly higher delta-34S values (~2.5 � 1 permille) compared to Grand Duc but similarly does not show much core-to-rim variation. At Grand Duc, the occurrence of Au concentrating along the rim of pyrite grains and associated with an enrichment in As and other metals (Sb-Ag-Bi-Te) shares similarities with porphyry and epithermal deposits, and the overall metal association of Au with Te and Bi is a hallmark of other intrusion-related gold systems. The occurrence of the ore metal-rich rims on pyrite from Grand Duc could be related to fluid boiling which results in the destabilization of gold-bearing aqueous complexes. Pyrite from Doyon does not show this inferred boiling texture but shares characteristics of dissolution-reprecipitation processes, where metals in the pyrite lattice are dissolved and then reconcentrated into discrete mineral phases that commonly precipitate in voids and fractures created during pyrite dissolution.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography