Дисертації: "Methods of text mining"

1

Johnson, Eamon B. "Methods in Text Mining for Diagnostic Radiology." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1459514073.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Eales, James Matthew. "Text-mining of experimental methods in phylogenetics." Thesis, University of Manchester, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.529251.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Ashton, Triss A. "Accuracy and Interpretability Testing of Text Mining Methods." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc283791/.

Повний текст джерела

Анотація:

Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Zakaria, Suliman Zubi. "Retrieving Electronic Data Interchange (EDI) Dataset using Text Mining Methods." Thesis, Сумський державний університет, 2012. http://essuir.sumdu.edu.ua/handle/123456789/28658.

Повний текст джерела

Анотація:

Abstract: - The internet is a huge source of documents, containing a massive number of texts presented in multilingual languages on a wide range of topics. These texts are demonstrating in an electronic documents format hosted on the web. The documents exchanged using special forms in an Electronic Data Interchange (EDI) environment. Using web text mining approaches to mine documents in EDI environment could be new challenging guidelines in web text mining. Applying text-mining approaches to discover knowledge previously unknown patters retrieved from the web documents by using partitioned cluster analysis methods such as k- means methods using Euclidean distance measure algorithm for EDI text document datasets is unique area of research these days. Our experiments employ the standard K-means algorithm on EDI text documents dataset that most commonly used in electronic interchange. We also report some results using text mining clustering application solution called WEKA. This study will provide high quality services to any organization that is willing to use the system. When you are citing the document, use the following link http://essuir.sumdu.edu.ua/handle/123456789/28658

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Bhattacharya, Sanmitra. "Computational methods for mining health communications in web 2.0." Diss., University of Iowa, 2014. https://ir.uiowa.edu/etd/4576.

Повний текст джерела

Анотація:

Data from social media platforms are being actively mined for trends and patterns of interests. Problems such as sentiment analysis and prediction of election outcomes have become tremendously popular due to the unprecedented availability of social interactivity data of different types. In this thesis we address two problems that have been relatively unexplored. The first problem relates to mining beliefs, in particular health beliefs, and their surveillance using social media. The second problem relates to investigation of factors associated with engagement of U.S. Federal Health Agencies via Twitter and Facebook. In addressing the first problem we propose a novel computational framework for belief surveillance. This framework can be used for 1) surveillance of any given belief in the form of a probe, and 2) automatically harvesting health-related probes. We present our estimates of support, opposition and doubt for these probes some of which represent true information, in the sense that they are supported by scientific evidence, others represent false information and the remaining represent debatable propositions. We show for example that the levels of support in false and debatable probes are surprisingly high. We also study the scientific novelty of these probes and find that some of the harvested probes with sparse scientific evidence may indicate novel hypothesis. We also show the suitability of off-the-shelf classifiers for belief surveillance. We find these classifiers are quite generalizable and can be used for classifying newly harvested probes. Finally, we show the ability of harvesting and tracking probes over time. Although our work is focused in health care, the approach is broadly applicable to other domains as well. For the second problem, our specific goals are to study factors associated with the amount and duration of engagement of organizations. We use negative binomial hurdle regression models and Cox proportional hazards survival models for these. For Twitter, the hurdle analysis shows that presence of user-mention is positively associated with the amount of engagement while negative sentiment has inverse association. Content of tweets is also equally important for engagement. The survival analyses indicate that engagement duration is positively associated with follower count. For Facebook, both hurdle and survival analyses show that number of page likes and positive sentiment are correlated with higher and prolonged engagement while few content types are negatively correlated with engagement. We also find patterns of engagement that are consistent across Twitter and Facebook.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Zhang, Xiaodan Hu Xiaohua. "Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods /." Philadelphia, Pa. : Drexel University, 2009. http://hdl.handle.net/1860/3076.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Davis, Aaron Samuel. "Bisecting Document Clustering Using Model-Based Methods." BYU ScholarsArchive, 2009. https://scholarsarchive.byu.edu/etd/1938.

Повний текст джерела

Анотація:

We all have access to large collections of digital text documents, which are useful only if we can make sense of them all and distill important information from them. Good document clustering algorithms that organize such information automatically in meaningful ways can make a difference in how effective we are at using that information. In this paper we use model-based document clustering algorithms as a base for bisecting methods in order to identify increasingly cohesive clusters from larger, more diverse clusters. We specifically use the EM algorithm and Gibbs Sampling on a mixture of multinomials as the base clustering algorithms on three data sets. Additionally, we apply a refinement step, using EM, to the final output of each clustering technique. Our results show improved agreement with human annotated document classes when compared to the existing base clustering algorithms, with marked improvement in two out of three data sets.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Boynukalin, Zeynep. "Emotion Analysis Of Turkish Texts By Using Machine Learning Methods." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614521/index.pdf.

Повний текст джерела

Анотація:

Automatically analysing the emotion in texts is in increasing interest in today&rsquo
s research fields. The aim is to develop a machine that can detect type of user&rsquo
s emotion from his/her text. Emotion classification of English texts is studied by several researchers and promising results are achieved. In this thesis, an emotion classification study on Turkish texts is introduced. To the best of our knowledge, this is the first study on emotion analysis of Turkish texts. In English there exists some well-defined datasets for the purpose of emotion classification, but we could not find datasets in Turkish suitable for this study. Therefore, another important contribution is the generating a new data set in Turkish for emotion analysis. The dataset is generated by combining two types of sources. Several classification algorithms are applied on the dataset and results are compared. Due to the nature of Turkish language, new features are added to the existing methods to improve the success of the proposed method.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Palma, Michael, and Shidi Zhou. "A Web Scraper For Forums : Navigation and text extraction methods." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219903.

Повний текст джерела

Анотація:

Web forums are a popular way of exchanging information and discussing various topics. These websites usually have a special structure, divided into boards, threads and posts. Although the structure might be consistent across forums, the layout of each forum is different. The way a web forum presents the user posts is also very different from how a news website presents a single piece of information. All of this makes the navigation and extraction of text a hard task for web scrapers. The focus of this thesis is the development of a web scraper specialized in forums. Three different methods for text extraction are implemented and tested before choosing the most appropriate method for the task. The methods are Word Count, Text-Detection Framework and Text-to-Tag Ratio. The handling of link duplicates is also considered and solved by implementing a multi-layer bloom filter. The thesis is conducted applying a qualitative methodology. The results indicate that the Text-to-Tag Ratio has the best overall performance and gives the most desirable result in web forums. Thus, this was the selected methods to keep on the final version of the web scraper.
Webforum är ett populärt sätt att utbyta information och diskutera olika ämnen. Dessa webbplatser har vanligtvis en särskild struktur, uppdelad i startsida, trådar och inlägg. Även om strukturen kan vara konsekvent bland olika forum är layouten av varje forum annorlunda. Det sätt på vilket ett webbforum presenterar användarinläggen är också väldigt annorlunda än hur en nyhet webbplats presenterar en enda informationsinlägg. Allt detta gör navigering och extrahering av text en svår uppgift för webbskrapor. Fokuset av detta examensarbete är utvecklingen av en webbskrapa specialiserad på forum. Tre olika metoder för textutvinning implementeras och testas innan man väljer den lämpligaste metoden för uppgiften. Metoderna är Word Count, Text Detection Framework och Text-to-Tag Ratio. Hanteringen av länk dubbleringar noga övervägd och löses genom att implementera ett flerlagers bloom filter. Examensarbetet genomförs med tillämpning av en kvalitativ metodik. Resultaten indikerar att Text-to-Tag Ratio har den bästa övergripande prestandan och ger det mest önskvärda resultatet i webbforum. Således var detta den valda metoden att behålla i den slutliga versionen av webbskrapan.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Nhlabano, Valentine Velaphi. "Fast Data Analysis Methods For Social Media Data." Diss., University of Pretoria, 2018. http://hdl.handle.net/2263/72546.

Повний текст джерела

Анотація:

The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time.
Dissertation (MSc)--University of Pretoria, 2019.
National Research Foundation (NRF) - Scarce skills
Computer Science
MSc
Unrestricted

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Klock, Robert. "Quality of SQL Code Security on StackOverflow and Methods of Prevention." Oberlin College Honors Theses / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1625831198110328.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Balahur, Dobrescu Alexandra. "Methods and resources for sentiment analysis in multilingual documents of different text types." Doctoral thesis, Universidad de Alicante, 2011. http://hdl.handle.net/10045/19437.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Hirao, Eiji, Takeshi Furuhashi, Tomohiro Yoshikawa, and Daisuke Kobayashi. "A Study of Visualization Method with HK Graph Using Concept Words." 日本知能情報ファジィ学会, 2010. http://hdl.handle.net/2237/20687.

Повний текст джерела

Анотація:

Session ID: TH-B1-3
SCIS & ISIS 2010, Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems. December 8-12, 2010, Okayama Convention Center, Okayama, Japan

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Pieper, Michael J. [Verfasser], and Svetlozar T. [Akademischer Betreuer] Račev. "Advanced Text Mining Methods for the Financial Markets and Forecasting of Intraday Volatility / Michael J. Pieper. Betreuer: S. T. Rachev." Karlsruhe : KIT-Bibliothek, 2011. http://d-nb.info/1018232648/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Issa, Ahmad. "A method for ontology and knowledge-base assisted text mining for diabetes discussion forum." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/71006/.

Повний текст джерела

Анотація:

Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language. In order to identify the main requirements for analysing unstructured text in social media, this research takes a case study of a large discussion forum in the diabetes domain. It then reviews and evaluates existing text mining methods for the requirements to analyse such a domain. Using domain background knowledge to bridge the semantic gap in traditional text mining methods was identified as a key requirement for analysing text in discussion forums. Existing ontology engineering methodologies encounter difficulties in deriving suitable domain knowledge with the appropriate breadth and depth in domain-specific concepts with a rich relationships structure. These limitations usually originate from a reliance on human domain experts. This research developed a novel semantic text mining method. It can identify the concepts and topics being discussed, the strength of the relationships between them and then display the emergent knowledge from a discussion forum. The derived method has a modular design that consists of three main components: The Ontology building Process, Semantic Annotation and Topic Identification, and Visualisation Tools. The ontology building process generates domain ontology quickly with little need for domain experts. The topic identification component utilises a hybrid system of domain ontology and a general knowledge base for text enrichment and annotation, while the visualisation methods of dynamic tag clouds and cooccurrence network for pattern discovery enable a flexible visualisation of these results and can help uncover hidden knowledge. Application of the derived text mining method within the case study helped identify trending topics in the forum and how they change over time. The derived method performed better in semantic annotation of the text compared to the other systems evaluated. The new text mining method appears to be “generalisable” to other domains than diabetes. Future study needs to confirm this ability and to evaluate its applicability to other types of social media text sources.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Joshi, Apoorva. "Trajectory-based methods to predict user churn in online health communities." Thesis, University of Iowa, 2018. https://ir.uiowa.edu/etd/6152.

Повний текст джерела

Анотація:

Online Health Communities (OHCs) have positively disrupted the modern global healthcare system as patients and caregivers are interacting online with similar peers to improve quality of their life. Social support is the pillar of OHCs and, hence, analyzing the different types of social support activities contributes to a better understanding and prediction of future user engagement in OHCs. This thesis used data from a popular OHC, called Breastcancer.org, to first classify user posts in the community into the different categories of social support using Word2Vec for language processing and six different classifiers were explored, resulting in the conclusion that Random Forest was the best approach for classification of the user posts. This exercise helped identify the different types of social support activities that users participate in and also detect the most common type of social support activity among users in the community. Thereafter, three trajectory-based methods were proposed and implemented to predict user churn (attrition) from the OHC. Comparison of the proposed trajectory-based methods with two non-trajectory-based benchmark methods helped establish that user trajectories, which represent the month-to-month change in the type of social support activity of users are effective pointers for user churn from the community. The results and findings from this thesis could help OHC managers better understand the needs of users in the community and take necessary steps to improve user retention and community management.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Goluchowicz, Kerstin Martina Verfasser], and Knut [Akademischer Betreuer] [Blind. "Standardisation Foresight - an indicator-based, text mining and Delphi method / Kerstin Martina Goluchowicz. Betreuer: Knut Blind." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2012. http://d-nb.info/1025931017/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Zeeh, Julia, Karl Ledermüller, and Michaela Kobler-Weiß. "Evaluierung von Motivationsschreiben als Instrument in universitären Aufnahmeverfahren." zfhe, 2018. http://dx.doi.org/10.3217/zfhe-13-04/13.

Повний текст джерела

Анотація:

Während Zulassungstests an Universitäten im Regelfall evaluiert werden, sind entsprechende Verfahren zur Evaluierung anderer Prozessschritte in Bewerbungsverfahren - wie die Einreichung von Motivationsschreiben - noch nicht etabliert. Um diese Lücke zu schließen, wird in diesem Beitrag ein Multi-Method-Ansatz zur Evaluierung von Motivationsschreiben vorgestellt, bei dem Text-Mining-Techniken mit inhaltsanalytischen Elementen kombiniert werden. Es wird dargelegt, wie unterschiedliche von Studierenden gesendete "Signale" mit Studienerfolg korrelieren, und aufgezeigt, dass soziodemografische Effekte bei der Bewertung von Motivationsschreiben berücksichtigt werden müssten.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Siffer, Alban. "New statistical methods for data mining, contributions to anomaly detection and unimodality testing." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S113.

Повний текст джерела

Анотація:

Cette thèse propose de nouveaux algorithmes statistiques dans deux domaines différents de la fouille de données: la détection d'anomalies et le test d'unimodalité.Premièrement, une nouvelle méthode non-supervisée permettant de détecter des anomalies dans des flux de données est développée. Celle-ci se base sur le calcul de seuils probabilistes, eux-mêmes utilisés pour discriminer les observations anormales.La force de cette méthode est sa capacité à s'exécuter automatiquement sans connaissance préalable ni hypothèse sur le flux de données d'intérêt.De même, l'aspect générique de l'algorithme lui permet d'opérer dans des domaines d'application variés. En particulier, nous développons un cas d'usage en cyber-sécurité.Cette thèse développe également un nouveau test d'unimodalité qui permet de déterminer si une distribution de données comporte un ou plusieurs modes. Ce test est nouveau par deux aspects: sa capacité à traiter des distributions multivariées mais également sa faible complexité, lui permettant alors d'être appliqué en temps réel sur des flux de données.Cette composante plus fondamentale a principalement des applications dans d'autres domaines du data mining tels que le clustering. Un nouvel algorithme cherchant incrémentalement le paramétrage de k-means est notamment détaillé à la fin de ce manuscrit
This thesis proposes new statistical algorithms in two different data mining areas: anomaly detection and unimodality testing. First, a new unsupervised method for detecting outliers in streaming data is developed. It is based on the computation of probabilistic thresholds, which are themselves used to discriminate against abnormal observations. The strength of this method is its ability to run automatically without prior knowledge or hypothesis about the input data. Similarly, the generic aspect of the algorithm makes it able to operate in various fields. In particular, we develop a cyber-security use case. This thesis also proposes a new unimodality test which determines whether a data distribution has one or several modes. This test is new in two respects: its ability to handle multivariate distributions but also its low complexity, allowing it to be applied on streaming data. This more fundamental component has applications mainly in other areas of data mining such as clustering. A new algorithm incrementally searching for the k-means parameter setting is notably detailed at the end of this manuscript

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Chaudhary, Amit. "Supplementing consumer insights at Electrolux by mining social media: An exploratory case study." Thesis, Högskolan i Jönköping, Internationella Handelshögskolan, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-16096.

Повний текст джерела

Анотація:

Purpose – The aim of this thesis is to explore the possibility of text mining social media, for consumer insights from an organizational perspective. Design/methodology/approach – An exploratory, single case embedded case study with inductive approach and partially mixed, concurrent, dominant status mixed method research design. The case study contains three different studies to try to triangulate the research findings and support research objective of using social media for consumer insights for new products, new ideas and helping research and development process of any organization. Findings – Text mining is a useful, novel, flexible and an unobtrusive method to harness the hidden information in social media. By text-mining social media, an organization can find consumer insights from a large data set and this initiative requires an understanding of social media and its building blocks. In addition, a consumer focused product development approach not only drives social media mining but also enriched by using consumer insights from social media. Research limitations/implications – Text mining is a relatively new subject and focus on developing better analytical tool kits would promote the use of this novel method. The researchers in the field of consumer driven new product development can use social media as additional evidence in their research. Practical implications – The consumer insights gained from the text mining of social media within a workable ethical policy are positive implications for any organization. Unlike conventional marketing research methods text mining is social media is cost and time effective. Originality/value –This thesis attempts to use innovatively text-mining tools, which appear, in the field of computer sciences to mine social media for gaining better understanding of consumers thereby enriching the field of marketing research, a cross-industry effort. The ability of consumers to spread the electronic word of mouth (eWOM) using social media is no secret and organizations should now consider social media as a source to supplement if not replace the insights captured using conventional marketing research methods. Keywords – Social media, Web 2.0, Consumer generated content, Text mining, Mixed methods design, Consumer insights, Marketing research, Case study, Analytic coding, Hermeneutics, Asynchronous, Emergent strategy Paper type Master Thesis

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Pagliarani, Andrea. "New markov chain based methods for single and cross-domain sentiment classification." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8445/.

Повний текст джерела

Анотація:

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Duck, Geraint. "Extraction of database and software usage patterns from the bioinformatics literature." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/extraction-of-database-and-software-usage-patterns-from-the-bioinformatics-literature(fac16cb8-5b5b-4732-b7af-77a41cc64487).html.

Повний текст джерела

Анотація:

Method forms the basis of scientific research, enabling criticism, selection and extension of current knowledge. However, methods are usually confined to the literature, where they are often difficult to find, understand, compare, or repeat. Bioinformatics and computational biology provide a rich opportunity for resource creation and discovery, with a rapidly expanding "resourceome". Many of these resources are difficult to find due to the large choice available, and there are only a limited number of sufficiently populated lists that can help inform resource selection. Text mining has enabled large scale data analysis and extraction from within the scientific literature, and as such can provide a way to help explore the vast wealth of resources available, which form the basis of bioinformatics methods. As such, this thesis aims to survey the computational biology literature, using text mining to extract database and software resource name mentions. By evaluating the common pairs and patterns of usage of these resources within such articles, an abstract approximation of the in silico methods employed within the target domain is developed. Specifically, this thesis provides an analysis of the difficulties of resource name extraction from the literature, then using this knowledge to develop bioNerDS - a rule-based system that can detect database and software name mentions within full-text documents (with a final F-score of 67%). bioNerDS is then applied to the full-text document corpus from PubMed Central, the results of which are then explored to identify the differences in resource usage between different domains (bioinformatics, biology and medicine) through time, different journals and different document sections. In particular, the well established resources (e.g., BLAST, GO and GenBank) remain pervasive throughout the domains, although they are seeing a slight decline in usage. Statistical programs see high levels of usage, with R in bioinformatics and SPSS in medicine being frequently mentioned throughout the literature. An overview of the common resource pairs has been generated by pairing database and software names which directly co-occur after one another in text. Combining and aggregating these resource pairs together across the literature enables the generation of a network of common resource patterns within computational biology, which provides an abstract representation of the common in silico methods used. For example, sequence alignment tools remain an important part of several computational biology analysis pipelines, and GO is a strong network sink (primarily used for data annotation). The networks also show the emergence of proteomics and next generation sequencing resources, and provide a specialised overview of a typical phylogenetics method. This work performs an analysis of common resource usage patterns, and thus provides an important first step towards in silico method extraction using text-mining. This should have future implications in community best practice, both for resource and method selection.

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Bobrik, Annette Verfasser], and Hermann [Akademischer Betreuer] [Krallmann. "Content-based Clustering in Social Corpora - A New Method for Knowledge Identification based on Text Mining and Cluster Analysis / Annette Bobrik. Betreuer: Hermann Krallmann." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2013. http://d-nb.info/1031075364/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Bobrik, Annette [Verfasser], and Hermann [Akademischer Betreuer] Krallmann. "Content-based Clustering in Social Corpora - A New Method for Knowledge Identification based on Text Mining and Cluster Analysis / Annette Bobrik. Betreuer: Hermann Krallmann." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2013. http://nbn-resolving.de/urn:nbn:de:kobv:83-opus-38461.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Salehian, Ali. "PREDICTING THE DYNAMIC BEHAVIOR OF COAL MINE TAILINGS USING STATE-OF-PRACTICE GEOTECHNICAL FIELD METHODS." UKnowledge, 2013. http://uknowledge.uky.edu/ce_etds/9.

Повний текст джерела

Анотація:

This study is focused on developing a method to predict the dynamic behavior of mine tailings dams under earthquake loading. Tailings dams are a by-product of coal mining and processing activities. Mine tailings impoundments are prone to instability and failure under seismic loading as a result of the mechanical behavior of the tailings. Due to the existence of potential seismic sources in close proximity to the coal mining regions in the United States, it is necessary to assess the post-earthquake stability of these tailings dams. To develop the aforementioned methodology, 34 cyclic triaxial tests along with vane shear tests were performed on undisturbed mine tailings specimens from two impoundments in Kentucky. Therefore, the liquefaction resistance and the residual shear strength of the specimens were measured. The laboratory cyclic strength curves for the coal mine specimens were produced, and the relationship between plasticity, density, cyclic stress ratio, and number of cycles to liquefaction were identified. The samples from the Big Branch impoundment were generally loose samples, while the Abner Fork specimens were dense samples, older and slightly cemented. The data suggest that the number of loading cycles required to initiate liquefaction in mine tailings, NL, decreases with increasing CSR and with decreasing density. This trend is similar to what is typically observed in soil. For a number of selected specimens, using the results of a series of small-strain cyclic triaxial tests, the shear modulus reduction curves and damping ratio plots were created. The data obtained from laboratory experiments were correlated to the previously recorded geotechnical field data from the two impoundments. The field parameters including the SPT blow counts (N1)60, corrected CPT cone tip resistance (qt), and shear wave velocity (vs), were correlated to the laboratory measured cyclic resistance ratio (CRR). The results indicate that in general, the higher the (N1)60 and the tip resistance (qt), the higher the CSR was. Ultimately, practitioners will be able to use these correlations along with common state-of-practice geotechnical field methods to predict cyclic resistance in fine tailings to assess the liquefaction potential and post-earthquake stability of the impoundment structures.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Mueller, Marianne Larissa [Verfasser], Stefan [Akademischer Betreuer] Kramer, and Frank [Akademischer Betreuer] Puppe. "Data Mining Methods for Medical Diagnosis : Test Selection, Subgroup Discovery, and Contrained Clustering / Marianne Larissa Mueller. Gutachter: Stefan Kramer ; Frank Puppe. Betreuer: Stefan Kramer." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1024964264/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

27

SOARES, FABIO DE AZEVEDO. "AUTOMATIC TEXT CATEGORIZATION BASED ON TEXT MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23213@1.

Повний текст джерела

Анотація:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
A Categorização de Documentos, uma das tarefas desempenhadas em Mineração de Textos, pode ser descrita como a obtenção de uma função que seja capaz de atribuir a um documento uma categoria a que ele pertença. O principal objetivo de se construir uma taxonomia de documentos é tornar mais fácil a obtenção de informação relevante. Porém, a implementação e a execução de um processo de Categorização de Documentos não é uma tarefa trivial: as ferramentas de Mineração de Textos estão em processo de amadurecimento e ainda, demandam elevado conhecimento técnico para a sua utilização. Além disso, exercendo grande importância em um processo de Mineração de Textos, a linguagem em que os documentos se encontram escritas deve ser tratada com as particularidades do idioma. Contudo há grande carência de ferramentas que forneçam tratamento adequado ao Português do Brasil. Dessa forma, os objetivos principais deste trabalho são pesquisar, propor, implementar e avaliar um framework de Mineração de Textos para a Categorização Automática de Documentos, capaz de auxiliar a execução do processo de descoberta de conhecimento e que ofereça processamento linguístico para o Português do Brasil.
Text Categorization, one of the tasks performed in Text Mining, can be described as the achievement of a function that is able to assign a document to the category, previously defined, to which it belongs. The main goal of building a taxonomy of documents is to make easier obtaining relevant information. However, the implementation and execution of Text Categorization is not a trivial task: Text Mining tools are under development and still require high technical expertise to be handled, also having great significance in a Text Mining process, the language of the documents should be treated with the peculiarities of each idiom. Yet there is great need for tools that provide proper handling to Portuguese of Brazil. Thus, the main aims of this work are to research, propose, implement and evaluate a Text Mining Framework for Automatic Text Categorization, capable of assisting the execution of knowledge discovery process and provides language processing for Brazilian Portuguese.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Baker, Simon. "Semantic text classification for cancer text mining." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/275838.

Повний текст джерела

Анотація:

Cancer researchers and oncologists benefit greatly from text mining major knowledge sources in biomedicine such as PubMed. Fundamentally, text mining depends on accurate text classification. In conventional natural language processing (NLP), this requires experts to annotate scientific text, which is costly and time consuming, resulting in small labelled datasets. This leads to extensive feature engineering and handcrafting in order to fully utilise small labelled datasets, which is again time consuming, and not portable between tasks and domains. In this work, we explore emerging neural network methods to reduce the burden of feature engineering while outperforming the accuracy of conventional pipeline NLP techniques. We focus specifically on the cancer domain in terms of applications, where we introduce two NLP classification tasks and datasets: the first task is that of semantic text classification according to the Hallmarks of Cancer (HoC), which enables text mining of scientific literature assisted by a taxonomy that explains the processes by which cancer starts and spreads in the body. The second task is that of the exposure routes of chemicals into the body that may lead to exposure to carcinogens. We present several novel contributions. We introduce two new semantic classification tasks (the hallmarks, and exposure routes) at both sentence and document levels along with accompanying datasets, and implement and investigate a conventional pipeline NLP classification approach for both tasks, performing both intrinsic and extrinsic evaluation. We propose a new approach to classification using multilevel embeddings and apply this approach to several tasks; we subsequently apply deep learning methods to the task of hallmark classification and evaluate its outcome. Utilising our text classification methods, we develop and two novel text mining tools targeting real-world cancer researchers. The first tool is a cancer hallmark text mining tool that identifies association between a search query and cancer hallmarks; the second tool is a new literature-based discovery (LBD) system designed for the cancer domain. We evaluate both tools with end users (cancer researchers) and find they demonstrate good accuracy and promising potential for cancer research.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Lu, Zhiyong. "Text mining on GeneRIFs /." Connect to full text via ProQuest. Limited to UCD Anschutz Medical Campus, 2007.

Знайти повний текст джерела

Анотація:

Thesis (Ph.D. in ) -- University of Colorado Denver, 2007.
Typescript. Includes bibliographical references (leaves 174-182). Free to UCD affiliates. Online version available via ProQuest Digital Dissertations;

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Gonçalves, Lea Silvia Martins. "Categorização em Text Mining." Universidade de São Paulo, 2002. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-22062015-202748/.

Повний текст джерела

Анотація:

Os avanços tecnológicos e científicos ocorridos nas últimas décadas têm proporcionado o desenvolvimento de métodos cada vez mais eficientes para o armazenamento e processamento de dados. Através da análise e interpretação dos dados, é possível obter o conhecimento. Devido o conhecimento poder auxiliar a tomada de decisão, ele se tornou um elemento de fundamental importância para diversas organizações. Uma grande parte dos dados disponíveis hoje se encontra na forma textual, exemplo disso é o crescimento vertiginoso no que se refere à internet. Como os textos são dados não estruturados, é necessário realizar uma série de passos para transformá-los em dados estruturados para uma possível análise. O processo denominado de Text Mining é uma tecnologia emergente e visa analisar grandes coleções de documentos. Esta dissertação de mestrado aborda a utilização de diferentes técnicas e ferramentas para Text Mining. Em conjunto com o módulo de Pré-processamento de textos, projetado e implementado por Imamura (2001), essas técnicas e ferramentas podem ser utilizadas para textos em português. São explorados alguns algoritmos utilizados para extração de conhecimento de dados, \"como: Vizinho mais Próximo, Naive Bayes, Árvore de Decisão, Regras de Decisão, Tabelas de Decisão e Support Vector Machines. Para verificar o comportamento desses algoritmos para textos em português, foram realizados alguns experimentos.
The technological and scientific progresses that happened in the last decades have been providing the development of methods that are more and more efficient for the storage and processing of data. It is possible to obtain knowledge through the analysis and interpretation of the data. Knowledge has become an element of fundamental importance for several organizations, due to its aiding in decision making. Most of the data available today are found in textual form, an example of this is the Internet vertiginous growth. As the texts are not structured data, it is necessary to accomplish a series of steps to transform them in structured data for a possible analysis. The process entitled Text Mining is an emergent technology and aims at analyzing great collections of documents. This masters dissertation approaches the use of different techniques and tools for Text Mining, which together with the Text pre-processing module projected and implemented by Imamura (2001), can be used for texts in Portuguese. Some algorithms, used for knowledge extraction of data, such as: Nearest Neighbor, Naive Bayes, Decision Tree, Decision Rule, Decision Table and Support Vector Machines, are explored. To verify the behavior of these algorithms for texts in Portuguese, some experiments were realized.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Al-Halimi, Reem Khalil. "Mining Topic Signals from Text." Thesis, University of Waterloo, 2003. http://hdl.handle.net/10012/1165.

Повний текст джерела

Анотація:

This work aims at studying the effect of word position in text on understanding and tracking the content of written text. In this thesis we present two uses of word position in text: topic word selectors and topic flow signals. The topic word selectors identify important words, called topic words, by their spread through a text. The underlying assumption here is that words that repeat across the text are likely to be more relevant to the main topic of the text than ones that are concentrated in small segments. Our experiments show that manually selected keywords correspond more closely to topic words extracted using these selectors than to words chosen using more traditional indexing techniques. This correspondence indicates that topic words identify the topical content of the documents more than words selected using the traditional indexing measures that do not utilize word position in text. The second approach to applying word position is through topic flow signals. In this representation, words are replaced by the topics to which they refer. The flow of any one topic can then be traced throughout the document and viewed as a signal that rises when a word relevant to the topic is used and falls when an irrelevant word occurs. To reflect the flow of the topic in larger segments of text we use a simple smoothing technique. The resulting smoothed signals are shown to be correlated to the ideal topic flow signals for the same document. Finally, we characterize documents using the importance of their topic words and the spread of these words in the document. When incorporated into a Support Vector Machine classifier, this representation is shown to drastically reduce the vocabulary size and improve the classifier's performance compared to the traditional word-based, vector space representation.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Zaghloul, Waleed A. Lee Sang M. "Text mining using neural networks." Lincoln, Neb. : University of Nebraska-Lincoln, 2005. http://0-www.unl.edu.library.unl.edu/libr/Dissertations/2005/Zaghloul.pdf.

Повний текст джерела

Анотація:

Thesis (Ph.D.)--University of Nebraska-Lincoln, 2005.
Title from title screen (sites viewed on Oct. 18, 2005). PDF text: 100 p. : col. ill. Includes bibliographical references (p. 95-100 of dissertation).

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Rice, Simon B. "Text data mining in bioinformatics." Thesis, University of Manchester, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488351.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Munyana, Nicole. "Le text mining et XML." Thèse, Trois-Rivières : Université du Québec à Trois-Rivières, 2007. http://www.uqtr.ca/biblio/notice/resume/30024815R.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Theußl, Stefan, Ingo Feinerer, and Kurt Hornik. "Distributed Text Mining in R." WU Vienna University of Economics and Business, 2011. http://epub.wu.ac.at/3034/1/Theussl_etal%2D2011%2Dpreprint.pdf.

Повний текст джерела

Анотація:

R has recently gained explicit text mining support with the "tm" package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text) corpora. However, we typically face two challenges when analyzing large corpora: (1) the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM), and (2) an increase of the amount of data to be analyzed leads to increasing computational workload. Fortunately, adequate parallel programming models like MapReduce and the corresponding open source implementation called Hadoop allow for processing data sets beyond what would fit into memory. In this paper we present the package "tm.plugin.dc" offering a seamless integration between "tm" and Hadoop. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.
Series: Research Report Series / Department of Statistics and Mathematics

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Meyer, David, Kurt Hornik, and Ingo Feinerer. "Text Mining Infrastructure in R." American Statistical Association, 2008. http://epub.wu.ac.at/3978/1/textmining.pdf.

Повний текст джерела

Анотація:

During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classiffication and string kernels. (authors' abstract)

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Martins, Bruno. "Geographically Aware Web Text Mining." Master's thesis, Department of Informatics, University of Lisbon, 2009. http://hdl.handle.net/10451/14301.

Повний текст джерела

Анотація:

Text mining and search have become important research areas over the past few years, mostly due to the large popularity of the Web. A natural extension for these technologies is the development of methods for exploring the geographic context of Web information. Human information needs often present specific geographic constraints. Many Web documents also refer to speci c locations. However, relatively little e ort has been spent on developing the facilities required for geographic access to unstructured textual information. Geographically aware text mining and search remain relatively unexplored. This thesis addresses this new area, arguing that Web text mining can be applied to extract geographic context information, and that this information can be explored for information retrieval. Fundamental questions investigated include handling geographic references in text, assigning geographic scopes to the documents, and building retrieval applications that handle/use geographic scopes. The thesis presents appropriate solutions for each of these challenges, together with a comprehensive evaluation of their efectiveness. By investigating these questions, the thesis presents several findings on how the geographic context can be efectively handled by text processing tools.

Стилі APA, Harvard, Vancouver, ISO та ін.

38

McDonald, Daniel Merrill. "Combining Text Structure and Meaning to Support Text Mining." Diss., The University of Arizona, 2006. http://hdl.handle.net/10150/194015.

Повний текст джерела

Анотація:

Text mining methods strive to make unstructured text more useful for decision making. As part of the mining process, language is processed prior to analysis. Processing techniques have often focused primarily on either text structure or text meaning in preparing documents for analysis. As approaches have evolved over the years, increases in the use of lexical semantic parsing usually have come at the expense of full syntactic parsing. This work explores the benefits of combining structure and meaning or syntax and lexical semantics to support the text mining process.Chapter two presents the Arizona Summarizer, which includes several processing approaches to automatic text summarization. Each approach has varying usage of structural and lexical semantic information. The usefulness of the different summaries is evaluated in the finding stage of the text mining process. The summary produced using structural and lexical semantic information outperforms all others in the browse task. Chapter three presents the Arizona Relation Parser, a system for extracting relations from medical texts. The system is a grammar-based system that combines syntax and lexical semantic information in one grammar for relation extraction. The relation parser attempts to capitalize on the high precision performance of semantic systems and the good coverage of the syntax-based systems. The parser performs in line with the top reported systems in the literature. Chapter four presents the Arizona Entity Finder, a system for extracting named entities from text. The system greatly expands on the combination grammar approach from the relation parser. Each tag is given a semantic and syntactic component and placed in a tag hierarchy. Over 10,000 tags exist in the hierarchy. The system is tested on multiple domains and is required to extract seven additional types of entities in the second corpus. The entity finder achieves a 90 percent F-measure on the MUC-7 data and an 87 percent F-measure on the Yahoo data where additional entity types were extracted.Together, these three chapters demonstrate that combining text structure and meaning in algorithms to process language has the potential to improve the text mining process. A lexical semantic grammar is effective at recognizing domain-specific entities and language constructs. Syntax information, on the other hand, allows a grammar to generalize its rules when possible. Balancing performance and coverage in light of the world's growing body of unstructured text is important.

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Höckert, Linda. "Kemisk stabilisering av gruvavfall från Ljusnarsbergsfältet med mesakalk och avloppsslam." Thesis, Uppsala University, Department of Earth Sciences, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-88825.

Повний текст джерела

Анотація:

Mine waste from Ljusnarsbergsfältet in Kopparberg, Sweden, is considered to constitute a great risk for human health and the surrounding environment. Some of the waste rock consists of sulphide minerals. When sulphide minerals come into contact with dissolved oxygen and precipitation, oxidation may occur resulting in acid mine drainage (AMD) and the release of heavy metals. The purpose of this study has been to characterise the waste material and try to chemically stabilize the waste rock with a mixture of sewage sludge and calcium carbonate. The drawback of using organic matter is the risk that dissolved organic matter can act as a complexing agent for heavy metals and in this way increase their mobility. An additional study to examine this risk has therefore also been performed.

The project started with a pilot study in order to identify the material fraction that was suitable for the experiment. When suitable material had been chosen, a column test was carried out for the purpose of studying the slurry’s influence on the mobility of metals along with the production of acidity. To clarify the organic material’s potential for complexation a pH-stat batch test was used. Drainage water samples, from the columns, were regularly taken during the experiment. These samples were analysed for pH, electrical conductivity, alkalinity, redox potential, dissolved organic carbon (DOC), sulphate and leaching metals. The effluent from the pH-stat-test were only analysed on a few occasions and only for metal content and change in DOC concentration.

The results from the laboratory experiments showed that the waste rock from Ljusnarsberg easily leached large amounts of metals. The stabilization of the waste rock succeeded in maintaining a near neutral pH in the rock waste leachate, compared to a pH 3 leachate from untreated rock waste The average concentration of copper and zinc in the leachate from untreated waste rock exceeded 100 and 1000 mg/l respectively, while these metals were detected at concentrations around 0.1 and 1 mg/l, respectively, in the leachate from the treated wastes. Examined metals had concentrations between 40 to 4000 times lower in the leachate from treated waste rock, which implies that the stabilisation with reactive amendments succeeded. The long term effects are, however, not determined. The added sludge contributed to immobilise metals at neutral pH despite a small increase in DOC concentration. The problem with adding sludge is that if pH decreases with time there is a risk of increased metal leaching.

Gruvavfallet från Ljusnarsbergsfältet i Kopparberg anses utgöra en stor risk för människors hälsa och den omgivande miljön. En del av varpmaterialet, ofyndigt berg som blir över vid malmbrytning, utgörs av sulfidhaltigt mineral. Då varpen exponeras för luft och nederbörd sker en oxidation av sulfiderna, vilket kan ge upphov till surt lakvatten och läckage av tungmetaller. Syftet med arbetet har varit att karaktärisera varpen och försöka stabilisera den med en blandning bestående av mesakalk och avloppsslam, samt att undersöka risken med det lösta organiska materialets förmåga att komplexbinda metaller och på så vis öka deras rörlighet.

Efter insamling av varpmaterial utfördes först en förstudie för att avgöra vilken fraktion av varpen som var lämplig för försöket. När lämpligt material valts ut utfördes kolonntest för att studera slam/kalk-blandningens inverkan på lakning av metaller, samt pH-statiskt skaktest för att bedöma komplexbildningspotentialen hos det organiska materialet vid olika pH värden. Från kolonnerna togs lakvattenprover kontinuerligt ut under försökets gång för analys med avseende på pH, konduktivitet, alkalinitet, redoxpotential, löst organiskt kol (DOC), sulfat och utlakade metaller. Lakvattnet från pH-stat-testet provtogs vid ett fåtal tillfällen och analyserades endast med avseende på metallhalter och förändring i DOC-halt.

Resultatet från den laborativa studien visade att varpmaterialet från Ljusnarsberg lätt lakades på stora mängder metaller. Den reaktiva tillsatsen lyckades uppbringa ett neutralt pH i lakvattnet från avfallet, vilket kan jämföras med lakvattnet från den obehandlade kolonnen som låg på ett pH kring 3. Medelhalten av koppar och zink översteg under försöksperioden 100 respektive 1000 mg/l i lakvattnet från det obehandlade avfallet, medan halterna i det behandlade materialets lakvatten låg kring 0,1 respektive 1 mg/l. Av de studerade metallerna låg halterna 40-4000 gånger lägre i lakvattnet från den behandlade kolonnen, vilket innebär att slam/kalk-blandningen har haft verkan. Stabiliseringens långtidseffekt är dock okänd. Det tillsatta slammet resulterade inte i någon större ökning av DOC-halten i det pH-intervall som åstadkoms med mesakalken. Utifrån pH-stat-försöket kunde det konstateras att det tillsatta slammet bidrog till metallernas immobilisering vid neutralt pH, trots en liten ökning av DOC-halten. Om en sänkning av pH skulle ske med tidens gång föreligger dock risk för ökat metalläckage.

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Olsson, Elin. "Deriving Genetic Networks Using Text Mining." Thesis, University of Skövde, Department of Computer Science, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-708.

Повний текст джерела

Анотація:

On the Internet an enormous amount of information is available that is represented in an unstructured form. The purpose with a text mining tool is to collect this information and present it in a more structured form. In this report text mining is used to create an algorithm that searches abstracts available from PubMed and finds specific relationships between genes that can be used to create a network. The algorithm can also be used to find information about a specific gene. The network created by Mendoza et al. (1999) was verified in all the connections but one using the algorithm. This connection contained implicit information. The results suggest that the algorithm is better at extracting information about specific genes than finding connections between genes. One advantage with the algorithm is that it can also find connections between genes and proteins and genes and other chemical substances.

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Fivelstad, Ole Kristian. "Temporal Text Mining : The TTM Testbench." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8764.

Повний текст джерела

Анотація:

This master thesis presents the Temporal Text Mining(TTM) Testbench, an application for discovering association rules in temporal document collections. It is a continuation of work done in a project the fall of 2005 and work done in a project the fall of 2006. These projects have laid the foundation for this thesis. The focus of the work is on identifying and extracting meaningful terms from textual documents to improve the meaningfulness of the mined association rules. Much work has been done to compile the theoretical foundation of this project. This foundation has been used for assessing different approaches for finding meaningful and descriptive terms. The old TTM Testbench has been extended to include usage of WordNet, and operations for finding collocations, performing word sense disambiguation, and for extracting higher-level concepts and categories from the individual documents. A method for rating association rules based on the semantic similarity of the terms present in the rules has also been implemented. This was done in an attempt to narrow down the result set, and filter out rules which are not likely to be interesting. Experiments performed with the improved application shows that the usage of WordNet and the new operations can help increase the meaningfulness of the rules. One factor which plays a big part in this, is that synonyms of words are added to make the term more understandable. However, the experiments showed that it was difficult to decide if a rule was interesting or not, this made it impossible to draw any conclusions regarding the suitability of semantic similarity for finding interesting rules. All work on the TTM Testbench so far has focused on finding association rules in web newspapers. It may however be useful to perform experiments in a more limited domain, for example medicine, where the interestingness of a rule may be more easily decided.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Jelier, Rob. "Text mining applied to molecular biology." [S.l.] : Rotterdam : [The Author] ; Erasmus University [Host], 2008. http://hdl.handle.net/1765/10866.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Rentzmann, René. "Text mining im Customer-relationship-Management." Hamburg Kovač, 2007. http://d-nb.info/987473808/04.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Rentzmann, René. "Text Mining im Customer Relationship Management /." Hamburg : Kovač, 2008. http://www.verlagdrkovac.de/978-3-8300-3510-7.htm.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Leroy, Gondy, Hsinchun Chen, Jesse D. Martinez, Shauna Eggers, Ryan R. Falsey, Kerri L. Kislin, Zan Huang, et al. "Genescene: Biomedical Text And Data Mining." Wiley Periodicals, Inc, 2005. http://hdl.handle.net/10150/105791.

Повний текст джерела

Анотація:

Artificial Intelligence Lab, Department of MIS, University of Arizona
To access the content of digital texts efficiently, it is necessary to provide more sophisticated access than keyword based searching. Genescene provides biomedical researchers with research findings and background relations automatically extracted from text and experimental data. These provide a more detailed overview of the information available. The extracted relations were evaluated by qualified researchers and are precise. A qualitative ongoing evaluation of the current online interface indicates that this method to search the literature is more useful and efficient than keyword based searching.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Gilli, Giacomo. "Text Mining mediante l'utilizzo di Orange." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5041/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

47

GOMES, ROBERTO MIRANDA. "WORD SENSE DESAMBIGUATION IN TEXT MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=14103@1.

Повний текст джерела

Анотація:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
Esta dissertação investigou a aplicação de processos de mineração de textos a partir de técnicas de inteligência computacional e aprendizado de máquina no problema de ambigüidade de sentido de palavras. O trabalho na área de métodos de apoio à decisão teve como objetivo o desenvolvimento de técnicas capazes de automatizar os processos de desambiguação bem como a construção de um protótipo baseado na implementação de algumas dessas técnicas. Desambiguação de sentido de palavra é o processo de atribuição de um significado a uma palavra obtido por meio de informações colhidas no contexto em que ela ocorre, e um de seus objetivos é mitigar os enganos introduzidos por construções textuais ambíguas, auxiliando assim o processo de tomada de decisão. Buscou-se ainda na utilização de conceitos, ferramentas e formas de documentação considerados em trabalhos anteriores de maneira a dar continuidade ao desenvolvimento científico e deixar um legado mais facilmente reutilizável em trabalhos futuros. Atenção especial foi dada ao processo de detecção de ambigüidades e, por esse motivo, uma abordagem diferenciada foi empregada. Diferente da forma mais comum de desambiguação, onde uma máquina é treinada para desambiguar determinado termo, buscou-se no presente trabalho a nãodependência de se conhecer o termo a ser tratado e assim tornar o sistema mais robusto e genérico. Para isso, foram desenvolvidas heurísticas específicas baseadas em técnicas de inteligência computacional. Os critérios semânticos para identificação de termos ambíguos foram extraídos das técnicas de agrupamento empregadas em léxicos construídos após algum processo de normalização de termos. O protótipo, SID - Sistema Inteligente de Desambiguação - foi desenvolvido em .NET, que permite uma grande diversidade de linguagens no desenvolvimento, o que facilita o reuso do código para a continuidade da pesquisa ou a utilização das técnicas implementadas em alguma aplicação de mineração de textos. A linguagem escolhida foi o C#, pela sua robustez, facilidade e semelhança sintática com JAVA e C++, linguagens amplamente conhecidas e utilizadas pela maioria dos desenvolvedores.
This dissertation investigated the application of text mining process from techniques of computing intelligence and machine learning in the problem of word sense ambiguity. The work in the methods of decision support area aimed to develop techniques capable of doing a word meaning disambiguation automatically and also to construct a prototype based on the application of such techniques. Special attention was given to the process of ambiguity detection and, for this reason, a differentiated approach was used. Unlikely the most common type of disambiguation, in which the machine is trained to do it in determined terms, the present work aimed to address the ambiguity problem without the need of knowing the meaning of the term used, and thus, to make the system more robust and generic. In order to achieve that, specific heurists were developed based on computing intelligence techniques. The semantic criteria used to identify the ambiguous terms were extracted from grouping techniques employed in lexis built after some term normalization process.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

AGUIAR, C. Z. "Concept Maps Mining for Text Summarization." Universidade Federal do Espírito Santo, 2017. http://repositorio.ufes.br/handle/10/9846.

Повний текст джерела

Анотація:

Made available in DSpace on 2018-08-02T00:03:48Z (GMT). No. of bitstreams: 1 tese_11160_CamilaZacche_dissertacao_final.pdf: 5437260 bytes, checksum: 0c96c6b2cce9c15ea234627fad78ac9a (MD5) Previous issue date: 2017-03-31
8 Resumo Os mapas conceituais são ferramentas gráficas para a representação e construção do conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas conceituais têm sido amplamente utilizados em diferentes situações e para diferentes propósitos na educação, sendo uma delas a represent ação do texto escrito. Mes mo um gramá tico e complexo texto pode ser representado por um mapa conceitual contendo apenas conceitos e relações que represente m o que foi expresso de uma forma mais complicada. No entanto, a construção manual de um mapa conceit ual exige bastante tempo e esforço na identificação e estruturação do conhecimento, especialmente quando o mapa não deve representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve representar os conceitos expressos em um texto. Ass im, várias abordagens tecnológicas foram propostas para facilitar o processo de construção de mapas conceituais a partir de textos. Portanto, esta dissertação propõe uma nova abordagem para a construção automática de mapas conceituais como sumarização de t extos científicos. A sumarização pretende produzir um mapa conceitual como uma representação resumida do texto, mantendo suas diversas e mais importantes características. A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão te ntando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação textual disponível atualmente. Este crescimento também pode ser prejudicial à construção do conhecimento. Assim, consideramos a hipótese de que a sumarização de um text o representado por um mapa conceitual pode atribuir características importantes para assimilar o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para processá - lo. Neste contexto, realizamos uma revisão da literatura entre o s anos de 1994 e 2016 sobre as abordagens que visam a construção automática de mapas conceituais a partir de textos. A partir disso, construímos uma categorização para melhor identificar e analisar os recursos e as características dessas abordagens tecnoló gicas. Além disso, buscamos identificar as limitações e reunir as melhores características dos trabalhos relacionados para propor nossa abordagem. 9 Ademais, apresentamos um processo Concept Map Mining elaborado seguindo quatro dimensões : Descrição da Fonte de Dados, Definição do Domínio, Identificação de Elementos e Visualização do Mapa. Com o intuito de desenvolver uma arquitetura computacional para construir automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa resultou na ferramenta pública CMBuilder , uma ferramenta online para a construção automática de mapas conceituais a partir de textos, bem como uma api java chamada ExtroutNLP , que contém bibliotecas para extração de informações e serviços públicos. Para alcançar o objetivo proposto, direcionados esforços para áreas de processamento de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para alcançar nosso objetivo é extrair do texto as proposições do tipo ( conceito, rela ção, conceito ). Sob essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de preposição, resolução de anáforas e exploração de entidad es nomeadas para a rotulação de conceitos; ranking de conceitos baseado na análise de frequência de elementos e na topologia do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a abordagem também propõe o uso de técnicas de apre ndizagem supervisionada de clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do texto e construção de um vocabulário conceitual de domínios. Finalmente, uma análise objetiva para validar a exatidão da biblioteca Extr outNLP é executada e apresenta 0.65 precision sobre o corpus . Além disso, uma análise subjetiva para validar a qualidade do mapa conceitual construído pela ferramenta CMBuilder é realizada , apresentando 0.75/0.45 para precision / recall de conceitos e 0.57/ 0.23 para precision/ recall de relações em idioma inglês e apresenta ndo 0.68/ 0.38 para precision/ recall de conceitos e 0.41/ 0.19 para precision/ recall de relações em idioma português. Ademais , um experimento para verificar se o mapa conceitual sumarizado pe lo CMBuilder tem influência para a compreensão do assunto abordado em um texto é realizado , atingindo 60% de acertos para mapas extraídos de pequenos textos com questões de múltipla escolha e 77% de acertos para m apas extraídos de textos extensos com quest ões discursivas

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Hellström, Karlsson Rebecca. "Aiding Remote Diagnosis with Text Mining." Thesis, KTH, Människa och Kommunikation, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215760.

Повний текст джерела

Анотація:

The topic of this thesis is on how text mining could be used on patient-reported symptom descriptions, and how it could be used to aid doctors in their diagnostic process. Healthcare delivery today is struggling to provide care to remote settings, and costs are increasing together with the aging population. The aid provided to doctors from text mining on patient descriptions is unknown.Investigating if text mining can aid doctors by presenting additional information, based on what patients who write similar things to what their current patient is writing about, could be relevant to many settings in healthcare. It has the potential to improve the quality of care to remote settings and increase the number of patients treated on the limited resources available. In this work, patient texts were represented using the Bag-of-Words model and clustered using the k-means algorithm. The final clustering model used 41 clusters, and the ten most important words for the cluster centroids were used as representative words for the cluster. An experiment was then performed to gauge how the doctors were aided in their diagnostic process when patient texts were paired with these additional words. The results were that the words aided doctors in cases where the patient case was difficult and that the clustering algorithm can be used to provide the current patient with specific follow-up questions.
Ämnet för detta examensarbete är hur text mining kan användas på patientrapporterade symptombeskrivningar, och hur det kan användas för att hjälpa läkare att utföra den diagnostiska processen. Sjukvården har idag svårigheter med att leverera vård till avlägsna orter, och vårdkostnader ökar i och med en åldrande population. Idag är det okänt hur text mining skulle kunna hjälpa doktorer i sitt arbete. Att undersöka om läkare blir hjälpta av att presenteras med mer information, baserat på vad patienter som skriver liknande saker som deras nuvarande patient gör, kan vara relevant för flera olika områden av sjukvården. Text mining har potential att förbättra vårdkvaliten för patienter med låg tillgänglighet till vård, till exempel på grund av avstånd. I detta arbete representerades patienttexter med en Bag-of-Words modell, och klustrades med en k-means algoritm. Den slutgiltiga klustringsmodellen använde sig av 41 kluster, och de tio viktigaste orden för klustercentroider användes för att representera respektive kluster. Därefter genomfördes ett experiment för att se om och hur läkare blev behjälpta i sin diagnostiska process, om patienters texter presenterades med de tio orden från de kluster som texterna hörde till. Resultaten från experimentet var att orden hjälpte läkarna i de mer komplicerade patientfallen, och att klustringsalgoritmen skulle kunna användas för att ställa specifika följdfrågor till patienter.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Stolt, Richard. "The Business Value of Text Mining." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-13740.

Повний текст джерела

Анотація:

Text mining is an enabling technology that will come to change the process for how businesses derive insights & knowledge from the textual data available to them. The current literature has its focus set on the text mining algorithms and techniques, whereas the practical aspects of text mining are lacking. The efforts of this study aims at helping companies understand what the business value of text mining is with the help of a case study. Subsequently, an SMS-survey method was used to identify additional business areas where text mining could be used to derive business value from. A literature review was conducted to conceptualize the business value of text mining, thus a concept matrix was established. Here a business category and its relative: derived insights & knowledge, domain, and data source are specified. The concept matrix was from then on used to decide when information was of business value, to prove that text mining could be used to derive information of business value.Text mining analyses was conducted on traffic school data of survey feedback. The results were several patterns, where the business value was derived mainly for the categories of Quality Control & Quality Assurance. After comparing the results of the SMS-survey with the case study empiricism, some difficulties emerged in the categorization of derived information, implying the categories are required to become more specific and distinct. Furthermore, the concept matrix does not comprise all of the business categories that are sure to exist.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Methods of text mining"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями