Auswahl der wissenschaftlichen Literatur zum Thema „Big text data“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Big text data" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Big text data"

1

N.J., Anjala. „Algorithmic Assessment of Text based Data Classification in Big Data Sets“. Journal of Advanced Research in Dynamical and Control Systems 12, SP4 (31.03.2020): 1231–34. http://dx.doi.org/10.5373/jardcs/v12sp4/20201598.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Hassani, Hossein, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani und Mohammad Reza Yeganegi. „Text Mining in Big Data Analytics“. Big Data and Cognitive Computing 4, Nr. 1 (16.01.2020): 1. http://dx.doi.org/10.3390/bdcc4010001.

Der volle Inhalt der Quelle
Annotation:
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Kodabagi, M. M., Deepa Sarashetti und Vilas Naik. „A Text Information Retrieval Technique for Big Data Using Map Reduce“. Bonfring International Journal of Software Engineering and Soft Computing 6, Special Issue (31.10.2016): 22–26. http://dx.doi.org/10.9756/bijsesc.8236.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Courtney, Kyle, Rachael Samberg und Timothy Vollmer. „Big data gets big help: Law and policy literacies for text data mining“. College & Research Libraries News 81, Nr. 4 (09.04.2020): 193. http://dx.doi.org/10.5860/crln.81.4.193.

Der volle Inhalt der Quelle
Annotation:
A wealth of digital texts and the proliferation of automated research methodologies enable researchers to analyze large sets of data at a speed that would be impossible to achieve through manual review. When researchers use these automated techniques and methods for identifying, extracting, and analyzing patterns, trends, and relationships across large volumes of un- or thinly structured digital content, they are applying a methodology called text data mining or TDM. TDM is also referred to, with slightly different emphases, as “computational text analysis” or “content mining.”
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Rajagopal, D., und K. Thilakavalli. „Efficient Text Mining Prototype for Big Data“. International Journal of Data Mining And Emerging Technologies 5, Nr. 1 (2015): 38. http://dx.doi.org/10.5958/2249-3220.2015.00007.5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Iqbal, Waheed, Waqas Ilyas Malik, Faisal Bukhari, Khaled Mohamad Almustafa und Zubiar Nawaz. „Big Data Full-Text Search Index Minimization Using Text Summarization“. Information Technology and Control 50, Nr. 2 (17.06.2021): 375–89. http://dx.doi.org/10.5755/j01.itc.50.2.25470.

Der volle Inhalt der Quelle
Annotation:
An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Toon, Elizabeth, Carsten Timmermann und Michael Worboys. „Text-Mining and the History of Medicine: Big Data, Big Questions?“ Medical History 60, Nr. 2 (14.03.2016): 294–96. http://dx.doi.org/10.1017/mdh.2016.18.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Lepper, Marcel. „Big Data, Global Villages“. Philological Encounters 1, Nr. 1-4 (26.01.2016): 131–62. http://dx.doi.org/10.1163/24519197-00000006.

Der volle Inhalt der Quelle
Annotation:
How should the field of philology react to the ongoing quantitative growth of its material basis? This essay will first discuss two opposing strategies: The quantitative analysis of large amounts of data, promoted above all by Franco Moretti, is contrasted with the canon-oriented method of resorting to small corpora. Yet both the culturally conservative anxiety over growing masses of texts as well as the enthusiasm for the ‘digital humanities’ and the technological indexation of large text corpora prove to be unmerited when considering the complexity of the problem. Therefore, this essay advocates for a third, heuristic approach, which 1) accounts for the changes in global text production and storage, 2) is conscious of the material-political conditions that determine the accessibility of texts, and 3) creates a bridge between close and distant reading by binding quantitative approaches to fundamental, qualitative philological principles, thus helping philologists keep track of the irritating, provocative, and subversive elements of texts that automated queries inevitably miss.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Khan, Zaheer, und Tim Vorley. „Big data text analytics: an enabler of knowledge management“. Journal of Knowledge Management 21, Nr. 1 (13.02.2017): 18–34. http://dx.doi.org/10.1108/jkm-06-2015-0238.

Der volle Inhalt der Quelle
Annotation:
Purpose The purpose of this paper is to examine the role of big data text analytics as an enabler of knowledge management (KM). The paper argues that big data text analytics represents an important means to visualise and analyse data, especially unstructured data, which have the potential to improve KM within organisations. Design/methodology/approach The study uses text analytics to review 196 articles published in two of the leading KM journals – Journal of Knowledge Management and Journal of Knowledge Management Research & Practice – in 2013 and 2014. The text analytics approach is used to process, extract and analyse the 196 papers to identify trends in terms of keywords, topics and keyword/topic clusters to show the utility of big data text analytics. Findings The findings show how big data text analytics can have a key enabler role in KM. Drawing on the 196 articles analysed, the paper shows the power of big data-oriented text analytics tools in supporting KM through the visualisation of data. In this way, the authors highlight the nature and quality of the knowledge generated through this method for efficient KM in developing a competitive advantage. Research limitations/implications The research has important implications concerning the role of big data text analytics in KM, and specifically the nature and quality of knowledge produced using text analytics. The authors use text analytics to exemplify the value of big data in the context of KM and highlight how future studies could develop and extend these findings in different contexts. Practical implications Results contribute to understanding the role of big data text analytics as a means to enhance the effectiveness of KM. The paper provides important insights that can be applied to different business functions, from supply chain management to marketing management to support KM, through the use of big data text analytics. Originality/value The study demonstrates the practical application of the big data tools for data visualisation, and, with it, improving KM.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Kagan, Pavel. „Big data sets in construction“. E3S Web of Conferences 110 (2019): 02007. http://dx.doi.org/10.1051/e3sconf/201911002007.

Der volle Inhalt der Quelle
Annotation:
The paper studies the processing of large information data arrays (Big Data) in construction. The issues of the applicability of the big data concept (Big Data) at various stages of the life cycle of buildings and structures are considered. Methods for data conversion for their further processing are proposed. The methods used in the analysis of "big data" allow working with unstructured data sets (Data Mining). An approach is considered, in which the analysis of arbitrary data can be reduced to text analysis, similar to the analysis of ordinary text messages. At the moment, it is important and interesting to isolate non-obvious links present in the analysed data. The advantage of using big data is that it is not necessary to advance hypotheses for testing. Hypotheses appear during data analysis. Dependence analysis is a basic approach when working with big data. The concept of an automatic big data analysis system is proposed. For data mining, text analysis algorithms should be used, and discriminant functions should be used for the main problem to be solved (data classification).
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Big text data"

1

Šoltýs, Matej. „Big Data v technológiách IBM“. Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193914.

Der volle Inhalt der Quelle
Annotation:
This diploma thesis presents Big Data technologies and their possible use cases and applications. Theoretical part is initially focused on definition of term Big Data and afterwards is focused on Big Data technology, particularly on Hadoop framework. There are described principles of Hadoop, such as distributed storage and data processing, and its individual components. Furthermore are presented the largest vendors of Big Data technologies. At the end of this part of the thesis are described possible use cases of Big Data technologies and also some case studies. The practical part describes implementation of demo example of Big Data technologies and it is divided into two chapters. The first chapter of the practical part deals with conceptual design of demo example, used products and architecture of the solution. Afterwards, implementation of the demo example is described in the second chapter, from preparation of demo environment to creation of applications. Goals of this thesis are description and characteristics of Big Data, presentation of the largest vendors and their Big Data products, description of possible use cases of Big Data technologies and especially implementation of demo example in Big Data tools from IBM.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Leis, Machín Angela 1974. „Studying depression through big data analytics on Twitter“. Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671365.

Der volle Inhalt der Quelle
Annotation:
Mental disorders have become a major concern in public health, since they are one of the main causes of the overall disease burden worldwide. Depressive disorders are the most common mental illnesses, and they constitute the leading cause of disability worldwide. Language is one of the main tools on which mental health professionals base their understanding of human beings and their feelings, as it provides essential information for diagnosing and monitoring patients suffering from mental disorders. In parallel, social media platforms such as Twitter, allow us to observe the activity, thoughts and feelings of people’s daily lives, including those of patients suffering from mental disorders such as depression. Based on the characteristics and linguistic features of the tweets, it is possible to identify signs of depression among Twitter users. Moreover, the effect of antidepressant treatments can be linked to changes in the features of the tweets posted by depressive users. The analysis of this huge volume and diversity of data, the so-called “Big Data”, can provide relevant information about the course of mental disorders and the treatments these patients are receiving, which allows us to detect, monitor and predict depressive disorders. This thesis presents different studies carried out on Twitter data in the Spanish language, with the aim of detecting behavioral and linguistic patterns associated to depression, which can constitute the basis of new and complementary tools for the diagnose and follow-up of patients suffering from this disease
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Nhlabano, Valentine Velaphi. „Fast Data Analysis Methods For Social Media Data“. Diss., University of Pretoria, 2018. http://hdl.handle.net/2263/72546.

Der volle Inhalt der Quelle
Annotation:
The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time.
Dissertation (MSc)--University of Pretoria, 2019.
National Research Foundation (NRF) - Scarce skills
Computer Science
MSc
Unrestricted
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Bischof, Jonathan Michael. „Interpretable and Scalable Bayesian Models for Advertising and Text“. Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11400.

Der volle Inhalt der Quelle
Annotation:
In the era of "big data", scalable statistical inference is necessary to learn from new and growing sources of quantitative information. However, many commercial and scientific applications also require models to be interpretable to end users in order to generate actionable insights about quantities of interest. We present three case studies of Bayesian hierarchical models that improve the interpretability of existing models while also maintaining or improving the efficiency of inference. The first paper is an application to online advertising that presents an augmented regression model interpretable in terms of the amount of revenue a customer is expected to generate over his or her entire relationship with the company---even if complete histories are never observed. The resulting Poisson Process Regression employs a marginal inference strategy that avoids specifying customer-level latent variables used in previous work that complicate inference and interpretability. The second and third papers are applications to the analysis of text data that propose improved summaries of topic components discovered by these mixture models. While the current practice is to summarize topics in terms of their most frequent words, we show significantly greater interpretability in online experiments with human evaluators by using words that are also relatively exclusive to the topic of interest. In the process we develop a new class of topic models that directly regularize the differential usage of words across topics in order to produce stable estimates of the combined frequency-exclusivity metric as well as proposing efficient and parallelizable MCMC inference strategies.
Statistics
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Abrantes, Filipe André Catarino. „Processos e ferramentas de análise de Big Data : a análise de sentimento no twitter“. Master's thesis, Instituto Superior de Economia e Gestão, 2017. http://hdl.handle.net/10400.5/15802.

Der volle Inhalt der Quelle
Annotation:
Mestrado em Gestão de Sistemas de Informação
Com o aumento exponencial na produção de dados a nível mundial, torna-se crucial encontrar processos e ferramentas que permitam analisar este grande volume de dados (comumente denominado de Big Data), principalmente os não estruturados como é o caso dos dados produzidos em formato de texto. As empresas, hoje, tentam extrair valor destes dados, muitos deles gerados por clientes ou potenciais clientes, que lhes podem conferir vantagem competitiva. A dificuldade subsiste na forma como se analisa dados não estruturados, nomeadamente, os dados produzidos através das redes digitais, que são uma das grandes fontes de informação das organizações. Neste trabalho será enquadrada a problemática da estruturação e análise de Big Data, são apresentadas as diferentes abordagens para a resolução deste problema e testada uma das abordagens num bloco de dados selecionado. Optou-se pela abordagem de análise de sentimento, através de técnica de text mining, utilizando a linguagem R e texto partilhado na rede Twitter, relativo a quatro gigantes tecnológicas: Amazon, Apple, Google e Microsoft. Conclui-se, após o desenvolvimento e experimento do protótipo realizado neste projeto, que é possível efetuar análise de sentimento de tweets utilizando a ferramenta R, permitindo extrair informação de valor a partir de grandes blocos de dados.
Due to the exponential increase of global data, it becomes crucial to find processes and tools that make it possible to analyse this large volume (usually known as Big Data) of unstructured data, especially, the text format data. Nowadays, companies are trying to extract value from these data, mostly generated by customers or potential customers, which can assure a competitive leverage. The main difficulty is how to analyse unstructured data, in particular, data generated through digital networks, which are one of the biggest sources of information for organizations. During this project, the problem of Big Data structuring and analysis will be framed, will be presented the different approaches to solve this issue and one of the approaches will be tested in a selected data block. It was selected the sentiment analysis approach, using text mining technique, R language and text shared in Twitter, related to four technology giants: Amazon, Apple, Google and Microsoft. In conclusion, after the development and experimentation of the prototype carried out in this project, that it is possible to perform tweets sentiment analysis using the tool R, allowing to extract valuable information from large blocks of data.
info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Hill, Geoffrey. „Sensemaking in Big Data: Conceptual and Empirical Approaches to Actionable Knowledge Generation from Unstructured Text Streams“. Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1433597354.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Chennen, Kirsley. „Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies“. Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAJ076/document.

Der volle Inhalt der Quelle
Annotation:
Au cours de la dernière décennie, la recherche biomédicale et la pratique médicale ont été révolutionné par l'ère post-génomique et l'émergence des « Big Data » en biologie. Il existe toutefois, le cas particulier des maladies rares caractérisées par la rareté, allant de l’effectif des patients jusqu'aux connaissances sur le domaine. Néanmoins, les maladies rares représentent un réel intérêt, car les connaissances fondamentales accumulées en temps que modèle d'études et les solutions thérapeutique qui en découlent peuvent également bénéficier à des maladies plus communes. Cette thèse porte sur le développement de nouvelles solutions bioinformatiques, intégrant des données Big Data et des approches guidées par la connaissance pour améliorer l'étude des maladies rares. En particulier, mon travail a permis (i) la création de PubAthena, un outil de criblage de la littérature pour la recommandation de nouvelles publications pertinentes, (ii) le développement d'un outil pour l'analyse de données exomique, VarScrut, qui combine des connaissance multiniveaux pour améliorer le taux de résolution
Over the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Soen, Kelvin, und Bo Yin. „Customer Behaviour Analysis of E-commerce : What information can we get from customers' reviews through big data analysis“. Thesis, KTH, Entreprenörskap och Innovation, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254194.

Der volle Inhalt der Quelle
Annotation:
Online transactions have been growing exponentially in the last decade, contributing to up to 11% of total retail sales. One of the parameters of success in online transactions are online reviews where customers have the chance to assign level of satisfaction regarding their purchase. This review system acts as a bargaining power for customers so that their suppliers pay more attention to their satisfaction, as well as benchmark for future prospective customers. This research digs into what actually causes customers to assign high level of satisfaction in their online purchase experience: Whether it is packaging, delivery time or else. This research also tries to dig into customer behaviour related to online reviews from three different perspectives: gender, culture and economic structure. Data mining methodology is used to collect and analyse the data, thus providing a reliable quantitative study. The end result of this study is expected to assist in marketing decisions to capture certain types of consumers who significantly place or purchasing decision based on online reviews.
Entrepreneurship & Innovation Management
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Lindén, Johannes. „Huvudtitel: Understand and Utilise Unformatted Text Documents by Natural Language Processing algorithms“. Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31043.

Der volle Inhalt der Quelle
Annotation:
News companies have a need to automate and make the editors process of writing about hot and new events more effective. Current technologies involve robotic programs that fills in values in templates and website listeners that notifies the editors when changes are made so that the editor can read up on the source change at the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources. This study applies deep learning algorithms to automatically formulate abstracts and tag sources with appropriate tags based on the context. The study is a full stack solution, which manages both the editors need for speed and the training, testing and validation of the algorithms. Decision Tree, Random Forest, Multi Layer Perceptron and phrase document vectors are used to evaluate the categorisation and Recurrent Neural Networks is used to paraphrase unformatted texts. In the evaluation a comparison between different models trained by the algorithms with a variation of parameters are done based on the F-score. The results shows that the F-scores are increasing the more document the training has and decreasing the more categories the algorithm needs to consider. The Multi-Layer Perceptron perform best followed by Random Forest and finally Decision Tree. The document length matters, when larger documents are considered during training the score is increasing considerably. A user survey about the paraphrase algorithms shows the paraphrase result is insufficient to satisfy editors need. It confirms a need for more memory to conduct longer experiments.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Savalli, Antonino. „Tecniche analitiche per “Open Data”“. Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17476/.

Der volle Inhalt der Quelle
Annotation:
L’ultimo decennio ha reso estremamente popolare il concetto di Open Government, un modello di amministrazione aperto che fonda le sue basi sui principi di trasparenza, partecipazione e collaborazione. Nel 2011, nasce il progetto Dati.gov.it, un portale che ha il ruolo di “catalogo nazionale dei metadati relativi ai dati rilasciati in formato aperto dalle pubbliche amministrazioni italiane”. L'obiettivo della tesi è fornire un efficace strumento per ricercare, usare e confrontare le informazioni presenti sul portale Dati.gov.it, individuando tra i dataset similarità che possano risolvere e/o limitare l’eterogeneità dei dati presenti. Il progetto consiste nello sviluppo su tre aree di studio principali: Standard di Open Data e Metadata, Record Linkage e Data Fusion. Nello specifico, sono state implementate sette funzioni contenute in un'unica libreria. La funzione search permette di ricercare all'interno del portale dati.gov.it. La funzione ext permette di estrarre le informazioni da sette formati sorgente: csv, json, xml, xls, rdf, pdf e txt. La funzione pre-process permette il Data Cleaning. La funzione find_claims è il cuore del progetto, perché contiene l'algoritmo di Text Mining che stabilisce una relazione tra i dataset individuando le parole in comune che hanno una sufficiente importanza all'interno del contesto. La funzione header_linkage permette di trovare la similarità tra i nomi degli attributi di due dataset, consigliando quali attributi concatenare. In modo analogo, record_linkage permette di trovare similarità tra i valori degli attributi di due dataset, consigliando quali attributi concatenare. Infine, la funzione merge_keys permette di fondere i risultati di header_linkage e record_linkage. I risultati sperimentali hanno fornito feedback positivi sul funzionamento dei principali metodi implementati per quanto concerne la similarità sintattica tra due dataset.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Big text data"

1

Jo, Taeho. Text Mining: Concepts, Implementation, and Big Data Challenge (Studies in Big Data). Springer, 2018.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Jo, Taeho. Text Mining: Concepts, Implementation, and Big Data Challenge. Springer, 2019.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Struhl, Steven. Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence. Kogan Page, 2016.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence. Kogan Page, 2015.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Zaydman, Mikhail. Tweeting About Mental Health: Big Data Text Analysis of Twitter for Public Policy. RAND Corporation, 2017. http://dx.doi.org/10.7249/rgsd391.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Deep Text: Using Text Analytics to Conquer Information Overload, Get Real Value from Social Media, and Add Bigger Text to Big Data. Information Today Inc, 2016.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Brayne, Sarah. Predict and Surveil. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190684099.001.0001.

Der volle Inhalt der Quelle
Annotation:
The scope of criminal justice surveillance, from policing to incarceration, has expanded rapidly in recent decades. At the same time, the use of big data has spread across a range of fields, including finance, politics, health, and marketing. While law enforcement’s use of big data is hotly contested, very little is known about how the police actually use it in daily operations and with what consequences. This book offers an inside look at how police use big data and new surveillance technologies, leveraging on-the-ground fieldwork with one of the most technologically advanced law enforcement agencies in the world—the Los Angeles Police Department. Drawing on original interviews and ethnographic observations from over two years of fieldwork with the LAPD, the text examines the causes and consequences of big data and algorithmic control. It reveals how the police use predictive analytics and new surveillance technologies to deploy resources, identify criminal suspects, and conduct investigations; how the adoption of big data analytics transforms police organizational practices; and how the police themselves respond to these new data-driven practices. While big data analytics has the potential to reduce bias, increase efficiency, and improve prediction accuracy, the book argues that it also reproduces and deepens existing patterns of inequality, threatens privacy, and challenges civil liberties.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Jockers, Matthew L. Theme. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037528.003.0008.

Der volle Inhalt der Quelle
Annotation:
This chapter demonstrates how big data and computation can be used to identify and track recurrent themes as the products of external influence. It first considers the limitations of the Google Ngram Viewer as a tool for tracing thematic trends over time before turning to Douglas Biber's Corpus Linguistics: Investigating Language Structure and Use, a primer on various factors complicating word-focused text analysis and the subsequent conclusions one might draw regarding word meanings. It then discusses the results of the author's application of latent Dirichlet allocation (LDA) to a corpus of 3,346 nineteenth-century novels using the open-source MALLET (MAchine Learning for LanguagE Toolkit), a software package for topic modeling. It also explains the different types of analyses performed by the author, including text segmentation, word chunking, and author nationality, gender and time-themes relationship analyses. The thematic data from the LDA model reveal the degree to which author nationality, author gender, and date of publication could be predicted by the thematic signals expressed in the nineteenth-century novels corpus.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Morin, Jean-Frédéric, Christian Olsson und Ece Özlem Atikcan, Hrsg. Research Methods in the Social Sciences: An A-Z of key concepts. Oxford University Press, 2021. http://dx.doi.org/10.1093/hepl/9780198850298.001.0001.

Der volle Inhalt der Quelle
Annotation:
Research Methods in the Social Sciences features chapters that cover a wide range of concepts, methods, and theories. Each chapter begins with an introduction to a method, using real-world examples from a wide range of academic disciplines, before discussing the benefits and limitations of the approach, its current status in academic practice, and finally providing tips and advice on when and how to apply the method in research. The text covers both well-established concepts and emerging ideas, such as big data and network analysis, for qualitative and quantitative research methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Jockers, Matthew L. Revolution. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037528.003.0001.

Der volle Inhalt der Quelle
Annotation:
This chapter discusses the enormous promise of computational approaches to the study of literature, with particular emphasis on digital humanities as an emerging field. By 2008 computers, with their capacity for number crunching and processing large-scale data sets, had revolutionized the way that scientific research is carried out. Now, the same elements that have had such an impact on the sciences are slowly and surely revolutionizing the way that research in the humanities gets done. This chapter considers the history of digital humanities, also known as humanities computing, community of practice, or field of study/theory/methodology, and how revolution in this emerging field is being catalyzed by big data. It also emphasizes the potential of literary computing and cites the existence of digital libraries and large electronic text collections as factors that are sparking the digital humanities revolution.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Big text data"

1

Ye, Zhonglin, Haixing Zhao, Ke Zhang, Yu Zhu und Yuzhi Xiao. „Text-Associated Max-Margin DeepWalk“. In Big Data, 301–21. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-2922-7_21.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Jo, Taeho. „Text Summarization“. In Studies in Big Data, 271–94. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_13.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Jo, Taeho. „Text Segmentation“. In Studies in Big Data, 295–317. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_14.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Jo, Taeho. „Text Indexing“. In Studies in Big Data, 19–40. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Jo, Taeho. „Text Encoding“. In Studies in Big Data, 41–58. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Jo, Taeho. „Text Association“. In Studies in Big Data, 59–75. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Aswin, T. S., Rahul Ignatius und Mathangi Ramachandran. „Integration of Text Classification Model with Speech to Text System“. In Big Data Analytics, 103–12. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-72413-3_7.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Jo, Taeho. „Text Clustering: Approaches“. In Studies in Big Data, 203–24. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Jo, Taeho. „Text Clustering: Implementation“. In Studies in Big Data, 225–47. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_11.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Jo, Taeho. „Text Clustering: Evaluation“. In Studies in Big Data, 249–68. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_12.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Big text data"

1

Lee, Song-Eun, Kang-Min Kim, Woo-Jong Ryu, Jemin Park und SangKeun Lee. „From Text Classification to Keyphrase Extraction for Short Text“. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019. http://dx.doi.org/10.1109/bigdata47090.2019.9006409.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Buchler, Marco, Greta Franzini, Emily Franzini und Maria Moritz. „Scaling historical text re-use“. In 2014 IEEE International Conference on Big Data (Big Data). IEEE, 2014. http://dx.doi.org/10.1109/bigdata.2014.7004449.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Blanke, Tobias, und Jon Wilson. „Identifying epochs in text archives“. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8258172.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Richardet, Renaud, Jean-Cedric Chappelier, Shreejoy Tripathy und Sean Hill. „Agile text mining with Sherlok“. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015. http://dx.doi.org/10.1109/bigdata.2015.7363910.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Vandierendonck, Hans, Karen Murphy, Mahwish Arif und Dimitrios S. Nikolopoulos. „HPTA: High-performance text analytics“. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. http://dx.doi.org/10.1109/bigdata.2016.7840632.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Ge, Lihao, und Teng-Sheng Moh. „Improving text classification with word embedding“. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8258123.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Lulli, Alessandro, Thibault Debatty, Matteo Dell'Amico, Pietro Michiardi und Laura Ricci. „Scalable k-NN based text clustering“. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015. http://dx.doi.org/10.1109/bigdata.2015.7363845.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Song, Xiaoli, XiaoTong Wang und Xiaohua Hu. „Semantic pattern mining for text mining“. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. http://dx.doi.org/10.1109/bigdata.2016.7840600.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Bingmann, Timo, Simon Gog und Florian Kurpicz. „Scalable Construction of Text Indexes with Thrill“. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018. http://dx.doi.org/10.1109/bigdata.2018.8622171.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Alzhrani, Khudran, Ethan M. Rudd, C. Edward Chow und Terrance E. Boult. „Automated big security text pruning and classification“. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. http://dx.doi.org/10.1109/bigdata.2016.7841028.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Big text data"

1

Currie, Janet, Henrik Kleven und Esmée Zwiers. Technology and Big Data Are Changing Economics: Mining Text to Track Methods. Cambridge, MA: National Bureau of Economic Research, Januar 2020. http://dx.doi.org/10.3386/w26715.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Doucet, Rachel A., Deyan M. Dontchev, Javon S. Burden und Thomas L. Skoff. Big Data Analytics Test Bed. Fort Belvoir, VA: Defense Technical Information Center, September 2013. http://dx.doi.org/10.21236/ada589903.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Cerdeira, Pablo, Marcus Mentzingen de Mendonça und Urszula Gabriela Lagowska. Políticas públicas orientadas por dados: Os caminhos possíveis para governos locais. Herausgegeben von Mauricio Bouskela, Marcelo Facchina und Hallel Elnir. Inter-American Development Bank, Oktober 2020. http://dx.doi.org/10.18235/0002727.

Der volle Inhalt der Quelle
Annotation:
Este texto para discussão aborda alguns estudos preliminares do Projeto “Big Data para o Desenvolvimento Urbano Sustentável” conduzido pela Fundação Getulio Vargas em parceria com o BID, com as cidades de Miraflores (Peru), Montevidéu (Uruguai), Quito (Equador), São Paulo (Brasil) e Xalapa (México) e com o apoio do aplicativo Waze. Este projeto faz parte da Cooperação Técnica Regional RG-T3095 financiada pelo BID, por intermédio do programa de Bens Públicos Regionais, e executado pela FGV. No BID, o estudo foi coordenado pela Divisão de Habitação e Desenvolvimento Urbano e, na FGV, pelo Centro de Tecnologia e Desenvolvimento - CTD, e desenvolvido em parceria com o Centro de Estudos de Política e Economia do Setor Público - CEPESP (Aspectos Institucionais), a Escola de Direito do Rio de Janeiro - FGV Direito Rio (Aspectos Regulatórios) e a Escola de Matemática Aplicada - FGV EMAp (Ciencia de Dados).
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

de Caritat, Patrice, Brent McInnes und Stephen Rowins. Towards a heavy mineral map of the Australian continent: a feasibility study. Geoscience Australia, 2020. http://dx.doi.org/10.11636/record.2020.031.

Der volle Inhalt der Quelle
Annotation:
Heavy minerals (HMs) are minerals with a specific gravity greater than 2.9 g/cm3. They are commonly highly resistant to physical and chemical weathering, and therefore persist in sediments as lasting indicators of the (former) presence of the rocks they formed in. The presence/absence of certain HMs, their associations with other HMs, their concentration levels, and the geochemical patterns they form in maps or 3D models can be indicative of geological processes that contributed to their formation. Furthermore trace element and isotopic analyses of HMs have been used to vector to mineralisation or constrain timing of geological processes. The positive role of HMs in mineral exploration is well established in other countries, but comparatively little understood in Australia. Here we present the results of a pilot project that was designed to establish, test and assess a workflow to produce a HM map (or atlas of maps) and dataset for Australia. This would represent a critical step in the ability to detect anomalous HM patterns as it would establish the background HM characteristics (i.e., unrelated to mineralisation). Further the extremely rich dataset produced would be a valuable input into any future machine learning/big data-based prospectivity analysis. The pilot project consisted in selecting ten sites from the National Geochemical Survey of Australia (NGSA) and separating and analysing the HM contents from the 75-430 µm grain-size fraction of the top (0-10 cm depth) sediment samples. A workflow was established and tested based on the density separation of the HM-rich phase by combining a shake table and the use of dense liquids. The automated mineralogy quantification was performed on a TESCAN® Integrated Mineral Analyser (TIMA) that identified and mapped thousands of grains in a matter of minutes for each sample. The results indicated that: (1) the NGSA samples are appropriate for HM analysis; (2) over 40 HMs were effectively identified and quantified using TIMA automated quantitative mineralogy; (3) the resultant HMs’ mineralogy is consistent with the samples’ bulk geochemistry and regional geological setting; and (4) the HM makeup of the NGSA samples varied across the country, as shown by the mineral mounts and preliminary maps. Based on these observations, HM mapping of the continent using NGSA samples will likely result in coherent and interpretable geological patterns relating to bedrock lithology, metamorphic grade, degree of alteration and mineralisation. It could assist in geological investigations especially where outcrop is minimal, challenging to correctly attribute due to extensive weathering, or simply difficult to access. It is believed that a continental-scale HM atlas for Australia could assist in derisking mineral exploration and lead to investment, e.g., via tenement uptake, exploration, discovery and ultimately exploitation. As some HMs are hosts for technology critical elements such as rare earth elements, their systematic and internally consistent quantification and mapping could lead to resource discovery essential for a more sustainable, lower-carbon economy.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Holland, Darren, und Nazmina Mahmoudzadeh. Foodborne Disease Estimates for the United Kingdom in 2018. Food Standards Agency, Januar 2020. http://dx.doi.org/10.46756/sci.fsa.squ824.

Der volle Inhalt der Quelle
Annotation:
In February 2020 the FSA published two reports which produced new estimates of foodborne norovirus cases. These were the ‘Norovirus Attribution Study’ (NoVAS study) (O’Brien et al., 2020) and the accompanying internal FSA technical review ‘Technical Report: Review of Quantitative Risk Assessment of foodborne norovirus transmission’ (NoVAS model review), (Food Standards Agency, 2020). The NoVAS study produced a Quantitative Microbiological Risk Assessment model (QMRA) to estimate foodborne norovirus. The NoVAS model review considered the impact of using alternative assumptions and other data sources on these estimates. From these two pieces of work, a revised estimate of foodborne norovirus was produced. The FSA has therefore updated its estimates of annual foodborne disease to include these new results and also to take account of more recent data related to other pathogens. The estimates produced include: •Estimates of GP presentations and hospital admissions for foodbornenorovirus based on the new estimates of cases. The NoVAS study onlyproduced estimates for cases. •Estimates of foodborne cases, GP presentations and hospital admissions for12 other pathogens •Estimates of unattributed cases of foodborne disease •Estimates of total foodborne disease from all pathogens Previous estimates An FSA funded research project ‘The second study of infectious intestinal disease in the community’, published in 2012 and referred to as the IID2 study (Tam et al., 2012), estimated that there were 17 million cases of infectious intestinal disease (IID) in 2009. These include illness caused by all sources, not just food. Of these 17 million cases, around 40% (around 7 million) could be attributed to 13 known pathogens. These pathogens included norovirus. The remaining 60% of cases (equivalent to 10 million cases) were unattributed cases. These are cases where the causal pathogen is unknown. Reasons for this include the causal pathogen was not tested for, the test was not sensitive enough to detect the causal pathogen or the pathogen is unknown to science. A second project ‘Costed extension to the second study of infectious intestinal disease in the community’, published in 2014 and known as IID2 extension (Tam, Larose and O’Brien, 2014), estimated that there were 566,000 cases of foodborne disease per year caused by the same 13 known pathogens. Although a proportion of the unattributed cases would also be due to food, no estimate was provided for this in the IID2 extension. New estimates We estimate that there were 2.4 million cases of foodborne disease in the UK in 2018 (95% credible intervals 1.8 million to 3.1 million), with 222,000 GP presentations (95% Cred. Int. 150,000 to 322,000) and 16,400 hospital admissions (95% Cred. Int. 11,200 to 26,000). Of the estimated 2.4 million cases, 0.9 million (95% Cred. Int. 0.7 million to 1.2 million) were from the 13 known pathogens included in the IID2 extension and 1.4 million1 (95% Cred. Int. 1.0 million to 2.0 million) for unattributed cases. Norovirus was the pathogen with the largest estimate with 383,000 cases a year. However, this estimate is within the 95% credible interval for Campylobacter of 127,000 to 571,000. The pathogen with the next highest number of cases was Clostridium perfringens with 85,000 (95% Cred. Int. 32,000 to 225,000). While the methodology used in the NoVAS study does not lend itself to producing credible intervals for cases of norovirus, this does not mean that there is no uncertainty in these estimates. There were a number of parameters used in the NoVAS study which, while based on the best science currently available, were acknowledged to have uncertain values. Sensitivity analysis undertaken as part of the study showed that changes to the values of these parameters could make big differences to the overall estimates. Campylobacter was estimated to have the most GP presentations with 43,000 (95% Cred. Int. 19,000 to 76,000) followed by norovirus with 17,000 (95% Cred. Int. 11,000 to 26,000) and Clostridium perfringens with 13,000 (95% Cred. Int. 6,000 to 29,000). For hospital admissions Campylobacter was estimated to have 3,500 (95% Cred. Int. 1,400 to 7,600), followed by norovirus 2,200 (95% Cred. Int. 1,500 to 3,100) and Salmonella with 2,100 admissions (95% Cred. Int. 400 to 9,900). As many of these credible intervals overlap, any ranking needs to be undertaken with caution. While the estimates provided in this report are for 2018 the methodology described can be applied to future years.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Transfer of Air Force technical procurement bid set data to small businesses, using CALS and EDI: Test report. Office of Scientific and Technical Information (OSTI), August 1994. http://dx.doi.org/10.2172/46712.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie