Dissertations / Theses on the topic 'Big text data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Big text data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Šoltýs, Matej. "Big Data v technológiách IBM." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193914.
Full textLeis, Machín Angela 1974. "Studying depression through big data analytics on Twitter." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671365.
Full textNhlabano, Valentine Velaphi. "Fast Data Analysis Methods For Social Media Data." Diss., University of Pretoria, 2018. http://hdl.handle.net/2263/72546.
Full textDissertation (MSc)--University of Pretoria, 2019.
National Research Foundation (NRF) - Scarce skills
Computer Science
MSc
Unrestricted
Bischof, Jonathan Michael. "Interpretable and Scalable Bayesian Models for Advertising and Text." Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11400.
Full textStatistics
Abrantes, Filipe André Catarino. "Processos e ferramentas de análise de Big Data : a análise de sentimento no twitter." Master's thesis, Instituto Superior de Economia e Gestão, 2017. http://hdl.handle.net/10400.5/15802.
Full textCom o aumento exponencial na produção de dados a nível mundial, torna-se crucial encontrar processos e ferramentas que permitam analisar este grande volume de dados (comumente denominado de Big Data), principalmente os não estruturados como é o caso dos dados produzidos em formato de texto. As empresas, hoje, tentam extrair valor destes dados, muitos deles gerados por clientes ou potenciais clientes, que lhes podem conferir vantagem competitiva. A dificuldade subsiste na forma como se analisa dados não estruturados, nomeadamente, os dados produzidos através das redes digitais, que são uma das grandes fontes de informação das organizações. Neste trabalho será enquadrada a problemática da estruturação e análise de Big Data, são apresentadas as diferentes abordagens para a resolução deste problema e testada uma das abordagens num bloco de dados selecionado. Optou-se pela abordagem de análise de sentimento, através de técnica de text mining, utilizando a linguagem R e texto partilhado na rede Twitter, relativo a quatro gigantes tecnológicas: Amazon, Apple, Google e Microsoft. Conclui-se, após o desenvolvimento e experimento do protótipo realizado neste projeto, que é possível efetuar análise de sentimento de tweets utilizando a ferramenta R, permitindo extrair informação de valor a partir de grandes blocos de dados.
Due to the exponential increase of global data, it becomes crucial to find processes and tools that make it possible to analyse this large volume (usually known as Big Data) of unstructured data, especially, the text format data. Nowadays, companies are trying to extract value from these data, mostly generated by customers or potential customers, which can assure a competitive leverage. The main difficulty is how to analyse unstructured data, in particular, data generated through digital networks, which are one of the biggest sources of information for organizations. During this project, the problem of Big Data structuring and analysis will be framed, will be presented the different approaches to solve this issue and one of the approaches will be tested in a selected data block. It was selected the sentiment analysis approach, using text mining technique, R language and text shared in Twitter, related to four technology giants: Amazon, Apple, Google and Microsoft. In conclusion, after the development and experimentation of the prototype carried out in this project, that it is possible to perform tweets sentiment analysis using the tool R, allowing to extract valuable information from large blocks of data.
info:eu-repo/semantics/publishedVersion
Hill, Geoffrey. "Sensemaking in Big Data: Conceptual and Empirical Approaches to Actionable Knowledge Generation from Unstructured Text Streams." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1433597354.
Full textChennen, Kirsley. "Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAJ076/document.
Full textOver the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate
Soen, Kelvin, and Bo Yin. "Customer Behaviour Analysis of E-commerce : What information can we get from customers' reviews through big data analysis." Thesis, KTH, Entreprenörskap och Innovation, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254194.
Full textEntrepreneurship & Innovation Management
Lindén, Johannes. "Huvudtitel: Understand and Utilise Unformatted Text Documents by Natural Language Processing algorithms." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31043.
Full textSavalli, Antonino. "Tecniche analitiche per “Open Data”." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17476/.
Full textYu, Shuren. "How to Leverage Text Data in a Decision Support System? : A Solution Based on Machine Learning and Qualitative Analysis Methods." Thesis, Umeå universitet, Institutionen för informatik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163899.
Full textAlshaer, Mohammad. "An Efficient Framework for Processing and Analyzing Unstructured Text to Discover Delivery Delay and Optimization of Route Planning in Realtime." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1105/document.
Full textInternet of Things (IoT) is leading to a paradigm shift within the logistics industry. The advent of IoT has been changing the logistics service management ecosystem. Logistics services providers today use sensor technologies such as GPS or telemetry to collect data in realtime while the delivery is in progress. The realtime collection of data enables the service providers to track and manage their shipment process efficiently. The key advantage of realtime data collection is that it enables logistics service providers to act proactively to prevent outcomes such as delivery delay caused by unexpected/unknown events. Furthermore, the providers today tend to use data stemming from external sources such as Twitter, Facebook, and Waze. Because, these sources provide critical information about events such as traffic, accidents, and natural disasters. Data from such external sources enrich the dataset and add value in analysis. Besides, collecting them in real-time provides an opportunity to use the data for on-the-fly analysis and prevent unexpected outcomes (e.g., such as delivery delay) at run-time. However, data are collected raw which needs to be processed for effective analysis. Collecting and processing data in real-time is an enormous challenge. The main reason is that data are stemming from heterogeneous sources with a huge speed. The high-speed and data variety fosters challenges to perform complex processing operations such as cleansing, filtering, handling incorrect data, etc. The variety of data – structured, semi-structured, and unstructured – promotes challenges in processing data both in batch-style and real-time. Different types of data may require performing operations in different techniques. A technical framework that enables the processing of heterogeneous data is heavily challenging and not currently available. In addition, performing data processing operations in real-time is heavily challenging; efficient techniques are required to carry out the operations with high-speed data, which cannot be done using conventional logistics information systems. Therefore, in order to exploit Big Data in logistics service processes, an efficient solution for collecting and processing data in both realtime and batch style is critically important. In this thesis, we developed and experimented with two data processing solutions: SANA and IBRIDIA. SANA is built on Multinomial Naïve Bayes classifier whereas IBRIDIA relies on Johnson's hierarchical clustering (HCL) algorithm which is hybrid technology that enables data collection and processing in batch style and realtime. SANA is a service-based solution which deals with unstructured data. It serves as a multi-purpose system to extract the relevant events including the context of the event (such as place, location, time, etc.). In addition, it can be used to perform text analysis over the targeted events. IBRIDIA was designed to process unknown data stemming from external sources and cluster them on-the-fly in order to gain knowledge/understanding of data which assists in extracting events that may lead to delivery delay. According to our experiments, both of these approaches show a unique ability to process logistics data. However, SANA is found more promising since the underlying technology (Naïve Bayes classifier) out-performed IBRIDIA from performance measuring perspectives. It is clearly said that SANA was meant to generate a graph knowledge from the events collected immediately in realtime without any need to wait, thus reaching maximum benefit from these events. Whereas, IBRIDIA has an important influence within the logistics domain for identifying the most influential category of events that are affecting the delivery. Unfortunately, in IBRIRDIA, we should wait for a minimum number of events to arrive and always we have a cold start. Due to the fact that we are interested in re-optimizing the route on the fly, we adopted SANA as our data processing framework
Musil, David. "Algoritmus pro detekci pozitívního a negatívního textu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-242026.
Full textCancellieri, Andrea. "Analisi di tecniche per l'estrazione di informazioni da documenti testuali e non strutturati." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7773/.
Full textCanducci, Marco. "Previsioni di borsa mediante analisi di dati testuali: studio ed estensione di un metodo basato su Google Trends." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017.
Find full textRisch, Jean-Charles. "Enrichissement des Modèles de Classification de Textes Représentés par des Concepts." Thesis, Reims, 2017. http://www.theses.fr/2017REIMS012/document.
Full textMost of text-classification methods use the ``bag of words” paradigm to represent texts. However Bloahdom and Hortho have identified four limits to this representation: (1) some words are polysemics, (2) others can be synonyms and yet differentiated in the analysis, (3) some words are strongly semantically linked without being taken into account in the representation as such and (4) certain words lose their meaning if they are extracted from their nominal group. To overcome these problems, some methods no longer represent texts with words but with concepts extracted from a domain ontology (Bag of Concept), integrating the notion of meaning into the model. Models integrating the bag of concepts remain less used because of the unsatisfactory results, thus several methods have been proposed to enrich text features using new concepts extracted from knowledge bases. My work follows these approaches by proposing a model-enrichment step using a domain ontology, I proposed two measures to estimate to belong to the categories of these new concepts. Using the naive Bayes classifier algorithm, I tested and compared my contributions on the Ohsumed corpus using the domain ontology ``Disease Ontology”. The satisfactory results led me to analyse more precisely the role of semantic relations in the enrichment step. These new works have been the subject of a second experiment in which we evaluate the contributions of the hierarchical relations of hypernymy and hyponymy
Gerrish, Charlotte. "European Copyright Law and the Text and Data Mining Exceptions and Limitations : With a focus on the DSM Directive, is the EU Approach a Hindrance or Facilitator to Innovation in the Region?" Thesis, Uppsala universitet, Juridiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385195.
Full textMariaux, Sébastien. "Les organisations de l'économie sociale et solidaire face aux enjeux écologiques : stratégies de communication et d'action environnementale." Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0463.
Full textThe protection of the natural environment is a key issue for the future of humanity. SSE, which shares the principles of sustainable development, is particularly well suited to implement more environmentally friendly development alternatives. The purpose of this research is to examine the factors and modalities of environmental action in this heterogeneous economy. The thesis looks at SSE organisations from the perspective of organisational identity and focuses on environmental communication on the one hand, and concrete actions on the other. The study of environmental communication is based on the social network Twitter. It is based on a program coded in Python, and on automatic text mining techniques. It highlights several rhetorical strategies. A second study deals with seven cases, based on semi-directive interviews. It sheds light on the role of individual commitment but also on collective logic in environmental action.This work makes a methodological contribution by developing the approach of automatic text mining, which is rarely used in Management Sciences. On the theoretical level, the thesis introduces the collective dimension as anborganisational identity of the SSE. We then adapt an environmental action model by identifying an additional determinant specific to these organizations. Finally, the research invites the SSE to put ecological issues back at the centre, and gives suggestions for supporting organisations in their efforts to protect the environment
Francia, Matteo. "Progettazione di un sistema di Social Intelligence e Sentiment Analysis per un'azienda del settore consumer goods." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3850/.
Full textDoucet, Rachel A., Deyan M. Dontchev, Javon S. Burden, and Thomas L. Skoff. "Big data analytics test bed." Thesis, Monterey, California: Naval Postgraduate School, 2013. http://hdl.handle.net/10945/37615.
Full textThe proliferation of big data has significantly expanded the quantity and breadth of information throughout the DoD. The task of processing and analyzing this data has become difficult, if not infeasible, using traditional relational databases. The Navy has a growing priority for information processing, exploitation, and dissemination, which makes use of the vast network of sensors that produce a large amount of big data. This capstone report explores the feasibility of a scalable Tactical Cloud architecture that will harness and utilize the underlying open-source tools for big data analytics. A virtualized cloud environment was built and analyzed at the Naval Postgraduate School, which offers a test bed, suitable for studying novel variations of these architectures. Further, the technologies directly used to implement the test bed seek to demonstrate a sustainable methodology for rapidly configuring and deploying virtualized machines and provides an environment for performance benchmark and testing. The capstone findings indicate the strategies and best practices to automate the deployment, provisioning and management of big data clusters. The functionality we seek to support is a far more general goal: finding open-source tools that help to deploy and configure large clusters for on-demand big data analytics.
Lucchi, Giulia. "Applicazione web per visualizzare e gestire dati estratti da Twitter." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12555/.
Full textGrepl, Filip. "Aplikace pro řízení paralelního zpracování dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445490.
Full textJing, Liping, and 景麗萍. "Text subspace clustering with feature weighting and ontologies." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B39332834.
Full textO'Sullivan, Jack William. "Biostatistical and meta-research approaches to assess diagnostic test use." Thesis, University of Oxford, 2018. http://ora.ox.ac.uk/objects/uuid:1419df96-1534-4cfe-b686-cde554ff7345.
Full text陳我智 and Ngor-chi Chan. "Text-to-speech conversion for Putonghua." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.
Full textCardinal, Robert W. "DATA REDUCTION AND PROCESSING SYSTEM FOR FLIGHT TEST OF NEXT GENERATION BOEING AIRPLANES." International Foundation for Telemetering, 1993. http://hdl.handle.net/10150/608878.
Full textThis paper describes the recently developed Loral Instrumentation ground-based equipment used to select and process post-flight test data from the Boeing 777 airplane as it is played back from a digital tape recorder (e.g., the Ampex DCRSi II) at very high speeds. Gigabytes (GB) of data, stored on recorder cassettes in the Boeing 777 during flight testing, are played back on the ground at a 15-30 MB/sec rate into ten multiplexed Loral Instrumentation System 500 Model 550s for high-speed decoding, processing, time correlation, and subsequent storage or distribution. The ten Loral 550s are multiplexed for independent data path processing from ten separate tape sources simultaneously. This system features a parallel multiplexed configuration that allows Boeing to perform critical 777 flight test processing at unprecedented speeds. Boeing calls this system the Parallel Multiplexed Processing Data (PMPD) System. The key advantage of the ground station's design is that Boeing engineers can add their own application-specific control and setup software. The Loral 550 VMEbus allows Boeing to add VME modules when needed, ensuring system growth with the addition of other LI-developed products, Boeing-developed products or purchased VME modules. With hundreds of third-party VME modules available, system expansion is unlimited. The final system has the capability to input data at 15 MB/sec. The present aggregate throughput capability of all ten 24-bit Decoders is 150 MB/sec from ten separate tape sources. A 24-bit Decoder was designed to support the 30 MB/sec DCRSi III so that the system can eventually support a total aggregate throughput of 300 MB/sec. Clearly, such high data selection, rejection, and processing will significantly accelerate flight certification and production testing of today's state-of-the-art aircraft. This system was supplied with low level software interfaces such that the customer would develop their own applications specific code and displays. The Loral 550 lends itself to this kind of applications due to its VME chassis, VxWorks operating system and the modularity of the software.
Hon, Wing-kai, and 韓永楷. "On the construction and application of compressed text indexes." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31059739.
Full textLam, Yan-ki Jacky. "Developmental normative data for the random gap detection test." Click to view the E-thesis via HKU Scholors Hub, 2005. http://lookup.lib.hku.hk/lookup/bib/B38279289.
Full text"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2005." Also available in print.
Lee, Wai-ming, and 李慧明. "Correlation of PCPT and SPT data from a shallow marine site investigation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B44570077.
Full textHo, Yuen-ying, and 何婉瑩. "The effect of introducing a computer software in enhancing comprehension of classical Chinese text." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1995. http://hub.hku.hk/bib/B31957869.
Full textFranco, Davide. "The Borexino experiment test of the purification systems and data analysis in the counting test facility /." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=974442968.
Full textYang, Wenwei, and 楊文衛. "Development and application of automatic monitoring system for standard penetration test in site investigation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B36811919.
Full textSmedley, Mark, and Gary Simpson. "SHOCK & VIBRATION TESTING OF AN AIRBORNE INSTRUMENTATION DIGITAL RECORDER." International Foundation for Telemetering, 2000. http://hdl.handle.net/10150/606747.
Full textShock and vibration testing was performed on the Metrum-Datatape Inc. 32HE recorder to determine its viability as an airborne instrumentation recorder. A secondary goal of the testing was to characterize the recorder operational shock and vibration envelope. Both flight testing and laboratory environmental testing of the recorder was performed to make these determinations. This paper addresses the laboratory portion of the shock and vibration testing and addresses the test methodology and rationale, test set-up, results, challenges, and lessons learned.
Wong, Ping-wai, and 黃炳蔚. "Semantic annotation of Chinese texts with message structures based on HowNet." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38212389.
Full textKozák, David. "Indexace rozsáhlých textových dat a vyhledávání v zaindexovaných datech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417263.
Full textStolz, Carsten Dirk. "Erfolgsmessung informationsorientierter Websites." kostenfrei, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?idn=989985180.
Full textMittermayer, Marc-André. "Einsatz von Text Mining zur Prognose kurzfristiger Trends von Aktienkursen nach der Publikation von Unternehmensnachrichten /." Berlin : dissertation.de, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2871284&prov=M&dok_var=1&dok_ext=htm.
Full textMoyse, Gilles. "Résumés linguistiques de données numériques : interprétabilité et périodicité de séries." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066526/document.
Full textOur research is in the field of fuzzy linguistic summaries (FLS) that allow to generate natural language sentences to describe very large amounts of numerical data, providing concise and intelligible views of these data. We first focus on the interpretability of FLS, crucial to provide end-users with an easily understandable text, but hard to achieve due to its linguistic form. Beyond existing works on that topic, based on the basic components of FLS, we propose a general approach for the interpretability of summaries, considering them globally as groups of sentences. We focus more specifically on their consistency. In order to guarantee it in the framework of standard fuzzy logic, we introduce a new model of oppositions between increasingly complex sentences. The model allows us to show that these consistency properties can be satisfied by selecting a specific negation approach. Moreover, based on this model, we design a 4-dimensional cube displaying all the possible oppositions between sentences in a FLS and show that it generalises several existing logical opposition structures. We then consider the case of data in the form of numerical series and focus on linguistic summaries about their periodicity: the sentences we propose indicate the extent to which the series are periodic and offer an appropriate linguistic expression of their periods. The proposed extraction method, called DPE, standing for Detection of Periodic Events, splits the data in an adaptive manner and without any prior information, using tools from mathematical morphology. The segments are then exploited to compute the period and the periodicity, measuring the quality of the estimation and the extent to which the series is periodic. Lastly, DPE returns descriptive sentences of the form ``Approximately every 2 hours, the customer arrival is important''. Experiments with artificial and real data show the relevance of the proposed DPE method. From an algorithmic point of view, we propose an incremental and efficient implementation of DPE, based on established update formulas. This implementation makes DPE scalable and allows it to process real-time streams of data. We also present an extension of DPE based on the local periodicity concept, allowing the identification of local periodic subsequences in a numerical series, using an original statistical test. The method validated on artificial and real data returns natural language sentences that extract information of the form ``Every two weeks during the first semester of the year, sales are high''
O’Donnell, John. "SOME PRACTICAL CONSIDERATIONS IN THE USE OF PSEUDO-RANDOM SEQUENCES FOR TESTING THE EOS AM-1 RECEIVER." International Foundation for Telemetering, 1998. http://hdl.handle.net/10150/609651.
Full textThere are well-known advantages in using pseudo-random sequences for testing of data communication links. The sequences, also called pseudo-noise (PN) sequences, approximate random data very well, especially for sequences thousands of bits long. They are easy to generate and are widely used for bit error rate testing because it is easy to synchronize a slave pattern generator to a received PN stream for bit-by-bit comparison. There are other aspects of PN sequences, however, that are not as widely known or applied. This paper points out how some of the less familiar characteristics of PN sequences can be put to practical use in the design of a Digital Test Set and other specialbuilt test equipment used for checkout of the EOS AM-1 Space Data Receiver. The paper also shows how knowledge of these PN sequence characteristics can simplify troubleshooting the digital sections in the Space Data Receiver. Finally, the paper addresses the sufficiency of PN data testing in characterizing the performance of a receiver/data recovery system.
Kopylova, Evguenia. "Algorithmes bio-informatiques pour l'analyse de données de séquençage à haut débit." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2013. http://tel.archives-ouvertes.fr/tel-00919185.
Full textNyström, Josefina. "Multivariate non-invasive measurements of skin disorders." Doctoral thesis, Umeå University, Chemistry, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-865.
Full textThe present thesis proposes new methods for obtaining objective and accurate diagnoses in modern healthcare. Non-invasive techniques have been used to examine or diagnose three different medical conditions, namely neuropathy among diabetics, radiotherapy induced erythema (skin redness) among breast cancer patients and diagnoses of cutaneous malignant melanoma. The techniques used were Near-InfraRed spectroscopy (NIR), Multi Frequency Bio Impedance Analysis of whole body (MFBIA-body), Laser Doppler Imaging (LDI) and Digital Colour Photography (DCP).
The neuropathy for diabetics was studied in papers I and II. The first study was performed on diabetics and control subjects of both genders. A separation was seen between males and females and therefore the data had to be divided in order to obtain good models. NIR spectroscopy was shown to be a viable technique for measuring neuropathy once the division according to gender was made. The second study on diabetics, where MFBIA-body was added to the analysis, was performed on males exclusively. Principal component analysis showed that healthy reference subjects tend to separate from diabetics. Also, diabetics with severe neuropathy separate from persons less affected.
The preliminary study presented in paper III was performed on breast cancer patients in order to investigate if NIR, LDI and DCP were able to detect radiotherapy induced erythema. The promising results in the preliminary study motivated a new and larger study. This study, presented in papers IV and V, intended to investigate the measurement techniques further but also to examine the effect that two different skin lotions, Essex and Aloe vera have on the development of erythema. The Wilcoxon signed rank sum test showed that DCP and NIR could detect erythema, which is developed during one week of radiation treatment. LDI was able to detect erythema developed during two weeks of treatment. None of the techniques could detect any differences between the two lotions regarding the development of erythema.
The use of NIR to diagnose cutaneous malignant melanoma is presented as unpublished results in this thesis. This study gave promising but inconclusive results. NIR could be of interest for future development of instrumentation for diagnosis of skin cancer.
Narmack, Kirilll. "Dynamic Speed Adaptation for Curves using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233545.
Full textMorgondagens fordon kommer att vara mer sofistikerade, intelligenta och säkra än dagens fordon. Framtiden lutar mot fullständigt autonoma fordon. Detta examensarbete tillhandahåller en datadriven lösning för ett hastighetsanpassningssystem som kan beräkna ett fordons hastighet i kurvor som är lämpligt för förarens körstil, vägens egenskaper och rådande väder. Ett hastighetsanpassningssystem för kurvor har som mål att beräkna en fordonshastighet för kurvor som kan användas i Advanced Driver Assistance Systems (ADAS) eller Autonomous Driving (AD) applikationer. Detta examensarbete utfördes på Volvo Car Corporation. Litteratur kring hastighetsanpassningssystem samt faktorer som påverkar ett fordons hastighet i kurvor studerades. Naturalistisk bilkörningsdata samlades genom att köra bil samt extraherades från Volvos databas och bearbetades. Ett nytt hastighetsanpassningssystem uppfanns, implementerades samt utvärderades. Hastighetsanpassningssystemet visade sig vara kapabelt till att beräkna en lämplig fordonshastighet för förarens körstil under rådande väderförhållanden och vägens egenskaper. Två olika artificiella neuronnätverk samt två matematiska modeller användes för att beräkna fordonets hastighet. Dessa metoder jämfördes och utvärderades.
Andrade, Carina Sofia Marinho de. "Text mining na análise de sentimentos em contextos de big data." Master's thesis, 2015. http://hdl.handle.net/1822/40034.
Full textA evolução da tecnologia associada à constante utilização de diferentes dispositivos conectados à internet proporciona um vasto crescimento do volume e variedade de dados gerados diariamente a grande velocidade, fenómeno habitualmente denominado de Big Data. Relacionado com o crescimento do volume de dados está o aumento da notoriedade das várias técnicas de Text Mining, devido essencialmente à possibilidade de retirar maior valor dos dados gerados pelas várias aplicações, tentando-se assim obter informação benéfica para várias áreas de estudo. Um dos atuais pontos de interesse no que a este tema diz respeito é a Análise de Sentimentos onde através de várias técnicas é possível perceber, entre os mais variados tipos de dados, que sentimentos e opiniões se encontram implícitas nos mesmos. Tendo esta dissertação como finalidade o desenvolvimento de um sistema baseado em tecnologia Big Data e que assentará sobre técnicas de Text Mining e Análise de Sentimentos para o apoio à decisão, o documento enquadra conceptualmente os três conceitos acima referidos, fornecendo uma visão global dos mesmos e descrevendo aplicações práticas onde geralmente são utilizados. Além disso, é proposta uma arquitetura para a Análise de Sentimentos no contexto de utilização de dados provenientes da rede social Twitter e desenvolvidas aplicações práticas, recorrendo a exemplos do quotidiano onde a Análise de Sentimentos traz benefícios quando é aplicada. Com os casos de demonstração apresentados é possível verificar o papel de cada tecnologia utilizada e da técnica adotada para a Análise de Sentimentos. Por outro lado, as conclusões a que se chega com os casos de demonstração, permitem perceber as dificuldades que ainda existem associadas à realização de Análise de Sentimentos: as dificuldades no tratamento de texto, a falta de dicionários em Português, entre outros assuntos que serão abordados neste documento.
The evolution of technology, associated with the common use of different devices connected to the internet, provides a vast growth in the data volume and variety that are daily generated at high velocity, phenomenon commonly denominated as Big Data. Related with the growth in data volume is the increase awareness of several Text Mining techniques, making possible the extraction of useful insight from the data generated by multiple applications, thus trying to obtain beneficial information to multiple study areas. One of the current interests in what concerns this topic is Sentiment Analysis, where through the use of several data analysis techniques it is possible to understand, among a vast variety of data and data types, which sentiments and opinions are implicit in them. Since the purpose of this dissertation is the development of a system based on Big Data technologies that will implement Text Mining and Sentiment Analysis techniques for decision support, this document presents a conceptual framework of the three concepts mentioned above, providing a global overview of them and describing practical applications where they are generally used. Besides, it is proposed an architecture for Sentiment Analysis in the context of data from the Twitter social network. For that, practical applications are developed, using real world examples where Sentiment Analysis brings benefits when applied. With the presented demonstration cases it is possible to verify the role of each technology used and the techniques adopted for Sentiment Analysis. Moreover, the conclusions drawn from the demonstration cases allow us to understand the difficulties that are still present in the development of Sentiment Analysis: difficulties in text processing, the lack of Portuguese lexicons, among other topics addressed in this document.
WU, JIA-HAO, and 吳家豪. "On-line Health News Analysis Involving Big Data based on Text Mining." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/70822968718916156554.
Full text國立聯合大學
資訊管理學系碩士班
104
People in Taiwan have been alerted by the problems of food safety for the past few years; therefore, people have paid more attention to health news. This study tries to find the critical terms in the on-line health news and predict votes for the "Like" of the news based on text mining and business intelligence algorithms. In addition, in order to deal with the possible big data from on-line news, this study proposes a big data system structure with Hadoop-based platform and Spark parallel framework by parallel processing on multiple data nodes. The results show that the support vector machine with 50 concept dimensions has the best prediction accuracy. When the number of iterations raised, the corresponding execution time increased; however, the increasing ratio of the execution time is much less than that of the iterations. Moreover, when the amount of data becomes huge, the performance of Spark distributed computing structure will improve significantly. The proposed approach can help managers of on-line news to choose or invest more popular health news thus to attract more potential readers. The proposed structure and analytic results regarding big data can also provide insights for the future studies.
Amado, Maria Alexandra Amorim. "A review of the literature on big data in marketing using text mining." Master's thesis, 2015. http://hdl.handle.net/10071/11101.
Full textCom a quantidade de dados atualmente existente, as organizações têm acesso a cada vez mais informação, recolhendo dados de todos os tipos e acumulando facilmente terabytes ou petabytes de dados. Estes dados são provenientes de várias fontes: streams de redes sociais, mobile, imagens, transações, sinais de GPS e etc.. Analisar esta grande quantidade de dados, atualmente designado de Big Data, é cada vez mais uma preocupação das organizações em termos de concorrência, potenciando o crescimento da produtividade e inovação. Mas o que é exatamente o Big Data? O Big Data é mais do que apenas uma questão de tamanho: com o aparecimento de novas tecnologias para recolha de dados Data Mining avançado, através de ferramentas de análise de dados poderosas, o Big Data oferece uma oportunidade sem precedentes para adquirir conhecimento através de novos tipos de dados e descobrir oportunidades de negócio mais rapidamente. A sua aplicação ao Marketing pode trazer um grande potencial às organizações uma vez que lhes permite melhorar a visão do mercado, a criação de melhores interações com os clientes através da investigação do seu comportamento, a fim de identificar a mensagem certa para entregar no canal certo, no momento certo para o cliente certo. Essas interações melhoradas resultam num aumento de receitas e diferenciação competitiva. Neste estudo foi utilizado Text Mining para desenvolver uma revisão automática da literatura e analisar a aplicação do Big Data em Marketing em quatro dimensões: temporal, geográfica, setores e produtos
Ke, Cheng Hao, and 柯政豪. "The Ecology of Big Data: A Memetic Approach on the Evolution of Online Text." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/z5bard.
Full text國立政治大學
公共行政學系
104
The mismatch between theory and method is a crisis which the discipline of public administration cannot afford to ignore. The arrival of the “Era of Big Data”, only serves to make matters worse. As data becomes uncoupled with the individual, so goes any pretense of trying to provide analyses beyond that of mere description. If public administration refuses to import new ontology and epistemology, then very little could be gained from online text research. The Darwinian theory of evolution, ever since the Modern Synthesis, has embraced the replicator centered point of view when explaining all living phenomena. This has unshackled the theory from limitations of the traditional individual centered view of evolution. Memetics is a recent offshoot of the theory of evolution. It views social cultural change as a process based on the evolution of a cultural unit of selection, the meme. Due to memetics’ ability to explain social cultural evolution from the meme’s point of view, it is a natural candidate to examine the dynamics of “big” online text data. The first part of this research is on the construction of an online text analysis framework, with testable hypotheses, through the integration of past literature on evolution, social cultural evolution, memetics and ecology. The second part is concerned with the testing of the framework with empirical data. The text corpus used in this research contains 1,761 news reports from the Yahoo! News website on the issue of high school curriculum change. Chinese term segmentation and text clustering algorithms were applied to the corpus, in order to extract text quasi-species composed of similar memes. Statistical tests were then used to determine the influence of text characteristics and temporal distribution dynamics on the population of quasi-species. Findings indicate that the population dynamics of text quasi-species were influenced by density dependence. Text characteristics, such as word length and sentiment, also exert significant influence on the number of comments that each text receives. However, these influences are not equal under different density conditions. The location of the news articles within the website also creates a difference in the number of comments received. Finally, interactions between the temporal distribution of different quasi-species and between quasi-species and term groups also yielded significant positive and negative correlations. The results are proof that memetics is an ideal theoretical platform to connect theory with text mining/analysis methods. It allows for a theory based approach and the creation of testable hypotheses. Frameworks and methods based on evolution and ecological research are also applicable under memetics. The empirical findings point to the importance of monitoring the temporal distribution of online text, and the significance of text characteristics and website environments to text population changes. The results also illustrate the importance of term groups in the influence of text population dynamics. Together these variables and effects are all central to the understanding of the change in online text and comment numbers, and the effect of past text population on current population changes. Online texts from different websites should also be analyzed separately. This research recommends that future public administration big data analyses should continue to adopt the memetic approach. Nevertheless, attention should be given to the strengths and weaknesses of different text mining algorithms and density dependence tests. Big data time series from different websites and with longer temporal spans should also be considered, while social cultural artifacts other than texts should not be excluded from memetics based researches. New frameworks must also be constructed to integrate and understand, the interaction between important variables, such as, text characteristics and environmental influences. Findings on all forms of online data would also be enhanced through comparisons with results from questionnaires designed with memetics in mind.
熊原朗. "Optimization Study of Applying Apache Spark on Plain Text Big Data with Association Rules Operations." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6b74rz.
Full text國立彰化師範大學
資訊工程學系
107
Plain texts generated by humans on the Internet is increasing. The ISPs also use this data to create competitive systems that provide more appropriate services. In various of big data computing frameworks, it is quite common to use Apache Spark to process plain text data and use collaborative filtering to build recommendation systems. However, when using Spark for data processing, it may encounter that developers implement different APIs for specific text operations, which have a considerable impact on the performance and efficiency. Moreover, many of researchers and medium-sized enterprises run small-scale clusters, and most of the research on Spark parameter adjustment is in large-scale clusters. For small-scale clusters, there will be different interactions between parameters and node performance. This paper provides a performance optimization study for small-scale cluster deployment in the context of the application of Spark to process plain text big data association rules operations. Through different APIs and different operating parameters, to meet the lack of computational power of small-scale clusters to achieve the highest efficiency in a limited environment. Using the improved implementation of this paper, the maximum speed can be increased by 3.44 times, and the operation can be completed when the output data size exceeds 3 times of the available memory of a single node. After simulating the small-scale cluster load, it is found that using Kryo serialization, recommended parallelism, and giving Spark its own allocation of core resources instead of manual allocation, the highest computing performance can be obtained.
Veiga, Hugo Miguel Ferrão Casal da. "Text mining e twitter : o poder das redes sociais num mercado competitivo." Master's thesis, 2016. http://hdl.handle.net/10362/17365.
Full textActualmente, com a massificação da utilização das redes sociais, as empresas passam a sua mensagem nos seus canais de comunicação, mas os consumidores dão a sua opinião sobre ela. Argumentam, opinam, criticam (Nardi, Schiano, Gumbrecht, & Swartz, 2004). Positiva ou negativamente. Neste contexto o Text Mining surge como uma abordagem interessante para a resposta à necessidade de obter conhecimento a partir dos dados existentes. Neste trabalho utilizámos um algoritmo de Clustering hierárquico com o objectivo de descobrir temas distintos num conjunto de tweets obtidos ao longo de um determinado período de tempo para as empresas Burger King e McDonald’s. Com o intuito de compreender o sentimento associado a estes temas foi feita uma análise de sentimentos a cada tema encontrado, utilizando um algoritmo Bag-of-Words. Concluiu-se que o algoritmo de Clustering foi capaz de encontrar temas através do tweets obtidos, essencialmente ligados a produtos e serviços comercializados pelas empresas. O algoritmo de Sentiment Analysis atribuiu um sentimento a esses temas, permitindo compreender de entre os produtos/serviços identificados quais os que obtiveram uma polaridade positiva ou negativa, e deste modo sinalizar potencias situações problemáticas na estratégia das empresas, e situações positivas passíveis de identificação de decisões operacionais bem-sucedidas.
KUANG, PEI-WEI, and 匡裴暐. "A Study of On-line Tourism News base on Business Intelligence and Text Mining – A Big Data Structure." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/jv5f42.
Full text國立聯合大學
資訊管理學系碩士班
105
awareness in Taiwan, demands for travel are growing fast; therefore, people have paid more attention to tourism-related news. In addition, because of the booming development of internet technology, the amount of data increases dramatically. Traditional database structure is insufficient to deal with problems involving big data. This study employs the concept of text mining and business intelligence to analyze and predict on-line tourism news based on a Hadoop-based big data structure. First, a text mining approach is utilized to analyze the content of the on-line tourism news. Correlation analysis and association rule algorithm are adopted to analyze the relationships among the content of news, and the number of “Click”, “Share” and “Hashtag”. Then, a genetic-based ensemble method consisted of the ordinal logistic regression, support vector machine and decision tree algorithm is developed to predict the number of the “Click” and “Share” of on-line tourism news, and the number of domestic tourists. The results show that the proposed approach and structure can increase hit rates and computational efficiency.
CHEN, SZU-LING, and 陳思伶. "Using Big Data and Text Analytics to Understand How Customer Experiences Posted on Yelp.com Impact the Hospitality Industry." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/67055472370562629103.
Full text國立臺北大學
企業管理學系
104
Nowadays, E-commerce systems used by the major Internet organizations, such as Google, Amazon, expedia.com, include highly scalable E-commerce platforms and social media platforms. These companies try to make use of web data that is less structured but composed of rich customer views and behavioral information. However, studies using these unstructured data to generate business value are still under- researched. This paper focuses on exploring the value of customer reviews post on the social media platforms in the hospitality industry by using big data analytic techniques. We aim to find the keywords that can help customer to find a suitable hotel. To be more specific, this study combines programming skills and applies data mining approaches to analyze lots of consumer reviews extracted from Yelp.com to deconstruct hotel guest experience and digging the possible texture that can be applied when searching or booking hotels. More importantly, the new approach we use in this study would make it possible to utilize big data analytics to find different perspectives that might not have been studied in existing hospitality literature. Moreover, it serves as a basis for further research related to unstructured data used in the E-commerce and hospitality industries.