Dissertations / Theses on the topic 'Named Entity Classification'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 18 dissertations / theses for your research on the topic 'Named Entity Classification.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Alasiry, Areej Mohammed. "Named entity recognition and classification in search queries." Thesis, Birkbeck (University of London), 2015. http://bbktheses.da.ulcc.ac.uk/154/.
Full textRosvall, Erik. "Comparison of sequence classification techniques with BERT for named entity recognition." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261419.
Full textDen här uppsatsen tar avstamp från den senaste utvecklingen inom datorlingvistik som skett med bakgrund av den nya transformator-arkitekturen (engelska “Transformer”). En av de senare modellerna som presenterats är en djup dubbelriktad modell, kallad BERT, som förbättrade flera resultat inom datorlingvistik. BERT är en modell som tränats på generell språkförståelse genom att bearbeta stora textmängder och sedan specialanpassas till ett specifikt problemområde. BERT kan användas för flera uppgifter inom datorlingvistik men denna uppsats tittade specifikt på informationsextraktion av entiteter (engelska “Named Entity Recognition”). Uppsatsen jämförde den ursprungliga modellen som presenterades med en ny klassificerare baserat på Conditional Random Fields. Modellen utvärderades på CoNLL-03, ett dataset från Reuters nyhetsartiklar skrivna på engelska. Resultatet visade att Conditional Random Field klassificerare presterade bättre mätt i F1-resultat, med ungefär 0.25 procentenheter. Uppsatsen lyckades inte reproducera BERTs ursprungliga resultat men jämför de två arkitekturerna över de hyperparametrar som föreslagits för specialanpassning till uppgiften. Conditional Random Fields visade bättre resultat för de flesta modellkonfigurationerna, men även mindre varians i resultat för olika parametrar vilket skapar ett starkt incitament att använda Conditional Random Fields som klassificerare.
Kliegr, Tomáš. "Unsupervised Entity Classification with Wikipedia and WordNet." Doctoral thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-126861.
Full textVolkova, Svitlana. "Entity extraction, animal disease-related event recognition and classification from web." Thesis, Kansas State University, 2010. http://hdl.handle.net/2097/4593.
Full textDepartment of Computing and Information Sciences
William H. Hsu
Global epidemic surveillance is an essential task for national biosecurity management and bioterrorism prevention. The main goal is to protect the public from major health threads. To perform this task effectively one requires reliable, timely and accurate medical information from a wide range of sources. Towards this goal, we present a framework for epidemiological analytics that can be used to extract and visualize infectious disease outbreaks from the variety of unstructured web sources automatically. More precisely, in this thesis, we consider several research tasks including document relevance classification, entity extraction and animal disease-related event recognition in the veterinary epidemiology domain. First, we crawl web sources and classify collected documents by topical relevance using supervised learning algorithms. Next, we propose a novel approach for automated ontology construction in the veterinary medicine domain. Our approach is based on semantic relationship discovery using syntactic patterns. We then apply our automatically-constructed ontology for the domain-specific entity extraction task. Moreover, we compare our ontology-based entity extraction results with an alternative sequence labeling approach. We introduce a sequence labeling method for the entity tagging that relies on syntactic feature extraction using a sliding window. Finally, we present our novel sentence-based event recognition approach that includes three main steps: entity extraction of animal diseases, species, locations, dates and the confirmation status n-grams; event-related sentence classification into two categories - suspected or confirmed; automated event tuple generation and aggregation. We show that our document relevance classification results as well as entity extraction and disease-related event recognition results are significantly better compared to the results reported by other animal disease surveillance systems.
Yosef, Mohamed Amir [Verfasser], and Gerhard [Akademischer Betreuer] Weikum. "U-AIDA : a customizable system for named entity recognition, classification, and disambiguation / Mohamed Amir Yosef. Betreuer: Gerhard Weikum." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2016. http://d-nb.info/1083894722/34.
Full textMendes, Pablo N. "Adaptive Semantic Annotation of Entity and Concept Mentions in Text." Wright State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=wright1401665504.
Full textSidås, Albin, and Simon Sandberg. "Conversational Engine for Transportation Systems." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176810.
Full textUrbansky, David. "Automatic Extraction and Assessment of Entities from the Web." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-97469.
Full textLiaghat, Zeinab. "Quality-efficiency trade-offs in machine learning applied to text processing." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/402575.
Full textAvui en dia, la quantitat de documents digitals disponibles està creixent ràpidament, expandint- se a un ritme considerable i procedint de diverses fonts. Les fonts d’informació no estructurada i semiestructurada inclouen la World Wide Web, articles de notícies, bases de dades biològiques, correus electrònics, biblioteques digitals, repositoris electrònics governamentals, , sales de xat, forums en línia, blogs i mitjans socials com Facebook, Instagram, LinkedIn, Pinterest, Twitter, YouTube i molts d’altres. Extreure’n informació d’aquests recursos i trobar informació útil d’aquestes col.leccions s’ha convertit en un desafiament que fa que l’organització d’aquesta enorme quantitat de dades esdevingui una necessitat. La mineria de dades, l’aprenentatge automàtic i el processament del llenguatge natural són tècniques poderoses que poden utilitzar-se conjuntament per fer front a aquest gran desafiament. Segons la tasca o el problema en qüestió existeixen molts emfo- caments diferents que es poden utilitzar. Els mètodes que s’estan implementant s’optimitzen continuament, però aquests mètodes d’aprenentatge automàtic supervisats han estat provats i comparats amb grans dades d’entrenament. La pregunta és : Què passa amb la qualitat dels mètodes si incrementem les dades de 100 MB a 1 GB? Més encara: Les millores en la qualitat valen la pena quan la taxa de processament de les dades minva? Podem canviar qualitat per eficiència, tot recuperant la perdua de qualitat quan processem més dades? Aquesta tesi és una primera aproximació per resoldre aquestes preguntes de forma gene- ral per a tasques de processament de text, ja que no hi ha hagut suficient investigació per a comparar aquests mètodes considerant el balanç entre el tamany de les dades, la qualitat dels resultats i el temps de processament. Per tant, proposem un marc per analitzar aquest balanç i l’apliquem a tres problemes importants de processament de text: Reconeixement d’Entitats Anomenades, Anàlisi de Sentiments i Classificació de Documents. Aquests problemes tam- bé han estat seleccionats perquè tenen nivells diferents de granularitat: paraules, opinions i documents complerts. Per a cada problema seleccionem diferents algoritmes d’aprenentatge automàtic i avaluem el balanç entre aquestes variables per als diferents algoritmes en grans conjunts de dades públiques ( notícies, opinions, patents). Utilitzem subconjunts de diferents tamanys entre 50 MB i alguns GB per a explorar aquests balanç. Per acabar, com havíem suposat, no perquè un algoritme és eficient en poques dades serà eficient en grans quantitats de dades. Per als dos últims problemes considerem algoritmes similars i també dos conjunts diferents de dades i tècniques d’avaluació per a estudiar l’impacte d’aquests dos paràmetres en els resultats. Mostrem que els resultats no canvien significativament amb aquests canvis.
Hoy en día, la cantidad de documentos digitales disponibles está creciendo rápidamente, ex- pandiéndose a un ritmo considerable y procediendo de una variedad de fuentes. Estas fuentes de información no estructurada y semi estructurada incluyen la World Wide Web, artículos de noticias, bases de datos biológicos, correos electrónicos, bibliotecas digitales, repositorios electrónicos gubernamentales, salas de chat, foros en línea, blogs y medios sociales como Fa- cebook, Instagram, LinkedIn, Pinterest, Twitter, YouTube, además de muchos otros. Extraer información de estos recursos y encontrar información útil de tales colecciones se ha convertido en un desafío que hace que la organización de esa enorme cantidad de datos sea una necesidad. La minería de datos, el aprendizaje automático y el procesamiento del lenguaje natural son técnicas poderosas que pueden utilizarse conjuntamente para hacer frente a este gran desafío. Dependiendo de la tarea o el problema en cuestión, hay muchos enfoques dife- rentes que se pueden utilizar. Los métodos que se están implementando se están optimizando continuamente, pero estos métodos de aprendizaje automático supervisados han sido probados y comparados con datos de entrenamiento grandes. La pregunta es ¿Qué pasa con la calidad de los métodos si incrementamos los datos de 100 MB a 1GB? Más aún, ¿las mejoras en la cali- dad valen la pena cuando la tasa de procesamiento de los datos disminuye? ¿Podemos cambiar calidad por eficiencia, recuperando la perdida de calidad cuando procesamos más datos? Esta tesis es una primera aproximación para resolver estas preguntas de forma general para tareas de procesamiento de texto, ya que no ha habido investigación suficiente para comparar estos métodos considerando el balance entre el tamaño de los datos, la calidad de los resultados y el tiempo de procesamiento. Por lo tanto, proponemos un marco para analizar este balance y lo aplicamos a tres importantes problemas de procesamiento de texto: Reconocimiento de En- tidades Nombradas, Análisis de Sentimientos y Clasificación de Documentos. Estos problemas fueron seleccionados también porque tienen distintos niveles de granularidad: palabras, opinio- nes y documentos completos. Para cada problema seleccionamos distintos algoritmos de apren- dizaje automático y evaluamos el balance entre estas variables para los distintos algoritmos en grandes conjuntos de datos públicos (noticias, opiniones, patentes). Usamos subconjuntos de distinto tamaño entre 50 MB y varios GB para explorar este balance. Para concluir, como ha- bíamos supuesto, no porque un algoritmo es eficiente en pocos datos será eficiente en grandes cantidades de datos. Para los dos últimos problemas consideramos algoritmos similares y tam- bién dos conjuntos distintos de datos y técnicas de evaluación, para estudiar el impacto de estos dos parámetros en los resultados. Mostramos que los resultados no cambian significativamente con estos cambios.
Skeppstedt, Maria. "Extracting Clinical Findings from Swedish Health Record Text." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-109254.
Full textAsbayou, Omar. "L'identification des entités nommées en arabe en vue de leur extraction et classification automatiques : la construction d’un système à base de règles syntactico-sémantique." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2136.
Full textThis thesis explains and presents our approach of rule-based system of arabic named entity recognition and classification. This work involves two disciplines : linguistics and computer science. Computer tools and linguistic rules are merged to give birth to a new discipline : Natural Languge Processsing, which operates in different levels (morphosyntactic, syntactic, semantic, syntactico-semantic…). So, in our particular case, we have put the necessary linguistic information and rules to software sevice. This later should be able to apply and implement them in order to recognise and classify, by syntactic and semantic annotations, the different named entity classes.This work of thesis is incorporated within the general domain of natural language processing, but it particularly falls within the scope of the continuity of the accomplished work in terms of morphosyntactic analysis and the realisation of lexical data bases of SAMIA and then DIINAR as well as the accompanying scientific recearch. This task aimes at lexical enrichement with simple and complex named entities and at establishing the transition from the morphological analysis into syntactic and syntactico-semantic analysis. The ultimate objective is text analysis. To understand what it is about, it was important to start with named entity definition. To carry out this task, we distinguished between two main named entity types : pur proper name and descriptive named entities. We have also established a referential classification on the basis of different classes and sub-classes which constitue the reference for our semantic annotations. Nevertheless, we are confronted with two major difficulties : lexical ambiguity and the frontiers of complex named entities. Our system adoptes a syntactico-semantic rule-based approach. After Level 0 of morpho-syntactic analysis, the system is made up of five levels of syntactic and syntactico-semantic patterns based on tne necessary linguisic information (i.e. morphosyntactic, syntactic, semantic and syntactico-semantic information).This work has obtained very good results in termes of precision, recall and F-measure. The output of our system has an interesting contribution in different applications of the natural language processing especially in both tasks of information retrieval and information extraction. In fact, we have concretely exploited our system output in both applications (information retrieval and information extraction). In addition to this unique experience, we envisage in the future work to extend our system into the sentence extraction and classification, in which classified entities, mainly named entities and verbs, play respectively the role of arguments and predicates. The second objective consists in the enrichment of different types of lexical resources such as ontologies
Pfeifer, Katja. "Serviceorientiertes Text Mining am Beispiel von Entitätsextrahierenden Diensten." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-150646.
Full textDoms, Andreas. "GoPubMed: Ontology-based literature search for the life sciences." Doctoral thesis, Technische Universität Dresden, 2008. https://tud.qucosa.de/id/qucosa%3A23835.
Full textLee, Sunshin. "Geo-Locating Tweets with Latent Location Information." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/75022.
Full textPh. D.
Zaccara, Rodrigo Constantin Ctenas. "Anotação e classificação automática de entidades nomeadas em notícias esportivas em Português Brasileiro." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-06092012-135831/.
Full textThe main target of this research is to develop an automatic named entity classification tool to sport news written in Brazilian Portuguese. To reduce this scope, during training and analysis only sport news about São Paulo Championship of 2011 written by UOL2 (Universo Online) was used. The first artefact developed was the WebCorpus tool, which aims to make easier the process of add meta informations to words, through a rich web interface. Using this, all the corpora news are tagged manually. The database used by this tool was fed by the crawler tool, also developed during this research. The second artefact developed was the corpora UOLCP2011 (UOL Campeonato Paulista 2011). This corpora was manually tagged using the WebCorpus tool. During this process, seven classification concepts were used: person, place, organization, team, championship, stadium and fans. To develop the automatic named entity classification tool, three different approaches were analysed: maximum entropy, inverted index and merge tecniques using both. Each approach had three steps: algorithm development, training using machine learning tecniques and best score analysis.
Yeh, Cheng-Hui, and 葉政輝. "A Corpus-Based Chinese Named-Entity Classification." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/13056048608687113177.
Full text國立交通大學
資訊科學系
91
Named-entity identification plays an important role in natural language processing, especially in document processing and message understanding. Named-entity can be a keyword on web or full-text retrieval. We can understand relationships among persons, events, locations, date or time in documents via correct named-entity identification. In this thesis, we use probabilities of characters used in common Chinese person names to retrieve Chinese person name. Furthermore, we propose co-occurring-neighbor word model and part-of-speech model to combine key terms and tagging information prior/posterior to named-entities. After training, we have 89% precision and 99% recall rate on Chinese person name classification experiments, 89% precision and 84% recall rate on organization classification experiments.
Thomas, Stefan. "Verbesserung einer Erkennungs- und Normalisierungsmaschine für natürlichsprachige Zeitausdrücke." 2012. https://ul.qucosa.de/id/qucosa%3A17239.
Full textUsbeck, Ricardo. "Knowledge Extraction for Hybrid Question Answering." Doctoral thesis, 2016. https://ul.qucosa.de/id/qucosa%3A15647.
Full text