Dissertations / Theses on the topic 'Metadata mining'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 22 dissertations / theses for your research on the topic 'Metadata mining.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Demšar, Urška. "Exploring geographical metadata by automatic and visual data mining." Licentiate thesis, KTH, Infrastructure, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-1779.
Full textMetadata are data about data. They describe characteristicsand content of an original piece of data. Geographical metadatadescribe geospatial data: maps, satellite images and othergeographically referenced material. Such metadata have twocharacteristics, high dimensionality and diversity of attributedata types, which present a problem for traditional data miningalgorithms.
Other problems that arise during the exploration ofgeographical metadata are linked to the expertise of the userperforming the analysis. The large amounts of metadata andhundreds of possible attributes limit the exploration for anon-expert user, which results in a potential loss ofinformation that is hidden in metadata.
In order to solve some of these problems, this thesispresents an approach for exploration of geographical metadataby a combination of automatic and visual data mining.
Visual data mining is a principle that involves the human inthe data exploration by presenting the data in some visualform, allowing the human to get insight into the data and torecognise patterns. The main advantages of visual dataexploration over automatic data mining are that the visualexploration allows a direct interaction with the user, that itis intuitive and does not require complex understanding ofmathematical or statistical algorithms. As a result the userhas a higher confidence in the resulting patterns than if theywere produced by computer only.
In the thesis we present the Visual data mining tool (VDMtool), which was developed for exploration of geographicalmetadata for site planning. The tool provides five differentvisualisations: a histogram, a table, a pie chart, a parallelcoordinates visualisation and a clustering visualisation. Thevisualisations are connected using the interactive selectionprinciple called brushing and linking.
In the VDM tool the visual data mining concept is integratedwith an automatic data mining method, clustering, which finds ahierarchical structure in the metadata, based on similarity ofmetadata items. In the thesis we present a visualisation of thehierarchical structure in the form of a snowflake graph.
Keywords:visualisation, data mining, clustering, treedrawing, geographical metadata.
Tang, Yaobin. "Butterfly -- A model of provenance." Worcester, Mass. : Worcester Polytechnic Institute, 2009. http://www.wpi.edu/Pubs/ETD/Available/etd-031309-095511/.
Full textRamakrishnan, Cartic. "Extracting, Representing and Mining Semantic Metadata from Text: Facilitating Knowledge Discovery in Biomedicine." Wright State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=wright1222021939.
Full textDong, Zheng. "Automated Extraction and Retrieval of Metadata by Data Mining : a Case Study of Mining Engine for National Land Survey Sweden." Thesis, University of Gävle, Department of Technology and Built Environment, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-6811.
Full textMetadata is the important information describing geographical data resources and their key elements. It is used to guarantee the availability and accessibility of the data. ISO 19115 is a metadata standard for geographical information, making the geographical metadata shareable, retrievable, and understandable at the global level. In order to cope with the massive, high-dimensional and high-diversity nature of geographical data, data mining is an applicable method to discover the metadata.
This thesis develops and evaluates an automated mining method for extracting metadata from the data environment on the Local Area Network at the National Land Survey of Sweden (NLS). These metadata are prepared and provided across Europe according to the metadata implementing rules for the Infrastructure for Spatial Information in Europe (INSPIRE). The metadata elements are defined according to the numerical formats of four different data entities: document data, time-series data, webpage data, and spatial data. For evaluating the method for further improvement, a few attributes and corresponding metadata of geographical data files are extracted automatically as metadata record in testing, and arranged in database. Based on the extracted metadata schema, a retrieving functionality is used to find the file containing the keyword of metadata user input. In general, the average success rate of metadata extraction and retrieval is 90.0%.
The mining engine is developed in C# programming language on top of the database using SQL Server 2005. Lucene.net is also integrated with Visual Studio 2005 to build an indexing framework for extracting and accessing metadata in database.
Al-Natsheh, Hussein. "Text Mining Approaches for Semantic Similarity Exploration and Metadata Enrichment of Scientific Digital Libraries." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2062.
Full textFor scientists and researchers, it is very critical to ensure knowledge is accessible for re-use and development. Moreover, the way we store and manage scientific articles and their metadata in digital libraries determines the amount of relevant articles we can discover and access depending on what is actually meant in a search query. Yet, are we able to explore all semantically relevant scientific documents with the existing keyword-based search information retrieval systems? This is the primary question addressed in this thesis. Hence, the main purpose of our work is to broaden or expand the knowledge spectrum of researchers working in an interdisciplinary domain when they use the information retrieval systems of multidisciplinary digital libraries. However, the problem raises when such researchers use community-dependent search keywords while other scientific names given to relevant concepts are being used in a different research community.Towards proposing a solution to this semantic exploration task in multidisciplinary digital libraries, we applied several text mining approaches. First, we studied the semantic representation of words, sentences, paragraphs and documents for better semantic similarity estimation. In addition, we utilized the semantic information of words in lexical databases and knowledge graphs in order to enhance our semantic approach. Furthermore, the thesis presents a couple of use-case implementations of our proposed model
Petersson, Andreas. "Data mining file sharing metadata : A comparison between Random Forests Classificiation and Bayesian Networks." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11180.
Full textPetersson, Andreas. "Data mining file sharing metadata : A comparison between Random Forests Classification and Bayesian Networks." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11285.
Full textFerrill, Paul. "REFERENCE DESIGN FOR A SQUADRON LEVEL DATA ARCHIVAL SYSTEM." International Foundation for Telemetering, 2006. http://hdl.handle.net/10150/604259.
Full textAs more aircraft are fitted with solid state memory recording systems, the need for a large data archival storage system becomes increasingly important. In addition, there is a need to keep classified and unclassified data separate but available to the aircrews for training and debriefing along with some type of system for cataloging and searching for specific missions. This paper will present a novel approach along with a reference design for using commercially available hardware and software and a minimal amount of custom programming to help address these issues.
Lockard, Michael T., R. Rajagopalan, and James A. Garling. "MINING IRIG-106 CHAPTER 10 AND HDF-5 DATA." International Foundation for Telemetering, 2006. http://hdl.handle.net/10150/604264.
Full textRapid access to ever-increasing amounts of test data is becoming a problem. The authors have developed a data-mining methodology solution approach to provide a solution to catalog test files, search metadata attributes to derive test data files of interest, and query test data measurements using a web-based engine to produce results in seconds. Generated graphs allow the user to visualize an overview of the entire test for a selected set of measurements, with areas highlighted where the query conditions were satisfied. The user can then zoom into areas of interest and export selected information.
Srinivasan, Uma Computer Science & Engineering Faculty of Engineering UNSW. "A FRAMEWORK FOR CONCEPTUAL INTEGRATION OF HETEROGENEOUS DATABASES." Awarded by:University of New South Wales. School of Computer Science and Engineering, 1997. http://handle.unsw.edu.au/1959.4/33463.
Full textKamenieva, Iryna. "Research Ontology Data Models for Data and Metadata Exchange Repository." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-6351.
Full textFor researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks.
Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized.
For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models - Protégé Version 3.4.
Šmerda, Vojtěch. "Grafický editor metadat pro OLAP." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235893.
Full textAlserafi, Ayman. "Dataset proximity mining for supporting schema matching and data lake governance." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671540.
Full textAmb l’enorme creixement de la quantitat de dades generades pels sistemes d’informació, és habitual avui en dia emmagatzemar conjunts de dades en els seus formats bruts (és a dir, sense cap pre-processament de dades ni transformacions) en dipòsits de dades a gran escala anomenats Data Lakes (DL). Aquests dipòsits emmagatzemen conjunts de dades d’àrees temàtiques heterogènies (que abasten molts temes empresarials) i amb molts esquemes diferents. Per tant, és un repte per als científics de dades que utilitzin la DL per a l’anàlisi de dades trobar conjunts de dades rellevants per a les seves tasques d’anàlisi sense cap suport ni govern de dades. L’objectiu és poder extreure metadades i informació sobre conjunts de dades emmagatzemats a la DL per donar suport al científic en trobar fonts rellevants. Aquest és l’objectiu principal d’aquesta tesi, on explorem diferents tècniques de perfilació de dades, concordança d’esquemes holístics i recomanació d’anàlisi per donar suport al científic. Proposem un nou marc basat en l’aprenentatge automatitzat supervisat per extreure automàticament metadades que descriuen conjunts de dades, incloent el càlcul de les seves similituds i coincidències de dades mitjançant tècniques de concordança d’esquemes holístics. Utilitzem les relacions extretes entre conjunts de dades per categoritzar-les automàticament per donar suport al científic del fet de trobar conjunts de dades rellevants amb la intersecció entre les seves dades. Això es fa mitjançant una nova tècnica basada en metadades anomenada mineria de proximitat que consumeix els metadades extrets mitjançant algoritmes automatitzats de mineria de dades per tal de detectar conjunts de dades relacionats i proposar-ne categories rellevants. Ens centrem en conjunts de dades plans (tabulars) organitzats com a files d’instàncies de dades i columnes d’atributs que descriuen les instàncies. El nostre marc proposat utilitza les quatre tècniques principals següents: (1) Esquema de concordança basat en instàncies per detectar ítems rellevants de dades entre conjunts de dades heterogènies, (2) Extracció de metadades de nivell de dades i mineria de proximitat per detectar conjunts de dades relacionats, (3) Extracció de metadades a nivell de atribut i mineria de proximitat per detectar conjunts de dades relacionats i, finalment, (4) Categorització de conjunts de dades automàtica mitjançant tècniques supervisades per k-Nearest-Neighbour (kNN). Posem en pràctica els nostres algorismes proposats mitjançant un prototip que mostra la viabilitat d’aquest marc. El prototip s’experimenta en un escenari DL real del món per demostrar la viabilitat, l’eficàcia i l’eficiència del nostre enfocament, de manera que hem pogut aconseguir elevades taxes de record i guanys d’eficiència alhora que millorem el consum computacional d’espai i temps mitjançant dues ordres de magnitud mitjançant el nostre es van proposar tècniques de poda anticipada i pre-filtratge en comparació amb tècniques de concordança d’esquemes basades en instàncies clàssiques. Això demostra l'efectivitat dels nostres mètodes automàtics proposats en les tasques de poda inicial i pre-filtratge per a la coincidència d'esquemes holístics i la classificació automàtica del conjunt de dades, tot demostrant també millores en l'anàlisi de dades basades en humans per a les mateixes tasques.
Avec l’énorme croissance de la quantité de données générées par les systèmes d’information, il est courant aujourd’hui de stocker des ensembles de données (datasets) dans leurs formats bruts (c’est-à-dire sans prétraitement ni transformation de données) dans des référentiels de données à grande échelle appelés Data Lakes (DL). Ces référentiels stockent des ensembles de données provenant de domaines hétérogènes (couvrant de nombreux sujets commerciaux) et avec de nombreux schémas différents. Par conséquent, il est difficile pour les data-scientists utilisant les DL pour l’analyse des données de trouver des datasets pertinents pour leurs tâches d’analyse sans aucun support ni gouvernance des données. L’objectif est de pouvoir extraire des métadonnées et des informations sur les datasets stockés dans le DL pour aider le data-scientist à trouver des sources pertinentes. Cela constitue l’objectif principal de cette thèse, où nous explorons différentes techniques de profilage de données, de correspondance holistique de schéma et de recommandation d’analyse pour soutenir le data-scientist. Nous proposons une nouvelle approche basée sur l’intelligence artificielle, spécifiquement l’apprentissage automatique supervisé, pour extraire automatiquement les métadonnées décrivant les datasets, calculer automatiquement les similitudes et les chevauchements de données entre ces ensembles en utilisant des techniques de correspondance holistique de schéma. Les relations entre datasets ainsi extraites sont utilisées pour catégoriser automatiquement les datasets, afin d’aider le data-scientist à trouver des datasets pertinents avec intersection entre leurs données. Cela est fait via une nouvelle technique basée sur les métadonnées appelée proximity mining, qui consomme les métadonnées extraites via des algorithmes de data mining automatisés afin de détecter des datasets connexes et de leur proposer des catégories pertinentes. Nous nous concentrons sur des datasets plats (tabulaires) organisés en rangées d’instances de données et en colonnes d’attributs décrivant les instances. L’approche proposée utilise les quatres principales techniques suivantes: (1) Correspondance de schéma basée sur l’instance pour détecter les éléments de données pertinents entre des datasets hétérogènes, (2) Extraction de métadonnées au niveau du dataset et proximity mining pour détecter les datasets connexes, (3) Extraction de métadonnées au niveau des attributs et proximity mining pour détecter des datasets connexes, et enfin, (4) catégorisation automatique des datasets via des techniques supervisées k-Nearest-Neighbour (kNN). Nous implémentons les algorithmes proposés via un prototype qui montre la faisabilité de cette approche. Nous appliquons ce prototype à une scénario DL du monde réel pour prouver la faisabilité, l’efficacité et l’efficience de notre approche, nous permettant d’atteindre des taux de rappel élevés et des gains d’efficacité, tout en diminuant le coût en espace et en temps de deux ordres de grandeur, via nos techniques proposées d’élagage précoce et de pré-filtrage, comparé aux techniques classiques de correspondance de schémas basées sur les instances. Cela prouve l’efficacité des méthodes automatiques proposées dans les tâches d’élagage précoce et de pré-filtrage pour la correspondance de schéma holistique et la cartegorisation automatique des datasets, tout en démontrant des améliorations par rapport à l’analyse de données basée sur l’humain pour les mêmes tâches.
Savalli, Antonino. "Tecniche analitiche per “Open Data”." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17476/.
Full textBauckmann, Jana, Ziawasch Abedjan, Ulf Leser, Heiko Müller, and Felix Naumann. "Covering or complete? : Discovering conditional inclusion dependencies." Universität Potsdam, 2012. http://opus.kobv.de/ubp/volltexte/2012/6208/.
Full textDatenabhängigkeiten (wie zum Beispiel Integritätsbedingungen), werden verwendet, um die Qualität eines Datenbankschemas zu erhöhen, um Anfragen zu optimieren und um Konsistenz in einer Datenbank sicherzustellen. In den letzten Jahren wurden bedingte Abhängigkeiten (conditional dependencies) vorgestellt, die die Qualität von Daten analysieren und verbessern sollen. Eine bedingte Abhängigkeit ist eine Abhängigkeit mit begrenztem Gültigkeitsbereich, der über Bedingungen auf einem oder mehreren Attributen definiert wird. In diesem Bericht betrachten wir bedingte Inklusionsabhängigkeiten (conditional inclusion dependencies; CINDs). Wir generalisieren die Definition von CINDs anhand der Unterscheidung von überdeckenden (covering) und vollständigen (completeness) Bedingungen. Wir stellen einen Anwendungsfall für solche CINDs vor, der den Nutzen von CINDs bei der Lösung komplexer Datenqualitätsprobleme aufzeigt. Darüber hinaus definieren wir Qualitätsmaße für Bedingungen basierend auf Sensitivität und Genauigkeit. Wir stellen effiziente Algorithmen vor, die überdeckende und vollständige Bedingungen innerhalb vorgegebener Schwellwerte finden. Unsere Algorithmen wählen nicht nur die Werte der Bedingungen, sondern finden auch die Bedingungsattribute automatisch. Abschließend zeigen wir, dass unser Ansatz effizient sinnvolle und hilfreiche Ergebnisse für den vorgestellten Anwendungsfall liefert.
Alves, Luiz Gustavo Pacola. "CollaboraTVware: uma infra-estrutura ciente de contexto para suporte a participação colaborativa no cenário da TV Digital Interativa." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-20072009-114734/.
Full textThe advent of the Interactive Digital TV around the world transforms, ultimately, the user experience in watching TV, making it richer mainly by enabling user interactivity. The users become pro-active and begin to interact with very different ways: building virtual communities, discussion about contents, sending messages and recommendations etc. In this scenario the user participation in a collaborative assumes an important and essential role. Additionally, the reception in Interactive Digital TV is done by devices that due to digital convergence are increasingly present in ubiquitous environments. Another preponderant issue to consider, resulting from this media, is the growing of the number and diversity of programs and interactive services available, increasing the difficulty of selecting relevant content. Thus, the main objective of this work is to propose and implement a software infrastructure in an Interactive Digital Television environment entitled CollaboraTVware to guide in a transparent way, users in the choice of programs and interactive services through the collaborative participation of other users with similar profiles and contexts. In the scope of this work, the collaborative participation corresponds to the rating given by users in order to express opinions about the content transmitted. The modeling of user, device used and context of user interaction, essential for the development of CollaboraTVware, are represented by granular metadata standards used in the field of Interactive Digital TV (MPEG-7, MPEG-21 and TV-Anytime), and its extensions needed. The CollaboraTVware architecture is composed of two subsystems: user device and service provider. The classification task, from the theory of data mining, is the approach adopted in the infrastructure design. The concept of participative usage profile is presented and discussed. To demonstrate the functionalities in a use scenario, was developed an application (collaborative EPG) as a case study which uses the CollaboraTVware.
Raad, Elie. "Découverte des relations dans les réseaux sociaux." Phd thesis, Université de Bourgogne, 2011. http://tel.archives-ouvertes.fr/tel-00702269.
Full textMoreux, Jean-Philippe, and Guillaume Chiron. "Image Retrieval in Digital Libraries: A Large Scale Multicollection Experimentation of Machine Learning techniques." Sächsische Landesbibliothek - Staats- und Universitätsbibliothek Dresden, 2017. https://slub.qucosa.de/id/qucosa%3A16444.
Full textSi historiquement, les bibliothèques numériques patrimoniales furent d’abord alimentées par des images, elles profitèrent rapidement de la technologie OCR pour indexer les collections imprimées afin d’améliorer périmètre et performance du service de recherche d’information offert aux utilisateurs. Mais l’accès aux ressources iconographiques n’a pas connu les mêmes progrès et ces dernières demeurent dans l’ombre : indexation manuelle lacunaire, hétérogène et non viable à grande échelle ; silos documentaires par genre iconographique ; recherche par le contenu (CBIR, content-based image retrieval) encore peu opérationnelle sur les collections patrimoniales. Aujourd’hui, il serait pourtant possible de mieux valoriser ces ressources, en particulier en exploitant les énormes volumes d’OCR produits durant les deux dernières décennies (tant comme descripteur textuel que pour l’identification automatique des illustrations imprimées). Et ainsi mettre en valeur ces gravures, dessins, photographies, cartes, etc. pour leur valeur propre mais aussi comme point d’entrée dans les collections, en favorisant découverte et rebond de document en document, de collection à collection. Cet article décrit une approche ETL (extract-transform-load) appliquée aux images d’une bibliothèque numérique à vocation encyclopédique : identifier et extraire l’iconographie partout où elle se trouve (dans les collections image mais aussi dans les imprimés : presse, revue, monographie) ; transformer, harmoniser et enrichir ses métadonnées descriptives grâce à des techniques d’apprentissage machine – machine learning – pour la classification et l’indexation automatiques ; charger ces données dans une application web dédiée à la recherche iconographique (ou dans d’autres services de la bibliothèque). Approche qualifiée de pragmatique à double titre, puisqu’il s’agit de valoriser des ressources numériques existantes et de mettre à profit des technologies (quasiment) mâtures.
Engvall, Tove. "Nyckeln till arkiven : En kritisk diskursanalytisk studie om interoperabilitet och kollektivt minne." Thesis, Mittuniversitetet, Avdelningen för arkiv- och datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-21171.
Full textLangenberg, Tristan Matthias [Verfasser], Florentin [Akademischer Betreuer] Wörgötter, Florentin [Gutachter] Wörgötter, Carsten [Gutachter] Damm, Wolfgang [Gutachter] May, Jens [Gutachter] Grabowski, Stephan [Gutachter] Waack, and Minija [Gutachter] Tamosiunaite. "Deep Learning Metadata Fusion for Traffic Light to Lane Assignment / Tristan Matthias Langenberg ; Gutachter: Florentin Wörgötter, Carsten Damm, Wolfgang May, Jens Grabowski, Stephan Waack, Minija Tamosiunaite ; Betreuer: Florentin Wörgötter." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2019. http://d-nb.info/1191989100/34.
Full textΓιαννακούδη, Θεοδούλα. "Προηγμένες τεχνικές και αλγόριθμοι εξόρυξης γνώσης για την προσωποποίηση της πρόσβασης σε δικτυακούς τόπους." 2005. http://nemertes.lis.upatras.gr/jspui/handle/10889/151.
Full textWeb personalization is a domain which has gained great momentum not only in the research area, where many research units have addressed the problem form different perspectives, but also in the industrial area, where a variety of modules for the personalization process is available. The objective is, researching the information hidden in the web server log files to discover the interactions between web sites visitors and web sites pages. This information can be further exploited for web sites optimization, ensuring more effective navigation for the user and client retention in the industrial case. A primary step before the personalization is the web usage mining, where the knowledge hidden in the log files is revealed. Web usage mining is the procedure where the information stored in the Web server logs is processed by applying statistical and data mining techniques such as clustering, association rules discovery, classification, and sequential pattern discovery, in order to reveal useful patterns that can be further analyzed. Recently, there has been an effort to incorporate Web content in the web usage mining process, in order to enhance the effectiveness of personalization. The interest in this thesis is focused on the domain of the knowledge mining for usage of web sites and how this procedure can get the better of attributes of the semantic web. Initially, techniques and algorithms that have been proposed lately in the field of web usage mining are presented. After, the role of the context in the usage mining process is introduced and two relevant works are presented: a usage mining technique based on the PLSA model, which may integrate attributes of the site content, and a personalization system which uses the site content in order to enhance a recommendation engine. After analyzing theoretically the usage mining domain, a new system is proposed, the ORGAN, which is named after Ontology-oRiented usaGe ANalysis. ORGAN concerns the stage of log files analysis and the domain of knowledge mining for the web site usage based on the semantic attributes of the web site. The web site semantic attributes have resulted from the web site pages applying data mining techniques and have been annotated by an OWL ontology. ORGAN provides an interface for queries submission concerning the average level of visitation and the semantics of the web site pages, exploiting the knowledge for the site, as it is derived from the ontology. There is an extensive description of the design, the development and the experimental evaluation of the system.
Schöneberg, Hendrik. "Semiautomatische Metadaten-Extraktion und Qualitätsmanagement in Workflow-Systemen zur Digitalisierung historischer Dokumente." Doctoral thesis, 2014. https://nbn-resolving.org/urn:nbn:de:bvb:20-opus-104878.
Full textDie Extraktion von Metadaten aus historischen Dokumenten ist eine zeitintensive, komplexe und höchst fehleranfällige Tätigkeit, die üblicherweise vom menschlichen Experten übernommen werden muss. Sie ist jedoch notwendig, um Bezüge zwischen Dokumenten herzustellen, Suchanfragen zu historischen Ereignissen korrekt zu beantworten oder semantische Verknüpfungen aufzubauen. Um den manuellen Aufwand dieser Aufgabe reduzieren zu können, sollen Verfahren der Named Entity Recognition angewendet werden. Die Klassifikation von Termen in historischen Handschriften stellt jedoch eine große Herausforderung dar, da die Domäne eine hohe Schreibweisenvarianz durch unter anderem nur konventionell vereinbarte Orthographie mit sich bringt. Diese Arbeit stellt Verfahren vor, die auch in komplexen syntaktischen Umgebungen arbeiten können, indem sie auf Informationen aus dem Kontext der zu klassifizierenden Terme zurückgreifen und diese mit domänenspezifischen Heuristiken kombinieren. Weiterhin wird evaluiert, wie die so gewonnenen Metadaten genutzt werden können, um in Workflow-Systemen zur Digitalisierung historischer Handschriften Mehrwerte durch Heuristiken zur Produktionsfehlererkennung zu erzielen