Dissertations / Theses on the topic 'Matching Schemes'

To see the other types of publications on this topic, follow the link: Matching Schemes.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Matching Schemes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lie, Chin Cheong Patrick. "Geometrically constrained matching schemes." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=39316.

Full text
Abstract:
We present an effective method for solving different types of noisy pattern matching problems in Euclidean space. The matching is performed in either a least-squares or a mixed-norm sense under the constraint that a transformation matrix $ Theta$ is restricted to belong to the orthogonal group. Matching problems of this type can be recast as function optimization problems which can be solved by representing the orthogonal group to which $ Theta$ belongs as a Lie group and then investigating the gradient vector field associated with the function to be optimized. The projection of the gradient field onto the tangent space of the Lie group at $ Theta$, i.e., the Lie algebra, results in a descent/ascent equation for the function. The descent/ascent equation so obtained is used in a classical steepest-descent/ascent algorithm and a singular value decomposition-based recursive method in order to determine the maximum or minimum point of the function under consideration. Since $ Theta$ belongs to the orthogonal group which includes the group of permutations as a subgroup, the proposed procedure works not only for patterns consisting of ordered feature points, but also for the combinatorial problem involving patterns having unordered feature points. Generalizations of the matching problem are also formulated and include the matching of patterns from Euclidean spaces of different dimensions and the matching of patterns having unequal numbers of feature points from the same Euclidean space. Simulations are performed which demonstrate the effectiveness and the efficiency of the proposed approach in solving some practical matching problems which arise in computer vision and pattern analysis.
APA, Harvard, Vancouver, ISO, and other styles
2

Elgedawy, Islam Moukhtar, and islam_elgedawy@yahoo com au. "Correctness-Aware High-Level Functional Matching Approaches For Semantic Web Services." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20070511.162143.

Full text
Abstract:
Existing service matching approaches trade precision for recall, creating the need for humans to choose the correct services, which is a major obstacle for automating the service matching and the service aggregation processes. To overcome this problem, the matchmaker must automatically determine the correctness of the matching results according to the defined users' goals. That is, only service(s)-achieving users' goals are considered correct. This requires the high-level functional semantics of services, users, and application domains to be captured in a machine-understandable format. Also this requires the matchmaker to determine the achievement of users' goals without invoking the services. We propose the G+ model to capture the high-level functional specifications of services and users (namely goals, achievement contexts and external behaviors) providing the basis for automated goal achievement determination; also we propose the concepts substitutability graph to capture the application domains' semantics. To avoid the false negatives resulting from adopting existing constraint and behavior matching approaches during service matching, we also propose new constraint and behavior matching approaches to match constraints with different scopes, and behavior models with different number of state transitions. Finally, we propose two correctness-aware matching approaches (direct and aggregate) that semantically match and aggregate semantic web services according to their G+ models, providing the required theoretical proofs and the corresponding verifying simulation experiments.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Nan. "TRANSFORM BASED AND SEARCH AWARE TEXT COMPRESSION SCHEMES AND COMPRESSED DOMAIN TEXT RETRIEVAL." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3938.

Full text
Abstract:
In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
4

ARATA, LINDA. "Il Ruolo dei Programmi Agro-ambientali: un'analisi attraverso il Propensity Score Matching e la Programmazione Matematica Positiva con il Rischio." Doctoral thesis, Università Cattolica del Sacro Cuore, 2014. http://hdl.handle.net/10280/2469.

Full text
Abstract:
La crescente attenzione riguardo l’interconnessione tra agricoltura e aspetti ambientali così come la crescita di volatilità dei prezzi dei prodotti agricoli ha posto una nuova enfasi sull’introduzione di misure ambientali nella politiche agricole e sulla ricerca di nuovi strumenti di stabilizzazione del reddito degli agricoltori. La ricerca di questa tesi di dottorato si inserisce in questo contesto e analizza i contratti agro-ambientali, misure della Politica Agricola Comunitaria (PAC) in Unione Europea (UE), sotto una duplice prospettiva. Il primo lavoro di ricerca consiste in un’analisi degli effetti dell’adesione a tali contratti sulle scelte produttive e sulle perfomance economiche degli agricoltori in cinque Paesi dell’UE. I risultati indicano un’eterogeneità di questi effetti: in alcuni Paesi i contratti agro-ambientali sembrano essere più efficaci nel promuovere pratiche agricole sostenibili, così come in alcuni Paesi il pagamento compensativo agro-ambientale sembra non essere sufficiente a compensare la perdita di reddito dei partecipanti. Questo studio è stato condotto combinando il Propensity Score Matching con lo stimatore Difference-in-Differences. Il secondo lavoro di ricerca sviluppa una nuova proposta metodologica che incorpora il rischio in un framework di Programmazione Matematica Positiva (PMP). Il modello elaborato presenta caratteri innovativi rispetto alla letteratura sull’argomento e permette di stimare simultaneamente i prezzi ombra delle risorse, la funzione di costo non lineare dell’azienda agricola e un coefficiente di avversione al rischio specifico per ciascuna azienda. Il modello è stato applicato a tre campioni di aziende e i risultati delle stime testano la calibrazione del modello e indicano valori del coefficiente di avversione al rischio coerenti con la letteratura. Infine il modello è stato impiegato nella simulazione di diversi scenari al fine di verificare il ruolo potenziale di un contratto agro-ambientale come strumento di gestione del rischio a diversi livelli di volatilità dei prezzi agricoli.
The increasing attention to the relationship between agriculture and the environment and the rise in price volatility on agricultural markets has led to a new emphasis on agri-environmental policies as well as to a search for new risk management strategies for the farmer. The research objective of this PhD thesis is in line with this challenging context, since it provides an analysis of the EU agri-environmental schemes (AESs) from two viewpoints. First, an ex-post analysis aims at investigating the AESs for their traditional role as measures which encourage sustainable farming while compensating the farmer for the income foregone in five EU Member States. The effects of AESs participation on farmer’s production plans and economic performances differs widely across Member States and in some of them the environmental payment is not enough to compensate the income foregone of participants. This study has been performed by applying a semi-parametric technique which combines a Difference-in-Differences estimator with a Propensity Score Matching estimator. The second piece of research develops a new methodological proposal to incorporate risk into a farm level Positive Mathematical Programming (PMP) model. The model presents some innovations with respect to the previous literature and estimates simultaneously the resource shadow prices, the farm non-linear cost function and a farm-specific coefficient of absolute risk aversion. The proposed model has been applied to three farm samples and the estimation results confirm the calibration ability of the model and show values for risk aversion coefficients consistent with the literature. Finally different scenarios have been simulated to test the potential role of an AES as risk management tool under different scenarios of crop price volatility.
APA, Harvard, Vancouver, ISO, and other styles
5

Van, der Merwe Nick. "Development of an image matching scheme using feature- and area based matching techniques." Doctoral thesis, University of Cape Town, 1995. http://hdl.handle.net/11427/21341.

Full text
Abstract:
Image matching is widely considered to be one of the most difficult tasks of a digital photogrammetric system. Traditionally image matching has been approached from either an area based or a feature based point of view. In recent years significant progress has been made in Area Based Matching (ABM) techniques such as Multiphoto Geometrically Constrained Least Squares Matching. Also in the field of Feature Based Matching (FBM) improvements have been made in extracting and matching image features, using for example the Forstner Operator followed by feature matching. Generally, area- and feature based matching techniques have been developed independently from each other. The aim of this research project was to design an automated image matching scheme that combines aspects of Feature Based Matching (FBM) and Area Based Matching (ABM). The reason for taking a hybrid approach is to encapsulate only the advantages of each matching scheme while cancelling out the disadvantages. The approach taken was to combine traditional aspects of ABM in digital photogrammetry with image analysis techniques found more commonly in the area of image processing and specifically machine vision.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Yau-Jr. "Marital-property scheme, marriage promotion and matching market equilibrium." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/5856.

Full text
Abstract:
Thesis (Ph. D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (March 5, 2007) Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
7

Saleem, Khalid. "Schema Matching and Integration in Large Scale Scenarios." Montpellier 2, 2008. http://www.theses.fr/2008MON20126.

Full text
Abstract:
Le besoin d'intégrer et d'analyser des grands ensembles de données issus des bases de données publiées sur le web est présent dans de nombreux domaines d'applications comme la génomique, l'environnement, la médecine et le commerce électronique. Ces données sont, après intégration, utilisées pour prendre des décisions, des échanges de services, etc. Les outils existants pour la découverte de correspondances (appelés matchers) permettent de traiter les schémas deux par deux et nécessitent l'intervention d'un expert afin de garantir une bonne qualité des correspondances. Dans un scénario de large échelle, ces approches ne sont plus pertinentes et sont voire même infaisables à cause du nombre important de schémas et de leur taille (de l'ordre d'un millier d'éléments). Il est donc nécessaire d'automatiser la découverte de correspondances. Cependant, une méthode automatique doit préserver la qualité des correspondances et garantir des performances acceptables si l'on veut qu'elle soit utilisable. Nous avons développé des méthodes qui passent à l'échelle et qui réaliseront une découverte automatique. Nous avons proposé une méthode PORSCHE (Performance ORiented SCHEma Mediation) qui permet d'intégrer plusieurs schémas simultanément et de fournir un schéma médiateur. Cette méthode utilise un algorithme basé sur la fouille d'arbres (tree mining) et a été implémentée et expérimentée sur un grand nombre de schémas disponibles sur le web. Le Web sémantique est fortement dépendant du paradigme XML, qui suit une structure hiérarchique. Par ailleurs, l'utilisation d'ontologie se développe fortement. Nous nous intéressons à la construction d'ontologie à partir de schemas XML disponible sur le web. Nous proposons une approche automatique pour modéliser la sémantique émergente des ontologies. C'est une méthode collaborative pour la construction d'ontologie sans l'interaction directe avec les utilisateurs du domaine, des experts ou des développeurs. Une des caractéristiques très importante d'une ontologie est sa structure hiérarchique des concepts. Nous considérons des grands ensembles de schémas pour un domaine spécifique comme étant des arbres et leur appliquons des algorithmes d'extraction de sous-arbres fréquents pour découvrir des motifs (patterns) hiérarchiques en vue de construire une ontologie. Nous présentons un technique pour découvrir et proposer des correspondances complexes entre deux schemas. Ces correspondances sont ensuite validées à l'aide des mini-taxonomies qui sont les sous-arbres fréquents. La technique démontre une fois de plus la construction de la taxonomie ontologie de domaine. À cet égard, nous considérons le plus grand arbre ou un arbre créé par la fusion de l'ensemble des plus grands souvent sous-arbres comme étant une taxonomie. Nous plaidons en faveur de la confiance d'une telle taxonomie et des concepts associés car elle a été extraite à partir des schémas utilisés dans le domaine spécifié considéré
Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. The dissertation presents a new robust automatic method which integrates a large set of domain specific schemas, represented as tree structures, based upon semantic correspondences among them. The method also creates the mappings from source schemas to the integrated schema. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. Semi-automatic matching requires user intervention to finalize a certain mapping. Although it provides the flexibilty to compute the best possible mapping but time performance wise abates the whole matching process. At first, the dissertation gives a detail discussion about the state of the art in schema matching. We summarize the deficiencies in the currently available tools and techniques for meeting the requirements of large scale schema matching scenarios. Our approach, PORSCHE (Performance ORiented SCHEma Mediation) is juxtaposed to these shortcomings and its advantages are highlighted with sound experimental support. PORSCHE associated algorithms, first cluster the tree nodes based on linguistic label similarity. Then, it applies a tree mining technique using node ranks calculated during depth-first traversal. This minimises the target node search space and improves time performance, which makes the technique suitable for large scale data sharing. PORSCHE implements a hybrid approach, which also in parallel, incrementally creates an integrated schema encompassing all schema trees, and defines mappings from the contributing schemas to the integrated schema. The approach discovers 1:1 mappings for integration and mediation purposes. Formal experiments on real and synthetic data sets show that PORSCHE is scalable in time performance for large scale scenarios. The quality of mappings and integrity of the integrated schema is also verified by the experimental evaluation. Moreover, we present a technique for discovering complex match (1:n, n:1 and n:m) propositions between two schemas, validated by mini-taxonomies. These mini-taxonomies are extracted from the large set of domain specific metadata instances represented as tree structures. We propose a framework, called ExSTax (Extracting Structurally Coherent Mini-Taxonomies) based on frequent sub-tree mining, to support our idea. We further extend the ExSTax framework for extracting a reliable domain specific taxonomy
APA, Harvard, Vancouver, ISO, and other styles
8

Do, Hong-Hai. "Schema matching and mapping based data integration architecture, approaches and evaluation." Saarbrücken VDM, Müller, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2863983&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Vojíř, Stanislav. "Mapování PMML a BKEF dokumentů v projektu SEWEBAR-CMS." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-75744.

Full text
Abstract:
In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
APA, Harvard, Vancouver, ISO, and other styles
10

Tao, Cui. "Schema Matching and Data Extraction over HTML Tables." Diss., CLICK HERE for online access, 2003. http://contentdm.lib.byu.edu/ETD/image/etd279.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Karagoz, Funda. "Application Of Schema Matching Methods To Semantic Web Service Discovery." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607593/index.pdf.

Full text
Abstract:
The Web turns out to be a collection of services that interoperate through the Internet. As the number of services increase, it is getting more and more diffucult for users to find, filter and integrate these services depending on their requirements. Automatic techniques are being developed to fulfill these tasks. The first step toward automatic composition is the discovery of services needed. UDDI which is one of the accepted web standards, provides a registry of web services. However representation capabilities of UDDI are insufficient to search for services on the basis of what they provide. Semantic web initiatives like OWL and OWL-S are promising for locating exact services based on their capabilities. In this thesis, a new semantic service discovery mechanism is implemented based on OWL-S service profiles. The service profiles of an advertisement and a request are matched based on OWL ontologies describing them. In contrast to previous work on the subject, the ontologies of the advertisement and the request are not assumed to be same. In case they are different, schema matching algorithms are applied. Schema matching algorithms find the mappings between the given schema models. A hybrid combination of semantic, syntactic and structural schema matching algorithms are applied to match ontologies
APA, Harvard, Vancouver, ISO, and other styles
12

Riaz, Muhammad Atif, and Sameer Munir. "An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous Datasets." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-1938.

Full text
Abstract:
Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce. Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown. Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure. Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure. Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.
APA, Harvard, Vancouver, ISO, and other styles
13

Warren, Julie. "Talking a good match : a case study of placement matching in a specialist adolescent foster care scheme." Thesis, University of Edinburgh, 2001. http://hdl.handle.net/1842/23247.

Full text
Abstract:
The study addresses an important gap in the research, the nature of foster placement matching for adolescents. The study setting was an established, specialist adolescent fostering scheme in a large local authority social work department. Contextual data was collected by preliminary fieldwork and analysis of agency documents. Data was collected observation and interview at the site of placement decision-making and this was analysed by a novel method which employed text-based, meaning-centred and simple quantitative analytical techniques. The enquiry centred on possible inconsistencies between the agency's aspiration for its matching practice and what actually took place in the decision-making. The practice was found to deviate from its intended goals in important ways and to leave placements open to risk of instability, breakdown and failing to meet childrens needs. Certain of the findings lend strong support to current developments in the field with regard to care planning, recording and monitoring. They also raise questions yet to be addressed in policy, practice or research about the relationship between the care agency, as seeker and regulator of foster placement resources, and foster carers, as the provides. The study concludes with recommendations based on its findings.
APA, Harvard, Vancouver, ISO, and other styles
14

TANIMOTO, Masayuki, Toshiaki FUJII, Bunpei TOUJI, Tadahiko KIMOTO, and Takashi IMORI. "A Segmentation-Based Multiple-Baseline Stereo (SMBS) Scheme for Acquisition of Depth in 3-D Scenes." Institute of Electronics, Information and Communication Engineers, 1998. http://hdl.handle.net/2237/14997.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Alserafi, Ayman. "Dataset proximity mining for supporting schema matching and data lake governance." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671540.

Full text
Abstract:
With the huge growth in the amount of data generated by information systems, it is common practice today to store datasets in their raw formats (i.e., without any data preprocessing or transformations) in large-scale data repositories called Data Lakes (DLs). Such repositories store datasets from heterogeneous subject-areas (covering many business topics) and with many different schemata. Therefore, it is a challenge for data scientists using the DL for data analysis to find relevant datasets for their analysis tasks without any support or data governance. The goal is to be able to extract metadata and information about datasets stored in the DL to support the data scientist in finding relevant sources. This shapes the main goal of this thesis, where we explore different techniques of data profiling, holistic schema matching and analysis recommendation to support the data scientist. We propose a novel framework based on supervised machine learning to automatically extract metadata describing datasets, including computation of their similarities and data overlaps using holistic schema matching techniques. We use the extracted relationships between datasets in automatically categorizing them to support the data scientist in finding relevant datasets with intersection between their data. This is done via a novel metadata-driven technique called proximity mining which consumes the extracted metadata via automated data mining algorithms in order to detect related datasets and to propose relevant categories for them. We focus on flat (tabular) datasets organised as rows of data instances and columns of attributes describing the instances. Our proposed framework uses the following four main techniques: (1) Instance-based schema matching for detecting relevant data items between heterogeneous datasets, (2) Dataset level metadata extraction and proximity mining for detecting related datasets, (3) Attribute level metadata extraction and proximity mining for detecting related datasets, and finally, (4) Automatic dataset categorization via supervised k-Nearest-Neighbour (kNN) techniques. We implement our proposed algorithms via a prototype that shows the feasibility of this framework. We apply the prototype in an experiment on a real-world DL scenario to prove the feasibility, effectiveness and efficiency of our approach, whereby we were able to achieve high recall rates and efficiency gains while improving the computational space and time consumption by two orders of magnitude via our proposed early-pruning and pre-filtering techniques in comparison to classical instance-based schema matching techniques. This proves the effectiveness of our proposed automatic methods in the early-pruning and pre-filtering tasks for holistic schema matching and the automatic dataset categorisation, while also demonstrating improvements over human-based data analysis for the same tasks.
Amb l’enorme creixement de la quantitat de dades generades pels sistemes d’informació, és habitual avui en dia emmagatzemar conjunts de dades en els seus formats bruts (és a dir, sense cap pre-processament de dades ni transformacions) en dipòsits de dades a gran escala anomenats Data Lakes (DL). Aquests dipòsits emmagatzemen conjunts de dades d’àrees temàtiques heterogènies (que abasten molts temes empresarials) i amb molts esquemes diferents. Per tant, és un repte per als científics de dades que utilitzin la DL per a l’anàlisi de dades trobar conjunts de dades rellevants per a les seves tasques d’anàlisi sense cap suport ni govern de dades. L’objectiu és poder extreure metadades i informació sobre conjunts de dades emmagatzemats a la DL per donar suport al científic en trobar fonts rellevants. Aquest és l’objectiu principal d’aquesta tesi, on explorem diferents tècniques de perfilació de dades, concordança d’esquemes holístics i recomanació d’anàlisi per donar suport al científic. Proposem un nou marc basat en l’aprenentatge automatitzat supervisat per extreure automàticament metadades que descriuen conjunts de dades, incloent el càlcul de les seves similituds i coincidències de dades mitjançant tècniques de concordança d’esquemes holístics. Utilitzem les relacions extretes entre conjunts de dades per categoritzar-les automàticament per donar suport al científic del fet de trobar conjunts de dades rellevants amb la intersecció entre les seves dades. Això es fa mitjançant una nova tècnica basada en metadades anomenada mineria de proximitat que consumeix els metadades extrets mitjançant algoritmes automatitzats de mineria de dades per tal de detectar conjunts de dades relacionats i proposar-ne categories rellevants. Ens centrem en conjunts de dades plans (tabulars) organitzats com a files d’instàncies de dades i columnes d’atributs que descriuen les instàncies. El nostre marc proposat utilitza les quatre tècniques principals següents: (1) Esquema de concordança basat en instàncies per detectar ítems rellevants de dades entre conjunts de dades heterogènies, (2) Extracció de metadades de nivell de dades i mineria de proximitat per detectar conjunts de dades relacionats, (3) Extracció de metadades a nivell de atribut i mineria de proximitat per detectar conjunts de dades relacionats i, finalment, (4) Categorització de conjunts de dades automàtica mitjançant tècniques supervisades per k-Nearest-Neighbour (kNN). Posem en pràctica els nostres algorismes proposats mitjançant un prototip que mostra la viabilitat d’aquest marc. El prototip s’experimenta en un escenari DL real del món per demostrar la viabilitat, l’eficàcia i l’eficiència del nostre enfocament, de manera que hem pogut aconseguir elevades taxes de record i guanys d’eficiència alhora que millorem el consum computacional d’espai i temps mitjançant dues ordres de magnitud mitjançant el nostre es van proposar tècniques de poda anticipada i pre-filtratge en comparació amb tècniques de concordança d’esquemes basades en instàncies clàssiques. Això demostra l'efectivitat dels nostres mètodes automàtics proposats en les tasques de poda inicial i pre-filtratge per a la coincidència d'esquemes holístics i la classificació automàtica del conjunt de dades, tot demostrant també millores en l'anàlisi de dades basades en humans per a les mateixes tasques.
Avec l’énorme croissance de la quantité de données générées par les systèmes d’information, il est courant aujourd’hui de stocker des ensembles de données (datasets) dans leurs formats bruts (c’est-à-dire sans prétraitement ni transformation de données) dans des référentiels de données à grande échelle appelés Data Lakes (DL). Ces référentiels stockent des ensembles de données provenant de domaines hétérogènes (couvrant de nombreux sujets commerciaux) et avec de nombreux schémas différents. Par conséquent, il est difficile pour les data-scientists utilisant les DL pour l’analyse des données de trouver des datasets pertinents pour leurs tâches d’analyse sans aucun support ni gouvernance des données. L’objectif est de pouvoir extraire des métadonnées et des informations sur les datasets stockés dans le DL pour aider le data-scientist à trouver des sources pertinentes. Cela constitue l’objectif principal de cette thèse, où nous explorons différentes techniques de profilage de données, de correspondance holistique de schéma et de recommandation d’analyse pour soutenir le data-scientist. Nous proposons une nouvelle approche basée sur l’intelligence artificielle, spécifiquement l’apprentissage automatique supervisé, pour extraire automatiquement les métadonnées décrivant les datasets, calculer automatiquement les similitudes et les chevauchements de données entre ces ensembles en utilisant des techniques de correspondance holistique de schéma. Les relations entre datasets ainsi extraites sont utilisées pour catégoriser automatiquement les datasets, afin d’aider le data-scientist à trouver des datasets pertinents avec intersection entre leurs données. Cela est fait via une nouvelle technique basée sur les métadonnées appelée proximity mining, qui consomme les métadonnées extraites via des algorithmes de data mining automatisés afin de détecter des datasets connexes et de leur proposer des catégories pertinentes. Nous nous concentrons sur des datasets plats (tabulaires) organisés en rangées d’instances de données et en colonnes d’attributs décrivant les instances. L’approche proposée utilise les quatres principales techniques suivantes: (1) Correspondance de schéma basée sur l’instance pour détecter les éléments de données pertinents entre des datasets hétérogènes, (2) Extraction de métadonnées au niveau du dataset et proximity mining pour détecter les datasets connexes, (3) Extraction de métadonnées au niveau des attributs et proximity mining pour détecter des datasets connexes, et enfin, (4) catégorisation automatique des datasets via des techniques supervisées k-Nearest-Neighbour (kNN). Nous implémentons les algorithmes proposés via un prototype qui montre la faisabilité de cette approche. Nous appliquons ce prototype à une scénario DL du monde réel pour prouver la faisabilité, l’efficacité et l’efficience de notre approche, nous permettant d’atteindre des taux de rappel élevés et des gains d’efficacité, tout en diminuant le coût en espace et en temps de deux ordres de grandeur, via nos techniques proposées d’élagage précoce et de pré-filtrage, comparé aux techniques classiques de correspondance de schémas basées sur les instances. Cela prouve l’efficacité des méthodes automatiques proposées dans les tâches d’élagage précoce et de pré-filtrage pour la correspondance de schéma holistique et la cartegorisation automatique des datasets, tout en démontrant des améliorations par rapport à l’analyse de données basée sur l’humain pour les mêmes tâches.
APA, Harvard, Vancouver, ISO, and other styles
16

Duchateau, Fabien. "Towards a Generic Approach for Schema Matcher Selection : Leveraging User Pre- and Post-match Effort for Improving Quality and Time Performance." Montpellier 2, 2009. http://www.theses.fr/2009MON20213.

Full text
Abstract:
L'interopérabilité entre applications et les passerelles entre différentes sources de données sont devenues des enjeux cruciaux pour permettre des échanges d'informations op- timaux. Cependant, certains processus nécessaires à cette intégration ne peuvent pas être complétement automatisés à cause de leur complexité. L'un de ces processus, la mise en correspondance de schémas, est maintenant étudié depuis de nombreuses années. Il s'attaque au problème de la découverte de correspondances sémantiques entre éléments de différentes sources de données, mais il reste encore principalement effectué de manière manuelle. Par conséquent, le déploiement de larges systèmes de partage d'informations ne sera possible qu'en (semi-)automatisant ce processus de mise en correspondance. De nombreux outils de mise en correspondance de schémas ont été développés ces dernières décennies afin de découvrir automatiquement des mappings entre éléments de schémas. Cependant, ces outils accomplissent généralement des tâches de mise en correspondance pour des critères spécifiques, comme un scénario à large échelle ou la décou- verte de mappings complexes. Contrairement à la recherche sur l'alignement d'ontologies, il n'existe aucune plate-forme commune pour évaluer ces outils. Aussi la profusion d'outils de découverte de correspondances entre schémas, combinée aux deux problèmes évoqués précedemment, ne facilite pas, pour une utilisatrice, le choix d'un outil le plus ap- proprié pour découvrir des correspondances entre schémas. La première contribution de cette thèse consiste à proposer un outil d'évaluation, appelé XBenchMatch, pour mesurer les performances (en terme de qualité et de temps) des outils de découverte de correspondances entre schémas. Un corpus comprenant une dizaine de scénarios de mise en correspondance sont fournis avec XBenchMatch, chacun d'entre eux représentant un ou plusieurs critères relatif au processus de mise en correspondance de schémas. Nous avons également conçu et implémenté de nouvelles mesures pour évaluer la qualité des schémas intégrés et le post-effort de l'utilisateur. Cette étude des outils existants a permis une meilleure compréhension du processus de mise en correspondance de schémas. Le premier constat est que sans ressources externes telles que des dictionnaires ou des ontologies, ces outils ne sont généralement pas capables de découvrir des correspondances entre éléments possédant des étiquettes très différentes. Inversement, l'utilisation de ressources ne permet que rarement la découverte de correspondances entre éléments dont les étiquettes se ressemblent. Notre seconde contribution, BMatch, est un outil de découverte de correspondances entre schémas qui inclut une mesure de similarité structurelle afin de contrer ces problèmes. Nous démontrons ensuite de manière empirique les avantages et limites de notre approche. En effet, comme la plupart des outils de découverte de correspondances entre schémas, BMatch utilise une moyenne pondérée pour combiner plusieurs valeurs de similarité, ce qui implique une baisse de qualité et d'efficacité. De plus, la configuration des divers paramètres est une autre difficulté pour l'utilisatrice. Pour remédier à ces problèmes, notre outil MatchPlanner introduit une nouvelle méth- ode pour combiner des mesures de similarité au moyen d'arbres de décisions. Comme ces arbres peuvent être appris par apprentissage, les paramètres sont automatiquement config- urés et les mesures de similarité ne sont pas systématiquement appliquées. Nous montrons ainsi que notre approche améliore la qualité de découverte de correspondances entre sché- mas et les performances en terme de temps d'exécution par rapport aux outils existants. Enfin, nous laissons la possibilité à l'utilisatrice de spécifier sa préférence entre précision et rappel. Bien qu'équipés de configuration automatique de leurs paramètres, les outils de mise en correspondances de schémas ne sont pas encore suffisamment génériques pour obtenir des résultats qualitatifs acceptables pour une majorité de scénarios. C'est pourquoi nous avons étendu MatchPlanner en proposant une “fabrique d'outils” de découverte de correspondances entre schémas, nommée YAM (pour Yet Another Matcher). Cet outil apporte plus de flexibilité car il génère des outils de mise en correspondances à la carte pour un scénario donné. En effet, ces outils peuvent être considérés comme des classifieurs en apprentissage automatique, puisqu'ils classent des paires d'éléments de schémas comme étant pertinentes ou non en tant que mappings. Ainsi, le meilleur outil de mise en cor- respondance est construit et sélectionné parmi un large ensemble de classifieurs. Nous mesurons aussi l'impact sur la qualité lorsque l'utilisatrice fournit à l'outil des mappings experts ou lorsqu'elle indique une préférence entre précision et rappel
Interoperability between applications or bridges between data sources are required to allow optimal information exchanges. Yet, some processes needed to bring this integra- tion cannot be fully automatized due to their complexity. One of these processes is called matching and it has now been studied for years. It aims at discovering semantic corre- spondences between data sources elements and is still largely performed manually. Thus, deploying large data sharing systems requires the (semi-)automatization of this matching process. Many schema matching tools were designed to discover mappings between schemas. However, some of these tools intend to fulfill matching tasks with specific criteria, like a large scale scenario or the discovery of complex mappings. And contrary to ontology alignment research field, there is no common platform to evaluate them. The abundance of schema matching tools, added to the two previously mentioned issues, does not facil- itate the choice, by an user, of the most appropriate tool to match a given scenario. In this dissertation, our first contribution deals with a benchmark, XBenchMatch, to evaluate schema matching tools. It consists of several schema matching scenarios, which features one or more criteria. Besides, we have designed new measures to evaluate the quality of integrated schemas and the user post-match effort. This study and analysis of existing matching tools enables a better understanding of the matching process. Without external resources, most matching tools are mainly not able to detect a mapping between elements with totally dissimilar labels. On the contrary, they cannot infirm a mapping between elements with similar labels. Our second contribu- tion, BMatch, is a matching tool which includes a structural similarity measure and it aims at solving these issues by only using the schema structure. Terminological measures en- able the discovery of mappings whose schema elements share similar labels. Conversely, structural measures, based on cosine measure, detects mappings when schema elements have the same neighbourhood. BMatch's second aspect aims at improving the time per- formance by using an indexing structure, the B-tree, to accelerate the schema matching process. We empirically demonstrate the benefits and the limits of our approach. Like most schema matching tools, BMatch uses an aggregation function to combine similarity values, thus implying several drawbacks in terms of quality and performance. Tuning the parameters is another burden for the user. To tackle these issues, MatchPlanner introduces a new method to combine similarity measures by relying on decision trees. As decision trees can be learned, parameters are automatically tuned and similarity measures are only computed when necessary. We show that our approach provides an increase in terms of matching quality and better time performance with regards to other matching tools. We also present the possibility to let users choose a preference between precision and recall. Even with tuning capabilities, schema matching tools are still not generic enough to provide acceptable quality results for most schema matching scenarios. We finally extend MatchPlanner by proposing a factory of schema matchers, named YAM (for Yet Another Matcher). This tool brings more flexibility since it generates an 'a la carte' matcher for a given schema matching scenario. Indeed, schema matchers can be seen as machine learn- ing classifiers since they classify pairs of schema elements either as relevant or irrelevant. Thus, the best matcher in terms of matching quality is built and selected from a set of different classifiers. We also show impact on the quality when user provides some inputs, namely a list of expert mappings and a preference between precision and recall
APA, Harvard, Vancouver, ISO, and other styles
17

Sottovia, Paolo. "Information Extraction from data." Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242992.

Full text
Abstract:
Data analysis is the process of inspecting, cleaning, extract, and modeling data with the intention of extracting useful information in order to support users in their decisions. With the advent of Big Data, data analysis was becoming more complicated due to the volume and variety of data. This process begins with the acquisition of the data and the selection of the data that is useful for the desiderata analysis. With such amount of data, also expert users are not able to inspect the data and understand if a dataset is suitable or not for their purposes. In this dissertation, we focus on five problems in the broad data analysis process to help users find insights from the data when they do not have enough knowledge about its data. First, we analyze the data description problem, where the user is looking for a description of the input dataset. We introduce data descriptions: a compact, readable and insightful formula of boolean predicates that represents a set of data records. Finding the best description for a dataset is computationally expensive and task-specific; we, therefore, introduce a set of metrics and heuristics for generating meaningful descriptions at an interactive performance. Secondly, we look at the problem of order dependency discovery, which discovers another kind of metadata that may help the user in the understanding of characteristics of a dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. Thirdly, textual data encodes much hidden information. To allow this data to reach its full potential, there has been an increasing interest in extracting structural information from it. In this regard, we propose a novel approach for extracting events that are based on temporal co-reference among entities. We consider an event to be a set of entities that collectively experience relationships between them in a specific period of time. We developed a distributed strategy that is able to scale with the largest on-line encyclopedia available, Wikipedia. Then, we deal with the evolving nature of the data by focusing on the problem of finding synonymous attributes in evolving Wikipedia Infoboxes. Over time, several attributes have been used to indicate the same characteristic of an entity. This provides several issues when we are trying to analyze the content of different time periods. To solve it, we propose a clustering strategy that combines two contrasting distance metrics. We developed an approximate solution that we assess over 13 years of Wikipedia history by proving its flexibility and accuracy. Finally, we tackle the problem of identifying movements of attributes in evolving datasets. In an evolving environment, entities not only change their characteristics, but they sometimes exchange them over time. We proposed a strategy where we are able to discover those cases, and we also test our strategy on real datasets. We formally present the five problems that we validate both in terms of theoretical results and experimental evaluation, and we demonstrate that the proposed approaches efficiently scale with a large amount of data.
APA, Harvard, Vancouver, ISO, and other styles
18

Rodrigues, Diego de Azevedo, and 981997982. "A Study on Machine Learning Techniques for the Schema Matching Networks Problem." Universidade Federal do Amazonas, 2018. https://tede.ufam.edu.br/handle/tede/6801.

Full text
Abstract:
Submitted by Diego Rodrigues (diego.rodrigues@icomp.ufam.edu.br) on 2018-12-07T21:38:02Z No. of bitstreams: 2 Diego Rodrigues.pdf: 3673641 bytes, checksum: f1fdd4162dc6acd590136bb6b886704e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-12-07T22:27:06Z (GMT) No. of bitstreams: 2 Diego Rodrigues.pdf: 3673641 bytes, checksum: f1fdd4162dc6acd590136bb6b886704e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-12-10T19:02:56Z (GMT) No. of bitstreams: 2 Diego Rodrigues.pdf: 3673641 bytes, checksum: f1fdd4162dc6acd590136bb6b886704e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-12-10T19:02:56Z (GMT). No. of bitstreams: 2 Diego Rodrigues.pdf: 3673641 bytes, checksum: f1fdd4162dc6acd590136bb6b886704e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2018-10-22
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Schema Matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem, since the same concept is often represented by disparate elements in the schemas. The traditional instances of this problem involved a pair of schemas to be matched. However, recently there has been a increasing interest in matching several related schemas at once, a problem known as Schema Matching Networks, where the goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to automatically generate training data. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also propose a strategy for receiving user feedback to assert some of the matchings generated, and, relying on this feedback, improving the quality of the final result. Our experiments show that our methods can outperform baselines reaching F1-score up to 0.83.
Casamento de Esquemas é a tarefa de encontrar correpondências entre elementos de diferentes esquemas de bancos de dados. É um problema desafiador, uma vez que o mesmo conceito geralmente é representado de maneiras distintas nos esquemas.Tradicionalmente, a tarefa envolve um par de esquemas a serem mapeados. Entretanto, houve um crescimento na necessidade de mapear vários esquemas ao mesmo tempo, tarefa conhecida como Casamento de Esquemas em Rede, onde o objetivo é identificar elementos de vários esquemas que correspondem ao mesmo conceito. Este trabalho propõe uma famı́lia de métodos para o problema do casamento de esquemas em rede baseados em aprendizagem de máquina, que provou ser uma alternativa viável para o problema do casamento tradicional em diversos domı́nios. Para superar obstáculo de obter bastantes instâncias de treino, também é proposta uma técnica de bootstrapping para gerar treino automático. Além disso, o trabalho considera restrições de integridade que ajudam a nortear o processo de casamento em rede. Este trabalho também propõe uma estratégia para receber avaliações do usuário, com o propósito de melhorar o resultado final. Experimentos mostram que o método proposto supera outros métodos comparados alcançando valor F1 até 0.83 e sem utilizar muitas avaliações do usuário.
APA, Harvard, Vancouver, ISO, and other styles
19

Pilling, Valerie Kay. "Increasing the effectiveness of messages promoting responsible undergraduate drinking : tailoring to personality and matching to context." Diss., Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Kabisch, Thomas. "Extraction and integration of Web query interfaces." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2011. http://dx.doi.org/10.18452/16398.

Full text
Abstract:
Diese Arbeit fokussiert auf die Integration von Web Anfrageschnittstellen (Web Formularen). Wir identifizieren mehrere Schritte für den Integrationsprozess: Im ersten Schritt werden unbekannte Anfrageschnittstellen auf ihre Anwendungsdomäne hin analysiert. Im zweiten Schritt werden die Anfrageschnittstellen in ein maschinenlesbares Format transformiert (Extraktion). Im dritten Schritt werden Paare semantisch gleicher Elemente zwischen den verschiedenen zu integrierenden Anfragesschnittstellen identifiziert (Matching). Diese Schritte bilden die Grundlage, um Systeme, die eine integrierte Sicht auf die verschiedenen Datenquellen bieten, aufsetzen zu können. Diese Arbeit beschreibt neuartige Lösungen für alle drei der genannten Schritte. Der erste zentrale Beitrag ist ein Exktraktionsalgorithmus, der eine kleine Zahl von Designregeln dazu benutzt, um Schemabäume abzuleiten. Gegenüber früheren Lösungen, welche in der Regel lediglich eine flache Schemarepräsentation anbieten, ist der Schemabaum semantisch reichhaltiger, da er zusätzlich zu den Elementen auch Strukturinformationen abbildet. Der Extraktionsalgorithmus erreicht eine verbesserte Qualität der Element-Extraktion verglichen mit Vergängermethoden. Der zweite Beitrag der Arbeit ist die Entwicklung einer neuen Matching-Methode. Hierbei ermöglicht die Repräsentation der Schnittstellen als Schemabäume eine Verbesserung vorheriger Methoden, indem auch strukturelle Aspekte in den Matching-Algorithmus einfließen. Zusätzlich wird eine globale Optimierung durchgeführt, welche auf der Theorie der bipartiten Graphen aufbaut. Als dritten Beitrag entwickelt die Arbeit einen Algorithms für eine Klassifikation von Schnittstellen nach Anwendungsdomänen auf Basis der Schemabäume und den abgeleiteten Matches. Zusätzlich wird das System VisQI vorgestellt, welches die entwickelten Algorithmen implementiert und eine komfortable graphische Oberfläche für die Unterstützung des Integrationsprozesses bietet.
This thesis focuses on the integration of Web query interfaces. We model the integration process in several steps: First, unknown interfaces have to be classified with respect to their application domain (classification); only then a domain-wise treatment is possible. Second, interfaces must be transformed into a machine readable format (extraction) to allow their automated analysis. Third, as a pre-requisite to integration across databases, pairs of semantically similar elements among multiple interfaces need to be identified (matching). Only if all these tasks have been solved, systems that provide an integrated view to several data sources can be set up. This thesis presents new algorithms for each of these steps. We developed a novel extraction algorithm that exploits a small set of commonsense design rules to derive a hierarchical schema for query interfaces. In contrast to prior solutions that use mainly flat schema representations, the hierarchical schema better represents the structure of the interfaces, leading to better accuracy of the integration step. Next, we describe a multi-step matching method for query interfaces which builds on the hierarchical schema representation. It uses methods from the theory of bipartite graphs to globally optimize the matching result. As a third contribution, we present a new method for the domain classification problem of unknown interfaces that, for the first time, combines lexical and structural properties of schemas. All our new methods have been evaluated on real-life datasets and perform superior to previous works in their respective fields. Additionally, we present the system VisQI that implements all introduced algorithmic steps and provides a comfortable graphical user interface to support the integration process.
APA, Harvard, Vancouver, ISO, and other styles
21

Kadri, Imen. "Controlled estimation algorithms of disparity map using a compensation compression scheme for stereoscopic image coding." Thesis, Paris 13, 2020. http://www.theses.fr/2020PA131002.

Full text
Abstract:
Ces dernières années ont vu apparaître de nombreuses applications utilisant la technologie 3D tels que les écrans de télévisions 3D, les écrans auto-stéréoscopiques ou encore la visio-conférence stéréoscopique. Cependant ces applications nécessitent des techniques bien adaptées pour comprimer efficacement le volume important de données à transmettre ou à stocker. Les travaux développés dans cette thèse concernent le codage d’images stéréoscopiques et s’intéressent en particulier à l'amélioration de l'estimation de la carte de disparité dans un schéma de Compression avec Compensation de Disparité (CCD). Habituellement, l'algorithme d’appariement de blocs similaires dans les deux vues permet d’estimer la carte de disparité en cherchant à minimiser l’erreur quadratique moyenne entre la vue originale et sa version reconstruite sans compensation de disparité. L’erreur de reconstruction est ensuite codée puis décodée afin d’affiner (compenser) la vue prédite. Pour améliorer la qualité de la vue reconstruite, dans un schéma de codage par CCD, nous avons prouvé que le concept de sélectionner la disparité en fonction de l'image compensée plutôt que de l'image prédite donne de meilleurs résultats. En effet, les simulations montrent que notre algorithme non seulement réduit la redondance inter-vue mais également améliore la qualité de la vue reconstruite et compensée par rapport à la méthode habituelle de codage avec compensation de disparité. Cependant, cet algorithme de codage requiert une grande complexité de calculs. Pour remédier à ce problème, une modélisation simplifiée de la manière dont le codeur JPEG (à savoir la quantification des composantes DCT) impacte la qualité de l’information codée est proposée. En effet, cette modélisation a permis non seulement de réduire la complexité de calculs mais également d’améliorer la qualité de l’image stéréoscopique décodée dans un contexte CCD. Dans la dernière partie, une métrique minimisant conjointement la distorsion et le débit binaire est proposée pour estimer la carte de disparité en combinant deux algorithmes de codage d’images stéréoscopiques dans un schéma CCD
Nowadays, 3D technology is of ever growing demand because stereoscopic imagingcreate an immersion sensation. However, the price of this realistic representation is thedoubling of information needed for storage or transmission purpose compared to 2Dimage because a stereoscopic pair results from the generation of two views of the samescene. This thesis focused on stereoscopic image coding and in particular improving thedisparity map estimation when using the Disparity Compensated Compression (DCC)scheme.Classically, when using Block Matching algorithm with the DCC, a disparity mapis estimated between the left image and the right one. A predicted image is thencomputed.The difference between the original right view and its prediction is called theresidual error. This latter, after encoding and decoding, is injected to reconstruct theright view by compensation (i.e. refinement) . Our first developed algorithm takes intoaccount this refinement to estimate the disparity map. This gives a proof of conceptshowing that selecting disparity according to the compensated image instead of thepredicted one is more efficient. But this done at the expense of an increased numericalcomplexity. To deal with this shortcoming, a simplified modelling of how the JPEGcoder, exploiting the quantization of the DCT components, used for the residual erroryields with the compensation is proposed. In the last part, to select the disparity mapminimizing a joint bitrate-distortion metric is proposed. It is based on the bitrateneeded for encoding the disparity map and the distortion of the predicted view.This isby combining two existing stereoscopic image coding algorithms
APA, Harvard, Vancouver, ISO, and other styles
22

Mergen, Sérgio Luis Sardi. "Casamento de esquemas XML e esquemas relacionais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2005. http://hdl.handle.net/10183/10421.

Full text
Abstract:
O casamento entre esquemas XML e esquemas relacionais é necessário em diversas aplicações, tais como integração de informação e intercâmbio de dados. Tipicamente o casamento de esquemas é um processo manual, talvez suportado por uma interface grá ca. No entanto, o casamento manual de esquemas muito grandes é um processo dispendioso e sujeito a erros. Disto surge a necessidade de técnicas (semi)-automáticas de casamento de esquemas que auxiliem o usuário fornecendo sugestões de casamento, dessa forma reduzindo o esforço manual aplicado nesta tarefa. Apesar deste tema já ter sido estudado na literatura, o casamento entre esquemas XML e esquemas relacionais é ainda um tema em aberto. Isto porque os trabalhos existentes ou se aplicam para esquemas de nidos no mesmo modelo, ou são genéricos demais para o problema em questão. O objetivo desta dissertação é o desenvolvimento de técnicas especí cas para o casamento de esquemas XML e esquemas relacionais. Tais técnicas exploram as particularidades existentes entre estes esquemas para inferir valores de similaridade entre eles. As técnicas propostas são avaliadas através de experimentos com esquemas do mundo real.
The matching between XML schemas and relational schemas has many applications, such as information integration and data exchange. Typically, schema matching is done manually by domain experts, sometimes using a graphical tool. However, the matching of large schemas is a time consuming and error-prone task. The use of (semi-)automatic schema matching techniques can help the user in nding the correct matches, thereby reducing his labor. The schema matching problem has already been addressed in the literature. Nevertheless, the matching of XML schemas and relational schemas is still an open issue. This comes from the fact that the existing work is whether speci c for schemas designed in the same model, or too generic for the problem in discussion. The mais goal of this dissertation is to develop speci c techniques for the matching of XML schemas and relational schemas. Such techniques exploit the particularities found when analyzing the two schemas together, and use these cues to leverage the matching process. The techniques are evaluated by running experiments with real-world schemas.
APA, Harvard, Vancouver, ISO, and other styles
23

Jain, Prateek. "Linked Open Data Alignment & Querying." Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1345575500.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Watanabe, Toyohide, Yuusuke Uehara, Yuuji Yoshida, and Teruo Fukumura. "A semantic data model for intellectual database access." IEEE, 1990. http://hdl.handle.net/2237/6923.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

CARVALHO, Marcus Vinícius Ribeiro de. "UMA ABORDAGEM BASEADA NA ENGENHARIA DIRIGIDA POR MODELOS PARA SUPORTAR MERGING DE BASE DE DADOS HETEROGÊNEAS." Universidade Federal do Maranhão, 2014. http://tedebc.ufma.br:8080/jspui/handle/tede/511.

Full text
Abstract:
Made available in DSpace on 2016-08-17T14:53:26Z (GMT). No. of bitstreams: 1 Dissertacao Marcus Vinicius Ribeiro.pdf: 4694533 bytes, checksum: b84a4bad63b098d054781131cfb9bc26 (MD5) Previous issue date: 2014-02-24
Model Driven Engineering (MDE) aims to make face to the development, maintenance and evolution of complex software systems, focusing in models and model transformations. This approach can be applied in other domains such as database schema integration. In this research work, we propose a framework to integrate database schema in the MDE context. Metamodels for defining database model, database model matching, database model merging, and integrated database model are proposed in order to support our framework. An algorithm for database model matching and an algorithm for database model merging are presented. We present also, a prototype that extends the MT4MDE and SAMT4MDE tools in order to demonstrate the implementation of our proposed framework, metodology, and algorithms. An illustrative example helps to understand our proposed framework.
A Engenharia Dirigida por Modelos (MDE) fornece suporte para o gerenciamento da complexidade de desenvolvimento, manutenção e evolução de software, através da criação e transformação de modelos. Esta abordagem pode ser utilizada em outros domínios também complexos como a integração de esquemas de base de dados. Neste trabalho de pesquisa, propomos uma metodologia para integrar schema de base de dados no contexto da MDE. Metamodelos para definição de database model, database model matching, database model merging, integrated database model são propostos com a finalidade de apoiar a metodologia. Um algoritmo para database model matching e um algoritmo para database model merging são apresentados. Apresentamos ainda, um protótipo que adapta e estende as ferramentas MT4MDE e SAMT4MDE a fim de demonstrar a implementação do framework, metodologia e algoritmos propostos. Um exemplo ilustrativo ajuda a melhor entender a metodologia apresentada, servindo para explicar os metamodelos e algoritmos propostos neste trabalho. Uma breve avaliação do framework e diretrizes futuras sobre este trabalho são apresentadas.
APA, Harvard, Vancouver, ISO, and other styles
26

Pfeifer, Katja. "Serviceorientiertes Text Mining am Beispiel von Entitätsextrahierenden Diensten." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-150646.

Full text
Abstract:
Der Großteil des geschäftsrelevanten Wissens liegt heute als unstrukturierte Information in Form von Textdaten auf Internetseiten, in Office-Dokumenten oder Foreneinträgen vor. Zur Extraktion und Verwertung dieser unstrukturierten Informationen wurde eine Vielzahl von Text-Mining-Lösungen entwickelt. Viele dieser Systeme wurden in der jüngeren Vergangenheit als Webdienste zugänglich gemacht, um die Verwertung und Integration zu vereinfachen. Die Kombination verschiedener solcher Text-Mining-Dienste zur Lösung konkreter Extraktionsaufgaben erscheint vielversprechend, da so bestehende Stärken ausgenutzt, Schwächen der Systeme minimiert werden können und die Nutzung von Text-Mining-Lösungen vereinfacht werden kann. Die vorliegende Arbeit adressiert die flexible Kombination von Text-Mining-Diensten in einem serviceorientierten System und erweitert den Stand der Technik um gezielte Methoden zur Auswahl der Text-Mining-Dienste, zur Aggregation der Ergebnisse und zur Abbildung der eingesetzten Klassifikationsschemata. Zunächst wird die derzeit existierende Dienstlandschaft analysiert und aufbauend darauf eine Ontologie zur funktionalen Beschreibung der Dienste bereitgestellt, so dass die funktionsgesteuerte Auswahl und Kombination der Text-Mining-Dienste ermöglicht wird. Des Weiteren werden am Beispiel entitätsextrahierender Dienste Algorithmen zur qualitätssteigernden Kombination von Extraktionsergebnissen erarbeitet und umfangreich evaluiert. Die Arbeit wird durch zusätzliche Abbildungs- und Integrationsprozesse ergänzt, die eine Anwendbarkeit auch in heterogenen Dienstlandschaften, bei denen unterschiedliche Klassifikationsschemata zum Einsatz kommen, gewährleisten. Zudem werden Möglichkeiten der Übertragbarkeit auf andere Text-Mining-Methoden erörtert.
APA, Harvard, Vancouver, ISO, and other styles
27

Rodrigues, Diego de Azevedo. "Casamento de esquemas de banco de dados aplicando aprendizado ativo." Universidade Federal do Amazonas, 2013. http://tede.ufam.edu.br/handle/tede/4146.

Full text
Abstract:
Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-06-18T13:54:27Z No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:02:00Z (GMT) No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:03:00Z (GMT) No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5)
Made available in DSpace on 2015-06-19T21:03:00Z (GMT). No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5) Previous issue date: 2013-03-12
FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas
Given two database schemas within the same domain, the schema matching problem is the task of finding pairs of schema elements that have the same semantics for that domain. Usually, this task was performed manually by a specialist making it tedious and costly because the specialist should know the schemas and their domain. Currently this process is assisted by semi-automatic schema matching methods. Current, methods use some heuristics to generate matchings and many of them share a common modeling: they build a similarity matrix between the elements from functions called matchers and, based on the matrix values, decide according to a criterion which of the matchings are correct. This thesis presents an active-learning based method that uses the similarity matrix generated by the matchers, a machine learning algorithm and specialist interventions to generate matchings. The presented method di↵ers from others because it has no fixed heuristic and uses the specialist expertise only when necessary. In our experiments, we evaluate the proposed method against a baseline on two datasets: the first one was the same used by the baseline and the second containing schemas of a benchmark for schema integration. We show that baseline achieves good results on its original dataset, but its fixed strategy is not as e↵ective for other schemas. Moreover, the proposed method based on active learning is shown more consistent achieving, on average, F-measure value of 0.64.
Dados dois esquemas de bancos de dados pertencentes ao mesmo domíınio, o problema de Casamento de Esquemas consiste em encontrar pares de elementos desses esquemas que possuam a mesma semântica para aquele domínio. Tradicionalmente, tal tarefa era realizada manualmente por um especialista, tornando-a custosa e cansativa pois, este deveria conhecer bem os esquemas e o domíınio em que estes estavam inseridos. Atualmente, esse processo é assistido por métodos semi-automáticos de casamento de esquemas. Os métodos atuais utilizam diversas heurísticas para gerar os casamentos e muitos deles compartilham uma modelagem em comum: constroem uma matriz de similaridade entre os elementos a partir de funções chamadas matchers e, baseados nos valores dessa matriz, decidem segundo algum critério quais os casamentos válidos. Esta dissertação apresenta um método baseado em aprendizado ativo que utiliza a matriz de similaridade gerada pelos matchers e um algoritmo de aprendizagem de máquina, além de intervenções de um especialista, para gerar os casamentos. O método apresentado se diferencia dos outros por não possuir uma heurística fixa e por utilizar a experiência do especialista apenas quando necessário. Em nossos experimentos, avaliamos o método proposto contra um baseline em dois datasets: o primeiro que foi o mesmo utilizado pelo baseline e o segundo contendo esquemas propostos em um benchmark para integração de esquemas. Mostramos que o baseline alcança bons resultados no dataset em que foi originalmente testado, mas que sua estratégia fixa não é tão efetiva para outros esquemas. Por outro lado, o método baseado em aprendizado ativo que propomos se mostra consistente em ambos os datasets, alcançando, em média, um valor de medida-F igual a 0, 64.
APA, Harvard, Vancouver, ISO, and other styles
28

NÓBREGA, Thiago Pereira da. "Pareamento privado de atributos no contexto da resolução de entidades com preservação de privacidade." Universidade Federal de Campina Grande, 2018. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1671.

Full text
Abstract:
Submitted by Emanuel Varela Cardoso (emanuel.varela@ufcg.edu.br) on 2018-09-10T19:58:50Z No. of bitstreams: 1 THIAGO PEREIRA DA NÓBREGA – DISSERTAÇÃO (PPGCC) 2018.pdf: 3402601 bytes, checksum: b1a8d86821a4d14435d5adbdd850ec04 (MD5)
Made available in DSpace on 2018-09-10T19:58:50Z (GMT). No. of bitstreams: 1 THIAGO PEREIRA DA NÓBREGA – DISSERTAÇÃO (PPGCC) 2018.pdf: 3402601 bytes, checksum: b1a8d86821a4d14435d5adbdd850ec04 (MD5) Previous issue date: 2018-05-11
A Resolução de entidades com preservação de privacidade (REPP) consiste em identificar entidades (e.g. Pacientes), armazenadas em bases de dados distintas, que correspondam a um mesmo objeto do mundo real. Como as entidades em questão possuem dados privados (ou seja, dados que não podem ser divulgados) é fundamental que a tarefa de REPP seja executada sem que nenhuma informação das entidades seja revelada entre os participantes (proprietários das bases de dados), de modo que a privacidade dos dados seja preservada. Ao final da tarefa de REPP, cada participante identifica quais entidades de sua base de dados estão presentes nas bases de dados dos demais participantes. Antes de iniciar a tarefa de REPP os participantes devem concordar em relação à entidade (em comum), a ser considerada na tarefa, e aos atributos das entidades a serem utilizados para comparar as entidades. Em geral, isso exige que os participantes tenham que expor os esquemas de suas bases de dados, compartilhando (meta-) informações que podem ser utilizadas para quebrar a privacidade dos dados. Este trabalho propõe uma abordagem semiautomática para identificação de atributos similares (pareamento de atributos) a serem utilizados para comparar entidades durante a REPP. A abordagem é inserida em uma etapa preliminar da REPP (etapa de Apresentação) e seu resultado (atributos similares) pode ser utilizado pelas etapas subsequentes (Blocagem e Comparação). Na abordagem proposta a identificação dos atributos similares é realizada utilizando-se representações dos atributos (Assinaturas de Dados), geradas por cada participante, eliminando a necessidade de divulgar informações sobre seus esquemas, ou seja, melhorando a segurança e privacidade da tarefa de REPP. A avaliação da abordagem aponta que a qualidade do pareamento de atributos é equivalente a uma solução que não considera a privacidade dos dados, e que a abordagem é capaz de preservar a privacidade dos dados.
The Privacy Preserve Record Linkage (PPRL) aims to identify entities, that can not have their information disclosed (e.g., Medical Records), which correspond to the same real-world object across different databases. It is crucial to the PPRL tasks that it is executed without revealing any information between the participants (database owners) during the PPRL task, to preserve the privacy of the original data. At the end of a PPRL task, each participant identifies which entities in its database are present in the databases of the other participants. Thus, before starting the PPRL task, the participants must agree on the entity and its attributes, to be compared in the task. In general, this agreement requires that participants have to expose their schemas, sharing (meta-)information that can be used to break the privacy of the data. This work proposes a semiautomatic approach to identify similar attributes (attribute pairing) to identify the entities attributes. The approach is inserted as a preliminary step of the PPRL (Handshake), and its result (similar attributes) can be used by subsequent steps (Blocking and Comparison). In the proposed approach, the participants generate a privacy-preserving representation (Data Signatures) of the attributes values that are sent to a trusted third-party to identify similar attributes from different data sources. Thus, by eliminating the need to share information about their schemas, consequently, improving the security and privacy of the PPRL task. The evaluation of the approach points out that the quality of attribute pairing is equivalent to a solution that does not consider data privacy, and is capable of preserving data privacy.
APA, Harvard, Vancouver, ISO, and other styles
29

See, Chan H. "Computation of electromagnetic fields in assemblages of biological cells using a modified finite difference time domain scheme. Computational electromagnetic methods using quasi-static approximate version of FDTD, modified Berenger absorbing boundary and Floquet periodic boundary conditions to investigate the phenomena in the interaction between EM fields and biological systems." Thesis, University of Bradford, 2007. http://hdl.handle.net/10454/4762.

Full text
Abstract:
yes
There is an increasing need for accurate models describing the electrical behaviour of individual biological cells exposed to electromagnetic fields. In this area of solving linear problem, the most frequently used technique for computing the EM field is the Finite-Difference Time-Domain (FDTD) method. When modelling objects that are small compared with the wavelength, for example biological cells at radio frequencies, the standard Finite-Difference Time-Domain (FDTD) method requires extremely small time-step sizes, which may lead to excessive computation times. The problem can be overcome by implementing a quasi-static approximate version of FDTD, based on transferring the working frequency to a higher frequency and scaling back to the frequency of interest after the field has been computed. An approach to modeling and analysis of biological cells, incorporating the Hodgkin and Huxley membrane model, is presented here. Since the external medium of the biological cell is lossy material, a modified Berenger absorbing boundary condition is used to truncate the computation grid. Linear assemblages of cells are investigated and then Floquet periodic boundary conditions are imposed to imitate the effect of periodic replication of the assemblages. Thus, the analysis of a large structure of cells is made more computationally efficient than the modeling of the entire structure. The total fields of the simulated structures are shown to give reasonable and stable results at 900MHz, 1800MHz and 2450MHz. This method will facilitate deeper investigation of the phenomena in the interaction between EM fields and biological systems. Moreover, the nonlinear response of biological cell exposed to a 0.9GHz signal was discussed on observing the second harmonic at 1.8GHz. In this, an electrical circuit model has been proposed to calibrate the performance of nonlinear RF energy conversion inside a high quality factor resonant cavity with known nonlinear device. Meanwhile, the first and second harmonic responses of the cavity due to the loading of the cavity with the lossy material will also be demonstrated. The results from proposed mathematical model, give good indication of the input power required to detect the weakly effects of the second harmonic signal prior to perform the measurement. Hence, this proposed mathematical model will assist to determine how sensitivity of the second harmonic signal can be detected by placing the required specific input power.
APA, Harvard, Vancouver, ISO, and other styles
30

See, Chan Hwang. "Computation of electromagnetic fields in assemblages of biological cells using a modified finite difference time domain scheme : computational electromagnetic methods using quasi-static approximate version of FDTD, modified Berenger absorbing boundary and Floquet periodic boundary conditions to investigate the phenomena in the interaction between EM fields and biological systems." Thesis, University of Bradford, 2007. http://hdl.handle.net/10454/4762.

Full text
Abstract:
There is an increasing need for accurate models describing the electrical behaviour of individual biological cells exposed to electromagnetic fields. In this area of solving linear problem, the most frequently used technique for computing the EM field is the Finite-Difference Time-Domain (FDTD) method. When modelling objects that are small compared with the wavelength, for example biological cells at radio frequencies, the standard Finite-Difference Time-Domain (FDTD) method requires extremely small time-step sizes, which may lead to excessive computation times. The problem can be overcome by implementing a quasi-static approximate version of FDTD, based on transferring the working frequency to a higher frequency and scaling back to the frequency of interest after the field has been computed. An approach to modeling and analysis of biological cells, incorporating the Hodgkin and Huxley membrane model, is presented here. Since the external medium of the biological cell is lossy material, a modified Berenger absorbing boundary condition is used to truncate the computation grid. Linear assemblages of cells are investigated and then Floquet periodic boundary conditions are imposed to imitate the effect of periodic replication of the assemblages. Thus, the analysis of a large structure of cells is made more computationally efficient than the modeling of the entire structure. The total fields of the simulated structures are shown to give reasonable and stable results at 900MHz, 1800MHz and 2450MHz. This method will facilitate deeper investigation of the phenomena in the interaction between EM fields and biological systems. Moreover, the nonlinear response of biological cell exposed to a 0.9GHz signal was discussed on observing the second harmonic at 1.8GHz. In this, an electrical circuit model has been proposed to calibrate the performance of nonlinear RF energy conversion inside a high quality factor resonant cavity with known nonlinear device. Meanwhile, the first and second harmonic responses of the cavity due to the loading of the cavity with the lossy material will also be demonstrated. The results from proposed mathematical model, give good indication of the input power required to detect the weakly effects of the second harmonic signal prior to perform the measurement. Hence, this proposed mathematical model will assist to determine how sensitivity of the second harmonic signal can be detected by placing the required specific input power.
APA, Harvard, Vancouver, ISO, and other styles
31

Abadie, Nathalie. "Formalisation, acquisition et mise en œuvre de connaissances pour l’intégration virtuelle de bases de données géographiques : les spécifications au cœur du processus d’intégration." Thesis, Paris Est, 2012. http://www.theses.fr/2012PEST1054/document.

Full text
Abstract:
Cette thèse traite de l'intégration de bases de données topographiques qui consiste à expliciter les relations de correspondance entre bases de données hétérogènes, de sorte à permettre leur utilisation conjointe. L'automatisation de ce processus d'intégration suppose celle de la détection des divers types d'hétérogénéité pouvant intervenir entre les bases de données topographiques à intégrer. Ceci suppose de disposer, pour chacune des bases à intégrer, de connaissances sur leurs contenus respectifs. Ainsi, l'objectif de cette thèse réside dans la formalisation, l'acquisition et l'exploitation des connaissances nécessaires pour la mise en œuvre d'un processus d'intégration virtuelle de bases de données géographiques vectorielles. Une première étape du processus d'intégration de bases de données topographiques consiste à apparier leurs schémas conceptuels. Pour ce faire, nous proposons de nous appuyer sur une source de connaissances particulière : les spécifications des bases de données topographiques. Celles-ci sont tout d'abord mises à profit pour la création d'une ontologie du domaine de la topographie. Cette ontologie est utilisée comme ontologie de support, dans le cadre d'une première approche d'appariement de schémas de bases de données topographiques, fondée sur des techniques d'appariement terminologiques et structurelles. Une seconde approche, inspirée des techniques d'appariement fondées sur la sémantique, met en œuvre cette ontologie pour la représentation des connaissances sur les règles de sélection et de représentation géométrique des entités géographiques issues des spécifications dans le langage OWL 2, et leur exploitation par un système de raisonnement
This PhD thesis deals with topographic databases integration. This process aims at facilitating the use of several heterogeneous databases by making the relationships between them explicit. To automatically achieve databases integration, several aspects of data heterogeneity must be detected and solved. Identifying heterogeneities between topographic databases implies comparing some knowledge about their respective contents. Therefore, we propose to formalise and acquire this knowledge and to use it for topographic databases integration. Our work focuses on the specific problem of topographic databases schema matching, as a first step in an integration application. To reach this goal, we propose to use a specific knowledge source, namely the databases specifications, which describe the data implementing rules. Firstly, they are used as the main resource for the knowledge acquisition process in an ontology learning application. As a first approach for schema matching, the domain ontology created from the texts of IGN's databases specifications is used as a background knowledge source in a schema matching application based on terminological and structural matching techniques. In a second approach, this ontology is used to support the representation, in the OWL 2 language, of topographic entities selection and geometry capture rules described in the databases specifications. This knowledge is then used by a reasoner in a semantic-based schema matching application
APA, Harvard, Vancouver, ISO, and other styles
32

Masri, Ali. "Multi-Network integration for an Intelligent Mobility." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV091/document.

Full text
Abstract:
Les systèmes de transport sont un des leviers puissants du progrès de toute société. Récemment les modes de déplacement ont évolué significativement et se diversifient. Les distances quotidiennement parcourues par les citoyens ne cessent d'augmenter au cours de ces dernières années. Cette évolution impacte l'attractivité et la compétitivité mais aussi la qualité de vie grandement dépendante de l'évolution des mobilités des personnes et des marchandises. Les gouvernements et les collectivités territoriales développent de plus en plus des politiques d'incitation à l'éco-mobilité. Dans cette thèse nous nous concentrons sur les systèmes de transport public. Ces derniers évoluent continuellement et offrent de nouveaux services couvrant différents modes de transport pour répondre à tous les besoins des usagers. Outre les systèmes de transports en commun, prévus pour le transport de masse, de nouveaux services de mobilité ont vu le jour, tels que le transport à la demande, le covoiturage planifié ou dynamique et l'autopartage ou les vélos en libre-service. Ils offrent des solutions alternatives de mobilité et pourraient être complémentaires aux services traditionnels. Cepandant, ces services sont à l'heure actuelle isolés du reste des modes de transport et des solutions multimodales. Ils sont proposés comme une alternative mais sans intégration réelle aux plans proposés par les outils existants. Pour permettre la multimodalité, le principal challenge de cette thèse est l'intégration de données et/ou de services provenant de systèmes de transports hétérogènes. Par ailleurs, le concept de données ouvertes est aujourd'hui adopté par de nombreuses organisations publiques et privées, leur permettant de publier leurs sources de données sur le Web et de gagner ainsi en visibilité. On se place dans le contexte des données ouvertes et des méthodes et outils du web sémantique pour réaliser cette intégration, en offrant une vue unifiée des réseaux et des services de transport. Les verrous scientifiques auxquels s'intéresse cette thèse sont liés aux problèmes d'intégration à la fois des données et des services informatiques des systèmes de transport sous-jacents
Multimodality requires the integration of heterogeneous transportation data and services to construct a broad view of the transportation network. Many new transportation services (e.g. ridesharing, car-sharing, bike-sharing) are emerging and gaining a lot of popularity since in some cases they provide better trip solutions.However, these services are still isolated from the existing multimodal solutions and are proposed as alternative plans without being really integrated in the suggested plans. The concept of open data is raising and being adopted by many companies where they publish their data sources to the web in order to gain visibility. The goal of this thesis is to use these data to enable multimodality by constructing an extended transportation network that links these new services to existing ones.The challenges we face mainly arise from the integration problem in both transportation services and transportation data
APA, Harvard, Vancouver, ISO, and other styles
33

Gentilhomme, Théophile. "Intégration multi-échelles des données de réservoir et quantification des incertitudes." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0089/document.

Full text
Abstract:
Dans ce travail, nous proposons de suivre une approche multi-échelles pour simuler des propriétés spatiales des réservoirs, permettant d'intégrer des données directes (observation de puits) ou indirectes (sismique et données de production) de résolutions différentes. Deux paramétrisations sont utilisées pour résoudre ce problème: les ondelettes et les pyramides gaussiennes. A l'aide de ces paramétrisations, nous démontrons les avantages de l'approche multi-échelles sur deux types de problèmes d'estimations des incertitudes basés sur la minimisation d'une distance. Le premier problème traite de la simulation de propriétés à partir d'un algorithme de géostatistique multipoints. Il est montré que l'approche multi-échelles basée sur les pyramides gaussiennes améliore la qualité des réalisations générées, respecte davantage les données et réduit les temps de calculs par rapport à l'approche standard. Le second problème traite de la préservation des modèles a priori lors de l'assimilation des données d'historique de production. Pour re-paramétriser le problème, nous développons une transformée en ondelette 3D applicable à des grilles stratigraphiques complexes de réservoir, possédant des cellules mortes ou de volume négligeable. Afin d'estimer les incertitudes liées à l'aspect mal posé du problème inverse, une méthode d'optimisation basée ensemble est intégrée dans l'approche multi-échelles de calage historique. A l'aide de plusieurs exemples d'applications, nous montrons que l'inversion multi-échelles permet de mieux préserver les modèles a priori et est moins assujettie au bruit que les approches standards, tout en respectant aussi bien les données de conditionnement
In this work, we propose to follow a multi-scale approach for spatial reservoir properties characterization using direct (well observations) and indirect (seismic and production history) data at different resolutions. Two decompositions are used to parameterize the problem: the wavelets and the Gaussian pyramids. Using these parameterizations, we show the advantages of the multi-scale approach with two uncertainty quantification problems based on minimization. The first one concerns the simulation of property fields from a multiple points geostatistics algorithm. It is shown that the multi-scale approach based on Gaussian pyramids improves the quality of the output realizations, the match of the conditioning data and the computational time compared to the standard approach. The second problem concerns the preservation of the prior models during the assimilation of the production history. In order to re-parameterize the problem, we develop a new 3D grid adaptive wavelet transform, which can be used on complex reservoir grids containing dead or zero volume cells. An ensemble-based optimization method is integrated in the multi-scale history matching approach, so that an estimation of the uncertainty is obtained at the end of the optimization. This method is applied on several application examples where we observe that the final realizations better preserve the spatial distribution of the prior models and are less noisy than the realizations updated using a standard approach, while matching the production data equally well
APA, Harvard, Vancouver, ISO, and other styles
34

Wu, Bing-Jhen, and 吳秉禎. "Cluster-Based Pattern-Matching Localization Schemes for Large-Scale Wireless Networks." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/04102399954953427819.

Full text
Abstract:
碩士
國立交通大學
網路工程研究所
95
In location-based services, the response time of location determination is critical, especially in real-time applications. This is especially true for pattern-matching localization methods, which rely on comparing an object's current signal strength pattern against a pre-established location database of signal strength patterns collected at the training phase, when the sensing field is large (such as a wireless city). In this work, we propose a cluster-based localization framework to speed up the positioning process for pattern-matching localization schemes. Through grouping training locations with similar signal strength patterns, we show how to reduce the associated comparison cost so as to accelerate the pattern-matching process. To deal with signal fluctuations, several clustering strategies are proposed. Extensive simulation studies are conducted. Experimental results show that more than 90% computation cost can be reduced in average without degrading the positioning accuracy.
APA, Harvard, Vancouver, ISO, and other styles
35

Lu, Wei-Yuan, and 呂偉元. "A Low Bitrate Video System Using New Block Matching And Rate Control Schemes." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/92340124721772058595.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Chiang, Yun-Jung, and 蔣云容. "Matching Pursuit based DoA Estimation Schemes for Non-Nyquist Spatially Sampled Ultrasound Array Radars." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441092%22.&searchmode=basic.

Full text
Abstract:
碩士
國立中興大學
電機工程學系所
107
Ultrasound radar has been widely used in short range obstacle detection due to its simplicity and cost effectiveness in implementation. Conventional ultrasound radars use pulse transmission and adopt the principle "time of flight" to calculate distance. The detection quality is susceptible to noise and interference. FMCW (frequency modulation continuous waveform) radars, nonetheless, use the frequency offset between the transmitted and the received signals and can function properly under inferior SNR environments. However, to obtain the direction information of the objects, an array instead of one single transducer is needed to achieve the estimation. This thesis develops an ultrasonic radar system for indoor, people rich environments 2) The system adopts an FMCW modulation scheme and employs an array configuration to perform distance and direction estimations simultaneously. The proposed system includes the following features: 1) tailored to the FMCW based systems 2) performing the estimations in the frequency domain, 3) capable of detecting multiple objects either with an identical distance from the radar or along the same incident directions, 4) mitigating the angle aliasing problem when a non-Nyquist spatially sampled transducer array is adopted. The system can thus construct obstacle maps of the surrounding environment. Because the detection principle of FMCW lies on the frequency offset between the transmitted and received signals, a Fast Fourier transform (FFT) is always required. The Direction of Arrival (DoA) can thus be performed in the frequency domain on a per frequency component basis. Because the reflection signals from two equally distanced objects co-exist in the same frequency component in an FMCW system, the DoA estimation should be able to distinguish them. A matching pursuit (MP) plus least square (LS) estimation scheme is thus developed to find the directions from a codebook with predefined steering vectors. In particular, the MP scheme finds the candidate vectors first and the LS scheme determines the best ones from them. Because no low-frequency (<100kHz) ultrasonic transducer array devices are available, individual transducers are put together to form an array. But the size of the ultrasonic transducer is too big to make an array meeting the Nyquist spatial sampling criterion. This leads to an aliasing problem in detection. To mitigate the problem, we propose a new array configuration consisting of 6+2 transducers. Six transducers are placed collinearly to form a linear array with a 3λ/2 spacing. Two auxiliary transducers are put on the opposite sides of the linear array with a horizontal displacement of λ/2 and λ, respectively, to resolve the aliasing issue. The steering vectors of the codebook are redefined subject to this new array configuration and the proposed DoA estimation scheme can be equally applied. Matlab simulations are conducted to verify the performance of the proposed scheme. We start with the simulations assuming a virtual linear array with aλ/2 spacing is available. After this verifications, simulations using the proposed 6+2 array configuration are conducted next. There are two object models adopted in the simulations. The first model assumes that each object has only one reflection point, and the second one assumes nine reflection points are associated with the object. In addition, we assume the codebook contains steering vectors with an angular resolution of 5°. The simulations are conducted under different SNR settings with no interferences. The evaluation criteria include of the accuracy of selecting the best match steering vectors from the codebook and root mean square (RMS) of the estimation in degrees. The tolerance of the estimation error is set to 3°. The DoA estimation schemes under comparison include the conventional MUSIC, ESPRIT schemes and an approach based on orthogonal matching pursuit (OMP). The simulation results indicate the effectiveness of the proposed scheme. Taking the case of SNR setting to 10dB as an example, the estimation error rate of the proposed scheme is merely 4.67% while the RMS of estimation error is just 0.3523°. These numbers are better than the OMP based approach. The conventional MUSIC and SPIRIT schemes fail to distinguish objects with equal distances. As for the simulation results using the proposed 6+2 array configuration, the object model containing 9 reflection points is adopted. The estimation error rate is 1.33% and the RMS of estimation error is 3.5863°. Again, these numbers are better than the OMP based approach while the complexity reduction can be as high as 73.4%
APA, Harvard, Vancouver, ISO, and other styles
37

Lee, Heung Ki. "Adaptive Resource Management Schemes for Web Services." 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7608.

Full text
Abstract:
Web cluster systems provide cost-effective solutions when scalable and reliable web services are required. However, as the number of servers in web cluster systems increase, web cluster systems incur long and unpredictable delays to manage servers. This study presents the efficient management schemes for web cluster systems. First of all, we propose an efficient request distribution scheme in web cluster systems. Distributor-based systems forward user requests to a balanced set of waiting servers in complete transparency to the users. The policy employed in forwarding requests from the frontend distributor to the backend servers plays an important role in the overall system performance. In this study, we present a proactive request distribution (ProRD) to provide an intelligent distribution at the distributor. Second, we propose the heuristic memory management schemes through a web prefetching scheme. For this study, we design a Double Prediction-by-Partial-Match Scheme (DPS) that can be adapted to the modern web frameworks. In addition, we present an Adaptive Rate Controller (ARC) to determine the prefetch rate depending on the memory status dynamically. For evaluating the prefetch gain in a server node, we implement an Apache module. Lastly, we design an adaptive web streaming system in wireless networks. The rapid growth of new wireless and mobile devices accessing the internet has contributed to a whole new level of heterogeneity in web streaming systems. Particularly, in-home networks have also increased in heterogeneity by using various devices such as laptops, cell phone and PDAs. In our study, a set-top box(STB) is the access pointer between the internet and a home network. We design an ActiveSTB which has a capability of buffering and quality adaptation based on the estimation for the available bandwidth in the wireless LAN.
APA, Harvard, Vancouver, ISO, and other styles
38

Huang, Chin-Chung, and 黃清忠. "A Methodology for the Integration of Hopfield Network and Genetic Algorithm Schemes for Graph Matching Problems." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/07832368643089770958.

Full text
Abstract:
博士
國立中山大學
機械與機電工程學系研究所
93
Object recognition is of much interest in recent industrial automation. Although a variety of approaches have been proposed to tackle the recognition problem, some cases such as overlapping objects, articulated objects, and low-resolution images, are still not easy for the existing schemes. Coping with these more complex images has remained a challenging task in the field. This dissertation, aiming to recognize objects from such images, proposes a new integrated method. For images with overlapping or articulated objects, graph matching methods are often used, seeing them as solving a combinatorial optimization problem. Both Hopfield network and the genetic algorithm are decent tools for the combinatorial optimization problems. Unfortunately, they both have intolerable drawbacks. The Hopfield network is sensitive to its initial state and stops at a local minimum if it is not properly given. The GA, on the other hand, only finds a near-global solution, and it is time-consuming for large-scale tasks. This dissertation proposes to combine these two methods, while eliminating their bad and keeping their good, to solve some complex recognition problems. Before the integration, some arrangements are required. For instance, specialized 2-D GA operators are used to accelerate the convergence. Also, the “seeds” of the solution of the GA is extracted as the initial state of the Hopfield network. By doing so the efficiency of the system is greatly improved. Additionally, several fine-tuning post matching algorithms are also needed. In order to solve the homomorphic graph matching problem, i.e., multiple occurrences in a single scene image, the Hopfield network has to repeat itself until the stopping criteria are met. The method can not only be used to obtain the homomorphic mapping between the model and the scene graphs, but it can also be applied to articulated object recognition. Here we do not need to know in advance if the model is really an articulated object. The proposed method has been applied to measure some kinematic properties, such as the positions of the joints, relative linear and angular displacements, of some simple machines. The subject about articulated object recognition has rarely been mentioned in the literature, particularly under affine transformations. Another unique application of the proposed method is also included in the dissertation. It is about using low-resolution images, where the contour of an object is easily affected by noise. To increase the performance, we use the hexagonal grid in dealing with such low-resolution images. A hexagonal FFT simulation is first presented to pre-process the hexagonal images for recognition. A feature vector matching scheme and a similarity matching scheme are also devised to recognize simpler images with only isolated objects. For complex low-resolution images with occluded objects, the integrated method has to be tailored to go with the hexagonal grid. The low-resolution, hexagonal version of the integrated scheme has also been shown to be suitable and robust.
APA, Harvard, Vancouver, ISO, and other styles
39

Tsai, Tsung-Lin, and 蔡宗霖. "Integration of data, function, pipeline partition schemes on distributed system--real-time implementation of correspondence matching in stereo images." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/94128290188731236059.

Full text
Abstract:
碩士
國立東華大學
資訊工程學系
92
We use distributed system and three partition schemes to make program achieve real-time performance. The three partition schemes are data partition, function partition, pipelining partition. In the paper, we analysis the advantages and disadvantages of the three partition, for example, advantage of data partition is the communication cost of processors is less, but the data partition is only suitable for the condition of algorithm only use local data, function partition can assign different task to different hardware, this can make more efficient utilization of hardware, but it can only be used when there are no relation of input and output between tasks. Pipelining partition is easy to applied to program and can raise mass throughput, but is only suitable for successive inputs and pipelining partition will raise the response time of system. At the end, we propose a strategy to integrate three partition schemes to make exploit highest parallelism, and get best throughput. In the field of computer vision, using two images to compute depth of object in images is a long discussed technique. And before compute depth of objects in images, we must computed disparity of corresponding points, but because of the mass computation of the matching of corresponding points , this technique can not be applied to real-time application, and the application is limited. To compute disparity of corresponding points in real-time, we employ an efficient algorithm and a distributed system to compute depth. The algorithm uses two calibrated images and a special data structure to compute disparity of corresponding points in images.
APA, Harvard, Vancouver, ISO, and other styles
40

Rahm, Erhard, Hong-Hai Do, and Sabine Massmann. "Matching Large XML Schemas." 2004. https://ul.qucosa.de/id/qucosa%3A31966.

Full text
Abstract:
Current schema matching approaches still have to improve for very large and complex schemas. Such schemas are increasingly written in the standard language W3C XML schema, especially in E-business applications. The high expressive power and versatility of this schema language, in particular its type system and support for distributed schemas and namespaces, introduce new issues. In this paper, we study some of the important problems in matching such large XML schemas. We propose a fragment-oriented match approach to decompose a large match problem into several smaller ones and to reuse previous match results at the level of schema fragments.
APA, Harvard, Vancouver, ISO, and other styles
41

Do, Hong-Hai, Sergey Melnik, and Erhard Rahm. "Comparison of Schema Matching Evaluations." 2003. https://ul.qucosa.de/id/qucosa%3A32456.

Full text
Abstract:
Recently, schema matching has found considerable interest in both research and practice. Determining matching components of database or XML schemas is needed in many applications, e.g. for E-business and data integration. Various schema matching systems have been developed to solve the problem semi-automatically. While there have been some evaluations, the overall effectiveness of currently available automatic schema matching systems is largely unclear. This is because the evaluations were conducted in diverse ways making it difficult to assess the effectiveness of each single system, let alone to compare their effectiveness. In this paper we survey recently published schema matching evaluations. For this purpose, we introduce the major criteria that influence the effectiveness of a schema matching approach and use these criteria to compare the various systems. Based on our observations, we discuss the requirements for future match implementations and evaluations.
APA, Harvard, Vancouver, ISO, and other styles
42

Huang, Nai-Lun, and 黃迺倫. "Efficient Pattern Matching Scheme in LZW Compressed Sequences." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/17797263375166320625.

Full text
Abstract:
碩士
國立交通大學
電信工程系所
94
Compressed pattern matching (CPM) is an emerging research field addressing the problem: Given a compressed sequence and a pattern, find the pattern occurrence(s) in the (uncompressed) sequence with minimal (or no) decompression. It can be applied to detection of computer virus and confidential information leakage in compressed files directly. In this thesis, we report our work of CPM in LZW compressed sequences. LZW is one of the most effective compression algorithms used extensively. We propose a simple bitmap-based realization of the well-known  Amir-Benson-Farach algorithm. We also generalize the algorithm to find all pattern occurrences (rather than just the first one) and to report their absolute positions in the uncompressed sequence. Experiments are conducted to compare the performance of our proposed generalization with the decompress-then-search scheme. We found that our proposed generalization is much faster than the decompress-then-search scheme. The memory space requirement of our proposed generalization is compared with that of the Navarro-Raffinot scheme, an alternative CPM algorithm which can also be realized with bitmaps. Results show that our proposed generalization has better space performance than the Navarro-Raffinot scheme for moderate and long patterns.
APA, Harvard, Vancouver, ISO, and other styles
43

Huang, Lan-Ya. "An Exact String Matching Algorithms Using Hashing Scheme." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814285000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Huang, Lan-Ya, and 黃蘭雅. "An Exact String Matching Algorithms Using Hashing Scheme." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/16933370591311289241.

Full text
Abstract:
碩士
國立暨南國際大學
資訊管理學系
96
In this thesis, we consider how to solve the exact string matching problem. The exact string matching problem is to find all locations of a pattern string P in a text string T. In general, string matching algorithm problem work in linear time and linear space. The two well-known of them are Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore (BM) algorithm. We will use the hashing scheme to solve the exact string matching problem. Our method is simple to implement, and our algorithm work in constant space. Experimental shows that our algorithm is better than Brute Force algorithm and KMP algorithm.
APA, Harvard, Vancouver, ISO, and other styles
45

Rau, Shiun-Hau, and 饒訓豪. "An Ontology-Based Matching Scheme for Web Services." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/23813100033825312905.

Full text
Abstract:
碩士
國立臺灣大學
資訊管理研究所
91
Automatic Web services discovery, matchmaking, composition, and execution will play an important role in future electronic commerce environments. Service providers need a mechanism to advertise their services to attract service users. Service users also need a way to search the best service provider that can meet their requirements. This paper focuses on the following questions: What language do service providers use to describe their services and service users to describe their requirements? How can a user discover the needed Web services easily and efficiently? How does a service matchmaker match service requirements against advertisements to find the best providers? Currently, UDDI is the primary Web services registry that allows for searching business entities or Web services by well-known identifiers, taxonomy, and keyword based string matching with very limited semantic search ability. In order to increase the precision of Web services searching, we propose a matchmaking scheme that applies the Semantic Web technology. We then design and implement a service matchmaker based on the proposed algorithm and show how it may be applied in a Web service brokering system.
APA, Harvard, Vancouver, ISO, and other styles
46

Drumm, Christian, Matthias Schmitt, Hong-Hai Do, and Erhard Rahm. "QuickMig: automatic schema matching for data migration projects." 2007. https://ul.qucosa.de/id/qucosa%3A32494.

Full text
Abstract:
A common task in many database applications is the migration of legacy data from multiple sources into a new one. This requires identifying semantically related elements of the source and target systems and the creation of mapping expressions to transform instances of those elements from the source format to the target format. Currently, data migration is typically done manually, a tedious and timeconsuming process, which is difficult to scale to a high number of data sources. In this paper, we describe QuickMig, a new semi-automatic approach to determining semantic correspondences between schema elements for data migration applications. QuickMig advances the state of the art with a set of new techniques exploiting sample instances, domain ontologies, and reuse of existing mappings to detect not only element correspondences but also their mapping expressions. QuickMig further includes new mechanisms to effectively incorporate domain knowledge of users into the matching process. The results from a comprehensive evaluation using real-world schemas and data indicate the high quality and practicability of the overall approach.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Hui-Min. "An Exact String Matching Problem Using Data Encoding Scheme." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814110600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Yuan-chung, and 王原中. "A Fast IP Lookup Scheme for Longest Prefix Matching." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/p48vus.

Full text
Abstract:
碩士
國立雲林科技大學
電子與資訊工程研究所
96
IP lookup is the chief bottleneck affecting performance of current routers. A router examines the destination address (DA) of each incoming packet against its forwarding table whereby the longest prefix matching entry would be chosen, and the next-hop decision for this packet is made. In this paper we have designed a scheme named IP Lookup based on Cover-list (IPLC) to construct the forwarding table. The cover list is used to store the covering relation among prefixes. That is, if only a prefix is covered, it will be stored into a cover list with all the prefixes covering it. Hence when we perform IP lookup for a DA, finding the longest prefix matching amounts to seek the prefix which covers the DA and its covering range is the narrowest in the cover list. The fundamental idea to this paper is that IP lookup is executable on prefix ranges. IPLC is a kind of 4-level scheme; each node is equipped with the room to store more than one prefixes. The number of memory access during an IP lookup is restricted to 3 at most, so the average time to search for a DA is reduced, and the goal of high-perofrmance search is achieved. IPLC as a whole has the characteristics as follows: the ability of 11 millions of IP lookup per second in the experimental environment, fast updating to the forwarding table, nonpreprocessing time, equal to or less than 3 levels accessed during a search. The experimental results have shown that IPLC outperforms MRT[20] and PIBT[21] in terms of search performance, update performance, and memory requirement.
APA, Harvard, Vancouver, ISO, and other styles
49

Chen, Hui-Min, and 陳慧敏. "An Exact String Matching Problem Using Data Encoding Scheme." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/99302180071058684639.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
The traditional exact string matching problem is to find all locations of a pattern string with length m in a text with length n. Here we propose a new encoding method to shorten the both lengths of pattern and text by substituting the substring between a special character for its length in O(m+n). Then we use an exact matching algorithm to solve the exact string matching problem on the encoding pattern and text. As can be seen、by using the encoding method、the pattern and text can be shortened about 2/|Σ| times the lengths of the original ones. In practice、it performs better than 2/|Σ|. For instance、for an English sentence pattern whose length is 50 and a text whose length is 200000、in average、the pattern is shortened to 6% of its original length and the text is shortened to 12.4% of its original length. Thus、the exact matching can be done in a much shorter time.
APA, Harvard, Vancouver, ISO, and other styles
50

Pin, Shou-Yu, and 賓少鈺. "A Fast IP Lookup Scheme for Longest-Matching Prefix." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/15738688847194388326.

Full text
Abstract:
碩士
國立雲林科技大學
電子與資訊工程研究所碩士班
90
A Fast IP Lookup Scheme for Longest-Matching Prefix Lih-Chyau Wuu, Shou-Yu Pin Institute of Electronic and Information Engineering National Yunlin University of Science and Technology wuulc@el.yuntech.edu.tw Abstract One of the key design issues for the next generation IP routers is the IP Lookup mechanism. IP lookup is an important action in router that is to find the next hop of each incoming packet with a longest-prefix-match address in the routing table. In this paper, we propose an IP lookup mechanism with the number of memory access for an IP lookup being one in the best case and being four in the worst case. The forwarding table needed by our mechanism is small enough to fit in the SRAM. For example, a large routing table with 40000 routing entries can be compacted to a forwarding table of 260KBytes in our scheme. Moreover, the data structure of the forwarding table makes the updating operating quickly compared with other schemes since it can be updated without reconstructing from scratch when the routing table changes. Keywords─IP lookup, routing table, CIDR, forwarding table, trie
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography