Dissertations / Theses: 'Record linkage'

1

Larsen, Stasha Ann Bown. "Record Linkage." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/3833.

Abstract:

This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.

APA, Harvard, Vancouver, ISO, and other styles

2

Pixton, Burdette N. "Improving Record Linkage Through Pedigrees." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1398.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Honório, Frederico Santos. "Multi-domain record linkage platform." Master's thesis, Universidade de Aveiro, 2013. http://hdl.handle.net/10773/11750.

Full text

Abstract:

Mestrado em Engenharia de Computadores e Telemática
This dissertation proposes and presents a technologic framework to perform record linkage. The proposed approach begins by defining a sequence of tasks necessary for record linkage. Then, several methods to split these tasks in small work units are discussed. The solution architecture is based on various executors that carry out the work units in a parallel manner, thereby making the process quicker. Finally, the benefits of using this approach in different contexts are also presented.
Esta dissertação propõe e apresenta uma ferramenta tecnológica que permite realizar mapeamento de registos. A abordagem proposta começa por definir uma sequência de tarefas necessárias para efetuar o mapeamento de registos. São depois discutidos métodos de separar estas tarefas em unidades de trabalho reduzidas. A solução baseia-se numa arquitetura composta por vários executores que levam a cabo essas unidades de trabalho de uma forma paralela, objectivando-se um processo mais rápido. As vantagens da utilização desta abordagem em diferentes contextos são também estudadas.

APA, Harvard, Vancouver, ISO, and other styles

4

Welford, John Anthony. "Nominal record linkage : the development of computer strategies to achieve the family-based record linkage of nineteenth century demographic data." Thesis, n.p, 1989. http://ethos.bl.uk/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Boyd, James Hutchison. "Record Linkage Techniques: Exploring and developing data matching methods to create national record linkage infrastructure to support population level research." Thesis, Curtin University, 2016. http://hdl.handle.net/20.500.11937/54163.

Full text

Abstract:

In a world where the growth in digital information and systems continues to expand, researchers have access to unprecedented amounts of data. These large and complex data reservoirs require creative, innovative and scalable tools to unlock the potential of this ‘big data’. Record linkage is a powerful tool in the ‘big data’ arsenal. This thesis demonstrates the value of national record linkage infrastructure and how this has been achieved for the Australian research community.

APA, Harvard, Vancouver, ISO, and other styles

6

Nin, Guerrero Jordi. "Contributions to Record Linkage for Disclosure Risk Assessment." Doctoral thesis, Universitat Autònoma de Barcelona, 2008. http://hdl.handle.net/10803/5787.

Full text

Abstract:

Cada dia una gran quantitat de dades són recollides pels instituts d'estadística. Aquest fet combinat amb el creixement que ha experimentat Internet en els darrers anys fa que hom es pregunti si les seves dades confidencials són emmagatzemades i distribuïdes d'una manera privada i segura.
En aquest marc, els mètodes de protecció de dades tenen una gran importància, convertint-se en crucial anonimitzar les dades abans de la seva publicació. Quan anonimitzem un conjunt de dades amb un mètode de protectió, s'ha d'avaluar el grau de privadesa de les noves dades protegides. Les tècniques de re-identificació, com l'enllaç de registres, són unes de les tècniques més utilitzades per avaluar la seguretat d'un mètode de protecció.
Aquesta tesi aplica mètodes d'enllaç de registres al càlcul del risc de revelació dels diferents mètodes de protecció de dades. L'objectiu d'aquest procés és avaluar la seguretat d'un mètode de protecció d'una forma pràctica i real. Les principals contribucions d'aquesta tesis són:
· La definició de tres mètodes d'enllaç de registres dissenyats per avaluar el risc de revelació de dos dels mètodes d'anonimització més utilitzats: la microagregació i l'intercanvi de rangs.
· La formalització d'una mesura empírica que avalua el risc de revelació de la microagregació multi variable.
· El desenvolupament de noves variants dels mètodes de protecció clàssics que són resistents a les tècniques d'enllaç de registres definides dins d'aquesta tesi.
· L'estudi de nous escenaris on el risc de revelació encara existeix. Concretament, hem definit un mètode de re-identificació basat en funcions d'agregació que permet re-identificar individus quan l'intrús no té accés a les dades originals abans d'ésser protegides. També hem desenvolupat un marc per a l'avaluació de mètodes de protecció quan aquests s'apliquen a series temporals. En aquest darrer escenari hem definit una serie de mesures per avaluar la pèrdua d'informació i el risc de revelació.
Every day, a large amount of data is collected by statistical agencies. This fact combined with the growth that the Internet has experimented during the recent years makes one wonders whether its confidential data is stored and distributed in a secure way.
In this framework, data protection methods have a great importance, becoming crucial to anonymize confidential attributes before releasing them in a private and secure manner. When a protection method is applied, a new and challenging problem arises. This problem is the evaluation of the privacy provided by such method. Re-identification techniques, as record linkage methods, are one of the most common techniques for evaluating the security of a protection method.
This thesis applies record linkage techniques to the calculation of the disclosure risk of a protection method. The aim of this application is to evaluate the security of a protection method in a real and fair way. The main contributions are:
· The definition of three specific record linkage techniques for evaluating two of the most common protection methods: rank swapping and microaggregation.
· The definition of an empirical disclosure risk measure for microaggregation.
· The development of new variants of rank swapping and microaggregation resistant to record linkage methods and disclosure risk measures defined in this thesis.
· The study of new disclosure risk scenarios. In particular, we have developed a record linkage method which applies aggregation functions to re-identify individuals when the intruder has no access to any of the original attributes of the protected data. We have also developed a framework for the evaluation of protection methods when they are applied to time series data.

APA, Harvard, Vancouver, ISO, and other styles

7

Cournut, Pierre. "Identification model of musical works using record linkage." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-249713.

Full text

Abstract:

This thesis is based on a project that is part of IBM’s collaboration with a Collecting Right Organization that collects and distributes payments of authors’ rights. The project aimed at helping this organization identify right beneficiaries for musical tracks listened on online streaming platforms. Given as an input a list of tracks composed of metadata such as artist names, titles and listening statistics, the goal was to match each line with its corresponding element in this organization’s documentation. Since each broadcaster has its own catalogue of music, it can be hard sometimes to find the correct matching for each song. In practice, this organization has a dedicated team that handles manually some of the non-trivial cases. Whereas their identification process focuses on resources that contribute to 90% of the revenue of each listening report, it achieves an identification rate of around 70% of the resources declared which represent a substantial amount of unprocessed tracks left aside. In this thesis, we investigate the possibility to outperform the current solution and design a new identification model that combines concepts and technologies from various fields including search engines, string metrics and machine learning. First, the identification process used by the organization was reproduced and refined to quickly process the most trivial cases. On top of this, an identification algorithm that relies on a machine learning framework was built to process non-trivial cases. This method showed very promising results since it achieves an identification rate and a false discovery rate of the order of those of the current solution without the use of a dedicated team of experts.
Detta examensarbete bidrar till ett samarbetsprojekt mellan IBM och en upphovsrättsorganisation, som samlar in och distribuerar royalties till upphovsmän. Projektet syftade till att hjälpa denna organisation att identifiera upphovsrättsinnehavare för musikverk som spelas på strömmande plattformar. Givet en verklista med metadata, såsom artistnamn, titlar och lyssningsstatistik, var målet att matcha varje rad med motsvarande element i organisationens dokumentation. Eftersom varje musikdistributör har sin egen musikkatalog kan det vara svårt att hitta rätt upphovsman för ett givet verk. I praktiken har denna organisation ett arbetslag som hanterar de icke triviala fallen manuellt. Detta sökarbete fokuserar på resurser som bidrar till 90% av intäkterna för varje lyssningsrapport, och uppnår en identifieringsgrad på omkring 70%. En betydande mängd obearbetade lyssningsrapporter lämnas alltså åt sidan, vilket leder till förluster för rättighetsinnehavarna. I föreliggande arbete undersöktes möjligheten att överträffa den nuva- rande lösningen. En ny identifieringsmodell utformades som kombinerar begrepp och teknik från olika områden, inklusive sökmotorer, strängmätningar och maskininlärning. För det första reproducerades och förfinades identifieringsprocessen som användes av organisationen för att snabbt behandla de mest triviala fallen. Utöver detta tillkommer en identifieringsalgoritm som bygger på maskininlärning, för att behandla icke triviala fall. Metoden uppvisade mycket lovande resultat; den uppnår en identifieringstakt och en felprocent av samma storleksordning som den nuvarande lösningen, utan att använda människor som experter.

APA, Harvard, Vancouver, ISO, and other styles

8

Dannelöv, Johannes. "Study on Record Linkage regarding Accuracy and Scalability." Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155357.

Full text

Abstract:

The idea of record linkage is to find records that refer to the same entity across different data sources. There are multiple synonyms that refer to record linkage, such as data matching, entity resolution, entity disambiguation, or deduplication etc. Record linkage is useful for lots of practices including data cleaning, data management, and business intelligence. Machine learning methods include both unsupervised and supervised learning methods have been applied to address the problem of record linkage. The rise of the big data era has presented new challenges. The trade-off of accuracy and scalability presents a few critical issues for the linkage process. The objective of this study is to present an overview of the state-of-the-art machine learning algorithms for record linkage, a comparison between them, and explore the optimization possibilities of these algorithms based on different similarity functions. The optimization is evaluated in terms of accuracy and scalability. Results showed that supervised classification algorithms, even with a relatively small training set, classified sets of data in shorter time and had approximately the same accuracy as the unsupervised counterparts.

APA, Harvard, Vancouver, ISO, and other styles

9

Holmlund, Olof. "EVALUATING RECORD LINKAGE METHODS FOR MANIFOLD IDENTITY DETECTION." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164720.

Full text

Abstract:

Record Linkage is the process of linking two or more records in a database to the same real life entity. These records do not share a common identifier. This makes connecting them to each other a difficult task since they can only be linked based on similarities in their data. This data can also contain errors due to misspellings or missing fields further increasing the difficulty of the task. In this thesis, common methods for comparing records and finding duplicates are presented. Methods for increasing the performance and reducing the computer power needed are also presented to show how record-linkage can be used with big amounts of data. Built on this knowledge, several experiments comparing these methods have been conducted, using data from two benchmark data sets including Freely Extensible Biomedical Record Linkage (FEBRL) and the North Carolina Voter Registration (NCVR) data set. The results presented show that different types of similarity measures can have similar performance, and that supervised methods provide better prediction rates than unsupervised methods. Finally, suggestions for future work and improvements are given.

APA, Harvard, Vancouver, ISO, and other styles

10

Tušimová, Lucia. "Generování rodokmenů z matričních záznamů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417252.

Full text

Abstract:

This work discusses the field of genealogy, different types of records and data in them. The thesis describes the topic of comparison of data and record linkage. It further it also discusses the design and implementation of the resulting system. The developed system connects people from parish records to larger pedigrees. These are then stored in the form of a graph database. The success of the interconnection of records was tested on the provided data sets.

APA, Harvard, Vancouver, ISO, and other styles

11

Bauman, G. John. "Computation of Weights for Probabilistic Record Linkage Using the EM Algorithm." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1361.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Snae, Chakkrit. "An investigation of computer based nominal data record linkage." Thesis, University of Liverpool, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.433784.

Full text

Abstract:

The Internet now provides access to vast volumes of nominal data (data associated with names e. g. birth/death records, parish records, text articles, multimedia) collected for a range of different purposes. This research focuses on parish registers containing baptism, marriage, and burial records. Mining these data resources involves linkage investigating as to how two records are related with regards to attributes like surname, spatio-temporal location, legal association and inter-relationships. Furthermore, as well as handling the implicit constraints of nominal data, such a system must also be able to handle automatically a range of temporal and spatial rules and constraints. The research examines the linkage rules that apply and how such rules interact. In this investigation a report is given of the current practices in several disciplines (e. g. history, demography, genealogy, and epidemiology) and how these are implemented in current computer and database systems. The practical aspects of this study, and the workbench approach proposed are centred on the extensive Lancashire & Cheshire Parish Register archive held on the MIMAS database computer located at Manchester University. The research also proposes how these findings can have wider applications. This thesis describes some initial research into this problem. It describes three prototypes of nominal data workbench that allow the specification and examination of several linkage types and discusses the merits of alternative name matching methods, name grouping techniques and method comparisons. The conclusion is that in the cases examined so far, effective nominal data linkage is essentially a query optimisation process. The process is made more efficient if linkage specific indexes exist, and suggests that query re-organization based on these indexes, though a complex process, is entirely feasible. To facilitate the use of indexes and to guide the optimization process, the work suggests the use of formal ontologies.

APA, Harvard, Vancouver, ISO, and other styles

13

Jupin, Joseph. "Temporal Graph Record Linkage and k-Safe Approximate Match." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/412419.

Full text

Abstract:

Computer and Information Science
Ph.D.
Since the advent of electronic data processing, organizations have accrued vast amounts of data contained in multiple databases with no reliable global unique identifier. These databases were developed by different departments for different purposes at different times. Organizing and analyzing these data for human services requires linking records from all sources. RL (Record Linkage) is a process that connects records that are related to the identical or a sufficiently similar entity from multiple heterogeneous databases. RL is a data and compute intensive, mission critical process. The process must be efficient enough to process big data and effective enough to provide accurate matches. We have evaluated an RL system that is currently in use by a local health and human services department. We found that they were using the typical approach that was offered by Fellegi and Sunter with tuple-by-tuple processing, using the Soundex as the primary approximate string matching method. The Soundex has been found to be unreliable both as a phonetic and as an approximate string matching method. We found that their data, in many cases, has more than one value per field, suggesting that the data were queried from a 5NF data base. Consider that if a woman has been married 3 times, she may have up to 4 last names on record. This query process produced more than one tuple per database/entity apparently generating a Cartesian product of this data. In many cases, more than a dozen tuples were observed for a single database/entity. This approach is both ineffective and inefficient. An effective RL method should handle this multi-data without redundancy and use edit-distance for approximate string matching. However, due to high computational complexity, edit-distance will not scale well with big data problems. We developed two methodologies for resolving the aforementioned issues: PSH and ALIM. PSH – The Probabilistic Signature Hash is a composite method that increases the speed of Damerau-Levenshtein edit-distance. It combines signature filtering, probabilistic hashing, length filtering and prefix pruning to increase the speed of edit-distance. It is also lossless because it does not lose any true positive matches. ALIM – Aggregate Link and Iterative Match is a graph-based record linkage methodology that uses a multi-graph to store demographic data about people. ALIM performs string matching as records are inserted into the graph. ALIM eliminates data redundancy and stores the relationships between data. We tested PSH for string comparison and found it to be approximately 6,000 times faster than DL. We tested it against the trie-join methods and found that they are up to 6.26 times faster but lose between 10 and 20 percent of true positives. We tested ALIM against a method currently in use by a local health and human services department and found ALIM to produce significantly more matches (even with more restrictive match criteria) and that ALIM ran more than twice as fast. ALIM handles the multi-data problem and PSH allows the use of edit-distance comparison in this RL model. ALIM is more efficient and effective than a currently implemented RL system. This model can also be expanded to perform social network analysis and temporal data modeling. For human services, temporal modeling can reveal how policy changes and treatments affect clients over time and social network analysis can determine the effects of these on whole families by facilitating family linkage.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

14

Jensen, Krista Peine. "Probabilistic Methodology for Record Linkage Determining Robustness of Weights." BYU ScholarsArchive, 2004. https://scholarsarchive.byu.edu/etd/590.

Full text

Abstract:

Record linkage is the process that joins separately recorded pieces of information for a particular individual from one or more sources. To facilitate record linkage, a reliable computer based approach is ideal. In genealogical research computerized record linkage is useful in combing information for an individual across multiple censuses. In creating a computerized method for linking censuse records it needs to be determined if weights calculated from one geographical area, can be used to link records from another geographical area. Research performed by Marcie Francis calculates field weights using census records from 1910 and 1920 for Ascension Parish Louisiana. These weights are re-calculated to take into account population changes of the time period and then used on five data sets from different geographical locations to determine their robustness. HeritageQuest provided indexed census records on four states. They include California, Connecticut, Illinois and Michigan in addition to Louisiana. Because the record size of California was large and we desired at least five data sets for comparison this state was split into two groups based on geographical location. Weights for Louisiana were re-calculated to take into consideration visual basic code modifications for the field "Place of Origin", "Age" and "Location" (enumeration district). The validity of these weights, were a concern due to the low number of known matches present in the data set for Louisiana. Thus, to get a better feel for how weights calculated from a data source with a larger number of known matches present, weights were calculated for Michigan census records. Error rates obtained using weights calculated from the Michigan data set were lower than those obtained using Louisiana weights. In order to determine weight robustness weights for Southern California were also calculated to allow for comparison between two samples. Error rates acquired using Southern California weights were much lower than either of the previously calculated error rates. This led to the decision to calculate weights for each of the data sets and take the average of the weights and use them to link each data set to take into account fluctuations of the population between geographical locations. Error rates obtained when using the averaged weights proved to be robust enough to use in any of the geographical areas sampled. The weights obtained in this project can be used when linking any census records from 1910 and 1920. When linking census records from other decades it is necessary to calculate new weights to account for specific time period fluctuations.

APA, Harvard, Vancouver, ISO, and other styles

15

MacLeod, Margaret Catriona Morag. "Record linkage : applied to a clinical trial and cohort study." Thesis, University of Glasgow, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.297030.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Brown, Adrian Paul. "Implementing Privacy-Preserving Record Linkage in a Cloud Computing Environment." Thesis, Curtin University, 2021. http://hdl.handle.net/20.500.11937/86133.

Full text

Abstract:

Increased demand for record linkage of administrative data, coupled with privacy risks, presents considerable challenges for many organisations. The primary objective of this research was to develop an operational cloud model for privacy-preserving record linkage utilising scalable computing infrastructure. The thesis presents and evaluates a cloud model for record linkage that links records without sharing personally identifying information, incorporating new techniques for improving accuracy, privacy and performance.

APA, Harvard, Vancouver, ISO, and other styles

17

Haque, Shovanur S. "Assessing the accuracy of record matching algorithms in data linkage." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/123042/1/Shovanur_Haque_Thesis.pdf.

Full text

Abstract:

This thesis developed a Markov Chain based Monte Carlo (MaCSim) simulation approach, implemented in the R software, for assessing the accuracy of a linked file and illustrates the utility of the approach using the ABS (Australian Bureau of Statistics) synthetic data in realistic data settings. MaCSim, can be used either to assess a linking method or to compare multiple linking methods. The accuracy results using MaCSim can inform decisions on a preferred linking method or whether records are linkable at all. This will prove extremely important in applying analysis techniques which can adequately account for the errors associated with linkage.

APA, Harvard, Vancouver, ISO, and other styles

18

Grzebala, Pawel B. "Private Record Linkage: A Comparison of Selected Techniques for Name Matching." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1461096562.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Shah, Pooja P. "Combination of a Probabilistic-Based and a Rule-Based Approach for Genealogical Record Linkage." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1353.

Full text

Abstract:

Record linkage is the task of identifying records within one or multiple databases that refer to the same entity. Currently, there exist many different approaches for record linkage. Some approaches incorporate the use of heuristic rules, mathematical models, Markov models, or machine learning. This thesis focuses on the application of record linkage to genealogical records within family trees. Today, large collections of genealogical records are stored in databases, which may contain multiple records that refer to a single individual. Resolving duplicate genealogical records can extend our knowledge on who has lived and more complete information can be constructed by combining all information referring to an individual. Simple string matching is not a feasible option for identifying duplicate records due to inconsistencies such as typographical errors, data entry errors, and missing data. Record linkage algorithms can be classified under two broad categories, a rule-based or heuristic approach, or a probabilistic-based approach. The Cocktail Approach, presented by Shirley Ong Ai Pei, combines a probabilistic-based approach with a rule-based approach for record linkage. This thesis discusses a re-implementation and adoption of the Cocktail Approach to genealogical records.

APA, Harvard, Vancouver, ISO, and other styles

20

Agha, Mohammad Mehdi. "Congenital abnormalities and childhood cancer, a record-linkage cohort study, Ontario, Canada." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0019/NQ53665.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Ashbury, Janet E. "Selective serotonin reuptake inhibitors (SSRIs) and breast cancer : a record linkage study." Thesis, Kingston, Ont. : [s.n.], 2008. http://hdl.handle.net/1974/971.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

LI, PEI. "Linking records with value diversity." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2013. http://hdl.handle.net/10281/42976.

Full text

Abstract:

Most record linkage techniques assume that information of the underlying entities do not change and is provided in different representations and sometimes with errors. For example, mailing lists may contain multiple entries representing the same physical address, but each record may be slightly different, e.g., containing different spellings or missing some information. As a second example, consider a company that has different customer databases (e.g., one for each subsidiary). A given customer may appear in different ways in each database, and there is a fair amount of guesswork in determining which customers match. However, in real-world, we often observe value diversity in real-world data sets for linkage. For example, many data sets contains temporal records over a long period of time; each record is associated with a time stamp and describes some aspects of a real-world entity at that particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. Value diversity also exists group linkage: linking records that refer to entities in the same group. Applications for group linkage includes finding businesses in the same chain, finding conference attendants from the same affiliation, finding players from the same team, etc. In such cases, although different members in the same group can share some similar global values, they represent different entities so can also have distinct local values, requiring a high tolerance for value diversity. However, most existing record linkage techniques assume that records describing the same real-world entities are fairly consistent and often focus on different representations of the same value, such as ”IBM” and ”International Business Machines”. Thus, they can fall short when values may vary for the same entity. This dissertation studies how to improve linkage quality of integrated data with tolerance to fairly high diversity, including temporal linkage, and group linkage. We solve the problem of temporal record linkage in two ways. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets. For group linkage, we present a two-stage algorithm: the first stage identifies cores containing records that are very likely to belong to the same group; the second stage collects strong evidence from the cores and leverages it for merging more records in the same group, while being tolerant to differences in other values. Our algorithm is designed to ensure efficiency and scalability. An experiment shows that it finished in 2.4 hours on a real-world data set containing 6.8 million records, and obtained both a precision and a recall of above .95. Finally, we build the CHRONOS system which offers users the useful tool for finding real-world entities over time and understanding history of entities in the bibliography domain. The core of CHRONOS is a temporal record-linkage algorithm, which is tolerant to value evolution over time. Our algorithm can obtain an F-measure of over 0.9 in linking author records and fix errors made by DBLP. We show how CHRONOS allows users to explore the history of authors, and how it helps users understand our linkage results by comparing our results with those of existing systems, highlighting differences in the results, explaining our decisions to users, and answering “what-if” questions.

APA, Harvard, Vancouver, ISO, and other styles

23

Nasseh, Daniel. "Einsatz und Optimierung einer überwachten Klassifizierungsmethode im Kontext eines Privacy-Preserving-Record-Linkage." Diss., Ludwig-Maximilians-Universität München, 2014. http://nbn-resolving.de/urn:nbn:de:bvb:19-178141.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Kotze, E., and T. McDonald. "A longitudinal patient record for patients receiving antiretroviral treatment." Journal for New Generation Sciences, Vol 10, Issue 1: Central University of Technology, Free State, Bloemfontein, 2012. http://hdl.handle.net/11462/598.

Full text

Abstract:

Published Article
In response to the Human Immunodeficiency Virus (HIV) epidemic in the country, the South African Government started with the provisioning of Antiretroviral Therapy (ART) in the public health sector. Monitoring and evaluating the effectiveness of the ART programme is of the utmost importance. The current patient information system could not supply the required information to manage the rollout of the ART programme. A data warehouse, consisting of several data marts, was developed that integrated several disparate systems related to HIV/AIDS/ART into one system. It was, however, not possible to trace a patient across all the data marts in the data warehouse. No unique identifiers existed for the patient records in the different data marts and they also had different structures. Record linkage in conjunction with a mapping process was used to link all the data marts and in so doing identify the same patient in all the data marts. This resulted in a longitudinal patient record of an ART patient that displayed all the treatments received by the patient in all public health care facilities in the province.

APA, Harvard, Vancouver, ISO, and other styles

25

Davies, Hywel Rhodri. "Nominal record linkage of historical data : procedures and applications in a North Wales parish." Thesis, University of Southampton, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238880.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

ALICANDRO, GIANFRANCO. "SOCIOECONOMIC INEQUALITY IN PREMATURE MORTALITY IN ITALY: A NATIONAL CENSUS-BASED RECORD LINKAGE STUDY." Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/612572.

Full text

Abstract:

Background Socioeconomic status (SES) is a well-recognized determinant of health. A high prevalence of risk factors for non-communicable diseases along with reduced access to early diagnosis and effective treatment have historically been thought to be the main mechanisms underlying the relationship between low SES and poor health. However, the phenomenon is more complex and involves also psychosocial factors, such as stress, depression, financial difficulties, lack of social support, and low job control, all risk factors for poor health. Nowadays, inequalities in health represents a major challenge for health policies, having a high social, ethical and economic impact even in high-income countries. This is particularly true during economic recessions, when unemployment and financial problems is expected to affect more people with medium or low SES. However, results on the impact of macroeconomic changes on socioeconomic inequalities in health are controversial and substantial differences exist among European countries with higher inequality in North and East Europe compared to southern European countries. In Europe, the evidence on inequalities in mortality comes mainly from national, longitudinal, census-linked or unlinked studies, whereas in Italy most of the data are based only on urban areas. The lack of national study precludes a comprehensive analysis of socioeconomic inequalities in mortality in Italy which can measure the impact of SES on cause-specific mortality and evaluate within-country geographic differences and the interaction with other variables. Aim The study aimed to quantify socioeconomic inequality in premature mortality in absolute and relative terms in Italy considering also geographic differences and the role of other variables, such as marital status, size of the municipality, and social and material vulnerability of the municipality of residence. Methods The study was based on the record linkage of national administrative databases, including the 2011 census and the mortality registries. Each death occurred in Italy from census date (9 October 2011) onwards was linked to the census using the tax identification number as linkage key. This allowed to conduct a cohort study based on all Italian residents. In this report, the mortality registries for the period 2011-2015 were linked to the 2011 census and the individuals alive on 1 January 2012 were included in the cohort. Education and occupation were used to determine the SES of the individual. Four levels of education were considered: no education or primary school, middle school, high school and university. Occupation-based social class was obtained by using the Erikson-Goldthorpe scheme with the following classes: non-skilled manual, skilled manual workers, farmers, self-employees, routine non-manual and upper non-manual workers. Relative inequality was measured by computing the age-adjusted mortality rate ratio (MRR) and the relative index of inequality (RII), whereas absolute inequality was measured by calculating the slope index of inequality (SII). The MRR and the RII was estimated by fitting multiplicative Poisson regression models, whereas the SII was estimated by fitting additive Poisson models. RII and SII were obtained by regressing the mortality rate of SES groups on a specific measure of their relative position in the social hierarchy: the socioeconomic rank, i.e. the proportion of the population that has a higher position, scaled to take values between 0 (highest rank) and 1 (lowest rank). The level of education was used to obtain the socioeconomic rank. RII and SII express the magnitude of socioeconomic inequality in relative and absolute terms, respectively, by providing a unique estimate of the inequality that can be used for comparisons within the same population or between different populations. The resulting figures can be interpreted as the ratio (for RII) or difference (for SII) of mortality rates between those at the bottom and those at the top of the social hierarchy. RII and SII were used to rank the causes of death by relative and absolute inequality. Results A total of 35,708,445 subjects aged between 30 and 74 years were included in the study. In four years of follow-up, 573,335 deaths were registered over 137,847,954 person-years at risk. Being low educated and having a less prestigious job had a negative effect on overall premature mortality and mortality from most of the causes of death considered in this study. Compared to men with the highest level of education (university graduates), the MRR from all causes was 1.30 (95% CI: 1.10-1.53) for men with high school diploma, 1.64 (95% CI: 1.40-1.92) for those with middle school diploma and 1.93 (95% CI: 1.65-2.27) among those with no education or primary school certificate. Compared to women with the highest level of education (university graduates), the MRR from all causes was 1.14 (95% CI: 1.01-1.29) for women with high school diploma, 1.31 (95% CI: 1.16-1.48) for those with middle school diploma and 1.44 (95% CI: 1.28-1.63) among those with no education or primary school certificate. Compared to men in the upper non-manual class, the MRR from all causes was 1.24 (95% CI: 1.18-1.30) among routine non-manual workers, 1.31 (95% CI: 1.24- 1.38) among self-employees, 1.48 (95% CI: 1.35-1.63) among farmers, 1.37 (95% CI: 1.30- 1.45) among skilled manual workers and 1.63 (95% CI: 1.55-1.71) among non-skilled manual workers. In women, all the other classes showed only a slight increase in mortality as compared to upper non-manual workers, with the only exception of farmers who had comparable mortality rates. The MRR was 1.07 (95% CI: 1.02-1.13) among routine non-manual workers, 1.14 (95% CI: 1.06-1.23) among self-employees, 1.03 (95% CI: 0.89-1.19) among farmers, 1.08 (95% CI: 0.98-1.20) among skilled manual workers and 1.09 (95% CI: 1.03-1.16) among non-skilled manual workers. Socioeconomic inequality for all-cause mortality was higher in men than in women, both in relative (RII for men: 2.07, 95% CI: 1.81-2.37, RII for women: 1.51, 95% CI: 1.35-1.68) and absolute terms (SII for men: 373 deaths per 100.000 person-years, 95% CI: 327-419, SII for women: 113 deaths per 100.000 person-years, 95% CI: 88-138). In relative terms, the causes of death with the highest inequality were: laryngeal cancer (RII: 5.69, 95% CI: 4.54-7.15), chronic liver diseases (RII: 5.03, 95% CI: 3.72-6.80), chronic lower respiratory diseases (RII: 4.83, 95% CI: 3.59-6.50) and HIV/AIDS (RII: 4.77, 95% CI: 3.11-7.31) among men, and diabetes (RII: 5.75, 95% CI: 4.48-7.37), HIV/AIDS (RII: 4.33, 95% CI: 2.55-7.38) and chronic liver diseases (RII: 3.47, 95% CI: 2.71-4.44) among women. The causes of death with the highest absolute socioeconomic inequality were: circulatory system diseases (SII: 85 deaths per 100,000 person-years, 95% CI: 76-94) and lung cancer (SII: 58 deaths, 95% CI: 52-64) among men, and circulatory system diseases (SII: 43 deaths, 95% CI: 37-49) and diabetes (SII: 12 deaths, 95% CI: 10; 14) among women. Socioeconomic inequality in all-cause mortality was higher among singles (RII in men: 3.24, 95% CI: 2.68-3.92, RII in women: 2.71, 95% CI: 2.11-3.49), separated or divorced (RII in men: 2.58, 95% CI: 2.30-2.58, RII in women: 1.67, 95% CI: 1.26-1.50) than married individuals (RII in men: 1.80, 95% CI: 1.63-1.98, RII in women 1.42, 95% CI: 1.32-1.53). People living in large municipalities (≥50,000 residents) showed a higher level of socioeconomic inequality (RII in men: 2.42, 95% CI: 2.09-2.79, RII in women: 1.68, 95% CI: 1.68, 95% CI 1.52-1.86) than those living in small municipalities (<2000 residents) (RII in men: 1.88, 95% CI: 1.66-2.14, RII in women: 1.40, 95% CI: 1.12-1.62). Women living in municipalities with high social and material vulnerability showed higher socioeconomic inequality in overall mortality (RIIs: 1.70, 95% CI: 1.51-1.91 and 1.34, 95% CI: 1.19-1.49 for those living in the last and first fifths of the distribution of the vulnerability index of the municipality of residence, respectively), whereas the estimates among men overlapped. Socioeconomic inequality in mortality from circulatory system diseases and diabetes was greater in women from southern Italy, while there are no substantial geographic differences in men. The RIIs for all circulatory system diseases were: 2.73 (95% CI: 2.39-3.12) in women living in the South, 1.86 (95% CI: 1.55-2.22) in those living in the Center and 2.01 (95% CI: 2.01, 1.72-2.36) in those living in the North. The RIIs for diabetes were: 6.21 (95% CI: 4.80-8.08) in women living in the South, 4.32 (95% CI: 3.33-5.61) in those living in the Center and 3.71 (95% CI: 2.78, 1.72-4.95) in those living in the North of the country. Conclusions The successful linkage of national databases allowed, for the first time, to provide a comprehensive picture of socioeconomic inequality in mortality in Italy. Socioeconomic inequality in premature mortality is still a major public health problem in Italy. It is more pronounced among some groups of the population, such as singles, separated and divorced individuals, those living in large municipalities, and women living in southern Italy or in municipalities with high social and material vulnerability. Lung cancer (in men), circulatory system diseases (in both sexes) and diabetes (in women) are the major contributors to the absolute socioeconomic inequality in Italy. The findings of this study will have important implications for planning policies to reduce the existing disparities in mortality in Italy.

APA, Harvard, Vancouver, ISO, and other styles

27

Johansson, Lars Age. "Targeting Non-obvious Errors in Death Certificates." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis : Universitetsbiblioteket [distributör], 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Lee, Yiu-fai. "Analysis for segmental sharing and linkage disequilibrium a genomewide association study on myopia /." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B43912217.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Suzuki, Katia Mitiko Firmino. "O uso de método de relacionamento de dados (record linkage) para integração de informação em sistemas heterogêneos de saúde: estudo de aplicabilidade entre níveis primário e terciário." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/17/17138/tde-23092013-103026/.

Full text

Abstract:

O relacionamento de dados record linkage, originou-se na área da saúde pública e atualmente é aplicado em várias outras áreas como: epidemiologia, pesquisa médica, criação de ensaios clínicos, na área de marketing, gestão de relacionamento com o cliente, detecção de fraude, aplicação da lei e na administração do governo. A técnica consiste no processo de comparação entre dois ou mais registros em diferentes bases de dados e as principais estratégias de record linkage são: manual, deterministic record linkage (DRL) e probabilistic record linkage (PRL). Este estudoteve como objetivo aplicar o record linkage em bases de dados heterogêneas, utilizadas pela rede de atenção à saúde do município de Ribeirão Preto e identificar entre elas a melhor estratégia a ser adotada para a integração de bases de dados na área da saúde. As bases de dados da secretaria Municipal de Saúde de Ribeirão Preto (SMS-RP) e do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (HCFMRP/USP) foram objeto deste estudo, tendo como critério de inclusão apenas os registros de pacientes em que o município de residência informado correspondia ao município de Ribeirão Preto e o atendimento tivesse ocorrido na Unidade Básica Distrital e de Saúde (UDBS) - Centro Saúde Escola Joel Domingos Machado\" (CSE-Sumarezinho) nos anos de janeiro de 2006 a agosto de 2008 e no HCFMRP/USP. Foi selecionada uma amostra aleatória simples resultando em um conjunto de 1.100 registros de pacientes na base de dados do CSE-Sumarezinho e de 370.375 registros na base de dados do HCFMRP/USP. Foram, então, selecionadas quatro variáveis de relacionamento (nome, nome da mãe, sexo e data de nascimento). As estratégias adotadas foram: DRL exato, DRL com discordância em uma variável de relacionamento, e baseada em funções de similaridades (Dice, Levenshtein, Jaro e Jaro-Winkler) e, por fim, PRL. A estratégia DRL exato resultou em 334 registros pareados e na abordagem com discordância de uma variável foram 335, 343, 383 e 495, sendo as variáveis discordantes sexo, data de nascimento, nome e nome da mãe respectivamente. Quanto ao uso das funções de similaridades, as que mais se destacaram foram Jaro-Winkler e Jaro. Quanto à acurácia dos métodos aplicados, o PRL (sensibilidade = 97,75% (CI 95% 96,298,8) e especificidade = 98,55% (CI 95% 97,0-99,4)) obteve melhor sensibilidade e especificidade, seguido do DRL com as funções de similaridade Jaro-Winkler sensibilidade = 91,3% (CI 95% 88,793,4) e especificidade = 99% (CI 95% 97,6-99,7)) e Jaro (sensibilidade = 73,1% (CI 95% 69,476,6) e especificidade = 99,6% (CI 95% 98,5-99,9)). Quanto à avaliação da área sob a curva ROC do PRL, observou-se que há diferença estatisticamente significativa (p = 0,0001) quando comparada com os métodos DRL com discordância da variável nome da mãe, Jaro-Winkler e Jaro. Os resultados obtidos permitem concluir que o método PRL é mais preciso dentre as técnicas avaliadas. Mas as técnicas com a função de similaridade de Jaro-Winkler e Jaro também são alternativas viáveis interessantes devido à facilidade de utilização apesar de apresentarem o valor de sensibilidade ligeiramente menor que o PRL.
The record linkage originated in the area of public health and is currently applied in several other areas such as epidemiology, medical research, establishment of clinical trials, in the area of marketing, manager customer relationships, fraud detection, law enforcement and government administration. The technique consists on the comparison between two or more records in different databases and their key strategies are: manual comparison, Deterministic Record Linkage (DRL), and Probabilistic Record Linkage (PRL).This study aimed to apply the record linkage in heterogeneous databases, used by the network of health care in Ribeirão Preto and identify the best strategy to be adopted for the integration of databases in health care. The databases that were evaluated in this study were of the Municipal Health Department of Ribeirão Preto (SMS-RP) and of the Clinical Hospital of the School of Medicine of Ribeirao Preto (HCFMRP/USP) having as inclusion criterion only the records of patients in the county of residence reported corresponded to the city of Ribeirão Preto and care had taken place in the Basic District Health Unit (UDBS) - School Health Center \"Joel Domingos Machado\" (CSE-Sumarezinho) included in the years from January 2006 to August 2008 and in the HCFMRP/USP. Held to select a simple random sample resulted in a set of 1,100 patient records in the database of the CSE-Sumarezinho and 370,375 records in the database of HCFMRP/USP. Then there was the selection of four linking variables (name, mother\'s name, gender and birth date). The strategies adopted were: the exact DRL, DRL with one variable where the linking is disagreement, applied with similarity functions (Dice, Levenshtein, Jaro, and Jaro-Winkler), and, finally, PRL. The strategy of the exact DRL resulted in 334 matched records and strategy in dealing with disagreement of one variable were 335, 343, 383 and 495, to the following variables discordant gender, birth date, name and mother\'s name, respectively. Regarding the use of similarity functions which most stood out were Jaro and Jaro-Winkler. Regarding the accuracy of the methods applied, the PRL obtained better sensitivity and specificity (sensitivity = 97,75% (CI 95% 96,298,8) and specificity = 98.55% (95% CI 97.0 to 99.4)), followed by the DRL with the similarity functions Jaro-Winkler (sensitivity = 91.3% (95% CI 88.7 to 93.4) and specificity = 99% (95% CI 97.6 to 99, 7)) and then by Jaro (sensitivity = 73.1% (95% CI 69.4 to 76.6) = 99.6% and specificity (95% CI 98.5 to 99.9)). The evaluation of the area under the ROC curve in the PRL, was observed that there is statistically significant difference (p = 0.0001) if it is compared with the DRL methods when there is disagreement in the variable mother\'s name, as well as for Jaro and for Jaro-Winkler. The results indicate that the PRL method is most accurate among the techniques evaluated. Although the techniques with the similarity function of Jaro-Winkler and Jaro were also interesting viable options due to the ease of use, although having the sensitivity value slightly smaller than the PRL.

APA, Harvard, Vancouver, ISO, and other styles

30

Wong, Wing-kong. "Landscape linkage along the edge waterfront design at Shau Kei Wan typhoon shelter /." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B42664378.

Full text

Abstract:

Thesis (M. L. A.)--University of Hong Kong, 2009.
Includes special report study entitled: Treatments of the tidal edge for appreciation. Includes bibliographical references. Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

31

Kinnear, H. R. "Describing and explaining variations in breast screening uptake in Northern Ireland : a census record-linkage study." Thesis, Queen's University Belfast, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.546370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Borgs, Christian [Verfasser], and Rainer [Akademischer Betreuer] Schnell. "Optimal Parameter Choice for Bloom Filter-based Privacy-preserving Record Linkage / Christian Borgs ; Betreuer: Rainer Schnell." Duisburg, 2019. http://d-nb.info/1193591090/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Russ, Thomas Charles. "Integrated investigation of dementia risk factors : insights from geography, record linkage, and individual participant meta-analysis." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8823.

Full text

Abstract:

Dementia is a public health priority and its importance is projected to increase in coming decades, particularly in low-to middle-income countries. A description of the methodological challenges of observational studies and the limitations of previous attempts to combine the published literature leads me to discuss ascertainment of dementia cases and the suitability of dementia mortality as an outcome. I report the findings of a memory clinic study where 71.5% of 502 deceased individuals with probable Alzheimer dementia had dementia correctly recorded on their death certificate, which is an improvement on similar results from two decades earlier. I review the evidence for geographical variation in dementia and discuss the implication that such variation might point towards potentially modifiable risk or protective factors for dementia. I have attempted to overcome the methodological challenges alluded to above by only examining within-study comparisons. A metaanalysis of rural-urban comparisons reveals some evidence of increased prevalence (odds ratio; 90% confidence interval (CI): 1.11; 0.79, 1.57) and incidence (1.20; 0.84, 1.71) of dementia in rural areas. These associations were stronger for Alzheimer dementia and particularly so in studies which identified early life rural residence (prevalence 2.22; 1.19, 4.16; incidence 1.64; 1.08, 2.50). Since there are no effective treatments, there is an obvious need to focus on prevention and an urgent need to improve our understanding of the aetiology of dementia in order to attempt to prevent or delay its onset. However, it is clear that prevention must begin sufficiently early in life to have an effect – intervening in later life might be too late. I describe a body of work using the Health Survey for England cohort studies examining the association between a series of risk factors and later dementiarelated death, including cardiovascular disease risk factors, psychological distress, and socioeconomic status. For example, there is a dose-response relationship between increasing psychological distress and dementia death (12-item General Health Questionnaire score 1-3 vs 0 age- and sex-adjusted hazard ratio; 95% CI: 1.44; 1.17, 1.78; score 4-12 vs 0: 1.74; 1.36, 2.22). I conclude by summarising the contribution these publications have made to the field of dementia epidemiology and by outlining ongoing and future projects building on the work presented in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

34

Leepe, Khadija Akter. "Felligi-Sunter Mixed Model And Beta Record Linkage Approach To Integrate Business Data With Social Data." Thesis, Örebro universitet, Handelshögskolan vid Örebro Universitet, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-67993.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Peres, Stela Verzinhasse. "Uso da técnica de linkage nos sistemas de informação em saúde: aplicação na base de dados do Registro de Câncer de base populacional do município de São Paulo." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/6/6132/tde-27032012-173312/.

Full text

Abstract:

A disponibilidade de grandes bases de dados informatizadas em saúde tornou a técnica de relacionamento de fontes de dados, também conhecida como linkage, uma alternativa para diferentes tipos de estudos. Esta técnica proporciona a geração de uma base de dados mais completa e de baixo custo operacional. Objetivo- Investigar a possibilidade de completar/aperfeiçoar as informações da base de dados do RCBP-SP, no período de 1997 a 2005, utilizando o processo de linkage com três outras bases, a saber: Programa de Aprimoramento de Mortalidade (PRO-AIM), Autorização e Procedimentos de Alta Complexidade (APAC-SIA/SUS) e Fundação Sistema Estadual de Análise de Dados (FSeade). Métodos- Neste estudo foi utilizada a base de dados do RCBP-SP, composta por 343.306 com casos incidentes de câncer do município de São Paulo, registrados no período de 1997 a 2005, com idades que variaram de menos de um a 106 anos, de ambos os sexos. Para a completitude das informações do RCBP-SP foram utilizadas as bases de dados, a saber: PRO-AIM, APAC-SIA/SUS e FSeade. Foram utilizadas as técnicas de linkage probabilística e determinística. O linkage probabilístico foi realizado pelo programa Reclink III versão 3.1.6. Quanto ao linkage determinístico as rotinas foram realizadas em Visual Basic, com as bases hospedadas em SQL Server. Foram calculados os coeficientes brutos de incidência (CBI) e mortalidade (CBM) antes e após o linkage. A análise de sobrevida global foi realizada pela técnica de Kaplan-Meier e para na comparação entre as curvas, utilizou-se o teste de log rank. Foram calculados os valores da área sob a curva, sensibilidade e especificidade para determinar o ponto de corte do escore de maior precisão na identificação dos pares verdadeiros. Resultados- Após o linkage, verificou-se um ganho de 101,5 por cento para a variável endereço e 31,5 por cento para a data do óbito e 80,0 por cento para a data da última informação. Quanto à variável nome da mãe, na base de dados do RCBP-SP antes do linkage esta informação representava somente 0,5 por cento , tendo sido complementada, no geral, em 76.332 registros. A análise de sobrevida global mostrou que antes do processo de linkage havia uma subestimação na probabilidade de estar vivo em todos os períodos analisados. No geral, para a análise de sobrevida truncada em sete anos, a probabilidade de estar vivo no primeiro ano de seguimento antes do linkage foi menor quando comparada a probabilidade de estar vivo ao primeiro ano de seguimento após o linkage (48,8 por cento x 61,1 por cento ; p< 0,001). Conclusão- A técnica de linkage tanto probabilística quanto determinística foi efetiva para completar/aperfeiçoar as informações da base de dados do RCBP-SP. Além do mais, o CBI apresentou um ganho de 3,4 por cento . Quanto ao CBM houve um ganho de 25,8 por cento . Após o uso da técnica de linkage, foi verificado que os valores para a sobrevida global estavam subestimados para ambos os sexos, faixas etárias e para as topografias de câncer
The availability of large computerized databases on health has enabled the record linkage technique, an alternative for different study designs. This technique provides the generation of a more complete database, at low operational cost. Objective to investigate the possibility of completing/improving information from the database of the RCBP-SP, in the period between 1997 and 2005, using the record linkage technique with other three databases, namely: Mortality Improvement Program (PRO-AIM), Authorization of Highly Complex Procedures (APAC-SIA/SUS) and State System of Data Analysis (FSeade), comparing different strategies. Methods In this study we used the database of the RCBP-SP composed of 343,306 incident cancer cases in the Municipality of São Paulo registered in the period between 1997 and 2005 with ages raging from under one to 106 years, from both sexes. To complete the database of the RCBP-SP three databases were used, namely: PRO-AIM, APAC-SIA/SUS and FSeade. Both probabilistic and deterministic record linkage were used. Probabilistic linkage was performed using the Reclink III software, version 3.1.6. As for the the deterministic record linkage, the routines were run in the Visual Basic and databases hosted on a SQL Server. Before and after record linkage, crude incidence (CIR) and mortality rates (CMR) were calculated. The overall survival analysis was performed using the Kaplan-Meier technique and for the comparison between curves, the log rank test was employed. In order to determine the most precise cut-off scores in identifying true matches, we calculated the area under the curve, as well as, sensitivity and specificity. Results After record linkage, it was verified a gain of 101.5 per cent for the variable address, 31.5 per cent for death date and 80,0 per cent for the date of latest information. As for the variable mother´s name, in the database of the RCBP-SP before record linkage, this information represented only 0.5 per cent , having been completed, in general, in 76,332 registries. The overall survival analysis showed that before the record linkage there was an underestimation of the probability of being alive for all periods assessed. In general, for the truncated survival at seven years, the probability of being alive at the first year of follow up before record linkage was lower when compared to the probability of being alive at the first year of follow up after record linkage (48.8 per cent x 61.1 per cent ; p< 0.001). Conclusion Both the probabilistic and deterministic record linkage were effective to complete/improve information from the database of the RCBP-SP. Moreover, the CIR had a gain of de 3.4 per cent . As for the CMR, there was a gain of 25.8 per cent . After using the record linkage technique, it was verified that values for overall survival were underestimated for both sexes, all age groups, and cancer sites

APA, Harvard, Vancouver, ISO, and other styles

36

Williams, Naomi. "Infant and child mortality in urban areas of nineteenth century England and Wales : a record linkage study." Thesis, University of Liverpool, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.236007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Synnestvedt, Marie B. Lin Xia. "Data preparation for biomedical knowledge domain visualization : a probabilistic record linkage and information fusion approach to citation data /." Philadelphia, Pa. : Drexel University, 2007. http://hdl.handle.net/1860/2532.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Fleming, Michael. "Using Scotland-wide record linkage to investigate the educational and health outcomes of children treated for chronic conditions." Thesis, University of Glasgow, 2017. http://theses.gla.ac.uk/8594/.

Full text

Abstract:

Objectives: This study linked Scottish education data to a number of administrative health datasets to explore associations between childhood chronic ill health and subsequent educational and health outcomes. Chronic conditions investigated were diabetes, asthma, epilepsy, attention deficit hyperactivity disorder (ADHD) and depression. Educational outcomes were number of days absent from school, number of school exclusions, special educational need (SEN), academic attainment and unemployment. Health outcomes were all-cause and cause-specific hospital admission, total number of hospital admissions, total length of hospital admission and all-cause mortality. Approach: Pupil census data and associated education records for all children attending primary and secondary schools in Scotland between 2009 and 2013 were linked to national prescribing data, hospital admissions, death records and retrospective maternity records enabling outcomes to be studied whilst controlling for socioeconomic, demographic and obstetric factors including birth outcomes and maternal antecedents. Specific medications are prescribed for some particular chronic conditions; therefore, children identified as receiving these medications whilst at school were assumed to have these conditions. Results: Children treated for each of the five conditions had more frequent absenteeism from school and were more likely than their peers to have SEN. However, only children treated for depression, epilepsy or ADHD experienced poorer academic attainment and increased odds of unemployment. Furthermore, children treated for depression or ADHD were significantly more likely to be excluded from school. Children treated for asthma experienced poorer academic attainment but no increased odds of unemployment and the association with attainment disappeared after adjusting for their increased absenteeism. Children treated for each of the five conditions had an increased risk of hospital admission and children treated for depression or epilepsy also had an increased risk of recurrent hospitalisation and longer stays in hospital. All of the chronic conditions, with the exception of ADHD, were associated with increased mortality. Conclusion: All five of the chronic conditions investigated in this thesis were associated with adverse educational and health outcomes. The number of outcomes affected varied by condition. Treated depression, epilepsy and ADHD were associated with the most wide-ranging impacts. Children treated for depression fared worse than their peers across all nine outcomes, and children treated for epilepsy and ADHD across eight and six respectively. In contrast, children treated for asthma and diabetes fared worse than their peers in respect of around half the outcomes investigated. Children with these chronic conditions at school appear to experience significant educational and health disadvantage; therefore further work is required to understand the underlying mechanisms and to develop effective interventions to reduce their risk.

APA, Harvard, Vancouver, ISO, and other styles

39

Nasseh, Daniel [Verfasser], and Jürgen [Akademischer Betreuer] Stausberg. "Einsatz und Optimierung einer überwachten Klassifizierungsmethode im Kontext eines Privacy-Preserving-Record-Linkage / Daniel Nasseh. Betreuer: Jürgen Stausberg." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2014. http://d-nb.info/1065611021/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Cheung, Ching-lung. "Genetic linkage and association studies to identify candidate genes for bone mineral density variation in Southern Chinese." Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/b40203281.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Churches, Timothy. "Estimation of a lower bound for the cumulative incidence of failure of female surgical sterilisation in NSW: a population-based study." Thesis, The University of Sydney, 2006. http://hdl.handle.net/2123/1968.

Full text

Abstract:

Female tubal sterilisation, often referred to as "tubal ligation" but more often performed these days using laparoscopically-applied metal clips, remains a popular form of contraception in women who have completed their families. A review of the literature on the incidence of failure of tubal sterilisation found many reports of case-series and small clinic-based studies, but only a few larger studies with good epidemiological designs, most recently the US CREST study conducted during the 1980s and early 1990s. The CREST study reported a conditional (life-table) cumulative incidence of failure of 0.55, 0.84, 1.18 and 1.85 per 100 women at 1, 2, 4 and 10 years of follow-up respectively. The study described here estimated a lower bound for the incidence of tubal sterilisation failure in NSW by probabilistically linking routinely-collected hospital admission records for women undergoing sterilisation surgery to hospital admission records for the same women which were indicative of subsequent conception or which represented censoring events such as hysterectomy or death in hospital. Data for the period July 1992 to June 2000 were used. Kaplan-Meier and proportional-hazards survival analyses were performed on the resulting linked data set. The conditional cumulative incidence per 100 women at 1, 2 4 and 8 years of follow-up was estimated to be 0.74 (95% CI 0.68-0.81), 1.05 (0.97-1.13), 1.33 (1.23-1.42) and 1.51 (1.39-1.62) respectively. Forty percent of failures ended in abortion and 14% presented as ectopic pregnancies. Age, private health insurance status and sterilisation in a smaller hospital were all found to be associated with lower rates of failure. Strong evidence of time-limited excess numbers of failures in women undergoing surgery in particular hospitals was also found. The study demonstrates the feasibility of using linked, routinely-collected health data to evaluate relatively rare, long-term outcomes such as sterilisation failure on a population-wide basis.

APA, Harvard, Vancouver, ISO, and other styles

42

Gonçalves, Veralice Maria. "Mapeamento da trajetória de usuários de crack na rede pública de atenção à saúde com o uso da metodologia de record linkage." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/143067.

Full text

Abstract:

As consequências do uso de substâncias psicoativas na saúde da população mundial são questões de preocupação - a maioria dos problemas dos usuários continua sendo a falta de acesso ao tratamento. Estudos longitudinais buscam identificar desfechos de recaída, mas são de alto custo. Como alternativa, estudos epidemiológicos com bases de dados secundários têm sido implementados em todo o mundo utilizando técnicas de record linkage. No Brasil, tem havido aumento do uso de registros médicos; porém, há pouca literatura sobre seu uso para seguimento de pacientes psiquiátricos, especialmente para estudos sobre usuários de drogas. Há muitos sistemas de informação na área de saúde pública sem um identificador único que possa ser utilizado para localizar um paciente em múltiplas bases de dados, condição de aplicação prática para o uso da técnica. O objetivo desse estudo foi o de produzir informações com base em dados secundários para mapear a trajetória dos usuários de crack na rede de atenção à saúde, utilizando metodologia de record linkage para o seguimento dos pacientes após sua alta hospitalar. Para isso, foi realizada a análise dos dados de atendimento disponibilizados pelos sistemas de informação em saúde pública para identificar a viabilidade de produzir informação para o seguimento dos usuários de crack na rede de atenção à saúde após a sua alta hospitalar. Com a impossibilidade de realizar esse seguimento com os dados disponíveis, a metodologia de record linkage probabilístico foi utilizada para rastrear as hospitalizações de usuários de crack e a continuidade de seu tratamento ambulatorial para estudo de seguimento desses usuários. A parcela da informação pública disponível pelos Sistemas de Informação em Saúde, não permitiu o acompanhamento de usuários entre os dispositivos de atendimento da rede de atenção. Utilizando uma amostra de 293 pacientes em tratamento para o uso de crack em duas instituições foi utilizada a metodologia e localizados 217 pacientes nos dados de internação hospitalar e 180 na base de atendimento ambulatorial; 55% foram identificados como pares verdadeiros na primeira base, enquanto apenas 12% na base de atendimento ambulatorial. Entre os dados hospitalares e ambulatoriais, demonstrou-se que dentre os usuários que realizam tratamento hospitalar, apenas 10 foram atendidos na rede ambulatorial, nesse período. Produzir informação para mapear a trajetória de usuários de crack utilizando as bases de dados dos sistemas de informação em saúde é possível por meio da metodologia de record linkage, como alternativa aos estudos longitudinais desta população de difícil acesso. Este estudo tem especial importância, pois pode contribuir também para a avaliação de programas de tratamento, por meio de indicadores de reinternação, tempos de permanência, curva de sobrevida e outros. A informação é fundamental para a implantação dos modelos de gestão que garantam as intervenções necessárias aos usuários com transtornos por uso de substâncias, especialmente no caso dos usuários de crack.
The consequences of use of psychotropic substances on the world population’s health are a matter of concern – most of the problems faced by users is still related to the lack of access to treatment. Longitudinal studies seek to identify these outcomes, but they are expensive. Alternatively, epidemiological studies based on secondary data have been applied worldwide, using record linkage methods. In Brazil, there has been an increase in the use of medical records. However, literature on its use for the follow-up of psychiatric patients, especially for studies on drug users is scarce. In the public health area, there are several information systems without an identification field that enable the location of a patient in multiple databases – which is one of the practical applications of the record linkage technique. The aim of this study was to produce information based on secondary data for mapping crack users pathway in the public healthcare network based on data linkage method, to follow them up after hospital discharge. For this, analysis of public health information systems was conducted to identify the feasibility of producing information for the follow-up of crack users in the network of health care. With the inability to conduct follow-up with the available data, the probabilistic record linkage methodology was used for tracing out crack users hospitalizations and the continuity of outpatient treatment after their discharge. The available public information from National Information Systems does not allow follow-up of patients of the health system across healthcare services, neither the monitoring of the continuation of treatment within the healthcare network. In a sample of 293 patients in treatment for crack use in two hospitals of Porto Alegre/RS; 217 patients were located in hospital admission data and 180 in the outpatient care database; 55% were identified as exact matches in the first database, whereas the outpatient database provided only 12%. Data from both hospital and outpatient care revealed that, among patients who received hospital treatment, only 10 attended outpatient care during the studied period. To produce information to track patient´s pathway is possible by record linkage method as an alternative to longitudinal studies of hard to reach populations. This study is particularly relevant, because it can also contribute to the evaluation of treatment programs, by means of indicators of rehospitalization, length of stay, survival rate etc. The formulation of public policies requires evidences based on information that, up to now, has not been adequately used, particularly that produced by existing Health Information Systems. Information is crucial for the implementation of administration models able to guarantee the necessary care to individuals with disorders resulting from drug use, especially in cases of crack users.

APA, Harvard, Vancouver, ISO, and other styles

43

Sengoku, Tami. "Diagnostic accuracy of FDG-PET cancer screening in asymptomatic individuals: use of record linkage from the Osaka Cancer Registry." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/199216.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Gunnarsdóttir, Oddný. "Users of a hospital emergency department : Diagnoses and mortality of those discharged home from the emergency department." Thesis, Nordic School of Public Health NHV, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:norden:org:diva-3323.

Full text

Abstract:

Objectives – To ascertain the annual number of users who were discharged home after visits to the emergency department, grouped by age, gender and number of visits during the calendar year, and to assess whether an increasing number of visits to the department predicted a higher mortality. Methods – This is a retrospective cohort study, at the emergency department of Landspitali University Hospital, Reykjavik capital city area, Iceland. During the years of 1995 to 2001 19259 users visited the emergency department, and were discharged home and they were follow-up for cause specific mortality through a national registry. Standardised mortality ratio, with expected number based on national mortality rates was calculated and hazard ratios according to number of visits per calendar year using time dependent multivariate regression analysis were computed. Results – The annual increase of visits to the emergency department among the patients discharged home was seven to 14 per cent per age group during the period 1995 to 2001, with a highest increase among older men. The most common discharge diagnosis was the category Symptoms, signs and abnormal clinical and laboratory findings not elsewhere classified. When emergency department users were compared with the general population, the standardised mortality ratio was 1.81 for men and 1.93 for women. Among those attending the emergency department two times, and three or more times in a calendar year, the mortality rate was higher than among those coming only once in a year. The causes of death which led to the highest mortality among frequent users of the emergency department were neoplasm, ischemic heart diseases, and the category external causes, particularly drug intoxication, suicides and probable suicides. Conclusions – The mortality of users of the emergency department who had been discharged home turned out to be higher than that of the general population. Frequent users of the emergency department had a higher mortality than those visiting the department no more than once in a year. Since the emergency department serves general medicine and surgery patients, not injuries, the high mortality due to drug intoxication, suicide and probable suicide is notable. Further studies are needed into the diagnosis at discharge of those frequently using emergency departments, in an attempt to understand and possibly prevent this mortality

ISBN 91-7997-128-8

APA, Harvard, Vancouver, ISO, and other styles

45

Tseliou, Foteini. "The use of routine administrative datasets and record-linkage to measure population mental health of young people in Northern Ireland." Thesis, Queen's University Belfast, 2017. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.728683.

Full text

Abstract:

The lifetime burden of mental ill-health, has been recognised as a major public health issue. Inconsistencies in assessment and diverse methodologies have led to a lack of comparability. This PhD thesis attempts to assess secondary administrative datasets in measuring mental health at a population level in Northern Ireland. At first, this thesis focuses on the linkage between the Child Health System (CHS) dataset and nation-wide prescription data, using the Health and Care Number. Through this, the interplay between perinatal factors and later psychotropic medication uptake is investigated. The second source of data that was assessed was information drawn from the Census returns, stored within either the Northern Ireland Longitudinal Study (NILS) or the Northern Ireland Mortality Study (NIMS). Two separate studies were run, with the first focusing on the effect of childhood residential mobility on mental health and the second investigating the burden of caregiving in mental health and mortality rates with specific reference to differences among different age groups of caregivers. Finally, an investigation was conducted on the potential use of the Northern Ireland Health Survey for the purposes of measuring population mental ill-health, comparing self-reported psychotropic medication uptake and GHQ-12 scores. This thesis highlights the potential utility and associated caveats of administrative datasets and record linkage methodologies in the measurement of population mental ill-health and its association with potential risk factors across the life course.

APA, Harvard, Vancouver, ISO, and other styles

46

Rentsch, C. T. "Point-of-contact interactive record linkage between demographic surveillance and health facilities to measure patterns of HIV service utilisation in Tanzania." Thesis, London School of Hygiene and Tropical Medicine (University of London), 2018. http://researchonline.lshtm.ac.uk/4650292/.

Full text

Abstract:

As significant investments and efforts have been made to strengthen HIV prevention and care service provisions throughout sub-Saharan Africa, approaches to monitoring uptake of these services have grown in importance. Global HIV/AIDS organisations use routinely updated estimates of the UNAIDS 90-90-90 targets, which state by 2020, 90% of all people living with HIV (PLHIV) should be diagnosed, 90% of diagnosed PLHIV should be receiving treatment, and 90% of PLHIV receiving treatment should achieve viral suppression. Currently, estimates of these targets in sub-Saharan Africa use population based demographic and HIV serological surveillance systems, which comprehensively measure vital events and HIV status but rely on self-reports of health service use. In contrast, most analyses of health service use are limited to patients already diagnosed and enrolled into clinical care and lack a population perspective. This thesis aims to augment existing computer software towards a novel approach to record linkage - termed point-of-contact interactive record linkage (PIRL) - and produce an infrastructure of linked surveillance data and medical records from clinics located within a surveillance area in northwest Tanzania. The linked data are then used to investigate methodological and substantive research questions. Paper A details the PIRL software that was used to collect the data for this thesis. Paper B reviews the data created by PIRL and reports record linkage statistics, including match percentages and attributes associated with (un)successful linkage. A subset of personal identifiers was found to drive the success of the probabilistic linkage algorithm, and PIRL was shown to outperform a fully automated linkage approach. Paper C provides original evidence measuring bias and precision in analyses of linked data with substantial linkage errors. Paper D critiques the estimation of the first 90-90-90 target and shows that current guidelines may underestimate the percentage diagnosed by a relative factor of between 10% and 20%. Finally, Paper E determines that while HIV serological surveillance has increased testing coverage, PLHIV who were diagnosed for HIV in a facility-based clinic were statistically significantly more likely to register for HIV care than those diagnosed at village-level temporary clinics during a surveillance round. Once individuals were in care, there was no evidence of any further delays to treatment initiation by testing modality. The collective findings of this thesis demonstrate the feasibility of PIRL to link community and medical records and use the linked data to measure patterns of HIV service use in a population.

APA, Harvard, Vancouver, ISO, and other styles

47

Samad, L. "Epidemiology of intussusception in children : national surveillance and use of record linkage to validate the incidence, and study of incidence trends." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1433621/.

Full text

Abstract:

Introduction: Intussusception (IS), an abdominal emergency in young children, has been linked to rotavirus vaccines used to prevent rotavirus gastroenteritis. We aimed to determine the pre-vaccination incidence of IS among infants in the United Kingdom (UK) and Republic of Ireland (ROI) in 2008-2009. IS incidence trends in children aged < 16 years were estimated for England, 1995-2009. Methods: The established BPSU system was used to estimate the IS incidence (recorded per 100,000 live births) among infants in the UK and ROI using the standard Brighton Collaboration case definition. Incidence rates for England were validated by record linkage between the BPSU and Hospital Episode Statistics (HES) datasets (2008-2009). The completeness of BPSU and HES data was calculated using capture-recapture methodology. IS incidence trends were estimated for England using HES. Results: The annual (BPSU) IS incidence in infants was 24.8 (95% CI: 21.7-28.2) in the UK and 24.2 (95% CI: 15.0-37.0) in ROI. UK rates varied: from 40.6 (95% CI: 21.0-71.8) in Northern Ireland, to Scotland (28.7, 95% CI: 17.5-44.3), then England (24.2, 95% CI: 20.9-27.9) and Wales (16.9, 95% CI: 6.8-34.8). Record linkage increased the incidence among infants from 24.2 (95% CI: 20.9-27.9) to 28.9 (95% CI: 25.3-33.0). The completeness of BPSU reporting was 81.5% compared to 85.8% for HES. A decline in IS incidence was observed from 1995 to 2009, predominantly for infants, with significantly higher (p=0.001) rates in winter and spring for both BPSU and HES data. Present clinical management of IS in the UK and ROI is associated with a higher than expected rate of surgical intervention and a lower rate of successful air-enema reduction than in other similar countries. Conclusions: National (and ROI) pre-vaccination incidence rates of IS are now available to inform post-marketing rotavirus vaccine surveillance. The present findings should improve the future epidemiological assessment of IS.

APA, Harvard, Vancouver, ISO, and other styles

48

Weng, Chao. "A pilot evaluation study on benefits of a record linkage between a hospital diabetes database and the information systems within the NHS." Thesis, King's College London (University of London), 2000. https://kclpure.kcl.ac.uk/portal/en/theses/a-pilot-evaluation-study-on-benefits-of-a-record-linkage-between-a-hospital-diabetes-database-and-the-information-systems-within-the-nhs(065d944e-29fe-442e-a981-15012719d063).html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Li, Xinran. "Evaluation et amélioration des méthodes de chaînage de données." Thesis, Clermont-Ferrand 1, 2015. http://www.theses.fr/2015CLF1MM02/document.

Full text

Abstract:

Le chaînage d’enregistrements est la tâche qui consiste à identifier parmi différentes sources de données les enregistrements qui concernent les mêmes entités. En l'absence de clé d’identification commune, cette tâche peut être réalisée à l’aide d’autres champs contenant des informations d’identifications, mais dont malheureusement la qualité n’est pas parfaite. Pour ce faire, de nombreuses méthodes dites « de chaînage de données » ont été proposées au cours des dernières décennies.Afin d’assurer le chaînage valide et rapide des enregistrements des mêmes patients dans le cadre de GINSENG, projet qui visait à mettre en place une infrastructure de grille informatique pour le partage de données médicales distribuées, il a été nécessaire d’inventorier, d’étudier et parfois d’adapter certaines des diverses méthodes couramment utilisées pour le chaînage d’enregistrements. Citons notamment les méthodes de comparaison approximative des champs d’enregistrement selon leurs épellations et leurs prononciations, les chaînages déterministe et probabiliste d’enregistrements, ainsi que leurs extensions. Ces méthodes comptent des avantages et des inconvénients qui sont ici clairement exposés.Dans la pratique, les champs à comparer étant souvent imparfaits du fait d’erreurs typographiques, notre intérêt porte particulièrement sur les méthodes probabilistes de chaînage d’enregistrements. L’implémentation de ces méthodes probabilistes proposées par Fellegi et Sunter (PRL-FS) et par Winkler (PRL-W) est précisément décrite, ainsi que leur évaluation et comparaison. La vérité des correspondances des enregistrements étant indispensable à l’évaluation de la validité des résultats de chaînages, des jeux de données synthétiques sont générés dans ce travail et des algorithmes paramétrables proposés et détaillés.Bien qu’à notre connaissance, le PRL-W soit une des méthodes les plus performantes en termes de validité de chaînages d’enregistrements en présence d’erreurs typographiques dans les champs contenant les traits d’identification, il présente cependant quelques caractéristiques perfectibles. Le PRL-W ne permet par exemple pas de traiter de façon satisfaisante le problème de données manquantes. Notons également qu’il s’agit d’une méthode dont l’implémentation n’est pas simple et dont les temps de réponse sont difficilement compatibles avec certains usages de routine. Certaines solutions ont été proposées et évaluées pour pallier ces difficultés, notamment plusieurs approches permettant d’améliorer l’efficacité du PRL-W en présence de données manquantes et d’autres destinées à optimiser les temps de calculs de cette méthode en veillant à ce que cette réduction du temps de traitement n’entache pas la validité des décisions de chaînage issues de cette méthode
Record linkage is the task of identifying which records from different data sources refer to the same entities. Without the common identification key among different databases, this task could be performed by comparison of corresponding fields (containing the information for identification) in records to link. To do this, many record linkage methods have been proposed in the last decades.In order to ensure a valid and fast linkage of the same patients’ records for GINSENG, a research project which aimed to implement a grid computing infrastructure for sharing medical data, we first studied various commonly used methods for record linkage. These are the methods of approximate comparison of fields in record according to their spellings and pronunciations; the deterministic and probabilistic record linkages and their extensions. The advantages and disadvantages of these methods are clearly demonstrated.In practice, as fields to compare are sometimes subject to typographical errors, we focused on probabilistic record linkage. The implementation of these probabilistic methods proposed by Fellegi and Sunter (PRL-FS) and Winkler (PRL-W) is described in details, and also their evaluation and comparison. Synthetic data sets were used in this work for knowing the truth of matches to evaluate the linkage results. A configurable algorithm for generating synthetic data was therefore proposed.To our knowledge, the PRL-W is one of the most effective methods in terms of validity of linkages in the presence of typographical errors in the field. However, the PRL-W does not satisfactorily treat the missing data problem in the fields, and the implementation of PRL-W is complex and has a computational time that impairs its opportunity in routine use. Solutions are proposed here with the objective of improving the effectiveness of PRL-W in the presence of missing data in the fields. Other solutions are tested to simplify the PRL-W algorithm and both reduce computational time and keep and optimal linkage accuracy.Keywords:

APA, Harvard, Vancouver, ISO, and other styles

50

Miyoshi, Newton Shydeo Brandão. "Arquitetura e métodos de integração de dados e interoperabilidade aplicados na saúde mental." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/17/17138/tde-20072018-100724/.

Full text

Abstract:

A disponibilidade e integração das informações em saúde relativas a um mesmo paciente entre diferentes níveis de atenção ou entre diferentes instituições de saúde é normalmente incompleta ou inexistente. Isso acontece principalmente porque os sistemas de informação que oferecem apoio aos profissionais da saúde não são interoperáveis, dificultando também a gestão dos serviços a nível municipal e regional. Essa fragmentação da informação também é desafiadora e preocupante na área da saúde mental, em que normalmente se exige um cuidado prolongado e que integra diferentes tipos de serviços de saúde. Problemas como a baixa qualidade e indisponibilidade de informações, assim como a duplicidade de registros, são importantes aspectos na gestão e no cuidado prolongado ao paciente portador de transtornos mentais. Apesar disso, ainda não existem estudos objetivos demonstrando o impacto efetivo da interoperabilidade e integração de dados na gestão e na qualidade de dados para a área de saúde mental. Objetivos: Neste contexto, o projeto tem como objetivo geral propor uma arquitetura de interoperabilidade para a assistência em saúde regionalizada e avaliar a efetividade de técnicas de integração de dados e interoperabilidade para a gestão dos atendimentos e internações em saúde mental na região de Ribeirão Preto, assim como o impacto na melhoria e disponibilidade dos dados por meio de métricas bem definidas. Métodos: O framework de interoperabilidade proposto tem como base a arquitetura cliente-servidor em camadas. O modelo de informação de interoperabilidade foi baseado em padrões de saúde internacionais e nacionais. Foi proposto um servidor de terminologias baseado em padrões de informação em saúde. Foram também utilizados algoritmos de Record Linkage para garantir a identificação unívoca do paciente. Para teste e validação da proposta foram utilizados dados de diferentes níveis de atenção à saúde provenientes de atendimentos na rede de atenção psicossocial na região de Ribeirão Preto. Os dados foram extraídos de cinco fontes diferentes: (i) a Unidade Básica de Saúde da Família - I, de Santa Cruz da Esperança; (ii) o Centro de Atenção Integrada à Saúde, de Santa Rita do Passa Quatro; (iii) o Hospital Santa Tereza; (iv) as informações de solicitações de internação contidas no SISAM (Sistema de Informação em Saúde Mental); e (v) dados demográficos do Barramento do Cartão Nacional de Saúde do Ministério da Saúde. As métricas de qualidade de dados utilizadas foram completude, consistência, duplicidade e acurácia. Resultados: Como resultado deste trabalho, foi projetado, desenvolvido e testado a plataforma de interoperabilidade em saúde, denominado eHealth-Interop. Foi adotada uma proposta de interoperabilidade por meio de serviços web com um modelo de integração de dados baseado em um banco de dados centralizador. Foi desenvolvido também um servidor de terminologias, denominado eHealth-Interop Terminology Server, que pode ser utilizado como um componente independente e em outros contextos médicos. No total foram obtidos dados de 31340 registros de pacientes pelo SISAM, e-SUS AB de Santa Cruz da Esperança, do CAIS de Santa Rita do Passa Quatro, do Hospital Santa Tereza e do Barramento do CNS do Ministério da Saúde. Desse total, 30,47% (9548) registros foram identificados como presente em mais de 1 fonte de informação, possuindo diferentes níveis de acurácia e completude. A análise de qualidade de dados, abrangendo todas os registros integrados, obteve uma melhoria na completude média de 18,40% (de 56,47% para 74,87%) e na acurácia sintática média de 1,08% (de 96,69% para 96,77%). Na análise de consistência houve melhoras em todas as fontes de informação, variando de uma melhoria mínima de 14.4% até o máximo de 51,5%. Com o módulo de Record Linkage foi possível quantificar, 1066 duplicidades e, dessas, 226 foram verificadas manualmente. Conclusões: A disponibilidade e a qualidade da informação são aspectos importantes para a continuidade do atendimento e gerenciamento de serviços de saúde. A solução proposta neste trabalho visa estabelecer um modelo computacional para preencher essa lacuna. O ambiente de interoperabilidade foi capaz de integrar a informação no caso de uso de saúde mental com o suporte de terminologias clínicas internacionais e nacionais sendo flexível para ser estendido a outros domínios de atenção à saúde.
The availability and integration of health information from the same patient between different care levels or between different health services is usually incomplete or non-existent. This happens especially because the information systems that support health professionals are not interoperable, making it difficult to manage services at the municipal and regional level. This fragmentation of information is also challenging and worrying in the area of mental health, where long-term care is often required and integrates different types of health services and professionals. Problems such as poor quality and unavailability of information, as well as duplicate records, are important aspects in the management and long-term care of patients with mental disorders. Despite this, there are still no objective studies that demonstrate the effective impact of interoperability and data integration on the management and quality of data for the mental health area. Objectives: In this context, this project proposes an interoperability architecture for regionalized health care management. It also proposes to evaluate the effectiveness of data integration and interoperability techniques for the management of mental health hospitalizations in the Ribeirão Preto region as well as the improvement in data availability through well-defined metrics. Methods: The proposed framework is based on client-service architecture to be deployed in the web. The interoperability information model was based on international and national health standards. It was proposed a terminology server based on health information standards. Record Linkage algorithms were implemented to guarantee the patient identification. In order to test and validate the proposal, we used data from different health care levels provided by the mental health care network in the Ribeirão Preto region. The data were extracted from five different sources: the Family Health Unit I of Santa Cruz da Esperança, the Center for Integrated Health Care of Santa Rita do Passa Quatro, Santa Tereza Hospital, the information on hospitalization requests system in SISAM (Mental Health Information System) and demographic data of the Brazilian Ministry of Health Bus. Results: As a result of this work, the health interoperability platform, called eHealth-Interop, was designed, developed and tested. A proposal was adopted for interoperability through web services with a data integration model based on a centralizing database. A terminology server, called eHealth-Interop Terminology Server, has been developed that can be used as an independent component and in other medical contexts. In total, 31340 patient records were obtained from SISAM, eSUS-AB from Santa Cruz da Esperança, from CAIS from Santa Rita do Passa Quatro, from Santa Tereza Hospital and from the CNS Service Bus from the Brazillian Ministry of Health. 47% (9548) records were identified as present in more than 1 information source, having different levels ofaccuracy and completeness. The data quality analysis, covering all integrated records, obtained an improvement in the average completeness of 18.40% (from 56.47% to 74.87%) and the mean syntactic accuracy of 1.08% (from 96,69% to 96.77%). In the consistency analysis there were improvements in all information sources, ranging from a minimum improvement of 14.4% to a maximum of 51.5%. With the Record Linkage module it was possible to quantify 1066 duplications, of which 226 were manually verified. Conclusions: The information\'s availability and quality are both important aspects for the continuity of care and health services management. The solution proposed in this work aims to establish a computational model to fill this gap. It has been successfully applied in the mental health care context and is flexible to be extendable to other medical domains.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Record linkage'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles