Dissertations / Theses on the topic 'Private data publishing'

To see the other types of publications on this topic, follow the link: Private data publishing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 28 dissertations / theses for your research on the topic 'Private data publishing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Yihua. "ON DATA UTILITY IN PRIVATE DATA PUBLISHING." Miami University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=miami1272986770.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Shang, Hui. "Privacy Preserving Kin Genomic Data Publishing." Miami University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=miami1594835227299524.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lin, Zehua. "Privacy Preserving Social Network Data Publishing." Miami University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=miami1610045108271476.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Loukides, Grigorios. "Data utility and privacy protection in data publishing." Thesis, Cardiff University, 2008. http://orca.cf.ac.uk/54743/.

Full text
Abstract:
Data about individuals is being increasingly collected and disseminated for purposes such as business analysis and medical research. This has raised some privacy concerns. In response, a number of techniques have been proposed which attempt to transform data prior to its release so that sensitive information about the individuals contained within it is protected. A:-Anonymisation is one such technique that has attracted much recent attention from the database research community. A:-Anonymisation works by transforming data in such a way that each record is made identical to at least A: 1 other records with respect to those attributes that are likely to be used to identify individuals. This helps prevent sensitive information associated with individuals from being disclosed, as each individual is represented by at least A: records in the dataset. Ideally, a /c-anonymised dataset should maximise both data utility and privacy protection, i.e. it should allow intended data analytic tasks to be carried out without loss of accuracy while preventing sensitive information disclosure, but these two notions are conflicting and only a trade-off between them can be achieved in practice. The existing works, however, focus on how either utility or protection requirement may be satisfied, which often result in anonymised data with an unnecessarily and/or unacceptably low level of utility or protection. In this thesis, we study how to construct /-anonymous data that satisfies both data utility and privacy protection requirements. We propose new criteria to capture utility and protection requirements, and new algorithms that allow A:-anonymisations with required utility/protection trade-off or guarantees to be generated. Our extensive experiments using both benchmarking and synthetic datasets show that our methods are efficient, can produce A:-anonymised data with desired properties, and outperform the state of the art methods in retaining data utility and providing privacy protection.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Xiaoqiang. "Privacy Preserving Data Publishing for Recommender System." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-155785.

Full text
Abstract:
Driven by mutual benefits, exchange and publication of data among various parties is an inevitable trend. However, released data often contains sensitive information thus direct publication violates individual privacy. This undertaking is in the scope of privacy preserving data publishing (PPDP). Among many privacy models, K- anonymity framework is popular and well-studied, it protects data by constructing groups of anonymous records such that each record in the table released is covered by no fewer than k-1 other records. This thesis investigates different privacy models and focus on achieving k-anonymity for large scale and sparse databases, especially recommender systems. We present a general process for anonymization of large scale database. A preprocessing phase strategically extracts preference matrix from original data by Singular Value Decomposition (SVD) eliminates the high dimensionality and sparsity problem. A new clustering based k-anonymity heuristic named Bisecting K-Gather (BKG) is invented and proved to be efficient and accurate. To support customized user privacy assignments, we also proposed a new concept called customized k-anonymity along with a corresponding algorithm. Experiments on MovieLens database are assessed and also presented. The results show we can release anonymized data with low compromising privacy.
APA, Harvard, Vancouver, ISO, and other styles
6

Sehatkar, Morvarid. "Towards a Privacy Preserving Framework for Publishing Longitudinal Data." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31629.

Full text
Abstract:
Recent advances in information technology have enabled public organizations and corporations to collect and store huge amounts of individuals' data in data repositories. Such data are powerful sources of information about an individual's life such as interests, activities, and finances. Corporations can employ data mining and knowledge discovery techniques to extract useful knowledge and interesting patterns from large repositories of individuals' data. The extracted knowledge can be exploited to improve strategic decision making, enhance business performance, and improve services. However, person-specific data often contain sensitive information about individuals and publishing such data poses potential privacy risks. To deal with these privacy issues, data must be anonymized so that no sensitive information about individuals can be disclosed from published data while distortion is minimized to ensure usefulness of data in practice. In this thesis, we address privacy concerns in publishing longitudinal data. A data set is longitudinal if it contains information of the same observation or event about individuals collected at several points in time. For instance, the data set of multiple visits of patients of a hospital over a period of time is longitudinal. Due to temporal correlations among the events of each record, potential background knowledge of adversaries about an individual in the context of longitudinal data has specific characteristics. None of the previous anonymization techniques can effectively protect longitudinal data against an adversary with such knowledge. In this thesis we identify the potential privacy threats on longitudinal data and propose a novel framework of anonymization algorithms in a way that protects individuals' privacy against both identity disclosure and attribute disclosure, and preserves data utility. Particularly, we propose two privacy models: (K,C)^P -privacy and (K,C)-privacy, and for each of these models we propose efficient algorithms for anonymizing longitudinal data. An extensive experimental study demonstrates that our proposed framework can effectively and efficiently anonymize longitudinal data.
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Hui. "Secure query answering and privacy-preserving data publishing." Thesis, University of British Columbia, 2007. http://hdl.handle.net/2429/31721.

Full text
Abstract:
The last several decades have witnessed a phenomenal growth in the networking infrastructure connecting computers all over the world. The Web has now become an ubiquitous channel for information sharing and dissemination. More and more data is being exchanged and published on the Web. This growth has created a whole new set of research challenges, while giving a new spin to some existing ones. For example, XML(eXtensible Markup Language), a self-describing and semi-structured data format, has emerged as the standard for representing and exchanging data between applications across the Web. An important issue of data publishing is the protection of sensitive and private information. However, security/privacy-enhancing techniques bring disadvantages: security-enhancing techniques may incur overhead for query answering, while privacy-enhancing techniques may ruin data utility. In this thesis, we study how to overcome such overhead. Specifically, we address the following two problems in this thesis: (a) efficient and secure query evaluation over published XML databases, and (b) publishing relational databases while protecting privacy and preserving utility. The first part of this thesis focuses on efficiency and security issues of query evaluation over XML databases. To protect sensitive information in the published database, security policies must be defined and enforced, which will result in unavoidable overhead. Due to the security overhead and the complex structure of XML databases, query evaluation may become inefficient. In this thesis, we study how to securely and efficiently evaluate queries over XML databases. First, we consider the access-controlled database. We focus on a security model by which every XML element either is locally assigned a security level or inherits the security level from one of its ancestors. Security checks in this model can cause considerable overhead for query evaluation. We investigate how to reduce the security overhead by analyzing the subtle interactions between inheritance of security levels and the structure of the XML database. We design a sound and complete set of rules and develop efficient, polynomial-time algorithms for optimizing security checks on queries. Second, we consider encrypted XML database in a "database-as-service" model, in which the private database is hosted by an untrusted server. Since the untrusted server has no decryption key, its power of query processing is very limited, which results in inefficient query evaluation. We study how to support secure and efficient query evaluation in this model. We design the metadata that will be hosted on the server side with the encrypted database. We show that the presence of the metadata not only facilitates query processing but also guarantees data security. We prove that by observing a series of queries from the client and responses by itself, the server's knowledge about the sensitive information in the database is always below a given security threshold. The second part of this thesis studies the problem of preserving both privacy and the utility when publishing relational databases. To preserve utility, the published data will not be perturbed. Instead, the base table in the original database will be decomposed into several view tables. First, we define a general framework to measure the likelihood of privacy breach of a published view. We propose two attack models, unrestricted and restricted models, and derive formulas to quantify the privacy breach for each model. Second, we take utility into consideration. Specifically, we study the problem of how to design the scheme of published views, so that data privacy is protected while maximum utility is guaranteed. Given a database and its scheme, there are exponentially many candidates for published views that satisfy both privacy and utility constraints. We prove that finding the globally optimal safe and faithful view, i.e., the view that does not violate any privacy constraints and provides the maximum utility, is NP-hard. We propose the locally optimal safe and faithful view as the heuristic, and show how we can efficiently find a locally optimal safe and faithful view in polynomial time.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, Zhengli. "Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing." Related electronic resource: Current Research at SU : database of SU dissertations, recent titles available full text, 2008. http://wwwlib.umi.com/cr/syr/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hajian, Sara. "Simultaneous discrimination prevention and privacy protection in data publishing and mining." Doctoral thesis, Universitat Rovira i Virgili, 2013. http://hdl.handle.net/10803/119651.

Full text
Abstract:
Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy violation and potential discrimination. The former is an unintentional or deliberate disclosure of a user pro le or activity data as part of the output of a data mining algorithm or as a result of data sharing. For this reason, privacy preserving data mining has been introduced to trade o the utility of the resulting data/models for protecting individual privacy. The latter consists of treating people unfairly on the basis of their belonging to a speci c group. Automated data collection and data mining techniques such as classi cation have paved the way to making automated decisions, like loan granting/denial, insurance premium computation, etc. If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, anti-discrimination techniques including discrimination discovery and prevention have been introduced in data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on discriminatory attributes. Indirect discrimination occurs when decisions are made based on non-discriminatory attributes which are strongly correlated with biased discriminatory ones. In the rst part of this thesis, we tackle discrimination prevention in data mining and propose new techniques applicable for direct or indirect discrimination prevention individually or both at the same time. We discuss how to clean training datasets and outsourced datasets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (non-discriminatory) classi cation rules. The experimental evaluations demonstrate that the proposed techniques are e ective at removing direct and/or indirect discrimination biases in the original dataset while preserving data quality. In the second part of this thesis, by presenting samples of privacy violation and potential discrimination in data mining, we argue that privacy and discrimination risks should be tackled together. We explore the relationship between privacy preserving data mining and discrimination prevention in data mining to design holistic approaches capable of addressing both threats simultaneously during the knowledge discovery process. As part of this e ort, we have investigated for the rst time the problem of discrimination and privacy aware frequent pattern discovery, i.e. the sanitization of the collection of patterns mined from a transaction database in such a way that neither privacy-violating nor discriminatory inferences can be inferred on the released patterns. Moreover, we investigate the problem of discrimination and privacy aware data publishing, i.e. transforming the data, instead of patterns, in order to simultaneously ful ll privacy preservation and discrimination prevention. In the above cases, it turns out that the impact of our transformation on the quality of data or patterns is the same or only slightly higher than the impact of achieving just privacy preservation.
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Cao. "Rigorous and Flexible Privacy Protection Framework for Utilizing Personal Spatiotemporal Data." 京都大学 (Kyoto University), 2017. http://hdl.handle.net/2433/225733.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Jafer, Yasser. "Task Oriented Privacy-preserving (TOP) Technologies Using Automatic Feature Selection." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34320.

Full text
Abstract:
A large amount of digital information collected and stored in datasets creates vast opportunities for knowledge discovery and data mining. These datasets, however, may contain sensitive information about individuals and, therefore, it is imperative to ensure that their privacy is protected. Most research in the area of privacy preserving data publishing does not make any assumptions about an intended analysis task applied on the dataset. In many domains such as healthcare, finance, etc; however, it is possible to identify the analysis task beforehand. Incorporating such knowledge of the ultimate analysis task may improve the quality of the anonymized data while protecting the privacy of individuals. Furthermore, the existing research which consider the ultimate analysis task (e.g., classification) is not suitable for high-dimensional data. We show that automatic feature selection (which is a well-known dimensionality reduction technique) can be utilized in order to consider both aspects of privacy and utility simultaneously. In doing so, we show that feature selection can enhance existing privacy preserving techniques addressing k-anonymity and differential privacy and protect privacy while reducing the amount of modifications applied to the dataset; hence, in most of the cases achieving higher utility. We consider incorporating the concept of privacy-by-design within the feature selection process. We propose techniques that turn filter-based and wrapper-based feature selection into privacy-aware processes. To this end, we build a layer of privacy on top of regular feature selection process and obtain a privacy preserving feature selection that is not only guided by accuracy but also the amount of protected private information. In addition to considering privacy after feature selection we introduce a framework for a privacy-aware feature selection evaluation measure. That is, we incorporate privacy during feature selection and obtain a list of candidate privacy-aware attribute subsets that consider (and satisfy) both efficacy and privacy requirements simultaneously. Finally, we propose a multi-dimensional, privacy-aware evaluation function which incorporates efficacy, privacy, and dimensionality weights and enables the data holder to obtain a best attribute subset according to its preferences.
APA, Harvard, Vancouver, ISO, and other styles
12

Li, Yidong. "Preserving privacy in data publishing and analysis." Thesis, 2011. http://hdl.handle.net/2440/68556.

Full text
Abstract:
As data collection and storage techniques being greatly improved, data analysis is becoming an increasingly important issue in many business and academic collaborations that enhances their productivity and competitiveness. Multiple techniques for data analysis, such as data mining, business intelligence, statistical analysis and predictive analytics, have been developed in different science, commerce and social science domains. To ensure quality data analysis, effective information sharing between organizations becomes a vital requirement in today’s society. However, the shared data often contains person-specific and sensitive information like medical records. As more and more realworld datasets are released publicly, there is a growing concern about privacy breaches for the entities involved. To respond to this challenge, this thesis discusses the problem of eliminating privacy threats while, at the same time, preserving useful information in the released database for data analysis. The first part of this thesis discuss the problem of privacy preservation on relational data. Due to the inherent drawbacks of applying equi-depth data swapping in distancebased data analysis, we study efficient swapping algorithms based on equi-width partitioning for relational data publishing. We develop effective methods for both univariate and multivariate data swapping. With extensive theoretical analysis and experimental validation, we show that, Equi-Width Swapping (EWS) can achieve a similar performance in privacy preservation to that of Equi-Depth Swapping (EDS) if the number of partitions is sufficiently large (e.g. ≳ √n, where n is the size of dataset). In addition, our analysis shows that the multivariate EWS algorithm has much lower computational complexity O(n) than that of the multivariate EDS (which is O(n³) basically), while it still provides good protection for sensitive information. The second part of this thesis focuses on solving the problem of privacy preservation on graphs, which has increasing significance as more and more real-world graphs modelling complex systems such as social networks are released publicly, . We point out that the real labels of a large portion of nodes can be easily re-identified with some weight-related attacks in a weighted graph, even the graph is perturbed with weight-independent invariants like degree. Two concrete attacks have been identified based on the following elementary weight invariants: 1) volume: the sum of adjacent weights for a vertex; and 2) histogram: the neighborhood weight distribution of a vertex. In order to protect a graph from these attacks, we formalize a general model for weighted graph anonymization and provide efficient methods with respect to a two-step framework including property anonymization and graph reconstruction. Moreover, we theoretically prove the histogram anonymization problem is NP-hard in the general case, and present an efficient heuristic algorithm for this problem running in near-quadratic time on graph size. The final part of this thesis turns to exploring efficient privacy preserving techniques for hypergraphs, meanwhile, maintaining the quality of community detection. We first model a background knowledge attack based on so-called rank, which is one of the important properties of hyperedges. Then, we show empirically how high the disclosure risk is with the attack to breach the real-world data. We formalize a general model for rank-based hypergraph anonymization, and justify its hardness. As a solution, we extend the two-step framework for graph anonymization into our new problem and propose efficient algorithms that perform well on preserving data privacy. Also, we explore the issue of constructing a hypergraph with a specified rank set in the first place so far as we know. The proposed construction algorithm also has the characteristics of minimizing the bias of community detection on the original and the perturbed hypergraphs. In addition, we consider two de-anonymizing schemes that may be used to attack an anonymizied hypergraph and verify that both schemes fail in breaching the privacy of a hypergraph with rank anonymity in the real-world case.
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2011
APA, Harvard, Vancouver, ISO, and other styles
13

"Privacy preserving data publishing." Thesis, 2008. http://library.cuhk.edu.hk/record=b6074672.

Full text
Abstract:
The advance of information technologies has enabled various organizations (e.g., census agencies, hospitals) to collect large volumes of sensitive personal data (e.g., census data, medical records). Due to the great research value of such data, it is often released for public benefit purposes, which, however, poses a risk to individual privacy. A typical solution to this problem is to anonymize the data before releasing it to the public. In particular, the anonymization should be conducted in a careful manner, such that the published data not only prevents an adversary from inferring sensitive information, but also remains useful for data analysis.
This thesis prevents an extensive study on the anonymization techniques for privacy preserving data publishing. We explore various aspects of the problem (e.g., definitions of privacy, modeling of the adversary, methodologies of anonymization), and devise novel solutions that address several important issues overlooked by previous work. Experiments with real-world data confirm the effectiveness and efficiency of our techniques.
Xiao, Xiaokui.
Adviser: Yufei Yao.
Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3618.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 307-314).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
APA, Harvard, Vancouver, ISO, and other styles
14

Iftikhar, Masooma. "Privacy-Preserving Data Publishing." Phd thesis, 2022. http://hdl.handle.net/1885/272877.

Full text
Abstract:
With the advances of data analytics, preserving privacy in publishing data about individuals becomes an important task. The data publishing process includes two phases: (i) data collection phase, and (ii) data publishing phase. In the data collection phase companies, organizations, and government agencies collect data from individuals through different means (such as surveys, polls, and questionnaires). Subsequently, in the data publishing phase, the data publisher or data holder publishes the collected data and information for analysis and research purposes which are later used to inform policy decision making. Given the private nature of collected data about individuals, releasing such data may raise privacy concerns, and there has been much interest to devise privacy-preserving mechanisms for data analysis. Moreover, preserving privacy of an individual while enhancing utility of published data is one of the most challenging problems in data privacy, requiring well-designed privacy-preserving mechanisms for data publishing. In recent years, differential privacy has emerged as one formal notion of privacy. To publish data under the guarantees of differential privacy, there is a need for preserving data utility, along with data privacy. However, the utility of published data under differential privacy is often limited, due to the amount of noise needed to achieve differential privacy. One of the key challenges in differentially private data publishing mechanisms is to simultaneously preserve data privacy while enhancing data utility. This thesis undertakes this challenge and introduces novel privacy-preserving mechanisms under the privacy guarantee of differential privacy to publish individuals' data while enhancing published data utility for different data structures. In this thesis, I explore both relational data publishing and graph data publishing. The first part of this thesis will consider the problem of generating differentially private datasets by integrating microaggregation into the relational data publishing methods in order to enhance published data utility. The second part of this thesis will consider graph data publishing. When applying differential privacy to network data, two interpretations of differential privacy exist: \emph{edge differential privacy} (edge-DP) and \emph{node differential privacy} (node-DP). Under edge-DP, I propose a microaggregation-based framework for graph anonymization which preserves the topological structures of an original graph at different levels of granularity through adding controlled perturbation to its edges. Under node-DP, I study the problem of publishing higher-order network statistics. Furthermore, I consider personalization to achieve personal data protection under personalized (edge or node) differential privacy while enhancing network data utility. To this extent, four approaches are proposed to handle the personal privacy requirements of individuals. I have conducted extensive experiments using real-world datasets to verify the utility enhancement and privacy guarantee of the proposed frameworks against existing state-of-the-art methods to publish relational and graph data.
APA, Harvard, Vancouver, ISO, and other styles
15

Cao, Ming. "Privacy Protection on RFID Data Publishing." Thesis, 2009. http://spectrum.library.concordia.ca/976641/1/MR63109.pdf.

Full text
Abstract:
Radio Frequency IDentification (RFID) is a technology of automatic object identification. Retailers and manufacturers have created compelling business cases for deploying RFID in their supply chains. Yet, the uniquely identifiable objects pose a privacy threat to individuals. In this paper, we study the privacy threats caused by publishing RFID data. Even if the explicit identifying information, such as name and social security number, has been removed from the published RFID data, an adversary may identify a target victim's record or infer her sensitive value by matching a priori known visited locations and time. RFID data by its nature is high-dimensional and sparse, so applying traditional k -anonymity to RFID data suffers from the curse of high-dimensionality, and results in poor information usefulness. We define a new privacy model and develop an anonymization algorithm to accommodate special challenges on RFID data. Then, we evaluate its effectiveness on synthetic data sets.
APA, Harvard, Vancouver, ISO, and other styles
16

Chen, Rui. "Toward Privacy in High-Dimensional Data Publishing." Thesis, 2012. http://spectrum.library.concordia.ca/974691/4/Chen_PhD_F2012.pdf.

Full text
Abstract:
Nowadays data sharing among multiple parties has become inevitable in various application domains for diverse reasons, such as decision support, policy development and data mining. Yet, data in its raw format often contains person-specific sensitive information, and publishing such data without proper protection may jeopardize individual privacy. This fact has spawned extensive research on privacy-preserving data publishing (PPDP), which balances the fundamental trade-off between individual privacy and the utility of published data. Early research of PPDP focuses on protecting private and sensitive information in relational and statistical data. However, the recent prevalence of several emerging types of high-dimensional data has rendered unique challenges that prevent traditional PPDP techniques from being directly used. In this thesis, we address the privacy concerns in publishing four types of high-dimensional data, namely set-valued data, trajectory data, sequential data and network data. We develop effective and efficient non-interactive data publishing solutions for various utility requirements. Most of our solutions satisfy a rigorous privacy guarantee known as differential privacy, which has been the de facto standard for privacy protection. This thesis demonstrates that our solutions have exhibited great promise for releasing useful high-dimensional data without endangering individual privacy.
APA, Harvard, Vancouver, ISO, and other styles
17

"Preservation of privacy in sensitive data publishing." 2008. http://library.cuhk.edu.hk/record=b5893631.

Full text
Abstract:
Li, Jiexing.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves [105]-110).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Problem Statement --- p.1
Chapter 1.2 --- Contributions --- p.3
Chapter 1.3 --- Thesis Organization --- p.5
Chapter 2 --- Background Study --- p.7
Chapter 2.1 --- Generalization Algorithms --- p.7
Chapter 2.2 --- Privacy Principles --- p.10
Chapter 2.3 --- Other Related Research --- p.11
Chapter 3 --- Anti-Corruption Privacy Preserving Publication --- p.13
Chapter 3.1 --- Motivation --- p.13
Chapter 3.2 --- Problem Settings --- p.14
Chapter 3.3 --- Defects of Generalization --- p.18
Chapter 3.4 --- Perturbed Generalization --- p.23
Chapter 3.5 --- Modeling Privacy Attacks --- p.26
Chapter 3.5.1 --- Corruption-Aided Linking Attacks --- p.26
Chapter 3.5.2 --- Posterior Confidence Derivation --- p.28
Chapter 3.6 --- Formal Results --- p.30
Chapter 3.7 --- Experiments --- p.34
Chapter 3.8 --- Summary --- p.37
Chapter 4 --- Preservation of Proximity Privacy --- p.39
Chapter 4.1 --- Motivation --- p.39
Chapter 4.2 --- Formalization --- p.40
Chapter 4.2.1 --- Privacy Attacks --- p.41
Chapter 4.2.2 --- "(ε, m)-Anonymity" --- p.42
Chapter 4.3 --- Inadequacy of the Existing Methods --- p.44
Chapter 4.3.1 --- Inadequacy of Generalization Principles --- p.45
Chapter 4.3.2 --- Inadequacy of Perturbation --- p.49
Chapter 4.4 --- "Characteristics of (Epsilon, m) Anonymity" --- p.51
Chapter 4.4.1 --- A Reduction --- p.51
Chapter 4.4.2 --- Achievable Range of m Given e1and e2 --- p.53
Chapter 4.4.3 --- Achievable e1 and e2 Given m --- p.57
Chapter 4.4.4 --- Selecting the Parameters --- p.60
Chapter 4.5 --- Generalization Algorithm --- p.61
Chapter 4.5.1 --- Non-Monotonicity and Predictability --- p.61
Chapter 4.5.2 --- The Algorithm --- p.63
Chapter 4.6 --- Experiments --- p.65
Chapter 4.7 --- Summary --- p.70
Chapter 5 --- Privacy Preserving Publication for Multiple Users --- p.71
Chapter 5.1 --- Motivation --- p.71
Chapter 5.2 --- Problem Definition --- p.74
Chapter 5.2.1 --- K-Anonymity --- p.75
Chapter 5.2.2 --- An Observation --- p.76
Chapter 5.3 --- The Butterfly Method --- p.78
Chapter 5.3.1 --- The Butterfly Structure --- p.78
Chapter 5.3.2 --- Anonymization Algorithm --- p.83
Chapter 5.4 --- Extensions --- p.89
Chapter 5.4.1 --- Handling More Than Two QIDs --- p.89
Chapter 5.4.2 --- Handling Collusion --- p.91
Chapter 5.5 --- Experiments --- p.93
Chapter 5.6 --- Summary --- p.101
Chapter 6 --- Conclusions and Future Work --- p.102
Chapter A --- List of Publications --- p.104
Bibliography --- p.105
APA, Harvard, Vancouver, ISO, and other styles
18

HSIAO, MEI-HUI, and 蕭美慧. "Privacy-Preserving Data Publishing with Missing Values." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/7t7u9u.

Full text
Abstract:
碩士
國立高雄大學
資訊工程學系碩士班
105
Recently, privacy preserving data publishing has become an important research issue. Over the past few years, although there have been many different privacy preserving data anonymization methods proposed by the researchers, all of them are dealing with non-missing data. However, in the real world most published data contain missing values. None of contemporary work notices this problem and investigates the effect of missing values to privacy preserving data publishing.   The aim of this research was to discuss the impact of missing values to current privacy preserving anonymization methods and propose appropriate solutions. We investigate possible strategies as well as the deficiencies for adopting contemporary anonymization methods for missing values. Accordingly, we propose a new strategy and two privacy protection models, called Closed k-anonymity and Closed l-diversity. Closed k-anonymity can prevent record linkage attack, while Closed l-diversity can prevent attribute linkage attack. We also propose two corresponding algorithms, called Closed k-anonymization and Closed l-diversification.   In the last, we compare our methods with the famous k-anonymity and l-diversity, evaluating their performances by measuring the information loss, privacy risk and data utility on anonymizing two real datasets, including census data and FAERS data. Experimental results show that our methods can effectively anonymize data with missing values, not only preventing privacy disclosure but also sustaining the data utility and analyzed results.
APA, Harvard, Vancouver, ISO, and other styles
19

Al-Hussaeni, Khalil. "Preserving Data Privacy and Information Usefulness for RFID Data Publishing." Thesis, 2009. http://spectrum.library.concordia.ca/976457/1/MR63082.pdf.

Full text
Abstract:
Radio-Frequency IDentification (RFID) is an emerging technology that employs radio waves to identify, locate, and track objects. RFID technology has wide applications in many areas including manufacturing, healthcare, and transportation. However, the manipulation of uniquely identifiable objects gives rise to privacy concerns for the individuals carrying these objects. Most previous works on privacy-preserving RFID technology, such as EPC re-encryption and killing tags, have focused on the threats caused by the physical RFID tags in the data collection phase, but these techniques cannot address privacy threats in the data publishing phase, when a large volume of RFID data is released to a third party. We explore the privacy threats in RFID data publishing. We illustrate that even though explicit identifying information, such as phone numbers and SSNs, is removed from the published RFID data, an attacker may still be able to perform privacy attacks by utilizing background knowledge about a target victim's visited locations and timestamps. Privacy attacks include identifying a target victim's record and/or inferring their sensitive information. High-dimensionality is an inherent characteristic in RFID data; therefore, applying traditional anonymity models, such as K -anonymity, to RFID data would significantly reduce data utility. We propose a new privacy model, devise an anonymization algorithm to address the special challenges of RFID data, and experimentally evaluate the performance of our method. Experiments suggest that applying our model significantly improves the data utility when compared to applying the traditional K -anonymity model.
APA, Harvard, Vancouver, ISO, and other styles
20

Yang, Duen-Chuan, and 楊敦筌. "Privacy Preserving Data Publishing Techniques for Spontaneous Reporting System Data." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/53521333395278780908.

Full text
Abstract:
碩士
國立高雄大學
資訊工程學系碩士班
103
In recent years, spontaneous reporting systems (SRSs) have been widely established to collect adverse drug events (ADEs) for ADR detection and analysis, e.g., the FDA Adverse Event Reporting System (FAERS). Usually, SRS data contain sensitive personal health information that should be protected to prevent the identification of individuals, raising the need of anonymizing the raw data before being published, namely privacy-preserving data publishing (PPDP). Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data. In this thesis, we present the problem of and research issues for anonymizing spontaneous ADE reporting data for privacy-preserving ADR signal detection first. Four main characteristics of spontaneous ADE data are identified, including rare ADE events, multiple individual records, multi-valued sensitive attribute, and missing values. We examine the feasibility of contemporary privacy-preserving models for anonymizing SRS datasets, showing their incompetence in handling these issues and so arouse the need of new privacy models and data anonymizing methods. Therefore, we present a new privacy-preserving model, called MS(k,
APA, Harvard, Vancouver, ISO, and other styles
21

WANG, CHIEH-TENG, and 王介騰. "Privacy Preserving Anonymity for Periodical SRS Data Publishing." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/16278646066845717875.

Full text
Abstract:
碩士
國立高雄大學
資訊工程學系碩士班
104
In recent years, many countries have built their spontaneous reporting systems to collect adverse drug events for ADR detection and analysis, e.g., the FDA Adverse Event Reporting System (FAERS). The SRS data are provided to the researchers, even open to the public, to foster the research of ADR. Normally, SRS data contains personal information and some private value such as indication. Thus, it is necessary to de-identify the SRS data for prevent the disclosure of individual privacy before it is published. However, researchers have pointed out that it is not enough to protect personal privacy by de-identifying personal identity. To publish data in a more safe way, the technique of privacy-preserving data publishing (PPDP) has attracted lots of attention gradually. Although there have been many different PPDP models (privacy models) proposed by the researchers, they are not suitable for protecting SRS data from disclosure due to some features of SRS data. As such, we have proposed a privacy model called MS(k, θ*)-bounding and the associated algorithm MS-Anonymization in our previous work. In the real world the SRS data is dynamically growing and needs to be published periodically, which thwarts our single-release-focus method, i.e., MS(k, θ*)-bounding, causing some cracks of anonymization to the attacker. In this research, we investigated the attacks on periodically published SRS data and proposed a new privacy model called PPMS(k, θ*)-bounding and the associated algorithm PPMS-Anonymization. Experimental results on FAERS dataset show that our new method can prevent privacy disclosure from the attacks in periodical data publishing scenario with reasonable sacrifice of data utility and acceptable deviation to the strength of ADR signals.
APA, Harvard, Vancouver, ISO, and other styles
22

HSU, KUANG-YUNG, and 許絖詠. "Privacy-Preserving SRS Data Publishing with Missing Values." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/vy754k.

Full text
Abstract:
碩士
國立高雄大學
資訊工程學系碩士班
106
In recent years, many countries have established Spontaneous Reporting System (SRS) for the detection and analysis of adverse drug reactions (ADRs), such as the US Food and Drug Administration's Adverse Event Reporting System (FAERS). These SRS data usually contain sensitive personal privacy information. In order to prevent personal privacy leakage, the data must be de-identified and processed by some Privacy Protection Data Publishing (PPDP) before being published. Although many scholars have proposed various privacy protection models, they overlooked characteristics of SRS data. Therefore, our lab proposed a feasible privacy model MS(k, θ*)-bounding dedicate to SRS data and corresponding anonymization method MS-Anonymization. However, this method is only applicable to complete data, not considering the fact that there is lot amount of missing data. On the other hand, our lab proposed privacy models for handling missing values, Closed k-anonymity and Closed l-diversity, but not dedicate to the characteristics of SRS data. Therefore, in this thesis, we propose a new privacy model Closed MS(k, θ*)-bounding, which combines MS(k, θ*)-bounding and Closed k-anonymity and Closed l-diversity, and propose three new anonymization methods, Closed-MSpartition, Closed-MSdirect, Closed-MSsorting, to process SRS data with missing values. We used FAERS data to test and compare our three methods from the aspects of information loss, privacy risk, and data utility. The results show that Closed-MSdirect has better performance on information distortion, privacy exposure risk and data utility. Although Closed-MSpartition and Closed-MSsorting have higher information loss and privacy risk, and lower data utility than Closed-MSdirect, the results are still in acceptable range. In summary, in the case of a large proportion of SRS contain missing values, our proposed new methods can effectively prevent attackers from learning personal privacy.
APA, Harvard, Vancouver, ISO, and other styles
23

"Privacy preserving in serial data and social network publishing." 2010. http://library.cuhk.edu.hk/record=b5894365.

Full text
Abstract:
Liu, Jia.
"August 2010."
Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (p. 69-72).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 2 --- Related Work --- p.3
Chapter 3 --- Privacy Preserving Network Publication against Structural Attacks --- p.5
Chapter 3.1 --- Background and Motivation --- p.5
Chapter 3.1.1 --- Adversary knowledge --- p.6
Chapter 3.1.2 --- Targets of Protection --- p.7
Chapter 3.1.3 --- Challenges and Contributions --- p.10
Chapter 3.2 --- Preliminaries and Problem Definition --- p.11
Chapter 3.3 --- Solution:K-Isomorphism --- p.15
Chapter 3.4 --- Algorithm --- p.18
Chapter 3.4.1 --- Refined Algorithm --- p.21
Chapter 3.4.2 --- Locating Vertex Disjoint Embeddings --- p.30
Chapter 3.4.3 --- Dynamic Releases --- p.32
Chapter 3.5 --- Experimental Evaluation --- p.34
Chapter 3.5.1 --- Datasets --- p.34
Chapter 3.5.2 --- Data Structure of K-Isomorphism --- p.37
Chapter 3.5.3 --- Data Utilities and Runtime --- p.42
Chapter 3.5.4 --- Dynamic Releases --- p.47
Chapter 3.6 --- Conclusions --- p.47
Chapter 4 --- Global Privacy Guarantee in Serial Data Publishing --- p.49
Chapter 4.1 --- Background and Motivation --- p.49
Chapter 4.2 --- Problem Definition --- p.54
Chapter 4.3 --- Breach Probability Analysis --- p.57
Chapter 4.4 --- Anonymization --- p.58
Chapter 4.4.1 --- AG size Ratio --- p.58
Chapter 4.4.2 --- Constant-Ratio Strategy --- p.59
Chapter 4.4.3 --- Geometric Strategy --- p.61
Chapter 4.5 --- Experiment --- p.62
Chapter 4.5.1 --- Dataset --- p.62
Chapter 4.5.2 --- Anonymization --- p.63
Chapter 4.5.3 --- Evaluation --- p.64
Chapter 4.6 --- Conclusion --- p.68
Bibliography --- p.69
APA, Harvard, Vancouver, ISO, and other styles
24

CHANG, YU-HSIANG, and 張煜祥. "Privacy-Preserving High Dimensional Data Publishing Mechanism Meets K-Anonymity and Differential Privacy." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7n3k93.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Khokhar, Rashid Hussain. "Quantifying the Costs and Benefits of Privacy-Preserving Health Data Publishing." Thesis, 2013. http://spectrum.library.concordia.ca/977136/1/Khokhar_MASc_S2013.pdf.

Full text
Abstract:
Cost-benefit analysis is required for making good business decision. This analysis is crucial in the field of privacy-preserving data publishing. In the economic trade of data privacy and utility, organization has the obligation to respect privacy of individuals. They intend to maximize the utility in order to earn revenue and also aim to achieve the acceptable level of privacy. In this thesis, we study the privacy and utility trade-offs and propose an analytical cost model which can help organization in better decision making subject to sharing customer data with another party. We examine the relevant cost factors associated with earning the revenue and the potential damage cost. Our proposed model is suitable for health information custodians (HICs) who share raw patient electronic health records (EHRs) with another health center or health insurer for research and commercial purposes. Health data in its raw form contain significant volume of sensitive data and sharing this data raises issues of privacy breach. Our analytical cost model could be utilized for nonperturbative and perturbative anonymization techniques for relational data. We show that our approach can achieve optimal value as per selection of each privacy model, namely, K-anonymity, LKC-privacy, and ϵ-differential privacy and their anonymization algorithm and level, through extensive experiments on a real-life dataset.
APA, Harvard, Vancouver, ISO, and other styles
26

"Privacy preserving data publishing: an expected gain model with negative association immunity." 2012. http://library.cuhk.edu.hk/record=b5549584.

Full text
Abstract:
隱私保護是許多應用(特別是和人們有關的)要面對的重要問題。在隱私保護數據發布之研究中,我們探討如何在個人隱私不會被侵犯之情況下發布一個包含個人資料之數據庫,而此數據庫仍包含有用的信息以供研究或其他數據分析之用。
本論文著重於隱私保護數據發布之隱私模型及算法。我們首先提出一個預期收益模型,以確認發布一個數據庫會否侵犯個人隱私。預期收益模型符合我們在本論文中提出的六個關於量化私人信息之公理,而第六條公理還會以社會心理學之角度考慮人為因素。而且,這模型考慮敵意信息收集人在發布數據庫之中所得到的好處。所以這模型切實反映出敵意信息收集人利用這些好處而獲得利益,而其他隱私模型並沒有考慮這點。然後,我們還提出了一個算法來生成符合預期收益模型之發布數據庫。我們亦進行了一些包含現實數據庫之實驗來表示出這算法是現實可行的。在那之後,我們提出了一個敏感值抑制算法,使發布數據庫能對負向關聯免疫,而負向關聯是前景/背景知識攻擊之一種。我們亦進行了一些實驗來表示出我們只需要抑制平均數個百份比之敏感值就可以令一個發佈數據庫對負向關聯免疫。最後,我們探討在分散環境之下之隱私保護數據發布,這代表有兩個或以上的數據庫持有人分別生成不同但有關之發布數據庫。我們提出一個在分散環境下可用的相異L多樣性的隱私模型和一個算法來生成符合此模型之發布數據庫。我們亦進行了一些實驗來表示出這算法是現實可行的。
Privacy preserving is an important issue in many applications, especially for the applications that involve human. In privacy preserving data publishing (PPDP), we study how to publish a database, which contains data records of some individuals, so that the privacy of the individuals is preserved while the published database still contains useful information for research or data analysis.
This thesis focuses on privacy models and algorithms in PPDP. We first propose an expected gain model to define whether privacy is preserved for publishing a database. The expected gain model satisfies the six axioms in quantifying private information proposed in this thesis, where the sixth axiom considers human factors in the view of social psychology. In addition, it considers the amount of advantage gained by an adversary by exploiting the private information deduced from a published database. Hence, the model reflects the reality that the adversary uses such an advantage to earn a profit, which is not conisidered by other existing privacy models. Then, we propose an algorithm to generate published databases that satisfy the expected gain model. Experiments on real datasets are conducted to show that the proposed algorithm is feasible to real applications. After that, we propose a value suppression framework to make the published databases immune to negative association, which is a kind of background / foreground knowledge attacks. Experiments are conducted to show that negative association immunity can be achieved by suppressing only a few percent of sensitive values on average. Finally, we investigate PPDP in a non-centralized environment, in which two or more data holders generate their own different but related published databases. We propose a non-centralized distinct l-diversity requirement as the privacy model and an algorithm to generate published databases for this requirement. Experiments are conducted to show that the proposed algorithm is feasible to real applications.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Cheong, Chi Hong.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 186-193).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Background --- p.1
Chapter 1.2 --- Thesis Contributions and Organization --- p.2
Chapter 1.3 --- Other Related Areas --- p.5
Chapter 1.3.1 --- Privacy Preserving Data Mining --- p.5
Chapter 1.3.2 --- Partition-Based Approach vs. Differential Privacy Approach --- p.5
Chapter 2 --- Expected Gain Model --- p.7
Chapter 2.1 --- Introduction --- p.8
Chapter 2.1.1 --- Background and Motivation --- p.8
Chapter 2.1.2 --- Contributions --- p.11
Chapter 2.2 --- Table Models --- p.12
Chapter 2.2.1 --- Private Table --- p.12
Chapter 2.2.2 --- Published Table --- p.13
Chapter 2.3 --- Private Information Model --- p.14
Chapter 2.3.1 --- Proposition --- p.14
Chapter 2.3.2 --- Private Information and Private Probability --- p.15
Chapter 2.3.3 --- Public Information and Public Probability --- p.18
Chapter 2.3.4 --- Axioms in Quantifying Private Information --- p.20
Chapter 2.4 --- Quantifying Private Information --- p.34
Chapter 2.4.1 --- Expected Gain of a Fair Guessing Game --- p.34
Chapter 2.4.2 --- Analysis --- p.41
Chapter 2.5 --- Tuning the Importance of Opposite Information --- p.48
Chapter 2.6 --- Conclusions --- p.53
Chapter 3 --- Generalized Expected Gain Model --- p.56
Chapter 3.1 --- Introduction --- p.58
Chapter 3.2 --- Table Models --- p.60
Chapter 3.2.1 --- Private Table --- p.62
Chapter 3.2.2 --- Published Table --- p.62
Chapter 3.3 --- Expected Gain Model --- p.63
Chapter 3.3.1 --- Random Variable and Probability Distribution --- p.64
Chapter 3.3.2 --- Public Information --- p.64
Chapter 3.3.3 --- Private Information --- p.65
Chapter 3.3.4 --- Expected Gain Model --- p.66
Chapter 3.4 --- Generalization Algorithm --- p.75
Chapter 3.4.1 --- Generalization Property and Subset Property --- p.75
Chapter 3.4.2 --- Modified Version of Incognito --- p.78
Chapter 3.5 --- Related Work --- p.80
Chapter 3.5.1 --- k-Anonymity --- p.80
Chapter 3.5.2 --- l-Diversity --- p.81
Chapter 3.5.3 --- Confidence Bounding --- p.83
Chapter 3.5.4 --- t-Closeness --- p.84
Chapter 3.6 --- Experiments --- p.85
Chapter 3.6.1 --- Experiment Set 1: Average/Max/Min Expected Gain --- p.85
Chapter 3.6.2 --- Experiment Set 2: Expected Gain Distribution --- p.90
Chapter 3.6.3 --- Experiment Set 3: Modified Version of Incognito --- p.95
Chapter 3.7 --- Conclusions --- p.99
Chapter 4 --- Negative Association Immunity --- p.100
Chapter 4.1 --- Introduction --- p.100
Chapter 4.2 --- Related Work --- p.104
Chapter 4.3 --- Negative Association Immunity and Value Suppression --- p.107
Chapter 4.3.1 --- Negative Association --- p.108
Chapter 4.3.2 --- Negative Association Immunity --- p.111
Chapter 4.3.3 --- Achieving Negative Association Immunity by Value Suppression --- p.114
Chapter 4.4 --- Local Search Algorithm --- p.123
Chapter 4.5 --- Experiments --- p.125
Chapter 4.5.1 --- Settings --- p.125
Chapter 4.5.2 --- Results and Discussions --- p.128
Chapter 4.6 --- Conclusions --- p.129
Chapter 5 --- Non-Centralized Distinct l-Diversity --- p.130
Chapter 5.1 --- Introduction --- p.130
Chapter 5.2 --- Related Work --- p.138
Chapter 5.3 --- Table Models --- p.140
Chapter 5.3.1 --- Private Tables --- p.140
Chapter 5.3.2 --- Published Tables --- p.141
Chapter 5.4 --- Private Information Deduced from Multiple Published Tables --- p.143
Chapter 5.4.1 --- Private Information Deduced by Simple Counting on Each Published Tables --- p.143
Chapter 5.4.2 --- Private Information Deduced from Multiple Published Tables --- p.145
Chapter 5.4.3 --- Probabilistic Table --- p.156
Chapter 5.5 --- Non-Centralized Distinct l-Diversity and Algorithm --- p.158
Chapter 5.5.1 --- Non-centralized Distinct l-diversity --- p.159
Chapter 5.5.2 --- Algorithm --- p.165
Chapter 5.5.3 --- Theorems --- p.171
Chapter 5.6 --- Experiments --- p.174
Chapter 5.6.1 --- Settings --- p.174
Chapter 5.6.2 --- Metrics --- p.176
Chapter 5.6.3 --- Results and Discussions --- p.179
Chapter 5.7 --- Conclusions --- p.181
Chapter 6 --- Conclusions --- p.183
Bibliography --- p.186
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, X. "Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing." Thesis, 2014. http://hdl.handle.net/10453/30324.

Full text
Abstract:
University of Technology, Sydney. Faculty of Engineering and Information Technology.
Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to current IT industry and research communities while posing significant challenges on them as well. The massive increase in computing power and data storage capacity provisioned by the cloud and the advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. A major obstacle to the adoption of cloud computing in sectors such as health and business for big data analysis is the privacy risk associated with releasing data sets to third-parties in the cloud. The data sets in the sectors mentioned above often contain personal privacy-sensitive data, e.g., electronic health records and financial transaction records, while these data sets can offer significant economic and social benefits if analysed or mined by organizations such as disease research centres. Although some privacy issues are not new, the situation is aggravated due to the features of cloud computing like ubiquitous access and multi-tenancy, and the three V properties of big data, i.e., Volume, Velocity and Variety. Therefore, it is still a significant challenge to achieve privacy-preserving big data publishing in cloud computing. A widely-adopted technique for privacy-preserving data publishing with semantic correctness guarantees is to anonymise data via generalisation, and a bundle of anonymisation approaches have been proposed. However, most existing approaches are either inherently sequential or distributed without directly optimising scalability, thus rendering them unsuitable for data intensive applications and inapplicable to the state-of-the-art parallel and distributed paradigms like MapReduce. In this thesis, we mainly investigate the problem of big data anonymisation for privacy preservation from the perspectives of scalability and cost-effectiveness. The cloud computing advantages including on-demand resource provisioning, rapid elasticity and pay-as-you-go fashion are exploited to address the problem, aiming at gaining high scalability and cost-effectiveness. Specifically, we examine three major phases in the lifecycle of privacy-preserving data publishing or sharing in cloud environments, including data anonymisation, anonymous data update and anonymous data management. Accordingly, a scalable and cost-effective privacy-preserving framework is proposed to provide a holistic conceptual foundation for privacy preservation over big data and enable users to accomplish the full potential of the high scalability, elasticity, and cost-effectiveness of the cloud. We develop a corresponding prototype system consisting of a series of solutions to the scalability issues that lie in the three phases based on MapReduce, the de facto standard for big data processing paradigm at present, for the sake of high scalability, cost-effectiveness and compatibility with other big data mining and analytical tools. In terms of extensive experiments on real-world data sets, this thesis demonstrates that our solutions can significantly improve the scalability and cost-effectiveness of big data privacy preservation compared to existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
28

Ho, Shih-Han, and 何是翰. "Maximizing Discriminability on Dynamic Attributes for Privacy-Preserving Data Publishing Using K-Anonymity." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/w2cpcb.

Full text
Abstract:
碩士
國立中興大學
電機工程學系所
107
There are increasing demands on open data for scientific, medical, and social applications. Open data is a new trend and more data are being released for data mining and decision-making. To avoid the leakage of personal privacy caused by the release of data, data must be processed through privacy protection methods before being released. Since the optimization of privacy preserving models like K-anonymity and L-diversity are NP problems, most previous privacy preserving methods trade off privacy preserving and data utility by designing heuristic algorithms to reduce information loss. Different from the previous works, our main idea is that the released data should be privacy protected meanwhile they should provide different levels of discrimination for different individuals, i.e., observers of different backgrounds shall get different levels of information from the released data. For example, data are required to be released for supervision of public administration or financial inspection of foundations. Also, for a mass casualty incident (MCI), the up to date information of injured and ill patients, especially their status and location, should be released for the ambulance and medical staff or the patients’ family for easy of finding the required resource for the patients. However, few privacy protection methods that take into account both privacy and data discrimination in the literature. In the thesis, we study the privacy protection and data discrimination problem and found that the attributes of a dataset could be classified into static and dynamic attributes. Considering the dynamic discrimination privacy preserving problem, we propose a new privacy-preserving model called K_1 K_2-anonymization model. It is to ensure that the equivalence class on static attributes is still K-anonymization while in a equivalence class, the numbers of data with the same dynamic attributes should be less than K_2. If the dynamic attribute values within equivalent classes are similar, there are no solutions because it is hard to differentiate. We propose a clustering-based SimDiv algorithm to make the dynamic attributes within equivalence classes more discriminable as a compromised solution to the K_1 K_2-anonymization problem. To validate the effectiveness, we conduct experiments on a real dataset. The experimental results show that the proposed method outperforms the other methods of similar models in term of discriminability on dynamic attributes.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography