Log in

Relevant bibliographies by topics / Anonymisation des graphes

Contents

Journal articles
Dissertations / Theses
Book chapters

Academic literature on the topic 'Anonymisation des graphes'

Author: Grafiati

Published: 23 September 2022

Last updated: 28 September 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Anonymisation des graphes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Anonymisation des graphes"

1

Grau, Bernardo Cuenca, and Egor V. Kostylev. "Logical Foundations of Linked Data Anonymisation." Journal of Artificial Intelligence Research 64 (February 16, 2019): 253–314. http://dx.doi.org/10.1613/jair.1.11355.

Full text

Abstract:

The widespread adoption of the Linked Data paradigm has been driven by the increasing demand for information exchange between organisations, as well as by regulations in domains such as health care and governance that require certain data to be published. In this setting, sensitive information is at high risk of disclosure since published data can be often seamlessly linkedwith arbitrary external data sources.In this paper we lay the logical foundations of anonymisation in the context of Linked Data. We consider anonymisations of RDF graphs (and, more generally, relational datasets with labelled nulls) and define notions of policy-compliant and linkage-safe anonymisations. Policy compliance ensures that an anonymised dataset does not reveal any sensitive information as specified by a policy query. Linkage safety ensures that an anonymised dataset remains compliant even if it is linked to (possibly unknown) external datasets available on the Web, thus providing provable protection guarantees against data linkage attacks. We establish the computational complexity of the underpinning decision problems both under the open-world semantics inherent to RDF and under the assumption that an attacker has complete, closed-world knowledge over some parts of the original data.

APA, Harvard, Vancouver, ISO, and other styles

2

Mauw, Sjouke, Yunior Ramírez-Cruz, and Rolando Trujillo-Rasua. "Preventing active re-identification attacks on social graphs via sybil subgraph obfuscation." Knowledge and Information Systems 64, no. 4 (2022): 1077–100. http://dx.doi.org/10.1007/s10115-022-01662-z.

Full text

Abstract:

AbstractActive re-identification attacks constitute a serious threat to privacy-preserving social graph publication, because of the ability of active adversaries to leverage fake accounts, a.k.a. sybil nodes, to enforce structural patterns that can be used to re-identify their victims on anonymised graphs. Several formal privacy properties have been enunciated with the purpose of characterising the resistance of a graph against active attacks. However, anonymisation methods devised on the basis of these properties have so far been able to address only restricted special cases, where the adversaries are assumed to leverage a very small number of sybil nodes. In this paper, we present a new probabilistic interpretation of active re-identification attacks on social graphs. Unlike the aforementioned privacy properties, which model the protection from active adversaries as the task of making victim nodes indistinguishable in terms of their fingerprints with respect to all potential attackers, our new formulation introduces a more complete view, where the attack is countered by jointly preventing the attacker from retrieving the set of sybil nodes, and from using these sybil nodes for re-identifying the victims. Under the new formulation, we show that k-symmetry, a privacy property introduced in the context of passive attacks, provides a sufficient condition for the protection against active re-identification attacks leveraging an arbitrary number of sybil nodes. Moreover, we show that the algorithm K-Match, originally devised for efficiently enforcing the related notion of k-automorphism, also guarantees k-symmetry. Empirical results on real-life and synthetic graphs demonstrate that our formulation allows, for the first time, to publish anonymised social graphs (with formal privacy guarantees) that effectively resist the strongest active re-identification attack reported in the literature, even when it leverages a large number of sybil nodes.

APA, Harvard, Vancouver, ISO, and other styles

3

Cuenca Grau, Bernardo, and Egor Kostylev. "Logical Foundations of Privacy-Preserving Publishing of Linked Data." Proceedings of the AAAI Conference on Artificial Intelligence 30, no. 1 (2016). http://dx.doi.org/10.1609/aaai.v30i1.10105.

Full text

Abstract:

The widespread adoption of Linked Data has been driven by the increasing demand for information exchange between organisations, as well as by data publishing regulations in domains such as health care and governance. In this setting, sensitive information is at risk of disclosure since published data can be linked with arbitrary external data sources. In this paper we lay the foundations of privacy-preserving data publishing (PPDP) in the context of Linked Data. We consider anonymisations of RDF graphs (and, more generally, relational datasets with labelled nulls) and define notions of safe and optimal anonymisations. Safety ensures that the anonymised data can be published with provable protection guarantees against linking attacks, whereas optimality ensures that it preserves as much information from the original data as possible, while satisfying the safety requirement. We establish the complexity of the underpinning decision problems both under open-world semantics inherent to RDF and a closed-world semantics, where we assume that an attacker has complete knowledge over some part of the original data.

APA, Harvard, Vancouver, ISO, and other styles

4

Scheliga, Bernhard, Milan Markovic, Helen Rowlands, Artur Wozniak, Katie Wilde, and Jessica Butler. "Data provenance tracking and reporting in a high-security digital research environment." International Journal of Population Data Science 7, no. 3 (2022). http://dx.doi.org/10.23889/ijpds.v7i3.1909.

Full text

Abstract:

ObjectiveTo protect privacy, routinely-collected data are processed and anonymised by third parties before being used for research. However, the methods used to do this are rarely shared, leaving the resulting research difficult to evaluate and liable to undetected errors. Here, we present a provenance-based approach for documenting and auditing such methods. ApproachWe designed the Safe Haven Provenance (SHP) ontology for representing provenance information about data, users, and activities within high-security environments as knowledge graphs. The work was based on a case study of the Grampian Data Safe Haven (DASH) which holds and processes medical records for 600,000 people in Scotland. The SHP ontology was designed as an extension to the standard W3C PROV-O ontology. The auditing capabilities of our approach were evaluated against a set of transparency requirements through a prototype interactive dashboard. ResultsWe demonstrated the ability of the SHP ontology to document the workflow within DASH: capturing the extraction and anonymisation process using a structured vocabulary of entities (e.g. datasets), activities (e.g. linkage, anonymisation) and agents (e.g. analysts, data owners). Two provenance reporting templates were designed following interviews with DASH staff and clinical researchers: 1) a detailed report for use within DASH for quality assurance, and 2) a summary report for researchers that was safe for public release. Using a prototype data-linkage project, we formalised queries for report generation, and demonstrated use of automated rules for error detection (e.g., data discrepancies) using the structure of the SHP knowledge graphs. All of the project outputs are available under an open-source license. ConclusionsThis project lays a foundation for more transparent high-quality research using public data for health care and innovation. The SHP ontology is extendible for different domains and potentially represents a key component for further automation of provenance capture and reporting in high-security research environments.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Anonymisation des graphes"

1

Maag, Maria Coralia Laura. "Apprentissage automatique de fonctions d'anonymisation pour les graphes et les graphes dynamiques." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066050/document.

Full text

Abstract:

La confidentialité des données est un problème majeur qui doit être considéré avant de rendre publiques les données ou avant de les transmettre à des partenaires tiers avec comme but d'analyser ou de calculer des statistiques sur ces données. Leur confidentialité est principalement préservée en utilisant des techniques d'anonymisation. Dans ce contexte, un nombre important de techniques d'anonymisation a été proposé dans la littérature. Cependant, des méthodes génériques capables de s'adapter à des situations variées sont souhaitables. Nous adressons le problème de la confidentialité des données représentées sous forme de graphe, données qui nécessitent, pour différentes raisons, d'être rendues publiques. Nous considérons que l'anonymiseur n'a pas accès aux méthodes utilisées pour analyser les données. Une méthodologie générique est proposée basée sur des techniques d'apprentissage artificiel afin d'obtenir directement une fonction d'anonymisation et d'optimiser la balance entre le risque pour la confidentialité et la perte dans l'utilité des données. La méthodologie permet d'obtenir une bonne procédure d'anonymisation pour une large catégorie d'attaques et des caractéristiques à préserver dans un ensemble de données. La méthodologie est instanciée pour des graphes simples et des graphes dynamiques avec une composante temporelle. La méthodologie a été expérimentée avec succès sur des ensembles de données provenant de Twitter, Enron ou Amazon. Les résultats sont comparés avec des méthodes de référence et il est montré que la méthodologie proposée est générique et peut s'adapter automatiquement à différents contextes d'anonymisation<br>Data privacy is a major problem that has to be considered before releasing datasets to the public or even to a partner company that would compute statistics or make a deep analysis of these data. Privacy is insured by performing data anonymization as required by legislation. In this context, many different anonymization techniques have been proposed in the literature. These techniques are difficult to use in a general context where attacks can be of different types, and where measures are not known to the anonymizer. Generic methods able to adapt to different situations become desirable. We are addressing the problem of privacy related to graph data which needs, for different reasons, to be publicly made available. This corresponds to the anonymized graph data publishing problem. We are placing from the perspective of an anonymizer not having access to the methods used to analyze the data. A generic methodology is proposed based on machine learning to obtain directly an anonymization function from a set of training data so as to optimize a tradeoff between privacy risk and utility loss. The method thus allows one to get a good anonymization procedure for any kind of attacks, and any characteristic in a given set. The methodology is instantiated for simple graphs and complex timestamped graphs. A tool has been developed implementing the method and has been experimented with success on real anonymized datasets coming from Twitter, Enron or Amazon. Results are compared with baseline and it is showed that the proposed method is generic and can automatically adapt itself to different anonymization contexts

APA, Harvard, Vancouver, ISO, and other styles

2

Alchicha, Élie. "Confidentialité Différentielle et Blowfish appliquées sur des bases de données graphiques, transactionnelles et images." Thesis, Pau, 2021. http://www.theses.fr/2021PAUU3067.

Full text

Abstract:

Les données numériques jouent un rôle crucial dans notre vie quotidienne en communiquant, en enregistrant des informations, en exprimant nos pensées et nos opinions et en capturant nos moments précieux sous forme d'images et de vidéos numériques. Les données numériques présentent d'énormes avantages dans tous les aspects de la vie moderne, mais constituent également une menace pour notre vie privée. Dans cette thèse, nous considérons trois types de données numériques en ligne générées par les utilisateurs des médias sociaux et les clients du commerce électronique : les graphiques, les transactions et les images. Les graphiques sont des enregistrements des interactions entre les utilisateurs qui aident les entreprises à comprendre qui sont les utilisateurs influents dans leur environnement. Les photos postées sur les réseaux sociaux sont une source importante de données qui nécessitent des efforts d'extraction. Les ensembles de données transactionnelles représentent les opérations qui ont eu lieu sur les services de commerce électronique.Nous nous appuyons sur une technique de préservation de la vie privée appelée Differential Privacy (DP) et sa généralisation Blowfish Privacy (BP) pour proposer plusieurs solutions permettant aux propriétaires de données de bénéficier de leurs ensembles de données sans risque de violation de la vie privée pouvant entraîner des problèmes juridiques. Ces techniques sont basées sur l'idée de récupérer l'existence ou la non-existence de tout élément dans l'ensemble de données (tuple, ligne, bord, nœud, image, vecteur, ...) en ajoutant respectivement un petit bruit sur la sortie pour fournir un bon équilibre entre intimité et utilité.Dans le premier cas d'utilisation, nous nous concentrons sur les graphes en proposant trois mécanismes différents pour protéger les données personnelles des utilisateurs avant d'analyser les jeux de données. Pour le premier mécanisme, nous présentons un scénario pour protéger les connexions entre les utilisateurs avec une nouvelle approche où les utilisateurs ont des privilèges différents : les utilisateurs VIP ont besoin d'un niveau de confidentialité plus élevé que les utilisateurs standard. Le scénario du deuxième mécanisme est centré sur la protection d'un groupe de personnes (sous-graphes) au lieu de nœuds ou d'arêtes dans un type de graphes plus avancé appelé graphes dynamiques où les nœuds et les arêtes peuvent changer à chaque intervalle de temps. Dans le troisième scénario, nous continuons à nous concentrer sur les graphiques dynamiques, mais cette fois, les adversaires sont plus agressifs que les deux derniers scénarios car ils plantent de faux comptes dans les graphiques dynamiques pour se connecter à des utilisateurs honnêtes et essayer de révéler leurs nœuds représentatifs dans le graphique.Dans le deuxième cas d'utilisation, nous contribuons dans le domaine des données transactionnelles en présentant un mécanisme existant appelé Safe Grouping. Il repose sur le regroupement des tuples de manière à masquer les corrélations entre eux que l'adversaire pourrait utiliser pour violer la vie privée des utilisateurs. D'un autre côté, ces corrélations sont importantes pour les propriétaires de données dans l'analyse des données pour comprendre qui pourrait être intéressé par des produits, biens ou services similaires. Pour cette raison, nous proposons un nouveau mécanisme qui expose ces corrélations dans de tels ensembles de données, et nous prouvons que le niveau de confidentialité est similaire au niveau fourni par Safe Grouping.Le troisième cas d'usage concerne les images postées par les utilisateurs sur les réseaux sociaux. Nous proposons un mécanisme de préservation de la confidentialité qui permet aux propriétaires des données de classer les éléments des photos sans révéler d'informations sensibles. Nous présentons un scénario d'extraction des sentiments sur les visages en interdisant aux adversaires de reconnaître l'identité des personnes<br>Digital data is playing crucial role in our daily life in communicating, saving information, expressing our thoughts and opinions and capturing our precious moments as digital pictures and videos. Digital data has enormous benefits in all the aspects of modern life but forms also a threat to our privacy. In this thesis, we consider three types of online digital data generated by users of social media and e-commerce customers: graphs, transactional, and images. The graphs are records of the interactions between users that help the companies understand who are the influential users in their surroundings. The photos posted on social networks are an important source of data that need efforts to extract. The transactional datasets represent the operations that occurred on e-commerce services.We rely on a privacy-preserving technique called Differential Privacy (DP) and its generalization Blowfish Privacy (BP) to propose several solutions for the data owners to benefit from their datasets without the risk of privacy breach that could lead to legal issues. These techniques are based on the idea of recovering the existence or non-existence of any element in the dataset (tuple, row, edge, node, image, vector, ...) by adding respectively small noise on the output to provide a good balance between privacy and utility.In the first use case, we focus on the graphs by proposing three different mechanisms to protect the users' personal data before analyzing the datasets. For the first mechanism, we present a scenario to protect the connections between users (the edges in the graph) with a new approach where the users have different privileges: the VIP users need a higher level of privacy than standard users. The scenario for the second mechanism is centered on protecting a group of people (subgraphs) instead of nodes or edges in a more advanced type of graphs called dynamic graphs where the nodes and the edges might change in each time interval. In the third scenario, we keep focusing on dynamic graphs, but this time the adversaries are more aggressive than the past two scenarios as they are planting fake accounts in the dynamic graphs to connect to honest users and try to reveal their representative nodes in the graph. In the second use case, we contribute in the domain of transactional data by presenting an existed mechanism called Safe Grouping. It relies on grouping the tuples in such a way that hides the correlations between them that the adversary could use to breach the privacy of the users. On the other side, these correlations are important for the data owners in analyzing the data to understand who might be interested in similar products, goods or services. For this reason, we propose a new mechanism that exposes these correlations in such datasets, and we prove that the level of privacy is similar to the level provided by Safe Grouping.The third use-case concerns the images posted by users on social networks. We propose a privacy-preserving mechanism that allows the data owners to classify the elements in the photos without revealing sensitive information. We present a scenario of extracting the sentiments on the faces with forbidding the adversaries from recognizing the identity of the persons. For each use-case, we present the results of the experiments that prove that our algorithms can provide a good balance between privacy and utility and that they outperform existing solutions at least in one of these two concepts

APA, Harvard, Vancouver, ISO, and other styles

3

Delanaux, Rémy. "Intégration de données liées respectueuse de la confidentialité." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1303.

Full text

Abstract:

La confidentialité des données personnelles est un souci majeur et un problème peu étudié pour la publication de données dans le Web des données ouvertes (ou LOD cloud, pour Linked Open Data cloud) . Ce nuage formé par le LOD est un réseau d'ensembles de données interconnectés et accessibles publiquement sous la forme de graphes de données modélisés dans le format RDF, et interrogés via des requêtes écrites dans le langage SPARQL. Ce cadre très standardisé est très utilisé de nos jours par des organismes publics et des entreprises. Mais certains acteurs notamment du secteur privé sont toujours réticents à la publication de leurs données, découragés par des soucis potentiels de confidentialité. Pour pallier cela, nous présentons et développons un cadre formel déclaratif pour la publication de données liées respectant la confidentialité, dans lequel les contraintes de confidentialité et d'utilité des données sont spécifiées sous forme de politiques (des ensembles de requêtes SPARQL). Cette approche est indépendante des données et du graphe considéré, et consiste en l'analyse statique d'une politique de confidentialité et d'une politique d'utilité pour déterminer des séquences d'opérations d'anonymization à appliquer à n'importe quel graphe RDF pour satisfaire les politiques fournies. Nous démontrons la sûreté de nos algorithmes et leur efficacité en terme de performance via une étude expérimentale. Un autre aspect à prendre en compte est qu'un nouveau graphe publié dans le nuage LOD est évidemment exposé à des failles de confidentialité car il peut être relié à des données déjà publiées dans d'autres données liées. Dans le second volet de cette thèse, nous nous concentrons donc sur le problème de construction d'anonymisations *sûres* d'un graphe RDF garantissant que relier le graphe anonymisé à un graphe externe quelconque ne causera pas de brèche de confidentialité. En prenant un ensemble de requêtes de confidentialité en entrée, nous étudions le problème de sûreté indépendamment des données du graphe, et la construction d'une séquence d'opérations d'anonymisation permettant d'assurer cette sûreté. Nous détaillons des conditions suffisantes sous lesquelles une instance d'anonymisation est sûre pour une certaine politique de confidentialité fournie. Par ailleurs, nous montrons que nos algorithmes sont robustes même en présence de liens de type sameAs (liens d'égalité entre entités en RDF), qu'ils soient explicites ou inférés par de la connaissance externe. Enfin, nous évaluons l'impact de cette contribution assurant la sûreté de données en la testant sur divers graphes. Nous étudions notamment la performance de cette solution et la perte d'utilité causée par nos algorithmes sur des données RDF réelles comme synthétiques. Nous étudions d'abord les diverses mesures d'utilité existantes et nous en choisissons afin de comparer le graphe original et son pendant anonymisé. Nous définissons également une méthode pour générer de nouvelles politiques de confidentialité à partir d'une politique de référence, via des modifications incrémentales. Nous étudions le comportement de notre contribution sur 4 graphes judicieusement choisis et nous montrons que notre approche est efficace avec un temps très faible même sur de gros graphes (plusieurs millions de triplets). Cette approche est graduelle : le plus spécifique est la politique de confidentialité, le plus faible est son impact sur les données. Pour conclure, nous montrons via différentes métriques structurelles (adaptées aux graphes) que nos algorithmes ne sont que peu destructeurs, et cela même quand les politiques de confidentialité couvrent une grosse partie du graphe<br>Individual privacy is a major and largely unexplored concern when publishing new datasets in the context of Linked Open Data (LOD). The LOD cloud forms a network of interconnected and publicly accessible datasets in the form of graph databases modeled using the RDF format and queried using the SPARQL language. This heavily standardized context is nowadays extensively used by academics, public institutions and some private organizations to make their data available. Yet, some industrial and private actors may be discouraged by potential privacy issues. To this end, we introduce and develop a declarative framework for privacy-preserving Linked Data publishing in which privacy and utility constraints are specified as policies, that is sets of SPARQL queries. Our approach is data-independent and only inspects the privacy and utility policies in order to determine the sequence of anonymization operations applicable to any graph instance for satisfying the policies. We prove the soundness of our algorithms and gauge their performance through experimental analysis. Another aspect to take into account is that a new dataset published to the LOD cloud is indeed exposed to privacy breaches due to the possible linkage to objects already existing in the other LOD datasets. In the second part of this thesis, we thus focus on the problem of building safe anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms are robust in the presence of sameAs links that can be explicit or inferred by additional knowledge. To conclude, we evaluate the impact of this safety-preserving solution on given input graphs through experiments. We focus on the performance and the utility loss of this anonymization framework on both real-world and artificial data. We first discuss and select utility measures to compare the original graph to its anonymized counterpart, then define a method to generate new privacy policies from a reference one by inserting incremental modifications. We study the behavior of the framework on four carefully selected RDF graphs. We show that our anonymization technique is effective with reasonable runtime on quite large graphs (several million triples) and is gradual: the more specific the privacy policy is, the lesser its impact is. Finally, using structural graph-based metrics, we show that our algorithms are not very destructive even when privacy policies cover a large part of the graph. By designing a simple and efficient way to ensure privacy and utility in plausible usages of RDF graphs, this new approach suggests many extensions and in the long run more work on privacy-preserving data publishing in the context of Linked Open Data

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Anonymisation des graphes"

1

Alamán Requena, Guillermo, Rudolf Mayer, and Andreas Ekelhart. "Anonymisation of Heterogeneous Graphs with Multiple Edge Types." In Lecture Notes in Computer Science. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-12423-5_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!