Дисертації з теми "Privacy preserving machine learning"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Privacy preserving machine learning".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Bozdemir, Beyza. "Privacy-preserving machine learning techniques." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS323.
Повний текст джерелаMachine Learning as a Service (MLaaS) refers to a service that enables companies to delegate their machine learning tasks to single or multiple untrusted but powerful third parties, namely cloud servers. Thanks to MLaaS, the need for computational resources and domain expertise required to execute machine learning techniques is significantly reduced. Nevertheless, companies face increasing challenges with ensuring data privacy guarantees and compliance with the data protection regulations. Executing machine learning tasks over sensitive data requires the design of privacy-preserving protocols for machine learning techniques.In this thesis, we aim to design such protocols for MLaaS and study three machine learning techniques: Neural network classification, trajectory clustering, and data aggregation under privacy protection. In our solutions, our goal is to guarantee data privacy while keeping an acceptable level of performance and accuracy/quality evaluation when executing the privacy-preserving variants of these machine learning techniques. In order to ensure data privacy, we employ several advanced cryptographic techniques: Secure two-party computation, homomorphic encryption, homomorphic proxy re-encryption, multi-key homomorphic encryption, and threshold homomorphic encryption. We have implemented our privacy-preserving protocols and studied the trade-off between privacy, efficiency, and accuracy/quality evaluation for each of them
Hesamifard, Ehsan. "Privacy Preserving Machine Learning as a Service." Thesis, University of North Texas, 2020. https://digital.library.unt.edu/ark:/67531/metadc1703277/.
Повний текст джерелаGrivet, Sébert Arnaud. "Combining differential privacy and homomorphic encryption for privacy-preserving collaborative machine learning." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG037.
Повний текст джерелаThe purpose of this PhD is to design protocols to collaboratively train machine learning models while keeping the training data private. To do so, we focused on two privacy tools, namely differential privacy and homomorphic encryption. While differential privacy enables to deliver a functional model immune to attacks on the training data privacy by end-users, homomorphic encryption allows to make use of a server as a totally blind intermediary between the data owners, that provides computational resource without any access to clear information. Yet, these two techniques are of totally different natures and both entail their own constraints that may interfere: differential privacy generally requires the use of continuous and unbounded noise whereas homomorphic encryption can only deal with numbers encoded with a quite limited number of bits. The presented contributions make these two privacy tools work together by coping with their interferences and even leveraging them so that the two techniques may benefit from each other.In our first work, SPEED, we built on Private Aggregation of Teacher Ensembles (PATE) framework and extend the threat model to deal with an honest but curious server by covering the server computations with a homomorphic layer. We carefully define which operations are realised homomorphically to make as less computation as possible in the costly encrypted domain while revealing little enough information in clear to be easily protected by differential privacy. This trade-off forced us to realise an argmax operation in the encrypted domain, which, even if reasonable, remained expensive. That is why we propose SHIELD in another contribution, an argmax operator made inaccurate on purpose, both to satisfy differential privacy and lighten the homomorphic computation. The last presented contribution combines differential privacy and homomorphic encryption to secure a federated learning protocol. The main challenge of this combination comes from the necessary quantisation of the noise induced by encryption, that complicates the differential privacy analysis and justifies the design and use of a novel quantisation operator that commutes with the aggregation
Cyphers, Bennett James. "A system for privacy-preserving machine learning on personal data." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/119518.
Повний текст джерелаThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 81-85).
This thesis describes the design and implementation of a system which allows users to generate machine learning models with their own data while preserving privacy. We approach the problem in two steps. First, we present a framework with which a user can collate personal data from a variety of sources in order to generate machine learning models for problems of the user's choosing. Second, we describe AnonML, a system which allows a group of users to share data privately in order to build models for classification. We analyze AnonML under differential privacy and test its performance on real-world datasets. In tandem, these two systems will help democratize machine learning, allowing people to make the most of their own data without relying on trusted third parties.
by Bennett James Cyphers.
M. Eng.
Esperança, Pedro M. "Privacy-preserving statistical and machine learning methods under fully homomorphic encryption." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:a081311c-b25c-462e-a66b-1e4ac4de5fc2.
Повний текст джерелаZhang, Kevin M. Eng Massachusetts Institute of Technology. "Tiresias : a peer-to-peer platform for privacy preserving machine learning." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129840.
Повний текст джерелаCataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 81-84).
Big technology firms have a monopoly over user data. To remediate this, we propose a data science platform which allows users to collect their personal data and offer computations on them in a differentially private manner. This platform provides a mechanism for contributors to offer computations on their data in a privacy-preserving way and for requesters -- i.e. anyone who can benefit from applying machine learning to the users' data -- to request computations on user data they would otherwise not be able to collect. Through carefully designed differential privacy mechanisms, we can create a platform which gives people control over their data and enables new types of applications.
by Kevin Zhang.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Langelaar, Johannes, and Mattsson Adam Strömme. "Federated Neural Collaborative Filtering for privacy-preserving recommender systems." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446913.
Повний текст джерелаDou, Yanzhi. "Toward Privacy-Preserving and Secure Dynamic Spectrum Access." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/81882.
Повний текст джерелаPh. D.
García, Recuero Álvaro. "Discouraging abusive behavior in privacy-preserving decentralized online social networks." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S010/document.
Повний текст джерелаThe main goal of this thesis is to evaluate privacy-preserving protocols to detect abuse in future decentralised online social platforms or microblogging services, where often limited amount of metadata is available to perform data analytics. Taking into account such data minimization, we obtain acceptable results compared to techniques of machine learning that use all metadata available. We draw a series of conclusion and recommendations that will aid in the design and development of a privacy-preserving decentralised social network that discourages abusive behavior
Ligier, Damien. "Functional encryption applied to privacy-preserving classification : practical use, performances and security." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0040/document.
Повний текст джерелаMachine Learning (ML) algorithms have proven themselves very powerful. Especially classification, enabling to efficiently identify information in large datasets. However, it raises concerns about the privacy of this data. Therefore, it brought to the forefront the challenge of designing machine learning algorithms able to preserve confidentiality.This thesis proposes a way to combine some cryptographic systems with classification algorithms to achieve privacy preserving classifier. The cryptographic system family in question is the functional encryption one. It is a generalization of the traditional public key encryption in which decryption keys are associated with a function. We did some experimentations on that combination on realistic scenario using the MNIST dataset of handwritten digit images. Our system is able in this use case to know which digit is written in an encrypted digit image. We also study its security in this real life scenario. It raises concerns about uses of functional encryption schemes in general and not just in our use case. We then introduce a way to balance in our construction efficiency of the classification and the risks
Sarmadi, Soheil. "On the Feasibility of Profiling, Forecasting and Authenticating Internet Usage Based on Privacy Preserving NetFlow Logs." Scholar Commons, 2018. https://scholarcommons.usf.edu/etd/7568.
Повний текст джерелаChatalic, Antoine. "Efficient and privacy-preserving compressive learning." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S030.
Повний текст джерелаThe topic of this Ph.D. thesis lies on the borderline between signal processing, statistics and computer science. It mainly focuses on compressive learning, a paradigm for large-scale machine learning in which the whole dataset is compressed down to a single vector of randomized generalized moments, called the sketch. An approximate solution of the learning task at hand is then estimated from this sketch, without using the initial data. This framework is by nature suited for learning from distributed collections or data streams, and has already been instantiated with success on several unsupervised learning tasks such as k-means clustering, density fitting using Gaussian mixture models, or principal component analysis. We improve this framework in multiple directions. First, it is shown that perturbing the sketch with additive noise is sufficient to derive (differential) privacy guarantees. Sharp bounds on the noise level required to obtain a given privacy level are provided, and the proposed method is shown empirically to compare favourably with state-of-the-art techniques. Then, the compression scheme is modified to leverage structured random matrices, which reduce the computational cost of the framework and make it possible to learn on high-dimensional data. Lastly, we introduce a new algorithm based on message passing techniques to learn from the sketch for the k-means clustering problem. These contributions open the way for a broader application of the framework
Nan, Lihao. "Privacy Preserving Representation Learning For Complex Data." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20662.
Повний текст джерелаMa, Jianjie. "Learning from perturbed data for privacy-preserving data mining." Online access for everyone, 2006. http://www.dissertations.wsu.edu/Dissertations/Summer2006/j%5Fma%5F080406.pdf.
Повний текст джерелаTorfi, Amirsina. "Privacy-Preserving Synthetic Medical Data Generation with Deep Learning." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99856.
Повний текст джерелаDoctor of Philosophy
Computers programs have been widely used for clinical diagnosis but are often designed with assumptions limiting their scalability and interoperability. The recent proliferation of abundant health data, significant increases in computer processing power, and superior performance of data-driven methods enable a trending paradigm shift in healthcare technology. This involves the adoption of artificial intelligence methods, such as deep learning, to improve healthcare knowledge and practice. Despite the success in using deep learning in many different domains, in the healthcare field, privacy challenges make collaborative research difficult, as working with data-driven methods may jeopardize patients' privacy. To overcome these challenges, researchers propose to generate and utilize realistic synthetic data that can be used instead of real private data. Existing methods for artificial data generation are limited by being bound to special use cases. Furthermore, their generalizability to real-world problems is questionable. There is a need to establish valid synthetic data that overcomes privacy restrictions and functions as a real-world analog for healthcare deep learning data training. We propose the use of Generative Adversarial Networks to simultaneously overcome the realism and privacy challenges associated with healthcare data.
Chen, Xuhui. "Secure and Privacy-Aware Machine Learning." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1563196765900275.
Повний текст джерелаZhang, Sixiao. "Classifier Privacy in Machine Learning Markets." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1586460332748024.
Повний текст джерелаLiu, Menghan. "PULMONARY FUNCTION MONITORING USING PORTABLE ULTRASONOGRAPHY AND PRIVACY-PRESERVING LEARNING." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1481034164747838.
Повний текст джерелаNguyen, Trang Pham Ngoc. "A privacy preserving online learning framework for medical diagnosis applications." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2503.
Повний текст джерелаSitta, Alessandro. "Privacy-Preserving Distributed Optimization via Obfuscated Gradient Tracking." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Знайти повний текст джерелаAryasomayajula, Naga Srinivasa Baradwaj. "Machine Learning Models for Categorizing Privacy Policy Text." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535633397362514.
Повний текст джерелаMhanna, Maggie. "Privacy-Preserving Quantization Learning for Distributed Detection with Applications to Smart Meters." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS047/document.
Повний текст джерелаThis thesis investigates source coding problems in which some secrecy should be ensured with respect to eavesdroppers. In the first part, we provide some new fundamental results on both detection and secrecy oriented source coding in the presence of side information at the receiving terminals. We provide several new results of optimality and single-letter characterization of the achievable rate-error-equivocation region, and propose practical algorithms to obtain solutions that are as close as possible to the optimal, which requires the design of optimal quantization in the presence of an eavesdropper In the second part, we study the problem of secure estimation in a utility-privacy framework where the user is either looking to extract relevant aspects of complex data or hide them from a potential eavesdropper. The objective is mainly centered on the development of a general framework that combines information theory with communication theory, aiming to provide a novel and powerful tool for security in Smart Grids. From a theoretical perspective, this research was able to quantify fundamental limits and thus the tradeoff between security and performance (estimation/detection)
Rodríguez, Hoyos Ana Fernanda. "Contribution to privacy-enhancing tecnologies for machine learning applications." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669919.
Повний текст джерелаLes aplicacions de big data impulsen actualment una accelerada innovació aprofitant la gran quantitat d’informació generada a partir de les interaccions dels usuaris amb la tecnologia. Així, qualsevol entitat és capaç d'explotar eficientment les dades per obtenir utilitat, emprant aprenentatge automàtic i capacitats de còmput sense precedents. No obstant això, sorgeixen en aquest escenari serioses preocupacions pel que fa a la privacitat dels usuaris ja que hi ha informació personal involucrada. Tot i que s'han proposat diversos mecanismes de protecció, hi ha alguns reptes per a la seva adopció en la pràctica, és a dir perquè es puguin utilitzar. Per començar, l’impacte real d'aquests mecanismes en la utilitat de les dades no esta clar, raó per la qual la seva avaluació empírica és important. A més, considerant que actualment es manegen grans volums de dades, una privacitat usable requereix, no només preservació de la utilitat de les dades, sinó també algoritmes eficients en temes de temps de còmput. És clau satisfer tots dos requeriments per incentivar l’adopció de mesures de privacitat. Malgrat que hi ha diversos esforços per dissenyar mecanismes de privacitat menys "destructius", les mètriques d'utilitat emprades no serien apropiades, de manera que aquests mecanismes de protecció podrien estar sent incorrectament avaluats. D'altra banda, tot i l’adveniment del big data, la investigació existent no s’enfoca molt en millorar la seva eficiència. Lamentablement, si els requisits de les aplicacions actuals no es satisfan, s’obstaculitzarà l'adopció de tecnologies de privacitat. A la primera part d'aquesta tesi abordem el problema de mesurar l'impacte de la microagregació k-Gnónima en la utilitat empírica de microdades. Per això, quantifiquem la utilitat com la precisió de models de classificació obtinguts a partir de les dades microagregades. i avaluats sobre dades de prova originals. Els experiments mostren que l'impacte de l’algoritme de rmicroagregació estàndard en el rendiment d’algoritmes d'aprenentatge automàtic és usualment menor per a una varietat de conjunts de dades avaluats. A més, l’evidència experimental suggereix que la mètrica tradicional de distorsió de les dades seria inapropiada per avaluar la utilitat empírica de dades microagregades. Així també estudiem el problema de preservar la utilitat empírica de les dades a l'ésser anonimitzades. Transformant els registres originaIs de dades en un espai de dades diferent, el nostre enfocament, basat en anàlisi de discriminant lineal, permet que el procés de microagregació k-anònima s'adapti al domini d’aplicació de les dades. Per això, primer, les dades són rotades o projectades en la direcció de màxima discriminació i, segon, escalades en aquesta direcció, penalitzant la distorsió a través del llindar de classificació. Com a resultat, la utilitat de les dades es preserva en termes de la precisió dels models d'aprenentatge automàtic en diversos conjunts de dades. Posteriorment, proposem un mecanisme per reduir el temps d'execució per a la microagregació k-anònima. Això s'aconsegueix simplificant les operacions internes de l'algoritme escollit Mitjançant una extensa experimentació sobre diversos conjunts de dades, vam mostrar que el nou algoritme és bastant més ràpid. Aquesta acceleració s'aconsegueix sense que hi ha pèrdua en la utilitat de les dades. Finalment, en un enfocament més aplicat, es proposa una eina de protecció de privacitat d'individus i organitzacions mitjançant l'anonimització de dades sensibles inclosos en logs de seguretat. Es dissenyen diferents mecanismes d'anonimat per implementar-los en base a la definició d'una política de privacitat, en el context d'un projecte europeu que té per objectiu construir un sistema de seguretat unificat.
PANFILO, DANIELE. "Generating Privacy-Compliant, Utility-Preserving Synthetic Tabular and Relational Datasets Through Deep Learning." Doctoral thesis, Università degli Studi di Trieste, 2022. http://hdl.handle.net/11368/3030920.
Повний текст джерелаTwo trends have rapidly been redefining the artificial intelligence (AI) landscape over the past several decades. The first of these is the rapid technological developments that make increasingly sophisticated AI feasible. From a hardware point of view, this includes increased computational power and efficient data storage. From a conceptual and algorithmic viewpoint, fields such as machine learning have undergone a surge and synergies between AI and other disciplines have resulted in considerable developments. The second trend is the growing societal awareness around AI. While institutions are becoming increasingly aware that they have to adopt AI technology to stay competitive, issues such as data privacy and explainability have become part of public discourse. Combined, these developments result in a conundrum: AI can improve all aspects of our lives, from healthcare to environmental policy to business opportunities, but invoking it requires the use of sensitive data. Unfortunately, traditional anonymization techniques do not provide a reliable solution to this conundrum. They are insufficient in protecting personal data, but also reduce the analytic value of data through distortion. However, the emerging study of deep-learning generative models (DLGM) may form a more refined alternative to traditional anonymization. Originally conceived for image processing, these models capture probability distributions underlying datasets. Such distributions can subsequently be sampled, giving new data points not present in the original dataset. However, the overall distribution of synthetic datasets, consisting of data sampled in this manner, is equivalent to that of the original dataset. In our research activity, we study the use of DLGM as an enabling technology for wider AI adoption. To do so, we first study legislation around data privacy with an emphasis on the European Union. In doing so, we also provide an outline of traditional data anonymization technology. We then provide an introduction to AI and deep-learning. Two case studies are discussed to illustrate the field’s merits, namely image segmentation and cancer diagnosis. We then introduce DLGM, with an emphasis on variational autoencoders. The application of such methods to tabular and relational data is novel and involves innovative preprocessing techniques. Finally, we assess the developed methodology in reproducible experiments, evaluating both the analytic utility and the degree of privacy protection through statistical metrics.
Anderberg, Jesper, and Nazdar Fathullah. "A machine learning approach to enhance the privacy of customers." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20629.
Повний текст джерелаDuring a phone call between a customer and a representative for a company, various amountof information is exchanged. Everything from a customer’s name, identification number,and home address, to weather conversations and more generic subjects. Companies knowledgeabout their customers are a vital part of their business. Therefore, a need to analyzethe conversation in the form of transcripts might be necessary to develop and improvethe overall customer service within a company. However, with new legislation like GDPR,special considerations must be taken into account when storing personal information.In this paper we will examine, by using two machine learning algorithms, the possibilitiesof classifying data from a transcribed phone call, to leave out sensitive information. Themachine learning model is built by following an iterative system development method. Byusing the Naive Bayes and Support Vector Machine algorithms, classification of sensitivedata, such a persons name and location, is conducted. Evaluation methods like 10-foldcross-validation, learning curve, classification report, and ROC curve are used to evaluating the system. The results show that the algorithm achieved a higher accuracy when the dataset contains more data samples, compared to a dataset with less number of data samples. Furthermore, by pre-processing the data, the accuracy of the machine learning models increased.
Lundmark, Magnus, and Carl-Johan Dahlman. "Differential privacy and machine learning: Calculating sensitivity with generated data sets." Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209481.
Повний текст джерелаAldrig tidigare har integritet varit viktigare att upprätthålla än i dagens informationssamhälle, där företag och organisationer samlar stora mängder data om sina användare. Merparten av denna information är sedd som värdefull och kan användas för att skapa statistik som i sin tur kan ge insikt inom områden som medicin, ekonomi eller beteendemönster bland individer. För att säkerställa att en enskild individs integritet upprätthålls har en teknik som heter differential privacy utvecklats. Denna möjliggör framtagandet av användbar statistik samtidigt som individens integritet upprätthålls. Differential privacy har dock en nackdel, och det är storleken på det randomiserade bruset som används för att dölja individen i en fråga om data. Denna undersökning undersökte huruvida detta brus kunde förbättras genom att använda maskininlärning för att generera ett data set som bruset kunde baseras på. Tanken var att den genererade datasetet skulle kunna ge en lokal representation av det underliggande datasetet som skulle vara säker att använda vid beräkning av det randomiserade brusets storlek. Forskningen visar att detta tillvägagångssätt för närvarande inte stöds av resultaten. Storleken på det beräknade bruset var inte tillräckligt stort och resulterade därmed i en oacceptabel mängd läckt information. Forskningen visar emellertid att genom att begränsa bruset till en lägsta nivå som är beräknad från det lokala datasetet möjligtvis kan räcka för att uppfylla alla sekretesskrav. Ytterligare forskning behövs för att säkerställa att detta ger den nödvändiga nivån av integritet. Vidare undersöktes inte noggrannheten hos maskininlärningsalgoritmen och dess inverkan på brusets användbarhet vilket kan vara en inriktning för vidare studier.
Vu, Xuan-Son. "Privacy-awareness in the era of Big Data and machine learning." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-162182.
Повний текст джерелаTania, Zannatun Nayem. "Machine Learning with Reconfigurable Privacy on Resource-Limited Edge Computing Devices." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292105.
Повний текст джерелаDistribuerad databehandling möjliggör effektiv datalagring, bearbetning och hämtning men det medför säkerhets- och sekretessproblem. Sensorer är hörnstenen i de IoT-baserade rörledningarna, eftersom de ständigt samlar in data tills de kan analyseras på de centrala molnresurserna. Dessa sensornoder begränsas dock ofta av begränsade resurser. Helst är det önskvärt att göra alla insamlade datafunktioner privata, men på grund av resursbegränsningar kanske det inte alltid är möjligt. Att göra alla funktioner privata kan orsaka överutnyttjande av resurser, vilket i sin tur skulle påverka prestanda för hela systemet. I denna avhandling designar och implementerar vi ett system som kan hitta den optimala uppsättningen datafunktioner för att göra privata, med tanke på begränsningar av enhetsresurserna och systemets önskade prestanda eller noggrannhet. Med hjälp av generaliseringsteknikerna för data-anonymisering skapar vi användardefinierade injicerbara sekretess-kodningsfunktioner för att göra varje funktion i datasetet privat. Oavsett resurstillgänglighet definieras vissa datafunktioner av användaren som viktiga funktioner för att göra privat. Alla andra datafunktioner som kan utgöra ett integritetshot kallas de icke-väsentliga funktionerna. Vi föreslår Dynamic Iterative Greedy Search (DIGS), en girig sökalgoritm som tar resursförbrukningen för varje icke-väsentlig funktion som inmatning och ger den mest optimala uppsättningen icke-väsentliga funktioner som kan vara privata med tanke på tillgängliga resurser. Den mest optimala uppsättningen innehåller de funktioner som förbrukar minst resurser. Vi utvärderar vårt system på en Fitbit-dataset som innehåller 17 datafunktioner, varav 4 är viktiga privata funktioner för en viss klassificeringsapplikation. Våra resultat visar att vi kan erbjuda ytterligare 9 privata funktioner förutom de 4 viktiga funktionerna i Fitbit-datasetet som innehåller 1663 poster. Dessutom kan vi spara 26; 21% minne jämfört med att göra alla funktioner privata. Vi testar också vår metod på en större dataset som genereras med Generative Adversarial Network (GAN). Den valda kantenheten, Raspberry Pi, kan dock inte tillgodose storleken på den stora datasetet på grund av otillräckliga resurser. Våra utvärderingar med 1=8th av GAN-datasetet resulterar i 3 extra privata funktioner med upp till 62; 74% minnesbesparingar jämfört med alla privata datafunktioner. Att upprätthålla integritet kräver inte bara ytterligare resurser utan har också konsekvenser för de designade applikationernas prestanda. Vi upptäcker dock att integritetskodning har en positiv inverkan på noggrannheten i klassificeringsmodellen för vår valda klassificeringsapplikation.
Shaham, Sina. "Location Privacy in the Era of Big Data and Machine Learning." Thesis, The University of Sydney, 2019. https://hdl.handle.net/2123/21689.
Повний текст джерелаRomanelli, Marco. "Machine Learning methods for privacy protection: leakage measurement and mechanism design." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1118314.
Повний текст джерелаCarlsson, Robert. "Privacy-Preserved Federated Learning : A survey of applicable machine learning algorithms in a federated environment." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-424383.
Повний текст джерелаZheng, Yao. "Privacy Preservation for Cloud-Based Data Sharing and Data Analytics." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/73796.
Повний текст джерелаPh. D.
DEMETRIO, LUCA. "Formalizing evasion attacks against machine learning security detectors." Doctoral thesis, Università degli studi di Genova, 2021. http://hdl.handle.net/11567/1035018.
Повний текст джерелаMivule, Kato. "An investigation of data privacy and utility using machine learning as a gauge." Thesis, Bowie State University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3619387.
Повний текст джерелаThe purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning classification, and the generation of synthetic data sets. Hence, data privacy and utility preservation using machine learning classification as a gauge is the central focus of this study. Many organizations that transact in large amounts of data have to comply with state, federal, and international laws to guarantee that the privacy of individuals and other sensitive data is not compromised. Yet at some point during the data privacy process, data loses its utility - a measure of how useful a privatized dataset is to the user of that dataset. Data privacy researchers have documented that attaining an optimal balance between data privacy and utility is an NP-hard challenge, thus an intractable problem. Therefore we propose the classification error gauge (x-CEG) approach, a data utility quantification concept that employs machine learning classification techniques to gauge data utility based on the classification error. In the initial phase of this proposed approach, a data privacy algorithm such as differential privacy, Gaussian noise addition, generalization, and or k-anonymity is applied on a dataset for confidentiality, generating a privatized synthetic data set. The privatized synthetic data set is then passed through a machine learning classifier, after which the classification error is measured. If the classification error is lower or equal to a set threshold, then better utility might be achieved, otherwise, adjustment to the data privacy parameters is made and then the refined synthetic data set is sent to the machine learning classifier; the process repeats until the error threshold is reached. Additionally, this study presents the Comparative x-CEG concept, in which a privatized synthetic data set is passed through a series of classifiers, each of which returns a classification error, and the classifier with the lowest classification error is chosen after parameter adjustments, an indication of better data utility. Preliminary results from this investigation show that fine-tuning parameters in data privacy procedures, for example in the case of differential privacy, and increasing weak learners in the ensemble classifier for instance, might lead to lower classification error, thus better utility. Furthermore, this study explores the application of this approach by employing signal processing techniques in the generation of privatized synthetic data sets and improving data utility. This dissertation presents theoretical and empirical work examining various data privacy and utility methodologies using machine learning classification as a gauge. Similarly this study presents a resourceful approach in the generation of privatized synthetic data sets, and an innovative conceptual framework for the data privacy engineering process.
Sharma, Sagar. "Towards Data and Model Confidentiality in Outsourced Machine Learning." Wright State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275.
Повний текст джерелаPetrucci, Edoardo. "A Personalized Privacy Management Framework for Android Applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.
Знайти повний текст джерелаHaupt, Johannes Sebastian. "Machine Learning for Marketing Decision Support." Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21554.
Повний текст джерелаThe digitization of the economy has fundamentally changed the way in which companies interact with customers and made customer targeting a key intersection of marketing and information systems. Building models of customer behavior at scale requires development of tools at the intersection of data management and statistical knowledge discovery. This dissertation widens the scope of research on predictive modeling by focusing on the intersections of model building with data collection and decision support. Its goals are 1) to develop and validate new machine learning methods explicitly designed to optimize customer targeting decisions in direct marketing and customer retention management and 2) to study the implications of data collection for customer targeting from the perspective of the company and its customers. First, the thesis proposes methods that utilize the richness of e-commerce data, reduce the cost of data collection through efficient experiment design and address the targeting decision setting during model building. The underlying state-of-the-art machine learning models scale to high-dimensional customer data and can be conveniently applied by practitioners. These models further address the problem of causal inference that arises when the causal attribution of customer behavior to a marketing incentive is difficult. Marketers can directly apply the model estimates to identify profitable targeting policies under complex cost structures. Second, the thesis quantifies the savings potential of efficient experiment design and the monetary cost of an internal principle of data privacy. An analysis of data collection practices in direct marketing emails reveals the ubiquity of tracking mechanisms without user consent in e-commerce communication. These results form the basis for a machine-learning-based system for the detection and deletion of tracking elements from emails.
Darwish, Roba N. Darwish. "A Detailed Study of User Privacy Behavior in Social Media." Kent State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=kent1510704797892479.
Повний текст джерелаDinh, The Canh. "Distributed Algorithms for Fast and Personalized Federated Learning." Thesis, The University of Sydney, 2023. https://hdl.handle.net/2123/30019.
Повний текст джерелаBahrak, Behnam. "Ex Ante Approaches for Security, Privacy, and Enforcement in Spectrum Sharing." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/24720.
Повний текст джерелаPh. D.
Minelli, Michele. "Fully homomorphic encryption for machine learning." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE056/document.
Повний текст джерелаFully homomorphic encryption enables computation on encrypted data without leaking any information about the underlying data. In short, a party can encrypt some input data, while another party, that does not have access to the decryption key, can blindly perform some computation on this encrypted input. The final result is also encrypted, and it can be recovered only by the party that possesses the secret key. In this thesis, we present new techniques/designs for FHE that are motivated by applications to machine learning, with a particular attention to the problem of homomorphic inference, i.e., the evaluation of already trained cognitive models on encrypted data. First, we propose a novel FHE scheme that is tailored to evaluating neural networks on encrypted inputs. Our scheme achieves complexity that is essentially independent of the number of layers in the network, whereas the efficiency of previously proposed schemes strongly depends on the topology of the network. Second, we present a new technique for achieving circuit privacy for FHE. This allows us to hide the computation that is performed on the encrypted data, as is necessary to protect proprietary machine learning algorithms. Our mechanism incurs very small computational overhead while keeping the same security parameters. Together, these results strengthen the foundations of efficient FHE for machine learning, and pave the way towards practical privacy-preserving deep learning. Finally, we present and implement a protocol based on homomorphic encryption for the problem of private information retrieval, i.e., the scenario where a party wants to query a database held by another party without revealing the query itself
Sperandio, Ricardo Carlini. "Time series retrieval using DTW-preserving shapelets." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S061.
Повний текст джерелаEstablishing the similarity of time series is at the core of many data mining tasks such as time series classification, time series clustering, time series retrieval, among others. Metrics to establish similarities between time series are specific in the sense that they must be able to take into account the differences in the values making the series as well as distortions along the timelines. The most popular similarity metric is the Dynamic Time Warping (DTW) measure. However, it is costly to compute, and using it against numerous and/or very long time series is difficult in practice. There has been numerous attempts to accelerate the DTW, yet, scaling DTW remains a major difficulty. An elegant research direction proposes to change the representation of time series such that it is much cheaper to establish similarities. This typically relies on an embedding process where vectorial representations of time series are constructed, allowing then to estimate their similarity using e.g. L2 distances, much faster to compute than DTW. Naturally, the quality of this representation largely depends on the embedding process, and the family of contributions relying on the concept of shapelets prove to work particularly well. Shapelets, and the transform operation materializing the embedding process, were originally proposed for time series classification. Shapelets are independent subsequences extracted or learned from time series to form discriminatory features. Shapelets are used to transform time series in high dimensional (Euclidean) vectors. Recently, it was proposed to embed time series into an Euclidean space such that the distance in this embedded space well approximates the true DTW. This contribution targets time series clustering. The work presented in this Ph.D. manuscript builds on the idea of transforming time series using shapelets. It shows how shapelets that preserve DTW measures can be used in the specific context of large scale time series retrieval. This manuscript is making major contributions: (1) it explains how DTW-preserving shapelets can be used in the specific context of time series retrieval; (2) it proposes some shapelet selection strategies in order to cope with scale, that is, in order to deal with extremely large collection of time series; (3) it details how to handle both univariate and multivariate time series, hence covering the whole spectrum of time series retrieval problems. The core of the contribution presented in this manuscript allows to easily trade-off the complexity of the transformation against the accuracy of the retrieval. Experiments using the UCR and the UEA datasets demonstrate the vast performance improvements compared to state of the art techniques
Babina, Chiara. "Privacy nel contesto location-based services." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16199/.
Повний текст джерелаSpolaor, Riccardo. "Security and Privacy Threats on Mobile Devices through Side-Channels Analysis." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3426796.
Повний текст джерелаNegli ultimi anni, i dispositivi mobili (come smartphone e tablet) sono diventati strumenti essenziali nella vita di tutti i giorni per miliardi di persone in tutto il mondo. Gli utenti utilizzano continuamente tali dispositivi per le attività quotidiane di comunicazione e le interazioni dei social network. Quindi, tali dispositivi contengono un'enorme quantità di informazioni private e sensibili. Per questo motivo, i dispositivi mobili diventano popolari bersagli di attacchi. Nella maggior parte degli attacchi ai dispositivi mobili, l'avversario ha l'obiettivo di prendere il controllo locale o remoto del dispositivo, per accedere alle informazioni sensibili dell'utente. Tuttavia, tali violazioni non sono facili da portare a termine poiché devono sfruttare una vulnerabilità del sistema o un utente distratto (ad esempio, installare un'app malware da una fonte inaffidabile). Un approccio diverso che non ha queste carenze è l'analisi dei canali laterali. In effetti, i canali laterali sono fenomeni fisici misurabili dall'interno o dall'esterno di un dispositivo. Sono principalmente dovuti all'interazione dell'utente con un dispositivo mobile, ma anche al contesto in cui viene utilizzato il dispositivo, quindi possono rivelare informazioni private tra cui l'identità e abitudini, ambiente e sistema operativo stesso. Quindi, questo approccio consiste nel dedurre informazioni private che sono trapelate da un dispositivo mobile attraverso un canale laterale. Inoltre, le informazioni sul canale laterale sono estremamente preziose per rafforzare i meccanismi di sicurezza come l'autenticazione dell'utente, l'intrusione e il rilevamento di furto di informazioni. Questa tesi studia le nuove sfide relative alla sicurezza e alla privacy nell'analisi dei canali secondari dei dispositivi mobili. Questa tesi è composta da tre parti, ognuna focalizzata su un canale laterale diverso: (i) l'uso dell'analisi del traffico di rete per dedurre le informazioni private dell'utente; (ii) il consumo di energia dei dispositivi mobili durante la ricarica della batteria come mezzo per identificare un utente e come canale nascosto per estrarre i dati; e (iii) l'eventuale applicazione di sicurezza dei dati raccolti dai sensori integrati nei dispositivi mobili per autenticare l'utente e per evitare il rilevamento di sandbox da parte di malware. Nella prima parte di questa tesi, consideriamo un avversario in grado di intercettare il traffico di rete del dispositivo sul lato della rete (ad esempio, controllando un punto di accesso WiFi). Il fatto che il traffico di rete sia spesso crittografato rende l'attacco ancora più impegnativo. Il nostro lavoro dimostra che è possibile sfruttare le tecniche di machine learning per identificare le attività degli utenti e le app installate sui dispositivi mobili analizzando il traffico di rete crittografato che producono. Queste informazioni stanno diventando una tecnica di raccolta dati molto attraente per avversari, amministratori di rete, investigatori e agenzie di marketing. Nella seconda parte di questa tesi, esaminiamo l'analisi del consumo di energia elettrica. In questo caso, un avversario è in grado di misurare con un monitor di potenza la quantità di energia fornita a un dispositivo mobile. Infatti, abbiamo osservato che l'utilizzo delle risorse del dispositivo mobile (ad es. CPU, capacità di rete) influisce direttamente sulla quantità di energia erogata, ovvero dalla porta USB per smartphone o dalla presa a muro per laptop. Sfruttando le tracce di energia, siamo in grado di riconoscere uno specifico utente di laptop in un gruppo e a rilevare potenziali intrusi (ad esempio, l'utente non appartenente al gruppo). Inoltre, mostriamo la fattibilità di un canale nascosto per estrarre dati dell'utente che si basa su picchi temporizzati di consumo di energia. Nell'ultima parte di questa tesi, presentiamo un canale laterale che può essere misurato all'interno del dispositivo mobile stesso. Tale canale è costituito da dati raccolti dai sensori di cui è dotato un dispositivo mobile (ad es. accelerometro, giroscopio). Innanzitutto, presentiamo DELTA, un nuovo strumento che raccoglie i dati da tali sensori e registra gli eventi degli utenti e del sistema operativo. Dopodichè presentiamo MIRAGE, un framework che si basa sui dati dei sensori per migliorare le sandbox contro l'evasione delle analisi malware.
Rekanar, Kaavya. "Text Classification of Legitimate and Rogue online Privacy Policies : Manual Analysis and a Machine Learning Experimental Approach." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13363.
Повний текст джерелаAlisic, Rijad. "Privacy of Sudden Events in Cyber-Physical Systems." Licentiate thesis, KTH, Reglerteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299845.
Повний текст джерелаDe senaste åren har cyberanfall mot kritiska infrastructurer varit ett växande problem. Dessa infrastrukturer är särskilt utsatta för cyberanfall, eftersom de uppfyller en nödvändig function för att ett samhälle ska fungera. Detta gör dem till önskvärda mål för en anfallare. Om en kritisk infrastruktur stoppas från att uppfylla sin funktion, då kan det medföra förödande konsekvenser för exempelvis en nations ekonomi, säkerhet eller folkhälsa. Anledningen till att mängden av attacker har ökat beror på att kritiska infrastrukturer har blivit alltmer komplexa eftersom de numera ingår i stora nätverk dör olika typer av cyberkomponenter ingår. Det är just genom dessa cyberkomponenter som en anfallare kan få tillgång till systemet och iscensätta cyberanfall. I denna avhandling utvecklar vi metoder som kan användas som en första försvarslinje mot cyberanfall på cyberfysiska system (CPS). Vi med att undersöka hur informationsläckor om systemdynamiken kan hjälpa en anfallare att skapa svårupptäckta attacker. Oftast är sådana attacker förödande för CPS, eftersom en anfallare kan tvinga systemet till en bristningsgräns utan att bli upptäcka av operatör vars uppgift är att säkerställa systemets fortsatta funktion. Vi bevisar att en anfallare kan använda relativt små mängder av data för att generera dessa svårupptäckta attacker. Mer specifikt så härleder ett uttryck för den minsta mängd information som krävs för att ett anfall ska vara svårupptäckt, även för fall då en operatör tar till sig metoder för att undersöka om systemet är under attack. I avhandlingen konstruerar vi försvarsmetoder mot informationsläcker genom Hammersley-Chapman-Robbins olikhet. Med denna olikhet kan vi studera hur informationsläckan kan dämpas genom att injicera brus i datan. Specifikt så undersöker vi hur mycket information om strukturerade insignaler, vilket vi kallar för händelser, till ett dynamiskt system som en anfallare kan extrahera utifrån dess utsignaler. Dessutom kollar vi på hur denna informationsmängd beror på systemdynamiken. Exempelvis så visar vi att ett system med snabb dynamik läcker mer information jämfört med ett långsammare system. Däremot smetas informationen ut över ett längre tidsintervall för långsammare system, vilket leder till att anfallare som börjar tjuvlyssna på ett system långt efter att händelsen har skett kan fortfarande uppskatta den. Dessutom så visar vi jur sensorplaceringen i ett CPS påverkar infromationsläckan. Dessa reultat kan användas för att bistå en operatör att analysera sekretessen i ett CPS. Vi använder även Hammersley-Chapman-Robbins olikhet för att utveckla försvarslösningar mot informationsläckor som kan användas \textit{online}. Vi föreslår modifieringar till den strukturella insignalen så att systemets befintliga brus utnyttjas bättre för att gömma händelsen. Om operatören har andra mål den försöker uppfylla med styrningen så kan denna metod användas för att styra avvängingen mellan sekretess och operatorns andra mål. Slutligen så visar vi hur en anfallares uppskattning av händelsen förbättras som en funktion av mängden data får tag på. Operatorn kan använda informationen för att ta reda på när anfallaren kan tänka sig vara redo att anfalla systemet, och därefter ändra systemet innan detta sker, vilket gör att anfallarens information inte längre är användbar.
QC 20210820
Baier, Lucas [Verfasser], and G. [Akademischer Betreuer] Satzger. "Concept Drift Handling in Information Systems: Preserving the Validity of Deployed Machine Learning Models / Lucas Baier ; Betreuer: G. Satzger." Karlsruhe : KIT-Bibliothek, 2021. http://d-nb.info/1241189250/34.
Повний текст джерелаWang, Yu-Xiang. "New Paradigms and Optimality Guarantees in Statistical Learning and Estimation." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/1113.
Повний текст джерелаHathurusinghe, Rajitha. "Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated Learning." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41011.
Повний текст джерелаTout, Hicham Refaat. "Measuring the Impact of email Headers on the Predictive Accuracy of Machine Learning Techniques." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/325.
Повний текст джерела