To see the other types of publications on this topic, follow the link: Sensitive date.

Dissertations / Theses on the topic 'Sensitive date'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Sensitive date.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ema, Ismat. "Sensitive Data Migration to the Cloud." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-64736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Folkesson, Carl. "Anonymization of directory-structured sensitive data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160952.

Full text
Abstract:
Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
APA, Harvard, Vancouver, ISO, and other styles
3

Subbiah, Arun. "Efficient Proactive Security for Sensitive Data Storage." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19719.

Full text
Abstract:
Fault tolerant and secure distributed data storage systems typically require that only up to a threshold of storage nodes can ever be compromised or fail. In proactively-secure systems, this requirement is modified to hold only in a time interval (also called epoch), resulting in increased security. An attacker or adversary could compromise distinct sets of nodes in any two time intervals. This attack model is also called the mobile adversary model. Proactively-secure systems require all nodes to "refresh" themselves periodically to a clean state to maintain the availability, integrity, and confidentiality properties of the data storage service. This dissertation investigates the design of a proactively-secure distributed data storage system. Data can be stored at storage servers using encoding schemes called secret sharing, or encryption-with-replication. The primary challenge is that the protocols that the servers run periodically to maintain integrity and confidentiality must scale with large amounts of stored data. Determining how much data can be proactively-secured in practical settings is an important objective of this dissertation. The protocol for maintain the confidentiality of stored data is developed in the context of data storage using secret sharing. We propose a new technique called the GridSharing framework that uses a combination of XOR secret sharing and replication for storing data efficiently. We experimentally show that the algorithm can secure several hundred GBs of data. We give distributed protocols run periodically by the servers for maintaining the integrity of replicated data under the mobile adversary model. This protocol is integrated into a document repository to make it proactively-secure. The proactively-secure document repository is implemented and evaluated on the Emulab cluster (http://www.emulab.net). The experimental evaluation shows that several 100 GBs of data can be proactively-secured. This dissertation also includes work on fault and intrusion detection - a necessary component in any secure system. We give a novel Byzantine-fault detection algorithm for quorum systems, and experimentally evaluate its performance using simulations and by deploying it in the AgileFS distributed file system.
APA, Harvard, Vancouver, ISO, and other styles
4

Bakri, Mustafa al. "Uncertainty-Sensitive Reasoning over the Web of Data." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM073.

Full text
Abstract:
Dans cette thèse, nous étudions plusieurs approches destinées à aider les utilisateurs à trouver des informations utiles et fiables dans le Web de données, en utilisant les technologies du Web sémantique. Nous abordons pour cela deux thèmes de recherche: le liage de données dans le Linked-Data et la confiance dans les réseaux P2P sémantiques. Nous modélisons le problème de liage dans le Web de données comme un problème de raisonnement sur des données incomplètes, qu'il s'agit d'enrichir en interrogeant de façon précise et pertinente le cloud du Linked Data. Nous avons conçu et implémenté un nouvel algorithme qui, à partir d'une requête de liage (du type et d'une base de règles modélisant de manière uniforme diverses connaissances du domaine (contraintes du schéma, axiomes d'inclusion ou d'exclusion d'une ontologie, règles expertes, mappings), construit itérativement des requêtes SPARQL pour importer des sources externes pertinentes du Linked Data les données utiles pour répondre à la requête de liage. Les expérimentations que nous avons menées sur des données réelles ont démontré la faisabilité de cette approche et son utilité dans la pratique pour le liage de données et la résolution d'homonymie. En outre, nous proposons une adaptation de cette approche pour prendre en compte des données et des connaissances éventuellement incertaines, avec en résultat l'inférence de liens ‘sameAs' et ‘differentFrom' associés à des poids de probabilité. Dans cette adaptation nous modélisons l'incertitude comme des valeurs de probabilité. Nos expérimentations ont montré que notre approche passe à l'échelle pour des bases de connaissances constituées de plusieurs millions de faits RDF et produit des poids probabilistes fiables. Concernant la confiance, nous introduisons un mécanisme de confiance permettant de guider le processus de réponse aux requêtes dans des Réseaux P2P sémantiques. Les différents pairs dans les réseaux P2P sémantiques organisent leur information en utilisant des ontologies distinctes et d épendent d'alignements entre ontologies pour traduire leurs requêtes. La notion de confiance dans un tel contexte est subjective ; elle estime la probabilité qu'un pair apportera des réponses satisfaisantes pour les requêtes spécifiques dans les interactions futures. Le mécanisme proposé de calcul de valeurs de confiance combine les informations fournies par les alignements avec celles provenant des interactions passées entre pairs. Les valeurs de confiances calculées sont affinées progressivement à chaque cycle de requête/réponse en utilisant l'inférence bayésienne. Pour l'évaluation de notre mécanisme, nous avons construit un système P2P de partage de signets sémantiques (TrustMe) dans lequel il est possible de faire varier différents paramètres quantitatifs et qualitatifs. Les résultats expérimentaux montrent la convergence des valeurs de confiance ;.ils mettent également en évidence le gain en terme de qualité des réponses des pairs - mesurées selon la précision et le rappel- lorsque le processus de réponse aux requêtes est guidé par notre mécanisme de confiance
In this thesis we investigate several approaches that help users to find useful and trustful informationin the Web of Data using the Semantic Web technologies. In this purpose, we tackle tworesearch issues: Data Linkage in Linked Data and Trust in Semantic P2P Networks. We model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel Import-by-Query algorithm that alternates steps of subquery rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation. Furthermore, we propose an adaptation of this approach to take into account possibly uncertain data and knowledge, with a result the inference of same-as and different-from links associated with probabilistic weights. In this adaptation we model uncertainty as probability values. Our experiments have shown that our adapted approach scales to large data sets and produces meaningful probabilistic weights. Concerning trust, we introduce a trust mechanism for guiding the query-answering process in Semantic P2P Networks. Peers in Semantic P2P Networks organize their information using separate ontologies and rely on alignments between their ontologies for translating queries. Trust is such a setting is subjective and estimates the probability that a peer will provide satisfactory answers for specific queries in future interactions. In order to compute trust, the mechanism exploits the information provided by alignments, along with the one that comes from peer's experiences. The calculated trust values are refined over time using Bayesian inference as more queries are sent and answers received. For the evaluation of our mechanism, we have built a semantic P2P bookmarking system (TrustMe) in which we can vary different quantitative and qualitative parameters. The experimental results show the convergence of trust, and highlight the gain in the quality of peers' answers —measured with precision and recall— when the process of query answering is guided by our trust mechanism
APA, Harvard, Vancouver, ISO, and other styles
5

Ljus, Simon. "Purging Sensitive Data in Logs Using Machine Learning." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-411610.

Full text
Abstract:
This thesis investigates how to remove personal data from logs using machine learning when rule-based scripts are not enough and manual scanning is too extensive. Three types of machine learning models were created and compared. One word model using logistic regression, another word model using LSTM and a sentence model also using LSTM. Data logs were cleaned and annotated using rule-based scripts, datasets from various countries and dictionaries from various languages. The created dataset for the sentence based model was imbalanced, and a lite version of data augmentation was applied. A hyperparameter optimization library was used to find the best hyperparameter combination. The models learned the training and the validation set well but did perform worse on the test set consisting of log data from a different server logging other types of data.
Detta examensarbete undersöker om det är möjligt att skapa ett program som automatiskt identifierar och tar bort persondata från dataloggar med hjälp av maskinlärning. Att förstå innebörden av vissa ord kräver också kontext: Banan kan syfta på en banan som man kan äta eller en bana som man kan springa på. Kan en maskinlärningsmodell ta nytta av föregående och efterkommande ord i en sekvens av ord för att få en bättre noggrannhet på om ordet är känsligt eller ej. Typen av data som förekommer i loggarna kan vara bland annat namn, personnummer, användarnamn och epostadress. För att modellen ska kunna lära sig att känna igen datan krävs det att det finns data som är färdigannoterad med facit i hand. Telefonnummer, personnummer och epostadress kan bara se ut på ett visst sätt och behöver nödvändigtvis ingen maskininlärning för att kunna pekas ut. Kan man skapa en generell modell som fungerar på flera typer av dataloggar utan att använda regelbaserade algoritmer. Resultaten visar att den annoterade datan som användes för träning kan ha skiljt allt för mycket från de loggar som har testats på (osedd data), vilket betyder att modellen inte är bra på att generalisera.
APA, Harvard, Vancouver, ISO, and other styles
6

Oshima, Sonoko. "Neuromelanin‐Sensitive Magnetic Resonance Imaging Using DANTE Pulse." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

El-Khoury, Hiba. "Introduction of New Products in the Supply Chain : Optimization and Management of Risks." Thesis, Jouy-en Josas, HEC, 2012. http://www.theses.fr/2012EHEC0001/document.

Full text
Abstract:
Les consommateurs d’aujourd’hui ont des goûts très variés et cherchent les produits les plus récents. Avec l’accélération technologique, les cycles de vie des produits se sont raccourcis et donc, de nouveaux produits doivent être introduits au marché plus souvent et progressivement, les anciens doivent y être retirés. L’introduction d’un nouveau produit est une source de croissance et d’avantage concurrentiel. Les directeurs du Marketing et Supply Chain se sont confrontés à la question de savoir comment gérer avec succès le remplacement de leurs produits et d’optimiser les coûts de la chaîne d’approvisionnement associée. Dans une situation idéale, la procédure de rollover est efficace et claire: l’ancien produit est vendu jusqu’à une date prévue où un nouveau produit est introduit. Dans la vie réelle, la situation est moins favorable. Le but de notre travail est d’analyser et de caractériser la politique optimale du rollover avec une date de disponibilitéstochastique pour l’introduction du nouveau produit sur le marché. Pour résoudre le problème d’optimisation,nous utilisons dans notre premier article deux mesures de minimisation: le coût moyen et le coût de la valeurconditionnelle à risque. On obtient des solutions en forme explicite pour les politiques optimales. En outre, nous caractérisons l’influence des paramètres de coûts sur la structure de la politique optimale. Dans cet esprit, nous analysons aussi le comportement de la politique de rollover optimale dans des contextes différents. Dans notre deuxième article, nous examinons le même problème mais avec une demande constante pour le premier produit et une demande linéaire au début puis constante pour le deuxième. Ce modèle est inspiré par la demande de Bass. Dans notre troisième article, la date de disponibilité du nouveau produit existe mais elle est inconnue. La seule information disponible est un ensemble historique d’échantillons qui sont tirés de la vraie distribution. Nous résoudrons le problème avec l’approche data drivenet nous obtenons des formulations tractables. Nous développons aussi des bornes sur le nombre d’échantillons nécessaires pour garantir qu’avec une forte probabilité, le coût n’est pas très loin du vrai coût optimal
Shorter product life cycles and rapid product obsolescence provide increasing incentives to introduce newproducts to markets more quickly. As a consequence of rapidly changing market conditions, firms focus onimproving their new product development processes to reap the benefits of early market entry. Researchershave analyzed market entry, but have seldom provided quantitative approaches for the product rolloverproblem. This research builds upon the literature by using established optimization methods to examine howfirms can minimize their net loss during the rollover process. Specifically, our work explicitly optimizes thetiming of removal of old products and introduction of new products, the optimal strategy, and the magnitudeof net losses when the market entry approval date of a new product is unknown. In the first paper, we use theconditional value at risk to optimize the net loss and investigate the effect of risk perception of the manageron the rollover process. We compare it to the minimization of the classical expected net loss. We deriveconditions for optimality and unique closed-form solutions for single and dual rollover cases. In the secondpaper, we investigate the rollover problem, but for a time-dependent demand rate for the second producttrying to approximate the Bass Model. Finally, in the third paper, we apply the data-driven optimizationapproach to the product rollover problem where the probability distribution of the approval date is unknown.We rather have historical observations of approval dates. We develop the optimal times of rollover and showthe superiority of the data-driven method over the conditional value at risk in case where it is difficult to guessthe real probability distribution
APA, Harvard, Vancouver, ISO, and other styles
8

Gholami, Ali. "Security and Privacy of Sensitive Data in Cloud Computing." Doctoral thesis, KTH, Parallelldatorcentrum, PDC, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186141.

Full text
Abstract:
Cloud computing offers the prospect of on-demand, elastic computing, provided as a utility service, and it is revolutionizing many domains of computing. Compared with earlier methods of processing data, cloud computing environments provide significant benefits, such as the availability of automated tools to assemble, connect, configure and reconfigure virtualized resources on demand. These make it much easier to meet organizational goals as organizations can easily deploy cloud services. However, the shift in paradigm that accompanies the adoption of cloud computing is increasingly giving rise to security and privacy considerations relating to facets of cloud computing such as multi-tenancy, trust, loss of control and accountability. Consequently, cloud platforms that handle sensitive information are required to deploy technical measures and organizational safeguards to avoid data protection breakdowns that might result in enormous and costly damages. Sensitive information in the context of cloud computing encompasses data from a wide range of different areas and domains. Data concerning health is a typical example of the type of sensitive information handled in cloud computing environments, and it is obvious that most individuals will want information related to their health to be secure. Hence, with the growth of cloud computing in recent times, privacy and data protection requirements have been evolving to protect individuals against surveillance and data disclosure. Some examples of such protective legislation are the EU Data Protection Directive (DPD) and the US Health Insurance Portability and Accountability Act (HIPAA), both of which demand privacy preservation for handling personally identifiable information. There have been great efforts to employ a wide range of mechanisms to enhance the privacy of data and to make cloud platforms more secure. Techniques that have been used include: encryption, trusted platform module, secure multi-party computing, homomorphic encryption, anonymization, container and sandboxing technologies. However, it is still an open problem about how to correctly build usable privacy-preserving cloud systems to handle sensitive data securely due to two research challenges. First, existing privacy and data protection legislation demand strong security, transparency and audibility of data usage. Second, lack of familiarity with a broad range of emerging or existing security solutions to build efficient cloud systems. This dissertation focuses on the design and development of several systems and methodologies for handling sensitive data appropriately in cloud computing environments. The key idea behind the proposed solutions is enforcing the privacy requirements mandated by existing legislation that aims to protect the privacy of individuals in cloud-computing platforms. We begin with an overview of the main concepts from cloud computing, followed by identifying the problems that need to be solved for secure data management in cloud environments. It then continues with a description of background material in addition to reviewing existing security and privacy solutions that are being used in the area of cloud computing. Our first main contribution is a new method for modeling threats to privacy in cloud environments which can be used to identify privacy requirements in accordance with data protection legislation. This method is then used to propose a framework that meets the privacy requirements for handling data in the area of genomics. That is, health data concerning the genome (DNA) of individuals. Our second contribution is a system for preserving privacy when publishing sample availability data. This system is noteworthy because it is capable of cross-linking over multiple datasets. The thesis continues by proposing a system called ScaBIA for privacy-preserving brain image analysis in the cloud. The final section of the dissertation describes a new approach for quantifying and minimizing the risk of operating system kernel exploitation, in addition to the development of a system call interposition reference monitor for Lind - a dual sandbox.
“Cloud computing”, eller “molntjänster” som blivit den vanligaste svenska översättningen, har stor potential. Molntjänster kan tillhandahålla exaktden datakraft som efterfrågas, nästan oavsett hur stor den är; dvs. molntjäns-ter möjliggör vad som brukar kallas för “elastic computing”. Effekterna avmolntjänster är revolutionerande inom många områden av datoranvändning.Jämfört med tidigare metoder för databehandling ger molntjänster mångafördelar; exempelvis tillgänglighet av automatiserade verktyg för att monte-ra, ansluta, konfigurera och re-konfigurera virtuella resurser “allt efter behov”(“on-demand”). Molntjänster gör det med andra ord mycket lättare för or-ganisationer att uppfylla sina målsättningar. Men det paradigmskifte, sominförandet av molntjänster innebär, skapar även säkerhetsproblem och förutsätter noggranna integritetsbedömningar. Hur bevaras det ömsesidiga förtro-endet, hur hanteras ansvarsutkrävandet, vid minskade kontrollmöjligheter tillföljd av delad information? Följaktligen behövs molnplattformar som är såkonstruerade att de kan hantera känslig information. Det krävs tekniska ochorganisatoriska hinder för att minimera risken för dataintrång, dataintrångsom kan resultera i enormt kostsamma skador såväl ekonomiskt som policymässigt. Molntjänster kan innehålla känslig information från många olikaområden och domäner. Hälsodata är ett typiskt exempel på sådan information. Det är uppenbart att de flesta människor vill att data relaterade tillderas hälsa ska vara skyddad. Så den ökade användningen av molntjänster påsenare år har medfört att kraven på integritets- och dataskydd har skärptsför att skydda individer mot övervakning och dataintrång. Exempel på skyd-dande lagstiftning är “EU Data Protection Directive” (DPD) och “US HealthInsurance Portability and Accountability Act” (HIPAA), vilka båda kräverskydd av privatlivet och bevarandet av integritet vid hantering av informa-tion som kan identifiera individer. Det har gjorts stora insatser för att utvecklafler mekanismer för att öka dataintegriteten och därmed göra molntjänsternasäkrare. Exempel på detta är; kryptering, “trusted platform modules”, säker“multi-party computing”, homomorfisk kryptering, anonymisering, container-och “sandlåde”-tekniker.Men hur man korrekt ska skapa användbara, integritetsbevarande moln-tjänster för helt säker behandling av känsliga data är fortfarande i väsentligaavseenden ett olöst problem på grund av två stora forskningsutmaningar. Fördet första: Existerande integritets- och dataskydds-lagar kräver transparensoch noggrann granskning av dataanvändningen. För det andra: Bristande kän-nedom om en rad kommande och redan existerande säkerhetslösningar för att skapa effektiva molntjänster.Denna avhandling fokuserar på utformning och utveckling av system ochmetoder för att hantera känsliga data i molntjänster på lämpligaste sätt.Målet med de framlagda lösningarna är att svara de integritetskrav som ställsi redan gällande lagstiftning, som har som uttalad målsättning att skyddaindividers integritet vid användning av molntjänster.Vi börjar med att ge en överblick av de viktigaste begreppen i molntjäns-ter, för att därefter identifiera problem som behöver lösas för säker databe-handling vid användning av molntjänster. Avhandlingen fortsätter sedan med en beskrivning av bakgrundsmaterial och en sammanfattning av befintligasäkerhets- och integritets-lösningar inom molntjänster.Vårt främsta bidrag är en ny metod för att simulera integritetshot vidanvändning av molntjänster, en metod som kan användas till att identifierade integritetskrav som överensstämmer med gällande dataskyddslagar. Vårmetod används sedan för att föreslå ett ramverk som möter de integritetskravsom ställs för att hantera data inom området “genomik”. Genomik handlari korthet om hälsodata avseende arvsmassan (DNA) hos enskilda individer.Vårt andra större bidrag är ett system för att bevara integriteten vid publice-ring av biologiska provdata. Systemet har fördelen att kunna sammankopplaflera olika uppsättningar med data. Avhandlingen fortsätter med att före-slå och beskriva ett system kallat ScaBIA, ett integritetsbevarande systemför hjärnbildsanalyser processade via molntjänster. Avhandlingens avslutan-de kapitel beskriver ett nytt sätt för kvantifiering och minimering av risk vid“kernel exploitation” (“utnyttjande av kärnan”). Denna nya ansats är ävenett bidrag till utvecklingen av ett nytt system för (Call interposition referencemonitor for Lind - the dual layer sandbox).

QC 20160516

APA, Harvard, Vancouver, ISO, and other styles
9

Mathew, George. "A Perturbative Decision Making Framework for Distributed Sensitive Data." Diss., Temple University Libraries, 2014. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/269109.

Full text
Abstract:
Computer and Information Science
Ph.D.
In various business domains, intelligence garnered from data owned by peer institutions can provide useful information. But, due to regulations, privacy concerns and legal ramifications, peer institutions are reluctant to share raw data. For example, in medical domain, HIPAA regulations, Personally Identifiable Information and privacy issues are impediments to data sharing. However, intelligence can be learned from distributed data sets if their key characteristics are shared among desired parties. In scenarios where samples are rare locally, but adequately available collectively from other sites, sharing key statistics about the data may be sufficient to make proper decisions. The objective of this research is to provide a framework in a distributed environment that helps decision-making using statistics of data from participating sites; thereby eliminating the need for raw data to be shared outside the institution. Distributed ID3-based Decision Tree (DIDT) model building is proposed for effectively building a Decision Support System based on labeled data from distributed sites. The framework includes a query mechanism, a global schema generation process brokered by a clearing-house (CH), crosstable matrices generation by participating sites and entropy calculation (for test) using aggregate information from the crosstable matrices by CH. Empirical evaluations were done using synthetic and real data sets. Due to local data policies, participating sites may place restrictions on attribute release. The concept of "constraint graphs" is introduced as an out of band high level filtering for data in transit. Constraint graphs can be used to implement various data transformations including attributes exclusions. Experiments conducted using constraint graphs yielded results consistent with baseline results. In the case of medical data, it was shown that communication costs for DIDT can be contained by auto-reduction of features with predefined thresholds for near constant attributes. In another study, it was shown that hospitals with insufficient data to build local prediction models were able to collaboratively build a common prediction model with better accuracy using DIDT. This prediction model also reduced the number of incorrectly classified patients. A natural follow up question is: Can a hospital with sufficiently large number of instances provide a prediction model to a hospital with insufficient data? This was investigated and the signature of a large hospital dataset that can provide such a model is presented. It is also shown that the error rates of such a model is not statistically significant compared to the collaboratively built model. When rare instances of data occur in local database, it is quite valuable to draw conclusions collectively from such occurrences in other sites. However, in such situations, there will be huge imbalance in classes among the relevant base population. We present a system that can collectively build a distributed classification model without the need for raw data from each site in the case of imbalanced data. The system uses a voting ensemble of experts for the decision model, where each expert is built using DIDT on selective data generated by oversampling of minority class and undersampling of majority class data. The imbalance condition can be detected and the number of experts needed for the ensemble can be determined by the system.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
10

Ansell, Peter. "A context sensitive model for querying linked scientific data." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/49777/1/Peter_Ansell_Thesis.pdf.

Full text
Abstract:
This thesis provides a query model suitable for context sensitive access to a wide range of distributed linked datasets which are available to scientists using the Internet. The model is designed based on scientific research standards which require scientists to provide replicable methods in their publications. Although there are query models available that provide limited replicability, they do not contextualise the process whereby different scientists select dataset locations based on their trust and physical location. In different contexts, scientists need to perform different data cleaning actions, independent of the overall query, and the model was designed to accommodate this function. The query model was implemented as a prototype web application and its features were verified through its use as the engine behind a major scientific data access site, Bio2RDF.org. The prototype showed that it was possible to have context sensitive behaviour for each of the three mirrors of Bio2RDF.org using a single set of configuration settings. The prototype provided executable query provenance that could be attached to scientific publications to fulfil replicability requirements. The model was designed to make it simple to independently interpret and execute the query provenance documents using context specific profiles, without modifying the original provenance documents. Experiments using the prototype as the data access tool in workflow management systems confirmed that the design of the model made it possible to replicate results in different contexts with minimal additions, and no deletions, to query provenance documents.
APA, Harvard, Vancouver, ISO, and other styles
11

Sobel, Louis (Louis A. ). "Secure Input Overlays : increasing security for sensitive data on Android." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100624.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 44-47).
Mobile devices and the applications that run on them are an important part of people's lives. Often, an untrusted mobile application will need to obtain sensitive inputs, such as credit card information or passwords, from the user. The application needs these sensitive inputs in order to send them to a trusted service provider that enables the application to implement some useful functionality such as authentication or payment. In order for the inputs to be secure, there needs to be a trusted path from the user, through a trusted base on the mobile device, and to the remote service provider. In addition, remote attestation is necessary to convince the service provider that the inputs it receives traveled through the trusted path. There are two orthogonal parts to establishing the trusted path: local attestation and data protection. Local attestation is the user being convinced that they are interacting with the trusted base. Data protection is ensuring that inputs remain isolated from untrusted applications until they reach the trusted service provider. This paper categorizes previous research solutions to these two components of a trusted path. I then introduce a new solution addressing data protection: Secure Input Overlays. They keep input safe from untrusted applications by completely isolating the inputs from the untrusted mobile application. However, the untrusted application is still able to perform a limited set of queries for validation purposes. These queries are logged. When the application wants to send the inputs to a remote service provider, it declaratively describes the request. The trusted base sends the contents and the log of queries. An attestation generated by trusted hardware verifies that the request is coming from an Android device. The remote service provider can use this attestation to verify the request, then check the log of queries against a whitelist to make a trust decision about the supplied data.
by Louis Sobel.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
12

Landegren, Nils. "How Sensitive Are Cross-Lingual Mappings to Data-Specific Factors?" Thesis, Stockholms universitet, Institutionen för lingvistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-185069.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Lindblad, Christopher John. "A programming system for the dynamic manipulation of temporally sensitive data." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/37744.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (p. 255-277).
by Christopher John Lindblad.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
14

Le, Tallec Yann. "Robust, risk-sensitive, and data-driven control of Markov Decision Processes." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/38598.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2007.
Includes bibliographical references (p. 201-211).
Markov Decision Processes (MDPs) model problems of sequential decision-making under uncertainty. They have been studied and applied extensively. Nonetheless, there are two major barriers that still hinder the applicability of MDPs to many more practical decision making problems: * The decision maker is often lacking a reliable MDP model. Since the results obtained by dynamic programming are sensitive to the assumed MDP model, their relevance is challenged by model uncertainty. * The structural and computational results of dynamic programming (which deals with expected performance) have been extended with only limited success to accommodate risk-sensitive decision makers. In this thesis, we investigate two ways of dealing with uncertain MDPs and we develop a new connection between robust control of uncertain MDPs and risk-sensitive control of dynamical systems. The first approach assumes a model of model uncertainty and formulates the control of uncertain MDPs as a problem of decision-making under (model) uncertainty. We establish that most formulations are at least NP-hard and thus suffer from the "'curse of uncertainty." The worst-case control of MDPs with rectangular uncertainty sets is equivalent to a zero-sum game between the controller and nature.
(cont.) The structural and computational results for such games make this formulation appealing. By adding a penalty for unlikely parameters, we extend the formulation of worst-case control of uncertain MDPs and mitigate its conservativeness. We show a duality between the penalized worst-case control of uncertain MDPs with rectangular uncertainty and the minimization of a Markovian dynamically consistent convex risk measure of the sample cost. This notion of risk has desirable properties for multi-period decision making, including a new Markovian property that we introduce and motivate. This Markovian property is critical in establishing the equivalence between minimizing some risk measure of the sample cost and solving a certain zero-sum Markov game between the decision maker and nature, and to tackling infinite-horizon problems. An alternative approach to dealing with uncertain MDPs, which avoids the curse of uncertainty, is to exploit directly observational data. Specifically, we estimate the expected performance of any given policy (and its gradient with respect to certain policy parameters) from a training set comprising observed trajectories sampled under a known policy.
(cont.) We propose new value (and value gradient) estimators that are unbiased and have low training set to training set variance. We expect our approach to outperform competing approaches when there are few system observations compared to the underlying MDP size, as indicated by numerical experiments.
by Yann Le Tallec.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
15

Torabian, Hajaralsadat. "Protecting sensitive data using differential privacy and role-based access control." Master's thesis, Université Laval, 2016. http://hdl.handle.net/20.500.11794/26580.

Full text
Abstract:
Dans le monde d'aujourd'hui où la plupart des aspects de la vie moderne sont traités par des systèmes informatiques, la vie privée est de plus en plus une grande préoccupation. En outre, les données ont été générées massivement et traitées en particulier dans les deux dernières années, ce qui motive les personnes et les organisations à externaliser leurs données massives à des environnements infonuagiques offerts par des fournisseurs de services. Ces environnements peuvent accomplir les tâches pour le stockage et l'analyse de données massives, car ils reposent principalement sur Hadoop MapReduce qui est conçu pour traiter efficacement des données massives en parallèle. Bien que l'externalisation de données massives dans le nuage facilite le traitement de données et réduit le coût de la maintenance et du stockage de données locales, elle soulève de nouveaux problèmes concernant la protection de la vie privée. Donc, comment on peut effectuer des calculs sur de données massives et sensibles tout en préservant la vie privée. Par conséquent, la construction de systèmes sécurisés pour la manipulation et le traitement de telles données privées et massives est cruciale. Nous avons besoin de mécanismes pour protéger les données privées, même lorsque le calcul en cours d'exécution est non sécurisé. Il y a eu plusieurs recherches ont porté sur la recherche de solutions aux problèmes de confidentialité et de sécurité lors de l'analyse de données dans les environnements infonuagique. Dans cette thèse, nous étudions quelques travaux existants pour protéger la vie privée de tout individu dans un ensemble de données, en particulier la notion de vie privée connue comme confidentialité différentielle. Confidentialité différentielle a été proposée afin de mieux protéger la vie privée du forage des données sensibles, assurant que le résultat global publié ne révèle rien sur la présence ou l'absence d'un individu donné. Enfin, nous proposons une idée de combiner confidentialité différentielle avec une autre méthode de préservation de la vie privée disponible.
In nowadays world where most aspects of modern life are handled and managed by computer systems, privacy has increasingly become a big concern. In addition, data has been massively generated and processed especially over the last two years. The rate at which data is generated on one hand, and the need to efficiently store and analyze it on the other hand, lead people and organizations to outsource their massive amounts of data (namely Big Data) to cloud environments supported by cloud service providers (CSPs). Such environments can perfectly undertake the tasks for storing and analyzing big data since they mainly rely on Hadoop MapReduce framework, which is designed to efficiently handle big data in parallel. Although outsourcing big data into the cloud facilitates data processing and reduces the maintenance cost of local data storage, it raises new problem concerning privacy protection. The question is how one can perform computations on sensitive and big data while still preserving privacy. Therefore, building secure systems for handling and processing such private massive data is crucial. We need mechanisms to protect private data even when the running computation is untrusted. There have been several researches and work focused on finding solutions to the privacy and security issues for data analytics on cloud environments. In this dissertation, we study some existing work to protect the privacy of any individual in a data set, specifically a notion of privacy known as differential privacy. Differential privacy has been proposed to better protect the privacy of data mining over sensitive data, ensuring that the released aggregate result gives almost nothing about whether or not any given individual has been contributed to the data set. Finally, we propose an idea of combining differential privacy with another available privacy preserving method.
APA, Harvard, Vancouver, ISO, and other styles
16

Hedlin, Johan, and Joakim Kahlström. "Detecting access to sensitive data in software extensions through static analysis." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162281.

Full text
Abstract:
Static analysis is a technique to automatically audit code without having to execute or manually read through it. It is highly effective and can scan large amounts of code or text very quickly. This thesis uses static analysis to find potential threats within a software's extension modules. These extensions are developed by third parties and should not be allowed to access information belonging to other extensions. However, due to the structure of the software there is no easy way to restrict this and still keep the software's functionality intact. The use of a static analysis tool could detect such threats by analyzing the code of an extension before it is published online, and therefore keep all current functionality intact. As the software is based on a lesser known language and there is a specific threat by way of information disclosure, a new static analysis tool has to be developed. To achieve this, a combination of language specific functionality and features available in C++ are combined to create an extendable tool which has the capability to detect cross-extension data access.
APA, Harvard, Vancouver, ISO, and other styles
17

Barreau, Emilie. "Accès aux droits sociaux et numérique : les enjeux de la digitalisation dans l’accès aux aides sociales départementales." Electronic Thesis or Diss., Angers, 2024. http://www.theses.fr/2024ANGE0012.

Full text
Abstract:
La dématérialisation des procédures est un fait général qui revêt une portée spécifique en matière de droits sociaux. En matière d’aide sociale, ces droits s’adressent à un public vulnérable qui peut cumuler des facteurs de difficultés. La dématérialisation des procédures qui se traduit par l’absence de guichets et d’interlocuteurs, se déploie sans que la particularité des droits sociaux ou de la vulnérabilité des personnes concernées ne soient prises en compte. Les potentialités du numérique permettent d’envisager des moyens pour renforcer l’accès aux droits sociaux desdites personnes. Néanmoins, ces solutions constituent une forme d’incertitude quant à l’effectivité des droits sociaux. Il en va particulièrement ainsi des plateformes qui constituent des interfaces entre le demandeur ou le bénéficiaire de l’aide sociale et l’autorité qui doit en assurer la garantie et le suivi, tels les conseils départementaux. Le caractère innovant de ces outils ne doit pourtant pas faire perdre de vue leur fonction sociale initiale. Si un encadrement plus inclusif des pratiques se développe, le cadre juridique actuel semble toutefois être mobilisé en faveur du numérique (dématérialisation, ouverture des données publiques, mise en place de divers algorithmes, etc.). À cet égard, le rapport entre l’accès aux droits sociaux et le numérique dévoile des points de divergences eu égard à l’organisation de proximité des conseils départementaux, à la sensibilité des données concernées, aux conséquences de l’automatisation des décisions administratives individuelles et à la valeur économique de la donnée. Dès lors, la posture adoptée dans le cadre de cette recherche consiste à mettre en exergue l’ensemble des conditions permettant d’assurer, face à ces évolutions, le respect des droits sociaux
The dematerialization of administrative procedures is a general fact that has a specific scope in terms of social rights. When it comes to social assistance, these rights are aimed at a vulnerable public that can combine difficulty factors. The dematerialization of administrative procedures, which results in the lack of offices/desks and interlocutors, is deployed without the particularity of social rights or the vulnerability of the persons concerned being considered. Consequently, the desired objective of strengthening access to social rights through the potential of digital technology quickly gives way to uncertainty about the effectiveness of social rights. This is particularly the case in the context of platforms that constitute interfaces between the applicant or the beneficiary of social assistance and the authority that must ensure and monitor it, such as departmental councils. The innovative nature of these tools must not, however, lose sight of their initial social function. While a more inclusive framework of practices is developing, the current legal framework seems to be mobilized in favor of digital (dematerialization, open data, algorithms, etc.). In this respect, the relationship between access to social rights and digital reveals differences such as the local organization of departmental councils, the sensitivity of personal data, the consequences of automating individual administrative decisions and the economic value of data. Therefore, the position adopted in this research is to highlight all the conditions allowing to ensure the respect of social rights in the face of these changes
APA, Harvard, Vancouver, ISO, and other styles
18

Krishnaswamy, Vijaykumar. "Shared state management for time-sensitive distributed applications." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/8197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

El, KHOURY Hiba. "Introduction of New Products in the Supply Chain : Optimization and Management of Risks." Phd thesis, HEC, 2012. http://pastel.archives-ouvertes.fr/pastel-00708801.

Full text
Abstract:
Shorter product life cycles and rapid product obsolescence provide increasing incentives to introduce newproducts to markets more quickly. As a consequence of rapidly changing market conditions, firms focus onimproving their new product development processes to reap the benefits of early market entry. Researchershave analyzed market entry, but have seldom provided quantitative approaches for the product rolloverproblem. This research builds upon the literature by using established optimization methods to examine howfirms can minimize their net loss during the rollover process. Specifically, our work explicitly optimizes thetiming of removal of old products and introduction of new products, the optimal strategy, and the magnitudeof net losses when the market entry approval date of a new product is unknown. In the first paper, we use theconditional value at risk to optimize the net loss and investigate the effect of risk perception of the manageron the rollover process. We compare it to the minimization of the classical expected net loss. We deriveconditions for optimality and unique closed-form solutions for single and dual rollover cases. In the secondpaper, we investigate the rollover problem, but for a time-dependent demand rate for the second producttrying to approximate the Bass Model. Finally, in the third paper, we apply the data-driven optimizationapproach to the product rollover problem where the probability distribution of the approval date is unknown.We rather have historical observations of approval dates. We develop the optimal times of rollover and showthe superiority of the data-driven method over the conditional value at risk in case where it is difficult to guessthe real probability distribution
APA, Harvard, Vancouver, ISO, and other styles
20

Khire, Sourabh Mohan. "Time-sensitive communication of digital images, with applications in telepathology." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29761.

Full text
Abstract:
Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2010.
Committee Chair: Jayant, Nikil; Committee Member: Anderson, David; Committee Member: Lee, Chin-Hui. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
21

Murphy, Brian R. "Order-sensitive XML query processing over relational sources." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0505103-123753.

Full text
Abstract:
Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: computation pushdown; XML; order-based Xquery processing; relational database; ordered SQL queries; data model mapping; XQuery; XML data mapping; SQL; XML algebra rewrite rules; XML document order. Includes bibliographical references (p. 64-67).
APA, Harvard, Vancouver, ISO, and other styles
22

McCullagh, Karen. "The social, cultural, epistemological and technical basis of the concept of 'private' data." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/the-social-cultural-epistemological-and-technical-basis-of-the-concept-of-private-data(e2ea538a-8e5b-43e3-8dc2-4cdf602a19d3).html.

Full text
Abstract:
In July 2008, the UK Information Commissioner launched a review of EU Directive 95/46/EC on the basis that: “European data protection law is increasingly seen as out of date, bureaucratic and excessively prescriptive. It is showing its age and is failing to meet new challenges to privacy, such as the transfer of personal details across international borders and the huge growth in personal information online. It is high time the law is reviewed and updated for the modern world.” Legal practitioners such as Bergkamp have expressed a similar sense of dissatisfaction with the current legislative approach: “Data Protection as currently conceived by the EU is a fallacy. It is a shotgun remedy against an incompletely conceptualised problem. It is an emotional, rather than rational reaction to feelings of discomfort with expanding data flows. The EU regime is not supported by any empirical data on privacy risks and demand…A future EU privacy program should focus on actual harms and apply targeted remedies.” Accordingly, this thesis critiques key concepts of existing data protection legislation, namely ‘personal’ and ‘sensitive’ data, in order to explore whether current data protection laws can simply be amended and supplemented to manage privacy in the information society. The findings from empirical research will demonstrate that a more radical change in EU law and policy is required to effectively address privacy in the digital economy. To this end, proposed definitions of data privacy and private data was developed and tested through semi-structured interviews with privacy and data protection experts. The expert responses indicate that Bergkamp et al have indeed identified a potential future direction for privacy and data protection, but that further research is required in order to develop a coherent definition of privacy protection based on managing risks to personal data, and harm from misuse of such information.
APA, Harvard, Vancouver, ISO, and other styles
23

Raber, Frederic Christian [Verfasser]. "Supporting lay users in privacy decisions when sharing sensitive data / Frederic Christian Raber." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2020. http://d-nb.info/1220691127/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Ailem, Melissa. "Sparsity-sensitive diagonal co-clustering algorithms for the effective handling of text data." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB087.

Full text
Abstract:
Dans le contexte actuel, il y a un besoin évident de techniques de fouille de textes pour analyser l'énorme quantité de documents textuelles non structurées disponibles sur Internet. Ces données textuelles sont souvent représentées par des matrices creuses (sparses) de grande dimension où les lignes et les colonnes représentent respectivement des documents et des termes. Ainsi, il serait intéressant de regrouper de façon simultanée ces termes et documents en classes homogènes, rendant ainsi cette quantité importante de données plus faciles à manipuler et à interpréter. Les techniques de classification croisée servent justement cet objectif. Bien que plusieurs techniques existantes de co-clustering ont révélé avec succès des blocs homogènes dans plusieurs domaines, ces techniques sont toujours contraintes par la grande dimensionalité et la sparsité caractérisant les matrices documents-termes. En raison de cette sparsité, plusieurs co-clusters sont principalement composés de zéros. Bien que ces derniers soient homogènes, ils ne sont pas pertinents et doivent donc être filtrés en aval pour ne garder que les plus importants. L'objectif de cette thèse est de proposer de nouveaux algorithmes de co-clustering conçus pour tenir compte des problèmes liés à la sparsité mentionnés ci-dessus. Ces algorithmes cherchent une structure diagonale par blocs et permettent directement d'identifier les co-clusters les plus pertinents, ce qui les rend particulièrement efficaces pour le co-clustering de données textuelles. Dans ce contexte, nos contributions peuvent être résumées comme suit: Tout d'abord, nous introduisons et démontrons l'efficacité d'un nouvel algorithme de co-clustering basé sur la maximisation directe de la modularité de graphes. Alors que les algorithmes de co-clustering existants qui se basent sur des critères de graphes utilisent des approximations spectrales, l'algorithme proposé utilise une procédure d'optimisation itérative pour révéler les co-clusters les plus pertinents dans une matrice documents-termes. Par ailleurs, l'optimisation proposée présente l'avantage d'éviter le calcul de vecteurs propres, qui est une tâche rédhibitoire lorsque l'on considère des données de grande dimension. Ceci est une amélioration par rapport aux approches spectrales, où le calcul des vecteurs propres est nécessaire pour effectuer le co-clustering. Dans un second temps, nous utilisons une approche probabiliste pour découvrir des structures en blocs homogènes diagonaux dans des matrices documents-termes. Nous nous appuyons sur des approches de type modèles de mélanges, qui offrent de solides bases théoriques et une grande flexibilité qui permet de découvrir diverses structures de co-clusters. Plus précisément, nous proposons un modèle de blocs latents parcimonieux avec des distributions de Poisson sous contraintes. De façon intéressante, ce modèle comprend la sparsité dans sa formulation, ce qui le rend particulièrement adapté aux données textuelles. En plaçant l'estimation des paramètres de ce modèle dans le cadre du maximum de vraisemblance et du maximum de vraisemblance classifiante, quatre algorithmes de co-clustering ont été proposées, incluant une variante dure, floue, stochastique et une quatrième variante qui tire profit des avantages des variantes floue et stochastique simultanément. Pour finir, nous proposons un nouveau cadre de fouille de textes biomédicaux qui comprend certains algorithmes de co-clustering mentionnés ci-dessus. Ce travail montre la contribution du co-clustering dans une problématique réelle de fouille de textes biomédicaux. Le cadre proposé permet de générer de nouveaux indices sur les résultats retournés par les études d'association pan-génomique (GWAS) en exploitant les abstracts de la base de données PUBMED. (...)
In the current context, there is a clear need for Text Mining techniques to analyse the huge quantity of unstructured text documents available on the Internet. These textual data are often represented by sparse high dimensional matrices where rows and columns represent documents and terms respectively. Thus, it would be worthwhile to simultaneously group these terms and documents into meaningful clusters, making this substantial amount of data easier to handle and interpret. Co-clustering techniques just serve this purpose. Although many existing co-clustering approaches have been successful in revealing homogeneous blocks in several domains, these techniques are still challenged by the high dimensionality and sparsity characteristics exhibited by document-term matrices. Due to this sparsity, several co-clusters are primarily composed of zeros. While homogeneous, these co-clusters are irrelevant and must be filtered out in a post-processing step to keep only the most significant ones. The objective of this thesis is to propose new co-clustering algorithms tailored to take into account these sparsity-related issues. The proposed algorithms seek a block diagonal structure and allow to straightaway identify the most useful co-clusters, which makes them specially effective for the text co-clustering task. Our contributions can be summarized as follows: First, we introduce and demonstrate the effectiveness of a novel co-clustering algorithm based on a direct maximization of graph modularity. While existing graph-based co-clustering algorithms rely on spectral relaxation, the proposed algorithm uses an iterative alternating optimization procedure to reveal the most meaningful co-clusters in a document-term matrix. Moreover, the proposed optimization has the advantage of avoiding the computation of eigenvectors, a task which is prohibitive when considering high dimensional data. This is an improvement over spectral approaches, where the eigenvectors computation is necessary to perform the co-clustering. Second, we use an even more powerful approach to discover block diagonal structures in document-term matrices. We rely on mixture models, which offer strong theoretical foundations and considerable flexibility that makes it possible to uncover various specific cluster structure. More precisely, we propose a rigorous probabilistic model based on the Poisson distribution and the well known Latent Block Model. Interestingly, this model includes the sparsity in its formulation, which makes it particularly effective for text data. Setting the estimate of this model’s parameters under the Maximum Likelihood (ML) and the Classification Maximum Likelihood (CML) approaches, four co-clustering algorithms have been proposed, including a hard, a soft, a stochastic and a fourth algorithm which leverages the benefits of both the soft and stochastic variants, simultaneously. As a last contribution of this thesis, we propose a new biomedical text mining framework that includes some of the above mentioned co-clustering algorithms. This work shows the contribution of co-clustering in a real biomedical text mining problematic. The proposed framework is able to propose new clues about the results of genome wide association studies (GWAS) by mining PUBMED abstracts. This framework has been tested on asthma disease and allowed to assess the strength of associations between asthma genes reported in previous GWAS as well as discover new candidate genes likely associated to asthma. In a nutshell, while several text co-clustering algorithms already exist, their performance can be substantially increased if more appropriate models and algorithms are available. According to the extensive experiments done on several challenging real-world text data sets, we believe that this thesis has served well this objective
APA, Harvard, Vancouver, ISO, and other styles
25

Ailem, Melissa. "Sparsity-sensitive diagonal co-clustering algorithms for the effective handling of text data." Electronic Thesis or Diss., Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB087.

Full text
Abstract:
Dans le contexte actuel, il y a un besoin évident de techniques de fouille de textes pour analyser l'énorme quantité de documents textuelles non structurées disponibles sur Internet. Ces données textuelles sont souvent représentées par des matrices creuses (sparses) de grande dimension où les lignes et les colonnes représentent respectivement des documents et des termes. Ainsi, il serait intéressant de regrouper de façon simultanée ces termes et documents en classes homogènes, rendant ainsi cette quantité importante de données plus faciles à manipuler et à interpréter. Les techniques de classification croisée servent justement cet objectif. Bien que plusieurs techniques existantes de co-clustering ont révélé avec succès des blocs homogènes dans plusieurs domaines, ces techniques sont toujours contraintes par la grande dimensionalité et la sparsité caractérisant les matrices documents-termes. En raison de cette sparsité, plusieurs co-clusters sont principalement composés de zéros. Bien que ces derniers soient homogènes, ils ne sont pas pertinents et doivent donc être filtrés en aval pour ne garder que les plus importants. L'objectif de cette thèse est de proposer de nouveaux algorithmes de co-clustering conçus pour tenir compte des problèmes liés à la sparsité mentionnés ci-dessus. Ces algorithmes cherchent une structure diagonale par blocs et permettent directement d'identifier les co-clusters les plus pertinents, ce qui les rend particulièrement efficaces pour le co-clustering de données textuelles. Dans ce contexte, nos contributions peuvent être résumées comme suit: Tout d'abord, nous introduisons et démontrons l'efficacité d'un nouvel algorithme de co-clustering basé sur la maximisation directe de la modularité de graphes. Alors que les algorithmes de co-clustering existants qui se basent sur des critères de graphes utilisent des approximations spectrales, l'algorithme proposé utilise une procédure d'optimisation itérative pour révéler les co-clusters les plus pertinents dans une matrice documents-termes. Par ailleurs, l'optimisation proposée présente l'avantage d'éviter le calcul de vecteurs propres, qui est une tâche rédhibitoire lorsque l'on considère des données de grande dimension. Ceci est une amélioration par rapport aux approches spectrales, où le calcul des vecteurs propres est nécessaire pour effectuer le co-clustering. Dans un second temps, nous utilisons une approche probabiliste pour découvrir des structures en blocs homogènes diagonaux dans des matrices documents-termes. Nous nous appuyons sur des approches de type modèles de mélanges, qui offrent de solides bases théoriques et une grande flexibilité qui permet de découvrir diverses structures de co-clusters. Plus précisément, nous proposons un modèle de blocs latents parcimonieux avec des distributions de Poisson sous contraintes. De façon intéressante, ce modèle comprend la sparsité dans sa formulation, ce qui le rend particulièrement adapté aux données textuelles. En plaçant l'estimation des paramètres de ce modèle dans le cadre du maximum de vraisemblance et du maximum de vraisemblance classifiante, quatre algorithmes de co-clustering ont été proposées, incluant une variante dure, floue, stochastique et une quatrième variante qui tire profit des avantages des variantes floue et stochastique simultanément. Pour finir, nous proposons un nouveau cadre de fouille de textes biomédicaux qui comprend certains algorithmes de co-clustering mentionnés ci-dessus. Ce travail montre la contribution du co-clustering dans une problématique réelle de fouille de textes biomédicaux. Le cadre proposé permet de générer de nouveaux indices sur les résultats retournés par les études d'association pan-génomique (GWAS) en exploitant les abstracts de la base de données PUBMED. (...)
In the current context, there is a clear need for Text Mining techniques to analyse the huge quantity of unstructured text documents available on the Internet. These textual data are often represented by sparse high dimensional matrices where rows and columns represent documents and terms respectively. Thus, it would be worthwhile to simultaneously group these terms and documents into meaningful clusters, making this substantial amount of data easier to handle and interpret. Co-clustering techniques just serve this purpose. Although many existing co-clustering approaches have been successful in revealing homogeneous blocks in several domains, these techniques are still challenged by the high dimensionality and sparsity characteristics exhibited by document-term matrices. Due to this sparsity, several co-clusters are primarily composed of zeros. While homogeneous, these co-clusters are irrelevant and must be filtered out in a post-processing step to keep only the most significant ones. The objective of this thesis is to propose new co-clustering algorithms tailored to take into account these sparsity-related issues. The proposed algorithms seek a block diagonal structure and allow to straightaway identify the most useful co-clusters, which makes them specially effective for the text co-clustering task. Our contributions can be summarized as follows: First, we introduce and demonstrate the effectiveness of a novel co-clustering algorithm based on a direct maximization of graph modularity. While existing graph-based co-clustering algorithms rely on spectral relaxation, the proposed algorithm uses an iterative alternating optimization procedure to reveal the most meaningful co-clusters in a document-term matrix. Moreover, the proposed optimization has the advantage of avoiding the computation of eigenvectors, a task which is prohibitive when considering high dimensional data. This is an improvement over spectral approaches, where the eigenvectors computation is necessary to perform the co-clustering. Second, we use an even more powerful approach to discover block diagonal structures in document-term matrices. We rely on mixture models, which offer strong theoretical foundations and considerable flexibility that makes it possible to uncover various specific cluster structure. More precisely, we propose a rigorous probabilistic model based on the Poisson distribution and the well known Latent Block Model. Interestingly, this model includes the sparsity in its formulation, which makes it particularly effective for text data. Setting the estimate of this model’s parameters under the Maximum Likelihood (ML) and the Classification Maximum Likelihood (CML) approaches, four co-clustering algorithms have been proposed, including a hard, a soft, a stochastic and a fourth algorithm which leverages the benefits of both the soft and stochastic variants, simultaneously. As a last contribution of this thesis, we propose a new biomedical text mining framework that includes some of the above mentioned co-clustering algorithms. This work shows the contribution of co-clustering in a real biomedical text mining problematic. The proposed framework is able to propose new clues about the results of genome wide association studies (GWAS) by mining PUBMED abstracts. This framework has been tested on asthma disease and allowed to assess the strength of associations between asthma genes reported in previous GWAS as well as discover new candidate genes likely associated to asthma. In a nutshell, while several text co-clustering algorithms already exist, their performance can be substantially increased if more appropriate models and algorithms are available. According to the extensive experiments done on several challenging real-world text data sets, we believe that this thesis has served well this objective
APA, Harvard, Vancouver, ISO, and other styles
26

Jarvis, Ryan D. "Protecting Sensitive Credential Content during Trust Negotiation." Diss., CLICK HERE for online access, 2003. http://contentdm.lib.byu.edu/ETD/image/etd192.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Becerra, Bonache Leonor. "On the learnibility of Mildly Context-Sensitive languages using positive data and correction queries." Doctoral thesis, Universitat Rovira i Virgili, 2006. http://hdl.handle.net/10803/8780.

Full text
Abstract:
Con esta tesis doctoral aproximamos la teoría de la inferencia gramatical y los estudios de adquisición del lenguaje, en pos de un objetivo final: ahondar en la comprensión del modo como los niños adquieren su primera lengua mediante la explotación de la teoría inferencial de gramáticas formales.

Nuestras tres principales aportaciones son:

1. Introducción de una nueva clase de lenguajes llamada Simple p-dimensional external contextual (SEC). A pesar de que las investigaciones en inferencia gramatical se han centrado en lenguajes regulares o independientes del contexto, en nuestra tesis proponemos centrar esos estudios en clases de lenguajes más relevantes desde un punto de vista lingüístico (familias de lenguajes que ocupan una posición ortogonal en la jerarquía de Chomsky y que son suavemente dependientes del contexto, por ejemplo, SEC).

2. Presentación de un nuevo paradigma de aprendizaje basado en preguntas de corrección. Uno de los principales resultados positivos dentro de la teoría del aprendizaje formal es el hecho de que los autómatas finitos deterministas (DFA) se pueden aprender de manera eficiente utilizando preguntas de pertinencia y preguntas de equivalencia. Teniendo en cuenta que en el aprendizaje de primeras lenguas la corrección de errores puede jugar un papel relevante, en nuestra tesis doctoral hemos introducido un nuevo modelo de aprendizaje que reemplaza las preguntas de pertinencia por preguntas de corrección.

3. Presentación de resultados basados en las dos previas aportaciones. En primer lugar, demostramos que los SEC se pueden aprender a partir de datos positivos. En segundo lugar, demostramos que los DFA se pueden aprender a partir de correcciones y que el número de preguntas se reduce considerablemente.

Los resultados obtenidos con esta tesis doctoral suponen una aportación importante para los estudios en inferencia gramatical (hasta el momento las investigaciones en este ámbito se habían centrado principalmente en los aspectos matemáticos de los modelos). Además, estos resultados se podrían extender a diversos campos de aplicación que gozan de plena actualidad, tales como el aprendizaje automático, la robótica, el procesamiento del lenguaje natural y la bioinformática.
With this dissertation, we bring together the Theory of the Grammatical Inference and Studies of language acquisition, in pursuit of our final goal: to go deeper in the understanding of the process of language acquisition by using the theory of inference of formal grammars.

Our main three contributions are:

1. Introduction of a new class of languages called Simple p-dimensional external contextual (SEC). Despite the fact that the field of Grammatical Inference has focused its research on learning regular or context-free languages, we propose in our dissertation to focus these studies in classes of languages more relevant from a linguistic point of view (families of languages that occupy an orthogonal position in the Chomsky Hierarchy and are Mildly Context-Sensitive, for example SEC).

2. Presentation of a new learning paradigm based on correction queries. One of the main results in the theory of formal learning is that deterministic finite automata (DFA) are efficiently learnable from membership query and equivalence query. Taken into account that in first language acquisition the correction of errors can play an important role, we have introduced in our dissertation a novel learning model by replacing membership queries with correction queries.

3. Presentation of results based on the two previous contributions. First, we prove that SEC is learnable from only positive data. Second, we prove that it is possible to learn DFA from corrections and that the number of queries is reduced considerably.

The results obtained with this dissertation suppose an important contribution to studies of Grammatical Inference (the current research in Grammatical Inference has focused mainly on the mathematical aspects of the models). Moreover, these results could be extended to studies related directly to machine translation, robotics, natural language processing, and bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
28

Vilsmaier, Christian. "Contextualized access to distributed and heterogeneous multimedia data sources." Thesis, Lyon, INSA, 2014. http://www.theses.fr/2014ISAL0094/document.

Full text
Abstract:
Rendre les données multimédias disponibles en ligne devient moins cher et plus pratique sur une base quotidienne, par exemple par les utilisateurs eux-mêmes. Des phénomènes du Web comme Facebook, Twitter et Flickr bénéficient de cette évolution. Ces phénomènes et leur acceptation accrue conduisent à une multiplication du nombre d’images disponibles en ligne. La taille cumulée de ces images souvent publiques et donc consultables, est de l’ordre de plusieurs zettaoctets. L’exécution d’une requête de similarité sur de tels volumes est un défi que la communauté scientifique commence à cibler. Une approche envisagée pour faire face à ce problème propose d’utiliser un système distribué et hétérogène de recherche d’images basé sur leur contenu (CBIRs). De nombreux problèmes émergent d’un tel scénario. Un exemple est l’utilisation de formats de métadonnées distincts pour décrire le contenu des images; un autre exemple est l’information technique et structurelle inégale. Les métriques individuelles qui sont utilisées par les CBIRs pour calculer la similarité entre les images constituent un autre exemple. Le calcul de bons résultats dans ce contexte s’avère ainsi une tàche très laborieuse qui n’est pas encore scientifiquement résolue. Le problème principalement abordé dans cette thèse est la recherche de photos de CBIRs similaires à une image donnée comme réponse à une requête multimédia distribuée. La contribution principale de cette thèse est la construction d’un réseau de CBIRs sensible à la sémantique des contenus (CBIRn). Ce CBIRn sémantique est capable de collecter et fusionner les résultats issus de sources externes spécialisées. Afin d’être en mesure d’intégrer de telles sources extérieures, prêtes à rejoindre le réseau, mais pas à divulguer leur configuration, un algorithme a été développé capable d’estimer la configuration d’un CBIRS. En classant les CBIRs et en analysant les requêtes entrantes, les requêtes d’image sont exclusivement transmises aux CBIRs les plus appropriés. De cette fac ̧on, les images sans intérêt pour l’utilisateur peuvent être omises à l’avance. Les images retournées cells sont considérées comme similaires par rapport à l’image donnée pour la requête. La faisabilité de l’approche et l’amélioration obtenue par le processus de recherche sont démontrées par un développement prototypique et son évaluation utilisant des images d’ImageNet. Le nombre d’images pertinentes renvoyées par l’approche de cette thèse en réponse à une requête image est supérieur d’un facteur 4.75 par rapport au résultat obtenu par un réseau de CBIRs predéfini
Making multimedia data available online becomes less expensive and more convenient on a daily basis. This development promotes web phenomenons such as Facebook, Twitter, and Flickr. These phenomena and their increased acceptance in society in turn leads to a multiplication of the amount of available images online. This vast amount of, frequently public and therefore searchable, images already exceeds the zettabyte bound. Executing a similarity search on the magnitude of images that are publicly available and receiving a top quality result is a challenge that the scientific community has recently attempted to rise to. One approach to cope with this problem assumes the use of distributed heterogeneous Content Based Image Retrieval system (CBIRs). Following from this anticipation, the problems that emerge from a distributed query scenario must be dealt with. For example the involved CBIRs’ usage of distinct metadata formats for describing their content, as well as their unequal technical and structural information. An addition issue is the individual metrics that are used by the CBIRs to calculate the similarity between pictures, as well as their specific way of being combined. Overall, receiving good results in this environment is a very labor intensive task which has been scientifically but not yet comprehensively explored. The problem primarily addressed in this work is the collection of pictures from CBIRs, that are similar to a given picture, as a response to a distributed multimedia query. The main contribution of this thesis is the construction of a network of Content Based Image Retrieval systems that are able to extract and exploit the information about an input image’s semantic concept. This so called semantic CBIRn is mainly composed of CBIRs that are configured by the semantic CBIRn itself. Complementarily, there is a possibility that allows the integration of specialized external sources. The semantic CBIRn is able to collect and merge results of all of these attached CBIRs. In order to be able to integrate external sources that are willing to join the network, but are not willing to disclose their configuration, an algorithm was developed that approximates these configurations. By categorizing existing as well as external CBIRs and analyzing incoming queries, image queries are exclusively forwarded to the most suitable CBIRs. In this way, images that are not of any use for the user can be omitted beforehand. The hereafter returned images are rendered comparable in order to be able to merge them to one single result list of images, that are similar to the input image. The feasibility of the approach and the hereby obtained improvement of the search process is demonstrated by a prototypical implementation. Using this prototypical implementation an augmentation of the number of returned images that are of the same semantic concept as the input images is achieved by a factor of 4.75 with respect to a predefined non-semantic CBIRn
APA, Harvard, Vancouver, ISO, and other styles
29

Darshana, Dipika. "DELAY SENSITIVE ROUTING FOR REAL TIME TRAFFIC OVER AD-HOC NETWORKS." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2802.

Full text
Abstract:
Wireless ad hoc network consists of inexpensive nodes that form a mobile communication network. Due to limitations of the transmission range, the nodes rely on each other to forward packets such that messages can be delivered across the network. The selection of the path along which a packet is forwarded from the source node to the destination node is done by the routing algorithm. Most commonly used routing algorithms, though effective for non-real time applications, cannot handle real-time applications that require strict delay bounds on packet delivery. In this thesis, we propose a routing protocol that ensures timely delivery of real time data packets. The idea is to route packets in such a way that irrespective of factors like traffic load and node density, the average delay remains within acceptable bounds. This is done by carefully accessing the resources available to a route before a session is admitted along that route. Each link in the route is checked for sufficient bandwidth not only for the new session to be admitted but also for the sessions that are already using that link. The new session is admitted only if the admission does not violate the delay bounds of any on-going sessions. This method of route selection coupled with per-hop link reservations allows us to provide bounds on the delay performance. Extensive simulation experiments have been conducted that demonstrate the performance of the proposed routing protocol in terms of throughput, session blocking probability, packet drop probability, average path length, and delay.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering MSCpE
APA, Harvard, Vancouver, ISO, and other styles
30

Winandy, Marcel [Verfasser]. "Security and Trust Architectures for Protecting Sensitive Data on Commodity Computing Platforms / Marcel Winandy." Aachen : Shaker, 2012. http://d-nb.info/106773497X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Vaskovich, Daria. "Cloud Computing and Sensitive Data : A Case of Beneficial Co-Existence or Mutual Exclusiveness?" Thesis, KTH, Hållbarhet och industriell dynamik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-169597.

Full text
Abstract:
I dag anses molntjänster vara ett omtalat ämne som har ändrat hur IT-tjänster levereras och som skapat nya affärsmodeller. Några av molntjänsternas mest frekvent nämnda fördelar är flexibilitet och skalbarhet. Molntjänster är i dagsläget extensivt använda av privatpersoner genom tjänster så som Google Drive och Dropbox. Å andra sidan kan en viss försiktighet gentemot molntjänster uppmärksammas hos de organisationer som innehar känslig data. Denna försiktighet kan anses leda till en långsammare tillämpningshastighet för dessa organisationer. Detta examensarbete har som syfte att undersöka sambandet mellan molntjänster och känslig data för att kunna erbjuda stöd och kunskapsbas för organisationer som överväger en övergång till molntjänster. Känslig data är definierat som information som omfattas av den svenska Personuppgiftslagen. Tidigare studier visar att organisationer värdesätter en hög säkerhetsgrad vid en övergång till molntjänster och ofta föredrar att leverantören kan erbjuda ett antal säkerhetsmekanismer. En molntjänsts lagliga överensstämmelse är en annan faktor som uppmärksammas. Datainsamlingen skedde genom en enkät, som var riktad till 101 av de svenska organisationerna i syfte att kartlägga användningen av molntjänster samt att identifiera möjliga bromsande faktorer. Dessutom genomfördes tre (3) intervjuer med experter och forskare inom IT-lag och/eller molnlösningar. En analys och diskussion, baserad på resultaten, har genomförts, vilket ledde till slutsatserna att en molnlösning av hybrid karaktär är bäst lämpad för den försiktiga organisationen, de olika villkoren i serviceavtalet bör grundligt diskuteras innan en överenskommelse mellan parter uppnås samt att i syfte att undvika att lösningen blir oförenlig med lagen bör främst en leverantör som är väl etablerad i Sverige väljas. Slutligen, bör varje organisation utvärdera om molntjänster kan tillgodose organisationens säkerhetsbehov, då det i stor mån berör ett risktagande.
Cloud computing is today a hot topic, which has changed how IT is delivered and created new business models to pursue. The main listed benefits of Cloud computing are, among others, flexibility and scalability. It is widely adopted by individuals in services, such as Google Drive and Dropbox. However, there exist a certain degree of precaution towards Cloud computing at organizations, which possess sensitive data, which may decelerate the adoption. Hence, this master thesis aims to investigate the topic of Cloud computing in a combination with sensitive data in order to support organizations in their decision making with a base of knowledge when a transition into the Cloud is considered. Sensitive data is defined as information protected by the Swedish Personal Data Act. Previous studies show that organizations value high degree of security when making a transition into Cloud computing, and request several measures to be implemented by the Cloud computing service provider. Legislative conformation of a Cloud computing service is another important aspect. The data gathering activities consisted of a survey, directed towards 101 Swedish organizations in order to map their usage of Cloud computing services and to identify aspects, which may decelerate the adoption. Moreover, interviews with three (3) experts within the fields of law and Cloud computing were conducted. The results were analyzed and discussed, which led to conclusions that hybrid Cloud is a well chosen alternative for a precautious organization, the SLA between the organizations should be thoroughly negotiated and that primarily providers well established on the Swedish market should be chosen in order to minimize the risk of legally non-consisting solution. Finally, each organization should decide whether the security provided by the Cloud computing provider is sufficient for organization’s purposes.
APA, Harvard, Vancouver, ISO, and other styles
32

Peng, Zhen. "Novel Data Analytics for Developing Sensitive and Reliable Damage Indicators in Structural Health Monitoring." Thesis, Curtin University, 2022. http://hdl.handle.net/20.500.11937/89064.

Full text
Abstract:
This thesis focuses on developing novel data analytics and damage detection methods that are applicable to the condition assessment of civil engineering structures subjected to operational and environmental condition changes, nonlinearity and/or measurement noise. Comprehensive numerical and experimental studies validate the effectiveness and performance of using the proposed approaches for practical structural health monitoring applications.
APA, Harvard, Vancouver, ISO, and other styles
33

Aljandal, Waleed A. "Itemset size-sensitive interestingness measures for association rule mining and link prediction." Diss., Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/1119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Flory, Long Mrs. "A WEB PERSONALIZATION ARTIFACT FOR UTILITY-SENSITIVE REVIEW ANALYSIS." VCU Scholars Compass, 2015. http://scholarscompass.vcu.edu/etd/3739.

Full text
Abstract:
Online customer reviews are web content voluntarily posted by the users of a product (e.g. camera) or service (e.g. hotel) to express their opinions about the product or service. Online reviews are important resources for businesses and consumers. This dissertation focuses on the important consumer concern of review utility, i.e., the helpfulness or usefulness of online reviews to inform consumer purchase decisions. Review utility concerns consumers since not all online reviews are useful or helpful. And, the quantity of the online reviews of a product/service tends to be very large. Manual assessment of review utility is not only time consuming but also information overloading. To address this issue, review helpfulness research (RHR) has become a very active research stream dedicated to study utility-sensitive review analysis (USRA) techniques for automating review utility assessment. Unfortunately, prior RHR solution is inadequate. RHR researchers call for more suitable USRA approaches. Our current research responds to this urgent call by addressing the research problem: What is an adequate USRA approach? We address this problem by offering novel Design Science (DS) artifacts for personalized USRA (PUSRA). Our proposed solution extends not only RHR research but also web personalization research (WPR), which studies web-based solutions for personalized web provision. We have evaluated the proposed solution by applying three evaluation methods: analytical, descriptive, and experimental. The evaluations corroborate the practical efficacy of our proposed solution. This research contributes what we believe (1) the first DS artifacts to the knowledge body of RHR and WPR, and (2) the first PUSRA contribution to USRA practice. Moreover, we consider our evaluations of the proposed solution the first comprehensive assessment of USRA solutions. In addition, this research contributes to the advancement of decision support research and practice. The proposed solution is a web-based decision support artifact with the capability to substantially improve accurate personalized webpage provision. Also, website designers can apply our research solution to transform their works fundamentally. Such transformation can add substantial value to businesses.
APA, Harvard, Vancouver, ISO, and other styles
35

Ording, Marcus. "Context-Sensitive Code Completion : Improving Predictions with Genetic Algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205334.

Full text
Abstract:
Within the area of context-sensitive code completion there is a need for accurate predictive models in order to provide useful code completion predictions. The traditional method for optimizing the performance of code completion systems is to empirically evaluate the effect of each system parameter individually and fine-tune the parameters. This thesis presents a genetic algorithm that can optimize the system parameters with a degree-of-freedom equal to the number of parameters to optimize. The study evaluates the effect of the optimized parameters on the prediction quality of the studied code completion system. Previous evaluation of the reference code completion system is also extended to include model size and inference speed. The results of the study shows that the genetic algorithm is able to improve the prediction quality of the studied code completion system. Compared with the reference system, the enhanced system is able to recognize 1 in 10 additional previously unseen code patterns. This increase in prediction quality does not significantly impact the system performance, as the inference speed remains less than 1 ms for both systems.
Inom området kontextkänslig kodkomplettering finns det ett behov av precisa förutsägande modeller för att kunna föreslå användbara kodkompletteringar. Den traditionella metoden för att optimera prestanda hos kodkompletteringssystem är att empiriskt utvärdera effekten av varje systemparameter individuellt och finjustera parametrarna. Det här arbetet presenterar en genetisk algoritm som kan optimera systemparametrarna med en frihetsgrad som är lika stor som antalet parametrar att optimera. Studien utvärderar effekten av de optimerade parametrarna på det studerade kodkompletteringssystemets pre- diktiva kvalitet. Tidigare utvärdering av referenssystemet utökades genom att även inkludera modellstorlek och slutledningstid. Resultaten av studien visar att den genetiska algoritmen kan förbättra den prediktiva kvali- teten för det studerade kodkompletteringssystemet. Jämfört med referenssystemet så lyckas det förbättrade systemet korrekt känna igen 1 av 10 ytterligare kodmönster som tidigare varit osedda. Förbättringen av prediktiv kvalietet har inte en signifikant inverkan på systemet, då slutledningstiden förblir mindre än 1 ms för båda systemen.
APA, Harvard, Vancouver, ISO, and other styles
36

Li, Xinfeng. "Time-sensitive Information Communication, Sensing, and Computing in Cyber-Physical Systems." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397731767.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Engin, Melih. "Text Classificaton In Turkish Marketing Domain And Context-sensitive Ad Distribution." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12610457/index.pdf.

Full text
Abstract:
Online advertising has a continuously increasing popularity. Target audience of this new advertising method is huge. Additionally, there is another rapidly growing and crowded group related to internet advertising that consists of web publishers. Contextual advertising systems make it easier for publishers to present online ads on their web sites, since these online marketing systems automatically divert ads to web sites with related contents. Web publishers join ad networks and gain revenue by enabling ads to be displayed on their sites. Therefore, the accuracy of automated ad systems in determining ad-context relevance is crucial. In this thesis we construct a method for semantic classification of web site contexts in Turkish language and develop an ad serving system to display context related ads on web documents. The classification method uses both semantic and statistical techniques. The method is supervised, and therefore, needs processed sample data for learning classification rules. Therefore, we generate a Turkish marketing dataset and use it in our classification approaches. We form successful classification methods using different feature spaces and support vector machine configurations. Our results present a good comparison between these methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Forde, Edward Steven. "Security Strategies for Hosting Sensitive Information in the Commercial Cloud." ScholarWorks, 2017. https://scholarworks.waldenu.edu/dissertations/3604.

Full text
Abstract:
IT experts often struggle to find strategies to secure data on the cloud. Although current security standards might provide cloud compliance, they fail to offer guarantees of security assurance. The purpose of this qualitative case study was to explore the strategies used by IT security managers to host sensitive information in the commercial cloud. The study's population consisted of information security managers from a government agency in the eastern region of the United States. The routine active theory, developed by Cohen and Felson, was used as the conceptual framework for the study. The data collection process included IT security manager interviews (n = 7), organizational documents and procedures (n = 14), and direct observation of a training meeting (n = 35). Data collection from organizational data and observational data were summarized. Coding from the interviews and member checking were triangulated with organizational documents and observational data/field notes to produce major and minor themes. Through methodological triangulation, 5 major themes emerged from the data analysis: avoiding social engineering vulnerabilities, avoiding weak encryption, maintaining customer trust, training to create a cloud security culture, and developing sufficient policies. The findings of this study may benefit information security managers by enhancing their information security practices to better protect their organization's information that is stored in the commercial cloud. Improved information security practices may contribute to social change by providing by proving customers a lesser amount of risk of having their identity or data stolen from internal and external thieves
APA, Harvard, Vancouver, ISO, and other styles
39

He, Yuting. "RVD2: An ultra-sensitive variant detection model for low-depth heterogeneous next-generation sequencing data." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-theses/499.

Full text
Abstract:
Motivation: Next-generation sequencing technology is increasingly being used for clinical diagnostic tests. Unlike research cell lines, clinical samples are often genomically heterogeneous due to low sample purity or the presence of genetic subpopulations. Therefore, a variant calling algorithm for calling low-frequency polymorphisms in heterogeneous samples is needed. Result: We present a novel variant calling algorithm that uses a hierarchical Bayesian model to estimate allele frequency and call variants in heterogeneous samples. We show that our algorithm improves upon current classifiers and has higher sensitivity and specificity over a wide range of median read depth and minor allele frequency. We apply our model and identify twelve mutations in the PAXP1 gene in a matched clinical breast ductal carcinoma tumor sample; two of which are loss-of-heterozygosity events.
APA, Harvard, Vancouver, ISO, and other styles
40

Hsu, William. "Using knowledge encoded in graphical disease models to support context-sensitive visualization of medical data." Diss., Restricted to subscribing institutions, 2009. http://proquest.umi.com/pqdweb?did=1925776141&sid=13&Fmt=2&clientId=1564&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Hoang, Van-Hoan. "Securing data access and exchanges in a heterogeneous ecosystem : An adaptive and context-sensitive approach." Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS009.

Full text
Abstract:
Les services de stockage et de partage de données basés sur le Cloud sont largement adoptés depuis des décennies. Le modèle sous-jacent permet aux utilisateurs de minimiser le coût de services en étant en mesure d'accéder et de partager des données facilement. Dans ce contexte, la sécurité est essentielle pour protéger les utilisateurs et leurs ressources. Concernant les utilisateurs, ils doivent prouver leur éligibilité pour pouvoir accéder aux ressources. Cependant, l’envoi direct des informations personnelles permet aux fournisseurs de services de détecter qui partage des données avec qui et de corréler leurs activités. Quant aux données, en raison de la complexité du déploiement d'un système de gestion de clés, elles ne sont pas chiffrées par les utilisateurs mais par les fournisseurs de services. Cela leur permet de les lire en clair. Dans la thèse, nous créons tout d’abord un protocole d’authentification et d'échange de clés basé sur un mot de passe qui permet de sécuriser des échanges entre des utilisateurs et des fournisseurs de services. Deuxièmement, nous construisons une PKI décentralisée qui permet de créer des protocoles d'authentification en préservant la vie privée des utilisateurs. Troisièmement, nous concevons deux schémas de chiffrement à base d’attributs. Ces schémas fournissent des systèmes de gestion de clés efficaces pour protéger des données en conservant la capacité de les partager avec d'autres. Enfin, nous construisons une plateforme de partage de données en tirant parti de la technologie blockchain. La plateforme garantit une haute disponibilité, la confidentialité des données, un contrôle d’accès sécurisé, et la vie privée des utilisateurs
Cloud-based data storage and sharing services have been proven successful since the last decades. The underlying model helps users not to expensively spend on hardware to store data while still being able to access and share data anywhere and whenever they desire. In this context, security is vital to protecting users and their resources. Regarding users, they need to be securely authenticated to prove their eligibility to access resources. As for user privacy, showing credentials enables the service provider to detect sharing-related people or build a profile for each. Regarding outsourced data, due to complexity in deploying an effective key management in such services, data is often not encrypted by users but service providers. This enables them to read users’ data. In this thesis, we make a set of contributions which address these issues. First, we design a password-based authenticated key exchange protocol to establish a secure channel between users and service providers over insecure environment. Second, we construct a privacy-enhancing decentralized public key infrastructure which allows building secure authentication protocols while preserving user privacy. Third, we design two revocable ciphertext-policy attribute-based encryption schemes. These provide effective key management systems to help a data owner to encrypt data before outsourcing it while still retaining the capacity to securely share it with others. Fourth, we build a decentralized data sharing platform by leveraging the blockchain technology and the IPFS network. The platform aims at providing high data availability, data confidentiality, secure access control, and user privacy
APA, Harvard, Vancouver, ISO, and other styles
42

Sankara, Krishnan Shivaranjani. "Delay sensitive delivery of rich images over WLAN in telemedicine applications." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29673.

Full text
Abstract:
Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Jayant, Nikil; Committee Member: Altunbasak, Yucel; Committee Member: Sivakumar, Raghupathy. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
43

Bhattacharya, Arindam. "Gradient Dependent Reconstruction from Scalar Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1449181983.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Ljungberg, Lucas. "Using unsupervised classification with multiple LDA derived models for text generation based on noisy and sensitive data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255010.

Full text
Abstract:
Creating models to generate contextual responses to input queries is a difficult problem. It is even more difficult when available data contains noise and sensitive data. Finding models or methods to handle such issues is important in order to use data for productive means.This thesis proposes a model based on a cooperating pair of Topic Models of differing tasks (LDA and GSDMM) in order to alleviate the problematic properties of data. The model is tested on a real-world dataset with these difficulties as well as a dataset without them. The goal is to 1) look at the behaviour of the different topic models to see if their topical representation of the data is of use as input or output to other models and 2) find out what properties can be alleviated as a result.The results show that topic modeling can represent the semantic information of documents well enough to produce well-behaved input data for other models, which can also deal well with large vocabularies and noisy data. The topical clustering of the response data is sufficient enough for a classification model to predict the context of the response, from which valid responses can be created.
Att skapa modeller som genererar kontextuella svar på frågor är ett svårt problem från början, någonting som blir än mer svårt när tillgänglig data innehåller både brus och känslig information. Det är både viktigt och av stort intresse att hitta modeller och metoder som kan hantera dessa svårigheter så att även problematisk data kan användas produktivt.Detta examensarbete föreslår en modell baserat på ett par samarbetande Topic Models (ämnesbaserade modeller) med skiljande ansvarsområden (LDA och GSDMM) för att underlätta de problematiska egenskaperna av datan. Modellen testas på ett verkligt dataset med dessa svårigheter samt ett dataset utan dessa. Målet är att 1) inspektera båda ämnesmodellernas beteende för att se om dessa kan representera datan på ett sådant sätt att andra modeller kan använda dessa som indata eller utdata och 2) förstå vilka av dessa svårigheter som kan hanteras som följd.Resultaten visar att ämnesmodellerna kan representera semantiken och betydelsen av dokument bra nog för att producera välartad indata för andra modeller. Denna representation kan även hantera stora ordlistor och brus i texten. Resultaten visar även att ämnesgrupperingen av responsdatan är godartad nog att användas som mål för klassificeringsmodeller sådant att korrekta meningar kan genereras som respons.
APA, Harvard, Vancouver, ISO, and other styles
45

Qi, Hao. "Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction." Thesis, Brunel University, 2011. http://bura.brunel.ac.uk/handle/2438/6346.

Full text
Abstract:
Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds. By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training. MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.
APA, Harvard, Vancouver, ISO, and other styles
46

Koop, Martin [Verfasser], and Stefan [Akademischer Betreuer] Katzenbeisser. "Preventing the Leakage of Privacy Sensitive User Data on the Web / Martin Koop ; Betreuer: Stefan Katzenbeisser." Passau : Universität Passau, 2021. http://d-nb.info/1226425577/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Oguchi, Chizoba. "A Comparison of Sensitive Splice Aware Aligners in RNA Sequence Data Analysis in Leaping towards Benchmarking." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18513.

Full text
Abstract:
Bioinformatics, as a field, rapidly develops and such development requires the design ofalgorithms and software. RNA-seq provides robust information on RNAs, both alreadyknown and new, hence the increased study of the RNA. Alignment is an important step indownstream analyses and the ability to map reads across splice junctions is a requirement ofan aligner to be suitable for mapping RNA-seq reads. Therefore, the necessity for a standardsplice-aware aligner. STAR, Rsubread and HISAT2 have not been singly studied for thepurpose of benchmarking one of them as a standard aligner for spliced RNA-seq reads. Thisstudy compared these aligners, found to be sensitive to splice sites, with regards to theirsensitivity to splice sites, performance with default parameter settings and the resource usageduring the alignment process. The aligners were matched with featureCounts. The resultsshow that STAR and Rsubread outperform HISAT2 in the aspects of sensitivity and defaultparameter settings. Rsubread was more sensitive to splice junctions than STAR butunderperformed with featureCounts. STAR had a consistent performance, with more demandon the memory and time resource, but showed it could be more sensitive with real data.
APA, Harvard, Vancouver, ISO, and other styles
48

Olorunnimbe, Muhammed. "Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32340.

Full text
Abstract:
In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
APA, Harvard, Vancouver, ISO, and other styles
49

Jun, Mi Kyung. "Effects of survey mode, gender, and perceived sensitivity on the quality of data regarding sensitive health behaviors." [Bloomington, Ind.] : Indiana University, 2005. http://wwwlib.umi.com/dissertations/fullcit/3167794.

Full text
Abstract:
Thesis (Ph.D.)--Indiana University, School of Health, Physical Education and Recreation, 2005.
Source: Dissertation Abstracts International, Volume: 66-04, Section: B, page: 2011. Adviser: Nathan W. Shier. "Title of dissertation home page (viewed Nov. 22, 2006)."
APA, Harvard, Vancouver, ISO, and other styles
50

Hellman, Hanna. "Data Aggregation in Time Sensitive Multi-Sensor Systems : Study and Implementation of Wheel Data Aggregation for Slip Detection in an Autonomous Vehicle Convoy." Thesis, KTH, Mekatronik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217857.

Full text
Abstract:
En övergång till bilar utrustade med avancerade automatiska säkerhetssystem (ADAS) och även utvecklingen mot självkörande fordon innebär ökad trafik på den lokala databussen. Det finns således ett behov av att både minska den faktiska mängden data som överförs, samtidigt som värdet på datat ökas. Data aggregation tillämpas i dagsläget inom områden såsom trådlösasensornätverk och mindre mobila robotar (WMR’s) och skulle kunna vara en del av en lösning. Denna rapport avser undersöka aggregation av sensordata i ett tidskänsligt system. För ett användarfall gällande halka under konvojkörning testas en aggregationsstrategi genom implementation på en fysisk demonstrator. Demonstratorn består av ett autonomt fordon i mindre skala som befinner sig i en konvoj med ett annat identiskt fordon. Resultaten pekar mot att ett viktat medelvärde, som i realtid anpassar sin viktning baserat på specifika sensorers koherens, med fördel kan användas för att estimera fordonshastighet baserat på individuella hjuls sensordata. Därefter kan en slip ratio beräknas, vilket avgör om fordonet befinner sig i ett tillstånd av halka eller ej. Begränsningar för den undersökta strategin inkluderar antalet icke-halkande hjul som behövs för tillförlitliga resultat. Simulerade resultat antyder att extra hastighetsreferenser behövs för tillförlitliga resultat. Relaterat till användarfallet konvojkörning föreslås att andra fordon används som hastighetsreferens. Detta skulle innebära en ökad precision för estimeringen av fordonshastigheten samt utgöra en intressant sammanslagning av områdena samarbetande cyberfysiska system (CO-CPS) och dataaggregation.
With an impending shift to more advanced safety systems and driver assistance (ADAS) in the vehicles we drive, and also increased autonomousity, comes increased amounts of data on the internal vehicle data bus. There is a need to lessen the amount of data and at the same time increase its value. Data aggregation, often applied in the field of environmental sensing or small mobile robots (WMR’s), could be a partial solution. This thesis choses to investigate an aggregation strategy applied to a use case regarding slip detection in a vehicle convoy. The approach was implemented in a physical demonstrator in the shape of a small autonomousvehicle convoy to produce quantitative data. The results imply that a weighted adaptive average can be used for vehicle velocity estimation based on the input of four individual wheel velocities. There after a slip ratio can be calculated which is used to decide if slip exists or not. Limitations of the proposed approach is however the number of velocity references that is needed since the results currently apply to one-wheel slipon a four-wheel vehicle. A proposed future direction related to the use case of convoy driving could be to include platooning vehicles as extra velocity references for the vehicles in the convoy, thus increasing the accuracy of the slip detection and merging the areas of CO-CPS and data aggregation.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography