Dissertations / Theses on the topic 'Data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Riminucci, Stefania. "COVID-19,Open data e data visualization:interazione con dati epidemiologici." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21577/.
Full textMondaini, Luca. "Data Visualization di dati spazio-temporali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16853/.
Full textYu, Wenyuan. "Improving data quality : data consistency, deduplication, currency and accuracy." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8899.
Full textLong, Christopher C. "Data Processing for NASA's TDRSS DAMA Channel." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/611474.
Full textPresently, NASA's Space Network (SN) does not have the ability to receive random messages from satellites using the system. Scheduling of the service must be done by the owner of the spacecraft through Goddard Space Flight Center (GSFC). The goal of NASA is to improve the current system so that random messages, that are generated on board the satellite, can be received by the SN. The messages will be requests for service that the satellites control system deems necessary. These messages will then be sent to the owner of the spacecraft where appropriate action and scheduling can take place. This new service is known as the Demand Assignment Multiple Access system (DAMA).
Budd, Chris. "Data Protection and Data Elimination." International Foundation for Telemetering, 2015. http://hdl.handle.net/10150/596395.
Full textData security is becoming increasingly important in all areas of storage. The news services frequently have stories about lost or stolen storage devices and the panic it causes. Data security in an SSD usually involves two components: data protection and data elimination. Data protection includes passwords to protect against unauthorized access and encryption to protect against recovering data from the flash chips. Data elimination includes erasing the encryption key and erasing the flash. Telemetry applications frequently add requirements such as write protection, external erase triggers, and overwriting the flash after the erase. This presentation will review these data security features.
Furrier, Sean Alexander, and Sean Alexander Furrier. "Communicating Data: Data-Driven Storytelling." Thesis, The University of Arizona, 2017. http://hdl.handle.net/10150/624989.
Full textChitondo, Pepukayi David Junior. "Data policies for big health data and personal health data." Thesis, Cape Peninsula University of Technology, 2016. http://hdl.handle.net/20.500.11838/2479.
Full textHealth information policies are constantly becoming a key feature in directing information usage in healthcare. After the passing of the Health Information Technology for Economic and Clinical Health (HITECH) Act in 2009 and the Affordable Care Act (ACA) passed in 2010, in the United States, there has been an increase in health systems innovations. Coupling this health systems hype is the current buzz concept in Information Technology, „Big data‟. The prospects of big data are full of potential, even more so in the healthcare field where the accuracy of data is life critical. How big health data can be used to achieve improved health is now the goal of the current health informatics practitioner. Even more exciting is the amount of health data being generated by patients via personal handheld devices and other forms of technology that exclude the healthcare practitioner. This patient-generated data is also known as Personal Health Records, PHR. To achieve meaningful use of PHRs and healthcare data in general through big data, a couple of hurdles have to be overcome. First and foremost is the issue of privacy and confidentiality of the patients whose data is in concern. Secondly is the perceived trustworthiness of PHRs by healthcare practitioners. Other issues to take into context are data rights and ownership, data suppression, IP protection, data anonymisation and reidentification, information flow and regulations as well as consent biases. This study sought to understand the role of data policies in the process of data utilisation in the healthcare sector with added interest on PHRs utilisation as part of big health data.
BRASCHI, GIACOMO. "La circolazione dei dati e l'analisi big data." Doctoral thesis, Università degli studi di Pavia, 2019. http://hdl.handle.net/11571/1244327.
Full textDescription of the legal instruments that regulate the circulation of data and analysis of possible legislative developments desirable to favor the circulation of data
Perovich, Laura J. (Laura Jones). "Data Experiences : novel interfaces for data engagement using environmental health data." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/95612.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 71-81).
For the past twenty years, the data visualization movement has reworked the way we engage with information. It has brought fresh excitement to researchers and reached broad audiences. But what comes next for data? I seek to create example "Data Experiences" that will contribute to developing new spaces of information engagement. Using data from Silent Spring Institute's environmental health studies as a test case, I explore Data Experiences that are immersive, interactive, and aesthetic. Environmental health datasets are ideal for this application as they are highly relevant to the general population and have appropriate complexity. Dressed in Data will focus on the experience of an individual with her/his own environmental health data while BigBarChart focuses on the experience of the community with the overall dataset. Both projects seek to present opportunities for nontraditional learning, community relevance, and social impact.
by Laura J. Perovich.
S.M.
Wang, Yi. "Data Management and Data Processing Support on Array-Based Scientific Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436157356.
Full textDedge, Parks Dana M. "Defining Data Science and Data Scientist." Thesis, University of South Florida, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10639701.
Full textThe world’s data sets are growing exponentially every day due to the large number of devices generating data residue across the multitude of global data centers. What to do with the massive data stores, how to manage them and defining who are performing these tasks has not been adequately defined and agreed upon by academics and practitioners. Data science is a cross disciplinary, amalgam of skills, techniques and tools which allow business organizations to identify trends and build assumptions which lead to key decisions. It is in an evolutionary state as new technologies with capabilities are still being developed snd deployed. The data science tasks and the data scientist skills needed in order to be successful with the analytics across the data stores are defined in this document. The research conducted across twenty-two academic articles, one book, eleven interviews and seventy-eight surveys are combined to articulate the convergence on the terms data science. In addition, the research identified that there are five key skill categories (themes) which have fifty-five competencies that are used globally by data scientists to successfully perform the art and science activities of data science.
Unspecified portions of statistics, technology programming, development of models and calculations are combined to determine outcomes which lead global organizations to make strategic decisions every day.
This research is intended to provide a constructive summary about the topics data science and data scientist in order to spark the dialogue for us to formally finalize the definitions and ultimately change the world by establishing set guidelines on how data science is performed and measured.
Proskurnia, Iuliia. "Genium Data Store : Distributed Data store." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-141552.
Full textMorshedzadeh, Iman. "Data Classification in Product Data Management." Thesis, Högskolan i Skövde, Institutionen för teknik och samhälle, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14651.
Full textDedge, Parks Dana M. "Defining Data Science and Data Scientist." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/7014.
Full textStrand, Mattias. "External Data Incorporation into Data Warehouses." Doctoral thesis, Kista : Skövde : Dept. of computer and system sciences, Stockholm University : School of humanities and informatics, University of Skövde, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-660.
Full textRadhakrishnan, Radhika. "Genome data modeling and data compression." abstract and full text PDF (free order & download UNR users only), 2007. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1447611.
Full textAbedjan, Ziawasch. "Improving RDF data with data mining." Phd thesis, Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/7133/.
Full textLinked Open Data (LOD) umfasst viele und oft sehr große öffentlichen Datensätze und Wissensbanken, die hauptsächlich in der RDF Triplestruktur bestehend aus Subjekt, Prädikat und Objekt vorkommen. Dabei repräsentiert jedes Triple einen Fakt. Unglücklicherweise erfordert die Heterogenität der verfügbaren öffentlichen Daten signifikante Integrationsschritte bevor die Daten in Anwendungen genutzt werden können. Meta-Daten wie ontologische Strukturen und Bereichsdefinitionen von Prädikaten sind zwar wünschenswert und idealerweise durch eine Wissensbank verfügbar. Jedoch sind Wissensbanken im Kontext von LOD oft unvollständig oder einfach nicht verfügbar. Deshalb ist es nützlich automatisch Meta-Informationen, wie ontologische Abhängigkeiten, Bereichs-und Domänendefinitionen und thematische Assoziationen von Ressourcen generieren zu können. Eine neue und vielversprechende Technik um solche Daten zu untersuchen basiert auf das entdecken von Assoziationsregeln, welche ursprünglich für Verkaufsanalysen in transaktionalen Datenbanken angewendet wurde. Wir haben eine Adaptierung dieser Technik auf RDF Daten entworfen und stellen das Konzept der Mining Konfigurationen vor, welches uns befähigt in RDF Daten auf unterschiedlichen Weisen Muster zu erkennen. Verschiedene Konfigurationen erlauben uns Schema- und Wertbeziehungen zu erkennen, die für interessante Anwendungen genutzt werden können. In dem Sinne, stellen wir assoziationsbasierte Verfahren für eine Prädikatvorschlagsverfahren, Datenvervollständigung, Ontologieverbesserung und Anfrageerleichterung vor. Das Vorschlagen von Prädikaten behandelt das Problem der inkonsistenten Verwendung von Ontologien, indem einem Benutzer, der einen neuen Fakt einem Rdf-Datensatz hinzufügen will, eine sortierte Liste von passenden Prädikaten vorgeschlagen wird. Eine Kombinierung von verschiedenen Konfigurationen erweitert dieses Verfahren sodass automatisch komplett neue Fakten für eine Wissensbank generiert werden. Hierbei stellen wir zwei Verfahren vor, einen nutzergesteuertenVerfahren, bei dem ein Nutzer die Entität aussucht die erweitert werden soll und einen datengesteuerten Ansatz, bei dem ein Algorithmus selbst die Entitäten aussucht, die mit fehlenden Fakten erweitert werden. Da Wissensbanken stetig wachsen und sich verändern, ist ein anderer Ansatz um die Verwendung von RDF Daten zu erleichtern die Verbesserung von Ontologien. Hierbei präsentieren wir ein Assoziationsregeln-basiertes Verfahren, der Daten und zugrundeliegende Ontologien zusammenführt. Durch die Verflechtung von unterschiedlichen Konfigurationen leiten wir einen neuen Algorithmus her, der gleichbedeutende Prädikate entdeckt. Diese Prädikate können benutzt werden um Ergebnisse einer Anfrage zu erweitern oder einen Nutzer während einer Anfrage zu unterstützen. Für jeden unserer vorgestellten Anwendungen präsentieren wir eine große Auswahl an Experimenten auf Realweltdatensätzen. Die Experimente und Evaluierungen zeigen den Mehrwert von Assoziationsregeln-Generierung für die Integration und Nutzbarkeit von RDF Daten und bestätigen die Angemessenheit unserer konfigurationsbasierten Methodologie um solche Regeln herzuleiten.
Fahmy, A. "Data encryption of communication data links." Thesis, University of Kent, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.385199.
Full textSun, Wenjun. "Parallel data processing for semistructured data." Thesis, London South Bank University, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.434394.
Full textWiemann, Stefan. "Data Fusion in Spatial Data Infrastructures." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-216985.
Full textDie Entwicklung des Internet im Laufe des letzten Jahrzehnts hat die Verfügbarkeit und öffentliche Wahrnehmung von Geodaten, sowie Möglichkeiten zu deren Erfassung und Nutzung, wesentlich verbessert. Dies liegt sowohl an der Etablierung amtlicher Geodateninfrastrukturen (GDI), als auch an der steigenden Anzahl Communitybasierter und kommerzieller Angebote. Da der Fokus zumeist auf der Bereitstellung von Geodaten liegt, gibt es jedoch kaum Möglichkeiten die Menge an, über das Internet verteilten, Datensätzen ad hoc zu verlinken und zusammenzuführen, was mitunter zur Isolation von Geodatenbeständen führt. Möglichkeiten zu deren Fusion sind allerdings essentiell, um Informationen zur Entscheidungsunterstützung in Bezug auf raum-zeitliche Fragestellungen zu extrahieren. Um eine ad hoc Fusion von Geodaten im Internet zu ermöglichen, behandelt diese Arbeit zwei Themenschwerpunkte. Zunächst wird eine dienstebasierten Umsetzung des Fusionsprozesses konzipiert, um bestehende GDI funktional zu erweitern. Dafür werden wohldefinierte, wiederverwendbare Funktionsblöcke beschrieben und über standardisierte Diensteschnittstellen bereitgestellt. Dies ermöglicht eine dynamische Komposition anwendungsbezogener Fusionsprozesse über das Internet. Des weiteren werden Geoprozessierungspatterns definiert, um populäre und häufig eingesetzte Diensteketten zur Bewältigung bestimmter Teilaufgaben der Geodatenfusion zu beschreiben und die Komposition und Automatisierung von Fusionsprozessen zu vereinfachen. Als zweiten Schwerpunkt beschäftigt sich die Arbeit mit der Frage, wie Relationen zwischen Geodatenbeständen im Internet erstellt, beschrieben und genutzt werden können. Der gewählte Ansatz basiert auf Linked Data Prinzipien und schlägt eine Brücke zwischen diensteorientierten GDI und dem Semantic Web. Während somit Geodaten in bestehenden GDI verbleiben, können Werkzeuge und Standards des Semantic Web genutzt werden, um Informationen aus den ermittelten Geodatenrelationen abzuleiten. Zur Überprüfung der entwickelten Konzepte wurde eine Reihe von Anwendungsfällen konzipiert und mit Hilfe einer prototypischen Implementierung umgesetzt und anschließend evaluiert. Der Schwerpunkt lag dabei auf einer interoperablen, transparenten und erweiterbaren Umsetzung dienstebasierter Fusionsprozesse, sowie einer formalisierten Beschreibung von Datenrelationen, unter Nutzung offener und etablierter Standards. Die Software folgt einer modularen Struktur und ist als Open Source frei verfügbar. Sie erlaubt sowohl die Entwicklung neuer Funktionalität durch Entwickler als auch die Einbindung existierender Daten- und Prozessierungsdienste während der Komposition eines Fusionsprozesses
Virinchi, Billa. "Data Visualization of Telenor mobility data." Thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13951.
Full textGullipalli, Deep Kumar. "Data envelopment analysis with sparse data." Thesis, Kansas State University, 2011. http://hdl.handle.net/2097/13092.
Full textDepartment of Industrial & Manufacturing Systems Engineering
David H. Ben-Arieh
Quest for continuous improvement among the organizations and issue of missing data for data analysis are never ending. This thesis brings these two topics under one roof, i.e., to evaluate the productivity of organizations with sparse data. This study focuses on Data Envelopment Analysis (DEA) to determine the efficiency of 41 member clinics of Kansas Association of Medically Underserved (KAMU) with missing data. The primary focus of this thesis is to develop new reliable methods to determine the missing values and to execute DEA. DEA is a linear programming methodology to evaluate relative technical efficiency of homogenous Decision Making Units, using multiple inputs and outputs. Effectiveness of DEA depends on the quality and quantity of data being used. DEA outcomes are susceptible to missing data, thus, creating a need to supplement sparse data in a reliable manner. Determining missing values more precisely improves the robustness of DEA methodology. Three methods to determine the missing values are proposed in this thesis based on three different platforms. First method named as Average Ratio Method (ARM) uses average value, of all the ratios between two variables. Second method is based on a modified Fuzzy C-Means Clustering algorithm, which can handle missing data. The issues associated with this clustering algorithm are resolved to improve its effectiveness. Third method is based on interval approach. Missing values are replaced by interval ranges estimated by experts. Crisp efficiency scores are identified in similar lines to how DEA determines efficiency scores using the best set of weights. There exists no unique way to evaluate the effectiveness of these methods. Effectiveness of these methods is tested by choosing a complete dataset and assuming varying levels of data as missing. Best set of recovered missing values, based on the above methods, serves as a source to execute DEA. Results show that the DEA efficiency scores generated with recovered values are close within close proximity to the actual efficiency scores that would be generated with the complete data. As a summary, this thesis provides an effective and practical approach for replacing missing values needed for DEA.
Kostopoulos, A. "Combinatorial data analysis for data ordering." Thesis, University of Liverpool, 2016. http://livrepository.liverpool.ac.uk/3004631/.
Full textTata, Maitreyi. "Data analytics on Yelp data set." Kansas State University, 2017. http://hdl.handle.net/2097/38237.
Full textDepartment of Computing and Information Sciences
William H. Hsu
In this report, I describe a query-driven system which helps in deciding which restaurant to invest in or which area is good to open a new restaurant in a specific place. Analysis is performed on already existing businesses in every state. This is based on certain factors such as the average star rating, the total number of reviews associated with a specific restaurant, the price range of the restaurant etc. The results will give an idea of successful restaurants in a city, which helps you decide where to invest and what are the things to be kept in mind while starting a new business. The main scope of the project is to concentrate on Analytics and Data Visualization.
Liu, Tantan. "Data Mining over Hidden Data Sources." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1343313341.
Full textYang, Ying. "Interactive Data Management and Data Analysis." Thesis, State University of New York at Buffalo, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10288109.
Full textEveryone today has a big data problem. Data is everywhere and in different formats, they can be referred to as data lakes, data streams, or data swamps. To extract knowledge or insights from the data or to support decision-making, we need to go through a process of collecting, cleaning, managing and analyzing the data. In this process, data cleaning and data analysis are two of the most important and time-consuming components.
One common challenge in these two components is a lack of interaction. The data cleaning and data analysis are typically done as a batch process, operating on the whole dataset without any feedback. This leads to long, frustrating delays during which users have no idea if the process is effective. Lacking interaction, human expert effort is needed to make decisions on which algorithms or parameters to use in the systems for these two components.
We should teach computers to talk to humans, not the other way around. This dissertation focuses on building systems --- Mimir and CIA --- that help user conduct data cleaning and analysis through interaction. Mimir is a system that allows users to clean big data in a cost- and time-efficient way through interaction, a process I call on-demand ETL. Convergent inference algorithms (CIA) are a family of inference algorithms in probabilistic graphical models (PGM) that enjoys the benefit of both exact and approximate inference algorithms through interaction.
Mimir provides a general language for user to express different data cleaning needs. It acts as a shim layer that wraps around the database making it possible for the bulk of the ETL process to remain within a classical deterministic system. Mimir also helps users to measure the quality of an analysis result and provides rankings for cleaning tasks to improve the result quality in a cost efficient manner. CIA focuses on providing user interaction through the process of inference in PGMs. The goal of CIA is to free users from the upfront commitment to either approximate or exact inference, and provide user more control over time/accuracy trade-offs to direct decision-making and computation instance allocations. This dissertation describes the Mimir and CIA frameworks to demonstrate that it is feasible to build efficient interactive data management and data analysis systems.
Taylor, Phillip. "Data mining of vehicle telemetry data." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/77645/.
Full textYao, Fang. "Functional data analysis for longitudinal data /." For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2003. http://uclibs.org/PID/11984.
Full textRamljak, Dusan. "Data Driven High Performance Data Access." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/530207.
Full textPh.D.
Low-latency, high throughput mechanisms to retrieve data become increasingly crucial as the cyber and cyber-physical systems pour out increasing amounts of data that often must be analyzed in an online manner. Generally, as the data volume increases, the marginal utility of an ``average'' data item tends to decline, which requires greater effort in identifying the most valuable data items and making them available with minimal overhead. We believe that data analytics driven mechanisms have a big role to play in solving this needle-in-the-haystack problem. We rely on the claim that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many applications offers significant potential to enable many I/O optimizations. Our research covers exploitation of storage hierarchy for data driven caching and tiering, reduction of distance between data and computations, removing redundancy in data, using sparse representations of data, the impact of data access mechanisms on resilience, energy consumption, storage usage, and the enablement of new classes of data driven applications. For caching and prefetching, we offer a powerful model that separates the process of access prediction from the data retrieval mechanism. Predictions are made on a data entity basis and used the notions of ``context'' and its aspects such as ``belief'' to uncover and leverage future data needs. This approach allows truly opportunistic utilization of predictive information. We elaborate on which aspects of the context we are using in areas other than caching and prefetching different situations and why it is appropriate in the specified situation. We present in more details the methods we have developed, BeliefCache for data driven caching and prefetching and AVSC for pattern mining based compression of data. In BeliefCache, using a belief, an aspect of context representing an estimate of the probability that the storage element will be needed, we developed modular framework BeliefCache, to make unified informed decisions about that element or a group. For the workloads we examined we were able to capture complex non-sequential access patterns better than a state-of-the-art framework for optimizing cloud storage gateways. Moreover, our framework is also able to adjust to variations in the workload faster. It also does not require a static workload to be effective since modular framework allows for discovering and adapting to the changes in the workload. In AVSC, using an aspect of context to gauge the similarity of the events, we perform our compression by keeping relevant events intact and approximating other events. We do that in two stages. We first generate a summarization of the data, then approximately match the remaining events with the existing patterns if possible, or add the patterns to the summary otherwise. We show gains over the plain lossless compression for a specified amount of accuracy for purposes of identifying the state of the system and a clear tradeoff in between the compressibility and fidelity. In other mentioned research areas we present challenges and opportunities with the hope that will spur researchers to further examine those issues in the space of rapidly emerging data intensive applications. We also discuss the ideas how our research in other domains could be applied in our attempts to provide high performance data access.
Temple University--Theses
Sherikar, Vishnu Vardhan Reddy. "I2MAPREDUCE: DATA MINING FOR BIG DATA." CSUSB ScholarWorks, 2017. https://scholarworks.lib.csusb.edu/etd/437.
Full textMathew, Avin D. "Asset management data warehouse data modelling." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/19310/1/Avin_Mathew_Thesis.pdf.
Full textMathew, Avin D. "Asset management data warehouse data modelling." Queensland University of Technology, 2008. http://eprints.qut.edu.au/19310/.
Full textNiggemann, Oliver. "Visual data mining of graph based data." [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=962400505.
Full textPeralta, Veronika. "Data Quality Evaluation in Data Integration Systems." Phd thesis, Université de Versailles-Saint Quentin en Yvelines, 2006. http://tel.archives-ouvertes.fr/tel-00325139.
Full textSánchez, Adam. "Big Data, Linked Data y Web semántica." Universidad Peruana de Ciencias Aplicadas (UPC), 2016. http://hdl.handle.net/10757/620705.
Full textConferencia que aborda aspectos del protocolo Linked Data, temas de Big Data y Web Semantica,
Nyström, Simon, and Joakim Lönnegren. "Processing data sources with big data frameworks." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188204.
Full textBig data är ett koncept som växer snabbt. När mer och mer data genereras och samlas in finns det ett ökande behov av effektiva lösningar som kan användas föratt behandla all denna data, i försök att utvinna värde från den. Syftet med detta examensarbete är att hitta ett effektivt sätt att snabbt behandla ett stort antal filer, av relativt liten storlek. Mer specifikt så är det för att testa två ramverk som kan användas vid big data-behandling. De två ramverken som testas mot varandra är Apache NiFi och Apache Storm. En metod beskrivs för att, för det första, konstruera ett dataflöde och, för det andra, konstruera en metod för att testa prestandan och skalbarheten av de ramverk som kör dataflödet. Resultaten avslöjar att Apache Storm är snabbare än NiFi, på den typen av test som gjordes. När antalet noder som var med i testerna ökades, så ökade inte alltid prestandan. Detta visar att en ökning av antalet noder, i en big data-behandlingskedja, inte alltid leder till bättre prestanda och att det ibland krävs andra åtgärder för att öka prestandan.
Li, Liangchun. "Web-based data visualization for data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ35845.pdf.
Full textTran, Viet-Trung. "Scalable data-management systems for Big Data." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.
Full textZebbiche, K. "Data Hiding for Securing Fingerprint Data Access." Thesis, Queen's University Belfast, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.517622.
Full textJägerhult, Fjelberg Marianne. "Predicting data traffic in cellular data networks." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-169388.
Full textRoss, Colin. "Applications of data fusion in data approximation." Thesis, University of Huddersfield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.247372.
Full textConlin, Adrian Keith. "Complex sensor data analysis through data augmentation." Thesis, University of Newcastle Upon Tyne, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.320016.
Full textCao, Yang. "Querying big data with bounded data access." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/25421.
Full textMueller, G. "Data Consistency Checks on Flight Test Data." International Foundation for Telemetering, 2014. http://hdl.handle.net/10150/577405.
Full textThis paper reflects the principal results of a study performed internally by Airbus's flight test centers. The purpose of this study was to share the body of knowledge concerning data consistency checks between all Airbus business units. An analysis of the test process is followed by the identification of the process stakeholders involved in ensuring data consistency. In the main part of the paper several different possibilities for improving data consistency are listed; it is left to the discretion of the reader to determine the appropriateness these methods.
Fitzgerald, Alan. "DATA STORAGE SUITED TO FLIGHT DATA RECORDERS." International Foundation for Telemetering, 2004. http://hdl.handle.net/10150/605266.
Full textOladele, Kazeem Ayinde. "Investigating pluralistic data architectures in data warehousing." Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/10534.
Full textPeralta, Costabel Veronika del Carmen. "Data quality evaluation in data integration systems." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0020.
Full textCette thèse porte sur la qualité des données dans les Systèmes d’Intégration de Données (SID). Nous nous intéressons, plus précisément, aux problèmes de l’évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l’utilisation de mesures de qualité pour l’amélioration de la conception du SID et la conséquente amélioration de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l’exactitude des données
Zhang, Yihua. "ON DATA UTILITY IN PRIVATE DATA PUBLISHING." Miami University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=miami1272986770.
Full textModur, Sharada P. "Missing Data Methods for Clustered Longitudinal Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1274876785.
Full textZha, Xiao. "Topological Data Analysis on Road Network Data." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu155563664988436.
Full text