Rozprawy doktorskie na temat „Genomics Big Data Engineering”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Genomics Big Data Engineering”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Goldstein, Theodore C. "Tools for extracting actionable medical knowledge from genomic big data". Thesis, University of California, Santa Cruz, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3589324.
Pełny tekst źródłaCancer is an ideal target for personal genomics-based medicine that uses high-throughput genome assays such as DNA sequencing, RNA sequencing, and expression analysis (collectively called omics); however, researchers and physicians are overwhelmed by the quantities of big data from these assays and cannot interpret this information accurately without specialized tools. To address this problem, I have created software methods and tools called OCCAM (OmiC data Cancer Analytic Model) and DIPSC (Differential Pathway Signature Correlation) for automatically extracting knowledge from this data and turning it into an actionable knowledge base called the activitome. An activitome signature measures a mutation's effect on the cellular molecular pathway. As well, activitome signatures can also be computed for clinical phenotypes. By comparing the vectors of activitome signatures of different mutations and clinical outcomes, intrinsic relationships between these events may be uncovered. OCCAM identifies activitome signatures that can be used to guide the development and application of therapies. DIPSC overcomes the confounding problem of correlating multiple activitome signatures from the same set of samples. In addition, to support the collection of this big data, I have developed MedBook, a federated distributed social network designed for a medical research and decision support system. OCCAM and DIPSC are two of the many apps that will operate inside of MedBook. MedBook extends the Galaxy system with a signature database, an end-user oriented application platform, a rich data medical knowledge-publishing model, and the Biomedical Evidence Graph (BMEG). The goal of MedBook is to improve the outcomes by learning from every patient.
Miller, Chase Allen. "Towards a Web-Based, Big Data, Genomics Ecosystem". Thesis, Boston College, 2014. http://hdl.handle.net/2345/bc-ir:104052.
Pełny tekst źródłaRapid advances in genome sequencing enable a wide range of biological experiments on a scale that was until recently restricted to large genome centers. However, the analysis of the resulting vast genomic datasets is time-consuming, unintuitive and requires considerable computational expertise and costly infrastructure. Collectively, these factors effectively exclude many bench biologists from genome-scale analyses. Web-based visualization and analysis libraries, frameworks, and applications were developed to empower all biological researchers to easily, interactively, and in a visually driven manner, analyze large biomedical datasets that are essential for their research, without bioinformatics expertise and costly hardware
Thesis (PhD) — Boston College, 2014
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
Hansen, Simon, i Erik Markow. "Big Data : Implementation av Big Data i offentlig verksamhet". Thesis, Högskolan i Halmstad, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-38756.
Pełny tekst źródłaKämpe, Gabriella. "How Big Data Affects UserExperienceReducing cognitive load in big data applications". Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163995.
Pełny tekst źródłaLuo, Changqing. "Towards Secure Big Data Computing". Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1529929603348119.
Pełny tekst źródłaSchobel, Seth Adam Micah. "The viral genomics revolution| Big data approaches to basic viral research, surveillance, and vaccine development". Thesis, University of Maryland, College Park, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10011480.
Pełny tekst źródłaSince the decoding of the first RNA virus in 1976, the field of viral genomics has exploded, first through the use of Sanger sequencing technologies and later with the use next-generation sequencing approaches. With the development of these sequencing technologies, viral genomics has entered an era of big data. New challenges for analyzing these data are now apparent. Here, we describe novel methods to extend the current capabilities of viral comparative genomics. Through the use of antigenic distancing techniques, we have examined the relationship between the antigenic phenotype and the genetic content of influenza virus to establish a more systematic approach to viral surveillance and vaccine selection. Distancing of Antigenicity by Sequence-based Hierarchical Clustering (DASH) was developed and used to perform a retrospective analysis of 22 influenza seasons. Our methods produced vaccine candidates identical to or with a high concordance of antigenic similarity with those selected by the WHO. In a second effort, we have developed VirComp and OrionPlot: two independent yet related tools. These tools first generate gene-based genome constellations, or genotypes, of viral genomes, and second create visualizations of the resultant genome constellations. VirComp utilizes sequence-clustering techniques to infer genome constellations and prepares genome constellation data matrices for visualization with OrionPlot. OrionPlot is a java application for tailoring genome constellation figures for publication. OrionPlot allows for color selection of gene cluster assignments, customized box sizes to enable the visualization of gene comparisons based on sequence length, and label coloring. We have provided five analyses designed as vignettes to illustrate the utility of our tools for performing viral comparative genomic analyses. Study three focused on the analysis of respiratory syncytial virus (RSV) genomes circulating during the 2012- 2013 RSV season. We discovered a correlation between a recent tandem duplication within the G gene of RSV-A and a decrease in severity of infection. Our data suggests that this duplication is associated with a higher infection rate in female infants than is generally observed. Through these studies, we have extended the state of the art of genotype analysis, phenotype/genotype studies and established correlations between clinical metadata and RSV sequence data.
Cheelangi, Madhusudan. "Result Distribution in Big Data Systems". Thesis, University of California, Irvine, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=1539891.
Pełny tekst źródłaWe are building a Big Data Management System (BDMS) called AsterixDB at UCI. Since AsterixDB is designed to operate on large volumes of data, the results for its queries can be potentially very large, and AsterixDB is also designed to operate under high concurency workloads. As a result, we need a specialized mechanism to manage these large volumes of query results and deliver them to the clients. In this thesis, we present an architecture and an implementation of a new result distribution framework that is capable of handling large volumes of results under high concurency workloads. We present the various components of this result distribution framework and show how they interact with each other to manage large volumes of query results and deliver them to clients. We also discuss various result distribution policies that are possible with our framework and compare their performance through experiments.
We have implemented a REST-like HTTP client interface on top of the result distribution framework to allow clients to submit queries and obtain their results. This client interface provides two modes for clients to choose from to read their query results: synchronous mode and asynchronous mode. In synchronous mode, query results are delivered to a client as a direct response to its query within the same request-response cycle. In asynchronous mode, a query handle is returned instead to the client as a response to its query. The client can store the handle and send another request later, including the query handle, to read the result for the query whenever it wants. The architectural support for these two modes is also described in this thesis. We believe that the result distribution framework, combined with this client interface, successfully meets the result management demands of AsterixDB.
Laurila, M. (Mikko). "Big data in Finnish financial services". Bachelor's thesis, University of Oulu, 2017. http://urn.fi/URN:NBN:fi:oulu-201711243156.
Pełny tekst źródłaTämän työn tavoitteena on selvittää big data -käsitettä sekä kehittää ymmärrystä Suomen rahoitusalan big data -kypsyydestä. Tutkimuskysymykset tutkielmalle ovat “Millaisia big data -ratkaisuja on otettu käyttöön rahoitusalalla Suomessa?” sekä “Mitkä tekijät hidastavat big data -ratkaisujen implementointia rahoitusalalla Suomessa?”. Big data käsitteenä liitetään yleensä valtaviin datamassoihin ja suuruuden ekonomiaan. Siksi big data onkin mielenkiintoinen aihe tutkittavaksi suomalaisessa kontekstissa, missä datajoukkojen koko on jossain määrin rajoittunut markkinan koon myötä. Työssä esitetään big datan määrittely kirjallisuuteen perustuen sekä esitetään yhteenveto big datan soveltamisesta Suomessa aikaisempiin tutkimuksiin perustuen. Työssä on toteutettu laadullinen aineistoanalyysi julkisesti saatavilla olevasta informaatiosta big datan käytöstä rahoitusalalla Suomessa. Tulokset osoittavat big dataa hyödynnettävän jossain määrin rahoitusalalla Suomessa, ainakin suurikokoisissa organisaatioissa. Rahoitusalalle erityisiä ratkaisuja ovat esimerkiksi hakemuskäsittelyprosessien automatisointi. Selkeimmät big data -ratkaisujen implementointia hidastavat tekijät ovat osaavan työvoiman puute, sekä uusien regulaatioiden asettamat paineet kehitysresursseille. Työ muodostaa eräänlaisen kokonaiskuvan big datan hyödyntämisestä rahoitusalalla Suomessa. Tutkimus perustuu julkisen aineiston analyysiin, mikä osaltaan luo pohjan jatkotutkimukselle aiheesta. Jatkossa haastatteluilla voitaisiinkin edelleen syventää tietämystä aiheesta
Flike, Felix, i Markus Gervard. "BIG DATA-ANALYS INOM FOTBOLLSORGANISATIONER En studie om big data-analys och värdeskapande". Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20117.
Pełny tekst źródłaNyström, Simon, i Joakim Lönnegren. "Processing data sources with big data frameworks". Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188204.
Pełny tekst źródłaBig data är ett koncept som växer snabbt. När mer och mer data genereras och samlas in finns det ett ökande behov av effektiva lösningar som kan användas föratt behandla all denna data, i försök att utvinna värde från den. Syftet med detta examensarbete är att hitta ett effektivt sätt att snabbt behandla ett stort antal filer, av relativt liten storlek. Mer specifikt så är det för att testa två ramverk som kan användas vid big data-behandling. De två ramverken som testas mot varandra är Apache NiFi och Apache Storm. En metod beskrivs för att, för det första, konstruera ett dataflöde och, för det andra, konstruera en metod för att testa prestandan och skalbarheten av de ramverk som kör dataflödet. Resultaten avslöjar att Apache Storm är snabbare än NiFi, på den typen av test som gjordes. När antalet noder som var med i testerna ökades, så ökade inte alltid prestandan. Detta visar att en ökning av antalet noder, i en big data-behandlingskedja, inte alltid leder till bättre prestanda och att det ibland krävs andra åtgärder för att öka prestandan.
Adler, Philip David Felix. "Crystalline cheminformatics : big data approaches to crystal engineering". Thesis, University of Southampton, 2015. https://eprints.soton.ac.uk/410940/.
Pełny tekst źródłaOhlsson, Anna, i Dan Öman. "A guide in the Big Data jungle". Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-1057.
Pełny tekst źródłaAl-Shiakhli, Sarah. "Big Data Analytics: A Literature Review Perspective". Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-74173.
Pełny tekst źródłaHellström, Hampus, i Oscar Ohm. "Big Data - Stort intresse, nya möjligheter". Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20307.
Pełny tekst źródłaToday’s information society is consisting of people, businesses and machinesthat together generate large amounts of data every day. This exponatial growthof datageneration has led to the creation of what we call Big Data. Amongother things the data produced, gathered and stored can be used bycompanies to practise knowledge based business development. Traditionallythe methods used for generating knowledge about a business environment andmarket has been timeconsuming and expensive and often conducted by aspecialized research company that carry out market research and surveys.Today the analysis of existing data sets is becoming increasingly valuable, andthe research companies have a great opportunity to mine value from societyshuge amounts of data.The study is designed as an exploratory case study that investigates how theresearch companies in Sweden work with these data sets, and identifies someof the challenges they face in the application of Big Data analysis in theirbusiness. The results shows that the participating research companies areusing Big Data tools to steamline existing business processes and to someextent use it as a complementary value to traditional research and surveys.Although they see possibilities with the technology, the participatingcompanies are unwilling to drive the development of new business processesthat are supported by Big Data analysis. There is a challenge identified in thelack of competence prevailing in the Swedish market. The result also coverssome of the ethical aspects research companies need to take intoconsideration. The ethical issues are especially problematic when data, thatcan be linked to an individual, is processed and analysed in real time.
Huttanus, Herbert M. "Screening and Engineering Phenotypes using Big Data Systems Biology". Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/102706.
Pełny tekst źródłaDoctor of Philosophy
Smith, Derik Lafayette, i Satya Prakash Dhavala. "Using big data for decisions in agricultural supply chain". Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/81106.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 53-54).
Agriculture is an industry where historical and current data abound. This paper investigates the numerous data sources available in the agricultural field and analyzes them for usage in supply chain improvement. We identified certain applicable data and investigated methods of using this data to make better supply chain decisions within the agricultural chemical distribution chain. We identified a specific product, AgChem, for this study. AgChem, like many agricultural chemicals, is forecasted and produced months in advance of a very short sales window. With improved demand forecasting based on abundantly-available data, Dow AgroSciences, the manufacturer of AgChem, can make better production and distribution decisions. We analyzed various data to identify factors that influence AgChem sales. Many of these factors relate to corn production since AgChem is generally used with corn crops. Using regression models, we identified leading indicators that assist to forecast future demand of the product. We developed three regressions models to forecast demand on various horizons. The first model identified that the price of corn and price of fertilizer affect the annual, nation-wide demand for the product. The second model explains expected geographic distribution of this annual demand. It shows that the number of retailers in an area is correlated to the total annual demand in that area. The model also quantifies the relationship between the sales in the first few weeks of the season, and the total sales for the season. And the third model serves as a short-term, demand-sensing tool to predict the timing of the demand within certain geographies. We found that weather conditions and the timing of harvest affect when AgChem sales occur. With these models, Dow AgroSciences has a better understanding of how external factors influence the sale of AgChem. With this new understanding, they can make better decisions about the distribution of the product and position inventory in a timely manner at the source of demand.
by Derik Lafayette Smith and Satya Prakash Dhavala.
M.Eng.in Logistics
Lu, Feng. "Big data scalability for high throughput processing and analysis of vehicle engineering data". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207084.
Pełny tekst źródłaStjerna, Albin. "Medium Data on Big Data Predicting Disk Failures in CERNs NetApp-based Data Storage System". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-337638.
Pełny tekst źródłaBao, Shunxing. "Algorithmic Enhancements to Data Colocation Grid Frameworks for Big Data Medical Image Processing". Thesis, Vanderbilt University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877282.
Pełny tekst źródłaLarge-scale medical imaging studies to date have predominantly leveraged in-house, laboratory-based or traditional grid computing resources for their computing needs, where the applications often use hierarchical data structures (e.g., Network file system file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance for laboratory-based approaches reveal that performance is impeded by standard network switches since typical processing can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. On the other hand, the grid may be costly to use due to the dedicated resources used to execute the tasks and lack of elasticity. With increasing availability of cloud-based big data frameworks, such as Apache Hadoop, cloud-based services for executing medical imaging studies have shown promise.
Despite this promise, our studies have revealed that existing big data frameworks illustrate different performance limitations for medical imaging applications, which calls for new algorithms that optimize their performance and suitability for medical imaging. For instance, Apache HBases data distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). Big data medical image processing applications involving multi-stage analysis often exhibit significant variability in processing times ranging from a few seconds to several days. Due to the sequential nature of executing the analysis stages by traditional software technologies and platforms, any errors in the pipeline are only detected at the later stages despite the sources of errors predominantly being the highly compute-intensive first stage. This wastes precious computing resources and incurs prohibitively higher costs for re-executing the application. To address these challenges, this research propose a framework - Hadoop & HBase for Medical Image Processing (HadoopBase-MIP) - which develops a range of performance optimization algorithms and employs a number of system behaviors modeling for data storage, data access and data processing. We also introduce how to build up prototypes to help empirical system behaviors verification. Furthermore, we introduce a discovery with the development of HadoopBase-MIP about a new type of contrast for medical imaging deep brain structure enhancement. And finally we show how to move forward the Hadoop based framework design into a commercialized big data / High performance computing cluster with cheap, scalable and geographically distributed file system.
Jiang, Yiming. "Automated Generation of CAD Big Data for Geometric Machine Learning". The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1576329384392725.
Pełny tekst źródłaMoran, Andrew M. Eng Massachusetts Institute of Technology. "Improving big data visual analytics with interactive virtual reality". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105972.
Pełny tekst źródłaThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 80-84).
For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors impede the practices of not only processing data, but also analyzing and displaying it in an efficient manner to the user. Many efforts have been completed in the data mining and visual analytics community to create effective ways to further improve analysis and achieve the knowledge desired for better understanding. Our approach for improved big data visual analytics is two-fold, focusing on both visualization and interaction. Given geo-tagged information, we are exploring the benefits of visualizing datasets in the original geospatial domain by utilizing a virtual reality platform. After running proven analytics on the data, we intend to represent the information in a more realistic 3D setting, where analysts can achieve an enhanced situational awareness and rely on familiar perceptions to draw in-depth conclusions on the dataset. In addition, developing a human-computer interface that responds to natural user actions and inputs creates a more intuitive environment. Tasks can be performed to manipulate the dataset and allow users to dive deeper upon request, adhering to desired demands and intentions. Due to the volume and popularity of social media, we developed a 3D tool visualizing Twitter on MIT's campus for analysis. Utilizing emerging technologies of today to create a fully immersive tool that promotes visualization and interaction can help ease the process of understanding and representing big data.
by Andrew Moran.
M. Eng.
Jun, Sang-Woo. "Scalable multi-access flash store for Big Data analytics". Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/87947.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 47-49).
For many "Big Data" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this work we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70[mu]s, including 3.5[mu]s of network overhead.
by Sang-Woo Jun.
S.M.
Hansson, Karakoca Josef. "Big Data Types : Internally Parallel in an Actor Language". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-372248.
Pełny tekst źródłaLindberg, Johan. "Big Data och Hadoop : Nästa generation av lagring". Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31079.
Pełny tekst źródłaMålet med rapporten och undersökningen är att på en teoretisk nivå undersöka möjligheterna för Försäkringskassan IT att byta plattform för lagring av data och information som används i deras dagliga arbete. Försäkringskassan samlar på sig oerhörda mängder data på daglig basis innehållandes allt från personupp- gifter, programkod, utbetalningar och kundtjänstärenden. Idag lagrar man allt detta i stora relationsdatabaser vilket leder till problem med skalbarhet och prestanda. Den nya plattformen som undersöks bygger på en lagringsteknik vid namn Hadoop. Hadoop är utvecklat för att både lagra och processerna data distribuerat över så kallade kluster bestående av billigare serverhårdvara. Plattformen utlovar näst intill linjär skalbarhet, möjlighet att lagra all data med hög feltolerans samt att hantera enorma datamängder. Undersökningen genomförs genom teoristudier och ett proof of concept. Teoristudierna fokuserar på bakgrunden på Hadoop, dess uppbyggnad och struktur samt hur framtiden ser ut. Dagens upplägg för lagring hos Försäkringskassan specificeras och jämförs med den nya plattformen. Ett proof of concept genomförs på en testmiljö hos För- säkringskassan där en Hadoop plattform från Hortonworks används för att påvi- sa hur lagring kan fungera samt att så kallad ostrukturerad data kan lagras. Undersökningen påvisar inga teoretiska problem i att byta till den nya plattformen. Dock identifieras ett behov av att flytta hanteringen av data från inläsning till utläsning. Detta beror på att dagens lösning med relationsdatabaser kräver väl strukturerad data för att kunna lagra den medan Hadoop kan lagra allt utan någon struktur. Däremot kräver Hadoop mer handpåläggning när det kommer till att hämta data och arbeta med den.
Toole, Jameson Lawrence. "Putting big data in its place : understanding cities and human mobility with new data sources". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/98631.
Pełny tekst źródłaCataloged from PDF version of thesis. "February 2015."
Includes bibliographical references (pages 223-241).
According the United Nations Population Fund (UNFPA), 2008 marked the first year in which the majority of the planet's population lived in cities. Urbanization, already over 80% in many western regions, is increasing rapidly as migration into cities continue. The density of cities provides residents access to places, people, and goods, but also gives rise to problems related to health, congestion, and safety. In parallel to rapid urbanization, ubiquitous mobile computing, namely the pervasive use of cellular phones, has generated a wealth of data that can be analyzed to understand and improve urban systems. These devices and the applications that run on them passively record social, mobility, and a variety of other behaviors of their users with extremely high spatial and temporal resolution. This thesis presents a variety of novel methods and analyses to leverage the data generated from these devices to understand human behavior within cities. It details new ways to measure and quantify human behaviors related to mobility, social influence, and economic outcomes.
by Jameson Lawrence Toole.
Ph. D.
Bhagattjee, Benoy. "Emergence and taxonomy of big data as a service". Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/90709.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 82-83).
The amount of data that we produce and consume is growing exponentially in the modem world. Increasing use of social media and new innovations such as smartphones generate large amounts of data that can yield invaluable information if properly managed. These large datasets, popularly known as Big Data, are difficult to manage using traditional computing technologies. New technologies are emerging in the market to address the problem of managing and analyzing Big Data to produce invaluable insights from it. Organizations are finding it difficult to implement these Big Data technologies effectively due to problems such as lack of available expertise. Some of the latest innovations in the industry are related to cloud computing and Big Data. There is significant interest in academia and industry in combining Big Data and cloud computing to create new technologies that can solve the Big Data problem. Big Data based on cloud computing is an upcoming area in computer science and many vendors are providing their ideas on this topic. The combination of Big Data technologies and cloud computing platforms has led to the emergence of a new category of technology called Big Data as a Service or BDaaS. This thesis aims to define the BDaaS service stack and to evaluate a few technologies in the cloud computing ecosystem using the BDaaS service stack. The BDaaS service stack provides an effective way to classify the Big Data technologies that enable technology users to evaluate and chose the technology that meets their requirements effectively. Technology vendors can use the same BDaaS stack to communicate the product offerings better to the consumer.
by Benoy Bhagattjee.
S.M. in Engineering and Management
Jun, Sang-Woo. "Big data analytics made affordable using hardware-accelerated flash storage". Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118088.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 175-192).
Vast amount of data is continuously being collected from sources including social networks, web pages, and sensor networks, and their economic value is dependent on our ability to analyze them in a timely and affordable manner. High performance analytics have traditionally required a machine or a cluster of machines with enough DRAM to accommodate the entire working set, due to their need for random accesses. However, datasets of interest are now regularly exceeding terabytes in size, and the cost of purchasing and operating a cluster with hundreds of machines is becoming a significant overhead. Furthermore, the performance of many random-access-intensive applications plummets even when a fraction of data does not fit in memory. On the other hand, such datasets could be stored easily in the flash-based secondary storage of a rack-scale cluster, or even a single machine for a fraction of capital and operating costs. While flash storage has much better performance compared to hard disks, there are many hurdles to overcome in order to reach the performance of DRAM-based clusters. This thesis presents a new system architecture as well as operational methods that enable flash-based systems to achieve performance comparable to much costlier DRAM-based clusters for many important applications. We describe a highly customizable architecture called BlueDBM, which includes flash storage devices augmented with in-storage hardware accelerators, networked using a separate storage-area network. Using a prototype BlueDBM cluster with custom-designed accelerated storage devices, as well as novel accelerator designs and storage management algorithms, we have demonstrated high performance at low cost for applications including graph analytics, sorting, and database operations. We believe this approach to handling Big Data analytics is an attractive solution to the cost-performance issue of Big Data analytics.
by Sang-Woo Jun.
Ph. D.
Battle, Leilani Marie. "Interactive visualization of big data leveraging databases for scalable computation". Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84906.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 55-57).
Modern database management systems (DBMS) have been designed to efficiently store, manage and perform computations on massive amounts of data. In contrast, many existing visualization systems do not scale seamlessly from small data sets to enormous ones. We have designed a three-tiered visualization system called ScalaR to deal with this issue. ScalaR dynamically performs resolution reduction when the expected result of a DBMS query is too large to be effectively rendered on existing screen real estate. Instead of running the original query, ScalaR inserts aggregation, sampling or filtering operations to reduce the size of the result. This thesis presents the design and implementation of ScalaR, and shows results for two example applications, visualizing earthquake records and satellite imagery data, stored in SciDB as the back-end DBMS.
by Leilani Marie Battle.
S.M.
Wu, Sherwin Zhang. "Sifter : a generalized, efficient, and scalable big data corpus generator". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100684.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (page 61).
Big data has reached the point where the volume, velocity, and variety of data place significant limitations on the computer systems which process and analyze them. Working with very large data sets has becoming increasingly unweildly. Therefore, our goal was to create a system that can support efficient extraction of data subsets to a size that can be manipulated on a single machine. Sifter was developed as a big data corpus generator for scientists to generate these smaller datasets from an original larger one. Sifter's three-layer architecture allows for client users to easily create their own custom data corpus jobs, while allowing administrative users to easily integrate additional core data sets into Sifter. This thesis presents the implemented Sifter system deployed on an initial Twitter dataset. We further show how we added support for a secondary MIMIC medical dataset, as well as demonstrate the scalability of Sifter with very large datasets.
M. Eng.
Eigner, Martin. "Das Industrial Internet – Engineering Prozesse und IT-Lösungen". Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-214588.
Pełny tekst źródłaBackurs, Arturs. "Below P vs NP : fine-grained hardness for big data problems". Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120376.
Pełny tekst źródłaThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 145-156).
The theory of NP-hardness has been remarkably successful in identifying problems that are unlikely to be solvable in polynomial time. However, many other important problems do have polynomial-time algorithms, but large exponents in their runtime bounds can make them inefficient in practice. For example, quadratic-time algorithms, although practical on moderately sized inputs, can become inefficient on big data problems that involve gigabytes or more of data. Although for many data analysis problems no sub-quadratic time algorithms are known, any evidence of quadratic-time hardness has remained elusive. In this thesis we present hardness results for several text analysis and machine learning tasks: ** Lower bounds for edit distance, regular expression matching and other pattern matching and string processing problems. ** Lower bounds for empirical risk minimization such as kernel support vectors machines and other kernel machine learning problems. All of these problems have polynomial time algorithms, but despite extensive amount of research, no near-linear time algorithms have been found. We show that, under a natural complexity-theoretic conjecture, such algorithms do not exist. We also show how these lower bounds have inspired the development of efficient algorithms for some variants of these problems.
by Arturs Backurs.
Ph. D.
Bunpuckdee, Bhadin, i Ömer Tekbas. "Ideation with Big Data : A case study of a large mature firm". Thesis, KTH, Maskinkonstruktion (Inst.), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277732.
Pełny tekst źródłaBig Data har under senaste åren fått mycket uppmärksamhet. Utvecklingen av olika teknologier har möjliggjort att en stor mängd data kan behandlas och förvaras enklare. Detta har gjort att företag har funderat över hur Big Data kan vara värdeskapande. Däremot är det inte självklart att Big Data automatiskt genererar affärsmöjligheter; företag måste förstå hur man ska förädla data och implementera insikterna. För att möjliggöra detta måste nya kompetenser införskaffas och företag måste anpassa sig till en mer medskapande arbetsstruktur. Detta arbetets ändamål är att undersöka vilka innovationsprocesser en avdelning med data-experter som jobbar tvärfunktionellt i en organisation använder för att idégenerera för nya affärsmöjligheter. Målet är att ge rekommendationer hur företag kan bli mer effektiva vid idégenerering. Denna fallstudie utfördes för ett stort etablerat företag inom revision och inom en avdelning med expertis inom dataanalys, automation och artificiell intelligens. Datan i denna rapport införskaffades genom interna intervjuer från på avdelningen A. Fallstudien resulterade i rekommendationer på vad som behövs att ha i åtanke vid idégenerering med Big Data. Viktiga aspekter att överväga är att Big Data möjliggör medskapande och därför är det ytterst viktigt att kunder, domänexperter och Big Data experter idégenererar tillsammans.
Landelius, Cecilia. "Data governance in big data : How to improve data quality in a decentralized organization". Thesis, KTH, Industriell ekonomi och organisation (Inst.), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301258.
Pełny tekst źródłaDen ökade användningen av internet har ökat mängden data som finns tillgänglig och mängden data som samlas in. Företag påbörjar därför initiativ för att analysera dessa stora mängder data för att få ökad förståelse. Dock är värdet av analysen samt besluten som baseras på analysen beroende av kvaliteten av den underliggande data. Av denna anledning har datakvalitet blivit en viktig fråga för företag. Misslyckanden i datakvalitetshantering är ofta på grund av organisatoriska aspekter. Eftersom decentraliserade organisationsformer blir alltmer populära, finns det ett behov av att förstå hur en decentraliserad organisation kan arbeta med frågor som datakvalitet och dess förbättring. Denna uppsats är en kvalitativ studie av ett företag inom logistikbranschen som i nuläget genomgår ett skifte till att bli datadrivna och som har problem med att underhålla sin datakvalitet. Syftet med denna uppsats är att besvara frågorna: • RQ1: Vad är datakvalitet i sammanhanget logistikdata? • RQ2: Vilka är hindren för att förbättra datakvalitet i en decentraliserad organisation? • RQ3: Hur kan dessa hinder överkommas? Flera datakvalitetsdimensioner identifierades och kategoriserades som kritiska problem, problem och icke-problem. Från den insamlade informationen fanns att dimensionerna, kompletthet, exakthet och konsekvens var kritiska datakvalitetsproblem för företaget. De tre mest förekommande hindren för att förbättra datakvalité var dataägandeskap, standardisering av data samt att förstå vikten av datakvalitet. För att överkomma dessa hinder är de viktigaste åtgärderna att skapa strukturer för dataägandeskap, att implementera praxis för hantering av datakvalitet samt att ändra attityden hos de anställda gentemot datakvalitet till en datadriven attityd. Generaliseringsbarheten av en enfallsstudie är låg. Dock medför denna studie flera viktiga insikter och trender vilka kan användas för framtida studier och för företag som genomgår liknande transformationer.
Islam, Md Zahidul. "A Cloud Based Platform for Big Data Science". Thesis, Linköpings universitet, Programvara och system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-103700.
Pełny tekst źródłaAkusok, Anton. "Extreme Learning Machines: novel extensions and application to Big Data". Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/3036.
Pełny tekst źródłaDawany, Noor Tozeren Aydin. "Large-scale integration of microarray data : investigating the pathologies of cancer and infectious diseases /". Philadelphia, Pa. : Drexel University, 2010. http://hdl.handle.net/1860/3251.
Pełny tekst źródłaKalila, Adham. "Big data fusion to estimate driving adoption behavior and urban fuel consumption". Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119335.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 63-68).
Data from mobile phones is constantly increasing in accuracy, quantity, and ubiquity. Methods that utilize such data in the field of transportation demand forecasting have been proposed and represent a welcome addition. We propose a framework that uses the resulting travel demand and computes fuel consumption. The model is calibrated for application on any range of car fuel efficiency and combined with other sources of data to produce urban fuel consumption estimates for the city of Riyadh as an application. Targeted traffic congestion reduction strategies are compared to random traffic reduction and the results indicate a factor of 2 improvement on fuel savings. Moreover, an agent-based innovation adoption model is used with a network of women from Call Detail Records to simulate the time at which women may adopt driving after the ban on females driving is lifted in Saudi Arabia. The resulting adoption rates are combined with fuel costs from simulating empty driver trips to forecast the fuel savings potential of such a historic policy change.
by Adham Kalila.
S.M. in Transportation
Abounia, Omran Behzad. "Application of Data Mining and Big Data Analytics in the Construction Industry". The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu148069742849934.
Pełny tekst źródłaKhalilikhah, Majid. "Traffic Sign Management: Data Integration and Analysis Methods for Mobile LiDAR and Digital Photolog Big Data". DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/4744.
Pełny tekst źródłaPurcaro, Michael J. "Analysis, Visualization, and Machine Learning of Epigenomic Data". eScholarship@UMMS, 2017. https://escholarship.umassmed.edu/gsbs_diss/938.
Pełny tekst źródłaLi, Zhen. "CloudVista: a Framework for Interactive Visual Cluster Exploration of Big Data in the Cloud". Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1348204863.
Pełny tekst źródłaPergert, Anton, i William George. "Teoretisk undersökning om relationen mellan Big Data och ekologisk hållbarhet i tillverkande industri". Thesis, KTH, Maskinkonstruktion (Inst.), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299636.
Pełny tekst źródłaThe Industrial Revolution had its beginnings in the middle of the 18th century. Today we are athe beginning of the fourth industrial revolution, also known as Industry 4.0 where smart technologies are integrated in factories. One result of this is the collection and management of large amounts of data, which has introduced Big Data into the manufacturing industry. At the same time, the focus on ecological sustainability is growing due to the increased environmental degradation and depletion of natural resources. Therefore, an important aspect of Industry 4.0 is to implement smart technologies that make factories more ecologically sustainable. This study consists of a theory study, where information is compiled from relevant scientific publications. In adition, the study includes interviews with relevant companies and researchers. Based on these, the question whether Big Data as a smart technology, affects manufacturing companies, from an ecological sustainability perspective, is answered. The results show that Big Data is a smart technology can contribute to a more energy efficient production, by collecting data and in various ways analysing and optimizing processes based on the collected information. However, the threshold for the technology can be steep, both in terms of pricing and knowledge. Furthermore, Big Data can also accelerate the shift to a more circular economy, by collecting data and making informed decisions regarding the transition to a more circular and ecologically sustainable production. In adition, Big Data can facilitate and be implemented in circular services, such as machine rental, which replaces linear and traditional methods, where the product is purchased, used and discarded. Big Data can also be used in the form of predictive maintenance, which reduces the use of ecological resources by collecting and analysing real-time data to make decisions, which in turn can increase the service life of the equipment. This also reduces the amount of spare parts and scrap. The study therefore shows that Big Data can contribute to increased ecological sustainability in various ways.
Kumlin, Jesper. "True operation simulation for urban rail : Energy efficiency from access to Big data". Thesis, Mälardalens högskola, Industriell ekonomi och organisation, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44264.
Pełny tekst źródłaObeso, Duque Aleksandra. "Performance Prediction for Enabling Intelligent Resource Management on Big Data Processing Workflows". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-372178.
Pełny tekst źródłaKoseler, Kaan Tamer. "Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case". Miami University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=miami1524832924255132.
Pełny tekst źródłaSaenyi, Betty. "Opportunities and challenges of Big Data Analytics in healthcare : An exploratory study on the adoption of big data analytics in the Management of Sickle Cell Anaemia". Thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Informatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-42864.
Pełny tekst źródłaTaratoris, Evangelos. "A single-pass grid-based algorithm for clustering big data on spatial databases". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113168.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 79-80).
The problem of clustering multi-dimensional data has been well researched in the scientific community. It is a problem with wide scope and applications. With the rapid growth of very large databases, traditional clustering algorithms become inefficient due to insufficient memory capacity. Grid-based algorithms try to solve this problem by dividing the space into cells and then performing clustering on the cells. However these algorithms also become inefficient when even the grid becomes too large to be saved in memory. This thesis presents a new algorithm, SingleClus, that is performing clustering on a 2-dimensional dataset with a single pass of the dataset. Moreover, it optimizes the amount of disk I/0 operations while making modest use of main memory. Therefore it is theoretically optimal in terms of performance. It modifies and improves on the Hoshen-Kopelman clustering algorithm while dealing with the algorithm's fundamental challenges when operating in a Big Data setting.
by Evangelos Taratoris.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Zhang, Liangwei. "Big Data Analytics for Fault Detection and its Application in Maintenance". Doctoral thesis, Luleå tekniska universitet, Drift, underhåll och akustik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-60423.
Pełny tekst źródłaNewth, Oliver Edward. "Predicting extreme events : the role of big data in quantifying risk in structural development". Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/90028.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 71-73).
Engineers are well-placed when calculating the required resistance for natural and non-natural hazards. However, there are two main problems with the current approach. First, while hazards are one of the primary causes of catastrophic damage and the design against risk contributes vastly to the cost in design and construction, it is only considered late in the development process. Second, current design approaches tend to provide guidelines that do not explain the rationale behind the presented values, leaving the engineer without any true understanding of the actual risk of a hazard occurring. Data is a key aspect in accurate prediction, though its sources are often sparsely distributed and engineers rarely have the background in statistics to process this into meaningful and useful results. This thesis explores the existing approaches to designing against hazards, focussing on natural hazards such as earthquakes, and the type of existing geographic information systems (GIS) that exist to assist in this process. A conceptual design for a hazard-related GIS is then proposed, looking at the key requirements for a system that could communicate key hazard-related data and how it could be designed and implemented. Sources for hazard-related data are then discussed. Finally, models and methodologies for interpreting hazard-related data are examined, with a schematic for how a hazard focussed system could be structured. These look at how risk can be predicted in a transparent way which ensures that the user of such a system is able to understand the hazard-related risks for a given location.
by Oliver Edward Newth.
M. Eng.
Guzun, Gheorghi. "Distributed indexing and scalable query processing for interactive big data explorations". Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2087.
Pełny tekst źródła