To see the other types of publications on this topic, follow the link: Data processing.

Dissertations / Theses on the topic 'Data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Long, Christopher C. "Data Processing for NASA's TDRSS DAMA Channel." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/611474.

Full text
Abstract:
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
Presently, NASA's Space Network (SN) does not have the ability to receive random messages from satellites using the system. Scheduling of the service must be done by the owner of the spacecraft through Goddard Space Flight Center (GSFC). The goal of NASA is to improve the current system so that random messages, that are generated on board the satellite, can be received by the SN. The messages will be requests for service that the satellites control system deems necessary. These messages will then be sent to the owner of the spacecraft where appropriate action and scheduling can take place. This new service is known as the Demand Assignment Multiple Access system (DAMA).
APA, Harvard, Vancouver, ISO, and other styles
2

Sun, Wenjun. "Parallel data processing for semistructured data." Thesis, London South Bank University, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.434394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Giordano, Manfredi. "Autonomic Big Data Processing." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14837/.

Full text
Abstract:
Apache Spark è un framework open source per la computazione distribuita su larga scala, caratterizzato da un engine in-memory che permette prestazioni superiori a soluzioni concorrenti nell’elaborazione di dati a riposo (batch) o in movimento (streaming). In questo lavoro presenteremo alcune tecniche progettate e implementate per migliorare l’elasticità e l’adattabilità del framework rispetto a modifiche dinamiche nell’ambiente di esecuzione o nel workload. Lo scopo primario di tali tecniche è di permettere ad applicazioni concorrenti di condividere le risorse fisiche disponibili nell’infrastruttura cluster sottostante in modo efficiente. Il contesto nel quale le applicazioni distribuite vengono eseguite difficilmente può essere considerato statico: le componenti hardware possono fallire, i processi possono interrompersi, gli utenti possono allocare risorse aggiuntive in modo imprevedibile nel tentativo di accelerare la computazione o di allegerire il carico di lavoro. Infine, non soltanto le risorse fisiche ma anche i dati in input possono variare di dimensione e complessità durante l’esecuzione, così che sia dati sia risorse non possano essere considerati statici. Una configurazione immutabile del cluster non riuscirà a ottenere la migliore efficienza possibile per tutti i differenti carichi di lavoro. Ne consegue che un framework per il calcolo distribuito che sia "consapevole" delle modifiche ambientali e delle modifiche al workload e che sia in grado di adattarsi a esse puo risultare piu performante di un framework che permetta unicamente configurazioni statiche. Gli esperimenti da noi compiuti con applicazioni Big Data altamente parallelizzabili mostrano come il costo della soluzione proposta sia minimo e come la nostra version di Spark più dinamica e adattiva possa portare a benefici in termini di flessibilità, scalabilità ed efficienza.
APA, Harvard, Vancouver, ISO, and other styles
4

Rydell, Joakim. "Advanced MRI Data Processing." Doctoral thesis, Linköping : Department of Biomedical Engineering, Linköpings universitet, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Irick, Nancy. "Post Processing Data Analysis." International Foundation for Telemetering, 2009. http://hdl.handle.net/10150/606091.

Full text
Abstract:
ITC/USA 2009 Conference Proceedings / The Forty-Fifth Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2009 / Riviera Hotel & Convention Center, Las Vegas, Nevada
Once the test is complete, the job of the Data Analyst has begun. Files from the various acquisition systems are collected. It is the job of the analyst to put together these files in a readable format so the success or failure of the test can be attained. This paper will discuss the process of breaking down these files, comparing data from different systems, and methods of presenting the data.
APA, Harvard, Vancouver, ISO, and other styles
6

Castro, Fernandez Raul. "Stateful data-parallel processing." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/31596.

Full text
Abstract:
Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial - it brings domain-specific knowledge from broad fields - but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs - with explicit state that they can update, such as matrices in machine learning algorithms - and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state - spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.
APA, Harvard, Vancouver, ISO, and other styles
7

Nyström, Simon, and Joakim Lönnegren. "Processing data sources with big data frameworks." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188204.

Full text
Abstract:
Big data is a concept that is expanding rapidly. As more and more data is generatedand garnered, there is an increasing need for efficient solutions that can be utilized to process all this data in attempts to gain value from it. The purpose of this thesis is to find an efficient way to quickly process a large number of relatively small files. More specifically, the purpose is to test two frameworks that can be used for processing big data. The frameworks that are tested against each other are Apache NiFi and Apache Storm. A method is devised in order to, firstly, construct a data flow and secondly, construct a method for testing the performance and scalability of the frameworks running this data flow. The results reveal that Apache Storm is faster than Apache NiFi, at the sort of task that was tested. As the number of nodes included in the tests went up, the performance did not always do the same. This indicates that adding more nodes to a big data processing pipeline, does not always result in a better performing setup and that, sometimes, other measures must be made to heighten the performance.
Big data är ett koncept som växer snabbt. När mer och mer data genereras och samlas in finns det ett ökande behov av effektiva lösningar som kan användas föratt behandla all denna data, i försök att utvinna värde från den. Syftet med detta examensarbete är att hitta ett effektivt sätt att snabbt behandla ett stort antal filer, av relativt liten storlek. Mer specifikt så är det för att testa två ramverk som kan användas vid big data-behandling. De två ramverken som testas mot varandra är Apache NiFi och Apache Storm. En metod beskrivs för att, för det första, konstruera ett dataflöde och, för det andra, konstruera en metod för att testa prestandan och skalbarheten av de ramverk som kör dataflödet. Resultaten avslöjar att Apache Storm är snabbare än NiFi, på den typen av test som gjordes. När antalet noder som var med i testerna ökades, så ökade inte alltid prestandan. Detta visar att en ökning av antalet noder, i en big data-behandlingskedja, inte alltid leder till bättre prestanda och att det ibland krävs andra åtgärder för att öka prestandan.
APA, Harvard, Vancouver, ISO, and other styles
8

Mai, Luo. "Towards efficient big data processing in data centres." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/64817.

Full text
Abstract:
Large data processing systems require a high degree of coordination, and exhibit network bottlenecks due to massive communication data. This motivates my PhD study to propose system control mechanisms that improve monitoring and coordination, and efficient communication methods by bridging applications and networks. The first result is Chi, a new control plane for stateful streaming systems. Chi has a control loop that embeds control messages in data channels to seamlessly monitor and coordinate a streaming pipeline. This design helps monitor system and application-specific metrics in a scalable manner, and perform complex modification with on-the-fly data. The behaviours of control messages are customisable, thus enabling various control algorithms. Chi has been deployed into production systems, and exhibits high performance and scalability in test-bed experiments. With effective coordination, data-intensive systems need to remove network bottlenecks. This is important in data centres as their networks are usually over-subscribed. Hence, my study explores an idea that bridges applications and networks for accelerating communication. This idea can be realised (i) in the network core through a middlebox platform called NetAgg that can efficiently execute application-specific aggregation functions along busy network paths, and (ii) at network edges through a server network stack that provides powerful communication primitives and traffic management services. Test-bed experiments show that these methods can improve the communication of important analytics systems. A tight integration of applications and networks, however, requires an intuitive network programming model. My study thus proposes a network programming framework named Flick. Flick has a high-level programming language for application-specific network services. The services are compiled to dataflows and executed by a high-performance runtime. To be production-friendly, this runtime can run in commodity network elements and guarantee fair resource sharing among services. Flick has been used for developing popular network services, and its performance is shown in real-world benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
9

Mueller, Guenter. "DIGITAL DATA RECORDING: NEW WAYS IN DATA PROCESSING." International Foundation for Telemetering, 2000. http://hdl.handle.net/10150/606505.

Full text
Abstract:
International Telemetering Conference Proceedings / October 23-26, 2000 / Town & Country Hotel and Conference Center, San Diego, California
With the introduction of digital data recorders new ways of data processing have been developed. The three most important improvements are discussed in this paper: A) By processing PCM Data from a digital recorder by using the SCSI-Interface our ground station has developed software to detect the synchronization pattern of the PCM data and then perform software frame decommutation. Many advantages will be found with this method. B) New digital recorders already use the CCSDS Standard as the internal recording format. Once this technique is implemented in our ground station’s software and becomes part of our software engineering team’s general know-how, the switch to CCSDS telemetry in the future will require no quantum leap in effort. C) Digital recorders offer a very new application: Writing data to a digital tape in the recorder’s own format, allows the replay of data using the recorder’s interfaces; i.e. writing vibration data from the host system to tape, using the analog format of the digital recorder, allows the analysis of the data either in analog form, using the analog interface of the recorder, or in digital form.
APA, Harvard, Vancouver, ISO, and other styles
10

Macias, Filiberto. "Real Time Telemetry Data Processing and Data Display." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/611405.

Full text
Abstract:
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
The Telemetry Data Center (TDC) at White Sands Missile Range (WSMR) is now beginning to modernize its existing telemetry data processing system. Modern networking and interactive graphical displays are now being introduced. This infusion of modern technology will allow the TDC to provide our customers with enhanced data processing and display capability. The intent of this project is to outline this undertaking.
APA, Harvard, Vancouver, ISO, and other styles
11

Neukirch, Maik. "Non Stationary Magnetotelluric Data Processing." Doctoral thesis, Universitat de Barcelona, 2014. http://hdl.handle.net/10803/284932.

Full text
Abstract:
Studies have proven that the desired signal for Magnetotellurics (MT) in the electromagnetic (EM) field can be regarded as 'quasi stationary' (i.e. sufficiently stationary to apply a windowed Fourier transform). However, measured time series often contain environmental noise. Hence, they may not fulfill the stationarity requirement for the application of the Fourier Transform (FT) and therefore may lead to false or unreliable results under methods that rely on the FT. In light of paucity of algorithms of MT data processing in the presence of non stationary noise, it is the goal of this thesis to elaborate a robust, non stationary algorithm, which can compete with sophisticated, state-of-the-art algorithms in terms of accuracy and precision. In addition, I proof mathematically the algorithm's viability and validate its superiority to other codes processing non stationary, synthetic and real MT data. Non stationary EM data may affect the computation of Fourier spectra in unforeseeable manners and consequently, the traditional estimation of the MT transfer functions (TF). The TF estimation scheme developed in this work is based on an emerging nonlinear, non stationary time series analysis tool, called Empirical Mode Decomposition (EMD). EMD decomposes time series into Intrinsic Mode Functions (IMF) in the time-frequency domain, which can be represented by the instantaneous parameters amplitude, phase and frequency. In the first part of my thesis, I show that time slices of well defined IMFs equal time slices of Fourier Series, where the instantaneous parameters of the IMF define amplitude and phase of the Fourier Series parameters. Based on these findings I formulate the theorem that non stationary convolution of an IMF with a general time domain response function translates into a multiplication of the IMF with the respective spectral domain response function, which is explicitly permitted to vary over time. Further, I employ real world MT data to illustrate that a de-trended signal's IMFs can be convolved independently and then be used for further time-frequency analysis as done for MT processing. In the second part of my thesis, I apply the newly formulated theorem to the MT method. The MT method analyses the correlation between the electric and magnetic field due to the conductivity structure of the subsurface. For sufficiently low frequencies (i.e. when the EM field interacts diffusively), the conductive body of the Earth acts as an inductive system response, which convolves with magnetic field variations and results in electric field variations. The frequency representation of this system response is commonly referred to as MT TF and its estimation from measured electric and magnetic time series is summarized as MT processing. The main contribution in this thesis is the design of the MT TF estimation algorithm based on EMD. In contrast to previous works that employ EMD for MT data processing, I (i) point out the advantages of a multivariate decomposition, (ii) highlight the possibility to use instantaneous parameters, and (iii) define the homogenization of frequency discrepancies between data channels. In addition, my algorithm estimates the transfer functions using robust statistical methods such as (i) robust principal component analysis and (ii) iteratively re-weighted least squares regression with a Huber weight function. Finally, TF uncertainties are estimated by iterating the complete robust regression, including the robust weight computation, by means of a bootstrap routine. The proposed methodology is applied to synthetic and real data with and without non stationary character and the results are compared with other processing techniques. I conclude that non stationary noise can heavily affect Fourier based MT data processing but the presented non stationary approach is nonetheless able to extract the impedances correctly even when the other methods fail.
APA, Harvard, Vancouver, ISO, and other styles
12

Brewster, Wayne Allan. "Space tether - radar data processing." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1994. http://handle.dtic.mil/100.2/ADA289654.

Full text
Abstract:
Thesis (M.S. in Electrical Engineering and M.S. in Applied Physics) Naval Postgraduate School, September 1994.
Thesis advisor(s): Richard Christopher Olsen, Ralph Hippenstiel. "September 1994." Bibliography: p. 71. Also available online.
APA, Harvard, Vancouver, ISO, and other styles
13

Caon, John. "Multi-channel radiometric data processing /." Title page, abstract and contents only, 1993. http://web4.library.adelaide.edu.au/theses/09SB/09sbc235.pdf.

Full text
Abstract:
Thesis (B. Sc.(Hons.))--University of Adelaide, Dept. of Geology and Geophysics, 1994.
Cover title: Advantages of multi-channel radiometric processing. Two maps have overlays. National map series reference Forbes, N.S.W. 1:250,000 S heet SI/55-7. Includes bibliographical references (leaf 38).
APA, Harvard, Vancouver, ISO, and other styles
14

Rupprecht, Lukas. "Network-aware big data processing." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/52455.

Full text
Abstract:
The scale-out approach of modern data-parallel frameworks such as Apache Flink or Apache Spark has enabled them to deal with large amounts of data. These applications are often deployed in large-scale data centres with many resources. However, as deployments and data continue to grow, more network communication is incurred during a data processing query. At the same time, data centre networks (DCNs) are becoming increasingly more complex in terms of the physical network topology, the variety of applications that are sharing the network, and the different requirements of these applications on the network. The high complexity of DCNs combined with the increased traffic demands of applications has made the network a bottleneck for query performance. In this thesis, we explore ways of making data-parallel frameworks network-aware, i.e. we combine specific knowledge about the application and the physical network to reduce query completion times. We identify three main types of traffic that occur during query processing and add network-awareness to each of them to optimise network usage. 1) Traffic reduction for aggregatable traffic exploits the physical network topology and the associativity and commutativity of aggregation queries to reduce traffic as early as possible. In-network aggregation trees utilise existing networking hardware and the tree topology of DCNs to partially aggregate and thereby reduce data as it flows through the network. 2) Traffic balancing for non-aggregatable traffic monitors the network throughput of an application and uses knowledge about the query to optimise the overall network utilisation. By dynamically changing the destinations of parts of the transferred data, network hotspots, which can occur when many applications share the network, can be avoided. 3) Traffic elimination for storage traffic gives control over data placement to the application instead of the distributed storage system. This allows the application to optimise where data is stored across the cluster based on application properties and thereby eliminate unnecessary network traffic.
APA, Harvard, Vancouver, ISO, and other styles
15

Chiu, Cheng-Jung. "Data processing in nanoscale profilometry." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/36677.

Full text
Abstract:
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 1995.
Includes bibliographical references (p. 176-177).
New developments on the nanoscale are taking place rapidly in many fields. Instrumentation used to measure and understand the geometry and property of the small scale structure is therefore essential. One of the most promising devices to head the measurement science into the nanoscale is the scanning probe microscope. A prototype of a nanoscale profilometer based on the scanning probe microscope has been built in the Laboratory for Manufacturing and Productivity at MIT. A sample is placed on a precision flip stage and different sides of the sample are scanned under the SPM to acquire its separate surface topography. To reconstruct the original three dimensional profile, many techniques like digital filtering, edge identification, and image matching are investigated and implemented in the computer programs to post process the data, and with greater emphasis placed on the nanoscale application. The important programming issues are addressed, too. Finally, this system's error sources are discussed and analyzed.
by Cheng-Jung Chiu.
M.S.
APA, Harvard, Vancouver, ISO, and other styles
16

Garlick, Dean, Glen Wada, and Pete Krull. "SPIRIT III Data Verification Processing." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/608393.

Full text
Abstract:
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
This paper will discuss the functions performed by the Spatial Infrared Imaging Telescope (SPIRIT) III Data Processing Center (DPC) at Utah State University (USU). The SPIRIT III sensor is the primary instrument on the Midcourse Space Experiment (MSX) satellite; and as builder of this sensor system, USU is responsible for developing and operating the associated DPC. The SPIRIT III sensor consists of a six-color long-wave infrared (LWIR) radiometer system, an LWIR spectrographic interferometer, contamination sensors, and housekeeping monitoring systems. The MSX spacecraft recorders can capture up to 8+ gigabytes of data a day from this sensor. The DPC is subsequently required to provide a 24-hour turnaround to verify and qualify these data by implementing a complex set of sensor and data verification and quality checks. This paper addresses the computing architecture, distributed processing software, and automated data verification processes implemented to meet these requirements.
APA, Harvard, Vancouver, ISO, and other styles
17

Ostroumov, Ivan Victorovich. "Real time sensors data processing." Thesis, Polit. Challenges of science today: XIV International Scientific and Practical Conference of Young Researchers and Students, April 2–3, 2014 : theses. – К., 2014. – 35p, 2014. http://er.nau.edu.ua/handle/NAU/26582.

Full text
Abstract:
Sensor it is the most powerful part of any system. Aviation industry is the plase where milions of sensors is be used for difetrent purpuses. Othe wery important task of avionics equipment is data transfer between sensors to processing equipment. Why it is so important to transmit data online into MatLab? Nowadays rapidly are developing unmanned aerial vehicles. If we can transmit data from UAV sensors into MatLab, then we can process it and get the desired information about UAV. Of course we have to use the most chipiest way to data transfer. Today everyone in the world has mobile phone. Many of them has different sensors, such as: pressure sensor, temperature sensor, gravity sensor, gyroscope, rotation vector sensor, proximity sensor, light sensor, orientation sensor, magnetic field sensor, accelerometer, GPS receiver and so on. It will be cool if we can use real time data from cell phone sensors for some navigation tasks. In our work we use mobile phone Samsung Galaxy SIII with all sensors which are listed above except temperature sensor. There are existing many programs for reading and displaying data from sensors, such as: “Sensor Kinetics”, “Sensors”, “Data Recording”, “Android Sensors Viewer”. We used “Data Recording”. For the purpose of transmitting data from cell phone there are following methods: - GPRS (Mobile internet); - Bluetooth; - USB cable; - Wi-Fi. After comparing this methods we analyzed that GPRS is uncomfortable for us because we should pay for it, Bluetooth has small coverage, USB cable has not such portability as others methods. So we decided that Wi-Fi is optimal method on transmitting data for our goal
APA, Harvard, Vancouver, ISO, and other styles
18

Silva, João Paulo Sá da. "Data processing in Zynq APSoC." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14703.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field-Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985, i.e. less than 30 years ago. The influence of FPGAs on many directions in engineering is growing continuously and rapidly. There are many reasons for such progress and the most important are the inherent reconfigurability of FPGAs and relatively cheap development cost. Recent field-configurable micro-chips combine the capabilities of software and hardware by incorporating multi-core processors and reconfigurable logic enabling the development of highly optimized computational systems for a vast variety of practical applications, including high-performance computing, data, signal and image processing, embedded systems, and many others. In this context, the main goals of the thesis are to study the new micro-chips, namely the Zynq-7000 family and to apply them to two selected case studies: data sort and Hamming weight calculation for long vectors.
Field-Programmable Gate Arrays (FPGAs) foram inventadas pela Xilinx em 1985, ou seja, há menos de 30 anos. A influência das FPGAs está a crescer continua e rapidamente em muitos ramos de engenharia. Há varias razões para esta evolução, as mais importantes são a sua capacidade de reconfiguração inerente e os baixos custos de desenvolvimento. Os micro-chips mais recentes baseados em FPGAs combinam capacidades de software e hardware através da incorporação de processadores multi-core e lógica reconfigurável permitindo o desenvolvimento de sistemas computacionais altamente otimizados para uma grande variedade de aplicações práticas, incluindo computação de alto desempenho, processamento de dados, de sinal e imagem, sistemas embutidos, e muitos outros. Neste contexto, este trabalho tem como o objetivo principal estudar estes novos micro-chips, nomeadamente a família Zynq-7000, para encontrar as melhores formas de potenciar as vantagens deste sistema usando casos de estudo como ordenação de dados e cálculo do peso de Hamming para vetores longos.
APA, Harvard, Vancouver, ISO, and other styles
19

Wang, Yue-Jin. "Adaptive data processing satellite positioning." Thesis, Queensland University of Technology, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
20

Rydman, Oskar. "Data processing of Controlled Source Audio Magnetotelluric (CSAMT) Data." Thesis, Uppsala universitet, Geofysik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387246.

Full text
Abstract:
During this project three distinct methods to improve the data processing of Controlled Source Audio Magnetotellurics (CSAMT) data are implemented and their advantages and disadvantages are discussed. The methods in question are: Detrending the time series in the time domain, instead of detrending in the frequencydomain. Implementation of a coherency test to pinpoint data segments of low quality andremove these data from the calculations. Implementing a method to detect and remove transients from the time series toreduce background noise in the frequency spectra. Both the detrending in time domain and the transient removal shows potential in improvingdata quality even if the improvements are small(both in the (1-10% range). Due totechnical limitations no coherency test was implemented. Overall the processes discussedin the report did improve the data quality and may serve as groundwork for further improvementsto come.
Projektet behandlar tre stycken metoder för att förbättra signalkvaliten hos Controlled Source Audio Magnetotellurics (CSAMT) data, dessa implementeras och deras för- och nackdelar diskuteras. Metoderna som hanteras är: Avlägsnandet av trender från tidsserier i tidsdomänen istället för i frekvensdomänen. Implementationen av ett koherenstest för att identifiera ”dåliga” datasegment ochavlägsna dessa från vidare beräkningar. Implementationen av en metod för att både hitta och avlägsna transienter (dataspikar) från tidsserien för att minska bakgrundsbruset i frekvensspektrat. Både avlägsnandet av trender samt transienter visar positiv inverkan på datakvaliteten,även om skillnaderna är relativt små (båda på ungefär 1-10%). På grund av begränsningarfrån mätdatan kunde inget meningsfullt koherenstest utformas. Överlag har processernasom diskuteras i rapporten förbättrat datakvaliten och kan ses som ett grundarbete förfortsatta förbättringar inom området.
APA, Harvard, Vancouver, ISO, and other styles
21

Chitondo, Pepukayi David Junior. "Data policies for big health data and personal health data." Thesis, Cape Peninsula University of Technology, 2016. http://hdl.handle.net/20.500.11838/2479.

Full text
Abstract:
Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2016.
Health information policies are constantly becoming a key feature in directing information usage in healthcare. After the passing of the Health Information Technology for Economic and Clinical Health (HITECH) Act in 2009 and the Affordable Care Act (ACA) passed in 2010, in the United States, there has been an increase in health systems innovations. Coupling this health systems hype is the current buzz concept in Information Technology, „Big data‟. The prospects of big data are full of potential, even more so in the healthcare field where the accuracy of data is life critical. How big health data can be used to achieve improved health is now the goal of the current health informatics practitioner. Even more exciting is the amount of health data being generated by patients via personal handheld devices and other forms of technology that exclude the healthcare practitioner. This patient-generated data is also known as Personal Health Records, PHR. To achieve meaningful use of PHRs and healthcare data in general through big data, a couple of hurdles have to be overcome. First and foremost is the issue of privacy and confidentiality of the patients whose data is in concern. Secondly is the perceived trustworthiness of PHRs by healthcare practitioners. Other issues to take into context are data rights and ownership, data suppression, IP protection, data anonymisation and reidentification, information flow and regulations as well as consent biases. This study sought to understand the role of data policies in the process of data utilisation in the healthcare sector with added interest on PHRs utilisation as part of big health data.
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Yi. "Data Management and Data Processing Support on Array-Based Scientific Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436157356.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Lloyd, Ian J. "Data processing and individual freedom : data protection and beyond." Thesis, University of Strathclyde, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.233213.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Aygar, Alper. "Doppler Radar Data Processing And Classification." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609890/index.pdf.

Full text
Abstract:
In this thesis, improving the performance of the automatic recognition of the Doppler radar targets is studied. The radar used in this study is a ground-surveillance doppler radar. Target types are car, truck, bus, tank, helicopter, moving man and running man. The input of this thesis is the output of the real doppler radar signals which are normalized and preprocessed (TRP vectors: Target Recognition Pattern vectors) in the doctorate thesis by Erdogan (2002). TRP vectors are normalized and homogenized doppler radar target signals with respect to target speed, target aspect angle and target range. Some target classes have repetitions in time in their TRPs. By the use of these repetitions, improvement of the target type classification performance is studied. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for doppler radar target classification and the results are evaluated. Before classification PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are implemented and applied to normalized doppler radar signals for feature extraction and dimension reduction in an efficient way. These techniques transform the input vectors, which are the normalized doppler radar signals, to another space. The effects of the implementation of these feature extraction algoritms and the use of the repetitions in doppler radar target signals on the doppler radar target classification performance are studied.
APA, Harvard, Vancouver, ISO, and other styles
25

Fernandez, Noemi. "Statistical information processing for data classification." FIU Digital Commons, 1996. http://digitalcommons.fiu.edu/etd/3297.

Full text
Abstract:
This thesis introduces new algorithms for analysis and classification of multivariate data. Statistical approaches are devised for the objectives of data clustering, data classification and object recognition. An initial investigation begins with the application of fundamental pattern recognition principles. Where such fundamental principles meet their limitations, statistical and neural algorithms are integrated to augment the overall approach for an enhanced solution. This thesis provides a new dimension to the problem of classification of data as a result of the following developments: (1) application of algorithms for object classification and recognition; (2) integration of a neural network algorithm which determines the decision functions associated with the task of classification; (3) determination and use of the eigensystem using newly developed methods with the objectives of achieving optimized data clustering and data classification, and dynamic monitoring of time-varying data; and (4) use of the principal component transform to exploit the eigensystem in order to perform the important tasks of orientation-independent object recognition, and di mensionality reduction of the data such as to optimize the processing time without compromising accuracy in the analysis of this data.
APA, Harvard, Vancouver, ISO, and other styles
26

Bernecker, Thomas. "Similarity processing in multi-observation data." Diss., lmu, 2012. http://nbn-resolving.de/urn:nbn:de:bvb:19-154119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Cukrowski, Jacek, and Manfred M. Fischer. "Efficient Organization of Collective Data-Processing." WU Vienna University of Economics and Business, 1998. http://epub.wu.ac.at/4148/1/WSG_DP_6498.pdf.

Full text
Abstract:
The paper examines the application of the concept of economic efficiency to organizational issues of collective information processing in decision making. Information processing is modeled in the framework of the dynamic parallel-processing model of associative computation with an endogenous set-up cost of the processors. The model is extended to include the specific features of collective information processing in the team of decision makers which could cause an error in data analysis. In such a model, the conditions for efficient organization of information processing are defined and the architecture of the efficient structures is considered. We show that specific features of collective decision making procedures require a broader framework for judging organizational efficiency than has traditionally been adopted. In particular, and contrary to the results presented in economic literature, we show that in human data processing (unlike in computer systems), there is no unique architecture for efficient information processing structures, but a number of various efficient forms can be observed. The results indicate that technological progress resulting in faster data processing (ceteris paribus) will lead to more regular information processing structures. However, if the relative cost of the delay in data analysis increases significantly, less regular structures could be efficient. (authors' abstract)
Series: Discussion Papers of the Institute for Economic Geography and GIScience
APA, Harvard, Vancouver, ISO, and other styles
28

Jones, Jonathan A. "Nuclear magnetic resonance data processing methods." Thesis, University of Oxford, 1992. http://ora.ox.ac.uk/objects/uuid:7df97c9a-4e65-4c10-83eb-dfaccfdccefe.

Full text
Abstract:
This thesis describes the application of a wide variety of data processing methods, in particular the Maximum Entropy Method (MEM), to data from Nuclear Magnetic Resonance (NMR) experiments. Chapter 1 provides a brief introduction to NMR and to data processing, which is developed in chapter 2. NMR is described in terms of the classical model due to Bloch, and the principles of conventional (Fourier transform) data processing developed. This is followed by a description of less conventional techniques. The MEM is derived on several grounds, and related to both Bayesian reasoning and Shannon information theory. Chapter 3 describes several methods of evaluating the quality of NMR spectra obtained by a variety of data processing techniques; the simple criterion of spectral appearance is shown to be completely unsatisfactory. A Monte Carlo method is described which allows several different techniques to be compared, and the relative advantages of Fourier transformation and the MEM are assessed. Chapter 4 describes in vivo NMR, particularly the application of the MEM to data from Phase Modulated Rotating Frame Imaging (PMRFI) experiments. In this case the conventional data processing is highly unsatisfactory, and MEM processing results in much clearer spectra. Chapter 5 describes the application of a range of techniques to the estimation and removal of splittings from NMR spectra. The various techniques are discussed using simple examples, and then applied to data from the amino acid iso-leucine. The thesis ends with five appendices which contain historical and philosophical notes, detailed calculations pertaining to PMRFI spectra, and a listing of the MEM computer program.
APA, Harvard, Vancouver, ISO, and other styles
29

Hein, C. S. "Integrated topics in geochemical data processing." Thesis, University of Bristol, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.354700.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Sun, Youshun 1970. "Processing of randomly obtained seismic data." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/59086.

Full text
Abstract:
Thesis (S.M. in Geosystems)--Massachusetts Institute of Technology, Dept. of Earth, Atmospheric, and Planetary Sciences, 1998.
Includes bibliographical references (leaves 62-64).
by Youshun Sun.
S.M.in Geosystems
APA, Harvard, Vancouver, ISO, and other styles
31

Bisot, Clémence. "Spectral Data Processing for Steel Industry." Thesis, KTH, Optimeringslära och systemteori, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175880.

Full text
Abstract:
For steel industry, knowing and understanding characteristics of a steel strip surface at every steps of the production process is a key element to control final product quality. Today as the quality requirements increase this task gets more and more important. The surface of new steel grades with complex chemical compositions has behaviors especially hard to master. For those grades in particular, surface control is critical and difficult. One of the promising technics to assess the problem of surface quality control is spectra analysis. Over the last few years, ArcelorMittal, world’s leading integrated steel and mining company, has led several projects to investigate the possibility of using devices to measure light spectrum of their product at different stage of the production. The large amount of data generated by these devices makes it absolutely necessary to develop efficient data treatment pipelines to get meaningful information out of the recorded spectra. In this thesis, we developed mathematical models and statistical tools to treat signal measured with spectrometers in the framework of different research projects.
För stålindustrin, att veta och förstå ytegenskaperna på ett stålband vid varje steg i produktionsprocessen är en nyckelfaktor för att styra slutproduktens kvalitet. Den senaste tidens ökande kvalitetskraven har gjort denna uppgift allt mer viktigare. Ytan på nya stål kvaliteter med komplexa kemiska sammansättningar har egenskaper som är särskilt svårt att hantera. För dess kvaliteter är ytkontroll kritisk och svår. En av de tekniker som används för att kontrollera ytans kvalitet är spektrum analys. Arcelor Mittal, världens ledande integrerade stål- och gruvföretag, har under de senaste åren lett flera projekt för att undersöka möjligheten att använda mätinstrument för att mäta spektrum ljuset från sin produkt i olika stadier av produktionen. En av de tekniker som används för att kontrollera ytans kvalitet är spektrum analys. I denna avhandling har vi utvecklat matematiska modeller och statistiska verktyg för att kunna hanskas med signaler som är uppmätt med spektrometrar inom ramen av olika forskningsprojekt hos Arcelor Mittal.
APA, Harvard, Vancouver, ISO, and other styles
32

Faber, Marc. "On-Board Data Processing and Filtering." International Foundation for Telemetering, 2015. http://hdl.handle.net/10150/596433.

Full text
Abstract:
ITC/USA 2015 Conference Proceedings / The Fifty-First Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2015 / Bally's Hotel & Convention Center, Las Vegas, NV
One of the requirements resulting from mounting pressure on flight test schedules is the reduction of time needed for data analysis, in pursuit of shorter test cycles. This requirement has ramifications such as the demand for record and processing of not just raw measurement data but also of data converted to engineering units in real time, as well as for an optimized use of the bandwidth available for telemetry downlink and ultimately for shortening the duration of procedures intended to disseminate pre-selected recorded data among different analysis groups on ground. A promising way to successfully address these needs consists in implementing more CPU-intelligence and processing power directly on the on-board flight test equipment. This provides the ability to process complex data in real time. For instance, data acquired at different hardware interfaces (which may be compliant with different standards) can be directly converted to more easy-to-handle engineering units. This leads to a faster extraction and analysis of the actual data contents of the on-board signals and busses. Another central goal is the efficient use of the available bandwidth for telemetry. Real-time data reduction via intelligent filtering is one approach to achieve this challenging objective. The data filtering process should be performed simultaneously on an all-data-capture recording and the user should be able to easily select the interesting data without building PCM formats on board nor to carry out decommutation on ground. This data selection should be as easy as possible for the user, and the on-board FTI devices should generate a seamless and transparent data transmission, making a quick data analysis viable. On-board data processing and filtering has the potential to become the future main path to handle the challenge of FTI data acquisition and analysis in a more comfortable and effective way.
APA, Harvard, Vancouver, ISO, and other styles
33

Brown, Barbie, Parminder Ghuman, Johnny Medina, and Randy Wilke. "A DESKTOP SATELLITE DATA PROCESSING SYSTEM." International Foundation for Telemetering, 1997. http://hdl.handle.net/10150/607552.

Full text
Abstract:
International Telemetering Conference Proceedings / October 27-30, 1997 / Riviera Hotel and Convention Center, Las Vegas, Nevada
The international space community, including National Aeronautics and Space Administration (NASA), European Space Agency (ESA), Japanese National Space Agency (NASDA) and others, are committed to using the Consultative Committee for Space Data Systems (CCSDS) recommendations for low earth orbiting satellites. With the advent of the CCSDS standards and the availability of direct broadcast data from a number of current and future spacecraft, a large number of users could have access to earth science data. However, to allow for the largest possible user base, the cost of processing this data must be as low as possible. By utilizing Very Large Scale Integration (VLSI) Application-Specific Integrated Circuits (ASIC), pipelined data processing, and advanced software development technology and tools, highly integrated CCSDS data processing can be attained in a single desktop system. This paper describes a prototype desktop system based on the Peripheral Component Interconnect (PCI) bus that performs CCSDS standard frame synchronization, bit transition density decoding, Cyclical Redundancy Check (CRC) error checking, Reed-Solomon decoding, data unit sorting, packet extraction, annotation and other CCSDS service processing. Also discussed is software technology used to increase the flexibility and usability of the desktop system. The reproduction cost for the system described is less than 1/8th the current cost of commercially available CCSDS data processing systems.
APA, Harvard, Vancouver, ISO, and other styles
34

Turver, Kim D. "Batch Processing of Flight Test Data." International Foundation for Telemetering, 1993. http://hdl.handle.net/10150/611885.

Full text
Abstract:
International Telemetering Conference Proceedings / October 25-28, 1993 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Boeing's Test Data Retrieval System not only acts as an interface between the Airborne Data Acquisition System and a mainframe computer but also does batch mode processing of data at faster than real time. Analysis engineers request time intervals and measurements of interest. Time intervals and measurements requested are acquired from the flight tape, converted to first order engineering units, and output to 3480 data cartridge tape for post processing. This allows all test data to be stored and only the data of interest to be processed at any given time.
APA, Harvard, Vancouver, ISO, and other styles
35

White, Allan P., and Richard K. Dean. "Real-Time Test Data Processing System." International Foundation for Telemetering, 1989. http://hdl.handle.net/10150/614650.

Full text
Abstract:
International Telemetering Conference Proceedings / October 30-November 02, 1989 / Town & Country Hotel & Convention Center, San Diego, California
The U.S. Army Aviation Development Test Activity at Fort Rucker, Alabama needed a real-time test data collection and processing capability for helicopter flight testing. The system had to be capable of collecting and processing both FM and PCM data streams from analog tape and/or a telemetry receiver. The hardware and software was to be off the shelf whenever possible. The integration was to result in a stand alone telemetry collection and processing system.
APA, Harvard, Vancouver, ISO, and other styles
36

Eccles, Lee H., and John J. Muckerheide. "FLIGHT TEST AIRBORNE DATA PROCESSING SYSTEM." International Foundation for Telemetering, 1986. http://hdl.handle.net/10150/615393.

Full text
Abstract:
International Telemetering Conference Proceedings / October 13-16, 1986 / Riviera Hotel, Las Vegas, Nevada
The Experimental Flight Test organization of the Boeing Commercial Airplane Company has an onboard data reduction system known as the Airborne Data Analysis/Monitor System or ADAMS. ADAMS has evolved over the last 11 years from a system built around a single minicomputer to a system using two minicomputers to a distributed processing system based on microprocessors. The system is built around two buses. One bus is used for passing setup and control information between elements of the system. This is burst type data. The second bus is used for passing periodic data between the units. This data originates in the sensors installed by Flight Test or in the Black Boxes on the airplane. These buses interconnect a number of different processors. The Application Processor is the primary data analysis processor in the system. It runs the application programs and drives the display devices. A number of Application Processors may be installed. The File Processor handles the mass storage devices and such common peripheral devices as the printer. The Acquisition Interface Assembly is the entry point for data into ADAMS. It accepts serial PCM data from either the data acquisition system or the tape recorder. This data is then concatenated, converted to engineering units, and passed to the rest of the system for further processing and display. Over 70 programs have been written to support activities on the airplane. Programs exist to aid the instrumentation engineer in preparing the system for flight and to minimize the amount of paper which must be dealt with. Additional programs are used by the analysis engineer to evaluate the aircraft performance in real time. These programs cover the tests from takeoff through cruise testing and aircraft maneuvers to landing. They are used to analyze everything from brake performance to fuel consumption. Using these programs has reduced the amount of data reduction done on the ground and in many cases eliminated it completely.
APA, Harvard, Vancouver, ISO, and other styles
37

Tamura, Yoshiaki. "Study on Precise Tidal Data Processing." 京都大学 (Kyoto University), 2000. http://hdl.handle.net/2433/157208.

Full text
Abstract:
本文データは平成22年度国立国会図書館の学位論文(博士)のデジタル化実施により作成された画像ファイルを基にpdf変換したものである
Kyoto University (京都大学)
0048
新制・論文博士
博士(理学)
乙第10362号
論理博第1378号
新制||理||1180(附属図書館)
UT51-2000-F428
(主査)教授 竹本 修三, 助教授 福田 洋一, 教授 古澤 保
学位規則第4条第2項該当
APA, Harvard, Vancouver, ISO, and other styles
38

Nasr, Kamil. "Comparison of Popular Data Processing Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293494.

Full text
Abstract:
Data processing is generally defined as the collection and transformation of data to extract meaningful information. Data processing involves a multitude of processes such as validation, sorting summarization, aggregation to name a few. Many analytics engines exit today for largescale data processing, namely Apache Spark, Apache Flink and Apache Beam. Each one of these engines have their own advantages and drawbacks. In this thesis report, we used all three of these engines to process data from the Carbon Monoxide Daily Summary Dataset to determine the emission levels per area and unit of time. Then, we compared the performance of these 3 engines using different metrics. The results showed that Apache Beam, while offered greater convenience when writing programs, was slower than Apache Flink and Apache Spark. Spark Runner in Beam was the fastest runner and Apache Spark was the fastest data processing framework overall.
Databehandling definieras generellt som insamling och omvandling av data för att extrahera meningsfull information. Databehandling involverar en mängd processer som validering, sorteringssammanfattning, aggregering för att nämna några. Många analysmotorer lämnar idag för storskalig databehandling, nämligen Apache Spark, Apache Flink och Apache Beam. Var och en av dessa motorer har sina egna fördelar och nackdelar. I den här avhandlingsrapporten använde vi alla dessa tre motorer för att bearbeta data från kolmonoxidens dagliga sammanfattningsdataset för att bestämma utsläppsnivåerna per område och tidsenhet. Sedan jämförde vi prestandan hos dessa 3 motorer med olika mått. Resultaten visade att Apache Beam, även om det erbjuds större bekvämlighet när man skriver program, var långsammare än Apache Flink och Apache Spark. Spark Runner in Beam var den snabbaste löparen och Apache Spark var den snabbaste databehandlingsramen totalt.
APA, Harvard, Vancouver, ISO, and other styles
39

Marselli, Catherine. "Data processing of a navigation microsystem." Université de Franche-Comté. UFR des sciences et techniques, 1998. http://www.theses.fr/1998BESA2078.

Full text
Abstract:
Ce travail de recherche s'inscrit dans un projet académique franco-suisse visant à déterminer les limites des microtechnologies et des microsystèmes à travers l'exemple d'un système de navigation inertiel basé sur des microaccéléromètres et des microcapteurs de vitesse angulaire (gyros). Il comprend quatre volets allant de la conception des composants au développement du système de traitement. La présente thèse réalisée à l'Institut de Microtechnique (Université de Neuchâtel, Suisse) concerne le traitement de l'information du microsystème de navigation. Les microcapteurs actuels sont moins chers mais moins précis que les capteurs mécaniques ou optiques classiques. Dans un système de navigation, le signal de sortie des accéléromètres et des gyroscopes est intégré conduisant à l'accumulation des erreurs dans le temps. Ainsi, en l'absence de correction, la trajectoire mesurée devient rapidement fausse. Le rôle du système de traitement est donc de calculer les paramètres de navigation (position, vitesse et orientation) et de limiter l'erreur de trajectoire suivant deux moyens : réduire les imperfections des capteurs et recaler régulièrement la trajectoire en utilisant un autre moyen de navigation. Ces différents objectifs sont réalisés par des filtres de Kalman. Le filtre de Kalman est un estimateur optimal de l'état d'un système. Il se présente sous la forme d'un ensemble d'équations récurrentes et nécessite une description d'état du système et des mesures
This research is part of a Swiss French academic project whose goal was the determination of some limits in the design and use of microtechnologies and microsystems, using as a common thread example a navigation system based on microaccelerometers and angular rate microsensors (gyros). The entire project was divided into four parts, including design at the component level as well as at the system level. This PhD report describes the data processing of the navigation microsystem realised at the Electronics and Signal Processing Laboratory of the Institute of Microtechnology, University of Neuchâtel. Current low-cost microsensors are less expensive but less accurate that mechanical or optical sensors. In a navigation system, the accelerometer and gyro outputs are integrated, leading to the accumulation of the errors. Thus, the measured trajectory becomes quickly wrong and a corrective system has to be designed. Hence, the goals of the data processing system is to compute the navigation parameters (position, velocity, orientation) while preventing the trajectory from diverging, following two approaches: reducing the sensor errors,updating regularly the trajectory using an aiding navigation system
APA, Harvard, Vancouver, ISO, and other styles
40

Ives, Zachary G. "Efficient query processing for data integration /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6864.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Lian, Xiang. "Efficient query processing over uncertain data /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20LIAN.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Neeli, Sandeep Wilamowski Bogdan M. "Internet data acquisition, search and processing." Auburn, Ala., 2009. http://hdl.handle.net/10415/1969.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Lei, Chuan. "Recurring Query Processing on Big Data." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-dissertations/550.

Full text
Abstract:
The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop's success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data. In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well. Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness.
APA, Harvard, Vancouver, ISO, and other styles
44

Liu, Kun. "Multi-View Oriented 3D Data Processing." Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0273/document.

Full text
Abstract:
Le raffinement de nuage de points et la reconstruction de surface sont deux problèmes fondamentaux dans le traitement de la géométrie. La plupart des méthodes existantes ont été ciblées sur les données de capteur de distance et se sont avérées être mal adaptées aux données multi-vues. Dans cette thèse, deux nouvelles méthodes sont proposées respectivement pour les deux problèmes avec une attention particulière aux données multi-vues. La première méthode permet de lisser les nuages de points provenant de la reconstruction multi-vue sans endommager les données. Le problème est formulé comme une optimisation non-linéaire sous contrainte et ensuite résolu par une série de problèmes d’optimisation sans contrainte au moyen d’une méthode de barrière. La seconde méthode effectue une triangulation du nuage de points d’entrée pour générer un maillage en utilisant une stratégie de l’avancement du front pilotée par un critère de l’empilement compact de sphères. L’algorithme est simple et permet de produire efficacement des maillages de haute qualité. Les expérimentations sur des données synthétiques et du monde réel démontrent la robustesse et l’efficacité des méthodes proposées. Notre méthodes sont adaptées aux applications qui nécessitent des informations de position précises et cohérentes telles que la photogrammétrie et le suivi des objets en vision par ordinateur
Point cloud refinement and surface reconstruction are two fundamental problems in geometry processing. Most of the existing methods have been targeted at range sensor data and turned out be ill-adapted to multi-view data. In this thesis, two novel methods are proposed respectively for the two problems with special attention to multi-view data. The first method smooths point clouds originating from multi-view reconstruction without impairing the data. The problem is formulated as a nonlinear constrained optimization and addressed as a series of unconstrained optimization problems by means of a barrier method. The second method triangulates point clouds into meshes using an advancing front strategy directed by a sphere packing criterion. The method is algorithmically simple and can produce high-quality meshes efficiently. The experiments on synthetic and real-world data have been conducted as well, which demonstrates the robustness and the efficiency of the methods. The developed methods are suitable for applications which require accurate and consistent position information such photogrammetry and tracking in computer vision
APA, Harvard, Vancouver, ISO, and other styles
45

Liu, Kun. "Multi-View Oriented 3D Data Processing." Electronic Thesis or Diss., Université de Lorraine, 2015. http://www.theses.fr/2015LORR0273.

Full text
Abstract:
Le raffinement de nuage de points et la reconstruction de surface sont deux problèmes fondamentaux dans le traitement de la géométrie. La plupart des méthodes existantes ont été ciblées sur les données de capteur de distance et se sont avérées être mal adaptées aux données multi-vues. Dans cette thèse, deux nouvelles méthodes sont proposées respectivement pour les deux problèmes avec une attention particulière aux données multi-vues. La première méthode permet de lisser les nuages de points provenant de la reconstruction multi-vue sans endommager les données. Le problème est formulé comme une optimisation non-linéaire sous contrainte et ensuite résolu par une série de problèmes d’optimisation sans contrainte au moyen d’une méthode de barrière. La seconde méthode effectue une triangulation du nuage de points d’entrée pour générer un maillage en utilisant une stratégie de l’avancement du front pilotée par un critère de l’empilement compact de sphères. L’algorithme est simple et permet de produire efficacement des maillages de haute qualité. Les expérimentations sur des données synthétiques et du monde réel démontrent la robustesse et l’efficacité des méthodes proposées. Notre méthodes sont adaptées aux applications qui nécessitent des informations de position précises et cohérentes telles que la photogrammétrie et le suivi des objets en vision par ordinateur
Point cloud refinement and surface reconstruction are two fundamental problems in geometry processing. Most of the existing methods have been targeted at range sensor data and turned out be ill-adapted to multi-view data. In this thesis, two novel methods are proposed respectively for the two problems with special attention to multi-view data. The first method smooths point clouds originating from multi-view reconstruction without impairing the data. The problem is formulated as a nonlinear constrained optimization and addressed as a series of unconstrained optimization problems by means of a barrier method. The second method triangulates point clouds into meshes using an advancing front strategy directed by a sphere packing criterion. The method is algorithmically simple and can produce high-quality meshes efficiently. The experiments on synthetic and real-world data have been conducted as well, which demonstrates the robustness and the efficiency of the methods. The developed methods are suitable for applications which require accurate and consistent position information such photogrammetry and tracking in computer vision
APA, Harvard, Vancouver, ISO, and other styles
46

Dao, Quang Minh. "High performance processing of metagenomics data." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS203.

Full text
Abstract:
Avec l'avènement de la technologie de séquençage de la prochaine génération, une quantité sans cesse croissante de données génomiques est produite à mesure que le coût du séquençage diminue. Cela a permis au domaine de la métagénomique de se développer rapidement. Par conséquent, la communauté bioinformatique est confrontée à des goulots d'étranglement informatiques sans précédent pour traiter les énormes ensembles de données métagénomiques. Les pipelines traditionnels de métagénomique se composent de plusieurs étapes, utilisant différentes plates-formes de calcul distribuées et parallèles pour améliorer leurs performances. Cependant, l'évolutivité de ces outils n'est pas efficace. Ils affichent de lourds frais généraux d'exécution lors du prétraitement de grandes quantités de données et ne sont pas en mesure de passer automatiquement à l'échelle supérieure pour collecter davantage de ressources informatiques. De plus, l'absence de modularité intégrée rend également leur maintenance et leur évolutivité difficiles. Ici, nous avons conçu QMSpy, une nouvelle plate-forme tout-en-un à la fois évolutive et modulaire. Dès le début, les lectures brutes de séquençage sont stockées sur stockage distribué et transformées en objets distribués, qui sont prétraités (rognés, nettoyés, filtrés, etc.), mis en correspondance avec le catalogue du génome de référence et comptés pour générer des tables d'abondance. QMSpy a été construit sur un cluster de calcul haute performance, utilisant le framework PySpark - un logiciel adaptatif qui supporte Python on Spark et étend le modèle Hadoop MapReduce. QMSpy a été testé avec des ensembles de données simulées et réelles. Dans ce pipeline, nous avons intégré des outils bioinformatiques bien connus tels que Bowtie2, Trimmomatic, Bwa, HiSat, Minimap, etc. pour traiter le séquençage des données. Notre approche prend en charge la création de workflows personnalisables en utilisant une enveloppe d'outils pour distribuer des logiciels externes dans des modules exécutables à déployer sur le cluster Spark et à exécuter en parallèle. De plus, QMSpy peut être déployé sur presque toutes les plates-formes de services informatiques à haute performance populaires telles que Google Cloud, Amazon Web Services, Microsoft Azure ou Docker et s'intégrer de manière flexible dans l'environnement d'entreprise et organisationnel tel que Hortonwork Data Platform, Salesforce, Teradata etc. En comparant QMSpy avec des ensembles de données réelles et simulées, nous avons identifié certains des facteurs les plus importants qui influencent l'exactitude du processus de quantification. Enfin, QMSpy avec ses caractéristiques telles que l'évolutivité et la modularité permettent aux bioinformaticiens de proposer de nouveaux algorithmes qui améliorent la quantification génétique, taxonomique et fonctionnelle des écosystèmes microbiens. Et nous croyons que cette ressource sera d'une grande valeur pour le domaine de la gestion de la quantitative metagenomics
The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Many diseases such as obesity, inflammatory bowel (IBD), lean or beses twins, colorectal cancers and so on (Qin et al. 2010; Turnbaugh et al. 2009) have already been found to be associated with changes in the human microbiome. To investigate these relationships, quantitative metagenomics (QM) studies based on sequencing data could be performed. Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for precision medicine and for the medical management of chronic diseases. Results from such QM studies which report the organisms present in the samples and profile their abundances, will be used for continuous analyses. The terms microbiome and microbiota are used indistinctly to describe the community of microorganisms that live in a given environment. The development of high-throughput DNA sequencing technologies has boosted microbiome research through the study of microbial genomes allowing a more precise quantification of microbial and functional abundance. However, microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional structure of microbiome data. The data preprocessing is typically implemented as a pipeline (workflow) with third-party software that each process input files and produces output files. The pipelines are often deep, with ten or more tools, which could be very diverse from different languages such as R, Python, Perl etc. and integrated into different frameworks (Leipzig 2017) such as Galaxy, Apache Taverna, Toil etc. The challenges with existing approaches is that they are not always efficient with very large datasets in terms of scalability for individual tools in a metagenomics pipeline and their execution speed also has not met the expectations of the bioinformaticians. To date, more and more data are captured or generated from many different research areas such as Physics, Climatology, Sociology, Remote sensing or Management as well as bioinformatics. Indeed, Big Data Analytics (BDA) describes the unprecedented growth of data generated and collected from all kinds of data sources as mentioned above. This growth could be in the volume of data, in the speed of data moving in/out or in the speed of analyzing data which depends on high-performance computing (HPC) technologies. In the past few decades since the invention of the computer, HPC has contributed significantly to our quality of life - driving scientific innovation, enhancing engineering design and consumer goods manufacturing, as well as strengthening national and international security. This has been recognised and emphasised by both government and industry, with major ongoing investments in areas encompassing weather forecasting, scientific research and development as well as drug design and healthcare outcomes. In many ways, those two worlds (HPC and big data) are slowly, but surely converging. They are the keys to overcome limitations of bioinformatics analysis in general and quantitative metagenomics analysis in particular. Within the scope of this thesis, we contributed a novel bioinformatics framework and pipeline called QMSpy which helped bioinformaticians overcome limitations related to HPC and big data domains in the context of quantitative metagenomics. QMSpy tackles two challenges introduced by large scale NGS data: (i) sequencing data alignment - a computation intensive task and (ii) quantify metagenomics objects - a memory intensive task. By leveraging the powerful distributed computing engine (Apache Spark), in combination with the workflow management of big data processing (Hortonwork Data Platform), QMSpy allows us not only to bypass [...]
APA, Harvard, Vancouver, ISO, and other styles
47

Bilalli, Besim. "Learning the impact of data pre-processing in data analysis." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/587221.

Full text
Abstract:
There is a clear correlation between data availability and data analytics, and hence with the increase of data availability --- unavoidable according to Moore's law, the need for data analytics increases too. This certainly engages many more people, not necessarily experts, to perform analytics tasks. However, the different, challenging, and time consuming steps of the data analytics process, overwhelm non-experts and they require support (e.g., through automation or recommendations). A very important and time consuming step that marks itself out of the rest, is the data pre-processing step. Data pre-processing is challenging but at the same time has a heavy impact on the overall analysis. In this regard, previous works have focused on providing user assistance in data pre-processing but without being concerned on its impact on the analysis. Hence, the goal has generally been to enable analysis through data pre-processing and not to improve it. In contrast, this thesis aims at developing methods that provide assistance in data pre-processing with the only goal of improving (e.g., increasing the predictive accuracy of a classifier) the result of the overall analysis. To this end, we propose a method and define an architecture that leverages ideas from meta-learning to learn the relationship between transformations (i.e., pre-processing operators) and mining algorithms (i.e., classification algorithms). This eventually enables ranking and recommending transformations according to their potential impact on the analysis. To reach this goal, we first study the currently available methods and systems that provide user assistance, either for the individual steps of data analytics or for the whole process altogether. Next, we classify the metadata these different systems use and then specifically focus on the metadata used in meta-learning. We apply a method to study the predictive power of these metadata and we extract and select the metadata that are most relevant. Finally, we focus on the user assistance in the pre-processing step. We devise an architecture and build a tool, PRESISTANT, that given a classification algorithm is able to recommend pre-processing operators that once applied, positively impact the final results (e.g., increase the predictive accuracy). Our results show that providing assistance in data pre-processing with the goal of improving the result of the analysis is feasible and also very useful for non-experts. Furthermore, this thesis is a step towards demystifying the non-trivial task of pre-processing that is an exclusive asset in the hands of experts.
Existe una clara correlación entre disponibilidad y análisis de datos, por tanto con el incremento de disponibilidad de datos --- inevitable según la ley de Moore, la necesidad de analizar datos se incrementa también. Esto definitivamente involucra mucha más gente, no necesariamente experta, en la realización de tareas analíticas. Sin embargo los distintos, desafiantes y temporalmente costosos pasos del proceso de análisis de datos abruman a los no expertos, que requieren ayuda (por ejemplo, automatización o recomendaciones). Uno de los pasos más importantes y que más tiempo conlleva es el pre-procesado de datos. Pre-procesar datos es desafiante, y a la vez tiene un gran impacto en el análisis. A este respecto, trabajos previos se han centrado en proveer asistencia al usuario en el pre-procesado de datos pero sin tener en cuenta el impacto en el resultado del análisis. Por lo tanto, el objetivo ha sido generalmente el de permitir analizar los datos mediante el pre-procesado y no el de mejorar el resultado. Por el contrario, esta tesis tiene como objetivo desarrollar métodos que provean asistencia en el pre-procesado de datos con el único objetivo de mejorar (por ejemplo, incrementar la precisión predictiva de un clasificador) el resultado del análisis. Con este objetivo, proponemos un método y definimos una arquitectura que emplea ideas de meta-aprendizaje para encontrar la relación entre transformaciones (operadores de pre-procesado) i algoritmos de minería de datos (algoritmos de clasificación). Esto, eventualmente, permite ordenar y recomendar transformaciones de acuerdo con el impacto potencial en el análisis. Para alcanzar este objetivo, primero estudiamos los métodos disponibles actualmente y los sistemas que proveen asistencia al usuario, tanto para los pasos individuales en análisis de datos como para el proceso completo. Posteriormente, clasificamos los metadatos que los diferentes sistemas usan y ponemos el foco específicamente en aquellos que usan metadatos para meta-aprendizaje. Aplicamos un método para estudiar el poder predictivo de los metadatos y extraemos y seleccionamos los metadatos más relevantes. Finalmente, nos centramos en la asistencia al usuario en el paso de pre-procesado de datos. Concebimos una arquitectura y construimos una herramienta, PRESISTANT, que dado un algoritmo de clasificación es capaz de recomendar operadores de pre-procesado que una vez aplicados impactan positivamente el resultado final (por ejemplo, incrementan la precisión predictiva). Nuestros resultados muestran que proveer asistencia al usuario en el pre-procesado de datos con el objetivo de mejorar el resultado del análisis es factible y muy útil para no-expertos. Además, esta tesis es un paso en la dirección de desmitificar que la tarea no trivial de pre-procesar datos esta solo al alcance de expertos.
APA, Harvard, Vancouver, ISO, and other styles
48

Derksen, Timothy J. (Timothy John). "Processing of outliers and missing data in multivariate manufacturing data." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/38800.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.
Includes bibliographical references (leaf 64).
by Timothy J. Derksen.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
49

Cena, Bernard Maria. "Reconstruction for visualisation of discrete data fields using wavelet signal processing." University of Western Australia. Dept. of Computer Science, 2000. http://theses.library.uwa.edu.au/adt-WU2003.0014.

Full text
Abstract:
The reconstruction of a function and its derivative from a set of measured samples is a fundamental operation in visualisation. Multiresolution techniques, such as wavelet signal processing, are instrumental in improving the performance and algorithm design for data analysis, filtering and processing. This dissertation explores the possibilities of combining traditional multiresolution analysis and processing features of wavelets with the design of appropriate filters for reconstruction of sampled data. On the one hand, a multiresolution system allows data feature detection, analysis and filtering. Wavelets have already been proven successful in these tasks. On the other hand, a choice of discrete filter which converges to a continuous basis function under iteration permits efficient and accurate function representation by providing a “bridge” from the discrete to the continuous. A function representation method capable of both multiresolution analysis and accurate reconstruction of the underlying measured function would make a valuable tool for scientific visualisation. The aim of this dissertation is not to try to outperform existing filters designed specifically for reconstruction of sampled functions. The goal is to design a wavelet filter family which, while retaining properties necessary to preform multiresolution analysis, possesses features to enable the wavelets to be used as efficient and accurate “building blocks” for function representation. The application to visualisation is used as a means of practical demonstration of the results. Wavelet and visualisation filter design is analysed in the first part of this dissertation and a list of wavelet filter design criteria for visualisation is collated. Candidate wavelet filters are constructed based on a parameter space search of the BC-spline family and direct solution of equations describing filter properties. Further, a biorthogonal wavelet filter family is constructed based on point and average interpolating subdivision and using the lifting scheme. The main feature of these filters is their ability to reconstruct arbitrary degree piecewise polynomial functions and their derivatives using measured samples as direct input into a wavelet transform. The lifting scheme provides an intuitive, interval-adapted, time-domain filter and transform construction method. A generalised factorisation for arbitrary primal and dual order point and average interpolating filters is a result of the lifting construction. The proposed visualisation filter family is analysed quantitatively and qualitatively in the final part of the dissertation. Results from wavelet theory are used in the analysis which allow comparisons among wavelet filter families and between wavelets and filters designed specifically for reconstruction for visualisation. Lastly, the performance of the constructed wavelet filters is demonstrated in the visualisation context. One-dimensional signals are used to illustrate reconstruction performance of the wavelet filter family from noiseless and noisy samples in comparison to other wavelet filters and dedicated visualisation filters. The proposed wavelet filters converge to basis functions capable of reproducing functions that can be represented locally by arbitrary order piecewise polynomials. They are interpolating, smooth and provide asymptotically optimal reconstruction in the case when samples are used directly as wavelet coefficients. The reconstruction performance of the proposed wavelet filter family approaches that of continuous spatial domain filters designed specifically for reconstruction for visualisation. This is achieved in addition to retaining multiresolution analysis and processing properties of wavelets.
APA, Harvard, Vancouver, ISO, and other styles
50

Chintala, Venkatram Reddy. "Digital image data representation." Ohio : Ohio University, 1986. http://www.ohiolink.edu/etd/view.cgi?ohiou1183128563.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography