Dissertations / Theses on the topic 'Datasety'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Datasety.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Zembjaková, Martina. "Prieskum a taxonómia sieťových forenzných nástrojov." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445488.
Full textKratochvíla, Lukáš. "Trasování objektu v reálném čase." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403748.
Full textSingh, Manjeet. "A Comparison of Rule Extraction Techniques with Emphasis on Heuristics for Imbalanced Datasets." Ohio University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1282139633.
Full textSilva, Jesús, Palma Hugo Hernández, Núẽz William Niebles, David Ovallos-Gazabon, and Noel Varela. "Parallel Algorithm for Reduction of Data Processing Time in Big Data." Institute of Physics Publishing, 2020. http://hdl.handle.net/10757/652134.
Full textMunyombwe, Theresa. "The harmonisation of stroke datasets : a case study of four UK datasets." Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/13511/.
Full textFurman, Yoel Avraham. "Forecasting with large datasets." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:69f2833b-cc53-457a-8426-37c06df85bc2.
Full textMumtaz, Shahzad. "Visualisation of bioinformatics datasets." Thesis, Aston University, 2015. http://publications.aston.ac.uk/25261/.
Full textMazumdar, Suvodeep. "Visualising large semantic datasets." Thesis, University of Sheffield, 2013. http://etheses.whiterose.ac.uk/5932/.
Full textDe, León Eduardo Enrique. "Medical abstract inference dataset." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/119516.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 35).
In this thesis, I built a dataset for predicting clinical outcomes from medical abstracts and their title. Medical Abstract Inference consists of 1,794 data points. Titles were filtered to include the abstract's reported medical intervention and clinical outcome. Data points were annotated with the interventions effect on the outcome. Resulting labels were one of the following: increased, decreased, or had no significant difference on the outcome. In addition, rationale sentences were marked, these sentences supply the necessary supporting evidence for the overall prediction. Preliminary modeling was also done to evaluate the corpus. Preliminary models included top performing Natural Language Inference models as well as Rationale based models and linear classifiers.
by Eduardo Enrique de León.
M. Eng.
Schöner, Holger. "Working with real world datasets preprocessing and prediction with large incomplete and heterogeneous datasets /." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=973424672.
Full textGemulla, Rainer. "Sampling Algorithms for Evolving Datasets." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2008. http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644.
Full textSchmidt, Heiko A. "Phylogenetic trees from large datasets." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=968534945.
Full textJones, Martin. "Multigene datasets for deep phylogeny." Thesis, University of Edinburgh, 2007. http://hdl.handle.net/1842/2575.
Full textTraore, Michael. "Interactive visualization for volumetric datasets." Thesis, Toulouse, ISAE, 2018. http://www.theses.fr/2018ESAE0028.
Full textOcclusion is an issue in volumetric visualization as it prevents direct visualizationof the region of interest. While most existing systems use a combination of DirectVolume Rendering (DVR) technique and its corresponding Transfer Function (TF),we considered alternative interaction techniques to explore such datasets.First, we proposed a new interactive visualization system for 3D scanned baggageaccelerated with GPGPU techniques in accordance with the needs we extractedfrom the contextual inquiry with the airport security agents.Secondly, we proposed a novel technique which combines high-quality DVRwith a fast, versatile, and easy to use, lens to support the interactive explorationof occluded data in volumes
Giritharan, Balathasan. "Incremental Learning with Large Datasets." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc149595/.
Full textBarnathan, Michael. "Mining Complex High-Order Datasets." Diss., Temple University Libraries, 2010. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/82058.
Full textPh.D.
Selection of an appropriate structure for storage and analysis of complex datasets is a vital but often overlooked decision in the design of data mining and machine learning experiments. Most present techniques impose a matrix structure on the dataset, with rows representing observations and columns representing features. While this assumption is reasonable when features are scalar and do not exhibit co-dependence, the matrix data model becomes inappropriate when dependencies between non-target features must be modeled in parallel, or when features naturally take the form of higher-order multilinear structures. Such datasets particularly abound in functional medical imaging modalities, such as fMRI, where accurate integration of both spatial and temporal information is critical. Although necessary to take full advantage of the high-order structure of these datasets and built on well-studied mathematical tools, tensor analysis methodologies have only recently entered widespread use in the data mining community and remain relatively absent from the literature within the biomedical domain. Furthermore, naive tensor approaches suffer from fundamental efficiency problems which limit their practical use in large-scale high-order mining and do not capture local neighborhoods necessary for accurate spatiotemporal analysis. To address these issues, a comprehensive framework based on wavelet analysis, tensor decomposition, and the WaveCluster algorithm is proposed for addressing the problems of preprocessing, classification, clustering, compression, feature extraction, and latent concept discovery on large-scale high-order datasets, with a particular emphasis on applications in computer-assisted diagnosis. Our framework is evaluated on a 9.3 GB fMRI motor task dataset of both high dimensionality and high order, performing favorably against traditional voxelwise and spectral methods of analysis, discovering latent concepts suggestive of subject handedness, and reducing space and time complexities by up to two orders of magnitude. Novel wavelet and tensor tools are derived in the course of this work, including a novel formulation of an r-dimensional wavelet transform in terms of elementary tensor operations and an enhanced WaveCluster algorithm capable of clustering real-valued as well as binary data. Sparseness-exploiting properties are demonstrated and variations of core algorithms for specialized tasks such as image segmentation are presented.
Temple University--Theses
Alawini, Abdussalam. "Identifying Relationships between Scientific Datasets." Thesis, Portland State University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10127966.
Full textScientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between datasets can help scientists recall their original derivation history. For instance, if dataset A is contained in dataset B, then the connection between A and B could be that A was extended to create B.
We present a relationship-identification methodology as a solution to this problem. To examine the feasibility of our approach, we articulated a set of relevant relationships, developed algorithms for efficient discovery of these relationships, and organized these algorithms into a new system called ReConnect to assist scientists in relationship discovery. We also evaluated existing alternative approaches that rely on flagging differences between two spreadsheets and found that they were impractical for many relationship-discovery tasks. Additionally, we conducted a user study, which showed that relationships do occur in real-world spreadsheets, and that ReConnect can improve scientists' ability to detect such relationships between datasets.
The promising results of ReConnect's evaluation encouraged us to explore a more automated approach for relationship discovery. In this dissertation, we introduce an automated end-to-end prototype system, ReDiscover, that identifies, from a collection of datasets, the pairs that are most likely related, and the relationship between them. Our experimental results demonstrate the overall effectiveness of ReDiscover in predicting relationships in a scientist's or a small group of researchers' collections of datasets, and the sensitivity of the overall system to the performance of its various components.
Jayaraman, Jayakumar. "Dental age assessment of Southern Chinese using Demirjian's dataset and the United Kingdom dataset." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B45447767.
Full textHorečný, Peter. "Metody segmentace obrazu s malými trénovacími množinami." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412996.
Full textKoufakou, Anna. "SCALABLE AND EFFICIENT OUTLIER DETECTION IN LARGE DISTRIBUTED DATA SETS WITH MIXED-TYPE ATTRIBUTES." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3431.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD
Sysoev, Oleg. "Monotonic regression for large multivariate datasets /." Linköping : Department of Cuputer and Information Science, Linköping University, 2010. http://www2.bibl.liu.se/liupubl/disp/disp2010/stat11s.pdf.
Full textMahmood, Muhammad Habib. "Motion annotation in complex video datasets." Doctoral thesis, Universitat de Girona, 2018. http://hdl.handle.net/10803/667583.
Full textLa segmentació del moviment es refereix al procés de separar regions i trajectòries d'una seqüència de vídeo en subconjunts coherents d'espai i de temps. En aquesta tesi hem creat un nou i multifacètic dataset amb seqüències de la vida real que inclou diferent número de moviments i fotogrames per seqüència i distorsions amb dades incomplertes. A més, inclou ground-truth en tots els fotogrames basat en mesures de trajectòria i regió. Hem proposat també una nova eina semiautomàtica per delinear les trajectòries en vídeos complexos, fins i tot en vídeos capturats amb càmeres mòbils. Amb una mínima anotació manual dels objectes, l'algoritme és capaç de propagar-la en tots els fotogrames. Durant les oclusions, la correcció de les etiquetes es realitza aplicant el seguiment de la màscara per a cada ordre de profunditat. Els resultats obtinguts mostren que el nostre enfocament ofereix resultats reeixits en una àmplia varietat de seqüències de vídeo.
Shi, Xiaojin. "Visual learning from small training datasets /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2005. http://uclibs.org/PID/11984.
Full textCotter, Andrew. "Regression on datasets containing missing elements." Diss., Connect to online resource, 2005. http://wwwlib.umi.com/cr/colorado/fullcit?p1425786.
Full textZhang, Xiaoyu. "Scalable isocontour visualization for large datasets /." Full text (PDF) from UMI/Dissertation Abstracts International, 2001. http://wwwlib.umi.com/cr/utexas/fullcit?p3064695.
Full textYang, Chaozheng. "Sufficient Dimension Reduction in Complex Datasets." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/404627.
Full textPh.D.
This dissertation focuses on two problems in dimension reduction. One is using permutation approach to test predictor contribution. The permutation approach applies to marginal coordinate tests based on dimension reduction methods such as SIR, SAVE and DR. This approach no longer requires calculation of the method-specific weights to determine the asymptotic null distribution. The other one is through combining clustering method with robust regression (least absolute deviation) to estimate dimension reduction subspace. Compared with ordinary least squares, the proposed method is more robust to outliers; also, this method replaces the global linearity assumption with the more flexible local linearity assumption through k-means clustering.
Temple University--Theses
Blum, Joshua (Joshua M. ). "Pinky : interactively analyzing large EEG datasets." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105939.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 75-77).
In this thesis, I describe a system I designed and implemented for interactively analyzing large electroencephalogram (EEG) datasets. Trained experts, known as encephalographers, analyze EEG data to determine if a patient has experienced an epileptic seizure. Since EEG analysis is time intensive for large datasets, there is a growing corpus of unanalyzed EEG data. Fast analysis is essential for building a set of example data of EEG results, allowing doctors to quickly classify the behavior of future EEG scans. My system aims to reduce the cost of analysis by providing near real-time interaction with the datasets. The system has three optimized layers handling the storage, computation, and visualization of the data. I evaluate the design choices for each layer and compare three dierent implementations across dierent workloads.
by Joshua Blum.
M. Eng.
Hilton, Erwin. "Visual datasets for artificial intelligence agents." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119553.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 41).
In this thesis, I designed and implemented two visual dataset generation tool frameworks. With these tools, I introduce procedurally generated new data to test VQA agents and other visual Al models on. The first tool is Spatial IQ Generative Dataset (SIQGD). This tool generates images based on the Raven's Progressive Matrices spatial IQ examination metric. The second tool is a collection of 3D models along with a Blender3D extension that renders images of the models from multiple viewpoints along with their depth maps.
by Erwin Hilton.
M. Eng.
Roizman, Violeta. "Flexible clustering algorithms for heterogeneous datasets." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG002.
Full textThe goal of the clustering task is to find groups of elements that are homogeneous with respect to a chosen distance. Given its unsupervised nature, clustering can be applied to any kind of data and there is no need to proceed to the costly labelling process. One of the most popular clustering algorithms is the one built on the Gaussian Mixture Model (GMM). This algorithm is very intuitive and works well when the clusters have an elliptical shape.Regardless of its popularity, the GMM model has a poor performance when the data points do not fulfil a basic assumption: Gaussian distributed clusters. The model performance can be strongly degraded by the non-robustness of the classical estimators implicated in the model fitting when the data contains outliers or noise.In this thesis, we give an alternative approach to the robustification of the GMM-EM method. We adopt a model based on Elliptical Symmetric distributions that manages to describe a more general range of distributions. Besides, we introduce extra parameters that increase the flexibility of our model and lead to generalizations of classical robust estimators. In order to support the robust claims about our algorithm, we provide theoretical and practical analyses that help to understand the general character of the proposal.Afterwards, we tackle the outlier rejection task. We consider a robust version of the Mahalanobis distance and study its distribution. Knowing the distribution helps us setting a rejection threshold.Finally, we address two applications related to radar images through a clustering perspective. First, we consider the image segmentation task. In the end, we apply our flexible algorithm to solve the change detection problem for image time series
Katcoff, Abigail. "Aligning heterogenous single cell assay datasets." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123030.
Full textThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 51-53).
Pluripotent stem cells offer strong promise for regenerative medicine but the pluripotent cell state is poorly understood. The goal of this thesis is the development of methods to analyze how the multiple facets of cell state-including gene expression, chromosome contacts, and chromatin accessibility-relate in the context of stem cells. The variability of each of these characteristics cannot be deduced from population studies, and while recent advances in single-cell transcriptomics have led to the development of a number of different single-cell assays, datasets that collect multiple types of assays on the same cells are rare. In this thesis, we explore the ability of three methods to integrate datasets from different single-cell assays based on an existing paired single-cell dataset of ATAC-seq and RNA-seq for human A549 cells. We then apply these methods to map the variability between three single-cell datasets-ATAC-seq, RNA-seq, and Hi-C-on pluripotent mouse embryonic stem cells and assess the performance of these methods.
by Abigail Katcoff.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Gerick, Steven Anthony. "Information Engineering with E-Learning Datasets." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-265008.
Full textSnabb utveckling inom E-lärandesindustrin gör snabba och generaliserbara metoder för informationsutveckling med E-lärandesdatabaser nödvändiga. Detta arbete tillämpar olika traditionella maskininlärnings- och matematiska metoder i en sådan databas för att identifiera mönster i användarfärdighet som inte lätt kan upptäckas genom att läsa igenom databasen. Detta arbete analyserar även metodernas generaliserbarhet, särskilt var dem kan användas, deras nackdelar, och vad databaserna behöver uppfylla för att lätt kunna analyseras med metoderna. Vi finner att många av metoderna kan upplysa om strukturer och mönster i databasen även om metoderna begränsas i effektivitet och gene- raliserbarhet. Metoderna är också enklare att tillämpa när databasens artiklar associeras med tydliga tidpunkter och studenternas betyg har hög upplösning. Vi föreslår ändringar för datainsamlingstekniken som kan förenkla paralleli- serbara storskalig tillämpningar av maskininlärningsmetoder på många databaser samtidigt.
Smith, Zach. "Joining and aggregating datasets using CouchDB." Master's thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/29530.
Full textSomasundaram, Jyothilakshmi. "Releasing Recommendation Datasets while Preserving Privacy." Miami University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=miami1306427987.
Full textHan, Qian. "Mining Shared Decision Trees between Datasets." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1274807201.
Full textJoshi, Vineet. "Unsupervised Anomaly Detection in Numerical Datasets." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744.
Full textLiu, Fang. "Mining Security Risks from Massive Datasets." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78684.
Full textPh. D.
Siddique, Nahian A. "PATTERN RECOGNITION IN CLASS IMBALANCED DATASETS." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4480.
Full textFraser, Ross Macdonald. "Computational analysis of nucleosome positioning datasets." Thesis, University of Edinburgh, 2006. http://hdl.handle.net/1842/29110.
Full textTao, F. "Data mining for relationships in large datasets." Thesis, Queen's University Belfast, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.273298.
Full textTalár, Ondřej. "Redukce šumu audionahrávek pomocí hlubokých neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-317118.
Full textRomuld, Daniel, and Markus Ruhmén. "Compiling attention datasets : Developing a method for annotating face datasets with human performance attention labels using crowdsourcing." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166708.
Full textDenna uppsats behandlar problemet med att upptäcka mänsklig uppmärksamhet, vilket är ett problem inom datorseende. För att göra framsteg mot att lösa problemet utvecklades en metod för att skapa uppmärksamhetsmärkningar till dataset av ansiktsbilder. Märkningarna utgör ett mått av den uppfattade uppmärksamhetsnivån hos personerna i bilderna. Arbetet i denna uppsats motiveras av avsaknaden av dataset med uppmärksamhetsmärkningar och den potentiella användbarheten av den framtagna metoden. Metoden konstruerades med fokus på att maximera tillförlitligheten och användbarheten av insamlad data och det resulterande datasetet. Som ett första steg i metodutvecklingen genererades bilder på folkmassor genom att använda datasetet Labeled Faces in the Wild. Evaluering av uppmärksamhetsnivån hos personerna i bilderna, som individer i en folkmassa, blev då möjligt. Denna egenskap utvärderades av arbetare på crowdsourcing-plattformen CrowdFlower. Svaren analyserades och kombinerades för att beräkna ett uppmärksamhetsmått med mänsklig prestanda för varje individ i bilderna. Resultatanalysen visade att svaren från arbetarna på CrowdFlower var tillförlitliga med hög intern konsistens. Den framtagna metoden ansågs vara ett giltigt tillvägagångssätt för att skapa uppmärksamhetsmärkningar. Möjliga förbättringar identifierades i flera delar av metoden och redovisas som del av uppsatsens huvudresultat.
Liu, Qing Computer Science & Engineering Faculty of Engineering UNSW. "Summarization of very large spatial dataset." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/25489.
Full textLamichhane, Niraj. "Prediction of Travel Time and Development of Flood Inundation Maps for Flood Warning System Including Ice Jam Scenario. A Case Study of the Grand River, Ohio." Youngstown State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463789508.
Full textAkuney, Arseniy. "Information flow identification in large email datasets." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/39847.
Full textKolloju, Naresh Kumar. "Flexible and efficient exploration of rated datasets." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44028.
Full textÖstergaard, Johan. "Planet Rendering Using Online High-Resolution Datasets." Thesis, Linköpings universitet, Institutionen för teknik och naturvetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-95360.
Full textDUSI, VENKATA SATYA SRIDHAR. "AUTOMATED DETECTION OF FEATURES IN CFD DATASETS." MSSTATE, 2001. http://sun.library.msstate.edu/ETD-db/theses/available/etd-11082001-152601/.
Full textGoldstein, Markus [Verfasser]. "Anomaly Detection in Large Datasets / Markus Goldstein." München : Verlag Dr. Hut, 2014. http://d-nb.info/1052374948/34.
Full textOsman, Ahmad. "Automated evaluation of three dimensional ultrasonic datasets." Phd thesis, INSA de Lyon, 2013. http://tel.archives-ouvertes.fr/tel-00995119.
Full textObalappa, Dinesh Tretiak Oleh J. "Optimal caching of large multi-dimensional datasets /." Philadelphia, Pa. : Drexel University, 2004. http://dspace.library.drexel.edu/handle/1860/307.
Full text