Rozprawy doktorskie na temat „ANALYZE BIG DATA”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „ANALYZE BIG DATA”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
SHARMA, DIVYA. "APPLICATION OF ML TO MAKE SENCE OF BIOLOGICAL BIG DATA IN DRUG DISCOVERY PROCESS". Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18378.
Pełny tekst źródłaUřídil, Martin. "Big data - použití v bankovní sféře". Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-149908.
Pełny tekst źródłaFlike, Felix, i Markus Gervard. "BIG DATA-ANALYS INOM FOTBOLLSORGANISATIONER En studie om big data-analys och värdeskapande". Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20117.
Pełny tekst źródłaŠoltýs, Matej. "Big Data v technológiách IBM". Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193914.
Pełny tekst źródłaVictoria, Åkestrand, i Wisen My. "Big Data-analyser och beslutsfattande i svenska myndigheter". Thesis, Högskolan i Halmstad, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-34752.
Pełny tekst źródłaKleisarchaki, Sofia. "Analyse des différences dans le Big Data : Exploration, Explication, Évolution". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM055/document.
Pełny tekst źródłaVariability in Big Data refers to data whose meaning changes continuously. For instance, data derived from social platforms and from monitoring applications, exhibits great variability. This variability is essentially the result of changes in the underlying data distributions of attributes of interest, such as user opinions/ratings, computer network measurements, etc. {em Difference Analysis} aims to study variability in Big Data. To achieve that goal, data scientists need: (a) measures to compare data in various dimensions such as age for users or topic for network traffic, and (b) efficient algorithms to detect changes in massive data. In this thesis, we identify and study three novel analytical tasks to capture data variability: {em Difference Exploration, Difference Explanation} and {em Difference Evolution}.Difference Exploration is concerned with extracting the opinion of different user segments (e.g., on a movie rating website). We propose appropriate measures for comparing user opinions in the form of rating distributions, and efficient algorithms that, given an opinion of interest in the form of a rating histogram, discover agreeing and disargreeing populations. Difference Explanation tackles the question of providing a succinct explanation of differences between two datasets of interest (e.g., buying habits of two sets of customers). We propose scoring functions designed to rank explanations, and algorithms that guarantee explanation conciseness and informativeness. Finally, Difference Evolution tracks change in an input dataset over time and summarizes change at multiple time granularities. We propose a query-based approach that uses similarity measures to compare consecutive clusters over time. Our indexes and algorithms for Difference Evolution are designed to capture different data arrival rates (e.g., low, high) and different types of change (e.g., sudden, incremental). The utility and scalability of all our algorithms relies on hierarchies inherent in data (e.g., time, demographic).We run extensive experiments on real and synthetic datasets to validate the usefulness of the three analytical tasks and the scalability of our algorithms. We show that Difference Exploration guides end-users and data scientists in uncovering the opinion of different user segments in a scalable way. Difference Explanation reveals the need to parsimoniously summarize differences between two datasets and shows that parsimony can be achieved by exploiting hierarchy in data. Finally, our study on Difference Evolution provides strong evidence that a query-based approach is well-suited to tracking change in datasets with varying arrival rates and at multiple time granularities. Similarly, we show that different clustering approaches can be used to capture different types of change
Nováková, Martina. "Analýza Big Data v oblasti zdravotnictví". Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-201737.
Pełny tekst źródłaEl, alaoui Imane. "Transformer les big social data en prévisions - méthodes et technologies : Application à l'analyse de sentiments". Thesis, Angers, 2018. http://www.theses.fr/2018ANGE0011/document.
Pełny tekst źródłaExtracting public opinion by analyzing Big Social data has grown substantially due to its interactive nature, in real time. In fact, our actions on social media generate digital traces that are closely related to our personal lives and can be used to accompany major events by analysing peoples' behavior. It is in this context that we are particularly interested in Big Data analysis methods. The volume of these daily-generated traces increases exponentially creating massive loads of information, known as big data. Such important volume of information cannot be stored nor dealt with using the conventional tools, and so new tools have emerged to help us cope with the big data challenges. For this, the aim of the first part of this manuscript is to go through the pros and cons of these tools, compare their respective performances and highlight some of its interrelated applications such as health, marketing and politics. Also, we introduce the general context of big data, Hadoop and its different distributions. We provide a comprehensive overview of big data tools and their related applications.The main contribution of this PHD thesis is to propose a generic analysis approach to automatically detect trends on given topics from big social data. Indeed, given a very small set of manually annotated hashtags, the proposed approach transfers information from hashtags known sentiments (positive or negative) to individual words. The resulting lexical resource is a large-scale lexicon of polarity whose efficiency is measured against different tasks of sentiment analysis. The comparison of our method with different paradigms in literature confirms the impact of our method to design accurate sentiment analysis systems. Indeed, our model reaches an overall accuracy of 90.21%, significantly exceeding the current models on social sentiment analysis
Pragarauskaitė, Julija. "Frequent pattern analysis for decision making in big data". Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2013. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2013~D_20130701_092451-80961.
Pełny tekst źródłaDidžiuliai informacijos kiekiai yra sukaupiami kiekvieną dieną pasaulyje bei jie sparčiai auga. Apytiksliai duomenų tyrybos algoritmai yra labai svarbūs analizuojant tokius didelius duomenų kiekius, nes algoritmų greitis yra ypač svarbus daugelyje sričių, tuo tarpu tikslieji metodai paprastai yra lėti bei naudojami tik uždaviniuose, kuriuose reikalingas tikslus atsakymas. Ši disertacija analizuoja kelias duomenų tyrybos sritis: dažnų sekų paiešką bei vizualizaciją sprendimų priėmimui. Dažnų sekų paieškai buvo pasiūlyti trys nauji apytiksliai metodai, kurie buvo testuojami naudojant tikras bei dirbtinai sugeneruotas duomenų bazes: • Atsitiktinės imties metodas (Random Sampling Method - RSM) formuoja pradinės duomenų bazės atsitiktinę imtį ir nustato dažnas sekas, remiantis atsitiktinės imties analizės rezultatais. Šio metodo privalumas yra teorinis paklaidų tikimybių įvertinimas, naudojantis standartiniais statistiniais metodais. • Daugybinio perskaičiavimo metodas (Multiple Re-sampling Method - MRM) yra RSM metodo patobulinimas, kuris formuoja kelias pradinės duomenų bazės atsitiktines imtis ir taip sumažina paklaidų tikimybes. • Markovo savybe besiremiantis metodas (Markov Property Based Method - MPBM) kelis kartus skaito pradinę duomenų bazę, priklausomai nuo Markovo proceso eilės, bei apskaičiuoja empirinius dažnius remdamasis Markovo savybe. Didelio duomenų kiekio vizualizavimui buvo naudojami pirkėjų internetu elgsenos duomenys, kurie analizuojami naudojant... [toliau žr. visą tekstą]
Landelius, Cecilia. "Data governance in big data : How to improve data quality in a decentralized organization". Thesis, KTH, Industriell ekonomi och organisation (Inst.), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301258.
Pełny tekst źródłaDen ökade användningen av internet har ökat mängden data som finns tillgänglig och mängden data som samlas in. Företag påbörjar därför initiativ för att analysera dessa stora mängder data för att få ökad förståelse. Dock är värdet av analysen samt besluten som baseras på analysen beroende av kvaliteten av den underliggande data. Av denna anledning har datakvalitet blivit en viktig fråga för företag. Misslyckanden i datakvalitetshantering är ofta på grund av organisatoriska aspekter. Eftersom decentraliserade organisationsformer blir alltmer populära, finns det ett behov av att förstå hur en decentraliserad organisation kan arbeta med frågor som datakvalitet och dess förbättring. Denna uppsats är en kvalitativ studie av ett företag inom logistikbranschen som i nuläget genomgår ett skifte till att bli datadrivna och som har problem med att underhålla sin datakvalitet. Syftet med denna uppsats är att besvara frågorna: • RQ1: Vad är datakvalitet i sammanhanget logistikdata? • RQ2: Vilka är hindren för att förbättra datakvalitet i en decentraliserad organisation? • RQ3: Hur kan dessa hinder överkommas? Flera datakvalitetsdimensioner identifierades och kategoriserades som kritiska problem, problem och icke-problem. Från den insamlade informationen fanns att dimensionerna, kompletthet, exakthet och konsekvens var kritiska datakvalitetsproblem för företaget. De tre mest förekommande hindren för att förbättra datakvalité var dataägandeskap, standardisering av data samt att förstå vikten av datakvalitet. För att överkomma dessa hinder är de viktigaste åtgärderna att skapa strukturer för dataägandeskap, att implementera praxis för hantering av datakvalitet samt att ändra attityden hos de anställda gentemot datakvalitet till en datadriven attityd. Generaliseringsbarheten av en enfallsstudie är låg. Dock medför denna studie flera viktiga insikter och trender vilka kan användas för framtida studier och för företag som genomgår liknande transformationer.
Grüning, Björn [Verfasser], i Stefan [Akademischer Betreuer] Günther. "Integrierte bioinformatische Methoden zur reproduzierbaren und transparenten Hochdurchsatz-Analyse von Life Science Big Data". Freiburg : Universität, 2015. http://d-nb.info/1122593996/34.
Pełny tekst źródłaÅhlander, Niclas, i Saed Aldaamsah. "Inhämtning & analys av Big Data med fokus på sociala medier". Thesis, Högskolan i Halmstad, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-29978.
Pełny tekst źródłaRivetti, di Val Cervo Nicolo. "Efficient Stream Analysis and its Application to Big Data Processing". Thesis, Nantes, 2016. http://www.theses.fr/2016NANT4046/document.
Pełny tekst źródłaNowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). The data streaming model provides randomized and/or approximated solutions to compute specific functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. In particular, we look into two classical and related data streaming problems: frequency estimation and (distributed) heavy hitters. A less common field of application is stream processing which is somehow complementary and more practical, providing efficient and highly scalable frameworks to perform soft real-time generic computation on streams, relying on cloud computing. This duality allows us to apply data streaming solutions to optimize stream processing systems. In this thesis, we provide a novel algorithm to track heavy hitters in distributed streams and two extensions of a well-known algorithm to estimate the frequencies of data items. We also tackle two related problems and their solution: provide even partitioning of the item universe based on their weights and provide an estimation of the values carried by the items of the stream. We then apply these results to both network monitoring and stream processing. In particular, we leverage these solutions to perform load shedding as well as to load balance parallelized operators in stream processing systems
Chen, Longbiao. "Big data-driven optimization in transportation and communication networks". Electronic Thesis or Diss., Sorbonne université, 2018. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2018SORUS393.pdf.
Pełny tekst źródłaThe evolution of metropolitan structures and the development of urban systems have created various kinds of urban networks, among which two types of networks are of great importance for our daily life, the transportation networks corresponding to human mobility in the physical space, and the communication networks supporting human interactions in the digital space. The rapid expansion in the scope and scale of these two networks raises a series of fundamental research questions on how to optimize these networks for their users. Some of the major objectives include demand responsiveness, anomaly awareness, cost effectiveness, energy efficiency, and service quality. Despite the distinct design intentions and implementation technologies, both the transportation and communication networks share common fundamental structures, and exhibit similar spatio-temporal dynamics. Correspondingly, there exists an array of key challenges that are common in the optimization in both networks, including network profiling, mobility prediction, traffic clustering, and resource allocation. To achieve the optimization objectives and address the research challenges, various analytical models, optimization algorithms, and simulation systems have been proposed and extensively studied across multiple disciplines. Generally, these simulation-based models are not evaluated in real-world networks, which may lead to sub-optimal results in deployment. With the emergence of ubiquitous sensing, communication and computing diagrams, a massive number of urban network data can be collected. Recent advances in big data analytics techniques have provided researchers great potentials to understand these data. Motivated by this trend, we aim to explore a new big data-driven network optimization paradigm, in which we address the above-mentioned research challenges by applying state-of-the-art data analytics methods to achieve network optimization goals. Following this research direction, in this dissertation, we propose two data-driven algorithms for network traffic clustering and user mobility prediction, and apply these algorithms to real-world optimization tasks in the transportation and communication networks. First, by analyzing large-scale traffic datasets from both networks, we propose a graph-based traffic clustering algorithm to better understand the traffic similarities and variations across different area and time. Upon this basis, we apply the traffic clustering algorithm to the following two network optimization applications. 1. Dynamic traffic clustering for demand-responsive bikeshare networks. In this application, we dynamically cluster bike stations with similar usage patterns to obtain stable and predictable cluster-wise bike traffic demands, so as to foresee over-demand stations in the network and enable demand-responsive bike scheduling. Evaluation results using real-world data from New York City and Washington, D.C. show that our framework accurately foresees over-demand clusters (e.g. with 0.882 precision and 0.938 recall in NYC), and outperforms other baseline methods significantly. 2. Complementary traffic clustering for cost-effective C-RAN. In this application, we cluster RRHs with complementary traffic patterns (e.g., an RRH in residential area and an RRH in business district) to reuse the total capacity of the BBUs, so as to reduce the overall deployment cost. We evaluate our framework with real-world network data collected from the city of Milan, Italy and the province of Trentino, Italy. Results show that our method effectively reduces the overall deployment cost to 48.4\% and 51.7\% of the traditional RAN architecture in the two datasets, respectively, and consistently outperforms other baseline methods. Second, by analyzing large-scale user mobility datasets from both networks, we propose [...]
Tian, Yongchao. "Accéler la préparation des données pour l'analyse du big data". Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0017/document.
Pełny tekst źródłaWe are living in a big data world, where data is being generated in high volume, high velocity and high variety. Big data brings enormous values and benefits, so that data analytics has become a critically important driver of business success across all sectors. However, if the data is not analyzed fast enough, the benefits of big data will be limited or even lost. Despite the existence of many modern large-scale data analysis systems, data preparation which is the most time-consuming process in data analytics has not received sufficient attention yet. In this thesis, we study the problem of how to accelerate data preparation for big data analytics. In particular, we focus on two major data preparation steps, data loading and data cleaning. As the first contribution of this thesis, we design DiNoDB, a SQL-on-Hadoop system which achieves interactive-speed query execution without requiring data loading. Modern applications involve heavy batch processing jobs over large volume of data and at the same time require efficient ad-hoc interactive analytics on temporary data generated in batch processing jobs. Existing solutions largely ignore the synergy between these two aspects, requiring to load the entire temporary dataset to achieve interactive queries. In contrast, DiNoDB avoids the expensive data loading and transformation phase. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata, that DiNoDB exploits to expedite the interactive queries. The second contribution is a distributed stream data cleaning system, called Bleach. Existing scalable data cleaning approaches rely on batch processing to improve data quality, which are very time-consuming in nature. We target at stream data cleaning in which data is cleaned incrementally in real-time. Bleach is the first qualitative stream data cleaning system, which achieves both real-time violation detection and data repair on a dirty data stream. It relies on efficient, compact and distributed data structures to maintain the necessary state to clean data, and also supports rule dynamics. We demonstrate that the two resulting systems, DiNoDB and Bleach, both of which achieve excellent performance compared to state-of-the-art approaches in our experimental evaluations, and can help data scientists significantly reduce their time spent on data preparation
Rodriguez, Pellière Lineth Arelys. "A qualitative analysis to investigate the enablers of big data analytics that impacts sustainable supply chain". Thesis, Ecole centrale de Nantes, 2019. http://www.theses.fr/2019ECDN0019/document.
Pełny tekst źródłaScholars and practitioners already shown that Big Data and Predictive Analytics also known in the literature as BDPA can play a pivotal role in transforming and improving the functions of sustainable supply chain analytics (SSCA). However, there is limited knowledge about how BDPA can be best leveraged to grow social, environmental and financial performance simultaneously. Therefore, with the knowledge coming from literature around SSCA, it seems that companies still struggled to implement SSCA practices. Researchers agree that is still a need to understand the techniques, tools, and enablers of the basics SSCA for its adoption; this is even more important to integrate BDPA as a strategic asset across business activities. Hence, this study investigates, for instance, what are the enablers of SSCA, and what are the tools and techniques of BDPA that enable the triple bottom line (3BL) of sustainability performances through SCA. The thesis adopted moderate constructionism since understanding of how the enablers of big data impacts sustainable supply chain analytics applications and performances. The thesis also adopted a questionnaire and a case study as a research strategy in order to capture the different perceptions of the people and the company on big data application on sustainable supply chain analytics. The thesis revealed a better insight of the factors that can affect in the adoption of big data on sustainable supply chain analytics. This research was capable to find the factors depending on the variable loadings that impact in the adoption of BDPA for SSCA, tools and techniques that enable decision making through SSCA, and the coefficient of each factor for facilitating or delaying sustainability adoption that wasn’t investigated before. The findings of the thesis suggest that the current tools that companies are using by itself can’t analyses data. The companies need more appropriate tools for the data analysis
Pšurný, Michal. "Big data analýzy a statistické zpracování metadat v archivu obrazové zdravotnické dokumentace". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-316821.
Pełny tekst źródłaBotes, André Romeo. "An artefact to analyse unstructured document data stores / by André Romeo Botes". Thesis, North-West University, 2014. http://hdl.handle.net/10394/10608.
Pełny tekst źródłaMSc (Computer Science), North-West University, Vaal Triangle Campus, 2014
Chennen, Kirsley. "Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies". Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAJ076/document.
Pełny tekst źródłaOver the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate
Sinkala, Musalula. "Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers". Doctoral thesis, Faculty of Health Sciences, 2020. http://hdl.handle.net/11427/32983.
Pełny tekst źródłaAdjout, Rehab Moufida. "Big Data : le nouvel enjeu de l'apprentissage à partir des données massives". Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCD052.
Pełny tekst źródłaIn recent years we have witnessed a tremendous growth in the volume of data generatedpartly due to the continuous development of information technologies. Managing theseamounts of data requires fundamental changes in the architecture of data managementsystems in order to adapt to large and complex data. Single-based machines have notthe required capacity to process such massive data which motivates the need for scalablesolutions.This thesis focuses on building scalable data management systems for treating largeamounts of data. Our objective is to study the scalability of supervised machine learningmethods in large-scale scenarios. In fact, in most of existing algorithms and datastructures,there is a trade-off between efficiency, complexity, scalability. To addressthese issues, we explore recent techniques for distributed learning in order to overcomethe limitations of current learning algorithms.Our contribution consists of two new machine learning approaches for large scale data.The first contribution tackles the problem of scalability of Multiple Linear Regressionin distributed environments, which permits to learn quickly from massive volumes ofexisting data using parallel computing and a divide and-conquer approach to providethe same coefficients like the classic approach.The second contribution introduces a new scalable approach for ensembles of modelswhich allows both learning and pruning be deployed in a distributed environment.Both approaches have been evaluated on a variety of datasets for regression rangingfrom some thousands to several millions of examples. The experimental results showthat the proposed approaches are competitive in terms of predictive performance while reducing significantly the time of training and prediction
Lindh, Felicia, i Anna Södersten. "Användning av Big Data-analys vid revision : En jämförelse mellan revisionsbyråers framställning och revisionsteamens användning". Thesis, Luleå tekniska universitet, Institutionen för ekonomi, teknik, konst och samhälle, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85115.
Pełny tekst źródłaBycroft, Clare. "Genomic data analyses for population history and population health". Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:c8a76d94-ded6-4a16-b5af-09bbad6292a2.
Pełny tekst źródłaAl-Odat, Zeyad Abdel-Hameed. "Analyses, Mitigation and Applications of Secure Hash Algorithms". Diss., North Dakota State University, 2020. https://hdl.handle.net/10365/32058.
Pełny tekst źródłaBelghache, Elhadi. "AMAS4BigData : analyse dynamique de grandes masses de données par systèmes multi-agents adaptatifs". Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30149.
Pełny tekst źródłaUnderstanding data is the main purpose of data science and how to achieve it is one of the challenges of data science, especially when dealing with big data. The big data era brought us new data processing and data management challenges to face. Existing state-of-the-art analytics tools come now close to handle ongoing challenges and provide satisfactory results with reasonable cost. But the speed at which new data is generated and the need to manage changes in data both for content and structure lead to new rising challenges. This is especially true in the context of complex systems with strong dynamics, as in for instance large scale ambient systems. One existing technology that has been shown as particularly relevant for modeling, simulating and solving problems in complex systems are Multi-Agent Systems. The AMAS (Adaptive Multi-Agent Systems) theory proposes to solve complex problems for which there is no known algorithmic solution by self-organization. The cooperative behavior of the agents enables the system to self-adapt to a dynamical environment so as to maintain the system in a functionality adequate state. In this thesis, we apply this theory to Big Data Analytics. In order to find meaning and relevant information drowned in the data flood, while overcoming big data challenges, a novel analytic tool is needed, able to continuously find relations between data, evaluate them and detect their changes and evolution over time. The aim of this thesis is to present the AMAS4BigData analytics framework based on the Adaptive Multi-agent systems technology, which uses a new data similarity metric, the Dynamics Correlation, for dynamic data relations discovery and dynamic display. This framework is currently being applied in the neOCampus operation, the ambient campus of the University Toulouse III - Paul Sabatier
Lindström, Maja. "Food Industry Sales Prediction : A Big Data Analysis & Sales Forecast of Bake-off Products". Thesis, Umeå universitet, Institutionen för fysik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184184.
Pełny tekst źródłaI denna avhandling har försäljningen av matbröd och fikabröd på Coop Värmland AB studerats. Målet var att hitta vilka faktorer som är viktiga för försäljningen och sedan förutsäga hur försäljningen kommer att se ut i framtiden för att minska svinn och öka vin- ster. Big data- analys och explorativ dataanalys har använts för att lära känna datat och hitta de faktorer som påverkar försäljningen mest. Tidsserieprediktion och olika mask- ininlärningsmodeller användes för att förutspå den framtida försäljningen. Huvudfokus var fem olika modeller som jämfördes och analyserades. De var Decision tree regression, Random forest regression, Artificial neural networks, Recurrent neural networks och en tidsseriemodell som kallas Prophet. Jämförelse mellan de observerade värdena och de värden som predicerats med modellerna indikerade att de modeller som är baserade på tidsserierna är att föredra, det vill säga Prophet och Recurrent neural networks. Dessa två modeller gav de lägsta felen och därmed de mest exakta resultaten. Prophet gav genomsnittliga absoluta procentuella fel på 8.295% för matbröd och 9.156% för fikabröd. Recurrent neural network gav genomsnittliga absoluta procentuella fel på 7.938% för matbröd och 13.12% för fikabröd. Det är ungefär dubbelt så korrekt som de modeller de använder idag på Coop som baseras på medelvärdet av tidigare försäljning.
Leonardelli, Lorena. "Grapevine acidity: SVM tool development and NGS data analyses". Doctoral thesis, University of Trento, 2014. http://eprints-phd.biblio.unitn.it/1350/1/PhD-Thesis.pdf.
Pełny tekst źródłaKozas, Anastasios. "OLAP-Analyse von Propagationsprozessen". [S.l. : s.n.], 2005. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB12168115.
Pełny tekst źródłaLeonardelli, Lorena. "Grapevine acidity: SVM tool development and NGS data analyses". Doctoral thesis, Università degli studi di Trento, 2014. https://hdl.handle.net/11572/368613.
Pełny tekst źródłaMansiaux, Yohann. "Analyse d'un grand jeu de données en épidémiologie : problématiques et perspectives méthodologiques". Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066272/document.
Pełny tekst źródłaThe increasing size of datasets is a growing issue in epidemiology. The CoPanFlu-France cohort(1450 subjects), intended to study H1N1 pandemic influenza infection risk as a combination of biolo-gical, environmental, socio-demographic and behavioral factors, and in which hundreds of covariatesare collected for each patient, is a good example. The statistical methods usually employed to exploreassociations have many limits in this context. We compare the contribution of data-driven exploratorymethods, assuming the absence of a priori hypotheses, to hypothesis-driven methods, requiring thedevelopment of preliminary hypotheses.Firstly a data-driven study is presented, assessing the ability to detect influenza infection determi-nants of two data mining methods, the random forests (RF) and the boosted regression trees (BRT), ofthe conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regres-sion - UFMLR) and of the Least Absolute Shrinkage and Selection Operator (LASSO), with penaltyin multivariate logistic regression to achieve a sparse selection of covariates. A simulation approachwas used to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods.Between three and twenty-four determinants of infection were identified, the pre-epidemic antibodytiter being the unique covariate selected with all methods. The mean TPR were the highest for RF(85%) and BRT (80%), followed by the LASSO (up to 78%), while the UFMLR methodology wasinefficient (below 50%). A slight increase of alpha risk (mean FPR up to 9%) was observed for logisticregression-based models, LASSO included, while the mean FPR was 4% for the data-mining methods.Secondly, we propose a hypothesis-driven causal analysis of the infection risk, with a structural-equation model (SEM). We exploited the SEM specificity of modeling latent variables to study verydiverse factors, their relative impact on the infection, as well as their eventual relationships. Only thelatent variables describing host susceptibility (modeled by the pre-epidemic antibody titer) and com-pliance with preventive behaviors were directly associated with infection. The behavioral factors des-cribing risk perception and preventive measures perception positively influenced compliance with pre-ventive behaviors. The intensity (number and duration) of social contacts was not associated with theinfection.This thesis shows the necessity of considering novel statistical approaches for the analysis of largedatasets in epidemiology. Data mining and LASSO are credible alternatives to the tools generally usedto explore associations with a high number of variables. SEM allows the integration of variables des-cribing diverse dimensions and the explicit modeling of their relationships ; these models are thereforeof major interest in a multidisciplinary study as CoPanFlu
Scholz, Matthias. "Approaches to analyse and interpret biological profile data". Phd thesis, [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=980988799.
Pełny tekst źródłaEl, Ouazzani Saïd. "Analyse des politiques publiques en matière d’adoption du cloud computing et du big data : une approche comparative des modèles français et marocain". Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLE009/document.
Pełny tekst źródłaOur research concerns the public policy analysis on how Cloud Computing and Big data are adopted by French and Moroccan States with a comparative approach between the two models. We have covered these main areas: The impact of the digital on the organization of States and Government ; The digital Public Policy in both France and Morocco countries ;The concept related to the data protection, data privacy ; The limits between security, in particular home security, and the civil liberties ; The future and the governance of the Internet ; A use case on how the Cloud could change the daily work of a public administration ; Our research aims to analyze how the public sector could be impacted by the current digital (re) evolution and how the States could be changed by emerging a new model in digital area called Cyber-State. This term is a new concept and is a new representation of the State in the cyberspace. We tried to analyze the digital transformation by looking on how the public authorities treat the new economics, security and social issues and challenges based on the Cloud Computing and Big Data as the key elements on the digital transformation. We tried also to understand how the States – France and Morocco - face the new security challenges and how they fight against the terrorism, in particular, in the cyberspace. We studied the recent adoption of new laws and legislation that aim to regulate the digital activities. We analyzed the limits between security risks and civil liberties in context of terrorism attacks. We analyzed the concepts related to the data privacy and the data protection. Finally, we focused also on the future of the internet and the impacts on the as is internet architecture and the challenges to keep it free and available as is the case today
Leonardelli, Lorena. "Grapevine acidity: SVM tool development and NGS data analyses". Doctoral thesis, country:IT, 2014. http://hdl.handle.net/10449/24467.
Pełny tekst źródłaCarel, Léna. "Analyse de données volumineuses dans le domaine du transport". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLG001/document.
Pełny tekst źródłaThe aim of this thesis is to apply new methodologies to public transportation data. Indeed, we are more and more surrounded by sensors and computers generating huge amount of data. In the field of public transportation, smart cards generate data about our purchases and our travels every time we use them. In this thesis, we used this data for two purposes. First of all, we wanted to be able to detect passenger's groups with similar temporal habits. To that end, we began to use the Non-negative Matrix Factorization as a pre-processing tool for clustering. Then, we introduced the NMF-EM algorithm allowing simultaneous dimension reduction and clustering on a multinomial mixture model. The second purpose of this thesis is to apply regression methods on these data to be able to forecast the number of check-ins on a network and give a range of likely check-ins. We also used this methodology to be able to detect anomalies on the network
Ren, Zheng. "Case Studies on Fractal and Topological Analyses of Geographic Features Regarding Scale Issues". Thesis, Högskolan i Gävle, Samhällsbyggnad, GIS, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-23996.
Pełny tekst źródłaAsadi, Abduljabbar [Verfasser], i Peter [Akademischer Betreuer] Dietrich. "Advanced Data Mining and Machine Learning Algorithms for Integrated Computer-Based Analyses of Big Environmental Databases / Abduljabbar Asadi ; Betreuer: Peter Dietrich". Tübingen : Universitätsbibliothek Tübingen, 2017. http://d-nb.info/1199392979/34.
Pełny tekst źródłaGRIMAUDO, LUIGI. "Data Mining Algorithms for Internet Data: from Transport to Application Layer". Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2537089.
Pełny tekst źródłaCoquidé, Célestin. "Analyse de réseaux complexes réels via des méthodes issues de la matrice de Google". Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCD038.
Pełny tekst źródłaIn a current period where people use more and more the Internet and are connected worldwide, our lives become easier. The Network science, a recent scientific domain coming from graph theory, handle such connected complex systems. A network is a mathematical object consisting in a set of interconnected nodes and a set of links connecting them. We find networks in nature such as networks of mycelium which grow underground and are able to feed their cells with organic nutrients located at low and long range from them, as well as the circulation system transporting blood throughout the human body. Networks also exist at a human scale where humans are nodes of such networks. In this thesis we are interested in what we call real complex networks which are networks constructed from databases. We can extract information which is normally hard to get since such a network might contain one million of nodes and one hundred times more links. Moreover, networks we are going to study are directed meaning that links have a direction. One can represent a random walk through a directed network with the use of the so-called Google matrix. The PageRank is the leading eigenvector associated to this stochastic matrix and allows us to measure nodes importance. We can also build a smaller Google matrix based on the Google matrix and a subregion of the network. This reduced Google matrix allows us to extract every existing links between the nodes composing the subregion of interest as well as all possible indirect connections between them by spreading through the entire network. With the use of tools developed from the Google matrix, especially the reduced Google matrix, considering the network of Wikipedia's articles we have identified interactions between universities of the world as well as their influence. We have extracted social trends by using data related to actual Wikipedia's users behaviour. Regarding the World Trade Network, we were able to measure economic response of the European Union to external petroleum and gas price variation. Regarding the World Network of economical activities we have figured out interdependence of sectors of production related to powerhouse such as The United States of America and China. We also built a crisis contagion model we applied on the World Trade Network and on the Bitcoin transactions Network
Walczak, Nathalie. "La protection des données personnelles sur l’internet.- Analyse des discours et des enjeux sociopolitiques". Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO20052/document.
Pełny tekst źródłaThis thesis, in Communication and Information Sciences, raises the question of the internet personal data protection through the discourses analysis of four actors concerned with this subject: internet companies, authorities regulating, French population and national press. The objective is to understand how, through the discourses of each one of these actors, the question of the jamming of the spheres private and public about the Internet takes shape. It is a question which increases with the development of the Internet, in particular with the multiplication of the social digital network, which gives to the Internet users various opportunities to display their privacy. The multiplication of the interpersonal relationship devices connection is then accompanied by a contemporary dialectical between private and public spheres, not always controlled by concerned people.This interaction between private and public leads to a transfert of the border wich separates the two spheres and can involves some drifts on behalf of specialized companies, such Google and Facebook, toward the aggregation of personal data contents. Indeed, databases are central in the economic system of these companies and gained a commercial value. However, the commercial use as of these data is not necessarily known by the user and can be realized without its agreement, at least in an explicit way. This double questioning related to the jamming of the private and public spheres, i.e., firstly, the individual aspect where the Internet user is incited to reveal personal elements more and more, and, secondly, the related aspect with the selling of the data by the Internet companies, then generates the question of the individual freedom and data confidentiality. The regulating authorities, in France or in European Union, try to provide answers in order to protect the Internet users by setting up actions relating to the right to be forgotten or by prosecuting Google, for example, when the company does not conform to the laws in force on the territory concerned. The various angles of incidence as well as the diversity of the studied actors required the constitution of a multidimentional corpus in order to have a comparative approach of the different representations. This corpus includes texts registered like political discourses, regulating authorities speeches, companies of the Internet speeches, specifically Google and Facebook, or press speeches which occupy a meta-discursive position since they repeat speeches of the actors previously stated. It includes also oral speeches made up of talks especially recorded for this research with some persons taken randomly in the French population. A quantitative analysis of the discourses between 2010 and 2013, contemporary period with the thesis, permit to carry out a first sorting and to select only the most relevant speeches compared to our hypothesis. The qualitative analysis which followed was based on the theoretical framework previously elaborate in order to cross the representations of the actors in connection with the personal data and to highlight the various visions about this question
El, Zant Samer. "Google matrix analysis of Wikipedia networks". Thesis, Toulouse, INPT, 2018. http://www.theses.fr/2018INPT0046/document.
Pełny tekst źródłaThis thesis concentrates on the analysis of the large directed network representation of Wikipedia.Wikipedia stores valuable fine-grained dependencies among articles by linking webpages togetherfor diverse types of interactions. Our focus is to capture fine-grained and realistic interactionsbetween a subset of webpages in this Wikipedia network. Therefore, we propose to leverage anovel Google matrix representation of the network called the reduced Google matrix. This reducedGoogle matrix (GR) is derived for the subset of webpages of interest (i.e. the reduced network). Asfor the regular Google matrix, one component of GR captures the probability of two nodes of thereduced network to be directly connected in the full network. But unique to GR, anothercomponent accounts for the probability of having both nodes indirectly connected through allpossible paths in the full network. In this thesis, we demonstrate with several case studies that GRoffers a reliable and meaningful representation of direct and indirect (hidden) links of the reducednetwork. We show that GR analysis is complementary to the well-known PageRank analysis andcan be leveraged to study the influence of a link variation on the rest of the network structure.Case studies are based on Wikipedia networks originating from different language editions.Interactions between several groups of interest are studied in details: painters, countries andterrorist groups. For each study, a reduced network is built, direct and indirect interactions areanalyzed and confronted to historical, geopolitical or scientific facts. A sensitivity analysis isconducted to understand the influence of the ties in each group on other nodes (e.g. countries inour case). From our analysis, we show that it is possible to extract valuable interactions betweenpainters, countries or terrorist groups. Network of painters with GR capture art historical fact sucha painting movement classification. Well-known interactions of countries between major EUcountries or worldwide are underlined as well in our results. Similarly, networks of terrorist groupsshow relevant ties in line with their objective or their historical or geopolitical relationships. Weconclude this study by showing that the reduced Google matrix analysis is a novel powerfulanalysis method for large directed networks. We argue that this approach can find as well usefulapplication for different types of datasets constituted by the exchange of dynamic content. Thisapproach offers new possibilities to analyze effective interactions in a group of nodes embedded ina large directed network
Corné, Josefine, i Amanda Ullvin. "Prediktiv analys i vården : Hur kan maskininlärningstekniker användas för att prognostisera vårdflöden?" Thesis, KTH, Skolan för teknik och hälsa (STH), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-211286.
Pełny tekst źródłaThis project was performed in cooperation with Siemens Healthineers. The project aimed to investigate possibilities to forecast healthcare processes by investigating how big data and machine learning can be used for predictive analytics. The project consisted of two separate case studies. Based on data from previous MRI examinations the aim was to investigate if it is possible to predict duration of MRI examinations and identify potential no show patients. The case studies were performed with the programming language R and three machine learning methods were used to develop predictive models for each case study. The results from the case studies indicate that with a greater amount of data of better quality it would be possible to predict duration of MRI examinations and potential no show patients. The conclusion is that these types of predictive models can be used to forecast healthcare processes. This could contribute to increased effectivity and reduced waiting time in healthcare.
Barosen, Alexander, i Sadok Dalin. "Analysis and comparison of interfacing, data generation and workload implementation in BigDataBench 4.0 and Intel HiBench 7.0". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254332.
Pełny tekst źródłaEn av de stora utmaningarna i Big Data är den exakta och meningsfulla bedömningen av systemprestanda. Till skillnad från andra system kan mindre skillnader i effektivitet eskalera till stora skillnader i kostnader och strömförbrukning. Medan det finns flera verktyg på marknaden för att mäta prestanda för Big Data-system, har få av dem undersökts djupgående. I denna rapport undersöktes gränssnittet, datagenereringen och arbetsbelastningen av två Big Data benchmarking-sviter, BigDataBench och HiBench. Syftet med studien var att fastställa varje verktygs kapacitet med hänsyn till de givna kriterierna. Ett utforskande och kvalitativt tillvägagångssätt användes för att samla information och analysera varje benchmarking verktyg. Källkod, dokumentation och rapporter som hade skrivits och publicerats av utvecklarna användes som informationskällor. Resultaten visade att BigDataBench och HiBench utformades på samma sätt med avseende på gränssnitt och dataflöde under utförandet av en arbetsbelastning med undantag för strömmande arbetsbelastningar. BigDataBench tillhandahöll mer realistisk datagenerering medan datagenerering för HiBench var lättare att styra. När det gäller arbetsbelastningsdesign var arbetsbelastningen i BigDataBench utformad för att kunna tillämpas på flera ramar, medan arbetsbelastningen i HiBench var inriktad på Hadoop-familjen. Sammanfattningsvis var ingen av benchmarkingssuperna överlägsen den andra. De var båda utformade för olika ändamål och bör tillämpas från fall till fall.
Inacio, Eduardo Camilo. "Caracterização e modelagem multivariada do desempenho de sistemas de arquivos paralelos". reponame:Repositório Institucional da UFSC, 2015. https://repositorio.ufsc.br/xmlui/handle/123456789/132478.
Pełny tekst źródłaMade available in DSpace on 2015-04-29T21:10:29Z (GMT). No. of bitstreams: 1 332968.pdf: 1630035 bytes, checksum: ab750b282530f4ce742e30736aa9d74d (MD5) Previous issue date: 2015
A quantidade de dados digitais gerados diariamente vem aumentando de forma significativa. Por consequência, as aplicações precisam manipular volumes de dados cada vez maiores, dos mais variados formatos e origens, em alta velocidade, sendo essa problemática denominada como Big Data. Uma vez que os dispositivos de armazenamento não acompanharam a evolução de desempenho observada em processadores e memórias principais, esses acabam se tornando os gargalos dessas aplicações. Sistemas de arquivos paralelos são soluções de software que vêm sendo amplamente adotados para mitigar as limitações de entrada e saída (E/S) encontradas nas plataformas computacionais atuais. Contudo, a utilização eficiente dessas soluções de armazenamento depende da compreensão do seu comportamento diante de diferentes condições de uso. Essa é uma tarefa particularmente desafiadora, em função do caráter multivariado do problema, ou seja, do fato de o desempenho geral do sistema depender do relacionamento e da influência de um grande conjunto de variáveis. Nesta dissertação se propõe um modelo analítico multivariado para representar o comportamento do desempenho do armazenamento em sistemas de arquivos paralelos para diferentes configurações e cargas de trabalho. Um extenso conjunto de experimentos, executados em quatro ambientes computacionais reais, foi realizado com o intuito de identificar um número significativo de variáveis relevantes, caracterizar a influência dessas variáveis no desempenho geral do sistema e construir e avaliar o modelo proposto.Como resultado do esforço de caracterização, o efeito de três fatores, não explorados em trabalhos anteriores, é apresentado. Os resultados da avaliação realizada, comparando o comportamento e valores estimados pelo modelo com o comportamento e valores medidos nos ambientes reais para diferentes cenários de uso, demonstraram que o modelo proposto obteve sucesso na representação do desempenho do sistema. Apesar de alguns desvios terem sido encontrados nos valores estimados pelo modelo, considerando o número significativamente maior de cenários de uso avaliados nessa pesquisa em comparação com propostas anteriores encontradas na literatura, a acurácia das predições foi considerada aceitável.
Abstract : The amount of digital data generated dialy has increased significantly.Consequently, applications need to handle increasing volumes of data, in a variety of formats and sources, with high velocity, namely Big Data problem. Since storage devices did not follow the performance evolution observed in processors and main memories, they become the bottleneck of these applications. Parallel file systems are software solutions that have been widely adopted to mitigate input and output (I/O) limitations found in current computing platforms. However, the efficient utilization of these storage solutions depends on the understanding of their behavior in different conditions of use. This is a particularly challenging task, because of the multivariate nature of the problem, namely the fact that the overall performance of the system depends on the relationship and the influence of a large set of variables. This dissertation proposes an analytical multivariate model to represent storage performance behavior in parallel file systems for different configurations and workloads. An extensive set of experiments, executed in four real computing environments, was conducted in order to identify a significant number of relevant variables, to determine the influence of these variables on overall system performance, and to build and evaluate the proposed model. As a result of the characterization effort, the effect of three factors, not explored in previous works, is presented. Results of the model evaluation, comparing the behavior and values estimated by the model with behavior and values measured in real environments for different usage scenarios, showed that the proposed model was successful in system performance representation. Although some deviations were found in the values estimated by the model, considering the significantly higher number of usage scenarios evaluated in this research work compared to previous proposals found in the literature, the accuracy of prediction was considered acceptable.
Britto, Fernando Perez de. "Perspectivas organizacional e tecnológica da aplicação de analytics nas organizações". Pontifícia Universidade Católica de São Paulo, 2016. https://tede2.pucsp.br/handle/handle/19282.
Pełny tekst źródłaMade available in DSpace on 2016-11-01T17:05:22Z (GMT). No. of bitstreams: 1 Fernando Perez de Britto.pdf: 2289185 bytes, checksum: c32224fdc1bfd0e47372fe52c8927cff (MD5) Previous issue date: 2016-09-12
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The use of Analytics technologies is gaining prominence in organizations exposed to pressures for greater profitability and efficiency, and to a highly globalized and competitive environment in which cycles of economic growth and recession and cycles of liberalism and interventionism, short or long, are more frequents. However, the use of these technologies is complex and influenced by conceptual, human, organizational and technologicalaspects, the latter especially in relation to the manipulation and analysis of large volumes of data, Big Data. From a bibliographicresearch on the organizational and technological perspectives, this work initially deals with theconcepts and technologies relevant to the use of Analytics in organizations, and then explores issues related to the alignment between business processes and data and information, the assessment of the potential of theuseofAnalytics, the use of Analytics in performance management, in process optimization and as decision support, and the establishment of a continuousimprovement process. Enabling at the enda reflection on the directions, approaches, referrals, opportunities and challenges related to the use of Analytics in organizations
A utilização de tecnologias de Analyticsvem ganhando destaque nas organizações expostas a pressões por maior rentabilidade e eficiência, ea um ambiente altamente globalizado e competitivo no qual ciclos de crescimento econômico e recessão e ciclos de liberalismo e intervencionismo, curtos ou longos, estão mais frequentes. Entretanto, a utilização destas tecnologias é complexa e influenciada por aspectos conceituais, humanos, organizacionais e tecnológicos, este último principalmente com relação à manipulação e análise de grandes volumes de dados, Big Data. A partir de uma pesquisa bibliográfica sobre as perspectivas organizacional e tecnológica, este trabalho trata inicialmente de conceitos e tecnologias relevantes para a utilização de Analyticsnas organizações, eem seguida explora questões relacionadas ao alinhamento entre processos organizacionaise dados e informações, à avaliação de potencial de utilização de Analytics, à utilização de Analyticsem gestão de performance, otimização de processos e como suporte à decisão, e ao estabelecimento de um processo de melhoria contínua.Possibilitandoao finaluma reflexão sobre os direcionamentos, as abordagens, os encaminhamentos, as oportunidades e os desafios relacionados àutilização de Analyticsnas organizações
Ledieu, Thibault. "Analyse et visualisation de trajectoires de soins par l’exploitation de données massives hospitalières pour la pharmacovigilance". Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1B032/document.
Pełny tekst źródłaThe massification of health data is an opportunity to answer questions about vigilance and quality of care. The emergence of big data in health is an opportunity to answer questions about vigilance and quality of care. In this thesis work, we will present approaches to exploit the diversity and volume of intra-hospital data for pharmacovigilance use and monitoring the proper use of drugs. This approach will be based on the modelling of intra-hospital care trajectories adapted to the specific needs of pharmacovigilance. Using data from a hospital warehouse, it will be necessary to characterize events of interest and identify a link between the administration of these health products and the occurrence of adverse reactions, or to look for cases of misuse of the drug. The hypothesis put forward in this thesis is that an interactive visual approach would be suitable for the exploitation of these heterogeneous and multi-domain biomedical data in the field of pharmacovigilance. We have developed two prototypes allowing the visualization and analysis of care trajectories. The first prototype is a tool for visualizing the patient file in the form of a timeline. The second application is a tool for visualizing and searching a cohort of event sequences The latter tool is based on the implementation of sequence analysis algorithms (Smith-Waterman, Apriori, GSP) for the search for similarity or patterns of recurring events. These human-machine interfaces have been the subject of usability studies on use cases from actual practice that have proven their potential for routine use
Huguet, Thibault. "La société connectée : contribution aux analyses sociologiques des liens entre technique et société à travers l'exemple des outils médiatiques numériques". Thesis, Montpellier 3, 2017. http://www.theses.fr/2017MON30002/document.
Pełny tekst źródłaInitiated for several decades, the development of the digital technology mark by its deep stamp the minds and the body of our contemporary society. More than a simple social phenomenon, it seems to be generaly agreed that we assist today at a true « anthropological mutation ». Nevertheless, while the analyses of the links between technology and society have been characterized for a long time by some deterministic prospects, we propose to explore in this thesis the dynamic relations which make that a technic is eminently social, and that a society is intrinsically technic. Adhering to a comprehensive approach, this research seeks to highlight the significations and the meaning systems related to the use of digital media tools, at a macro-social and a micro-social scale, to explain causally the importance we ascribed to this specific category of objects. The dynamics at work, both at an individual or collective level, are examinated in a socio-logical way, alternately with an historical, philosophical, economical, political, or socio-cultural point of view. As artefacts-symbols of our present day societies – total social object –, the digital media are the tools upon which we organize the contemporaneity of our relationship with the world : we regard them as a sociological prism from which it possible to grasp the connected society
Loose, Tobias Sebastian. "Konzept für eine modellgestützte Diagnostik mittels Data Mining am Beispiel der Bewegungsanalyse". Karlsruhe : Univ.-Verl, 2004. http://deposit.d-nb.de/cgi-bin/dokserv?idn=973140607.
Pełny tekst źródłaMaraun, Douglas. "What can we learn from climate data? : Methods for fluctuation, time/scale and phase analysis". Phd thesis, [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=981698980.
Pełny tekst źródłaSchwarz, Holger. "Integration von Data-Mining und online analytical processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien /". [S.l. : s.n.], 2003. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10720634.
Pełny tekst źródłaWong, Shing-tat, i 黃承達. "Disaggregate analyses of stated preference data for capturing parking choice behavior". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B36393678.
Pełny tekst źródła