Tesis sobre el tema "Données massives – Gestion"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 39 mejores tesis para su investigación sobre el tema "Données massives – Gestion".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Dia, Amadou Fall. "Filtrage sémantique et gestion distribuée de flux de données massives". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS495.
Texto completoOur daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system
Castanié, Laurent. "Visualisation de données volumiques massives : application aux données sismiques". Thesis, Vandoeuvre-les-Nancy, INPL, 2006. http://www.theses.fr/2006INPL083N/document.
Texto completoSeismic reflection data are a valuable source of information for the three-dimensional modeling of subsurface structures in the exploration-production of hydrocarbons. This work focuses on the implementation of visualization techniques for their interpretation. We face both qualitative and quantitative challenges. It is indeed necessary to consider (1) the particular nature of seismic data and the interpretation process (2) the size of data. Our work focuses on these two distinct aspects : 1) From the qualitative point of view, we first highlight the main characteristics of seismic data. Based on this analysis, we implement a volume visualization technique adapted to the specificity of the data. We then focus on the multimodal aspect of interpretation which consists in combining several sources of information (seismic and structural). Depending on the nature of these sources (strictly volumes or both volumes and surfaces), we propose two different visualization systems. 2) From the quantitative point of view, we first define the main hardware constraints involved in seismic interpretation. Focused on these constraints, we implement a generic memory management system. Initially able to couple visualization and data processing on massive data volumes, it is then improved and specialised to build a dynamic system for distributed memory management on PC clusters. This later version, dedicated to visualization, allows to manipulate regional scale seismic data (100-200 GB) in real-time. The main aspects of this work are both studied in the scientific context of visualization and in the application context of geosciences and seismic interpretation
Castelltort, Arnaud. "Historisation de données dans les bases de données NoSQLorientées graphes". Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20076.
Texto completoThis thesis deals with data historization in the context of graphs. Graph data have been dealt with for many years but their exploitation in information systems, especially in NoSQL engines, is recent. The emerging Big Data and 3V contexts (Variety, Volume, Velocity) have revealed the limits of classical relational databases. Historization, on its side, has been considered for a long time as only linked with technical and backups issues, and more recently with decisional reasons (Business Intelligence). However, historization is now taking more and more importance in management applications.In this framework, graph databases that are often used have received little attention regarding historization. Our first contribution consists in studying the impact of historized data in management information systems. This analysis relies on the hypothesis that historization is taking more and more importance. Our second contribution aims at proposing an original model for managing historization in NoSQL graph databases.This proposition consists on the one hand in elaborating a unique and generic system for representing the history and on the other hand in proposing query features.We show that the system can support both simple and complex queries.Our contributions have been implemented and tested over synthetic and real databases
Baron, Benjamin. "Transport intermodal de données massives pour le délestage des réseaux d'infrastructure". Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066454/document.
Texto completoIn this thesis, we exploit the daily mobility of vehicles to create an alternative transmission medium. Our objective is to draw on the many vehicular trips taken by cars or public transports to overcome the limitations of conventional data networks such as the Internet. In the first part, we take advantage of the bandwidth resulting from the mobility of vehicles equipped with storage capabilities to offload large amounts of delay-tolerant traffic from the Internet. Data is transloaded to data storage devices we refer to as offloading spots, located where vehicles stop often and long enough to transfer large amounts of data. Those devices act as data relays, i.e., they store data it is until loaded on and carried by a vehicle to the next offloading spot where it can be dropped off for later pick-up and delivery by another vehicle. We further extend the concept of offloading spots according to two directions in the context of vehicular cloud services. In the first extension, we exploit the storage capabilities of the offloading spots to design a cloud-like storage and sharing system for vehicle passengers. In the second extension, we dematerialize the offloading spots into pre-defined areas with high densities of vehicles that meet long enough to transfer large amounts of data. The performance evaluation of the various works conducted in this thesis shows that everyday mobility of entities surrounding us enables innovative services with limited reliance on conventional data networks
Baron, Benjamin. "Transport intermodal de données massives pour le délestage des réseaux d'infrastructure". Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066454.
Texto completoIn this thesis, we exploit the daily mobility of vehicles to create an alternative transmission medium. Our objective is to draw on the many vehicular trips taken by cars or public transports to overcome the limitations of conventional data networks such as the Internet. In the first part, we take advantage of the bandwidth resulting from the mobility of vehicles equipped with storage capabilities to offload large amounts of delay-tolerant traffic from the Internet. Data is transloaded to data storage devices we refer to as offloading spots, located where vehicles stop often and long enough to transfer large amounts of data. Those devices act as data relays, i.e., they store data it is until loaded on and carried by a vehicle to the next offloading spot where it can be dropped off for later pick-up and delivery by another vehicle. We further extend the concept of offloading spots according to two directions in the context of vehicular cloud services. In the first extension, we exploit the storage capabilities of the offloading spots to design a cloud-like storage and sharing system for vehicle passengers. In the second extension, we dematerialize the offloading spots into pre-defined areas with high densities of vehicles that meet long enough to transfer large amounts of data. The performance evaluation of the various works conducted in this thesis shows that everyday mobility of entities surrounding us enables innovative services with limited reliance on conventional data networks
Gueye, Ndeye. "Une démarche de gestion stratégique et opérationnelle du changement dans le contexte de l'exploitation avancée de données massives internes aux organisations". Master's thesis, Université Laval, 2017. http://hdl.handle.net/20.500.11794/30367.
Texto completoGarmaki, Mahda. "La capacité des "Big Data Analytics" et la création de valeur : l’effet médiateur de l’apprentissage organisationnel sur la performance des entreprises". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLE018.
Texto completoThe purpose of this grounded theory research is to explore ‘to which extent firms can achieve value from big data analytics (BDA), in order to improve firm performance’. BDA is dramatically influencing the way firms perform and compete within the digital era. In this light, BDA has become the trending issue that generates innovative solutions and valuable insights through predictive approach. Despite the hype surrounding BDA value creation, it has not fully understood the features drive value and sustain competitive advantage from BDA. Using the classic grounded theory, this thesis conducted interviews with twenty-two executives from different firms. Through substantive theory, BDA capability is conceptualized as the core competency, which enables firms to accomplish value from BDA, transform the business into the data-driven approach, and subsequently enhance firm performance over-time. The core contribution of this grounded theory research focuses on capability building to implement and manipulate BDA. The findings of this study contribute to the knowledge of BDA value creation and digitalization through the following discussions: 1) while the conventional approach about BDA focuses data collection or investment on technologies, the findings indicate the various dimensions (internal and external resources and capabilities) should jointly contribute to building the overall BDA,2) furthermore, these dimensions and their properties create the integrative network, which is incomplete in the absence of individual dimension or their properties, 3) To identify the variables that are influenced by BDA capability, enhancing organizational learning is introduced as the “hidden value” of BDA capability, which is the dynamic process, and consequently develops sustained competitive advantage, 4) Within the digital era, BDA is the primary digital asset, as well as, digital lever. In this light, BDA capability fosters digital transformation through providing prerequisite capabilities, 5) Drawing resource-based view, knowledge-based view, and dynamic capability the conceptual model of this research is addressed through the combination of different resources (tangible intangible and personnel-based resources) and capabilities.The conceptual model demonstrates the direct effect of BDA capability on firm performance, as well as, the indirect effect that is mediated by organizational learning
Barry, Mariam. "Adaptive Scalable Online Learning for Handling Heterogeneous Streaming Data in Large-Scale Banking Infrastructure". Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAT006.
Texto completoArtificial Intelligence (AI) is a powerful tool to extract valuable insights for decision-making. However, learning from heterogeneous and unstructured streaming data presents a multitude of challenges that this research aims to tackle. The creation of big data is projected to experience exponential growth, with expectations to surpass 2,000 zettabytes by the year 2035. Such Big Data highlights the importance of efficient, incremental, and adaptive models. Online Learning, known as Streaming Machine Learning (SML), is a dynamic technique for building and updating learning models as new data arrive, without the need for periodic complete model replacement. It is the most efficient technique for big data stream learning. The change detection task is a proactive way to detect and prevent critical events such as cyber-attacks, fraud detection, or IT incidents in an online fashion. The research conducted during this thesis aims to develop adaptive and scalable online machine-learning solutions to learn from heterogeneous streaming data that can be operationalized with large-scale infrastructures, particularly in the banking sector. This Ph.D. thesis delves into algorithmic and infrastructure challenges related to continuous training and serving online machine learning over high-velocity streaming data from diverse sources, specifically focusing on large-scale IT infrastructures (AIOps). Thesis contributions include techniques like StreamFlow for summarizing information from big data streams, Stream2Graph for dynamically building and updating knowledge graphs for batch and online learning tasks, and StreamChange, an efficient and explainable online change detection model. Evaluation results on real-world open data and industrial data demonstrate performance improvements in learned models. StreamChange surpasses state-ofthe-art techniques in detecting gradual and abrupt changes. Additionally, the thesis introduces a conceptual framework, StreamMLOps, for scaling and serving online machine learning in real-time without pausing the inference pipeline. This framework showcases the effectiveness of the proposed MLOps pipeline on a feature-evolving dataset with millions of dimensions for malicious event detection tasks. Finally, we share lessons learned regarding Streaming Machine Learning systems, AI at scale, and online model management in large-scale banking, with a focus on streaming data and real-time applications
Brahem, Mariem. "Optimisation de requêtes spatiales et serveur de données distribué - Application à la gestion de masses de données en astronomie". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLV009/document.
Texto completoThe big scientific data generated by modern observation telescopes, raises recurring problems of performances, in spite of the advances in distributed data management systems. The main reasons are the complexity of the systems and the difficulty to adapt the access methods to the data. This thesis proposes new physical and logical optimizations to optimize execution plans of astronomical queries using transformation rules. These methods are integrated in ASTROIDE, a distributed system for large-scale astronomical data processing.ASTROIDE achieves scalability and efficiency by combining the benefits of distributed processing using Spark with the relevance of an astronomical query optimizer.It supports the data access using the query language ADQL that is commonly used.It implements astronomical query algorithms (cone search, kNN search, cross-match, and kNN join) tailored to the proposed physical data organization.Indeed, ASTROIDE offers a data partitioning technique that allows efficient processing of these queries by ensuring load balancing and eliminating irrelevant partitions. This partitioning uses an indexing technique adapted to astronomical data, in order to reduce query processing time
Hatia, Saalik. "Leveraging formal specification to implement a database backend". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS137.
Texto completoConceptually, a database storage backend is just a map of keys to values. However, to provide performance and reliability, a modern store is a complex, concurrent software system, opening many opportunities for bugs. This thesis reports on our journey from formal specification of a store to its implementation. The specification is terse and unambiguous, and helps reason about correctness. Read as pseudocode, the specification provides a rigorous grounding for implementation. The specification describes a store as a simple transactional shared memory, with two (behaviourally equivalent) variants, map- and journal-based. We implement these two basic variants verbatim in Java. We specify the features of a modern store, such as a write-ahead log with checkpointing and truncation, as a dynamic composition of instances of the two basic variants. The specification of correct composition is particularly simple. Our experimental evaluation of an implementation has acceptable performance, while our rigorous methodology increases confidence in its correctness
Jain, Sheenam. "Big data management using artificial intelligence in the apparel supply chain : opportunities and challenges". Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I051.
Texto completoOver the past decade, the apparel industry has seen several applications of big data and artificial intelligence (AI) in dealing with various business problems. With the increase in competition and customer demands for the personalization of products and services which can enhance their brand experience and satisfaction, supply-chain managers in apparel firms are constantly looking for ways to improve their business strategies so as to bring speed and cost efficiency to their organizations. The big data management solutions presented in this thesis highlight opportunities for apparel firms to look into their supply chains and identify big data resources that may be valuable, rare, and inimitable, and to use them to create data-driven strategies and establish dynamic capabilities to sustain their businesses in an uncertain business environment. With the help of these data-driven strategies, apparel firms can produce garments smartly to provide customers with a product that closer meets their needs, and as such drive sustainable consumption and production practices.In this context, this thesis aims to investigate whether apparel firms can improve their business operations by employing big data and AI, and in so doing, seek big data management opportunities using AI solutions. Firstly, the thesis identifies and classifies AI techniques that can be used at various stages of the supply chain to improve existing business operations. Secondly, the thesis presents product-related data to create a classification model and design rules that can create opportunities for providing personalized recommendations or customization, enabling better shopping experiences for customers. Thirdly, this thesis draws from the evidence in the industry and existing literature to make suggestions that may guide managers in developing data-driven strategies for improving customer satisfaction through personalized services. Finally, this thesis shows the effectiveness of data-driven analytical solutions in sustaining competitive advantage via the data and knowledge already present within the apparel supply chain. More importantly, this thesis also contributes to the field by identifying specific opportunities with big data management using AI solutions. These opportunities can be a starting point for other research in the field of technology and management
Liu, Rutian. "Semantic services for assisting users to augment data in the context of analytic data sources". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS208.
Texto completoThe production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using bigdata frameworks and agile data preparation tools. However, it still remains difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This thesis presents a new solution for business users and data scientists who want to augment the schema of analytic datasets with attributes coming from other semantically related datasets : We introduce attribute graphs as a novel concise and natural way to represent literal functional dependencies over hierarchical dimension level types to infer unique dimension and fact table identifiers We give formal definitions for schema augmentation, schema complement, and merge query in the context of analytic tables. We then introduce several reduction operations to enforce schema complements when schema augmentation yields a row multiplication in the augmented dataset. We define formal quality criteria and algorithms to control the correctness, non-ambiguity, and completeness of generated schema augmentations and schema complements. We describe the implementation of our solution as a REST service within the SAP HANA platform and provide a detailed description of our algorithms. We evaluate the performance of our algorithms to compute unique identifiers in dimension and fact tables and analyze the effectiveness of our REST service using two application scenarios
De, Oliveira Joffrey. "Gestion de graphes de connaissances dans l'informatique en périphérie : gestion de flux, autonomie et adaptabilité". Electronic Thesis or Diss., Université Gustave Eiffel, 2023. http://www.theses.fr/2023UEFL2069.
Texto completoThe research work carried out as part of this PhD thesis lies at the interface between the Semantic Web, databases and edge computing. Indeed, our objective is to design, develop and evaluate a database management system (DBMS) based on the W3C Resource Description Framework (RDF) data model, which must be adapted to the terminals found in Edge computing.The possible applications of such a system are numerous and cover a wide range of sectors such as industry, finance and medicine, to name but a few. As proof of this, the subject of this thesis was defined with the team from the Computer Science and Artificial Intelligence Laboratory (CSAI) at ENGIE Lab CRIGEN. The latter is ENGIE's research and development centre dedicated to green gases (hydrogen, biogas and liquefied gases), new uses of energy in cities and buildings, industry and emerging technologies (digital and artificial intelligence, drones and robots, nanotechnologies and sensors). CSAI financed this thesis as part of a CIFRE-type collaboration.The functionalities of a system satisfying these characteristics must enable anomalies and exceptional situations to be detected in a relevant and effective way from measurements taken by sensors and/or actuators. In an industrial context, this could mean detecting excessively high measurements, for example of pressure or flow rate in a gas distribution network, which could potentially compromise infrastructure or even the safety of individuals. This detection must be carried out using a user-friendly approach to enable as many users as possible, including non-programmers, to describe risk situations. The approach must therefore be declarative, not procedural, and must be based on a query language, such as SPARQL.We believe that Semantic Web technologies can make a major contribution in this context. Indeed, the ability to infer implicit consequences from explicit data and knowledge is a means of creating new services that are distinguished by their ability to adjust to the circumstances encountered and to make autonomous decisions. This can be achieved by generating new queries in certain alarming situations, or by defining a minimal sub-graph of knowledge that an instance of our DBMS needs in order to respond to all of its queries.The design of such a DBMS must also take into account the inherent constraints of Edge computing, i.e. the limits in terms of computing capacity, storage, bandwidth and sometimes energy (when the terminal is powered by a solar panel or a battery). Architectural and technological choices must therefore be made to meet these limitations. With regard to the representation of data and knowledge, our design choice fell on succinct data structures (SDS), which offer, among other advantages, the fact that they are very compact and do not require decompression during querying. Similarly, it was necessary to integrate data flow management within our DBMS, for example with support for windowing in continuous SPARQL queries, and for the various services supported by our system. Finally, as anomaly detection is an area where knowledge can evolve, we have integrated support for modifications to the knowledge graphs stored on the client instances of our DBMS. This support translates into an extension of certain SDS structures used in our prototype
Khelil, Abdallah. "Gestion et optimisation des données massives issues du Web Combining graph exploration and fragmentation for scalable rdf query processing Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System? EXGRAF : Exploration et Fragmentation de Graphes au Service du Traitement Scalable de Requˆetes RDF". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2020. http://www.theses.fr/2020ESMA0009.
Texto completoBig Data represents a challenge not only for the socio-economic world but also for scientific research. Indeed, as has been pointed out in several scientific articles and strategic reports, modern computer applications are facing new problems and issues that are mainly related to the storage and the exploitation of data generated by modern observation and simulation instruments. The management of such data represents a real bottleneck which has the effect of slowing down the exploitation of the various data collected not only in the framework of international scientific programs but also by companies, the latter relying increasingly on the analysis of large-scale data. Much of this data is published today on the WEB. Indeed, we are witnessing an evolution of the traditional web, designed basically to manage documents, to a web of data that allows to offer mechanisms for querying semantic information. Several data models have been proposed to represent this information on the Web. The most important is the Resource Description Framework (RDF) which provides a simple and abstract representation of knowledge for resources on the Web. Each semantic Web fact can be encoded with an RDF triple. In order to explore and query structured information expressed in RDF, several query languages have been proposed over the years. In 2008,SPARQL became the official W3C Recommendation language for querying RDF data.The need to efficiently manage and query RDF data has led to the development of new systems specifically designed to process this data format. These approaches can be categorized as centralized that rely on a single machine to manage RDF data and distributed that can combine multiple machines connected with a computer network. Some of these approaches are based on an existing data management system such as Virtuoso and Jena, others relies on an approach specifically designed for the management of RDF triples such as GRIN, RDF3X and gStore. With the evolution ofRDF datasets (e.g. DBPedia) and Sparql, most systems have become obsolete and/or inefficient. For example, no one of existing centralized system is able to manage 1 billion triples provided under the WatDiv benchmark. Distributed systems would allow under certain conditions to improve this point but consequently leads a performance degradation. In this Phd thesis, we propose the centralized system "RDF_QDAG" that allows to find a good compromise between scalability and performance. We propose to combine physical data fragmentation and data graph exploration."RDF_QDAG" supports multiple types of queries based not only on basic graph patterns but also that incorporate filters based on regular expressions and aggregation and sorting functions. "RDF_QDAG" relies on the Volcano execution model, which allows controlling the main memory, avoiding any overflow even if the hardware configuration is limited. To the best of our knowledge, "RDF_QDAG" is the only centralized system that good performance when manage several billion triples. We compared this system with other systems that represent the state of the art in RDF data management: a relational approach (Virtuoso), a graph-based approach (g-Store), an intensive indexing approach (RDF-3X) and two parallel approaches (CliqueSquare and g-Store-D). "RDF_QDAG" surpasses existing systems when it comes to ensuring both scalability and performance
Honore, Valentin. "Convergence HPC - Big Data : Gestion de différentes catégories d'applications sur des infrastructures HPC". Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0145.
Texto completoNumerical simulations are complex programs that allow scientists to solve, simulate and model complex phenomena. High Performance Computing (HPC) is the domain in which these complex and heavy computations are performed on large-scale computers, also called supercomputers.Nowadays, most scientific fields need supercomputers to undertake their research. It is the case of cosmology, physics, biology or chemistry. Recently, we observe a convergence between Big Data/Machine Learning and HPC. Applications coming from these emerging fields (for example, using Deep Learning framework) are becoming highly compute-intensive. Hence, HPC facilities have emerged as an appropriate solution to run such applications. From the large variety of existing applications has risen a necessity for all supercomputers: they mustbe generic and compatible with all kinds of applications. Actually, computing nodes also have a wide range of variety, going from CPU to GPU with specific nodes designed to perform dedicated computations. Each category of node is designed to perform very fast operations of a given type (for example vector or matrix computation).Supercomputers are used in a competitive environment. Indeed, multiple users simultaneously connect and request a set of computing resources to run their applications. This competition for resources is managed by the machine itself via a specific program called scheduler. This program reviews, assigns andmaps the different user requests. Each user asks for (that is, pay for the use of) access to the resources ofthe supercomputer in order to run his application. The user is granted access to some resources for a limited amount of time. This means that the users need to estimate how many compute nodes they want to request and for how long, which is often difficult to decide.In this thesis, we provide solutions and strategies to tackle these issues. We propose mathematical models, scheduling algorithms, and resource partitioning strategies in order to optimize high-throughput applications running on supercomputers. In this work, we focus on two types of applications in the context of the convergence HPC/Big Data: data-intensive and irregular (orstochastic) applications.Data-intensive applications represent typical HPC frameworks. These applications are made up oftwo main components. The first one is called simulation, a very compute-intensive code that generates a tremendous amount of data by simulating a physical or biological phenomenon. The second component is called analytics, during which sub-routines post-process the simulation output to extract,generate and save the final result of the application. We propose to optimize these applications by designing automatic resource partitioning and scheduling strategies for both of its components.To do so, we use the well-known in situ paradigm that consists in scheduling both components together in order to reduce the huge cost of saving all simulation data on disks. We propose automatic resource partitioning models and scheduling heuristics to improve overall performance of in situ applications.Stochastic applications are applications for which the execution time depends on its input, while inusual data-intensive applications the makespan of simulation and analytics are not affected by such parameters. Stochastic jobs originate from Big Data or Machine Learning workloads, whose performanceis highly dependent on the characteristics of input data. These applications have recently appeared onHPC platforms. However, the uncertainty of their execution time remains a strong limitation when using supercomputers. Indeed, the user needs to estimate how long his job will have to be executed by the machine, and enters this estimation as his first reservation value. But if the job does not complete successfully within this first reservation, the user will have to resubmit the job, this time requiring a longer reservation
Cappuzzo, Riccardo. "Deep learning models for tabular data curation". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS047.
Texto completoData retention is a pervasive and far-reaching topic, affecting everything from academia to industry. Current solutions rely on manual work by domain users, but they are not adequate. We are investigating how to apply deep learning to tabular data curation. We focus our work on developing unsupervised data curation systems and designing curation systems that intrinsically model categorical values in their raw form. We first implement EmbDI to generate embeddings for tabular data, and address the tasks of entity resolution and schema matching. We then turn to the data imputation problem using graphical neural networks in a multi-task learning framework called GRIMP
Ali, Shayar. "Smart City : Implementation and development of platforms for the management of SunRise Smart Campus". Thesis, Lille 1, 2018. http://www.theses.fr/2018LIL1I027/document.
Texto completoThis work concerns the implementation of professional platforms and the development of SunRise platform for managing a Smart City. It is a part of SunRise project, which aims at turning the Scientific Campus of the University of Lille into a large-scale demonstrator site of the "Smart and Sustainable City". The campus is representative to a small town of 25000 inhabitants and 100 km of urban infrastructure.This thesis includes five parts. The first part includes a literature review concerning the Smart Cities with its definitions and components. The second part presents the role of data in Smart Cities, as well as the latest technologies that are used for Smart City management. It presents also the different existing architectures and platforms for management a Smart City.The Third part presents the SunRise Smart City demonstrator, which is used as a basis for this thesis. The part details the instrumentation installed in the demo site as well as the GIS model of the demonstrator. The fourth part concerns the architecture of the two professional platforms PI System and OpenDataSoft as well as their implementation and use for the analysis of water consumption.The last part describes the architecture of the platform SunRise and details its layers. It presents also the stages of the platform development and implementation
Fertier, Audrey. "Interprétation automatique de données hétérogènes pour la modélisation de situations collaboratives : application à la gestion de crise". Thesis, Ecole nationale des Mines d'Albi-Carmaux, 2018. http://www.theses.fr/2018EMAC0009/document.
Texto completoThe present work is applied to the field of French crisis management, and specifically to the crisis response phase which follows a major event, like a flood or an industrial accident. In the aftermath of the event, crisis cells are activated to prevent and deal with the consequences of the crisis. They face, in a hurry, many difficulties. The stakeholders are numerous, autonomous and heterogeneous, the coexistence of contingency plans favours contradictions and the interconnections of networks promotes cascading effects. These observations arise as the volume of data available continues to grow. They come, for example, from sensors, social media or volunteers on the crisis theatre. It is an occasion to design an information system able to collect the available data to interpret them and obtain information suited to the crisis cells. To succeed, it will have to manage the 4Vs of Big Data: the Volume, the Variety and Veracity of data and information, while following the dynamic (velocity) of the current crisis. Our literature review on the different parts of this architecture enables us to define such an information system able to (i) receive different types of events emitted from data sources both known and unknown, (ii) to use interpretation rules directly deduced from official business rules and (iii) to structure the information that will be used by the stake-holders. Its architecture is event-driven and coexists with the service oriented architecture of the software developed by the CGI laboratory. The implemented system has been tested on the scenario of a 1/100 per year flood elaborated by two French forecasting centres. The model describing the current crisis situation, deduced by the proposed information system, can be used to (i) deduce a crisis response process, (ii) to detect unexpected situations, and (iii) to update a COP suited to the decision-makers
Zarebski, David. "Ontologie naturalisée et ingénierie des connaissances". Thesis, Paris 1, 2018. http://www.theses.fr/2018PA01H232/document.
Texto completo«What do I need to know about something to know it ?». It is no wonder that such a general, hard to grasp and riddle-like question remained the exclusive domain of a single discipline for centuries : Philosophy. In this context, the distinction of the primitive components of reality – the so called "world’s furniture" – and their relations is called an Ontology. This book investigates the emergence of similar questions in two different though related fields, namely : Artificial Intelligence and Knowledge Engineering. We show here that the way these disciplines apply an ontological methodology to either cognition or knowledge representation is not a mere analogy but raises a bunch of relevant questions and challenges from both an applied and a speculative point of view. More specifically, we suggest that some of the technical answers to the issues addressed by Big Data invite us to revisit many traditional philosophical positions concerning the role of language or common sense reasoning in the thought or the existence of mind-independent structure in reality
Boukorca, Ahcène. "Hypergraphs in the Service of Very Large Scale Query Optimization. Application : Data Warehousing". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2016. http://www.theses.fr/2016ESMA0026/document.
Texto completoThe emergence of the phenomenon Big-Data conducts to the introduction of new increased and urgent needs to share data between users and communities, which has engender a large number of queries that DBMS must handle. This problem has been compounded by other needs of recommendation and exploration of queries. Since data processing is still possible through solutions of query optimization, physical design and deployment architectures, in which these solutions are the results of combinatorial problems based on queries, it is essential to review traditional methods to respond to new needs of scalability. This thesis focuses on the problem of numerous queries and proposes a scalable approach implemented on framework called Big-queries and based on the hypergraph, a flexible data structure, which bas a larger modeling power and may allow accurate formulation of many problems of combinatorial scientific computing. This approach is the result of collaboration with the company Mentor Graphies. It aims to capture the queries interaction in an unified query plan and to use partitioning algorithms to ensure scalability and to optimal optimization structures (materialized views and data partitioning). Also, the unified plan is used in the deploymemt phase of parallel data warehouses, by allowing data partitioning in fragments and allocating these fragments in the correspond processing nodes. Intensive experimental study sbowed the interest of our approach in terms of scaling algorithms and minimization of query response time
Tran, Viet-Trung. "Scalable data-management systems for Big Data". Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.
Texto completoDebaere, Steven. "Proactive inferior member participation management in innovation communities". Thesis, Lille, 2018. http://www.theses.fr/2018LIL1A012.
Texto completoNowadays, companies increasingly recognize the benefits of innovation communities (ICs) to inject external consumer knowledge into innovation processes. Despite the advantages of ICs, guaranteeing the viability poses two important challenges. First, ICs are big data environments that can quickly overwhelm community managers as members communicate through posts, thereby creating substantial (volume), rapidly expanding (velocity), and unstructured data that might encompass combinations of linguistic, video, image, and audio cues (variety). Second, most online communities fail to generate successful outcomes as they are often unable to derive value from individual IC members owing to members’ inferior participation. This doctoral dissertation leverages customer relationship management strategies to tackle these challenges and adds value by introducing a proactive inferior member participation management framework for community managers to proactively reduce inferior member participation, while effectively dealing with the data-rich IC environment. It proves that inferior member participation can be identified proactively by analyzing community actors’ writing style. It shows that dependencies between members’ participation behaviour can be exploited to improve prediction performance. Using a field experiment, it demonstrates that a proactive targeted email campaign allows to effectively reduce inferior member participation
Saif, Abdulqawi. "Experimental Methods for the Evaluation of Big Data Systems". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0001.
Texto completoIn the era of big data, many systems and applications are created to collect, to store, and to analyze massive data in multiple domains. Although those – big data systems – are subjected to multiple evaluations during their development life-cycle, academia and industry encourage further experimentation to ensure their quality of service and to understand their performance under various contexts and configurations. However, the experimental challenges of big data systems are not trivial. While many pieces of research still employ legacy experimental methods to face such challenges, we argue that experimentation activity can be improved by proposing flexible experimental methods. In this thesis, we address particular challenges to improve experimental context and observability for big data experiments. We firstly enable experiments to customize the performance of their environmental resources, encouraging researchers to perform scalable experiments over heterogeneous configurations. We then introduce two experimental tools: IOscope and MonEx to improve observability. IOscope allows performing low-level observations on the I/O stack to detect potential performance issues in target systems, convincing that the high-level evaluation techniques should be accompanied by such complementary tools to understand systems’ performance. In contrast, MonEx framework works on higher levels to facilitate experimental data collection. MonEx opens directions to practice experiment-based monitoring independently from the underlying experimental environments. We finally apply statistics to improve experimental designs, reducing the number of experimental scenarios and obtaining a refined set of experimental factors as fast as possible. At last, all contributions complement each other to facilitate the experimentation activity by working almost on all phases of big data experiments’ life-cycle
Ramdane, Yassine. "Big Data Warehouse : model de distribution des cubes de données à la volée". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2099.
Texto completoPartitioning and distributing of the data have been widely used in sharing nothing systems, more particularly in the distributed systems that used the MapReduce paradigm, such as Hadoop ecosystem and Spark. They have been used for many purposes, such as load balancing, skipping to load unnecessary data partitions and for guiding the physical design ofdistributed databases or data warehouses. To do analysis with data warehouses, usually, we used OLAP queries. An OLAP query is a complex query that contains several cost operations, such as the star join, the projection, filtering, and aggregate functions. In this thesis, we propose different static and dynamic approaches of partitioning and load balancing of the data, to improve the performances of distributed big data warehouses over Hadoop cluster. We have proposed different static and dynamic schemes of a big data warehouse over a cluster of homogeneous nodes, which can help the distributed system to enhance the executing time of OLAP query operations, such as star join operation, scanning tables, and Group-By operation.We have proposed four approaches: The first approach, is a new data placement strategy which able to help a query processing system to perform a star join operation in only one MapReduce cycle, without a shuffle phase; In the second contribution, we propose different partitioning and bucketing techniques to skip loading some HDFS blocks and to enhance the parallel treatment of the distributed system, based on a workload-driven model; In the third approach, we propose a novel physical design of distributed big data warehouse over Hadoop cluster, such as we combine between our first data-driven approach and the second workloaddriven solution; The fourth contribution has been developed to improve Group-by and aggregate functions, by using a dynamic method, which able to define on the fly the best partitioning scheme of the reducer inputs. To evaluate our approaches, we have conducted some experiments on different cluster sizes, using different data warehouses volumes where the fact table has more than 28 billions of records. We have used the TPC-DS benchmark, a Hadoop-YARN platform, a Spark engine, and Ray and Hive system. Our experiments show that our methods outperform the state-of-the-art approaches in many aspects, especially on the OLAP query execution time
Mercier, Michael. "Contribution to High Performance Computing and Big Data Infrastructure Convergence". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM031/document.
Texto completoThe amount of data produced, either in the scientific community and the commercial world, is constantly growing. The field of Big Data has emerged to handle a large amount of data on distributed computing infrastructures. High-Performance Computer (HPC) infrastructures are made for intensive parallel computations. The HPC community is also facing more and more data because of new high definition sensors and large physics apparatus. The convergence of the two fields is currently happening. In fact, the HPC community is already using Big Data tools, but they are not integrated correctly, especially at the level of the file system and the Resources and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, and what are the challenges for the HPC infrastructures, we have studied multiple aspects of the convergence: we have made a survey on the software provisioning methods, with a focus on data-intensive applications. We also propose a new RJMS collaboration technique called BeBiDa which is based on 50 lines of code whereas similar solutions use at least 1000x more. We evaluate this mechanismon real conditions and in a simulation with our simulator Batsim
Liu, Jixiong. "Semantic Annotations for Tabular Data Using Embeddings : Application to Datasets Indexing and Table Augmentation". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS529.
Texto completoWith the development of Open Data, a large number of data sources are made available to communities (including data scientists and data analysts). This data is the treasure of digital services as long as data is cleaned, unbiased, as well as combined with explicit and machine-processable semantics in order to foster exploitation. In particular, structured data sources (CSV, JSON, XML, etc.) are the raw material for many data science processes. However, this data derives from different domains for which consumers are not always familiar with (knowledge gap), which complicates their appropriation, while this is a critical step in creating machine learning models. Semantic models (in particular, ontologies) make it possible to explicitly represent the implicit meaning of data by specifying the concepts and relationships present in the data. The provision of semantic labels on datasets facilitates the understanding and reuse of data by providing documentation on the data that can be easily used by a non-expert. Moreover, semantic annotation opens the way to search modes that go beyond simple keywords and allow the use of queries of a high conceptual level on the content of the datasets but also their structure while overcoming the problems of syntactic heterogeneity encountered in tabular data. This thesis introduces a complete pipeline for the extraction, interpretation, and applications of tables in the wild with the help of knowledge graphs. We first refresh the exiting definition of tables from the perspective of table interpretation and develop systems for collecting and extracting tables on the Web and local files. Three table interpretation systems are further proposed based on either heuristic rules or graph representation models facing the challenges observed from the literature. Finally, we introduce and evaluate two table augmentation applications based on semantic annotations, namely data imputation and schema augmentation
Alexandre-Barff, Welcome. "Architecture out-of-core basée GPU pour de la visualisation interactive de séries temporelles de données AMR". Electronic Thesis or Diss., Reims, 2024. http://www.theses.fr/2024REIMS005.
Texto completoThis manuscript presents a scalable approach for large-scale Adaptive Mesh Refinement (AMR) time series interactive visualization.We can define AMR data as a dynamic gridding format of cells hierarchically refined from a computational domain described in this study as a regular Cartesian grid.This adaptive feature is essential for tracking time-dependent evolutionary phenomena and makes the AMR format an essential representation for 3D numerical simulations.However, the visualization of numerical simulation data highlights one critical issue: the significant increases in generated data memory footprint reaching petabytes, thus greatly exceeding the memory capabilities of the most recent graphics hardware.Therefore, the question is how to access this massive data - AMR time series in particular - for interactive visualization on a simple workstation. To overcome this main problem, we present an out-of-core GPU-based architecture.Our proposal is a cache system based on an ad-hoc bricking identified by a Space-Filling Curve (SFC) indexing and managed by a GPU-based page table that loads required AMR data on-the-fly from disk to GPU memory
Da, Silva Carvalho Paulo. "Plateforme visuelle pour l'intégration de données faiblement structurées et incertaines". Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4020/document.
Texto completoWe hear a lot about Big Data, Open Data, Social Data, Scientific Data, etc. The importance currently given to data is, in general, very high. We are living in the era of massive data. The analysis of these data is important if the objective is to successfully extract value from it so that they can be used. The work presented in this thesis project is related with the understanding, assessment, correction/modification, management and finally the integration of the data, in order to allow their respective exploitation and reuse. Our research is exclusively focused on Open Data and, more precisely, Open Data organized in tabular form (CSV - being one of the most widely used formats in the Open Data domain). The first time that the term Open Data appeared was in 1995 when the group GCDIS (Global Change Data and Information System) (from United States) used this expression to encourage entities, having the same interests and concerns, to share their data [Data et System, 1995]. However, the Open Data movement has only recently undergone a sharp increase. It has become a popular phenomenon all over the world. Being the Open Data movement recent, it is a field that is currently growing and its importance is very strong. The encouragement given by governments and public institutions to have their data published openly has an important role at this level
Chihoub, Houssem Eddine. "Managing consistency for big data applications : tradeoffs and self-adaptiveness". Thesis, Cachan, Ecole normale supérieure, 2013. http://www.theses.fr/2013DENS0059/document.
Texto completoIn the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency
Caigny, Arno de. "Innovation in customer scoring for the financial services industry". Thesis, Lille, 2019. http://www.theses.fr/2019LIL1A011.
Texto completoThis dissertation improves customer scoring. Customer scoring is important for companies in their decision making processes because it helps to solve key managerial issues such as the decision of which customers to target for a marketing campaign or the assessment of customer that are likely to leave the company. The research in this dissertation makes several contributions in three areas of the customer scoring literature. First, new sources of data are used to score customers. Second, methodology to go from data to decisions is improved. Third, customer life event prediction is proposed as a new application of customer scoring
El, Garrab Hamza. "Amélioration de la chaine logistique de pièces de rechange en boucle fermée : application des modèles d’apprentissage". Thesis, Angers, 2020. http://www.theses.fr/2020ANGE0019.
Texto completoIn the field of after-sales service and particularly in maintenance, the quick intervention and repair of the customer's property is a key element for his satisfaction and for the creation of the brand image in the market. The work presented in this thesis proposes a Big Data and Machine Learning approach for the improvement of the information flow in the spare parts supply chain. Our contribution focuses on load forecasting in spare parts repair centers, which are the main suppliers of parts used to repair customers' systems. The size of the supply chain and its complexity, the large number of part numbers as well as the multitude of special cases (countries with specific laws, special parts...) makes that classical approaches do not offer reliable forecasts for repair services. In this project, we propose learning algorithms allowing the construction of knowledge from large volumes of data, instead of manual implementation. We will see the models in the literature, present our methodology, and then implement the models and evaluate their performance in comparison with existing algorithms
Yildiz, Orcun. "Efficient Big Data Processing on Large-Scale Shared Platforms ˸ managing I/Os and Failure". Thesis, Rennes, École normale supérieure, 2017. http://www.theses.fr/2017ENSR0009/document.
Texto completoAs of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance
Jlassi, Aymen. "Optimisation de la gestion des ressources sur une plate-forme informatique du type Big Data basée sur le logiciel Hadoop". Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4042.
Texto completo"Cyres-Group" is working to improve the response time of his clusters Hadoop and optimize how the resources are exploited in its data center. That is, the goals are to finish work as soon as possible and reduce the latency of each user of the system. Firstly, we decide to work on the scheduling problem in the Hadoop system. We consider the problem as the problem of scheduling a set of jobs on a homogeneous platform. Secondly, we decide to propose tools, which are able to provide more flexibility during the resources management in the data center and ensure the integration of Hadoop in Cloud infrastructures without unacceptable loss of performance. Next, the second level focuses on the review of literature. We conclude that, existing works use simple mathematical models that do not reflect the real problem. They ignore the main characteristics of Hadoop software. Hence, we propose a new model ; we take into account the most important aspects like resources management and the relations of precedence among tasks and the data management and transfer. Thus, we model the problem. We begin with a simplistic model and we consider the minimisation of the Cmax as the objective function. We solve the model with mathematical solver CPLEX and we compute a lower bound. We propose the heuristic "LocFirst" that aims to minimize the Cmax. In the third level, we consider a more realistic modelling of the scheduling problem. We aim to minimize the weighted sum of the following objectives : the weighted flow time ( ∑ wjCj) and the makespan (Cmax). We compute a lower bound and we propose two heuristics to resolve the problem
Darrous, Jad. "Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEN077.
Texto completoThis thesis focuses on scalable data management solutions to accelerate service provisioning and enable efficient execution of data-intensive applications in large-scale distributed clouds. Data-intensive applications are increasingly running on distributed infrastructures (multiple clusters). The main two reasons for such a trend are 1) moving computation to data sources can eliminate the latency of data transmission, and 2) storing data on one site may not be feasible given the continuous increase of data size.On the one hand, most applications run on virtual clusters to provide isolated services, and require virtual machine images (VMIs) or container images to provision such services. Hence, it is important to enable fast provisioning of virtualization services to reduce the waiting time of new running services or applications. Different from previous work, during the first part of this thesis, we worked on optimizing data retrieval and placement considering challenging issues including the continuous increase of the number and size of VMIs and container images, and the limited bandwidth and heterogeneity of the wide area network (WAN) connections.On the other hand, data-intensive applications rely on replication to provide dependable and fast services, but it became expensive and even infeasible with the unprecedented growth of data size. The second part of this thesis provides one of the first studies on understanding and improving the performance of data-intensive applications when replacing replication with the storage-efficient erasure coding (EC) technique
Nesvijevskaia, Anna. "Phénomène Big Data en entreprise : processus projet, génération de valeur et Médiation Homme-Données". Thesis, Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1247.
Texto completoBig Data, a sociotechnical phenomenon carrying myths, is reflected in companies by the implementation of first projects, especially Data Science projects. However, they do not seem to generate the expected value. The action-research carried out over the course of 3 years in the field, through an in-depth qualitative study of multiple cases, points to key factors that limit this generation of value, including overly self-contained project process models. The result is (1) an open data project model (Brizo_DS), orientated on the usage, including knowledge capitalization, intended to reduce the uncertainties inherent in these exploratory projects, and transferable to the scale of portfolio management of corporate data projects. It is completed with (2) a tool for documenting the quality of the processed data, the Databook, and (3) a Human-Data Mediation device, which guarantee the alignment of the actors towards an optimal result
Méral, Hélène. "De la relance multicanal du client fidèle à la performance commerciale des enseignes de ditribution spécialisées". Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0385/document.
Texto completoIn a context of development of the multi-channel marketing strategies bound mainly to the constant evolution of the Internet tools and the new technologies of distribution (Dupuis, Prunet, on 2001; Dabholkar, on 1996), it is advisable to understand the effects of these strategies on the companies' commercial performance through the process of customer loyalty development. So, this research project suggests investigating, through diverse quantitative studies, the effect the multi-channel relaunchings on loyalty programs to signs through an operation "check". The study will allow to build a benchmark model which the duplicity will be verified to be exploited on several business sectors
Soleman, Ramzi. "La théorie des ressources et l'évaluation du système d'information : le cas des outils de surveillance des médias sociaux (Social Media Monitoring)". Thesis, Paris 10, 2018. http://www.theses.fr/2018PA100018.
Texto completoRecently, social media data, called Big Social Data (BSD), attract more and more attention from researchers and professionals, particularly after the emergence of Social Media Monitoring (SMM) tools, used to process BSD. The promises associated with the SMM concern the improvement of decision-making processes, or even the transformation of business processes. Despite increasing investments, the effective use of these tools in companies is very variable. In this research, we would like to understand how and for what purposes the SMM tools are used?. For the evaluation of these tools, we build upon the Resource-Based Theory (RBT). In order to implement this research, we used a mixed method approach. This approach consists of a qualitative study that was used to develop and enrich a second quantitative study. The obtained results show that the combination of SMM resources (quality of SMM tool, human resources…) and complementary resources makes it possible to build SMM capabilities (measurement, process, interaction…) leading to performance. Moreover, the support of the organization, and more specifically the role of managers, in the activation of SMM resources and capabilities is consistent with the recent advancements of resource management. However, we detected some ambiguities concerning the RBT. To deal with these ambiguities, we propose to resort to the extended theory of resource. Finally, we present the contributions, the limits and the perspectives of our research
Renault, Thomas. "Three essays on the informational efficiency of financial markets through the use of Big Data Analytics". Thesis, Paris 1, 2017. http://www.theses.fr/2017PA01E009/document.
Texto completoThe massive increase in the availability of data generated everyday by individuals on the Internet has made it possible to address the predictability of financial markets from a different perspective. Without making the claim of offering a definitive answer to a debate that has persisted for forty years between partisans of the efficient market hypothesis and behavioral finance academics, this dissertation aims to improve our understanding of the price formation process in financial markets through the use of Big Data analytics. More precisely, it analyzes: (1) how to measure intraday investor sentiment and determine the relation between investor sentiment and aggregate market returns, (2) how to measure investor attention to news in real time, and identify the relation between investor attention and the price dynamics of large capitalization stocks, and (3) how to detect suspicious behaviors that could undermine the in-formational role of financial markets, and determine the relation between the level of posting activity on social media and small-capitalization stock returns. The first essay proposes a methodology to construct a novel indicator of investor sentiment by analyzing an extensive dataset of user-generated content published on the social media platform Stock-Twits. Examining users’ self-reported trading characteristics, the essay provides empirical evidence of sentiment-driven noise trading at the intraday level, consistent with behavioral finance theories. The second essay proposes a methodology to measure investor attention to news in real-time by combining data from traditional newswires with the content published by experts on the social media platform Twitter. The essay demonstrates that news that garners high attention leads to large and persistent change in trading activity, volatility, and price jumps. It also demonstrates that the pre-announcement effect is reduced when corrected newswire timestamps are considered. The third essay provides new insights into the empirical literature on small capitalization stocks market manipulation by examining a novel dataset of messages published on the social media plat-form Twitter. The essay proposes a novel methodology to identify suspicious behaviors by analyzing interactions between users and provide empirical evidence of suspicious stock recommendations on social media that could be related to market manipulation. The conclusion of the essay should rein-force regulators’ efforts to better control social media and highlights the need for a better education of individual investors
Lombardo-Fiault, Bernard. "Collaboration numérique et nouvelles formes de visibilité professionnelle : proposition d’une méthodologie et d’un dispositif réflexif d’adoption des pratiques collaboratives". Thesis, Paris 8, 2017. http://www.theses.fr/2017PA080097/document.
Texto completo10 years after digital socio-collaborative platforms have released, it appears their use is still not generalized, not taken for granted ; the integration in professional environments seems difficult, despite the promise of efficiencies, despite the proximity of tools with traditional local office applications, despite the efforts particularly in consulting services which fail to perpetuate uses. This work demonstrates that a new form of visibility induced by sharing, which is the foundation of digital collaboration, can be a brake or a leverage for adoption, and that it should be taken into account in the change process; It also contributes to the knowledge of the collaborative paradigm by proposing a typology of uses based on their intrinsic and social value, an adoption methodology geared towards the local transformation of daily work practices (Get Collaboration Done!™), And an indicator of the “Value” of the collaborative behavior, which is figurated (and calculated) by an index determined according to algorithmic modalities (Collaboration-Index™)