Academic literature on the topic 'Nettoyage des données'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Nettoyage des données.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Nettoyage des données"
Yasseen III, Abdool, Deborah Weiss, Sandy Remer, Nina Dobbin, Morgan MacNeill, Bojana Bogeljic, Dennis Leong, et al. "Augmentation du nombre d’appels relatifs à une exposition à certains nettoyants et désinfectants au début de la pandémie de COVID-19 : données des centres antipoison canadiens." Promotion de la santé et prévention des maladies chroniques au Canada 41, no. 1 (January 2021): 27–32. http://dx.doi.org/10.24095/hpcdp.41.1.03f.
Full textBouzeghoub, Mokrane, Zoubida Kedad, and Assia Soukane. "Génération de requêtes de médiation intégrant le nettoyage de données." Ingénierie des systèmes d'information 7, no. 3 (June 24, 2002): 39–66. http://dx.doi.org/10.3166/isi.7.3.39-66.
Full textKremp, Elizabeth. "Nettoyage de fichiers dans le cas de données individuelles : recherche de la cohérence transversale." Économie & prévision 119, no. 3 (1995): 171–93. http://dx.doi.org/10.3406/ecop.1995.5738.
Full textRivera Andía, Juan Javier, and Geneviève Deschamps. "Comparaison entre la herranza, la « fête de l’eau » et la zafa-casa dans les Andes." Recherches amérindiennes au Québec 44, no. 2-3 (June 1, 2015): 39–48. http://dx.doi.org/10.7202/1030965ar.
Full textKhardi, Abdeslam, Abdelaziz Nogot, Mustapha Abdellaoui, and Fatima Jaiti. "Valorisation des sous-produits du palmier-dattier pour contribuer à la durabilité des oasis du Maroc." Cahiers Agricultures 33 (2024): 3. http://dx.doi.org/10.1051/cagri/2023027.
Full textLefrançois, Mélanie, Johanne Saint-Charles, and Karen Messing. "« Travailler la nuit pour voir ses enfants, ce n’est pas l’idéal ! »." Articles 72, no. 1 (April 19, 2017): 99–124. http://dx.doi.org/10.7202/1039592ar.
Full textCouture, Andréanne, Najat Bhiry, James Woollett, and Yves Monette. "Géoarchéologie de maisons multifamiliales inuit de la période de contact au Labrador." Études/Inuit/Studies 39, no. 2 (December 2, 2016): 233–58. http://dx.doi.org/10.7202/1038149ar.
Full textCarmen GNELE, Baï Dodji Laurenda, Pierre OUASSA, Expédit Wilfrid VISSIN, and Moussa GIBIGAYE. "Facteurs De Contaminations Des Aliments Dans Les Restaurants De Rue De La Commune D’Abomey-Calavi Au Sud Du Benin, Afrique De L’ouest." International Journal of Progressive Sciences and Technologies 41, no. 1 (October 22, 2023): 93. http://dx.doi.org/10.52155/ijpsat.v41.1.5674.
Full textAnderson, Maureen, Ashok Chhetri, Edith Halyk, Amanda Lang, Ryan McDonald, Julie Kryzanowski, Jessica Minion, and Molly Trecker. "Une éclosion de COVID-19 associée à un centre d’entraînement physique en Saskatchewan : leçons pour la prévention." Relevé des maladies transmissibles au Canada 47, no. 11 (November 10, 2021): 538–44. http://dx.doi.org/10.14745/ccdr.v47i11a08f.
Full textSchroth, Robert J., Grace Kyoon-Achan, Mary McNally, Jeanette Edwards, Penny White, Hannah Tait Neufeld, Mary Bertone, et al. "Initiative en santé buccodentaire des enfants : le point de vue des intervenants quant à ses effets dans les communautés des Premières Nations." Promotion de la santé et prévention des maladies chroniques au Canada 43, no. 9 (September 2023): 439–50. http://dx.doi.org/10.24095/hpcdp.43.9.01f.
Full textDissertations / Theses on the topic "Nettoyage des données"
Galhardas, Héléna. "Nettoyage de données : modèle, langage déclaratif et algorithmes." Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS0032.
Full textThe problem od data cleaning, which consists of removing inconsistencies and errors from original data sets, is well know in the area of decision support systems and data warehouses. This holds regardless of the application-relational database joining, web-related, or scientific. In all cases, existing ETL (Extraction transformation Loading) and data cleaning tools for writing data cleaning programs are insufficient. The main challenge is the design and implementation of a data flow graph that effectivrly generates clean data. Needed improvements to the current state of the art include (i) a clear separation between the logical specification of data transformations and their physical implementation (ii) debugging of the reasoning behind cleaning results, (iii) and interactive facilities to tune a data cleaning program. This thesis presents a langage, an execution model and algorithms that enable users to express data cleaning specifications declaratively and perform the cleaning efficiently. We use as an example a set of bibliographic references used to construct the Citeseer web site. The underlying data integration problem is to derive structured and clean textual records so that meaningful queries can be performed. Experimental results report on the assesment of the proposed framework for data cleaning
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Full textNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Tian, Yongchao. "Accéler la préparation des données pour l'analyse du big data." Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0017/document.
Full textWe are living in a big data world, where data is being generated in high volume, high velocity and high variety. Big data brings enormous values and benefits, so that data analytics has become a critically important driver of business success across all sectors. However, if the data is not analyzed fast enough, the benefits of big data will be limited or even lost. Despite the existence of many modern large-scale data analysis systems, data preparation which is the most time-consuming process in data analytics has not received sufficient attention yet. In this thesis, we study the problem of how to accelerate data preparation for big data analytics. In particular, we focus on two major data preparation steps, data loading and data cleaning. As the first contribution of this thesis, we design DiNoDB, a SQL-on-Hadoop system which achieves interactive-speed query execution without requiring data loading. Modern applications involve heavy batch processing jobs over large volume of data and at the same time require efficient ad-hoc interactive analytics on temporary data generated in batch processing jobs. Existing solutions largely ignore the synergy between these two aspects, requiring to load the entire temporary dataset to achieve interactive queries. In contrast, DiNoDB avoids the expensive data loading and transformation phase. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata, that DiNoDB exploits to expedite the interactive queries. The second contribution is a distributed stream data cleaning system, called Bleach. Existing scalable data cleaning approaches rely on batch processing to improve data quality, which are very time-consuming in nature. We target at stream data cleaning in which data is cleaned incrementally in real-time. Bleach is the first qualitative stream data cleaning system, which achieves both real-time violation detection and data repair on a dirty data stream. It relies on efficient, compact and distributed data structures to maintain the necessary state to clean data, and also supports rule dynamics. We demonstrate that the two resulting systems, DiNoDB and Bleach, both of which achieve excellent performance compared to state-of-the-art approaches in our experimental evaluations, and can help data scientists significantly reduce their time spent on data preparation
Tian, Yongchao. "Accéler la préparation des données pour l'analyse du big data." Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0017.
Full textWe are living in a big data world, where data is being generated in high volume, high velocity and high variety. Big data brings enormous values and benefits, so that data analytics has become a critically important driver of business success across all sectors. However, if the data is not analyzed fast enough, the benefits of big data will be limited or even lost. Despite the existence of many modern large-scale data analysis systems, data preparation which is the most time-consuming process in data analytics has not received sufficient attention yet. In this thesis, we study the problem of how to accelerate data preparation for big data analytics. In particular, we focus on two major data preparation steps, data loading and data cleaning. As the first contribution of this thesis, we design DiNoDB, a SQL-on-Hadoop system which achieves interactive-speed query execution without requiring data loading. Modern applications involve heavy batch processing jobs over large volume of data and at the same time require efficient ad-hoc interactive analytics on temporary data generated in batch processing jobs. Existing solutions largely ignore the synergy between these two aspects, requiring to load the entire temporary dataset to achieve interactive queries. In contrast, DiNoDB avoids the expensive data loading and transformation phase. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata, that DiNoDB exploits to expedite the interactive queries. The second contribution is a distributed stream data cleaning system, called Bleach. Existing scalable data cleaning approaches rely on batch processing to improve data quality, which are very time-consuming in nature. We target at stream data cleaning in which data is cleaned incrementally in real-time. Bleach is the first qualitative stream data cleaning system, which achieves both real-time violation detection and data repair on a dirty data stream. It relies on efficient, compact and distributed data structures to maintain the necessary state to clean data, and also supports rule dynamics. We demonstrate that the two resulting systems, DiNoDB and Bleach, both of which achieve excellent performance compared to state-of-the-art approaches in our experimental evaluations, and can help data scientists significantly reduce their time spent on data preparation
Manad, Otman. "Nettoyage de corpus web pour le traitement automatique des langues." Thesis, Paris 8, 2018. http://www.theses.fr/2018PA080011.
Full textCorpora are the main material of computer linguistics and natural language processing. Not many languages have corpora made from web resources (forums, blogs, etc.), even those that do not have other resources. Web resources contain lots of noise (menus, ads, etc.). Filtering boilerplate and repetitive data requires a large-scale manual cleaning by the researcher.This thesis presents an automatic system that construct web corpus with a low level of noise.It consists of three modules : (a) one for building corpora in any language and any type of data, intended to be collaborative and preserving corpus history; (b) one for crawling web forums and blogs; (c) one for extracting relevant data using clustering techniques with different distances, from the structure of web page.The system is evaluated in terms of the efficacy of noise filtering and of computing time. Our experiments, made on four languages, are evaluated using our own gold standard corpus. To measure quality, we use recall, precision and F-measure. Feature-distance and Jaro distance give the best results, but not in the same contexts, feature-distance having the best average quality.We compare our method with three methods dealing with the same problem, Nutch, BootCat and JusText. The performance of our system is better as regards the extraction quality, even if for computing time, Nutch and BootCat dominate
Zaidi, Houda. "Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1094/document.
Full textData quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns
Zaidi, Houda. "Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes." Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1094.
Full textData quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns
Cappuzzo, Riccardo. "Deep learning models for tabular data curation." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS047.
Full textData retention is a pervasive and far-reaching topic, affecting everything from academia to industry. Current solutions rely on manual work by domain users, but they are not adequate. We are investigating how to apply deep learning to tabular data curation. We focus our work on developing unsupervised data curation systems and designing curation systems that intrinsically model categorical values in their raw form. We first implement EmbDI to generate embeddings for tabular data, and address the tasks of entity resolution and schema matching. We then turn to the data imputation problem using graphical neural networks in a multi-task learning framework called GRIMP
Cadot, Martine. "Extraire et valider les relations complexes en sciences humaines : statistiques, motifs et règles d'association." Phd thesis, Université de Franche-Comté, 2006. http://tel.archives-ouvertes.fr/tel-00594174.
Full textLemieux, Perreault Louis-Philippe. "Approches bio-informatiques appliquées aux technologies émergentes en génomique." Thèse, 2014. http://hdl.handle.net/1866/10884.
Full textGenetic studies, such as linkage and association studies, have contributed greatly to a better understanding of the etiology of several diseases. Nonetheless, despite the tens of thousands of genetic studies performed to date, a large part of the heritability of diseases and traits remains unexplained. The last decade experienced unprecedented progress in genomics. For example, the use of microarrays for high-density comparative genomic hybridization has demonstrated the existence of large-scale copy number variations and polymorphisms. These are now detectable using DNA microarray or high-throughput sequencing. In addition, high-throughput sequencing has shown that the majority of variations in the exome are rare or unique to the individual. This has led to the design of a new type of DNA microarray that is enriched for rare variants that can be quickly and inexpensively genotyped in high throughput capacity. In this context, the general objective of this thesis is the development of methodological approaches and bioinformatics tools for the detection at the highest quality standards of copy number polymorphisms and rare single nucleotide variations. It is expected that by doing so, more of the missing heritability of complex traits can then be accounted for, contributing to the advancement of knowledge of the etiology of diseases. We have developed an algorithm for the partition of copy number polymorphisms, making it feasible to use these structural changes in genetic linkage studies with family data. We have also conducted an extensive study in collaboration with the Wellcome Trust Centre for Human Genetics of the University of Oxford to characterize rare copy number definition metrics and their impact on study results with unrelated individuals. We have conducted a thorough comparison of the performance of genotyping algorithms when used with a new DNA microarray composed of a majority of very rare genetic variants. Finally, we have developed a bioinformatics tool for the fast and efficient processing of genetic data to increase quality, reproducibility of results and to reduce spurious associations.
Book chapters on the topic "Nettoyage des données"
"Nettoyage et découverte." In L'analyse des données de sondage avec SPSS, 47–66. Presses de l'Université du Québec, 2018. http://dx.doi.org/10.2307/j.ctv10qqx59.7.
Full textCostanzo, Lucia. "Le nettoyage de données dans le processus de gestion des données de recherche." In La gestion des données de recherche dans le contexte canadien: un guide pour la pratique et l'apprentissage. Western University, Western Libraries, 2023. http://dx.doi.org/10.5206/rhbn7291.
Full textLuo, Rong, and Berenica Vejvoda. "Nouvelles aventures en nettoyage des données: travailler avec des données dans Excel et R." In La gestion des données de recherche dans le contexte canadien: un guide pour la pratique et l'apprentissage. Western University, Western Libraries, 2023. http://dx.doi.org/10.5206/dpci3894.
Full text