Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Big Data et algorithmes.

Artykuły w czasopismach na temat „Big Data et algorithmes”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Big Data et algorithmes”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Benavent, Christophe. "Big Data, algorithme et marketing : rendre des comptes". Statistique et société 4, nr 3 (2016): 25–35. https://doi.org/10.3406/staso.2016.1009.

Pełny tekst źródła
Streszczenie:
Cet article s’intéresse à la question de la mise en oeuvre à vaste échelle d’algorithmes utiles au marketing, et s’intégrant dans une logique de plateforme. Prenant en compte des observations répétées d’externalités négatives produites par les algorithmes : ségrégation, biais de sélection, polarisation, hétérogénéisation, mais aussi leurs faiblesses intrinsèques résultant de la dette technique, de dépendances des données, et du contexte adversial dans lequel ils s’exercent, nous aboutissons à la nécessité d’une redevabilité algorithmique et nous nous questionnons sur la manière dont les algorithmes doivent être gouvernés.
Style APA, Harvard, Vancouver, ISO itp.
2

Jauréguiberry, Francis. "L’individu hypermoderne face aux big data". Sociologie et sociétés 49, nr 2 (4.12.2018): 33–58. http://dx.doi.org/10.7202/1054273ar.

Pełny tekst źródła
Streszczenie:
Les big data, le datamining et le profilage, avec l’ensemble des applications d’aide à l’action individuelle et collective qui en découle, suscitent à juste titre des inquiétudes en ce qui concerne, d’une part la protection de la vie privée dans un environnement capteur des faits et gestes de chacun, d’autre part les formes de gouvernance de plus en plus informées par des algorithmes prédictifs. Sans négliger ces dangers, une position presque inverse sera ici défendue sous forme d’hypothèse : loin d’entraîner le déclin de l’autonomie individuelle, de soi comme personne singulière capable de réflexivité et en position de faire des choix autonomes, la confrontation renouvelée à une image personnelle purement quantitative et utilitaire (profil) peut conduire à un ressaisissement de soi visant à ce que les choix soient non plus seulement guidés par une logique narcissique, utilitaire et quantitative, mais tout autant par des principes de cohérence individuelle, éthiques et moraux qui, in fine, donnent du sens à la vie.
Style APA, Harvard, Vancouver, ISO itp.
3

Koch, Olivier. "Les données de la guerre. Big Data et algorithmes à usage militaire". Les Enjeux de l'information et de la communication N° 19/2, nr 2 (2018): 113. http://dx.doi.org/10.3917/enic.025.0113.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Gori, Roland. "La biopolitique à l’ère des algorithmes". Cliniques méditerranéennes 110, nr 2 (25.09.2024): 147–65. http://dx.doi.org/10.3917/cm.110.0147.

Pełny tekst źródła
Streszczenie:
Michel Foucault a largement montré comment l’herméneutique du sujet moderne le conduisait à une externalisation des dispositifs de « technique de soi », mais que dirait-il aujourd’hui face à une connaissance du sujet par lui-même qui passe par Facebook, les Big Data et le numérique, bref un sujet à l’ère des selfies et de l’intelligence artificielle ? Où en sommes-nous aujourd’hui de la manière de gouverner les humains et des techniques de formation pour leur apprendre à se gouverner eux-mêmes ? Il y a véritablement un nouage entre les manières de se connaître, de se fabriquer comme sujet éthique, et les manières de gouverner. C’est la raison pour laquelle à la manière de se connaître par le numérique et les algorithmes correspond un gouvernement politique des humains par la prédiction et la gestion numériques. Face à la crise de confiance des peuples dans les gouvernements démocratiques, les États se réfugient dans une gestion technocratique, une administration quasiment algorithmique des populations, non sans se soumettre aux exigences des marchés avec lesquels le gouvernement technocratique des humains peut faire bon ménage. Avec une telle transition vers la post-démocratie allant vers une ère post-politique, déjà présente dans les nouvelles manières de soigner les âmes, nous serions face à une crise de la vérité et de la subjectivité. Et plus encore en présence de l’émergence d’un nouveau système, d’une nouvelle épistémè politique et subjective, d’une nouvelle relation du sujet au pouvoir.
Style APA, Harvard, Vancouver, ISO itp.
5

Besse, Philippe, Céline Castets-Renard i Aurélien Garivier. "L’IA du Quotidien peut elle être Éthique ?" Statistique et société 6, nr 3 (2018): 9–31. https://doi.org/10.3406/staso.2018.1083.

Pełny tekst źródła
Streszczenie:
Associant données massives (big data) et algorithmes d’apprentissage automatique (machine learning), la puissance des outils de décision automatique suscite autant d’espoir que de craintes. De nombreux textes législatifs européens (RGPD) et français récemment promulgués tentent d’encadrer les usages de ces outils. Laissant de côté les problèmes bien identifiés de confidentialité des données et ceux d’entrave à la concurrence, nous nous focalisons sur les risques de discrimination, les problèmes de transparence et ceux de qualité des décisions algorithmiques. La mise en perspective détaillée des textes juridiques, face à la complexité et l’opacité des algorithmes d’apprentissage, révèle la nécessité d’importantes disruptions technologiques que ce soit pour détecter ou réduire le risque de discrimination ou pour répondre au droit à l’explication. La confiance des développeurs et surtout des usagers (citoyens, justiciables, clients) étant indispensable, les algorithmes exploitant des données personnelles se doivent d’être déployés dans un cadre éthique strict. En conclusion nous listons, pour répondre à cette nécessité, quelques possibilités de contrôle à développer : institutionnel, charte éthique, audit externe attaché à la délivrance d’un label.
Style APA, Harvard, Vancouver, ISO itp.
6

Viglino, Manon. "La présomption d’innocence à l’ère du numérique". Revue de la recherche juridique, nr 2 (5.01.2021): 1039–63. http://dx.doi.org/10.3917/rjj.190.1039.

Pełny tekst źródła
Streszczenie:
Les nouvelles technologies se développent, les formes d’intelligence artificielle se perfectionnent. Algorithmes, Big Data et justice prédictive sont aujourd’hui au cœur des débats sur l’avenir de la justice. Loin de dénier les progrès permis par ces nouvelles technologies et les bénéfices des infinies possibilités qu’elles offrent, il convient toutefois d’en souligner les dangers. En particulier, à l’heure de leur officialisation et de leur développement, il semble indispensable de veiller au respect des droits et libertés les plus fondamentaux. En particulier, dans le cadre pénal, la présomption d’innocence semble déjà souffrir d’un certain affaiblissement, induit par la facilité du partage d’informations non vérifiées en particulier par l’utilisation des réseaux sociaux, mais également par les nombreux biais et la nature même de certains algorithmes. Une réflexion éthique devrait ainsi être initiée, pour garantir l’existence d’un état de droit sans pour autant freiner le progrès.
Style APA, Harvard, Vancouver, ISO itp.
7

Nazeer, Mohammed Yaseer, i Mohammad Tarik Nadir. "Data Deluge Dynamics: Tracing the Evolution and Ramifications of Big Data Phenomenon". International Journal of Research and Innovation in Social Science VIII, nr V (2024): 2147–56. http://dx.doi.org/10.47772/ijriss.2024.805157.

Pełny tekst źródła
Streszczenie:
This paper presents a comprehensive review of the evolution, methodologies, challenges, and implications of Big Data in various domains. Big Data has emerged as a critical resource, offering unprecedented opportunities for decision-making, innovation, and societal advancement. The analysis delves into the historical trajectory of Big Data, examining its evolution from the early 2000s to its current status as a cornerstone of contemporary data-driven practices. Drawing on seminal works by Chen et al. (2014), Manyika et al. (2011), and Kitchin (2014), the review highlights the fundamental characteristics of Big Data, encapsulated by the “three Vs” – Volume, Velocity, and Variety – along with the emerging dimensions of Veracity and Value. Methodologically, the paper surveys the diverse approaches and technologies employed in Big Data analytics, ranging from descriptive and predictive analytics to advanced machine learning algorithms. Provost and Fawcett (2013) and Zikopoulos et al. (2011) provide valuable insights into the practical applications of these methodologies across sectors such as healthcare, finance, marketing, and governance. However, amidst the promise of Big Data lies a myriad of challenges, including data quality issues, scalability constraints, and ethical dilemmas. Davenport and Harris (2007) discuss the imperative of organizations to compete on analytics while navigating the complexities of managing large and heterogeneous datasets. Moreover, the paper examines the ethical, legal, and social considerations inherent in Big Data practices, emphasizing the importance of privacy, consent, fairness, transparency, and accountability. These concerns are further underscored by recent controversies surrounding data privacy breaches and algorithmic biases, prompting calls for enhanced regulatory frameworks and ethical guidelines. Looking ahead, the paper outlines future research directions in Big Data, including the development of ethical frameworks for governance, the integration of diverse data sources, and the exploration of emerging applications in smart cities, precision agriculture, and autonomous vehicles. In conclusion, while Big Data holds immense potential for driving innovation and progress, its responsible and ethical utilization is paramount to ensuring equitable and sustainable societal outcomes.
Style APA, Harvard, Vancouver, ISO itp.
8

Berriche, Amira, Dominique Crié i Michel Calciu. "Une Approche Computationnelle Ancrée : Étude de cas des tweets du challenge #Movember en prévention de santé masculine". Décisions Marketing N° 112, nr 4 (25.01.2024): 79–103. http://dx.doi.org/10.3917/dm.112.0079.

Pełny tekst źródła
Streszczenie:
• Objectif L’objectif de cette étude est de présenter l’approche méthodologique computationnelle ancrée qui repose sur une démarche d’interprétation par les chercheurs des thèmes détectés par les algorithmes d’intelligence artificielle (IA) puis de l’appliquer au cas #Movember. • Méthodologie Une classification non supervisée par LDA et une analyse de sentiment ont été réalisées sur 144 906 tweets provenant de différents pays participants (France, Italie, Belgique, Australie, USA, UK, Arabie Saoudite, etc.). • Résultats Les résultats montrent que le processus de l’engagement individuel au mouvement social #Movember est composé de trois principaux éléments : (1) 4 segments d’engagement individuel (sympathisants, conscients, engagés et maintiens), (2) émotions collectives (positives et négatives) et (3) facteurs cognitifs et motivationnels (calcul bénéfices-coûts, efficacité collective et identité). • Implications managériales Les résultats proposent des actions marketing adaptées à chaque segment pour aider à la fois les organisateurs du mouvement #Movember et les professionnels de santé (PS) à atteindre deux principaux objectifs : (1) dépistage et (2) notoriété, recrutement et collecte de dons, grâce au big data, par le ciblage des personnes avec antécédents familiaux. • Originalité Les recherches sur #Movember utilisent habituellement les algorithmes supervisés qui présentent plusieurs limites tels que biais de confirmation, manque de répétabilité et une exigence en temps. Ce travail utilise le modèle non supervisé LDA pour identifier des concepts latents par la machine dans une perspective computationnelle ancrée (Computational Grounded Theory, CGT).
Style APA, Harvard, Vancouver, ISO itp.
9

Polton, Dominique. "Les données de santé". médecine/sciences 34, nr 5 (maj 2018): 449–55. http://dx.doi.org/10.1051/medsci/20183405018.

Pełny tekst źródła
Streszczenie:
En matière de santé comme dans d’autres secteurs, une masse croissante de données numérisées provenant de diverses sources est disponible et exploitable. C’est l’un des domaines où le potentiel du Big data apparaît très prometteur, avec de multiples innovations au bénéfice des patients et du système (accélération de la recherche et développement, connaissance des maladies, des facteurs de risque, médecine personnalisée, aide au diagnostic et au traitement, rôle accru des patients, pharmacovigilance, etc.), même si des inquiétudes s’expriment aussi vis-à-vis des impacts sociétaux, économiques et éthiques que le recours croissant aux algorithmes et à l’intelligence artificielle pourrait induire. Développer l’usage de ces données constitue un objectif stratégique de tous les systèmes de santé, et de ce point de vue le Système national de données de santé (SNDS) constitue pour la France un patrimoine intéressant, mais qui demande à être complété et enrichi.
Style APA, Harvard, Vancouver, ISO itp.
10

Bullich, Vincent, i Viviane Clavier. "Production des données, « Production de la société ». Les Big Data et algorithmes au regard des Sciences de l’information et de la communication". Les Enjeux de l'information et de la communication N° 19/2, nr 2 (2018): 5. http://dx.doi.org/10.3917/enic.025.0005.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Cotton, Anne-Marie. "European Communication Monitor 2016 : 10 ans de recherche pan-européenne sur la communication stratégique". Revue Communication & professionnalisation, nr 4 (26.01.2017): 181–205. http://dx.doi.org/10.14428/rcompro.vi4.713.

Pełny tekst źródła
Streszczenie:
L’équipe de recherche du European Communication Monitor (ECM) publie les résultats de la dixième édition de leurs questionnements sur les développements et les dynamiques de la communication stratégique dans 43 pays d’Europe. Dans l’étude 2016 l’analyse des « big data », des algorithmes en communication, des pratiques en communication propres au coaching et au conseil, de l’engagement des parties prenantes, des influenceurs actifs sur les réseaux sociaux, et des savoir, savoir-faire et savoir-être des professionnels de la communication. 2710 professionnels de la communication ont participé à l’étude. Les nombreuses comparaisons avec les résultats de l’ECM 2013 dénoncent la faible évolution du niveau de compétences moyen des professionnels de la communication à l’exception de la prévention et la gestion des crises sur les réseaux sociaux. Dans une optique de standards de la professionnalisation, les chercheurs ont créé le « comparative excellence framework » (CEF) qui vise à identifier les caractéristiques distinguant les professionnels et identifiant les pratiques d’excellence. The research team of the European Communication Monitor (ECM) publishes the results of the tenth edition of their pan-European study on the developments and dynamics of strategic communication in 43 European countries. In the 2016 edition, they focused on the analysis of « big data », communication algorithms, communication practices specifically dealing with coaching and consulting, stakeholders engagement, active influencers on social networks, and the knowledge, skills and know-how of communication professionals. 2710 communication professionals participated in the study. The comparisons with the results of the ECM 2013 reveal a weak evolution of the average level of competences of the communication professionals with the exception of the prevention and the management of crises on the social networks. Willing to support professionalisation standards, the researchers have created the comparative excellence framework (CEF), which aims to identify the characteristics distinguishing professionals and identifying best practices.
Style APA, Harvard, Vancouver, ISO itp.
12

Ibrahim, Nadia, Alaa Hassan i Marwah Nihad. "Big Data Analysis of Web Data Extraction". International Journal of Engineering & Technology 7, nr 4.37 (13.12.2018): 168. http://dx.doi.org/10.14419/ijet.v7i4.37.24095.

Pełny tekst źródła
Streszczenie:
In this study, the large data extraction techniques; include detection of patterns and secret relationships between factors numbering and bring in the required information. Rapid analysis of massive data can lead to innovation and concepts of the theoretical value. Compared with results from mining between traditional data sets and the vast amount of large heterogeneous data interdependent it has the ability expand the knowledge and ideas about the target domain. We studied in this research data mining on the Internet. The various networks that are used to extract data onto different locations complex may appear sometimes and has been used to extract information on the web technology to extract and data analysis (Marwah et al., 2016). In this research, we extracted the information on large quantities of the web pages and examined the pages of the site using Java code, and we added the extracted information on a special database for the web page. We used the data network function to get accurate results of evaluating and categorizing the data pages found, which identifies the trusted web or risky web pages, and imported the data onto a CSV extension. Consequently, examine and categorize these data using WEKA to obtain accurate results. We concluded from the results that the applied data mining algorithms are better than other techniques in classification and extraction of data and high performance.
Style APA, Harvard, Vancouver, ISO itp.
13

Abdessamad, Sara. "La jurisprudence « as code », vers quelle substantialité jurisprudentielle ?" Pin Code 32, nr 4 (18.11.2024): 1–6. https://doi.org/10.3917/pinc.020.0001.

Pełny tekst źródła
Streszczenie:
La justice prédictive figure parmi les concepts clés de la justice de xxi e siècle, une justice qui reflète l’utilisation des nouvelles technologies « intelligentes », pour consacrer le principe de l’efficience dans la sphère judiciaire, notamment devant une atmosphère universelle en mutation, mais aussi en crise. Ainsi, la robotisation de la justice devient un garde-fou pour satisfaire la concurrence des systèmes juridiques, une concurrence qui encourage par ricochet l’esprit pragmatique. Un esprit qui engendrait selon cette approche calculatrice une jurisprudence « mécanique », élaborée via les algorithmes du big data. La question de l’esprit d’une jurisprudence « algorithmisée », met aussi en exergue le langage jurisprudentiel, qui est le métalangage du langage du droit. Le deep learning a changé la donne au niveau du traitement de langage, il s’agit en effet d’une révolution technologique pour imiter la cérébralité humaine. Le leitmotiv de cette intervention est de démontrer, à travers les axes absorbés, que l’esprit calculateur appauvrirait la substantialité de la jurisprudence, en tant qu’une norme traduite par une langue. Et que l’esprit de la jurisprudence est issu d’un esprit purement créateur, où le juge humain s’affronte toujours devant les phénomènes inédits, une opération compliquée pour un juge humanoïde guidé par son big data .
Style APA, Harvard, Vancouver, ISO itp.
14

Sumrahadi, Abdullah, Musa Maliki i Harryanto Aryodiguno. "Navigating the Data Stream: The Intersection of Digital Politics and Indonesian Foreign Policy in the Era of Big Data". Journal of Law and Politics 5, nr 1 (kwiecień 2024): 1–16. http://dx.doi.org/10.69648/rywo5712.

Pełny tekst źródła
Streszczenie:
In the contemporary landscape of international relations, the fusion of digital politics and big data analytics has emerged as a pivotal force reshaping diplomatic strategies and national foreign policy. This paper delves into the intricate interplay between digital politics and Indonesian foreign policy within the expansive realm of big data. Indonesia is a dynamic archipelago boasting a burgeoning digital ecosystem that serves as an illuminating case study to unravel the multifaceted dynamics of this intersection. In this era of information overload (Wood et al., 1998), big data analytics revolutionized the traditional diplomacy paradigm, offering unprecedented insight into global trends, public sentiment, and policy preferences. Traditionally grounded in diplomatic norms and statecraft, Indonesian foreign policy is now navigating the data stream, leveraging digital technologies to enhance strategic decision-making. The convergence of digital politics and big data has democratized access to information and posed novel challenges, including privacy, misinformation, and algorithmic biases (Boyd, 2008). Against this backdrop, Indonesian policymakers are tasked with crafting nuanced approaches to harness the potential of big data while safeguarding national interest and democratic values. Through a comprehensive analysis of Internet research methods through case studies, policy initiatives, and theoretical frameworks, this paper illuminates the transformative potential of big data in shaping the contours of Indonesian foreign policy. By exploring the synergies between digital politics and diplomatic endeavors, this study contributes to a deeper understanding of the evolving landscape of international relations in the digital age.
Style APA, Harvard, Vancouver, ISO itp.
15

Zolynski, Célia. "OS BIG DATA E OS DADOS PESSOAIS ENTRE OS PRINCÍPIOS DA PROTEÇÃO E DA INOVAÇÃO". Law, State and Telecommunications Review 12, nr 1 (16.03.2020): 225–45. http://dx.doi.org/10.26512/lstr.v12i1.30007.

Pełny tekst źródła
Streszczenie:
Objective ”“ The article contrasts the problem of Big Data with the possibilities and limits of personal data protection. It is an original contribution to the academic discussion about the regulation of the Internet and the management of algorithms, focusing on Big Data. Methodology/approach/design ”“ The article provides bibliographic research on the opposition between Big Data and personal data protection, focusing on European Union law and French law. From the research is possible to identify regulatory alternatives do Big Data, whether legal-administrative nature or technological nature. Findings ”“ The article enlightens that, in addition to the traditional regulatory options, based on the law, there are technological options for regulating Big Data and algorithms. The article goes through an analysis of administrative performance, such as France’s CNIL (Commission nationale informatique et libertés, CNIL), to show that it has limits. Thus, the article concludes that there is a need to build a new type of regulation, one that is open to the inputs of regulated parties and civil society, in the form of new co-regulatory arrangements. Practical implications ”“ The article has an obvious application since the production of legal solutions for Internet regulation requires combining them with technological solutions. Brazil and several Latin American countries are experiencing this agenda, as they are building institutions and solutions to solve the dilemma of personal data protection. Originality/value ”“ The article clarifies several parts of the General Data Protection Regulation (EU Regulation 2016/679) and its applicability to Big Data. These new types of data processing impose several legal and regulatory challenges, whose solutions cannot be trivial and will rely on new theories and practices.
Style APA, Harvard, Vancouver, ISO itp.
16

Nguyen Thai, B., i A. Olasz. "RASTER DATA PARTITIONING FOR SUPPORTING DISTRIBUTED GIS PROCESSING". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XL-3/W3 (20.08.2015): 543–51. http://dx.doi.org/10.5194/isprsarchives-xl-3-w3-543-2015.

Pełny tekst źródła
Streszczenie:
In the geospatial sector big data concept also has already impact. Several studies facing originally computer science techniques applied in GIS processing of huge amount of geospatial data. In other research studies geospatial data is considered as it were always been big data (Lee and Kang, 2015). Nevertheless, we can prove data acquisition methods have been improved substantially not only the amount, but the resolution of raw data in spectral, spatial and temporal aspects as well. A significant portion of big data is geospatial data, and the size of such data is growing rapidly at least by 20% every year (Dasgupta, 2013). The produced increasing volume of raw data, in different format, representation and purpose the wealth of information derived from this data sets represents only valuable results. However, the computing capability and processing speed rather tackle with limitations, even if semi-automatic or automatic procedures are aimed on complex geospatial data (Krist´of et al., 2014). In late times, distributed computing has reached many interdisciplinary areas of computer science inclusive of remote sensing and geographic information processing approaches. Cloud computing even more requires appropriate processing algorithms to be distributed and handle geospatial big data. Map-Reduce programming model and distributed file systems have proven their capabilities to process non GIS big data. But sometimes it’s inconvenient or inefficient to rewrite existing algorithms to Map-Reduce programming model, also GIS data can not be partitioned as text-based data by line or by bytes. Hence, we would like to find an alternative solution for data partitioning, data distribution and execution of existing algorithms without rewriting or with only minor modifications. This paper focuses on technical overview of currently available distributed computing environments, as well as GIS data (raster data) partitioning, distribution and distributed processing of GIS algorithms. A proof of concept implementation have been made for raster data partitioning, distribution and processing. The first results on performance have been compared against commercial software ERDAS IMAGINE 2011 and 2014. Partitioning methods heavily depend on application areas, therefore we may consider data partitioning as a preprocessing step before applying processing services on data. As a proof of concept we have implemented a simple tile-based partitioning method splitting an image into smaller grids (NxM tiles) and comparing the processing time to existing methods by NDVI calculation. The concept is demonstrated using own development open source processing framework.
Style APA, Harvard, Vancouver, ISO itp.
17

Phan, Thanh Huan, i Hoài Bắc Lê. "A Comprehensive Survey of Frequent Itemsets Mining on Transactional Database with Weighted Items". Journal of Research and Development on Information and Communication Technology 2021, nr 1 (17.06.2021): 19–28. http://dx.doi.org/10.32913/mic-ict-research.v2021.n1.967.

Pełny tekst źródła
Streszczenie:
In 1993, Agrawal et al. proposed the first algorithm for mining traditional frequent itemset on binarytransactional database with unweighted items - This algorithmis essential in finding hindden relationships among items inyour data. Until 1998, with the development of various typesof transactional database - some researchers have proposed afrequent itemsets mining algorithms on transactional databasewith weighted items (the importance/meaning/value of itemsis different) - It provides more pieces of knowledge thantraditional frequent itemsets mining. In this article, the authors present a survey of frequent itemsets mining algorithmson transactional database with weighted items over the pasttwenty years. This research helps researchers to choose theright technical solution when it comes to scale up in big datamining. Finally, the authors give their recommendations anddirections for their future research.
Style APA, Harvard, Vancouver, ISO itp.
18

ALAM, MAHBOOB, i MOHD AMJAD. "A precipitation forecasting model using machine learning on big data in clouds environment". MAUSAM 72, nr 4 (1.11.2021): 781–90. http://dx.doi.org/10.54302/mausam.v72i4.3546.

Pełny tekst źródła
Streszczenie:
Numerical weather prediction (NWP) has long been a difficult task for meteorologists. Atmospheric dynamics is extremely complicated to model, and chaos theory teaches us that the mathematical equations used to predict the weather are sensitive to initial conditions; that is, slightly perturbed initial conditions could yield very different forecasts. Over the years, meteorologists have developed a number of different mathematical models for atmospheric dynamics, each making slightly different assumptions and simplifications, and hence each yielding different forecasts. It has been noted that each model has its strengths and weaknesses forecasting in different situations, and hence to improve performance, scientists now use an ensemble forecast consisting of different models and running those models with different initial conditions. This ensemble method uses statistical post-processing; usually linear regression. Recently, machine learning techniques have started to be applied to NWP. Studies of neural networks, logistic regression, and genetic algorithms have shown improvements over standard linear regression for precipitation prediction. Gagne et al proposed using multiple machine learning techniques to improve precipitation forecasting. They used Breiman’s random forest technique, which had previously been applied to other areas of meteorology. Performance was verified using Next Generation Weather Radar (NEXRAD) data. Instead of using an ensemble forecast, it discusses the usage of techniques pertaining to machine learning to improve the precipitation forecast. This paper is to present an approach for mapping of precipitation data. The project attempts to arrive at a machine learning method which is optimal and data driven for predicting precipitation levels that aids farmers thereby aiming to provide benefits to the agricultural domain.
Style APA, Harvard, Vancouver, ISO itp.
19

Bagui, Sikha, i Timothy Bennett. "Optimizing random forests: spark implementations of random genetic forests". BOHR International Journal of Engineering 1, nr 1 (2022): 42–51. http://dx.doi.org/10.54646/bije.2022.09.

Pełny tekst źródła
Streszczenie:
The Random Forest (RF) algorithm, originally proposed by Breiman et al. (1), is a widely used machine learning algorithm that gains its merit from its fast learning speed as well as high classification accuracy. However, despiteits widespread use, the different mechanisms at work in Breiman’s RF are not yet fully understood, and there is stillon-going research on several aspects of optimizing the RF algorithm, especially in the big data environment. To optimize the RF algorithm, this work builds new ensembles that optimize the random portions of the RF algorithm using genetic algorithms, yielding Random Genetic Forests (RGF), Negatively Correlated RGF (NC-RGF), and Preemptive RGF (PFS-RGF). These ensembles are compared with Breiman’s classic RF algorithm in Hadoop’s big data framework using Spark on a large, high-dimensional network intrusion dataset, UNSW-NB15.
Style APA, Harvard, Vancouver, ISO itp.
20

Qu, Quanbo, Baocang Wang, Yuan Ping i Zhili Zhang. "Improved Cryptanalysis of a Fully Homomorphic Symmetric Encryption Scheme". Security and Communication Networks 2019 (2.06.2019): 1–6. http://dx.doi.org/10.1155/2019/8319508.

Pełny tekst źródła
Streszczenie:
Homomorphic encryption is widely used in the scenarios of big data and cloud computing for supporting calculations on ciphertexts without leaking plaintexts. Recently, Li et al. designed a symmetric homomorphic encryption scheme for outsourced databases. Wang et al. proposed a successful key-recovery attack on the homomorphic encryption scheme but required the adversary to know some plaintext/ciphertext pairs. In this paper, we propose a new ciphertext-only attack on the symmetric fully homomorphic encryption scheme. Our attack improves the previous Wang et al.’s attack by eliminating the assumption of known plaintext/ciphertext pairs. We show that the secret key of the user can be recovered by running lattice reduction algorithms twice. Experiments show that the attack successfully and efficiently recovers the secret key of the randomly generated instances with an overwhelming probability.
Style APA, Harvard, Vancouver, ISO itp.
21

Duan, Guiduo, Xiaotong Wang, Tianxi Huang i Jürgen Kurths. "An Improved Group Similarity-Based Association Rule Mining Algorithm in Complex Scenes". International Journal of Pattern Recognition and Artificial Intelligence 34, nr 02 (14.06.2019): 2059005. http://dx.doi.org/10.1142/s0218001420590053.

Pełny tekst źródła
Streszczenie:
Association rule (AR) mining in complex scene has attracted extensive attention of researchers in recent years. Typically, many researchers focused on an algorithm itself and ignored a generalization method to improve the performance of AR mining. Tuna et al., presented a general data structure Speeding-Up AR Structure with Inverted Index Compression (SAII) which could be utilized in most of the existing algorithms to improve their performance IEEE Trans. Cybern. 46(12) (2016) 3059–3072. However, we found that this algorithm consumes a lot of time in re-ordering data because a one-to-one comparison method is used in this process, which is the main reason that the speeding-up structure is difficult to establish when coping with much more large amount of data. To overcome these problems, this paper aims to propose an improved speeding-up AR algorithm based on group similarity and Apache Spark framework to further reduce the memory requirements and runtime. Our simulation results on the police business big dataset make clear that our improved approach performs well and is more suitable for a big data environment.
Style APA, Harvard, Vancouver, ISO itp.
22

Aziz, Sameen, Saleem Ullah, Bushra Mughal, Faheem Mushtaq i Sabih Zahra. "Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms". Pakistan Journal of Engineering and Technology 3, nr 2 (22.10.2020): 172–77. http://dx.doi.org/10.51846/vol3iss2pp172-177.

Pełny tekst źródła
Streszczenie:
People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (Bilal et al. 2016)language processes is a big challenging task for the researchers because of lack of resources and its non-structured and non-standard syntax / script. We have collected the Dataset from Kaggle containing 21000 values with manually annotated and prepare the data for machine learning and then we apply different machine learning algorithms(SVM , Logistic regression , Random Forest, Naïve Bayes ,AdaBoost, KNN )(Bowers et al. 2018) with different parameters and kernels and with TFIDF(Unigram , Bigram , Uni-Bigram)(Pereira et al. 2018) from the algorithms we find the best fit algorithm , then from the best algorithm we choose 4 algorithms and combined them to deploy on the data set but after the deployment of the hyperparameters we get the best model build by the Support Vector Machine with linear kernel which are 80% accuracy and F1 score 0.79 precision 0.79 and recall is 0.78 with (Ezpeleta et al. 2018)Grid Search CV and CV is 5 fold. Then we perform experiments on the Robust linear Regression model estimation using (Huang, Gao, and Zhou 2018)(Chum and Matas 2008)RANSAC(random sample Consensus) that gives us the best estimators with 82.19%.
Style APA, Harvard, Vancouver, ISO itp.
23

Rahman, Md Atiqur, i Mohamed Hamada. "Lossless Image Compression Techniques: A State-of-the-Art Survey". Symmetry 11, nr 10 (11.10.2019): 1274. http://dx.doi.org/10.3390/sym11101274.

Pełny tekst źródła
Streszczenie:
Modern daily life activities result in a huge amount of data, which creates a big challenge for storing and communicating them. As an example, hospitals produce a huge amount of data on a daily basis, which makes a big challenge to store it in a limited storage or to communicate them through the restricted bandwidth over the Internet. Therefore, there is an increasing demand for more research in data compression and communication theory to deal with such challenges. Such research responds to the requirements of data transmission at high speed over networks. In this paper, we focus on deep analysis of the most common techniques in image compression. We present a detailed analysis of run-length, entropy and dictionary based lossless image compression algorithms with a common numeric example for a clear comparison. Following that, the state-of-the-art techniques are discussed based on some bench-marked images. Finally, we use standard metrics such as average code length (ACL), compression ratio (CR), pick signal-to-noise ratio (PSNR), efficiency, encoding time (ET) and decoding time (DT) in order to measure the performance of the state-of-the-art techniques.
Style APA, Harvard, Vancouver, ISO itp.
24

Yang, Junjie. "Technology Focus: Data Analytics (October 2023)". Journal of Petroleum Technology 75, nr 10 (1.10.2023): 93–94. http://dx.doi.org/10.2118/1023-0093-jpt.

Pełny tekst źródła
Streszczenie:
Throughout the present decade, the oil and gas industry is experiencing and enduring digital transformation, marked by continuous evolution. In the pursuit of increased efficiency, companies have embraced cutting-edge technologies in artificial intelligence and big data, effectively automating their existing work flows. Having undergone extensive exploration and refinement, the debate between physics-based models and data-driven models has now found a harmonious middle ground. This convergence has paved the way for significant advancements in the realms of transfer learning and machine-learning-assisted simulation, opening up new avenues for research and practical application. The paramount objective of data scientists in the oil and gas industry is to decipher hidden patterns within the data and subsequently transform these insights into domain knowledge, which, in turn, guides operation and facilitates educated decision-making. Hence, despite the thrilling advancements in deep learning and large language models, the upstream industry continues to extensively use traditional machine-learning models such as tree-based algorithms for processing structured data sets because of their strong interpretability. Another prominent trend involves the widespread development and application of no-code or low-code machine-learning platforms, empowering citizen data scientists to efficiently leverage machine-learning models. As we press forward, we are confronted with a series of challenges, including the following: - Coping with an abundance of data with low quality that need significant effort in manipulation and quality control - Contending with the relatively high cost of data acquisition, which curtails the full exploitation of data-hungry models - Encountering a scarcity of successful proof-of-concept projects that make it to production - Lacking interpretability of complex but powerful models, hindering our ability to deliver tangible insight to subject-matter experts and management The selected list of papers shares some interesting examples to showcase the value added by close collaboration between data scientists and subject-matter experts and how the power of the digital revolution actively revitalizes our industry. Recommended additional reading at OnePetro: www.onepetro.org. SPE 204712 Data-Driven Solution for Enhancing Workover Intervention Activities by Saniya Karnik, SLB, et al. SPE 211363 Generation of a Regional Fluid Database for Gas Condensate Assets in a Thrusted Carbonate Environment of the Northern Emirates by Harshil Saradva, Sharjah National Oil Corporation, et al. SPE 211719 Machine-Learning-Assisted Well-Log Data Quality Control and Preprocessing Lab by Nader Gerges, ADNOC, et al.
Style APA, Harvard, Vancouver, ISO itp.
25

Flores, Eibar, Hannah Hansen i Simon Clark. "Semantic Technologies to Model Battery Data and Knowledge". ECS Meeting Abstracts MA2023-02, nr 1 (22.12.2023): 112. http://dx.doi.org/10.1149/ma2023-021112mtgabs.

Pełny tekst źródła
Streszczenie:
Data-intensive battery research is here to stay. With open data incentives on the rise and simpler software interfaces granting access to powerful machine learning algorithms, researchers are uncovering a wealth of insights from battery data. However, data without context are just unusable digital bits. Battery data must be appropriately described to render it interoperable and reusable for traditional and data-intensive research. This requires overcoming numerous challenges associated to data heterogeneity, inconsistency, standardization, and meaning. Semantic technologies address these challenges by providing a unified and flexible framework for integrating and reasoning about data from heterogeneous sources. In this contribution, we will introduce Semantic Technologies and how they enable both human and software agents to understand and reuse battery data. We will present the tools we are developing to i) encourage the adoption of a common standard vocabulary to describe the battery domain[1], ii) connect research resources (datasets, experts, reports, publications, projects) within an accessible platform[2], and iii) reduce the effort required to upgrade data with semantic context. Overall, these tools take concrete steps towards guaranteeing compliance with FAIR data principles. We will further outline plans for future development, including strategies for reaching out to the global battery landscape, facilitate the curation of battery data for data-driven research, and enable knowledge exploration via natural language models. [1] S. Clark et al., "Toward a Unified Description of Battery Data," Adv. Energy Mater., vol. 2102702, 2021. [2] https://github.com/BIG-MAP/Demo-BatteryDataSemanticSearch
Style APA, Harvard, Vancouver, ISO itp.
26

Çetin, F., i M. O. Kulekci. "AN EXPERIMENTAL ANALYSIS OF SPATIAL INDEXING ALGORITHMS FOR REAL TIME SAFETY CRITICAL MAP APPLICATION". ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-4-2021 (17.06.2021): 41–48. http://dx.doi.org/10.5194/isprs-annals-v-4-2021-41-2021.

Pełny tekst źródła
Streszczenie:
Abstract. This paper presents a study that compares the three space partitioning and spatial indexing techniques, KD Tree, Quad KD Tree, and PR Tree. KD Tree is a data structure proposed by Bentley (Bentley and Friedman, 1979) that aims to cluster objects according to their spatial location. Quad KD Tree is a data structure proposed by Berezcky (Bereczky et al., 2014) that aims to partition objects using heuristic methods. Unlike Bereczky’s partitioning technique, a new partitioning technique is presented based on dividing objects according to space-driven, in the context of this study. PR Tree is a data structure proposed by Arge (Arge et al., 2008) that is an asymptotically optimal R-Tree variant, enables data-driven segmentation. This study mainly aimed to search and render big spatial data in real-time safety-critical avionics navigation map application. Such a real-time system needs to efficiently reach the required records inside a specific boundary. Performing range query during the runtime (such as finding the closest neighbors) is extremely important in performance. The most crucial purpose of these data structures is to reduce the number of comparisons to solve the range searching problem. With this study, the algorithms’ data structures are created and indexed, and worst-case analyses are made to cover the whole area to measure the range search performance. Also, these techniques’ performance is benchmarked according to elapsed time and memory usage. As a result of these experimental studies, Quad KD Tree outperformed in range search analysis over the other techniques, especially when the data set is massive and consists of different geometry types.
Style APA, Harvard, Vancouver, ISO itp.
27

Lysaght, Tamra, Hannah Yeefen Lim, Vicki Xafis i Kee Yuan Ngiam. "AI-Assisted Decision-making in Healthcare". Asian Bioethics Review 11, nr 3 (wrzesień 2019): 299–314. http://dx.doi.org/10.1007/s41649-019-00096-0.

Pełny tekst źródła
Streszczenie:
Abstract Artificial intelligence (AI) is set to transform healthcare. Key ethical issues to emerge with this transformation encompass the accountability and transparency of the decisions made by AI-based systems, the potential for group harms arising from algorithmic bias and the professional roles and integrity of clinicians. These concerns must be balanced against the imperatives of generating public benefit with more efficient healthcare systems from the vastly higher and accurate computational power of AI. In weighing up these issues, this paper applies the deliberative balancing approach of the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019). The analysis applies relevant values identified from the framework to demonstrate how decision-makers can draw on them to develop and implement AI-assisted support systems into healthcare and clinical practice ethically and responsibly. Please refer to Xafis et al. (2019) in this special issue of the Asian Bioethics Review for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end of this paper.
Style APA, Harvard, Vancouver, ISO itp.
28

Suwalska, Aleksandra, i Joanna Polanska. "GMM-Based Expanded Feature Space as a Way to Extract Useful Information for Rare Cell Subtypes Identification in Single-Cell Mass Cytometry". International Journal of Molecular Sciences 24, nr 18 (13.09.2023): 14033. http://dx.doi.org/10.3390/ijms241814033.

Pełny tekst źródła
Streszczenie:
Cell subtype identification from mass cytometry data presents a persisting challenge, particularly when dealing with millions of cells. Current solutions are consistently under development, however, their accuracy and sensitivity remain limited, particularly in rare cell-type detection due to frequent downsampling. Additionally, they often lack the capability to analyze large data sets. To overcome these limitations, a new method was suggested to define an extended feature space. When combined with the robust clustering algorithm for big data, it results in more efficient cell clustering. Each marker’s intensity distribution is presented as a mixture of normal distributions (Gaussian Mixture Model, GMM), and the expanded space is created by spanning over all obtained GMM components. The projection of the initial flow cytometry marker domain into the expanded space employs GMM-based membership functions. An evaluation conducted on three established cellular identification algorithms (FlowSOM, ClusterX, and PARC) utilizing the most substantial publicly available annotated dataset by Samusik et al. demonstrated the superior performance of the suggested approach in comparison to the standard. Although our approach identified 20 cell clusters instead of the expected 24, their intra-cluster homogeneity and inter-cluster differences were superior to the 24-cluster FlowSOM-based solution.
Style APA, Harvard, Vancouver, ISO itp.
29

Giri, Pradip Raj, i Toyanath Upadhayaya. "Digital Ethics and Mindful Circulation: Navigating the Complexities in the Digital Age". Myagdi Guru 5, nr 1 (31.12.2022): 1–10. http://dx.doi.org/10.3126/mg.v5i1.70615.

Pełny tekst źródła
Streszczenie:
In the digital age, characterized by unprecedented connectivity and rapid information exchange, the dynamics of writing, communication, and ethical considerations have undergone profound transformations. This paper aims at exploring why the ethical implications of digital communication are important during circulating and disseminating information in the context of digital ethics. The primary texts for this paper are tweets by former President of United States of America, Donald Trump and some other posts on social media. To support the claim, the researchers use a wide range of scholars such as L. Floridi who focuses on digital ethics and L.E. Gries who discusses “circulation” as a central concept in digital rhetoric. This paper also argues that algorithms and big data applications should prioritize improving human well-being, dignity, rights, and autonomy. For that, the researchers have used Gilli A. et al.’s concept of “human-centered” writing, M.C. Murphy’s insights about the role of social media algorithms, and B. McComiskey’s concept of “post-truth.” This paper concludes that digital technologies and algorithms should shape the ethical landscape of information dissemination thereby emphasizing the importance of ethical frameworks in navigating the complexities of digital communication to safeguard societal values, truthfulness, and democratic discourse amidst the challenges posed by misinformation and capitalist influences.
Style APA, Harvard, Vancouver, ISO itp.
30

Amicelle, Anthony. "Policing & big data. La mise en algorithmes d’une politique internationale". Critique internationale N° 92, nr 3 (28.06.2021): 23–48. http://dx.doi.org/10.3917/crii.092.0026.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Dimitriadis, George, Joana P. Neto i Adam R. Kampff. "t-SNE Visualization of Large-Scale Neural Recordings". Neural Computation 30, nr 7 (lipiec 2018): 1750–74. http://dx.doi.org/10.1162/neco_a_01097.

Pełny tekst źródła
Streszczenie:
Electrophysiology is entering the era of big data. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, that is, single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention (Rey, Pedreira, & Quian Quiroga, 2015 ; Rossant et al., 2016 ) but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grows exponentially. Here we introduce the [Formula: see text]-student stochastic neighbor embedding (t-SNE) dimensionality reduction method (Van der Maaten & Hinton, 2008 ) as a visualization tool in the spike sorting process. t-SNE embeds the [Formula: see text]-dimensional extracellular spikes ([Formula: see text] = number of features by which each spike is decomposed) into a low- (usually two-) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets from both hybrid (Rossant et al., 2016 ) and paired juxtacellular/extracellular recordings (Neto et al., 2016 ). We have released a graphical user interface (GUI) written in Python as a tool for the manual clustering of the t-SNE embedded spikes and as a tool for an informed overview and fast manual curation of results from different clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.
Style APA, Harvard, Vancouver, ISO itp.
32

Jiao, Yang, Bruno Goutorbe, Christelle Grauer, Matthieu Cornec i Jérémie Jakubowicz. "The categorization challenge organized by Cdiscount on datascience.net in 2015: analysis of the released data set and winning contributions". Statistique et Enseignement 8, nr 2 (2017): 125–35. https://doi.org/10.3406/staso.2017.1375.

Pełny tekst źródła
Streszczenie:
En 2015, Cdiscount a mis la communauté au défi de prévoir la catégorie correcte de ses produits à partir de certains de leurs attributs comme le libellé, la description, le prix ou l’image associée. Les candidats ont eu accès à l’intégralité du catalogue de produits actifs en mai 2015, soit environ 15.8 millions d’items répartis dans 5,789 catégories, hormis une petite partie qui a servi d’ensemble de test. La qualité des données est loin d’être homogène et la répartition des catégories est extrêmement déséquilibrée, ce qui complique la tâche de catégorisation. Les cinq algorithmes gagnants, sélectionnés parmi plus de 3,500 contributions, atteignent un taux de prévisions correctes de 66– 68% sur l’ensemble de test. La plupart utilisent des modèles linéaires simples comme des régressions logistiques, ce qui suggère que les étapes préliminaires telles que le pré-traitement du texte, sa vectorisation et le rééchantillonnage des données sont plus cruciales que le choix de modèles non-linéaires complexes. En particulier, les gagnants corrigent tous le déséquilibre des catégories par des méthodes d’échantillonnage aléatoire ou de pondération en fonction de l’importance des catégories. Les deux meilleurs algorithmes se distinguent par leur agrégation de grands nombres de modèles entrainés sur des sous-ensembles aléatoires des données. Le catalogue de produits est mis à disposition de la communauté de recherche et formation scientifique, qui disposera ainsi de données réelles issues du e-commerce pour étalonner et améliorer les algorithmes de classification basés sur le texte et les images dans un contexte de très grand nombre de classes.
Style APA, Harvard, Vancouver, ISO itp.
33

Laroche, Arnaud. "La « vague de fond » du BigData". Statistique et société 2, nr 4 (2014): 13–18. https://doi.org/10.3406/staso.2014.928.

Pełny tekst źródła
Streszczenie:
Ce qui caractérise l’ère du «BigData » , ce n’est pas seulement la taille des fichiers de données, ou la vitesse à laquelle on doit les traiter : c’est surtout la place qu’on leur attribue dans les processus industriels et de services. Désormais, c’est « la donnée » qui pilote des applications et des décisions. Les cas d’usage se multiplient, du micro-ciblage de publicités au pilotage de réseaux d’énergie ou de transport. Des secteurs comme les télécommunications, l’assurance, la banque sont directement concernés ; et l’information économique et sociale publique sera elle aussi touchée. Ce vaste mouvement donne à réfléchir : quelle est la place des analyses statistiques traditionnelles par rapport aux algorithmes « data-driven » ? Comment donner une maîtrise au citoyen-consommateur sur ses propres informations ? Comment concevoir et mieux maîtriser les implications sociales de la généralisation des algorithmes ?
Style APA, Harvard, Vancouver, ISO itp.
34

Potier, Victor. "Dominique Cardon, À quoi rêvent les algorithmes ? Nos vies à l’heure du big data". Sociologie 9, nr 3 (2018): 339. http://dx.doi.org/10.3917/socio.093.0339.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Quellet, Céline. "Cardon Dominique, À quoi rêvent les algorithmes ? Nos vies à l’heure des big data". Recherches sociologiques et anthropologiques 47, nr 2 (31.12.2016): 154–56. http://dx.doi.org/10.4000/rsa.1779.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Beaudouin, Valérie, i Winston Maxwell. "La prédiction du risque en justice pénale aux états-unis : l’affaire propublica-compas". Réseaux N° 240, nr 4 (21.09.2023): 71–109. http://dx.doi.org/10.3917/res.240.0071.

Pełny tekst źródła
Streszczenie:
Un article publié par le média Pro Publica en 2016 considère que le logiciel Compas, utilisé aux États-Unis pour la prédiction de la récidive, porte préjudice à la population noire : « It’s biased against blacks ». La publication crée une onde de choc dans l’espace public et alimente les débats sur l’équité des algorithmes et sur le bien-fondé de ces outils de prédiction du risque. Ces débats étaient jusque-là confinés dans des sphères de spécialistes. En partant de l’affaire ProPublica-Compas, nous avons exploré les différents embranchements de la controverse dans l’arène des « data sciences » et dans celle du monde de la justice. Si, dans l’espace médiatique, l’affaire Compas illustre les dérives liées aux algorithmes et vient renforcer les inquiétudes autour de l’intelligence artificielle (peur du remplacement, du renforcement des inégalités et de l’opacité), dans le monde académique deux arènes s’emparent de l’affaire. Dans l’arène des data sciences, les chercheurs discutent des critères d’équité et de leur incompatibilité, montrant à quel point la traduction d’un principe moral en indicateurs statistiques est problématique. Ils débattent également de la supériorité supposée de la machine sur l’homme dans les tâches de prédiction. Dans l’arène de la justice pénale, espace beaucoup plus hétérogène, l’affaire ProPublica-Compas renforce la prise de conscience qu’il est nécessaire de mieux évaluer les outils avant de les utiliser, de comprendre comment les juges s’approprient ces outils en contexte et amène les ONG qui défendent les prisonniers et les législateurs à changer de posture par rapport à ces outils de prédiction. Tandis que l’arène des data sciences fonctionne dans un entre-soi disciplinaire, focalisé sur les données et les algorithmes hors contexte, dans l’arène juridique, qui assemble des acteurs hétérogènes, la question de l’inscription des outils dans la pratique professionnelle occupe une place centrale.
Style APA, Harvard, Vancouver, ISO itp.
37

Ampomah, Ernest Kwame, Zhiguang Qin i Gabriel Nyame. "Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement". Information 11, nr 6 (20.06.2020): 332. http://dx.doi.org/10.3390/info11060332.

Pełny tekst źródła
Streszczenie:
Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.
Style APA, Harvard, Vancouver, ISO itp.
38

Monino, Jean-Louis. "Big Data et Open Data". Marché et organisations N° 51, nr 3 (31.07.2024): 169–98. http://dx.doi.org/10.3917/maorg.051.0169.

Pełny tekst źródła
Streszczenie:
« Big Data » et « Open Data » constituent des gisements d’informations encore sous-exploités. La question associée à cet article considère quels éléments peuvent éclairer une meilleure intégration de leur potentiel. Par un travail d’analyses conceptuelles, seront montrées leurs aptitudes à transformer « business », gouvernement et société. Le Big Data donne le pouvoir d’analyser et changer le monde. L’ Open Data assure que ce pouvoir soit partagé et favorise des démarches d’innovation ouverte, suivant, d’une part des organisations détenant des données exploitables, d’autre part des entreprises créatives, dont startups et PME, fournisseuses de technologies innovantes, cela aboutissant à faire émerger des solutions radicalement nouvelles. Avec l’évolution de la démarche d’analyse, des modèles économiques basés sur les stratégies de monétisation sont donc à construire. En devenant le socle actif de nouveaux produits et services, le Big Data Analytics appelle une révision profonde des modèles économiques, désormais nécessairement orientés données.
Style APA, Harvard, Vancouver, ISO itp.
39

Dinh-Xuan, A. T. "«Big Data» et pneumologie". Revue des Maladies Respiratoires Actualités 7, nr 3 (wrzesień 2015): 197–99. http://dx.doi.org/10.1016/s1877-1203(15)30052-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
40

Baudin, Bruno. "Biologie et big data". Revue Francophone des Laboratoires 2019, nr 509 (luty 2019): 1. http://dx.doi.org/10.1016/s1773-035x(19)30042-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Charaudeau, Marie-Odile, Alexis Fritel, Charles Huot, Philippe Martin i Laurent Prével. "Et demain ? Archivage et big data". La Gazette des archives 240, nr 4 (2015): 373–84. http://dx.doi.org/10.3406/gazar.2015.5319.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
42

Lin, Ai Qin, Min Li Zheng, Chun Guang Fan i Lin Yang. "Surface Morphology Simulation of High Speed Milled of Face Milling Cutters". Advanced Materials Research 305 (lipiec 2011): 225–29. http://dx.doi.org/10.4028/www.scientific.net/amr.305.225.

Pełny tekst źródła
Streszczenie:
To surface milling cutters for research object, established considering the single spindle partial pendulum milling cutter tooth surfaces of high speed cutting 3d surface morphology simulation model by using graphic matrix transformation principle and vector algorithms. Comparing the simulation and forecast of surface morphology and surface roughness with the actual machining surface morphology and surface roughness by using the workpiece simulation algorithm meshing, we verify the correctness of the simulation model. The simulation analyses the influence regularity of surface morphology and surface roughness by changing cutting parameters and geometrical parameters. This can help us choosing the reasonable cutting parameters and geometrical parameters and have significance on the actual machining. The surface milling cutters are high efficiency and good quality of cutting big plane or curved surface. With the development of high speed cutting technology, in high speed milling process, product crumbs tumor and scales thorn hardly exists, so cutter geometrical parameters, cutting data and so on will be the main influence reasons of surface roughness. In order to satisfied the quality requirements, at present, we choice tools and determine the milling parameters depending on experience but it is limited. The surface roughness of the processing components is reflected intuitively by processed surface of microscopic geometric shape. So surface of microscopic geometric shape produced by theory emulation is significant to forecast the surface roughness and selecting reasonable cutting parameters. Currently, there are some simulation method researches about surface of microscopic geometric shape. Zhao Xiao ming et al [1, 2] has researched the simulation modeling of microscopic geometric shape of ball end mills during processing; Xu An ping et al [3, 4] has researched the simulation modeling methods of peripherally milling processing; Zhang Guang Peng et al [5] has researched the inversion multiple tooth surfaces of the milling cutter surface morphology simulation and develop simulation software. But all above researches are ideal simulation of surface shape. There are few researches about simulation of surface shape on condition of spindle partial pendulum. Based on object of surface milling cutters, this article researches simulation modeling methods of surface topography on condition of high speed milling and give an account of the corresponding simulation algorithm. From the article, we also get the influence law of microscopic geometric shape depending on different milling dosage, cutter geometrical parameters and eccentric quantity and get the significance conclusion to actual production.
Style APA, Harvard, Vancouver, ISO itp.
43

Charpentier, Arthur. "Big Data, GAFA et assurance". Annales des Mines - Réalités industrielles Févrir2020, nr 1 (2020): 53. http://dx.doi.org/10.3917/rindu1.201.0053.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
44

Bensamoun, Alexandra, i Célia Zolynski. "Cloud computing et big data". Réseaux 189, nr 1 (2015): 103. http://dx.doi.org/10.3917/res.189.0103.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
45

Salam, Ghizlane. "Big Data et Intelligence Collaborative". Economia N.A, nr 29 (kwiecień 2017): 29–33. http://dx.doi.org/10.12816/0046972.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

de Corbière, François. "ERP, RSN et Big data". Systèmes d'information & management Volume 27, nr 3 (24.01.2023): 3–4. http://dx.doi.org/10.3917/sim.223.0003.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
47

Sédar Paluku, Mutsotsya, Mpia Héritier Nsenge, Munga Manassé Nzanzu, Baelani Nephtali Inipaivudu i Kasolene Moïse Katembo. "Prédiction des notes finales des étudiants en fin du premier cycle : Utilisation de Data Mining éducatif". Revue Internationale Multidisciplinaire Etincelle 25, nr 2 (6.09.2024): 1–17. http://dx.doi.org/10.61532/rime252114.

Pełny tekst źródła
Streszczenie:
L'analyse des performances académiques est basée sur un certain nombre de facteurs qui mesurent les capacités intellectuelles. Cette mesure de la performance est basée sur un certain nombre de variables qui peuvent être utilisées pour prédire la performance académique et les notes finales. Les outils statistiques permettent avant tout de mesurer l'impact des données les unes sur les autres. À cette fin, l'étude des corrélations est le meilleur compromis pour identifier les vrais et les faux prédicteurs entre les variables. Aussi, par le biais des statistiques descriptives, la distribution des données est d'ordre important pour mesurer les différentes dispersions des données dans leur exploration, ces analyses minutieuses amènent le chercheur à devoir cibler les éléments nécessaires à la prédiction des outputs. Pour prédire les performances académiques des étudiants, il est important d'avoir une idée de l'historique de leurs études, afin de pouvoir identifier les facteurs qui peuvent être déterminants pour prédire le résultat final à la fin du cycle. Les modèles d'apprentissage automatique ont joué un rôle clé dans la prédiction des performances. Après avoir exploré les données, nous avons pu comparer les performances des différents algorithmes. Nous avons comparé six algorithmes, dont deux ont atteint une performance acceptable de 80% pour le coefficient de détermination : la régression linéaire et le SVM. L'algorithme le plus performant est la régression linéaire, étant donné que le problème abordé est basé sur la régression. La régression est un type d'algorithme souvent adapté à l'analyse quantitative où la variable cible supporte des valeurs continues. Une fois les modèles entraînés, ils ont été déployés sur la plateforme web à l'aide du Framework Python Flask, en vue de mettre les résultats à la disposition du grand public. Pour illustrer notre modèle, nous avons utilisé 6 variables susceptibles de déterminer les meilleurs prédicteurs. Une fois le modèle entraîné, il peut être utilisé et sauvegardé pour un déploiement ultérieur.
Style APA, Harvard, Vancouver, ISO itp.
48

Courtney, M. "Puzzling out big data". Engineering & Technology 7, nr 12 (1.12.2012): 56–60. http://dx.doi.org/10.1049/et.2012.1215.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
49

Le Ven, Éric. "Dam et big data : entre fantasme et réalité". Archimag N°318, nr 8 (1.10.2018): 20–22. http://dx.doi.org/10.3917/arma.318.0020.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Nwanosike, E. M., W. Sunter, M. A. Ansari, H. Merchant, B. R. Conway i S. S. Hasan. "469 Use of data-driven technology to optimise DOACs in morbidly obese patients". International Journal of Pharmacy Practice 31, Supplement_1 (1.04.2023): i20—i21. http://dx.doi.org/10.1093/ijpp/riad021.023.

Pełny tekst źródła
Streszczenie:
Abstract Introduction Despite the advantages of Direct Oral Anticoagulants (DOACs) over older classes of anticoagulants, clinical experience is limited in special populations; data from landmark trials on safety and efficacy are relatively scarce (compared to warfarin). This makes it challenging for clinicians to prescribe the right DOAC at the right dose for such patients (e.g., morbidly obese patients) (1). Insights derived from analysing real-world data have proven to be a vital source of clinical evidence backing the recommendation of medications (2). Therefore, data-driven technologies like machine learning can harness big data in electronic health records (EHRs) to optimise DOAC therapy and improve clinical outcomes. Aim The study aims to accurately predict clinical outcomes in morbidly obese patients, and identify the key variables in the model for optimising the safety and efficacy of Direct Oral Anticoagulant (DOAC) doses. Methods An observational, retrospective cohort study was carried out in partnership with an NHS Trust. Based on eligibility criteria, the dataset of morbidly obese patients on DOACs was extracted from EHRs, pre-processed and analysed considering the access granted. After partitioning the entire dataset into a 70:30 split, the training dataset (70%) was run through selected machine learning (ML) classifiers (Random Forest, decision trees, K-nearest neighbours (KNN), bootstrap aggregation algorithm, gradient boosting classifier, support vector machines, and logistic regression) to rank variables, and derive predictions which were evaluated against the test dataset (30%). A multivariate regression model was used to adjust for confounders and to explore the relationships between DOAC regimens and clinical outcomes. Results We identified 4,275 morbidly obese patients out of n=97,413 records overall. The bootstrap aggregation, decision trees, and random forest classifiers (from the ML algorithms tested) achieved superior prediction accuracies (98.6%, 97.9%, and 98.3%, respectively) for the individual DOAC doses, with excellent values for precision, recall, and F1 scores (performance metrics). The most important characteristics in the model for predicting mortality and stroke were age, treatment days, and length of stay. Among DOACs, apixaban (84%) was the most frequently prescribed DOAC followed by rivaroxaban (15%). Apixaban 2.5 mg (twice daily) received the highest ranking for relevance to mortality, while it raised the mortality risk (OR 1.430, 95% CI: 1.181, 1.1.732, p=0.001). There were mixed results for apixaban 5mg (twice daily), the most widely prescribed dose of apixaban (54%), with significantly reduced risk of mortality (OR 0.751, 95% CI: 0.632, 0.905, p=0.003), but significantly increased risk of stroke events (OR 32.457, 95% CI: 17.083-61.664, p=0.001). Conclusion Given the large sample size—a strength of our study, data-driven technologies were successfully employed in predicting the safety and efficacy of DOACs in morbidly obese patients using the real-world dataset; the key variables in the model for optimising clinical outcomes were identified. However, the limitations in our study, such as reporting errors, selection bias, and confounding bias, were not ruled out. Therefore, confirmatory studies (e.g., external validation with prospective data) are needed to confirm findings and provide a sound basis for universal deployment in clinical settings. References 1. Chen A, Stecker E, A. Warden B. Direct Oral Anticoagulant Use: A Practical Guide to Common Clinical Challenges. J Am Heart Assoc. 2020 Jul 7;9(13):e017559. 2. Hu C, Liu Z, Jiang Y, Shi O, Zhang X, Xu K, et al. Early prediction of mortality risk among patients with severe COVID-19, using machine learning. Int J Epidemiol. 2021;49(6):1918–29.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii