Articoli di riviste: "Twitter corpora"

1

Aditya Prakash. "Twitter Sentimental Analysis". International Journal for Modern Trends in Science and Technology 6, n. 12 (18 dicembre 2020): 355–59. http://dx.doi.org/10.46501/ijmtst061266.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Twitter sentiment analysis (TSA) provides the methods to survey public emotions about the products or events associated with them. Categorization of opinions through tweets involves a great scope of study and may yield interesting results and insights on public opinion and social behavior towards different events, services, product, geopolitical issues, situations and scenarios that concern mankind at large. These attributes are expressed explicitly through emoticons, exclamation, sentiment words and so on. In this paper, we introduce a word embedding (Word2Vec) technique obtained by unsupervised learning built on large twitter corpora, this process uses co-occurrence statistical characteristics between words in tweets and hidden contextual semantic interrelation

2

Danielewicz-Betz, A., H. Kaneda, M. Mozgovoy e M. Purgina. "Creating English and Japanese Twitter Corpora for Emotion Analysis". International Journal of Knowledge Engineering-IACSIT 1, n. 2 (2015): 120–24. http://dx.doi.org/10.7763/ijke.2015.v1.20.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Hu, Yuheng, Kartik Talamadupula e Subbarao Kambhampati. "Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language". Proceedings of the International AAAI Conference on Web and Social Media 7, n. 1 (3 agosto 2021): 244–53. http://dx.doi.org/10.1609/icwsm.v7i1.14443.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Twitter has become the de facto information sharing and communication platform. Given the factors that influence language on Twitter - size limitation as well as communication and content-sharing mechanisms - there is a continuing debate about the position of Twitter's language in the spectrum of language on various established mediums. These include SMS and chat on the one hand (size limitations) and email (communication), blogs and newspapers (content sharing) on the other. To provide a way of determining this, we propose a computational framework that offers insights into the linguistic style of all these mediums. Our framework consists of two parts. The first part builds upon a set of linguistic features to quantify the language of a given medium. The second part introduces a flexible factorization framework, soclin, which conducts a psycholinguistic analysis of a given medium with the help of an external cognitive and affective knowledge base. Applying this analytical framework to various corpora from several major mediums, we gather statistics in order to compare the linguistics of Twitter with these other mediums via a quantitative comparative study. We present several key insights: (1) Twitter's language is surprisingly more conservative, and less informal than SMS and online chat; (2) Twitter users appear to be developing linguistically unique styles; (3) Twitter's usage of temporal references is similar to SMS and chat; and (4) Twitter has less variation of affect than other more formal mediums. The language of Twitter can thus be seen as a projection of a more formal register into a size-restricted space.

4

Sifianou, Maria. "Conceptualizing politeness in Greek: Evidence from Twitter corpora". Journal of Pragmatics 86 (settembre 2015): 25–30. http://dx.doi.org/10.1016/j.pragma.2015.05.019.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

5

Ling, Wang, Luís Marujo, Chris Dyer, Alan W. Black e Isabel Trancoso. "Mining Parallel Corpora from Sina Weibo and Twitter". Computational Linguistics 42, n. 2 (giugno 2016): 307–43. http://dx.doi.org/10.1162/coli_a_00249.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Microblogs such as Twitter, Facebook, and Sina Weibo (China's equivalent of Twitter) are a remarkable linguistic resource. In contrast to content from edited genres such as newswire, microblogs contain discussions of virtually every topic by numerous individuals in different languages and dialects and in different styles. In this work, we show that some microblog users post “self-translated” messages targeting audiences who speak different languages, either by writing the same message in multiple languages or by retweeting translations of their original posts in a second language. We introduce a method for finding and extracting this naturally occurring parallel data. Identifying the parallel content requires solving an alignment problem, and we give an optimally efficient dynamic programming algorithm for this. Using our method, we extract nearly 3M Chinese–English parallel segments from Sina Weibo using a targeted crawl of Weibo users who post in multiple languages. Additionally, from a random sample of Twitter, we obtain substantial amounts of parallel data in multiple language pairs. Evaluation is performed by assessing the accuracy of our extraction approach relative to a manual annotation as well as in terms of utility as training data for a Chinese–English machine translation system. Relative to traditional parallel data resources, the automatically extracted parallel data yield substantial translation quality improvements in translating microblog text and modest improvements in translating edited news content.

6

Liu, Xuan, Guohui Zhou, Minghui Kong, Zhengtong Yin, Xiaolu Li, Lirong Yin e Wenfeng Zheng. "Developing Multi-Labelled Corpus of Twitter Short Texts: A Semi-Automatic Method". Systems 11, n. 8 (1 agosto 2023): 390. http://dx.doi.org/10.3390/systems11080390.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Facing fast-increasing electronic documents in the Digital Media Age, the need to extract textual features of online texts for better communication is growing. Sentiment classification might be the key method to catch emotions of online communication, and developing corpora with annotation of emotions is the first step to achieving sentiment classification. However, the labour-intensive and costly manual annotation has resulted in the lack of corpora for emotional words. Furthermore, single-label semantic corpora could hardly meet the requirement of modern analysis of complicated user’s emotions, but tagging emotional words with multiple labels is even more difficult than usual. Improvement of the methods of automatic emotion tagging with multiple emotion labels to construct new semantic corpora is urgently needed. Taking Twitter short texts as the case, this study proposes a new semi-automatic method to annotate Internet short texts with multiple labels and form a multi-labelled corpus for further algorithm training. Each sentence is tagged with both the emotional tendency and polarity, and each tweet, which generally contains several sentences, is tagged with the first two major emotional tendencies. The semi-automatic multi-labelled annotation is achieved through the process of selecting the base corpus and emotional tags, data preprocessing, automatic annotation through word matching and weight calculation, and manual correction in case of multiple emotional tendencies are found. The experiments on the Sentiment140 published Twitter corpus demonstrate the effectiveness of the proposed approach and show consistency between the results of semi-automatic annotation and manual annotation. By applying this method, this study summarises the annotation specification and constructs a multi-labelled emotion corpus with 6500 tweets for further algorithm training.

7

Bokányi, Eszter, Dániel Kondor e Gábor Vattay. "Scaling in words on Twitter". Royal Society Open Science 6, n. 10 (ottobre 2019): 190027. http://dx.doi.org/10.1098/rsos.190027.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the metropolitan and micropolitan statistical areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes, we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf’s Law and Heaps' Law differ on Twitter from that of other texts, and that the exponent of Zipf’s Law changes with city size.

8

Yadav, Madan Lal, Anurag Dugar e Kuldeep Baishya. "Decoding Customer Opinion for Products or Brands Using Social Media Analytics". International Journal of Intelligent Information Technologies 18, n. 2 (aprile 2022): 1–20. http://dx.doi.org/10.4018/ijiit.296271.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This study uses aspect level sentiment analysis using lexicon-based approach to analyse online reviews of an Indian brand called Patanjali, which sells many FMCG products under its name. These reviews have been collected from the microblogging site twitter from where a total of 4961 tweets about ten Patanjali branded products have been extracted and analysed. Along with the aspect level sentiment analysis, an opinion tagged corpora has also been developed. Machine learning approaches - Support Vector Machine (SVM), Decision Tree, and Naïve Bayes have also been used to perform the sentiment analysis and to figure out the appropriate classifiers suitable for such product reviews analysis. Authors first identify customer preferences and / or opinions about a product or brand by analyisng online customer reviews as they express them on social media platform, twitter by using aspect level sentiment analysis. Authors also address the limitations of scarcity of opinion tagged data, required to train supervised classifiers to perform sentiment analysis by developing tagged corpora.

9

González Fernández, Adela. "Big data y corpus lingüísticos para el estudio de la densidad léxica". SKOPOS. Revista Internacional de Traducción e Interpretación. e-ISSN: 2695-8465. ISSN: 2255-3703 9 (15 luglio 2019): 107–22. http://dx.doi.org/10.21071/skopos.v9i0.12144.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

La unión entre la Informática y de la Lingüística es cada vez más frecuente en las investigaciones en el campo del lenguaje y de las lenguas. La Lingüística de corpus, en especial, se está viendo beneficiada por este emparejamiento, gracias a los avances a la hora de gestionar y procesar los corpora. En este trabajo damos un paso más y proponemos el trabajo en Lingüística de corpus a través de big data, en general, y de Twitter, en particular. Gracias a la creación de una herramienta informática diseñada específicamente para el trabajo lingüístico en big data, obtendremos una inmensa cantidad de información textual que nos servirá para la compilación de corpora mediante los que estudiaremos la diversidad léxica en el lenguaje de cuatro escritores españoles. Para ello, extraeremos los tuits publicados por ellos en sus cuentas de Twitter y los procesaremos a través de nuestra herramienta para obtener la información deseada. Intentaremos demostrar, también, la mejora que esta nueva metodología supone en este tipo de estudios.

10

Abdalla, Mohamed, Magnus Sahlgren e Graeme Hirst. "Enriching Word Embeddings with a Regressor Instead of Labeled Corpora". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 6188–95. http://dx.doi.org/10.1609/aaai.v33i01.33016188.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We propose a novel method for enriching word-embeddings without the need of a labeled corpus. Instead, we show that relying on a regressor – trained with a small lexicon to predict pseudo-labels – significantly improves performance over current techniques that rely on human-derived sentence-level labels for an entire corpora. Our approach enables enrichment for corpora that have no labels (such as Wikipedia). Exploring the utility of this general approach in both sentiment and non-sentiment-focused tasks, we show how enriching embeddings, for both Twitter and Wikipedia-based embeddings, provide notable improvements in performance for: binary sentiment classification, SemEval Tasks, embedding analogy task, and, document classification. Importantly, our approach is notably better and more generalizable than other state-of-the-art approaches for enriching both labeled and unlabeled corpora.

11

Adams, Julia Bahia, e Carlos Augusto Jardim Chiarelli. "A guide on extracting and tidying tweets with R". Cadernos de Linguística 2, n. 4 (3 dicembre 2021): e410. http://dx.doi.org/10.25189/2675-4916.2021.v2.n4.id410.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Social media platforms represent a deep resource for academic research and a wide range of untapped possibilities for linguists (D'ARCY; YOUNG, 2012). This rapidly developing field presents various ethical issues and unique challenges regarding methods to retrieve and analyze data. This tutorial provides a straightforward guide to harvesting and tidying Twitter data, focused mainly on the Tweets' text, by using the R programming language (R CORE TEAM, 2020) via Twitter's APIs. The R code was developed in Adams (2020), based on the rtweet package (KEARNEY, 2018), and successfully resulted in a script for corpora compilation. In this tutorial, we discuss limitations, problems, and solutions in our framework for conducting ethical research on this social networking site. Our ethical concerns go beyond what we "agree to" in terms of use and privacy policies, that is, we argue that their content does not contemplate all the concerns researchers need to attend to. Additionally, our aim is to show that using Twitter as a data source does not require advanced computational skills.

12

Vo, Duc-Thuan, Vo Thuan Hai e Cheol-Young Ock. "Exploiting Language Models to Classify Events from Twitter". Computational Intelligence and Neuroscience 2015 (2015): 1–11. http://dx.doi.org/10.1155/2015/401024.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets’ features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.

13

Alshaabi, Thayer, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth e Peter Sheridan Dodds. "Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter". Science Advances 7, n. 29 (luglio 2021): eabe6534. http://dx.doi.org/10.1126/sciadv.abe6534.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument’s potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest.

14

Pota, Marco, Mirko Ventura, Rosario Catelli e Massimo Esposito. "An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian". Sensors 21, n. 1 (28 dicembre 2020): 133. http://dx.doi.org/10.3390/s21010133.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

15

Bansal, Neetika, Vishal Goyal e Simpel Rani. "Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text". International Journal of E-Adoption 12, n. 1 (gennaio 2020): 52–62. http://dx.doi.org/10.4018/ijea.2020010105.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

People do not always use Unicode, rather, they mix multiple languages. The processing of codemixed data becomes challenging due to the linguistic complexities. The noisy text increases the complexities of language identification. The dataset used in this article contains Facebook and Twitter messages collected through Facebook graph API and twitter API. The annotated English Punjabi code mixed dataset has been trained using a pipeline Dictionary Vectorizer, N-gram approach with some features. Furthermore, classifiers used are Logistic Regression, Decision Tree Classifier and Gaussian Naïve Bayes are used to perform language identification at word level. The results show that Logistic Regression performs best with an accuracy of 86.63 with an F-1 measure of 0.88. The success of machine learning approaches depends on the quality of labeled corpora.

16

Rajasekar, Devaraj, e Lourdusamy Robert. "Unsupervised Word Embedding with Ensemble Deep Learning for Twitter Rumor Identification". Revue d'Intelligence Artificielle 36, n. 5 (23 dicembre 2022): 769–76. http://dx.doi.org/10.18280/ria.360515.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In social networks, rumor identification is a major problem. The structural data in a topic is applied to derive useful attributes for rumor identification. Most standard rumor identification methods concentrate on local structural attributes, ignoring the global structural attributes that exist between the source tweet and its responses. To tackle this issue, a Source-Replies relation Graph (SR-graph) has been built to develop an Ensemble Graph Convolutional neural Net (EGCN) with a Nodes Proportion Allocation Mechanism (NPAM) which identifies the rumor. But, the word vectors were trained by the standard word-embedding model which does not increase the accuracy for large Twitter databases. To solve this problem, an unsupervised word-embedding method is needed for large Twitter corpora. As a result, the Twitter word-embedded EGCN (T-EGCN) model is proposed in this article, which uses unsupervised learning-based word embedding to find rumors in huge Twitter databases. Initially, the latent contextual semantic correlation and co-occurrence statistical attributes among words in tweets are extracted. Then, to create a rumor attribute vector of tweets, these word embeddings are concatenated with the GloVe model's word attribute vectors, Twitter-specific attributes, and n-gram attributes. Further, the EGCN is trained by using this attribute vector to identify rumors in a huge Twitter database. Finally, the testing results exhibit that the T-EGCN achieves 87.56% accuracy, whereas the RNN, GCN, PGNN, EGCN, and BiLSTM-CNN attain 65.38%, 68.41%, 75.04%, 81.87%, and 86.12%, respectively for rumor identification.

17

Alsaedi, Nasser, Pete Burnap e Omer Rana. "Automatic Summarization of Real World Events Using Twitter". Proceedings of the International AAAI Conference on Web and Social Media 10, n. 1 (4 agosto 2021): 511–14. http://dx.doi.org/10.1609/icwsm.v10i1.14766.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Microblogging sites, such as Twitter, have become increasingly popular in recent years for reporting details of real world events via the Web. Smartphone apps enable people to communicate with a global audience to express their opinion and commentate on ongoing situations - often while geographically proximal to the event. Due to the heterogeneity and scale of the data and the fact that some messages are more salient than others for the purposes of understanding any risk to human safety and managing any disruption caused by events, automatic summarization of event-related microblogs is a non-trivial and important problem. In this paper we tackle the task of automatic summarization of Twitter posts, and present three methods that produce summaries by selecting the most representative posts from real-world tweet-event clusters. To evaluate our approaches, we compare them to the state-of-the-art summarization systems and human generated summaries. Our results show that our proposed methods outperform all the other summarization systems for English and non-English corpora.

18

Neves, Natane Isabel de Souza, Lia Rodrigues Lessa De Lima e Washington Sales Do Monte. "OS DISCURSOS DA PANDEMIA: UMA ANÁLISE LEXICOGRÁFICA NO TWITTER". REVISTA FOCO 16, n. 8 (3 agosto 2023): e2756. http://dx.doi.org/10.54751/revistafoco.v16n8-037.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

No período de pandemia da Covid-19, foi possível observar as diversas manifestações individuais da população nas redes sociais a respeito dos impactos sociais, econômicos e políticos da doença. Neste estudo realizamos uma análise lexicográfica em postagens da rede social Twitter, por meio do “software” livre IRAMUTEQ. Utilizando métodos de captura em massa das postagens, analisamos cerca de 16.000 tweets divididos em dois períodos distintos de tempo e que tratavam de temas relacionados à pandemia da Covid-19. A metodologia foi composta por três etapas: 1) Extração de dados automatizados do Twitter em períodos pré e pós vacinação; 2) Preparação e melhoramento do corpus textual; 3) Análise de dados pelo IRAMUTEQ. Os resultados apontam que no Twitter, o tema Covid foi pauta para debates, desabafos e discussões em diversos assuntos e meios sociais. Acredita-se que as técnicas exploradas neste trabalho possam servir de inspiração para seu uso em organizações, enquanto análise de conteúdos oriundos de redes sociais, pois a técnica explorada nesse estudo mostrou que a ferramenta IRAMUTEQ pode ser utilizada com êxito para analisar corpora textuais compostos por tweets.

19

De Ávila Othero, Gabriel, Sonia Maria Lazzarini Cyrino, Leonardo Teixeira Madrid Alves, Rodrigo Barreto Viana Rosito e Giulia Rotava Schabbach. "Objeto nulo e pronome pleno na retomada anafórica em PB: uma análise em corpora escritos com características de fala". Revista da Anpoll 1, n. 45 (22 agosto 2018): 68–89. http://dx.doi.org/10.18309/anp.v1i45.1113.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Neste trabalho, apresentaremos os resultados de nossa investigação sobre o fenômeno da retomada anafórica de objeto direto em PB analisando dois corpora escritos que apresentam características de oralidade e tentam se aproximar da fala: um corpus constituído de histórias em quadrinhos infantis e outro contendo postagens do Twitter. Investigamos esses corpora com dois objetivos: o primeiro é comparar duas hipóteses sobre a distribuição entre objetos nulos e pronomes (uma que leva em consideração os traços de animacidade e especificidade do antecedente do elemento anafórico e outra que considera o traço de gênero semântico do antecedente). Nosso segundo objetivo é determinar a distribuição entre objetos nulos, clíticos pronominais e pronomes plenos de 3ª pessoa na retomada anafórica de objetos diretos, para verificar se essa distribuição se aproxima mais ao que se reporta, na literatura, a estudos de língua falada ou escrita.

20

Zsombok, Gyula. "Official new terms in the age of social media: the story of hashtag on French Twitter". Journal of French Language Studies 32, n. 2 (luglio 2022): 145–64. http://dx.doi.org/10.1017/s0959269522000072.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractAs other chapters in this special issue demonstrate, social media proposes widely available data for sociolinguistic analysis. Twitter is an ideal resource to implement variationist approaches regarding regional differences, features specific to gender, and metrics of social media influence. At the same time, official intervention on language use, while somewhat studied in other corpora, is less explored on Twitter. French shows a long tradition of purist and prescriptive ideologies, embodied by the Académie française in France and the Office québécois de la langue française in Québec. The injection of recommended terminology aimed to eradicate foreign influence often has questionable success rate, especially in such an informal setting as Twitter. This article thus investigates lexical variation, in particular, the implantation of official new French translations mot-dièse and mot-clic between 2010 and 2016 that are meant to replace the English word hashtag. Results corroborate previous findings on the lacklustre implantation of the prescribed terms, while also revealing that users in Québec are more inclined to adapt them. Furthermore, diffusion online reflects face-to-face patterns that is cascading spread from large urban areas to smaller cities.

21

Aastha, Joshi, e Gaud Nirmal. "Hate speech detection on twitter using machine learning techniques". i-manager's Journal on Information Technology 11, n. 2 (2022): 1. http://dx.doi.org/10.26634/jit.11.2.18919.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

People can now create, post, and share content to connect with each other as social media platforms have grown in popularity. On the other hand, it has also become a forum for hatred and war. The rampant spread of hatred on social media has had a significant impact on society, dividing people into pros and cons on topics that govern the status of a person, place, community, and country. Hate speech on social media is difficult to recognize because messages contain paralinguistic signs, jumbled language, and poorly written content. Due to the lack of consensus on what constitutes hate speech and the lack of background information, it becomes more difficult to detect. Creating huge markup corpora with lots of relevant contexts is a difficult task. Even though scientists have found that hate is a problem on all social media platforms, there is no perfect method to detect accurately. The current state and complexity of the field, as well as the main algorithms, methodologies, and key characteristics used, are described in this paper. It has focused on the important areas that have been explored for hate speech detection and also applied machine learning algorithms to detect it.

22

Bruneau, Pierrick, Etienne Brangbour, Stéphane Marchand-Maillet, Renaud Hostache, Marco Chini, Ramona-Maria Pelich, Patrick Matgen e Thomas Tamisier. "Measuring the Impact of Natural Hazards with Citizen Science: The Case of Flooded Area Estimation Using Twitter". Remote Sensing 13, n. 6 (18 marzo 2021): 1153. http://dx.doi.org/10.3390/rs13061153.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Twitter has significant potential as a source of Volunteered Geographic Information (VGI), as its content is updated at high frequency, with high availability thanks to dedicated interfaces. However, the diversity of content types and the low average accuracy of geographic information attached to individual tweets remain obstacles in this context. The contributions in this paper relate to the general goal of extracting actionable information regarding the impact of natural hazards on a specific region from social platforms, such as Twitter. Specifically, our contributions describe the construction of a model classifying whether given spatio-temporal coordinates, materialized by raster cells in a remote sensing context, lie in a flooded area. For training, remotely sensed data are used as the target variable, and the input covariates are built on the sole basis of textual and spatial data extracted from a Twitter corpus. Our contributions enable the use of trained models for arbitrary new Twitter corpora collected for the same region, but at different times, allowing for the construction of a flooded area measurement proxy available at a higher temporal frequency. Experimental validation uses true data that were collected during Hurricane Harvey, which caused significant flooding in the Houston urban area between mid-August and mid-September 2017. Our experimental section compares several spatial information extraction methods, as well as various textual representation and aggregation techniques, which were applied to the collected Twitter data. The best configuration yields a F1 score of 0.425, boosted to 0.834 if restricted to the 10% most confident predictions.

23

Spina, Stefania. "Le tre fasi del discorso politico italiano in Twitter: una storia senza lieto fine?" Italienisch 44, n. 87 (5 settembre 2022): 47–63. http://dx.doi.org/10.24053/ital-2022-0006.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The relationship between Twitter and Italian politicians started around 2010. Since then, it is possible to trace an evolution, and to identify three distinct phases that have characterised this relationship. This study describes some of the linguistic shifts that characterise the political discourse on Twitter, in its progressive move from the naive attitude to ʻbeing social’ of the beginnings, to the self-promotional monologue, up to the verbal excesses and forms of language aggression of the last three years. Through the analysis of corpora of tweets written by Luigi di Maio, Giorgia Meloni and Matteo Renzi, the focus is mainly on some of the lexical and discursive features that characterise this verbal aggression. The use of a rhetoric of simplification, through which politicians constantly tend to banalise and reduce reality to its extreme forms, strongly contributes to the diffusion of a highly polarised form of political discourse, as well as to our daily experience with social media.

24

Haustein, Stefanie. "SIG/MET: METRICS 2015: Workshop on Informetric and Scientometric Research". Bulletin of the Association for Information Science and Technology 42, n. 3 (febbraio 2016): 24–27. http://dx.doi.org/10.1002/bul2.2016.1720420308.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

EDITOR'S SUMMARYAt the fifth SIG/MET workshop, held during the 2015 ASIS&T Annual Meeting, the group shared papers, posters and discussions exploring developments in information measurement. The opening session on bibliometric case studies examined interdisciplinarity among consumers of academic research, increased funding for coauthors of previously funded authors and a classification of acknowledgement types. A session on information retrieval in relation to bibliometrics included studies on overcoming the limits of computational linguistics in very large corpora, an interactive context explorer of bibliographic data called Ariadne and comparative approaches to visualizing the structure of a very large dataset. The alternative metrics session covered application of altmetrics for analyzing public policy documents, a novel usage indicator promoting article discovery, the basis for connections among faculty members using Twitter and the heavy use of Twitter among academics. The daylong workshop included awards for best papers, best student papers and two featured presentations on the application and use of metrics.

25

Novaes, Maria Fernanda Alvares Travassos de Avelino. "The Recognition of Brazilian Baiano and Gaucho Regional Dialects on Twitter Using Text Mining". U.Porto Journal of Engineering 6, n. 1 (29 aprile 2020): 42–51. http://dx.doi.org/10.24840/2183-6493_006.001_0005.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The internet has broken geographical barriers and brought people and cultures closer independent to their physical location. However, the language, idiom, dialects and accents continue to characterize individuals in their origins. The Brazilian regional dialect is the object of study of this research, which deals with linguistic corpora analyzed from a volume of data extracted on Twitter. This paper presents the results of the mining phase that makes up a first stage of the project to create a technique for recognizing the Brazilian Portuguese regional dialects. Analysis and conclusions were be made only for the baiano and gaucho dialects, considering the significant size of the samples and the need to reach a diagnosis of the collect data set.

26

Hua, Yiqing, Thomas Ristenpart e Mor Naaman. "Towards Measuring Adversarial Twitter Interactions against Candidates in the US Midterm Elections". Proceedings of the International AAAI Conference on Web and Social Media 14 (26 maggio 2020): 272–82. http://dx.doi.org/10.1609/icwsm.v14i1.7298.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Adversarial interactions against politicians on social media such as Twitter have significant impact on society. In particular they disrupt substantive political discussions online, and may discourage people from seeking public office. In this study, we measure the adversarial interactions against candidates for the US House of Representatives during the run-up to the 2018 US general election. We gather a new dataset consisting of 1.7 million tweets involving candidates, one of the largest corpora focusing on political discourse. We then develop a new technique for detecting tweets with toxic content that are directed at any specific candidate. Such technique allows us to more accurately quantify adversarial interactions towards political candidates. Further, we introduce an algorithm to induce candidate-specific adversarial terms to capture more nuanced adversarial interactions that previous techniques may not consider toxic. Finally, we use these techniques to outline the breadth of adversarial interactions seen in the election, including offensive name-calling, threats of violence, posting discrediting information, attacks on identity, and adversarial message repetition.

27

Jahić, Sead, e Jernej Vičič. "Annotated Lexicon for Sentiment Analysis in the Bosnian Language". Slovenščina 2.0: empirične, aplikativne in interdisciplinarne raziskave 11, n. 2 (22 dicembre 2023): 59–83. http://dx.doi.org/10.4312/slo2.0.2023.2.59-83.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The paper presents the first sentiment-annotated lexicon of the Bosnian language. The annotation process and methodology are presented along with a usability study, which concentrates on language coverage. The composition of the starting base was done by translating the Slovenian annotated lexicon and later manually checking the translations and annotations. The language coverage was observed using two reference corpora. The Bosnian language is still considered a low-resource language. A reference corpus comprised of automatically crawled web pages is available for the Bosnian language, but the authors had a hard time sourcing any corpora with a clear time frame for the text contained therein. A corpus of contemporary texts was constructed by collecting news articles from several Bosnian web portals. Two language coverage methods were used in this experiment. The first used a frequency list of all words extracted from two reference Bosnian language corpora, and the second ignored the frequencies as the main factor in counting. The computed coverage using the first presented method for the first corpus was 19.24%, while the second corpus yielded 28.05%. The second method yielded 2.34% coverage for the first corpus and 6.98% for the second corpus. The results of the study present a language coverage that is comparable to the state of the art in the field. The usability of the lexicon was already proven in a Twitter-based comparison.

28

Nini, Andrea, Carlo Corradini, Diansheng Guo e Jack Grieve. "The application of growth curve modeling for the analysis of diachronic corpora". Language Dynamics and Change 7, n. 1 (2017): 102–25. http://dx.doi.org/10.1163/22105832-00701001.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper introduces growth curve modeling for the analysis of language change in corpus linguistics. In addition to describing growth curve modeling, which is a regression-based method for studying the dynamics of a set of variables measured over time, we demonstrate the technique through an analysis of the relative frequencies of words that are increasing or decreasing over time in a multi-billion word diachronic corpus of Twitter. This analysis finds that increasing words tend to follow a trajectory similar to the s-curve of language change, whereas decreasing words tend to follow a decelerated trajectory, thereby showing how growth curve modeling can be used to uncover and describe underlying patterns of language change in diachronic corpora.

29

Martinez, Mario Antonio. "What do people write about COVID-19 and teaching, publicly? Insulators and threats to newly habituated and institutionalized practices for instruction". PLOS ONE 17, n. 11 (10 novembre 2022): e0276511. http://dx.doi.org/10.1371/journal.pone.0276511.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Covid represents major changes in teaching across the world. This study examined some of those changes through tweets that contained threats and insulators to habitualization of newer teaching practices. The investigator harvested tweets to determine sentiment differences between teaching and schools and teaching and online. Topic modeling explored the topics in two separate corpora. Omnibus Yuen’s robust bootstrapped t-tests tested for sentiment differences between the two corpora based on emotions such as fear, anger, disgust, etc. Qualitative responses voiced ideas of insulation and threats to teaching modalities institutionalized during the pandemic. The investigator found that ‘teaching and school’ was associated with higher anger, distrust, and negative emotions than ‘teaching and online’ corpus sets. Qualitative responses indicated support for online instruction, albeit complicated by topic modeling concerns with the modality. Some twitter responses criticized government actions as restrictive. The investigator concluded that insulation and threats towards habitualization and institutionalization of newer teaching modalities during covid are rich and sometimes at odds with each other, showing tension at times.

30

Miličević, Maja, e Nikola Ljubešić. "Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets". Slovenščina 2.0: empirical, applied and interdisciplinary research 4, n. 2 (27 settembre 2016): 156–88. http://dx.doi.org/10.4312/slo2.0.2016.2.156-188.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In this paper we discuss the parallel manual normalisation of samples extracted from Croatian and Serbian Twitter corpora. We describe the datasets, outline the unified guidelines provided to annotators, and present a series of analyses of standard-to-non-standard transformations found in the Twitter data. The results show that closed part-of-speech classes are transformed more frequently than the open classes, that the most frequently transformed lemmas are auxiliary and modal verbs, interjections, particles and pronouns, that character deletions are more frequent than insertions and replacements, and that more transformations occur at the word end than in other positions. Croatian and Serbian are found to share many, but not all transformation patterns; while some of the discrepancies can be ascribed to the structural differences between the two languages, others appear to be better explained by looking at extralinguistic factors. The produced datasets and their initial analyses can be used for studying the properties of non-standard language, as well as for developing language technologies for non-standard data.

31

Monderin, Camille, e Mildred B. Go. "Emerging Netspeak Word Choices in Social Media on Filipino Pop Culture". International Journal of Linguistics, Literature and Translation 4, n. 6 (30 giugno 2021): 49–61. http://dx.doi.org/10.32996/ijllt.2021.4.6.7.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The emergence of the Internet gave birth to a new form of language that is unique to the users of the network. Netspeak is the language of the Internet and has adapted the features of both speaking and writing, however, Netspeak has its own unique characteristics as well. This study aimed to find the emerging lexical patterns of Netspeak as used by Filipinos, the extent of use of Netspeak in three most popular social media platforms (Facebook, Instagram and Twitter) as well as various domains of pop culture (entertainment, politics, fashion and sports) and its implications to the language studies in the Philippines. Both qualitative and quantitative methods were used in this study. The corpora of the study were gathered from two months’ worth of social media activities focusing on the comments in the Facebook, Instagram and Twitter of selected public figures. The findings showed that the emerging lexical patterns of Netspeak were abbreviations and homophones and that social media platforms and pop culture domains affect the use of Netspeak features. The platform and domain that got the highest extent of usage of Netspeak lexical features were Twitter and Politics respectively. The results of this study will help in understanding the language that is used in the Internet as well as raise awareness that this kind of language exists.

32

Gourisaria, Mahendra Kumar, Satish Chandra, Himansu Das, Sudhansu Shekhar Patra, Manoj Sahni, Ernesto Leon-Castro, Vijander Singh e Sandeep Kumar. "Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies". Healthcare 10, n. 5 (10 maggio 2022): 881. http://dx.doi.org/10.3390/healthcare10050881.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%.

33

Schafer, Valérie, Gérôme Truc, Romain Badouard, Lucien Castex e Francesca Musiani. "Paris and Nice terrorist attacks: Exploring Twitter and web archives". Media, War & Conflict 12, n. 2 (3 aprile 2019): 153–70. http://dx.doi.org/10.1177/1750635219839382.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The attacks suffered by France in January and November 2015, and then in the course of 2016, especially the Nice attack, provoked intense online activity both during the events and in the months that followed. The digital traces left by this reactivity and reactions to events gave rise, from the very first days and even hours after the attacks, to a ‘real-time’ institutional archiving by the National Library of France ( Bibliothèque nationale de France, BnF) and the National Audio-visual Institute ( Institut national de l’audiovisuel, Ina). The results amount to millions of archived tweets and URLs. This article seeks to highlight some of the most significant issues raised by these relatively unprecedented corpora, from collection to exploitation, from online stream of data to its mediation and re-composition. Indeed, web archiving practices in times of emergency and crises are significant, almost emblematic, loci to explore the human and technical agencies, and the complex temporalities, of ‘born-digital’ heritage. The cases examined here emphasize the way these ‘emergency collections’ challenge the perimeters and the very nature of web archives as part of our digital and societal heritage, and the guiding visions of its governance and mission. Finally, the present analysis underlines the need for a careful contextualization of the design process – both of original web pages or tweets and of their archived images – and of the tools deployed to collect, retrieve and analyse them.

34

Tahir, Bilal, e Muhammad Amir Mehmood. "Anbar: Collection and analysis of a large scale Urdu language Twitter corpus". Journal of Intelligent & Fuzzy Systems 42, n. 5 (31 marzo 2022): 4789–800. http://dx.doi.org/10.3233/jifs-219266.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The confluence of high performance computing algorithms and large scale high-quality data has led to the availability of cutting edge tools in computational linguistics. However, these state-of-the-art tools are available only for the major languages of the world. The preparation of large scale high-quality corpora for low-resource language such as Urdu is a challenging task as it requires huge computational and human resources. In this paper, we build and analyze a large scale Urdu language Twitter corpus Anbar. For this purpose, we collect 106.9 million Urdu tweets posted by 1.69 million users during one year (September 2018-August 2019). Our corpus consists of tweets with a rich vocabulary of 3.8 million unique tokens along with 58K hashtags and 62K URLs. Moreover, it contains 75.9 million (71.0%) retweets and 847K geotagged tweets. Furthermore, we examine Anbar using a variety of metrics like temporal frequency of tweets, vocabulary size, geo-location, user characteristics, and entities distribution. To the best of our knowledge, this is the largest repository of Urdu language tweets for the NLP research community which can be used for Natural Language Understanding (NLU), social analytics, and fake news detection.

35

Al-Kadi, Abdu M. Talib, e Rashad Ali Ahmed. "EVOLUTION OF ENGLISH IN THE INTERNET AGE". Indonesian Journal of Applied Linguistics 7, n. 3 (31 gennaio 2018): 727. http://dx.doi.org/10.17509/ijal.v7i3.9823.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Although the Internet came into existence in the second half of the twentieth century, its influence on language began to escalate in 1990 onwards. It has drastically changed the way people communicate and use English both in writing and speaking. Consequently, the world has become increasingly interconnected through synchronous and asynchronous communicational scripts, such as SMS, online chat, Yahoo messengers, emails, blogs, and wikis, which have become retrievable as accessible corpora for analysis. These corpora can yield anecdotal evidence of historical language change. The arrival of Web 2.0 tools and applications, such as Facebook, Twitter, Skype, WhatsApp, and Viber, can likewise reveal changes that English has recently undergone. The Internet has given rise to what is arguably a new variety of English that differs from standard varieties. This article provides an account of the development of English from dialects spoken by a small number of people in the British Isles to an international and global language. It emphasizes the language shifts that have taken place more recently since the widespread use of the Internet. The pervasiveness of the Internet has led to new changes in form and usage described as Internet English.

36

Semenova, Marina. "Ecolinguistic sustainability of Spanglish and Chinglish communities". E3S Web of Conferences 371 (2023): 01051. http://dx.doi.org/10.1051/e3sconf/202337101051.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Spanglish and Chinglish as two most popular ‘Glishes’ have developed to a phase when we can seriously consider their sustainability issue, which provides an extensive and stable use of various loinguistic mechanisms built into thier language structure. The paper aims at analizing such trsnalingual mechanisms and corresponding code-switching functions from the perspaectuve of ecolinguistics. The research is base on two corpora of Spanglish and Chinese code-switches, which are then analyzed applying the methods of linguistic, componential, distribution and statistical analysis. Both epistemologically and deductively the paper demonstrates the two most vivid categories of code-switches both in Spanglish and Chinese: stylistic play of words based mainly on semantic borrowing, and translingual allusions and idioms used in Twitter messages over the period between 2017 and 2022. The study shows a certain correlation across the ‘Glishes’ as well as a degree of fluctuation in the corpora-based statistical data. This might involve a different sustainability weight of Spanglish and Chinglish in globalized communities. The paper concludes that the ecolinguistic monitoring of translingual communities is vitally important as we witness an influence redistribution process on the global scale.

37

Mahfouz, Iman Mohamed. "Word Shortening Strategies: Egyptian vs. Non-Egyptian English Tweets". English Language and Literature Studies 8, n. 3 (22 agosto 2018): 27. http://dx.doi.org/10.5539/ells.v8n3p27.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The language of Computer-mediated Communication (CMC) is known to deviate from standard language in many ways dictated by the characteristics of the medium in order to achieve brevity, speed as well as innovation. Together with the intrinsic features of CMC in general, the character limitation imposed by the popular social media platform, Twitter has triggered the use of a number of linguistic devices including shortening strategies in addition to unconventional spelling and grammar. Using two parallel corpora of English tweets written by Egyptians and non-Egyptians on a similar hashtag, the study attempts to compare the shortening strategies used in both datasets. A taxonomy for orthographic and morphological shortening strategies was adapted from Thurlow and Brown (2003) and Denby (2010) with particular focus on message length, punctuation, clipping, abbreviations, contractions, alphanumeric homophones and accent stylization. Given the scarcity of linguistic studies conducted on Egyptian tweets despite the vast amount of data they offer, the study compares the findings about tweets written by Egyptians in English as a foreign language to previous studies. The findings suggest that Egyptians tend to omit punctuation more frequently, whereas non-Egyptians favor abbreviations, contractions and clipped forms. The results also indicate that Twitter may be shifting towards longer messages while at the same time increasingly employing more shortening strategies. The study also reveals that character limitation is not the only factor shaping language use on Twitter since not all linguistic choices are governed by brevity of communication.

38

Kutlu, Ethan, e Ruth Kircher. "A Corpus-Assisted Discourse Study of Attitudes toward Spanish as a Heritage Language in Florida". Languages 6, n. 1 (28 febbraio 2021): 38. http://dx.doi.org/10.3390/languages6010038.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Spanish speakers constitute the largest heritage language community in the US. The state of Florida is unusual in that, on one hand, it has one of the highest foreign-born resident rates in the country, most of whom originate from Latin America—but on the other hand, Florida has a comparatively low Spanish language vitality. In this exploratory study of attitudes toward Spanish as a heritage language in Florida, we analyzed two corpora (one English: 5,405,947 words, and one Spanish: 525,425 words) consisting of recent Twitter data. We examined frequencies, collocations, concordance lines, and larger text segments. The results indicate predominantly negative attitudes toward Spanish on the status dimension, but predominantly positive attitudes on the solidarity dimension. Despite the latter, transmission and use of Spanish were found to be affected by pressure to assimilate, and fear of negative societal repercussions. We also found Spanish to be used less frequently than English to tweet about attitudes; instead, Spanish was frequently used to attract Twitter users’ attention to specific links in the language. We discuss the implications of our findings (should they generalize) for the future of Spanish in Florida, and we provide directions for future research.

39

Jain, Gauri, Manisha Sharma e Basant Agarwal. "Spam Detection on Social Media Using Semantic Convolutional Neural Network". International Journal of Knowledge Discovery in Bioinformatics 8, n. 1 (gennaio 2018): 12–26. http://dx.doi.org/10.4018/ijkdb.2018010102.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.

40

Střelec, Karel. "LAZAR, Jan: Á PROPOS DES PRATI QUES SCRIPTURALES DANS L´ESPACE VIRTUEL: ENTRE FACEBOOK ET TWITTER. Ostrava: Ostravská univerzita 2017. 257 s. ISBN 978-80-7464-811-3." Journal of Linguistics/Jazykovedný casopis 68, n. 3 (1 dicembre 2017): 500–502. http://dx.doi.org/10.2478/jazcas-2018-0005.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract The review starts by stating that modern communication technologies do, without doubt, influence several domains of human behaviour, including the domain of language. The reviewed title focus on peculiarities with one of these new ways of communication, namely, communication on two prominent social networks. The review gives an overview over structure in the book, and comments on its scope. The core of the books’ research is located in the second part of the monograph, where changes in ortography and some other features, such as use of emoticons, are observed in excerpts from Facebook and Twitter corpora, counting 18 000 tokens both. The review concludes that typology of ortographical differences, proposed by the book, is accurate and reflects the real situation.

41

Sboev, Aleksandr, Ivan Moloshnikov, Dmitry Gudovskikh e Roman Rybka. "A Gender Identification of Russian Text Author on Base of Multigenre Data-Driven Approach using Machine Learning Models". 2018 International Conference on Multidisciplinary Research 2018 (31 dicembre 2018): 41–52. http://dx.doi.org/10.26803/myres.2018.04.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In this work data-driven approaches to identify the gender of author of Russian text are investigated with the purpose to clarify, to what extent the machine learning models trained on texts of a certain genre could give accurate results on texts of other genre. The set of data corpora includes: one collected by a crowdsourcing platform, essays of Russian students (RusPersonality), Gender Imitation corpus, and the corpora used at Forum for Information Retrieval Evaluation 2017 (FIRE), containing texts from Facebook, Twitter and Reviews. We present the analysis of numerical experiments based on different features(morphological data, vector of character n-gram frequencies, LIWC and others) of input texts along with various machine learning models (neural networks, gradient boosting methods, CNN, LSTM, SVM, Logistic Regression, Random Forest). Results of these experiments are compared with the results of FIRE competition to evaluate effects of multi-genre training. The presented results, obtained on a wide set of data-driven models, establish the accuracy level for the task to identify gender of a author of a Russian text in the multi-genre case. As shown, an average loss in F1 because of training on a set of genre other than the one used to test is about 11.7%.

42

Jamatia, Anupam, Amitava Das e Björn Gambäck. "Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora". Journal of Intelligent Systems 28, n. 3 (26 luglio 2019): 399–408. http://dx.doi.org/10.1515/jisys-2017-0440.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract This article addresses language identification at the word level in Indian social media corpora taken from Facebook, Twitter and WhatsApp posts that exhibit code-mixing between English-Hindi, English-Bengali, as well as a blend of both language pairs. Code-mixing is a fusion of multiple languages previously mainly associated with spoken language, but which social media users also deploy when communicating in ways that tend to be rather casual. The coarse nature of code-mixed social media text makes language identification challenging. Here, the performance of deep learning on this task is compared to feature-based learning, with two Recursive Neural Network techniques, Long Short Term Memory (LSTM) and bidirectional LSTM, being contrasted to a Conditional Random Fields (CRF) classifier. The results show the deep learners outscoring the CRF, with the bidirectional LSTM demonstrating the best language identification performance.

43

Frey, William R., Desmond U. Patton, Michael B. Gaskell e Kyle A. McGregor. "Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth as Domain Experts for Analyzing Unstructured Twitter Data". Social Science Computer Review 38, n. 1 (18 luglio 2018): 42–56. http://dx.doi.org/10.1177/0894439318788314.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Mining social media data for studying the human condition has created new and unique challenges. When analyzing social media data from marginalized communities, algorithms lack the ability to accurately interpret off-line context, which may lead to dangerous assumptions about and implications for marginalized communities. To combat this challenge, we hired formerly gang-involved young people as domain experts for contextualizing social media data in order to create inclusive, community-informed algorithms. Utilizing data from the Gang Intervention and Computer Science Project—a comprehensive analysis of Twitter data from gang-involved youth in Chicago—we describe the process of involving formerly gang-involved young people in developing a new part-of-speech tagger and content classifier for a prototype natural language processing system that detects aggression and loss in Twitter data. We argue that involving young people as domain experts leads to more robust understandings of context, including localized language, culture, and events. These insights could change how data scientists approach the development of corpora and algorithms that affect people in marginalized communities and who to involve in that process. We offer a contextually driven interdisciplinary approach between social work and data science that integrates domain insights into the training of qualitative annotators and the production of algorithms for positive social impact.

44

Yoon, Sunmoo, Robert Lucero, Mary S. Mittelman, José A. Luchsinger e Suzanne Bakken. "Mining Twitter to Inform the Design of Online Interventions for Hispanic Alzheimer’s Disease and Related Dementias Caregivers". Hispanic Health Care International 18, n. 3 (24 ottobre 2019): 138–43. http://dx.doi.org/10.1177/1540415319882777.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Background/Objective: Hispanics are about 1.5 times as likely as non-Hispanic Whites to experience Alzheimer’s disease and related dementias (AD/ADRD). Eight percent of AD/ADRD caregivers are Hispanics. The purpose of this article is to provide a methodological case study of using data mining methods and the Twitter platform to inform online self-management and social support intervention design and evaluation for Hispanic AD/ADRD caregivers. It will enable other researchers to replicate the methods for their phenomena of interest. Method: We extracted an analytic corpus of 317,658 English and Spanish tweets, applied content mining (topic models) and network structure analysis (macro-, meso-, and micro-levels) methods, and created visualizations of results. Results: The topic models showed differences in content between English and Spanish tweet corpora and between years analyzed. Our methods detected significant structural changes between years including increases in network size and subgroups, decrease in proportion of isolates, and increase in proportion of triads of the balanced communication type. Discussion/Conclusion: Each analysis revealed key lessons that informed the design and/or evaluation of online self-management and social support interventions for Hispanic AD/ADRD caregivers. These lessons are relevant to others wishing to use Twitter to characterize a particular phenomenon or as an intervention platform.

45

Fadhli, Imen, Lobna Hlaoua e Mohamed Nazih Omri. "Sentiment Analysis CSAM Model to Discover Pertinent Conversations in Twitter Microblogs". International Journal of Computer Network and Information Security 14, n. 5 (8 ottobre 2022): 28–46. http://dx.doi.org/10.5815/ijcnis.2022.05.03.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In recent years, the most exploited sources of information such as Facebook, Instagram, LinkedIn and Twitter have been considered to be the main sources of misinformation. The presence of false information in these social networks has a very negative impact on the opinions and the way of thinking of Internet users. To solve this problem of misinformation, several techniques have been used and the most popular is the sentiment analysis. This technique, which consists in exploring opinions on corpora of texts, has become an essential topic in this field. In this article, we propose a new approach, called Conversational Sentiment Analysis Model (CSAM), allowing, from a text written on a subject through messages exchanged between different users, called a conversation, to find the passages describing feelings, emotions, opinions and attitudes. This approach is based on: (i) the conditional probability in order to analyse sentiments of different conversation items in Twitter microblog, which are characterized by small sizes, the presence of emoticons and emojis, (ii) the aggregation of conversation items using the uncertainty theory to evaluate the general sentiment of conversation. We conducted a series of experiments based on the standard Semeval2019 datasets, using three standard and different packages, namely a library for sentiment analysis TextBlob, a dictionary, a sentiment reasoner Flair and an integration-based framework for the Vader NLP task. We evaluated our model with two dataset SemEval 2019 and ScenarioSA, the analysis of the results, which we obtained at the end of this experimental study, confirms the feasibility of our model as well as its performance in terms of precision, recall and F-measurement.

46

Baviera, Tomás. "Influence in the political Twitter sphere: Authority and retransmission in the 2015 and 2016 Spanish General Elections". European Journal of Communication 33, n. 3 (18 marzo 2018): 321–37. http://dx.doi.org/10.1177/0267323118763910.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Candidates, parties, media and citizens have the same ability to post tweets. For this reason, mapping the dynamics of interaction among users is essential to evaluate the processes of influence in an electoral campaign. However, characterising these aspects requires methodologies that consider the interconnections generated by users globally. The discipline of social network analysis provides the concepts of centrality and modularity, both very suitable for the context of network communication. This article analyses the political conversation on Twitter during the 2015 and 2016 General Elections in Spain, in which four candidates with significant popularity in the electorate participated. Two corpora of 8.9 million and 9.7 million tweets were collected from each campaign, respectively, to analyse the networks of mentions and retweets. The network of mentions appears more blurred than that of retweets, allowing us to better estimate users’ partisan preference. The graphs of the network of retweets show a strong internal activity within clusters, and the proximity between them reflects the ideological axis of each party.

47

Molinari, Milena de Paula, Estela Demarque e Maria Cristina Parreira da Silva. "A unidade lexical crush e seus usos: inglês, português do Brasil e francês". Acta Scientiarum. Language and Culture 41, n. 2 (16 dicembre 2019): e46971. http://dx.doi.org/10.4025/actascilangcult.v41i2.46971.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Com a globalização e a facilidade de acesso à internet, sabe-se que hoje em dia há um encurtamento de distâncias devido ao aumento do uso de tecnologias, principalmente por meio das redes sociais. A proposta deste artigo é estudar o uso da unidade lexical (UL) crush, vocábulo de origem inglesa, mas que por meio das mídias como Facebook, Instagram, Twitter, entre outras, vem se tornando cada dia mais popular em várias línguas, como em português do Brasil e em francês. É importante salientar que não trabalhamos com o português europeu e sim, somente, do Brasil. Para que esta pesquisa fosse realizada, a Lexicologia constituiu o aporte teórico, mais especificamente os estudos acerca do empréstimo, de estrangeirismos (Silva, 2006) e de neologismos (Alves, 1996). Este artigo também se apoia na Linguística de Corpus (Berber Sardinha, 2004), para seleção e tratamento de todos os corpora necessários para esse estudo. O levantamento dos corpora em inglês, português e francês, foi realizado por meio da ferramenta BootCat (Baroni & Bernardini, 2004), a fim de analisar o uso da UL crush nessas três línguas. De acordo com nossa pesquisa, constatamos que a UL inglês já pode ser considerada como um empréstimo no português em vias de se adaptar sintaticamente. Dessa forma, a UL não é mais considerada como um estrangeirismo em português, mas sim um neologismo.

48

Kusum, Kusum, e Supriya P. Panda. "Sentiment analysis using global vector and long short-term memory". Indonesian Journal of Electrical Engineering and Computer Science 26, n. 1 (1 aprile 2022): 414. http://dx.doi.org/10.11591/ijeecs.v26.i1.pp414-422.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Tweet sentiment analysis is a Deep Learning study that is beneficial for automatically determining public opinion on a certain topic. Using the Long Short-Term Memory (LSTM) algorithm, this paper aims to proposes a Twitter analysis technique that divides Tweets into two categories (positive and negative). The Global Vector (GloVe) word embedding score is used to rate many selected words as network input. GloVe converts words into vectors by building a corpus matrix. The GloVe outperforms its prior model, owing to its smaller vector and corpora sizes. GloVe has a higher accuracy than the model word embedding word2vec, Continuous Bag of Word(CBoW), and word2vec Skip-gram. The preprocessed term variation was conducted to test the performance of sentiment classification. The test results show that this proposed method has succeeded in classifying with the best results with an accuracy of 95.61%.

49

Zhao, Wanying, Siyi Guo, Kristina Lerman e Yong-Yeol Ahn. "Discovering Collective Narratives Shifts in Online Discussions". Proceedings of the International AAAI Conference on Web and Social Media 18 (28 maggio 2024): 1804–17. http://dx.doi.org/10.1609/icwsm.v18i1.31427.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Narratives are foundation of human cognition and decision making. Because narratives play a crucial role in societal discourses and spread of misinformation and because of the pervasive use of social media, the narrative dynamics on social media can have profound societal impact. Yet, systematic and computational understanding of online narratives faces critical challenge of the scale and dynamics; how can we reliably and automatically extract narratives from massive amount of texts? How do narratives emerge, spread, and die? Here, we propose a systematic narrative discovery framework that fill this gap by combining change point detection, semantic role labeling (SRL), and automatic aggregation of narrative fragments into narrative networks. We evaluate our model with synthetic and empirical data — two Twitter corpora about COVID-19 and 2017 French Election. Results demonstrate that our approach can recover major narrative shifts that correspond to the major events.

50

Turner, Jason, Mehmed Kantardzic e Rachel Vickers-Smith. "Infodemiological Examination of Personal and Commercial Tweets About Cannabidiol: Term and Sentiment Analysis". Journal of Medical Internet Research 23, n. 12 (20 dicembre 2021): e27307. http://dx.doi.org/10.2196/27307.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Background In the absence of official clinical trial information, data from social networks can be used by public health and medical researchers to assess public claims about loosely regulated substances such as cannabidiol (CBD). For example, this can be achieved by comparing the medical conditions targeted by those selling CBD against the medical conditions patients commonly treat with CBD. Objective The objective of this study was to provide a framework for public health and medical researchers to use for identifying and analyzing the consumption and marketing of unregulated substances. Specifically, we examined CBD, which is a substance that is often presented to the public as medication despite complete evidence of efficacy and safety. Methods We collected 567,850 tweets by searching Twitter with the Tweepy Python package using the terms “CBD” and “cannabidiol.” We trained two binary text classifiers to create two corpora of 167,755 personal use and 143,322 commercial/sales tweets. Using medical, standard, and slang dictionaries, we identified and compared the most frequently occurring medical conditions, symptoms, side effects, body parts, and other substances referenced in both corpora. In addition, to assess popular claims about the efficacy of CBD as a medical treatment circulating on Twitter, we performed sentiment analysis via the VADER (Valence Aware Dictionary for Sentiment Reasoning) model on the personal CBD tweets. Results We found references to medically relevant terms that were unique to either personal or commercial CBD tweet classes, as well as medically relevant terms that were common to both classes. When we calculated the average sentiment scores for both personal and commercial CBD tweets referencing at least one of 17 medical conditions/symptoms terms, an overall positive sentiment was observed in both personal and commercial CBD tweets. We observed instances of negative sentiment conveyed in personal CBD tweets referencing autism, whereas CBD was also marketed multiple times as a treatment for autism within commercial tweets. Conclusions Our proposed framework provides a tool for public health and medical researchers to analyze the consumption and marketing of unregulated substances on social networks. Our analysis showed that most users of CBD are satisfied with it in regard to the condition that it is being advertised for, with the exception of autism.

Articoli di riviste sul tema "Twitter corpora"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili