Academic literature on the topic 'Visual Linguistic Task'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual Linguistic Task.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual Linguistic Task"

1

Fox, Sonya, and Beryl Exley. "Historical Timelines Analyzing Mutlimodal Text Design." Social Studies Research and Practice 4, no. 3 (November 1, 2009): 17–27. http://dx.doi.org/10.1108/ssrp-03-2009-b0002.

Full text
Abstract:
The recent focus on literacy in Social Studies has been on linguistic design, particularly that related to the grammar of written and spoken text. When students are expected to produce complex hybridized genres such as timelines, a focus on the teaching and learning of linguistic design is necessary but not sufficient to complete the task. Theorizations of new literacies identify five interrelated meaning making designs for text deconstruction and reproduction: linguistic, spatial, visual, gestural, and audio design. Honing in on the complexity of timelines, this paper casts a lens on the linguistic, visual, spatial, and gestural designs of three pairs of primary school aged Social Studies learners. Drawing on a functional metalanguage, we analyze the linguistic, visual, spatial, and gestural designs of their work. We also offer suggestions of their effect, and from there consider the importance of explicit instruction in text design choices for this Social Studies task. We conclude the analysis by suggesting the foci of explicit instruction for future lessons.
APA, Harvard, Vancouver, ISO, and other styles
2

Suhr, Alane, Mike Lewis, James Yeh, and Yoav Artzi. "Evaluating Visual Reasoning through Grounded Language Understanding." AI Magazine 39, no. 2 (July 1, 2018): 45–52. http://dx.doi.org/10.1609/aimag.v39i2.2796.

Full text
Abstract:
Autonomous systems that understand natural language must reason about complex language and visual observations. Key to making progress towards such systems is the availability of benchmark datasets and tasks. We introduce the Cornell Natural Language Visual Reasoning (NLVR) corpus, which targets reasoning skills like counting, comparisons, and set theory. NLVR contains 92,244 examples of natural language statements paired with synthetic images and annotated with boolean values for the simple task of determining whether the sentence is true or false about the image. While it presents a simple task, NLVR has been developed to challenge systems with diverse linguistic phenomena and complex reasoning. Linguistic analysis confirms that NLVR presents diversity and complexity beyond what is provided by contemporary benchmarks. Empirical evaluation of several methods further demonstrates the open challenges NLVR presents.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Huafeng, Mengwei Tu, and Meiqiong Liang. "Effects of Perceptual Learning Styles on Chinese EFL Learners’ Writing Proficiency in the Reading-writing Integrated Continuation Task." International Journal of Linguistics 14, no. 6 (December 4, 2022): 77. http://dx.doi.org/10.5296/ijl.v14i6.20521.

Full text
Abstract:
Previous studies have manifested that the reading-writing integrated continuation task has great language learning potential and linguistic alignment facilitated by the continuation task positively affects L2 learners’ written performance. As an individual difference construct, perceptual learning style has been investigated from its impact on EFL learning, while research on how it affects learners’ performance in the continuation task seems deficient. To this end, this study investigated the relationship between Chinese EFL learners’ perceptual learning style and writing proficiency in the reading-writing integrated continuation task. Participants were 46 intermediate learners of L2 English from two intact classes who were required to perform both independent topic writing and the continuation task. The results showed that 1) group and auditory style learners slightly outperformed on phrasal alignment while visual and tactile performed better on clausal alignment; 2) visual, tactile and auditory learners were likely to generate content-rich, well-organized and more accurate written production, but students’ linguistic fluency in topic writing outperformed that in the continuation task; 3) learners who prefer audio input showed in inferiority on the continuation writing. These findings confirm that perceptual learning style might be a mediator affecting learners’ linguistic alignment within the continuation task.
APA, Harvard, Vancouver, ISO, and other styles
4

Lu, Youtao, and James L. Morgan. "Homophone auditory processing in cross-linguistic perspective." Proceedings of the Linguistic Society of America 5, no. 1 (March 23, 2020): 529. http://dx.doi.org/10.3765/plsa.v5i1.4733.

Full text
Abstract:
Previous studies reported conflicting results for the effects of homophony on visual word processing across languages. On finding significant differences in homophone density in Japanese, Mandarin Chinese and English, we conducted two experiments to compare native speakers’ competence in homophone auditory processing across these three languages. A lexical decision task showed that the effect of homophony on word processing in Japanese was significantly less detrimental than in Mandarin and English. A word-learning task showed that native Japanese speakers were the fastest in learning novel homophones. These results suggest that language-intrinsic properties influence corresponding language processing abilities of native speakers.
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Chih-Chun, Wan-Cyuan Fan, Cheng-Fu Yang, and Yu-Chiang Frank Wang. "Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3036–44. http://dx.doi.org/10.1609/aaai.v36i3.20210.

Full text
Abstract:
As a key characteristic in audio-visual speech recognition (AVSR), relating linguistic information observed across visual and audio data has been a challenge, benefiting not only audio/visual speech recognition (ASR/VSR) but also for manipulating data within/across modalities. In this paper, we present a feature disentanglement-based framework for jointly addressing the above tasks. By advancing cross-modal mutual learning strategies, our model is able to convert visual or audio-based linguistic features into modality-agnostic representations. Such derived linguistic representations not only allow one to perform ASR, VSR, and AVSR, but also to manipulate audio and visual data output based on the desirable subject identity and linguistic content information. We perform extensive experiments on different recognition and synthesis tasks to show that our model performs favorably against state-of-the-art approaches on each individual task, while ours is a unified solution that is able to jointly tackle the aforementioned audio-visual learning tasks.
APA, Harvard, Vancouver, ISO, and other styles
6

Will, Udo, Guido Nottbusch, and Rüdiger Weingarten. "Linguistic units in word typing." Written Language and Literacy 9, no. 1 (July 20, 2006): 153–76. http://dx.doi.org/10.1075/wll.9.1.10wil.

Full text
Abstract:
This study reports on two experiments in which German participants had to type words presented to them in various modes. Experiment 1 compares typing following visual and oral word presentation with typing following picture presentation. In the second experiment typing responses following oral and visual word presentation were delayed by an extended preparatory period. Both experiments demonstrate significantly increased inter-keystroke intervals (IKIs) at exclusive syllable (S) boundaries and combined syllable and morpheme (SM) boundaries in comparison to within-syllable (L) boundaries. SM-IKIs are significantly larger than S-IKIs and influenced by word frequencies, indicating lexical dependencies. SM-IKIs were found to be significantly longer for oral than for visual word presentation. This is taken as an indication that additional processes are involved in the accessing of graphemic word forms when words are presented orally. Two effects of the typing delay were identified: a decrease of word initial latencies and the disappearance of size differences between SM-IKIs following visual and oral word presentation. On the other hand, the persistence of augmented SM- and S-IKIs in the delayed typing task indicates that input into the motor system is constituted by sub-word units instead by fully specified words. As SM- and S-IKIs reflect influences of different hierarchical levels of language processing, these findings suggest a processing architecture in which the peripheral motor system essentially connects at several hierarchical levels with central processing units.
APA, Harvard, Vancouver, ISO, and other styles
7

Champoux-Larsson, Marie-France, Alexandra S. Dylman, Helena Örnkloo, and Francisco Esteves. "Identification of facial expressions of emotion by 4-year-old children from different linguistic environments." International Journal of Bilingualism 23, no. 5 (June 13, 2018): 1208–19. http://dx.doi.org/10.1177/1367006918781069.

Full text
Abstract:
The current study investigated the identification of facial expressions of emotion, a socio-emotional task that has not previously been examined in children from different linguistic environments. Eighty-four 4-year-olds growing up in one of three linguistic environments (monolingual, dominant bilingual, balanced bilingual) performed a task where they identified facial expressions (happiness, anger, sadness, fear). Accuracy was analysed with a mixed-design analysis of variance using group (monolinguals, dominant bilinguals and balanced bilinguals) and emotion (happy, angry, sad and scared) as between- and within-group variables, respectively. Our results showed a main effect of emotion, but there was no main effect of group. This suggests that 4-year-olds’ linguistic environment does not affect performance on an identification of facial expressions task. This study was the first to investigate the identification of facial expressions of emotion in children coming from different linguistic environments. As the socio-emotional development of bilinguals is not yet well understood, especially regarding the visual perception of emotions, this study is amongst the first to contribute to this area of research. Our results are therefore of significance as a building block for additional studies that should explore the visual perception of emotions in other types of tasks and populations.
APA, Harvard, Vancouver, ISO, and other styles
8

Gross, Stephanie, Brigitte Krenn, and Matthias Scheutz. "Multi-modal referring expressions in human-human task descriptions and their implications for human-robot interaction." Interaction Studies 17, no. 2 (December 14, 2016): 180–210. http://dx.doi.org/10.1075/is.17.2.02gro.

Full text
Abstract:
Abstract Human instructors often refer to objects and actions involved in a task description using both linguistic and non-linguistic means of communication. Hence, for robots to engage in natural human-robot interactions, we need to better understand the various relevant aspects of human multi-modal task descriptions. We analyse reference resolution to objects in a data collection comprising two object manipulation tasks (22 teacher student interactions in Task 1 and 16 in Task 2) and find that 78.76% of all referring expressions to the objects relevant in Task 1 are verbally underspecified and 88.64% of all referring expressions are verbally underspecified in Task 2. The data strongly suggests that a language processing module for robots must be genuinely multi-modal, allowing for seamless integration of information transmitted in the verbal and the visual channel, whereby tracking the speaker’s eye gaze and gestures as well as object recognition are necessary preconditions.
APA, Harvard, Vancouver, ISO, and other styles
9

Esaulova, Yulia, Sarah Dolscheid, Sabine Reuters, and Martina Penke. "The Alignment of Agent-First Preferences with Visual Event Representations: Contrasting German and Arabic." Journal of Psycholinguistic Research 50, no. 4 (March 11, 2021): 843–61. http://dx.doi.org/10.1007/s10936-020-09750-3.

Full text
Abstract:
AbstractHow does non-linguistic, visual experience affect language production? A series of experiments addressed this question by examining linguistic and visual preferences for agent positions in transitive action scenarios. In Experiment 1, 30 native German speakers described event scenes where agents were positioned either to the right or to the left of patients. Produced utterances had longer speech onset times for scenes with right- rather than left-positioned agents, suggesting that the visual organization of events can affect sentence production. In Experiment 2 another cohort of 36 native German participants indicated their aesthetic preference for left- or right-positioned agents in mirrored scenes and displayed a preference for scenes with left-positioned agents. In Experiment 3, 37 Arabic native participants performed the same non-verbal task showing the reverse preference. Our findings demonstrate that non-linguistic visual preferences seem to affect sentence production, which in turn may rely on the writing system of a specific language.
APA, Harvard, Vancouver, ISO, and other styles
10

Carpio, Claudio Antonio, Diana Valeria Barrios, María Guadalupe Montes, Francisco Aguilar, Daniel García-Gallardo, and Virginia Pacheco. "Linguistic Mediation of Perceptual Adjustment in University Students." Revista Argentina de Ciencias del Comportamiento 13, no. 3 (December 23, 2021): 59–69. http://dx.doi.org/10.32348/1852.4206.v13.n3.27985.

Full text
Abstract:
Students from different areas of academic training (Psychology vs. Optometry) completed a task in which they had to locate a "lost moving target" in a simulated forest on a computer screen. The effects of three independent variables were assessed: a) the type of trajectory of the moving target (regular and irregular), b) the time elapsed since the loss of visual contact with the moving target (delays of 1, 4 and 6 seconds), and c) administration / non administration of verbal consequences for localization responses. Results indicated that accuracy in localization responses was higher on 1) regular trajectories, 2) shortest delays, 3) verbal consequences condition, and 4) Optometry students. Findings are discussed in terms of the parameters of the task. Contributions of the academic training of the participants are discussed as a linguistic scenario in which differential modes of the contact with the environment’s mediation are learned.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Visual Linguistic Task"

1

Valsecchi, Matteo, Olaf Dimigen, Reinhold Kliegl, Werner Sommer, and Massimo Turatto. "Microsaccadic Inhibition and P300 Enhancement in a Visual Oddball Task." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2011/5717/.

Full text
Abstract:
It has recently been demonstrated that the presentation of a rare target in a visual oddball paradigm induces a prolonged inhibition of microsaccades. In the field of electrophysiology, the amplitude of the P300 component in event-related potentials (ERP) has been shown to be sensitive to the stimulus category (target vs. non target) of the eliciting stimulus, its overall probability, and the preceding stimulus sequence. In the present study we further specify the functional underpinnings of the prolonged microsaccadic inhibition in the visual oddball task, showing that the stimulus category, the frequency of a stimulus and the preceding stimulus sequence influence microsaccade rate. Furthermore, by co-recording ERPs and eye-movements, we were able to demonstrate that, despite being largely sensitive to the same experimental manipulation, the amplitude of P300 and the microsaccadic inhibition predict each other very weakly, and thus constitute two independent measures of the brain’s response to rare targets in the visual oddball paradigm.
APA, Harvard, Vancouver, ISO, and other styles
2

Skorniakova, Oxana G. "Sensitivity to sub-phonemic variation: Evidence from a Visual Analogue Scale (VAS) goodness-rating task." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1290127664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bibyk, Sarah Alaine. "The Development of Children’s Processing of English Pitch Accents in a Visual Search Task." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275443086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

McGuire, Steven Paul. "Vocabulary Learning Through Cooperatively Structured Art-Based Tasks." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/414473.

Full text
Abstract:
Teaching & Learning
Ed.D.
This study is a multi-method exploratory quantitative and qualitative examination of the degree to which students produce, share, and learn vocabulary and cooperative skills as they carry out three types of individually and cooperatively structured art-based tasks regarding carefully selected and sequenced artworks. The artwork was selected from, and the tasks were adapted from Visual Thinking Strategies, an approach for teaching art appreciated and critical thinking skills. There has been little research that reports the degree of vocabulary through the use of images in general, very little research on cooperative learning and language learning, and an extremely limited amount of research on cooperative learning carried out in the field of foreign language learning through the use of artwork in the Japanese context. This study aims to fill these gaps. There were five main purposes of this study. The first purpose was to explore the range of vocabulary elicited through the cooperatively structured art-based tasks regarding the artworks. The second purpose was to measure students’ learning and use of two cooperative skills as they carried out the art-based tasks. The third purpose was to examine the implementation of the art-based tasks adapted for language learning in the Japanese college context investigated in this study. The fourth purpose was to explore the degree to which vocabulary is produced, shared, and learned in the adapted art-based tasks. The fifth and final purpose was a qualitative and quantitative examination of students’ attitudes towards the art tasks and towards working cooperatively in groups. To answer questions based on the purposes listed above, AntWordProfiler was used to analyze students’ production of vocabulary as they wrote their individual comments about the artworks and the RANGE feature of AntWordProfiler was used to analyze the frequency of particular vocabulary within and across groups in the group activities. The degree of learning was measured through pretests and posttests adapted from the Vocabulary Knowledge Scale. Finally an ANOVA was used to compare the vocabulary learned in the individual and cooperative drawing tasks following a Latin Square design. The qualitative study involved examination of many sources of data, including the worksheets students filled out as they carried out the art-based tasks, the artwork they drew, and audio recordings. Finally, a combined qualitative and qualitative survey at the end of the semester allowed an exploration of students’ opinions regarding art-based tasks, working and learning in groups, and the class as a whole. The results to the 12 research questions showed very little predictability in the specific vocabulary elicited, but did find patterns in the frequency of vocabulary elicited through the artworks, especially in terms of the percentage of vocabulary elicited. Students showed a significant increase in vocabulary knowledge between the pretests and posttests on all tasks, although there was a significant difference in vocabulary learned by students who did the drawing task individually for one artwork over those who drew that artwork in cooperative groups. A frequency analysis of student self-reports of their use of the cooperative skills they were taught and an examination of audio recordings showed they used and processed their use of the skills in ways that cooperative research suggests are beneficial for learning. Finally, the results of the quantitative and qualitative course-final survey showed that students had generally positive attitudes towards both the learning vocabulary using artwork and working in groups and that students enjoyed interacting and learning from fellow group members. There were some negative views of the cooperative tasks that need to be addressed in future use of these tasks, primarily making students aware of the reasoning behind the way they were being asked to carry out the tasks. The findings showed teachers can use artwork with confidence that students will learn vocabulary and that students are generally positive to the cooperatively structured art-based tasks. Future research needs to be carried out with other artwork, in different contexts, with students at different levels of language ability, and with additional art-based tasks.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
5

Malapetsa, Christina. "Stroop tasks with visual and auditory stimuli : How different combinations of spoken words, written words, images and natural sounds affect reaction times." Thesis, Stockholms universitet, Institutionen för lingvistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-185057.

Full text
Abstract:
The Stroop effect is the delay in reaction times due to interference. Since the original experiments of 1935, it has been used primarily in linguistic context. Language is a complex skill unique to humans, which involves a large part of the cerebral cortex and many subcortical regions. It is perceived primarily in auditory form (spoken) and secondarily in visual form (written), but it is also always perceived in representational form (natural sounds, images, smells, etc). Auditory signals are processed much faster than visual signals, and the language processing centres are closer to the primary auditory cortex than the primary visual cortex, but due to the integration of stimuli and the role of the executive functions, we are able to perceive both simultaneously and coherently. However, auditory signals are still processed faster, and this study focused on establishing how auditory and visual, linguistic and representational stimuli interact with each other and affect reaction times in four Stroop tasks with four archetypal mammals (dog, cat, mouse and pig): a written word against an image, a spoken word against an image, a written word against a natural sound and a spoken word against a natural sound. Four hypotheses were tested: in all tasks reaction times would be faster when the stimuli were congruent (Stroop Hypothesis); reaction times would be faster when both stimuli are auditory than when they are visual (Audiovisual Hypothesis); reaction times would be similar in the tasks where one stimulus is auditory and the other visual (Similarity Hypothesis); finally, reaction times would be slower when stimuli come from two sources than when they come from one source (Attention Hypothesis). Twelve native speakers of Swedish between the ages of 22 and 40 participated. The experiment took place in the EEG lab of the Linguistics Department of Stockholm University. The same researcher (the author) and equipment was used for all participants. The results confirmed the Stroop Hypothesis, did not confirm the Audiovisual and Similarity Hypothesis, and the results of the Attention Hypothesis were mixed. The somewhat controversial results were mostly attributed to a false initial assumption, namely that having two different auditory stimuli (one on each ear) was considered one source of stimuli, and possibly the poor quality of some natural sounds. With this additional consideration, the results seemed to be in accord with previous research. Future research could focus on more efficient ways to test the reaction times of Stroop tasks involving auditory and visual stimuli, as well as different populations, especially neurodiverse and bilingual populations.
APA, Harvard, Vancouver, ISO, and other styles
6

Hamilton, Joshua. "These Walls Can Talk: An Ethnographic Study of the Interior Schoolscape of Three High Schools." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc1062876/.

Full text
Abstract:
The schoolhouse is a place in which messages for student consumption are typically found with classroom lectures, text, and activities. As with any social setting, however, the communication is not confined to one space but extends, in this case, to hallways, common spaces, and exterior of the building. One of the most common practices for the delivery of messages to students within the schoolhouse is through visual signage. Visual signage can traverse disciplines encompassing concepts from the fields of communication, semiotics, language, literacy, and even interior design. In an effort to understand the impact these signs have on student populations this dissertation asks the question: How are signs within public high schools produced, consumed, and influential to persons in contact with intended messages that are presented in public school spaces? The study utilizes ethnography to describe the production, consumption, and influence of fixed signs in the interior hallways and common spaces at three public high schools in Texas. At each campus, student volunteers, one from each grade level, provided their individual course schedule to follow their daily route from class to class at their particular high school. Post these observations these students engaged in focus groups to discuss the various signs displayed on their campus. In addition, faculty/staff members from each high school volunteered to participate in a separate faculty/staff focus group to discuss the use of signage in schools and the observations made by both the students and myself during the observations. The data suggest that district directives and social happenings guide the production of messages for each campus. The consumption and influence of these messages though is far more complex as a variety of factors contributed to the student and faculty/staff consumption, or lack thereof, and influence to action. As ethnography, this dissertation sheds light onto these complexities revealing that a host of external and internal issues dictate the messages displayed through school signage within the individual schoolhouse.
APA, Harvard, Vancouver, ISO, and other styles
7

Heuer, Sabine. "AN EVALUATION OF TEST IMAGES FOR MULTIPLE-CHOICE COMPREHENSION ASSESSMENT IN APHASIA." Ohio University / OhioLINK, 2004. http://www.ohiolink.edu/etd/view.cgi?ohiou1090264500.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Drummond, Jane Elizabeth. "Children’s transitive reasoning: effects of visual-spatial and linguistic task conditions." Thesis, 1992. http://hdl.handle.net/2429/2959.

Full text
Abstract:
This research was designed to explore the nature of reasoning. In general, three categories of theories about reasoning (the inferential rule approach, the mental models approach, and the operational constructive approach) are used to explain reasoning. In this research, a simple transitivity of length task was selected as the experimental vehicle to explore these approaches for their veracity. Each approach was assessed for spatial and linguistic conditions which might influence reasoning about transitive length relations. The length difference under consideration in the reasoning task, the order in which the premise statements about the length differences were presented and the linguistic relational term used to describe the length difference were selected as the experimental variables. Three measures of reasoning about transitive length relations were assessed: judgements, judgements-plus-justifications, and necessity understanding. A between-within factorial, cross-sectional design was employed. The order of the premise statements (optimal/control) was manipulated as the experimental between-subjects factor. The two experimental within-subjects factors, length difference (large/small) and linguistic relational term (“longer”/”shorter”), were fully crossed and counterbalanced. Ninety-six preschool and school-age children, evenly divided by gender and age (5-6 years, 7-8 years, 9-10 years), participated in the study. The developmental character of transitive reasoning in the age range studied was confirmed for two of the three measures of reasoning. More failures of judgement were observed when a large length difference was matched with the linguistic relational term “longer” and when a small length difference was matched with the linguistic relational term “shorter” than when the length differences and relational terms were mismatched. The arrangement of the premise figure did indirectly influence any measure of transitive reasoning but a large length difference in combination with the control premise figure was found to increase the frequency of transitive judgements-plus justifications. It is concluded from the analysis of the findings of this research that transitive reasoning about length is likely to result from constructive processes, rather then from application of logical rules. However, it is unclear whether the constructive processes in question are best explained in terms of cognitive operations or in terms of figurative mental models.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Visual Linguistic Task"

1

Satoko, Hayashi, and Tsuda Center for Japanese Language Teaching., eds. 24 tasks for basic modern Japanese. Tokyo: Tsuda Center for Japanese Language Teaching, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Motohashi, Fujiko. 24 tasks for basic modern Japanese. Tokyo: Japan Times, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Motohashi, Fujiko. 24 Tasks for basic modern Japanese =: Nihongo kiite hanashite. Tōkyō: Japan Times, Ltd., 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Motohashi, Fujiko, and Satoko Hayashi. 24 Tasks for Basic Modern Japanese Volume 1. Japan Times Japan, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Maree, Claire. queerqueen. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190869618.001.0001.

Full text
Abstract:
Queerqueen examines the editing and writing of queer excess into Japanese popular culture through mediatization of queerqueen styles. The book illustrates how a diversity of gender identifications, sexual orientations, and discursive styles are packaged together as if to form a homogenous character—the queerqueen. In a range of genres from conversational dialogue books to lifestyle television and animations, queerqueen styles are configured as crossing into popular media via the body of the authentically “queer male,” whose “authentic” speech is produced spontaneously without scripting. Editorial interventions enacted through the collaborative language labor of stenographers and record makers, graphic designers and illustrators, and editorial teams (re)trace the sonic qualities of the queerqueen. Through visual mimesis, contemporaneous citational practices, and the mobilization of nostalgia, queerqueen styles are enregistered as talk that is inherently excessive and in need of containment. Editorial acts of containment such as self-censorship simultaneously expose the sexualized nature of gendered norms of talk in Japanese. It is also here that possible spaces for dissent open up through contestation of the limits to excess. The visual and sonic crossings of gender norms unsettle heteronormative mapping of speech styles onto statically gendered bodies. Strategic use of a variety of linguistic resources such as hyper-masculine forms and hyper-politeness exposes the veneers of technologies that seek to regiment excess. Analysis of the inscription of queerqueen styles reveals metapragmatic stereotypes of gender, sexuality, and desire that are essential to the business of mainstream entertainment.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Visual Linguistic Task"

1

Koner, Rajat, Hang Li, Marcel Hildebrandt, Deepan Das, Volker Tresp, and Stephan Günnemann. "Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering." In The Semantic Web – ISWC 2021, 111–27. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-88361-4_7.

Full text
Abstract:
AbstractVisual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.
APA, Harvard, Vancouver, ISO, and other styles
2

Leung, Kei Yan. "Reflections on Doing Cross-Cultural Research Through and with Visual Methods." In Co-Creativity and Engaged Scholarship, 265–97. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-84248-2_9.

Full text
Abstract:
AbstractAs a traditional and dominant practice of qualitative research, interviewing is heavily dependent on meanings constructed by language. In a cross-cultural setting, the challenge of adequately capturing what interviewees want to convey is well acknowledged by researchers. Indeed, meanings are not only tied to linguistic meanings but also to cultural practices. Moreover, when the focus of one’s research is to understand the mindsets and practices of farmers, focusing solely on spoken words may also hide the fact that farmers also engage with plants, soil and nature through emotions and feelings. In this chapter I will reflect on my personal experiences as a non-Japanese Asian researcher working with an interpreter during my field work in Japan. In the interviews I conducted with farmers, I used photographs of local artwork to elicit information to understand what relationships they may build between the artworks and their farming practices. I used photo elicitation to supplement the limitations of language in making sense of meanings tied to farming practices. Also, to convey results to a western audience, I explore the use of visual illustrations to complement verbal quotes to more fully convey the meaning of the quotes. Two main observations emerged from this cross-cultural experience: first, the gap between language and cultural meaning can provide valuable opportunities for researchers to experiment with different methods, that broaden our sensibilities beyond rational reasoning in data collection; second, using photography in interviews can unfold different layers of realities than talk-only interviews. I argue that visual methods can take us beyond language and open up a more diverse picture to understand the practices of farmers. It is therefore important for cross-cultural researchers to be reflexive about the limitations of language, transform these challenges to an opportunity to remake method and open up different layers of understanding.
APA, Harvard, Vancouver, ISO, and other styles
3

Francis, Elaine J. "Gradient acceptability, methodological diversity, and theoretical interpretation." In Gradient Acceptability and Linguistic Theory, 194–236. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780192898944.003.0008.

Full text
Abstract:
Chapter 8 revisits the issues of form–meaning isomorphism and soft constraints, providing additional examples and arguments to support the positions taken in earlier chapters. The chapter then highlights three studies of split intransitivity in Spanish and English which used visual probe recognition tasks, cross-modal lexical priming tasks, and structural priming tasks to test a widely accepted syntactic distinction between unaccusative and unergative predicates. It is argued that while the results of these studies are open to different theoretical interpretations, the information gleaned from these and other alternative task types is potentially valuable for addressing syntactic questions. The chapter concludes with some brief remarks on big data, neurolinguistics, and the future of syntactic theory within an increasingly diverse methodological landscape.
APA, Harvard, Vancouver, ISO, and other styles
4

Gromik, Nicolas. "Producing Cell Phone Video Diaries." In Handbook of Research on Web 2.0 and Second Language Learning, 259–73. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-60566-190-2.ch014.

Full text
Abstract:
This chapter reports on an ongoing project conducted at Tohoku University in Sendai, Japan. A mixed group of seven advanced EFL learners produced weekly cell phone video diaries that were then delivered online via blip.tv. Participants completed this task as an independent learning project. Using the video recording feature of their cell phones, participants produced videos between 15 and 30 seconds long. As a piece of preliminary research, the aim was not to gather evidence about the linguistic gains that such technology affords, but rather to assess whether or not such a learning approach was feasible and suitable for students. The findings revealed that while the majority of the students found merit in this project, some had reservations. The outcome of this project demonstrates how Web 2.0 is redefining the Internet as a platform for individual content delivery, especially in terms of audio and visual productions.
APA, Harvard, Vancouver, ISO, and other styles
5

Stepanek, Libor, and Constanza Guillermina Arriaga. "Video Summaries of Academic Texts." In Cases on Audio-Visual Media in Language Education, 328–49. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-2724-4.ch014.

Full text
Abstract:
This chapter introduces a case study on the use of video summaries in a university setting. It presents a practical insight into a simple, yet complex and versatile activity that aims at improving the language and communication skills of students in authentic intercultural academic situations. This activity has been developed, implemented and used in language courses at Masaryk University (MU), Brno, Czech Republic, and Universidad Nacional del Sur (UNS), Bahía Blanca, Argentina, since February 2014. Students undertaking language courses at MU and UNS write texts and record video summaries outside of class. They then reflect and discuss linguistic, cultural and organisational topics in class or online discussion forums. The activity consists of a set of interconnected tasks that improve students' understanding of similarities and differences between written and spoken styles, encourage their learning autonomy and enhance the authenticity of their language development.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhu, Meng, and Atta Badii. "Cross-Modal Semantic-Associative Labelling, Indexing and Retrieval of Multimodal Data." In Multiple Sensorial Media Advances and Applications, 234–57. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-60960-821-7.ch012.

Full text
Abstract:
Digitalised multimedia information today is typically represented in different modalities and distributed through various channels. The use of such a huge amount of data is highly dependent on effective and efficient cross-modal labelling, indexing and retrieval of multimodal information. In this Chapter, we mainly focus on the combining of the primary and collateral modalities of the information resource in an intelligent and effective way in order to provide better multimodal information understanding, classification, labelling and retrieval. Image and text are the two modalities we mainly talk about here. A novel framework for semantic-based collaterally cued image labelling had been proposed and implemented, aiming to automatically assign linguistic keywords to regions of interest in an image. A visual vocabulary was constructed based on manually labelled image segments. We use Euclidean distance and Gaussian distribution to map the low-level region-based image features to the high-level visual concepts defined in the visual vocabulary. Both the collateral content and context knowledge were extracted from the collateral textual modality to bias the mapping process. A semantic-based high-level image feature vector model was constructed based on the labelling results, and the performance of image retrieval using this feature vector model appears to outperform both content-based and text-based approaches in terms of its capability for combining both perceptual and conceptual similarity of the image content.
APA, Harvard, Vancouver, ISO, and other styles
7

Markus, Dace, and Valentīna Kaļiņina. "Dzimtās un otrās latviešu valodas prasme pirmsskolas vecumā: vārdu krājums." In Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums, 40–56. Liepājas Universitāte, 2021. http://dx.doi.org/10.37384/lva.2021.040.

Full text
Abstract:
The study has been carried out within the subproject No. 8 “Latvian Language Acquisition” framework of the National Research Programme “Latvian Language”. The aim of this article is to analyse the results of Latvian language skills of the minority pre-school children who attend pre-school groups with Russian as the everyday communication language, the minority pre-school children who attend pre-school groups with Latvian as the everyday language, and Latvian pre-school children. The recordings of children’s speeches were made in Kurzeme pre-school education institu-tions during May and June of 2019 and 2020 before the children started to attend primary school. The findings obtained in this study are illustrated only with the results in vocabulary acquisition, taking into account that one of the most important tasks in learning a second language at pre-school age is vocabulary acquisition. Creating a conviction for beginning a new activity – communication in another language, not in the mother tongue, is of linguodidactic and psychological importance. Knowledge of a larger or smaller vocabulary is the basis for starting to speak a language. The study uses a picture-based conversation, with a maximum of 20 minutes spent in conversation with each child. The criteria proposed by Ingēra Tomme-Jukēvica (Tomme-Jukēvica 2018) have been used; they indicate the level of language skills (0 (insufficient level) – not showing or showing very minimal (<5%) knowledge and skills; 1 (low level) shows minimal (<25%) knowledge and skills; 2 (medium level) shows mediocre (>50%) knowledge and skills; 3 (high level) shows good (>75%) knowledge and skills. The article points out that each individual’s worldview forms with the mother tongue’s help and compares some striking linguistic lexical differences, paying particular attention to the comparative examples of Latvian and Russian languages. By referring to Latvian and Russian examples, the authors demonstrate that it may be necessary to divide the action expressed in one word in one language by creating a word group or even a phrase in another language. The Latvian language proficiency researchers should be aware that children with different native languages (Latvian or Russian) may have different worldviews, demanding additional actions of thinking and speech from the second language speaker. Therefore, second language acquisition at the pre-school age is an essential prerequisite for continuing bilingual studies or studies in Latvian at school. Observations made during the research in the National Research Programme testify that in pre-school education institutions, the process of education usually is interesting for children. However, as the analysis of the recordings of children’s speech in Kurzeme reveals, in those minority children groups where the everyday communication language is Russian and where Latvian is usually taught only two times a week for approximately 30–45 minutes, and also where the visual information in Russian dominates, insufficient skills of the state language and substantially worse experience of the Latvian language use have been observed. At the same time, it should be acknowledged that those minority children who attend groups with Latvian as the everyday language have learned Latvian sufficiently to continue education in the first grade of primary school. These children have not lost their native language, usually Russian, which they use actively at home. Therefore, they have the basis for several language acquisitions when they start learning at school. Learning Latvian as the second language requires optimization of this process in the pre-school education institutions, ensuring regular communication with the child in Latvian, and the use of appropriate methodologies in teaching activities. In this context, not only teaching and practicing Latvian lessons are particularly important, but also communication with other children and the possibility of talking Latvian with the staff of the pre-school educational institution. In accordance with earlier conclusions of linguists, the study conducted in Kurzeme shows that in the speech of pre-school children, independently of their mother tongue, nouns are dominating, but minority children attending groups with the dominant Russian language mostly use nouns in the nominative. Because of the task of preparing minority children for bilingual studies or studies in Latvian in the first grade, the authors of the article recommend ensuring bilingual communication on a day-to-day basis in minority groups of pre-school children.
APA, Harvard, Vancouver, ISO, and other styles
8

Luengo, Isabel. "A Diagrammatic Subsystem of Hilbert's Geometry." In Logical Reasoning with Diagrams. Oxford University Press, 1996. http://dx.doi.org/10.1093/oso/9780195104271.003.0012.

Full text
Abstract:
In the last few years there has been an increasing interest in the visual representation of mathematical concepts. The fact that computers can help us perform graphical tasks very easily has been translated into an increasing interest in diagrammatic representations in general. Several experiments have shown that diagrammatic reasoning plays a main role in the way in which experts in several areas solve problems (Gobert and Freferiksen [1992] and Kindfield [1992]). Two kinds of explanations have been given for the advantages of visual representations over linguistic ones. The first kind of explanation is psychological. It has been argued that visual representations are easier to use because they resemble the mental models hurnans build to solve problems Stenning and Oberlander [1991], Johnson-Laird and Byrne [1991], arid Tverski [1991]. The second kind of explanation is related to computational efficiency. Larkin and Simon [1987] have argued that diagrammatic representations are computationally more efficient than sentential representations because the location of each element in the diagram corresponds to the spatial or topological properties of the objects they represent. However, the efficiency of the use of diagrams is not enough justification for their use in analytical areas of knowledge. Mathematical discoveries often have been made using visual reasoning, but those very same discoveries were not justified by the visual reasoning. Diagrams are associated with intuitions and illustrations, not with rigorous proofs. Visual representations are allowed in the context of discovery, not in the context of justification. Many authors have considered diagrams in opposition to deductive systems. Lindsay [1988], for instance, has claimed that the main feature of visual representations is that they correspond to a non-deductive kind of inference system. Koedinger and Anderson [1991] have related diagrammatic reasoning in geometry to informal, inductive strategies to solve problems. Thus, though we have an empirical justification for the use of diagrams in mathematics (people use them and they work!) we do not usually have an analytical justification. In fact, the history of mathematics, and especially the history of geometry, is full of mistakes related to the use of diagrams.
APA, Harvard, Vancouver, ISO, and other styles
9

Vadhana, Chandra, Shanthi Bala P., and Immanuel Zion Ramdinthara. "Impact of Deep Learning Techniques in IoT." In Advances in Web Technologies and Engineering, 196–226. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-3111-2.ch012.

Full text
Abstract:
Deep learning models can achieve more accuracy sometimes that exceed human-level performance. It is crucial for safety-critical applications such as driverless cars, aerospace, defence, medical research, and industrial automation. Most of the deep learning methods mimic the neural network. It has many hidden layers and creates patterns for decision making and it is a subset of machine learning that performs end-to-end learning and has the capability to learn unsupervised data and also provides very flexible, learnable framework for representing the visual and linguistic information. Deep learning has greatly changed the way and computing devices processes human-centric content such as speech, image recognition, and natural language processing. Deep learning plays a major role in IoT-related services. The amalgamation of deep learning to the IoT environment makes the complex sensing and recognition tasks easier. It helps to automatically identify patterns and detect anomalies that are generated by IoT devices. This chapter discusses the impact of deep learning in the IoT environment.
APA, Harvard, Vancouver, ISO, and other styles
10

Vasylyshyna, N. M., and N. V. Honcharenko-Zakrevska. "SECTION #2. INNOVATIVE METHODS AND TECHNOLOGIES IN THE STUDY AND TEACHING OF FOREIGN 2.1 THE EFFICIENCY OF NEW TEACHING METHODOLOGIES IN THE PROCESS OF SHAPING FOREIGN COMMUNICATIVE COMPETENCIES." In CURRENT THEORY AND PRACTICE ASPECTS OF LINGUISTICS, SOCIOLINGUISTICS AND METHODOLOGY OF FOREIGN LANGUAGES AT UNIVERSITIES IN MODERN GLOBAL HIGHER EDUCATIONAL SPACE. RS Global Sp. z O.O., 2022. http://dx.doi.org/10.31435/rsglobal/052-4.

Full text
Abstract:
Learning a foreign language as if it were a mother tongue would be the ideal way, since the need to learn grammar and structures would be obviated. This is difficult if the teachers themselves are non-native and is therefore one of the most complicated aspects. Things have changed over the years, and though it was one of the most effective ways of teaching, it no longer considered the same now. This is due to various reasons, maybe because: the present generation gets exposure to the world through social media; their knowledge base is augmenting by the information available on the internet; the students nowadays are more impatient and to grab their attention, teaching methods need to cater to their dynamic thinking process. Language teaching, like any other topic, has undergone a lot of changes. It has shifted to roleplays, interactive games, short visuals from the traditional ways, such as lectures by facilitators with only a blackboard to support and spell repetition and grammar worksheets, have shifted to role-plays. Thus, we consider it very important to investigate how to teach English in each situation. Sometimes it is not a matter of teaching English but a matter of teaching in English. The main purpose is to create a new method made of all the different methods already known and take advantage of all the positive features in each method. However, just a simple mixture of all methods would not be enough since we are dealing with very different situations regarding age, level and resources. Therefore, the main idea is to use all the methods in a varying proportion depending on the circumstances. Learning a foreign language may cause stress and anxiety and in order to mitigate this problem, teachers could follow a Natural approach involving teaching in a setting as close as possible to the one people learn their mother tongue. The actuality of our research can be proven with the fact that digitization has no doubt changed our education system, but we cannot say that it has diminished the value of our old time classroom learning. The best part about the digitization of foreign languages education in the 21st century is that it is combined with the aspects of both classroom learning and online learning methods. Walking hand in hand both act as a support system to each other, which gives a stronghold to our modern students. To add, digitization in foreign languages education has also proved to be the right method for saving resources. Online examination platforms have restricted the frivolous usage of paper. During research we have noticed that there is no consensus in academia on the effectiveness and the appropriateness of the use of gaming activities in teaching or learning English. However, we consider it expedient and relevant use of them is able to increase motivation to study English language. We have identified the following benefits of using on line resources during studu English: increases interest and motivates to perform tasks; immerses in English environment; stimulates the ability to work independently; promotes development critical thinking, memory, attention; forms foreign language competence in auditioning and socio-cultural competence; activates the desire to communicate in English when discussing the revised; provides an opportunity to form realistic and modern situations for discussion; allows use a wide range of exercises and various forms of work at the stages of previewing and postviewing; higher education learners learn to understand nonverbal communication and enrich your active and passive conversational vocabularies language. The research concluded that all on line measures developed to improve foreign language training of the foreign languages discipline are developed by teachers of the department of foreign languages and teaching methods of foreign languages, graduation proposals are to taken into account by language departments. We hope that the results will develop further steps in optimization of foreign language training in blended learning and distance education.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual Linguistic Task"

1

Diao, Xiaolei. "Building a Visual Semantics Aware Object Hierarchy." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/826.

Full text
Abstract:
The semantic gap is defined as the difference between the linguistic representations of the same concept, which usually leads to misunderstanding between individuals with different knowledge backgrounds. Since linguistically annotated images are extensively used for training machine learning models, semantic gap problem (SGP) also results in inevitable bias on image annotations and further leads to poor performance on current computer vision tasks. To address this problem, we propose a novel unsupervised method to build visual semantics aware object hierarchy, aiming to get a classification model by learning from pure-visual information and to dissipate the bias of linguistic representations caused by SGP. Our intuition in this paper comes from real-world knowledge representation where concepts are hierarchically organized, and each concept can be described by a set of features rather than a linguistic annotation, namely visual semantic. The evaluation consists of two parts, firstly we apply the constructed hierarchy on the object recognition task and then we compare our visual hierarchy and existing lexical hierarchies to show the validity of our method. The preliminary results reveal the efficiency and potential of our proposed method.
APA, Harvard, Vancouver, ISO, and other styles
2

Agrawal, Samyak, and Radhika Mamidi. "LastResort at SemEval-2022 Task 5: Towards Misogyny Identification using Visual Linguistic Model Ensembles And Task-Specific Pretraining." In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.semeval-1.79.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Song, Wei, Ziyao Song, Lizhen Liu, and Ruiji Fu. "Hierarchical Multi-task Learning for Organization Evaluation of Argumentative Student Essays." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/536.

Full text
Abstract:
Organization evaluation is an important dimension of automated essay scoring. This paper focuses on discourse element (i.e., functions of sentences and paragraphs) based organization evaluation. Existing approaches mostly separate discourse element identification and organization evaluation. In contrast, we propose a neural hierarchical multi-task learning approach for jointly optimizing sentence and paragraph level discourse element identification and organization evaluation. We represent the organization as a grid to simulate the visual layout of an essay and integrate discourse elements at multiple linguistic levels. Experimental results show that the multi-task learning based organization evaluation can achieve significant improvements compared with existing work and pipeline baselines. Multiple level discourse element identification also benefits from multi-task learning through mutual enhancement.
APA, Harvard, Vancouver, ISO, and other styles
4

Stenger, I., and T. Avgustinova. "VISUAL VS. AUDITORY PERCEPTION OF BULGARIAN STIMULI BY RUSSIAN NATIVE SPEAKERS." In International Conference on Computational Linguistics and Intellectual Technologies "Dialogue". Russian State University for the Humanities, 2020. http://dx.doi.org/10.28995/2075-7182-2020-19-684-695.

Full text
Abstract:
This study contributes to a better understanding of receptive multilingualism by determining similarities and differences in successful processing of written and spoken cognate words in an unknown but (closely) related language. We investigate two Slavic languages with regard to their mutual intelligibility. The current focus is on the recognition of isolated Bulgarian words by Russian native speakers in a cognate guessing task, considering both written and audio stimuli. The experimentally obtained intercomprehension scores show a generally high degree of intelligibility of Bulgarian cognates to Russian subjects, as well as processing difficulties in case of visual vs. auditory perception. In search of an explanation, we examine the linguistic factors that can contribute to various degrees of written and spoken word intelligibility. The intercomprehension scores obtained in the online word translation experiments are correlated with (i) the identical and mismatched correspondences on the orthographic and phonetic level, (ii) the word length of the stimuli, and (iii) the frequency of Russian cognates. Additionally we validate two measuring methods: the Levenshtein distance and the word adaptation surprisal as potential pr
APA, Harvard, Vancouver, ISO, and other styles
5

Yan, Liqi, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, and Dongfang Liu. "GL-RG: Global-Local Representation Granularity for Video Captioning." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/384.

Full text
Abstract:
Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GL-RG framework for video captioning, namely a Global-Local Representation Granularity. Our GL-RG demonstrates three advantages over the prior efforts: 1) we explicitly exploit extensive visual representations from different video ranges to improve linguistic expression; 2) we devise a novel global-local encoder to produce rich semantic vocabulary to obtain a descriptive granularity of video contents across frames; 3) we develop an incremental training strategy which organizes model learning in an incremental fashion to incur an optimal captioning behavior. Experimental results on the challenging MSR-VTT and MSVD datasets show that our DL-RG outperforms recent state-of-the-art methods by a significant margin. Code is available at https://github.com/ylqi/GL-RG.
APA, Harvard, Vancouver, ISO, and other styles
6

Long, Siqu, Feiqi Cao, Soyeon Caren Han, and Haiqin Yang. "Vision-and-Language Pretrained Models: A Survey." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/773.

Full text
Abstract:
Pretrained models have produced great success in both Computer Vision (CV) and Natural Language Processing (NLP). This progress leads to learning joint representations of vision and language pretraining by feeding visual and linguistic contents into a multi-layer transformer, Visual-Language Pretrained Models (VLPMs). In this paper, we present an overview of the major advances achieved in VLPMs for producing joint representations of vision and language. As the preliminaries, we briefly describe the general task definition and genetic architecture of VLPMs. We first discuss the language and vision data encoding methods and then present the mainstream VLPM structure as the core content. We further summarise several essential pretraining and fine-tuning strategies. Finally, we highlight three future directions for both CV and NLP researchers to provide insightful guidance.
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Shuo, and Xinxiao Wu. "Entity-aware and Motion-aware Transformers for Language-driven Action Localization." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/216.

Full text
Abstract:
Language-driven action localization in videos is a challenging task that involves not only visual-linguistic matching but also action boundary prediction. Recent progress has been achieved through aligning language queries to video segments, but estimating precise boundaries is still under-explored. In this paper, we propose entity-aware and motion-aware Transformers that progressively localize actions in videos by first coarsely locating clips with entity queries and then finely predicting exact boundaries in a shrunken temporal region with motion queries. The entity-aware Transformer incorporates the textual entities into visual representation learning via cross-modal and cross-frame attentions to facilitate attending action-related video clips. The motion-aware Transformer captures fine-grained motion changes at multiple temporal scales via integrating long short-term memory into the self-attention module to further improve the precision of action boundary prediction. Extensive experiments on the Charades-STA and TACoS datasets demonstrate that our method achieves better performance than existing methods.
APA, Harvard, Vancouver, ISO, and other styles
8

Öman, Anne. "Design and Redesign of a Multimodal Classroom Task – Implications for Teaching and Learning." In InSITE 2015: Informing Science + IT Education Conferences: USA. Informing Science Institute, 2015. http://dx.doi.org/10.28945/2242.

Full text
Abstract:
Digital technologies are increasingly implemented in Swedish schools, which impact on educa-tion in the contemporary classroom. Screen-based practice opens up for new forms and multi-plicity of representations, taking into account that language in a globalized society is more than reading and writing skills. This paper presents a case study of technology-mediated instruction at the primary-school level including an analysis of the designed task and how the teacher orchestrated the digital resources during three introductory classes. The aim was also to explore the pupils’ redesigning of advertis-ing films based on teacher’s instructions and available digital resources. Sequences of a learning trajectory were video recorded and analysed from a multimodal perspective with a focus on the designed task and the processes of how pupils orchestrate meaning through their selection and configuration of available designs. The findings show a distinction between the selection of design elements in the teacher’s orches-tration of the laptop resources during instruction and the pupils’ redesigning of the task. Pupils’ work developed from the linguistic design provided by the teacher towards visual design and the use of images as the central mode of expression in the process of creating advertising films. The findings also indicate a lack of orientation towards subject content due to the teacher’s primary focus on introducing the software. This paper that was presented at the conference was previously published in the Journal of IT Education: Research
APA, Harvard, Vancouver, ISO, and other styles
9

Dang, Long Hoang, Thao Minh Le, Vuong Le, and Truyen Tran. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/88.

Full text
Abstract:
Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol like manipulation over objects, their behavior and interactions. Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects. At each stage of the video event flow, these objects interact with each other, and their interactions are reasoned about with respect to the query and under the overall context of a video. This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture called Hierarchical Object-oriented Spatio-Temporal Reasoning (HOSTR) networks. This neural model maintains the objects' consistent lifelines in the form of a hierarchically nested spatio-temporal graph. Within this graph, the dynamic interactive object-oriented representations are built up along the video sequence, hierarchically abstracted in a bottom-up manner, and converge toward the key information for the correct answer. The method is evaluated on multiple major Video QA datasets and establishes new state-of-the-arts in these tasks. Analysis into the model's behavior indicates that object-oriented reasoning is a reliable, interpretable and efficient approach to Video QA.
APA, Harvard, Vancouver, ISO, and other styles
10

Le, Hieu, Taufiq Daryanto, Fabian Zhafransyah, Derry Wijaya, Elizabeth Coppock, and Sang Chin. "Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach." In 3rd International Conference on Data Mining and Machine Learning (DMML 2022). Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.120709.

Full text
Abstract:
This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene. One common theoretical approach to this problem is to model the task as a two-agent cooperative scheme in which a ‘speaker’ agent would generate the expression that best describes a targeted area and a ‘listener’ agent would identify the target. Several recent REG systems have used deep learning approaches to represent the speaker/listener agents. The Rational Speech Act framework (RSA), a Bayesian approach to pragmatics that can predict human linguistic behavior quite accurately, has been shown to generate high quality and explainable expressions on toy datasets involving simple visual scenes. Its application to large scale problems, however, remains largely unexplored. This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes in a multi-step process with the aim of generating better-explained expressions. We carry out experiments on the RefCOCO and RefCOCO+ datasets and compare our approach with other endto-end deep learning approaches as well as a variation of RSA to highlight our key contribution. Experimental results show that while achieving lower accuracy than SOTA deep learning methods, our approach outperforms similar RSA approach in human comprehension and has an advantage over end-to-end deep learning under limited data scenario. Lastly, we provide a detailed analysis on the expression generation process with concrete examples, thus providing a systematic view on error types and deficiencies in the generation process and identifying possible areas for future improvements.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Visual Linguistic Task"

1

Yatsymirska, Mariya. SOCIAL EXPRESSION IN MULTIMEDIA TEXTS. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11072.

Full text
Abstract:
The article investigates functional techniques of extralinguistic expression in multimedia texts; the effectiveness of figurative expressions as a reaction to modern events in Ukraine and their influence on the formation of public opinion is shown. Publications of journalists, broadcasts of media resonators, experts, public figures, politicians, readers are analyzed. The language of the media plays a key role in shaping the worldview of the young political elite in the first place. The essence of each statement is a focused thought that reacts to events in the world or in one’s own country. The most popular platform for mass information and social interaction is, first of all, network journalism, which is characterized by mobility and unlimited time and space. Authors have complete freedom to express their views in direct language, including their own word formation. Phonetic, lexical, phraseological and stylistic means of speech create expression of the text. A figurative word, a good aphorism or proverb, a paraphrased expression, etc. enhance the effectiveness of a multimedia text. This is especially important for headlines that simultaneously inform and influence the views of millions of readers. Given the wide range of issues raised by the Internet as a medium, research in this area is interdisciplinary. The science of information, combining language and social communication, is at the forefront of global interactions. The Internet is an effective source of knowledge and a forum for free thought. Nonlinear texts (hypertexts) – «branching texts or texts that perform actions on request», multimedia texts change the principles of information collection, storage and dissemination, involving billions of readers in the discussion of global issues. Mastering the word is not an easy task if the author of the publication is not well-read, is not deep in the topic, does not know the psychology of the audience for which he writes. Therefore, the study of media broadcasting is an important component of the professional training of future journalists. The functions of the language of the media require the authors to make the right statements and convincing arguments in the text. Journalism education is not only knowledge of imperative and dispositive norms, but also apodictic ones. In practice, this means that there are rules in media creativity that are based on logical necessity. Apodicticity is the first sign of impressive language on the platform of print or electronic media. Social expression is a combination of creative abilities and linguistic competencies that a journalist realizes in his activity. Creative self-expression is realized in a set of many important factors in the media: the choice of topic, convincing arguments, logical presentation of ideas and deep philological education. Linguistic art, in contrast to painting, music, sculpture, accumulates all visual, auditory, tactile and empathic sensations in a universal sign – the word. The choice of the word for the reproduction of sensory and semantic meanings, its competent use in the appropriate context distinguishes the journalist-intellectual from other participants in forums, round tables, analytical or entertainment programs. Expressive speech in the media is a product of the intellect (ability to think) of all those who write on socio-political or economic topics. In the same plane with him – intelligence (awareness, prudence), the first sign of which (according to Ivan Ogienko) is a good knowledge of the language. Intellectual language is an important means of organizing a journalistic text. It, on the one hand, logically conveys the author’s thoughts, and on the other – encourages the reader to reflect and comprehend what is read. The richness of language is accumulated through continuous self-education and interesting communication. Studies of social expression as an important factor influencing the formation of public consciousness should open up new facets of rational and emotional media broadcasting; to trace physical and psychological reactions to communicative mimicry in the media. Speech mimicry as one of the methods of disguise is increasingly becoming a dangerous factor in manipulating the media. Mimicry is an unprincipled adaptation to the surrounding social conditions; one of the most famous examples of an animal characterized by mimicry (change of protective color and shape) is a chameleon. In a figurative sense, chameleons are called adaptive journalists. Observations show that mimicry in politics is to some extent a kind of game that, like every game, is always conditional and artificial.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography