Journal articles on the topic 'Multi-modal image translation'

To see the other types of publications on this topic, follow the link: Multi-modal image translation.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 29 journal articles for your research on the topic 'Multi-modal image translation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yang, Pengcheng, Boxing Chen, Pei Zhang, and Xu Sun. "Visual Agreement Regularized Training for Multi-Modal Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9418–25. http://dx.doi.org/10.1609/aaai.v34i05.6484.

Full text
Abstract:
Multi-modal machine translation aims at translating the source sentence into a different language in the presence of the paired image. Previous work suggests that additional visual information only provides dispensable help to translation, which is needed in several very special cases such as translating ambiguous words. To make better use of visual information, this work presents visual agreement regularized training. The proposed approach jointly trains the source-to-target and target-to-source translation models and encourages them to share the same focus on the visual information when generating semantically equivalent visual words (e.g. “ball” in English and “ballon” in French). Besides, a simple yet effective multi-head co-attention model is also introduced to capture interactions between visual and textual features. The results show that our approaches can outperform competitive baselines by a large margin on the Multi30k dataset. Further analysis demonstrates that the proposed regularized training can effectively improve the agreement of attention on the image, leading to better use of visual information.
APA, Harvard, Vancouver, ISO, and other styles
2

Kaur, Jagroop, and Gurpreet Singh Josan. "English to Hindi Multi Modal Image Caption Translation." Journal of scientific research 64, no. 02 (2020): 274–81. http://dx.doi.org/10.37398/jsr.2020.640238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xiaobin Guo, Xiaobin Guo. "Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation." 電腦學刊 33, no. 2 (April 2022): 037–50. http://dx.doi.org/10.53106/199115992022043302004.

Full text
Abstract:
<p>Machine translation is a hot research topic at present. Traditional machine translation methods are not effective because they require a large number of training samples. Image visual semantic information can improve the effect of the text machine translation model. Most of the existing works fuse the whole image visual semantic information into the translation model, but the image may contain different semantic objects. These different local semantic objects have different effects on the words prediction of the decoder. Therefore, this paper proposes a multi-modal machine translation model based on the image visual attention mechanism via global and local semantic information fusion. The global semantic information in the image and the local semantic information are fused into the text attention weight as the image attention. Thus, the alignment information between the hidden state of the decoder and the text of the source language is further enhanced. Experimental results on the English-German translation pair and the Indonesian-Chinese translation pair on the Multi30K dataset show that the proposed model has a better performance than the state-of-the-art multi-modal machine translation models, the BLEU values of English-German translation results and Indonesian-Chinese translation results exceed 43% and 29%, which proves the effectiveness of the proposed model.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
4

Xiaobin Guo, Xiaobin Guo. "Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation." 電腦學刊 33, no. 2 (April 2022): 037–50. http://dx.doi.org/10.53106/199115992022043302004.

Full text
Abstract:
<p>Machine translation is a hot research topic at present. Traditional machine translation methods are not effective because they require a large number of training samples. Image visual semantic information can improve the effect of the text machine translation model. Most of the existing works fuse the whole image visual semantic information into the translation model, but the image may contain different semantic objects. These different local semantic objects have different effects on the words prediction of the decoder. Therefore, this paper proposes a multi-modal machine translation model based on the image visual attention mechanism via global and local semantic information fusion. The global semantic information in the image and the local semantic information are fused into the text attention weight as the image attention. Thus, the alignment information between the hidden state of the decoder and the text of the source language is further enhanced. Experimental results on the English-German translation pair and the Indonesian-Chinese translation pair on the Multi30K dataset show that the proposed model has a better performance than the state-of-the-art multi-modal machine translation models, the BLEU values of English-German translation results and Indonesian-Chinese translation results exceed 43% and 29%, which proves the effectiveness of the proposed model.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
5

Shi, Xiayang, Jiaqi Yuan, Yuanyuan Huang, Zhenqiang Yu, Pei Cheng, and Xinyi Liu. "Reference Context Guided Vector to Achieve Multimodal Machine Translation." Journal of Physics: Conference Series 2171, no. 1 (January 1, 2022): 012076. http://dx.doi.org/10.1088/1742-6596/2171/1/012076.

Full text
Abstract:
Abstract Traditional machine translation mainly realizes the introduction of static images from other modal information to improve translation quality. In processing, a variety of methods are combined to improve the data and features, so that the translation result is close to the upper limit, and some even need to rely on the sensitivity of the sample distance algorithm to the data. At the same time, multi-modal MT will cause problems such as lack of semantic interaction in the attention mechanism in the same corpus, or excessive encoding of the same text image information and corpus irrelevant information, resulting in excessive noise. In order to solve these problems, this article proposes a new input port that adds visual image processing to the decoder. The core idea is to combine visual image information with traditional attention mechanisms at each time step specific to decoding. The dynamic router extracts the relevant visual features, integrates the multi-modal visual features into the decoder, and predicts the target word by introducing the visual image process. At the same time, experiments were carried out on more than 30K datasets translated in the United Kingdom, France and the Czech Republic, which proved the superiority of adding visual images to the decoder to extract features.
APA, Harvard, Vancouver, ISO, and other styles
6

Calixto, Iacer, and Qun Liu. "An error analysis for image-based multi-modal neural machine translation." Machine Translation 33, no. 1-2 (April 8, 2019): 155–77. http://dx.doi.org/10.1007/s10590-019-09226-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gómez, Jose L., Gabriel Villalonga, and Antonio M. López. "Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches." Sensors 21, no. 9 (May 4, 2021): 3185. http://dx.doi.org/10.3390/s21093185.

Full text
Abstract:
Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images.
APA, Harvard, Vancouver, ISO, and other styles
8

Rodrigues, Ana, Bruna Sousa, Amílcar Cardoso, and Penousal Machado. "“Found in Translation”: An Evolutionary Framework for Auditory–Visual Relationships." Entropy 24, no. 12 (November 22, 2022): 1706. http://dx.doi.org/10.3390/e24121706.

Full text
Abstract:
The development of computational artifacts to study cross-modal associations has been a growing research topic, as they allow new degrees of abstraction. In this context, we propose a novel approach to the computational exploration of relationships between music and abstract images, grounded by findings from cognitive sciences (emotion and perception). Due to the problem’s high-level nature, we rely on evolutionary programming techniques to evolve this audio–visual dialogue. To articulate the complexity of the problem, we develop a framework with four modules: (i) vocabulary set, (ii) music generator, (iii) image generator, and (iv) evolutionary engine. We test our approach by evolving a given music set to a corresponding set of images, steered by the expression of four emotions (angry, calm, happy, sad). Then, we perform preliminary user tests to evaluate if the user’s perception is consistent with the system’s expression. Results suggest an agreement between the user’s emotional perception of the music–image pairs and the system outcomes, favoring the integration of cognitive science knowledge. We also discuss the benefit of employing evolutionary strategies, such as genetic programming on multi-modal problems of a creative nature. Overall, this research contributes to a better understanding of the foundations of auditory–visual associations mediated by emotions and perception.
APA, Harvard, Vancouver, ISO, and other styles
9

Lu, Chien-Yu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, and Li Su. "Play as You Like: Timbre-Enhanced Multi-Modal Music Style Transfer." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 1061–68. http://dx.doi.org/10.1609/aaai.v33i01.33011061.

Full text
Abstract:
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domaininvariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timbreenhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.
APA, Harvard, Vancouver, ISO, and other styles
10

Islam, Kh Tohidul, Sudanthi Wijewickrema, and Stephen O’Leary. "A rotation and translation invariant method for 3D organ image classification using deep convolutional neural networks." PeerJ Computer Science 5 (March 4, 2019): e181. http://dx.doi.org/10.7717/peerj-cs.181.

Full text
Abstract:
Three-dimensional (3D) medical image classification is useful in applications such as disease diagnosis and content-based medical image retrieval. It is a challenging task due to several reasons. First, image intensity values are vastly different depending on the image modality. Second, intensity values within the same image modality may vary depending on the imaging machine and artifacts may also be introduced in the imaging process. Third, processing 3D data requires high computational power. In recent years, significant research has been conducted in the field of 3D medical image classification. However, most of these make assumptions about patient orientation and imaging direction to simplify the problem and/or work with the full 3D images. As such, they perform poorly when these assumptions are not met. In this paper, we propose a method of classification for 3D organ images that is rotation and translation invariant. To this end, we extract a representative two-dimensional (2D) slice along the plane of best symmetry from the 3D image. We then use this slice to represent the 3D image and use a 20-layer deep convolutional neural network (DCNN) to perform the classification task. We show experimentally, using multi-modal data, that our method is comparable to existing methods when the assumptions of patient orientation and viewing direction are met. Notably, it shows similarly high accuracy even when these assumptions are violated, where other methods fail. We also explore how this method can be used with other DCNN models as well as conventional classification approaches.
APA, Harvard, Vancouver, ISO, and other styles
11

Hamghalam, Mohammad, Baiying Lei, and Tianfu Wang. "High Tissue Contrast MRI Synthesis Using Multi-Stage Attention-GAN for Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4067–74. http://dx.doi.org/10.1609/aaai.v34i04.5825.

Full text
Abstract:
Magnetic resonance imaging (MRI) provides varying tissue contrast images of internal organs based on a strong magnetic field. Despite the non-invasive advantage of MRI in frequent imaging, the low contrast MR images in the target area make tissue segmentation a challenging problem. This paper demonstrates the potential benefits of image-to-image translation techniques to generate synthetic high tissue contrast (HTC) images. Notably, we adopt a new cycle generative adversarial network (CycleGAN) with an attention mechanism to increase the contrast within underlying tissues. The attention block, as well as training on HTC images, guides our model to converge on certain tissues. To increase the resolution of HTC images, we employ multi-stage architecture to focus on one particular tissue as a foreground and filter out the irrelevant background in each stage. This multi-stage structure also alleviates the common artifacts of the synthetic images by decreasing the gap between source and target domains. We show the application of our method for synthesizing HTC images on brain MR scans, including glioma tumor. We also employ HTC MR images in both the end-to-end and two-stage segmentation structure to confirm the effectiveness of these images. The experiments over three competitive segmentation baselines on BraTS 2018 dataset indicate that incorporating the synthetic HTC images in the multi-modal segmentation framework improves the average Dice scores 0.8%, 0.6%, and 0.5% on the whole tumor, tumor core, and enhancing tumor, respectively, while eliminating one real MRI sequence from the segmentation procedure.
APA, Harvard, Vancouver, ISO, and other styles
12

Blanco, Jose-Luis, Javier González-Jiménez, and Juan-Antonio Fernández-Madrigal. "A robust, multi-hypothesis approach to matching occupancy grid maps." Robotica 31, no. 5 (January 11, 2013): 687–701. http://dx.doi.org/10.1017/s0263574712000732.

Full text
Abstract:
SUMMARYThis paper presents a new approach to matching occupancy grid maps by means of finding correspondences between a set of sparse features detected in the maps. The problem is stated here as a special instance of generic image registration. To cope with the uncertainty and ambiguity that arise from matching grid maps, we introduce a modified RANSAC algorithm which searches for a dynamic number of internally consistent subsets of feature pairings from which to compute hypotheses about the translation and rotation between the maps. By providing a (possibly multi-modal) probability distribution of the relative pose of the maps, our method can be seamlessly integrated into large-scale mapping frameworks for mobile robots. This paper provides a benchmarking of different detectors and descriptors, along extensive experimental results that illustrate the robustness of the algorithm with a 97% success ratio in loop-closure detection for ~1700 matchings between local maps obtained from four publicly available datasets.
APA, Harvard, Vancouver, ISO, and other styles
13

Bashiri, Fereshteh, Ahmadreza Baghaie, Reihaneh Rostami, Zeyun Yu, and Roshan D’Souza. "Multi-Modal Medical Image Registration with Full or Partial Data: A Manifold Learning Approach." Journal of Imaging 5, no. 1 (December 30, 2018): 5. http://dx.doi.org/10.3390/jimaging5010005.

Full text
Abstract:
Multi-modal image registration is the primary step in integrating information stored in two or more images, which are captured using multiple imaging modalities. In addition to intensity variations and structural differences between images, they may have partial or full overlap, which adds an extra hurdle to the success of registration process. In this contribution, we propose a multi-modal to mono-modal transformation method that facilitates direct application of well-founded mono-modal registration methods in order to obtain accurate alignment of multi-modal images in both cases, with complete (full) and incomplete (partial) overlap. The proposed transformation facilitates recovering strong scales, rotations, and translations. We explain the method thoroughly and discuss the choice of parameters. For evaluation purposes, the effectiveness of the proposed method is examined and compared with widely used information theory-based techniques using simulated and clinical human brain images with full data. Using RIRE dataset, mean absolute error of 1.37, 1.00, and 1.41 mm are obtained for registering CT images with PD-, T1-, and T2-MRIs, respectively. In the end, we empirically investigate the efficacy of the proposed transformation in registering multi-modal partially overlapped images.
APA, Harvard, Vancouver, ISO, and other styles
14

Chunying, LU. "On the Translation of Chinese Diplomatic Discourse from the Perspective of Multi-Modal Discourse: Take the China-US high-Level Strategic Dialogue as an Example." Asia-Pacific Journal of Humanities and Social Sciences 2, no. 2 (June 30, 2022): 153–59. http://dx.doi.org/10.53789/j.1653-0465.2022.0202.019.p.

Full text
Abstract:
Translation is not only a task but also a tool in the international political competition. It's urgent for us to strengthen and improve our work in international communication to demonstrate a real, three-dimensional and comprehensive China, in which the English translation of diplomatic discourse also matters. This study finds that the English translation of Chinese diplomatic discourse is mainly in the form of symbols, and little attention is given to pictures, images, audio and other multi-modal forms. The latter, however is often more intuitive and vivid, which can enhance the communication effect from the perspective of multi-angle and three-dimension, increasing audience coverage. From the perspective of multi-modal discourse analysis, this paper focuses on the China-US Alaska high-level strategic dialogue in the following two aspects, i. e, verbal modal and non-verbal modal discourse analysis, in hope of shedding some light on the English translation of Chinese diplomatic discourse.
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Edmond. "Glioblastoma Synthesis and Segmentation with 3D Multi-Modal MRI: A Study using Generative Adversarial Networks." International Journal on Computational Science & Applications 11, no. 6 (December 31, 2021): 1–14. http://dx.doi.org/10.5121/ijcsa.2021.11601.

Full text
Abstract:
The Grade IV cancer Glioblastoma is an extremely common and aggressive brain tumour. It is of significant consequence that histopathologic examinations should be able to identify and capture the tumour’s genetic variability for assistance in treatment. The use of Deep Learning - in particular CNNs and GANs - have become prominent in dealing with various image segmentation and detection tasks. The use of GANs have another importance - to expand the available training set by generating realistic pseudomedical images. Multi-modal MRIs, moreover, are also crucial as they lead to more successful performances. Nonetheless, accurate segmentation and realistic image synthesis remain challenging tasks. In this study, the history and various breakthroughs/challenges of utilising deep learning in glioblastoma detection is outlined and evaluated. To see networks in action, an adjusted and calibrated Vox2Vox network - a 3D implementation of the Pix2Pix translator - is trained on the biggest public brain tumour dataset BraTS 2020. The experimental results demonstrate the versatility and improvability of GAN networks in both fields of augmentation and segmentation. Overall, deep learning in medical imaging remains an extremely intoxicating field full of meticulous and innovative new studies.
APA, Harvard, Vancouver, ISO, and other styles
16

Desjardins, Renée. "Inter-Semiotic Translation within the Space of the Multimodal Text." TranscUlturAl: A Journal of Translation and Cultural Studies 1, no. 1 (August 8, 2008): 48. http://dx.doi.org/10.21992/t9f63h.

Full text
Abstract:
Though Jakobson conceptualized inter-semiotic transfer as a valid form of translation between texts some 40 years ago, it remained a relatively marginal area of investigation until recently. Additionally, we may note that information and communication technologies (ICTs) have also markedly modified previously held notions of what constitutes a “text”. Basing my research on these two observations, this paper will draw attention to inter-semiotic translation within the space the newscast, a prime example of an ever-evolving multi-modal text created by newer forms of ICTs. More specifically, the focus lies in how we can posit translational activity in the construction of the newscast (for instance, how the “visual” images translate the “verbal” narration of the newscaster or journalist). This conceptualization of “news-making” will lead us to consider the ways in which cultural stereotypes are created through the “translation” of interacting texts (verbal, visual, aural) on the same interface. To suggest that newscasts, and by extension media, proliferate cultural stereotypes, is by no means novel. However, to consider how inter-semiotic translation plays a role in their creation may be a departure from previous paradigms. In fact, Kress and van Leeuwan state: “This incessant process of ‘translation’, or ‘transcoding’ – ‘transduction’ – between a range of semiotic modes represents […] a better, more adequate understanding of representation and communication (2006:39)”. Furthermore, because cultural stereotyping is often at the root of conflict, this type of investigation becomes, we suggest, all the more worthwhile in understanding how we “translate” difference across borders, semiotic or otherwise.
APA, Harvard, Vancouver, ISO, and other styles
17

Zhao, Shanshan, Lixiang Li, Haipeng Peng, Zihang Yang, and Jiaxuan Zhang. "Image Caption Generation via Unified Retrieval and Generation-Based Method." Applied Sciences 10, no. 18 (September 8, 2020): 6235. http://dx.doi.org/10.3390/app10186235.

Full text
Abstract:
Image captioning is a multi-modal transduction task, translating the source image into the target language. Numerous dominant approaches primarily employed the generation-based or the retrieval-based method. These two kinds of frameworks have their advantages and disadvantages. In this work, we make the best of their respective advantages. We adopt the retrieval-based approach to search the visually similar image and their corresponding captions for each queried image in the MSCOCO data set. Based on the retrieved similar sequences and the visual features of the queried image, the proposed de-noising module yielded a set of attended textual features which brought additional textual information for the generation-based model. Finally, the decoder makes use of not only the visual features but also the textual features to generate the output descriptions. Additionally, the incorporated visual encoder and the de-noising module can be applied as a preprocessing component for the decoder-based attention mechanisms. We evaluate the proposed method on the MSCOCO benchmark data set. Extensive experiment yields state-of-the-art performance, and the incorporated module raises the baseline models in terms of almost all the evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
18

Ferrari, Luca, Fabio Dell’Acqua, Peng Zhang, and Peijun Du. "Integrating EfficientNet into an HAFNet Structure for Building Mapping in High-Resolution Optical Earth Observation Data." Remote Sensing 13, no. 21 (October 29, 2021): 4361. http://dx.doi.org/10.3390/rs13214361.

Full text
Abstract:
Automated extraction of buildings from Earth observation (EO) data is important for various applications, including updating of maps, risk assessment, urban planning, and policy-making. Combining data from different sensors, such as high-resolution multispectral images (HRI) and light detection and ranging (LiDAR) data, has shown great potential in building extraction. Deep learning (DL) is increasingly used in multi-modal data fusion and urban object extraction. However, DL-based multi-modal fusion networks may under-perform due to insufficient learning of “joint features” from multiple sources and oversimplified approaches to fusing multi-modal features. Recently, a hybrid attention-aware fusion network (HAFNet) has been proposed for building extraction from a dataset, including co-located Very-High-Resolution (VHR) optical images and light detection and ranging (LiDAR) joint data. The system reported good performances thanks to the adaptivity of the attention mechanism to the features of the information content of the three streams but suffered from model over-parametrization, which inevitably leads to long training times and heavy computational load. In this paper, the authors propose a restructuring of the scheme, which involved replacing VGG-16-like encoders with the recently proposed EfficientNet, whose advantages counteract exactly the issues found with the HAFNet scheme. The novel configuration was tested on multiple benchmark datasets, reporting great improvements in terms of processing times, and also in terms of accuracy. The new scheme, called HAFNetE (HAFNet with EfficientNet integration), appears indeed capable of achieving good results with less parameters, translating into better computational efficiency. Based on these findings, we can conclude that, given the current advancements in single-thread schemes, the classical multi-thread HAFNet scheme could be effectively transformed by the HAFNetE scheme by replacing VGG-16 with EfficientNet blocks on each single thread. The remarkable reduction achieved in computational requirements moves the system one step closer to on-board implementation in a possible, future “urban mapping” satellite constellation.
APA, Harvard, Vancouver, ISO, and other styles
19

Lin, Jia-Ren, Yu-An Chen, Daniel Campton, Jeremy Cooper, Shannon Coy, Clarence Yapp, Erin McCarty, et al. "Abstract A028: Multi-modal digital pathology by sequential acquisition and joint analysis of highly multiplexed immunofluorescence and hematoxylin and eosin images." Cancer Research 82, no. 23_Supplement_1 (December 1, 2022): A028. http://dx.doi.org/10.1158/1538-7445.crc22-a028.

Full text
Abstract:
Abstract Histopathology using Hematoxylin and Eosin (H&E) stained tissue sections plays a central role in the diagnosis and staging of diseases. The transition to digital H&E pathology affords an opportunity for integration with recently developed, highly multiplexed tissue imaging methods. Here we describe an approach (and instrument) for collecting and analyzing H&E and high-plex immunofluorescence (IF) images from the same cells at subcellular-resolution in a whole-slide format suitable for translational and clinical research and eventual deployment in a diagnostic setting. IF and H&E images provide highly complementary information for analysis by human experts and machine learning algorithms. Using images of 40 human colorectal cancer resections, we demonstrate the automated generation and ranking of computational models, based either on immune infiltration or tumor-intrinsic features, that are highly predictive of progression-free survival. When these models are combined, a hazard ratio of ~0.045 can be achieved, suggesting the potential of integrated H&E and high-plex imaging in the generation of high performance prognostic biomarkers. Citation Format: Jia-Ren Lin, Yu-An Chen, Daniel Campton, Jeremy Cooper, Shannon Coy, Clarence Yapp, Erin McCarty, Keith L. Ligon, Steven Reese, Tad George, Sandro Santagata, Peter Sorger. Multi-modal digital pathology by sequential acquisition and joint analysis of highly multiplexed immunofluorescence and hematoxylin and eosin images [abstract]. In: Proceedings of the AACR Special Conference on Colorectal Cancer; 2022 Oct 1-4; Portland, OR. Philadelphia (PA): AACR; Cancer Res 2022;82(23 Suppl_1):Abstract nr A028.
APA, Harvard, Vancouver, ISO, and other styles
20

Gu, Xiaoling, Jun Yu, Yongkang Wong, and Mohan S. Kankanhalli. "Toward Multi-Modal Conditioned Fashion Image Translation." IEEE Transactions on Multimedia, 2020, 1. http://dx.doi.org/10.1109/tmm.2020.3009500.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Yan, Shouang, Chengyan Wang, Weibo Chen, and Jun Lyu. "Swin transformer-based GAN for multi-modal medical image translation." Frontiers in Oncology 12 (August 8, 2022). http://dx.doi.org/10.3389/fonc.2022.942511.

Full text
Abstract:
Medical image-to-image translation is considered a new direction with many potential applications in the medical field. The medical image-to-image translation is dominated by two models, including supervised Pix2Pix and unsupervised cyclic-consistency generative adversarial network (GAN). However, existing methods still have two shortcomings: 1) the Pix2Pix requires paired and pixel-aligned images, which are difficult to acquire. Nevertheless, the optimum output of the cycle-consistency model may not be unique. 2) They are still deficient in capturing the global features and modeling long-distance interactions, which are critical for regions with complex anatomical structures. We propose a Swin Transformer-based GAN for Multi-Modal Medical Image Translation, named MMTrans. Specifically, MMTrans consists of a generator, a registration network, and a discriminator. The Swin Transformer-based generator enables to generate images with the same content as source modality images and similar style information of target modality images. The encoder part of the registration network, based on Swin Transformer, is utilized to predict deformable vector fields. The convolution-based discriminator determines whether the target modality images are similar to the generator or from the real images. Extensive experiments conducted using the public dataset and clinical datasets showed that our network outperformed other advanced medical image translation methods in both aligned and unpaired datasets and has great potential to be applied in clinical applications.
APA, Harvard, Vancouver, ISO, and other styles
22

Dufera, Bisrat Derebssa. "Improved Training of Multi-Modal Unsupervised Image-to-Image Translation." SSRN Electronic Journal, 2022. http://dx.doi.org/10.2139/ssrn.4221662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Gu, Xiaoling, Jie Huang, Yongkang Wong, Jun Yu, Jianping Fan, Pai Peng, and Mohan S. Kankanhalli. "PAINT: Photo-realistic Fashion Design Synthesis." ACM Transactions on Multimedia Computing, Communications, and Applications, June 30, 2022. http://dx.doi.org/10.1145/3545610.

Full text
Abstract:
In this paper, we investigate a new problem of generating a variety of multi-view fashion designs conditioned on a human pose and texture examples of arbitrary sizes, which can replace the repetitive and low-level design work for fashion designers. To solve this challenging multi-modal image translation problem, we propose a novel Photo-reAlistic fashIon desigN synThesis (PAINT) framework, which decomposes the framework into three manageable stages. In the first stage, we employ a Layout Generative Network (LGN) to transform an input human pose into a series of person semantic layouts. In the second stage, we propose a Texture Synthesis Network (TSN) to synthesize textures on all transformed semantic layouts. Specifically, we design a novel attentive texture transfer mechanism for precisely expanding texture patches to the irregular clothing regions of the target fashion designs. In the third stage, we leverage an Appearance Flow Network (AFN) to generate the fashion design images of other viewpoints from a single-view observation by learning 2D multi-scale appearance flow fields. Experimental results demonstrate that our method is capable of generating diverse photo-realistic multi-view fashion design images with fine-grained appearance details conditioned on the provided multiple inputs. The source code and trained models are available at https://github.com/gxl-groups/PAINT.
APA, Harvard, Vancouver, ISO, and other styles
24

Kofler, Florian, Ivan Ezhov, Lucas Fidon, Carolin M. Pirkl, Johannes C. Paetzold, Egon Burian, Sarthak Pati, et al. "Robust, Primitive, and Unsupervised Quality Estimation for Segmentation Ensembles." Frontiers in Neuroscience 15 (December 30, 2021). http://dx.doi.org/10.3389/fnins.2021.752780.

Full text
Abstract:
A multitude of image-based machine learning segmentation and classification algorithms has recently been proposed, offering diagnostic decision support for the identification and characterization of glioma, Covid-19 and many other diseases. Even though these algorithms often outperform human experts in segmentation tasks, their limited reliability, and in particular the inability to detect failure cases, has hindered translation into clinical practice. To address this major shortcoming, we propose an unsupervised quality estimation method for segmentation ensembles. Our primitive solution examines discord in binary segmentation maps to automatically flag segmentation results that are particularly error-prone and therefore require special assessment by human readers. We validate our method both on segmentation of brain glioma in multi-modal magnetic resonance - and of lung lesions in computer tomography images. Additionally, our method provides an adaptive prioritization mechanism to maximize efficacy in use of human expert time by enabling radiologists to focus on the most difficult, yet important cases while maintaining full diagnostic autonomy. Our method offers an intuitive and reliable uncertainty estimation from segmentation ensembles and thereby closes an important gap toward successful translation of automatic segmentation into clinical routine.
APA, Harvard, Vancouver, ISO, and other styles
25

Ramalhinho, João, Bongjin Koo, Nina Montaña-Brown, Shaheer U. Saeed, Ester Bonmati, Kurinchi Gurusamy, Stephen P. Pereira, Brian Davidson, Yipeng Hu, and Matthew J. Clarkson. "Deep hashing for global registration of untracked 2D laparoscopic ultrasound to CT." International Journal of Computer Assisted Radiology and Surgery, April 2, 2022. http://dx.doi.org/10.1007/s11548-022-02605-3.

Full text
Abstract:
Abstract Purpose The registration of Laparoscopic Ultrasound (LUS) to CT can enhance the safety of laparoscopic liver surgery by providing the surgeon with awareness on the relative positioning between critical vessels and a tumour. In an effort to provide a translatable solution for this poorly constrained problem, Content-based Image Retrieval (CBIR) based on vessel information has been suggested as a method for obtaining a global coarse registration without using tracking information. However, the performance of these frameworks is limited by the use of non-generalisable handcrafted vessel features. Methods We propose the use of a Deep Hashing (DH) network to directly convert vessel images from both LUS and CT into fixed size hash codes. During training, these codes are learnt from a patient-specific CT scan by supplying the network with triplets of vessel images which include both a registered and a mis-registered pair. Once hash codes have been learnt, they can be used to perform registration with CBIR methods. Results We test a CBIR pipeline on 11 sequences of untracked LUS distributed across 5 clinical cases. Compared to a handcrafted feature approach, our model improves the registration success rate significantly from 48% to 61%, considering a 20 mm error as the threshold for a successful coarse registration. Conclusions We present the first DH framework for interventional multi-modal registration tasks. The presented approach is easily generalisable to other registration problems, does not require annotated data for training, and may promote the translation of these techniques.
APA, Harvard, Vancouver, ISO, and other styles
26

Cui, Ying, and Hui Wang. "Film song translation: Verbal, vocal, and visual dimensions." Babel. Revue internationale de la traduction / International Journal of Translation, July 19, 2022. http://dx.doi.org/10.1075/babel.00280.cui.

Full text
Abstract:
Abstract As films are distributed across the globe, film song translation has become a subject of study, which entails considering multi-modal factors. This paper aims to explore the major dimensions and parameters involved in film song translation. Based on previous research on music and translation, this paper proposes a framework for studying film song translation from verbal, vocal, and visual dimensions. The verbal dimension involves semantic meaning, metaphors, images, mood, and emotion. The vocal dimension includes the number of syllables and musical notes, the length of musical notes, rhyme and parallelism, the rise and fall of the melody, and the segmentation of a line. The visual dimension covers the plot, characters, and background pictures. This paper uses this framework to analyze the Chinese translation of Amazing Grace in the film Forever Young to demonstrate how film song translation can be flexible in tackling verbal, vocal, and visual restrictions and possibilities.
APA, Harvard, Vancouver, ISO, and other styles
27

Cornia, Marcella, Lorenzo Baraldi, and Rita Cucchiara. "Explaining transformer-based image captioning models: An empirical analysis." AI Communications, October 29, 2021, 1–19. http://dx.doi.org/10.3233/aic-210172.

Full text
Abstract:
Image Captioning is the task of translating an input image into a textual description. As such, it connects Vision and Language in a generative fashion, with applications that range from multi-modal search engines to help visually impaired people. Although recent years have witnessed an increase in accuracy in such models, this has also brought increasing complexity and challenges in interpretability and visualization. In this work, we focus on Transformer-based image captioning models and provide qualitative and quantitative tools to increase interpretability and assess the grounding and temporal alignment capabilities of such models. Firstly, we employ attribution methods to visualize what the model concentrates on in the input image, at each step of the generation. Further, we propose metrics to evaluate the temporal alignment between model predictions and attribution scores, which allows measuring the grounding capabilities of the model and spot hallucination flaws. Experiments are conducted on three different Transformer-based architectures, employing both traditional and Vision Transformer-based visual features.
APA, Harvard, Vancouver, ISO, and other styles
28

Fan, Yao, Jiaji Li, Linpeng Lu, Jiasong Sun, Yan Hu, Jialin Zhang, Zhuoshi Li, et al. "Smart computational light microscopes (SCLMs) of smart computational imaging laboratory (SCILab)." PhotoniX 2, no. 1 (September 3, 2021). http://dx.doi.org/10.1186/s43074-021-00040-2.

Full text
Abstract:
AbstractComputational microscopy, as a subfield of computational imaging, combines optical manipulation and image algorithmic reconstruction to recover multi-dimensional microscopic images or information of micro-objects. In recent years, the revolution in light-emitting diodes (LEDs), low-cost consumer image sensors, modern digital computers, and smartphones provide fertile opportunities for the rapid development of computational microscopy. Consequently, diverse forms of computational microscopy have been invented, including digital holographic microscopy (DHM), transport of intensity equation (TIE), differential phase contrast (DPC) microscopy, lens-free on-chip holography, and Fourier ptychographic microscopy (FPM). These computational microscopy techniques not only provide high-resolution, label-free, quantitative phase imaging capability but also decipher new and advanced biomedical research and industrial applications. Nevertheless, most computational microscopy techniques are still at an early stage of “proof of concept” or “proof of prototype” (based on commercially available microscope platforms). Translating those concepts to stand-alone optical instruments for practical use is an essential step for the promotion and adoption of computational microscopy by the wider bio-medicine, industry, and education community. In this paper, we present four smart computational light microscopes (SCLMs) developed by our laboratory, i.e., smart computational imaging laboratory (SCILab) of Nanjing University of Science and Technology (NJUST), China. These microscopes are empowered by advanced computational microscopy techniques, including digital holography, TIE, DPC, lensless holography, and FPM, which not only enables multi-modal contrast-enhanced observations for unstained specimens, but also can recover their three-dimensional profiles quantitatively. We introduce their basic principles, hardware configurations, reconstruction algorithms, and software design, quantify their imaging performance, and illustrate their typical applications for cell analysis, medical diagnosis, and microlens characterization.
APA, Harvard, Vancouver, ISO, and other styles
29

Droumeva, Milena. "Curating Everyday Life: Approaches to Documenting Everyday Soundscapes." M/C Journal 18, no. 4 (August 10, 2015). http://dx.doi.org/10.5204/mcj.1009.

Full text
Abstract:
In the last decade, the cell phone’s transformation from a tool for mobile telephony into a multi-modal, computational “smart” media device has engendered a new kind of emplacement, and the ubiquity of technological mediation into the everyday settings of urban life. With it, a new kind of media literacy has become necessary for participation in the networked social publics (Ito; Jenkins et al.). Increasingly, the way we experience our physical environments, make sense of immediate events, and form impressions is through the lens of the camera and through the ear of the microphone, framed by the mediating possibilities of smartphones. Adopting these practices as a kind of new media “grammar” (Burn 29)—a multi-modal language for public and interpersonal communication—offers new perspectives for thinking about the way in which mobile computing technologies allow us to explore our environments and produce new types of cultural knowledge. Living in the Social Multiverse Many of us are concerned about new cultural practices that communication technologies bring about. In her now classic TED talk “Connected but alone?” Sherry Turkle talks about the world of instant communication as having the illusion of control through which we micromanage our immersion in mobile media and split virtual-physical presence. According to Turkle, what we fear is, on the one hand, being caught unprepared in a spontaneous event and, on the other hand, missing out or not documenting or recording events—a phenomenon that Abha Dawesar calls living in the “digital now.” There is, at the same time, a growing number of ways in which mobile computing devices connect us to new dimensions of everyday life and everyday experience: geo-locative services and augmented reality, convergent media and instantaneous participation in the social web. These technological capabilities arguably shift the nature of presence and set the stage for mobile users to communicate the flow of their everyday life through digital storytelling and media production. According to a Digital Insights survey on social media trends (Bennett), more than 500 million tweets are sent per day and 5 Vines tweeted every second; 100 hours of video are uploaded to YouTube every minute; more than 20 billion photos have been shared on Instagram to date; and close to 7 million people actively produce and publish content using social blogging platforms. There are more than 1 billion smartphones in the US alone, and most social media platforms are primarily accessed using mobile devices. The question is: how do we understand the enormity of these statistics as a coherent new media phenomenon and as a predominant form of media production and cultural participation? More importantly, how do mobile technologies re-mediate the way we see, hear, and perceive our surrounding evironment as part of the cultural circuit of capturing, sharing, and communicating with and through media artefacts? Such questions have furnished communication theory even before McLuhan’s famous tagline “the medium is the message”. Much of the discourse around communication technology and the senses has been marked by distinctions between “orality” and “literacy” understood as forms of collective consciousness engendered by technological shifts. Leveraging Jonathan Sterne’s critique of this “audio-visual litany”, an exploration of convergent multi-modal technologies allows us to focus instead on practices and techniques of use, considered as both perceptual and cultural constructs that reflect and inform social life. Here in particular, a focus on sound—or aurality—can help provide a fresh new entry point into studying technology and culture. The phenomenon of everyday photography is already well conceptualised as a cultural expression and a practice connected with identity construction and interpersonal communication (Pink, Visual). Much more rarely do we study the act of capturing information using mobile media devices as a multi-sensory practice that entails perceptual techniques as well as aesthetic considerations, and as something that in turn informs our unmediated sensory experience. Daisuke and Ito argue that—in contrast to hobbyist high-quality photographers—users of camera phones redefine the materiality of urban surroundings as “picture-worthy” (or not) and elevate the “mundane into a photographic object.” Indeed, whereas traditionally recordings and photographs hold institutional legitimacy as reliable archival references, the proliferation of portable smart technologies has transformed user-generated content into the gold standard for authentically representing the everyday. Given that visual approaches to studying these phenomena are well underway, this project takes a sound studies perspective, focusing on mediated aural practices in order to explore the way people make sense of their everyday acoustic environments using mobile media. Curation, in this sense, is a metaphor for everyday media production, illuminated by the practice of listening with mobile technology. Everyday Listening with Technology: A Case Study The present conceptualisation of curation emerged out of a participant-driven qualitative case study focused on using mobile media to make sense of urban everyday life. The study comprised 10 participants using iPod Touches (a device equivalent to an iPhone, without the phone part) to produce daily “aural postcards” of their everyday soundscapes and sonic experiences, over the course of two to four weeks. This work was further informed by, and updates, sonic ethnography approaches nascent in the World Soundscape Project, and the field of soundscape studies more broadly. Participants were asked to fill out a questionnaire about their media and technology use, in order to establish their participation in new media culture and correlate that to the documentary styles used in their aural postcards. With regard to capturing sonic material, participants were given open-ended instructions as to content and location, and encouraged to use the full capabilities of the device—that is, to record audio, video, and images, and to use any applications on the device. Specifically, I drew their attention to a recording app (Recorder) and a decibel measurement app (dB), which combines a photo with a static readout of ambient sound levels. One way most participants described the experience of capturing sound in a collection of recordings for a period of time was as making a “digital scrapbook” or a “media diary.” Even though they had recorded individual (often unrelated) soundscapes, almost everyone felt that the final product came together as a stand-alone collection—a kind of gallery of personalised everyday experiences that participants, if anything, wished to further organise, annotate, and flesh out. Examples of aural postcard formats used by participants: decibel photographs of everyday environments and a comparison audio recording of rain on a car roof with and without wipers (in the middle). Working with 139 aural postcards comprising more than 250 audio files and 150 photos and videos, the first step in the analysis was to articulate approaches to media documentation in terms of format, modality, and duration as deliberate choices in conversation with dominant media forms that participants regularly consume and are familiar with. Ambient sonic recordings (audio-only) comprised a large chunk of the data, and within this category there were two approaches: the sonic highlight, a short vignette of a given soundscape with minimal or no introduction or voice-over; and the process recording, featuring the entire duration of an unfolding soundscape or event. Live commentaries, similar to the conventions set forth by radio documentaries, represented voice-over entries at the location of the sound event, sometimes stationary and often in motion as the event unfolded. Voice memos described verbal reflections, pre- or post- sound event, with no discernable ambience—that is, participants intended them to serve as reflective devices rather than as part of the event. Finally, a number of participants also used the sound level meter app, which allowed them to generate visual records of the sonic levels of a given environment or location in the form of sound level photographs. Recording as a Way of Listening In their community soundwalking practice, Förnstrom and Taylor refer to recording sound in everyday settings as taking world experience, mediating it through one’s body and one’s memories and translating it into approximate experience. The media artefacts generated by participants as part of this study constitute precisely such ‘approximations’ of everyday life accessed through aural experience and mediated by the technological capabilities of the iPod. Thinking of aural postcards along this technological axis, the act of documenting everyday soundscapes involves participants acting as media producers, ‘framing’ urban everyday life through a mobile documentary rubric. In the process of curating these documentaries, they have to make decisions about the significance and stylistic framing of each entry and the message they wish to communicate. In order to bring the scope of these curatorial decisions into dialogue with established media forms, in this work’s analysis I combine Bill Nichols’s classification of documentary modes in cinema with Karin Bijsterveld’s concept of soundscape ‘staging’ to characterise the various approaches participants took to the multi-modal curation of their everyday (sonic) experience. In her recent book on the staging of urban soundscapes in both creative and documentary/archival media, Bijsterveld describes the representation of sound as particular ‘dramatisations’ that construct different kinds of meanings about urban space and engender different kinds of listening positions. Nichols’s articulation of cinematic documentary modes helps detail ways in which the author’s intentionality is reflected in the styling, design, and presentation of filmic narratives. Michel Chion’s discussion of cinematic listening modes further contextualises the cultural construction of listening that is a central part of both design and experience of media artefacts. The conceptual lens is especially relevant to understanding mobile curation of mediated sonic experience as a kind of mobile digital storytelling. Working across all postcards, settings, and formats, the following four themes capture some of the dominant stylistic dimensions of mobile media documentation. The exploratory approach describes a methodology for representing everyday life as a flow, predominantly through ambient recordings of unfolding processes that participants referred to in the final discussion as a ‘turn it on and forget it’ approach to recording. As a stylistic method, the exploratory approach aligns most closely with Nichols’s poetic and observational documentary modes, combining a ‘window to the world’ aesthetic with minimal narration, striving to convey the ‘inner truth’ of phenomenal experience. In terms of listening modes reflected in this approach, exploratory aural postcards most strongly engage causal listening, to use Chion’s framework of cinematic listening modes. By and large, the exploratory approach describes incidental documentaries of routine events: soundscapes that are featured as a result of greater attentiveness and investment in the sonic aspects of everyday life. The entries created using this approach reflect a process of discovering (seeing and hearing) the ordinary as extra-ordinary; re-experiencing sometimes mundane and routine places and activities with a fresh perspective; and actively exploring hidden characteristics, nuances of meaning, and significance. For instance, in the following example, one participant explores a new neighborhood while on a work errand:The narrative approach to creating aural postcards stages sound as a springboard for recollecting memories and storytelling through reflecting on associations with other soundscapes, environments, and interactions. Rather than highlighting place, routine, or sound itself, this methodology constructs sound as a window into the identity and inner life of the recordist, mobilising most strongly a semantic listening mode through association and narrative around sound’s meaning in context (Chion 28). This approach combines a subjective narrative development with a participatory aesthetic that draws the listener into the unfolding story. This approach is also performative, in that it stages sound as a deeply subjective experience and approaches the narrative from a personally significant perspective. Most often this type of sound staging was curated using voice memo narratives about a particular sonic experience in conjunction with an ambient sonic highlight, or as a live commentary. Recollections typically emerged from incidental encounters, or in the midst of other observations about sound. In the following example a participant reminisces about the sound of wind, which, interestingly, she did not record: Today I have been listening to the wind. It’s really rainy and windy outside today and it was reminding me how much I like the sound of wind. And you know when I was growing up on the wide prairies, we sure had a lot of wind and sometimes I kind of miss the sound of it… (Participant 1) The aesthetic approach describes instances where the creation of aural postcards was motivated by a reduced listening position (Chion 29)—driven primarily by the qualities and features of the soundscape itself. This curatorial practice for staging mediated aural experience combines a largely subjective approach to documenting with an absence of traditional narrative development and an affective and evocative aesthetic. Where the exploratory documentary approach seeks to represent place, routine, environment, and context through sonic characteristics, the aesthetic approach features sound first and foremost, aiming to represent and comment on sound qualities and characteristics in a more ‘authentic’ manner. The media formats most often used in conjunction with this approach were the incidental ambient sonic highlight and the live commentary. In the following example we have the sound of coffee being made as an important domestic ritual where important auditory qualities are foregrounded: That’s the sound of a stovetop percolator which I’ve been using for many years and I pretty much know exactly how long it takes to make a pot of coffee by the sound that it makes. As soon as it starts gurgling I know I have about a minute before it burns. It’s like the coffee calls and I come. (Participant 6) The analytical approach characterises entries that stage mediated aural experience as a way of systematically and inductively investigating everyday phenomena. It is a conceptual and analytical experimental methodology employed to move towards confirming or disproving a ‘hypothesis’ or forming a theory about sonic relations developed in the course of the study. As such, this approach most strongly aligns with Chion’s semantic listening mode, with the addition of the interactive element of analytical inquiry. In this context, sound is treated as a variable to be measured, compared, researched, and theorised about in an explicit attempt to form conclusions about social relationships, personal significance, place, or function. This analytical methodology combines an explicit and critical focus to the process of documenting itself (whether it be measuring decibels or systematically attending to sonic qualities) with a distinctive analytical synthesis that presents as ‘formal discovery’ or even ‘truth.’ In using this approach, participants most often mobilised the format of short sonic highlights and follow-up voice memos. While these aural postcards typically contained sound level photographs (decibel measurement values), in some cases the inquiry and subsequent conclusions were made inductively through sustained observation of a series of soundscapes. The following example is by a participant who exclusively recorded and compared various domestic spaces in terms of sound levels, comparing and contrasting them using voice memos. This is a sound level photograph of his home computer system: So I decided to record sitting next to my computer today just because my computer is loud, so I wanted to see exactly how loud it really was. But I kept the door closed just to be sort of fair, see how quiet it could possibly get. I think it peaked at 75 decibels, and that’s like, I looked up a decibel scale, and apparently a lawn mower is like 90 decibels. (Participant 2) Mediated Curation as a New Media Cultural Practice? One aspect of adopting the metaphor of ‘curation’ towards everyday media production is that it shifts the critical discourse on aesthetic expression from the realm of specialised expertise to general practice (“Everyone’s a photographer”). The act of curation is filtered through the aesthetic and technological capabilities of the smartphone, a device that has become co-constitutive of our routine sensorial encounters with the world. Revisiting McLuhan-inspired discourses on communication technologies stages the iPhone not as a device that itself shifts consciousness but as an agent in a media ecology co-constructed by the forces of use and design—a “crystallization of cultural practices” (Sterne). As such, mobile technology is continuously re-crystalised as design ‘constraints’ meet both normative and transgressive user approaches to interacting with everyday life. The concept of ‘social curation’ already exists in commercial discourse for social web marketing (O’Connell; Allton). High-traffic, wide-integration web services such as Digg and Pinterest, as well as older portals such as Reddit, all work on the principles of arranging user-generated, web-aggregated, and re-purposed content around custom themes. From a business perspective, the notion of ‘social curation’ captures, unsurprisingly, only the surface level of consumer behaviour rather than the kinds of values and meaning that this process holds for people. In the more traditional sense, art curation involves aesthetic, pragmatic, epistemological, and communication choices about the subject of (re)presentation, including considerations such as manner of display, intended audience, and affective and phenomenal impact. In his 2012 book tracing the discourse and culture of curating, Paul O’Neill proposes that over the last few decades the role of the curator has shifted from one of arts administrator to important agent in the production of cultural experiences, an influential cultural figure in her own right, independent of artistic content (88). Such discursive shifts in the formulation of ‘curatorship’ can easily be transposed from a specialised to a generalised context of cultural production, in which everyone with the technological means to capture, share, and frame the material and sensory content of everyday life is a curator of sorts. Each of us is an agent with a unique aesthetic and epistemological perspective, regardless of the content we curate. The entire communicative exchange is necessarily located within a nexus of new media practices as an activity that simultaneously frames a cultural construction of sensory experience and serves as a cultural production of the self. To return to the question of listening and a sound studies perspective into mediated cultural practices, technology has not single-handedly changed the way we listen and attend to everyday experience, but it has certainly influenced the range and manner in which we make sense of the sensory ‘everyday’. Unlike acoustic listening, mobile digital technologies prompt us to frame sonic experience in a multi-modal and multi-medial fashion—through the microphone, through the camera, and through the interactive, analytical capabilities of the device itself. Each decision for sensory capture as a curatorial act is both epistemological and aesthetic; it implies value of personal significance and an intention to communicate meaning. The occurrences that are captured constitute impressions, highlights, significant moments, emotions, reflections, experiments, and creative efforts—very different knowledge artefacts from those produced through textual means. Framing phenomenal experience—in this case, listening—in this way is, I argue, a core characteristic of a more general type of new media literacy and sensibility: that of multi-modal documenting of sensory materialities, or the curation of everyday life. References Allton, Mike. “5 Cool Content Curation Tools for Social Marketers.” Social Media Today. 15 Apr. 2013. 10 June 2015 ‹http://socialmediatoday.com/mike-allton/1378881/5-cool-content-curation-tools-social-marketers›. Bennett, Shea. “Social Media Stats 2014.” Mediabistro. 9 June 2014. 20 June 2015 ‹http://www.mediabistro.com/alltwitter/social-media-statistics-2014_b57746›. Bijsterveld, Karin, ed. Soundscapes of the Urban Past: Staged Sound as Mediated Cultural Heritage. Bielefeld: Transcript-Verlag, 2013. Burn, Andrew. Making New Media: Creative Production and Digital Literacies. New York, NY: Peter Lang Publishing, 2009. Daisuke, Okabe, and Mizuko Ito. “Camera Phones Changing the Definition of Picture-worthy.” Japan Media Review. 8 Aug. 2015 ‹http://www.dourish.com/classes/ics234cw04/ito3.pdf›. Chion, Michel. Audio-Vision: Sound on Screen. New York, NY: Columbia UP, 1994. Förnstrom, Mikael, and Sean Taylor. “Creative Soundwalks.” Urban Soundscapes and Critical Citizenship Symposium. Limerick, Ireland. 27–29 March 2014. Ito, Mizuko, ed. Hanging Out, Messing Around, and Geeking Out: Kids Living and Learning with New Media. Cambridge, MA: The MIT Press, 2010. Jenkins, Henry, Ravi Purushotma, Margaret Weigel, Katie Clinton, and Alice J. Robison. Confronting the Challenges of Participatory Culture: Media Education for the 21st Century. White Paper prepared for the McArthur Foundation, 2006. McLuhan, Marshall. Understanding Media: The Extensions of Man. New York: McGraw-Hill, 1964. Nichols, Brian. Introduction to Documentary. Bloomington & Indianapolis, Indiana: Indiana UP, 2001. Nielsen. “State of the Media – The Social Media Report.” Nielsen 4 Dec. 2012. 12 May 2015 ‹http://www.nielsen.com/us/en/insights/reports/2012/state-of-the-media-the-social-media-report-2012.html›. O’Connel, Judy. “Social Content Curation – A Shift from the Traditional.” 8 Aug. 2011. 11 May 2015 ‹http://judyoconnell.com/2011/08/08/social-content-curation-a-shift-from-the-traditional/›. O’Neill, Paul. The Culture of Curating and the Curating of Culture(s). Cambridge, MA: MIT Press, 2012. Pink, Sarah. Doing Visual Ethnography. London, UK: Sage, 2007. ———. Situating Everyday Life. London, UK: Sage, 2012. Sterne, Jonathan. The Audible Past: Cultural Origins of Sound Reproduction. Durham, NC: Duke UP, 2003. Schafer, R. Murray, ed. World Soundscape Project. European Sound Diary (reprinted). Vancouver: A.R.C. Publications, 1977. Turkle, Sherry. “Connected But Alone?” TED Talk, Feb. 2012. 8 Aug. 2015 ‹http://www.ted.com/talks/sherry_turkle_alone_together?language=en›.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography