Dissertations / Theses on the topic 'MULTI VIEW VIDEOS'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'MULTI VIEW VIDEOS.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Wang, Dongang. "Action Recognition in Multi-view Videos." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19740.
Full textCanavan, Shaun. "Face recognition by multi-frame fusion of rotating heads in videos /." Connect to resource online, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1210446052.
Full textCanavan, Shaun J. "Face Recognition by Multi-Frame Fusion of Rotating Heads in Videos." Youngstown State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1210446052.
Full textBalusu, Anusha. "Multi-Vehicle Detection and Tracking in Traffic Videos Obtained from UAVs." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1593266183551245.
Full textTwinanda, Andru Putra. "Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos." Thesis, Strasbourg, 2017. http://www.theses.fr/2017STRAD005/document.
Full textThe main objective of this thesis is to address the problem of activity recognition in the operating room (OR). Activity recognition is an essential component in the development of context-aware systems, which will allow various applications, such as automated assistance during difficult procedures. Here, we focus on vision-based approaches since cameras are a common source of information to observe the OR without disrupting the surgical workflow. Specifically, we propose to use two complementary video types: laparoscopic and OR-scene RGBD videos. We investigate how state-of-the-art computer vision approaches perform on these videos and propose novel approaches, consisting of deep learning approaches, to carry out the tasks. To evaluate our proposed approaches, we generate large datasets of recordings of real surgeries. The results demonstrate that the proposed approaches outperform the state-of-the-art methods in performing surgical activity recognition on these new datasets
Ozcinar, Cagri. "Multi-view video communication." Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807807/.
Full textSalvador, Marcos Jordi. "Surface reconstruction for multi-view video." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/108907.
Full textAquesta tesi presenta diferents tècniques per a la definiciò d’una metodologia per obtenir una representaciò alternativa de les seqüències de vídeo capturades per sistemes multi-càmera calibrats en entorns controlats, amb fons de l’escena conegut. Com el títol de la tesi suggereix, aquesta representació consisteix en una descripció tridimensional de les superfícies dels objectes de primer pla. Aquesta aproximació per la representació de les dades multi-vista permet recuperar part de la informació tridimensional de l’escena original perduda en el procés de projecció que fa cada càmera. L’elecció del tipus de representació i el disseny de les tècniques per la reconstrucció de l’escena responen a tres requeriments que apareixen en entorns controlats del tipus smart room o estudis de gravació, en què les seqüències capturades pel sistema multi-càmera són utilitzades tant per aplicacions d’anàlisi com per diferents mètodes de visualització interactius. El primer requeriment és que el mètode de reconstrucció ha de ser ràpid, per tal de poder-ho utilitzar en aplicacions interactives. El segon és que la representació de les superfícies sigui eficient, de manera que en resulti una compressió de les dades multi-vista. El tercer requeriment és que aquesta representació sigui efectiva, és a dir, que pugui ser utilitzada en aplicacions d’anàlisi, així com per visualitació. Un cop separats els continguts de primer pla i de fons de cada vista –possible en entorns controlats amb fons conegut–, l’estratègia que es segueix en el desenvolupament de la tesi és la de dividir el procés de reconstrucció en dues etapes. La primera consisteix en obtenir un mostreig de les superfícies (incloent orientació i textura). La segona etapa proporciona superfícies tancades, contínues, a partir del conjunt de mostres, mitjançant un procés d’interpolació. El resultat de la primera etapa és un conjunt de punts orientats a l’espai 3D que representen localment la posició, orientació i textura de les superfícies visibles pel conjunt de càmeres. El procés de mostreig s’interpreta com un procés de cerca de posicions 3D que resulten en correspondències de característiques de la imatge entre diferents vistes. Aquest procés de cerca pot ser conduït mitjançant diferents mecanismes, els quals es presenten a la primera part d’aquesta tesi. La primera proposta és fer servir un mètode basat en les imatges que busca mostres de superfície al llarg de la semi-recta que comença al centre de projeccions de cada càmera i passa per un determinat punt de la imatge corresponent. Aquest mètode s’adapta correctament al cas de voler explotar foto-consistència en un escenari estàtic i presenta caracterìstiques favorables per la seva utilizació en GPUs–desitjable–, però no està orientat a explotar les redundàncies temporals existentsen seqüències multi-vista ni proporciona superfícies tancades. El segon mètode efectua la cerca a partir d’una superfície inicial mostrejada que tanca l’espai on es troben els objectes a reconstruir. La cerca en direcció inversa a les normals –apuntant a l’interior– permet obtenir superfícies tancades amb un algorisme que explota la correlació temporal de l’escena per a l’evolució de reconstruccions 3D successives al llarg del temps. Un inconvenient d’aquest mètode és el conjunt d’operacions topològiques sobre la superfície inicial, que en general no són aplicables eficientment en GPUs. La tercera estratègia de mostreig està orientada a la paral·lelització –GPU– i l’explotació de correlacions temporals i espacials en la cerca de mostres de superfície. Definint un espai inicial de cerca que inclou els objectes a reconstruir, es busquen aleatòriament unes quantes mostres llavor sobre la superfície dels objectes. A continuació, es continuen buscant noves mostres de superfície al voltant de cada llavor –procés d’expansió– fins que s’aconsegueix una densitat suficient. Per tal de millorar l’eficiència de la cerca inicial de llavors, es proposa reduir l’espai de cerca, explotant d’una banda correlacions temporals en seqüències multi-vista i de l’altra aplicant multi-resolució. A continuació es procedeix amb l’expansió, que explota la correlació espacial en la distribució de les mostres de superfície. A la segona part de la tesi es presenta un algorisme de mallat que permet interpolar la superfície entre les mostres. A partir d’un triangle inicial, que connecta tres punts coherentment orientats, es procedeix a una expansió iterativa de la superfície sobre el conjunt complet de mostres. En relació amb l’estat de l’art, el mètode proposat presenta una reconstrucció molt precisa (no modifica la posició de les mostres) i resulta en una topologia correcta. A més, és prou ràpid com per ser utilitzable en aplicacions interactives, a diferència de la majoria de mètodes disponibles. Els resultats finals, aplicant ambdues etapes –mostreig i interpolació–, demostren la validesa de la proposta. Les dades experimentals mostren com la metodologia presentada permet obtenir una representació ràpida, eficient –compressió– i efectiva –completa– dels elements de primer pla de l’escena.
Abdullah, Jan Mirza, and Mahmododfateh Ahsan. "Multi-View Video Transmission over the Internet." Thesis, Linköping University, Department of Electrical Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57903.
Full text3D television using multiple views rendering is receiving increasing interest. In this technology a number of video sequences are transmitted simultaneously and provides a larger view of the scene or stereoscopic viewing experience. With two views stereoscopic rendition is possible. Nowadays 3D displays are available that are capable of displaying several views simultaneously and the user is able to see different views by moving his head.
The thesis work aims at implementing a demonstration system with a number of simultaneous views. The system will include two cameras, computers at both the transmitting and receiving end and a multi-view display. Besides setting up the hardware, the main task is to implement software so that the transmission can be done over an IP-network.
This thesis report includes an overview and experiences of similar published systems, the implementation of real time video, its compression, encoding, and transmission over the internet with the help of socket programming and finally the multi-view display in 3D format. This report also describes the design considerations more precisely regarding the video coding and network protocols.
Fecker, Ulrich. "Coding techniques for multi-view video signals /." Aachen : Shaker, 2009. http://d-nb.info/993283179/04.
Full textOzkalayci, Burak Oguz. "Multi-view Video Coding Via Dense Depth Field." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607517/index.pdf.
Full textCigla, Cevahir. "Real-time Stereo To Multi-view Video Conversion." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614513/index.pdf.
Full textstereo matching and virtual view rendering that enable extraction of 3D information from stereo video and synthesis of inexistent virtual views, respectively. In the intermediate steps of these functional blocks, a novel edge-preserving filter is proposed that recursively constructs connected support regions for each pixel among color-wise similar neighboring pixels. The proposed recursive update structure eliminates pre-defined window dependency of the conventional approaches, providing complete content adaptibility with quite low computational complexity. Based on extensive tests, it is observed that the proposed filtering technique yields better or competitive results against some leading techniques in the literature. The proposed filter is mainly applied for stereo matching to aggregate cost functions and also handles occlusions that enable high quality disparity maps for the stereo pairs. Similar to box filter paradigm, this novel technique yields matching of arbitrary-shaped regions in constant time. Based on Middlebury benchmarking, the proposed technique is currently the best local matching technique in the literature in terms of both precision and complexity. Next, virtual view synthesis is conducted through depth image based rendering, in which reference color views of left and right pairs are warped to the desired virtual view using the estimated disparity maps. A feedback mechanism based on disparity error is introduced at this step to remove salient distortions for the sake of visual quality. Furthermore, the proposed edge-aware filter is re-utilized to assign proper texture for holes and occluded regions during view synthesis. Efficiency of the proposed scheme is validated by the real-time implementation on a special graphics card that enables parallel computing. Based on extensive experiments on stereo matching and virtual view rendering, proposed method yields fast execution, low memory requirement and high quality outputs with superior performance compared to most of the state-of-the-art techniques.
Lawan, Sagir. "Adaptive intra refresh for robust wireless multi-view video." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/13078.
Full textTalebpourazad, Mahsa. "3D-TV Content generation and multi-view video coding." Thesis, University of British Columbia, 2010. http://hdl.handle.net/2429/25949.
Full textBouyagoub, Samira. "Multi-camera optimisation for view synthesis and video communications." Thesis, University of Bristol, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.529898.
Full textLee, Yung-Lyul. "Trend of Multi-View Video Coding in Korea (3D AV)." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10360.
Full textEkmekcioglu, Erhan. "Advanced three-dimensional multi-view video coding and evaluation techniques." Thesis, University of Surrey, 2009. http://epubs.surrey.ac.uk/843601/.
Full textPouladzadeh, Parvaneh. "Design and Implementation of Video View Synthesis for the Cloud." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/37048.
Full textYang, Fan. "Integral Video Coding." Thesis, KTH, Kommunikationsteori, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-162922.
Full textCigla, Cevahir. "Dense Depth Map Estimation For Object Segmentation In Multi-view Video." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608647/index.pdf.
Full textHany, Hanafy Mahmoud Said. "Low bitrate multi-view video coding based on H.264/AVC." Thesis, Staffordshire University, 2015. http://eprints.staffs.ac.uk/2206/.
Full textMohib, Hamdullah. "End-to-end 3D video communication over heterogeneous networks." Thesis, Brunel University, 2014. http://bura.brunel.ac.uk/handle/2438/8293.
Full textAndersson, Håkan. "3D Video Playback : A modular cross-platform GPU-based approach for flexible multi-view 3D video rendering." Thesis, Mittuniversitetet, Institutionen för informationsteknologi och medier, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-12389.
Full textYamaguchi, Tatsuhisa. "3D Video Capture of a Moving Object in a Wide Area Using Active Cameras." 京都大学 (Kyoto University), 2013. http://hdl.handle.net/2433/180466.
Full textSu, Tianyu. "An Architecture for 3D Multi-view video Transmission based on Dynamic Adaptive Streaming over HTTP (DASH)." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32505.
Full textDanielsen, Eivind. "An exploration of user needs and experiences towards an interactive multi-view video presentation." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8997.
Full textAfter a literature review about multi-view video technologies, it was focused on a multi-view video presentation where the user receives multiple video streams and can freely switch between them. User interaction was considered to be a key function for this system. The goal was to explore user needs and expectations towards an interactive multi-view video presentation. A multi-view video player was implemented according to specifications in possible scenarios and users needs and expectations conducted through an online survey. The media player was written in objective-C, Cocoa and was developed using the integrated development environment tool XCode and graphics user interface tool Interface Builder. The media player was built around Quicktime's framework QTKit. A plugin tool, Perian, added extra media format support to QuickTime. The results from the online survey shows that the minority has experience with such a multi-view video presentation. However, those who had tried multi-view video are positive towards it. The usage of the system is strongly dependent on content. The content should be highly entertainment- and action-oriented. Switching of views was to be considered a key feature by experienced users of the conducted test of the multi-view video player. This feature provides a more interactive application and more satisfied users, when the content is suitable for multi-view video. However, rearranging and hiding of views also contributed to a positive viewing experience. However, it is important to notice that these results are not complete in order to fully investigate users need and expectations towards an interactive multi-view video presentation.
Fecker, Ulrich [Verfasser]. "Coding Techniques for Multi-View Video Signals : Verfahren zur Codierung von Mehrkamera-Videosignalen / Ulrich Fecker." Aachen : Shaker, 2009. http://d-nb.info/1156517311/34.
Full textGale, Nicholas C. "FUSION OF VIDEO AND MULTI-WAVEFORM FMCW RADAR FOR TRAFFIC SURVEILLANCE." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1315857639.
Full textFülöp-Balogh, Beatrix-Emőke. "Acquisition multi-vues et rendu de scènes animées." Thesis, Lyon, 2021. http://www.theses.fr/2021LYSE1308.
Full textRecent technological breakthroughs have led to an abundance of consumer friendly video recording devices. Nowadays new smart phone models, for instance, are equipped not only with multiple cameras, but also depth sensors. This means that any event can easily be captured by several different devices and technologies at the same time, and it raises questions about how one can process the data in order to render a meaningful 3D scene. Most current solutions focus on static scenes only, LiDar scanners produce extremely accurate depth maps, and multi-view stereo algorithms can reconstruct a scene in 3D based on a handful of images. However, these ideas are not directly applicable in case of dynamic scenes. Depth sensors trade accuracy for speed, or vice versa, and color image based methods suffer from temporal inconsistencies or are too computationally demanding. In this thesis we aim to provide consumer friendly solutions to fuse multiple, possibly heterogeneous, technologies to reconstruct and render 3D dynamic scenes. Firstly, we introduce an algorithm that corrects distortions produced by small motions in time-of-flight acquisitions and outputs a corrected animated sequence. We do so by combining a slow but high-resolution time-of-flight LiDAR system and a fast but low-resolution consumer depth sensor. We cast the problem as a curve-to-volume registration, by seeing the LiDAR point cloud as a curve in the 4-dimensional spacetime and the captured low-resolution depth video as a 4-dimensional spacetime volume. We then advect the details of the high-resolution point cloud to the depth video using its optical flow. Second, we tackle the case of the reconstruction and rendering of dynamic scenes captured by multiple RGB cameras. In casual settings, the two problems are hard to merge: structure from motion (SfM) produces spatio-temporally unstable and sparse point clouds, while the rendering algorithms that rely on the reconstruction need to produce temporally consistent videos. To ease the challenge, we consider the two steps together. First, for SfM, we recover stable camera poses, then we defer the requirement for temporally-consistent points across the scene and reconstruct only a sparse point cloud per timestep that is noisy in space-time. Second, for rendering, we present a variational diffusion formulation on depths and colors that lets us robustly cope with the noise by enforcing spatio-temporal consistency via per-pixel reprojection weights derived from the input views. Overall, our work contributes to the understanding of the acquisition and rendering of casually captured dynamic scenes
Ding, Sihao. "Multi-Perspective Image and Video Processing for Human-Machine Interaction." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1488462115943949.
Full textThonat, Théo. "Complétion d'image, segmentation et mixture de vidéos dans un contexte multi-vue, pour un rendu basé image plus polyvalent." Thesis, Université Côte d'Azur (ComUE), 2019. http://theses.univ-cotedazur.fr/2019AZUR4047.
Full textCreating realistic images with the traditional rendering pipeline requires tedious work, starting with complex manual work to create 3D models, materials, and lighting, and then computationally expensive realistic rendering. Such a process requires both skilled artists and significant computing power. Image Based Rendering (IBR) is an alternative way to create high quality content by only using an unstructured set of photos as input. IBR allows casual users to create and render realistic and immersive scenes in real time, for applications such as virtual tourism, cultural heritage, interactive mapping, urban and architecture planning, and movie production. Existing IBR methods produce generally good image quality, but still suffer from limitations. First, many types of scene content produce visually-unappealing rendering artifacts, because the underlying scene representation is insufficient, e.g, for reflective surfaces, thin structures, and dynamic content. Second, scenes are often captured with real- world constraints which require editing to meet the user requirements, yet existing IBR methods do not allow this. To address editing, we propose to extend single image inpainting to allow sparse multiview object removal. Such inpainting requires to hallucinating both color and geometry behind the object to be removed in a multi-view coherent fashion. Our method reduces rendering artifacts by removing objects which are not well represented by IBR methods or by moving well represented objects in the scene. To address rendering quality, we enlarge the scope of casual IBR in two different ways. First we deal with the case of thin structures, which are extremely challenging for multi-view 3D reconstruction and represent a major limitation for IBR in an urban context. We propose a pipeline which locates and renders thin structures supported by simple surfaces. We introduce both a multi-view segmentation algorithm for thin structures, and a rendering method which extends traditional IBR with transparency information. Second, we propose an approach to extend IBR to dynamic contents. By focusing on time-dependent stochastic textures, we preserve both the casual capture setup and the free-viewpoint navigation of the rendered scene. Our key insight is to use a video representation which is adapted to video looping and spatio-temporal blending. Our results for all methods show improved visual quality compared to previous solutions on a variety of input scenes
Kulasekera, Sunera C. "Multiplierless DFT, DCT Approximations for Multi-Beam RF Aperture and HEVC HD Video Applications: Digital Systems Implementation." University of Akron / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=akron1454023102.
Full textPowers, Jennifer Ann. ""Designing" in the 21st century English language arts classroom processes and influences in creating multimodal video narratives /." [Kent, Ohio] : Kent State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1194639677.
Full textTitle from PDF t.p. (viewed Mar. 31, 2008). Advisor: David Bruce. Keywords: multiliteracies, multi-modal literacies, language arts education, secondary education, video composition. Includes survey instrument. Includes bibliographical references (p. 169-179).
Bartoli, Simone. "Deploying deep learning for 3D reconstruction from monocular video sequences." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22402/.
Full textMora, Elie-Gabriel. "Codage multi-vues multi-profondeur pour de nouveaux services multimédia." Thesis, Paris, ENST, 2014. http://www.theses.fr/2014ENST0007/document.
Full textThis PhD. thesis deals with improving the coding efficiency in 3D-HEVC. We propose both constrained approaches aimed towards standardization, and also more innovative approaches based on optical flow. In the constrained approaches category, we first propose a method that predicts the depth Intra modes using the ones of the texture. The inheritance is driven by a criterion measuring how much the two are expected to match. Second, we propose two simple ways to improve inter-view motion prediction in 3D-HEVC. The first adds an inter-view disparity vector candidate in the Merge list and the second modifies the derivation process of this disparity vector. Third, an inter-component tool is proposed where the link between the texture and depth quadtree structures is exploited to save both runtime and bits through a joint coding of the quadtrees. In the more innovative approaches category, we propose two methods that are based on a dense motion vector field estimation using optical flow. The first computes such a field on a reconstructed base view. It is then warped at the level of a dependent view where it is inserted as a dense candidate in the Merge list of prediction units in that view. The second method improves the view synthesis process: four fields are computed at the level of the left and right reference views using a past and a future temporal reference. These are then warped at the level of the synthesized view and corrected using an epipolar constraint. The four corresponding predictions are then blended together. Both methods bring significant coding gains which confirm the potential of such innovative solutions
Bosc, Emilie. "Compression des données Multi-View-plus-Depth (MVD) : De l'analyse de la qualité perçue à l'élaboration d'outils pour le codage des données MVD." Phd thesis, INSA de Rennes, 2012. http://tel.archives-ouvertes.fr/tel-00777710.
Full textHossain, Md Amjad. "DESIGN OF CROWD-SCALE MULTI-PARTY TELEPRESENCE SYSTEM WITH DISTRIBUTED MULTIPOINT CONTROL UNIT BASED ON PEER TO PEER NETWORK." Kent State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=kent1606570495229229.
Full textEzzo, Anthony John. "Using typography and iconography to express emotion (or meaning) in motion graphicsas a learning tool for ESL (English as a second language) in a multi-device platform." Kent State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=kent1460146374.
Full text魏震豪. "Multi-view video synthesis from stereo videos with iterative depth refinement." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/77312469437977067092.
Full text國立清華大學
資訊工程學系
101
In this thesis, we propose a novel algorithm to refine depth maps and generate multi-view video sequences from two-view video sequences for modern autostereoscopic display. In order to generate realistic contents for virtual views, high-quality depth maps are very critical to the view synthesis results. Therefore, refining the depth maps is the main challenging problem in the task. We propose an iterative depth refinement algorithm, including error detection and error correction, to correct errors in depth map. The error types are classified into across-view color-depth-inconsistency errors and local color-depth-inconsistency errors. Then, we correct the error pixels based on sampling local candidates. Next, we apply a trilateral filter that considers intensity, spatial and temporal terms into the filter weighting to enhance the temporal and spatial consistencies across frames. So the virtual views can be synthesized according to the refined depth maps. To combine both warped images, disparity-based view interpolation is introduced to alleviate the translucent artifacts. Finally, a directional filter is applied to reduce the aliasing around the object boundaries. Finally, the high-quality virtual views between the two views are generated. We demonstrate the superior image quality of the synthesized virtual views by using the proposed algorithm over the state-of-the-art view synthesis methods through experiments on benchmarking image and video datasets.
Dai, Yu-Chia, and 戴佑家. "Automatic Alignment of Multi-View Event Videos by Fast Sequence Matching." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/10220095380663952412.
Full text國立臺灣大學
資訊工程學研究所
99
The high availability of digital video capture devices and the increasing diversity of social video sharing sites make sharing and searching become easy. Multi-view event videos provide diverse visual content and different audio information of the same event. Compared with single-view video, users prefer a more diverse and comprehensive views (video segments) of the same event. Therefore, the rise of multi-view event videos alignment becomes more and more important. It is a challenging work because the scene’s visual appearances from different views look apparently dissimilar. This work has been solved using audio before, but videos’ audio is not always available. In this work, we investigate the effect of different visual features and focus on regions of interest. Moreover, we propose a time sensitive dynamic time warping algorithm which takes temporal factor into consideration. Besides, we can reduce the computational cost by LSH indexing to improve time efficiency. Experimental results show that our proposed method provides an efficiency way to align videos and derive robust matching results.
Lee, Ji-Tang, and 李繼唐. "Efficient Caching for Multi-view 3D Videos with Depth-Image-Based Rendering." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/02216570548253704949.
Full text國立臺灣大學
電信工程學研究所
104
Due to the emergence of mobile 3D and VR devices, multi-view 3D videos are expected to play increasingly important roles shortly. Compared with traditional single-view videos, it is envisaged that a multi-view 3D video requires a larger storage space. Nevertheless, efficient caching of multi-view 3D videos in a proxy has not been explored in the literature. In this thesis, therefore, we first observe that the storage space can be effectively reduced by leveraging Depth Image Based Rendering (DIBR) in multi-view 3D. We then formulate a new cache replacement problem, named View Selection and Cache Operation (VSCO), and find the optimal policy based on Markov Decision Process. In addition, we devise an efficient and effective algorithm, named Efficient View Exploration Algorithm (EVEA), to solve the problem in large cases. Simulation results manifest that the proposed algorithm can significantly improve the cache hit rate and reduce the total cost compared with the previous renowned cache replacement algorithms.
GOEL, YUVRAJ. "3D VIDEO CODING." Thesis, 2011. http://dspace.dtu.ac.in:8080/jspui/handle/repository/13874.
Full textInterest in 3DTV has increased recently with more and more products and services becoming available for the consumer market. 3D video is an emerging trend in developing digital video system. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. The video (texture) plus depth (V+D) representation is an interesting method to realize 3D video. A depth map is simply a grayscale image which represents the distance between a pixel and camera in black and white. However, a major problem when dealing with multi-view video is the intrinsically large amount of data to be compressed decompressed and rendered. We extend the standard H.264/MPEG-4 MVC for handling the compression of multi-view video. An algorithm is implemented to compress the data in which instead of separate bit-streams each for depth and texture, only one bit-stream for texture (also containing depth data) is developed. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed algorithm performs the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. The goal of this thesis work is to establish an efficient method to encode depth information along with multiple but limited numbers of views Software used is JMVC (Joint Multi-view Video Coding) jmvc8.0, which is an open source software for the Multi-view Video Coding (MVC) project of the Joint Video Team (JVT) of the ISO/IEC Moving Pictures Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG).
Szu-HuaWu and 吳思樺. "A Parallax Adjustable Multi-view Rendering System based on Stereoscopic Videos with Depth Information." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/61927844729721099451.
Full text國立成功大學
電腦與通信工程研究所
101
With the development of 3D techniques, the related products are now available in recent years. However, the traditional 3D television should watch the TV to feel the stereo perception with the equipment of polarized glasses, shutter glasses or red-cyan glasses. Nevertheless, it is inconvenient to wear a pair of glasses while watching 3D television. Therefore, in order to increase the application area of 3D techniques, the naked-eye stereoscopic display must be the trend of the future development. On the other hand, the multiview display system can provide the different viewing angles to perceive stereo visions. The goal of this thesis focuses on a parallax adjustable multiview rendering system. With the rich view information, we can provide better rendering results than the system with one view plus on depth. The proposed stereo-based direct single-image multiview rendering algorithm is based on the traditional depth-image-based rendering (DIBR) algorithm. It can directly render the output image with multiview information. The proposed stereo-based parallax adjustable multiview rendering system is implemented with GPU to reduce the rendering time.
Huang, Yao-Ching, and 黃耀慶. "Summarization of Multi-view Surveillance Videos by an Object-Based Key Frame Extraction Method." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/06942591994137463963.
Full text輔仁大學
電機工程學系
99
Video summarization is an important technique which has been an interested subject in many research fields which generates a short summary of a video for the presentation to users with browsing and navigation. Multi-view development is also beneficial to video surveillance, since the vast public security area installed a lot of cameras need to filter of huge non-important information. In this paper, we propose a multi-view video summarization approach that extracts semantic-level key frames by object information from multiple cameras. Our main goal is to avoid the redundant key frames with multi-view videos that the dominant camera selection presented to decentralize key frame extraction approach. The proposed approach is a new formulation which integrates camera selection algorithm into key frame extraction for optimization. This proposed approach has been verified by large amounts video dataset that include different surveillance scenes, and comparing with other camera selection method. This method proved by experiments not only can extract representative key frames but also reduce redundant key frames in multi-view videos.
TungHsiao and 蕭桐. "Improved Depth Upsampling and Multi-view Generation for Depacking Centralized Texture Depth Depacked 3D Videos." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/rt8jj7.
Full textHuang, Xin-Xian, and 黃信憲. "Efficient Multi-view Video Coding Algorithm Using Inter-View Information." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/zma67u.
Full text國立東華大學
電機工程學系
100
Multi-view video coding (MVC) is extended based on H.264/AVC and can improve the coding efficiency of multi-view video. However, MVC produces much more computational complexity than single view video coding. Therefore, how to promote the coding efficiency is a very important issue for the numerous applications of multi-view video. This thesis proposes a fast mode decision algorithm to solve this enormous computational complexity, and early decide the mode partition. We utilize the best mode partition in the reference views to determine the complexity of the macroblock in the current view. And then, the mode candidates needed to calculate can be obtained according to the complexity. If the complexity is belong to low or median, the search range can be reduced. The threshold of the rate-distortion cost for the current macroblock is calculated by the ones of co-located and neighboring macroblocks, in previously coded view and utilized as the criterion for early termination. The motion vector difference with co-located macroblock in the reference view is utilized to adaptively adjust the search range in the current macroblock. The experimental results verify our proposed algorithm can achieve 81.04% and 92.34% of time saving for fast search TZ and full search, respectively, and keep good performance of quality and bit-rate.
Hu, Kai-Wen, and 胡凱文. "Multicast Multi-View 3D Video over WLANs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/zt7d57.
Full text國立臺灣大學
資訊工程學研究所
105
In recent years, many video service providers start to provide 3D video contents,moreover, there is a new service which is called multi-view 3D video arise. The multi-view 3D video provides multiple view point to be chosen, also, it offer better immersive experience than traditional single view 3D video to users. But, the transmission of all views in multi-view 3D video would require significantly bandwidth consumption. Another interesting and well developed technology is the Wireless Local Area Networks (WLANs) which could support efficient multicast service on streaming with the limited wireless bandwidth. However, multicast to a set of heterogeneous user over multiple wireless Access Points (APs) also been a complicated problem. In this thesis, we wont to solve the multicast multi-view 3D video in WLANs problem. We exploit the Depth Image Based Rendering (DIBR) technology to synthesize the user subscribed view from nearby left and right views. Therefore, in this problem we need to decide the user to AP association and the view session to be multicast from each AP. And the aim of our problem is maximize the number of satisfied users with bandwidth constraint. We propose an efficiently algorithm to solve this problem, and the simulation results show that our algorithm could further consider the view thesis and the inter APs coordination, and effectively satisfy most user demands.
Wang, Poching, and 王柏青. "Morphing-based View Synthesis without Depth for Multi-view Video Players." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/03497455498882364187.
Full text國立中正大學
資訊工程研究所
100
In this thesis, we present a morphing based view synthesis approach. With the proposed algorithm, users can use two cameras and generate the different view point of virtual view. First, we use the SIFT algorithm to detect and extract feature points. Then we use the correspondences and make use of normalized direct linear transformation to solve the parameters of the multi-view geometry. After the morphed view is obtained, the distortion occurred in the initial virtual view would be eliminated in Image Re-projection and Repairing stage. The method we use does not need such a calibration; instead, it makes use of the multiple view geometry to achieve this special effect. Moreover, since the proposed method is morphing based, we do not need to make the model. The method proposed in this thesis can be applied for various applications, not only we can use in the digital photo frame or the multi-angle display but also we can use it in the 360-degree street-level imagery.
Mahapatra, Ansuman. "Framework and Algorithms for Multi-view Video Synopsis." Thesis, 2018. http://ethesis.nitrkl.ac.in/9442/1/2018_PhD_AMahapatra_511CS108_Framework.pdf.
Full textKuan, Yuan-Kai, and 官元凱. "Error Concealment Algorithm Using Inter-View Correlation for Multi-View Video Decoding." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/44843855194809419466.
Full text國立東華大學
電機工程學系
101
This thesis proposes an error concealment algorithm for the whole frame loss of multi-view video decoding. Compared with H.264/AVC, Multi-view Video Coding(MVC) utilizes inter-view correlation to reduce bit-rate. However, when the network transmission delay happens or error occurs during transmission, the decoded video is damaged and error propagation occurs. How to conceal the error is a very important issue. When the whole frame is lost or damaged in the two-view sequence, the proposed algorithm uses the inter-view and intra-view domains to conceal the damaged frame. Experimental results show that our proposed algorithm provides better video quality than previous work and reduces error propagation.
Huang, Jun-Te, and 黃潤德. "View Synthesis for Multi-view Video Plus Depth Using Spatial-temporal Information." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/52759014517694521488.
Full text國立中正大學
資訊工程研究所
102
View synthesis is an important technique for free viewpoint TV applications to reducing the transmission bit rate. In order to simultaneously reduce bit rate and display high quality video, a general solution is using reference viewpoint video plus the depth information (multi-view video plus depth format) to synthesize virtual view point video. In this study, a view synthesis for multi-view video plus depth using spatial-temporal information is proposed. The proposed approach includes five steps: (1) disparity map estimation, optimization, and projected onto a virtual viewpoint; (2) use spatial information to synthesis virtual view, (3) use temporal information to synthesis virtual view , (4) select the best virtual view, (5) motion compensation. Based on the experimental results, the synthesis views of the proposed approach are better than those of view synthesis reference software approach (VSRS).