To see the other types of publications on this topic, follow the link: 3D video system.

Dissertations / Theses on the topic '3D video system'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 46 dissertations / theses for your research on the topic '3D video system.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Dongbin. "Development of a 3D video-theodolite image based survey system." Thesis, University of East London, 2003. http://roar.uel.ac.uk/3555/.

Full text
Abstract:
The scope of this thesis is to investigate and to develop a zoom lens videotheodolite system, which is formed by a zoom lens CCD camera, a motorised theodolite and a computer with the developed system software. A novel automatic calibration procedure is developed for the zoom lens CCD video-theodolite system. This method is suitable for the efficient calibration of any other video-theodolite system used in fieldwork. A novel image edge detection algorithm is developed. The maximum directional edge detection algorithm depends on the maximum directional gradient to judge the edges in an image. A novel line detection algorithm based on the Hough lines transform was developed for the applications of the video-theodolite system. This new algorithm can obtain not only the line parameters of r and 9 but also the data of the two terminal image points of the detected line. A novel method of constructing panoramic images from sequential images is developed based on the zoom lens video-theodolite system. It is effectively applied in the system to generate a panorama of a scene. A novel image matching algorithm is developed. The line features are matched with the constraint of epipolar lines. Through an experiment to match real world buildings, it is shown that the novel stereo matching algorithm is robust and effective. Another novel image matching algorithm is also developed. This algorithm is used to automatically measure the image displacement between the stereo images for the video-theodolite system. The accuracy of the zoom lens video-theodolite system is evaluated by three experiments. The measuring accuracy of this system can be within 0.09 pixels. The software of this system based on PC system is developed. It has a standard MFC windows interface with menu controls. This system software includes the control functions, measuring functions and image processing functions for the zoom lens video-theodolite system.
APA, Harvard, Vancouver, ISO, and other styles
2

Magaia, Lourenco Lazaro. "A video-based traffic monitoring system." Thesis, Stellenbosch : University of Stellenbosch, 2006. http://hdl.handle.net/10019.1/1243.

Full text
Abstract:
Thesis (PhD (Mathematical Sciences. Applied Mathematics))--University of Stellenbosch, 2006.
This thesis addresses the problem of bulding a video-based traffic monitoring system. We employ clustering, trackiing and three-dimensional reconstruction of moving objects over a long image sequence. We present an algorithms that robustly recovers the motion and reconstructs three-dimensional shapes from a sequence of video images, Magaia et al [91]. The problem ...
APA, Harvard, Vancouver, ISO, and other styles
3

Markström, Johannes. "3D Position Estimation of a Person of Interest in Multiple Video Sequences : People Detection." Thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-98140.

Full text
Abstract:
In most cases today when a specific person's whereabouts is monitored through video surveillance it is done manually and his or her location when not seen is based on assumptions on how fast he or she can move. Since humans are good at recognizing people this can be done accurately, given good video data, but the time needed to go through all data is extensive and therefore expensive. Because of the rapid technical development computers are getting cheaper to use and therefore more interesting to use for tedious work. This thesis is a part of a larger project that aims to see to what extent it is possible to estimate a person of interest's time dependent 3D position, when seen in surveillance videos. The surveillance videos are recorded with non overlapping monocular cameras. Furthermore the project aims to see if the person of interest's movement, when position data is unavailable, could be predicted. The outcome of the project is a software capable of following a person of interest's movement with an error estimate visualized as an area indicating where the person of interest might be at a specific time. This thesis main focus is to implement and evaluate a people detector meant to be used in the project, reduce noise in position measurement, predict the position when the person of interest's location is unknown, and to evaluate the complete project. The project combines known methods in computer vision and signal processing and the outcome is a software that can be used on a normal PC running on a Windows operating system. The software implemented in the thesis use a Hough transform based people detector and a Kalman filter for one step ahead prediction. The detector is evaluated with known methods such as Miss-rate vs. False Positives per Window or Image (FPPW and FPPI respectively) and Recall vs. 1-Precision. The results indicate that it is possible to estimate a person of interest's 3D position with single monocular cameras. It is also possible to follow the movement, to some extent, were position data are unavailable. However the software needs more work in order to be robust enough to handle the diversity that may appear in different environments and to handle large scale sensor networks.
APA, Harvard, Vancouver, ISO, and other styles
4

Johansson, Victor. "3D Position Estimation of a Person of Interest in Multiple Video Sequences : Person of Interest Recognition." Thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-97970.

Full text
Abstract:
Because of the increase in the number of security cameras, there is more video footage available than a human could efficiently process. In combination with the fact that computers are getting more efficient, it is getting more and more interesting to solve the problem of detecting and recognizing people automatically. Therefore a method is proposed for estimating a 3D-path of a person of interest in multiple, non overlapping, monocular cameras. This project is a collaboration between two master theses. This thesis will focus on recognizing a person of interest from several possible candidates, as well as estimating the 3D-position of a person and providing a graphical user interface for the system. The recognition of the person of interest includes keeping track of said person frame by frame, and identifying said person in video sequences where the person of interest has not been seen before. The final product is able to both detect and recognize people in video, as well as estimating their 3D-position relative to the camera. The product is modular and any part can be improved or changed completely, without changing the rest of the product. This results in a highly versatile product which can be tailored for any given situation.
APA, Harvard, Vancouver, ISO, and other styles
5

Martell, Angel Alfredo. "Benchmarking structure from motion algorithms with video footage taken from a drone against laser-scanner generated 3D models." Thesis, Luleå tekniska universitet, Rymdteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-66280.

Full text
Abstract:
Structure from motion is a novel approach to generate 3D models of objects and structures. The dataset simply consists of a series of images of an object taken from different positions. The ease of the data acquisition and the wide array of available algorithms makes the technique easily accessible. The structure from motion method identifies features in all the images from the dataset, like edges with gradients in multiple directions, and tries to match these features between all the images and then computing the relative motion that the camera was subject to between any pair of images. It builds a 3D model with the correlated features. It then creates a 3D point cloud with colour information of the scanned object. There are different implementations of the structure from motion method that use different approaches to solve the feature-correlation problem between the images from the data set, different methods for detecting the features and different alternatives for sparse reconstruction and dense reconstruction as well. These differences influence variations in the final output across distinct algorithms. This thesis benchmarked these different algorithms in accuracy and processing time. For this purpose, a terrestrial 3D laser scanner was used to scan structures and buildings to generate a ground truth reference to which the structure from motion algorithms were compared. Then a video feed from a drone with a built-in camera was captured when flying around the structure or building to generate the input for the structure from motion algorithms. Different structures are considered taking into account how rich or poor in features they are, since this impacts the result of the structure from motion algorithms. The structure from motion algorithms generated 3D point clouds, which then are analysed with a tool like CloudCompare to benchmark how similar it is to the laser scanner generated data, and the runtime was recorded for comparing it across all algorithms. Subjective analysis has also been made, such as how easy to use the algorithm is and how complete the produced model looks in comparison to the others. In the comparison it was found that there is no absolute best algorithm, since every algorithm highlights in different aspects. There are algorithms that are able to generate a model very fast, managing to scale the execution time linearly in function of the size of their input, but at the expense of accuracy. There are also algorithms that take a long time for dense reconstruction, but generate almost complete models even in the presence of featureless surfaces, like COLMAP modified PatchMacht algorithm. The structure from motion methods are able to generate models with an accuracy of up to \unit[3]{cm} when scanning a simple building, where Visual Structure from Motion and Open Multi-View Environment ranked among the most accurate. It is worth highlighting that the error in accuracy grows as the complexity of the scene increases. Finally, it was found that the structure from motion method cannot reconstruct correctly structures with reflective surfaces, as well as repetitive patterns when the images are taken from mid to close range, as the produced errors can be as high as \unit[1]{m} on a large structure.
APA, Harvard, Vancouver, ISO, and other styles
6

Yin, Ling. "Automatic Stereoscopic 3D Chroma-Key Matting Using Perceptual Analysis and Prediction." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31851.

Full text
Abstract:
This research presents a novel framework for automatic chroma keying and the optimizations for real-time and stereoscopic 3D processing. It first simulates the process of human perception on isolating foreground elements in a given scene by perceptual analysis, and then predicts foreground colours and alpha map based on the analysis results and the restored clean background plate rather than direct sampling. Besides, an object level depth map is generated through stereo matching on a carefully determined feature map. In addition, three prototypes on different platforms have been implemented according to their hardware capability based on the proposed framework. To achieve real-time performance, the entire procedures are optimized for parallel processing and data paths on the GPU, as well as heterogeneous computing between GPU and CPU. The qualitative comparisons between results generated by the proposed algorithm and other existing algorithms show that the proposed one is able to generate more acceptable alpha maps and foreground colours especially in those regions that contain translucencies and details. And the quantitative evaluations also validate our advantages in both quality and speed.
APA, Harvard, Vancouver, ISO, and other styles
7

Koz, Alper. "Watermarking For 3d Representations." Phd thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608886/index.pdf.

Full text
Abstract:
In this thesis, a number of novel watermarking techniques for different 3D representations are presented. A novel watermarking method is proposed for the mono-view video, which might be interpreted as the basic implicit representation of 3D scenes. The proposed method solves the common flickering problem in the existing video watermarking schemes by means of adjusting the watermark strength with respect to temporal contrast thresholds of human visual system (HVS), which define the maximum invisible distortions in the temporal direction. The experimental results indicate that the proposed method gives better results in both objective and subjective measures, compared to some recognized methods in the literature. The watermarking techniques for the geometry and image based representations of 3D scenes, denoted as 3D watermarking, are examined and classified into three groups, as 3D-3D, 3D-2D and 2D-2D watermarking, in which the pair of symbols identifies whether the watermark is embedded-detected in a 3D model or a 2D projection of it. A detailed literature survey on 3D-3D watermarking is presented that mainly focuses on protection of the intellectual property rights of the 3D geometrical representations. This analysis points out the specific problems in 3D-3D geometry watermarking , such as the lack of a unique 3D scene representation, standardization for the coding schemes and benchmarking tools on 3D geometry watermarking. For 2D-2D watermarking category, the copyright problem for the emerging free-view televisions (FTV) is introduced. The proposed watermarking method for this original problem embeds watermarks into each view of the multi-view video by utilizing the spatial sensitivity of HVS. The hidden signal in a selected virtual view is detected by computing the normalized correlation between the selected view and a generated pattern, namely rendered watermark, which is obtained by applying the same rendering operations which has occurred on the selected view to the original watermark. An algorithm for the estimation of the virtual camera position and rotation is also developed based on the projective planar relations between image planes. The simulation results show the applicability of the method to the FTV systems. Finally, the thesis also presents a novel 3D-2D watermarking method, in which a watermark is embedded into 3-D representation of the object and detected from a 2-D projection (image) of the same model. A novel solution based on projective invariants is proposed which modifies the cross ratio of the five coplanar points on the 3D model according to the watermark bit and extracts the embedded bit from the 2D projections of the model by computing the cross-ratio. After presenting the applicability of the algorithm via simulations, the future directions for this novel problem for 3D watermarking are addressed.
APA, Harvard, Vancouver, ISO, and other styles
8

Göransson, Rasmus. "Automatic 3D reconstruction with kinect : A modylar system for creating high quality light weight textured meshes from rgbd video." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-142480.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wong, Timothy. "System Design and Analysis for Creating a 3D Virtual Street Scene for Autonomous Vehicles using Geometric Proxies from a Single Video Camera." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2041.

Full text
Abstract:
Self-driving vehicles use a variety of sensors to understand the environment they are in. In order to do so, they must accurately measure the distances and positions of the objects around them. A common representation of the environment around the vehicle is a 3D point cloud, or a set of 3D data points which represent the positions of objects in the real world relative to the car. However, while accurate and useful, these point clouds require large amounts of memory compared to other representations such as lightweight polygonal meshes. In addition, 3D point clouds can be difficult for a human to visually understand as the data points do not always form a naturally coherent object. This paper introduces a system to lower the memory consumption needed for the graphical representation of a virtual street environment. At this time, the proposed system takes in as input a single front-facing video. The system uses the video to retrieve still images of a scene which are then segmented to distinguish the different relevant objects, such as cars and stop signs. The system generates a corresponding virtual street scene and these key objects are visualized in the virtual world as low poly, or low resolution, models of the respective objects. This virtual 3D street environment is created to allow a remote operator to visualize the world that the car is traveling through. At this time, the virtual street includes geometric proxies for parallel parked cars in the form of lightweight polygonal meshes. These meshes are predefined, taking up less memory than a point cloud, which can be costly to transmit from the remote vehicle and potentially difficult for a remote human operator to understand. This paper contributes a design and analysis of an initial system for generating and placing these geometric proxies of parked cars in a virtual street environment from one input video. We discuss the limitations and measure the error for this system as well as reflect on future improvements.
APA, Harvard, Vancouver, ISO, and other styles
10

Gurram, Prudhvi K. "Automated 3D object modeling from aerial video imagery /." Online version of thesis, 2009. http://hdl.handle.net/1850/11207.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Draréni, Jamil. "Exploitation de contraintes photométriques et géométriques en vision : application au suivi, au calibrage et à la reconstruction." Grenoble, 2010. http://www.theses.fr/2010GRENM061.

Full text
Abstract:
Cette thèse s'intéresse à trois problèmes fondamentaux de la vision par ordinateur qui sont le suivi vidéo, le calibrage et la reconstruction 3D. Les approches proposées sont strictement basées sur des contraintes photométriques et géométriques présentent dans des images 2D. Le suivi de mouvement se fait généralement dans un flux vidéo et consiste à suivre un objet d'intérêt identifié par l'usager. Nous reprenons une des méthodes les plus robustes à cet effet et l'améliorons de sorte à prendre en charge, en plus de ses translations, les rotations qu'effectue l'objet d'intérêt. Par la suite nous nous attelons au calibrage de caméras ; un autre problème fondamental en vision. Il s'agit là, d'estimer des paramètres intrinsèques qui décrivent la projection d'entités 3D dans une image plane. Plus précisément, nous proposons des algorithmes de calibrage plan pour les caméras linéaires (pushbroom) et les vidéo projecteurs lesquels étaient, jusque-là, calibrés de façon laborieuse. Le troisième volet de cette thèse sera consacré à la reconstruction 3D par ombres projetée. À moins de connaissance à priori sur le contenu de la scène, cette technique est intrinsèquement ambigüe. Nous proposons une méthode pour réduire cette ambiguïté en exploitant le fait que les spots de lumières sont souvent visibles dans la caméra
The topic of this thesis revolves around three fundamental problems in computer vision; namely, video tracking, camera calibration and shape recovery. The proposed methods are solely based on photometric and geometric constraints found in the images. Video tracking, usually performed on a video sequence, consists in tracking a region of interest, selected manually by an operator. We extend a successful tracking method by adding the ability to estimate the orientation of the tracked object. Furthermore, we consider another fundamental problem in computer vision: calibration. Here we tackle the problem of calibrating linear cameras (a. K. A: pushbroom)and video projectors. For the former one we propose a convenient plane-based calibration algorithm and for the latter, a calibration algorithm that does not require aphysical grid and a planar auto-calibration algorithm. Finally, we pointed our third research direction toward shape reconstruction using coplanar shadows. This technique is known to suffer from a bas-relief ambiguity if no extra information on the scene or light source is provided. We propose a simple method to reduce this ambiguity from four to a single parameter. We achieve this by taking into account the visibility of the light spots in the camera
APA, Harvard, Vancouver, ISO, and other styles
12

Sexton, Ian. "Three dimensional television : an investigation concerning programmable parallax barriers." Thesis, De Montfort University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.391205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

AZEVEDO, ROBERTO GERSON DE ALBUQUERQUE. "SUPPORTING MULTIMEDIA APPLICATIONS IN STEREOSCOPIC AND DEPTH-BASED 3D VIDEO SYSTEMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2015. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=26551@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
FUNDAÇÃO DE APOIO À PESQUISA DO ESTADO DO RIO DE JANEIRO
Tecnologias de vídeos bidimensionais (2D) têm evoluído rapidamente nos últimos anos. Apesar disso, elas não permitem uma visão realista e imersiva do mundo, pois não oferecem importantes dicas de profundidade para o sistema visual humano. Tecnologias de vídeo tridimensionais (3D) têm como objetivo preencher essa lacuna, provendo representações que permitem a reprodução de informações de profundidade em displays 3D. Embora a representação baseada em vídeos estereoscópicos ainda seja a mais utilizada até o momento, novas representações de vídeo 3D têm emergido, tais como MVV (Multi-view video), 2D plus Z (2D plus depth), MVD (Multi-view plus depth) e LDV (Layered-depth video). A integração de aplicações multimídia com mídias 3D tem o potencial de permitir novos conteúdos interativos, novas experiências com o usuário e novos modelos de negócio. Nesta tese, duas abordagens para a integração de aplicações multimídia em cadeias de transmissão de vídeo 3D fim-a-fim são propostas. Primeiro, uma abordagem que é compatível com cadeias de transmissão de vídeo 3D baseado em vídeos estereoscópicos é discutida. A proposta consiste em extensões para linguagens multimídia 2D e um processo de conversão de aplicações multimídia 2D para sua versão estereoscópica. Essa proposta não requer nenhuma alteração no exibidor de linguagens multimídia 2D para a apresentação de mídias estereoscópicas. Em uma segunda abordagem, extensões adicionais a linguagens multimídia também são propostas visando a integração de aplicações multimídia em cadeias de vídeo 3D baseado em profundidade (2D plus Z ou LDV). Além disso, uma arquitetura para a composição gráfica dessas aplicações, baseada no conceito de LDV e que permite a integração de objetos de mídia baseado em profundidade em exibidores de aplicações multimídias é apresentada. Como um exemplo de aplicação prática das proposta desta tese, ambas são implementadas e integradas em um sistema de vídeo 3D fim-a-fim baseado no Sistema Brasileiro de TV Digital.
Two-dimensional video technologies have evolved quickly in the last few years. Even so, they do not achieve a realistic and immersive view of the world since they do not offer important depth cues to the human vision system. Three-dimensional video (3DV) technologies try to fulfill this gap through video representations that enable 3D displays to provide those additional depth cues. Although CSV (Conventional Stereoscopic Video) has been the most widely-used 3DV representation, other 3DV representations have emerged during the last years. Examples of those representations include MVV (Multi-view video), 2D plus Z (2D plus depth), MVD (Multi-view plus depth), and LDV (Layered-depth Video). Although end-to-end 3DV delivery chains based on those 3DV formats have been studied, the integration of interactive multimedia applications into those 3DV delivery chains has not yet been explored enough. The integration of multimedia applications with 3D media using those new representations has the potential of allowing new rich content, user experiences and business models. In this thesis, two approaches for the integration of multimedia applications into 3DV end-to-end delivery chains are proposed. First, a backward-compatible approach for integrating CSV-based media into 2D-only multimedia languages is discussed. In this proposal, it is possible to add depth information to 2D-only media objects. The proposal consists of extensions to multimedia languages and a process for converting the original multimedia application into its stereoscopic version. It does not require any change on the language player and is ready-to-run in current CSV-based 3DV delivery chains and digital receiver s hardware. Second, extensions to multimedia languages based on layered-depth media are proposed and a software architecture for the graphics composition of multimedia applications using those extensions is presented. As an example, both proposals are implemented and integrated into an end-to-end 3DV delivery chain based on the Brazilian Digital TV System.
APA, Harvard, Vancouver, ISO, and other styles
14

Kaller, Ondřej. "Pokročilé metody snímání a hodnocení kvality 3D videa." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-369744.

Full text
Abstract:
Disertační práce se zabývá metodami snímání a hodnocení kvality 3D obrazů a videí. Po krátkém shrnutí fyziologie prostorového vnímání, obsahuje práce stav poznání v oblastech problému adaptivní paralaxy a konfigurace kamer pro snímání klasického stereopáru. Taktéž shrnuje dnešní možnosti odhadu hloubkové mapy. Zmíněny jsou aktivní i pasivní metody, detailněji je vysvětleno profilometrické skenování. Byly změřeny některé technické parametry dvou technologií současných 3D zobrazovačů, a to polarizačně-oddělujících a využívajících časový multiplex, například přeslechy mezi levým a pravým snímkem. Jádro práce tvoří nová metoda pro vytváření hloubkové mapy při snímání 3D scény, kterážto byla autorem navržena a testována. Inovativnost tohoto přístupu spočívá v chytré kombinaci současných aktivních a pasivních metod snímání hloubky scény, která vtipně využívá výhod obou metod. Nakonec jsou prezentovány výsledky subjektivních testů kvality 3D videa. Největší přínos zde má navržená metrika modelující výsledky subjektivních testů kvality 3D videa.
APA, Harvard, Vancouver, ISO, and other styles
15

Solh, Mashhour M. "Depth-based 3D videos: quality measurement and synthesized view enhancement." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/43743.

Full text
Abstract:
Three dimensional television (3DTV) is believed to be the future of television broadcasting that will replace current 2D HDTV technology. In the future, 3DTV will bring a more life-like and visually immersive home entertainment experience, in which users will have the freedom to navigate through the scene to choose a different viewpoint. A desired view can be synthesized at the receiver side using depth image-based rendering (DIBR). While this approach has many advantages, one of the key challenges in DIBR is generating high quality synthesized views. This work presents novel methods to measure and enhance the quality of 3D videos generated through DIBR. For quality measurements we describe a novel method to characterize and measure distortions by multiple cameras used to capture stereoscopic images. In addition, we present an objective quality measure for DIBR-based 3D videos by evaluating the elements of visual discomfort in stereoscopic 3D videos. We also introduce a new concept called the ideal depth estimate, and define the tools to estimate that depth. Full-reference and no-reference profiles for calculating the proposed measures are also presented. Moreover, we introduce two innovative approaches to improve the quality of the synthesized views generated by DIBR. The first approach is based on hierarchical blending of the background and foreground information around the disocclusion areas which produces a natural looking, synthesized view with seamless hole-filling. This approach yields virtual images that are free of any geometric distortions, unlike other algorithms that preprocess the depth map. In contrast to the other hole-filling approaches, our approach is not sensitive to depth maps with high percentage of bad pixels from stereo matching. The second approach further enhances the results through a depth-adaptive preprocessing of the colored images. Finally, we propose an enhancement over depth estimation algorithm using the depth monocular cues from luminance and chrominance. The estimated depth will be evaluated using our quality measure, and the hole-filling algorithm will be used to generate synthesized views. This application will demonstrate how our quality measures and enhancement algorithms could help in the development of high quality stereoscopic depth-based synthesized videos.
APA, Harvard, Vancouver, ISO, and other styles
16

Cunat, Christophe. "Accélération matérielle pour le rendu de scènes multimédia vidéo et 3D." Phd thesis, Télécom ParisTech, 2004. http://tel.archives-ouvertes.fr/tel-00077593.

Full text
Abstract:
Un processus de convergence des techniques algorithmiques de deux domaines autrefois disjoints, convergence facilité par l'émergence de normes telles que MPEG-4, s'est engagé au cours de ces dernières années. Grâce au concept de codage par objets, une scène peut être reconstituée par la composition de divers objets dans un ordre déterminé.
Cette thèse s'inscrit dans le cadre de la composition d'objets visuels qui peuvent être de natures différentes (séquences vidéo, images fixes, objets synthétiques 3D, etc.). Néanmoins, les puissances de calcul nécessaires afin d'effectuer cette composition demeurent prohibitives sans mise en place d'accélérateurs matériels spécialisés et deviennent critiques dans un contexte de terminal portable.
Une revue tant algorithmique qu'architecturale des différents domaines est effectuée afin de souligner à la fois les points de convergence et de différence. Ensuite, trois axes (interdépendants) de réflexions concernant les problématiques de représentation des données, d'accès aux données et d'organisation des traitements sont principalement discutés.
Ces réflexions sont alors appliquées au cas concret d'un terminal portable pour la labiophonie : application de téléphonie où le visage de l'interlocuteur est reconstruit à partir d'un maillage de triangles et d'un placage de texture. Une architecture unique d'un compositeur d'image capable de traiter indifféremment ces objets visuels est ensuite définie. Enfin, une synthèse sur une plateforme de prototypage de cet opérateur autorise une comparaison avec des solutions existantes, apparues pour la plupart au cours de cette thèse.
APA, Harvard, Vancouver, ISO, and other styles
17

Jenkins, Dave A. "Teaching First-Semester General Chemistry Using 3D Video Games following an Atoms First Approach to Chemistry." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1248376/.

Full text
Abstract:
The unified learning model (ULM) focuses on students' engagement, motivation, prior knowledge, and working memory. This study employs the use of video games to assess students' learning through a 3D chemistry gaming environment. In this human-subjects research, students carried out missions and applied reasoning to solve problems appropriate for general chemistry content. For learning to occur, students must be engaged and motivated as stated in the ULM. Learning cannot necessarily be accomplished by experience alone, and critical thinking is required to turn the experience into learning. The interpretation of educational theory applied to video games and this proposed study are discussed. A moderately positive correlation was found between exam score and study time (playing the game). Essentially the more time spent playing the game or an online activity the higher the exam scores. There was an alpha level less than 0.05 (p < 0.05) between the experimental group and non-traditional group (no game or online activity). Supporting that there was a statistically significant difference between groups, the null hypothesis was accepted between the game and online activity. Furthermore, as stated under the ULM, engagement is necessary for optimal learning.
APA, Harvard, Vancouver, ISO, and other styles
18

Brewin, Michael A. "Reduced information integral imaging." Thesis, De Montfort University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.391420.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Schwarz, Sebastian. "Gaining Depth : Time-of-Flight Sensor Fusion for Three-Dimensional Video Content Creation." Doctoral thesis, Mittuniversitetet, Avdelningen för informations- och kommunikationssystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-21938.

Full text
Abstract:
The successful revival of three-dimensional (3D) cinema has generated a great deal of interest in 3D video. However, contemporary eyewear-assisted displaying technologies are not well suited for the less restricted scenarios outside movie theaters. The next generation of 3D displays, autostereoscopic multiview displays, overcome the restrictions of traditional stereoscopic 3D and can provide an important boost for 3D television (3DTV). Then again, such displays require scene depth information in order to reduce the amount of necessary input data. Acquiring this information is quite complex and challenging, thus restricting content creators and limiting the amount of available 3D video content. Nonetheless, without broad and innovative 3D television programs, even next-generation 3DTV will lack customer appeal. Therefore simplified 3D video content generation is essential for the medium's success. This dissertation surveys the advantages and limitations of contemporary 3D video acquisition. Based on these findings, a combination of dedicated depth sensors, so-called Time-of-Flight (ToF) cameras, and video cameras, is investigated with the aim of simplifying 3D video content generation. The concept of Time-of-Flight sensor fusion is analyzed in order to identify suitable courses of action for high quality 3D video acquisition. In order to overcome the main drawback of current Time-of-Flight technology, namely the high sensor noise and low spatial resolution, a weighted optimization approach for Time-of-Flight super-resolution is proposed. This approach incorporates video texture, measurement noise and temporal information for high quality 3D video acquisition from a single video plus Time-of-Flight camera combination. Objective evaluations show benefits with respect to state-of-the-art depth upsampling solutions. Subjective visual quality assessment confirms the objective results, with a significant increase in viewer preference by a factor of four. Furthermore, the presented super-resolution approach can be applied to other applications, such as depth video compression, providing bit rate savings of approximately 10 percent compared to competing depth upsampling solutions. The work presented in this dissertation has been published in two scientific journals and five peer-reviewed conference proceedings.  In conclusion, Time-of-Flight sensor fusion can help to simplify 3D video content generation, consequently supporting a larger variety of available content. Thus, this dissertation provides important inputs towards broad and innovative 3D video content, hopefully contributing to the future success of next-generation 3DTV.
APA, Harvard, Vancouver, ISO, and other styles
20

Samrouth, Khouloud. "Représentation et compression à haut niveau sémantique d’images 3D." Thesis, Rennes, INSA, 2014. http://www.theses.fr/2014ISAR0025/document.

Full text
Abstract:
La diffusion de données multimédia, et particulièrement les images, continuent à croitre de manière très significative. La recherche de schémas de codage efficaces des images reste donc un domaine de recherche très dynamique. Aujourd'hui, une des technologies innovantes les plus marquantes dans ce secteur est sans doute le passage à un affichage 3D. La technologie 3D est largement utilisée dans les domaines de divertissement, d'imagerie médicale, de l'éducation et même plus récemment dans les enquêtes criminelles. Il existe différentes manières de représenter l'information 3D. L'une des plus répandues consiste à associer à une image classique dite de texture, une image de profondeur de champs. Cette représentation conjointe permet ainsi une bonne reconstruction 3D dès lors que les deux images sont bien corrélées, et plus particulièrement sur les zones de contours de l'image de profondeur. En comparaison avec des images 2D classiques, la connaissance de la profondeur de champs pour les images 3D apporte donc une information sémantique importante quant à la composition de la scène. Dans cette thèse, nous proposons un schéma de codage scalable d'images 3D de type 2D + profondeur avec des fonctionnalités avancées, qui préserve toute la sémantique présente dans les images, tout en garantissant une efficacité de codage significative. La notion de préservation de la sémantique peut être traduite en termes de fonctionnalités telles que l'extraction automatique de zones d'intérêt, la capacité de coder plus finement des zones d'intérêt par rapport au fond, la recomposition de la scène et l'indexation. Ainsi, dans un premier temps, nous introduisons un schéma de codage scalable et joint texture/profondeur. La texture est codée conjointement avec la profondeur à basse résolution, et une méthode de compression de la profondeur adaptée aux caractéristiques des cartes de profondeur est proposée. Ensuite, nous présentons un schéma global de représentation fine et de codage basé contenu. Nous proposons ainsi schéma global de représentation et de codage de "Profondeur d'Intérêt", appelé "Autofocus 3D". Il consiste à extraire finement des objets en respectant les contours dans la carte de profondeur, et de se focaliser automatiquement sur une zone de profondeur pour une meilleure qualité de synthèse. Enfin, nous proposons un algorithme de segmentation en régions d'images 3D, fournissant une forte consistance entre la couleur, la profondeur et les régions de la scène. Basé sur une exploitation conjointe de l'information couleurs, et celle de profondeur, cet algorithme permet la segmentation de la scène avec un degré de granularité fonction de l'application visée. Basé sur cette représentation en régions, il est possible d'appliquer simplement le même principe d'Autofocus 3D précédent, pour une extraction et un codage de la profondeur d'Intérêt (DoI). L'élément le plus remarquable de ces deux approches est d'assurer une pleine cohérence spatiale entre texture, profondeur, et régions, se traduisant par une minimisation des problèmes de distorsions au niveau des contours et ainsi par une meilleure qualité dans les vues synthétisées
Dissemination of multimedia data, in particular the images, continues to grow very significantly. Therefore, developing effective image coding schemes remains a very active research area. Today, one of the most innovative technologies in this area is the 3D technology. This 3D technology is widely used in many domains such as entertainment, medical imaging, education and very recently in criminal investigations. There are different ways of representing 3D information. One of the most common representations, is to associate a depth image to a classic colour image called texture. This joint representation allows a good 3D reconstruction, as the two images are well correlated, especially along the contours of the depth image. Therefore, in comparison with conventional 2D images, knowledge of the depth of field for 3D images provides an important semantic information about the composition of the scene. In this thesis, we propose a scalable 3D image coding scheme for 2D + depth representation with advanced functionalities, which preserves all the semantics present in the images, while maintaining a significant coding efficiency. The concept of preserving the semantics can be translated in terms of features such as an automatic extraction of regions of interest, the ability to encode the regions of interest with higher quality than the background, the post-production of the scene and the indexing. Thus, firstly we introduce a joint and scalable 2D plus depth coding scheme. First, texture is coded jointly with depth at low resolution, and a method of depth data compression well suited to the characteristics of the depth maps is proposed. This method exploits the strong correlation between the depth map and the texture to better encode the depth map. Then, a high resolution coding scheme is proposed in order to refine the texture quality. Next, we present a global fine representation and contentbased coding scheme. Therefore, we propose a representation and coding scheme based on "Depth of Interest", called "3D Autofocus". It consists in a fine extraction of objects, while preserving the contours in the depth map, and it allows to automatically focus on a particular depth zone, for a high rendering quality. Finally, we propose 3D image segmentation, providing a high consistency between colour, depth and regions of the scene. Based on a joint exploitation of the colour and depth information, this algorithm allows the segmentation of the scene with a level of granularity depending on the intended application. Based on such representation of the scene, it is possible to simply apply the same previous 3D Autofocus, for Depth of Interest extraction and coding. It is remarkable that both approaches ensure a high spatial coherence between texture, depth, and regions, allowing to minimize the distortions along object of interest's contours and then a higher quality in the synthesized views
APA, Harvard, Vancouver, ISO, and other styles
21

Fan, Yu. "Quality assessment of stereoscopic 3D content based on binocular perception." Thesis, Poitiers, 2019. http://www.theses.fr/2019POIT2266.

Full text
Abstract:
La grande avancée des technologies stéréoscopiques/3D conduit à une croissance remarquable des contenus 3D dans diverses applications grâce aux effets immersifs qu’ils offrent. Cependant, ces technologies ont également créé de enjeux concernant l’évaluation de la qualité et la compression, en raison des processus complexes de la vision binocu laire. Visant à évaluer et à optimiser les performances des systèmes d’imagerie 3D quant à leur capacité de stockage et leur qualité d’expérience (QoE), cette thèse porte sur deux parties principales: 1- les seuils de visibilité spatiale du système visuel humain (SVH) et 2- l’évaluation de la qualité des images stéréoscopiques. Il est bien connu que le SVH ne peut détecter les modifications dans une image compressée si ces dernières sont inférieures au seuil JND (Just Noticeable Difference). Par conséquent, une étude approfondie basée sur une analyse objective et subjective a été menée sur les modèles 3D-JND existants. En outre, un nouveau modèle 3D-JND basé sur des expériences psychophysiques visant à mesurer l’effet de la disparité binoculaire et du masquage spatial sur les seuils visuels a été proposé. Dans la deuxième partie, nous avons exploré la mesure de la qualité 3D. Ainsi, nous avons développé un modèle avec référence prenant en compte à la fois la qualité monoculaire et cyclopéenne. Nous avons ensuite proposé une nouvelle métrique de qualité sans référence reposant sur la combinaison de statistiques locales de contraste de la paire stéréo. Les deux modèles reposent sur les propriétés de fusion et de rivalité binoculaire du SVH afin de simuler avec précision le jugement humain de la qualité 3D
The great advance of stereoscopic/3D technologies leads to a remarkable growth of 3D content in various applications thanks to a realistic and immersive experience. However, these technologies also brought some technical challenges and issues, regarding quality assessment and compression due to the complex processes of the binocular vision. Aiming to evaluate and optimize the performance of 3D imaging systems with respect to their storage capacity and quality of experience (QoE), this thesis focuses on two main parts: 1- spatial visibility thresholds of the human visual system (HVS) and 2- stereoscopic image quality assessment (SIQA). It is well-known that the HVS cannot detect the changes in a compressed image if these changes are lower than the just noticeable different (JND) threshold. Therefore, an extensive study based on objective and subjective analysis has been conducted on existing 3D-JND models. In addition, a new 3D-JND model has been proposed based on psychophysical experiments aiming to measure the effect of binocular disparity and spatial masking on the visual thresholds. In the second part, we explored new approaches for SIQA from two different perspectives. First, we developed a reference-based model accounting for both monocular and cyclopean quality. Then, we proposed a new blind quality metric relying on local contrast statistics combination of the stereopair. Both models considered the binocular fusion and binocular rivalry behaviors of the HVS in order to accurately simulate the human judgment of 3D quality
APA, Harvard, Vancouver, ISO, and other styles
22

Burbano, Andres. "Système de caméras intelligentes pour l’étude en temps-réel de personnes en mouvement." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS139/document.

Full text
Abstract:
Nous proposons un système dedétection et de suivi des personnes enmouvement dans des grands espaces. Notresolution repose sur un réseau de camérasintelligentes pour l'extraction desinformations spatio-temporelles despersonnes. Les caméras sont composées d'uncapteur 3D, d'un système embarqué et decommunication. Nous avons montrél'efficacité du placement des capteurs 3D enposition zénithale par rapport auxoccultations et variations d’échelle.Nous garantissons l'exécution des traitementsen temps-réel (~20 fps), permettantde détecter des déplacements rapides avecune précision jusqu’à 99 %, et capable d’unfiltrage paramétrique des cibles non désiréescomme les enfants ou les caddies.Nous avons réalisé une étude sur la viabilitétechnologique des résultats pour de grandsespaces, rendant la solution industrialisable
We propose a detection and trackingsystem of people moving in large spacessystem. Our solution is based on a network ofsmart cameras capable of retrievingspatiotemporal information from the observedpeople. These smart cameras are composed bya 3d sensor, an onboard system and acommunication and power supply system. Weexposed the efficacy of the overhead positionto decreasing the occlusion and the scale'svariation.Finally, we carried out a study on the use ofspace, and a global trajectories analysis ofrecovered information by our and otherssystems, able to track people in large andcomplex spaces
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, You-Sheng, and 陳友聖. "Adaptive 3D Video Generation System." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/zctjr5.

Full text
Abstract:
碩士
國立雲林科技大學
電子工程系
103
The 3D filming system can be divided into two methods: array cameras and depth cameras. With these cameras, the 3D video can be produced, and the stereoscopic display is combined for playing videos. Planar videos that apply other filming methods have to go through the post conversion technology to produce stereoscopic depth, and the Depth Image Based Rendering (DIBR) is applied to generate left and right perspectives. Finally, the videos are played with stereoscopic display. Since the source of 3D filming is more expensive and planar videos are common video types, to develop a 2D to 3D conversion technology can solve the problem of insufficient 3D contents. In addition, the costs of 3D content production are lower and more flexible. This paper applies automatic depth estimation method to develop the 2D to 3D video conversion technology. With the Depth Image Based Rendering (DIBR), the left and right perspectives are generated, and the videos are played with stereoscopic display. This paper will automatically adjust the required threshold for establishing video groups according to the RGB distribution of source images, and determine that if the video contents have to adjust to extend the depth information according to the change of RGB distribution under the timeline. In addition, this paper also uses the type of the movement amount of an image to modify redundant movement amount and applies the feature of object movement in front of cameras to establish the object depth. Moreover, this paper uses video groups to calculate the texture features and determines the depth directions of atmospheric perspective, particular condition, and linear perspective. The above depth information is integrated with the movement trajectory, which can emphasize the object depth and suppress wrong depth at the same time to establish the stereoscopic depth. The proposed 2D to 3D video conversion technology can establish the stereoscopic depth in different videos flexibly as well as establish the outline of stereoscopic depth accurately. This paper can solve the problem that the current 2D to 3D video conversion technologies cannot emphasize the 3D features and maintain low beating of 3D videos at the same time. Finally, the p-thread and the CUDA multi-core are applied to accelerate the proposed adaptive 2D to 3D depth estimation algorithm, which makes the video conversion update rate of 1080P can achieve over 15fps.
APA, Harvard, Vancouver, ISO, and other styles
24

Martins, Nuno Alexandre Bettencourt. "Network distributed 3D video quality monitoring system." Master's thesis, 2014. http://hdl.handle.net/10400.26/12295.

Full text
Abstract:
This project description presents a research and development work whose primary goal was the design and implementation of an Internet Protocol (IP) network distributed video quality assessment tool. Even though the system was designed to monitor H.264 three-dimensional (3D) stereo video quality it is also applicable to di erent formats of 3D video (such as texture plus depth) and can use di erent video quality assessment models making it easily customizable and adaptable to varying conditions and transmission scenarios. The system uses packet level data collection done by a set of network probes located at convenient network points, that carry out packet monitoring, inspection and analysis to obtain information about 3D video packets passing through the probe's locations. The information gathered is sent to a central server for further processing including 3D video quality estimation based on packet level information. Firstly an overview of current 3D video standards, their evolution and features is presented, strongly focused on H.264/AVC and HEVC. Then follows a description of video quality assessment metrics, describing in more detail the quality estimator used in the work. Video transport methods over the Internet Protocol are also explained in detail as thorough knowledge of video packetization schemes is important to understand the information retrieval and parsing performed at the front stage of the system, the probes. After those introductory themes are addressed, a general system architecture is shown, explaining all its components and how they interact with each other. The development steps of each of the components are then thoroughly described. In addition to the main project, a 3D video streamer was created to be used in the implementation tests of the system. This streamer was purposely built for the present work as currently available free-domain streamers do not support 3D video streaming. The overall result is a system that can be deployed in any IP network and is exible enough to help in future video quality assessment research, since it can be used as a testing platform to validate any proposed new quality metrics, serve as a network monitoring tool for video transmission or help to understand the impact that some network characteristics may have on video quality.
APA, Harvard, Vancouver, ISO, and other styles
25

Chuang, Hui-Min, and 莊惠閔. "Advanced 3D Video Capture and Stabilization System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/11624282658060243627.

Full text
Abstract:
碩士
國立中央大學
電機工程研究所
99
When recording videos by handheld cameras, the recording method may cause some unwanted vibration in videos. The vibration in video may cause audiences having uncomfortable viewing experience. Audience may want to view videos where no vibration exists. Therefore, the unwanted vibration in videos needs to be removed or reduced. If cameraman films the video using a handheld 3D stereo camera, the vibration in video will cause more problems than traditional 2D monocular video. A stereo camera has two lenses on a single device. The vibrations in video may destroy the 3D relationship between left-view video and right-view video. Thus, audience may not experience deserved stereoscopic effect and feel more uncomfortable due to the destroyed 3D relationships. Efficient hardware architecture for monocular video stabilization is proposed in this thesis. The performance of the architecture supports the video specification 1920x1080 (Full HD) @ 30 fps, and the system operates at clock rate 100MHz. In order to solve the problem, different solutions for monocular video stabilization and stereo video stabilization are proposed in this thesis. Efficient hardware architecture is proposed for monocular video stabilization, and advanced 3D video stabilization algorithm is addressed for stereoscopic video. A 3D stereo video stabilization system should consider not only how to smooth the vibration in video, but also how to maintain the relationships between two cameras. Considering above issues, an advanced 3D stereo video stabilization algorithm is proposed in this thesis. A porting project is also finished in this thesis. The stereo video stabilization system is porting on Texas Instruments (TI) DaVinci DM6446 evaluation platform. This system contains control of input/output interface and process of the proposed algorithm. This work is finished for achieving the goal of real-time processing.
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Kuo-Hsuan, and 吳國暄. "2D to 3D Video Conversion System Design." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/08421135642980549863.

Full text
Abstract:
碩士
國立雲林科技大學
電子與光電工程研究所碩士班
100
The progress in display industry and the ambition of each display company to increase their quantity of sale, have contributed to the prevalent phenomenon of cutting price competition. To enhance the competitiveness and profitability of the product, it is essential to innovate and add extra value to the product. The 3D movie “Avatar”, a blockbuster released on December 2009, has successfully attracted attentions from display industry, and thus motivated the development of 3D display. However, 3D display development now faces a new challenge. Except for 3D movies and videos shot by the multi-view camera, it is difficult for common viewers to obtain large quantities of 3D videos. Therefore, to popularize 3D displays, it is necessary to use the existed 2D videos and convert them to 3D ones, so as to solve the problem of insufficient video source. The key to 2D to 3D video conversion is how to reproduce the disparity information that is commonly experienced by human eyes. Therefore, it is essential to produce the depth information[1] of 2D videos (which are the relative distance between different objects) first and then apply Depth-Image-Based Rendering (DIBR)[2] to produce multi-view video (3D video) that can be viewed on any 3D display. This paper proposes 2D to 3D video conversion system consists of four main parts, object segmentation, background depth configuration, depth information integration, generation of multi-view video with hole-filling. Among of them, in order to separate the objects, we use the edge of the moving object and the property of similar color block to develop the algorithm of object segmentation. This algorithm includes the determine homogeneous region, moving detection, object edge detection, color grouping, object cutting and so on. On the other hand, we make use of the psychological factor of visual perception to create the depth distribution of video scene algorithm. It includes atmospheric perspective, horizon analysis, linear perspective and so on. Because objects segmentation doesn’t contain the information of object depth, we use depth distribution of video scene to create the object depth. And then, we integrate the video object and background to create a complete depth information. Finally, through this information and 2D video, we can use DIBR to create the multi-view video. We provide the multi-view video of what consumer expect according to the required information of 3D display. Experiment shows this system can create the realistic quality of 3D video on 3D display.
APA, Harvard, Vancouver, ISO, and other styles
27

Cavan, Neil. "Reconstruction of 3D Points From Uncalibrated Underwater Video." Thesis, 2011. http://hdl.handle.net/10012/6242.

Full text
Abstract:
This thesis presents a 3D reconstruction software pipeline that is capable of generating point cloud data from uncalibrated underwater video. This research project was undertaken as a partnership with 2G Robotics, and the pipeline described in this thesis will become the 3D reconstruction engine for a software product that can generate photo-realistic 3D models from underwater video. The pipeline proceeds in three stages: video tracking, projective reconstruction, and autocalibration. Video tracking serves two functions: tracking recognizable feature points, as well as selecting well-spaced keyframes with a wide enough baseline to be used in the reconstruction. Video tracking is accomplished using Lucas-Kanade optical flow as implemented in the OpenCV toolkit. This simple and widely used method is well-suited to underwater video, which is taken by carefully piloted and slow-moving underwater vehicles. Projective reconstruction is the process of simultaneously calculating the motion of the cameras and the 3D location of observed points in the scene. This is accomplished using a geometric three-view technique. Results are presented showing that the projective reconstruction algorithm detailed here compares favourably to state-of-the-art methods. Autocalibration is the process of transforming a projective reconstruction, which is not suitable for visualization or measurement, into a metric space where it can be used. This is the most challenging part of the 3D reconstruction pipeline, and this thesis presents a novel autocalibration algorithm. Results are shown for two existing cost function-based methods in the literature which failed when applied to underwater video, as well as the proposed hybrid method. The hybrid method combines the best parts of its two parent methods, and produces good results on underwater video. Final results are shown for the 3D reconstruction pipeline operating on short under- water video sequences to produce visually accurate 3D point clouds of the scene, suitable for photorealistic rendering. Although further work remains to extend and improve the pipeline for operation on longer sequences, this thesis presents a proof-of-concept method for 3D reconstruction from uncalibrated underwater video.
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Chung-Te, and 李宗德. "A Brain-Inspired and Visual-Cortex-Aware Interactive 3D Video Processing System." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/15619307105211631973.

Full text
Abstract:
博士
國立臺灣大學
電機工程學研究所
101
This dissertation describes a brain-inspired and visual-cortex-aware interactive 3D video processing system. Compare to current smart 3DTV, the proposed system focuses on 1) the enhancement of 3D visual quality and 2) natural and smart 3D interaction. We enhance the 3D visual quality by analyzing the perception in the human visual system. Psychologists have explored that human beings perceive 3D effects by various monocular and binocular depth cues. While watching a 3D video, the conflicts between those depth cues make the viewer feel unnatural or uncomfortable. In contrast, lacking of 3D contents is a well-known fundamental problem for current 3DTV system. Therefore, we propose a depth cue conflict-free 2D-to-3D conversion to generate 3D videos with higher visual quality. To eliminate the potential conflicts between the depth cues while watching the converted 3D videos, we try to compute the depth from conventional 2D videos by mimicking how the brain analyzes the depth. Since neural scientists have found that the depth perception is generated by dealing with all the depth cues in Bayesian way, so-called “Bayesian brain”, we convert 2D videos to 3D videos by solving a Bayesian inference problem of the depth cues. We call the proposed methods as “brain-inspired 2D-to-3D conversion.” From the subjective viewpoint, the brain-inspired 2D-to-3D conversion outperforms earlier conversion methods by preserving more reliable depth cues. Moreover, an enhancement of 0.70-3.14 dB and 0.0059-0.1517 in the perceptual quality of the videos is realized in terms of the objective-modified peak signal-to-noise ratio and disparity distortion measure, respectively. Besides, the natural and smart 3D interaction is performed between the intention of the viewer and the corresponding 3D perception. In our proposed system, we assume that the viewers perform their intentions by hands, which is one of the most natural ways of the interaction. However, the 3D perception cannot be distorted during the process of the interaction. This is a fundamental and necessary condition of a smart 3D interaction system. In the proposed system, we model the early vision in the visual cortex to make sure there are no distortions in the 3D perception. Psychologists have also found that the images on the retinas are the inputs of the visual cortex; therefore, we propose an interactive 3D video retargeting method on the basis of estimating the retinal images and the responses of the early vision in the visual cortex, called as “visual-cortex-aware interactive 3D retargeting”. Notably, scientists have explored that the retinal images will be pre-processed on the basis of viewing angle in the early vision while watching television. Hence, our proposed visual-cortex-aware interactive 3D retargeting considers the pre-processing of the retinal images for preserving the intensive 3D perception. Several demonstrations of 3D interactions, including 3D viewpoint-interactive video and 3D interactive window, are also designed in this dissertation. From the subjective viewpoint, the proposed interactive 3D demonstrations are much more immersive and also preferred than current non-interactive 3D since the perceptual distortion is quite reduced. In this dissertation, we describe a brain-inspired and visual-cortex-aware interactive 3D video processing system for 3D video processing. Our proposed brain-inspire 2D-to-3D conversion provides more 3D contents with enhanced visual quality. The proposed visual-cortex-aware interactive 3D retargeting let viewers be able to enjoy their 3D experiences during the natural and smart 3D interaction.
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Kai-ping, and 陳凱萍. "Embedded system design and implementation of high efficiency 3D video depth processing." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/81306753328219997936.

Full text
Abstract:
碩士
國立雲林科技大學
電子與光電工程研究所碩士班
101
To obtain 3D video content from existed 2D video, many 2D-to-3D video conversion methodologies have been proposed.2D-to-3D video conversion methodology includes depth map generation. Higher 2D video complexity may induce greater computation in methodology. If 2D-to-3D video conversion performance is low, it would greatly reduce the efficiency of an embedded system platform. This paper proposed a low computation method to improve 2D to 3D video conversion performance. To solve the low performance problem, we proposed a fast interpolation 3D depth algorithm. This algorithm used one frame as basic unit to divide a video into several groups. Then the first frame and final frame of a group were applied to the 2D-to-3D video conversion methodology to produce 3D depth map. The depth map was applied with the proposed fast interpolation depth information algorithm to produce the depth maps between the first frame and the final frame. In this paper, because the processing speed of embedded system is very slow when it performs highly complex 2D-to-3D methodology, we proposed the concept of reducing computation load without affecting the precision of 3D depth information, and suggested a fast interpolation depth information algorithm. The algorithm can be applied with any existed 2D-to-3D methodology. The method is as follows. First a 2D video was divided into several groups, and the first frame and final frame was applied to the 2D-to-3D methodology to produce depth maps. Then these depth maps were applied with the 2D video to interpolate the depth maps between the first frame and the final frame. The experimental result shows that the frame rate of the proposed method was 13.6fps faster than that of the original method when performed on x86, and 7.44 fps faster when performed on an embedded system. As for PSNR, the highest average value was 36.17dB.
APA, Harvard, Vancouver, ISO, and other styles
30

Yang, Yi, and 楊毅. "3D Fish Animation Generating System Based on Analysis of Video-based Fish Swimming Motion." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/39230407155953886572.

Full text
Abstract:
碩士
國立中山大學
資訊工程學系研究所
103
This thesis describes a system that can imitate the swimming behavior from a real fish motion recording from videos. The proposed method does not require the setting of markers or sensors on the target objects like making a 3D movie. Object data are obtained from videos setting on variety of viewpoints. Top and front cameras were used to record the motion of fish. The videos were calibrated with pole-polar relationship and deformable objects were tracked using template matching associated with the selective binary and Gaussian filtering regularized level set. Following the tracking procedure, the skeletons were extracted using Delaunay triangulation from the contours of the creatures. We also proposed a line fitting method to facilitate the formulation of 2D skeletons obtained from two views into a 3D skeleton. The proposed method enables the simulation of fish motion with the same motion of real fish in video.
APA, Harvard, Vancouver, ISO, and other styles
31

Wu, Meng-Jer, and 吳孟哲. "Design of Muli-code CDMA System for 3D stereoscopic Video over wireless ATM Networks." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/80074019134121216152.

Full text
Abstract:
碩士
國立交通大學
電信研究所
85
This paper investigates the application of multi-code SS-CDMA techniquesto three-dimensional stereoscopic video transmission over wireless ATM networks. 3D visual communications, made through the use of stereoscopic images, are able toachieve total display realism. Such services allow users to share the virtual reality (VR)world without any geographical restrictions. In order to create a 3D system with two images (left and right) that should be transmitted over a bandlimited mobile channelsimultaneously, a cost-effective MPEG-based wavelet multiresolution coding with an joint motion- and disparity- compensation is developed to reduce a large amount of information contained in the images in order to the meet the low transmission rate limitationof mobile channels. However, the rapidly variable bit rate (VBR) characteristics of the MPEG- based 3D videos seems a weakness to the transmission of such videos via a constantbit-rate (CBR) mobile channel.The ATM technique is especially wellsuited for variable bit rate (VBR) MPEG-based 3D video because of its ability to allocate bandwidth on demand to these services.However, since the mobile radio has a limited channel capacity,the overall capacity of the traditional ATM-based SS-CDMA system may not be sufficientto accommodate the MPEG-based 3D video services requested bythe multiple mobile users simultaneously.To tackle this difficulty, a multi-code CDMA technique is proposed to provide VBR MPEG- based 3D video services by varying the number of spreading codes assigned to the 3D video in order to meet its dynamic throughput requirement. Meanwhile, these extra spreading codes would alsocreate the interference to the other videos and degrade their picture quality. Hence, a cost-effectivespreading code assignment mechanism has then been proposed to dynamically assignappropriate number of codes to each video for achieving both the maximum resource utilizationand the best picture quality for multiple video services via the mobile radio channel. However, both the fading and interference on SS-CDMA radio channels tend to cause significanttransmission error and MPEG-based 3D video bit streams are very vulnerable to these errors. Powerfulforward error correction(FEC) codes are therefore necessary to protect the video dataso that it can be successfully transmitted at acceptable signal power levels. Two separate FEC code schemes are applied to the header and payload of a ATM cell containing 3D video data,respectively. The ATM cell header is protected by a relatively powerful FEC code to ensure correctdelivery and low cell loss rate (CLR).On the other hand, the ATM cell payload is encoded for varying degrees of error protection accordingto the priority of the payload data in MPEG2 videos. An adaptive FEC code combining scheme is proposedto provide the good protection for payload data with the maximizationof its code rate in order to minimize the extra bandwidth for FEC overhead.Simulation results would show that our FEC code combining scheme is able to ensurethe average number of ATM cell losses in each typical 3D picture frame bein the order of O(1).
APA, Harvard, Vancouver, ISO, and other styles
32

Tsai, Ho-Cheng, and 蔡和成. "Real-time implementation of Digital 2D-to-3D conversion algorithm for stereo video system." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/80379398905791105716.

Full text
Abstract:
碩士
國立雲林科技大學
電子與光電工程研究所碩士班
101
This paper presents a fast and complete 2D-3D conversion algorithm. It takes the unmovable background as the target movie, and divides the algorithm into two parts for treatment, i.e. depth background depth algorithm and foreground object segmentation algorithm method. Background depth algorithm uses the concept of horizon to divide the image, hereinafter referred to as the dividing line. It takes advantages of the features of obvious and the color difference to find dividing lines. After the line is found out, it determines the location of dividing line. If the line is near the center of the image, the dividing line is set as the furthest distance. If the dividing line is located under the image, it is taken as the most front position of the image. After the position determination is completed, depth map generated by the dividing line is used to generate depth information. For foreground object segmentation method, firstly, it shall use displacement method of object to be cut out. Then calculate the difference between present time image and nearby image and calculate the total difference of each adjacent image. It uses is calculated total difference to judge whether each adjacent image is in compliance. If it is in compliance, difference image generated by the adjacent image is used and the image is converted into binaryzation image. The conversion is used the concept of group. In the scope, if the total difference is higher, it has relatively low adaptive threshold to make this point becomes the binary image and be preserved. On the contrary, the block with smaller total value within the mask has higher adaptive threshold and this point is more difficult to be preserved. Such a practice is used to filter out noise and form binary image. If the information amount is too small, it is judged as incomplete information around the image, and used rectangle filling method to make the image more complete. Then combine the two results finally with DIBR (Depth Based Image Rendering) technology. The deeper depth value is, the larger displacement is, whereas the smaller depth value is the smaller displacement is, to form red and blue three-dimensional image. Currently, it is applied in digital TV as an input, and the image size is set as 125ms to produce an image. Also, the algorithm is implanted into the embedded board as implementation, and the image size is set as 146ms to produce an image. Thus, it can achieve immediate conversion of 2D to 3D.
APA, Harvard, Vancouver, ISO, and other styles
33

Lee, Juisheng, and 李睿勝. "Low Complexity 3D Video/Image Depth Generation Algorithm and Its Realization on An Embedded System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/84221132551212199987.

Full text
Abstract:
碩士
國立中正大學
資訊工程研究所
99
This thesis presents a low complexity 3D video/image depth generation algorithm and its realization on an embedded system. This method could generate depth information based on a single view 2D video/image automatically for stereo applications. Owing to different scene characteristics of videos/images, we propose a classification mechanism which adopts edge points and vanishing lines in the 2D pictures for classifying videos/images to scenery feature, vanishing region feature and close-up feature. According to different features, we process this 2D content with the proposed low complexity techniques for generating depth map. Especially we propose a human detection method for strengthening the depth information in pictures with persons. Finally, we propose a post-processing method to repair the automatically generated depth map. With good quality in the generated depth map, we achieve about 75% in complexity reduction using the proposed algorithm. By the feature of low complexity, this algorithm could implement in hardware or embedded system software for all kinds of mobile device and digital photo frame.
APA, Harvard, Vancouver, ISO, and other styles
34

Jiang, Jiyin-Chang, and 江錦昌. "LTPS-TFT Digital Cell Library Apply 2D-to-3D Conversion System for Three-dimensional Video." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/6b3gw5.

Full text
Abstract:
碩士
國立臺北科技大學
電腦與通訊研究所
96
In order to build a circuit on panel, people tend to use special device materials such as Low Temperature Poly Silicon (LTPS)-TFT for various types of circuits on LCD panels. However, due to the substantial deviation on the device characteristics, most of the LTPS-TFT designs are still based on full-custom design style and therefore may waste much human resource and lead to a long turn-around time. In order to achieve a robust and repeatable LCD circuit design flow, we propose to construct a LTPS-TFT cell library for digital circuit designs. The video devices execute in autostereoscopic 3D display today. This paper describes 2D-to 3D conversion system of chip which doesn’t need depth information. It exercises with 3um 1P2M technology by LTPS TFT standard cell-base VLSI design flow for controller circuit and TSMC 0.18um 1P6M technology standard cell-based VLSI design flow for 2D/3D conversion system without depth information.
APA, Harvard, Vancouver, ISO, and other styles
35

Ling-FeiWei and 魏凌飛. "Algorithm and Architecture Design of Motion Compensated Infinite Impulse Response Filters for Depth Maps in 3D Video System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/05374262628791101330.

Full text
Abstract:
碩士
國立成功大學
電機工程學系碩博士班
99
The thesis proposed a motion compensated infinite impulse response filters algorithm for depth maps in 3D video system. The filter is used for post-processing of depth maps. If inappropriate depth maps are used to render stereo image pairs, the view of rendered image would flicker and become difficult to be converged by human’s eyes. Our proposed filter can be adopted to filter depth maps so as to relief the uncomfortable visual effect when human’s eye is focusing on the rendered image pair. Our proposed algorithm contains calculation of motion magnitude, and the coefficient of our filter is adaptive to motion magnitude to be applicable in different scene. The result shows that the visual quality of rendered image pair is improved after the used depth maps are post-processed by our proposed algorithm. The thesis also proposed architecture design of our algorithm. Following the top-down design flow methodology, we expand the design space, extract architectural information and explore from top to lower levels of abstraction. By the systemic approach, the optimal solution which meets the specification we explored in design space will mapped to our architecture design.
APA, Harvard, Vancouver, ISO, and other styles
36

Lin, Yi-Ting, and 林奕廷. "Design and Implementation of Real-Time Camera Dispatching System with Dynamic Requirements for video Surveillance in 3D Environments." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/55818723439362064670.

Full text
Abstract:
碩士
國立交通大學
電機工程學系
104
As people become more concerned regarding security issues, there is also an increasing demand on a good monitoring system to protect the safety of people or valuable properties. With the recent advance in camera technology, video surveillance becomes one of the most popular security monitoring system, since it can be applied to a variety of environments and situations, such as: companies, factories, and schools. However, most of current video surveillance systems only adopt simple 2D representation of camera’s field of view, while in reality, 3D objects, spaces or areas are involved, which cannot be covered precisely using a simple 2D model. Therefore, this research study aims to design a real-time video surveillance system that is more suitable for 3D environment given a certain viewing angle and resolution requirement. Cameras used in this system have horizontal, vertical and zoom function (Pan-Tilt-Zoom Camera), so that we can dynamically configure the camera focal length and direction through algorithms, to simultaneously monitor a number of objects, spaces or areas with different size, location and direction. The system can be applied to monitor emergency events, for example Office / factory / campus fire, unexpected staff / students gathering or fighting. It can also monitor unauthorized entry from visitors/strangers/burglars. In the performance evaluation section, we demonstrate the efficiency of our algorithm compared to other existing algorithms, by using less deployed cameras but can achieve higher coverage ratio, proving that our developed video surveillance system can accomplish the deployment of cameras within the valid period of time and monitor more object, space, and area.
APA, Harvard, Vancouver, ISO, and other styles
37

Herbison, N., S. Cobb, R. Gregson, I. Ash, R. Eastgate, J. Purdy, T. Hepburn, D. MacKeith, A. Foss, and I. BiT study group. "Interactive binocular treatment (I-BiT) for amblyopia: results of a pilot study of 3D shutter glasses system." 2013. http://hdl.handle.net/10454/9578.

Full text
Abstract:
No
PURPOSE: A computer-based interactive binocular treatment system (I-BiT) for amblyopia has been developed, which utilises commercially available 3D 'shutter glasses'. The purpose of this pilot study was to report the effect of treatment on visual acuity (VA) in children with amblyopia. METHODS: Thirty minutes of I-BiT treatment was given once weekly for 6 weeks. Treatment sessions consisted of playing a computer game and watching a DVD through the I-BiT system. VA was assessed at baseline, mid-treatment, at the end of treatment, and at 4 weeks post treatment. Standard summary statistics and an exploratory one-way analysis of variance (ANOVA) were performed. RESULTS: Ten patients were enrolled with strabismic, anisometropic, or mixed amblyopia. The mean age was 5.4 years. Nine patients (90%) completed the full course of I-BiT treatment with a mean improvement of 0.18 (SD=0.143). Six out of nine patients (67%) who completed the treatment showed a clinically significant improvement of 0.125 LogMAR units or more at follow-up. The exploratory one-way ANOVA showed an overall effect over time (F=7.95, P=0.01). No adverse effects were reported. CONCLUSION: This small, uncontrolled study has shown VA gains with 3 hours of I-BiT treatment. Although it is recognised that this pilot study had significant limitations-it was unblinded, uncontrolled, and too small to permit formal statistical analysis-these results suggest that further investigation of I-BiT treatment is worthwhile.
APA, Harvard, Vancouver, ISO, and other styles
38

Draréni, Jamil. "Exploitation de contraintes photométriques et géométriques en vision : application au suivi, au calibrage et à la reconstruction." Thèse, 2010. http://hdl.handle.net/1866/4868.

Full text
Abstract:
Cette thèse s’intéresse à trois problèmes fondamentaux de la vision par ordinateur qui sont le suivi vidéo, le calibrage et la reconstruction 3D. Les approches proposées sont strictement basées sur des contraintes photométriques et géométriques présentent dans des images 2D. Le suivi de mouvement se fait généralement dans un flux vidéo et consiste à suivre un objet d’intérêt identifié par l’usager. Nous reprenons une des méthodes les plus robustes à cet effet et l’améliorons de sorte à prendre en charge, en plus de ses translations, les rotations qu’effectue l’objet d’intérêt. Par la suite nous nous attelons au calibrage de caméras; un autre problème fondamental en vision. Il s’agit là, d’estimer des paramètres intrinsèques qui décrivent la projection d’entités 3D dans une image plane. Plus précisément, nous proposons des algorithmes de calibrage plan pour les cam ́eras linéaires (pushbroom) et les vidéo projecteurs lesquels ́etaient, jusque là, calibrés de façon laborieuse. Le troisième volet de cette thèse sera consacré à la reconstruction 3D par ombres projetée. À moins de connaissance à-priori sur le contenu de la scène, cette technique est intrinsèquement ambigüe. Nous proposons une méthode pour réduire cette ambiguïté en exploitant le fait que les spots de lumières sont souvent visibles dans la caméra.
The topic of this thesis revolves around three fundamental problems in computer vision; namely, video tracking, camera calibration and shape recovery. The proposed methods are solely based on photometric and geometric constraints found in the images. Video tracking, usually performed on a video sequence, consists in tracking a region of interest, selected manually by an operator. We extend a successful tracking method by adding the ability to estimate the orientation of the tracked object. Furthermore, we consider another fundamental problem in computer vision: cali- bration. Here we tackle the problem of calibrating linear cameras (a.k.a: pushbroom) and video projectors. For the former one we propose a convenient plane-based cali- bration algorithm and for the latter, a calibration algorithm that does not require a physical grid and a planar auto-calibration algorithm. Finally, we pointed our third research direction toward shape reconstruction using coplanar shadows. This technique is known to suffer from a bas-relief ambiguity if no extra information on the scene or light source is provided. We propose a simple method to reduce this ambiguity from four to a single parameter. We achieve this by taking into account the visibility of the light spots in the camera.
Cette thése a été réalisée dans le cadre d'une cotutelle avec l'Institut National Polytechnique de Grenoble (France). La recherche a été effectuée au sein des laboratoires de vision 3D (DIRO, UdM) et PERCEPTION-INRIA (Grenoble).
APA, Harvard, Vancouver, ISO, and other styles
39

"Image motion estimation for 3D model based video conferencing." 2000. http://library.cuhk.edu.hk/record=b5890519.

Full text
Abstract:
Cheung Man-kin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.
Includes bibliographical references (leaves 116-120).
Abstracts in English and Chinese.
Chapter 1) --- Introduction --- p.1
Chapter 1.1) --- Building of the 3D Wireframe and Facial Model --- p.2
Chapter 1.2) --- Description of 3D Model Based Video Conferencing --- p.3
Chapter 1.3) --- Wireframe Model Fitting or Conformation --- p.6
Chapter 1.4) --- Pose Estimation --- p.8
Chapter 1.5) --- Facial Motion Estimation and Synthesis --- p.9
Chapter 1.6) --- Thesis Outline --- p.10
Chapter 2) --- Wireframe model Fitting --- p.11
Chapter 2.1) --- Algorithm of WFM Fitting --- p.12
Chapter 2.1.1) --- Global Deformation --- p.14
Chapter a) --- Scaling --- p.14
Chapter b) --- Shifting --- p.15
Chapter 2.1.2) --- Local Deformation --- p.15
Chapter a) --- Shifting --- p.16
Chapter b) --- Scaling --- p.17
Chapter 2.1.3) --- Fine Updating --- p.17
Chapter 2.2) --- Steps of Fitting --- p.18
Chapter 2.3) --- Functions of Different Deformation --- p.18
Chapter 2.4) --- Experimental Results --- p.19
Chapter 2.4.1) --- Output wireframe in each step --- p.19
Chapter 2.4.2) --- Examples of Mis-fitted wireframe with incoming image --- p.22
Chapter 2.4.3) --- Fitted 3D facial wireframe --- p.23
Chapter 2.4.4) --- Effect of mis-fitted wireframe after compensation of motion --- p.24
Chapter 2.5) --- Summary --- p.26
Chapter 3) --- Epipolar Geometry --- p.27
Chapter 3.1) --- Pinhole Camera Model and Perspective Projection --- p.28
Chapter 3.2) --- Concepts in Epipolar Geometry --- p.31
Chapter 3.2.1) --- Working with normalized image coordinates --- p.33
Chapter 3.2.2) --- Working with pixel image coordinates --- p.35
Chapter 3.2.3) --- Summary --- p.37
Chapter 3.3) --- 8-point Algorithm (Essential and Fundamental Matrix) --- p.38
Chapter 3.3.1) --- Outline of the 8-point algorithm --- p.38
Chapter 3.3.2) --- Modification on obtained Fundamental Matrix --- p.39
Chapter 3.3.3) --- Transformation of Image Coordinates --- p.40
Chapter a) --- Translation to mean of points --- p.40
Chapter b) --- Normalizing transformation --- p.41
Chapter 3.3.4) --- Summary of 8-point algorithm --- p.41
Chapter 3.4) --- Estimation of Object Position by Decomposition of Essential Matrix --- p.43
Chapter 3.4.1) --- Algorithm Derivation --- p.43
Chapter 3.4.2) --- Algorithm Outline --- p.46
Chapter 3.5) --- Noise Sensitivity --- p.48
Chapter 3.5.1) --- Rotation vector of model --- p.48
Chapter 3.5.2) --- The projection of rotated model --- p.49
Chapter 3.5.3) --- Noisy image --- p.51
Chapter 3.5.4) --- Summary --- p.51
Chapter 4) --- Pose Estimation --- p.54
Chapter 4.1) --- Linear Method --- p.55
Chapter 4.1.1) --- Theory --- p.55
Chapter 4.1.2) --- Normalization --- p.57
Chapter 4.1.3) --- Experimental Results --- p.58
Chapter a) --- Synthesized image by linear method without normalization --- p.58
Chapter b) --- Performance between linear method with and without normalization --- p.60
Chapter c) --- Performance of linear method under quantization noise with different transformation components --- p.62
Chapter d) --- Performance of normalized case without transformation in z- component --- p.63
Chapter 4.1.4) --- Summary --- p.64
Chapter 4.2) --- Two Stage Algorithm --- p.66
Chapter 4.2.1) --- Introduction --- p.66
Chapter 4.2.2) --- The Two Stage Algorithm --- p.67
Chapter a) --- Stage 1 (Iterative Method) --- p.68
Chapter b) --- Stage 2 ( Non-linear Optimization) --- p.71
Chapter 4.2.3) --- Summary of the Two Stage Algorithm --- p.72
Chapter 4.2.4) --- Experimental Results --- p.72
Chapter 4.2.5) --- Summary --- p.80
Chapter 5) --- Facial Motion Estimation and Synthesis --- p.81
Chapter 5.1) --- Facial Expression based on face muscles --- p.83
Chapter 5.1.1) --- Review of Action Unit Approach --- p.83
Chapter 5.1.2) --- Distribution of Motion Unit --- p.85
Chapter 5.1.3) --- Algorithm --- p.89
Chapter a) --- For Unidirectional Motion Unit --- p.89
Chapter b) --- For Circular Motion Unit (eyes) --- p.90
Chapter c) --- For Another Circular Motion Unit (mouth) --- p.90
Chapter 5.1.4) --- Experimental Results --- p.91
Chapter 5.1.5) --- Summary --- p.95
Chapter 5.2) --- Detection of Facial Expression by Muscle-based Approach --- p.96
Chapter 5.2.1) --- Theory --- p.96
Chapter 5.2.2) --- Algorithm --- p.97
Chapter a) --- For Sheet Muscle --- p.97
Chapter b) --- For Circular Muscle --- p.98
Chapter c) --- For Mouth Muscle --- p.99
Chapter 5.2.3) --- Steps of Algorithm --- p.100
Chapter 5.2.4) --- Experimental Results --- p.101
Chapter 5.2.5) --- Summary --- p.103
Chapter 6) --- Conclusion --- p.104
Chapter 6.1) --- WFM fitting --- p.104
Chapter 6.2) --- Pose Estimation --- p.105
Chapter 6.3) --- Facial Estimation and Synthesis --- p.106
Chapter 6.4) --- Discussion on Future Improvements --- p.107
Chapter 6.4.1) --- WFM Fitting --- p.107
Chapter 6.4.2) --- Pose Estimation --- p.109
Chapter 6.4.3) --- Facial Motion Estimation and Synthesis --- p.110
Chapter 7) --- Appendix --- p.111
Chapter 7.1) --- Newton's Method or Newton-Raphson Method --- p.111
Chapter 7.2) --- H.261 --- p.113
Chapter 7.3) --- 3D Measurement --- p.114
Bibliography --- p.116
APA, Harvard, Vancouver, ISO, and other styles
40

Pin-ChenKuo and 郭品岑. "Fast Algorithms and Their VLSI Implementation for 3D Video Systems." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/70566420966927893880.

Full text
Abstract:
博士
國立成功大學
電機工程學系
104
In recent years, the three-dimensional (3D) video system, which is in the representation of color texture and depth, brings users a more realistic visual experience. To enhance the video quality of the three-dimensional video system, this dissertation proposed a three-dimensional video system with fast algorithms and hardware design. First, a 3DVC encoding system is proposed to reduce the computational complexity of three-dimensional high-efficiency video coding (3D-HEVC) by a fast mode decision method. In this method, the boundary of the depth map is extracted and applied for the unit partition mode detection before the partition for color texture coding. Compared with the related researches, this algorithm can reduce about 67.49% encoding time. Finally, in stereo matching computation, adaptive support weight with census transform-based algorithm is then proposed to estimate the depth map. In the adaptive support weight method, a two-stage method is proposed to reduce the computation. A rough depth map is rapidly generated in the first stage while an adaptive refinement method for each case is applied in the second stage. The computation time is reduced by about 85%. In the census transform based algorithm, an iterative aggregation process is proposed which reduces complexity and is suitable for hardware realization. Furthermore, the corresponding VLSI is provided, and the speed achieves 60 fps with 1080p resolution. For the depth image-based rendering (DIBR) system, two inpainting-based algorithms are proposed for nine views rendering and multi-view rendering, respectively. The nine views rendering DIBR method is implemented on the GPU which can increase the rendering speed. For the multi-view rendering, a patch-matched hole filling algorithm is proposed, where the best patch is searched with adaptive window size in the surrounding region to fill the holes. This work can support 1080p video in real-time at maximum operating frequency 160.2 MHz. This dissertation also proposed a hardware architecture for a new texture plus depth format-2Dcompatible packing and de-packing format. For the architecture, the operation frequency can reach 166.56 MHz, which can support the maximum frame size up to FHD in real-time.
APA, Harvard, Vancouver, ISO, and other styles
41

Ya-HanHu and 胡雅涵. "An Object-based 3D Effect Adjustment System for Stereoscopic Videos." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/akwgb9.

Full text
Abstract:
碩士
國立成功大學
電腦與通信工程研究所
103
Recently, the developments of three-dimension (3D) capture and display techniques grow rapidly, which bring in mass production of stereoscopic 3D contents. The 3D effect adjustment for stereoscopic images in post processes is a worth exploring issue. We could reduce the 3D depth of an object to restrain viewer visual exhaustion, or increase the 3D depth of an object to achieve better 3D perception for 3D contents. The post processing of 3D effect becomes a low-cost, adjustable, and feasible solution. To achieve this purpose, we proposed a framework which can adjust the 3D effect of a specific object according to the preference of the 3D effect director. However, the disparity and depth adjustments of the object could inevitably cause the view incompletion and the temporal discontinuity. Hence, we proposed a stereoscopic inpainting algorithm which can change the depth information and keep stereoscopic consistency and temporal continuity to achieve better viewing performance. Simulation results show that the proposed framework can reach the target of the 3D effect adjustment and maintain the viewing quality simultaneously. Furthermore, the framework is also compatible to other stereo matching and object segmentation methods.
APA, Harvard, Vancouver, ISO, and other styles
42

Hsi-ChunTseng and 曾喜駿. "Efficient 3D Video Packing and Comfort Disparity Modification for Traditional Broadcasting Systems." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/03443684237570760409.

Full text
Abstract:
碩士
國立成功大學
電腦與通信工程研究所
101
In recent year, 3DTV displays have become more popular in the consumer market. The 3D video service, which is widespread to everywhere through cable, terrestrial channels and the Internet, becomes the important issue recently. The bandwidth limitation and capability is still a main problem needed to be solved. The frame compatible format is the first difficulty. The most famous 3D frame compatible formats for transmitting stereoscopic views are side-by-side and top-and-bottom. We still need a better frame compatible format for packing one view with its depth information. The visual discomfort is the common problem when the audience views the stereoscopic 3D video for a long time period. According to the above problems, several efficient 3D video packing methods and comfort disparity modification are proposed in the thesis. Experimental results show that the coding performances of the novel 3D video packing formats are better than the current formats. The comfort disparity can help to adjust the existing DIBR techniques to generate more comfortable stereoscopic videos.
APA, Harvard, Vancouver, ISO, and other styles
43

Chung-ChenLiao and 廖崇辰. "3D Computer Graphic Special Effect Editing Systems for Multi-View Video Production." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/09649066564292724755.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Chiu, Han-Pang, and 邱漢邦. "3D C-String: A New Spatio-temporal Knowledge Representation for Video Database Systems." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/76749926736281833530.

Full text
Abstract:
碩士
國立臺灣大學
資訊管理研究所
89
The knowledge structure, called 2D C+-string, to represent the spatial relations between the objects in an image was proposed by P.W. Huang et al. It allows us to represent spatial knowledge in images. The knowledge structure, called 3D string, to represent the spatial and temporal relations between the objects in a video was proposed by A.L.P. Chen et al. In the 3D string representation, an object is represented by its central point and starting frame number. So, they cannot deal with the overlapping relation in spatial and temporal dimensions and with the information of motions and size changes. In this thesis, we propose a new spatio-temporal knowledge representation called 3D C-string. 3D C-string can overcome the weakness of 3D string. The knowledge structure of 3D C-string, based on the concepts of 2D C+-string, uses the projections of video objects to represent spatial and temporal relations between the objects in a video. Moreover, 3D C-string can keep track of the motions and size changes of the objects in a video. This approach can provide us an easy and efficient way to retrieve, visualize and manipulate video objects in video database systems. The string generation and video reconstruction algorithms for the 3D C-string representation of video objects are also developed. By introducing the concept of the template objects and nearest former objects. The string generated by the string generation algorithm is unique and the symbolic video reconstructed from a given 3D C-string is unique, too. In comparison with the spatial relation inference and similarity retrieval in image database systems, the counterparts in video database systems are a fuzzier concept. Therefore, we extend the idea behind the relation inference and similarity retrieval of images in 2D C+-string to 3D C-string. We also define the similarity measures and propose a similarity retrieval algorithm. Finally, some experiments are performed to show the efficiency and effectiveness of the proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
45

"Creating virtual environment by 3D computer vision techniques." 2000. http://library.cuhk.edu.hk/record=b5890257.

Full text
Abstract:
Lao Tze Kin Jackie.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.
Includes bibliographical references (leaves 83-87).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- 3D Modeling using Active Contour --- p.3
Chapter 1.2 --- Rectangular Virtual Environment Construction --- p.5
Chapter 1.3 --- Thesis Contribution --- p.7
Chapter 1.4 --- Thesis Outline --- p.7
Chapter 2 --- Background --- p.9
Chapter 2.1 --- Panoramic Representation --- p.9
Chapter 2.1.1 --- Static Mosaic --- p.10
Chapter 2.1.2 --- Advanced Mosaic Representation --- p.15
Chapter 2.1.3 --- Panoramic Walkthrough --- p.17
Chapter 2.2 --- Active Contour Model --- p.24
Chapter 2.2.1 --- Parametric Active Contour Model --- p.28
Chapter 2.3 --- 3D Shape Estimation --- p.29
Chapter 2.3.1 --- Model Formation with both intrinsic and extrinsic parameters --- p.29
Chapter 2.3.2 --- Model Formation with only Intrinsic Parameter and Epipo- lar Geometry --- p.32
Chapter 3 --- 3D Object Modeling using Active Contour --- p.39
Chapter 3.1 --- Point Acquisition Through Active Contour --- p.40
Chapter 3.2 --- Object Segmentation and Panorama Generation --- p.43
Chapter 3.2.1 --- Object Segmentation --- p.44
Chapter 3.2.2 --- Panorama Construction --- p.44
Chapter 3.3 --- 3D modeling and Texture Mapping --- p.45
Chapter 3.3.1 --- Texture Mapping From Parameterization --- p.46
Chapter 3.4 --- Experimental Results --- p.48
Chapter 3.4.1 --- Experimental Error --- p.49
Chapter 3.4.2 --- Comparison between Virtual 3D Model with Actual Model --- p.54
Chapter 3.4.3 --- Comparison with Existing Techniques --- p.55
Chapter 3.5 --- Discussion --- p.55
Chapter 4 --- Rectangular Virtual Environment Construction --- p.57
Chapter 4.1 --- Rectangular Environment Construction using Traditional (Hori- zontal) Panoramic Scenes --- p.58
Chapter 4.1.1 --- Image Manipulation --- p.59
Chapter 4.1.2 --- Panoramic Mosaic Creation --- p.59
Chapter 4.1.3 --- Measurement of Panning Angles --- p.61
Chapter 4.1.4 --- Estimate Side Ratio --- p.62
Chapter 4.1.5 --- Wireframe Modeling and Cylindrical Projection --- p.63
Chapter 4.1.6 --- Experimental Results --- p.66
Chapter 4.2 --- Rectangular Environment Construction using Vertical Panoramic Scenes --- p.67
Chapter 4.3 --- Building virtual environments for complex scenes --- p.73
Chapter 4.4 --- Comparison with Existing Techniques --- p.75
Chapter 4.5 --- Discussion and Future Directions --- p.77
Chapter 5 --- System Integration --- p.79
Chapter 6 --- Conclusion --- p.81
Bibliography --- p.87
APA, Harvard, Vancouver, ISO, and other styles
46

(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.

Find full text
Abstract:

An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ F1 = 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography