Dissertations / Theses on the topic 'Computer vision'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Computer vision.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Revell, James Duncan. "Computer vision elastography." Thesis, University of Bristol, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412361.
Full textChiu, Kevin (Kevin Geeyoung). "Vision on tap : an online computer vision toolkit." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/67714.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 60-64).
In this thesis, we present an online toolkit, based on a combination of a Scratch-based programming environment and computer vision libraries, manifested as blocks within the environment, integrated with a community platform for diffusing advances in computer vision to a general populace. We show that by providing these tools, non-developers are able to create and publish computer vision applications. The visual development environment includes a collection of algorithms that, despite being well known in the computer vision community, provide capabilities to commodity cameras that are not yet common knowledge. In support of this visual development environment, we also present an online community that allows users to share applications made in the environment, assisting the dissemination of both the knowledge of camera capabilities and advanced camera capabilities to users who have not yet been exposed to their existence or comfortable with their use. Initial evaluations consist of user studies that quantify the abilities afforded to the novice computer vision users by the toolkit, baselined against experienced computer vision users.
by Kevin Chiu.
S.M.
Rihan, Jonathan. "Computer vision based interfaces for computer games." Thesis, Oxford Brookes University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579554.
Full textKlomark, Marcus. "Occupant Detection using Computer Vision." Thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54363.
Full textThe purpose of this master’s thesis was to study the possibility to use computer vision methods to detect and classify objects in the front passenger seat in a car. This work presents different approaches to solve this problem and evaluates the usefulness of each technique. The classification information should later be used to modulate the speed and the force of the airbag, to be able to provide each occupant with optimal protection and safety.
This work shows that computer vision has a great potential in order to provide data, which may be used to perform reliable occupant classification. Future choice of method to use depends on many factors, for example costs and requirements on the system from laws and car manufacturers. Further, evaluation and tests of the methods in this thesis, other methods, the ABE approach and post-processing of the results should also be made before a reliable classification algorithm may be written.
Purdy, Eric. "Grammatical methods in computer vision." Thesis, The University of Chicago, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3557428.
Full textIn computer vision, grammatical models are models that represent objects hierarchically as compositions of sub-objects. This allows us to specify rich object models in a standard Bayesian probabilistic framework. In this thesis, we formulate shape grammars, a probabilistic model of curve formation that allows for both continuous variation and structural variation. We derive an EM-based training algorithm for shape grammars. We demonstrate the effectiveness of shape grammars for modeling human silhouettes, and also demonstrate their effectiveness in classifying curves by shape. We also give a general method for heuristically speeding up a large class of dynamic programming algorithms. We provide a general framework for discussing coarse-to-fine search strategies, and provide proofs of correctness. Our method can also be used with inadmissible heuristics.
Finally, we give an algorithm for doing approximate context-free parsing of long strings in linear time. We define a notion of approximate parsing in terms of restricted families of decompositions, and construct small families which can approximate arbitrary parses.
Newman, Rhys A. "Automatic learning in computer vision." Thesis, University of Oxford, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390526.
Full textCrossley, Simon. "Robust temporal stereo computer vision." Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.327614.
Full textFletcher, Gordon James. "Geometrical problems in computer vision." Thesis, University of Liverpool, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.337166.
Full textMirmehdi, Majid. "Transputer configurations for computer vision." Thesis, City University London, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.292339.
Full textHovhannisyan, Vahan. "Multilevel optimisation for computer vision." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/55874.
Full textClayton, Tyler (Tyler T. ). "Motion tracking with computer vision." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/109687.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (page 27).
In the Mechatronics laboratory, work is being done to develop methods for robot collision avoidance. A vital component of the project is motion detection and tracking. Currently, 3d-imaging software and hardware are employed, but this technique carries the drawbacks of blind spots in the environment. Since the camera is placed directly above the robot, there are blind spots underneath the robot, which are a major problem. The idea is for the robot to work side-by-side to a human counterpart, which would allow for quicker assembly of parts. But, with the current visual system, the robot would be unable to detect limbs that may maneuver underneath its linkages. This is an obvious problem. In this thesis, an automated rotary vision system attachable to each linkage of the robot is being proposed. By attaching cameras directly to the robot, we will have the increased ability to eliminate blind spots and detect objects in the environment. The proposed assembly involves a four-piece clamp-on shaft collar. Two parts will clamp to the linkages while the other two clamp around enabling free rotation. In testing, this proposed solution was able to track and detect, but it has drawbacks of increased weight to linkages and speed of image processing. Suggestions for improving upon the device are outlined. Overall, this device shows much promise for the Optical Assembly Station.
by Tyler Clayton.
S.B.
Christie, Gordon A. "Computer Vision for Quarry Applications." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/42762.
Full textMaster of Science
Anani-Manyo, Nina K. "Computer Vision and Building Envelopes." Kent State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=kent1619539038754026.
Full textChavali, Neelima. "Object Proposals in Computer Vision." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/56590.
Full textMaster of Science
Kileel, Joseph David. "Algebraic Geometry for Computer Vision." Thesis, University of California, Berkeley, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10282753.
Full textThis thesis uses tools from algebraic geometry to solve problems about three-dimensional scene reconstruction. 3D reconstruction is a fundamental task in multiview geometry, a field of computer vision. Given images of a world scene, taken by cameras in unknown positions, how can we best build a 3D model for the scene? Novel results are obtained for various challenging minimal problems, which are important algorithmic routines in Random Sampling Consensus pipelines for reconstruction. These routines reduce overfitting when outliers are present in image data.
Our approach throughout is to formulate inverse problems as structured systems of polynomial equations, and then to exploit underlying geometry. We apply numerical algebraic geometry, commutative algebra and tropical geometry, and we derive new mathematical results in these fields. We present simulations on image data as well as an implementation of general-purpose homotopy-continuation software for implicitization in computational algebraic geometry.
Chapter 1 introduces some relevant computer vision. Chapters 2 and 3 are devoted to the recovery of camera positions from images. We resolve an open problem concerning two calibrated cameras raised by Sameer Agarwal, a vision expert at Google Research, by using the algebraic theory of Ulrich sheaves. This gives a robust test for identifying outliers in terms of spectral gaps. Next, we quantify the algebraic complexity for notorious poorly understood cases for three calibrated cameras. This is achieved by formulating in terms of structured linear sections of an explicit moduli space and then computing via homotopy-continuation. In Chapter 4, a new framework for modeling image distortion is proposed, based on lifting algebraic varieties in projective space to varieties in other toric varieties. We check that our formulation leads to faster and more stable solvers than the state of the art. Lastly, Chapter 5 concludes by studying possible pictures of simple objects, as varieties inside products of projective planes. In particular, this dissertation exhibits that algebro-geometric methods can actually be useful in practical settings.
Reading, Ivan Alaric Derrick. "Pedestrian detection by computer vision." Thesis, Edinburgh Napier University, 1999. http://researchrepository.napier.ac.uk/Output/6915.
Full textRyan, David Andrew. "Crowd monitoring using computer vision." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/65652/1/David_Ryan_Thesis.pdf.
Full textAHMED, WAQAR. "Collaborative Learning in Computer Vision." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069010.
Full textDouillard, Arthur. "Continual Learning for Computer Vision." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS165.
Full textI first review the existing methods based on regularization for continual learning. While regularizing a model's probabilities is very efficient to reduce forgetting in large-scale datasets, there are few works considering constraints on intermediate features. I cover in this chapter two contributions aiming to regularize directly the latent space of ConvNet. The first one, PODNet, aims to reduce the drift of spatial statistics between the old and new model, which in effect reduces drastically forgetting of old classes while enabling efficient learning of new classes. I show in a second part a complementary method where we avoid pre-emptively forgetting by allocating locations in the latent space for yet unseen future class. Then, I describe a recent application of CIL to semantic segmentation. I show that the very nature of CSS offer new specific challenges, namely forgetting on large images and a background shift. We tackle the first problem by extending our distillation loss introduced in the previous chapter to multi-scales. The second problem is solved by an efficient pseudo-labeling strategy. Finally, we consider the common rehearsal learning, but applied this time to CSS. I show that it cannot be used naively because of memory complexity and design a light-weight rehearsal that is even more efficient. Finally, I consider a completely different approach to continual learning: dynamic networks where the parameters are extended during training to adapt to new tasks. Previous works on this domain are hard to train and often suffer from parameter count explosion. For the first time in continual computer vision, we propose to use the Transformer architecture: the model dimension mostly fixed and shared across tasks, except for an expansion of learned task tokens. With an encoder/decoder strategy where the decoder forward is specialized by a task token, we show state-of-the-art robustness to forgetting while our memory and computational complexities barely grow
Riba, Pi Edgar. "Geometric Computer Vision Techniques for Scene Reconstruction." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671624.
Full textDesde los inicios de la Visión por Computador, la reconstrucción de escenas ha sido uno de los temas más estudiados que ha llevado a una amplia variedad de nuevos descubrimientos y aplicaciones. La manipulación de objetos, la localización y mapeo, o incluso la generación de efectos visuales son diferentes ejemplos de aplicaciones en las que la reconstrucción de escenas ha tomado un papel importante para industrias como la robótica, la automatización de fábricas o la producción audiovisual. Sin embargo, la reconstrucción de escenas es un tema extenso que se puede abordar de muchas formas diferentes con soluciones ya existentes que funcionan de manera efectiva en entornos controlados. Formalmente, el problema de la reconstrucción de escenas puede formularse como una secuencia de procesos independientes. En esta tesis, analizamos algunas partes del pipeline de reconstrucción a partir de las cuales contribuimos con métodos novedosos utilizando Redes Neuronales Convolucionales (CNN) proponiendo soluciones innovadoras que consideran la optimización de los métodos de forma end-to-end. En primer lugar, revisamos el estado del arte de los detectores y descriptores de características locales clásicas y contribuimos con dos métodos novedosos que mejoran las soluciones preexistentes en el problema de reconstrucción de escenas. Es un hecho que la informática y la ingeniería de software son dos campos que suelen ir de la mano y evolucionan según necesidades mutuas facilitando el diseño de algoritmos complejos y eficientes. Por esta razón, contribuimos con Kornia, una libreria diseñada específicamente para trabajar con técnicas clásicas de visión por computadora conjuntamente con redes neuronales profundas. En esencia, creamos un marco que facilita el diseño de procesos complejos para algoritmos de visión por computadora para que puedan incluirse dentro de las redes neuronales y usarse para propagar gradientes dentro de un marco de optimización común. Finalmente, en el último capítulo de esta tesis desarrollamos el concepto antes mencionado de diseñar sistemas de forma conjunta con geometría proyectiva clásica. Por lo tanto, proponemos una solución al problema de la generación de vistas sintéticas mediante la alucinación de vistas novedosas de objetos altamente deformables utilizando un sistema conjunto con la geometría de la escena. En resumen, en esta tesis demostramos que con un diseño adecuado que combine los métodos clásicos de visión geométrica por computador con técnicas de aprendizaje profundo puede conducir a mejores soluciones para el problema de la reconstrucción de escenas.
From the early stages of Computer Vision, scene reconstruction has been one of the most studied topics leading to a wide variety of new discoveries and applications. Object grasping and manipulation, localization and mapping, or even visual effect generation are different examples of applications in which scene reconstruction has taken an important role for industries such as robotics, factory automation, or audio visual production. However, scene reconstruction is an extensive topic that can be approached in many different ways with already existing solutions that effectively work in controlled environments. Formally, the problem of scene reconstruction can be formulated as a sequence of independent processes which compose a pipeline. In this thesis, we analyse some parts of the reconstruction pipeline from which we contribute with novel methods using Convolutional Neural Networks (CNN) proposing innovative solutions that consider the optimisation of the methods in an end-to-end fashion. First, we review the state of the art of classical local features detectors and descriptors and contribute with two novel methods that inherently improve pre-existing solutions in the scene reconstruction pipeline. It is a fact that computer science and software engineering are two fields that usually go hand in hand and evolve according to mutual needs making easier the design of complex and efficient algorithms. For this reason, we contribute with Kornia, a library specifically designed to work with classical computer vision techniques along with deep neural networks. In essence, we created a framework that eases the design of complex pipelines for computer vision algorithms so that can be included within neural networks and be used to backpropagate gradients throw a common optimisation framework. Finally, in the last chapter of this thesis we develop the aforementioned concept of designing end-to-end systems with classical projective geometry. Thus, we contribute with a solution to the problem of synthetic view generation by hallucinating novel views from high deformable cloths objects using a geometry aware end-to-end system. To summarize, in this thesis we demonstrate that with a proper design that combine classical geometric computer vision methods with deep learning techniques can lead to improve pre-existing solutions for the problem of scene reconstruction.
Watiti, Tom Wanjala. "Vision-based virtual mouse system." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.
Full textWakefield, Jonathan P. "A framework for generic computer vision." Thesis, University of Huddersfield, 1994. http://eprints.hud.ac.uk/id/eprint/4003/.
Full textLankton, Shawn M. "Localized statistical models in computer vision." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31644.
Full textCommittee Chair: Tannenbaum, Allen; Committee Member: Al Regib, Ghassan; Committee Member: Niethammer, Marc; Committee Member: Shamma, Jeff; Committee Member: Stillman, Arthur; Committee Member: Yezzi, Anthony. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Barngrover, Christopher M. "Computer vision techniques for underwater navigation." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477884.
Full textTitle from first page of PDF file (viewed July 10, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (leaf 59).
Zandifar, Ali. "Computer vision for scene text analaysis." College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1767.
Full textThesis research directed by: Electrical Engineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Johansson, Björn. "Multiscale Curvature Detection in Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54966.
Full textThis thesis presents a new method for detection of complex curvatures such as corners, circles, and star patterns. The method is based on a second degree local polynomial model applied to a local orientation description in double angle representation. The theory of rotational symmetries is used to compute curvature responses from the parameters of the polynomial model. The responses are made more selective using a scheme of inhibition between different symmetry models. These symmetries can serve as feature points at a high abstraction level for use in hierarchical matching structures for 3D estimation, object recognition, image database search, etc.
A very efficient approximative algorithm for single and multiscale polynomial expansion is developed, which is used for detection of the complex curvatures in one or several scales. The algorithm is based on the simple observation that polynomial functions multiplied with a Gaussian function can be described in terms of partial derivatives of the Gaussian. The approximative polynomial expansion algorithm is evaluated in an experiment to estimate local orientation on 3D data, and the performance is comparable to previously tested algorithms which are more computationally expensive.
The curvature algorithm is demonstrated on natural images and in an object recognition experiment. Phase histograms based on the curvature features are developed and shown to be useful as an alternative compact image representation.
The importance of curvature is furthermore motivated by reviewing examples from biological and perceptual studies. The usefulness of local orientation information to detect curvature is also motivated by an experiment about learning a corner detector.
Bårman, Håkan. "Hierarchical curvature estimation in computer vision." Doctoral thesis, Linköpings universitet, Bildbehandling, 1991. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54887.
Full textSafari-Foroushani, Ramin. "Form registration, a computer vision approach." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0012/NQ52413.pdf.
Full textYuan, Dan. "Environmental exploration via computer vision techniques /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2007. http://uclibs.org/PID/11984.
Full textPhillips, Walter. "VHDL design of computer vision tasks." Honors in the Major Thesis, University of Central Florida, 2001. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/240.
Full textBachelors
Engineering
Computer Science
Marshall, Christopher. "Robot trajectory generation using computer vision." Thesis, University of Newcastle Upon Tyne, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443107.
Full textRobles-Kelly, Antonio A. "Graph-spectral methods for computer vision." Thesis, University of York, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.399252.
Full textAli, Abdulamer T. "Computer vision aided road traffic analysis." Thesis, University of Bristol, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333953.
Full textSofeikov, Konstantin Igorevich. "Measure concentration in computer vision applications." Thesis, University of Leicester, 2018. http://hdl.handle.net/2381/42791.
Full textViloria, John A. (John Alexander) 1978. "Optimizing clustering algorithms for computer vision." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86847.
Full textPanish, Robert Martin. "Vehicle egomotion estimation using computer vision." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/46370.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 107-108).
A vision based navigation alter is developed for application on UAVs and tested in simulation. This alter is meant to allow the UAV to navigate in GPS-denied environments using measurements from a suite of cameras. The extended Kalman alter integrates measurements from multiple non-overlapping cameras as well as an IMU and occasional GPS. Simulations are conducted to evaluate the performance of the alter in a variety of fight regimes as well as to assess the value of using multiple cameras. Simulations demonstrate the value of using multiple cameras for egomotion estimation. Multiple non-overlapping cameras are useful for resolving motion in an unobservable direction that manifests as an ambiguity between translation and rotation. Additionally, multiple cameras are extremely useful when flying in an environment such as an urban canyon, where features remain in the fields of view for a very short period of time.
by Robert Martin Panish.
S.M.
Churcher, Stephen. "VLSI neural networks for computer vision." Thesis, University of Edinburgh, 1993. http://hdl.handle.net/1842/13397.
Full textFarajidavar, Nazli. "Transductive transfer learning for computer vision." Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807998/.
Full textMatuszewski, Damian Janusz. "Computer vision for continuous plankton monitoring." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-24042014-150825/.
Full textMicroorganismos planctônicos constituem a base da cadeia alimentar marinha e desempenham um grande papel na redução do dióxido de carbono na atmosfera. Além disso, são muito sensíveis a alterações ambientais e permitem perceber (e potencialmente neutralizar) as mesmas mais rapidamente do que em qualquer outro meio. Como tal, não só influenciam a indústria da pesca, mas também são frequentemente utilizados para analisar as mudanças nas zonas costeiras exploradas e a influência destas interferências no ambiente e clima locais. Como consequência, existe uma forte necessidade de desenvolver sistemas altamente eficientes, que permitam observar comunidades planctônicas em grandes escalas de tempo e volume. Isso nos fornece uma melhor compreensão do papel do plâncton no clima global, bem como ajuda a manter o equilíbrio do frágil meio ambiente. Os sensores utilizados normalmente fornecem grandes quantidades de dados que devem ser processados de forma eficiente sem a necessidade do trabalho manual intensivo de especialistas. Um novo sistema de monitoramento de plâncton em grandes volumes é apresentado. Foi desenvolvido e otimizado para o monitoramento contínuo de plâncton; no entanto, pode ser aplicado como uma ferramenta versátil para a análise de fluídos em movimento ou em qualquer aplicação que visa detectar e identificar movimento em fluxo unidirecional. O sistema proposto é composto de três estágios: aquisição de dados, detecção de alvos e suas identificações. O equipamento óptico é utilizado para gravar imagens de pequenas particulas imersas no fluxo de água. A detecção de alvos é realizada pelo método baseado no Ritmo Visual, que acelera significativamente o tempo de processamento e permite um maior fluxo de volume. O método proposto detecta, conta e mede organismos presentes na passagem do fluxo de água em frente ao sensor da câmera. Além disso, o software desenvolvido permite salvar imagens segmentadas de plâncton, que não só reduz consideravelmente o espaço de armazenamento necessário, mas também constitui a entrada para a sua identificação automática. Para garantir o desempenho máximo de até 720 MB/s, o algoritmo foi implementado utilizando CUDA para GPGPU. O método foi testado em um grande conjunto de dados e comparado com a abordagem alternativa de quadro-a-quadro. As imagens obtidas foram utilizadas para construir um classificador que é aplicado na identificação automática de organismos em experimentos de análise de plâncton. Por este motivo desenvolveu-se um software para extração de características. Diversos subconjuntos das 55 características foram testados através de modelos de aprendizagem disponíveis. A melhor exatidão de aproximadamente 92% foi obtida através da máquina de vetores de suporte. Este resultado é comparável à identificação manual média realizada por especialistas. Este trabalho foi desenvolvido sob a co-orientacao do Professor Rubens Lopes (IO-USP).
Zhan, Beibei. "Learning crowd dynamics using computer vision." Thesis, Kingston University, 2008. http://eprints.kingston.ac.uk/20302/.
Full textRubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.
Full textLa interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
Pizenberg, Matthieu. "Interactive computer vision through the Web." Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0023.
Full textComputer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results
Burns, James Ian. "Agricultural Crop Monitoring with Computer Vision." Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/52563.
Full textMaster of Science
Millman, Michael Peter. "Computer vision for yarn quality inspection." Thesis, Loughborough University, 2000. https://dspace.lboro.ac.uk/2134/34196.
Full textPellegrini, Lorenzo <1993>. "Continual learning for computer vision applications." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10401/1/Lorenzo%20Pellegrini%20-%20PhD%20Thesis.pdf.
Full textBristow, Hilton K. "Registration and representation in computer vision." Thesis, Queensland University of Technology, 2016. https://eprints.qut.edu.au/99587/1/Hilton_Bristow_Thesis.pdf.
Full textRaufdeen, Ramzi A. "SE4S toolkit extension project vision diagramming tool build your vision." Thesis, California State University, Long Beach, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10147325.
Full textSustainability is an important topic when developing software because it helps develop ecofriendly programs. Software can contribute towards sustainability by supporting sustainable goals, which can be efficiently supported if considered early on in a project by requirements engineers. This project helps requirements engineers make that sustainable contribution through the development of the SE4S toolkit extension project–a vision diagramming tool that contributes towards sustainability. This interactive tool is developed using HTML, SVG, and JointJS library. The vision diagramming tool is an open source project that can be used in any browser, which allows requirements engineers to bring their visions to life while keeping sustainability in mind. Requirements engineers, with help from this tool, would be able to easily demonstrate their sustainability vision to their stakeholders and pass it on to rest of the development team.
TRUYENQUE, MICHEL ALAIN QUINTANA. "A COMPUTER VISION APPLICATION FOR HAND-GESTURES HUMAN COMPUTER INTERACTION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2005. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=6585@1.
Full textA Visão Computacional pode ser utilizada para capturar gestos e criar dispositivos de interação com computadores mais intuitivos e rápidos. Os dispositivos comerciais atuais de interação baseados em gestos utilizam equipamentos caros (dispositivos de seguimento, luvas, câmeras especiais, etc.) e ambientes especiais que dificultam a difusão para o público em geral. Este trabalho apresenta um estudo sobre a viabilidade de utilizarmos câmeras Web como dispositivo de interação baseado em gestos da Mão. Em nosso estudo consideramos que a mão humana está limpa, isto é, sem nenhum dispositivo (mecânico, magnético ou óptico) colocado nela. Consideramos ainda que o ambiente onde ocorre a interação tem as características de um ambiente de trabalho normal, ou seja, sem luzes ou panos de fundo especiais. Para avaliar a viabilidade deste mecanismo de interação, desenvolvemos alguns protótipos. Neles os gestos da mão e as posições dos dedos são utilizados para simular algumas funções presentes em mouses e teclados, tais como selecionar estados e objetos e definir direções e posições. Com base nestes protótipos apresentamos algumas conclusões e sugestões para trabalhos futuros.
Computer Vision can be used to capture gestures and create more intuitive and faster devices to interact with computers. Current commercial gesture-based interaction devices make use of expensive equipment (tracking devices, gloves, special cameras, etc.) and special environments that make the dissemination of such devices to the general public difficult. This work presents a study on the feasibility of using Web cameras as interaction devices based on hand-gestures. In our study, we consider that the hand is clean, that is, it has no (mechanical, magnetic or optical) device. We also consider that the environment where the interaction takes place has the characteristics of a normal working place, that is, without special lights or backgrounds. In order to evaluate the feasibility of such interaction mechanism, we have developed some prototypes of interaction devices. In these prototypes, hand gestures and the position of fingers were used to simulate some mouse and keyboard functions, such as selecting states and objects, and defining directions and positions. Based on these prototypes, we present some conclusions and suggestions for future works.
Cheda, Diego. "Monocular Depth Cues in Computer Vision Applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/121644.
Full textDepth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following holistic and alternative strategies to obtain more precise and robustness results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle from two novels methods that identify distant elements in the image, where the translation component of the image flow field is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our depth-based approaches contribute to better performances.
Moe, Anders. "Passive Aircraft Altitude Estimation using Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53415.
Full textThis thesis presents a number of methods to estimate 3D structures with a single translating camera. The camera is assumed to be calibrated and to have a known translation and rotation.
Applications for aircraft altitude estimation and ground structure estimation ahead of the aircraft are discussed. The idea is to mount a camera on the aircraft and use the motion estimates obtained in the inertia navigation system. One reason for this arrangement is to make the aircraft more passive, in comparison to conventional radar based altitude estimation.
Two groups of methods are considered, optical flow based and region tracking based. Both groups have advantages and drawbacks.
Two methods to estimate the optical flow are presented. The accuracy of the estimated ground structure is increased by varying the temporal distance between the frames used in the optical flow estimation algorithms.
Four region tracking algorithms are presented. Two of them use canonical correlation and the other two are based on sum of squared difference and complex correlation respectively.
The depth estimates are then temporally filtered using weighted least squares or a Kalman filter.
A simple estimation of the computational complexity and memory requirements for the algorithms is presented to aid estimation of the hardware requirements.
Tests on real flight sequences are performed, showing that the aircraft altitude can be estimated with a good accuracy.