Dissertations / Theses: 'Video annotation'

1

Chaudhary, Ahmed. "Video annotation tools." Thesis, Texas A&M University, 2008. http://hdl.handle.net/1969.1/85911.

Abstract:

This research deals with annotations in scholarly work. Annotations have been studied by many people. A significant amount of research has shown that instead of implementing domain specific annotation applications a better approach is to develop general purpose annotation toolkits that can be used to create domain specific applications. A video annotation toolkit along with toolkits for searching, retrieving, analyzing and presenting videos can help achieve the broader goal of creating integrated work spaces for scholarly work in humanities research similar to existing environments in such fields as mathematics, engineering, statistics, software development and bioinformatics. This research implements a video annotation toolkit and evaluates it by looking at its usefulness in creating applications for different areas. It was found that many areas of study in the arts and sciences can benefit from a video annotation application tailored to their specific needs and that an annotation toolkit can significantly reduce the time for developing such applications. The toolkit was engineered through successive refinements of prototype applications developed for different application areas. The toolkit design was also guided by a set of features identified by the research community for an ideal general purpose annotation toolkit. This research contributes by combining these two different approaches to toolkit design and construction into a hybrid approach. This approach could be useful for similar or related efforts.

APA, Harvard, Vancouver, ISO, and other styles

2

Hartley, Edward. "Automating video annotation." Thesis, Lancaster University, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.435884.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Clement, Michael David. "Obstacle Annotation by Demonstration." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1722.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Mahmood, Muhammad Habib. "Motion annotation in complex video datasets." Doctoral thesis, Universitat de Girona, 2018. http://hdl.handle.net/10803/667583.

Full text

Abstract:

Motion segmentation refers to the process of separating regions and trajectories from a video sequence into coherent subsets of space and time. In this thesis, we created a new multifaceted motion segmentation dataset enclosing real-life long and short sequences, with different numbers of motions and frames per sequence, and real distortions with missing data. Trajectory- and region-based ground-truth is provided on all the frames of all the sequences. We also proposed a new semi-automatic tool for delineating the trajectories in complex videos, even in videos captured from moving cameras. With a minimal manual annotation of an object mask, the algorithm is able to propagate the label mask in all the frames. Object label correction based on static and moving occluder is performed by applying occluder mask tracking for a given depth ordering. The results show that our cascaded-naive approach provides successful results in a variety of video sequences.
La segmentació del moviment es refereix al procés de separar regions i trajectòries d'una seqüència de vídeo en subconjunts coherents d'espai i de temps. En aquesta tesi hem creat un nou i multifacètic dataset amb seqüències de la vida real que inclou diferent número de moviments i fotogrames per seqüència i distorsions amb dades incomplertes. A més, inclou ground-truth en tots els fotogrames basat en mesures de trajectòria i regió. Hem proposat també una nova eina semiautomàtica per delinear les trajectòries en vídeos complexos, fins i tot en vídeos capturats amb càmeres mòbils. Amb una mínima anotació manual dels objectes, l'algoritme és capaç de propagar-la en tots els fotogrames. Durant les oclusions, la correcció de les etiquetes es realitza aplicant el seguiment de la màscara per a cada ordre de profunditat. Els resultats obtinguts mostren que el nostre enfocament ofereix resultats reeixits en una àmplia varietat de seqüències de vídeo.

APA, Harvard, Vancouver, ISO, and other styles

5

Aydinlilar, Merve. "Semi-automatic Semantic Video Annotation Tool." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613966/index.pdf.

Full text

Abstract:

Semantic annotation of video content is necessary for indexing and retrieval tasks of video management systems. Currently, it is not possible to extract all high-level semantic information from video data automatically. Video annotation tools assist users to generate annotations to represent video data. Generated annotations can also be used for testing and evaluation of content based retrieval systems. In this study, a semi-automatic semantic video annotation tool is presented. Generated annotations are in MPEG-7 metadata format to ensure interoperability. With the help of image processing and pattern recognition solutions, annotation process is partly automated and annotation time is reduced. Annotations can be done for spatio-temporal decompositions of video data. Extraction of low-level visual descriptions are included to obtain complete descriptions.

APA, Harvard, Vancouver, ISO, and other styles

6

Foley-Fisher, Zoltan. "A pursuit method for video annotation." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/43613.

Full text

Abstract:

Video annotation is a process of describing or elaborating on objects or events represented in video. Part of this process involves time consuming manual interactions to define spatio-temporal entities - such as a region of interest within the video. This dissertation proposes a pursuit method for video annotation to quickly define a particular type of spatio-temporal entity known as a point- based path. A pursuit method is particularly suited to annotation contexts when a precise bounding region is not needed, such as when annotators draw attention to objects in consumer video. We demonstrate the validity of the pursuit method with measurements of both accuracy and annotation time when annotators create point-based paths. Annotator tool designers can now chose a pursuit method for suitable annotation contexts.

APA, Harvard, Vancouver, ISO, and other styles

7

Salway, Andrew. "Video annotation : the role of specialist text." Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/843350/.

Full text

Abstract:

Digital video is among the most information-intensive modes of communication. The retrieval of video from digital libraries, along with sound and text, is a major challenge for the computing community in general and for the artificial intelligence community specifically. The advent of digital video has set some old questions in a new light. Questions relating to aesthetics and to the role of surrogates - image for reality and text for image, invariably touch upon the link between vision and language. Dealing with this link computationally is important for the artificial intelligence enterprise. Interesting images to consider both aesthetically and for research in video retrieval include those which are constrained and patterned, and which convey rich meanings; for example, dance. These are specialist images for us and require a special language for description and interpretation. Furthermore, they require specialist knowledge to be understood since there is usually more than meets the untrained eye: this knowledge may also be articulated in the language of the specialism. In order to be retrieved effectively and efficiently, video has to be annotated-, particularly so for specialist moving images. Annotation involves attaching keywords from the specialism along with, for us, commentaries produced by experts, including those written and spoken specifically for annotation and those obtained from a corpus of extant texts. A system that processes such collateral text for video annotation should perhaps be grounded in an understanding of the link between vision and language. This thesis attempts to synthesise ideas from artificial intelligence, multimedia systems, linguistics, cognitive psychology and aesthetics. The link between vision and language is explored by focusing on moving images of dance and the special language used to describe and interpret them. We have developed an object-oriented system, KAB, which helps to annotate a digital video library with a collateral corpus of texts and terminology. User evaluation has been encouraging. The system is now available on the WWW.

APA, Harvard, Vancouver, ISO, and other styles

8

Silva, João Miguel Ferreira da. "People and object tracking for video annotation." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8953.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Object tracking is a thoroughly researched problem, with a body of associated literature dating at least as far back as the late 1970s. However, and despite the development of some satisfactory real-time trackers, it has not yet seen widespread use. This is not due to a lack of applications for the technology, since several interesting ones exist. In this document, it is postulated that this status quo is due, at least in part, to a lack of easy to use software libraries supporting object tracking. An overview of the problems associated with object tracking is presented and the process of developing one such library is documented. This discussion includes how to overcome problems like heterogeneities in object representations and requirements for training or initial object position hints. Video annotation is the process of associating data with a video’s content. Associating data with a video has numerous applications, ranging from making large video archives or long videos searchable, to enabling discussion about and augmentation of the video’s content. Object tracking is presented as a valid approach to both automatic and manual video annotation, and the integration of the developed object tracking library into an existing video annotator, running on a tablet computer, is described. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our interaction approaches to handle moving object selection on live video, which we have called “Hold and Overlay” and “Hold and Speed Up”. In addition, the results of a set of preliminary tests are reported.
project “TKB – A Transmedia Knowledge Base for contemporary dance” (PTDC/EA /AVP/098220/2008 funded by FCT/MCTES), the UTAustin – Portugal, Digital Media Program (SFRH/BD/42662/2007 FCT/MCTES) and by CITI/DI/FCT/UNL (Pest-OE/EEI/UI0527/2011)

APA, Harvard, Vancouver, ISO, and other styles

9

Dye, Brigham R. "Reliability of Pre-Service Teachers Coding of Teaching Videos Using Video-Annotation Tools." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/990.

Full text

Abstract:

Teacher education programs that aspire to helping pre-service teachers develop expertise must help students engage in deliberate practice along dimensions of teaching expertise. However, field teaching experiences often lack the quantity and quality of feedback that is needed to help students engage in meaningful teaching practice. The limited availability of supervising teachers makes it difficult to personally observe and evaluate each student teacher's field teaching performances. Furthermore, when a supervising teacher debriefs such an observation, the supervising teacher and student may struggle to communicate meaningfully about the teaching performance. This is because the student teacher and supervisor often have very different perceptions of the same teaching performance. Video analysis tools show promise for improving the quality of feedback student teachers receive in their teaching performance by providing a common reference for evaluative debriefing and allowing students to generate their own feedback by coding videos of their own teaching. This study investigates the reliability of pre-service teacher coding using a video analysis tool. This study found that students were moderately reliable coders when coding video of an expert teacher (49%-68%). However, when the reliability of student coding of their own teaching videos was audited, students showed a high degree of accuracy (91%). These contrasting findings suggest that coding reliability scores may not be simple indicators of student understanding of the teaching competencies represented by a coding scheme. Instead, reliability scores may also be subject to the influence of extraneous factors. For example, reliability scores in this study were influenced by differences in the technical aspects of how students implemented the coding system. Furthermore, reliability scores were influenced by how coding proficiency was measured. Because this study also suggests that students can be taught to improve their coding reliability, further research may improve reliability scores"-and make them a more valid reflection of student understanding of teaching competency-"by training students about the technical aspects of implementing a coding system.

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Yang. "Digital video segmentation and annotation in news programs." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23273082.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Goldman, Daniel R. "A framework for video annotation, visualization, and interaction /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6994.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Demirdizen, Goncagul. "An Ontology-driven Video Annotation And Retrieval System." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612592/index.pdf.

Full text

Abstract:

In this thesis, a system, called Ontology-Driven Video Annotation and Retrieval System (OntoVARS) is developed in order to provide a video management system which is used for ontology-driven semantic content annotation and querying. The proposed system is based on MPEG-7 ontology which provides interoperability and common communication platform with other MPEG-7 ontology compatible systems. The Rhizomik MPEG-7 ontology is used as the core ontology and domain specific ontologies are integrated to the core ontology in order to provide ontology-based video content annotation and querying capabilities to the user. The proposed system supports content-based annotation and spatio-temporal data modeling in video databases by using the domain ontology concepts. Moreover, the system enables ontology-driven query formulation and processing according to the domain ontology instances and concepts. In the developed system, ontology-driven concept querying, spatio-temporal querying, region-based and time-based querying capabilities are performed as simple querying types. Besides these simple query types, compound queries are also generated by combining simple queries with "
("
, "
)"
, "
AND"
and "
OR"
operators. For all these query types, the system supports both general and video specific query processing. By this means, the user is able to pose queries on all videos in the video databases as well as the details of a specific video of interest.

APA, Harvard, Vancouver, ISO, and other styles

13

Adam, Jameel. "Video annotation wiki for South African sign language." Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_1540_1304499135.

Full text

Abstract:

The SASL project at the University of the Western Cape aims at developing a fully automated translation system between English and South African Sign Language (SASL). Three important aspects of this system require SASL documentation and knowledge. These are: recognition of SASL from a video sequence, linguistic translation between SASL and English and the rendering of SASL. Unfortunately, SASL documentation is a scarce resource and no official or complete documentation exists. This research focuses on creating an online collaborative video annotation knowledge management system for SASL where various members of the community can upload SASL videos to and annotate them in any of the sign language notation systems, SignWriting, HamNoSys and/or Stokoe. As such, knowledge about SASL structure is pooled into a central and freely accessible knowledge base that can be used as required. The usability and performance of the system were evaluated. The usability of the system was graded by users on a rating scale from one to five for a specific set of tasks. The system was found to have an overall usability of 3.1, slightly better than average. The performance evaluation included load and stress tests which measured the system response time for a number of users for a specific set of tasks. It was found that the system is stable and can scale up to cater for an increasing user base by improving the underlying hardware.

APA, Harvard, Vancouver, ISO, and other styles

14

Bodin, Erik. "Video content annotation automation using machine learning techniques." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-155751.

Full text

Abstract:

Annotations describing semantic content within video material is essential for ecient search of such content, e.g. allowing for search engine queries to return only relevant segments of video clips within large content management systems. However, manual annotation of video material is a dull and time-consuming task, eectively lowering thequality and quantity of such annotations. In this rapport a system to automate most of the process is suggested. The system learns from video material with user-provided annotationsto infer annotations for new material automatically, without requiring any system changes between dierent user-created labeling schemes. The prototype of such a system presentedin this rapport, evaluated on a few concepts, is showing promising results for concepts with high inuence on the scene environments.
Annotering av semantiskt innehall av videomaterial ar kritiskt for eektiv sokning inomsadant material, vilket i sin tur mojliggor t.ex. att forfragningar till sokmotorer kanreturerna endast relevanta segment av videoklipp inom stora videohanteringssystem.Manuell annotering ar dock en trakig och tidsodande uppgift, vilket medfor lag kvalite ochliten mangd av sadana annoteringar. I denna uppsats foreslas ett system for attautomatisera det mesta av den processen. Systemet lar sig fran manuellt annoteratvideomaterial att inferera annoteringar for nytt material automatiskt, utan att kravaandringar pa systemet mellan olika anvandarskapta koncept att annotera. Prototypen somar presenterad i denna uppsats och utvarderad pa ett antal koncept visar lovande resultatfor koncept som har hogt inytande pa scenmiljoerna.

APA, Harvard, Vancouver, ISO, and other styles

15

Lindström, Lucas. "Towards a Video Annotation System using Face Recognition." Thesis, Umeå universitet, Institutionen för datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-85251.

Full text

Abstract:

A face recognition software framework was developed to lay the foundation for a future video annotation system. The framework provides a unified and extensible interface to multiple existing implementations of face detection and recognition algorithms from OpenCV and Wawo SDK. The framework supports face detection with cascade classification using Haar-like features, and face recognition with Eigenfaces, Fisherfaces, local binary pattern histograms, the Wawo algorithm and an ensemble method combining the output of the four algorithms. An extension to the cascade face detector was developed that covers yaw rotations. CAMSHIFT object tracking was combined with an arbitrary face recognition algorithm to enhance face recognition in video. The algorithms in the framework and the extensions were evaluated on several different test databases with different properties in terms of illumination, pose, obstacles, background clutter and imaging conditions. The results of the evaluation show that the algorithmic extensions provide improved performance over the basic algorithms under certain conditions.

APA, Harvard, Vancouver, ISO, and other styles

16

Wang, Yang, and 王揚. "Digital video segmentation and annotation in news programs." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31225305.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Markatopoulou, Foteini. "Machine learning architectures for video annotation and retrieval." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/44693.

Full text

Abstract:

In this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches.

APA, Harvard, Vancouver, ISO, and other styles

18

Lamia, Louis M. "Video annotation for choreographers on the NB platform." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/105994.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 71).
The NB Platform, originally developed to host discussion on class materials, has been adapted to suit the needs of dance choreographers. A Video Annotator was developed, and features requested by choreographers were added to help suit the platform to their needs. The updated annotator was then subjected to user testing to determine more ways in which software can be developed for the dance community and video annotation software can be improved.
by Louis M. Lamia.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

19

Li, Honglin. "Hierarchical video semantic annotation the vision and techniques /." Connect to this title online, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1071863899.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2003.
Title from first page of PDF file. Document formatted into pages; contains xv, 146 p.; also includes graphics. Includes bibliographical references (p. 136-146).

APA, Harvard, Vancouver, ISO, and other styles

20

VON, WITTING DANIEL. "Annotation and indexing of video content basedon sentiment analysis." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-156387.

Full text

Abstract:

Due to scientific advances in mobility and connectivity, digital media can be distributed to multiple platforms by streams and video on demand services. The abundance of video productions poses a problem in term sof storage, organization and cataloging. How the movies or TV-series should be sorted and retrieved is much dictated by user preferences,motivating proper indexing and an notation of video content. While movies tend to be described by keywords or genre, this thesis constitutesan attempt to automatically index videos, based on their semantics. Representing a video by the sentiment it invokes, would not only be more descriptive, but could also be used to compare movies directly based onthe actual content. Since filmmaking is biased by human perception,this project looks to utilize these characteristics for machine learning.The video is modeled as a sequence of shots, attempting to capture the temporal nature of the information. Sentiment analysis of videos has been used as labels in a supervised learning algorithm, namely a SVM using a string kernel. Besides the specifics of learning, the work of this thesis involve other relevant fields such a feature extraction and videosegmentation. The results show that there are patterns in video fit for learning; however the performance of the method is inconclusive due to lack of data. It would therefore be interesting to evaluate the approach further, using more data along with minor modifications.
Tack vare tekniska framsteg inom mobilitet och tillgänglighet, kan media såsom film distribueras till flertalet olika plattformar, i form avströmning eller liknande tjänster. Det enorma utbudet av TV-serier och film utgör svårigheter för hur materialet ska lagras, sorteras och katalogiseras. Ofta är det dessutom användarna som ställer krav på vad somär relevant i en sökning. Det påvisar vikten av lämplig notation och indexering.I dag används oftast text som beskrivning av videoinnehållet, i form av antingen genre eller nyckelord. Det här arbetet är ett försök till att automatiskt kunna indexera film och serier, beroende på det semantiska innehållet. Att istället beskriva videomaterialet beroende på hur det uppfattas, samt de känslor som väcks, innebär en mer karaktäristisk skildring. Ett sådant signalement skulle beskriva det faktiska innehållet på ett annat sätt, som är mer lämpligt för jämförelser mellan två videoproduktioner. Eftersom skapandet av film anpassar sig till hur människor uppfattar videomaterial, kommer denna undersökning utnyttja de regler och praxis som används, som hjälp för maskininlärningen. Hur en film uppfattas, eller de känslor som framkallas, utgör en bas för inlärningen, då de används för att beteckna de olika koncept som ska klassificeras. En video representeras som en sekvens av klipp, med avsikt att fånga de tidsmässiga egenskaperna. Metoden som används för denna övervakade inlärning är en SVM som kan hantera data i form av strängar. Förutom de teknikaliteter som krävs för att förstå inlärningen,tar rapporten upp relevanta andra områden, t.ex. hur information ska extraheras och videosegmentering. Resultaten visar att det finns mönster i video, lämpliga för inlärning. På grund av för lite data, är det inte möjligt att avgöra hur metoden presterar. Det vore därför intressant med vidare analys, med mer data samt smärre modifikationer.

APA, Harvard, Vancouver, ISO, and other styles

21

Diakopoulos, Nicholas A. "Collaborative annotation, analysis, and presentation interfaces for digital video." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29680.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2010.
Committee Chair: Essa, Irfan; Committee Member: Abowd, Gregory; Committee Member: Bolter, Jay; Committee Member: Lampe, Cliff; Committee Member: Stasko, John. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

22

Shao, Wenbin. "Automatic annotation of digital photos." Access electronically, 2007. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20080403.120857/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Schroeter, Ronald. "Collaborative video indexing, annotation and discussion over high-bandwidth networks /." St. Lucia, Qld, 2004. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe18014.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Al-Athel, Mohammed S. "The role of terminology and local grammar in video annotation." Thesis, University of Surrey, 2008. http://epubs.surrey.ac.uk/1018/.

Full text

Abstract:

The linguistic annotation' of video sequences is an intellectually challenging task involving the investigation of how images and words are linked .together, a task that is ultimately financially rewarding in that the eventual automatic retrieval of video (sequences) can be much less time consuming, subjective and expensive than when retrieved manually. Much effort has been focused on automatic or semi-automatic annotation. Computational linguistic methods of video annotation rely on collections of collateral text in the form of keywords and proper nouns. Keywords are often used in a particular order indicating an identifiable pattern which is often limited and can subsequently be used to annotate the portion of a video where such a pattern occurred. Once' the relevant keywords and patterns have been stored, they can then be used to annotate the remainder of the video, excluding all collateral text which does not match the keywords or patterns. A new method of video annotation is presented in this thesis. The method facilitates a) annotation extraction of specialist terms within a corpus of collateral text; b) annotation identification of frequently used linguistic patterns to use in repeating key events within the data-set. The use of the method has led to the development of a system that can automatically assign key words and key patterns to a number of frames that are found in the commentary text approximately contemporaneous to the selected number of frames. The system does not perform video analysis; it only analyses the collateral text. The method is based on corpus linguistics and is mainly frequency based - frequency of occurrence of a key word or key pattern is taken as the basis of its representation. No assumptions are made about the grammatical structure of the language used in the collateral text, neither is a lexica of key words refined. Our system has been designed to annotate videos of football matches in English a!ld Arabic, and also cricket videos in English. The system has also been designed to retrieve annotated clips. The system not only provides a simple search method for annotated clips retrieval, it also provides complex, more advanced search methods.

APA, Harvard, Vancouver, ISO, and other styles

25

Yan, Fei. "Tennis ball tracking for automatic annotation of broadcast tennis video." Thesis, University of Surrey, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.441906.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Skowronek, Ondřej. "Anotace obrazu a videa formou hry." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236063.

Full text

Abstract:

This master thesis is oriented on a problem of creating video and image annotations. This problem is solved by crowdsourcing approach. Crowdsourcing games were designed and implemented to make solution of this problem . It was proven by testing that these games are capable of creating high quality annotations. Launching these games on a larger scale could create large database of annotated videos and images.

APA, Harvard, Vancouver, ISO, and other styles

27

Vrochidis, Stefanos. "Interactive video retrieval using implicit user feedback." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8729.

Full text

Abstract:

In the recent years, the rapid development of digital technologies and the low cost of recording media have led to a great increase in the availability of multimedia content worldwide. This availability places the demand for the development of advanced search engines. Traditionally, manual annotation of video was one of the usual practices to support retrieval. However, the vast amounts of multimedia content make such practices very expensive in terms of human effort. At the same time, the availability of low cost wearable sensors delivers a plethora of user-machine interaction data. Therefore, there is an important challenge of exploiting implicit user feedback (such as user navigation patterns and eye movements) during interactive multimedia retrieval sessions with a view to improving video search engines. In this thesis, we focus on automatically annotating video content by exploiting aggregated implicit feedback of past users expressed as click-through data and gaze movements. Towards this goal, we have conducted interactive video retrieval experiments, in order to collect click-through and eye movement data in not strictly controlled environments. First, we generate semantic relations between the multimedia items by proposing a graph representation of aggregated past interaction data and exploit them to generate recommendations, as well as to improve content-based search. Then, we investigate the role of user gaze movements in interactive video retrieval and propose a methodology for inferring user interest by employing support vector machines and gaze movement-based features. Finally, we propose an automatic video annotation framework, which combines query clustering into topics by constructing gaze movement-driven random forests and temporally enhanced dominant sets, as well as video shot classification for predicting the relevance of viewed items with respect to a topic. The results show that exploiting heterogeneous implicit feedback from past users is of added value for future users of interactive video retrieval systems.

APA, Harvard, Vancouver, ISO, and other styles

28

Oliveira, Laura Sofia Martins. "Interfaces for television content sharing and annotation." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10973.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
The ways of television consumption and production are changing significantly, with the viewers moving away from the traditional linear model. The various devices for accessing content have a significant role in these changes and suggest new paradigms of access. Social experience has also changed and takes on new forms molded by technology. Content sharing and production from users are some of the trends that globally influence how we relate to audiovisual content. Therefore the aim is to develop ways to access television content, that allow commenting and sharing, through multimodal annotations. These annotations include text, sketches, handwriting and images. Our solution provides users a way to watch and annotate television content, in real-time and in a collaborative environment. Using a mobile device, users can annotate content together with other users, while watching both content and annotations on a TV. These annotations can also be shared through social networks or saved on other platforms. Finally, the system also uses content provided by the users to search and link to television content.
Fundação para a Ciência e a Tecnologia - UTA-Est/MAI/0010/2009 - project ImTV (On-Demand Immersive-TV for Communities of Media Producers and Consumers)

APA, Harvard, Vancouver, ISO, and other styles

29

Kucuk, Dilek. "Exploiting Information Extraction Techniques For Automatic Semantic Annotation And Retrieval Of News Videos In Turkish." Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613043/index.pdf.

Full text

Abstract:

Information extraction (IE) is known to be an effective technique for automatic semantic indexing of news texts. In this study, we propose a text-based fully automated system for the semantic annotation and retrieval of news videos in Turkish which exploits several IE techniques on the video texts. The IE techniques employed by the system include named entity recognition, automatic hyperlinking, person entity extraction with coreference resolution, and event extraction. The system utilizes the outputs of the components implementing these IE techniques as the semantic annotations for the underlying news video archives. Apart from the IE components, the proposed system comprises a news video database in addition to components for news story segmentation, sliding text recognition, and semantic video retrieval. We also propose a semi-automatic counterpart of system where the only manual intervention takes place during text extraction. Both systems are executed on genuine video data sets consisting of videos broadcasted by Turkish Radio and Television Corporation. The current study is significant as it proposes the first fully automated system to facilitate semantic annotation and retrieval of news videos in Turkish, yet the proposed system and its semi-automated counterpart are quite generic and hence they could be customized to build similar systems for video archives in other languages as well. Moreover, IE research on Turkish texts is known to be rare and within the course of this study, we have proposed and implemented novel techniques for several IE tasks on Turkish texts. As an application example, we have demonstrated the utilization of the implemented IE components to facilitate multilingual video retrieval.

APA, Harvard, Vancouver, ISO, and other styles

30

Volkmer, Timo, and timovolkmer@gmx net. "Semantics of Video Shots for Content-based Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090220.122213.

Full text

Abstract:

Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future.

APA, Harvard, Vancouver, ISO, and other styles

31

Chen, Juan. "Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4256.

Full text

Abstract:

Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation.

APA, Harvard, Vancouver, ISO, and other styles

32

Uggerud, Nils. "AnnotEasy: A gesture and speech-to-text based video annotation tool for note taking in pre-recorded lectures in higher education." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105962.

Full text

Abstract:

This paper investigates students’ attitudes towards using gestures and speech-to- text (GaST) to take notes while watching recorded lectures. A literature review regarding video based learning, an expert interview, and a background survey regarding students’ note taking habits led to the creation of the prototype AnnotEasy, a tool that allows students to use GaST to take notes. AnnotEasy was tested in three iterations against 18 students, and was updated after each iteration. The students watched a five minute long lecture and took notes by using AnnotEasy. The participants’ perceived ease of use (PEU) and perceived usefulness (PU) was evaluated based on the TAM. Their general attitudes were evaluated in semi structured interviews. The result showed that the students had a high PEU and PU of AnnotEasy. They were mainly positive towards taking notes by using GaST. Further, the result suggests that AnnotEasy could facilitate the process of structuring a lecture’s content. Lastly, even though students had positive attitudes towards using speech to create notes, observations showed that this was problematic when the users attempted to create longer notes. This indicates that speech could be more beneficial for taking shorter notes.

APA, Harvard, Vancouver, ISO, and other styles

33

Cabral, Diogo Nuno Crespo Ribeiro. "Video interaction using pen-based technology." Doctoral thesis, Faculdade de Ciências e Tecnologia, 2014. http://hdl.handle.net/10362/11503.

Full text

Abstract:

Dissertação para obtenção do Grau de Doutor em Informática
Video can be considered one of the most complete and complex media and its manipulating is still a difficult and tedious task. This research applies pen-based technology to video manipulation, with the goal to improve this interaction. Even though the human familiarity with pen-based devices, how they can be used on video interaction, in order to improve it, making it more natural and at the same time fostering the user’s creativity is an open question. Two types of interaction with video were considered in this work: video annotation and video editing. Each interaction type allows the study of one of the interaction modes of using pen-based technology: indirectly, through digital ink, or directly, trough pen gestures or pressure. This research contributes with two approaches for pen-based video interaction: pen-based video annotations and video as ink. The first uses pen-based annotations combined with motion tracking algorithms, in order to augment video content with sketches or handwritten notes. It aims to study how pen-based technology can be used to annotate a moving objects and how to maintain the association between a pen-based annotations and the annotated moving object The second concept replaces digital ink by video content, studding how pen gestures and pressure can be used on video editing and what kind of changes are needed in the interface, in order to provide a more familiar and creative interaction in this usage context.
This work was partially funded by the UTAustin-Portugal, Digital Media, Program (Ph.D. grant: SFRH/BD/42662/2007 - FCT/MCTES); by the HP Technology for Teaching Grant Initiative 2006; by the project "TKB - A Transmedia Knowledge Base for contemporary dance" (PTDC/EAT/AVP/098220/2008 funded by FCT/MCTES); and by CITI/DI/FCT/UNL (PEst-OE/EEI/UI0527/2011)

APA, Harvard, Vancouver, ISO, and other styles

34

Jakob, Persson. "How to annotate in video for training machine learning with a good workflow." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-187078.

Full text

Abstract:

Artificial intelligence and machine learning is used in a lot of different areas, one of those areas is image recognition. In the production of a TV-show or film, image recognition can be used to help the editors to find specific objects, scenes, or people in the video content, which speeds up the production. But image recognition is not working perfect all the time and can not be used in the production of a TV-show or film as it is intended to. Therefore the image recognition algorithms needs to be trained on large datasets to become better. But to create these datasets takes time and tools that can let users create specific datasets and retrain algorithms to become better is needed. The aim of this master thesis was to investigate if it was possible to create a tool that can annotate objects and people in video content and using the data as training sets, and a tool that can retrain the output of an image recognition to make the image recognition become better. It was also important that the tools have a good workflow for the users. The study consisted of a theoretical study to gain more knowledge about annotation, and how to make a good UX-design with a good workflow. Interviews were also held to get more knowledge of what the requirements of the product was. It resulted in a user scenario and a workflow that was used together with the knowledge from the theoretical study to create a hi-fi prototype by using an iterative process with usability testing. This resulted in a final hi-fi prototype with a good design and a good workflow for the users, where it is possible to annotate objects and people with a bounding box, and where it is possible to retrain an image recognition program that has been used on video content.
Artificiell intelligens och maskininlärning används inom många olika områden, ett av dessa områden är bildigenkänning. Vid produktionen av ett TV-program eller av en film kan bildigenkänning användas för att hjälpa redigerarna att hitta specifika objekt, scener eller personer i videoinnehållet, vilket påskyndar produktionen. Men bildigenkänningsprogram fungerar inte alltid helt perfekt och kan inte användas i produktionen av ett TV-program eller film som det är tänkt att användas i det sammanhanget. För att förbättra bildigenkänningsprogram så behöver dess algoritm tränas på stora datasets av bilder och labels. Men att skapa dessa datasets tar tid och det behövs program som kan skapa datasets och återträna algoritmer för bildigenkänning så att de fungerar bättre. Syftet med detta examensarbete var att undersöka om det var möjligt att skapa ett verktyg som kan markera(annotera) objekt och personer i video och använda datat som träningsdata för algoritmer. Men även att skapa ett verktyg som kan återträna algoritmer för bildigenkänning så att de blir bättre utifrån datat man får från ett bildigenkänningprogram. Det var också viktigt att dessa verktyg hade ett bra arbetsflöde för användarna. Studien bestod av en teoretisk studie för att få mer kunskap om annoteringar i video och hur man skapar bra UX-design med ett bra arbetsflöde. Intervjuer hölls också för att få mer kunskap om kraven på produkten och vilka som skulle använda den. Det resulterade i ett användarscenario och ett arbetsflöde som användes tillsammans med kunskapen från den teoretiska studien för att skapa en hi-fi prototyp, där en iterativ process med användbarhetstestning användes. Detta resulterade i en slutlig hi-fi prototyp med bra design och ett bra arbetsflöde för användarna där det är möjligt att markera(annotera) objekt och personer med en bounding box och där det är möjligt att återträna algoritmer för bildigenkänning som har körts på video.

APA, Harvard, Vancouver, ISO, and other styles

35

Ren, Jinchang. "Semantic content analysis for effective video segmentation, summarisation and retrieval." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4251.

Full text

Abstract:

This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.

APA, Harvard, Vancouver, ISO, and other styles

36

Gokturk, Ozkan Ziya. "Metadata Extraction From Text In Soccer Domain." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609871/index.pdf.

Full text

Abstract:

Video databases and content based retrieval in these databases have become popular with the improvements in technology. Metadata extraction techniques are used for providing data to video content. One popular metadata extraction technique for mul- timedia is information extraction from text. For some domains, it is possible to &
#64257
nd accompanying text with the video, such as soccer domain, movie domain and news domain. In this thesis, we present an approach of metadata extraction from match reports for soccer domain. The UEFA Cup and UEFA Champions League Match Reports are downloaded from the web site of UEFA by a web-crawler. These match reports are preprocessed by using regular expressions and then important events are extracted by using hand-written rules. In addition to hand-written rules, two di&
#64256
erent machine learning techniques are applied on match corpus to learn event patterns and automatically extract match events. Extracted events are saved in an MPEG-7 &
#64257
le. A user interface is implemented to query the events in the MPEG-7 match corpus and view the corresponding video segments.

APA, Harvard, Vancouver, ISO, and other styles

37

Nielsen, Ryler Jay. "Japanese Vocabulary Learning Through an Interactive Video Platform: Comparative Effects of L1 Versus L2 Definitions and Kana Versus Kanji Presentation." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6096.

Full text

Abstract:

Advances in digital technology have recently allowed for richer text annotation in combination with authentic second language video media. As a result, many educational researchers are focusing increased attention on the effect this application of technology can have on second language acquisition. This study examines the comparative effectiveness of target vocabulary treatments with either native language (L1) definitions or target language (L2) definitions when target vocabulary is presented in either kana or kanji as a part of the subtitles of an L2 film based on participant performance on vocabulary assessments. This study also examines the participants' perceived levels of helpfulness of the varying word treatments. The results of the study suggest that providing annotations for target words in the L2 increases learning of that word more than L1 annotations for intermediate learners of Japanese. On the other hand, learners rated access to definitions in L1 as more helpful than L2 definitions, and they expressed their preference for understanding the story more than learning the target vocabulary.

APA, Harvard, Vancouver, ISO, and other styles

38

Derathé, Arthur. "Modélisation de la qualité de gestes chirurgicaux laparoscopiques." Thesis, Université Grenoble Alpes, 2020. https://thares.univ-grenoble-alpes.fr/2020GRALS021.pdf.

Full text

Abstract:

La chirurgie laparoscopique est une pratique de plus en plus communément utilisée dans différentes spécialités chirurgicales, du fait des grands avantages pour le patient en termes de complications et de temps d’hospitalisation. En revanche, cette pratique est très différente de la chirurgie dite « ouverte », et présente ses propres difficultés, notamment dans la manipulation des instruments chirurgicaux, et la maîtrise de l’espace opératoire. Une meilleure compréhension du geste chirurgical en laparoscopie permettrait d’améliorer les outils utilisés pour la formation des jeunes chirurgiens.L’objectif de ce travail était de développer et valider une méthode visant à expliquer certains aspects clés de la pratique du geste chirurgical en termes cliniques, à partir d’une approche algorithmique. La compréhension du contexte clinique de cette thèse étant essentielle, un important travail d’explicitation et de formalisation des connaissances du chirurgien a été effectué. La deuxième partie de ce travail a consisté à développer une méthode algorithmique visant à prédire la qualité du geste chirurgical et le chirurgien pratiquant. Enfin à travers l’analyse de données décrivant la qualité et la pratique du geste chirurgical, nous avons étudié et validé la pertinence clinique de nouveaux éléments de connaissances cliniques.Nous avons travaillé sur une cohorte de 30 patients opérés par gastrectomie longitudinale au sein du département de chirurgie digestive du CHU de Grenoble. Cette technique chirurgicale est aujourd’hui communément utilisé pour traiter les patients atteints d’obésité morbide ou accompagné de comorbidités. Grâce à une réflexion commune avec notre partenaire chirurgien, nous avons pu formaliser les notions importantes de cette procédure chirurgicale. Pour chacune des chirurgies de la cohorte, nous avons effectué trois annotations distinctes : une annotation de la procédure et des actions des mains du chirurgien, une évaluation de la qualité d’exposition de la scène chirurgicale à chaque geste de dissection effectué par le chirurgien, et enfin la segmentation complète de l’image associée à chacun des gestes de dissection évalués. L’annotation de la procédure et la segmentation ont rendu possible l’extraction de métriques caractéristiques du geste et de la scène chirurgicale.Ensuite, nous avons développé un algorithme dont l’objectif était la prédiction de la qualité d’exposition à partir des métriques. Nous avons également développé un environnement dédié à l’optimisation des hyper-paramètres de notre algorithme pour maximiser les performances en prédiction. L’intérêt de cet environnement était notamment de gérer les spécificités de notre jeu de données.Dans un troisième temps, nous avons mis en place une méthode permettant de confronter l’analyse algorithmique quantitative de nos données à l’expertise clinique des chirurgiens ayant effectué les chirurgies. Pour ce faire, nous avons d’abord extrait les variables les plus importantes pour notre tâche de prédiction. Puis nous avons traduit l’information portée par ces variables sous forme d’énoncés présentant une signification clinique. Enfin nous avons extrait des échantillons vidéos représentatifs de chacun de ces énoncés. A partir de ces énoncés accompagnés de leurs échantillons vidéos, nous avons pu construire un questionnaire de validation, et le présenter à nos partenaires chirurgiens. Nous avons ainsi mené une validation clinique visant à recueillir leur avis quant à la pertinence clinique de notre approche.Nous avons donc proposé une méthode d'analyse quantitative explicitant le lien entre des observations visuelles et temporelles et des critères cliniques relatifs à des chirurgies laparoscopiques. Une meilleure compréhension de ces liens permettrait, à terme, de proposer des systèmes d'aide à la formation des chirurgiens sur cette pratique complexe.hick up
Sous cœlioscopie, le traitement chirurgical permet une meilleure prise en charge du patient, et sa pratique est de plus en plus fréquente en routine clinique. Cette pratique présente néanmoins ses difficultés propres pour le chirurgien, et nécessite une formation prolongée pendant l’internat et en post-internat. Pour faciliter cette formation, il est notamment possible de développer des outils d’évaluation et d’analyse de la pratique chirurgicale.Dans cette optique, l’objectif de ce travail de thèse est d’étudier la faisabilité d’une méthodologie proposant, à partir d’un traitement algorithmique, des analyses à portée clinique pertinente pour le chirurgien. J’ai donc traité les problèmes suivants : Il m’a fallu recueillir et annoter un jeu de données, implémenter un environnement d’apprentissage dédié à la prédiction d’un aspect spécifique de la pratique chirurgicale, et proposer une approche permettant de traduire mes résultats algorithmiques sous une forme pertinente pour le chirurgien. Dès que cela était possible, nous avons cherché à valider ces différentes étapes de la méthodologie

APA, Harvard, Vancouver, ISO, and other styles

39

Tran, Hoang Tung. "Automatic tag correction in videos : an approach based on frequent pattern mining." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4028/document.

Full text

Abstract:

Nous présentons dans cette thèse un système de correction automatique d'annotations (tags) fournies par des utilisateurs qui téléversent des vidéos sur des sites de partage de documents multimédia sur Internet. La plupart des systèmes d'annotation automatique existants se servent principalement de l'information textuelle fournie en plus de la vidéo par les utilisateurs et apprennent un grand nombre de "classifieurs" pour étiqueter une nouvelle vidéo. Cependant, les annotations fournies par les utilisateurs sont souvent incomplètes et incorrectes. En effet, un utilisateur peut vouloir augmenter artificiellement le nombre de "vues" d'une vidéo en rajoutant des tags non pertinents. Dans cette thèse, nous limitons l'utilisation de cette information textuelle contestable et nous n'apprenons pas de modèle pour propager des annotations entre vidéos. Nous proposons de comparer directement le contenu visuel des vidéos par différents ensembles d'attributs comme les sacs de mots visuels basés sur des descripteurs SIFT ou des motifs fréquents construits à partir de ces sacs. Nous proposons ensuite une stratégie originale de correction des annotations basées sur la fréquence des annotations des vidéos visuellement proches de la vidéo que nous cherchons à corriger. Nous avons également proposé des stratégies d'évaluation et des jeux de données pour évaluer notre approche. Nos expériences montrent que notre système peut effectivement améliorer la qualité des annotations fournies et que les motifs fréquents construits à partir des sacs de motifs fréquents sont des attributs visuels pertinents
This thesis presents a new system for video auto tagging which aims at correcting the tags provided by users for videos uploaded on the Internet. Most existing auto-tagging systems rely mainly on the textual information and learn a great number of classifiers (on per possible tag) to tag new videos. However, the existing user-provided video annotations are often incorrect and incomplete. Indeed, users uploading videos might often want to rapidly increase their video’s number-of-view by tagging them with popular tags which are irrelevant to the video. They can also forget an obvious tag which might greatly help an indexing process. In this thesis, we limit the use this questionable textual information and do not build a supervised model to perform the tag propagation. We propose to compare directly the visual content of the videos described by different sets of features such as SIFT-based Bag-Of-visual-Words or frequent patterns built from them. We then propose an original tag correction strategy based on the frequency of the tags in the visual neighborhood of the videos. We have also introduced a number of strategies and datasets to evaluate our system. The experiments show that our method can effectively improve the existing tags and that frequent patterns build from Bag-Of-visual-Words are useful to construct accurate visual features

APA, Harvard, Vancouver, ISO, and other styles

40

Tarakci, Hilal. "An Ontology-based Multimedia Information Management System." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609865/index.pdf.

Full text

Abstract:

In order to manage the content of multimedia data, the content must be annotated. Although any user-defined annotation is acceptable, it is preferable if systems agree on the same annotation format. MPEG-7 is a widely accepted standard for multimedia content annotation. However, in MPEG-7, semantically identical metadata can be represented in multiple ways due to lack of precise semantics in its XML-based syntax. Unfortunately this prevents metadata interoperability. To overcome this problem, MPEG-7 standard is translated into an ontology. In this thesis, MPEG-7 ontology is used on top and the given user-defined ontologies are attached to the MPEG-7 ontology via a user friendly interface, thus building MPEG-7 based ontologies automatically. Our proposed system is an ontology-based multimedia information management framework due to its modular architecture, ease of integrating with domain specific ontologies naturally and automatic harmonization of MPEG-7 ontology and domain-specific ontologies. Integration with domain specific ontologies is carried out by enabling import of domain ontologies via a user-friendly interface which makes the system independent of application domains.

APA, Harvard, Vancouver, ISO, and other styles

41

Heggland, Jon. "OntoLog : Flexible Management of Semantic Video Content Annotations." Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-707.

Full text

Abstract:

To encode, query and present the semantic content of digital video precisely and flexibly is very useful for many kinds of knowledge work: system analysis and evaluation, documentation and education, to name a few. However, that kind of video management is not a trivial matter. The traditional stratified annotation model has quite poor facilities for specifying the meaning – the structure and relationships – of the strata. Because of this, it may also be troublesome to present the annotations to the users in a clear and flexible manner.

This thesis presents OntoLog, a system for managing the semantic content of video. It extends the stratified annotation model by defining the strata as objects and classes in ontologies, thereby making their semantic meaning more explicit and relating them to each other in a semantic network. The same ontologies are also used to define properties and objects for describing both the strata, individual video intervals and entire videos. This constitutes a very customisable, expressive and precise description model, without sacrificing simplicity and conceptual integrity.

Arranging the annotation strata in a near-hierarchical network with specified semantics (classes, subclasses and instances) also enables reasoning about the annotations during query and browsing. In particular, it enables visual aggregation of traditional timeline-based strata graphics. Using this to create compact content visualisations, the OntoLog system is able to present tens of videos on screen at the same time, thus providing inter-video browsing. By judiciously disaggregating selected parts of the strata hierarchy, users can focus on relevant strata at their preferred level of detail – overview-and-zoom functionality for semantic annotations, in other words.

The OntoLog system has been implemented in the form of six Java applications and web services – together covering annotation editing, browsing, analysis, search, query and presentation with various approaches – built on top of an RDF database founded on SQL. The system has been tested under realistic conditions in several real-world projects, with good results. A novel information gathering interface for OntoLog data, Savanta, has been created. This is based on an iterative interaction paradigm featuring inter-video browsing, filtering, navigation and context-sensitive temporal analysis of the annotations. In a comparative usability evaluation, Savanta is shown to outperform more traditional user interfaces for video search/browsing with regard to expressive power, straightforwardness and user satisfaction.

APA, Harvard, Vancouver, ISO, and other styles

42

Yilmazturk, Mehmet Celaleddin. "Online And Semi-automatic Annotation Of Faces In Personal Videos." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12611936/index.pdf.

Full text

Abstract:

Video annotation has become an important issue due to the rapidly increasing amount of video available. For efficient video content searches, annotation has to be done beforehand, which is a time-consuming process if done manually. Automatic annotation of faces for person identification is a major challenge in the context of content-based video retrieval. This thesis work focuses on the development of a semi-automatic face annotation system which benefits from online learning methods. The system creates a face database by using face detection and tracking algorithms to collect samples of the encountered faces in the video and by receiving labels from the user. Using this database a learner model is trained. While the training session continues, the system starts offering labels for the newly encountered faces and lets the user acknowledge or correct the suggested labels hence a learner is updated online throughout the video. The user is free to train the learner until satisfactory results are obtained. In order to create a face database, a shot boundary algorithm is implemented to partition the video into semantically meaningful segments and the user browses through the video from one shot boundary to the next. A face detector followed by a face tracker is implemented to collect face samples within two shot boundary frames. For online learning, feature extraction and classification methods which are computationally efficient are investigated and evaluated. Sequential variants of some robust batch classification algorithms are implemented. Combinations of feature extraction and classification methods have been tested and compared according to their face recognition accuracy and computational performances.

APA, Harvard, Vancouver, ISO, and other styles

43

Yaprakkaya, Gokhan. "Face Identification, Gender And Age Groups Classifications For Semantic Annotation Of Videos." Thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612848/index.pdf.

Full text

Abstract:

This thesis presents a robust face recognition method and a combination of methods for gender identification and age group classification for semantic annotation of videos. Local binary pattern histogram which has 256 bins and pixel intensity differences are used as extracted facial features for gender classification. DCT Mod2 features and edge detection results around facial landmarks are used as extracted facial features for age group classification. In gender classification module, a Random Trees classifier is trained with LBP features and an adaboost classifier is trained with pixel intensity differences. DCT Mod2 features are used for training of a Random Trees classifier and LBP features around facial landmark points are used for training another Random Trees classifier in age group classification module. DCT Mod2 features of the detected faces morped by two dimensional face morphing method based on Active Appearance Model and Barycentric Coordinates are used as the inputs of the nearest neighbor classifier with weights obtained from the trained Random Forest classifier in face identification module. Different feature extraction methods are tried and compared and the best achievements in the face recognition module to be used in the method chosen. We compared our classification results with some successful earlier works results in our experiments performed with same datasets and got satisfactory results.

APA, Harvard, Vancouver, ISO, and other styles

44

Zilevu, Kobla Setor. "Interactive Interfaces for Capturing and Annotating Videos of Human Movement." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/91424.

Full text

Abstract:

In this thesis, I describe the iterative service design process I used in identifying and understanding the needs of diverse stakeholders, the development of technologies to support their mutually beneficial needs, and the evaluation of the end-user experience with these technologies. Over three iterative design cycles, the set of identified end-user customers expanded to include the patient, the supervising therapist, the annotating therapist, and other members of the development team. Multiple versions of interactive movement capture and annotation tools were developed as the needs of these stakeholders were clarified and evolved, and the optimal data forms and structures became evident. Interactions between the stakeholders and the developed technologies operating in various environments were evaluated and assessed to help improve and optimize the entire service ecosystem. Results and findings from these three design cycles are being used to direct and shape my ongoing and future doctoral research
Master of Science
In this thesis, I describe the iterative service design process I used in identifying and understanding the needs of diverse stakeholders, the development of technologies to support their mutually beneficial needs, and the evaluation of the end-user experience with these technologies. Over three iterative design cycles, the set of identified end-user customers expanded to include the patient, the supervising therapist, the annotating therapist, and other members of the development team. Multiple versions of interactive movement capture and annotation tools were developed as the needs of these stakeholders were clarified and evolved, and the optimal data forms and structures became evident Interactions between the stakeholders and the developed technologies operating in various environments were evaluated and assessed to help improve and optimize the entire service ecosystem. Results and findings from these three design cycles are being used to direct and shape my ongoing and future doctoral research.

APA, Harvard, Vancouver, ISO, and other styles

45

Rocque, Ryan K. "A Study of the Effectiveness of Annotations in Improving the Listening Comprehension of Intermediate ESL Learners." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2370.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Zhao, Yang. "Person Retrieval with Deep Learning." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/411526.

Full text

Abstract:

Person retrieval aims at matching person images across multiple non-overlapping camera views. It has facilitated a wide range of important applications in intelligent video analysis. The task of person retrieval remains challenging due to dramatic changes on visual appearance that are caused by large intra-class variations from human pose and camera viewpoint, misaligned person detection and occlusion. How to learn discriminative features under these challenging conditions becomes the core issue for the task of person retrieval. According to the input modality, person retrieval could be categorised into image-based retrieval and video-based retrieval. Despite decades of efforts, person retrieval is still very challenging and remains unsolved due to the following factors: 1) the large intra-class variations (e.g., pose variation) of pedestrian images, leading to a dramatic change in their appearances; 2) only heuristically coarse-grained region strips or pixel-level annotations directly borrowed from pretrained human parsing models are employed, impeding the efficacy and practicality of region representation learning; 3) absence of useful temporal cues for boosting the video person retrieval system; This thesis reports a series of technical solutions towards addressing the above challenges in person retrieval. To address the large intra-class variations among the person images, we introduce an improved triplet loss such that the global feature representations from the same identity are closely clustered for person retrieval. To learn a discriminative region representation within fine-grained segments while avoiding expensive pixel-level annotations, we introduce a novel identityguided human region segmentation method that can predict informative region segments, enabling discriminative region representation learning for person retrieval. To extract useful temporal cues for video person retrieval, we build a two-stream architecture, named appearance-gait network, to jointly learn the appearance features and gait feaures from RGB video clips and silhouette video clips. To further provide potentially useful information for person retrieval, we introduce a lightweight and effective knowledge distillation method for facial landmark detection. We believe that the proposed person retrieval approaches can serve as benchmark methods and provide new perspectives for the person retrieval task.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

47

Prié, Yannick. "Vers une phénoménologie des inscriptions numériques. Dynamique de l'activité et des structures informationnelles dans les systèmes d'interprétation." Habilitation à diriger des recherches, Université Claude Bernard - Lyon I, 2011. http://tel.archives-ouvertes.fr/tel-00655574.

Full text

Abstract:

Ce mémoire est l'occasion de présenter nos travaux à l'Université Claude Bernard Lyon 1. Il est composé de trois parties, la première visant à proposer une thématisation originale des liens entre activité et inscriptions numériques de l'activité sous la forme de structures informationnelles, les deux suivants étant consacrés à nos thématiques principales de recherche qui sont, d'une part la lecture active audiovisuelle et les systèmes d'interprétation, et d'autre part les systèmes à base de traces modélisées. Le premier chapitre est consacré à l'exploration des liens entre l'activité informatiquement médiée et les représentations qui y sont impliquées. Une première étude critique décrit la notion d'inscription de connaissances proposée par B. Bachimont au sein d'une théorie du support pour permettre de penser la rencontre ou l'expérience qu'une conscience fait des dispositifs technologiques et des inscriptions. Cette proposition vise à thématiser la numéricité des inscriptions et nous intéresse donc à ce titre, mais elle se révèle à notre sens insuffisante pour penser un individu actif engagé dans un processus de manipulation d'inscriptions. Une seconde étude est alors consacrée aux liens entre action, activité et inscriptions dans les théories dites " post-cognitivistes " de la cognition. Les inscriptions soutiennent l'activité et la sous-tendent tout à la fois, et nous nous intéresserons particulièrement aux inscriptions dans leur perception et leur manipulation par un être humain suivant la théorie instrumentale. Cependant, les inscriptions en tant qu'elles peuvent être numériques ne sont que peu thématisées. Notre dernière étude est orientée autour de la proposition de penser la notion de structure informationnelle et les instruments associés comme permettant d'articuler le monde numérique à l'activité humaine et l'activité humaine au monde numérique. Une structure informationnelle est une inscription numérique en acte, objectivable, mais non obligatoirement canonique, c'est-à-dire manipulée explicitement par le système. Cette proposition permet de penser le côté humain de l'activité instrumentée tout en conservant le calcul et les représentations associées comme préoccupation informatique. Nous présentons également la notion d'espace informationnel qu'un utilisateur énacte, et les directions de recherche ouvertes par nos propositions. Le deuxième chapitre est principalement consacré à la présentation de nos travaux sur les systèmes de lecture active audiovisuelle. Nous proposons d'abord une étude rapide sur le cadre général des technologies intellectuelles comme soutenant le travail intellectuel, l'activité ouverte d'interprétation et de manipulation d'inscriptions de connaissances personnelles. De telles inscriptions et réinscriptions se font au sein de ce que nous proposons d'appeler des systèmes d'interprétation qui offrent aux utilisateurs la possibilité de manipuler consciemment des structures informationnelles de toutes sortes, par exemple sous la forme de données, de schémas ou de feuilles de style et de formulaires, et de partager celles-ci comme réifications de pratiques. La lecture active audiovisuelle est une activité intellectuelle qui s'effectue dans un système d'interprétation permettant de construire des hypervidéos à partir d'annotations. Nous présentons alors nos travaux autour du projet Advene (Annotate Digital Video, Exchange on the NEt), notamment les modèles associés, l'outil générique Advene pour l'annotation et la construction d'hypervidéos, ainsi que quelques applications de lecture active associées à l'analyse des interactions et la critique filmique. Nous pouvons alors tirer un bilan des presque dix ans de ce projet et proposer quelques directions pour la suite. Le troisième et dernier chapitre présente essentiellement nos travaux sur les traces numériques. Nous définissons d'abord la notion de trace en général comme inscription permettant de viser le passé au cours d'une interprétation et présentons comment les traces d'activité médiées sont largement utilisées au sein de systèmes variés allant de l'analyse à la réflexivité. Nous considérons que l'enjeu est de manipuler des traces numérique explicites définies comme " inscriptions canoniques temporellement orientées " dans des systèmes d'interprétation orientés trace. Nous pouvons alors présenter la notion de trace modélisée comme un certain type de trace explicite, ainsi que nos travaux dans ce cadre depuis plus de dix ans : l'approche Musette, le cadre général des systèmes de base de traces modélisées (SBTm) ainsi que la formalisation des traces et des transformations pour construire des systèmes de gestion de bases de traces (SGBT). Différents travaux applicatifs sont ensuite présentés qui permettent d'illustrer les différentes utilisations des traces modélisées dans des contextes applicatifs variés, visant notamment le support à l'awareness, à la remémoration, à la réflexivité, à la redocumentation et au partage, ou encore à la reprise d'activité.

APA, Harvard, Vancouver, ISO, and other styles

48

Qiu, Shun-Qun, and 邱順群. "Video Bookmarks – Prototype of Video Annotation Tool." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/a6ddv7.

Full text

Abstract:

碩士
實踐大學
工業產品設計學系碩士班
106
"Video Bookmarks" is a supplementary information on the timeline in a movie screen. It has two display types. The first type is unexpanded and is presented on the timeline in the form of a dialog bubble; after being clicked, it is rendered in a dialog window, providing the comment content of the video paragraph.This study is to explore the possibility of the development of video bookmarker annotation tool, which is based on the learn and discussion of video. The research is divided into four stages: "using requirements exploration", "designing with users", "usability testing", "product prototype description", and so on. According to the interview data and the three versions of the transformation process to show the results of the study. Since the timeline of the video player has both the "status display" and "accept adjustments" features. Adding a subsidy message around the timeline will change the "status display" of the timeline to help the user pre-identify the selected video paragraph. Therefore, film bookmarks can provide three kinds of services: time axis supplementary information, annotated content to strengthen specific information, annotated content to bring up a total of contrasting information, and these three services have become the guiding signs for users in the process of non-linear browsing. According to the interview data, this design is suitable for "closed discussion" and knowledge representation belongs to "procedural knowledge" educational context, in order to help teachers and students to communicate and review the blind spot in the learning process. There are two kinds of operation behavior: reading video bookmark and adding video bookmark, which correspond to " video bookmark" and " annotation tool " and so on. Enables users to use annotation tool to edit video bookmark to clearly communicate the message. The tagging tool should retain the text string, graffiti, voice input and combined with the film control instructions to remove the pressure of time for the user to facilitate editing operations. The size of the video bookmark, display location and interactive type are important design conditions of this product. These are the development points of the basic version of the Movie Bookmarker-tagging tool. From the data from the usability test, we can infer the mapping effect of user identity "video bookmark" on the timeline and consider it to be a nudge. In the future, the "annotation tool " can be adapted in detail to help teachers and students edit " video bookmarks" so that both sides can have a cross-space-time exchange and discussion based on the video.

APA, Harvard, Vancouver, ISO, and other styles

49

Wu, Wen-Kai, and 吳文凱. "Wiki-based Collaborative Video Annotation System." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/11740303788819181923.

Full text

Abstract:

碩士
國立暨南國際大學
資訊工程學系
97
With the rapid development of the Web 2.0, sharing information on the social network is common and popular like Wikipedia and Blog. Wikipedia is the representative collaborative platform to allow users to write and edit the web content. Furthermore, Streaming and compression technology allow users to access multimedia data through network more easily than before. YouTube is the famous website to share the video. Our system offers an easy-to-use interface to annotate and add information on video. By integrating the Wiki spirit with sharing YouTube video, the user can collaboratively annotate and edit the video using our system platform. Finally, using the equation to determine the highest credit, the system provides the recommended annotated video to users.

APA, Harvard, Vancouver, ISO, and other styles

50

Hong, Guo-Xiang, and 洪國翔. "Optimal Training Set Selection for Video Annotation." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/28133091433393507547.

Full text

Abstract:

碩士
國立清華大學
電機工程學系
97
Most learning-based video semantic analysis methods hope to obtain the good semantic model that require a large training set to achieve good performances. However, annotating a large video is labor-intensive and the training data set collection is not easy either. Generally, most of training set selection schemes adopted a random selection or select parts of the video data, they neglect similar and extensive characteristics of the training set. In this thesis, we propose several different methods to construct the training set and reduce user involvement. They are clustering-based, spatial dispersiveness, temporal dispersiveness, and sample-based. Using those selection schemes, we hope to construct a small size and effective training set by using spatial and temporal distribution, clustering information of the whole video data. Therefore if the selected training data can represent the characteristic of the whole video data, the classification performance will be better even when the size of the training set is smaller than that of the whole video data. We can choose the best samples for training a semantic model and use SVM to classify the category of each sample. This thesis intends to classify the shots of the semantic into the five categories: person, landscape, cityscape, map and others. Experimental results show that these methods are effective for training set selection in video annotation, and outperform random selection.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Video annotation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles