To see the other types of publications on this topic, follow the link: Crowded scenes.

Dissertations / Theses on the topic 'Crowded scenes'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 18 dissertations / theses for your research on the topic 'Crowded scenes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ali, Saad. "Taming Crowded Visual Scenes." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3593.

Full text
Abstract:
Computer vision algorithms have played a pivotal role in commercial video surveillance systems for a number of years. However, a common weakness among these systems is their inability to handle crowded scenes. In this thesis, we have developed algorithms that overcome some of the challenges encountered in videos of crowded environments such as sporting events, religious festivals, parades, concerts, train stations, airports, and malls. We adopt a top-down approach by first performing a global-level analysis that locates dynamically distinct crowd regions within the video. This knowledge is then employed in the detection of abnormal behaviors and tracking of individual targets within crowds. In addition, the thesis explores the utility of contextual information necessary for persistent tracking and re-acquisition of objects in crowded scenes. For the global-level analysis, a framework based on Lagrangian Particle Dynamics is proposed to segment the scene into dynamically distinct crowd regions or groupings. For this purpose, the spatial extent of the video is treated as a phase space of a time-dependent dynamical system in which transport from one region of the phase space to another is controlled by the optical flow. Next, a grid of particles is advected forward in time through the phase space using a numerical integration to generate a "flow map". The flow map relates the initial positions of particles to their final positions. The spatial gradients of the flow map are used to compute a Cauchy Green Deformation tensor that quantifies the amount by which the neighboring particles diverge over the length of the integration. The maximum eigenvalue of the tensor is used to construct a forward Finite Time Lyapunov Exponent (FTLE) field that reveals the Attracting Lagrangian Coherent Structures (LCS). The same process is repeated by advecting the particles backward in time to obtain a backward FTLE field that reveals the repelling LCS. The attracting and repelling LCS are the time dependent invariant manifolds of the phase space and correspond to the boundaries between dynamically distinct crowd flows. The forward and backward FTLE fields are combined to obtain one scalar field that is segmented using a watershed segmentation algorithm to obtain the labeling of distinct crowd-flow segments. Next, abnormal behaviors within the crowd are localized by detecting changes in the number of crowd-flow segments over time. Next, the global-level knowledge of the scene generated by the crowd-flow segmentation is used as an auxiliary source of information for tracking an individual target within a crowd. This is achieved by developing a scene structure-based force model. This force model captures the notion that an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in his or her vicinity. The key ingredients of the force model are three floor fields that are inspired by research in the field of evacuation dynamics; namely, Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of moving from one location to the next by converting the long-range forces into local forces. The SFF specifies regions of the scene that are attractive in nature, such as an exit location. The DFF, which is based on the idea of active walker models, corresponds to the virtual traces created by the movements of nearby individuals in the scene. The BFF specifies influences exhibited by the barriers within the scene, such as walls and no-entry areas. By combining influence from all three fields with the available appearance information, we are able to track individuals in high-density crowds. The results are reported on real-world sequences of marathons and railway stations that contain thousands of people. A comparative analysis with respect to an appearance-based mean shift tracker is also conducted by generating the ground truth. The result of this analysis demonstrates the benefit of using floor fields in crowded scenes. The occurrence of occlusion is very frequent in crowded scenes due to a high number of interacting objects. To overcome this challenge, we propose an algorithm that has been developed to augment a generic tracking algorithm to perform persistent tracking in crowded environments. The algorithm exploits the contextual knowledge, which is divided into two categories consisting of motion context (MC) and appearance context (AC). The MC is a collection of trajectories that are representative of the motion of the occluded or unobserved object. These trajectories belong to other moving individuals in a given environment. The MC is constructed using a clustering scheme based on the Lyapunov Characteristic Exponent (LCE), which measures the mean exponential rate of convergence or divergence of the nearby trajectories in a given state space. Next, the MC is used to predict the location of the occluded or unobserved object in a regression framework. It is important to note that the LCE is used for measuring divergence between a pair of particles while the FTLE field is obtained by computing the LCE for a grid of particles. The appearance context (AC) of a target object consists of its own appearance history and appearance information of the other objects that are occluded. The intent is to make the appearance descriptor of the target object more discriminative with respect to other unobserved objects, thereby reducing the possible confusion between the unobserved objects upon re-acquisition. This is achieved by learning the distribution of the intra-class variation of each occluded object using all of its previous observations. In addition, a distribution of inter-class variation for each target-unobservable object pair is constructed. Finally, the re-acquisition decision is made using both the MC and the AC.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD
APA, Harvard, Vancouver, ISO, and other styles
2

Bhatnagar, Deepti S. M. Massachusetts Institute of Technology. "Dropped object detection in crowded scenes." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53204.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 83-85).
In the last decade, the topic of automated surveillance has become very important in the computer vision community. Especially important is the protection of critical transportation places and infrastructure like airport and railway stations. As a step in that direction, we consider the problem of detecting abandoned objects in a crowded scene. Assuming that the scene is being captured through a mid-field static camera, our approach consists of segmenting the foreground from the background and then using a change analyzer to detect any objects which meet certain criteria. In this thesis, we describe a background model and a method of bootstrapping that model in the presence of foreign objects in the foreground. We then use a Markov Random Field formulation to segment the foreground in image frames sampled periodically from the video camera. We use a change analyzer to detect foreground blobs that remain static through the scene and based on certain rules decide if the blob could be a potentially abandoned object.
by Deepti Bhatnagar.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
3

Pathan, Saira Saleem [Verfasser], Bernd [Akademischer Betreuer] Michaelis, and Ayoub [Akademischer Betreuer] Al-Hamadi. "Behavior understanding in non-crowded and crowded scenes / Saira Saleem Pathan. Betreuer: Bernd Michaelis ; Ayoub Al-Hamadi." Magdeburg : Universitätsbibliothek, 2012. http://d-nb.info/1053914083/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pathan, Saira Saleem Verfasser], Bernd [Akademischer Betreuer] [Michaelis, and Ayoub [Akademischer Betreuer] Al-Hamadi. "Behavior understanding in non-crowded and crowded scenes / Saira Saleem Pathan. Betreuer: Bernd Michaelis ; Ayoub Al-Hamadi." Magdeburg : Universitätsbibliothek, 2012. http://d-nb.info/1053914083/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Lu, and 王璐. "Three-dimensional model based human detection and tracking in crowded scenes." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46587421.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tang, Siyu [Verfasser], and Bernt [Akademischer Betreuer] Schiele. "People detection and tracking in crowded scenes / Siyu Tang ; Betreuer: Bernt Schiele." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2017. http://d-nb.info/1142919722/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Simonnet, Damien Remi Jules Joseph. "Detecting and tracking humans in crowded scenes based on 2D image understanding." Thesis, Kingston University, 2012. http://eprints.kingston.ac.uk/28213/.

Full text
Abstract:
Tracking pedestrians in surveillance videos is an important task, not only in itself but also as a component of pedestrian counting, activity and event recognition, and scene understanding in general. Robust tracking in crowded environments remains a major challenge, mainly due to the occlusions and interactions between pedestrians. Methods to detect humans in a single frame are becoming increasingly accurate. Therefore, the majority of multi-target tracking algorithms in crowds follow a tracking-by-detection approach, along with models of individual and group behaviour, and various types of features to re-identify any given pedestrian (and discriminate them from the remainder). The aim is, given a Closed Circuit TeleVision (CCTV) camera view (moving or static) of a crowded scene, to produce tracks that indicate which pedestrians are entering and leaving the scene to be used in further applications (e.g. a multi-camera tracking scenario). Therefore, this output should be accurate in terms of position, have few false alarms and identity changes (i.e. tracks have not to be fragmented nor switch identity). Consequently, the presented algorithm concentrates on two important characteristics. Firstly, production of a real-time or near real-time output to be practically usable for further applications without penalising the final system. Secondly, management of occlusions which is the main challenge in crowds. The methodology presented, based on a tracking-by-detection approach, proposes an advance over those two aspects through a hierarchical framework to solve short and long occlusions with two novel methods. First, at a fine temporal scale, kinematic features and appearance features based on non-occluded parts are combined to generate short and reliable 'tracklets'. More specifically, this part uses an occlusion map which attributes a local measurement (by searching over the non-occluded parts) to a target without a global measurement (i.e. a measurement generated by the global detector), and demonstrates better results in terms of tracklet length without generating more false alarms or identity changes. Over a longer scale, these tracklets are associated with each other to build up longer tracks for each pedestrian in the scene. This tracklet data association is based on a novel approach that uses dynamic time warping to locate and measure the possible similarities of appearances between tracklets, by varying the time step and phase of the frame-based visual feature. The method, which does not require any target initialisations or camera calibrations, shows significant improvements in terms of false alarms and identity changes, the latter being a critical point for evaluating tracking algorithms. The evaluation framework, based on different metrics introduced in the literature, consists of a set of new track-based metrics (in contrast to frame-based) which enables failure parts of a tracker to be identified and algorithms to be compared as a single value. Finally, advantages of the dual method proposed to solve long and short occlusions are to reduce simultaneously the problem of track fragmentation and identity switches, and to make it naturally extensible to a multi-camera scenario. Results are presented as a tag and track system over a network of moving and static cameras. In addition to public datasets for multi-target tracking in crowds (e.g. Oxford Town Centre (OTC) dataset) where the new methodology introduced (i.e. building tracklets based on non-occluded pedestrian parts plus re-identification with dynamic time warping) shows significant improvements. Two new datasets are introduced to test the robustness of the algorithm proposed in more challenging scenarios. Firstly, a CCTV shopping view centre is used to demonstrate the effectiveness of the algorithm in a more crowded scenario. Secondly, a dataset with a network of CCTV Pan Tilt Zoom (PTZ) cameras tracking a single pedestrian, demonstrates the capability of the algorithm to handle a very difficult scenario (abrupt motion and non-overlapping camera views) and therefore its applicability as a component of a multitarget tracker in a network of static and PTZ cameras. The thesis concludes with a critical analysis of the work and presents future research opportunities (notably the use of this framework in a non-overlapping network of static and PTZ cameras).
APA, Harvard, Vancouver, ISO, and other styles
8

Bažout, David. "Detekce anomálií v chování davu ve video-datech z dronu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445484.

Full text
Abstract:
There have been lots of new drone applications in recent years. Drones are also often used in the field of national security forces. The aim of this work is to design and implement a tool intended for crowd behavior analysis in drone videodata. This tool ensures identification of suspicious behavior of persons and facilitates its localization. The main benefits include the design of a suitable video stabilization algorithm to stabilize small jitters, as well as trace back of the lost scene. Furthermore, two anomaly detectors were proposed, differing in the method of feature vector extraction and background modeling. Compared to the state of the art approaches, they achieved comparable results, but at the same time they brought the possibility of online data processing.
APA, Harvard, Vancouver, ISO, and other styles
9

Mladinovic, Mirjam. "'In order when most out of order' : crowds and crowd scenes in Shakespearean drama." Thesis, University of Liverpool, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.569436.

Full text
Abstract:
This thesis investigates the representations of crowds and crowd scenes in Shakespearean drama. Contrary to the assumption that the crowd's character in early modern drama had a peripheral role, this thesis argues that Shakespeare's crowd is a complex "character" in its ,. own right, and that the playwright's use of it in his drama reveals its dramatic importance. / On the stage the crowd was not dangerous because its role was scripted. This study further proposes to view the character of the crowd from a perspective that has not been applied before in reading Shakespeare's drama. It employs Martin Buber's concept 'I-Thou', aiming to demonstrate that Shakespeare's dramatic characters should be perceived as "dramatic items", and examined through their relations, dramatic and theatrical. Furthermore, this thesis introduces the concept of 'the space of the character' which, unlike the term 'character', refers to theatrical relations that shape "dramatic identities" during the theatrical production. This thesis argues that our understanding of the dramatised hero and the crowd is only fully accomplished when we understand, and acknowledge, the relation between them, and that the relation is not only apparent, but inherent to crowd scenes. It is this non-tangible outcome of interaction between staged characters, and the network of these different theatrical relations, that constitutes the 'theatrical' effectiveness of the crowd scene. This thesis further argues that the crowd scenes are always political in nature, and that they focus not only on the interaction between the crowd and the authority figure, but also on the interaction between the stage and the audience. The key point is that the role of the audience in theatre has been widely debated and recognised, and yet the role of crowd scenes has not. This study insists that a crowd scene should be seen as a dramaturgical device or a theatrical trope that utilises the presence of the audience in such a way that no other scene can. It can incorporate the audience in the theatre and simultaneously give them voice on the stage. Through his dramatisation of the character of the crowd Shakespeare reforms our views about crowds. He reminds his audience that the "crowd" is not a many- headed multitude at all times, but that it consists of individuals with different view points. Shakespeare's crowd is thus meaningful and always' in order when most out of order'.
APA, Harvard, Vancouver, ISO, and other styles
10

Lister, Wayne Daniel. "Real-time rendering of animated crowd scenes." Thesis, University of East Anglia, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.551209.

Full text
Abstract:
Simulated crowds can be found in a wide range of real-time applications. Examples include urban planning and cultural heritage visualizations, disaster and military training simulations, through to perhaps most prominently the use of virtual crowds purely for entertainment purposes in the gaming industry. Crowd simulation is very much an interdisciplinary concern and its importance has motivated researchers from a variety of fields; including computer graphics, psychology and robotics. This thesis considers the problem purely from a computer graphics perspective and introduces three new techniques to animate and draw a crowd of virtual humans in real-time. Contribution 1 addresses vertex skinning and begins by noting that for scenes in which many thousands of characters are visualized, it is often the case that individuals are doing much the same thing. A caching system is therefore proposed and used to accelerate the rendering of a crowd by taking advantage of the temporal and intra-crowd coherencies that are inherent within a populated scene. The approach can be considered a geometric interpretation of dynamic impostors and is best suited to low-entropy scenes such as sports fans clapping and cheering in a stadium. Contributions 2 and 3 consider skeletal animation. For performance reasons previous works have relied heavily on pre-computation when animating their crowds but the associated trade- off is control. It is currently far too difficult to make members of a crowd do anything other than play a scripted animation clip and high-level techniques such as inverse kinematics are yet to be fully explored. This thesis describes how a combination of compute shaders and middleware can remove the need for pre-computation and enable a huge library of 'off-the- shelf' animation techniques, not usually available when visualizing a crowd, to be deployed on thousands of crowd members simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
11

Pellicanò, Nicola. "Tackling pedestrian detection in large scenes with multiple views and representations." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS608/document.

Full text
Abstract:
La détection et le suivi de piétons sont devenus des thèmes phares en recherche en Vision Artificielle, car ils sont impliqués dans de nombreuses applications. La détection de piétons dans des foules très denses est une extension naturelle de ce domaine de recherche, et l’intérêt croissant pour ce problème est lié aux évènements de grande envergure qui sont, de nos jours, des scenarios à risque d’un point de vue de la sûreté publique. Par ailleurs, les foules très denses soulèvent des problèmes inédits pour la tâche de détection. De par le fait que les caméras ont le champ de vision le plus grand possible pour couvrir au mieux la foule les têtes sont généralement très petites et non texturées. Dans ce manuscrit nous présentons un système complet pour traiter les problèmes de détection et de suivi en présence des difficultés spécifiques à ce contexte. Ce système utilise plusieurs caméras, pour gérer les problèmes de forte occultation. Nous proposons une méthode robuste pour l’estimation de la position relative entre plusieurs caméras dans le cas des environnements requérant une surveillance. Ces environnements soulèvent des problèmes comme la grande distance entre les caméras, le fort changement de perspective, et la pénurie d’information en commun. Nous avons alors proposé d’exploiter le flot vidéo pour effectuer la calibration, avec l’objectif d’obtenir une solution globale de bonne qualité. Nous proposons aussi une méthode non supervisée pour la détection des piétons avec plusieurs caméras, qui exploite la consistance visuelle des pixels à partir des différents points de vue, ce qui nous permet d’effectuer la projection de l’ensemble des détections sur le plan du sol, et donc de passer à un suivi 3D. Dans une troisième partie, nous revenons sur la détection supervisée des piétons dans chaque caméra indépendamment en vue de l’améliorer. L’objectif est alors d’effectuer la segmentation des piétons dans la scène en partant d’une labélisation imprécise des données d’apprentissage, avec des architectures de réseaux profonds. Comme dernière contribution, nous proposons un cadre formel original pour une fusion de données efficace dans des espaces 2D. L’objectif est d’effectuer la fusion entre différents capteurs (détecteurs supervisés en chaque caméra et détecteur non supervisé en multi-vues) sur le plan du sol, qui représente notre cadre de discernement. nous avons proposé une représentation efficace des hypothèses composées qui est invariante au changement de résolution de l’espace de recherche. Avec cette représentation, nous sommes capables de définir des opérateurs de base et des règles de combinaison efficaces pour combiner les fonctions de croyance. Enfin, notre approche de fusion de données a été évaluée à la fois au niveau spatial, c’est à dire en combinant des détecteurs de nature différente, et au niveau temporel, en faisant du suivi évidentiel de piétons sur de scènes à grande échelle dans des conditions de densité variable
Pedestrian detection and tracking have become important fields in Computer Vision research, due to their implications for many applications, e.g. surveillance, autonomous cars, robotics. Pedestrian detection in high density crowds is a natural extension of such research body. The ability to track each pedestrian independently in a dense crowd has multiple applications: study of human social behavior under high densities; detection of anomalies; large event infrastructure planning. On the other hand, high density crowds introduce novel problems to the detection task. First, clutter and occlusion problems are taken to the extreme, so that only heads are visible, and they are not easily separable from the moving background. Second, heads are usually small (they have a diameter of typically less than ten pixels) and with little or no textures. This comes out from two independent constraints, the need of one camera to have a field of view as high as possible, and the need of anonymization, i.e. the pedestrians must be not identifiable because of privacy concerns.In this work we develop a complete framework in order to handle the pedestrian detection and tracking problems under the presence of the novel difficulties that they introduce, by using multiple cameras, in order to implicitly handle the high occlusion issues.As a first contribution, we propose a robust method for camera pose estimation in surveillance environments. We handle problems as high distances between cameras, large perspective variations, and scarcity of matching information, by exploiting an entire video stream to perform the calibration, in such a way that it exhibits fast convergence to a good solution. Moreover, we are concerned not only with a global fitness of the solution, but also with reaching low local errors.As a second contribution, we propose an unsupervised multiple camera detection method which exploits the visual consistency of pixels between multiple views in order to estimate the presence of a pedestrian. After a fully automatic metric registration of the scene, one is capable of jointly estimating the presence of a pedestrian and its height, allowing for the projection of detections on a common ground plane, and thus allowing for 3D tracking, which can be much more robust with respect to image space based tracking.In the third part, we study different methods in order to perform supervised pedestrian detection on single views. Specifically, we aim to build a dense pedestrian segmentation of the scene starting from spatially imprecise labeling of data, i.e. heads centers instead of full head contours, since their extraction is unfeasible in a dense crowd. Most notably, deep architectures for semantic segmentation are studied and adapted to the problem of small head detection in cluttered environments.As last but not least contribution, we propose a novel framework in order to perform efficient information fusion in 2D spaces. The final aim is to perform multiple sensor fusion (supervised detectors on each view, and an unsupervised detector on multiple views) at ground plane level, that is, thus, our discernment frame. Since the space complexity of such discernment frame is very large, we propose an efficient compound hypothesis representation which has been shown to be invariant to the scale of the search space. Through such representation, we are capable of defining efficient basic operators and combination rules of Belief Function Theory. Furthermore, we propose a complementary graph based description of the relationships between compound hypotheses (i.e. intersections and inclusion), in order to perform efficient algorithms for, e.g. high level decision making.Finally, we demonstrate our information fusion approach both at a spatial level, i.e. between detectors of different natures, and at a temporal level, by performing evidential tracking of pedestrians on real large scale scenes in sparse and dense conditions
APA, Harvard, Vancouver, ISO, and other styles
12

Solmaz, Berkan. "Holistic Representations for Activities and Crowd Behaviors." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5870.

Full text
Abstract:
In this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into account the domain specific information for modeling these activities. The summary of the proposed solutions is presented in the following. The holistic description of videos is appealing for visual detection and classification tasks for several reasons including capturing the spatial relations between the scene components, simplicity, and performance [1, 2, 3]. First, we present a holistic (global) frequency spectrum based descriptor for representing the atomic actions performed by individuals such as: bench pressing, diving, hand waving, boxing, playing guitar, mixing, jumping, horse riding, hula hooping etc. We model and learn these individual actions for classifying complex user uploaded videos. Our method bypasses the detection of interest points, the extraction of local video descriptors and the quantization of local descriptors into a code book; it represents each video sequence as a single feature vector. This holistic feature vector is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence it integrates the information about the motion and scene structure. We tested our approach on two of the most challenging datasets, UCF50 [4] and HMDB51 [5], and obtained promising results which demonstrates the robustness and the discriminative power of our holistic video descriptor for classifying videos of various realistic actions. In the above approach, a holistic feature vector of a video clip is acquired by dividing the video into spatio-temporal blocks then concatenating the features of the individual blocks together. However, such a holistic representation blindly incorporates all the video regions regardless of their contribution in classification. Next, we present an approach which improves the performance of the holistic descriptors for activity recognition. In our novel method, we improve the holistic descriptors by discovering the discriminative video blocks. We measure the discriminativity of a block by examining its response to a pre-learned support vector machine model. In particular, a block is considered discriminative if it responds positively for positive training samples, and negatively for negative training samples. We pose the problem of finding the optimal blocks as a problem of selecting a sparse set of blocks, which maximizes the total classifier discriminativity. Through a detailed set of experiments on benchmark datasets [6, 7, 8, 9, 5, 10], we show that our method discovers the useful regions in the videos and eliminates the ones which are confusing for classification, which results in significant performance improvement over the state-of-the-art. In contrast to the scenes where an individual performs a primitive action, there may be scenes with several people, where crowd behaviors may take place. For these types of scenes the traditional approaches for recognition will not work due to severe occlusion and computational requirements. The number of videos is limited and the scenes are complicated, hence learning these behaviors is not feasible. For this problem, we present a novel approach, based on the optical flow in a video sequence, for identifying five specific and common crowd behaviors in visual scenes. In the algorithm, the scene is overlaid by a grid of particles, initializing a dynamical system which is derived from the optical flow. Numerical integration of the optical flow provides particle trajectories that represent the motion in the scene. Linearization of the dynamical system allows a simple and practical analysis and classification of the behavior through the Jacobian matrix. Essentially, the eigenvalues of this matrix are used to determine the dynamic stability of points in the flow and each type of stability corresponds to one of the five crowd behaviors. The identified crowd behaviors are (1) bottlenecks: where many pedestrians/vehicles from various points in the scene are entering through one narrow passage, (2) fountainheads: where many pedestrians/vehicles are emerging from a narrow passage only to separate in many directions, (3) lanes: where many pedestrians/vehicles are moving at the same speeds in the same direction, (4) arches or rings: where the collective motion is curved or circular, and (5) blocking: where there is a opposing motion and desired movement of groups of pedestrians is somehow prohibited. The implementation requires identifying a region of interest in the scene, and checking the eigenvalues of the Jacobian matrix in that region to determine the type of flow, that corresponds to various well-defined crowd behaviors. The eigenvalues are only considered in these regions of interest, consistent with the linear approximation and the implied behaviors. Since changes in eigenvalues can mean changes in stability, corresponding to changes in behavior, we can repeat the algorithm over clips of long video sequences to locate changes in behavior. This method was tested on over real videos representing crowd and traffic scenes.
Ph.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering
APA, Harvard, Vancouver, ISO, and other styles
13

Li, Wun-Jie, and 李文杰. "The Study of Anomaly Detection in Crowded Scenes using a Subspace Approach." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/17252558645219766152.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
105
In this thesis, we propose a new method for abnormal detection based on the Principal Component Analysis (PCA) and apply network traffic anomaly diagnosis to the detection of image anomalies. First, we obtained some relatively high points of response in the video by detecting the space-time interest points (STIPs), then gathered information around the points to form a cube, and finally segmented the picture with horizontal and vertical lines into partial windows, which divided the video into cuboids training separate models. We used several spatial and temporal features to describe the cuboids: histogram of oriented gradient (HOG), histogram of oriented optical flow (HOF), motion direction descriptor, and motion magnitude descriptor. These provided not only information in velocity and directionality, but also physical features of the cuboids. In deciding the model principles and residual principles, we resorted to the Principal Component Analysis and counted each individual feature as a data point, thereby calculating the distance between data point and model and judging whether the present data point is abnormal by comparing the distance value with the normal threshold. Because we used only a few specific variances for detection, we were able to reduce their dimension. We also compared our proposed method of calculation with some published datasets, and verified our validity, reliability and accuracy through simulation experiments.
APA, Harvard, Vancouver, ISO, and other styles
14

陳建榮. "Human counting and tracking in crowded scene." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/32103214266610394328.

Full text
Abstract:
碩士
國立交通大學
資訊科學與工程研究所
94
Human monitoring and controlling the crowded situation is not only tedious work but also easy to get mistakes. Automatic head counting and tracking can save the manpower and reduce the chance of human negligence. Because of the occlusion between people, it is difficulty to count human by the frontal view. In order to reduce the occlusion effect, we using the overhead view of people to develop a human counting and tracking method in crowded scene. Based on the radiation of grey-level gradient direction along the human head contour, the method detects human head position in image by clustering. And track multiple people by color and trajectory analysis from the detection results in image sequence. The experimental results presented no matter under sparse or crowed situation, our method can achieve above 80% correction rate. It presents our method doesn’t affected by occlusion effect and can be used in crowded scene. We implement our method as an automatic surveillance system and apply it in a real world.
APA, Harvard, Vancouver, ISO, and other styles
15

"Scene-Independent Crowd Understanding and Crowd Behavior Analysis." 2016. http://repository.lib.cuhk.edu.hk/en/item/cuhk-1292523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Wang, Jian-Cheng, and 王建程. "Multi-Mode Target Tracking on a Crowd Scene." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/86779133720054828783.

Full text
Abstract:
碩士
中華大學
資訊工程學系碩士班
95
With the great demand for constructing a safe and security environment, video surveillance becomes more and more important. Conventional video surveillance systems often have several shortcomings. First, target detection can’t be accurate under the light variation environment or clustering background. Especially, the light reflection and back-lighted problems can influence the target detection seriously. Second, multiple target tracking become difficult on a crowd scene because the split and merge or occlusion among the tracked targets occur frequently and irregularly. Third, it is difficult to the partition the tracked targets from a merged image blob and then the target tracking may fail. Finally, the tracking efficiency and precision are reduced by the inaccurate foreground detection. In this study, the spatial-temporal probability background model, multi-mode tracking scheme, color-based difference projection, and ground point detection are proposed to improve the abovementioned problems. In addition, in stead of using the top-down targets tracking the bottom-up targets tracking is adopted for target tracking on the crowd scene. Experimental results show that the targets on the crowd scene may be tracked with the correct tracking modes and with tracking rate above 15fps.
APA, Harvard, Vancouver, ISO, and other styles
17

WENG, WEI-TENG, and 翁偉騰. "Cross-Scenes and Multi-View Crowd Density Evaluation and Counting Based on Multi-Column Convolutional Neural Network." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/7m2bf2.

Full text
Abstract:
碩士
國立臺北大學
資訊工程學系
105
The global issue of crowd disaster management has been a serious concern for many years now. Many accidents occur due to unexpected crowd squeeze. With limited space availability during famous shows, protests, or religious occasions, a high crowd density can potentially result in an unexpected tragedy in the event of accidents or sometimes, even rumors, which can happen within a span of few seconds. In such a case, the force exerted by the rear of the crowd would cause the front of the crowd to feel extremely suffocated. There is, therefore, a stringent need for organizers of large-scale activities to avoid these accidents by controlling the number of people who gets assembled in a given region of the crowd. To achieve the purpose of crowd control and public safety, accurately estimating the number of crowds and computing crowd density by monitoring images or videos has become a popular research topic and a hard challenge for researchers of computer vision. The error rate of crowd counting is about 10% in traditional image processing method, but this is not enough for practical applications. In fact, the large accident usually occurs in these seemingly little error margins and hence the 10% error rate is indeed alarming. So we need more accurate technology to decrease the error and thereby decrease the probability of accidents. Deep learning is a new field of machine learning that is similar to a human brain due to the highly complicated and deep hierarchical structure. This motivates to establish a neural network that can simulate the human brain and use it for analysis and learning. Considering the challenges mentioned above, in this work, we employ a three-tier Convolution Neural Network based on the Multi-Column Convolution Neural Network (MCNN) system architecture to precisely estimate crowd density. We distinguish three regions from the far field to near field, to produce a crowd density map. Based on the MCNN system architecture we can detect changes in the size of a crowd according to a distance measure. We examined the possibilities of incorporating additional features and show their impacts on precisely estimating the crowd density map. In our test, we found promising results on the Shanghaitech dataset. Compared to the native MCNN, the accuracy of estimating the crowd counting using our proposed method, increases by 18%.
APA, Harvard, Vancouver, ISO, and other styles
18

Moria, Kawther. "Computer vision-based detection of fire and violent actions performed by individuals in videos acquired with handheld devices." Thesis, 2016. http://hdl.handle.net/1828/7423.

Full text
Abstract:
Advances in social networks and multimedia technologies greatly facilitate the recording and sharing of video data on violent social and/or political events via In- ternet. These video data are a rich source of information in terms of identifying the individuals responsible for damaging public and private property through vio- lent behavior. Any abnormal, violent individual behavior could trigger a cascade of undesirable events, such as vandalism and damage to stores and public facilities. When such incidents occur, investigators usually need to analyze thousands of hours of videos recorded using handheld devices in order to identify suspects. The exhaus- tive manual investigation of these video data is highly time and resource-consuming. Automated detection techniques of abnormal events and actions based on computer vision would o↵er a more e cient solution to this problem. The first contribution described in this thesis consists of a novel method for fire detection in riot videos acquired with handheld cameras and smart-phones. This is a typical example of computer vision in the wild, where we have no control over the data acquisition process, and where the quality of the video data varies considerably. The proposed spatial model is based on the Mixtures of Gaussians model and exploits color adjacency in the visible spectrum of incandescence. The experimental results demonstrate that using this spatial model in concert with motion cues leads to highly accurate results for fire detection in noisy, complex scenes of rioting crowds. The second contribution consists in a method for detecting abnormal, violent actions that are performed by individual subjects and witnessed by passive crowds. The problem of abnormal individual behavior, such as a fight, witnessed by passive bystanders gathered into a crowd has not been studied before. We show that the presence of a passive, standing crowd is an important indicator that an abnormal action might occur. Thus, detecting the standing crowd improves the performance of detecting the abnormal action. The proposed method performs crowd detection first, followed by the detection of abnormal motion events. Our main theoretical contribution consists in linking crowd detection to abnormal, violent actions, as well as in defining novel sets of features that characterize static crowds and abnormal individual actions in both spatial and spatio-temporal domains. Experimental results are computed on a custom dataset, the Vancouver Riot Dataset, that we generated using amateur video footage acquired with handheld devices and uploaded on public social network sites. Our approach achieves good precision and recall values, which validates our system’s reliability of localizing the crowds and the abnormal actions. To summarize, this thesis focuses on the detection of two types of abnormal events occurring in violent street movements. The data are gathered by passive participants to these movements using handheld devices. Although our data sets are drawn from one single social movement (the Vancouver 2011 Stanley cup riot) we are confident that our approaches would generalize well and would be helpful to forensic activities performed in the context of other similar violent occasions.
Graduate
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography