Dissertations / Theses: 'Audio data'

1

Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis." Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.

Full text

Abstract:

The currently dominating approach for rendering audio content in interactivemedia, such as video games and virtual reality, involves playback of static audiofiles. This approach is inflexible and requires management of large quantities of audio data. An alternative approach is procedural audio, where sound models are used to generate audio in real time from live inputs. While providing many advantages, procedural audio has yet to find widespread use in commercial productions, partly due to the audio produced by many of the proposed models not meeting industry standards. This thesis investigates how procedural audio can be performed using datadriven methods. We do this by specifically investigating how to generate the sound of car engines using neural audio synthesis. Building on a recently published method that integrates digital signal processing with deep learning, called Differentiable Digital Signal Processing (DDSP), our method obtains sound models by training deep neural networks to reconstruct recorded audio examples from interpretable latent features. We propose a method for incorporating engine cycle phase information, as well as a differentiable transient synthesizer. Our results illustrate that DDSP can be used for procedural engine sounds; however, further work is needed before our models can generate engine sounds without undesired artifacts and before they can be used in live real-time applications. We argue that our approach can be useful for procedural audio in more general contexts, and discuss how our method can be applied to other sound sources.
Det i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor

APA, Harvard, Vancouver, ISO, and other styles

2

Rydman, Oskar. "Data processing of Controlled Source Audio Magnetotelluric (CSAMT) Data." Thesis, Uppsala universitet, Geofysik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387246.

Full text

Abstract:

During this project three distinct methods to improve the data processing of Controlled Source Audio Magnetotellurics (CSAMT) data are implemented and their advantages and disadvantages are discussed. The methods in question are: Detrending the time series in the time domain, instead of detrending in the frequencydomain. Implementation of a coherency test to pinpoint data segments of low quality andremove these data from the calculations. Implementing a method to detect and remove transients from the time series toreduce background noise in the frequency spectra. Both the detrending in time domain and the transient removal shows potential in improvingdata quality even if the improvements are small(both in the (1-10% range). Due totechnical limitations no coherency test was implemented. Overall the processes discussedin the report did improve the data quality and may serve as groundwork for further improvementsto come.
Projektet behandlar tre stycken metoder för att förbättra signalkvaliten hos Controlled Source Audio Magnetotellurics (CSAMT) data, dessa implementeras och deras för- och nackdelar diskuteras. Metoderna som hanteras är: Avlägsnandet av trender från tidsserier i tidsdomänen istället för i frekvensdomänen. Implementationen av ett koherenstest för att identifiera ”dåliga” datasegment ochavlägsna dessa från vidare beräkningar. Implementationen av en metod för att både hitta och avlägsna transienter (dataspikar) från tidsserien för att minska bakgrundsbruset i frekvensspektrat. Både avlägsnandet av trender samt transienter visar positiv inverkan på datakvaliteten,även om skillnaderna är relativt små (båda på ungefär 1-10%). På grund av begränsningarfrån mätdatan kunde inget meningsfullt koherenstest utformas. Överlag har processernasom diskuteras i rapporten förbättrat datakvaliten och kan ses som ett grundarbete förfortsatta förbättringar inom området.

APA, Harvard, Vancouver, ISO, and other styles

3

Levy, Marcel Andrew. "Ringermute an audio data mining toolkit /." abstract and full text PDF (free order & download UNR users only), 2005. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1433402.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Larsen, Vegard Andreas. "Combining Audio Fingerprints." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8869.

Full text

Abstract:

Large music collections are now more common than ever before. Yet, search technology for music is still in its infancy. Audio fingerprinting is one method that allows searching for music. In this thesis several audio fingerprinting solutions are combined into a single solution to determine if such a combination can yield better results than any of the solutions can separately. The solution is used to find duplicate music files in a personal collection. The results show that applying the weighted root-mean square (WRMS) to the problem most effectively ranked the results in a satisfying manner. It was notably better than the other approaches tried. The WRMS produced 61% more correct matches than the original FDMF solution, and 49% more correct matches than libFooID.

APA, Harvard, Vancouver, ISO, and other styles

5

Morimoto, Norishige. "Techniques for data hiding in audio files." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/11422.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.
Includes bibliographical references (leaves 75-76).
by Norishige Morimoto.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

6

Spina, Michelle S. (Michelle Suzanne). "Analysis and transcription of general audio data." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86479.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.
Includes bibliographical references (p. 141-147).
by Michelle S. Spina.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

7

Gartenlaub, Arie Gal. "Hi fi digital audio tape to SUN workstation transfer system for digital audio data." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1994. http://handle.dtic.mil/100.2/ADA282550.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Shelley, Michael. "Bay audio repair website & data management application." Click here to view, 2010. http://digitalcommons.calpoly.edu/cscsp/5/.

Full text

Abstract:

Thesis (B.S.)--California Polytechnic State University, 2010.
Project advisor: Franz Kurfess. Title from PDF title page; viewed on Apr. 19, 2010. Includes bibliographical references. Also available on microfiche.

APA, Harvard, Vancouver, ISO, and other styles

9

Lu, Xinyou. "Inversion of controlled-source audio-frequency magnetotelluric data /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/6799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, Jong Seo. "RECOMMENDER SYSTEM FOR AUDIO RECORDINGS." DigitalCommons@CalPoly, 2010. https://digitalcommons.calpoly.edu/theses/238.

Full text

Abstract:

Nowadays the largest E-commerce or E-service websites offer millions of products for sale. A Recommender system is defined as software used by such websites for recommending commercial or noncommercial product items to users according to the users’ tastes. In this project, we develop a recommender system for a private multimedia web service company. In particular, we devise three recommendation engines using different data filtering methods – named weighted-average, K-nearest neighbors, and item-based – which are based on collaborative filtering techniques, which work by recording user preferences on items and by anticipating the future likes and dislikes of users by comparing the records, for prediction of user preference. To acquire proper input data for the three engines, we retrieve data from database using three data collection techniques: active filtering, passive filtering, and item-based filtering. For experimental purpose we compare prediction accuracy of those three recommendation engines with the results from each engine and additionally we evaluate the performance of weighted-average method using an empirical analysis approach – a methodology which was devised for verification of predictive accuracy.

APA, Harvard, Vancouver, ISO, and other styles

11

Abefelt, Fredrik. "Synchronized audio playback over WIFI and Ethernet : A proof of concept multi-room audio playback system." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187345.

Full text

Abstract:

This thesis aims to develop an audio playback system, which can perform synchronized audio playback on multiple devices. Two different approaches for developing the system has been investigated, one using an already existing off the self product, and the other using an open source framework. The system developed is a proof-of-concept that can perform synchronized playback five devices, connected by Wi-Fi or Ethernet. The system developed can use Bluetooth devices or common media players as the sound source for the system.
Huvuduppgift med detta examensarbete har varit att utveckla ett synkroniserat ljuduppspelningssystem, vilket kan spela upp ljud samtidigt på flera enheter, enheterna är anslutna med antingen med Wi-Fi eller Ethernet. Två olika tillvägagångsätt har undersökts för att utveckla systemet, ett redan färdigt system och ett system baserat på ett ramverk med öppen källkod. Det utvecklade systemet kan utföra synkroniserad uppspelning på fem olika enheter och kan använda Blueetooth enheter och olika mediaspelare som ljudkälla.

APA, Harvard, Vancouver, ISO, and other styles

12

Bosch, Vicente Juan José. "From heuristics-based to data-driven audio melody extraction." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404678.

Full text

Abstract:

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications.
La identificación de la melodía en una grabación musical es una tarea relativamente fácil para seres humanos, pero muy difícil para sistemas computacionales. Esta tarea se conoce como "extracción de melodía", más formalmente definida como la estimación automática de la secuencia de alturas correspondientes a la melodía de una grabación de música polifónica. Esta tesis investiga los beneficios de utilizar conocimiento derivado automáticamente de datos para extracción de melodía, combinando procesado digital de la señal y métodos de aprendizaje automático. Ampliamos el alcance de la investigación en este campo, al trabajar con un conjunto de datos variado y múltiples definiciones de melodía. En primer lugar presentamos un extenso análisis comparativo del estado de la cuestión y realizamos una evaluación en un contexto de música sinfónica. A continuación, proponemos métodos de extracción de melodía basados en modelos de fuente-filtro y la caracterización de contornos tonales, y los evaluamos en varios géneros musicales. Finalmente, investigamos la caracterización de contornos con información de timbre, tonalidad y posición espacial, y proponemos un método para la estimación de múltiples líneas melódicas. La combinación de enfoques supervisados y no supervisados lleva a mejoras en la extracción de melodía y muestra un camino prometedor para futuras investigaciones y aplicaciones.

APA, Harvard, Vancouver, ISO, and other styles

13

Wang, Shuai. "Embedding data in an audio signal, using acoustic OFDM." Thesis, Linköpings universitet, Kommunikationssystem, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-71427.

Full text

Abstract:

The OFDM technology has been extensively used in many radio communicationtechnologies. For example, OFDM is the core technology applied in WiFi, WiMAXand LTE. Its main advantages include high bandwidth utilization, strong noise im-munity and the capability to resist frequency selective fading. However, OFDMtechnology is not only applied in the ﬁeld of radio communication, but has alsobeen developed greatly in acoustic communication, namely the so called acousticOFDM. Thanks to the acoustic OFDM technology, the information can be em-bedded in audio and then transmitted so that the receiver can obtain the requiredinformation through certain demodulation mechanisms without severely aﬀectingthe audio quality.This thesis mainly discusses how to embed and transmit information in audioby making use of acoustic OFDM. Based on the theoretical systematic structure, italso designs a simulation system and a measurement system respectively. In thesetwo systems, channel coding, manners of modulation and demodulation, timingsynchronization and parameters of the functional components are conﬁgured in themost reasonable way in order to achieve relatively strong stability and robustnessof the system. Moreover, power control and the compatibility between audio andOFDM signals are also explained and analyzed in this thesis.Based on the experimental results, the author analyzes the performance of thesystem and the factors that aﬀect the performance of the system, such as the typeof audio, distance between transmitter and receiver, audio output level and so on.According to this analysis, it is proved that the simulation system can work steadilyin any audio of wav format and transmit information correctly. However, dueto the hardware limitations of the receiver and sender devices, the measurementsystem is unstable to a certain degree. Finally, this thesis draws conclusions of theresearch results and points out unsolved problems in the experiments. Eventually,some expectations for this research orientation are stated and relevant suggestionsare proposed.

APA, Harvard, Vancouver, ISO, and other styles

14

Marques, Janet 1976. "An automatic annotation system for audio data containing music." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80547.

Full text

Abstract:

Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (leaves 51-53).
by Janet Marques.
S.B.and M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

15

Kohlsdorf, Daniel. "Data mining in large audio collections of dolphin signals." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53968.

Full text

Abstract:

The study of dolphin cognition involves intensive research of animal vocal- izations recorded in the field. In this dissertation I address the automated analysis of audible dolphin communication. I propose a system called the signal imager that automatically discovers patterns in dolphin signals. These patterns are invariant to frequency shifts and time warping transformations. The discovery algorithm is based on feature learning and unsupervised time series segmentation using hidden Markov models. Researchers can inspect the patterns visually and interactively run com- parative statistics between the distribution of dolphin signals in different behavioral contexts. The required statistics for the comparison describe dolphin communication as a combination of the following models: a bag-of-words model, an n-gram model and an algorithm to learn a set of regular expressions. Furthermore, the system can use the patterns to automatically tag dolphin signals with behavior annotations. My results indicate that the signal imager provides meaningful patterns to the marine biologist and that the comparative statistics are aligned with the biologists’ domain knowledge.

APA, Harvard, Vancouver, ISO, and other styles

16

Fridlund, Julia. "Processing of Noisy Controlled Source Audio Magnetotelluric (CSAMT) Data." Thesis, Uppsala universitet, Geofysik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-396255.

Full text

Abstract:

Controlled Source Audio Magnetotellurics (CSAMT) is a geophysical method for characterizing the resistivity of the subsurface with the help of electromagnetic waves. The method is used for various purposes, such as geothermal- and hydrocarbon exploration, mineral prospecting and for investigation of groundwater resources. Electromagnetic fields are created by running an alternating current in a grounded electric dipole and by varying the frequency, different depths can be targeted. Orthogonal components of the electromagnetic fields are measured at receiver stations a few kilometers away from the source. From these field components, so called magnetotellurics transfer functions are estimated, which can be used to invert for the resistivity of the subsurface. The data used in this project is from a survey conducted in 2014 and 2016 in Kiruna by Uppsala University and the mining company LKAB. Measurements were made at 31 stations along two orthogonal profiles. The data have been processed earlier, but due to noise, especially in the lower frequencies, a significant part of the data set could not be inverted. The aim of this project was to improve the results by analyzing the data and testing different methods to remove noise. First, robust regression was used to account for possible non-Gaussian noise in the estimation of the magnetotelluric transfer functions. Except for one station on profile 1, the robust method did not improve the results, which suggests that the noise is mostly Gaussian. Then modified versions of least squares, each affected by a different bias, were used to estimate the transfer functions. Where there is more noise, the estimates should differ more due to their different biases. The estimates differed most for low frequencies and especially on the part of profile 2 that was measured in 2014. It was investigated whether the railway network could explain part of the low frequency noise. Measures were taken to reduce spectral leakage from the railway signal at 16 ⅔ Hz to the closest transmitter frequencies 14 Hz and 20 Hz, but no clear improvement was seen and more detailed studies need to be conducted to determine this matter. Finally, a method based on comparing the ratio of short-term and long-term averages was tested to remove transients in the measured time series of the electromagnetic field components. This proved difficult to implement due to the variability of the time series’ behavior between different stations, frequencies and field components. However, the method showed some potential for stations 9 and 10 on profile 1, and could probably be developed further to remove transients more efficiently and thus improve the data.
Magnetotellurik med kontrollerad källa (förkortat CSAMT på engelska) är en metod där elektromagnetiska fält används för att undersöka markens resistivitet. Resisitivitet är ett mått på hur bra eller dåligt marken leder elektriska strömmar. Metoden används till exempel för att mäta djupet till berggrunden, som oftast har högre resistivitet (sämre ledningsförmåga) än marken ovanför. Man kan också hitta metaller, så som guld och koppar, vilka har väldigt låg resistivitet (bra ledningsförmåga). Elektromagnetiska vågor skapas genom att man låter en växelström gå igenom en lång ledning. Vågorna färdas först genom luften och sen ner i marken. Hur djupt ner de når beror på växelströmmens frekvens; med låga frekvenser når vågorna djupare ner i marken än med höga. Under markytan inducerar de elektromagnetiska vågorna elektriska strömmar, så kallade telluriska strömmar (dvs. jordströmmar). Strömmarna blir svagare ju längre de färdas och hur snabbt de avtar i styrka beror på jordens resistivitet. Strömmarna skapar också nya elektriska och magnetiska fält som färdas tillbaka mot ytan. Vid markytan mäter man fältens styrka för olika frekveser, vilket då ger information om resistiviteten på olika djup. Från mätningarna tar man ofta fram så kallade magnetotelluriska överföringsfunktioner. Dessa överföringsfunktioner gör det lättare att tolka datan och ta reda på resistiviteten hos marken. I detta projekt har CSAMT-data använts från en undersökning i Kiruna som genomfördes av Uppsala Universitet och gruvföretaget LKAB. Datan har bearbetats tidigare, men på grund av mycket brus i mätningarna blev inte resultatet så bra som väntat. Brus kan komma från allt som genererar elektromagnetiska fält, till exempel elledningar, tågledningar eller naturliga variationer i jordens egna magnetfält. Målet med projektet var att förbättra resultatet genom att analysera datan och testa olika metoder för att ta bort brus. Den vanligaste metoden för att beräkna överföringsfunktionerna antar att det magnetiska fältet är fritt från brus. Detta är inte nödvändigtvis sant och kan leda till bias, alltså ett snedvridet resultat. Andra sätt att beräkna överföringsfunktionerna på ger olika bias. Det här kan man utnyttja för att se hur mycket brus som finns i datan. Om det inte finns något brus alls så blir alla överföringsfunktioner lika, medan om det finns mycket brus så skiljer de sig mer åt. På detta sätt upptäcktes att det var mer brus för frekvenserna 14 och 20 Hz (där 1 Hz är 1 svängning per sekund). En förklaring till det kan vara att tågledningar, som genererar elektromagnetiska fält med 16.67 Hz, ligger nära i frekvens och stör dessa signaler. För att minska brusets påverkan testades så kallad robust processering. Det innebär att man lägger mindre vikt vid de mätningar som tycks vara mycket annorlunda (alltså innehåller mer brus) från andra mätningar. Tyvärr så hjälpte inte denna strategi nämnvärt för att förbättra resultatet. Till sist tog vi fram en metod för att ta bort transienter, vilket är kortvarigt brus med hög intensitet. Transienter kan till exempel komma från åskblixtar, som ju är kortvariga elektriska urladdningar. Det visade sig dock att detta inte var helt enkelt, då det var svårt att se vad som var brus och vad som bara var naturliga variationer hos de elektromagnetiska fälten. Men i några fall kunde bruset urskiljas och därför verkar det troligt att fortsatt arbete med denna metod skulle kunna ge ännu bättre resultat.

APA, Harvard, Vancouver, ISO, and other styles

17

Chin, Craig. "Multilevel data compression techniques for transmission of audio over networks." FIU Digital Commons, 2001. http://digitalcommons.fiu.edu/etd/2336.

Full text

Abstract:

With the spread of the Internet into mainstream society, there has come a demand for the efficient transmission of multimedia information. Accompanying the drive to find more efficient ways of utilizing limited transmission bandwidth is a need to find novel ways of compressing data. This thesis proposed the utilization of transform coding compression techniques for the transmission of audio data across networks. The Discrete Cosine Transform (DCT) and the Discrete Sine Transform (DST) were the primary transforms utilized. This thesis investigated the viability of utilizing individual transforms, as well as, nested modifications of these transforms for compression purposes. These techniques were compared to those already in existence. Viability was determined using objective compression measures. It was found that transform coding techniques gave a useful alternative to the techniques in existence. A voice-over-IP (VOIP) application that utilized one of the transform coding techniques was implemented.

APA, Harvard, Vancouver, ISO, and other styles

18

Navalekar, Abhijit C. "Design of a high data rate audio band OFDM modem." Worcester, Mass. : Worcester Polytechnic Institute, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-041806-174713/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Kwon, Patrick (Patrick Ryan) 1975. "Speaker spotting : automatic annotation of audio data with speaker identity." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/47608.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Fenet, Sébastien. "Empreintes audio et stratégies d'indexation associées pour l'identification audio à grande échelle." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0051/document.

Full text

Abstract:

Dans cet ouvrage, nous définissons précisément ce qu’est l’identification audio à grande échelle. En particulier, nous faisons une distinction entre l’identification exacte, destinée à rapprocher deux extraits sonores provenant d’un même enregistrement, et l’identification approchée, qui gère également la similarité musicale entre les signaux. A la lumière de ces définitions, nous concevons et examinons plusieurs modèles d’empreinte audio et évaluons leurs performances, tant en identification exacte qu’en identificationapprochée
N this work we give a precise definition of large scale audio identification. In particular, we make a distinction between exact and approximate matching. In the first case, the goal is to match two signals coming from one same recording with different post-processings. In the second case, the goal is to match two signals that are musically similar. In light of these definitions, we conceive and evaluate different audio-fingerprint models

APA, Harvard, Vancouver, ISO, and other styles

21

Hemgren, Dan. "Fuzzy Content-Based Audio Retrieval Using Visualization Tools." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264514.

Full text

Abstract:

Music composition and sound design in the digital domain often involves sifting through large collections of audio files to find the right sample. Traditionally, this involves searching through metadata such as filenames and descriptors either via text search or by manually searching through folders. This paper presents a fast, scalable method for implementing a search engine in which the contents of audio files are used as queries to retrieve similar audio files. The presented approach applies visualization tools to speed up retrieval time compared to a simple KD-Tree algorithm. Qualitative and quantitative results are presented and benefits and drawbacks of the approach are discussed. While the qualitative results show promise, they are deemed inconclusive. Via the quantitative results, it is found that the application of UMAP yield an order-of-magnitude speed-up at a loss of accuracy and that the approach scales well with larger datasets.
Digital ljuddesign och musikkomposition innebär ofta ett sökande genom stora samlingar av ljudfiler efter rätt sampling. Traditionellt sett innebär detta antingen textsökning via metadata såsom filnamn och tags eller manuell sökning genom filstrukturer. Denna rapport presenterar en snabb, skalbar lösning i form av en sökmotor som möjliggör användandet av en ljudfil för innehållsbaserad sökning som hittar liknande ljudfiler. Den presenterade lösningen använder visualiseringsverktyg för att snabba upp hämtningstiden jämför med enkla KD-tree-algoritmer. Kvalitativa och kvantitativa resultat presenteras och för- och nackdelar med lösningen diskuteras. De kvalitativa resultaten visar på potential men bedöms vara ofullständiga. De kvantitativa resultaten påvisar storleksordningar kortare hämtningstid då UMAP används, dock med sänkt noggrannhet som följd, och lösningen visar sig skala väl med större mängder data.

APA, Harvard, Vancouver, ISO, and other styles

22

Hansen, Vedal Amund. "Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264947.

Full text

Abstract:

Despite the recent successes of neural networks in a variety of domains, musical audio modeling is still considered a hard task, with features typically spanning tens of thousands of dimensions in input space. By formulating audio data compression as an unsupervised learning task, this project investigates the applicability of vector quantized neural network autoencoders for compressing spectrograms – image-like representations of audio. Using a recently proposed gradient-based method for approximating waveforms from reconstructed (real-valued) spectrograms, the discrete pipeline produces listenable reconstructions of surprising fidelity compared to uncompressed versions, even for out-of-domain examples. The results suggest that the learned discrete quantization method achieves about 9x harder spectrogram compression compared to its continuous counterpart, while achieving similar reconstructions, both qualitatively and in terms of quantitative error metrics.
Trots de senaste framgångarna för neurala nätverk på en rad olika områden är musikalisk ljudmodellering fortfarande en svår uppgift, med karakteristiska egenskaper som spänner över tiotusentals dimensioner i inputrymnden. Genom att formulera ljuddatakomprimering som en oövervakad inlärningsuppgift undersöker detta projekt användbarheten av vektorkvantiserade neurala nätverkbaserade självkodare på spektrogram – en bildliknande representation av ljud. Med en nyligen beskriven gradientbaserad metod för approximering av vågformer från rekonstruerade (realvärda) spektrogram, producerar den diskreta pipelinen lyssningsbara rekonstruktioner med överraskande ljudåtergivning jämfört med okomprimerade versioner, även för exempel utanför domänen. Resultaten tyder på att den lärda diskreta kvantiseringsmetoden uppnår ungefär nio gånger hårdare spektrogramkompression jämfört med sin kontinuerliga motsvarighet, samtidigt som den skapar liknande rekonstruktioner, både kvalitativt och enligt kvantitativa felmått.

APA, Harvard, Vancouver, ISO, and other styles

23

Olofsson, Oskar. "Detecting Unsynchronized Audio and Subtitles using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261414.

Full text

Abstract:

Unsynchronized audio and subtitle files are common within streaming media. As subtitles often are an essential part of the viewing experience, this can have large consequences, possibly making the content inaccessible. Detecting the unsynchronization manually is a time consuming task, as entire media files have to be viewed and evaluated by a person. In this thesis an investigation on how to detect unsynchronized audio and subtitles automatically using machine learning is performed. The process is divided into two parts. The first part consists of training the models Support Vector Machine, Random Forest and Multilayer Perceptron to classify whether subtitles should be present given features extracted from audio. As a part of this process the algorithms are compared and evaluated based on their accuracy and time-efficiency. The second part is composed of using the best model to detect unsynchronization. It is done through a similarity measurement between the predicted subtitle distribution and the distribution of the actual subtitles. If a better similarity can be found through shifting the subtitles, the files are classified as unsynchronized. The project shows that Random Forest has the highest accuracy and is thus best suited for the purpose. Of ten file pairs tested for unsynchronization the method successfully categorized nine of them. The conclusion is that the approach is working, yet future work includes increasing the accuracy through testing other algorithms and audio feature extraction techniques.
Osynkroniserade ljud- och undertextfiler är vanligt förekommande inom strömmande media. Då undertexter ofta står för en viktig del i upplevelsen av innehållet, så kan detta medföra stora konsekvenser, och till och med göra innehållet obegripligt för vissa tittare. Att manuellt detektera detta problem är tidskrävande eftersom mediafiler måste undersökas och utvärderas i detalj. I examensarbetet undersöks hur maskininlärning kan användas för att automatiskt detektera om ljud och undertexter är osynkroniserade. Processen kan beskrivas i två delar, där den första delen går ut på att träna maskinlärningsmodellerna Support Vector Machine, Random Forest och Multilayer Perceptron att klassificera huruvida det ska vara undertext eller inte. Input till modellerna består av data extraherat från ljudfiler. För att avgöra vilken model som är bäst jämförs modellerna och utvärderas med avseende på träffsäkerheten och tidseffektiviten. Den andra delen går ut på att använda den bästa modellen för att detektera om ljud- och undertextfiler är osynkroniserade. Detta görs genom att jämföra den förutspådda undertextfördelningen med den faktiska undertexten och söka efter en bättre överensstämmelse genom att förskjuta undertexten, om det går att påvisa att en bättre överensstämmelse finns tillgänglig klassificeras filparet osynkroniserat. Arbetet visar på att Random Forest har högst träffsäkerhet och lämpar sig bäst för ändamålet. Av tio filpar som testades lyckades metoden göra en korrekt detektering för nio av filparen. Slutsatsen är att metoden fungerar men att ytterliggare arbete innefattar att öka träffsäkerheten genom att testa andra algoritmer eller ljudextraheringstekniker.

APA, Harvard, Vancouver, ISO, and other styles

24

Hargreaves, Steven. "Music metadata capture in the studio from audio and symbolic data." Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8816.

Full text

Abstract:

Music Information Retrieval (MIR) tasks, in the main, are concerned with the accurate generation of one of a number of different types of music metadata {beat onsets, or melody extraction, for example. Almost always, they operate on fully mixed digital audio recordings. Commonly, this means that a large amount of signal processing effort is directed towards the isolation, and then identification, of certain highly relevant aspects of the audio mix. In some cases, results of one MIR algorithm are useful, if not essential, to the operation of another { a chord detection algorithm for example, is highly dependent upon accurate pitch detection. Although not clearly defined in all cases, certain rules exist which we may take from music theory in order to assist the task { the particular note intervals which make up a specific chord, for example. On the question of generating accurate, low level music metadata (e.g. chromatic pitch and score onset time), a potentially huge advantage lies in the use of multitrack, rather than mixed, audio recordings, in which the separate instrument recordings may be analysed in isolation. Additionally, in MIR, as in many other research areas currently, there is an increasing push towards the use of the Semantic Web for publishing metadata using the Resource Description Framework (RDF). Semantic Web technologies, though, also facilitate the querying of data via the SPARQL query language, as well as logical inferencing via the careful creation and use of web ontology language (OWL) ontologies. This, in turn, opens up the intriguing possibility of deferring our decision regarding which particular type of MIR query to ask of our low-level music metadata until some point later down the line, long after all the heavy signal processing has been carried out. In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata capture, and exploitation of open, machine-readable Semantic Web data. Using the specific example of structural segmentation, we demonstrate that by analysing multitrack rather than mixed audio, we are able to achieve a significant and quantifiable increase in the accuracy of our segmentation algorithm. We also provide details of a new multitrack audio dataset with structural segmentation annotations, created as part of this research, and available for public use. Furthermore, we show that it is possible to fully implement a pair of pattern discovery algorithms (the SIA and SIATEC algorithms { highly applicable, but not restricted to, symbolic music data analysis) using only SemanticWeb technologies { the SPARQL query language, acting on RDF data, in tandem with a small OWL ontology. We describe the challenges encountered by taking this approach, the particular solution we've arrived at, and we evaluate the implementation both in terms of its execution time, and also within the wider context of our vision for a new MIR paradigm.

APA, Harvard, Vancouver, ISO, and other styles

25

Ruiter, Julia. "Practical Chaos: Using Dynamical Systems to Encrypt Audio and Visual Data." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/scripps_theses/1389.

Full text

Abstract:

Although dynamical systems have a multitude of classical uses in physics and applied mathematics, new research in theoretical computer science shows that dynamical systems can also be used as a highly secure method of encrypting data. Properties of Lorenz and similar systems of equations yield chaotic outputs that are good at masking the underlying data both physically and mathematically. This paper aims to show how Lorenz systems may be used to encrypt text and image data, as well as provide a framework for how physical mechanisms may be built using these properties to transmit encrypted wave signals.

APA, Harvard, Vancouver, ISO, and other styles

26

Reuben, Mugisha. "Addressing Civil Servants´ training needs through audio-visual content." Thesis, Örebro universitet, Handelshögskolan vid Örebro universitet, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-12558.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lanciani, Christopher A. "Compressed-domain processing of MPEG audio signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13760.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Pérez, López Andrés. "Parametric analysis of ambisonic audio: a contributions to methods, applications and data generation." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669962.

Full text

Abstract:

Due to the recent advances in virtual and augmented reality, ambisonics has emerged as the de facto standard for immersive audio. Ambisonic audio can be captured using spherical microphone arrays, which are becoming increasingly popular. Yet, many methods for acoustic and microphone array signal processing are not speci cally tailored for spherical geometries. Therefore, there is still room for improvement in the eld of automatic analysis and description of ambisonic recordings. In the present thesis, we tackle this problem using methods based on the parametric analysis of the sound eld. Speci cally, we present novel contributions in the scope of blind reverberation time estimation, diffuseness estimation, and sound event localization and detection. Furthermore, several software tools developed for ambisonic dataset generation and management are also presented.

APA, Harvard, Vancouver, ISO, and other styles

29

Smith, Strether. "DATA ACQUISITION SYSTEMS FOR AUDIO-FREQUENCY, MECHANICAL-TESTING APPLICATIONS — RECENT DEVELOPMENTS 2001 —." International Foundation for Telemetering, 2001. http://hdl.handle.net/10150/606437.

Full text

Abstract:

International Telemetering Conference Proceedings / October 22-25, 2001 / Riviera Hotel and Convention Center, Las Vegas, Nevada
The objective of any data acquisition system is to make accurate measurements of physical phenomena. Many of the phenomena to be characterized contain data that is in the audio-frequency range between 0 and 50,000 Hertz. Examples include structural vibration, wind-tunnel measurements, turbine engines and acoustics in air and water. These tests often require a large number of channels and may be very expensive. In some cases, there may be only one opportunity to acquire the data. This paper describes a testing/measurement philosophy and the use of advances in available hardware/software systems to implement the requirements. Primary emphasis is on robustness (assurance that critical data is properly recorded), measurement/characterization of unexpected results (generated by accidents or unexpected behavior), and test safety (for both the test article and the facility). Finally, a data acquisition system that encompasses the features discussed is described.

APA, Harvard, Vancouver, ISO, and other styles

30

Chen, Howard. "AZIP, audio compression system: Research on audio compression, comparison of psychoacoustic principles and genetic algorithms." CSUSB ScholarWorks, 2005. https://scholarworks.lib.csusb.edu/etd-project/2617.

Full text

Abstract:

The purpose of this project is to investigate the differences between psychoacoustic principles and genetic algorithms (GA0). These will be discussed separately. The review will also compare the compression ratio and the quality of the decompressed files decoded by these two methods.

APA, Harvard, Vancouver, ISO, and other styles

31

Melih, Kathy, and n/a. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Griffith University. School of Information Technology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20050114.081327.

Full text

Abstract:

The information age has brought with it a dual problem. In the first place, the ready access to mechanisms to capture and store vast amounts of data in all forms (text, audio, image and video), has resulted in a continued demand for ever more efficient means to store and transmit this data. In the second, the rapidly increasing store demands effective means to structure and access the data in an efficient and meaningful manner. In terms of audio data, the first challenge has traditionally been the realm of audio compression research that has focused on statistical, unstructured audio representations that obfuscate the inherent structure and semantic content of the underlying data. This has only served to further complicate the resolution of the second challenge resulting in access mechanisms that are either impractical to implement, too inflexible for general application or too low level for the average user. Thus, an artificial dichotomy has been created from what is in essence a dual problem. The founding motivation of this thesis is that, although the hypermedia model has been identified as the ideal, cognitively justified method for organising data, existing audio data representations and coding models provide little, if any, support for, or resemblance to, this model. It is the contention of the author that any successful attempt to create hyperaudio must resolve this schism, addressing both storage and information management issues simultaneously. In order to achieve this aim, an audio representation must be designed that provides compact data storage while, at the same time, revealing the inherent structure of the underlying data. Thus it is the aim of this thesis to present a representation designed with these factors in mind. Perhaps the most difficult hurdle in the way of achieving the aims of content-based audio coding and information management is that of auditory source separation. The MPEG committee has noted this requirement during the development of its MPEG-7 standard, however, the mechanics of "how" to achieve auditory source separation were left as an open research question. This same committee proposed that MPEG-7 would "support descriptors that can act as handles referring directly to the data, to allow manipulation of the multimedia material." While meta-data tags are a part solution to this problem, these cannot allow manipulation of audio material down to the level of individual sources when several simultaneous sources exist in a recording. In order to achieve this aim, the data themselves must be encoded in such a manner that allows these descriptors to be formed. Thus, content-based coding is obviously required. In the case of audio, this is impossible to achieve without effecting auditory source separation. Auditory source separation is the concern of computational auditory scene analysis (CASA). However, the findings of CASA research have traditionally been restricted to a limited domain. To date, the only real application of CASA research to what could loosely be classified as information management has been in the area of signal enhancement for automatic speech recognition systems. In these systems, a CASA front end serves as a means of separating the target speech from the background "noise". As such, the design of a CASA-based approach, as presented in this thesis, to one of the most significant challenges facing audio information management research represents a significant contribution to the field of information management. Thus, this thesis unifies research from three distinct fields in an attempt to resolve some specific and general challenges faced by all three. It describes an audio representation that is based on a sinusoidal model from which low-level auditory primitive elements are extracted. The use of a sinusoidal representation is somewhat contentious with the modern trend in CASA research tending toward more complex approaches in order to resolve issues relating to co-incident partials. However, the choice of a sinusoidal representation has been validated by the demonstration of a method to resolve many of these issues. The majority of the thesis contributes several algorithms to organise the low-level primitives into low-level auditory objects that may form the basis of nodes or link anchor points in a hyperaudio structure. Finally, preliminary investigations in the representations suitability for coding and information management tasks are outlined as directions for future research.

APA, Harvard, Vancouver, ISO, and other styles

32

Melih, Kathy. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Thesis, Griffith University, 2004. http://hdl.handle.net/10072/366279.

Full text

Abstract:

The information age has brought with it a dual problem. In the first place, the ready access to mechanisms to capture and store vast amounts of data in all forms (text, audio, image and video), has resulted in a continued demand for ever more efficient means to store and transmit this data. In the second, the rapidly increasing store demands effective means to structure and access the data in an efficient and meaningful manner. In terms of audio data, the first challenge has traditionally been the realm of audio compression research that has focused on statistical, unstructured audio representations that obfuscate the inherent structure and semantic content of the underlying data. This has only served to further complicate the resolution of the second challenge resulting in access mechanisms that are either impractical to implement, too inflexible for general application or too low level for the average user. Thus, an artificial dichotomy has been created from what is in essence a dual problem. The founding motivation of this thesis is that, although the hypermedia model has been identified as the ideal, cognitively justified method for organising data, existing audio data representations and coding models provide little, if any, support for, or resemblance to, this model. It is the contention of the author that any successful attempt to create hyperaudio must resolve this schism, addressing both storage and information management issues simultaneously. In order to achieve this aim, an audio representation must be designed that provides compact data storage while, at the same time, revealing the inherent structure of the underlying data. Thus it is the aim of this thesis to present a representation designed with these factors in mind. Perhaps the most difficult hurdle in the way of achieving the aims of content-based audio coding and information management is that of auditory source separation. The MPEG committee has noted this requirement during the development of its MPEG-7 standard, however, the mechanics of "how" to achieve auditory source separation were left as an open research question. This same committee proposed that MPEG-7 would "support descriptors that can act as handles referring directly to the data, to allow manipulation of the multimedia material." While meta-data tags are a part solution to this problem, these cannot allow manipulation of audio material down to the level of individual sources when several simultaneous sources exist in a recording. In order to achieve this aim, the data themselves must be encoded in such a manner that allows these descriptors to be formed. Thus, content-based coding is obviously required. In the case of audio, this is impossible to achieve without effecting auditory source separation. Auditory source separation is the concern of computational auditory scene analysis (CASA). However, the findings of CASA research have traditionally been restricted to a limited domain. To date, the only real application of CASA research to what could loosely be classified as information management has been in the area of signal enhancement for automatic speech recognition systems. In these systems, a CASA front end serves as a means of separating the target speech from the background "noise". As such, the design of a CASA-based approach, as presented in this thesis, to one of the most significant challenges facing audio information management research represents a significant contribution to the field of information management. Thus, this thesis unifies research from three distinct fields in an attempt to resolve some specific and general challenges faced by all three. It describes an audio representation that is based on a sinusoidal model from which low-level auditory primitive elements are extracted. The use of a sinusoidal representation is somewhat contentious with the modern trend in CASA research tending toward more complex approaches in order to resolve issues relating to co-incident partials. However, the choice of a sinusoidal representation has been validated by the demonstration of a method to resolve many of these issues. The majority of the thesis contributes several algorithms to organise the low-level primitives into low-level auditory objects that may form the basis of nodes or link anchor points in a hyperaudio structure. Finally, preliminary investigations in the representation’s suitability for coding and information management tasks are outlined as directions for future research.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

33

Goussard, George Willem. "Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems." Thesis, Stellenbosch : University of Stellenbosch, 2011. http://hdl.handle.net/10019.1/6686.

Full text

Abstract:

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2011.
ENGLISH ABSTRACT: This thesis presents a system that is designed to replace the manual process of generating a pronunciation dictionary for use in automatic speech recognition. The proposed system has several stages. The first stage segments the audio into what will be known as the subword units, using a frequency domain method. In the second stage, dynamic time warping is used to determine the similarity between the segments of each possible pair of these acoustic segments. These similarities are used to cluster similar acoustic segments into acoustic clusters. The final stage derives a pronunciation dictionary from the orthography of the training data and corresponding sequence of acoustic clusters. This process begins with an initial mapping between words and their sequence of clusters, established by Viterbi alignment with the orthographic transcription. The dictionary is refined iteratively by pruning redundant mappings, hidden Markov model estimation and Viterbi re-alignment in each iteration. This approach is evaluated experimentally by applying it to two subsets of the TIMIT corpus. It is found that, when test words are repeated often in the training material, the approach leads to a system whose accuracy is almost as good as one trained using the phonetic transcriptions. When test words are not repeated often in the training set, the proposed approach leads to better results than those achieved using the phonetic transcriptions, although the recognition is poor overall in this case.
AFRIKAANSE OPSOMMING: Die doelwit van die tesis is om ’n stelsel te beskryf wat ontwerp is om die handgedrewe proses in die samestelling van ’n woordeboek, vir die gebruik in outomatiese spraakherkenningsstelsels, te vervang. Die voorgestelde stelsel bestaan uit ’n aantal stappe. Die eerste stap is die segmentering van die oudio in sogenaamde sub-woord eenhede deur gebruik te maak van ’n frekwensie gebied tegniek. Met die tweede stap word die dinamiese tydverplasingsalgoritme ingespan om die ooreenkoms tussen die segmente van elkeen van die moontlike pare van die akoestiese segmente bepaal. Die ooreenkomste word dan gebruik om die akoestiese segmente te groepeer in akoestiese groepe. Die laaste stap stel die woordeboek saam deur gebruik te maak van die ortografiese transkripsie van afrigtingsdata en die ooreenstemmende reeks akoestiese groepe. Die finale stap begin met ’n aanvanklike afbeelding vanaf woorde tot hul reeks groep identifiseerders, bewerkstellig deur Viterbi belyning en die ortografiese transkripsie. Die woordeboek word iteratief verfyn deur oortollige afbeeldings te snoei, verskuilde Markov modelle af te rig en deur Viterbi belyning te gebruik in elke iterasie. Die benadering is getoets deur dit eksperimenteel te evalueer op twee subversamelings data vanuit die TIMIT korpus. Daar is bevind dat, wanneer woorde herhaal word in die afrigtingsdata, die stelsel se benadering die akkuraatheid ewenaar van ’n stelsel wat met die fonetiese transkripsie afgerig is. As die woorde nie herhaal word in die afrigtingsdata nie, is die akkuraatheid van die stelsel se benadering beter as wanneer die stelsel afgerig word met die fonetiese transkripsie, alhoewel die akkuraatheid in die algemeen swak is.

APA, Harvard, Vancouver, ISO, and other styles

34

Lopes, Batres Mario. "Integrating Spatial Audio in Voice Guidance Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289640.

Full text

Abstract:

Navigation systems are commonly used in our daily lives. Research has shown that spatial audio presents one opportunity for more effectively communicating to the driver the direction of the next manoeuvre. This thesis project proposes a new feature for the spatialisation of the audio cues triggered by a mobile navigation system by using a virtualised vector-based panning (VVBP) architecture for the encoding and decoding. The prototype developed during this thesis enables the spatialisation using headphones- or speakers- based systems. This study aims to promote a new sound experience to the user, which can be used to increase the safety and performance of driving. Based on an expert review and a user test, the application was tested on different scenarios. The participants selected during these sessions were part of HERE Technologies, which made possible to reach design experts who knew the current application provided by the company beforehand, making easier the comparison with the proposal. This selection could also present a limitation on the study since the users might have a personal bias for seeing new features in a product which have already worked on. Analysis of the results obtained during the testing session demonstrated high satisfaction with the feature by the users and a better understanding of their surroundings. Consequently, this indicates that spatial audio can improve the performance of driving by introducing a new source of information for positioning the next turn or obstacle. Further research is needed to identify other factors that could strengthen the effectiveness of the product.
Navigationssystem används ofta i våra dagliga liv. Forskning har visat att rumsligt ljud ger ett tillfälle att effektivare kommunicera till föraren i riktningen för nästa manöver. Detta avhandlingsprojekt föreslår en ny funktion för spatialisering av ljudkoder som utlöses av ett mobilnavigeringssystem med hjälp av en virtualiserad vektorbaserad panorering (VVBP) arkitektur för kodning och avkodning. Prototypen som utvecklats under denna avhandling möjliggör spatialisering med hörlurar eller högtalarbaserade system. Denna studie syftar till att främja en ny ljudupplevelse för användaren, som kan användas för att öka säkerheten och prestandan vid körning. Baserat på en expertgranskning och ett användartest testades applikationen på olika scenarier. Deltagarna som valdes ut under dessa sessioner var en del av HERE Technologies, som gjorde det möjligt att nå konstruktionsexperter som kände till den nuvarande applikationen från företaget i förväg, vilket underlättar jämförelsen med förslaget. Detta val kan också utgöra en begränsning av studien eftersom användarna kan ha en personlig fördom för att se nya funktioner i en produkt som redan har arbetat med. Analys av resultaten som erhölls under testperioden visade hög tillfredsställelse med funktionen hos användarna och en bättre förståelse för deras omgivning. Konsekvensen indikerar att rumsligt ljud kan förbättra körförmågan genom att införa en ny informationskälla för positionering av nästa sväng eller hinder. Ytterligare forskning behövs för att identifiera andra faktorer som kan stärka produktens effektivitet.

APA, Harvard, Vancouver, ISO, and other styles

35

Udaya, Kumar Magesh Kumar. "Classification of Parkinson’s Disease using MultiPass Lvq,Logistic Model Tree,K-Star for Audio Data set : Classification of Parkinson Disease using Audio Dataset." Thesis, Högskolan Dalarna, Datateknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:du-5596.

Full text

Abstract:

Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

36

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Full text

Abstract:

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

APA, Harvard, Vancouver, ISO, and other styles

37

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Full text

Abstract:

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

APA, Harvard, Vancouver, ISO, and other styles

38

Jiang, Jing Jing. "Self-synchronization and LUT based client side digital audio watermarking." Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2550676.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Prykhodko, S. B. "Application of Nonlinear Stochastic Differential Systems for Data Protection in Audio and Graphics Files." Thesis, Sumy State University, 2015. http://essuir.sumdu.edu.ua/handle/123456789/41209.

Full text

Abstract:

Data protection in audio and graphics files is one of the significant problems in information security area. This problem is usually solved with cryptographic methods in computer systems, but new solutions are still being searched for. Application of nonlinear stochastic differential systems (SDSs) is one of such new methods [1].

APA, Harvard, Vancouver, ISO, and other styles

40

Barakat, Arian. "What makes an (audio)book popular?" Thesis, Linköpings universitet, Statistik och maskininlärning, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152871.

Full text

Abstract:

Audiobook reading has traditionally been used for educational purposes but has in recent times grown into a popular alternative to the more traditional means of consuming literature. In order to differentiate themselves from other players in the market, but also provide their users enjoyable literature, several audiobook companies have lately directed their efforts on producing own content. Creating highly rated content is, however, no easy task and one reoccurring challenge is how to make a bestselling story. In an attempt to identify latent features shared by successful audiobooks and evaluate proposed methods for literary quantiﬁcation, this thesis employs an array of frameworks from the ﬁeld of Statistics, Machine Learning and Natural Language Processing on data and literature provided by Storytel - Sweden’s largest audiobook company. We analyze and identify important features from a collection of 3077 Swedish books concerning their promotional and literary success. By considering features from the aspects Metadata, Theme, Plot, Style and Readability, we found that popular books are typically published as a book series, cover 1-3 central topics, write about, e.g., daughter-mother relationships and human closeness but that they also hold, on average, a higher proportion of verbs and a lower degree of short words. Despite successfully identifying these, but also other factors, we recognized that none of our models predicted “bestseller” adequately and that future work may desire to study additional factors, employ other models or even use different metrics to deﬁne and measure popularity. From our evaluation of the literary quantiﬁcation methods, namely topic modeling and narrative approximation, we found that these methods are, in general, suitable for Swedish texts but that they require further improvement and experimentation to be successfully deployed for Swedish literature. For topic modeling, we recognized that the sole use of nouns provided more interpretable topics and that the inclusion of character names tended to pollute the topics. We also identiﬁed and discussed the possible problem of word inﬂections when modeling topics for more morphologically complex languages, and that additional preprocessing treatments such as word lemmatization or post-training text normalization may improve the quality and interpretability of topics. For the narrative approximation, we discovered that the method currently suffers from three shortcomings: (1) unreliable sentence segmentation, (2) unsatisfactory dictionary-based sentiment analysis and (3) the possible loss of sentiment information induced by translations. Despite only examining a handful of literary work, we further found that books written initially in Swedish had narratives that were more cross-language consistent compared to books written in English and then translated to Swedish.

APA, Harvard, Vancouver, ISO, and other styles

41

Miller, Robin J. "COFDM for HF digital broadcasting." Thesis, University of Brighton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287067.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Shakespeare, Simon Adam. "Fetal heart rate derivation via Doppler ultrasound." Thesis, University of Nottingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.342473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Hansjons, Vegeborn Victor. "LjudMAP: A Visualization Tool for Exploring Audio Collections with Real-Time Concatenative Synthesis Capabilities." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277831.

Full text

Abstract:

This thesis presents the software tool “LjudMAP," which fuses techniques of music informatics and unsupervised machine learning methods to assist in the exploration of audio collections. LjudMAP builds on concepts of the software tool, "Temporally Disassembled Audio," which was developed to enable fast browsing of recorded speech material. LjudMAP is intended instead for analysis and real-time composition of electroacoustic music, and is programmed in a way that can include more audio features. This thesis presents investigations into how LjudMAP can be used for identifying similarities and clusters within audio collections. A key contribution is the coagulation of clusters of sound based on principles of proximity in time and feature space. The thesis also shows how LjudMAP can be used for composition, with several demonstrations provided by one electroacoustic composer with a variety of sound materials. The source code for LjudMAP is available at: https://github.com/victorwegeborn/LjudMAP.
I detta examensarbete presenteras mjukvaruverktyget ”LjudMAP,” som sam- manfogar tekniker i musikinformatik och oövervakade maskininlärningsmetoder för att bistå i utforskande av ljudsamlingar. LjudMAP bygger på koncepten som återfinns i ”Temporally Disassembled Audio” som är framtaget för att möjliggöra snabbt bläddrande i ljudupptagningar av tal. LjudMAP är istället avsedd för analys och realtidskomposition av elektroakustik musik, och är programmerad på ett sätt som kan inkludera fler ljuddeskriptorer. I examensarbetet presenteras undersökningar i hur LjudMAP kan användas för att identifiera likheter och kluster av ljud inom ljudsamlingar. Ett viktigt bidrag är koagulering av kluster av ljud baserat på principer för närhet i tids- och funktionsrymden. Examensarbetet visar också hur LjudMAP kan användas för komposition genom flera demonstrationer utförda av en elektroakustisk kompositör som använt sig av olika ljudkällor. Källkoden för LjudMAP finns tillgängligt vid: https://github.com/victorwegeborn AP.

APA, Harvard, Vancouver, ISO, and other styles

44

Slater, P. "The creation and control of digital audio waveforms : An investigation into techniques for the creation and real-time control of audio waveforms using data representations which result in timbral flexibility and high audio quality." Thesis, University of Bradford, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.233660.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Henzl, David. "VST Plug-IN pro vodoznačení audio signálů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217498.

Full text

Abstract:

This thesis deal with digital signal proccessing methods, possibilities their processing and especially audio signal watermarking like possibility safeguard author's rights of audio content. In this thesis are foreshadoweds basic audio watermarking methods and possibilities of watermark detection. To idea generation about watermarking there is described audio watermarking method known as Echo Hiding. This method embed watermarks to audio content in time-domain while watermark detection is made in kepstral-domain by using Fast Fourier Transform and correlation function. Method is implemented like VST plug - in and along with ASIO drivers that minimize signal latency provides audio signal watermarking in real - time. Aim of the first volume of this thesis is introduction of VST technology, ASIO driver and creating VST plug – in‘s. Alternative volume of thesis deal with implementation watermarking methods in conjunction with VST technology.

APA, Harvard, Vancouver, ISO, and other styles

46

Saeed, Nausheen. "Automated Gravel Road Condition Assessment : A Case Study of Assessing Loose Gravel using Audio Data." Licentiate thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-36402.

Full text

Abstract:

Gravel roads connect sparse populations and provide highways for agriculture and the transport of forest goods. Gravel roads are an economical choice where traffic volume is low. In Sweden, 21% of all public roads are state-owned gravel roads, covering over 20,200 km. In addition, there are some 74,000 km of gravel roads and 210,000 km of forest roads that are owned by the private sector. The Swedish Transport Administration (Trafikverket) rates the condition of gravel roads according to the severity of irregularities (e.g. corrugations and potholes), dust, loose gravel, and gravel cross-sections. This assessment is carried out during the summertime when roads are free of snow. One of the essential parameters for gravel road assessment is loose gravel. Loose gravel can cause a tire to slip, leading to a loss of driver control. Assessment of gravel roads is carried out subjectively by taking images of road sections and adding some textual notes. A cost-effective, intelligent, and objective method for road assessment is lacking. Expensive methods, such as laser profiler trucks, are available and can offer road profiling with high accuracy. These methods are not applied to gravel roads, however, because of the need to maintain cost-efficiency. In this thesis, we explored the idea that, in addition to machine vision, we could also use machine hearing to classify the condition of gravel roads in relation to loose gravel. Several suitable classical supervised learning and convolutional neural networks (CNN) were tested. When people drive on gravel roads, they can make sense of the road condition by listening to the gravel hitting the bottom of the car. The more we hear gravel hitting the bottom of the car, the more we can sense that there is a lot of loose gravel and, therefore, the road might be in a bad condition. Based on this idea, we hypothesized that machines could also undertake such a classification when trained with labeled sound data. Machines can identify gravel and non-gravel sounds. In this thesis, we used traditional machine learning algorithms, such as support vector machines (SVM), decision trees, and ensemble classification methods. We also explored CNN for classifying spectrograms of audio sounds and images in gravel roads. Both supervised learning and CNN were used, and results were compared for this study. In classical algorithms, when compared with other classifiers, ensemble bagged tree (EBT)-based classifiers performed best for classifying gravel and non-gravel sounds. EBT performance is also useful in reducing the misclassification of non-gravel sounds. The use of CNN also showed a 97.91% accuracy rate. Using CNN makes the classification process more intuitive because the network architecture takes responsibility for selecting the relevant training features. Furthermore, the classification results can be visualized on road maps, which can help road monitoring agencies assess road conditions and schedule maintenance activities for a particular road.

Due to unforeseen circumstances the seminar was postponed from May 7 to 28, as duly stated in the new posting page.

APA, Harvard, Vancouver, ISO, and other styles

47

Scholz, Anne-Charlot. "Voice Qualities in Audio Subtitles : Opportunities and Challenges in Voice Design for accessibility and beyond." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299944.

Full text

Abstract:

This paper explores novel experiential qualities of the voice in Audio Subtitles through research through design. Audio subtitles is an accessibility service for users who have trouble comprehending subtitles in audiovisual content and has been newly developed for video on demand platforms such as SVT Play. In order to explore possibilities in its voice design, short video clips of films and TV series with different types of audio subtitles were produced, presented to and discussed with a small number of potential users of audio subtitles that included people with dyslexia, cognitive difficulties and autism. Results indicated that applied voices that did not support the user’s expectations, low and high pitches as well as low-quality speech synthesis, made for uncomfortable experiences, which could prove to be useful for provoking reflection and challenging norms. The paper also discusses how voice design for this service has the potential to match the filmmakers intentions by translating more than semantic information, as well as how audio subtitles could potentially be produced by professional sound designers and filmmakers instead of video on demand services. Finally, challenges such as misgendering and insensitive choices of voice in voice design for audio subtitles are considered, underscoring how ethics can’t be avoided when working with the voice modality.
Denna uppsats utforskar nya kvaliteter av rösten i uppläst undertext genom research through design, en metod där kunskap skapas genom design processen och reaktioner till design. Uppläst undertext är en tillgänglighetstjänst för användare som har problem med att läsa och följa undertexter i audiovisuellt innehåll och har nyligen utvecklats för video on demand-plattformar som SVT Play. För att utforska möjligheter i dens röstdesign producerades korta videoklipp av filmer och TV-serier med olika typer av uppläst undertext. De presenterades för och diskuterades med ett litet antal potentiella användare av tjänsten, bland dem personer med dyslexi, kognitiva svårigheter och autism. Resultaten indikerade att röster som inte stödde användarens förväntningar, låga och höga tonhöjden samt talsyntes av låg kvalitet, gav obehagliga upplevelser, vilket kan visa sig vara användbart för att framkalla reflektioner och utmana normer. Uppsatsen diskuterar även hur röstdesign för uppläst undertext har potentialen att efterlikna filmskaparnas avsikter genom att översätta mer än semantisk information, och hur ljudundertexter kan produceras av professionella ljuddesigner och filmskapare istället för video on demand tjänster. Slutligen tas utmaningar som felaktig könsbestämning och okänsliga röstval i röstdesign för uppläst undertext i hänsyn, vilket understryker hur etik inte kan undvikas när det arbetas med röst-modaliteten.

APA, Harvard, Vancouver, ISO, and other styles

48

Vijjapu, Sudheer Paarmann Larry D. "RC implementation of an audio frequency band Butterworth MASH delta-sigma analog to digital data converter." Diss., The archival copy of this thesis can be found at SOAR (password protected), 2006. http://soar.wichita.edu/dspace/handle/10057/568.

Full text

Abstract:

Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical and Computer Engineering.
"August 2006." Title from PDF title page (viewed on May 2, 2007). Thesis adviser: Larry D. Paarmann. Includes bibliographic references (leaves 41-43).

APA, Harvard, Vancouver, ISO, and other styles

49

Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.

Full text

Abstract:

Traditionally, in Speech Emotion Recognition, models require a large number of manually engineered features and intermediate representations such as spectrograms for training. However, to hand-engineer such features often requires both expert domain knowledge and resources. Recently, with the emerging paradigm of deep-learning, end-to-end models that extract features themselves and learn from the raw speech signal directly have been explored. A previous approach has been to combine multiple parallel CNNs with different filter lengths to extract multiple temporal features from the audio signal, and then feed the resulting sequence to a recurrent block. Also, other recent work present high accuracies when utilizing local feature learning blocks (LFLBs) for reducing the dimensionality of a raw audio signal, extracting the most important information. Thus, this study will combine the idea of LFLBs for feature extraction with a block of parallel CNNs with different filter lengths for capturing multitemporal features; this will finally be fed into an LSTM layer for global contextual feature learning. To the best of our knowledge, such a combined architecture has yet not been properly investigated. Further, this study will investigate different configurations of such an architecture. The proposed model is then trained and evaluated on the well-known speech databases EmoDB and RAVDESS, both in a speaker-dependent and speaker-independent manner. The results indicate that the proposed architecture can produce comparable results with state-of-the-art; despite excluding data augmentation and advanced pre-processing. It was reported 3 parallel CNN pipes yielded the highest accuracy, together with a series of modified LFLBs that utilize averagepooling and ReLU activation. This shows the power of leaving the feature learning up to the network and opens up for interesting future research on time-complexity and trade-off between introducing complexity in pre-processing or in the model architecture itself.
Traditionellt sätt, vid talbaserad känsloigenkänning, kräver modeller ett stort antal manuellt konstruerade attribut och mellanliggande representationer, såsom spektrogram, för träning. Men att konstruera sådana attribut för hand kräver ofta både domänspecifika expertkunskaper och resurser. Nyligen har djupinlärningens framväxande end-to-end modeller, som utvinner attribut och lär sig direkt från den råa ljudsignalen, undersökts. Ett tidigare tillvägagångssätt har varit att kombinera parallella CNN:er med olika filterlängder för att extrahera flera temporala attribut från ljudsignalen och sedan låta den resulterande sekvensen passera vidare in i ett så kallat Recurrent Neural Network. Andra tidigare studier har också nått en hög noggrannhet när man använder lokala inlärningsblock (LFLB) för att reducera dimensionaliteten hos den råa ljudsignalen, och på så sätt extraheras den viktigaste informationen från ljudet. Således kombinerar denna studie idén om att nyttja LFLB:er för extraktion av attribut, tillsammans med ett block av parallella CNN:er som har olika filterlängder för att fånga multitemporala attribut; detta kommer slutligen att matas in i ett LSTM-lager för global inlärning av kontextuell information. Så vitt vi vet har en sådan kombinerad arkitektur ännu inte undersökts. Vidare kommer denna studie att undersöka olika konfigurationer av en sådan arkitektur. Den föreslagna modellen tränas och utvärderas sedan på de välkända taldatabaserna EmoDB och RAVDESS, både via ett talarberoende och talaroberoende tillvägagångssätt. Resultaten indikerar att den föreslagna arkitekturen kan ge jämförbara resultat med state-of-the-art, trots att ingen ökning av data eller avancerad förbehandling har inkluderats. Det rapporteras att 3 parallella CNN-lager gav högsta noggrannhet, tillsammans med en serie av modifierade LFLB:er som nyttjar average-pooling och ReLU som aktiveringsfunktion. Detta visar fördelarna med att lämna inlärningen av attribut till nätverket och öppnar upp för intressant framtida forskning kring tidskomplexitet och avvägning mellan introduktion av komplexitet i förbehandlingen eller i själva modellarkitekturen.

APA, Harvard, Vancouver, ISO, and other styles

50

Karri, Janardhan Bhima Reddy. "Low Power Real-time Video and Audio Embedded System Design for Naturalistic Bicycle Study." Scholar Commons, 2015. https://scholarcommons.usf.edu/etd/5518.

Full text

Abstract:

According to NHTSA Traffic Safety Facts [9], bicyclist deaths and injuries in 2013 are recorded as 732 and 48,000, respectively. In the State of Florida the safety of bicyclists is of particular concern as the bicycle fatality rates are nearly triple the national average. Further Florida ranks #1 on bicycle fatality rate in the nation for several years. To determine the cause of near-misses and crashes, a detailed study of bicyclist behavior and environmental conditions is needed. In a Florida Department of Transport (FDOT) funded project, USF CUTR has proposed naturalistic bicycle study based on ride data collected from 100 bicyclists for 3000 hrs. To this end, Bicycle Data Acquisition System (BDAS) is being researched and developed. The main objective of this thesis work is to design and implement low power video and audio subsystems of BDAS as specified by domain experts (USF CUTR researchers). This work also involves design of graphical user interface (Windows application) to visualize the data in a synchronized manner. Selection of appropriate hardware to capture and store data is critical as it should meet several criteria like low power consumption, low cost, and small form factor. Several Camera controllers were evaluated in terms of their performance and cost. The major challenges in this design are synchronization between collected data, storage of the video and sensor data, and design of low power embedded subsystems.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Audio data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles