Dissertations / Theses: 'Automatic data processing'

1

余銘龍 and Ming-lung Yu. "Automatic processing of Chinese language bank cheques." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31225548.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Hoyt, Matthew Ray. "Automatic Tagging of Communication Data." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc149611/.

Full text

Abstract:

Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project.

APA, Harvard, Vancouver, ISO, and other styles

3

Lee, Hiu-wing Doris, and 李曉穎. "A study of automatic expansion of Chinese abbreviations." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B31609338.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Ikei, Mitsuru. "Automatic program restructuring for distributed memory multicomputers." Full text open access at:, 1992. http://content.ohsu.edu/u?/etd,191.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

張少能 and Siu-nang Bruce Cheung. "A theory of automatic language acquisition." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1994. http://hub.hku.hk/bib/B31233521.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Josifovski, Ljubomir. "Robust automatic speech recognition with missing and unreliable data." Thesis, University of Sheffield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Wang, Wei. "Automatic Chinese calligraphic font generation with machine learning technology." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3950605.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Wong, Angela Sai On. "A fully automatic analytic approach to budget-constrained system upgrade." Thesis, University of British Columbia, 1987. http://hdl.handle.net/2429/26670.

Full text

Abstract:

This thesis describes the development of a software package to upgrade computer systems. The package, named OPTIMAL, solves the following problem: given an existing computer system and its workload, a budget, and the costs and descriptions of available upgrade alternatives for devices in the system, what is the most cost-effective way of upgrading and tuning the system to produce the optimal system throughput? To enhance the practicality of OPTIMAL, the research followed two criteria: i) input required by OPTIMAL must be system and workload characteristics directly measurable from the system under consideration; ii) other than gathering the appropriate input data, the package must be completely automated and must not require any specialized knowledge in systems performance evaluation to interpret the results. The output of OPTIMAL consists of the optimal system throughput under the budget constraint, the workload and system configuration (or upgrade strategy) that provide such throughput, and the cost of the upgrade. Various optimization techniques, including saturation analysis and fine tuning, have been applied to enhance the performance of OPTIMAL.
Science, Faculty of
Computer Science, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

9

Morgan, Clifford Owen. "Development of computer aided analysis and design software for studying dynamic process operability." Thesis, Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/10187.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Paithoonwattanakij, Kitti. "Automatic pattern recognition techniques for geometrical correction on satellite data." Thesis, University of Dundee, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.293190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Avadhani, Umesh D. "Data processing in a small transit company using an automatic passenger counter." Thesis, Virginia Tech, 1986. http://hdl.handle.net/10919/45669.

Full text

Abstract:

This thesis describes the work done in the second stage of the implementation of the Automatic Passenger Counter (APC) system at the Roanoke Valley - Metro Transit Company. This second stage deals with the preparation of a few reports and plots that would help the transit managers in efficiently managing the transit system. The reports and plots give an evaluation of the system and service operations by which the decision makers can support their decisions.

For an efficient management of the transit system, data on ridership activity, running times schedule information, and fare revenue is required. From this data it is possible to produce management information reports and summary statistics.

The present data collection program at Roanoke Valleyâ Metro is carried by using checkers and supervisors to collect ridership and schedule adherence information using manual methods. The information needed for efficient management of transit operations is both difficult and expensive to obtain. The new APC system offers the management with a new and powerful tool that will enhance their capability to make better decisions when allocating the service needs. The data from the APC are essential for the transit propertys ongoing planning and scheduling activites. The management could easily quantify the service demands on a route or for the whole system as desired by the user.

Master of Science

APA, Harvard, Vancouver, ISO, and other styles

12

McKay, Cory. "Automatic genre classification of MIDI recordings." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81503.

Full text

Abstract:

A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A novel hybrid classification system is used that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour and neural network-based classifiers are used, and feature selection and weighting are performed using genetic algorithms. A thorough review of previous research in automatic genre classification is presented, along with an overview of automatic feature selection and classification techniques. Also included is a discussion of the theoretical issues relating to musical genre, including but not limited to what mechanisms humans use to classify music by genre and how realistic genre taxonomies can be constructed.

APA, Harvard, Vancouver, ISO, and other styles

13

Nelson, Jeffrey Ernest. "Automatic, incremental, on-the-fly garbage collection of actors." Thesis, Virginia Tech, 1989. http://hdl.handle.net/10919/43103.

Full text

Abstract:

Garbage collection is an important topic of research for operating systems, because applications are easier to write and maintain if they are unburdened by the concerns of storage management. The actor computation model is another important topic: it is a powerful, expressive model of concurrent computation. This thesis is motivated by the need for an actor garbage collector for a distributed real-time system under development by the Real-Time Systems Group at Virginia Tech. It is shown that traditional garbage collectorsâ even those that operate on computational objectsâ are not sufficient for actors. Three algorithms, with varying degrees of efficiency, are presented as solutions to the actor garbage collection problem. The correctness and execution complexity of the algorithms is derived. Implementation methods are explored, and directions for future research are proposed.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

14

Andersson, Jakob. "Automatic Invoice Data Extraction as a Constraint Satisfaction Problem." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-411596.

Full text

Abstract:

Invoice processing has traditionally been heavily dependent onmanual labor, where the task is to identify and move certaininformation from an origin to a destination. A time demandingtask with a high interest of automation to reduce time ofexecution, fault-risk and cost.With the evergrowing interest in automation and ArtificialIntelligence (AI), this thesis will explore the possibilities ofautomating the task of extracting and mapping information ofinterest by defining the problem as a Constraint OptimizationProblem (COP) using numeric relations between present information.The problem is then solved by extracting the numericalvalues in a document and utilizing it as an input space whereeach combination of numeric values are tested using a backendsolver.Several different models were defined, using different approachesand constraints on relations between possible existingfields. A solution to an invoice was considered correct if thetotal, tax, net and rounding amounts were estimated correctly.The final best achieved results were 84.30% correct and8.77% incorrect solutions on a set of 1400 various types of invoices.The achieved results show a promising alternative route toproposed solutions using e.g. machine learning or other intelligentsolutions using graphical or positional data. While only regardingthe numerical values present in each document, the proposedsolution becomes decentralized and therefor can be implementedand ran on any set of invoices without any pre-training phase.

APA, Harvard, Vancouver, ISO, and other styles

15

Mohapatra, Deepankar. "Automatic Removal of Complex Shadows From Indoor Videos." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804942/.

Full text

Abstract:

Shadows in indoor scenarios are usually characterized with multiple light sources that produce complex shadow patterns of a single object. Without removing shadow, the foreground object tends to be erroneously segmented. The inconsistent hue and intensity of shadows make automatic removal a challenging task. In this thesis, a dynamic thresholding and transfer learning-based method for removing shadows is proposed. The method suppresses light shadows with a dynamically computed threshold and removes dark shadows using an online learning strategy that is built upon a base classifier trained with manually annotated examples and refined with the automatically identified examples in the new videos. Experimental results demonstrate that despite variation of lighting conditions in videos our proposed method is able to adapt to the videos and remove shadows effectively. The sensitivity of shadow detection changes slightly with different confidence levels used in example selection for classifier retraining and high confidence level usually yields better performance with less retraining iterations.

APA, Harvard, Vancouver, ISO, and other styles

16

Schurmann, Paul R. "Automatic flowchart displays for software visualisation." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 1998. https://ro.ecu.edu.au/theses/985.

Full text

Abstract:

Understanding large software projects and maintaining them can be a time consuming process. For instance, when changes are made to source code, corresponding changes have to be made to any related documentation. One large section of the documentation process is the creation and management of diagrams. Currently there are very few automated diagramming systems that can produce diagrams from source code, and the majority of these diagramming systems require a significant amount of time to generate diagrams. This research aims at investigating the process of creating flowchart diagrams from source code and how this process can be fully automated. Automating the diagrams creation process can save the developer both time and money. By saving the developer time we allow the developer to concentrate on more critical areas of their project. This thesis will involve the design and implementation of a prototype software tool that will allow the user to quickly and easily construct meaningful diagrams from source code. The project will focus directly on the interpretation of the Pascal language into Flowcharts. The emphasis of the project will be on the arrangement of the flowchart, with a goal to create clear and understandable diagrams.

APA, Harvard, Vancouver, ISO, and other styles

17

Gergel, Barry, and University of Lethbridge Faculty of Arts and Science. "Automatic compression for image sets using a graph theoretical framework." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2007, 2007. http://hdl.handle.net/10133/538.

Full text

Abstract:

A new automatic compression scheme that adapts to any image set is presented in this thesis. The proposed scheme requires no a priori knowledge on the properties of the image set. This scheme is obtained using a unified graph-theoretical framework that allows for compression strategies to be compared both theoretically and experimentally. This strategy achieves optimal lossless compression by computing a minimum spanning tree of a graph constructed from the image set. For lossy compression, this scheme is near-optimal and a performance guarantee relative to the optimal one is provided. Experimental results demonstrate that this compression strategy compares favorably to the previously proposed strategies, with improvements up to 7% in the case of lossless compression and 72% in the case of lossy compression. This thesis also shows that the choice of underlying compression algorithm is important for compressing image sets using the proposed scheme.
x, 77 leaves ; 29 cm.

APA, Harvard, Vancouver, ISO, and other styles

18

Zhao, Guang, and 趙光. "Automatic boundary extraction in medical images based on constrained edge merging." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31223904.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

20

Parshakov, Ilia. "Automatic class labeling of classified imagery using a hyperspectral library." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Geography, c2012, 2012. http://hdl.handle.net/10133/3372.

Full text

Abstract:

Image classification is a fundamental information extraction procedure in remote sensing that is used in land-cover and land-use mapping. Despite being considered as a replacement for manual mapping, it still requires some degree of analyst intervention. This makes the process of image classification time consuming, subjective, and error prone. For example, in unsupervised classification, pixels are automatically grouped into classes, but the user has to manually label the classes as one land-cover type or another. As a general rule, the larger the number of classes, the more difficult it is to assign meaningful class labels. A fully automated post-classification procedure for class labeling was developed in an attempt to alleviate this problem. It labels spectral classes by matching their spectral characteristics with reference spectra. A Landsat TM image of an agricultural area was used for performance assessment. The algorithm was used to label a 20- and 100-class image generated by the ISODATA classifier. The 20-class image was used to compare the technique with the traditional manual labeling of classes, and the 100-class image was used to compare it with the Spectral Angle Mapper and Maximum Likelihood classifiers. The proposed technique produced a map that had an overall accuracy of 51%, outperforming the manual labeling (40% to 45% accuracy, depending on the analyst performing the labeling) and the Spectral Angle Mapper classifier (39%), but underperformed compared to the Maximum Likelihood technique (53% to 63%). The newly developed class-labeling algorithm provided better results for alfalfa, beans, corn, grass and sugar beet, whereas canola, corn, fallow, flax, potato, and wheat were identified with similar or lower accuracy, depending on the classifier it was compared with.
vii, 93 leaves : ill., maps (some col.) ; 29 cm

APA, Harvard, Vancouver, ISO, and other styles

21

Lind, Marcus. "Automatic Segmentation of Knee Cartilage Using Quantitative MRI Data." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138403.

Full text

Abstract:

This thesis investigates if support vector machine classification is a suitable approach when performing automatic segmentation of knee cartilage using quantitative magnetic resonance imaging data. The data sets used are part of a clinical project that investigates if patients that have suffered recent knee damage will develop cartilage damage. Therefore the thesis also investigates if the segmentation results can be used to predict the clinical outcome of the patients. Two methods that perform the segmentation using support vector machine classification are implemented and evaluated. The evaluation indicates that it is a good approach for the task, but the implemented methods needs to be further improved and tested on more data sets before clinical use. It was not possible to relate the cartilage properties to clinical outcome using the segmentation results. However, the investigation demonstrated good promise of how the segmentation results, if they are improved, can be used in combination with quantitative magnetic resonance imaging data to analyze how the cartilage properties change over time or vary between knees.

APA, Harvard, Vancouver, ISO, and other styles

22

Nicolson, Aaron M. "Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing." Thesis, Griffith University, 2020. http://hdl.handle.net/10072/399974.

Full text

Abstract:

Speech corrupted by background noise (or noisy speech) can cause misinterpretation and fatigue during phone and conference calls, and for hearing aid users. Noisy speech can also severely impact the performance of speech processing systems such as automatic speech recognition (ASR), automatic speaker verification (ASV), and automatic speaker identification (ASI) systems. Currently, deep learning approaches are employed in an end-to-end fashion to improve robustness. The target speech (or clean speech) is used as the training target or large noisy speech datasets are used to facilitate multi-condition training. In this dissertation, we propose competitive alternatives to the preceding approaches by updating two classic robust speech processing techniques using deep learning. The two techniques include minimum mean-square error (MMSE) and missing data approaches. An MMSE estimator aims to improve the perceived quality and intelligibility of noisy speech. This is accomplished by suppressing any background noise without distorting the speech. Prior to the introduction of deep learning, MMSE estimators were the standard speech enhancement approach. MMSE estimators require the accurate estimation of the a priori signal-to-noise ratio (SNR) to attain a high level of speech enhancement performance. However, current methods produce a priori SNR estimates with a large tracking delay and a considerable amount of bias. Hence, we propose a deep learning approach to a priori SNR estimation that is significantly more accurate than previous estimators, called Deep Xi. Through objective and subjective testing across multiple conditions, such as real-world non-stationary and coloured noise sources at multiple SNR levels, we show that Deep Xi allows MMSE estimators to produce the highest quality enhanced speech amongst all clean speech magnitude spectrum estimators. Missing data approaches improve robustness by performing inference only on noisy speech features that reliably represent clean speech. In particular, the marginalisation method was able to significantly increase the robustness of Gaussian mixture model (GMM)-based speech classification systems (e.g. GMM-based ASR, ASV, or ASI systems) in the early 2000s. However, deep neural networks (DNNs) used in current speech classification systems are non-probabilistic, a requirement for marginalisation. Hence, multi-condition training or noisy speech pre-processing is used to increase the robustness of DNN-based speech classification systems. Recently, sum-product networks (SPNs) were proposed, which are deep probabilistic graphical models that can perform the probabilistic queries required for missing data approaches. While available toolkits for SPNs are in their infancy, we show through an ASI task that SPNs using missing data approaches could be a strong alternative for robust speech processing in the future. This dissertation demonstrates that MMSE estimators and missing data approaches are still relevant approaches to robust speech processing when assisted by deep learning.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

23

Shenoy, U. Nagaraj. "Automatic Data Partitioning By Hierarchical Genetic Search." Thesis, Indian Institute of Science, 1996. https://etd.iisc.ac.in/handle/2005/172.

Full text

Abstract:

The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem.
CDAC

APA, Harvard, Vancouver, ISO, and other styles

24

Shenoy, U. Nagaraj. "Automatic Data Partitioning By Hierarchical Genetic Search." Thesis, Indian Institute of Science, 1996. http://hdl.handle.net/2005/172.

Full text

Abstract:

CDAC
The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem.

APA, Harvard, Vancouver, ISO, and other styles

25

Van, der Walt Craig. "An investigation into the practical implementation of speech recognition for data capturing." Thesis, Cape Technikon, 1993. http://hdl.handle.net/20.500.11838/1156.

Full text

Abstract:

Thesis (Master Diploma (Technology))--Cape Technikon, Cape Town,1993
A study into the practical implementation of Speech Recognition for the purposes of Data Capturing within Telkom SA. is described. As datacapturing is increasing in demand a more efficient method of capturing is sought. The technology relating to Speech recognition is herein examined and practical gnidelines for selecting a Speech recognition system are described. These guidelines are used to show how commercially available systems can be evaluated. Specific tests on a selected speech recognition system are described, relating to the accuracy and adaptability of the system. The results obtained illustrate why at present speech recognition systems are not advisable for the purpose of Data capturing. The results also demonstrate how the selection of keywords words can affect system performance. Areas of further research are highlighted relating to recognition performance and vocabulary selection.

APA, Harvard, Vancouver, ISO, and other styles

26

Boying, Lu, Zhang Jun, Nie Shuhui, and Huang Xinjian. "AUTOMATIC DEPENDENT SURVEILLANCE (ADS) SYSTEM RESEARCH AND DEVELOPMENT." International Foundation for Telemetering, 2002. http://hdl.handle.net/10150/607495.

Full text

Abstract:

International Telemetering Conference Proceedings / October 21, 2002 / Town & Country Hotel and Conference Center, San Diego, California
This paper presents the basic concept, construction principle and implementation work for the Automatic Dependent Surveillance (ADS) system. As a part of ADS system, the ADS message processing system based on PC computer was given more attention. Furthermore, the paper introduces the ADS trial status and points out that the ADS implementation will bring tremendous economical and social efficiency.

APA, Harvard, Vancouver, ISO, and other styles

27

Farley, Mark Harrison. "Predicting machining accuracy and duration of an NC mill by computer simulation." Thesis, Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/16499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Yang, Wenwei, and 楊文衛. "Development and application of automatic monitoring system for standard penetration test in site investigation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B36811919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Palmer, David Donald. "Modeling uncertainty for information extraction from speech data /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/5834.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Alvarado, Mantecon Jesus Gerardo. "Towards the Automatic Classification of Student Answers to Open-ended Questions." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39093.

Full text

Abstract:

One of the main research challenges nowadays in the context of Massive Open Online Courses (MOOCs) is the automation of the evaluation process of text-based assessments effectively. Text-based assessments, such as essay writing, have been proved to be better indicators of higher level of understanding than machine-scored assessments (E.g. Multiple Choice Questions). Nonetheless, due to the rapid growth of MOOCs, text-based evaluation has become a difficult task for human markers, creating the need of automated systems for grading. In this thesis, we focus on the automated short answer grading task (ASAG), which automatically assesses natural language answers to open-ended questions into correct and incorrect classes. We propose an ensemble supervised machine learning approach that relies on two types of classifiers: a response-based classifier, which centers around feature extraction from available responses, and a reference-based classifier which considers the relationships between responses, model answers and questions. For each classifier, we explored a set of features based on words and entities. For the response-based classifier, we tested and compared 5 features: traditional n-gram models, entity URIs (Uniform Resource Identifier) and entity mentions both extracted using a semantic annotation API, entity mention embeddings based on GloVe and entity URI embeddings extracted from Wikipedia. For the reference-based classifier, we explored fourteen features: cosine similarity between sentence embeddings from student answers and model answers, number of overlapping elements (words, entity URI, entity mention) between student answers and model answers or question text, Jaccard similarity coefficient between student answers and model answers or question text (based on words, entity URI or entity mentions) and a sentence embedding representation. We evaluated our classifiers on three datasets, two of which belong to the SemEval ASAG competition (Dzikovska et al., 2013). Our results show that, in general, reference-based features perform much better than response-based features in terms of accuracy and macro average f1-score. Within the reference-based approach, we observe that the use of S6 embedding representation, which considers question text, student and model answer, generated the best performing models. Nonetheless, their combination with other similarity features helped build more accurate classifiers. As for response-based classifiers, models based on traditional n-gram features remained the best models. Finally, we combined our best reference-based and response-based classifiers using an ensemble learning model. Our ensemble classifiers combining both approaches achieved the best results for one of the evaluation datasets, but underperformed on the remaining two. We also compared the best two classifiers with some of the main state-of-the-art results on the SemEval competition. Our final embedded meta-classifier outperformed the top-ranking result on the SemEval Beetle dataset and our top classifier on SemEval SciEntBank, trained on reference-based features, obtained the 2nd position. In conclusion, the reference-based approach, powered mainly by sentence level embeddings and other similarity features, proved to generate the most efficient models in two out of three datasets and the ensemble model was the best on the SemEval Beetle dataset.

APA, Harvard, Vancouver, ISO, and other styles

31

Truter, J. N. J. "Using CAMAC hardware for access to a particle accelerator." Master's thesis, University of Cape Town, 1988. http://hdl.handle.net/11427/17049.

Full text

Abstract:

Includes bibliographical references and index.
The design and implementation of a method to software interface high level applications programs used for the control and monitoring of a Particle Accelerator is described. Effective methods of interfacing the instrumentation bus system with a Real time multitasking computer operating system were examined and optimized for efficient utilization of the operating system software and available hardware. Various methods of accessing the instrumentation bus are implemented as well as demand response servicing of the instruments on the bus.

APA, Harvard, Vancouver, ISO, and other styles

32

Goussard, George Willem. "Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems." Thesis, Stellenbosch : University of Stellenbosch, 2011. http://hdl.handle.net/10019.1/6686.

Full text

Abstract:

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2011.
ENGLISH ABSTRACT: This thesis presents a system that is designed to replace the manual process of generating a pronunciation dictionary for use in automatic speech recognition. The proposed system has several stages. The first stage segments the audio into what will be known as the subword units, using a frequency domain method. In the second stage, dynamic time warping is used to determine the similarity between the segments of each possible pair of these acoustic segments. These similarities are used to cluster similar acoustic segments into acoustic clusters. The final stage derives a pronunciation dictionary from the orthography of the training data and corresponding sequence of acoustic clusters. This process begins with an initial mapping between words and their sequence of clusters, established by Viterbi alignment with the orthographic transcription. The dictionary is refined iteratively by pruning redundant mappings, hidden Markov model estimation and Viterbi re-alignment in each iteration. This approach is evaluated experimentally by applying it to two subsets of the TIMIT corpus. It is found that, when test words are repeated often in the training material, the approach leads to a system whose accuracy is almost as good as one trained using the phonetic transcriptions. When test words are not repeated often in the training set, the proposed approach leads to better results than those achieved using the phonetic transcriptions, although the recognition is poor overall in this case.
AFRIKAANSE OPSOMMING: Die doelwit van die tesis is om ’n stelsel te beskryf wat ontwerp is om die handgedrewe proses in die samestelling van ’n woordeboek, vir die gebruik in outomatiese spraakherkenningsstelsels, te vervang. Die voorgestelde stelsel bestaan uit ’n aantal stappe. Die eerste stap is die segmentering van die oudio in sogenaamde sub-woord eenhede deur gebruik te maak van ’n frekwensie gebied tegniek. Met die tweede stap word die dinamiese tydverplasingsalgoritme ingespan om die ooreenkoms tussen die segmente van elkeen van die moontlike pare van die akoestiese segmente bepaal. Die ooreenkomste word dan gebruik om die akoestiese segmente te groepeer in akoestiese groepe. Die laaste stap stel die woordeboek saam deur gebruik te maak van die ortografiese transkripsie van afrigtingsdata en die ooreenstemmende reeks akoestiese groepe. Die finale stap begin met ’n aanvanklike afbeelding vanaf woorde tot hul reeks groep identifiseerders, bewerkstellig deur Viterbi belyning en die ortografiese transkripsie. Die woordeboek word iteratief verfyn deur oortollige afbeeldings te snoei, verskuilde Markov modelle af te rig en deur Viterbi belyning te gebruik in elke iterasie. Die benadering is getoets deur dit eksperimenteel te evalueer op twee subversamelings data vanuit die TIMIT korpus. Daar is bevind dat, wanneer woorde herhaal word in die afrigtingsdata, die stelsel se benadering die akkuraatheid ewenaar van ’n stelsel wat met die fonetiese transkripsie afgerig is. As die woorde nie herhaal word in die afrigtingsdata nie, is die akkuraatheid van die stelsel se benadering beter as wanneer die stelsel afgerig word met die fonetiese transkripsie, alhoewel die akkuraatheid in die algemeen swak is.

APA, Harvard, Vancouver, ISO, and other styles

33

Yeung, Dit-yan, and 楊瓞仁. "A hierarchical approach to the automatic identification of Putonghua unvoiced consonants in isolated syllables." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1985. http://hub.hku.hk/bib/B42128195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Jungfer, Kim Michael. "Semi automatic generation of CORBA interfaces for databases in molecular biology." Thesis, University College London (University of London), 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272561.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Palm, Myllylä Johannes. "Domain Adaptation for Hypernym Discovery via Automatic Collection of Domain-Speciﬁc Training Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157693.

Full text

Abstract:

Identifying semantic relations in natural language text is an important component of many knowledge extraction systems. This thesis studies the task of hypernym discovery, i.e discovering terms that are related by the hypernymy (is-a) relation. Speciﬁcally, this thesis explores how state-of-the-art methods for hypernym discovery perform when applied in speciﬁc language domains. In recent times, state-of-the-art methods for hypernym discovery are mostly made up by supervised machine learning models that leverage distributional word representations such as word embeddings. These models require labeled training data in the form of term pairs that are known to be related by hypernymy. Such labeled training data is often not available when working with a speciﬁc language domain. This thesis presents experiments with an automatic training data collection algorithm. The algorithm leverages a pre-deﬁned domain-speciﬁc vocabulary, and the lexical resource WordNet, to extract training pairs automatically. This thesis contributes by presenting experimental results when attempting to leverage such automatically collected domain-speciﬁc training data for the purpose of domain adaptation. Experiments are conducted in two different domains: One domain where there is a large amount of text data, and another domain where there is a much smaller amount of text data. Results show that the automatically collected training data has a positive impact on performance in both domains. The performance boost is most signiﬁcant in the domain with a large amount of text data, with mean average precision increasing by up to 8 points.

APA, Harvard, Vancouver, ISO, and other styles

36

Ravishankar, Mahesh. "Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1400085733.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Swarnkar, Divya. "Experience and analysis of the real time data acquisition system." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 59 p, 2005. http://proquest.umi.com/pqdweb?did=994252331&sid=12&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (M.S.)--University of Delaware, 2005.
Principal faculty advisors: Martin Swany, Dept.. of Computer & Information Sciences; and David Seckel, Dept. of Physics & Astronomy. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

38

Salvi, Giampiero. "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis." Doctoral thesis, Stockholm : KTH School of Computer Science and Comunication, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Mazidi, Karen. "Infusing Automatic Question Generation with Natural Language Understanding." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc955021/.

Full text

Abstract:

Automatically generating questions from text for educational purposes is an active research area in natural language processing. The automatic question generation system accompanying this dissertation is MARGE, which is a recursive acronym for: MARGE automatically reads generates and evaluates. MARGE generates questions from both individual sentences and the passage as a whole, and is the first question generation system to successfully generate meaningful questions from textual units larger than a sentence. Prior work in automatic question generation from text treats a sentence as a string of constituents to be rearranged into as many questions as allowed by English grammar rules. Consequently, such systems overgenerate and create mainly trivial questions. Further, none of these systems to date has been able to automatically determine which questions are meaningful and which are trivial. This is because the research focus has been placed on NLG at the expense of NLU. In contrast, the work presented here infuses the questions generation process with natural language understanding. From the input text, MARGE creates a meaning analysis representation for each sentence in a passage via the DeconStructure algorithm presented in this work. Questions are generated from sentence meaning analysis representations using templates. The generated questions are automatically evaluated for question quality and importance via a ranking algorithm.

APA, Harvard, Vancouver, ISO, and other styles

40

Hernańdez, Correa Evelio. "Control of nonlinear systems using input-output information." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/11176.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Paul, Douglas James. "Parallel microcomputer control of a 3DOF robotic arm." Thesis, Georgia Institute of Technology, 1989. http://hdl.handle.net/1853/18371.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Park, Jonghun. "Structural analysis and control of resource allocation systems using petri nets." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/24529.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Jung, Uk. "Wavelet-based Data Reduction and Mining for Multiple Functional Data." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5084.

Full text

Abstract:

Advance technology such as various types of automatic data acquisitions, management, and networking systems has created a tremendous capability for managers to access valuable production information to improve their operation quality and efficiency. Signal processing and data mining techniques are more popular than ever in many fields including intelligent manufacturing. As data sets increase in size, their exploration, manipulation, and analysis become more complicated and resource consuming. Timely synthesized information such as functional data is needed for product design, process trouble-shooting, quality/efficiency improvement and resource allocation decisions. A major obstacle in those intelligent manufacturing system is that tools for processing a large volume of information coming from numerous stages on manufacturing operations are not available. Thus, the underlying theme of this thesis is to reduce the size of data in a mathematical rigorous framework, and apply existing or new procedures to the reduced-size data for various decision-making purposes. This thesis, first, proposes {it Wavelet-based Random-effect Model} which can generate multiple functional data signals which have wide fluctuations(between-signal variations) in the time domain. The random-effect wavelet atom position in the model has {it locally focused impact} which can be distinguished from other traditional random-effect models in biological field. For the data-size reduction, in order to deal with heterogeneously selected wavelet coefficients for different single curves, this thesis introduces the newly-defined {it Wavelet Vertical Energy} metric of multiple curves and utilizes it for the efficient data reduction method. The newly proposed method in this thesis will select important positions for the whole set of multiple curves by comparison between every vertical energy metrics and a threshold ({it Vertical Energy Threshold; VET}) which will be optimally decided based on an objective function. The objective function balances the reconstruction error against a data reduction ratio. Based on class membership information of each signal obtained, this thesis proposes the {it Vertical Group-Wise Threshold} method to increase the discriminative capability of the reduced-size data so that the reduced data set retains salient differences between classes as much as possible. A real-life example (Tonnage data) shows our proposed method is promising.

APA, Harvard, Vancouver, ISO, and other styles

44

Sanford, Jerald Patrick. "An automatic system for converting digitized line drawings into highly compressed mathematical primitives." Thesis, Virginia Polytechnic Institute and State University, 1985. http://hdl.handle.net/10919/101259.

Full text

Abstract:

The design of an efficient, low-cost system for automatically converting a hardcopy technical drawing into a highly compressed electronic representation is the motivation for this work. An improved method for extracting line and region information from a typical engineering drawing is presented. An efficient encoding method has also been proposed that takes advantage of the preprocessing done by the region and line extraction steps. Finally, a technique for creating a highly compressed mathematical representation (based on spline approximations) for the drawing is presented.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

45

何敏聰 and Man-chung Ho. "A recognizer of Guangdonghua: development of speech controlled telephone directory system." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31220903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Ma, Bin, and 馬斌. "A study on acoustic modeling and adaptation in HMM-based speech recognition." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31242145.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Jouenne, Vincent Y. "Critical Issues in the Processing of cDNA Microarray Images." Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/33960.

Full text

Abstract:

Microarray technology enables simultaneous gene expression level monitoring for thousands of genes. While this technology has now been recognized as a powerful and cost-effective tool for large-scale analysis, the many systematic sources of experimental variations introduce inherent errors in the extracted data. Data is gathered by processing scanned images of microarray slides. Therefore robust image processing is particularly important and has a large impact on downstream analysis. The processing of the scanned images can be subdivided in three phases: gridding, segmentation and data extraction. To measure the gene expression levels, the processing of cDNA microarray images must overcome a large set of issues in these three phases that motivates this study. This study presents automatic gridding methods and compares their performances. Two segmentation techniques already used, the Seeded Region Growing Algorithm and the Mann-Whitney Test, are examined. We present limitations of these techniques. Finally, we studied the data extraction method used in MicroArray Suite (MS), a microarray analysis software, via synthetic images and explain its intricacies.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

48

Ghosh, Sushmita. "Real time data acquisition for load management." Thesis, Virginia Tech, 1985. http://hdl.handle.net/10919/45726.

Full text

Abstract:

Demand for Data Transfer between computers has increased ever since the introduction of Personal Computers (PC). Data Communicating on the Personal Computer is much more productive as it is an intelligent terminal that can connect to various hosts on the same I/O hardware circuit as well as execute processes on its own as an isolated system. Yet, the PC on its own is useless for data communication. It requires a hardware interface circuit and software for controlling the handshaking signals and setting up communication parameters. Often the data is distorted due to noise in the line. Such transmission errors are imbedded in the data and require careful filtering. The thesis deals with the development of a Data Acquisition system that collects real time load and weather data and stores them as historical database for use in a load forecast algorithm in a load management system. A filtering technique has been developed here that checks for transmission errors in the raw data. The microcomputers used in this development are the IBM PC/XT and the AT&T 3B2 supermicro computer.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

49

"Automatic topic detection from news stories." 2001. http://library.cuhk.edu.hk/record=b5890594.

Full text

Abstract:

Hui Kin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 115-120).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Topic Detection Problem --- p.2
Chapter 1.1.1 --- What is a Topic? --- p.2
Chapter 1.1.2 --- Topic Detection --- p.3
Chapter 1.2 --- Our Contributions --- p.5
Chapter 1.2.1 --- Thesis Organization --- p.6
Chapter 2 --- Literature Review --- p.7
Chapter 2.1 --- Dragon Systems --- p.7
Chapter 2.2 --- University of Massachusetts (UMass) --- p.9
Chapter 2.3 --- Carnegie Mellon University (CMU) --- p.10
Chapter 2.4 --- BBN Technologies --- p.11
Chapter 2.5 --- IBM T. J. Watson Research Center --- p.12
Chapter 2.6 --- National Taiwan University (NTU) --- p.13
Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14
Chapter 3 --- System Overview --- p.16
Chapter 3.1 --- News Sources --- p.17
Chapter 3.2 --- Story Preprocessing --- p.21
Chapter 3.3 --- Named Entity Extraction --- p.22
Chapter 3.4 --- Gross Translation --- p.22
Chapter 3.5 --- Unsupervised Learning Module --- p.24
Chapter 4 --- Term Extraction and Story Representation --- p.27
Chapter 4.1 --- IBM Intelligent Miner For Text --- p.28
Chapter 4.2 --- Transformation-based Error-driven Learning --- p.31
Chapter 4.2.1 --- Learning Stage --- p.32
Chapter 4.2.2 --- Design of New Tags --- p.33
Chapter 4.2.3 --- Lexical Rules Learning --- p.35
Chapter 4.2.4 --- Contextual Rules Learning --- p.39
Chapter 4.3 --- Extracting Named Entities Using Learned Rules --- p.42
Chapter 4.4 --- Story Representation --- p.46
Chapter 4.4.1 --- Basic Representation --- p.46
Chapter 4.4.2 --- Enhanced Representation --- p.47
Chapter 5 --- Gross Translation --- p.52
Chapter 5.1 --- Basic Translation --- p.52
Chapter 5.2 --- Enhanced Translation --- p.60
Chapter 5.2.1 --- Parallel Corpus Alignment Approach --- p.60
Chapter 5.2.2 --- Enhanced Translation Approach --- p.62
Chapter 6 --- Unsupervised Learning Module --- p.68
Chapter 6.1 --- Overview of the Discovery Algorithm --- p.68
Chapter 6.2 --- Topic Representation --- p.70
Chapter 6.3 --- Similarity Calculation --- p.72
Chapter 6.3.1 --- Similarity Score Calculation --- p.72
Chapter 6.3.2 --- Time Adjustment Scheme --- p.74
Chapter 6.3.3 --- Language Normalization Scheme --- p.75
Chapter 6.4 --- Related Elements Combination --- p.78
Chapter 7 --- Experimental Results and Analysis --- p.84
Chapter 7.1 --- TDT corpora --- p.84
Chapter 7.2 --- Evaluation Methodology --- p.85
Chapter 7.3 --- Experimental Results on Various Parameter Settings --- p.88
Chapter 7.4 --- Experiments Results on Various Named Entity Extraction Ap- proaches --- p.89
Chapter 7.5 --- Experiments Results on Various Story Representation Approaches --- p.100
Chapter 7.6 --- Experiments Results on Various Translation Approaches --- p.104
Chapter 7.7 --- Experiments Results on the Effect of Language Normalization Scheme on Detection Approaches --- p.106
Chapter 7.8 --- TDT2000 Topic Detection Result --- p.110
Chapter 8 --- Conclusions and Future Works --- p.112
Chapter 8.1 --- Conclusions --- p.112
Chapter 8.2 --- Future Work --- p.114
Bibliography --- p.115
Chapter A --- List of Topics annotated for TDT2 Corpus --- p.121
Chapter B --- Significant Test Results --- p.124

APA, Harvard, Vancouver, ISO, and other styles

50

Yao, Yi-Cheng, and 姚奕丞. "An Automatic Pre-Processing Framework for Big Data." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/58812215485504227958.

Full text

Abstract:

碩士
中華大學
資訊管理學系
104
Big data is one of the hottest topics in today’s IT industry. With the advancing storage technology and hardware equipment, not only the quantity of data is growing incredibly fast, but also the types of data and file format are becoming more and more diverse. However, these raw data have to go through pre-processing such as capturing, analysis, correction and testing or sometimes there are problems like incomplete data, noises and inconsistency. Today, when data are being pre-processed, significant amount of money are spent on modeling, and more time and manpower are required for repeating execution and testing for data of similar sources. One solution for these issues is to have a consistent data pre-processing procedure that deals with data from a variety of sources automatically. This study is intended to design an automatic pre-processing framework for big data as a solution for these issues. The framework consists of 6 function modules: source upload, data format standardization, XML data registration, data cleansing and transformation, data export and rule management. The “source upload” module transfers the source uploaded by a user to a server; the “data format standardization” module analyzes the source and converts the data into a unified standard format by capturing a specified range; the “XML data registration” module keeps the standardized data in the system for the access of other modules; the “data cleansing and conversion” module processes the data using data cleansing and transformation techniques; the “data export” module exports files in various formats as needed; and the “rule management” module is in charge of accessing data in database and communicating with other modules for automation. Based on such design concept and techniques, an automatic pre-processing system for big data was built in the study using the object-oriented technique. The system was then tested and evaluated using practical data of various types and from a number of sources. The evaluation suggested that the system was capable of effectively converting data of different types and sources into the standardized format. The data cleansing and transformation techniques were able to process data with incomplete contents, noises and/or inconsistency and convert them into the file format that is required by users for their specific demands for the analysis environment. By processing data from various sources using this system, a standard mechanism was built for automatic pre-processing, saving the time, money and manpower that were required for modeling and repeated execution in the past. Therefore, when dealing with a wide variety of source of big data, this system will come in handy as long as the data are converted effectively into the standard format designed in this study. Files will be converted and exported to the format required by analysis needs. This will help companies deal with data from various source in real time, accurately and completely, thus allowing decision makers to make the right decision by analyzing the data.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Automatic data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles