Accedi

Bibliografie tematiche / Parse data / Articoli di riviste

Articoli di riviste sul tema "Parse data"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Parse data.

Autore: Grafiati

Pubblicato: 1 giugno 2024

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Parse data".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Marimon, Montserrat, Núria Bel e Lluís Padró. "Automatic Selection of HPSG-Parsed Sentences for Treebank Construction". Computational Linguistics 40, n. 3 (settembre 2014): 523–31. http://dx.doi.org/10.1162/coli_a_00190.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This article presents an ensemble parse approach to detecting and selecting high-quality linguistic analyses output by a hand-crafted HPSG grammar of Spanish implemented in the LKB system. The approach uses full agreement (i.e., exact syntactic match) along with a MaxEnt parse selection model and a statistical dependency parser trained on the same data. The ultimate goal is to develop a hybrid corpus annotation methodology that combines fully automatic annotation and manual parse selection, in order to make the annotation task more efficient while maintaining high accuracy and the high degree of consistency necessary for any foreseen uses of a treebank.

2

Kallmeyer, Laura, e Wolfgang Maier. "Data-Driven Parsing using Probabilistic Linear Context-Free Rewriting Systems". Computational Linguistics 39, n. 1 (marzo 2013): 87–119. http://dx.doi.org/10.1162/coli_a_00136.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper presents the first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRSs). LCFRS, an extension of CFG, can describe discontinuities in a straightforward way and is therefore a natural candidate to be used for data-driven parsing. To speed up parsing, we use different context-summary estimates of parse items, some of them allowing for A* parsing. We evaluate our parser with grammars extracted from the German NeGra treebank. Our experiments show that data-driven LCFRS parsing is feasible and yields output of competitive quality.

3

Dehbi, Y., C. Staat, L. Mandtler e L. Pl¨umer. "INCREMENTAL REFINEMENT OF FAÇADE MODELS WITH ATTRIBUTE GRAMMAR FROM 3D POINT CLOUDS". ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-3 (6 giugno 2016): 311–16. http://dx.doi.org/10.5194/isprsannals-iii-3-311-2016.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Data acquisition using unmanned aerial vehicles (UAVs) has gotten more and more attention over the last years. Especially in the field of building reconstruction the incremental interpretation of such data is a demanding task. In this context formal grammars play an important role for the top-down identification and reconstruction of building objects. Up to now, the available approaches expect offline data in order to parse an a-priori known grammar. For mapping on demand an on the fly reconstruction based on UAV data is required. An incremental interpretation of the data stream is inevitable. This paper presents an incremental parser of grammar rules for an automatic 3D building reconstruction. The parser enables a model refinement based on new observations with respect to a weighted attribute context-free grammar (WACFG). The falsification or rejection of hypotheses is supported as well. The parser can deal with and adapt available parse trees acquired from previous interpretations or predictions. Parse trees derived so far are updated in an iterative way using transformation rules. A diagnostic step searches for mismatches between current and new nodes. Prior knowledge on fac¸ades is incorporated. It is given by probability densities as well as architectural patterns. Since we cannot always assume normal distributions, the derivation of location and shape parameters of building objects is based on a kernel density estimation (KDE). While the level of detail is continuously improved, the geometrical, semantic and topological consistency is ensured.

4

Dehbi, Y., C. Staat, L. Mandtler e L. Pl¨umer. "INCREMENTAL REFINEMENT OF FAÇADE MODELS WITH ATTRIBUTE GRAMMAR FROM 3D POINT CLOUDS". ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-3 (6 giugno 2016): 311–16. http://dx.doi.org/10.5194/isprs-annals-iii-3-311-2016.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Data acquisition using unmanned aerial vehicles (UAVs) has gotten more and more attention over the last years. Especially in the field of building reconstruction the incremental interpretation of such data is a demanding task. In this context formal grammars play an important role for the top-down identification and reconstruction of building objects. Up to now, the available approaches expect offline data in order to parse an a-priori known grammar. For mapping on demand an on the fly reconstruction based on UAV data is required. An incremental interpretation of the data stream is inevitable. This paper presents an incremental parser of grammar rules for an automatic 3D building reconstruction. The parser enables a model refinement based on new observations with respect to a weighted attribute context-free grammar (WACFG). The falsification or rejection of hypotheses is supported as well. The parser can deal with and adapt available parse trees acquired from previous interpretations or predictions. Parse trees derived so far are updated in an iterative way using transformation rules. A diagnostic step searches for mismatches between current and new nodes. Prior knowledge on fac¸ades is incorporated. It is given by probability densities as well as architectural patterns. Since we cannot always assume normal distributions, the derivation of location and shape parameters of building objects is based on a kernel density estimation (KDE). While the level of detail is continuously improved, the geometrical, semantic and topological consistency is ensured.

5

Toutanova, Kristina, Aria Haghighi e Christopher D. Manning. "A Global Joint Model for Semantic Role Labeling". Computational Linguistics 34, n. 2 (giugno 2008): 161–91. http://dx.doi.org/10.1162/coli.2008.34.2.161.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We present a model for semantic role labeling that effectively captures the linguistic intuition that a semantic argument frame is a joint structure, with strong dependencies among the arguments. We show how to incorporate these strong dependencies in a statistical joint model with a rich set of features over multiple argument phrases. The proposed model substantially outperforms a similar state-of-the-art local model that does not include dependencies among different arguments. We evaluate the gains from incorporating this joint information on the Propbank corpus, when using correct syntactic parse trees as input, and when using automatically derived parse trees. The gains amount to 24.1% error reduction on all arguments and 36.8% on core arguments for gold-standard parse trees on Propbank. For automatic parse trees, the error reductions are 8.3% and 10.3% on all and core arguments, respectively. We also present results on the CoNLL 2005 shared task data set. Additionally, we explore considering multiple syntactic analyses to cope with parser noise and uncertainty.

6

Homayounfar, Hooman, e Fangju Wang. "Sibling‐First Data Organization for Parse‐Free XML Data Processing". International Journal of Web Information Systems 2, n. 3/4 (27 settembre 2007): 176–86. http://dx.doi.org/10.1108/17440080780000298.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

7

Clark, Stephen, e James R. Curran. "Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models". Computational Linguistics 33, n. 4 (dicembre 2007): 493–552. http://dx.doi.org/10.1162/coli.2007.33.4.493.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (up to 25 GB), which is satisfied using a parallel implementation of the BFGS optimization algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly, given CCG's “spurious ambiguity,” the parsing speeds are significantly higher than those reported for comparable parsers in the literature. We also extend the existing parsing techniques for CCG by developing a new model and efficient parsing algorithm which exploits all derivations, including CCG's nonstandard derivations. This model and parsing algorithm, when combined with normal-form constraints, give state-of-the-art accuracy for the recovery of predicate-argument dependencies from CCGbank. The parser is also evaluated on DepBank and compared against the RASP parser, outperforming RASP overall and on the majority of relation types. The evaluation on DepBank raises a number of issues regarding parser evaluation. This article provides a comprehensive blueprint for building a wide-coverage CCG parser. We demonstrate that both accurate and highly efficient parsing is possible with CCG.

8

Ammar, Waleed, George Mulcaire, Miguel Ballesteros, Chris Dyer e Noah A. Smith. "Many Languages, One Parser". Transactions of the Association for Computational Linguistics 4 (dicembre 2016): 431–44. http://dx.doi.org/10.1162/tacl_a_00109.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser’s performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

9

Rioth, Matthew J., Ramya Thota, David B. Staggs, Douglas B. Johnson e Jeremy L. Warner. "Pragmatic precision oncology: the secondary uses of clinical tumor molecular profiling". Journal of the American Medical Informatics Association 23, n. 4 (28 marzo 2016): 773–76. http://dx.doi.org/10.1093/jamia/ocw002.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract Background Precision oncology increasingly utilizes molecular profiling of tumors to determine treatment decisions with targeted therapeutics. The molecular profiling data is valuable in the treatment of individual patients as well as for multiple secondary uses. Objective To automatically parse, categorize, and aggregate clinical molecular profile data generated during cancer care as well as use this data to address multiple secondary use cases. Methods A system to parse, categorize and aggregate molecular profile data was created. A naÿve Bayesian classifier categorized results according to clinical groups. The accuracy of these systems were validated against a published expertly-curated subset of molecular profiling data. Results Following one year of operation, 819 samples have been accurately parsed and categorized to generate a data repository of 10,620 genetic variants. The database has been used for operational, clinical trial, and discovery science research. Conclusions A real-time database of molecular profiling data is a pragmatic solution to several knowledge management problems in the practice and science of precision oncology.

10

Zou, Feng, Xingshu Chen, Yonggang Luo, Tiemai Huang, Zhihong Liao e Keer Song. "Spray: Streaming Log Parser for Real-Time Analysis". Security and Communication Networks 2022 (6 settembre 2022): 1–11. http://dx.doi.org/10.1155/2022/1559270.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Logs is an important source of data in the field of security analysis. Log messages characterized by unstructured text, however, pose extreme challenges to security analysis. To this end, the first issue to be addressed is how to efficiently parse logs into structured data in real-time. The existing log parsers mostly parse raw log files by batch processing and are not applicable to real-time security analysis. It is also difficult to parse large historical log sets with such parsers. Some streaming log parsers also have some demerits in accuracy and parsing performance. To realize automatic, accurate, and efficient real-time log parsing, we propose Spray, a streaming log parser for real-time analysis. Spray can automatically identify the template of a real-time incoming log and accurately match the log and its template for parsing based on the law of contrapositive. We also improve Spray’s parsing performance based on key partitioning and search tree strategies. We conducted extensive experiments from such aspects as accuracy and performance. Experimental results show that Spray is much more accurate in parsing a variety of public log sets and has higher performance for parsing large log sets.

11

Passino, Diana. "Positional factors in syllabification". Acta Linguistica Academica 67, n. 1 (marzo 2020): 91–108. http://dx.doi.org/10.1556/2062.2020.00007.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractFrom the perspective of standard generative phonological theory, syllable structure is not recorded in the lexicon but it is obtained by means of a syllabification algorithm based on a series of principles. In a given language, the algorithm should parse obstruent+liquid clusters as tautosyllabic both in word-initial and word-internal positions. The tautosyllabic parse as a branching onset complies with all principles on which the syllable-building algorithm is based. In standard theory, if branching onsets of obstruent+liquid are allowed in a language and documented in word-initial position, tautosyllabic parse is predicted to hold also word-internally. Likewise, Kaye’s (1992) Uniformity Principle makes the same prediction, since it states that sequences of contiguous positions that are in a governing relation and contain the same phonological material have the same constituent structure. The present paper draws attention to empirical data showing obstruent+liquid clusters being parsed tautosyllabically in word-initial position and heterosyllabically in word-internal position in the same language. An account is proposed to explain the data discussed, claiming that positional factors may also be relevant in determining syllabification.

12

Barsnes, Harald, Steffen Huber, Albert Sickmann, Ingvar Eidhammer e Lennart Martens. "OMSSA Parser: An open-source library to parse and extract data from OMSSA MS/MS search results". PROTEOMICS 9, n. 14 (luglio 2009): 3772–74. http://dx.doi.org/10.1002/pmic.200900037.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

13

Merkys, Andrius, Antanas Vaitkus, Justas Butkus, Mykolas Okulič-Kazarinas, Visvaldas Kairys e Saulius Gražulis. "COD::CIF::Parser: an error-correcting CIF parser for the Perl language". Journal of Applied Crystallography 49, n. 1 (1 febbraio 2016): 292–301. http://dx.doi.org/10.1107/s1600576715022396.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

A syntax-correcting CIF parser,COD::CIF::Parser, is presented that can parse CIF 1.1 files and accurately report the position and the nature of the discovered syntactic problems. In addition, the parser is able to automatically fix the most common and the most obvious syntactic deficiencies of the input files. Bindings for Perl, C and Python programming environments are available. Based onCOD::CIF::Parser, thecod-toolspackage for manipulating the CIFs in the Crystallography Open Database (COD) has been developed. Thecod-toolspackage has been successfully used for continuous updates of the data in the automated COD data deposition pipeline, and to check the validity of COD data against the IUCr data validation guidelines. The performance, capabilities and applications of different parsers are compared.

14

Ryazanov, Yu D., e S. V. Nazina. "Building parsers based on syntax diagrams with multiport components". Prikladnaya Diskretnaya Matematika, n. 55 (2022): 102–19. http://dx.doi.org/10.17223/20710410/55/8.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The problem of constructing parsers from syntax diagrams with multiport components (SD) is solved. An algorithm for constructing a parser based on the GLL algorithm is proposed, which results in the compact representation of the input chain parse forest. The proposed algorithm makes it possible to build parsers based on the SD of an arbitrary structure and does not require preliminary SD transformations. We introduce the concepts of “inference tree” and “parsing forest” for SD and describe the data structures used by the parser, such as a graph-structured stack, a parser descriptor, and a compact representation of the parsing forest. The algorithm for constructing parsers based on SD is described and an example of parser constructing is given.

15

Wang, Dingquan, e Jason Eisner. "The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages". Transactions of the Association for Computational Linguistics 4 (dicembre 2016): 491–505. http://dx.doi.org/10.1162/tacl_a_00113.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We release Galactic Dependencies 1.0—a large set of synthetic languages not found on Earth, but annotated in Universal Dependencies format. This new resource aims to provide training and development data for NLP methods that aim to adapt to unfamiliar languages. Each synthetic treebank is produced from a real treebank by stochastically permuting the dependents of nouns and/or verbs to match the word order of other real languages. We discuss the usefulness, realism, parsability, perplexity, and diversity of the synthetic languages. As a simple demonstration of the use of Galactic Dependencies, we consider single-source transfer, which attempts to parse a real target language using a parser trained on a “nearby” source language. We find that including synthetic source languages somewhat increases the diversity of the source pool, which significantly improves results for most target languages.

16

Wang, Dingquan, e Jason Eisner. "Surface Statistics of an Unknown Language Indicate How to Parse It". Transactions of the Association for Computational Linguistics 6 (dicembre 2018): 667–85. http://dx.doi.org/10.1162/tacl_a_00248.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We introduce a novel framework for delexicalized dependency parsing in a new language. We show that useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous work’s interpretable typological features that require parsed corpora or expert categorization of the language. Our best method improved attachment scores on held-out test languages by an average of 5.6 percentage points over past work that does not inspect the unparsed data (McDonald et al., 2011), and by 20.7 points over past “grammar induction” work that does not use training languages (Naseem et al., 2010).

17

BOD, RENS. "Do all fragments count?" Natural Language Engineering 9, n. 4 (25 novembre 2003): 307–23. http://dx.doi.org/10.1017/s1351324903003140.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We aim at finding the minimal set of fragments that achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank. We isolate a number of dependency relations which previous models neglect but which contribute to higher accuracy. We show that the history of statistical parsing models displays a tendency towards using more and larger fragments from training data.

18

Kramer, Frank, Michaela Bayerlová, Florian Klemm, Annalen Bleckmann e Tim Beißbarth. "rBiopaxParser—an R package to parse, modify and visualize BioPAX data". Bioinformatics 29, n. 4 (28 dicembre 2012): 520–22. http://dx.doi.org/10.1093/bioinformatics/bts710.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

19

Tsivtsivadze, Evgeni, Tapio Pahikkala, Jorma Boberg e Tapio Salakoski. "Locality kernels for sequential data and their applications to parse ranking". Applied Intelligence 31, n. 1 (4 marzo 2008): 81–88. http://dx.doi.org/10.1007/s10489-008-0114-2.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

20

LIN, ZIHENG, HWEE TOU NG e MIN-YEN KAN. "A PDTB-styled end-to-end discourse parser". Natural Language Engineering 20, n. 2 (6 novembre 2012): 151–84. http://dx.doi.org/10.1017/s1351324912000307.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractSince the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser-to-parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, non-explicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F1 score of 46.80 percent for partial matching utilizing gold standard parses, and 38.18 percent with full automation.

21

Jie, Hu, Jia Quan Feng e Da Lin Chen. "Some Improvements of Genetic Programming in Data Fitting". Advanced Materials Research 201-203 (febbraio 2011): 2536–39. http://dx.doi.org/10.4028/www.scientific.net/amr.201-203.2536.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper proposed some improvement measures of Genetic Programming (GP) in data fitting, including developed new ways of crossover and mutation, improved the calculation efficiency greatly, and avoided the problem of parse tree expansion. The new adopted mutation method improved the problem of constant modification to some extent. Numerical simulation obtained a considerable good fitting and prediction precision.

22

Cameron, Sharon, Nicky Chong-White, Kiri Mealings, Tim Beechey, Harvey Dillon e Taegan Young. "The Parsing Syllable Envelopes Test for Assessment of Amplitude Modulation Discrimination Skills in Children: Development, Normative Data, and Test–Retest Reliability Studies". Journal of the American Academy of Audiology 29, n. 02 (febbraio 2018): 151–63. http://dx.doi.org/10.3766/jaaa.16146.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractIntensity peaks and valleys in the acoustic signal are salient cues to syllable structure, which is accepted to be a crucial early step in phonological processing. As such, the ability to detect low-rate (envelope) modulations in signal amplitude is essential to parse an incoming speech signal into smaller phonological units.The Parsing Syllable Envelopes (ParSE) test was developed to quantify the ability of children to recognize syllable boundaries using an amplitude modulation detection paradigm. The envelope of a 750-msec steady-state /a/ vowel is modulated into two or three pseudo-syllables using notches with modulation depths varying between 0% and 100% along an 11-step continuum. In an adaptive three-alternative forced-choice procedure, the participant identified whether one, two, or three pseudo-syllables were heard.Development of the ParSE stimuli and test protocols, and collection of normative and test–retest reliability data.Eleven adults (aged 23 yr 10 mo to 50 yr 9 mo, mean 32 yr 10 mo) and 134 typically developing, primary-school children (aged 6 yr 0 mo to 12 yr 4 mo, mean 9 yr 3 mo). There were 73 males and 72 females.Data were collected using a touchscreen computer. Psychometric functions (PFs) were automatically fit to individual data by the ParSE software. Performance was related to the modulation depth at which syllables can be detected with 88% accuracy (referred to as the upper boundary of the uncertainty region [UBUR]). A shallower PF slope reflected a greater level of uncertainty. Age effects were determined based on raw scores. z Scores were calculated to account for the effect of age on performance. Outliers, and individual data for which the confidence interval of the UBUR exceeded a maximum allowable value, were removed. Nonparametric tests were used as the data were skewed toward negative performance.Across participants, the performance criterion (UBUR) was met with a median modulation depth of 42%. The effect of age on the UBUR was significant (p < 0.00001). The UBUR ranged from 50% modulation depth for 6-yr-olds to 25% for adults. Children aged 6–10 had significantly higher uncertainty region boundaries than adults. A skewed distribution toward negative performance occurred (p = 0.00007). There was no significant difference in performance on the ParSE between males and females (p = 0.60). Test–retest z scores were strongly correlated (r = 0.68, p < 0.0000001).The ParSE normative data show that the ability to identify syllable boundaries based on changes in amplitude modulation improves with age, and that some children in the general population have performance much worse than their age peers. The test is suitable for use in planned studies in a clinical population.

23

Huang, Jui-Chan, Po-Chang Ko, Cher-Min Fong, Sn-Man Lai, Hsin-Hung Chen e Ching-Tang Hsieh. "Statistical Modeling and Simulation of Online Shopping Customer Loyalty Based on Machine Learning and Big Data Analysis". Security and Communication Networks 2021 (18 febbraio 2021): 1–12. http://dx.doi.org/10.1155/2021/5545827.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

With the increase in the number of online shopping users, customer loyalty is directly related to product sales. This research mainly explores the statistical modeling and simulation of online shopping customer loyalty based on machine learning and big data analysis. This research mainly uses machine learning clustering algorithm to simulate customer loyalty. Call the k-means interactive mining algorithm based on the Hash structure to perform data mining on the multidimensional hierarchical tree of corporate credit risk, continuously adjust the support thresholds for different levels of data mining according to specific requirements and select effective association rules until satisfactory results are obtained. After conducting credit risk assessment and early warning modeling for the enterprise, the initial preselected model is obtained. The information to be collected is first obtained by the web crawler from the target website to the temporary web page database, where it will go through a series of preprocessing steps such as completion, deduplication, analysis, and extraction to ensure that the crawled web page is correctly analyzed, to avoid incorrect data due to network errors during the crawling process. The correctly parsed data will be stored for the next step of data cleaning or data analysis. For writing a Java program to parse HTML documents, first set the subject keyword and URL and parse the HTML from the obtained file or string by analyzing the structure of the website. Secondly, use the CSS selector to find the web page list information, retrieve the data, and store it in Elements. In the overall fit test of the model, the root mean square error approximation (RMSEA) value is 0.053, between 0.05 and 0.08. The results show that the model designed in this study achieves a relatively good fitting effect and strengthens customers’ perception of shopping websites, and relationship trust plays a greater role in maintaining customer loyalty.

24

Xie, Yi, Huimei Chen, Vasuki Ranjani Chellamuthu, Ahmad bin Mohamed Lajam, Salvatore Albani, Andrea Hsiu Ling Low, Enrico Petretto e Jacques Behmoaras. "Comparative Analysis of Single-Cell RNA Sequencing Methods with and without Sample Multiplexing". International Journal of Molecular Sciences 25, n. 7 (29 marzo 2024): 3828. http://dx.doi.org/10.3390/ijms25073828.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique for investigating biological heterogeneity at the single-cell level in human systems and model organisms. Recent advances in scRNA-seq have enabled the pooling of cells from multiple samples into single libraries, thereby increasing sample throughput while reducing technical batch effects, library preparation time, and the overall cost. However, a comparative analysis of scRNA-seq methods with and without sample multiplexing is lacking. In this study, we benchmarked methods from two representative platforms: Parse Biosciences (Parse; with sample multiplexing) and 10x Genomics (10x; without sample multiplexing). By using peripheral blood mononuclear cells (PBMCs) obtained from two healthy individuals, we demonstrate that demultiplexed scRNA-seq data obtained from Parse showed similar cell type frequencies compared to 10x data where samples were not multiplexed. Despite relatively lower cell capture affecting library preparation, Parse can detect rare cell types (e.g., plasmablasts and dendritic cells) which is likely due to its relatively higher sensitivity in gene detection. Moreover, a comparative analysis of transcript quantification between the two platforms revealed platform-specific distributions of gene length and GC content. These results offer guidance for researchers in designing high-throughput scRNA-seq studies.

25

Gormley, Matthew R., Mark Dredze e Jason Eisner. "Approximation-Aware Dependency Parsing by Belief Propagation". Transactions of the Association for Computational Linguistics 3 (dicembre 2015): 489–501. http://dx.doi.org/10.1162/tacl_a_00153.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We show how to train the fast dependency parser of Smith and Eisner (2008) for improved accuracy. This parser can consider higher-order interactions among edges while retaining O( n3) runtime. It outputs the parse with maximum expected recall—but for speed, this expectation is taken under a posterior distribution that is constructed only approximately, using loopy belief propagation through structured factors. We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data. We find this gradient by back-propagation. That is, we treat the entire parser (approximations and all) as a differentiable circuit, as others have done for loopy CRFs (Domke, 2010; Stoyanov et al., 2011; Domke, 2011; Stoyanov and Eisner, 2012). The resulting parser obtains higher accuracy with fewer iterations of belief propagation than one trained by conditional log-likelihood.

26

Taghizadeh, Nasrin, e Heshaam Faili. "Cross-lingual Adaptation Using Universal Dependencies". ACM Transactions on Asian and Low-Resource Language Information Processing 20, n. 4 (26 maggio 2021): 1–23. http://dx.doi.org/10.1145/3448251.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We describe a cross-lingual adaptation method based on syntactic parse trees obtained from the Universal Dependencies (UD), which are consistent across languages, to develop classifiers in low-resource languages. The idea of UD parsing is to capture similarities as well as idiosyncrasies among typologically different languages. In this article, we show that models trained using UD parse trees for complex NLP tasks can characterize very different languages. We study two tasks of paraphrase identification and relation extraction as case studies. Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages, e.g., French, Farsi, and Arabic. The proposed approach opens up avenues for exploiting UD parsing in solving similar cross-lingual tasks, which is very useful for languages for which no labeled data is available.

27

Ren, Wenbo, Xinran Bian, Jiayuan Gong, Anqing Chen, Ming Li, Zhuofei Xia e Jingnan Wang. "Analysis and Visualization of New Energy Vehicle Battery Data". Future Internet 14, n. 8 (26 luglio 2022): 225. http://dx.doi.org/10.3390/fi14080225.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In order to safely and efficiently use their power as well as to extend the life of Li-ion batteries, it is important to accurately analyze original battery data and quickly predict SOC. However, today, most of them are analyzed directly for SOC, and the analysis of the original battery data and how to obtain the factors affecting SOC are still lacking. Based on this, this paper uses the visualization method to preprocess, clean, and parse collected original battery data (hexadecimal), followed by visualization and analysis of the parsed data, and finally the K-Nearest Neighbor (KNN) algorithm is used to predict the SOC. Through experiments, the method can completely analyze the hexadecimal battery data based on the GB/T32960 standard, including three different types of messages: vehicle login, real-time information reporting, and vehicle logout. At the same time, the visualization method is used to intuitively and concisely analyze the factors affecting SOC. Additionally, the KNN algorithm is utilized to identify the K value and P value using dynamic parameters, and the resulting mean square error (MSE) and test score are 0.625 and 0.998, respectively. Through the overall experimental process, this method can well analyze the battery data from the source, visually analyze various factors and predict SOC.

28

Bar-Haim, Roy, Ido Dagan e Jonathan Berant. "Knowledge-Based Textual Inference via Parse-Tree Transformations". Journal of Artificial Intelligence Research 54 (9 settembre 2015): 1–57. http://dx.doi.org/10.1613/jair.4584.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Textual inference is an important component in many applications for understanding natural language. Classical approaches to textual inference rely on logical representations for meaning, which may be regarded as "external" to the natural language itself. However, practical applications usually adopt shallower lexical or lexical-syntactic representations, which correspond closely to language structure. In many cases, such approaches lack a principled meaning representation and inference framework. We describe an inference formalism that operates directly on language-based structures, particularly syntactic parse trees. New trees are generated by applying inference rules, which provide a unified representation for varying types of inferences. We use manual and automatic methods to generate these rules, which cover generic linguistic structures as well as specific lexical-based inferences. We also present a novel packed data-structure and a corresponding inference algorithm that allows efficient implementation of this formalism. We proved the correctness of the new algorithm and established its efficiency analytically and empirically. The utility of our approach was illustrated on two tasks: unsupervised relation extraction from a large corpus, and the Recognizing Textual Entailment (RTE) benchmarks.

29

Collins, Michael, e Terry Koo. "Discriminative Reranking for Natural Language Parsing". Computational Linguistics 31, n. 1 (marzo 2005): 25–70. http://dx.doi.org/10.1162/0891201053630273.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75% F-measure, a 13% relative decrease in F-measure error over the baseline model's score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative-in terms of both simplicity and efficiency-to work on feature selection methods within log-linear (maximum-entropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.

30

Owhonda, Golden, Anwuri Luke, Japheth Russell Inyele e Chidinma Eze-Emiri. "The data analysis and validation engine: an application of artificial intelligence in the improvement of COVID-19 data management". International Journal Of Community Medicine And Public Health 9, n. 6 (27 maggio 2022): 2437. http://dx.doi.org/10.18203/2394-6040.ijcmph20221517.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Background: The proper management of healthcare data is fundamental to the health system processes; artificial intelligence has proven its value in these processes. Artificial intelligence can simplify the management of information, improve data security, and automate data flow. It is also useful in the analysis and interpretation of big data. Hence, it has the possibility of screening and diagnosing diseases, categorizing disease severity, detecting therapeutic agents, and forecasting outbreak spots.Methods: A data analysis and validation engine was developed to perform data quality control checks, classify addresses, and generate epidemiology numbers using the index and parse command on the command-line interface of DAVE.Results: DAVE correctly formatted data and created a local copy of the datastore and the index. It also returned previous EPID numbers to each entry and assigned a new EPID number to missed entries. DAVE imported the entries into the data template of the existing data management tool and generated a sample manifest that is then sent to the Laboratory. The data flow from the point of collection to storage and reporting was assessed as 100% accurate without errors and in real-time; there was also the ability to roll back if any error occurred.Conclusions: DAVE is a semi-autonomous system that operates with minimal human intervention; it is automatically faster as it leverages computing power to parse, store, and retrieve data while practically eliminating the need for manual data quality assessment. The DAVE functionality can be extended to incorporate additional features like forecasting outbreaks of emerging/re-emerging diseases, categorizing the severity of diseases and analysis of data in our setting.

31

Graben, Peter beim, Markus Huber, Werner Meyer, Ronald Römer e Matthias Wolff. "Vector Symbolic Architectures for Context-Free Grammars". Cognitive Computation 14, n. 2 (24 dicembre 2021): 733–48. http://dx.doi.org/10.1007/s12559-021-09974-y.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractVector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. We present a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser state. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation.

32

Britvin, Artur, Jawad Hammad Alrawashdeh e Rostyslav Tkachuck. "Client-Server System for Parsing Data from Web Pages". Advances in Cyber-Physical Systems 7, n. 1 (27 giugno 2022): 8–14. http://dx.doi.org/10.23939/acps2022.01.008.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

An overview of the basic principles and approaches for extracting information and processing information from web pages has been conducted. A methodology for developing a client-server system based on a tool for automation of work in Selenium web browsers based on the analyzed information about data parsing has been created. A third-party API as a user interface to simplify and speed up system development has been used. User access without downloading additional software has been enabled. Data from web pages have been received and processed. Development has been based on this methodology of its own client-server system, which is used to parse and collect the information presented on web pages. Analysis of cloud technology services for further deployment of data collection system from web pages has been carried out. Assessment and analysis of the viability of the system in an autonomous state have been deployed in the cloud service during long-term operation.

33

Chen, Jianbo, e Michael Jordan. "LS-Tree: Model Interpretation When the Data Are Linguistic". Proceedings of the AAAI Conference on Artificial Intelligence 34, n. 04 (3 aprile 2020): 3454–61. http://dx.doi.org/10.1609/aaai.v34i04.5749.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We study the problem of interpreting trained classification models in the setting of linguistic data sets. Leveraging a parse tree, we propose to assign least-squares-based importance scores to each word of an instance by exploiting syntactic constituency structure. We establish an axiomatic characterization of these importance scores by relating them to the Banzhaf value in coalitional game theory. Based on these importance scores, we develop a principled method for detecting and quantifying interactions between words in a sentence. We demonstrate that the proposed method can aid in interpretability and diagnostics for several widely-used language models.

34

Guo, Yinuo, Zeqi Lin, Jian-Guang Lou e Dongmei Zhang. "Iterative Utterance Segmentation for Neural Semantic Parsing". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 14 (18 maggio 2021): 12937–45. http://dx.doi.org/10.1609/aaai.v35i14.17530.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Neural semantic parsers usually fail to parse long and complex utterances into correct meaning representations, due to the lack of exploiting the principle of compositionality. To address this issue, we present a novel framework for boosting neural semantic parsers via iterative utterance segmentation. Given an input utterance, our framework iterates between two neural modules: a segmenter for segmenting a span from the utterance, and a parser for mapping the span into a partial meaning representation. Then, these intermediate parsing results are composed into the final meaning representation. One key advantage is that this framework does not require any handcraft templates or additional labeled data for utterance segmentation: we achieve this through proposing a novel training method, in which the parser provides pseudo supervision for the segmenter. Experiments on Geo, ComplexWebQuestions and Formulas show that our framework can consistently improve performances of neural semantic parsers in different domains. On data splits that require compositional generalization, our framework brings significant accuracy gains: Geo 63.1~81.2, Formulas 59.7~72.7, ComplexWebQuestions 27.1~56.3.

35

LINTEAN, MIHAI, e VASILE RUS. "LARGE SCALE EXPERIMENTS WITH NAIVE BAYES AND DECISION TREES FOR FUNCTION TAGGING". International Journal on Artificial Intelligence Tools 17, n. 03 (giugno 2008): 483–99. http://dx.doi.org/10.1142/s0218213008004011.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper describes the use of two machine learning techniques, naive Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic parse tree. Function tags are extra functional information, such as logical subject or predicate, that can be added to certain nodes in syntactic parse trees. We model the function tags assignment problem as a classification problem. Each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags. The paper offers the first systematic comparison of the two techniques, naive Bayes and decision trees, for the task of function tags assignment. The comparison is based on a standardized data set, the Penn Treebank, a collection of sentences annotated with syntactic information including function tags. We found out that decision trees generally outperform naive Bayes for the task of function tagging. Furthermore, this is the first large scale evaluation of decision trees based solutions to the task of functional tagging.

36

Tshering, Younten, Suyogya Ratna Tamrakar e Sai Preetham Kamishetty. "Developing programming language with compilers using JFlex in NetBean: Expanding and testing simple operators by implementing a calculator". International Journal for Research in Applied Science and Engineering Technology 10, n. 8 (31 agosto 2022): 1992–2005. http://dx.doi.org/10.22214/ijraset.2022.46665.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract: JFlex is a lexical analyzer generator and takes input requirements with a set of regular expressions and corresponding actions. It creates a program (a lexer) that reads input, matches the input against the regular expressions, and runs the matching action. This paper shows how Programming Language can be developed. This work was done to develop a simple programming language with compilers using JFlex in NetBean so that it can support assignment statements, if then else, while do and type checking and its execution. The data type included are int, real, char, and Boolean/String. The key concept used in this work was the execution of the grammar or rules in the cup file. The parse tree records a sequence of rules the parser applies to recognize the input. The tool used for the development of lexical analyzers was JFlex. JFlex Lexers was based on deterministic finite automata (DFAs). To show the implementation and working of operators, a simple calculator was designed that supports addition and multiplication operations. Further, the key compiler concepts like lexical analyzers, semantic analysis, and parse trees are discussed. This paper will help understand the syntax and way to develop simple language. For the programming language developed, the evaluations of the expressions and statements are recursively done. Type checking and Error checking are also done where two operands are checked for their compatibility with the operator and are shown if incompatible expressions are found.

37

Maharaj, Robin, Vipin Balyan e Mohammed Tariq Kahn. "Design of IIoT device to parse data directly to scada systems using LoRa physical layer". International Journal on Smart Sensing and Intelligent Systems 15, n. 1 (1 gennaio 2022): 1–13. http://dx.doi.org/10.21307/ijssis-2021-023.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract With the advent of the 4th Industrial revolution and the need to improve productivity levels, there is an increasing need for data analytics. Techniques should be pursued to extend the range of sensors both to Local systems and to remote systems. In this paper, the proof of concept is implemented on an STM34F4 development board to realise the performance of two wireless technologies to extend the range of sensor connectivity to a local process control system. The technologies investigated are physical layer LoRa nodes, nrf2401l (a 2.4 GHz radio module), 4 to 20 mA Converters together with protocol wrappers and interfaces to enhance the data value chain in a legacy process control system.

38

Tao, Shimin, Weibin Meng, Yimeng Cheng, Yichen Zhu, Ying Liu, Chunning Du, Tao Han, Yongpeng Zhao, Xiangguang Wang e Hao Yang. "LogStamp". ACM SIGMETRICS Performance Evaluation Review 49, n. 4 (2 giugno 2022): 93–98. http://dx.doi.org/10.1145/3543146.3543168.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Logs are one of the most critical data for service management. It contains rich runtime information for both services and users. Since size of logs are often enormous in size and have free handwritten constructions, a typical log-based analysis needs to parse logs into structured format first. However, we observe that most existing log parsing methods cannot parse logs online, which is essential for online services. In this paper, we present an automatic online log parsing method, name as LogStamp. We extensively evaluate LogStamp on five public datasets to demonstrate the effectiveness of our proposed method. The experiments show that our proposed method can achieve high accuracy with only a small portion of the training set. For example, it can achieve an average accuracy of 0.956 when using only 10% of the data training.

39

Dong, Li, Furu Wei, Shujie Liu, Ming Zhou e Ke Xu. "A Statistical Parsing Framework for Sentiment Classification". Computational Linguistics 41, n. 2 (giugno 2015): 293–336. http://dx.doi.org/10.1162/coli_a_00221.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We present a statistical parsing framework for sentence-level sentiment classification in this article. Unlike previous works that use syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence. We show that complicated phenomena in sentiment analysis (e.g., negation, intensification, and contrast) can be handled the same way as simple and straightforward sentiment expressions in a unified and probabilistic way. We formulate the sentiment grammar upon Context-Free Grammars (CFGs), and provide a formal description of the sentiment parsing framework. We develop the parsing model to obtain possible sentiment parse trees for a sentence, from which the polarity model is proposed to derive the sentiment strength and polarity, and the ranking model is dedicated to selecting the best sentiment tree. We train the parser directly from examples of sentences annotated only with sentiment polarity labels but without any syntactic annotations or polarity annotations of constituents within sentences. Therefore we can obtain training data easily. In particular, we train a sentiment parser, s.parser, from a large amount of review sentences with users' ratings as rough sentiment polarity labels. Extensive experiments on existing benchmark data sets show significant improvements over baseline sentiment classification approaches.

40

Hu, Jie, Xi Nong Zhang e Shi Lin Xie. "An Intelligent Modeling Method Based on Genetic Programming and Genetic Algorithm". Advanced Materials Research 33-37 (marzo 2008): 795–800. http://dx.doi.org/10.4028/www.scientific.net/amr.33-37.795.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper utilizes Genetic Programming(GP) and Genetic Algorithm(GA) to analyze experiment data. The purpose of this research is to establish a function model of the data. The core methodology of this research is using GP to get the approximate model first, and then optimizes the parameters and enhance the fitness value of the model by using GA. To validate this method, two examples are given: one is the reconstruction of permeability-strain equation of the rock in literature[1]; another example is the function search automatically of the wire cable isolator experiment data. In the process of programming of parse tree, this paper adopted a new way that different from three traditional methods, the parse tree is described by matrix of special size, more significantly, this new method makes the genetic operation of crossover and mutation intuitionstic, even the pellucid Matlab programming language could implement it.

41

Vitagliano, Gerardo, Mazhar Hameed, Lan Jiang, Lucas Reisener, Eugene Wu e Felix Naumann. "Pollock: A Data Loading Benchmark". Proceedings of the VLDB Endowment 16, n. 8 (aprile 2023): 1870–82. http://dx.doi.org/10.14778/3594512.3594518.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is csv. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard csv formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic "pollution" process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.

42

Song, Hyun-Je, Seong-Bae Park e Se Young Park. "Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph". Mathematical Problems in Engineering 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/429807.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper proposes a novel method to compute how similar two program source codes are. Since a program source code is represented as a structural form, the proposed method adopts convolution kernel functions as a similarity measure. Actually, a program source code has two kinds of structural information. One is syntactic information and the other is the dependencies of function calls lying on the program. Since the syntactic information of a program is expressed as its parse tree, the syntactic similarity between two programs is computed by a parse tree kernel. The function calls within a program provide a global structure of a program and can be represented as a graph. Therefore, the similarity of function calls is computed with a graph kernel. Then, both structural similarities are reflected simultaneously into comparing program source codes by composing the parse tree and the graph kernels based on a cyclomatic complexity. According to the experimental results on a real data set for program plagiarism detection, the proposed method is proved to be effective in capturing the similarity between programs. The experiments show that the plagiarized pairs of programs are found correctly and thoroughly by the proposed method.

43

BALDRIDGE, JASON, e MILES OSBORNE. "Active learning and logarithmic opinion pools for HPSG parse selection". Natural Language Engineering 14, n. 2 (aprile 2008): 191–222. http://dx.doi.org/10.1017/s1351324906004396.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractFor complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called theLogarithmic Opinion Pool(LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.

44

Tan, Jin Jack, Jiun Cai Ong, Kin Keong Chan, Kam Hing How e Jee Hou Ho. "Development of a Portable Automated Piano Player CantaPlayer". Applied Mechanics and Materials 284-287 (gennaio 2013): 2037–43. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.2037.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper describes the development of a low cost, compact and portable automated piano player CantaPlayer. The system accepts digital MIDI (Musical Instrument Digital Interface) files as input and develops pushing actions against piano keys which in turn produces sounds of notes. CantaPlayer uses Pure Data, an audio processing software to parse MIDI files and serve as user interfaces. The parsed information will be sent to Arduino, an open source microcontroller platform, via serial communication. The Arduino I/O pins will be triggered based on the information from Pure Data of which connected transistors will be activated, acting as a switch to draw in larger power supply to power the solenoids. The solenoids will then push the respective piano keys and produce music. The performance of CantaPlayer is evaluated by examining the synchronousness of the note playing sequence for a source MIDI and the corresponding reproduced MIDI. Three types of MIDI playing sequence (scale, polyphonic and rapid note switching) were tested and the results were satisfactory.

45

Wagdarikar, Rohitkumar Rudrappa, e Sandhya P. "Parallelism in web services: design of parallel XML parser for web services". Indonesian Journal of Electrical Engineering and Computer Science 16, n. 3 (1 dicembre 2019): 1407. http://dx.doi.org/10.11591/ijeecs.v16.i3.pp1407-1415.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

<p>A WS provides the communication between heterogeneous systems. While performing this operation, we need to focus on QoS of consumer, provider and registry directory. There will be some parameters like WS selection, prediction and rank these are parameters need to consider while QoS implementation in web services. While performing integration in web services we need to focus on QoS requirements regarding server and network performance. Performance of WS is related to locations i.e the network distance and the Internet connections between consumer and provider. There will be more QoS approach which works on consumers collected QoS data, based on this data system can predict the QoS of WS. Throughput and response time are the QoS of WS. In this paper, we have proposed parallel XML parser, by which we can parse UDDI, WSDL and SOAP XML files parallel by which it will improve the response time and throughput of WS.</p>

46

Sirri, Paul, Elizabeth M. Palmer e Essam Heggy. "Processing and Analysis for Radio Science Experiments (PARSE): Graphical Interface for Bistatic Radar". Planetary Science Journal 3, n. 1 (1 gennaio 2022): 24. http://dx.doi.org/10.3847/psj/ac3a07.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract Opportunistic bistatic radar (BSR) observations of planetary surfaces can probe the textural and electrical properties of several solar system bodies without needing a dedicated instrument or additional mission requirements, providing unique insights into volatile enrichment and supporting future landing, anchoring, and in situ sampling. Given their opportunistic nature, complex observation geometries, and required radiometric knowledge of the received radio signal, these data are particularly challenging to process, analyze, and interpret for most planetary science data users, who can be unfamiliar with link budget analysis of received echoes. The above impedes real-time use of BSR data to support mission operations, such as identifying safe landing locations on small bodies, as was the case for the Rosetta mission. To address this deficiency, we develop an open-source graphical user interface—Processing and Analysis for Radio Science Experiments (PARSE)—that assesses the feasibility of performing BSR observations and automates radiometric signal processing, power spectral analysis, and visualization of DSN planetary radio science data sets acquired during mission operations or archived on NASA’s Planetary Data System. In this first release, PARSE automates the processing chain developed for Dawn at Asteroid Vesta, streamlining the detection of DSN-received surface-scatter echoes generated as the spacecraft enters/exits occultations behind the target. Future releases will include support for existing Arecibo data sets and other Earth-based radio observatories. Our tool enables the broader planetary science community, beyond planetary radar signal processing experts, to utilize BSR data sets to characterize electrical and textural properties of planetary surfaces. Such tools are becoming increasingly important as the number of space missions—and subsequent opportunities for orbital radio science observations—continue to grow.

47

Shcherbina, Olha, e Yuliya Vakulenko. "IMPLEMENTATION AND PROSPECTS OF USING PARSER PROGRAMS IN HIGHER EDUCATION INSTITUTIONS". Scientific journal “Library Science. Record Studies. Informology”, n. 2 (1 settembre 2021): 88–95. http://dx.doi.org/10.32461/2409-9805.2.2021.238787.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The purpose of the article is the need to develop a concept for the formation of communication using parser programs in the personnel department of higher education to improve educational activities. Methodology. In the process of research the methods of informology and social communication, logical-dialectical method of cognition, methods of system analysis, abstractions were applied, which allowed to reveal the content of implementation of parser programs; to analyze the use of parser programs in higher education institutions on the example of Vasyl' Stus Donetsk National University. Scientific Novelty. The necessity of introduction and use of parser programs in higher education institutions is substantiated. The use of Google Spreadsheet in the human resources department of the university as an information retrieval system and text-digital database to improve the educational activities of higher education institutions has been studied. Conclusions. Analyzing the communications of the institution of higher education, it is necessary to name their main directions: communication between the organization and its external environment; communication between management levels and departments; communication manager-subordinate; communication between the leader and the working group; informal communications. The purpose of using parser programs in higher education institutions is primarily to increase the efficiency of the educational process, improve service and reduce costs in the data processing. The use of parsing in the personnel department of a higher education institution is multifaceted. Today, parser programs perform parsing, lexical, automated analysis, process many information requests, parse documents, extracting from there the necessary and useful user data. The Google Spreadsheets parser program aims to use the latest information and communication technologies that help improve management, improve employee interaction, and promote social innovation.

48

SANTAMARÍA, JESÚS, e LOURDES ARAUJO. "Pattern-based unsupervised parsing method". Natural Language Engineering 22, n. 3 (4 giugno 2014): 397–422. http://dx.doi.org/10.1017/s1351324914000072.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

AbstractWe have developed a heuristic method for unsupervised parsing of unrestricted text. Our method relies on detecting certain patterns of part-of-speech tag sequences of words in sentences. This detection is based on statistical data obtained from the corpus and allows us to classify part-of-speech tags into classes that play specific roles in the parse trees. These classes are then used to construct the parse tree of new sentences via a set of deterministic rules. Aiming to asses the viability of the method on different languages, we have tested it on English, Spanish, Italian, Hebrew, German, and Chinese. We have obtained a significant improvement over other unsupervised approaches for some languages, including English, and provided, as far as we know, the first results of this kind for others.

49

Domingues, Patrício, Luís Andrade e Miguel Frade. "A Digital Forensic View of Windows 10 Notifications". Forensic Sciences 2, n. 1 (31 gennaio 2022): 88–106. http://dx.doi.org/10.3390/forensicsci2010007.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Windows Push Notifications (WPN) is a relevant part of Windows 10 interaction with the user. It is comprised of badges, tiles and toasts. Important and meaningful data can be conveyed by notifications, namely by so-called toasts that can popup with information regarding a new incoming email or a recent message from a social network. In this paper, we analyze the Windows 10 Notification systems from a digital forensic perspective, focusing on the main forensic artifacts conveyed by WPN. We also briefly analyze Windows 11 first release’s WPN system, observing that internal data structures are practically identical to Windows 10. We provide an open source Python 3 command line application to parse and extract data from the Windows Push Notification SQLite3 database, and a Jython module that allows the well-known Autopsy digital forensic software to interact with the application and thus to also parse and process Windows Push Notifications forensic artifacts. From our study, we observe that forensic data provided by WPN are scarce, although they still need to be considered, namely if traditional Windows forensic artifacts are not available. Furthermore, toasts are clearly WPN’s most relevant source of forensic data.

50

Desigar, Tyagraj. "Golang Library/CLI Tool to Provide Data Analysis and Conversion Utilities". International Journal for Research in Applied Science and Engineering Technology 9, n. VI (20 giugno 2021): 1643–45. http://dx.doi.org/10.22214/ijraset.2021.35329.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

When developing or testing any complex software that deals with data or networks or low-level system software, or when doing regular administration of any development or production environments, or while learning about security or networks people often need tools to convert, parse, analyse and use data in various ways. These can sometimes be difficult to find, scattered and sometimes much more in-depth than they need. In this paper, we aim to propose a system that will get aggregate such operations into a library that will be accessible using the command line interface. The goal of this project is to create a versatile library/CLI tool that takes a simple text input and performs the user’s desired operations on it and produces an output.