Статті в журналах з теми "Data-to-text generation"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Data-to-text generation.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Data-to-text generation".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Yang, Sen, and Yang Liu. "Data-to-text Generation via Planning." Journal of Physics: Conference Series 1827, no. 1 (March 1, 2021): 012190. http://dx.doi.org/10.1088/1742-6596/1827/1/012190.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Puduppully, Ratish, Yao Fu, and Mirella Lapata. "Data-to-text Generation with Variational Sequential Planning." Transactions of the Association for Computational Linguistics 10 (2022): 697–715. http://dx.doi.org/10.1162/tacl_a_00484.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We focus on generating long-form text, that is, documents with multiple paragraphs, and propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way. We infer latent plans sequentially with a structured variational model, while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Experiments on two data-to-text benchmarks (RotoWire and MLB) show that our model outperforms strong baselines and is sample-efficient in the face of limited training data (e.g., a few hundred instances).
3

Gong, Heng, Xiaocheng Feng, and Bing Qin. "DiffuD2T: Empowering Data-to-Text Generation with Diffusion." Electronics 12, no. 9 (May 7, 2023): 2136. http://dx.doi.org/10.3390/electronics12092136.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for data-to-text generation show promising results in tackling two major challenges: content planning and surface realization, which transform structured data into fluent text. However, they lack an iterative refinement process for generating text, which can enable the model to perfect the text step-by-step while accepting control over the process. In this paper, we explore enhancing data-to-text generation with an iterative refinement process via diffusion. We have four main contributions: (1) we use the diffusion model to improve the prefix tuning for data-to-text generation; (2) we propose a look-ahead guiding loss to supervise the iterative refinement process for better text generation; (3) we extract content plans from reference text and propose a planning-then-writing pipeline to give the model content planning ability; and (4) we conducted experiments on three data-to-text generation datasets and both automatic evaluation criteria (BLEU, NIST, METEOR, ROUGEL, CIDEr, TER, MoverScore, BLEURT, and BERTScore) and human evaluation criteria (Quality and Naturalness) show the effectiveness of our model. Our model can improve the competitive prefix tuning method by 2.19% in terms of a widely-used automatic evaluation criterion BLEU (BiLingual Evaluation Understudy) on WebNLG dataset with GPT-2 Large as the pretrained language model backbone. Human evaluation criteria also show that our model can improve the quality and naturalness of the generated text across all three datasets.
4

Puduppully, Ratish, and Mirella Lapata. "Data-to-text Generation with Macro Planning." Transactions of the Association for Computational Linguistics 9 (2021): 510–27. http://dx.doi.org/10.1162/tacl_a_00381.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate text that is fluent (but often imprecise) and perform quite poorly at selecting appropriate content and ordering it coherently. To overcome some of these issues, we propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods which embrace separate modules for planning and surface realization. Macro plans represent high level organization of important content such as entities, events, and their interactions; they are learned from data and given as input to the generator. Extensive experiments on two data-to-text benchmarks (RotoWire and MLB) show that our approach outperforms competitive baselines in terms of automatic and human evaluation.
5

Zhang, Dell, Jiahao Yuan, Xiaoling Wang, and Adam Foster. "Probabilistic Verb Selection for Data-to-Text Generation." Transactions of the Association for Computational Linguistics 6 (December 2018): 511–27. http://dx.doi.org/10.1162/tacl_a_00038.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In data-to-text Natural Language Generation (NLG) systems, computers need to find the right words to describe phenomena seen in the data. This paper focuses on the problem of choosing appropriate verbs to express the direction and magnitude of a percentage change (e.g., in stock prices). Rather than simply using the same verbs again and again, we present a principled data-driven approach to this problem based on Shannon’s noisy-channel model so as to bring variation and naturalness into the generated text. Our experiments on three large-scale real-world news corpora demonstrate that the proposed probabilistic model can be learned to accurately imitate human authors’ pattern of usage around verbs, outperforming the state-of-the-art method significantly.
6

Li, Shujie, Liang Li, Ruiying Geng, Min Yang, Binhua Li, Guanghu Yuan, Wanwei He, et al. "Unifying Structured Data as Graph for Data-to-Text Pre-Training." Transactions of the Association for Computational Linguistics 12 (2024): 210–28. http://dx.doi.org/10.1162/tacl_a_00641.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performance. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different D2T generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.
7

Gong, Heng, Xiaocheng Feng, and Bing Qin. "Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning." Applied Sciences 13, no. 9 (April 30, 2023): 5573. http://dx.doi.org/10.3390/app13095573.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Data-to-text generation plays an important role in natural language processing by processing structured data and helping people understand those data by generating user-friendly descriptive text. It can be applied to news generation, financial report generation, customer service, etc. However, in practice, it needs to adapt to different domains that may lack an annotated training corpus. To alleviate this dataset scarcity problem, distantly-supervised data-to-text generation has emerged, which constructs a training corpus automatically and is more practical to apply to new domains when well-aligned data is expensive to obtain. However, this distant supervision method of training induces an over-generation problem since the automatically aligned text includes hallucination. These expressions cannot be inferred from the data, misguiding the model to produce unfaithful text. To exploit the noisy dataset while maintaining faithfulness, we empower the neural data-to-text model by dynamically increasing the weights of those well-aligned training instances and reducing the weights of the low-quality ones via meta learning. To our best knowledge, we are the first to alleviate the noise in distantly-supervised data-to-text generation via meta learning. In addition, we rewrite those low-quality texts to provide better training instances. Finally, we construct a new distantly-supervised dataset, DIST-ToTTo (abbreviation for Distantly-supervised Table-To-Text), and conduct experiments on both the benchmark WITA (abbreviation for the data source Wikipedia and Wikidata) and DIST-ToTTo datasets. The evaluation results show that our model can improve the state-of-the-art DSG (abbreviation for Distant Supervision Generation) model across all automatic evaluation metrics, with an improvement of 3.72% on the WITA dataset and 3.82% on the DIST-ToTTo dataset in terms of the widely used metric BLEU (abbreviation for BiLingual Evaluation Understudy). Furthermore, based on human evaluation, our model can generate more grammatically correct and more faithful text compared to the state-of-the-art DSG model.
8

Puduppully, Ratish, Li Dong, and Mirella Lapata. "Data-to-Text Generation with Content Selection and Planning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6908–15. http://dx.doi.org/10.1609/aaai.v33i01.33016908.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released RotoWIRE dataset.
9

Gkatzia, Dimitra, Oliver Lemon, and Verena Rieser. "Data-to-Text Generation Improves Decision-Making Under Uncertainty." IEEE Computational Intelligence Magazine 12, no. 3 (August 2017): 10–17. http://dx.doi.org/10.1109/mci.2017.2708998.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Rebuffel, Clement, Marco Roberti, Laure Soulier, Geoffrey Scoutheeten, Rossella Cancelliere, and Patrick Gallinari. "Controlling hallucinations at word level in data-to-text generation." Data Mining and Knowledge Discovery 36, no. 1 (October 22, 2021): 318–54. http://dx.doi.org/10.1007/s10618-021-00801-4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractData-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs. Consequently, state-of-art neural models include misleading statements –usually called hallucinations—in their outputs. The control of this phenomenon is today a major challenge for DTG, and is the problem addressed in the paper. Previous work deal with this issue at the instance level: using an alignment score for each table-reference pair. In contrast, we propose a finer-grained approach, arguing that hallucinations should rather be treated at the word level. Specifically, we propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance. These labels are obtained following a simple and efficient scoring procedure based on co-occurrence analysis and dependency parsing. Extensive evaluations, via automated metrics and human judgment on the standard WikiBio benchmark, show the accuracy of our alignment labels and the effectiveness of the proposed Multi-Branch Decoder. Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts. Further experiments on a degraded version of ToTTo show that our model could be successfully used on very noisy settings.
11

Jang, Jungsun, Hyungjong Noh, Yeonsoo Lee, Soo-Min Pantel, and Haechang Rim. "Narrative context-based data-to-text generation for ambient intelligence." Journal of Ambient Intelligence and Humanized Computing 11, no. 4 (January 17, 2019): 1421–29. http://dx.doi.org/10.1007/s12652-019-01176-7.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Jolly, Shailza, Zi Xuan Zhang, Andreas Dengel, and Lili Mou. "Search and Learn: Improving Semantic Coverage for Data-to-Text Generation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10858–66. http://dx.doi.org/10.1609/aaai.v36i10.21332.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Data-to-text generation systems aim to generate text descriptions based on input data (often represented in the tabular form). A typical system uses huge training samples for learning the correspondence between tables and texts. However, large training sets are expensive to obtain, limiting the applicability of these approaches in real-world scenarios. In this work, we focus on few-shot data-to-text generation. We observe that, while fine-tuned pretrained language models may generate plausible sentences, they suffer from the low semantic coverage problem in the few-shot setting. In other words, important input slots tend to be missing in the generated text. To this end, we propose a search-and-learning approach that leverages pretrained language models but inserts the missing slots to improve the semantic coverage. We further finetune our system based on the search results to smooth out the search noise, yielding better-quality text and improving inference efficiency to a large extent. Experiments show that our model achieves high performance on E2E and WikiBio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.
13

Uehara, Yui, and Tatsuya Ishigaki. "Commentary on “Learning with Contrastive Examples for Data-to-Text Generation”." Journal of Natural Language Processing 28, no. 2 (2021): 710–15. http://dx.doi.org/10.5715/jnlp.28.710.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Dale, Robert. "Navigating the text generation revolution: Traditional data-to-text NLG companies and the rise of ChatGPT." Natural Language Engineering 29, no. 4 (July 2023): 1188–97. http://dx.doi.org/10.1017/s1351324923000347.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractSince the release of ChatGPT at the end of November 2022, generative AI has been talked about endlessly in both the technical press and the mainstream media. Large language model technology has been heralded as many things: the disruption of the search engine, the end of the student essay, the bringer of disinformation … but what does it mean for commercial providers of earlier iterations of natural language generation technology? We look at how the major players in the space are responding, and where things might go in the future.
15

Ma, Ting-Huai, Xin Yu, and Huan Rong. "A comprehensive transfer news headline generation method based on semantic prototype transduction." Mathematical Biosciences and Engineering 20, no. 1 (2022): 1195–228. http://dx.doi.org/10.3934/mbe.2023055.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
<abstract> <p>Most current deep learning-based news headline generation models only target domain-specific news data. When a new news domain appears, it is usually costly to obtain a large amount of data with reference truth on the new domain for model training, so text generation models trained by traditional supervised approaches often do not generalize well on the new domain—inspired by the idea of transfer learning, this paper designs a cross-domain transfer text generation method based on domain data distribution alignment, intermediate domain redistribution, and zero-shot learning semantic prototype transduction, focusing on the data problem with no reference truth in the target domain. Eventually, the model can be guided by the most relevant source domain data to generate headlines from the target domain news text through the semantic correlation between source and target domain data during the training process of generating headlines for the target domain news, even without any reference truth of the news headlines in the target domain, which improves the usability of the text generation model in real scenarios. The experimental results show that the proposed transfer text generation method has a good domain transfer effect and outperforms other existing transfer text generation methods in various text generation evaluation indexes, proving the proposed method's effectiveness in this paper.</p> </abstract>
16

Joseph, Ethan, Julian Lioanag, and Mei Si. "Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking." Italian Journal of Computational Linguistics 7, no. 1 | 2 (December 1, 2021): 223–44. http://dx.doi.org/10.4000/ijcol.909.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Ma, Mingyu Derek, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, and Wei Wang. "STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 18751–59. http://dx.doi.org/10.1609/aaai.v38i17.29839.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Information extraction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies. They heavily rely on task-specific training data in the form of (passage, target structure) pairs to obtain reasonable performance. However, obtaining such data through human annotation is costly, leading to a pressing need for low-resource information extraction approaches that require minimal human labeling for real-world applications. Fine-tuning supervised models with synthesized training data would be a generalizable method, but the existing data generation methods either still rely on large-scale ground-truth data or cannot be applied to complicated IE tasks due to their poor performance. To address these challenges, we propose STAR, a data generation method that leverages Large Language Models (LLMs) to synthesize data instances given limited seed demonstrations, thereby boosting low-resource information extraction performance. Our approach involves generating target structures (Y) followed by generating passages (X), all accomplished with the aid of LLMs. We design fine-grained step-by-step instructions to obtain the initial data instances. We further reduce errors and improve data quality through self-reflection error identification and self-refinement with iterative revision. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data. Human assessment of the data quality shows STAR-generated data exhibit higher passage quality and better align with the task definitions compared with the human-curated data.
18

Avhad, Pranjali. "WordCanvas: Text-to-Image Generation." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (May 7, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem32152.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This project investigates the novel use of stable dif- fusion techniques to generate high-quality images from detailed text descriptions. The combination of natural language under- standing and computer vision in text-to-image conversion opens up new possibilities for content creation and communication. Using cutting-edge stable diffusion models, our project builds a solid foundation for the generation process, which includes tokenization, pre-processing, specialized architecture design, and post-processing techniques. The advantages include eye-catching images, increased user engagement, content personalization, and improved accessibility. Automation of content generation has applications in marketing, education, data visualization, and creative expression. However, challenges such as model accuracy, ethical concerns, and biases need addressing. Achieving a balance between automation and human supervision is critical for the responsible application of this transformative capability. Index Terms—Stable diffusion, Text-to-image conversion, Nat- ural language understanding, Pre - processing, Post-processing techniques, Content personalization
19

THEUNE, M., E. KLABBERS, J. R. DE PIJPER, E. KRAHMER, and J. ODIJK. "From data to speech: a general approach." Natural Language Engineering 7, no. 1 (March 2001): 47–86. http://dx.doi.org/10.1017/s1351324901002625.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We present a data-to-speech system called D2S, which can be used for the creation of data-to-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a natural language text expressing the system's input data, and speech generation is used to make this text audible. In D2S, this combination is exploited by using linguistic information available in the language generation module for the computation of prosody. This allows us to achieve a better prosodic output quality than can be achieved in a plain text-to-speech system. For language generation in D2S, the use of syntactically enriched templates is guided by knowledge of the discourse context, while for speech generation pre-recorded phrases are combined in a prosodically sophisticated manner. This combination of techniques makes it possible to create linguistically sound but efficient systems with a high quality language and speech output.
20

Fu, Zihao, Lidong Bing, and Wai Lam. "Open Domain Event Text Generation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7748–55. http://dx.doi.org/10.1609/aaai.v34i05.6278.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text generation tasks aim at generating human-readable text from different kinds of data. Normally, the generated text only contains the information included in the data and its application is thus restricted to some limited scenarios. In this paper, we extend the task to an open domain event text generation scenario with an entity chain as its skeleton. Specifically, given an entity chain containing several related event entities, the model should retrieve from a trustworthy repository (e.g. Wikipedia) the detailed information of these entities and generate a description text based on the retrieved sentences. We build a new dataset called WikiEvent1 that provides 34K pairs of entity chain and its corresponding description sentences. To solve the problem, we propose a wiki augmented generator framework that contains an encoder, a retriever, and a decoder. The encoder encodes the entity chain into a hidden space while the decoder decodes from the hidden space and generates description text. The retriever retrieves relevant text from a trustworthy repository which provides more information for generation. To alleviate the overfitting problem, we propose a novel random drop component that randomly deletes words from the retrieved sentences making our model more robust for handling long input sentences. We apply the proposed model on the WikiEvent dataset and compare it with a few baselines. The experimental results show that our carefully-designed architecture does help generate better event text, and extensive analysis further uncovers the characteristics of the proposed task.
21

Journal, IJSREM. "Enhancing Data Representation: A Novel Text-to-Image Protocol for Advanced Visual Content Generation using Generative Pre-trained Transformers." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 01 (January 15, 2024): 1–13. http://dx.doi.org/10.55041/ijsrem28134.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The rapid advancement of text-to-image generation has led to the development of innovative protocols for creating visual content from textual descriptions. This article presents a cutting-edge text-to-image protocol designed to enhance data representation through advanced neural network architectures and natural language processing techniques. The protocol leverages state-of-the-art deep learning models to generate high-fidelity images from textual inputs, offering significant potential for applications in diverse fields such as art generation, e-commerce, and content creation. The proposed protocol demonstrates promising results in producing realistic and contextually relevant images, marking a substantial leap forward in the realm of text-to-image technology. Key Words: Text-to-Image Protocol , Data Representation Neural Network Architectures,Natural Language Processing ,Deep Learning Models ,Image Generation, E-commerce Applications.
22

Currie, Janet, Henrik Kleven, and Esmée Zwiers. "Technology and Big Data Are Changing Economics: Mining Text to Track Methods." AEA Papers and Proceedings 110 (May 1, 2020): 42–48. http://dx.doi.org/10.1257/pandp.20201058.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The last 40 years have seen huge innovations in computing and in the availability of data. Data derived from millions of administrative records or by using (as we do) new methods of data generation such as text mining are now common. New data often requires new methods, which in turn can inspire new data collection. If history is any guide, some methods will stick and others will prove to be a flash in the pan. However, the larger trends toward demanding greater credibility and transparency from researchers in applied economics and a 'collage' approach to assembling evidence will likely continue.
23

Shonenkov, A. V., D. K. Karachev, M. Y. Novopoltsev, M. S. Potanin, D. V. Dimitrov, and A. V. Chertok. "Handwritten text generation and strikethrough characters augmentation." Computer Optics 46, no. 3 (June 2022): 455–64. http://dx.doi.org/10.18287/2412-6179-co-1049.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models.
24

Libbi, Claudia Alessandra, Jan Trienes, Dolf Trieschnigg, and Christin Seifert. "Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records." Future Internet 13, no. 5 (May 20, 2021): 136. http://dx.doi.org/10.3390/fi13050136.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A major hurdle in the development of natural language processing (NLP) methods for Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns prevent the distribution of EHRs, and the annotation of data is known to be costly and cumbersome. Synthetic data presents a promising solution to the privacy concern, if synthetic data has comparable utility to real data and if it preserves the privacy of patients. However, the generation of synthetic text alone is not useful for NLP because of the lack of annotations. In this work, we propose the use of neural language models (LSTM and GPT-2) for generating artificial EHR text jointly with annotations for named-entity recognition. Our experiments show that artificial documents can be used to train a supervised named-entity recognition model for de-identification, which outperforms a state-of-the-art rule-based baseline. Moreover, we show that combining real data with synthetic data improves the recall of the method, without manual annotation effort. We conduct a user study to gain insights on the privacy of artificial text. We highlight privacy risks associated with language models to inform future research on privacy-preserving automated text generation and metrics for evaluating privacy-preservation during text generation.
25

Pandraju, Saichandra, and Sakthi Ganesh Mahalingam. "Answer-Aware Question Generation from Tabular and Textual Data using T5." International Journal of Emerging Technologies in Learning (iJET) 16, no. 18 (September 20, 2021): 256. http://dx.doi.org/10.3991/ijet.v16i18.25121.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic Question Generation (AQG) systems are applied in a myriad of domains to generate questions from sources such as documents, images, knowledge graphs to name a few. With the rising interest in such AQG systems, it is equally important to recognize structured data like tables while generating questions from documents. In this paper, we propose a single model architecture for question generation from tables along with text using “Text-to-Text Transfer Transformer” (T5) - a fully end-to-end model which does not rely on any intermediate planning steps, delexicalization, or copy mechanisms. We also present our systematic approach in modifying the ToTTo dataset, release the augmented dataset as TabQGen along with the scores achieved using T5 as a baseline to aid further research.
26

Zhang, Yangqianhui. "Exploration of Cross-Modal Text Generation Methods in Smart Justice." Scientific Programming 2021 (October 21, 2021): 1–14. http://dx.doi.org/10.1155/2021/3225933.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the development of modern science and technology, information technology has brought great changes to many fields. Smart justice has become one of the increasing areas that people are paying more attention to. For example, large and small cases occur every day, and the legal library is continuously updated. Therefore, a large number of documents and evidence collection archives will bring tremendous pressure on the judiciary. The text generation technology can automatically present the results extracted from these redundant legal data and express the results of the analysis in natural language. It facilitates the business for huge amounts of legal data effectively, which relieves the work pressure of the judicial department. However, the text generation algorithms have not been promoted in justice. Therefore, this paper focuses on what benefits text generation can produce in law and how to apply text generation technology in legal field. The survey provides a comprehensive overview on text generation firstly, through summarizing the existing methods, that is, text to text, data to text, and visual to text. Then, we examine the process of the practical application of text generation in law. Furthermore, this paper puts forward the challenges and possible solutions to the judicial text generation, which provides pointers on future work.
27

Sun, Peijie, Le Wu, Kun Zhang, Yu Su, and Meng Wang. "An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation." ACM Transactions on Information Systems 40, no. 3 (July 31, 2022): 1–29. http://dx.doi.org/10.1145/3483611.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.
28

Gong, Haisong, Qiang Liu, Shu Wu, and Liang Wang. "Text-Guided Molecule Generation with Diffusion Language Model." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 24, 2024): 109–17. http://dx.doi.org/10.1609/aaai.v38i1.27761.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.
29

Zahoor, Saniya, and Shabir A.Sofi. "Automatic Podcast Generation." Journal of University of Shanghai for Science and Technology 23, no. 10 (October 1, 2021): 22–28. http://dx.doi.org/10.51201/jusst/21/09700.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A massive leap forward in the field of Human Computer Interaction in living memory has been achieved by the Google Duo system to sustain a natural sounding and coherent phone call with a human being without them being able to tell the difference. The computer system capitalized on recent developments in the field of Synthetic voice generation along with real time processing and response generation. The aim of this work is to replicate the success of that presentation as well as to build upon that body of work and generate useful content summaries which can be converted into high quality podcasts. In particular, our approach first comprises of extracting text data from web pages using various Natural Language Processing (NLP) tools as well as deep neural networks. After that it summarises text into byte sized chunks using extractive summarisation. Then, in the end it generates clear, high quality audio podcasts from the produced summaries using recently developed text to speech engines.
30

Rosenberg, Harrison, Shimaa Ahmed, Guruprasad Ramesh, Kassem Fawaz, and Ramya Korlakai Vinayak. "Limitations of Face Image Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 13 (March 24, 2024): 14838–46. http://dx.doi.org/10.1609/aaai.v38i13.29403.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models. Our survey data and analytics code can be found online at https://github.com/wi-pi/Limitations_of_Face_Generation
31

Li, Quanzhi, and Qiong Zhang. "Court Opinion Generation from Case Fact Description with Legal Basis." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 14840–48. http://dx.doi.org/10.1609/aaai.v35i17.17742.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this study, we proposed an approach to automatically generating court view from the fact description of a legal case. This is a text-to-text natural language generation problem, and it can help the automatic legal document generation. Due to the specialty of the legal domain, our model exploits the charge and law article information in the generation process, instead of utilizing just the fact description text. The BERT model is used as the encoder and a Transformer architecture is used as decoder. To smoothly integrate these two parts together, we employ two separate optimizers for the two components during the training process. The experiments on two data sets of Chinese legal cases show that our approach outperforms other methods.
32

Barbosa, Wendson A. S., and Daniel J. Gauthier. "Learning spatiotemporal chaos using next-generation reservoir computing." Chaos: An Interdisciplinary Journal of Nonlinear Science 32, no. 9 (September 2022): 093137. http://dx.doi.org/10.1063/5.0098707.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a computational time [Formula: see text]–[Formula: see text] times faster for training process and training data set [Formula: see text] times smaller than other machine learning algorithms. We also take advantage of the translational symmetry of the model to further reduce the computational cost and training data, each by a factor of [Formula: see text]10.
33

Philip, Philemon, and Sidra Minhas. "A Brief Survey on Natural Language Processing Based Text Generation and Evaluation Techniques." VFAST Transactions on Software Engineering 10, no. 3 (September 27, 2022): 24–36. http://dx.doi.org/10.21015/vtse.v10i3.1104.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text Generation is a pressing topic of Natural Language Processing that involves the prediction of upcoming text. Applications like auto-complete, chatbots, auto-correct, and many others use text generation to meet certain communicative requirements. However more accurate text generation methods are needed to encapsulate all possibilities of natural language communication. In this survey, we present cutting-edge methods being adopted for text generation. These methods are divided into three broad categories i.e. 1) Sequence-to-Sequence models (Seq2Seq), 2) Generative Adversarial Networks (GAN), and 3) Miscellaneous. Sequence-to-Sequence involves supervised methods, while GANs are unsupervised, aimed at reducing the dependence of models on training data. After this, we also list a few other text generation methods. We also summarize some evaluation metrics available for text generation and their Performance
34

Yu, Kyungho, Hyoungju Kim, Jeongin Kim, Chanjun Chun, and Pankoo Kim. "A Study on Webtoon Generation Using CLIP and Diffusion Models." Electronics 12, no. 18 (September 21, 2023): 3983. http://dx.doi.org/10.3390/electronics12183983.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This study focuses on harnessing deep-learning-based text-to-image transformation techniques to help webtoon creators’ creative outputs. We converted publicly available datasets (e.g., MSCOCO) into a multimodal webtoon dataset using CartoonGAN. First, the dataset was leveraged for training contrastive language image pre-training (CLIP), a model composed of multi-lingual BERT and a Vision Transformer that learnt to associate text with images. Second, a pre-trained diffusion model was employed to generate webtoons through text and text-similar image input. The webtoon dataset comprised treatments (i.e., textual descriptions) paired with their corresponding webtoon illustrations. CLIP (operating through contrastive learning) extracted features from different data modalities and aligned similar data more closely within the same feature space while pushing dissimilar data apart. This model learnt the relationships between various modalities in multimodal data. To generate webtoons using the diffusion model, the process involved providing the CLIP features of the desired webtoon’s text with those of the most text-similar image to a pre-trained diffusion model. Experiments were conducted using both single- and continuous-text inputs to generate webtoons. In the experiments, both single-text and continuous-text inputs were used to generate webtoons, and the results showed an inception score of 7.14 when using continuous-text inputs. The text-to-image technology developed here could streamline the webtoon creation process for artists by enabling the efficient generation of webtoons based on the provided text. However, the current inability to generate webtoons from multiple sentences or images while maintaining a consistent artistic style was noted. Therefore, further research is imperative to develop a text-to-image model capable of handling multi-sentence and -lingual input while ensuring coherence in the artistic style across the generated webtoon images.
35

Jiang, Nan, Jing Chen, Ri-Gui Zhou, Changxing Wu, Honglong Chen, Jiaqi Zheng, and Tao Wan. "PAN: Pipeline assisted neural networks model for data-to-text generation in social internet of things." Information Sciences 530 (August 2020): 167–79. http://dx.doi.org/10.1016/j.ins.2020.03.080.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Guan, Xinyi, and Shun Long. "Hierarchical Keyword Generation Method for Low-Resource Social Media Text." Information 14, no. 11 (November 15, 2023): 615. http://dx.doi.org/10.3390/info14110615.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The exponential growth of social media text information presents a challenging issue in terms of retrieving valuable information efficiently. Utilizing deep learning models, we can automatically generate keywords that express core content and topics of social media text, thereby facilitating the retrieval of critical information. However, the performance of deep learning models is limited by the labeled text data in the social media domain. To address this problem, this paper presents a hierarchical keyword generation method for low-resource social media text. Specifically, the text segment is introduced as a hierarchical unit of social media text to construct a hierarchical model structure and design a text segment recovery task for self-supervised training of the model, which not only improves the ability of the model to extract features from social media text, but also reduces the dependence of the keyword generation model on the labeled data in the social media domain. Experimental results from publicly available social media datasets demonstrate that the proposed method can effectively improve the keyword generation performance even given limited social media labeled data. Further discussions demonstrate that the self-supervised training stage based on the text segment recovery task indeed benefits the model in adapting to the social media text domain.
37

Chary, Podakanti Satyajith. "Text Generation: Using Markov Model & LSTM Networks to Generate Realistic Text." International Journal for Research in Applied Science and Engineering Technology 11, no. 12 (December 31, 2023): 1323–27. http://dx.doi.org/10.22214/ijraset.2023.57601.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract: Text generation plays a crucial role in various natural language processing applications, ranging from creative writing to chatbots. This research delves into the realm of text generation by exploring and comparing two distinct techniques: Markov models and Long Short-Term Memory (LSTM) networks. The study focuses on their ability to generate realistic text within specific styles or genres, providing valuable insights into their respective strengths and limitations. Markov models, rooted in probability theory, and LSTM networks, a type of recurrent neural network, represent contrasting approaches to text generation. The research employs these techniques on a carefully curated dataset, evaluating their performance based on coherence, style, and contextual relevance. The comparison aims to elucidate the nuanced differences in how these models capture dependencies within the data and their effectiveness in simulating authentic linguistic patterns. Through rigorous experimentation, this research investigates the intricacies of both Markov models and LSTM networks, shedding light on their individual contributions to the task of text generation. The examination extends beyond mere algorithmic efficacy, considering the impact of these techniques on the quality and diversity of the generated text. Additionally, the study explores the influence of hyperparameters, such as temperature in the context of LSTM networks, on the output's richness and variability.
38

Liu, Chang, Yuanhe Tian, Weidong Chen, Yan Song, and Yongdong Zhang. "Bootstrapping Large Language Models for Radiology Report Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 18635–43. http://dx.doi.org/10.1609/aaai.v38i17.29826.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Radiology report generation (RRG) aims to automatically generate a free-text description from a specific clinical radiograph, e.g., chest X-Ray images. Existing approaches tend to perform RRG with specific models trained on the public yet limited data from scratch, where they often lead to inferior performance owing to the problem of inefficient capabilities in both aligning visual and textual features and generating informative reports accordingly. Currently, large language models (LLMs) offered a promising solution to text generation with their power in learning from big data, especially for cross-modal scenarios such as RRG. However, most existing LLMs are pre-trained on general data, and suffer from the same problem of conventional approaches caused by knowledge gap between general and medical domain if they are applied to RRG. Therefore in this paper, we propose an approach to bootstrapping LLMs for RRG with a in-domain instance induction and a coarse-to-fine decoding process. Specifically, the in-domain instance induction process learns to align the LLM to radiology reports from general texts through contrastive learning. The coarse-to-fine decoding performs a text elevating process for those reports from the ranker, further enhanced with visual features and refinement prompts. Experimental results on two prevailing RRG datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate the superiority of our approach to previous state-of-the-art solutions. Further analyses illustrate that, for the LLM, the induction process enables it to better align with the medical domain and the coarse-to-fine generation allows it to conduct more precise text generation.
39

Li, Linfeng, Licheng Zhang, Chiwei Zhu, and Zhendong Mao. "QGAE: an End-to-end Answer-Agnostic Question Generation Model for Generating Question-Answer Pairs." JUSTC 53 (2023): 1. http://dx.doi.org/10.52396/justc-2023-0002.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Question generation aims to generate meaningful and fluent questions, which can address the lack of question-answer type annotated corpus by augmenting the available data. Using unannotated text with optional answers as input contents, question generation can be divided into two types based on whether answers are provided: answer-aware and answer-agnostic. While generating questions with providing answers is challenging, generating high-quality questions without providing answers is even more difficult, for both humans and machines. In order to address this issue, we proposed a novel end-to-end model called QGAE, which is able to transform answer-agnostic question generation into answer-aware question generation by directly extracting candidate answers. This approach effectively utilizes unlabeled data for generating high-quality question-answer pairs, and its end-to-end design makes it more convenient compared to a multi-stage method that requires at least two pre-trained models. Moreover, our model achieves better average scores and greater diversity. Our experiments show that QGAE achieves significant improvements in generating question-answer pairs, making it a promising approach for question generation.
40

Chai, Yuyang, Zhuang Li, Jiahui Liu, Lei Chen, Fei Li, Donghong Ji, and Chong Teng. "Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 17727–35. http://dx.doi.org/10.1609/aaai.v38i16.29725.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text classification models. Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models' capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines. Our codes available at https://github.com/yychai74/LD-VAE.
41

Laha, Anirban, Parag Jain, Abhijit Mishra, and Karthik Sankaranarayanan. "Scalable Micro-planned Generation of Discourse from Structured Data." Computational Linguistics 45, no. 4 (January 2020): 737–63. http://dx.doi.org/10.1162/coli_a_00363.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps.
42

Hei, Nailei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, and Wenqiang Zhang. "A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2139–47. http://dx.doi.org/10.1609/aaai.v38i3.27986.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics. Data and code are available at https://github.com/Naylenv/UF-FGTG.
43

Rossiello, Gaetano, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. "KnowGL: Knowledge Generation and Linking from Text." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16476–78. http://dx.doi.org/10.1609/aaai.v37i13.27084.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions compliant with the TBox of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART. Given a sentence, we fine-tune such models to detect pairs of entity mentions and jointly generate a set of facts consisting of the full set of semantic annotations for a KG, such as entity labels, entity types, and their relationships. To showcase the capabilities of our tool, we build a web application consisting of a set of UI widgets that help users to navigate through the semantic data extracted from a given input text. We make the KnowGL model available at https://huggingface.co/ibm/knowgl-large.
44

Konstas, I., and M. Lapata. "A Global Model for Concept-to-Text Generation." Journal of Artificial Intelligence Research 48 (October 30, 2013): 305–46. http://dx.doi.org/10.1613/jair.4025.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection ("what to say") and surface realization ("how to say") in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We recast generation as the task of finding the best derivation tree for a set of database records and describe an algorithm for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. Experimental evaluation on several domains achieves results competitive with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.
45

Mahajan, Dhruva, Ashish Gapat, Lalita Moharkar, Prathamesh Sawant, and Kapil Dongardive. "Artificial Generation of Realistic Voices." International Journal of Applied Sciences and Smart Technologies 03, no. 01 (June 21, 2021): 11–26. http://dx.doi.org/10.24071/ijasst.v3i1.2744.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this paper, we propose an end-to-end text-to-speech system deployment wherein a user feeds input text data which gets synthesized, variated, and altered into artificial voice at the output end. To create a text-to-speech model, that is, a model capable of generating speech with the help of trained datasets. It follows a process which organizes the entire function to present the output sequence in three parts. These three parts are Speaker Encoder, Synthesizer, and Vocoder. Subsequently, using datasets, the model accomplishes generation of voice with prior training and maintains the naturalness of speech throughout. For naturalness of speech we implement a zero-shot adaption technique. The primary capability of the model is to provide the ability of regeneration of voice, which has a variety of applications in the advancement of the domain of speech synthesis. With the help of speaker encoder, our model synthesizes user generated voice if the user wants the output trained on his/her voice which is feeded through the mic, present in GUI. Regeneration capabilities lie within the domain Voice Regeneration which generates similar voice waveforms for any text.
46

Nayan Banik, Chayti Saha, Chayti Saha, Ikbal Ahmed, Ikbal Ahmed, and Kulsum Akter Shapna. "Bangla text generation system by incorporating attention in sequence-to-sequence model." World Journal of Advanced Research and Reviews 14, no. 1 (April 30, 2022): 080–94. http://dx.doi.org/10.30574/wjarr.2022.14.1.0292.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this AI-driven digital era, the pervasive nature of digital data is possible due to the widespread and cheap access to the Internet. Internet is continuously flourishing with data in many forms. Among them, textual data are a great source of information where people share their expressions in written format. Social media, blogs, online newspapers, government documents are some notable textual data sources. Information extraction from this enormous amount of data by manual inspection is time-consuming, cumbersome, and sometimes impossible. Natural Language Processing (NLP) is the computational domain for addressing those limitations by solving human language-related problems. Text summarization, Named entity recognition, Question answering are some of them where a common task for a machine is to generate coherent text. In such scenarios, the input is a sequence of text, and the output is also a sequence, but they differ in length. Sequence-to-Sequence (Seq2Seq) is an algorithmic approach to address that scenario by utilizing layers of recurrent units. However, the simple Seq2Seq model fails to capture the long-term relationship on the input sequence. Research shows that the attention mechanism guides the model to concentrate on specific inputs. Existing literature shows a lack of quality research on this text generation problem in the Bangla language, whereas many languages show excellent results. This work aims to develop such a system by incorporating attention to the Seq2Seq model and justifying its applicability by comparing it with baseline models. The model perplexity shows that the system can generate human-level readable text using a preprocessed dataset.
47

Vinay S and Kumar Siddamallappa U. "A novel package of key generation and integrity validation in symmetric key cryptography." International Journal of Science and Research Archive 9, no. 2 (August 30, 2023): 997–1002. http://dx.doi.org/10.30574/ijsra.2023.9.2.0480.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this paper we presents a new cryptography method on symmetric key Cryptography along with that it will ensure the integrity of the data during the transmission. Here encryption and decryption operation is performed on bit stream of data. It is suitable for any type of text files. This method of encryption is simple and powerful to secure data in network against passive attack like traffic analysis to provide data confidentiality. The Cryptography package deals with generating a simple encryption key by choosing a byte of data in the plaintext. By using that secure encryption key Plain text is converted to cipher text and in the destination Cipher text is converted to plain text by performing basic XOR and other operation. To ensure integrity of data available parity bit methods are applied.
48

RIZZO, THOMAS G. "LIMITS ON NEW FERMIONS FROM $p\bar p$ COLLIDER DATA." Modern Physics Letters A 02, no. 07 (July 1987): 505–11. http://dx.doi.org/10.1142/s0217732387000628.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent data from the UA2 Collaboration is used to place new limits on the top quark mass as well as hypothetical new fermions such as additional light neutrinos, a fourth generation charged lepton L, a fourth generation [Formula: see text] down-type quark b′, as well as the [Formula: see text] isosinglet E6 exotic quark D.
49

Liu, Lei, Yeguo Sun, Yihong Liu, Rachel Edita O. Roxas, and Rodolfo C. Raga. "Research and Implementation of Text Generation Based on Text Augmentation and Knowledge Understanding." Computational Intelligence and Neuroscience 2022 (September 10, 2022): 1–10. http://dx.doi.org/10.1155/2022/2988639.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text generation has always been limited by the lack of corpus data required for language model (LM) training and the low quality of the generated text. Researchers have proposed some solutions, but these solutions are often complex and will greatly increase the consumption of computing resources. Referring to the current main solutions, this paper proposes a lightweight language model (EDA-BoB) based on text augmentation technology and knowledge understanding mechanism. Experiments show that the EDA-BoB model cannot only expand the scale of the training data set but also ensure the data quality at the cost of consuming little computing resources. Moreover, our model is shown to combine the contextual semantics of sentences to generate rich and accurate texts.
50

Chen, Yiwen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, and Guosheng Lin. "IT3D: Improved Text-to-3D Generation with Explicit View Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (March 24, 2024): 1237–44. http://dx.doi.org/10.1609/aaai.v38i2.27886.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images based on the renderings of coarse 3D models. Although the generated images mostly alleviate the aforementioned issues, challenges such as view inconsistency and significant content variance persist due to the inherent generative nature of large diffusion models, posing extensive difficulties in leveraging these images effectively. To overcome this hurdle, we advocate integrating a discriminator alongside a novel Diffusion-GAN dual training strategy to guide the training of 3D models. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data. We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches.

До бібліографії