Academic literature on the topic 'Text document classification'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text document classification.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Text document classification"
Mukherjee, Indrajit, Prabhat Kumar Mahanti, Vandana Bhattacharya, and Samudra Banerjee. "Text classification using document-document semantic similarity." International Journal of Web Science 2, no. 1/2 (2013): 1. http://dx.doi.org/10.1504/ijws.2013.056572.
Full textCheng, Betty Yee Man, Jaime G. Carbonell, and Judith Klein-Seetharaman. "Protein classification based on text document classification techniques." Proteins: Structure, Function, and Bioinformatics 58, no. 4 (January 11, 2005): 955–70. http://dx.doi.org/10.1002/prot.20373.
Full textYao, Liang, Chengsheng Mao, and Yuan Luo. "Graph Convolutional Networks for Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7370–77. http://dx.doi.org/10.1609/aaai.v33i01.33017370.
Full textKim, Jiyun, and Han-joon Kim. "Multidimensional Text Warehousing for Automated Text Classification." Journal of Information Technology Research 11, no. 2 (April 2018): 168–83. http://dx.doi.org/10.4018/jitr.2018040110.
Full textP, Ashokkumar, Siva Shankar G, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, and Thippa Reddy Gadekallu. "A Two-stage Text Feature Selection Algorithm for Improving Text Classification." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 3 (May 2021): 1–19. http://dx.doi.org/10.1145/3425781.
Full textZheng, Jianming, Yupu Guo, Chong Feng, and Honghui Chen. "A Hierarchical Neural-Network-Based Document Representation Approach for Text Classification." Mathematical Problems in Engineering 2018 (2018): 1–10. http://dx.doi.org/10.1155/2018/7987691.
Full textLee, Kangwook, Sanggyu Han, and Sung-Hyon Myaeng. "A discourse-aware neural network-based text model for document-level text classification." Journal of Information Science 44, no. 6 (December 4, 2017): 715–35. http://dx.doi.org/10.1177/0165551517743644.
Full textRahamat Basha, S., J. Keziya Rani, and J. J. C. Prasad Yadav. "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy." Engineering, Technology & Applied Science Research 9, no. 6 (December 1, 2019): 5001–5. http://dx.doi.org/10.48084/etasr.3173.
Full textM.Shaikh, Mustafa, Ashwini A. Pawar, and Vibha B. Lahane. "Pattern Discovery Text Mining for Document Classification." International Journal of Computer Applications 117, no. 1 (May 20, 2015): 6–12. http://dx.doi.org/10.5120/20516-2101.
Full textElhadad, Mohamed, Khaled Badran, and Gouda Salama. "Towards Ontology-Based web text Document Classification." Journal of Engineering Science and Military Technologies 17, no. 17 (April 1, 2017): 1–8. http://dx.doi.org/10.21608/ejmtc.2017.21564.
Full textDissertations / Theses on the topic "Text document classification"
Mondal, Abhro Jyoti. "Document Classification using Characteristic Signatures." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1511793852923472.
Full textSendur, Zeynel. "Text Document Categorization by Machine Learning." Scholarly Repository, 2008. http://scholarlyrepository.miami.edu/oa_theses/209.
Full textBlein, Florent. "Automatic Document Classification Applied to Swedish News." Thesis, Linköping University, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-3065.
Full textThe first part of this paper presents briefly the ELIN[1] system, an electronic newspaper project. ELIN is a framework that stores news and displays them to the end-user. Such news are formatted using the xml[2] format. The project partner Corren[3] provided ELIN with xml articles, however the format used was not the same. My first task has been to develop a software that converts the news from one xml format (Corren) to another (ELIN).
The second and main part addresses the problem of automatic document classification and tries to find a solution for a specific issue. The goal is to automatically classify news articles from a Swedish newspaper company (Corren) into the IPTC[4] news categories.
This work has been carried out by implementing several classification algorithms, testing them and comparing their accuracy with existing software. The training and test documents were 3 weeks of the Corren newspaper that had to be classified into 2 categories.
The last tests were run with only one algorithm (Naïve Bayes) over a larger amount of data (7, then 10 weeks) and categories (12) to simulate a more real environment.
The results show that the Naïve Bayes algorithm, although the oldest, was the most accurate in this particular case. An issue raised by the results is that feature selection improves speed but can seldom reduce accuracy by removing too many features.
Anne, Chaitanya. "Advanced Text Analytics and Machine Learning Approach for Document Classification." ScholarWorks@UNO, 2017. http://scholarworks.uno.edu/td/2292.
Full textAlsaad, Amal. "Enhanced root extraction and document classification algorithm for Arabic text." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/13510.
Full textMcElroy, Jonathan David. "Automatic Document Classification in Small Environments." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/682.
Full textFelhi, Mehdi. "Document image segmentation : content categorization." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0109/document.
Full textIn this thesis I discuss the document image segmentation problem and I describe our new approaches for detecting and classifying document contents. First, I discuss our skew angle estimation approach. The aim of this approach is to develop an automatic approach able to estimate, with precision, the skew angle of text in document images. Our method is based on Maximum Gradient Difference (MGD) and R-signature. Then, I describe our second method based on Ridgelet transform.Our second contribution consists in a new hybrid page segmentation approach. I first describe our stroke-based descriptor that allows detecting text and line candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. Finally, text candidates are clustered using mean-shift analysis technique according to their corresponding sizes. The method is applied for segmenting scanned document images (newspapers and magazines) that contain text, lines and photo regions. Finally, I describe our stroke-based text extraction method. Our approach begins by extracting connected components and selecting text character candidates over the CIE LCH color space using the Histogram of Oriented Gradients (HOG) correlation coefficients in order to detect low contrasted regions. The text region candidates are clustered using two different approaches ; a depth first search approach over a graph, and a stable text line criterion. Finally, the resulted regions are refined by classifying the text line candidates into « text» and « non-text » regions using a Kernel Support Vector Machine K-SVM classifier
Wang, Yanbo Justin. "Language-independent pre-processing of large document bases for text classification." Thesis, University of Liverpool, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445960.
Full textWang, Yalin. "Document analysis : table structure understanding and zone content classification /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6079.
Full textWei, Zhihua. "The research on chinese text multi-label classification." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20025/document.
Full textLa thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels
文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化,分类目标也逐渐多样化,研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务,对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上,针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题,从不同的角度尝试运用粗糙集理论加以解决,提出了相应的算法,主要包括:针对n-gram作为中文文本特征时带来的维数灾难问题,提出了两步特征选择的方法,即去除类内稀有特征和类间特征选择相结合的方法,并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验,得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低,开销大等问题,提出了基于LDA模型的多标签文本分类算法,利用LDA模型提取的主题作为文本特征,构建高效的分类器。在PT3多标签分类转换方法下,该分类算法在中英文数据集上都表现出很好的效果,与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点,提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类,再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能,在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题,提出了一种基于可变精度粗糙集的复合多标签文本分类框架,该框架通过可变精度粗糙集方法划分文本特征空间,进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即,当一篇未知文本被划分到某一类文本的下近似区域时,可以直接用简单的单标签文本分类器判断其类别;当未知文本被划分在边界域时,则采用相应区域的多标签分类器进行分类。实验表明,这种分类框架下,分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统(MLWC),该系统能够直接调用搜索引擎返回的搜索结果,并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类,使用户可以快速地定位搜索结果中感兴趣的文本。
Books on the topic "Text document classification"
Automatic indexing and abstracting of document texts. Boston: Kluwer Academic Publishers, 2000.
Find full textMeister, Burkhardt W. The German Limited Liability Company: An introduction to the Act on limited liability companies with German/English text, synoptically arranged, of the act, a sample of arcticles of association, samples of the other formation documents of the company, the classification of the balance sheet and the profit and loss statement of a company and an extract from the commercial register = Die deutsche Gesellschaft mit beschränkter Haftung : eine Einführung zum Gesetz betreffend die Gesellschaften mit beschränkter Haftung mit synoptisch angeordnetem deutsch/englischem Text des Gesetzes, eines Gesellschaftsvertrages, der sonstigen Gründungsdokumente einer Bilanz und der Gewinn- und Verlustrechnung einer Gesellschaft und eines Auszugs aus dem Handelsregister. 7th ed. München: Beck, 2010.
Find full textIntelligent Text Categorization And Clustering. Springer, 2008.
Find full textBook chapters on the topic "Text document classification"
Guthrie, Louise, Joe Guthrie, and James Leistensnider. "Document Classification and Routing." In Text, Speech and Language Technology, 289–310. Dordrecht: Springer Netherlands, 1999. http://dx.doi.org/10.1007/978-94-017-2388-6_12.
Full textHuang, Chaochao, Xipeng Qiu, and Xuanjing Huang. "Text Classification with Document Embeddings." In Lecture Notes in Computer Science, 131–40. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-12277-9_12.
Full textHrala, Michal, and Pavel Král. "Multi-label Document Classification in Czech." In Text, Speech, and Dialogue, 343–51. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40585-3_44.
Full textKralicek, Jiri, and Jiri Matas. "Fast Text vs. Non-text Classification of Images." In Document Analysis and Recognition – ICDAR 2021, 18–32. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86337-1_2.
Full textKrál, Pavel, and Ladislav Lenc. "Confidence Measure for Czech Document Classification." In Computational Linguistics and Intelligent Text Processing, 525–34. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-18117-2_39.
Full textPenha, Gustavo, Raphael Campos, Sérgio Canuto, Marcos André Gonçalves, and Rodrygo L. T. Santos. "Document Performance Prediction for Automatic Text Classification." In Lecture Notes in Computer Science, 132–39. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15719-7_17.
Full textHowland, Peg, and Haesun Park. "Cluster-Preserving Dimension Reduction Methods for Document Classification." In Survey of Text Mining II, 3–23. London: Springer London, 2008. http://dx.doi.org/10.1007/978-1-84800-046-9_1.
Full textXia, Zhonghang, Guangming Xing, Houduo Qi, and Qi Li. "Applications of Semidefinite Programming in XML Document Classification." In Survey of Text Mining II, 129–44. London: Springer London, 2008. http://dx.doi.org/10.1007/978-1-84800-046-9_7.
Full textGelbukh, Alexander, Grigori Sidorov, and Adolfo Guzman-Arénas. "Use of a Weighted Topic Hierarchy for Document Classification." In Text, Speech and Dialogue, 133–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999. http://dx.doi.org/10.1007/3-540-48239-3_24.
Full textLehečka, Jan, and Jan Švec. "Improving Multi-label Document Classification of Czech News Articles." In Text, Speech, and Dialogue, 307–15. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24033-6_35.
Full textConference papers on the topic "Text document classification"
Alshamari, Fatimah, and Abdou Youssef. "A Study into Math Document Classification using Deep Learning." In 8th International Conference on Computational Science and Engineering (CSE 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101702.
Full textWu, Qin, Eddie Fuller, and Cun-Quan Zhang. "Text Document Classification and Pattern Recognition." In 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM). IEEE, 2009. http://dx.doi.org/10.1109/asonam.2009.21.
Full textChen, ZhiHang, Liping Huang, and Yi L. Murphey. "Incremental Learning for Text Document Classification." In 2007 International Joint Conference on Neural Networks. IEEE, 2007. http://dx.doi.org/10.1109/ijcnn.2007.4371367.
Full textZhang, Haopeng, and Jiawei Zhang. "Text Graph Transformer for Document Classification." In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.emnlp-main.668.
Full textSimske, Steven J., and Rafael Lins. "Automatic Text Summarization and Classification." In DocEng '18: ACM Symposium on Document Engineering 2018. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3209280.3232791.
Full textLima, João Marcos Carvalho, and José Everardo Bessa Maia. "A Topical Word Embeddings for Text Classification." In XV Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2018. http://dx.doi.org/10.5753/eniac.2018.4401.
Full textThi Xuan Lam, Thanh, Anh Duc Le, and Masaki Nakagawa. "User Interface for Text and Non-Text Classification." In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). IEEE, 2019. http://dx.doi.org/10.1109/icdarw.2019.20044.
Full text"Contextual Latent Semantic Networks used for Document Classification." In Special Session on Text Mining. SciTePress - Science and and Technology Publications, 2012. http://dx.doi.org/10.5220/0004109304250430.
Full textZhao, Miao, Rui-Qi Wang, Fei Yin, Xu-Yao Zhang, Lin-Lin Huang, and Jean-Marc Ogier. "Fast Text/non-Text Image Classification with Knowledge Distillation." In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019. http://dx.doi.org/10.1109/icdar.2019.00234.
Full textJiang, Suqi, Jason Lewris, Michael Voltmer, and Hongning Wang. "Integrating rich document representations for text classification." In 2016 Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2016. http://dx.doi.org/10.1109/sieds.2016.7489319.
Full textReports on the topic "Text document classification"
Idakwo, Gabriel, Sundar Thangapandian, Joseph Luttrell, Zhaoxian Zhou, Chaoyang Zhang, and Ping Gong. Deep learning-based structure-activity relationship modeling for multi-category toxicity classification : a case study of 10K Tox21 chemicals with high-throughput cell-based androgen receptor bioassay data. Engineer Research and Development Center (U.S.), July 2021. http://dx.doi.org/10.21079/11681/41302.
Full text