Journal articles on the topic 'Automated information extraction'

To see the other types of publications on this topic, follow the link: Automated information extraction.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Automated information extraction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Adefowoke Ojokoh, Bolanle, Olumide Sunday Adewale, and Samuel Oluwole Falaki. "Automated document metadata extraction." Journal of Information Science 35, no. 5 (June 11, 2009): 563–70. http://dx.doi.org/10.1177/0165551509105195.

Full text
Abstract:
Web documents are available in various forms, most of which do not carry additional semantics. This paper presents a model for general document metadata extraction. The model, which combines segmentation by keywords and pattern matching techniques, was implemented using PHP, MySQL, JavaScript and HTML. The system was tested with 40 randomly selected PDF documents (mainly theses). An evaluation of the system was done using standard criteria measures namely precision, recall, accuracy and F-measure. The results show that the model is relatively effective for the task of metadata extraction, especially for theses and dissertations. A combination of machine learning with these rule-based methods will be explored in the future for better results.
APA, Harvard, Vancouver, ISO, and other styles
2

Musaev, Alexander A., and Dmitry A. Grigoriev. "TECHNOLOGIES FOR AUTOMATIC KNOWLEDGE EXTRACTION FROM POORLY STRUCTURED INFORMATION FOR MANAGEMENT TASKS IN UNSTABLE IMMERSION ENVIRONMENTS." Bulletin of the Saint Petersburg State Institute of Technology (Technical University) 63 (2022): 68–77. http://dx.doi.org/10.36807/1998-9849-2022-63-89-68-77.

Full text
Abstract:
The problem of automatic knowledge extraction from poorly structured text data is considered. The application uses the task of proactive management in unstable immersion environments. A brief overview and critical analysis of the current state of knowledge extraction technologies from text messages are presented. A formalized formulation of the task of extracting knowledge from textual information was carried out. The structures of an automated system for preprocessing text documents and a training data polygon were developed. Options for creating search and statistical technologies for extracting knowledge from text messages are presented.
APA, Harvard, Vancouver, ISO, and other styles
3

Andrade, Miguel A., and Peer Bork. "Automated extraction of information in molecular biology." FEBS Letters 476, no. 1-2 (June 26, 2000): 12–17. http://dx.doi.org/10.1016/s0014-5793(00)01661-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Townsend, Joe A., Sam E. Adams, Christopher A. Waudby, Vanessa K. de Souza, Jonathan M. Goodman, and Peter Murray-Rust. "Chemical documents: machine understanding and automated information extraction." Organic & Biomolecular Chemistry 2, no. 22 (2004): 3294. http://dx.doi.org/10.1039/b411033a.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cemus, Karel, and Tomas Cerny. "Automated extraction of business documentation in enterprise information systems." ACM SIGAPP Applied Computing Review 16, no. 4 (January 13, 2017): 5–13. http://dx.doi.org/10.1145/3040575.3040576.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Valls-Vargas, Josep, Jichen Zhu, and Santiago Ontanon. "Error Analysis in an Automated Narrative Information Extraction Pipeline." IEEE Transactions on Computational Intelligence and AI in Games 9, no. 4 (December 2017): 342–53. http://dx.doi.org/10.1109/tciaig.2016.2575823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Haiyan Guan, Jonathan Li, Yongtao Yu, Michael Chapman, and Cheng Wang. "Automated Road Information Extraction From Mobile Laser Scanning Data." IEEE Transactions on Intelligent Transportation Systems 16, no. 1 (February 2015): 194–205. http://dx.doi.org/10.1109/tits.2014.2328589.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cook, Tessa S., Stefan Zimmerman, Andrew D. A. Maidment, Woojin Kim, and William W. Boonn. "Automated Extraction of Radiation Dose Information for CT Examinations." Journal of the American College of Radiology 7, no. 11 (November 2010): 871–77. http://dx.doi.org/10.1016/j.jacr.2010.06.026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Grant, Gerry H., and Sumali J. Conlon. "EDGAR Extraction System: An Automated Approach to Analyze Employee Stock Option Disclosures." Journal of Information Systems 20, no. 2 (September 1, 2006): 119–42. http://dx.doi.org/10.2308/jis.2006.20.2.119.

Full text
Abstract:
Past alternative accounting choices and new accounting standards for stock options have hindered analysts' ability to compare corporate financial statements. Financial analysts need specific information about stock options in order to accurately assess the financial position of companies. Finding this information is often a tedious task. The SEC's EDGAR database is the richest source of financial statement information on the Web. However, the information is stored in text or HTML files making it difficult to search and extract data. Information Extraction (IE), the process of finding and extracting useful information in unstructured text, can effectively help users find vital financial information. This paper examines the development and use of the EDGAR Extraction System (EES), a customized, automated system that extracts relevant information about employee stock options from financial statement disclosure notes on the EDGAR database.
APA, Harvard, Vancouver, ISO, and other styles
10

Reimeier, Fabian, Dominik Röpert, Anton Güntsch, Agnes Kirchhoff, and Walter G. Berendsohn. "Service-based information extraction from herbarium specimens." Biodiversity Information Science and Standards 2 (May 21, 2018): e25415. http://dx.doi.org/10.3897/biss.2.25415.

Full text
Abstract:
On herbarium sheets, data elements such as plant name, collection site, collector, barcode and accession number are found mostly on labels glued to the sheet. The data are thus visible on specimen images. With continuously improving technologies for collection mass-digitisation it has become easier and easier to produce high quality images of herbarium sheets and in the last few years herbarium collections worldwide have started to digitize specimens on an industrial scale (Tegelberg et al. 2014). To use the label data contained in these massive numbers of images, they have to be captured and databased. Currently, manual data entry prevails and forms the principal cost and time limitation in the digitization process. The StanDAP-Herb Project has developed a standard process for (semi-) automatic detection of data on herbarium sheets. This is a formal extensible workflow integrating a wide range of automated specimen image analysis services, used to replace time-consuming manual data input as far as possible. We have created web-services for OCR (Optical Character Recognition); for identifying regions of interest in specimen images and for the context-sensitive extraction of information from text recognized by OCR. We implemented the workflow as an extension of the OpenRefine platform (Verborgh and De Wilde 2013).
APA, Harvard, Vancouver, ISO, and other styles
11

Manjunath, Akanksh Aparna, Manjunath Sudhakar Nayak, Santhanam Nishith, Satish Nitin Pandit, Shreyas Sunkad, Pratiba Deenadhayalan, and Shobha Gangadhara. "Automated invoice data extraction using image processing." IAES International Journal of Artificial Intelligence (IJ-AI) 12, no. 2 (June 1, 2023): 514. http://dx.doi.org/10.11591/ijai.v12.i2.pp514-521.

Full text
Abstract:
Manually processing invoices which are in the form of scanned photocopies is a time-consuming process. There is a need to automate the task of extraction of data from the invoices with a similar format. In this paper we investigate and analyse various techniques of image processing and text extraction to improve the results of the optical character recognition (OCR) engine, which is applied to extract the text from the invoice. This paper also proposes the design and implementation of a web enabled invoice processing system (IPS). The IPS consists of an annotation tool and an extraction tool. The annotation tool is used to mark the fields of interest in the invoice which are to be extracted. The extraction tool makes use of opensource computer vision library (OpenCV) algorithms to detect text. The proposed system was tested on more than 25 types of invoices with the average accuracy score lying between 85% and 95%. Finally, to provide ease of use, a web application is developed which also presents the results in a structured format. The entire system is designed so as to provide flexibility and automate the process of extracting details of interest from the invoices.
APA, Harvard, Vancouver, ISO, and other styles
12

Kang, SungKu, Lalit Patil, Arvind Rangarajan, Abha Moitra, Tao Jia, Dean Robinson, and Debasish Dutta. "Automated feedback generation for formal manufacturing rule extraction." Artificial Intelligence for Engineering Design, Analysis and Manufacturing 33, no. 3 (March 19, 2019): 289–301. http://dx.doi.org/10.1017/s0890060419000027.

Full text
Abstract:
AbstractManufacturing knowledge is maintained primarily in the unstructured text in industry. To facilitate the reuse of the knowledge, previous efforts have utilized Natural Language Processing (NLP) to classify manufacturing documents or to extract structured knowledge (e.g. ontology) from manufacturing text. On the other hand, extracting more complex knowledge, such as manufacturing rule, has not been feasible in a practical scenario, as standard NLP techniques cannot address the input text that needs validation. Specifically, if the input text contains the information irrelevant to the rule-definition or semantically invalid expression, standard NLP techniques cannot selectively derive precise information for the extraction of the desired formal manufacturing rule. To address the gap, we developed the feedback generation method based on Constraint-based Modeling (CBM) coupled with NLP and domain ontology, designed to support formal manufacturing rule extraction. Specifically, the developed method identifies the necessity of input text validation based on the predefined constraints and provides the relevant feedback to help the user modify the input text, so that the desired rule can be extracted. We proved the feasibility of the method by extending the previously implemented formal rule extraction framework. The effectiveness of the method is demonstrated by enabling the extraction of correct manufacturing rules from all the cases that need input text validation, about 30% of the dataset, after modifying the input text based on the feedback. We expect the feedback generation method will contribute to the adoption of semantics-based technology in the manufacturing field, by facilitating precise knowledge acquisition from manufacturing-related documents in a practical scenario.
APA, Harvard, Vancouver, ISO, and other styles
13

Alim, Sophia. "Automated Data Extraction from Online Social Network Profiles." International Journal of Virtual Communities and Social Networking 5, no. 4 (October 2013): 24–42. http://dx.doi.org/10.4018/ijvcsn.2013100102.

Full text
Abstract:
As the use of online social networking (OSN) sites is increasing, data extraction from OSN profiles is providing researchers with a rich source of data. Data extraction is divided into non-automated and automated approaches. However, researchers face a variety of ethical challenges especially using automated data extraction approaches. In social networking, there has been a lack of research that looks into the unique ethical challenges of using automated data extraction compared to non-automated extraction. This article explores the history of social research ethics and the unique ethical challenges associated with using automated data extraction, as well as how these impact the researcher. The author's review has highlighted that researchers face challenges when designing an experiment involving automated extraction from OSN profiles due to issues such as extraction methods, the speed at which the field of social media is moving and a lack of information on how to deal with ethical challenges.
APA, Harvard, Vancouver, ISO, and other styles
14

LA SALLE, JOHN, QUENTIN WHEELER, PAUL JACKWAY, SHAUN WINTERTON, DONALD HOBERN, and DAVID LOVELL. "Accelerating taxonomic discovery through automated character extraction." Zootaxa 2217, no. 1 (September 2, 2009): 43–55. http://dx.doi.org/10.11646/zootaxa.2217.1.3.

Full text
Abstract:
This paper discusses the following key messages. Taxonomy is (and taxonomists are) more important than ever in times of global change. Taxonomic endeavour is not occurring fast enough: in 250 years since the creation of the Linnean Systema Naturae, only about 20% of Earth’s species have been named. We need fundamental changes to the taxonomic process and paradigm to increase taxonomic productivity by orders of magnitude. Currently, taxonomic productivity is limited principally by the rate at which we capture and manage morphological information to enable species discovery. Many recent (and welcomed) initiatives in managing and delivering biodiversity information and accelerating the taxonomic process do not address this bottleneck. Development of computational image analysis and feature extraction methods is a crucial missing capacity needed to enable taxonomists to overcome the taxonomic impediment in a meaningful time frame.
APA, Harvard, Vancouver, ISO, and other styles
15

MORI, Tatsunori, Atsushi FUJIOKA, and Ichiro MURATA. "Automated Extraction of Statistical Expressions from Text for Information Compilation." Transactions of the Japanese Society for Artificial Intelligence 23 (2008): 310–18. http://dx.doi.org/10.1527/tjsai.23.310.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Zhixia, Madhav V. Chitturi, Andrea R. Bill, Dongxi Zheng, and David A. Noyce. "Automated Extraction of Horizontal Curve Information for Low-Volume Roads." Transportation Research Record: Journal of the Transportation Research Board 2472, no. 1 (January 2015): 172–84. http://dx.doi.org/10.3141/2472-20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Radhi, Abdul Kareem M. "CONSTRUCTION OF AUTOMATED SYSTEM FOR INFORMATION EXTRACTION AND TEXT CATEGORIZATION." Journal of Al-Nahrain University Science 11, no. 3 (December 1, 2008): 156–74. http://dx.doi.org/10.22401/jnus.11.3.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Zhou, Peng, and Nora El-Gohary. "Ontology-based automated information extraction from building energy conservation codes." Automation in Construction 74 (February 2017): 103–17. http://dx.doi.org/10.1016/j.autcon.2016.09.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Lacson, Ronilda, Martha E. Goodrich, Kimberly Harris, Phyllis Brawarsky, and Jennifer S. Haas. "Assessing Inaccuracies in Automated Information Extraction of Breast Imaging Findings." Journal of Digital Imaging 30, no. 2 (November 14, 2016): 228–33. http://dx.doi.org/10.1007/s10278-016-9927-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Shokri, D., H. Rastiveis, A. Shams, and W. A. Sarasua. "UTILITY POLES EXTRACTION FROM MOBILE LIDAR DATA IN URBAN AREA BASED ON DENSITY INFORMATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W18 (October 19, 2019): 1001–7. http://dx.doi.org/10.5194/isprs-archives-xlii-4-w18-1001-2019.

Full text
Abstract:
Abstract. Utility poles located along roads play a key role in road safety and planning as well as communications and electricity distribution. In this regard, new sensing technologies such as Mobile Terrestrial Laser Scanner (MTLS) could be an efficient method to detect utility poles and other planimetric objects along roads. However, due to the vast amount of data collected by MTLS in the form of a point cloud, automated techniques are required to extract objects from this data. This study proposes a novel method for automatic extraction of utility poles from the MTLS point clouds. The proposed algorithm is composed of three consecutive steps of pre-processing, cable area detection, and poles extraction. The point cloud is first pre-processed and then candidate areas for utility poles are specified based on Hough Transform (HT). Utility poles are extracted by applying horizontal and vertical density information to these areas. The performance of the method was evaluated on a sample point cloud and 98% accuracy was achieved in extracting utility poles using the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
21

Lauriston, Andy. "Automatic recognition of complex terms." Terminology 1, no. 1 (January 1, 1994): 147–70. http://dx.doi.org/10.1075/term.1.1.11lau.

Full text
Abstract:
While the term-extraction decisions made by terminologists are based primarily on semantic and pragmatic criteria, automated processes have barely started operating at these levels of linguistic analysis. This paper discusses the graphic, lexical, syntactic and semantic difficulties encountered in automated text processing in general and emphasizes in particular certain specific problems that arise in the automatic recognition of complex terms. In order to illustrate the current limitations of existing systems, the article goes on to describe TERMINO, a morphosyntactic text-analysis system developed to help in French-language term extraction. A quantitative and qualitative assessment is made of the system's performance in recognizing complex terms.
APA, Harvard, Vancouver, ISO, and other styles
22

Westerlund, Parvaneh, Ingemar Andersson, Tero Päivärinta, and Jörgen Nilsson. "Towards automated pre-ingest workflow for bridging information systems and digital preservation services." Records Management Journal 29, no. 3 (November 18, 2019): 289–304. http://dx.doi.org/10.1108/rmj-05-2018-0011.

Full text
Abstract:
Purpose This paper aims to automate pre-ingest workflow for preserving digital content, such as records, through middleware that integrates potentially many information systems with potentially several alternative digital preservation services. Design/methodology/approach This design research approach resulted in a design for model- and component-based software for such workflow. A proof-of-concept prototype was implemented and demonstrated in context of a European research project, ForgetIT. Findings The study identifies design issues of automated pre-ingest for digital preservation while using middleware as a design choice for this purpose. The resulting model and solution suggest functionalities and interaction patterns based on open interface protocols between the source systems of digital content, middleware and digital preservation services. The resulting workflow automates the tasks of fetching digital objects from the source system with metadata extraction, preservation preparation and transfer to a selected preservation service. The proof-of-concept verified that the suggested model for pre-ingest workflow and the suggested component architecture was technologically implementable. Future research and development needs to include new solutions to support context-aware preservation management with increased support for configuring submission agreements as a basis for dynamic automation of pre-ingest and more automated error handling. Originality/value The paper addresses design issues for middleware as a design choice to support automated pre-ingest in digital preservation. The suggested middleware architecture supports many-to-many relationships between the source information systems and digital preservation services through open interface protocols, thus enabling dynamic digital preservation solutions for records management.
APA, Harvard, Vancouver, ISO, and other styles
23

Alzubi, Raid, Hadeel Alzoubi, Stamos Katsigiannis, Daune West, and Naeem Ramzan. "Automated Detection of Substance-Use Status and Related Information from Clinical Text." Sensors 22, no. 24 (December 8, 2022): 9609. http://dx.doi.org/10.3390/s22249609.

Full text
Abstract:
This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability.
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Rui, Xiang Lan, Yao Liu, and Qing Yang Liu. "Web Information Extraction and Conversion for Mashup." Applied Mechanics and Materials 556-562 (May 2014): 5471–76. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.5471.

Full text
Abstract:
The existing Mashup Platform can only exploit the web service providing APIs, which limits the application scope of Mashup platform. This paper analyzes the HTTP protocol and related technologies, and proposes a semi-automated method for Web information extraction and conversion: the WebAPI system. In WebAPI, users firstly mark pages by hand with browser plug-ins, and then the proxy server grabs the HTTP message flow to obtain the parameters needed. The conversion module of WebAPI analyzes the parameters to create the corresponding Web Service APIs so that Mashup applications can utilize the general Web Service by invoking these interfaces. In this indirect manner, the range of resources that Mashup platforms can utilize is extensively expanded.
APA, Harvard, Vancouver, ISO, and other styles
25

Grover, Claire, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. "Use of the Edinburgh geoparser for georeferencing digitized historical collections." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368, no. 1925 (August 28, 2010): 3875–89. http://dx.doi.org/10.1098/rsta.2010.0149.

Full text
Abstract:
We report on two JISC-funded projects that aimed to enrich the metadata of digitized historical collections with georeferences and other information automatically computed using geoparsing and related information extraction technologies. Understanding location is a critical part of any historical research, and the nature of the collections makes them an interesting case study for testing automated methodologies for extracting content. The two projects (GeoDigRef and Embedding GeoCrossWalk) have looked at how automatic georeferencing of resources might be useful in developing improved geographical search capacities across collections. In this paper, we describe the work that was undertaken to configure the geoparser for the collections as well as the evaluations that were performed.
APA, Harvard, Vancouver, ISO, and other styles
26

Jamali, A., P. Kumar, and A. Abdul Rahman. "AUTOMATED EXTRACTION OF BUILDINGS FROM AERIAL LIDAR POINT CLOUDS AND DIGITAL IMAGING DATASETS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W16 (October 1, 2019): 303–8. http://dx.doi.org/10.5194/isprs-archives-xlii-4-w16-303-2019.

Full text
Abstract:
Abstract. To acquire 3D geospatial information, LiDAR technology provides the rapid, continuous and cost-effective capability. In this paper, two automated approaches for extracting building features from the integrated aerial LiDAR point cloud and digital imaging datasets are proposed. The assumption of the two approaches is that the LiDAR data can be used to distinguish between high- and low-rise objects while the multispectral dataset can be used to filter out vegetation from the data. Object-based image analysis techniques are applied to the extracted building objects. The two automated buildings extraction approaches are tested on a fusion of aerial LiDAR point cloud and digital imaging datasets of Istanbul city. The object-based automated technique presents better results compared to the threshold-based technique for extraction of building objects in term of visual interpretation.
APA, Harvard, Vancouver, ISO, and other styles
27

Dogon-yaro, M. A., P. Kumar, A. Abdul Rahman, and G. Buyuksalih. "EXTRACTION OF URBAN TREES FROM INTEGRATED AIRBORNE BASED DIGITAL IMAGE AND LIDAR POINT CLOUD DATASETS - INITIAL RESULTS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W1 (October 26, 2016): 81–88. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w1-81-2016.

Full text
Abstract:
Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.
APA, Harvard, Vancouver, ISO, and other styles
28

I.V., Prokhorov, Kochetkov O.T., and Filatov A.A. "Practical Training of Students on the Extraction and Analysis of Big Data." KnE Social Sciences 3, no. 2 (February 15, 2018): 361. http://dx.doi.org/10.18502/kss.v3i2.1565.

Full text
Abstract:
The article deals with questions of studies, development and practical use in teaching complex laboratory work on extracting and analyzing big data to train specialists in the specialty 10.05.04 "Information and Analytical Security Systems", direction of training "Information sSecurity of Financial and Economic Structures" in the framework of the educational discipline "Distributed Automated Information Systems". Keywords: big data, data scientist, extraction, processing and analysis of big data, information security of financial and economic structures, the Internet, Yandex, Google, application programming interface –API.
APA, Harvard, Vancouver, ISO, and other styles
29

Midhu Bala, G., and K. Chitra. "Data Extraction and Scratching Information Using R." Shanlax International Journal of Arts, Science and Humanities 8, no. 3 (January 1, 2021): 140–44. http://dx.doi.org/10.34293/sijash.v8i3.3588.

Full text
Abstract:
Web scraping is the process of automatically extracting multiple WebPages from the World Wide Web. It is a field with active developments that shares a common goal with text processing, the semantic web vision, semantic understanding, machine learning, artificial intelligence and human- computer interactions. Current web scraping solutions range from requiring human effort, the ad-hoc, and to fully automated systems that are able to extract the required unstructured information, convert into structured information, with limitations. This paper describes a method for developing a web scraper using R programming that locates files on a website and then extracts the filtered data and stores it. The modules used and the algorithm of automating the navigation of a website via links are mentioned in this paper. Further it can be used for data analytics.
APA, Harvard, Vancouver, ISO, and other styles
30

Wang, Jingyuan, Xinli Hu, Qingyan Meng, Linlin Zhang, Chengyi Wang, Xiangchen Liu, and Maofan Zhao. "Developing a Method to Extract Building 3D Information from GF-7 Data." Remote Sensing 13, no. 22 (November 11, 2021): 4532. http://dx.doi.org/10.3390/rs13224532.

Full text
Abstract:
The three-dimensional (3D) information of buildings can describe the horizontal and vertical development of a city. The GaoFen-7 (GF-7) stereo-mapping satellite can provide multi-view and multi-spectral satellite images, which can clearly describe the fine spatial details within urban areas, while the feasibility of extracting building 3D information from GF-7 image remains understudied. This article establishes an automated method for extracting building footprints and height information from GF-7 satellite imagery. First, we propose a multi-stage attention U-Net (MSAU-Net) architecture for building footprint extraction from multi-spectral images. Then, we generate the point cloud from the multi-view image and construct normalized digital surface model (nDSM) to represent the height of off-terrain objects. Finally, the building height is extracted from the nDSM and combined with the results of building footprints to obtain building 3D information. We select Beijing as the study area to test the proposed method, and in order to verify the building extraction ability of MSAU-Net, we choose GF-7 self-annotated building dataset and a public dataset (WuHan University (WHU) Building Dataset) for model testing, while the accuracy is evaluated in detail through comparison with other models. The results are summarized as follows: (1) In terms of building footprint extraction, our method can achieve intersection-over-union indicators of 89.31% and 80.27% for the WHU Dataset and GF-7 self-annotated datasets, respectively; these values are higher than the results of other models. (2) The root mean square between the extracted building height and the reference building height is 5.41 m, and the mean absolute error is 3.39 m. In summary, our method could be useful for accurate and automatic 3D building information extraction from GF-7 satellite images, and have good application potential.
APA, Harvard, Vancouver, ISO, and other styles
31

Li, Zhixia, Madhav V. Chitturi, Andrea R. Bill, and David A. Noyce. "Automated Identification and Extraction of Horizontal Curve Information from Geographic Information System Roadway Maps." Transportation Research Record: Journal of the Transportation Research Board 2291, no. 1 (January 2012): 80–92. http://dx.doi.org/10.3141/2291-10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

M. Ganesh, K., P. A.R.K.Raju, A. S. Satya Vara Prasad, and D. Ratnagiri. "Automated Mapping of Water Bodies from Resourcesat-2 Awifs Image Using Automated Algorithm, Nalgonda District, Telangana State, India." International Journal of Engineering & Technology 7, no. 3.31 (August 24, 2018): 224. http://dx.doi.org/10.14419/ijet.v7i3.31.18301.

Full text
Abstract:
In the recent past lot of research is taken place on surface water features. The surface water includes lakes, ponds, rivers, streams and other exposed inland water bodies. The function of rainfall amounts, intensity of rainfall etc. over season / year are the variation in spatial extent of these features. Remote sensing providing lot of data and extracting lot of information over the changes from time to time. Nowadays the role of satellite image process is widely used in extraction of water bodies. Different researchers are using various methods to delineate water bodies from different satellite imagery varying in characteristics like spatial, spectral, and temporal. In FCC water bodies appear as different hues depending on their physical characteristics such as depth of water (bottom reflection), turbidity, etc. Water appears dark due to which absorbs all infrared radiations which helps in easy contrast distinction between water and land in near-infrared band. Our present area of interest includes an automatic approach to capture the water body from a Resourcesat-2 AWiFS (Advanced Wide-Field Sensor) imagery using a Automated Algorithm for extraction of surface water bodies model. The dynamics of surface water bodies in Study on geospatial analysis of the extraction of water feature sheets for the month of January month 2018 of the study area. Geospatial database on water bodies information has been created from the Resourcesat-2 AWiFS image. By using bands of 1.55- 1.70 µm (SWIR), 0.77-0.86 µm (NIR), 0.62-0.68 µm (Red) and 0.52-0.59 µm (Green) for the estimation of the water spread area. The Water spread area (WSA) calculated for each is 37231 ha [1] and [2].
APA, Harvard, Vancouver, ISO, and other styles
33

Dore, C., and M. Murphy. "CURRENT STATE OF THE ART HISTORIC BUILDING INFORMATION MODELLING." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W5 (August 18, 2017): 185–92. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w5-185-2017.

Full text
Abstract:
In an extensive review of existing literature a number of observations were made in relation to the current approaches for recording and modelling existing buildings and environments: Data collection and pre-processing techniques are becoming increasingly automated to allow for near real-time data capture and fast processing of this data for later modelling applications. Current BIM software is almost completely focused on new buildings and has very limited tools and pre-defined libraries for modelling existing and historic buildings. The development of reusable parametric library objects for existing and historic buildings supports modelling with high levels of detail while decreasing the modelling time. Mapping these parametric objects to survey data, however, is still a time-consuming task that requires further research. Promising developments have been made towards automatic object recognition and feature extraction from point clouds for as-built BIM. However, results are currently limited to simple and planar features. Further work is required for automatic accurate and reliable reconstruction of complex geometries from point cloud data. Procedural modelling can provide an automated solution for generating 3D geometries but lacks the detail and accuracy required for most as-built applications in AEC and heritage fields.
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Xinhua, Da Zhang, and Bob Liu. "Automated Extraction of Radiation Dose Information From CT Dose Report Images." American Journal of Roentgenology 196, no. 6 (June 2011): W781—W783. http://dx.doi.org/10.2214/ajr.10.5718.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Goudelis, G., A. Tefas, and I. Pitas. "Automated Facial Pose Extraction From Video Sequences Based on Mutual Information." IEEE Transactions on Circuits and Systems for Video Technology 18, no. 3 (March 2008): 418–24. http://dx.doi.org/10.1109/tcsvt.2008.918457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Papamichail, D., A. Ploussi, S. Kordolaimi, E. Karavasilis, P. Papadimitroulas, V. Syrgiamiotis, and E. Efstathopoulos. "Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry." Journal of Physics: Conference Series 637 (September 16, 2015): 012022. http://dx.doi.org/10.1088/1742-6596/637/1/012022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Silverstein, Marc D. "Assessing Automated Extraction of Prognostic Information for Intensive Care Unit Patients." Mayo Clinic Proceedings 87, no. 9 (September 2012): 811–13. http://dx.doi.org/10.1016/j.mayocp.2012.07.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Feng, Chunlai, Fumiyoshi Yamashita, and Mitsuru Hashida. "Automated Extraction of Information from the Literature on Chemical-CYP3A4 Interactions." Journal of Chemical Information and Modeling 47, no. 6 (September 27, 2007): 2449–55. http://dx.doi.org/10.1021/ci700091m.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Zhu, Donghua, and Alan L. Porter. "Automated extraction and visualization of information for technological intelligence and forecasting." Technological Forecasting and Social Change 69, no. 5 (June 2002): 495–506. http://dx.doi.org/10.1016/s0040-1625(01)00157-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Pogorilyy, S. D., and A. A. Kramov. "Automated extraction of structured information from a variety of web pages." PROBLEMS IN PROGRAMMING, no. 2-3 (2018): 149–58. http://dx.doi.org/10.15407/pp2018.02.149.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Nowak, Jacqueline, Kristin Gennermann, Staffan Persson, and Zoran Nikoloski. "CytoSeg 2.0: automated extraction of actin filaments." Bioinformatics 36, no. 9 (January 23, 2020): 2950–51. http://dx.doi.org/10.1093/bioinformatics/btaa035.

Full text
Abstract:
Abstract Motivation Actin filaments (AFs) are dynamic structures that substantially change their organization over time. The dynamic behavior and the relatively low signal-to-noise ratio during live-cell imaging have rendered the quantification of the actin organization a difficult task. Results We developed an automated image-based framework that extracts AFs from fluorescence microscopy images and represents them as networks, which are automatically analyzed to identify and compare biologically relevant features. Although the source code is freely available, we have now implemented the framework into a graphical user interface that can be installed as a Fiji plugin, thus enabling easy access by the research community. Availability and implementation CytoSeg 2.0 is open-source software under the GPL and is available on Github: https://github.com/jnowak90/CytoSeg2.0. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
42

Previtali, M., L. Barazzetti, and M. Scaioni. "AUTOMATED ROAD INFORMATION EXTRACTION FROM HIGH RESOLUTION AERIAL LIDAR DATA FOR SMART ROAD APPLICATIONS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2020 (August 21, 2020): 533–39. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2020-533-2020.

Full text
Abstract:
Abstract. Automatic extraction of road features from LiDAR data is a fundamental task for different applications, including asset management. The availability of updated and reliable models is even more important in the context of smart roads. One of the main advantages of LiDAR data compared with other sensing instruments is the possibility to directly get 3D information. However, the task of deriving road networks form LiDAR data acquired with Airborne Laser Scanning (ALS) may be quite complex due to occlusions, low feature separability and shadowing from contextual objects. Indeed, even if roads elements can be identified in the ALS point cloud, the automated identification of the network starting form them can be involved due to large variability in the size of roads, shapes and presence of connected off-road features such as parking lots. This paper presents a workflow aimed at partially solving the automatic creation of a road network from high-resolution ALS data. The presented method consists of three main steps: (i) labelling of road points; (ii) a multi-level voting scheme; and (iii) the regularization of the extracted road segments. The developed method has been tested using the “Vaihingen”, “Toronto” and “Tobermory” data set provided by the ISPRS.
APA, Harvard, Vancouver, ISO, and other styles
43

Phan, Xuan Hieu, Susumu Horiguchi, and Tu Bao Ho. "Automated data extraction from the web with conditional models." International Journal of Business Intelligence and Data Mining 1, no. 2 (2005): 194. http://dx.doi.org/10.1504/ijbidm.2005.008362.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Vanetik, Natalia, and Marina Litvak. "Definition Extraction from Generic and Mathematical Domains with Deep Ensemble Learning." Mathematics 9, no. 19 (October 6, 2021): 2502. http://dx.doi.org/10.3390/math9192502.

Full text
Abstract:
Definitions are extremely important for efficient learning of new materials. In particular, mathematical definitions are necessary for understanding mathematics-related areas. Automated extraction of definitions could be very useful for automated indexing educational materials, building taxonomies of relevant concepts, and more. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. In this paper, we focus on automatic detection of one-sentence definitions in mathematical and general texts. We experiment with different classification models arranged in an ensemble and applied to a sentence representation containing syntactic and semantic information, to classify sentences. Our ensemble model is applied to the data adjusted with oversampling. Our experiments demonstrate the superiority of our approach over state-of-the-art methods in both general and mathematical domains.
APA, Harvard, Vancouver, ISO, and other styles
45

Gargoum, Suliman, Karim El-Basyouny, Joseph Sabbagh, and Kenneth Froese. "Automated Highway Sign Extraction Using Lidar Data." Transportation Research Record: Journal of the Transportation Research Board 2643, no. 1 (January 2017): 1–8. http://dx.doi.org/10.3141/2643-01.

Full text
Abstract:
Traffic signs are integral elements of any transportation network; however, keeping records of those signs and their condition is a tedious, time-consuming, and labor-intensive process. As a result, many agencies worldwide have been working toward automating the process. One form of automation uses remote sensing techniques to extract traffic sign information. An algorithm is proposed that can automatically extract traffic signs from mobile light detection and ranging data. After the number of signs on a road segment has been determined, the coordinates of those signs are mapped onto the road segment. The sign extraction procedure involves applying multiple filters to the point cloud data and clustering the data into traffic signs. The proposed algorithm was tested on three highways located in different regions of the province of Alberta, Canada. The segments on which the algorithm was tested include a two-lane undivided rural road and four-lane divided highways. The highway geometry varied, as did vegetation and tree density. Success rates ranged from 93% to 100%, and the algorithm performed better on highways without overhead signs. Results indicate that the proposed method is simple but effective for creating an accurate inventory of traffic signs.
APA, Harvard, Vancouver, ISO, and other styles
46

Divekar, Akash. "Analysis on Text Summarization." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 4222–29. http://dx.doi.org/10.22214/ijraset.2022.44848.

Full text
Abstract:
Abstract: As we enter the 21st century, with the advent of mobile phones and access to information stores, we seem to be surrounded by more information, less time, or the ability to process it. The creation of automated summaries was a clever human solution to this complex problem. However, the application of this solution was very complicated. In fact, there are a number of problems that need to be addressed before the promises of an automated text can be fully realized. Basically, it is necessary to understand how people summarize the text and build a system based on that. However, people are different in their thinking and interpretation that it is difficult to make a "gold standard" summary in which product summaries will be tested. In this paper, we will discuss the basic concepts of this article by providing the most appropriate definitions, characterization, types and two different methods of automatic text abstraction: extraction and extraction. Special attention is given to the method of extraction. It consists of selecting sentences and paragraphs that are important in the original text and combining them into a short form. It is mentally simple and easy to use
APA, Harvard, Vancouver, ISO, and other styles
47

Paglialonga, Alessia, Massimo Schiavo, and Enrico Gianluca Caiani. "Automated Characterization of Mobile Health Apps' Features by Extracting Information From the Web: An Exploratory Study." American Journal of Audiology 27, no. 3S (November 19, 2018): 482–92. http://dx.doi.org/10.1044/2018_aja-imia3-18-0008.

Full text
Abstract:
Purpose The aim of this study was to test the viability of a novel method for automated characterization of mobile health apps. Method In this exploratory study, we developed the basic modules of an automated method, based on text analytics, able to characterize the apps' medical specialties by extracting information from the web. We analyzed apps in the Medical and Health & Fitness categories on the U.S. iTunes store. Results We automatically crawled 42,007 Medical and 79,557 Health & Fitness apps' webpages. After removing duplicates and non-English apps, the database included 80,490 apps. We tested the accuracy of the automated method on a subset of 400 apps. We observed 91% accuracy for the identification of apps related to health or medicine, 95% accuracy for sensory systems apps, and an average of 82% accuracy for classification into medical specialties. Conclusions These preliminary results suggested the viability of automated characterization of apps based on text analytics and highlighted directions for improvement in terms of classification rules and vocabularies, analysis of semantic types, and extraction of key features (promoters, services, and users). The availability of automated tools for app characterization is important as it may support health care professionals in informed, aware selection of health apps to recommend to their patients.
APA, Harvard, Vancouver, ISO, and other styles
48

Bae, Jung Ho, Hyun Wook Han, Sun Young Yang, Gyuseon Song, Soonok Sa, Goh Eun Chung, Ji Yeon Seo, Eun Hyo Jin, Heecheon Kim, and DongUk An. "Natural Language Processing for Assessing Quality Indicators in Free-Text Colonoscopy and Pathology Reports: Development and Usability Study." JMIR Medical Informatics 10, no. 4 (April 15, 2022): e35257. http://dx.doi.org/10.2196/35257.

Full text
Abstract:
Background Manual data extraction of colonoscopy quality indicators is time and labor intensive. Natural language processing (NLP), a computer-based linguistics technique, can automate the extraction of important clinical information, such as adverse events, from unstructured free-text reports. NLP information extraction can facilitate the optimization of clinical work by helping to improve quality control and patient management. Objective We developed an NLP pipeline to analyze free-text colonoscopy and pathology reports and evaluated its ability to automatically assess adenoma detection rate (ADR), sessile serrated lesion detection rate (SDR), and postcolonoscopy surveillance intervals. Methods The NLP tool for extracting colonoscopy quality indicators was developed using a data set of 2000 screening colonoscopy reports from a single health care system, with an associated 1425 pathology reports. The NLP system was then tested on a data set of 1000 colonoscopy reports and its performance was compared with that of 5 human annotators. Additionally, data from 54,562 colonoscopies performed between 2010 and 2019 were analyzed using the NLP pipeline. Results The NLP pipeline achieved an overall accuracy of 0.99-1.00 for identifying polyp subtypes, 0.99-1.00 for identifying the anatomical location of polyps, and 0.98 for counting the number of neoplastic polyps. The NLP pipeline achieved performance similar to clinical experts for assessing ADR, SDR, and surveillance intervals. NLP analysis of a 10-year colonoscopy data set identified great individual variance in colonoscopy quality indicators among 25 endoscopists. Conclusions The NLP pipeline could accurately extract information from colonoscopy and pathology reports and demonstrated clinical efficacy for assessing ADR, SDR, and surveillance intervals in these reports. Implementation of the system enabled automated analysis and feedback on quality indicators, which could motivate endoscopists to improve the quality of their performance and improve clinical decision-making in colorectal cancer screening programs.
APA, Harvard, Vancouver, ISO, and other styles
49

Puttinaovarat, Supattra, and Paramate Horkaew. "Multi-spectral and Topographic Fusion for Automated Road Extraction." Open Geosciences 10, no. 1 (September 14, 2018): 461–73. http://dx.doi.org/10.1515/geo-2018-0036.

Full text
Abstract:
AbstractRoad geometry is pertinent information in various GIS studies. Reliable and updated road information thus calls for conventional on-site survey being replaced by more accurate and efficient remote sensing technology. Generally, this approach involves image enhancement and extraction of relevant features, such as elongate gradient and intersecting corners. Thus far, its implication is often impeded by wrongly extraction of other urban peripherals with similar pixel characteristics. This paper therefore proposes the fusion of THEOS satellite image and topographic derivatives, obtained from underlying Digital Surface Models (DSM). Multi-spectral indices in thematic layers and surface properties of designated roads were both fed into state-of-the-art machine learning algorithms. The results were later fused, taken into account consistently leveled road surface. The proposed technique was thus able to eliminate irrelevant urban structures such as buildings and other constructions, otherwise left by conventional index based extraction. The numerical assessment indicates recall of 84.64%, precision of 97.40% and overall accuracy of 97.78%, with 0.89 Kappa statistics. Visual inspection reported herewith also confirms consistency with ground truth reference.
APA, Harvard, Vancouver, ISO, and other styles
50

Choi, Jun-Woo, Yong-Joon Jun, Jin-ha Yoon, Young-hak Song, and Kyung-Soon Park. "A Study of Energy Simulation Integrated Process by Automated Extraction Module of the BIM Geometry Module." Energies 12, no. 13 (June 26, 2019): 2461. http://dx.doi.org/10.3390/en12132461.

Full text
Abstract:
Despite the international trend of actively utilizing BIM in the field of energy simulation, it is difficult to actively utilize BIM in domestic certification-related practice. As a result, the work process is not integrated and the work efficiency is reduced, such as the occurrence of redundant tasks. In this paper, the integrated process based on the BIM geometry information automated extraction module was presented and its effectiveness was reviewed and verified through self-developed automation module and uniformity and error-causing analysis. The construction of an integrated process through automatic configuration information extraction has the advantages of reducing the workload, enabling additional tasks through module function expansion, sharing information through the web, and ease of service manufacture. In addition, an experiment comparing the data of automatic and manual calculations with three subjects confirmed that automatic calculation database through automation module can produce a more reliable and uniform database with a lower proportion of human error than the manual calculation module. This not only improves the reliability of the energy simulation itself, but also reduces the human and temporal loads that occur during the post-simulation data verification process.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography