Zaloguj się

Gotowe bibliografie tematyczne / Data models, storage and indexing / Artykuły w czasopismach

Artykuły w czasopismach na temat „Data models, storage and indexing”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Data models, storage and indexing.

Autor: Grafiati

Data publikacji: 17 lipca 2024

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Data models, storage and indexing”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Wang, Zhaoguo, Haibo Chen, Youyun Wang, Chuzhe Tang i Huan Wang. "The Concurrent Learned Indexes for Multicore Data Storage". ACM Transactions on Storage 18, nr 1 (28.02.2022): 1–35. http://dx.doi.org/10.1145/3478289.

Pełny tekst źródła

Streszczenie:

We present XIndex, which is a concurrent index library and designed for fast queries. It includes a concurrent ordered index (XIndex-R) and a concurrent hash index (XIndex-H). Similar to a recent proposal of the learned index, the indexes in XIndex use learned models to optimize index efficiency. Compared with the learned index, for the ordered index, XIndex-R is able to handle concurrent writes effectively and adapts its structure according to runtime workload characteristics. For the hash index, XIndex-H is able to avoid the resize operation blocking concurrent writes. Furthermore, the indexes in XIndex can index string keys much more efficiently than the learned index. We demonstrate the advantages of XIndex with YCSB, TPC-C (KV), which is a TPC-C-inspired benchmark for key-value stores, and micro-benchmarks. Compared with ordered indexes of Masstree and Wormhole, XIndex-R achieves up to 3.2× and 4.4× performance improvement on a 24-core machine. Compared with hash indexes of Intel TBB HashMap, XIndex-H achieves up to 3.1× speedup. The performance further improves by 91% after adding the optimizations on indexing string keys. The library is open-sourced. 1

Style APA, Harvard, Vancouver, ISO itp.

2

Selihat, Maali, Belal Abuhaja i Khalid Alkaabneh. "Secure audio file indexing based on hidden markov model (HMM) on the cloud storage". International Journal Artificial Intelligent and Informatics 2, nr 1 (16.07.2021): 1–8. http://dx.doi.org/10.33292/ijarlit.v2i1.30.

Pełny tekst źródła

Streszczenie:

With the introduction of many social media applications and the exponential growth of the number of people using such applications to exchange Audio files as a main way of conveying confidential messages relaying on public telecommunications and networks. The need arise to secure audio data files and preserve the integrity of the message while traversing public networks and when the data is at rest in the cloud. Therefore, in this research, to ensure confidentiality and integrity of the audio files while reducing storage space several algorithms have been devised. To achieve this, we utilized Public key infrastructure and Hash functions along with Hidden Markov Models (HMM). The results show a significant drop in the storage space needed while remarkable reduction in network transmission time. When comparing the original audio file size with the converted file size after applying HMM, the results show a variation between 0% and 10% only, with over 50% reduction in the storage space in some cases.

Style APA, Harvard, Vancouver, ISO itp.

3

Frady, E. Paxon, Denis Kleyko i Friedrich T. Sommer. "A Theory of Sequence Indexing and Working Memory in Recurrent Neural Networks". Neural Computation 30, nr 6 (czerwiec 2018): 1449–513. http://dx.doi.org/10.1162/neco_a_01084.

Pełny tekst źródła

Streszczenie:

To accommodate structured approaches of neural computation, we propose a class of recurrent neural networks for indexing and storing sequences of symbols or analog data vectors. These networks with randomized input weights and orthogonal recurrent weights implement coding principles previously described in vector symbolic architectures (VSA) and leverage properties of reservoir computing. In general, the storage in reservoir computing is lossy, and crosstalk noise limits the retrieval accuracy and information capacity. A novel theory to optimize memory performance in such networks is presented and compared with simulation experiments. The theory describes linear readout of analog data and readout with winner-take-all error correction of symbolic data as proposed in VSA models. We find that diverse VSA models from the literature have universal performance properties, which are superior to what previous analyses predicted. Further, we propose novel VSA models with the statistically optimal Wiener filter in the readout that exhibit much higher information capacity, in particular for storing analog data. The theory we present also applies to memory buffers, networks with gradual forgetting, which can operate on infinite data streams without memory overflow. Interestingly, we find that different forgetting mechanisms, such as attenuating recurrent weights or neural nonlinearities, produce very similar behavior if the forgetting time constants are matched. Such models exhibit extensive capacity when their forgetting time constant is optimized for given noise conditions and network size. These results enable the design of new types of VSA models for the online processing of data streams.

Style APA, Harvard, Vancouver, ISO itp.

4

Ibtisam, Ferrahi Ibtisam, Sandro Bimonte i Kamel Boukhalfa. "Logical and Physical Design of Spatial Non-Strict Hierarchies in Relational Spatial Data Warehouse". International Journal of Data Warehousing and Mining 15, nr 1 (styczeń 2019): 1–18. http://dx.doi.org/10.4018/ijdwm.2019010101.

Pełny tekst źródła

Streszczenie:

The emergence of spatial or geographic data in DW Systems defines new models that support the storage and manipulation of the data. The need to build an SDW and to optimize SOLAP queries continues to attract the interest of researchers in recent years. Several spatial data models have been investigated to extend classical multidimensional data models with spatial concepts. However, most of existing models do not handle a non-strict spatial hierarchy. Moreover, the complexity of the spatial data makes the execution time of spatial queries very considerable. Often, spatial indexation methods are applied to optimizing access to large volumes of data and helps reduce the cost of spatial OLAP queries. Most of existing indexes support predefined spatial hierarchies. The authors show, in this article, that the logical models proposed in the literature and indexing techniques are not suitable to non-strict hierarchies. The authors propose a new logical schema supporting the non-strict hierarchies and a bitmap index to optimize queries defined by spatial dimensions with several non-strict hierarchies.

Style APA, Harvard, Vancouver, ISO itp.

5

Abebe, Michael, Horatiu Lazu i Khuzaima Daudjee. "Tiresias". Proceedings of the VLDB Endowment 15, nr 11 (lipiec 2022): 3126–36. http://dx.doi.org/10.14778/3551793.3551857.

Pełny tekst źródła

Streszczenie:

To efficiently store and query a DBMS, administrators must select storage and indexing configurations. For example, one must decide whether data should be stored in rows or columns, in-memory or on disk, and which columns to index. These choices can be challenging to make for workloads that are mixed requiring hybrid transactional and analytical processing (HTAP) support. There is growing interest in system designs that can adapt how data is stored and indexed to execute these workloads efficiently. We present Tiresias , a predictor that learns the cost of data accesses and predicts their latency and likelihood under different storage scenarios. Tiresias makes these predictions by collecting observed latencies and access histories to build predictive models in an online manner, enabling autonomous storage and index adaptation. Experimental evaluation shows the benefits of predictive adaptation and the trade-offs for different predictive techniques.

Style APA, Harvard, Vancouver, ISO itp.

6

Rao, Sirisala Nageswara. "Optimized Cost Model for k-NN Queries in R*-Trees over Random Distribution". Advanced Materials Research 403-408 (listopad 2011): 3315–21. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.3315.

Pełny tekst źródła

Streszczenie:

Efficient storage and retrieval of multidimensional data in large volumes has become one of the key issues in the design and implementation of commercial and application software. The kind of queries posted on such data is also multifarious. Nearest neighbor queries are one such category and have more significance in GIS type of application. R-tree and its sequel are data partitioned hierarchical multidimensional indexing structures that help in this purpose. Today’s research has turned towards the development of powerful analytical method to predict the performance of such indexing structures such as for varies categories of queries such as range, nearest neighbor, join, etc .This paper focuses on performance of R*-tree for k nearest neighbor (kNN) queries. While general approaches are available in literature that works better for larger k over uniform data, few have explored the impact of small values of k. This paper proposes improved performance analysis model for kNN query for small k over random data. The results are tabulated and compared with existing models, the proposed model out performs the existing models in a significant way for small k

Style APA, Harvard, Vancouver, ISO itp.

7

Ma, Chaohong, Xiaohui Yu, Yifan Li, Xiaofeng Meng i Aishan Maoliniyazi. "FILM". Proceedings of the VLDB Endowment 16, nr 3 (listopad 2022): 561–73. http://dx.doi.org/10.14778/3570690.3570704.

Pełny tekst źródła

Streszczenie:

As modern applications generate data at an unprecedented speed and often require the querying/analysis of data spanning a large duration, it is crucial to develop indexing techniques that cater to larger-than-memory databases, where data reside on heterogeneous storage devices (such as memory and disk), and support fast data insertion and query processing. In this paper, we propose FILM, a F ully learned I ndex for L arger-than- M emory databases. FILM is a learned tree structure that uses simple approximation models to index data spanning different storage devices. Compared with existing techniques for larger-than-memory databases, such as anti-caching, FILM allows for more efficient query processing at significantly lower main-memory overhead. FILM is also designed to effectively address one of the bottlenecks in existing methods for indexing larger-than-memory databases that is caused by data swapping between memory and disk. More specifically, updating the LRU (for Least Recently Used) structure employed by existing methods for cold data identification (determining the data to be evicted to disk when the available memory runs out) often incurs significant delay to query processing. FILM takes a drastically different approach by proposing an adaptive LRU structure and piggybacking its update onto query processing with minimal overhead. We thoroughly study the performance of FILM and its components on a variety of datasets and workloads, and the experimental results demonstrate its superiority in improving query processing performance and reducing index storage overhead (by orders of magnitudes) compared with applicable baselines.

Style APA, Harvard, Vancouver, ISO itp.

8

Ge, Jiake, Huanchen Zhang, Boyu Shi, Yuanhui Luo, Yunda Guo, Yunpeng Chai, Yuxing Chen i Anqun Pan. "SALI: A Scalable Adaptive Learned Index Framework based on Probability Models". Proceedings of the ACM on Management of Data 1, nr 4 (8.12.2023): 1–25. http://dx.doi.org/10.1145/3626752.

Pełny tekst źródła

Streszczenie:

The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is the learned index, which uses a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing learned indexes exhibit constraints and encounter issues of scalability on multi-core data storage. This paper introduces SALI, the Scalable Adaptive Learned Index framework, which incorporates two strategies aimed at achieving high scalability, improving efficiency, and enhancing the robustness of the learned index. Firstly, a set of node-evolving strategies is defined to enable the learned index to adapt to various workload skews and enhance its concurrency performance in such scenarios. Secondly, a lightweight strategy is proposed to maintain statistical information within the learned index, with the goal of further improving the scalability of the index. Furthermore, to validate their effectiveness, SALI applied the two strategies mentioned above to the learned index structure that utilizes fine-grained write locks, known as LIPP. The experimental results have demonstrated that SALI significantly enhances the insertion throughput with 64 threads by an average of 2.04x compared to the second-best learned index. Furthermore, SALI accomplishes a lookup throughput similar to that of LIPP+.

Style APA, Harvard, Vancouver, ISO itp.

9

Boguslawski, P., P. Balak i C. Gold. "DATABASE STORAGE AND TRANSPARENT MEMORY LOADING OF BIG SPATIAL DATASETS IMPLEMENTED WITH THE DUAL HALF-EDGE DATA STRUCTURE". International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B4-2022 (1.06.2022): 9–15. http://dx.doi.org/10.5194/isprs-archives-xliii-b4-2022-9-2022.

Pełny tekst źródła

Streszczenie:

Abstract. 3D spatial models covering big areas, such as cities, are widely developed in recent years. Loading of a whole model from a hard drive into a computer memory is often not possible due to big amount of data and memory size limitations. Optimisation techniques based on spatial indexing, such as tiling, are applied in order to load at least a part of a model as soon as possible, while the remaining parts are collected in the background. It is especially useful in visualisation of cities. A similar idea is proposed for a transparent loading of a model implemented with the dual half-edge (DHE) data structure and stored in a database. The existing DHE-based solutions require the whole model to be present in the memory, which is a considerable limitation in case of models covering big areas and including detailed representations of city objects, such as buildings and their interiors. The prototype mechanism developed in this work includes loading and unloading of model parts at a level of single edges as well as model tiling. This allows for spatial analysis without complete loading a big amount of data into memory.

Style APA, Harvard, Vancouver, ISO itp.

10

Ramzan, Bajwa, Kazmi i Amna. "Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review". Electronics 8, nr 5 (30.04.2019): 488. http://dx.doi.org/10.3390/electronics8050488.

Pełny tekst źródła

Streszczenie:

Key-Value stores (KVSs) are the most flexible and simplest model of NoSQL databases, which have become highly popular over the last few years due to their salient features such as availability, portability, reliability, and low operational cost. From the perspective of software engineering, the chief obstacle for KVSs is to achieve software quality attributes (consistency, throughput, latency, security, performance, load balancing, and query processing) to ensure quality. The presented research is a Systematic Literature Review (SLR) to find the state-of-the-art research in the KVS domain, and through doing so determine the major challenges and solutions. This work reviews the 45 papers between 2010–2018 that were found to be closely relevant to our study area. The results show that performance is addressed in 31% of the studies, consistency is addressed in 20% of the studies, latency and throughput are addressed in 16% of the studies, query processing is addressed in 13% of studies, security is addressed in 11% of the studies, and load balancing is addressed in 9% of the studies. Different models are used for execution. The indexing technique was used in 20% of the studies, the hashing technique was used in 13% of the studies, the caching and security techniques were used together in 9% of the studies, the batching technique was used in 5% of the studies, the encoding techniques and Paxos technique were used together in 4% of the studies, and 36% of the studies used other techniques. This systematic review will enable researchers to design key-value stores as efficient storage. Regarding future collaborations, trust and privacy are the quality attributes that can be addressed; KVS is an emerging facet due to its widespread popularity, opening the way to deploy it with proper protection.

Style APA, Harvard, Vancouver, ISO itp.

11

Cheng, Yinyi, Kefa Zhou, Jinlin Wang i Jining Yan. "Big Earth Observation Data Integration in Remote Sensing Based on a Distributed Spatial Framework". Remote Sensing 12, nr 6 (17.03.2020): 972. http://dx.doi.org/10.3390/rs12060972.

Pełny tekst źródła

Streszczenie:

The arrival of the era of big data for Earth observation (EO) indicates that traditional data management models have been unable to meet the needs of remote sensing data in big data environments. With the launch of the first remote sensing satellite, the volume of remote sensing data has also been increasing, and traditional data storage methods have been unable to ensure the efficient management of large amounts of remote sensing data. Therefore, a professional remote sensing big data integration method is sorely needed. In recent years, the emergence of some new technical methods has provided effective solutions for multi-source remote sensing data integration. This paper proposes a multi-source remote sensing data integration framework based on a distributed management model. In this framework, the multi-source remote sensing data are partitioned by the proposed spatial segmentation indexing (SSI) model through spatial grid segmentation. The designed complete information description system, based on International Organization for Standardization (ISO) 19115, can explain multi-source remote sensing data in detail. Then, the distributed storage method of data based on MongoDB is used to store multi-source remote sensing data. The distributed storage method is physically based on the sharding mechanism of the MongoDB database, and it can provide advantages for the security and performance of the preservation of remote sensing data. Finally, several experiments have been designed to test the performance of this framework in integrating multi-source remote sensing data. The results show that the storage and retrieval performance of the distributed remote sensing data integration framework proposed in this paper is superior. At the same time, the grid level of the SSI model proposed in this paper also has an important impact on the storage efficiency of remote sensing data. Therefore, the remote storage data integration framework, based on distributed storage, can provide new technical support and development prospects for big EO data.

Style APA, Harvard, Vancouver, ISO itp.

12

Gnatyuk, Sergiy, Rat Berdibayev, Viktoriia Sydorenko, Oksana Zhyharevych i Tetiana Smirnova. "SYSTEM FOR CYBER SECURITY EVENTS CORRELATION AND INCIDENT MANAGEMENT IN CRITICAL INFRASTRUCTURE OBJECTS". Cybersecurity: Education, Science, Technique 3, nr 19 (2023): 176–96. http://dx.doi.org/10.28925/2663-4023.2023.19.176196.

Pełny tekst źródła

Streszczenie:

Modern information infrastructure consists of a large number of systems and components that require constant monitoring and control. To identify, analyze and eliminate possible cyber threats, it is recommended to use a single common solution - the so-called SIEM systems. SIEM technology collects event log data, detects unusual activity through real-time analysis, identifies threats, generates alerts, and suggests appropriate action scenarios. Today, the number and quality of SIEM systems has grown significantly, and the latest technologies of artificial intelligence, the Internet of Things, and cloud technologies are used to ensure fast and effective detection of threats. Thus, the work carried out a study of modern SIEM systems, their functionality, basic principles of operation, as well as a comparative analysis of their capabilities and differences, advantages and disadvantages of use. In addition, a universal system of event correlation and management of cyber security incidents at critical infrastructure facilities was developed and experimentally investigated. Models of the operation of the hybrid security data storage have been developed, which allow the indexing service to access external data storages, to perform scaling when the volume of data increases, to ensure high search speed, etc. Models, methods and algorithms for the operation of a distributed data bus have been developed, which allow for high speed processing of large flows of information, minimal delays in data processing, high resistance to failures, flexibility and expandability of storage. The proposed system is designed to solve a number of current cyber security problems and meets the main requirements of international standards and best global practices regarding the creation of cyber incident management systems.

Style APA, Harvard, Vancouver, ISO itp.

13

Kumar, K., H. Ledoux i J. Stoter. "COMPARATIVE ANALYSIS OF DATA STRUCTURES FOR STORING MASSIVE TINS IN A DBMS". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (7.06.2016): 123–30. http://dx.doi.org/10.5194/isprs-archives-xli-b2-123-2016.

Pełny tekst źródła

Streszczenie:

Point cloud data are an important source for 3D geoinformation. Modern day 3D data acquisition and processing techniques such as airborne laser scanning and multi-beam echosounding generate billions of 3D points for simply an area of few square kilometers. With the size of the point clouds exceeding the billion mark for even a small area, there is a need for their efficient storage and management. These point clouds are sometimes associated with attributes and constraints as well. Storing billions of 3D points is currently possible which is confirmed by the initial implementations in Oracle Spatial SDO PC and the PostgreSQL Point Cloud extension. But to be able to analyse and extract useful information from point clouds, we need more than just points i.e. we require the surface defined by these points in space. There are different ways to represent surfaces in GIS including grids, TINs, boundary representations, etc. In this study, we investigate the database solutions for the storage and management of massive TINs. The classical (face and edge based) and compact (star based) data structures are discussed at length with reference to their structure, advantages and limitations in handling massive triangulations and are compared with the current solution of PostGIS Simple Feature. The main test dataset is the TIN generated from third national elevation model of the Netherlands (AHN3) with a point density of over 10 points/m<sup>2</sup>. PostgreSQL/PostGIS DBMS is used for storing the generated TIN. The data structures are tested with the generated TIN models to account for their geometry, topology, storage, indexing, and loading time in a database. Our study is useful in identifying what are the limitations of the existing data structures for storing massive TINs and what is required to optimise these structures for managing massive triangulations in a database.

Style APA, Harvard, Vancouver, ISO itp.

14

Kumar, K., H. Ledoux i J. Stoter. "COMPARATIVE ANALYSIS OF DATA STRUCTURES FOR STORING MASSIVE TINS IN A DBMS". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (7.06.2016): 123–30. http://dx.doi.org/10.5194/isprsarchives-xli-b2-123-2016.

Pełny tekst źródła

Streszczenie:

Point cloud data are an important source for 3D geoinformation. Modern day 3D data acquisition and processing techniques such as airborne laser scanning and multi-beam echosounding generate billions of 3D points for simply an area of few square kilometers. With the size of the point clouds exceeding the billion mark for even a small area, there is a need for their efficient storage and management. These point clouds are sometimes associated with attributes and constraints as well. Storing billions of 3D points is currently possible which is confirmed by the initial implementations in Oracle Spatial SDO PC and the PostgreSQL Point Cloud extension. But to be able to analyse and extract useful information from point clouds, we need more than just points i.e. we require the surface defined by these points in space. There are different ways to represent surfaces in GIS including grids, TINs, boundary representations, etc. In this study, we investigate the database solutions for the storage and management of massive TINs. The classical (face and edge based) and compact (star based) data structures are discussed at length with reference to their structure, advantages and limitations in handling massive triangulations and are compared with the current solution of PostGIS Simple Feature. The main test dataset is the TIN generated from third national elevation model of the Netherlands (AHN3) with a point density of over 10 points/m<sup>2</sup>. PostgreSQL/PostGIS DBMS is used for storing the generated TIN. The data structures are tested with the generated TIN models to account for their geometry, topology, storage, indexing, and loading time in a database. Our study is useful in identifying what are the limitations of the existing data structures for storing massive TINs and what is required to optimise these structures for managing massive triangulations in a database.

Style APA, Harvard, Vancouver, ISO itp.

15

Benito-Picazo, Jesús, Ezequiel López-Rubio i Enrique Domínguez. "Growing Neural Forest-Based Color Quantization Applied to RGB Images". International Journal of Computer Vision and Image Processing 7, nr 3 (lipiec 2017): 13–25. http://dx.doi.org/10.4018/ijcvip.2017070102.

Pełny tekst źródła

Streszczenie:

Although last improvements in both physical storage technologies and image handling techniques have eased image managing processes, the large amount of information handled nowadays constantly demands more efficient ways to store and transmit image data streams. Among other alternatives for such purpose, the authors find color quantization, which consists of color indexing for minimal perceptual distortion image compression. In this context, artificial intelligence-based algorithms and more specifically, Artificial Neural Networks, have been consolidated as a powerful tool for unsupervised tasks, and therefore, for color quantization purposes. In this work, a novel approach to color quantization is presented based on the Growing Neural Forest (GNF), which is a Growing Neural Gas (GNG) variation where a set of trees is learnt instead of a general graph. Experimental results support the use of GNF for image quantization tasks where it overcomes other self-organized models including SOM, GHSOM and GNG. Future work will include more datasets and different competitive models to compare to.

Style APA, Harvard, Vancouver, ISO itp.

16

Vrgoč, Domagoj, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros i Juan Romero. "MillenniumDB: An Open-Source Graph Database System". Data Intelligence 5, nr 3 (2023): 560–610. http://dx.doi.org/10.1162/dint_a_00229.

Pełny tekst źródła

Streszczenie:

ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.

Style APA, Harvard, Vancouver, ISO itp.

17

Zhao, Hong, Wei-Jie Wang, Tao Wang, Zhao-Bin Chang i Xiang-Yan Zeng. "Key-Frame Extraction Based on HSV Histogram and Adaptive Clustering". Mathematical Problems in Engineering 2019 (22.09.2019): 1–10. http://dx.doi.org/10.1155/2019/5217961.

Pełny tekst źródła

Streszczenie:

Along with the fast development of digital information technology and the application of Internet, video data begins to grow explosively. Some applications with high real-time requirements, such as object detection, require strong online video storage and analysis capabilities. Key-frame extraction is an important technique in video analysis, which provides an organizational framework for dealing with video content and reduces the amount of data required in video indexing. To address the problem, this study proposes a key-frame extraction method based on HSV (hue, saturation, value) histogram and adaptive clustering. The HSV histogram is used as color features for each frame, which reduces the amount of data. Furthermore, by using the transformed one-dimensional eigenvector, the fixed number of features can be extracted for images with different sizes. Then, a cluster validation technique, the silhouette coefficient, is employed to get the appropriate number of clusters without setting any clustering parameters. Finally, several algorithms are compared in the experiments. The density peak clustering algorithm (DPCA) model is shown to be more effective than the other four models in precision and F-measure.

Style APA, Harvard, Vancouver, ISO itp.

18

Zhu, Guicun, Meihui Hao, Changlong Zheng i Linlin Wang. "Design of Knowledge Graph Retrieval System for Legal and Regulatory Framework of Multilevel Latent Semantic Indexing". Computational Intelligence and Neuroscience 2022 (19.07.2022): 1–11. http://dx.doi.org/10.1155/2022/6781043.

Pełny tekst źródła

Streszczenie:

Latent semantic analysis (LSA) is a natural language statistical model, which is considered as a method to acquire, generalize, and represent knowledge. Compared with other retrieval models based on concept dictionaries or concept networks, the retrieval model based on LSA has the advantages of strong computability and less human participation. LSA establishes a latent semantic space through truncated singular value decomposition. Words and documents in the latent semantic space are projected onto the dimension representing the latent concept, and then the semantic relationship between words can be extracted to present the semantic structure in natural language. This paper designs the system architecture of the public prosecutorial knowledge graph. Combining the graph data storage technology and the characteristics of the public domain ontology, a knowledge graph storage method is designed. By building a prototype system, the functions of knowledge management, knowledge query, and knowledge push are realized. A named entity recognition method based on bidirectional long-short-term memory (bi-LSTM) combined with conditional random field (CRF) is proposed. Bi-LSTM-CRF performs named entity recognition based on character-level features. CRF can use the transition matrix to further obtain the relationship between each position label, so that bi-LSTM-CRF not only retains the context information but also considers the influence between the current position and the previous position. The experimental results show that the LSTM-entity-context method proposed in this paper improves the representation ability of text semantics compared with other algorithms. However, this method only introduces relevant entity information to supplement the semantic representation of the text. The order in the case is often ignored, especially when it comes to the time series of the case characteristics, and the “order problem” may eventually affect the final prediction result. The knowledge graph of legal documents of theft cases based on ontology can be updated and maintained in real time. The knowledge graph can conceptualize, share, and perpetuate knowledge related to procuratorial organs and can also reasonably utilize and mine many useful experiences and knowledge to assist in decision-making.

Style APA, Harvard, Vancouver, ISO itp.

19

Ujang, Uznir, Francois Anton, Suhaibah Azri, Alias Abdul Rahman i Darka Mioc. "3D Hilbert Space Filling Curves in 3D City Modeling for Faster Spatial Queries". International Journal of 3-D Information Modeling 3, nr 2 (kwiecień 2014): 1–18. http://dx.doi.org/10.4018/ij3dim.2014040101.

Pełny tekst źródła

Streszczenie:

The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2 D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, the authors propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, they extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested for single object, nearest neighbor and range search queries using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a sub-interval of the ([0,1]) interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its clustering in 2 D.

Style APA, Harvard, Vancouver, ISO itp.

20

Peng, Yang, Fangqiang Yu, Ming Zhang, Jinglin Xu i Shang Gao. "Solutions on Establishing and Utilizing BIM for General Contractors in Key Scenarios of Digital Construction". Advances in Civil Engineering 2023 (27.07.2023): 1–13. http://dx.doi.org/10.1155/2023/3951336.

Pełny tekst źródła

Streszczenie:

Nowadays, digital construction has become popular when bringing convenience and efficiency to the traditional building construction industry. The primary tool of digital construction is the building information model (BIM). However, from the perspective of general contractors, unresolved puzzles still hinder obtaining the benefits of digital construction. When establishing a unified BIM from submodels provided by subcontractors, there will possibly be incomplete or inconsistent data during model merging. But extracting submodels from unified BIM often includes redundant data, thus making models less usable for subcontractors. It is also difficult for general contractors to effectively and accurately utilize resource information and submodel changes. This paper proposed solutions that depend on the widely adopted industry foundation classes standard to ensure the universality of our methods. First, a model merging algorithm is proposed to support the continuous merge of submodels created by different subcontractors. Second, an instance-level model extraction method based on strongly related entities is proposed, which extracts model instances to the minimum submodel and meets the subcontractor requirements at the same time. Third, the new model storage and indexing method are designed to reduce the complexity of model data and support rapid data retrieval, and a new BIM change detection method based on object metadata is provided. The proposed methods were applied by the general contractor of a large airport project during the construction stage. The application results proved that the proposed methods could ensure the quality of established deepening design models and extracted submodels and significantly reduce human labor by improving efficiency when utilizing BIM, which in turn supported key scenarios throughout the digital construction workflow.

Style APA, Harvard, Vancouver, ISO itp.

21

Liu, Dandan, i Zhaonian Zou. "gCore: Exploring Cross-Layer Cohesiveness in Multi-Layer Graphs". Proceedings of the VLDB Endowment 16, nr 11 (lipiec 2023): 3201–13. http://dx.doi.org/10.14778/3611479.3611519.

Pełny tekst źródła

Streszczenie:

As multi-layer graphs can give a more accurate and reliable picture of the complex relationships between entities, cohesive subgraph mining, a fundamental task in graph analysis, has been studied on multi-layer graphs in the literature. However, existing cohesive subgraph models are designated for special multi-layer graphs such as multiplex networks and heterogeneous information networks. In this paper, we propose generalized core (gCore), a new notion of cohesive subgraph on general multi-layer graphs without any predefined constraints on the interconnections between vertices. The gCore model considers both the intra-layer and cross-layer cohesiveness of vertices. Three related problems are studied in this paper including gCore search (GCS), gCore decomposition (GCD), and gCore indexing (GCI). A polynomial-time algorithm based on the peeling paradigm is proposed to solve the GCS problem. By considering the containment among gCores, a "tree of trees" data structure called KP-tree is designed for efficiently solving the GCD problem and serving as a compact storage and index of all gCores. Several advanced lossless compaction techniques including node/subtree elimination, subtree transplant, and subtree merge are proposed to help reduce the storage overhead of the KP-tree and speed up the process of solving GCD and GCI. Besides, a KP-tree-based GCS algorithm is designed, which can retrieve any gCore in linear time in the size of the gCore and the height of the KP-tree. The experiments on 10 real-world graphs verify the effectiveness of the gCore model and the efficiency of the proposed algorithms.

Style APA, Harvard, Vancouver, ISO itp.

22

Snehal Eknath Phule. "Graph Theory Applications in Database Management". International Journal of Scientific Research in Modern Science and Technology 3, nr 3 (16.03.2024): 13–17. http://dx.doi.org/10.59828/ijsrmst.v3i3.190.

Pełny tekst źródła

Streszczenie:

Graph theory, which is a branch of discrete mathematics, has emerged as a powerful tool in various domains, including database management. This abstract investigates the ways in which ideas and methods from graph theory which can be applied to database systems, offering a thorough synopsis of their benefits. Complex interactions within data can be well-modeled by using the basic concepts of graph theory, such as nodes, edges, and relationships. Because of its capacity to represent and query complex relationships, graph databases have become more and more popular in the field of database administration. Graph databases are well-suited for situations such as social networks, recommendation systems, and interconnected data domains because they are excellent at representing and traversing relationships, in contrast to standard relational databases, which are excellent at managing structured data. The abstract delves into the key graph-based data models, such as property graphs, RDF (Resource Description Framework), explaining how they facilitate the representation of diverse relationships. Furthermore, it explores the efficient storage and retrieval mechanisms that leverage graph traversal algorithms to extract valuable insights from interconnected datasets. The document highlights specific use cases where graph theory contributes to database management, including fraud detection, social network analysis, and recommendation systems. Additionally, it discusses the challenges associated with integrating graph databases into existing infrastructures and proposes solutions to address scalability and performance concerns. The abstract also touches upon the advancements in graph database query languages (Cypher) and SPARQL, showcasing their expressive power in querying complex relationships. The inclusion of graph-based indexing and optimization techniques demonstrates how database systems can efficiently handle queries involving large-scale graph data. As graph databases continue to evolve, this abstract concludes by outlining potential future directions in the intersection of graph theory and database management. It emphasizes the importance of ongoing research in developing scalable and efficient solutions for managing interconnected data, ultimately paving the way for more sophisticated and context-aware database systems relationships. Furthermore, it explores the efficient storage and retrieval mechanisms that leverage graph traversal algorithms to extract valuable insights from interconnected datasets.

Style APA, Harvard, Vancouver, ISO itp.

23

Saukova, Y. N., i M. A. Hundzina. "Tensor Calculus in Digital Colorimetry". Devices and Methods of Measurements 13, nr 3 (24.10.2022): 216–27. http://dx.doi.org/10.21122/2220-9506-2022-13-3-216-227.

Pełny tekst źródła

Streszczenie:

Any object can have many implementations in the form of digital images and any digital image can be processed many times increasing or decreasing accuracy and reliability. Digital colorimetry faces the need to work out issues of ensuring accuracy, metrological traceability and reliability. The purpose of this work was to generalize approaches to the description of multidimensional quantized spaces and show the possibilities of their adaptation to digital colorimetry. This approach will minimize the private and global risks in measurements.For color identification digital colorimetry uses standard color models and spaces. Most of them are empirical and are improved during the transition from standard to real observation conditions taking into account the phenomena of vision and the age of observers. From the point of view of measurement, a digital image can be represented by a combinatorial model of an information and measurement channel with the appearance of the phenomenon of a color covariance hypercube requiring a significant amount of memory for data storage and processing. The transition from the covariance hypercube to high-dimensional matrices and tensors of the first, second and higher ranks provides the prospect of optimizing the color parameters of a digital image by the criterion of information entropy.Tensor calculus provides opportunities for expanding the dynamic range in color measurements describing multidimensional vector fields and quantized spaces with indexing tensors and decomposing them into matrices of low orders.The proposed complex approach based on tensor calculus. According to this approach the color space is a set of directed vector fields undergoing sampling, quantization and coding operations. Also it is a dynamic open system exchanging information with the environment at a given level and to identify color with specified levels of accuracy, reliability, uncertainty and entropy.

Style APA, Harvard, Vancouver, ISO itp.

24

Iliopoulos, Costas. "Storage and indexing of massive data". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, nr 2016 (28.05.2014): 20130213. http://dx.doi.org/10.1098/rsta.2013.0213.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

25

Cao, Zhixuan, Abani Patra, Marcus Bursik, E. Bruce Pitman i Matthew Jones. "Plume-SPH 1.0: a three-dimensional, dusty-gas volcanic plume model based on smoothed particle hydrodynamics". Geoscientific Model Development 11, nr 7 (9.07.2018): 2691–715. http://dx.doi.org/10.5194/gmd-11-2691-2018.

Pełny tekst źródła

Streszczenie:

Abstract. Plume-SPH provides the first particle-based simulation of volcanic plumes. Smoothed particle hydrodynamics (SPH) has several advantages over currently used mesh-based methods in modeling of multiphase free boundary flows like volcanic plumes. This tool will provide more accurate eruption source terms to users of volcanic ash transport and dispersion models (VATDs), greatly improving volcanic ash forecasts. The accuracy of these terms is crucial for forecasts from VATDs, and the 3-D SPH model presented here will provide better numerical accuracy. As an initial effort to exploit the feasibility and advantages of SPH in volcanic plume modeling, we adopt a relatively simple physics model (3-D dusty-gas dynamic model assuming well-mixed eruption material, dynamic equilibrium and thermodynamic equilibrium between erupted material and air that entrained into the plume, and minimal effect of winds) targeted at capturing the salient features of a volcanic plume. The documented open-source code is easily obtained and extended to incorporate other models of physics of interest to the large community of researchers investigating multiphase free boundary flows of volcanic or other origins. The Plume-SPH code (https://doi.org/10.5281/zenodo. 572819) also incorporates several newly developed techniques in SPH needed to address numerical challenges in simulating multiphase compressible turbulent flow. The code should thus be also of general interest to the much larger community of researchers using and developing SPH-based tools. In particular, the SPH−ε turbulence model is used to capture mixing at unresolved scales. Heat exchange due to turbulence is calculated by a Reynolds analogy, and a corrected SPH is used to handle tensile instability and deficiency of particle distribution near the boundaries. We also developed methodology to impose velocity inlet and pressure outlet boundary conditions, both of which are scarce in traditional implementations of SPH. The core solver of our model is parallelized with the message passing interface (MPI) obtaining good weak and strong scalability using novel techniques for data management using space-filling curves (SFCs), object creation time-based indexing and hash-table-based storage schemes. These techniques are of interest to researchers engaged in developing particles in cell-type methods. The code is first verified by 1-D shock tube tests, then by comparing velocity and concentration distribution along the central axis and on the transverse cross with experimental results of JPUE (jet or plume that is ejected from a nozzle into a uniform environment). Profiles of several integrated variables are compared with those calculated by existing 3-D plume models for an eruption with the same mass eruption rate (MER) estimated for the Mt. Pinatubo eruption of 15 June 1991. Our results are consistent with existing 3-D plume models. Analysis of the plume evolution process demonstrates that this model is able to reproduce the physics of plume development.

Style APA, Harvard, Vancouver, ISO itp.

26

Du, M., J. Wang, C. Jing, J. Jiang i Q. Chen. "HIERARCHICAL DATA MODEL FOR STORAGE AND INDEXING OF MASSIVE STREET VIEW". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W13 (5.06.2019): 1295–99. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w13-1295-2019.

Pełny tekst źródła

Streszczenie:

<p><strong>Abstract.</strong> Maintaining an up-to-date inventory of urban infrastructure such as fire hydrant is critical to urban management. Street view database such as Google Street View and Baidu Street View contain street-level images, their potential for urban management has not been fully explored. For the massive image, data model for storage and indexing is an important research issue. Considering multiple cameras and GPS device in the image capturing platform, a hierarchical data model named 3D-Grid is proposed. Massive street view images were stored according to grid ID, GPS time and camera ID. An efficient time indexing algorithm is brought forth to replace the spatial indexing. Real test experiments are conducted in a project, and the validation and feasibility of 3D-Grid including time indexing algorithm were validated.</p>

Style APA, Harvard, Vancouver, ISO itp.

27

Moses, Timothy, Abubakar Usman Othman, Umar Yahaya Aisha, Abdulsalam Ya’u Gital, Boukari Souley i Badmos Tajudeen Adeleke. "Big data indexing: Taxonomy, performance evaluation, challenges and research opportunities". Journal of Computer Science and Engineering (JCSE) 3, nr 2 (6.09.2022): 71–94. http://dx.doi.org/10.36596/jcse.v3i2.548.

Pełny tekst źródła

Streszczenie:

In order to efficiently retrieve information from highly huge and complicated datasets with dispersed storage in cloud computing, indexing methods are continually used on big data. Big data has grown quickly due to the accessibility of internet connection, mobile devices like smartphones and tablets, body-sensor devices, and cloud applications. Big data indexing has a variety of problems as a result of the expansion of big data, which is seen in the healthcare industry, manufacturing, sciences, commerce, social networks, and agriculture. Due to their high storage and processing requirements, current indexing approaches fall short of meeting the needs of large data in cloud computing. To fulfil the indexing requirements for large data, an effective index strategy is necessary. This paper presents the state-of-the-art indexing techniques for big data currently being proposed, identifies the problems these techniques and big data are currently facing, and outlines some future directions for research on big data indexing in cloud computing. It also compares the performance taxonomy of these techniques based on mean average precision and precision-recall rate.

Style APA, Harvard, Vancouver, ISO itp.

28

Adeleke, Imran A., Adegbuyi D. Gbadebo i Abayomi O. Dawodu. "A B+-Tree-Based Indexing and Storage of Numerical Records in School Databases". Asian Journal of Research in Computer Science 16, nr 4 (26.12.2023): 418–27. http://dx.doi.org/10.9734/ajrcos/2023/v16i4401.

Pełny tekst źródła

Streszczenie:

The need for effective indexing and retrieval of data is paramount in any contemporary organization. However, the use of tree data structure had been effective in this regard as evident in literature. This article gives an overview of B+-tree data structure, its indexing technique and application in indexing and retrieving students’ academic records in the school system in order to make such records flexible. The study demonstrates the indexing and arrangement patterns of some numerical data. In essence, it discusses how to adopt the use of B+-tree data structure to manage some numerical data in order to enhance indexing, retrieval and modifications of such record. It concludes that good record management results in more convenient indexing and retrieval of students’ academic records within the school system.

Style APA, Harvard, Vancouver, ISO itp.

29

Vimal, Vrince. "An Efficient and Secure Query Processing and Indexing model for Secure Dynamic Cloud Storage". Turkish Journal of Computer and Mathematics Education (TURCOMAT) 10, nr 2 (10.09.2019): 1043–48. http://dx.doi.org/10.17762/turcomat.v10i2.13623.

Pełny tekst źródła

Streszczenie:

To ensure the security and privacy of stored data, as well as the efficacy and efficiency of cloud storage, it is necessary to overcome significant challenges, such as efficient and secure query processing and indexing in dynamic cloud storage. There are a number of limitations with the present methodologies and tactics for query processing and indexing in cloud storage, including high processing overhead, scalability problems, and security concerns. In this paper, we provide a method for efficiently and securely executing queries and indexing data in dynamic cloud storage. The suggested system incorporates scalable indexing techniques, secure query processing, and dynamic data management to overcome these issues. The proposed system has several potential uses in many different areas, including healthcare, finance, e-commerce, government, and research. As new problems arise with cloud storage services, the proposed approach will need to be adjusted and enhanced via ongoing research and development. The proposed method has the potential to enhance data administration and analysis in dynamically managed cloud storage service environments while also protecting data privacy and security.

Style APA, Harvard, Vancouver, ISO itp.

30

Chen, S., Z. Wang, L. Bai, K. Liu, J. Gao, M. Zhao i M. D. Mulvenna. "LARGE VECTOR SPATIAL DATA STORAGE AND QUERY PROCESSING USING CLICKHOUSE". International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-M-1-2023 (21.04.2023): 65–72. http://dx.doi.org/10.5194/isprs-archives-xlviii-m-1-2023-65-2023.

Pełny tekst źródła

Streszczenie:

Abstract. The exponential growth of geospatial data resulting from the development of earth observation technology has created significant challenges for traditional relational databases. While NoSQL databases based on distributed file systems can handle massive data storage, they often struggle to cope with real-time query. Column-storage databases, on other hand, are highly effective at both storage and query processing for large-scale datasets. In this paper, we propose a spatial version of ClickHouse that leverages R-Tree indexing to enable efficient storage and real-time analysis of massive remote sensing data. ClickHouse is a column-oriented, open-source database management system designed for handling large-scale datasets. By integrating R-Tree indexing, we have created a highly efficient system for storing and querying geospatial data. To evaluate the performance of our system, we compare it with HBase, a popular distributed, NoSQL database system. Our experimental results show that ClickHouse outperforms HBase in handling spatial data queries, with a response time approximately three times faster than HBase. We attribute this performance gain to the highly efficient R-Tree indexing used in ClickHouse, which allows for fast spatial data query.

Style APA, Harvard, Vancouver, ISO itp.

31

Wang, Sheng, David Maier i Beng Chin Ooi. "Lightweight indexing of observational data in log-structured storage". Proceedings of the VLDB Endowment 7, nr 7 (marzec 2014): 529–40. http://dx.doi.org/10.14778/2732286.2732290.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Tulkinbekov, Khikmatullo, i Deok-Hwan Kim. "Data Modifications in Blockchain Architecture for Big-Data Processing". Sensors 23, nr 21 (27.10.2023): 8762. http://dx.doi.org/10.3390/s23218762.

Pełny tekst źródła

Streszczenie:

Due to the immutability of blockchain, the integration with big-data systems creates limitations on redundancy, scalability, cost, and latency. Additionally, large amounts of invaluable data result in the waste of energy and storage resources. As a result, the demand for data deletion possibilities in blockchain has risen over the last decade. Although several prior studies have introduced methods to address data modification features in blockchain, most of the proposed systems need shorter deletion delays and security requirements. This study proposes a novel blockchain architecture called Unlichain that provides data-modification features within public blockchain architecture. To achieve this goal, Unlichain employed a new indexing technique that defines the deletion time for predefined lifetime data. The indexing technique also enables the deletion possibility for unknown lifetime data. Unlichain employs a new metadata verification consensus among full and meta nodes to avoid delays and extra storage usage. Moreover, Unlichain motivates network nodes to include more transactions in a new block, which motivates nodes to scan for expired data during block mining. The evaluations proved that Unlichain architecture successfully enables instant data deletion while the existing solutions suffer from block dependency issues. Additionally, storage usage is reduced by up to 10%.

Style APA, Harvard, Vancouver, ISO itp.

33

Singhal, Shubhanshi, Akanksha Kaushik i Pooja Sharma. "A Novel approach of data deduplication for distributed storage". International Journal of Engineering & Technology 7, nr 2.4 (10.03.2018): 46. http://dx.doi.org/10.14419/ijet.v7i2.4.10040.

Pełny tekst źródła

Streszczenie:

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.

Style APA, Harvard, Vancouver, ISO itp.

34

Jiang, Chao, Jinlin Wang i Yang Li. "An Efficient Indexing Scheme for Network Traffic Collection and Retrieval System". Electronics 10, nr 2 (15.01.2021): 191. http://dx.doi.org/10.3390/electronics10020191.

Pełny tekst źródła

Streszczenie:

Historical network traffic retrieval, both at the packet and flow level, has been applied in many fields of network security, such as network traffic analysis and network forensics. To retrieve specific packets from a vast number of packet traces, it is an effective solution to build indexes for the query attributes. However, it brings challenges of storage consumption and construction time overhead for packet indexing. To address these challenges, we propose an efficient indexing scheme called IndexWM based on the wavelet matrix data structure for packet indexing. Moreover, we design a packet storage format based on the PcapNG format for our network traffic collection and retrieval system, which can speed up the extraction of index data from packet traces. Offline experiments on randomly generated network traffic and actual network traffic are performed to evaluate the performance of the proposed indexing scheme. We choose an open-source and widely used bitmap indexing scheme, FastBit, for comparison. Apart from the native bitmap compression method Word-Aligned Hybrid (WAH), we implement an efficient bitmap compression method Scope-Extended COMPAX (SECOMPAX) in FastBit for performance evaluation. The comparison results show that our scheme outperforms the selected bitmap indexing schemes in terms of time consumption, storage consumption and retrieval efficiency.

Style APA, Harvard, Vancouver, ISO itp.

35

Geetha, K., i A. Vijaya. "Cross-Layer Fragment Indexing based File Deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for Optimized Cloud Storage". International Journal on Recent and Innovation Trends in Computing and Communication 11, nr 8s (18.08.2023): 565–75. http://dx.doi.org/10.17762/ijritcc.v11i8s.7239.

Pełny tekst źródła

Streszczenie:

Cloud computing and storage processing is a big service for maintaining a large number of data in a centralized server to store and retrieve data depending on the use to pay as a service model. Due to increasing storage depending on duplicate copy presence during different sceneries, the increased size leads to increased cost. To resolve this problem, we propose a Cross-Layer Fragment Indexing (CLFI) based file deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for optimized cloud storage. Initially, the file storage indexing easy carried out with Lexical Syntactic Parser (LSP) to split the files into blocks. Then comparativesector was created based on Chunk staking. Based on the file frequency weight, the relative Indexing was verified through Cross-Layer Fragment Indexing (CLFI). Then the fragmented index gets grouped by maximum relative threshold margin usingIntra Subset Near-Duplicate Clusters (ISNDC). The hashing is applied to get comparative index points based on hyper correlation comparer using Hyper Spectral Hash Duplicate Filter (HSHDF). This filter the near duplicate contentdepending on file content difference to identify the duplicates. This proposed system produces high performance compared to the other system. This optimizes cloudstorage and has a higher precision rate than other methods.

Style APA, Harvard, Vancouver, ISO itp.

36

Kanellakis, Paris, Sridhar Ramaswamy, Darren E. Vengroff i Jeffrey Scott Vitter. "Indexing for Data Models with Constraints and Classes". Journal of Computer and System Sciences 52, nr 3 (czerwiec 1996): 589–612. http://dx.doi.org/10.1006/jcss.1996.0043.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

YAO, Qiu-Lin, Ying WANG, Ping LIU i Li GUO. "Storage Optimized Containment-Encoded Intervals Indexing for Data Stream Querying". Journal of Software 20, nr 9 (13.11.2009): 2462–69. http://dx.doi.org/10.3724/sp.j.1001.2009.03402.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

38

Gibson, Seann, i Kerr Gibson. "Mtree data structure for storage, indexing and retrieval of information". Laboratory Automation & Information Management 33, nr 1 (czerwiec 1997): 64. http://dx.doi.org/10.1016/s1381-141x(97)80054-8.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

39

S, Karthi, i Prabu S. "Execution Analysis of Spatial Data Storage Indexing on Cloud Environment". Scalable Computing: Practice and Experience 19, nr 4 (29.12.2018): 339–49. http://dx.doi.org/10.12694/scpe.v19i4.1421.

Pełny tekst źródła

Streszczenie:

Cloud computing overcome the GIS issues are huge storage, computing and reliability. Cloud computing with SpatialHadoop framework gives high performance in GIS. This paper presents spatial partition, global index and map reduce operations were studied and described in detail. Bloom filter R-tree index in the Map-reduce for providing more efficiency than the existing approaches. The BR-tree index on Map-Reduce is implemented in SpatialHadoop process that reduces intermediate data access time. Global index decreases the number of data accesses for range queries and thus improves efficiency. It is observed through experimental results that the proposed index along cloud environment performs better than existing techniques

Style APA, Harvard, Vancouver, ISO itp.

40

Guan, Runda, Ziyu Wang, Xiaokang Pan, Rongjie Zhu, Biao Song i Xinchang Zhang. "SbMBR Tree—A Spatiotemporal Data Indexing and Compression Algorithm for Data Analysis and Mining". Applied Sciences 13, nr 19 (22.09.2023): 10562. http://dx.doi.org/10.3390/app131910562.

Pełny tekst źródła

Streszczenie:

In the field of data analysis and mining, adopting efficient data indexing and compression techniques to spatiotemporal data can significantly reduce computational and storage overhead for the abilities to control the volume of data and exploit the spatiotemporal characteristics. However, traditional lossy compression techniques are hardly suitable due to their inherently random nature. They often impose unpredictable damage to scientific data, which affects the results of data mining and analysis tasks that require certain precision. In this paper, we propose a similarity-based minimum bounding rectangle (SbMBR) tree, a tree-based indexing and compression method, to address the aforementioned problem. Our method can hierarchically select appropriate minimum bounding rectangles according to the given maximum acceptable errors and use the average value contained in each selected MBR to replace the original data to achieve data compression with multi-layer loss control. This paper also provides the corresponding tree construction algorithm and range query processing algorithm for the indexing structure mentioned above. To evaluate the data quality preservation in cross-domain data analysis and mining scenarios, we use mutual information as the estimation metric. Experimental results emphasize the superiority of our method over some of the typical indexing and compression algorithms.

Style APA, Harvard, Vancouver, ISO itp.

41

Kouahla, Zineddine, Ala-Eddine Benrazek, Mohamed Amine Ferrag, Brahim Farou, Hamid Seridi, Muhammet Kurulay, Adeel Anjum i Alia Asheralieva. "A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues". Future Internet 14, nr 1 (31.12.2021): 19. http://dx.doi.org/10.3390/fi14010019.

Pełny tekst źródła

Streszczenie:

The past decade has been characterized by the growing volumes of data due to the widespread use of the Internet of Things (IoT) applications, which introduced many challenges for efficient data storage and management. Thus, the efficient indexing and searching of large data collections is a very topical and urgent issue. Such solutions can provide users with valuable information about IoT data. However, efficient retrieval and management of such information in terms of index size and search time require optimization of indexing schemes which is rather difficult to implement. The purpose of this paper is to examine and review existing indexing techniques for large-scale data. A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme. The real-world applications of the existing indexing techniques in different areas, such as health, business, scientific experiments, and social networks, are presented. Open problems and research challenges, e.g., privacy and large-scale data mining, are also discussed.

Style APA, Harvard, Vancouver, ISO itp.

42

Li, Wu, Wu i Zhao. "An Adaptive Construction Method of Hierarchical Spatio-Temporal Index for Vector Data under Peer-to-Peer Networks". ISPRS International Journal of Geo-Information 8, nr 11 (12.11.2019): 512. http://dx.doi.org/10.3390/ijgi8110512.

Pełny tekst źródła

Streszczenie:

Spatio-temporal indexing is a key technique in spatio-temporal data storage and management. Indexing methods based on spatial filling curves are popular in research on the spatio-temporal indexing of vector data in the Not Relational (NoSQL) database. However, the existing methods mostly focus on spatial indexing, which makes it difficult to balance the efficiencies of time and space queries. In addition, for non-point elements (line and polygon elements), it remains difficult to determine the optimal index level. To address these issues, this paper proposes an adaptive construction method of hierarchical spatio-temporal index for vector data. Firstly, a joint spatio-temporal information coding based on the combination of the partition and sort key strategies is presented. Secondly, the multilevel expression structure of spatio-temporal elements consisting of point and non-point elements in the joint coding is given. Finally, an adaptive multi-level index tree is proposed to realize the spatio-temporal index (Multi-level Sphere 3, MLS3) based on the spatio-temporal characteristics of geographical entities. Comparison with the XZ3 index algorithm proposed by GeoMesa proved that the MLS3 indexing method not only reasonably expresses the spatio-temporal features of non-point elements and determines their optimal index level, but also avoids storage hotspots while achieving spatio-temporal retrieval with high efficiency.

Style APA, Harvard, Vancouver, ISO itp.

43

Yang, Yuqi, Xiaoqing Zuo, Kang Zhao i Yongfa Li. "Non-Uniform Spatial Partitions and Optimized Trajectory Segments for Storage and Indexing of Massive GPS Trajectory Data". ISPRS International Journal of Geo-Information 13, nr 6 (12.06.2024): 197. http://dx.doi.org/10.3390/ijgi13060197.

Pełny tekst źródła

Streszczenie:

The presence of abundant spatio-temporal information based on the location of mobile objects in publicly accessible GPS mobile devices makes it crucial to collect, analyze, and mine such information. Therefore, it is necessary to index a large volume of trajectory data to facilitate efficient trajectory retrieval and access. It is difficult for existing indexing methods that primarily rely on data-driven indexing structures (such as R-Tree) or space-driven indexing structures (such as Quadtree) to support efficient analysis and computation of data based on spatio-temporal range queries as a service basis, especially when applied to massive trajectory data. In this study, we propose a massive GPS data storage and indexing method based on uneven spatial segmentation and trajectory optimization segmentation. Primarily, the method divides GPS trajectories in a large spatio-temporal data space into multiple MBR sequences by greedy algorithm. Then, a hybrid indexing model for segmented trajectories is constructed to form a global spatio-temporal segmentation scheme, called HHBITS index, to achieve hierarchical organization of trajectory data. Eventually, a spatio-temporal range query processing method is proposed based on this index. This paper implements and evaluates the index in MongoDB and compares it with two other spatio-temporal composite indexes for performing spatio-temporal range queries efficiently. The experimental results show that the method in this paper has high performance in responding to spatio-temporal queries on large-scale trajectory data.

Style APA, Harvard, Vancouver, ISO itp.

44

Mohamed, Mohamed Attia, Manal A. Abdel-Fattah i Ayman E. Khedr. "Challenges and Recommendations in Big Data Indexing Strategies". International Journal of e-Collaboration 17, nr 2 (kwiecień 2021): 22–39. http://dx.doi.org/10.4018/ijec.2021040102.

Pełny tekst źródła

Streszczenie:

Index structures are one of the main strategies for effective data access applied for indexing data. According to expansion of data, traditional indexing strategies on big data meets several challenges that lead to weak performance; they haven't abilities to handle the rapid increase of data in terms of accurate retrieval results and processing time. So, it is necessary to substitute traditional index with another efficient index structure called learned index. Learned index goes to use machine learning models to tackle such issues and achieve more enhancements of processing time and accurate results. In this research, the authors discuss different indexing strategies on big data both traditional and learned indexes, demonstrate the main features of them, perform comparison in terms of its performance, and present big data indexing challenges and solutions. Consequently, the research suggests replacing traditional indexes by dynamic index models, which lead to less processing time and more accurate results taking into consideration specification of hardware used.

Style APA, Harvard, Vancouver, ISO itp.

45

Kunfang, Song, i Hongwei Lu. "Efficient Querying Distributed Big-XML Data using MapReduce". International Journal of Grid and High Performance Computing 8, nr 3 (lipiec 2016): 70–79. http://dx.doi.org/10.4018/ijghpc.2016070105.

Pełny tekst źródła

Streszczenie:

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.

Style APA, Harvard, Vancouver, ISO itp.

46

Aouat, Saliha, i Slimane Larabi. "Object Retrieval Using the Quad-Tree Decomposition". Journal of Intelligent Systems 23, nr 1 (1.01.2014): 33–47. http://dx.doi.org/10.1515/jisys-2013-0014.

Pełny tekst źródła

Streszczenie:

AbstractWe propose in this article an indexing and retrieval approach applied on outline shapes. Models of objects are stored in a database using the textual descriptors of their silhouettes. We extract from the textual description a set of efficient similarity measures to index the silhouettes. The extracted features are the geometric quasi-invariants that vary slightly with the small change in the viewpoint. We use a textual description and quasi-invariant features to minimize the storage space and to achieve an efficient indexing process. We also use the quad-tree structure to improve processing time during indexing. Using both geometric features and quad-tree decomposition facilitates recognition and retrieval processes. Our approach is applied on the outline shapes of three-dimensional objects. Experiments conducted on two well-known databases show the efficiency of our method in real-world applications, especially for image indexing and retrieval.

Style APA, Harvard, Vancouver, ISO itp.

47

Leung, A. W., M. Shao, T. Bisson, S. Pasupathy i E. L. Miller. "High-performance metadata indexing and search in petascale data storage systems". Journal of Physics: Conference Series 125 (1.07.2008): 012069. http://dx.doi.org/10.1088/1742-6596/125/1/012069.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

48

Doan, Quang-Tu, A. S. M. Kayes, Wenny Rahayu i Kinh Nguyen. "Integration of IoT Streaming Data With Efficient Indexing and Storage Optimization". IEEE Access 8 (2020): 47456–67. http://dx.doi.org/10.1109/access.2020.2980006.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

49

Krommyda, Maria, i Verena Kantere. "Spatial Data Management in IoT Systems: Solutions and Evaluation". International Journal of Semantic Computing 15, nr 01 (marzec 2021): 117–39. http://dx.doi.org/10.1142/s1793351x21300016.

Pełny tekst źródła

Streszczenie:

As the Internet of Things (IoT) systems gain in popularity, an increasing number of Big Data sources are available. Ranging from small sensor networks designed for household use to large fully automated industrial environments, the IoT systems create billions of measurements each second making traditional storage and indexing solutions obsolete. While research around Big Data has focused on scalable solutions that can support the datasets produced by these systems, the focus has been mainly on managing the volume and velocity of these data, rather than providing efficient solutions for their retrieval and analysis. A key characteristic of these data, which is, more often than not, overlooked, is the spatial information that can be used to integrate data from multiple sources and conduct multi-dimensional analysis of the collected information. We present here the solutions currently available for the storage and indexing of spatial datasets produced by the IoT systems and we discuss their applicability in real-world scenarios.

Style APA, Harvard, Vancouver, ISO itp.

50

Song, Zhen, Jian Chen i Jiu Yan Ye. "A Mobile Storage System for Massive Spatial Data". Advanced Materials Research 962-965 (czerwiec 2014): 2730–34. http://dx.doi.org/10.4028/www.scientific.net/amr.962-965.2730.

Pełny tekst źródła

Streszczenie:

This paper aims at the solution to problems of large amounts of data storage, low efficiency calculate and poor user experience which actually exists in GIS application on mobile devices. In order to solve these issues we use GIS technologies including spatial data organization, map browser, and spatial index. We focused on the research of how to effectively utilize system resources and rational and efficient out organize the spatial data. What’s more, we improved the R-tree indexing algorithm to establish a rapid spatial index structure and used hierarchical classification techniques to optimize the efficiency of the real-time visualization of spatial data for mobile devices.

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!