Dissertations / Theses: 'Distributed databases'

1

Unnava, Vasundhara. "Query processing in distributed database systems." Connect to resource, 1992. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1261314105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Bielecki, Pavel. "Distributed relational database system of occasionally connected databases." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2000. http://handle.dtic.mil/100.2/ADA378092.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Karlapalem, Kamalakar. "Redesign of distributed relational databases." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/9173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Dixon, Eric Richard. "Developing distributed applications with distributed heterogenous databases." Thesis, Virginia Tech, 1993. http://hdl.handle.net/10919/42748.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Hsu, Ing-Miin. "Distributed rule monitoring in distributed active databases /." The Ohio State University, 1993. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487841975356679.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Juntunen, R. (Risto). "Tradeoffs in distributed databases." Bachelor's thesis, University of Oulu, 2016. http://urn.fi/URN:NBN:fi:oulu-201602231230.

Full text

Abstract:

In a distributed database data is spread throughout the network into separated nodes with different DBMS systems (Date, 2000). According to CAP-theorem three database properties — consistency, availability and partition tolerance cannot be achieved simultaneously in distributed database systems. Two of these properties can be achieved but not all three at the same time (Brewer, 2000). Since this theorem there has been some development in network infrastructure. Also new methods to achieve consistency in distributed databases has emerged. This paper discusses trade-offs in distributed databases.

APA, Harvard, Vancouver, ISO, and other styles

7

Andriopoulos, X. "Databases for distributed realtime systems." Thesis, Imperial College London, 1986. http://hdl.handle.net/10044/1/37926.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Xu, Lianghong. "Online Deduplication for Distributed Databases." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/719.

Full text

Abstract:

The rate of data growth outpaces the decline of hardware costs, and there has been an ever-increasing demand in reducing the storage and network overhead for online database management systems (DBMSs). The most widely used approach for data reduction in DBMSs is blocklevel compression. Although this method is simple and effective, it fails to address redundancy across blocks and therefore leaves significant room for improvement for many applications. This dissertation proposes a systematic approach, termed similaritybased deduplication, which reduces the amount of data stored on disk and transmitted over the network beyond the benefits provided by traditional compression schemes. To demonstrate the approach, we designed and implemented dbDedup, a lightweight record-level similaritybased deduplication engine for online DBMSs. The design of dbDedup exploits key observations we find in database workloads, including small item sizes, temporal locality, and the incremental nature of record updates. The proposed approach differs from traditional chunk-based deduplication approaches in that, instead of finding identical chunks anywhere else in the data corpus, similarity-based deduplication identifies a single similar data-item and performs differential compression to remove the redundant parts for greater savings. To achieve high efficiency, dbDedup introduces novel encoding, caching and similarity selection techniques that significantly mitigate the deduplication overhead with minimal loss of compression ratio. For evaluation, we integrated dbDedup into the storage and replication components of a distributed NoSQL DBMS and analyzed its properties using four real datasets. Our results show that dbDedup achieves up to 37⇥ reduction in the storage size and replication traffic of the database on its own and up to 61⇥ reduction when paired with the DBMS’s block-level compression. dbDedup provides both benefits with negligible effect on DBMS throughput or client latency (average and tail).

APA, Harvard, Vancouver, ISO, and other styles

9

Garcia, Hong-Mei Chen. "A semantics-based methodology for integrated distributed database design: Toward combined logical and fragmentation design and design automation." Diss., The University of Arizona, 1992. http://hdl.handle.net/10150/185936.

Full text

Abstract:

The many advantages of Distributed Database (DDB) systems can only be achieved through proper DDB designs. Since designing a DDB is very difficult and expert designers are relatively few in number, "good" DDB design methodologies and associated computer-aided design tools are needed to help designers cope with design complexity and improve their productivity. Unfortunately, previous DDB design research focused on solving subproblems of data distribution design in isolation. As a result, past research on a general DDB design methodology offered only methodological frameworks that, at best, aggregate a set of non-integrated design techniques. The conventional separation of logical design from fragmentation design is problematic, but has not been fully analyzed. This dissertation presents the SEER-DTS methodology developed for the purposes of overcoming the methodological inadequacies of conventional design methodologies, resolving the DDB design problem in an integrated manner and facilitating design automation. It is based on a static semantic data model, SEER (Synthesized Extended Entity-Relationship Model) and a dynamic data model, DTS (Distributed Transaction Scheme), which together provide complete and consistent modeling mechanisms for acquiring/representing DDB design inputs and facilitating DDB schema design. In this methodology, requirement/distribution analysis and conceptual design are integrated and logical and fragmentation designs are combined. "Semantics-based" design techniques have been developed to allow for end-user design specifications and seamless design schema transformations, thereby simplifying design tasks. Towards our ultimate goal of design automation, an architectural framework for a computer-aided DDB design system, Auto-DDB, was formulated and the system was prototyped. As part of the developmental effort, a real-world DDB design case study was conducted to verify the applicability of the SEER-DTS methodology in a manual design mode. The results of a laboratory experiment showed that the SEER-DTS methodology produced better design outcomes (in terms of design effectiveness and efficiency) than a Conventional Best methodology performed by non-expert designers in an automated design mode. However, no statistically significant difference was found in user-perceived ease of use.

APA, Harvard, Vancouver, ISO, and other styles

10

Potter, Anthony. "Query answering in distributed RDF databases." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:2ed8a003-7850-4699-bdbf-38be68673813.

Full text

Abstract:

To simplify data integration and exchange, modern applications often represent their data using the Resource Description Framework (RDF). As the amount of the available data keeps increasing, many RDF datasets cannot be processed using centralised RDF stores. A common solution is to distribute RDF data in a cluster of shared-nothing servers, and to query the data using a distributed query algorithm. Existing approaches typically use a variant of the data exchange operator to shuffle partial query answers between servers and thus ensure that every query answer is produced. Decisions as to when and where to shuffle the data are usually made statically - that is, at query compile time. In this thesis, we argue that such approaches can miss opportunities for local computation and thus incur considerable overheads. Moreover, we present a novel distributed query evaluation algorithm for RDF based on dynamic data exchange, where all computation that can be done locally is guaranteed to be performed on a single server. Our approach can successfully process any query even if the memory available at each server is bounded, and we argue that this is critical in distributed systems where intermediate results can easily exceed the capacity of each server. We also present a new query planning approach that balances the cost of communication against the cost of local processing at each server, as well as a new approach to partitioning RDF data that aims to increase locality in each server. We have implemented our approach in the well-known RDFox data store, and our empirical evaluation suggests that our techniques can outperform the state of the art by orders of magnitude in terms of query evaluation times, network communication, and memory use.

APA, Harvard, Vancouver, ISO, and other styles

11

KUMAR, SUSMIT. "NEAREST NEIGHBOR SEARCH IN DISTRIBUTED DATABASES." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1022879916.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Lundin, Mats. "Building Distributed Control Systems Using Distributed Active Real-Time Databases." Thesis, University of Skövde, Department of Computer Science, 1998. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-234.

Full text

Abstract:

From the field of control theory, we can see that varying communication delays in a control system may be hard or even impossible to handle. From this point of view it is preferable to have these delays bounded and as small and as possible in order to adapt the control process to them. On the other hand, in some cases delays are inevitable and must be handled by the control system.

A control system may for different reasons be distributed, e.g., because of a distributed environment or severe environment demands such as heat or dust at some locations. Information in such a system will suffer from delays due to transportation from one place to another. These delays often show up in a random fashion, especially if a general network is used for transportation. Another source of delays is the system environment itself. For predictability reasons a real-time database is preferable if the delays are to be controlled.

A straightforward way of handling delays in a control system is to build the system such that delays are constant, i.e., to build a time invariant system. The time from sensor reading to actuation is made constant either by adding a suitable delay to achieve a total constant delay or by using time-triggered reading and actuation. These are simple ways of controlling the delays, but may be very inefficient because worst-case execution time must always be used. Other ways of handling varying delays are by using more tolerant control algorithms. There are two suitable control models proposed by Nilsson (1998) for this purpose. The tolerant algorithm approach is assumed in this work.

This thesis uses a distributed active real-time database system as a basis for building control systems. One of the main objectives is to determine how active functionality can be used to express the control system, i.e., how rules in the database can be used to express the control algorithm and for handling propagation of information. Another objective is to look at how the choice of consistency level in the database affects the result of the control system, i.e. how different consistency level affects the delays. Of interest is also to characterize what type of applications each level is suited for.

APA, Harvard, Vancouver, ISO, and other styles

13

Zhou, Wanlei, and mikewood@deakin edu au. "Building reliable distributed systems." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20051017.160921.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Tuck, Terry W. "Temporally Correct Algorithms for Transaction Concurrency Control in Distributed Databases." Thesis, University of North Texas, 2001. https://digital.library.unt.edu/ark:/67531/metadc2743/.

Full text

Abstract:

Many activities are comprised of temporally dependent events that must be executed in a specific chronological order. Supportive software applications must preserve these temporal dependencies. Whenever the processing of this type of an application includes transactions submitted to a database that is shared with other such applications, the transaction concurrency control mechanisms within the database must also preserve the temporal dependencies. A basis for preserving temporal dependencies is established by using (within the applications and databases) real-time timestamps to identify and order events and transactions. The use of optimistic approaches to transaction concurrency control can be undesirable in such situations, as they allow incorrect results for database read operations. Although the incorrectness is detected prior to transaction committal and the corresponding transaction(s) restarted, the impact on the application or entity that submitted the transaction can be too costly. Three transaction concurrency control algorithms are proposed in this dissertation. These algorithms are based on timestamp ordering, and are designed to preserve temporal dependencies existing among data-dependent transactions. The algorithms produce execution schedules that are equivalent to temporally ordered serial schedules, where the temporal order is established by the transactions' start times. The algorithms provide this equivalence while supporting currency to the extent out-of-order commits and reads. With respect to the stated concern with optimistic approaches, two of the proposed algorithms are risk-free and return to read operations only committed data-item values. Risk with the third algorithm is greatly reduced by its conservative bias. All three algorithms avoid deadlock while providing risk-free or reduced-risk operation. The performance of the algorithms is determined analytically and with experimentation. Experiments are performed using functional database management system models that implement the proposed algorithms and the well-known Conservative Multiversion Timestamp Ordering algorithm.

APA, Harvard, Vancouver, ISO, and other styles

15

Gong, Guohui. "On concurrency control in logbased databases." Thesis, Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/8175.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Bhasker, Bharat. "Query processing in heterogeneous distributed database management systems." Diss., Virginia Tech, 1992. http://hdl.handle.net/10919/39437.

Full text

Abstract:

The goal of this work is to present an advanced query processing algorithm formulated and developed in support of heterogeneous distributed database management systems. Heterogeneous distributed database management systems view the integrated data through an uniform global schema. The query processing algorithm described here produces an inexpensive strategy for a query expressed over the global schema. The research addresses the following aspects of query processing: (1) Formulation of a low level query language to express the fundamental heterogeneous database operations; (2) Translation of the query expressed over the global schema to an equivalent query expressed over a conceptual schema; (3) An estimation methodology to derive the intermediate result sizes of the database operations; (4) A query decomposition algorithm to generate an efficient sequence of the basic database operations to answer the query. This research addressed the first issue by developing an algebraic query language called cluster algebra. The cluster algebra consists of the following operations: (a) Selection, union, intersection and difference, which are extensions of their relational algebraic counterparts to heterogeneous databases; (b) Normal-join and normal-projection which replace their counterparts, join and projection, in the relational algebra; (c) Two new operators embed and unembed to restructure the database schema. The second issue of the query translation was addressed by development of an algorithm that translates a cluster algebra query expressed over the virtual views to an equivalent cluster algebra query expressed over the conceptual databases. A non-parametric estimation methodology to estimate the result size of a cluster algebra operation was developed to address the third issue described above. Finally, this research developed a query decomposition algorithm, applicable to the relational and non-relational databases, that decomposes a query by computing all profitable semi-join operations, followed by the determination of the best sequence of join operations per processing site. The join optimization is performed by formulating a zero-one integer linear program that uses the non-parametric estimation technique to compute the sizes of intermediate results. The query processing algorithm was implemented in the context of DAVID, a heterogeneous distributed database management system.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

17

Ashraf, Imran, and Amir Shahzed Khokhar. "Principles for Distributed Databases in Telecom Environment." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4753.

Full text

Abstract:

Centralized databases are becoming bottleneck for organizations that are physically distributed and access data remotely. Data management is easy in centralized databases. However, it carries high communication cost and most importantly high response time. The concept of distributing the data over various locations is very attractive for such organizations. In such cases the database is fragmented into fragments and distributed to the locations where it is needed. This kind of distribution provides local control of data and the data access is also very fast in such databases. However, concurrency control, query optimization and data allocations are the factors that affect the response time and must be investigated prior to implementing distributed databases. This thesis makes the use of mixed method approach to meet its objective. In quantitative section, we performed an experiment to compare the response time of two databases; centralized and fragmented/distributed. The experiment was performed at Ericsson. A literature review was also done to find out other important response time related issues like query optimization, concurrency control and data allocation. The literature review revealed that these factors can further improve the response time in distributed environment. Results of the experiment showed a substantial decrease in the response time due to the fragmentation and distribution.
Centraliserade databaser blir flaskhals för organisationer som är fysiskt distribuerade och tillgång till data på distans. Datahantering är lätt i centrala databaser. Men bär den höga kostnaden kommunikation och viktigast av hög svarstid. Konceptet att distribuera data över olika orter är mycket attraktiv för sådana organisationer. I sådana fall databasen är splittrade fragment och distribueras till de platser där det behövs. Denna typ av distribution ger lokal kontroll av uppgifter och dataåtkomst är också mycket snabb i dessa databaser. Men, samtidighet kontroll, frågeoptimering och data anslagen är de faktorer som påverkar svarstiden och måste utredas innan genomförandet distribuerade databaser. Denna avhandling gör användningen av blandade metod strategi för att nå sitt mål. I kvantitativa delen utförde vi ett experiment för att jämföra svarstid på två databaser, centraliserad och fragmenterad / distribueras. Försöket utfördes på Ericsson. En litteraturstudie har gjorts för att ta reda på andra viktiga svarstid liknande frågor som frågeoptimering, samtidighet kontroll och data tilldelning. Litteraturgenomgången visade att dessa faktorer ytterligare kan förbättra svarstiden i distribuerad miljö. Resultaten av försöket visade en betydande minskning av den svarstid på grund av splittring och distribution.

APA, Harvard, Vancouver, ISO, and other styles

18

Rogers, Brandon Lamar. "A Statistical Performance Model of Homogeneous Raidb Clusters." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd709.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Milton, Robert. "Time-series in distributed real-time databases." Thesis, University of Skövde, Department of Computer Science, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-827.

Full text

Abstract:

In a distributed real-time environment where it is imperative to make correct decisions it is important to have all facts available to make the most accurate decision in a certain situation. An example of such an environment is an Unmanned Aerial Vehicle (UAV) system where several UAVs cooperate to carry out a certain task and the data recorded is analyzed after the completion of the mission. This project aims to define and implement a time series architecture for use together with a distributed real-time database for the ability to store temporal data. The result from this project is a time series (TS) architecture that uses DeeDS, a distributed real-time database, for storage. The TS architecture is used by an application modelled from a UAV scenario for storing temporal data. The temporal data is produced by a simulator. The TS architecture solves the problem of storing temporal data for applications using DeeDS. The TS architecture is also useful as a foundation for integrating time series in DeeDS since it is designed for space efficiency and real-time requirements.

APA, Harvard, Vancouver, ISO, and other styles

20

Gottemukkala, Vibby. "Scalability issues in distributed and parallel databases." Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/8176.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

KHEDR, AHMED MOHAMED. "DESIGN OF DECOMPOSABLE ALGORITHMS FOR DISTRIBUTED DATABASES." University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1044894428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

SHINDE, KAUSTUBH ARUN. "FUNCTION COMPUTING IN VERTICALLY PARTITIONED DISTRIBUTED DATABASES." University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1163574762.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Friedman, Marc T. "Representation and optimization for data integration /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/6979.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Obermeyer, Lincoln Lance. "Abstractions and algorithms for active multidatabases /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Mukhopadhyay, Meenakshi. "Performance analysis of a distributed file system." PDXScholar, 1990. https://pdxscholar.library.pdx.edu/open_access_etds/4198.

Full text

Abstract:

An important design goal of a distributed file system, a component of many distributed systems, is to provide UNIX file access semantics, e.g., the result of any write system call is visible by all processes as soon as the call completes. In a distributed environment, these semantics are difficult to implement because processes on different machines do not share kernel cache and data structures. Strong data consistency guarantees may be provided only at the expense of performance. This work investigates the time costs paid by AFS 3.0, which uses a callback mechanism to provide consistency guarantees, and those paid by AFS 4.0 which uses typed tokens for synchronization. AFS 3.0 provides moderately strong consistency guarantees, but they are not like UNIX because data are written back to the server only after a file is closed. AFS 4.0 writes back data to the server whenever there are other clients wanting to access it, the effect being like UNIX file access semantics. Also, AFS 3.0 does not guarantee synchronization of multiple writers, whereas AFS 4.0 does.

APA, Harvard, Vancouver, ISO, and other styles

26

Weng, Bin. "Dynamic integration of evolving distributed databases using services." Thesis, Durham University, 2010. http://etheses.dur.ac.uk/322/.

Full text

Abstract:

This thesis investigates the integration of many separate existing heterogeneous and distributed databases which, due to organizational changes, must be merged and appear as one database. A solution to some database evolution problems is presented. It presents an Evolution Adaptive Service-Oriented Data Integration Architecture (EA-SODIA) to dynamically integrate heterogeneous and distributed source databases, aiming to minimize the cost of the maintenance caused by database evolution. An algorithm, named Relational Schema Mapping by Views (RSMV), is designed to integrate source databases that are exposed as services into a pre-designed global schema that is in a data integrator service. Instead of producing hard-coded programs, views are built using relational algebra operations to eliminate the heterogeneities among the source databases. More importantly, the definitions of those views are represented and stored in the meta-database with some constraints to test their validity. Consequently, the method, called Evolution Detection, is then able to identify in the meta-database the views affected by evolutions and then modify them automatically. An evaluation is presented using case study. Firstly, it is shown that most types of heterogeneity defined in this thesis can be eliminated by RSMV, except semantic conflict. Secondly, it presents that few manual modification on the system is required as long as the evolutions follow the rules. For only three types of database evolutions, human intervention is required and some existing views are discarded. Thirdly, the computational cost of the automatic modification shows a slow linear growth in the number of source database. Other characteristics addressed include EA-SODIA’ scalability, domain independence, autonomy of source databases, and potential of involving other data sources (e.g.XML). Finally, the descriptive comparison with other data integration approaches is presented. It shows that although other approaches may provide better performance of query processing in some circumstances, the service-oriented architecture provide better autonomy, flexibility and capability of evolution.

APA, Harvard, Vancouver, ISO, and other styles

27

Taylor, M. "Data integration and query decomposition in distributed databases." Thesis, University of Aberdeen, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.377623.

Full text

Abstract:

Preci* is a generalised distributed database management system, capable of supporting heterogeneous, pre-existing databases as nodes. The system is fully decentralised, supporting both retrieval and update of the data. Varying degrees of location transparency can be provided, according to user requirements. The work presented here is concerned with data integration and query decomposition. An extended relational algebra (PAL) is developed, which serves both as a query language and as a mapping language for data integration. The suitability of PAL for data integration is demonstrated by a number of examples, and by comparison with existing proposals. A major attraction of PAL is that it can also be used as a query language, thereby making query decomposition much easier. The relational algebraic approach is shown to be particularly appropriate for query decomposition, since queries can be easily parsed and represented in tree form. Such parse trees are readily transformed to yield equivalent expressions which will execute more efficiently. An algorithm is given for decomposing global PAL queries into nodal subqueries, and for coordinating their execution. The general problem of allocating subqueries to execution nodes is not tackled, though it is shown that the algorithm will do this allocation under specific implementation conditions. A prototype of Preci* has been implemented in 'C'.

APA, Harvard, Vancouver, ISO, and other styles

28

Jones, Evan P. C. (Evan Philip Charles) 1981. "Fault-tolerant distributed transactions for partitioned OLTP databases." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/71477.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 103-112).
This thesis presents Dtxn, a fault-tolerant distributed transaction system designed specifically for building online transaction processing (OLTP) databases. Databases have traditionally been designed as general purpose data processing tools. By being designed only for OLTP workloads, Dtxn can be more efficient. It is designed to support very large databases by partitioning data across a cluster of commodity servers in a data center. Combining multiple servers together allows systems built with Dtxn to be cost effective, highly available, scalable, and fault-tolerant. Dtxn provides three novel features. First, it provides reusable infrastructure for building a distributed OLTP database out of single machine databases. This allows developers to take a specialized backend storage engine and use it across multiple machines, without needing to re-implement the distributed transaction infrastructure. We used Dtxn to build four different applications: a simple key/value store, a specialized TPC-C implementation, a main-memory OLTP database, and a traditional disk-based OLTP database. Second, Dtxn provides a novel concurrency control mechanism called speculative concurrency control, designed for main memory OLTP workloads that are primarily composed of transactions with a single round of communication between the application and database. Speculative concurrency control executes one transaction at a time, with no concurrency control overhead. In cases where there may be stalls due to network communication, it speculates future transactions. Our results show that this provides significantly better throughput than traditional two-phase locking, outperforming it by a factor of two on the TPC-C benchmark. Finally, Dtxn supports live migration, allowing part of the data on one server to be moved to another server while processing transactions. Our experiments show that our approach has nearly no visible impact on throughput or latency when moving data under moderate to high loads. It has significantly less impact than the best commercially available systems when the database is overloaded. The period of time where the throughput is reduced is less than half as long as failing over to another replica or using virtual machine migration.
by Evan Philip Charles Jones.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

29

Pappas, Nicholas Peter. "Searching Biological Sequence Databases Using Distributed Adaptive Computing." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/31074.

Full text

Abstract:

Genetic research projects currently can require enormous computing power to processes the vast quantities of data available. Further, DNA sequencing projects are generating data at an exponential rate greater than that of the development microprocessor technology; thus, new, faster methods and techniques of processing this data are needed. One common type of processing involves searching a sequence database for the most similar sequences. Here we present a distributed database search system that utilizes adaptive computing technologies. The search is performed using the Smith-Waterman algorithm, a common sequence comparison algorithm. To reduce the total search time, an initial search is performed using a version of the algorithm, implemented in adaptive computing hardware, which is designed to efficiently perform the initial search. A final search is performed using a complete version of the algorithm. This two-stage search, employing adaptive and distributed hardware, achieves a performance increase of several orders of magnitude over similar processor based systems.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

30

Kim, Kihwan. "Managing motion triggered executables in distributed mobile databases." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389114.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Bergman, Sara. "Permissioned Blockchains and Distributed Databases : A Performance Study." Thesis, Linköpings universitet, Programvara och system, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152230.

Full text

Abstract:

Blockchain technology is a booming new field in both computer science and economicsand other use cases than cryptocurrencies are on the rise. Permissioned blockchains are oneinstance of the blockchain technique. In a permissioned blockchain the nodes which validatesnew transactions are trusted. Permissioned blockchains and distributed databasesare essentially two different ways for storing data, but how do they compare in terms ofperformance? This thesis compares Hyperledger Fabric to Apache Cassandra in four experimentsto investigate their insert and read latency. The experiments are executed usingDocker on an Azure virtual machine and the studied systems consist of up to 20 logicalnodes. Latency measurements are performed using varying network size and load. Forsmall networks, the insert latency of Cassandra is twice as high as that of Fabric, whereasfor larger networks Fabric has almost twice as high insert latencies as Cassandra. Fabrichas around 40 ms latency for reading data and Cassandra between 150 ms to 250 ms, thusit scales better for reading. The insert latency of different workloads is heavily affected bythe configuration of Fabric and by the Docker overhead for Cassandra. The read latency isnot affected by different workloads for either system.

APA, Harvard, Vancouver, ISO, and other styles

32

Muessig, Mikael. "Bounded Delay Replication in Distributed Databases with Eventual Consistency." Thesis, University of Skövde, Department of Computer Science, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-830.

Full text

Abstract:

Distributed real-time database systems demand consistency and timeliness. One approach for this problem is eventual consistency which guarantees local consistency within predictable time. Global consistency can be reached by best effort mechanisms but for some scenarios, e.g. an alarm signal, this may not be suffcient. Bounded delay replication, which provides global consistency in bounded time, ensures that after the local commit of a transaction updates are propagated to and integrated at any remote node within bounded time. The DRTS group at the University of Skövde is working on a project called DeeDS, which is a distributed real-time database prototype. In this prototype, eventual consistency with as

soon as possible (ASAP) replication is implemented. The goal of this dissertation is to further develop replication in this prototype in coexistence to the existing eventual consistency which implies the extension of both the theory and the implementation.

The main issue with bounded time replication is to make all parts, which are involved in the replication process predictable and simultaneously support eventual consistency with as soon as possible replication.

APA, Harvard, Vancouver, ISO, and other styles

33

Schneider, Jan, Héctor Cárdenas, and José Alfonso Talamantes. "Using Web Services for Transparent Access to Distributed Databases." Thesis, Jönköping University, JTH, Computer and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-940.

Full text

Abstract:

This thesis consists of a strategy to integrate distributed systems with the aid of web services. The focus of this research involves three subjects, web services and distributed database systems and its application on a real-life project.

For defining the context in this thesis, we present the research methodology that provides the path where the investigation will be performed and the general concepts of the running environment and architecture of web services.

The mayor contribution for this thesis is a solution for the Chamber Trade in Sweden and VNemart in Vietnam obtaining the requirement specification according to the SPIDER project needs and our software design specification using distributed databases and web services.

As results, we present the software implementation and the way or software meets and the requirements previously defined. For future web services developments, this document provides guidance for best practices in this subject.

APA, Harvard, Vancouver, ISO, and other styles

34

Nanongkai, Danupon. "Graph and geometric algorithms on distributed networks and databases." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41056.

Full text

Abstract:

In this thesis, we study the power and limit of algorithms on various models, aiming at applications in distributed networks and databases. In distributed networks, graph algorithms are fundamental to many applications. We focus on computing random walks which are an important primitive employed in a wide range of applications but has always been computed naively. We show that a faster solution exists and subsequently develop faster algorithms by exploiting random walk properties leading to two immediate applications. We also show that this algorithm is optimal. Our technique in proving a lower bound show the first non-trivial connection between communication complexity and lower bounds of distributed graph algorithms. We show that this technique has a wide range of applications by proving new lower bounds of many problems. Some of these lower bounds show that the existing algorithms are tight. In database searching, we think of the database as a large set of multi-dimensional points stored in a disk and want to help the users to quickly find the most desired point. In this thesis, we develop an algorithm that is significantly faster than previous algorithms both theoretically and experimentally. The insight is to solve the problem on the streaming model which helps emphasize the benefits of sequential access over random disk access. We also introduced the randomization technique to the area. The results were complemented with a lower bound. We also initiat a new direction as an attempt to get a better query. We are the first to quantify the output quality using "user satisfaction" which is made possible by borrowing the idea of modeling users by utility functions from game theory and justify our approach through a geometric analysis.

APA, Harvard, Vancouver, ISO, and other styles

35

Mathiason, Gunnar. "Virtual Full Replication for Scalable Distributed Real-Time Databases." Doctoral thesis, Linköpings universitet, Institutionen för datavetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-20661.

Full text

Abstract:

A fully replicated distributed real-time database provides high availability and predictable access times, independent of user location, since all the data is available at each node. However, full replication requires that all updates are replicated to every node, resulting in exponential growth of bandwidth and processing demands with the number of nodes and objects added. To eliminate this scalability problem, while retaining the advantages of full replication, this thesis explores Virtual Full Replication (ViFuR); a technique that gives database users a perception of using a fully replicated database while only replicating a subset of the data. We use ViFuR in a distributed main memory real-time database where timely transaction execution is required. ViFuR enables scalability by replicating only data used at the local nodes. Also, ViFuR enables flexibility by adaptively replicating the currently used data, effectively providing logical availability of all data objects. Hence, ViFuR substantially reduces the problem of non-scalable resource usage of full replication, while allowing timely execution and access to arbitrary data objects. In the thesis we pursue ViFuR by exploring the use of database segmentation. We give a scheme (ViFuR-S) for static segmentation of the database prior to execution, where access patterns are known a priori. We also give an adaptive scheme (ViFuR-A) that changes segmentation during execution to meet the evolving needs of database users. Further, we apply an extended approach of adaptive segmentation (ViFuR-ASN) in a wireless sensor network - a typical dynamic large-scale and resource-constrained environment. We use up to several hundreds of nodes and thousands of objects per node, and apply a typical periodic transaction workload with operation modes where the used data set changes dynamically. We show that when replacing full replication with ViFuR, resource usage scales linearly with the required number of concurrent replicas, rather than exponentially with the system size.

APA, Harvard, Vancouver, ISO, and other styles

36

Cooper, C., and n/a. "Space subdivision and distributed databases in a multiprocessor raytracer." University of Canberra. Information Sciences & Engineering, 1991. http://erl.canberra.edu.au./public/adt-AUC20060629.145540.

Full text

Abstract:

This thesis deals with computer generated images. The thesis begins with an overview of a generalised computer graphics system, including a brief survey of typical methods for generating photorealistic images. One such technique, ray tracing, is used as the basis for the work which follows. The overview section concludes with a statement of the aim which is to: Investigate the effective use of available processing power and effective utilisation of available memory by implementing a ray tracing programme which uses space subdivision, multiple processors and a distributed world model database. The problem formulation section describes the ray tracing principle and then introduces the main areas of study. The INMOS Transputer (a building block for concurrent systems) is used to implement the multiple process ray tracer. Space subdivision is achieved by repeated and regular subdivision of a world cube (which contains the scene to be ray traced) into named cubes, called octrees. The subdivision algorithm continues to subdivide space until no octree contains more than a specified number of objects, or until the practical limit of space subdivision is reached. The objects in the world model database are distributed in a round robin manner to the ray trace processes. During execution of the ray trace programme, information about each object is passed between processes by a message mechanism. The concurrent code for the transputer processes, written in OCCAM 2, was developed using timing diagrams and signal flow diagrams derived by analogy from digital electronics. Structure diagrams, modified to be consistent with OCCAM 2 processes, were derived from the timing diagrams and signal flow diagrams. These were used as a basis for the coding. The results show that space subdivision is an effective use of processor power because the number of trial intersections of rays with objects is dramatically reduced. In addition, distribution of the world model database avoids duplication of the database in the memory of each process and hence better utilisation of available memory is achieved. The programmes are supported by a menu driven interface (running on a PC AT) which enables the user to control the ray trace processes running on the transputer board housed in the PC.

APA, Harvard, Vancouver, ISO, and other styles

37

Oza, Smita. "Implementing real-time transactions using distributed main memory databases." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0031/MQ27056.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Srinivasan, Arati. "Role of distributed databases in an apparel supply chain." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/9163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Kuruganti, NSR Sankaran. "Distributed databases for Multi Mediation : Scalability, Availability & Performance." Thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-1018.

Full text

Abstract:

Context: Multi Mediation is a process of collecting data from network(s) & network elements, pre-processing this data and distributing it to various systems like Big Data analysis, Billing Systems, Network Monitoring Systems, and Service Assurance etc. With the growing demand for networks and emergence of new services, data collected from networks is growing. There is need for efficiently organizing this data and this can be done using databases. Although RDBMS offers Scale-up solutions to handle voluminous data and concurrent requests, this approach is expensive. So, alternatives like distributed databases are an attractive solution. Suitable distributed database for Multi Mediation, needs to be investigated. Objectives: In this research we analyze two distributed databases in terms of performance, scalability and availability. The inter-relations between performance, scalability and availability of distributed databases are also analyzed. The distributed databases that are analyzed are MySQL Cluster 7.4.4 and Apache Cassandra 2.0.13. Performance, scalability and availability are quantified, measurements are made in the context of Multi Mediation system. Methods: The methods to carry out this research are both qualitative and quantitative. Qualitative study is made for the selection of databases for evaluation. A benchmarking harness application is designed to quantitatively evaluate the performance of distributed database in the context of Multi Mediation. Several experiments are designed and performed using the benchmarking harness on the database cluster. Results: Results collected include average response time & average throughput of the distributed databases in various scenarios. The average throughput & average INSERT response time results favor Apache Cassandra low availability configuration. MySQL Cluster average SELECT response time is better than Apache Cassandra for greater number of client threads, in high availability and low availability configurations.Conclusions: Although Apache Cassandra outperforms MySQL Cluster, the support for transaction and ACID compliance are not to be forgotten for the selection of database. Apart from the contextual benchmarks, organizational choices, development costs, resource utilizations etc. are more influential parameters for selection of database within an organization. There is still a need for further evaluation of distributed databases.

I am indebted to my advisor Prof. Lars Lundberg and his valuable ideas which helped in the completion of this work. In fact he has guided on every crucial and important stages of this research work.

I sincerely thank Prof. Markus Fiedler & Prof. Kurt Tutschku for their endless support during the work.

I am grateful to Neeraj Garg, Sourab, Saket & Kulbir at Ericsson, for providing me necessary equipment and helping me financially during my work.

To my family members and friends who one way or the other shared their support. Thank you.

Above all I would like to thank the Supreme Personality of Godhead, the author of everything.

APA, Harvard, Vancouver, ISO, and other styles

40

Patvarczki, Jozsef. "Layout Optimization for Distributed Relational Databases Using Machine Learning." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/291.

Full text

Abstract:

A common problem when running Web-based applications is how to scale-up the database. The solution to this problem usually involves having a smart Database Administrator determine how to spread the database tables out amongst computers that will work in parallel. Laying out database tables across multiple machines so they can act together as a single efficient database is hard. Automated methods are needed to help eliminate the time required for database administrators to create optimal configurations. There are four operators that we consider that can create a search space of possible database layouts: 1) denormalizing, 2) horizontally partitioning, 3) vertically partitioning, and 4) fully replicating. Textbooks offer general advice that is useful for dealing with extreme cases - for instance you should fully replicate a table if the level of insert to selects is close to zero. But even this seemingly obvious statement is not necessarily one that will lead to a speed up once you take into account that some nodes might be a bottle neck. There can be complex interactions between the 4 different operators which make it even more difficult to predict what the best thing to do is. Instead of using best practices to do database layout, we need a system that collects empirical data on when these 4 different operators are effective. We have implemented a state based search technique to try different operators, and then we used the empirically measured data to see if any speed up occurred. We recognized that the costs of creating the physical database layout are potentially large, but it is necessary since we want to know the "Ground Truth" about what is effective and under what conditions. After creating a dataset where these four different operators have been applied to make different databases, we can employ machine learning to induce rules to help govern the physical design of the database across an arbitrary number of computer nodes. This learning process, in turn, would allow the database placement algorithm to get better over time as it trains over a set of examples. What this algorithm calls for is that it will try to learn 1) "What is a good database layout for a particular application given a query workload?" and 2) "Can this algorithm automatically improve itself in making recommendations by using machine learned rules to try to generalize when it makes sense to apply each of these operators?" There has been considerable research done in parallelizing databases where large amounts of data are shipped from one node to another to answer a single query. Sometimes the costs of shipping the data back and forth might be high, so in this work we assume that it might be more efficient to create a database layout where each query can be answered by a single node. To make this assumption requires that all the incoming query templates are known beforehand. This requirement can easily be satisfied in the case of a Web-based application due to the characteristic that users typically interact with the system through a web interface such as web forms. In this case, unseen queries are not necessarily answerable, without first possibly reconstructing the data on a single machine. Prior knowledge of these exact query templates allows us to select the best possible database table placements across multiple nodes. But in the case of trying to improve the efficiency of a Web-based application, a web site provider might feel that they are willing to suffer the inconvenience of not being able to answer an arbitrary query, if they are in turn provided with a system that runs more efficiently.

APA, Harvard, Vancouver, ISO, and other styles

41

Oza, Smita Carleton University Dissertation Computer Science. "Implementing real- time transactions using distributed main memory databases." Ottawa, 1997.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

42

Durrett, John Randall. "Distributed information systems design through software teams /." Digital version, 1999. http://wwwlib.umi.com/cr/utexas/fullcit?p9959479.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Mena, Eduardo Illarramendi Arantza. "Ontology-based query processing for global information systems /." Boston [u.a.] : Kluwer Acad. Publ, 2001. http://www.loc.gov/catdir/enhancements/fy0813/2001029621-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Mends, Diana. "Access Control and Storage of Distributed IoT Data." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37356.

Full text

Abstract:

There has been a growth of a class of databases known as the Not only SQL (NoSQL) databases in recent years. Its quick growth has been fueled by a high demand by businesses as it offers a convenient way to store data and is significantly different from our traditional relational databases. It is easy to process unstructured data, offers a cloud-friendly ap- proach and grows through the distribution of data over lots of commodity computers. Most of these NoSQL databases are distributed in several different locations, spanning countries and are known as geo-distributed cloud datastores. We work to customize one of these known as Cassandra. Given the size of the database and the size of applications accessing the data stored, it has been challenging to customize it to meet existing application Service Level Agreement (SLAs). We live in an era of data breaches and even though some types of information are stripped of all sensitive data, there are ways to easily identify and link it to data of real persons or government. Data saved in different countries are subject to the rules and regulations of that specific country and security measures employed to safeguard consumer data. In this thesis, we describe mechanisms for selectively replicating data in a large scale NoSQL datastore in respect of privacy and legal regulations. We introduce an easily extensible constraint language to implement these policy constraints through the creation of a pluggable topology provider in the configuration files of Cassandra. Experiments using the modified Cassandra trunk demonstrate that our techniques work well, respect response times and improves read and write latencies.

APA, Harvard, Vancouver, ISO, and other styles

45

Venugopal, Srikumar. "Scheduling distributed data-intensive applications on global grids /." Connect to thesis, 2006. http://eprints.unimelb.edu.au/archive/0002929.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Belkeir, Nasr Eddine. "Multicast communication in distributed systems with dynamic groups." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/8134.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Milton, Robert. "CORBA in the aspect of replicated distributed real-time databases." Thesis, University of Skövde, Department of Computer Science, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-644.

Full text

Abstract:

A distributed real-time database (DRTDB) is a database distributed over a network on several nodes and where the transactions are associated with deadlines. The issues of concern in this kind of database are data consistency and the ability to meet deadlines. In addition, there is the possibility that the nodes, on which the database is distributed, are heterogeneous. This means that the nodes may be built on different platforms and written in different languages. This makes the integration of these nodes difficult, since data types may be represented differently on different nodes. The common object request broker architecture (CORBA), defined by the Object Management Group (OMG), is a distributed object computing (DOC) middleware created to overcome problems with heterogeneous sites.

The project described in this paper aims to investigate the suitability of CORBA as a middleware in a DRTDB. Two extensions to CORBA, Fault-Tolerance CORBA (FT-CORBA) and Real-Time CORBA (RT-CORBA) is of particular interest since the combination of these extensions provides the features for object replication and end-to-end predictability, respectively. The project focuses on the ability of RT-CORBA meeting hard deadlines and FT-CORBA maintaining replica consistency by using replication with eventual consistency. The investigation of the combination of RT-CORBA and FT-CORBA results in two proposed architectures that meet real-time requirements and provides replica consistency with CORBA as the middleware in a DRTDB.

APA, Harvard, Vancouver, ISO, and other styles

48

Ouerd, Messaouda. "Learning in belief networks and its application to distributed databases." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0015/NQ57060.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

KINSEY, MICHAEL LOY. "PRIVACY PRESERVING INDUCTION OF DECISION TREES FROM GEOGRAPHICALLY DISTRIBUTED DATABASES." University of Cincinnati / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1123855448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Chiu, Lin. "A methodology for designing concurrency control schemes in distributed databases /." The Ohio State University, 1987. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487584612163117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Distributed databases'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles