Scalability of commercial database management systems as RDF stores
Abstract
With the data on the Web growing exponentially, the World Wide Web Consortium has proposed semantic storage of data in order to maintain common formats for data integration. Researchers have extended the advantage of storing data semantically to life sciences data. Finding the right database to store semantic data depends on various factors such as amount of data involved, information to be extracted from the data stored, functionality to be addressed by the database, available expertise and infrastructure, etc. The author has compared three RDF stores—Oracle 11g, Virtuoso and TDB—for storage and retrieval of RDF data, using basic commodity hardware. The data used was cancer proteomics data, generated by a mass spectrometry instrument. Performance comparison was solely on 3 factors—data loading time, query response time and query throughput. For all data sets, TDB gave the lowest data loading times, followed by Virtuoso. Oracle gave the highest data loading times. For most of the queries, Virtuoso performed best for smaller datasets. For bigger datasets, TDB performed better in most cases and was closely followed by Virtuoso. For all queries, Oracle gave the longest query response times. Thus, combining data loading times and query response times, TDB performed the best, closely followed by Virtuoso. Oracle showed the worst performance amongst all three databases.
Degree
M.S.
Advisors
Springer, Purdue University.
Subject Area
Information Technology|Information science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.