I was recently tasked with re-writing the database representation for a potentially very large list of elements, each with an arbitrary number of possibly different attributes. The problem is concerning a standalone Java desktop application. Currently, the list of elements is internally stored in an embedded SQL database. In order to deal with the fixed schema requirements of SQL, the following schema was designed:
When this was designed, the number of elements was supposed to be limited to a couple thousand. You already guessed it, this limitation is long gone. Users now load files containing hundreds of thousands of elements. The current approach can not handle this load properly, which is not surprising. The relational database needs to perform JOIN operations for almost all use cases. Simply displaying an element with all its properties requires a JOIN of all three tables.
This is a very common approach with relational databases. I can not think of many ways to improve this schema without introducing other pitfalls. I would be very interested to hear about improved schema suggestions in the Disqus comments below.
However, there is definitely one way to improve this situation:
Getting rid of the relational database completely.
Having a list of elements that can each have an arbitrary number of properties that might differ from each other is a perfect use case for a document-oriented NoSQL database.
When I started to look for NoSQL alternatives to tackle the described problem, I immediately thought of MongoDB. I have used it already in some small projects and feel quite comfortable with it. Unfortunately, there is no embedded version of MongoDB, except for testing and for mobile app development.
So my search went on. At first, I stumbled upon some very interesting key-value store projects:
I am happy to have these on my radar now, they are certainly very interesting and might find some use in future projects. However, for the above described use case, a key-value store is not a good fit. Instead, a document-oriented approach similar to MongoDB would be ideal. With this requirement, I actually had trouble finding open source projects. In the end, I had two relatively unknown alternatives:
- Jsondb: https://github.com/Jsondb/jsondb-core
- Nitrite Database: https://github.com/dizitart/nitrite-database
The decision in favor of Nitrite was quite obvious, it has beautiful and extensive documentation and boasts a “Very fast and lightweight MongoDB like API”.
Implementing our use case in a document-oriented NoSQL database is actually very straightforward. We simply create a
NitriteCollection and add all our elements as
Document to this collection. Each document will contain exactly the attributes that the respective element actually has set. This requires much less effort than with fixed-schema databases.
tableId attribute is specific to my use case. It allows to index my elements for later use with a
Using Nitrite as JTable Backend
I did not (yet) create a proper, reproducible evaluation setup. However, it is quite easy to get a feeling for how fast
Nitrite compares to embedded SQL solutions (Derby, H2, …).
Simply create a a basic Java Swing GUI containing a JTable with the following
This allows to easily filter collections and display the results.
nitriteTableModel.setFilteredIndices(nitriteCollection.find(Filters.eq(attributeNameField.getText(), attributeValueField.getText()))); nitriteTableModel.fireTableDataChanged();
nitriteTableModel.setFilteredIndices(nitriteCollection.find(FindOptions .sort(attributeNameField.getText(), SortOrder.Ascending))); nitriteTableModel.fireTableDataChanged();
attributeValueField are basic
JTextField GUI elements that allow to specify which column should be used for filtering or sorting as well as which value the filtered elements have to match. Nitrite provides all the obvious filtering (
in, …) and sorting (ascending, descending) options out of the box.
I have used Nitrite as
JTable backend with several million elements. The table performs amazingly smooth. Filtering and sorting on indexed attributes is almost instant as compared to several seconds or even minutes with the SQL-based approach described in the beginning.
If you happen to know of a more extensive performance evaluation of Nitrite, ideally comparing it against embedded SQL-based systems, please post it in the comments or contact me directly. If there is none, I might do one myself in a future blog post.