Recently, I had the exciting challenge to redesign the database representation of a potentially very large list of elements, each with an arbitrary number of possibly different attributes. Previously, the list was internally stored in an embedded SQL database. In order to deal with the fixed schema requirements of SQL, the following schema was conceptualized:
This approach is perfectly fine for a few thousand elements. However, when performance becomes critical because you would like to support very large databases, this approach reaches its limitations. This is not surprising because the relational database needs to perform JOIN operations for almost all use cases. Simply displaying an element with all its properties requires a JOIN of all three tables.
Nevertheless, it is a very common approach with relational databases. I cannot think of many ways to improve this schema without introducing other pitfalls. I would be very interested to hear about improved schema suggestions in the comments below.
However, there is definitely one way to improve this situation:
Getting rid of the relational database completely.
Having a list of elements that can each have an arbitrary number of properties which might differ from each other is a perfect use case for a document-oriented NoSQL database. The only challenge remaining was to find an embeddable open-source project, as those were strict requirements.
When I started to look for NoSQL alternatives to tackle the described problem, I immediately thought of MongoDB. I have used it already in some small projects and feel quite comfortable with it. Unfortunately, there is no embedded version of MongoDB, except for testing and for mobile app development.
So my search went on. At first, I stumbled upon some very interesting key-value store projects:
I am happy to have these on my radar now, they are certainly very interesting and might find some use in future projects. However, for the above described use case, a key-value store is not a good fit. Instead, a document-oriented approach similar to MongoDB would be ideal. With this requirement, I actually had trouble finding open source projects. In the end, I had two relatively unknown alternatives:
- Jsondb: https://github.com/Jsondb/jsondb-core
- Nitrite Database: https://github.com/dizitart/nitrite-database
The decision in favor of Nitrite was quite obvious, it has beautiful and extensive documentation and boasts a “Very fast and lightweight MongoDB like API”.
Implementing our use case in a document-oriented NoSQL database is actually very straightforward. We simply create a
NitriteCollection and add all our elements as
Document to this collection. Each document will contain exactly the attributes that the respective element actually has set. This requires much less effort than with fixed-schema databases.
tableId attribute is specific to my use case. It allows to index my elements for later use with a
Using Nitrite as JTable Backend
I did not (yet) create a proper, reproducible evaluation setup. However, it is quite easy to get a feeling for how fast
Nitrite compares to embedded SQL solutions (Derby, H2, …).
Simply create a a basic Java Swing GUI containing a JTable with the following
This allows to easily filter collections and display the results.
nitriteTableModel.setFilteredIndices(nitriteCollection.find(Filters.eq(attributeNameField.getText(), attributeValueField.getText()))); nitriteTableModel.fireTableDataChanged();
nitriteTableModel.setFilteredIndices(nitriteCollection.find(FindOptions .sort(attributeNameField.getText(), SortOrder.Ascending))); nitriteTableModel.fireTableDataChanged();
attributeValueField are basic
JTextField GUI elements that allow to specify which column should be used for filtering or sorting as well as which value the filtered elements have to match. Nitrite provides all the obvious filtering (
in, …) and sorting (ascending, descending) options out of the box.
I have used Nitrite as
JTable backend with several million elements. The table performs amazingly smooth. Filtering and sorting on indexed attributes is almost instant as compared to several seconds or even minutes with the SQL-based approach described in the beginning.
If you happen to know of a more extensive performance evaluation of Nitrite, ideally comparing it against embedded SQL-based systems, please post it in the comments or contact me directly. If there is none, I might do one myself in a future blog post.