Upsert and Upsert all with JDBCTemplate and Cockroach DB

Sai Ram Naragoni
3 min readFeb 5, 2022
By RogueNation

In my previous blog, I have covered the custom implementation of upsert and upsertAll operations for all the repositories in the application using spring-data-jpa and reviewed the performance difference. In this blog, I am going address the same but using JDBC-Template. I will not be using any in-built repository such as CrudRepository, for comparing upsert and upsertAll performance since they give out similar result as seen in JPA. Instead, I will be using JDBCTemplate to execute queries for the save() and saveAll() operations.

Intuition !

The intuition here is same as seen with JPA implementation where I used the concept of reflection on the annotations of the entity class to frame the UPSERT query, except that the annotations change along with the execution flow where the generated query is executed using JDBCTemplate.

For this demonstation, I used ‘users’ table in the cockroach db with a single node local cluster on the ‘movr’ database.

Let’s get started !

To begin with, create a custom base repository interface to declare upsert() and upsertAll() methods. All the repositories defined in the project should be extending the custom base repository as shown below.

Custom base JDBC Repository
User Repository extending Custom base Respository

Now comes the important part — implementing the upsert and upsertAll query generation and execution. To frame the query, we would need 3 things — table_name, column_names in the table and the data_values for the given corresponding column_names. Fetching table_name would be simple with the use of reflection on the entity class but fetching the column_names from the entity is a bit more complicated because of superclasses (if any — also note that there are no embedded-Ids or join annotations involved here like in JPA). Brief explanation of the functions in the QueryGenerator can be found below.

1. getColumnFields - Fetch all the declared fields in the Entity class and its super classes(if any) that are annotated with Column.class2. formInitialQuery - Forms the initial query 'UPSERT INTO table_name (column_names) VALUES '3. populateTableColumns - Populates the column_names in the above formed initial query.4. populateColumnParams (and values) - For a given entity, fetch the value of the column_name. The output of this function would look something like - "(value1,value2,value3,value4,value5),". There is no need to parameterize here cause the column_name and its corresponding value is what needed.6. upsert and upsertAll - upsert for a single entity and upsert all for a list of entites. POSTGRE SQL has a variable limit in a single query as 32767, hence any query statement during upsertAll with variable count more than this limit will be split into batches and generated.Find below the source code for the above described functions :
Query Generator

Now, It’s time to implement a concrete class for UserReposity interface. There is one additional operation defined in the custom base respository — upsertAllParallel(). I am using parallel Streams from Java-8 to execute upsert queries in parallel which gives a significant performance boost for upserting records in orders of millions as shown later below.

Performance comparision of save/saveAll vs upsert/upsertAll/upsertAllParallel

Save Vs Upsert
SaveAll Vs UpsertAll Vs UpsertAllParallel (time in ms)

All the attached tests are for upsert operations (for saveAll batch_size used is 250). The time to execute the query is more for upsert() than save() but its quite the opposite for upsertAll() and saveAll(). Unlike what is observed in JPA though, the difference can be seen starting with entities more than or equal to 10. The time to execute the query is reduced by almost 60 times in upsertAll() and 200 times in upsertAllParallel(). The improvement in terms of percentage is also very huge 5000% for upsertAll() and 20000% for upsertAllParallel().

Conclusion

The catch here is the same as stated in my previous blog. But Regardless, for less frequent bulk database operations, this approach gives a huge advantage.

You can find the complete source code for this post on my github.

--

--