Context:

Recently, I was asked to load some data quickly for a particular tab. For around 1000 rows of data, it was taking almost 40-45 seconds! The admins had to wait for a long time when they used to see those statistics for each customer!! I tested and got frustrated, tired and fed up debugging the code. My senior told me that we might have to split up existing database architecture. Here was the main problem because the database architecture is very good of this project. It is a LEGACY project (projects which run for a long period of time, this one is being run for 6 years+) which has no Hibernate support.

Approaching the problem:

When I saw the query for the first time, I was lost for a while. I am sharing the query here. But, I have renamed the tables, alias names, just retained some information so that it can be understandable. So, basically, what it does, it gets unpaid invoices from the database tables by join and subqueries and its related information such as customer, debtor, transaction date, due period etc.

1) At first, I beautified the code from some online site to make it readable and here it is:

 

Figure 1: existing query

Here, customer ‘id’ was passed dynamically in the highlighted line of the query. The query returned 357 rows of data with a time of 20.5 seconds!

Figure 1.1: existing query’s execution time

    2) After searching some online suggestions, I could not find much but I found out that a particular id was always given priority in the WHERE clause. So, I did change the
above query to the following:

 

Figure 2: slightly optimized query

        It did save some time! for like 4-5 seconds. So overall, it was taking 15-16 seconds(I do not have it’s execution time’s snapshot).

3) This was my final attempt, and a few hours already passed. I thought, If I could not find some solution better than this, I might need to restructure the database architecture and refactor the existing codes. I tried to find relations among the different tables. There was no Model level joins in this project which had made it more difficult. After a  lot, I found a solution to the inner query (from lines 20 – 28) which had a join between two tables. As we were providing the customer_id some value in the tbl_ic. So, I  thought while joining, it would have been better if I could join only that customer’s transaction data rather than the whole data which consisted all the customer’s transactions. So, I did add a line 25 to the above query :

Figure 3: optimized query

The query returned 357 rows of data with a time of 0.60 seconds! Can you believe ! not even 1 second.

Figure 3.1: optimized query’s execution time

 

Summary:

The query returned all the rows first then it filtered the data for a particular customer_id. So, I passed the customer_id first in the WHERE condition so that the query only filters the required results rather than fetching all the customer’s data. Moreover, it had a JOIN on the subquery. The inner query is run separately for every row of your table over and over again.  So, whenever you are in such conditions try to limit the data or filter out the data set with a condition. It will save your life and job too.