Today, I attended a SQL “tuning” training. I had some initial concerns that the topics might be a bit over my head, given that my current SQL knowledge sits somewhere around what I would maybe call “solid fundamentals” — although I’m not so sure about the “solid” part now 😀

But my worries turned out to be unfounded. On the contrary, I found it utterly fascinating and the instructor managed to keep us all engaged, filling in any gaps in our fundamental knowledge as needed. The methods he introduced to make queries more efficient were far from high-brow and now I’m really excited to delve deeper into SQL and apply this new knowledge.

The topics covered included GROUP functions, analytical functions, PL/SQL queries, and the use of the DATE type, all primarily focused on the Oracle database.

I’d like to highlight one small example here. It concerns queries that mix SQL and PL/SQL instructions. PL/SQL is essentially the programming language in Oracle, which, for example, allows functions or other procedures programmed there to be used in SQL queries. Please don’t hold me to every detail, I’m writing this down just as I remembered and noted it.

Let’s take a look at a simple query:

select * 
 from tab_names 
 where lastname = v(last_lastname);

Here, the table tab_names is being searched for entries where the value v(last_lastname) is in the lastname column. This value is a procedure previously defined in PL/SQL that returns the value of the last requested surname.

The database now goes through these queries in such a way that it takes the table tab_names, goes through every entry there and compares the last_name column with the value from v(last_lastname). It does this for EVERY row in the table, calling the v(last_lastname) procedure. It does this because at this point it does not know that the value will remain the same.

However, things look different when you “hide” the procedure in a subselect and make the comparison that way:

select *
   from tab_names
   where lastname = (
     select v(last_lastname) 
      from dual);

The dual table here is just a standard dummy table in SQL with exactly one entry, used when you’re not interested in the data from a table but only in those from system functions, for instance. In this case, we’re interested in the value v(last_lastname), the rest is there because that’s how we get to this value via a SELECT statement in Oracle. This is also something I learned today…

Back to the topic: why is this syntax better now? It seems more complicated, and intuitively it feels like there’s even more work for the database.

Well, it may look that way, but it isn’t. Because now, the database looks at the subselect individually and recognizes that the same thing is always being done here. This is due to the internal processing of SELECT statements in Oracle. For the database to understand the request, it is essentially compiled and optimized. To speed things up, the database stores individual queries by generating a hash from the corresponding string of a SELECT. If another query comes in, it is also converted to a hash and the database checks if it already exists in the memory. In this case, there’s no need to compile and optimize in the background again.

A good example of this is the following query:

Select dbms_random.value random_wert1, 
  (select dbms_random.value from dual) random_wert2 
  from tab_123;

Suppose the table tab_123 contains 5 rows. The function dbms_random.value returns a random number between 0 and 1. The above query yields, for instance, the following result:

random_value1random_value2
0.123456789012340.43872456523768
0.983451748586920.43872456523768
0.337582374753440.43872456523768
0.582364172712220.43872456523768
0.238748237475740.43872456523768

Here, the database takes a shortcut and sees that the subselect (select dbms_random.value from dual) always has the same string and therefore only executes it ONCE, while the random function for random_value1 is executed anew each time.

Why is this particularly interesting in the above example with PL/SQL? Well, there’s always an environment switch between SQL and PL/SQL. And that costs. The less I do this, the better. And when it comes to dealing with a LOT of data, you start noticing these differences quite clearly.

I find this incredibly intriguing and I’m eager to see if I can put this new knowledge into practice soon!

Until then, I’ve learned something new again 🙂

Leave a Reply