Some queries, mostly reporting ones, have to be optimised when they use tables with millions of rows because their execution times make them unusable. We will see how to improve their performance including certain features that are specific for Oracle databases.
Prepare in advance
Before starting, check that there are enough automated tests that check most of the scenarios, that way you will be confident that the changes don’t break any existing functionality and it reduces the time needed to test manually the same cases multiple times.
Try to prepare multiple scenarios in which the old and new versions of the query may perform differently, and try to have two separate similar instances so you can compare the old and new queries running in parallel. Also, add log lines lines before and after the query execution to monitor the execution time as the queries may perform differently on each environment as they may have different execution plans on them.
Oracle caches the results, what makes difficult to compare multiple executions as the first run may take minutes and the next ones only a few seconds. Flushing solves this as it clears the cache, but it is a bad approach as it affects everyone using the same database. It is better to prepare different parameters so it runs without using previously cached results. If possible also prepare some basic performance tests that run the query multiple times with different parameters to see the average time, as the execution times may depend on the server and database usage.
It is usually needed to do multiple small changes as it is rare when only one change improves drastically the performance. Run the automated tests after every big change to check that all the scenarios still work, and don’t forget to commit the changes to a branch after each stable version so you can go back if something breaks as you may not remember which change did it. Also, run frequently the integration tests if the results of the query are used as an input of another.
Most RDBMS have a command like explain plan that shows how they process the queries, which algorithms they use, which indexes are picked, the CPU cost… what is really helpful to detect issues. Most tools like SQL Developer also have a UI for it that makes easier to read the plan. When looking to it, try to see if it searches the data using hashmaps or nested loops, if it uses the expected indexes or does full scans, etc.
Queries perform a lot faster when they use indexes, as they prevent doing full scans (when it is needed to look to every single row), and some indexes work better than others depending on their cardinality. Sometimes it helps using additional indexes or composed ones, but they affect the performance of inserts and updates, so they have to be added carefully. If a column contains nulls, it may work better to use nvl than a combination of comparing the value and using is null. And if possible avoid filtering by calculated fields (concatenations, case-whens, …) as they don’t take advantage of the indexes.
Only return the columns needed
We should avoid doing “select *” as most columns may not be needed and it is a lot slower to fetch all their data. Consider this even on the subqueries.
Oracle allows to refactor the queries using with clauses that extract parts of them to make them more readable. However, they make harder to detect how to change joins, if the filters are applied on the right places or if there are unnecessary duplicated ones, if there are unnecessary orderings in the sub-queries, etc. Consider unfolding these clauses, looking to the query as a whole to see if there are possible performance improvements unnoticed before, and then re-factoring again using with clauses if needed.
Optimise the joins
The way of joining tables affects a lot to the performance. The main one is using conditions in the on instead of in the where, as it does the filtering before joining, processing a much smaller row combination. Inner joins are usually faster than left or right outer joins although it may sound counter-intuitive. The order of the joins may also affect the performance if the tables have very different sizes (e.g. when joining a table with millions of rows with other that only has a few thousands).
There are little things that affect the performance. E.g. avoid using parenthesis in the where clauses unless necessary, as it may make it harder for Oracle to resolve and use the right indexes or partitions. Also, when comparing dates, ensure that the input one is an actual date instead of a string, as Oracle may try to convert each row’s dates to string to compare them, what would impact the performance.
Take advantage of the partitions
Oracle allows to split big tables in partitions e.g. by year, month, … It is good to know them to ensure we filter by columns that use them, and also to do parallel processing. E.g. we can use hints like parallel(x) to process different partitions in parallel using multiple threads. Consider also preparing a separate query that figures out if the search interval can be reduced so the main query has to process less partitions.
There may be cases when no much improvement can be achieved. In that case, if they are slow but under an acceptable SLA, consider workarounds like using a load animation so users know that it is still working and they just have to wait a bit. Also, consider supporting multiple tabs so they can run other reports while waiting for that one to finish.