When most people first start learning to tune SQL, they develop a fascination with this mysterious column on the execution plan called “cost.” What is cost? And how can you use it to tune SQL?
So here’s, high level, what cost is:
Cost is a measurement of ESTIMATED time for a query to complete.
BIG, BIG, BIG emphasis on the word “ESTIMATED.”
It is NOT the ACTUAL amount of time that it takes to run a query…it’s “this is how long Oracle thinks it will take to run this query based on the information it has.”
So is “Cost” in seconds then? Milliseconds? Microseconds? Minutes? Hours? Days? What’s the unit of time for cost?
None of those! It’s actually a much weirder unit! One unit of cost is supposed to represent the amount of time that it takes to read a block off of disk.
At least…sort of.
Hard Drive Technology
The geeky history: A long time ago, the SLOWEST part of a database used to be the hard disk drive. This was the case for lots of reasons–hard drives weren’t as fast as back then. They are a lot faster now for lots of reasons:
- Our spinning hard disks can now spin much faster than they used to…up to 10,000 rpm!
- We’ve developed new technologies that allow us to store bits on hard disk drives in a more compact fashion. This means each time the disk spins, it’s able to read more data and have a higher throughput.
- Caching is better, at multiple layers. The operating system has a page cache. The hard disk drives now are built with something called a disk buffer. So just because you’re asking for information from the hard drive doesn’t mean the platters are necessarily spinning and reading. Everything could be cached at one layer or another.
- Nowadays we have solid state disks, which gets rid of a lot of the time we normally spent waiting. We don’t have to wait for the hard disk drive arm to move, center itself over the right track. We don’t have to wait for the sector that we’re interested to pass under the head of the disk drive. So a lot of the “moving parts” time just disappears.
Anyway! Long ago, physical hard drives didn’t have our fancy-pants technology that we have today. So we spent a lot of the time associated with a query reading data off of a disk.
Let’s start with the “old” way of costing. Whenever Oracle initially created the concept of the “cost-based optimizer” its whole job was to try look at a whole bunch of different ways to run the query and see which one would work best.
Different methods of running the query include things like:
- What order should I join tables in?
- Which indexes should I use?
- What methods should I use to join these tables?)
It’s a lot like Dr. Strange looking into the future in Avengers Infinity War:
Oracle is trying to look into the future, and is going “based on what I know now, what’s the best way to to run this query?”
So Oracle looks for a bunch of ways to run the query. For each method it finds, it measures the amount of time it thinks it’ll take based on “how many blocks will I have to read?”
This is the “old way” of costing. It’s called “IO” costing because it only takes into consideration the amount of IO your query needs.
At some point, technology became faster, and Oracle’s costing formula started to break down.
- Database Buffer Caches became larger. As a result, we spent less time reading data off of disk, more time on the CPU.
- Hard drives got faster. Hard drive time became less significant.
- More CPU-intensive operations became introduced with more complex features.
All of these lead to physical IO being “less of a big deal” and CPU as “more significant.”
Note: Physical IO is still be a big deal in queries today! But as a whole, it’s less of a big deal than it was.
So in order to capture the amount of time being spent on the CPU, Oracle started estimating two things: The number of blocks a query would read and the number of CPU cycles it would burn executing its query.
Well…the obvious problem here is if we want to know how much time a query takes, we can’t just add the number of CPU cycles + the number of blocks we need to read. They have to be the same unit before we can add them.
So Oracle converts the number of CPU cycles to the unit of “how long does it take to do a random read on a disk.”
Hint: It takes a TON of CPU cycles to equal the amount of time it takes to read even just 1 block off of disk for most systems.
Once they’re in the same “unit” Oracle adds the two numbers together.
How Can We Use Cost to Find the Most Efficient Query for Tuning?
Bad new: You can’t.
Why Cost is Useless for Measuring Anything “Real”
Cost is a guess that has a lot of assumptions baked in.
- The database assumes things like “there won’t be any data skew as a result of a join.” Or, “the distribution of data across a table is uniform.” Etc.
- Some filters in your WHERE clause may be so complex that the optimizer doesn’t have a good way to figure out how much data it’s going to get back. So the optimizer literally plugs in totally arbitrary guesses.
- Even though nowadays cost takes both CPU and Hard drive time into account, it doesn’t take everything into account. It still doesn’t include the amount of time it takes for results to travel across the network, or additional time as a result of database contention.
- Cost does not take the buffer cache into account at all. It assumes that we’ll be reading all blocks from disk. So if your table or index is in the buffer cache, it can throw your estimated time off.
So those are all extremely good reasons why Oracle’s cost estimation is basically useless for trying to measure anything real. It is an EXTREMELY ROUGH guess which makes a ton of assumptions.
But that’s not the real reason you have to ignore cost when tuning.
The Real Reason Why You Can’t Use Cost When Tuning:
One of the main reasons we tune queries is because Oracle has picked a bad execution plan. When Oracle has chosen a bad plan, it’s usually because Oracle’s Understanding of the Situation is off.
If Oracle misunderstands what the situation looks like…
- Maybe it thinks an index is a good idea when it’s not
- Perhaps it misunderstands how many rows it’ll pull back from a table
- Maybe its misestimated the number of rows that would result from a join
…then Oracle will often get “tricked” into using a bad plan.
So think about this:
- Cost is based off of Oracle’s understanding of what the data in your query looks like…
- When Oracle gets a bad query plan, it is almost always because Oracle misunderstands what the data looks like…
- …therefore, if Oracle gets tricked into using a bad plan, the cost of that plan is going to be wrong.
Like, WRONG WRONG.
Like WAAAAY OFF.
I’m not talking “different ballpark,” I’m talking “wrong state” or “wrong country.”
So if Oracle gives you an unreasonable execution plan, the overwhelming majority of the time it’s because the cost of the query is dead wrong.
So! When you’re tuning a query, pay absolutely no attention to cost. Seriously–it will not give you any valuable information. And it’s just about guaranteed to be wrong.
Free Oracle SQL Tuning Guide
Checkout my FREE guide, 7 SQL Tuning Secrets You Can Use Immediately, Even If You’ve Never Tuned a Query In Your Life!
Get it here: tuningsql.com/secrets