Understanding IO Issues with Spotlight on SQL Server

Finding the statements and objects involved in your bottlenecks

The modern explosion of data has created a huge demand for large volumes of storage, written and read at high speeds. The sheer size of many modern databases makes it impractical to maintain them in-memory, explaining our continued reliance on hard drives. SQL Server itself, as an ACID-compliant relational database, adds to this hard-drive intensive load through the continuous use of transaction log and tempdb files.

With all of this pressure on storage systems, IO performance is a common bottleneck. To understand it, you’ll need to know the processes, resources and configurations involved.  This post will show you how Spotlight on SQL Server gives you this information in a few clicks.

Spotlight can alert you by email, mobile alert and on the dashboard when an IO operation takes longer than it should.

In the example above, Spotlight is showing a Max I/O Stall Time alarm, which indicates that reading from and/or writing to a database file has taken too long.  Clicking on the dashboard component has brought up a descriptive pop-up, and the Diagnose button drop-down is offering two ways to look at the issue.  Let’s looks at the I/O by File option first:

Here you can see the I/O by File drilldown, which has a list of database files that can be selected to see a graph of the trend of IO load on each.  This helps you see the IO issue from a hardware and configuration perspective, and to determine whether you might want to create additional files in order to distribute the IO load on the objects inside them, or need to relocate heavily used files to a higher performance drive, or even upgrade your drive system altogether.

Of course, SQL Server performance problems should be addressed not only by adding resources or making configuration changes, but also by seeking opportunities to optimize the t-sql statements hitting them. The second option in the Diagnose button drop-down menu, shown in the first screenshot above, is “Wait Events”. This takes you to a screen that lets you see the heaviest-hitting statements for a particular time period, revealing your best opportunities for t-sql optimization.

When using the Wait Events screen to understand IO issues, you’ll want to change the Category to IO, which will ensure that you only see IO-related activity and statements. The “Show data for” dropdown lets you specify a time range that you’d like to investigate.  You’ll then see graphs of the activity for that time range, and you can expand the Statements branch in the dimension tree to see the statements that created the load.  It’s easy to see where your best opportunities for tuning are, because the statements are ordered with the heaviest ones at the top.

Let’s take a look at one last screen, this time giving you an excellent big picture, at-a-glance view of IO issues over time. It’s called Alarms by Time:

Alarms by Time shows you all Spotlight alarms that occurred in the last week, drawn out on a timeline (by the way, Spotlight can retain more than a week of history if you allocate the necessary space for it).  Along the left edge of the timeline we can see all of the files that have recently experienced IO Stall Time alarms (you can hover over the file path or colored alarm bar to see the full path).  What’s great about this view is that you can see in a single shot the severity and duration of the alarms, and whether they were related.  Multiple concurrent IO alarms on files placed on the same drive might indicate that the combined IO load on them may be more that the drive can handle, prompting you to consider relocating some of the files, or upgrading the storage system they’re on.

Thanks for taking the time to check out this post, more like this can be found at Database Management posts can be found at https://www.quest.com/community/b/en/p/database-management.

All of my Quest Database Performance blog posts can be found here.

Anonymous