Whenever there was a performance problem, one of the first things our development teams and testing teams would ask was ‘What does Spotlight tell us?’
Imagine a firm that manages financial assets for one of the world’s largest investment banks. Its database team offers 24/7 support to internal customers with a staff of nine database administrators (DBAs) spread across the UK and India.
As in many enterprises, users turn to the database team when system performance unexpectedly lags, even when the root of the problem lies elsewhere. But this team has the advantage of using Spotlight on SQL Server Enterprise to investigate performance problems — wherever they originate.
“We ran on a mainframe system until our recent migration to .NET applications on SQL Server databases,” says a senior DBA on the team. “We migrated because the cost of running an aging mainframe system kept growing. Over time, we had developed internal expertise with .NET and SQL Server, so migration to a modern, easily maintained platform made more sense.”
To supplement that expertise, the firm had purchased licenses for Spotlight on SQL Server Enterprise, a tool for monitoring and optimizing database performance. The DBA had inherited Spotlight upon starting at the company and came to appreciate its value during the migration project. The two-year project consisted of thousands of hours of testing plus migration of millions of lines of code and terabytes of data across production and non-production.
“We bought additional licenses for Spotlight as we migrated,” says the DBA, ”first for our test servers, then for our production boxes. Whenever there was a performance problem, one of the first things our development teams and testing teams would ask was ‘What does Spotlight tell us?’”
The DBA appreciated how Spotlight captured system activity at multiple levels, including CPU and memory usage by SQL Server. That enabled his team to scroll back to the important minutes before, during and after an interruption so they could examine system events:
Network traffic — All DBAs could see quite easily that during the batch process window, many delays were caused by network traffic, so they knew processes were waiting on network input/output (I/O).
Parallelism — They identified situations in which unnecessary parallelism arose, allowing them to raise the threshold for it.
Update locks — They discovered contention on update locks and reported occurrences so that the database developers could address how the locking strategy was being implemented during batch processes.
Deadlocks — Spotlight helped them identify points at which blocking escalated into deadlocks.
Server configuration — With graphs from Spotlight illustrating CPU and memory usage during test batch runs, the DBAs demonstrated that the servers as specified would suffice for the workload. That enabled them to extrapolate the optimal sizes for production servers and user acceptance test (UAT) servers.
Workload spikes — When developers reported a performance hit at, say, 2:40 a.m., DBAs could replay the Spotlight recording of system activity and point to resource-intensive activity like reports running concurrently. The teams agreed to stagger workloads to relieve the bottlenecks.
“Not only was Spotlight useful throughout our migration,” says the DBA, “but we’ve also come to rely on it whenever there’s a problem. Our colleagues always ask us, ‘Can we just see what was happening 20 minutes ago when this problem first started?’ They ask us to trace through and look for blocking, or to see where the wait states changed. More than anything, the biggest benefit we have with Spotlight is the fact that it records what’s happened and allows us to play it back. And then it presents that data very clearly.”
He prizes that recording capability — “we can time-travel like Doctor Who in the TARDIS” — because performance questions come back so often to the database and the DBA. Unfortunately, investigations begin several hours after the fact, when the performance details are no longer in SQL Server buffers. Once that data is lost, all that’s usually left is speculation, but Spotlight allows DBAs to scroll back through multiple levels and drill into system events.
While Spotlight is designed for monitoring and diagnosing problems with SQL Server databases, its capabilities also pay off in code optimization. During migration, developers pointed out to the database team that a particular process kept being killed off. The DBAs used Spotlight to determine that code translated from the mainframe was trying to lock tables for an index update. It also revealed that another process was simultaneously trying an index update lock on the same tables in reverse order — a deadly embrace that migration engineers had to address.
Developers have brought up blocking problems that Spotlight has helped solve. The DBAs played back the incident and extracted information revealing a spike in SQL activity and the change in wait statistics during that time frame. By mapping the waits to the active sessions, they were able to show the blocking chains so that the developers could see where to examine their code.
“We had thousands of hours of testing,” says the DBA, “and Spotlight proved very useful in helping to identify problems that were in fact coding errors. Or perhaps they were database problems, but they were related to design and the data model. For instance, we’ve identified blocking and determined that indexes are no longer efficient because data volume or skew has changed, so the teams have looked at new indexing strategies.”
The database team monitors the Windows Server machines and the SQL Server instances running on them. As the firm’s SQL Server estate has grown to dozens of licenses, so has the value of Spotlight. That estate spans a dozen production instances, development servers, a quality assurance server for integration testing, a UAT server, pre-live servers and several servers used for sandboxing and proofs of concept. It’s a mixture of virtual and physical servers, both clustered and non-clustered.
In a financial environment, database activity tends to reflect current events, and the DBA can point to problems caused by unusual transaction profiles on certain days. “Elections are a classic case,” he says, “because large numbers of investors watch the vote tallies, then suddenly decide to sell or buy. Spotlight lets us monitor transaction counts and the effects they have on the databases, which makes trends stand out for us. There might be a massive change in physical reads, or a sudden drop in cache hit ratios, or a block that spreads from one to two to 20 or 30 processes in a few seconds.”
The SQL Server databases are also at the heart of the testing that precedes the rollout of new offerings. Developers ask the DBAs to monitor the UAT system while they conduct volume testing in the hundreds of thousands of transactions for a new type of trade. After the test, the DBAs hold a video conference and share Spotlight screens so the developers can see how the database performed.
“We play back the dashboard for them, which gives them a degree of confidence that the production system will handle the new type of trade,” says the DBA. “Being able to replay history like that has been worth its weight in gold to us — more than any other feature of Spotlight.”