For many organizations, Microsoft SQL Server provides the back-end power behind some of their most important applications. Unfortunately, many of those applications can be space hogs, producing megabytes (or gigabytes) of new data every day. Even worse, many applications provide little or no archiving capability, expecting you to simply keep piling on the data. If you’ve got a nice big SAN to store all of that data, you might not mind, but in some cases you might want to look at saving a bit of room – especially on older servers with shrinking free capacity.
Use NTFS Compression?
SQL Server supports the use of Windows native NTFS compression capabilities. Unfortunately, it only supports this compression for read-only filegroups. That’s great if you can move older, historical data into a separate filegroup – but typically your application needs to explicitly support that kind of structure. If you’re stuck leaving your data in the active, read-write database, then NTFS compression isn’t an option. Because only read-only databases are allowed to be compressed this way, SQL Server’s own master, model, msdb, resource, and tempdb database can’t be compressed – but those typically aren’t all that large. The good news is that NTFS-compressed databases can still be backed up, support all SELECT statements and read operations (although execution time will be slower), and other operations that involve only reading. Note that transaction logs can’t be compressed, but you won’t have a very large transaction log for read-only data.
Enabling NTFS compression is easy: As shown in Figure 1, simply right-click the appropriate .NDF database files from Windows Explorer, and enable compression by selecting the checkbox.
Use SQL Server Database Compression?
SQL Server also offers its own compression capabilities. This doesn’t compress the database files per se, but rather compresses the data within those files. You’ll save storage, although as with NTFS compression you’ll pay a price in lowered performance. SQL Server actually performs the compression and decompression in-memory, meaning more data can be read and written per 8KB database page than without compression.
SQL Server actually supports two kinds of compression:
Row-level compression reduces the metadata overhead that accompanies each row within the database. It also changes the physical data types used for some data storage, such as numbers, in order to save space. For example, fixed-length strings are stored as variable-length strings and blank characters are discarded. This isn’t truly “compression” in the normal sense of the word, but it does achieve space savings with only a very minor performance hit.
Page-level compression compresses the actual page-based data, for both tables and indexes. This includes row-level compression and typically saves more space, but lowers performance more as well.
SQL Server offers the ability to estimate the usefulness of compression before deciding to use it. To estimate your space savings, use the sp_estimate_data_compression_savings stored procedure, which will produce output something like the following:
EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'ROW' ; Go object_name : TestCompression schema_name :DBO index_id :0 partition_number : 1 size_with_current_compression_setting(KB) : 704888 size_with_requested_compression_setting(KB) : 9952 sample_size_with_current_compression_setting(KB) : 40784 sample_size_with_requested_compression_setting(KB) : 576
That’s for row-level compression; to test page-level compression you’d do this:
EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'Page' ; Go object_name : TestCompression schema_name :DBO index_id :0 partition_number : 1 size_with_current_compression_setting(KB) : 704888 size_with_requested_compression_setting(KB) : 9134 sample_size_with_current_compression_setting(KB) : 39664 sample_size_with_requested_compression_setting(KB) : 560
The output gives you an idea of how much each technique would save. I ran this against a table that used a large character-type column; you can see that simply eliminating the blank characters with row-level compression achieved a great deal of space savings. Moving up to page-level compression, which would impact performance more, actually offers very little additional benefit, so with this table I probably wouldn’t do it.
SQL Server Management Studio offers the ability to actually enable compression. Note that this can be a very lengthy process, especially on a large existing data set, so you should schedule this for an off-hours maintenance window. As shown in Figure 2, you can also have the compression wizard schedule the compression for a later time, so you don’t necessarily have to be around to watch it happen.
Note that data compression is only available in the Enterprise and Developer versions of SQL Server, and my examples refer specifically to SQL Server 2008 (although 2008R2 works the same). If you’re using the Standard edition, then you can’t use data compression.
Interestingly, SQL Server goes to great lengths to avoid a negative performance hit. For example, when a compressed page is read into memory, SQL Server only decompresses the bits that have actually been requested or modified. That helps it save time, although it depends on well-written queries that only request the data the application actually needs at the time. However, because data is read and written in compressed form, you can actually see better disk I/O performance when compression is turned on, because more data is read or written with each 8KB disk operation. The performance impact you’ll see will be on the CPU, so compression might not be a good idea for a server that’s already close to being CPU-bound. Microsoft has a great discussion of the feature and its pros and cons at https://msdn.microsoft.com/en-us/library/dd894051(v=sql.100).aspx.
What About Backups?
Backups are another SQL Server space-hog, and with these you have a lot more options. SQL Server has its own in-product compression option for backups, and almost every third-party backup product offers compression of some kind. De-duplication is a valuable companion to compression, and the two together can often reduce backup sizes by as much as 90%. Hop on your favorite search engine and look for something like “sql backup compression” to start reviewing your options.
It’s All About Saving Space
In the end, it’s possible to dramatically reduce the space your SQL Server databases occupy – even for databases that support third-party applications over which you have little developmental control. There can be a performance price to pay, so it’s worth doing some lab testing to see what you’re up against before you deploy compression in a production environment. Backup compression is another great way to conserve space, and with the right tool you can save a lot of it.
Do you use compression in your production environment? What kind of savings did you achieve, and what kind of performance impact did you run into?