Does Your Code Have a Preamble?

Okay, here is a pet peeve of mine, I think every stored procedure, function, view etc. should all contain a block of code I refer to as a preamble. If yours doesn’t I strongly recommend you start adding it. It drives me crazy when I see code with no documentation of any kind telling me what it is for and when it was written or changed.

Why? A preamble documents the use, need, and changes for the code. It also leaves bread crumbs as to how why and what you did. I don’t know about you but I may code something and not have to change it for two years. When I do, I then think back and say why did I do that or who changed this code last. Working as a lone DBA, leaving bread crumbs was critical as I constantly jumped from task to task.

Above is the example of my preamble I use for all code I write. It tells who wrote it, what it is, what it is called by, how to run it, and lists any changes done to it.  I find one of the most helpful items on this is the Run documentation.  Here I place an exact run statement. It will show how the parameters should look and gives me a quick way to test it.

There are a million and one reasons as to why you should be doing this in your code. If you’re not doing it just take a second and start doing it. You’ll thank me for it later.

Just Check ALL the Boxes

Today I ran into something on a client server I unfortunately see too often.  The DBA goes through the trouble of configuring and setting up alerts\operators but doesn’t really understand what the options in the configurations mean. So unfortunately, that means they take the CYA (cover your ass) approach and they check all of them. Now, not only have I seen this with alerts but also with things like security configurations as well. My advice is to always in to take a second and research what each option is before you check the little boxes, especially when it comes to security. Always follow the rule of less is more.

In the example below the administrator enabled alerts for an operator using the CYA approach. They checked email, pager, and netsend.

So, what’s the big deal? This server experienced an insufficient resources (space) alert that fired every minute and by having PAGER notifications enabled it caused the error log to bloat, consumed unnecessary space, and created noise in the logs.

The administrator of this environment really only needed to configure the email notification, as the company did not use netsend nor have pagers duties configured. To be honest, I have yet to see an environment use more than that, and per Microsoft both Pager and Net Send will be removed in future versions.

So, the morale of the story is, please take the time to research what the little checkboxes are before you enable them. The example above is a pretty benign one, but you can imagine what kind of messes you can get yourself in for other more critical things like security.

A Side Note:

If you want to learn how to setup your alerts and operators I’ve already written a blog on that with scripts you can find it here.

You can also visit github.com/dc-ac for a full install script that includes the Alert and Operator setups https://github.com/DC-AC/SQL2016_Scripted_Install

Hmmm… What’s This?

OK So, I am doing some digging and peaking around again in SQL Server and came across a database option called Date Correlation Optimization Enabled = False. Honestly, I had no clue what it did, so I took it as a learning opportunity to look into it and do a little research. Who knows, it may actual help me solve one of the many problems I run into day to day for clients.

Syntax

So, What Does It Do?

According to MSDN – The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.

How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.

In English

Basically, it uses a foreign key relationship key between tables in SQL Server to enhance performance of date and date time queries when the dates fall within a certain defined range of each other (correlates). Ok that’s cool, but what’s the big deal? The power really comes in for things like reporting, validation, and data warehouses. With this option turned on, SQL Server maintains statistics between correlated columns and creates improved execution plans that reads less data.

Let’s See It in Action

Consider this, all internet orders that are received have a must send out by date (due date) of 10 days after order is received. Therefore, the OrderDate and DueDate are correlated, related to each other.

Here is a query you would normally run.

Without DATE_CORRELATION_OPTIMIZATION turn on the optimizer would create a plan just like anything else, however with it set to ON the optimizer can make more granular execution plans.

Here’s how

With each INSERT, UPDATE and DELETE between these two tables SQL Server is gathering statistics which helps the optimizer infer the query to be more like the one below. This is where the power comes in. The optimizer can better narrow down the records it needs to read and therefor returning faster results.

Here is the way SQL interprets the dates now that correlation is turned on and statistics are being gathered. It based on those statistics it can now infer that each DueDate is exactly 10 days after the OrderDate.

Depending on the number of records in the two tables this can be a VERY significant decrease to execution times.

The Caution

You should not enable DATE_CORRELATION_OPTIMIZATION in update-intensive database environments. SQL Server keeps all the correlation information in statistics form, this means with every INSERT, UPDATE and DELETE you gain additional overhead.

As always, be sure to test it before you use it in production.

What does this little check box do?

Ever wander around SQL Server properties and wonder what these little check boxes turn on? I do, and I get very tempted to check them. Here is one of those tempting little boxes that seems pretty handy, Use query governor to prevent long running queries.

Syntax

How Does it Work?

It’s simple. This option, available in SQL Server 2008 standard and forward, will prevent long running queries based on run time measured in seconds. If I specify a value of 180 the query governor will not allow any execution of a query that it estimates will exceed that value. Notice it says ESTIMATES which means it will be based on optimizer estimates and not ACTUAL run times. It does NOT KILL an actively running query after designated amount of time.  There is no worries for rollback scenarios or partial data.

CAUTION

This is an advanced option, keep in mind this is a server instance wide option. This will also effect your maintenance queries, so please use with caution, this is not “a let me check this box for fun” option.

But Wait There’s More

Now there is a query “transaction” based option available to us that will limit a specific query. This option will estimate a transaction and prevent it from running if it will go over the boundary we have set. Notice we set the limit before the query and then back to 0 after.

Again, playing with any old check box is not a recommended practice. Make sure you research it first and understand the full impact before checking that tempting little box.

Please Don’t Do This!

Please, please, please Admins do not leave your default index fill factor at 0. This means you are telling SQL Server to fill the page 100% full when creating indexes. This also means you are forcing it to a new page when additional inserts are done. These are called PAGE SPLITS which can take time to perform and is a resource intensive operation. Having a high fill factor will cause more index fragmentation, decrease performance and increase IO.

If you find that this is how your system is configured, all is not lost. You can correct this by changing the default value so that new indexes will be created with proper factor and rebuilding your existing indexes with another fill factor value. I like to use 80 across the board for most, of course there is always the “it depends” scenario that arises but 80 is a pretty safe bet. One of those “it depends” would be on logging table that has the correct clustering key and never gets updates in between values (make sense?), I don’t want a fill factor of 80.  I’d want 0/100 to maximize page density as page splits wouldn’t occur if the clustered key is monotonically increasing.

Note, having the additional 20% free on a page will increase your storage requirements but the benefit outweighs the cost.

Example syntax for changing the default

Example script for rebuilding in index with new fill factor

 

VLFs the Forgotten Foe

How many of you check the amount of Virtual Log Files (VLFs) your transaction logs have?

Working as a consultant now, I see this as something that is often ignored by DBAs.  This is an easy thing maintain and yet so many don’t know how to. Keeping these in check can give you a performance boost not only on startup but with your insert/update/delete as well as backup/restore operations. SQL Server performs better with a smaller number of right sized virtual log files.  I highly recommend you add this to your server reviews.

What is a VLF?

Every transaction log is composed of smaller segments called virtual log files. Every time a growth event occurs new segments, virtual log files, are created at the end of your transaction log file. A large number of VLFs can slow things down.

What causes High VLFs?

As transactions force growth of the log file, inappropriate log file sizing or auto-growth settings can cause a high number of VLFs to occur.  Each growth event adds VLFs to the log file.  The more often you grow in conjunction with smaller growth segments, the more VLFs your transaction log will have.

Example

If you grow your log by the default 1 MB you may end up with thousands of VLFs as opposed to growing by 1GB increments. MSDN does a great job on explaining how a transaction logs work for a deeper dive I recommend reading it.

How do I know how many VLFs my log files have?

It’s very easy to figure out how many VLFs you have in your log file.

Make sure you are on the context of the database you want to run it against. In this case TEMPDB and run the DBCC LOGINFO command.

The query will return a result set of all LSNs created for that database, the COUNT of those rows is the amount of VLFs you have.

Now there are many ways you can get fancy with it using TSQL, so have fun with it. Write something that rolls through all your databases and gives you the record counts for each. There are plenty of useful examples on the internet.

The VLF counts should be under 100 ideally, anything above should be addressed.

*New for 2017 is a DMV that will give you an even easier way to get the VLF counts sys.dm_db_log_stats ( database_id ) .

How do you Fix?

These transaction log files should be shrunk until there are only two VLFs, then grown in chunks back to the current size.

  • Perform Shrink using DBCC SHRINKFILE

  • Regrow your log in an increment that makes sense to your environment. However, if your file growth is in excess 8GB it is recommended to grow in 8000MB chunks while manually regrowing the file. Your autogrowth should be set to a lower value. There is no set rule to what those values should be, it may take trial and error to figure out what is best for your environment.

Note: Growing out you log can cause a performance hit and block on going transactions, be sure to perform this during a maintenance window.

It’s that simple, now go take a look at your files. You may be surprise on what you find.

Back to Basics: Why not parameterize?

I think sometimes those of us that have been doing database administration/development for a while take it for granted that everyone knows the basics. One such basic is parameterizing stored procedures. This allows us to potentially consolidate multiple stored procedures into a single procedure.  It’s as simple thing to do that many don’t.

I try to parameterize as many stored procedures as possible. This not only minimizes the amount of procedures I need to maintain, it in my opinion is a much cleaner way to code. It disturbs me when I see multiple stored procedures that pull the exact same data, but may have slight differences between them. Whether it be a sort, a where clause, or even just an extra field or two that makes it different, some developers think you need a different procedure for each one . Why not consolidate and parameterize?

Exhibit A

The code below is an example of a real work scenario.  Originally, it was 8 stored procedures and with 8 correlated reports. By simply adding a Report Type parameter I was able to make it one stored procedure and as well as consolidate to a single report.

To add a new dataset just right click on Datasets and choose Add Dataset. Since the report is a stored procedure we set the dataset connection string to the stored procedure name and its parameters. This is just my preferred method. You can also choose the stored procedure from the drop down.

rep

rptTrackMonthlyStats @ReportType, @year, @startdate, @enddate

rp

In the Report Type parameter, choose add Available Values. I typed in each option so the user could choose which report layout/data they wanted to see from drop down. That parameter will be passed to the stored procedure upon execution and the proper dataset will be returned. The users will never see the T, TD etc. they only see the label so it doesn’t make any difference to them what those are.

Parareport connectiom

You can even go as far as using these parameters to hide and show different report elements, but that’s for another time. Stay tuned for more back to the basics.

NOTE: There are some reasons not to do this, like the reuse of the execution plans and parameter sniffing but in these cases consolidating would not be an issue as they use the same parameters.

Initial SQL Server Configurations

Wonder if I Do Things Differently?

I am always wondering what other DBA’s do and if I am doing things differently. One such thing is my initial server setups, basically, what I configure for each of my new servers. So, why not blog about it and see what others chime in with after they read this. Keeping in mind that everyone has different requirements and different ways that they like to do the actual configurations.

For now, I am not going to go into what each one of these configurations do and why I choose the value I do. That’s for another time. If you want that information you can always go to Books Online or just Google it. In this case, I am just going to give you a running list with scripts; that I’ve added too over the years based on best practices and experience.

How Does Yours Compare?

I’d really love to hear what others do and if I may have missed something that you like to implement that may benefit me as well.  So leave a comment, tweet to me, or send me an email let’s compare notes.

The List

So here are the basics setups I do on every server post install in no particular order.

* Value varies based on server configuration

  1. Min and Max Memory *

  1. Enable and Configure Database Mail ( This is only to enable, full script will be in later post)

  1. Set Default Database Locations

  1. Set SQL Agent Job History Retentions

  1. Set Cost threshold for Parallelism *

  1. Set Max Degree of a Parallelism *

  1. Set Optimize for Adhoc work loads

  1. Change Number of Error Logs

  1. Create Cycle Error Log Job

  1. Add Additional TempDB Files All With Same Size and Growth Rates *

  1. Set Media Retention

  1. Set Backup Compression Default On

  1. Change to Audit Successful and Failed Logins

  1. Set Default Growths in Model Not Be Percentages

  1. Set AD Hoc Distributed Queries off 
  2. Set CLR Enabled off 
  3. Set Ole Automation Procedures off 
  4. Set Scan For Startup Procs off 
  5. Set xp_cmdshell off
  6. Setup Operators

  1. Set Up Alerts 17-25 and Error codes  823,824,825 (Remember to add the alerts to the operator)

Note: Most of these can be set using GUI as well as the scripts above. Also, in addition to these configurations, I make sure that the server is brought up to the most current stable CU or Service Pack. Everyone’s environment is different, my list may not be right for you.

So let’s talk naming conventions

How many of you have come across a database that had stored procedures, views or functions and you had no clue, by name, what they were for? Having standard naming conventions helps to prevent that. Everyone has their own preferences and opinions on what they should be, so I thought I’d share mine.

My opinion

In a nutshell, the name of any object should be informative; specifically what the object is used for and where it is used. This is accomplished by utilizing prefixes in conjunction with specific naming conventions.  I apply these standards to all of my stored procedures, views, and functions, whenever possible. Of course there are always times where naming conventions don’t fit the situation, you have to be flexible and logically choose a name that fits that scenario.

Here Are My Standards

standards

Stored Procedures

  • Stored Procedure name should accurately reflect the content and function
  • No Spaces
  • No Underscores
  • Use Camel Case Only example “rptTheNewStoredProcedure”
  • The following prefixes should be used for all names
  • New Prefixes can be added as needed when a deemed applicable
Prefix Use
rpt Reporting Services Report
app Applications
web Website Exclusively
xls Direct Excel Data Pull
ssis Store Procedure only used in SSIS packages
job SQL Agent Job Step
cube Analysis Services Cube Data

If a stored procedure has multiple uses, I choose the highest entry in the usage hierarchy as the prefix. For example, a job that calls a stored procedure for an application. The job is what calls it, so it would be named with the job prefix. Using the prefix job will point me to SQL Agent. There I should see a job having a job step with same name as stored procedure minus the prefix and a stored procedure of the same name prefixed with job. At any point in it should give the DBA a trail to follow. Don’t duplicate a stored procedure to use in multiple places. Otherwise, any change will have to be done multiple times.

Views

  • View name should accurately reflect the content and function
  • No Spaces
  • No Underscores
  • Use Camel Case only example “rptVwTheNewView”
Prefix Use
rptVw Reporting Services Reports
appVw Applications
webVw Website Exclusively
jobVw Job Related
cubeVw Analysis Services Cube

I add Vw just so I can differentiate it from a stored procedure or table. Now you may question why I add Vw to views but not p or proc prefix to stored procedures. It’s simply because for me stored procedures are the default and doesn’t need it.

Functions

  • Function name should accurately reflect the content and function
  • No Spaces
  • No Underscores
  • Prefix with fx
  • Use Camel Case only example “fxTheNewFunction”

Because functions can be called by lots of objects I tend only to use fx and not add rpt etc. Functions are the only objects that deviate from the norm for me.

Right Way

The best use example I can give for how I try to implement naming conventions as a way to identify its use is a stored procedure for a reporting services report. The stored procedure should be named exactly what the report is or will be named, but will have the prefix of rpt. Whoever comes into the database should look at the name and know it’s;

  1. A stored procedure called by a report
  2. There will be an .rdl file that has the exact same name minus the prefix.

Exhibit A

Stored Procedure: rptCostofManufacturedItems

Report Services Report: CostofManufacturedItems.rdl

The Not So Right Way

wrong

The not so correct way

Like the toilet paper roll being put on the wrong way. It is a huge pet peeve of mine to see stored procedures named something that you cannot tell what it is. I get irritated when people prefix a stored procedure it p_ or proc, it’s just a thing for me.  I also despise the wretched underscore and mismatched casing. As you can probably tell, I am particularly persnickety when it comes to naming conventions.

Exhibit B (a real exhibit I have encountered)

Stored Procedure: p_DatabaseA_Skus_customers_ROLLING_vj_v5

Report Services Report: customer items.rdl
The name, as I figured out, was “p” for procedure, database name (which makes no sense since I can find it under that database), then procedure use, the developers initials, and finally a version number.  As you can see the report that is it associated with, tells you in absolutely no way shape or form, that it uses that procedure. Notice the “v5” in the name.  Yep, there were 5 of these puppies.  I could not tell without locating the report which one was actually used. Finding which report was a task in and of its self since the report was named completely different from the stored procedure. With naming conventions like this you waste time just trying to figure out what goes with what.

What’s your standard?

Everyone’s environment is different and some may not agree with my opinion, but it works for me. I’d be interested to hear other standards people use. Maybe I can pick up an idea or two.

The key to this is to come up with standards of your own and implement them.  Use them not only for the objects I listed but also for  triggers, indexes, jobs, and replication. It’s less important WHAT your conventions are. It’s more important that you have them. Pick what works for your company’s business model and push all those who deploy to production, including report writers, to use the same naming conventions.  Cleanly named objects make for happy DBA’s.