Does Your Code Have a Preamble?

Okay, here is a pet peeve of mine, I think every stored procedure, function, view etc. should all contain a block of code I refer to as a preamble. If yours doesn’t I strongly recommend you start adding it. It drives me crazy when I see code with no documentation of any kind telling me what it is for and when it was written or changed.

Why? A preamble documents the use, need, and changes for the code. It also leaves bread crumbs as to how why and what you did. I don’t know about you but I may code something and not have to change it for two years. When I do, I then think back and say why did I do that or who changed this code last. Working as a lone DBA, leaving bread crumbs was critical as I constantly jumped from task to task.

Above is the example of my preamble I use for all code I write. It tells who wrote it, what it is, what it is called by, how to run it, and lists any changes done to it.  I find one of the most helpful items on this is the Run documentation.  Here I place an exact run statement. It will show how the parameters should look and gives me a quick way to test it.

There are a million and one reasons as to why you should be doing this in your code. If you’re not doing it just take a second and start doing it. You’ll thank me for it later.

SQL Sequence vs Identity Column

Let’s take a look at what a Sequence is in relation to an Identity Column in SQL Server. Did you know Sequence even existed? I didn’t until I was asked about them. It’s amazing how much you can skip over and never notice in SSMS. See this little folder, ever notice it under Programmability in Management Studio. Yep it’s there, SQL Server has this very handy thing called Sequences. Sequences are a relatively new feature that have only existed since SQL Server 2012, but have long existed in Oracle (where there a no identity columns).

What is a Sequence?

Per MSDN, A sequence is a user-defined schema bound object that generates a sequence of numeric values according to the specification with which the sequence was created. The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted. It’s important to note the sequences can be cached and are not guaranteed to be in sequential order.

The Code

After creation, you can look at the properties in the GUI. Note you can set the Increment by, you can restart the sequence and even set min and max values.

How to Query

What’s an Identity Column?

A property of a table that is set by initial seed value (starting value). For each insert it assigns a new incremental value that is added to the identity value of the previous row that was loaded.

The Code

Note: after the field type IDENTITY, you declare the SEED (1), then INCREMENT Value (1). You can see this in the GUI below for the Column properties.

How to Query

Let’s insert two records and see the NameID Identity column increment.

Comparing the two

Attribute Sequence Identity
Object Level Database Table
Limit Can set a limit Limited by data type INT vs BIG INT
Values Generated by application call using NEXT VALUE FOR Generated on INSERT on a table
Increments Declared as INCREMENT at setup and can be anything. Can be a negative number to cause the sequence to descend instead of ascending numbers Declared as INCREMENT at setup and can be any positive number, numbers will ascend
Scope Generated outside the scope of a transaction Generated within a Transaction
Number Assignment Sequences can be preallocated (example assign me number 1-25) Cannot be preallocated, assigned in order by INSERT
Gaps Can experience Gaps Can experience Gaps
Uniqueness No, this number can be reset and reused. Often used as Primary Key (you must choose this property to ensure the unique value).

Summary

So, this was just a quick look what a Sequence is compared to an Identity column. Both can be very useful. If you’re looking for a unique value your best bet it to go with an Identity Column and the Primary Key option. If you want just an auto-generated value to be able to use in an application outside of a table a Sequence is a sure bet. Play around with it, I am sure you can come up with a million and one uses for each.

***UPDATE NEW to SQL 2017 ***

Per MSDN there is a new option

IDENTITY_CACHE = { ON | OFF }

Applies to: SQL Server 2017 and Azure SQL Database (feature is in public preview)

Enables or disables identity cache at the database level. The default is ON. Identity caching is used to improve INSERT performance on tables with Identity columns. To avoid gaps in the values of the Identity column in cases where the server restarts unexpectedly or fails over to a secondary server, disable the IDENTITY_CACHE option. This option is similar to the existing SQL Server Trace Flag 272, except that it can be set at the database level rather than only at the server level.

Just Check ALL the Boxes

Today I ran into something on a client server I unfortunately see too often.  The DBA goes through the trouble of configuring and setting up alerts\operators but doesn’t really understand what the options in the configurations mean. So unfortunately, that means they take the CYA (cover your ass) approach and they check all of them. Now, not only have I seen this with alerts but also with things like security configurations as well. My advice is to always in to take a second and research what each option is before you check the little boxes, especially when it comes to security. Always follow the rule of less is more.

In the example below the administrator enabled alerts for an operator using the CYA approach. They checked email, pager, and netsend.

So, what’s the big deal? This server experienced an insufficient resources (space) alert that fired every minute and by having PAGER notifications enabled it caused the error log to bloat, consumed unnecessary space, and created noise in the logs.

The administrator of this environment really only needed to configure the email notification, as the company did not use netsend nor have pagers duties configured. To be honest, I have yet to see an environment use more than that, and per Microsoft both Pager and Net Send will be removed in future versions.

So, the morale of the story is, please take the time to research what the little checkboxes are before you enable them. The example above is a pretty benign one, but you can imagine what kind of messes you can get yourself in for other more critical things like security.

A Side Note:

If you want to learn how to setup your alerts and operators I’ve already written a blog on that with scripts you can find it here.

You can also visit github.com/dc-ac for a full install script that includes the Alert and Operator setups https://github.com/DC-AC/SQL2016_Scripted_Install

What does this little check box do?

Ever wander around SQL Server properties and wonder what these little check boxes turn on? I do, and I get very tempted to check them. Here is one of those tempting little boxes that seems pretty handy, Use query governor to prevent long running queries.

Syntax

How Does it Work?

It’s simple. This option, available in SQL Server 2008 standard and forward, will prevent long running queries based on run time measured in seconds. If I specify a value of 180 the query governor will not allow any execution of a query that it estimates will exceed that value. Notice it says ESTIMATES which means it will be based on optimizer estimates and not ACTUAL run times. It does NOT KILL an actively running query after designated amount of time.  There is no worries for rollback scenarios or partial data.

CAUTION

This is an advanced option, keep in mind this is a server instance wide option. This will also effect your maintenance queries, so please use with caution, this is not “a let me check this box for fun” option.

But Wait There’s More

Now there is a query “transaction” based option available to us that will limit a specific query. This option will estimate a transaction and prevent it from running if it will go over the boundary we have set. Notice we set the limit before the query and then back to 0 after.

Again, playing with any old check box is not a recommended practice. Make sure you research it first and understand the full impact before checking that tempting little box.

Please Don’t Do This!

Please, please, please Admins do not leave your default index fill factor at 0. This means you are telling SQL Server to fill the page 100% full when creating indexes. This also means you are forcing it to a new page when additional inserts are done. These are called PAGE SPLITS which can take time to perform and is a resource intensive operation. Having a high fill factor will cause more index fragmentation, decrease performance and increase IO.

If you find that this is how your system is configured, all is not lost. You can correct this by changing the default value so that new indexes will be created with proper factor and rebuilding your existing indexes with another fill factor value. I like to use 80 across the board for most, of course there is always the “it depends” scenario that arises but 80 is a pretty safe bet. One of those “it depends” would be on logging table that has the correct clustering key and never gets updates in between values (make sense?), I don’t want a fill factor of 80.  I’d want 0/100 to maximize page density as page splits wouldn’t occur if the clustered key is monotonically increasing.

Note, having the additional 20% free on a page will increase your storage requirements but the benefit outweighs the cost.

Example syntax for changing the default

Example script for rebuilding in index with new fill factor

 

VLFs the Forgotten Foe

How many of you check the amount of Virtual Log Files (VLFs) your transaction logs have?

Working as a consultant now, I see this as something that is often ignored by DBAs.  This is an easy thing maintain and yet so many don’t know how to. Keeping these in check can give you a performance boost not only on startup but with your insert/update/delete as well as backup/restore operations. SQL Server performs better with a smaller number of right sized virtual log files.  I highly recommend you add this to your server reviews.

What is a VLF?

Every transaction log is composed of smaller segments called virtual log files. Every time a growth event occurs new segments, virtual log files, are created at the end of your transaction log file. A large number of VLFs can slow things down.

What causes High VLFs?

As transactions force growth of the log file, inappropriate log file sizing or auto-growth settings can cause a high number of VLFs to occur.  Each growth event adds VLFs to the log file.  The more often you grow in conjunction with smaller growth segments, the more VLFs your transaction log will have.

Example

If you grow your log by the default 1 MB you may end up with thousands of VLFs as opposed to growing by 1GB increments. MSDN does a great job on explaining how a transaction logs work for a deeper dive I recommend reading it.

How do I know how many VLFs my log files have?

It’s very easy to figure out how many VLFs you have in your log file.

Make sure you are on the context of the database you want to run it against. In this case TEMPDB and run the DBCC LOGINFO command.

The query will return a result set of all LSNs created for that database, the COUNT of those rows is the amount of VLFs you have.

Now there are many ways you can get fancy with it using TSQL, so have fun with it. Write something that rolls through all your databases and gives you the record counts for each. There are plenty of useful examples on the internet.

The VLF counts should be under 100 ideally, anything above should be addressed.

*New for 2017 is a DMV that will give you an even easier way to get the VLF counts sys.dm_db_log_stats ( database_id ) .

How do you Fix?

These transaction log files should be shrunk until there are only two VLFs, then grown in chunks back to the current size.

  • Perform Shrink using DBCC SHRINKFILE

  • Regrow your log in an increment that makes sense to your environment. However, if your file growth is in excess 8GB it is recommended to grow in 8000MB chunks while manually regrowing the file. Your autogrowth should be set to a lower value. There is no set rule to what those values should be, it may take trial and error to figure out what is best for your environment.

Note: Growing out you log can cause a performance hit and block on going transactions, be sure to perform this during a maintenance window.

It’s that simple, now go take a look at your files. You may be surprise on what you find.

Ooops! Was that me? (Blog Challenge)

We have all made mistakes in our careers, I thought I’d share one of mine as a quick tip to others so that you don’t make the same one.

Everyone has their SQL Alerts setup right? If not, I have included the script below and here is the MSDN link to find out more (https://msdn.microsoft.com/en-us/library/ms180982.aspx).

alert-list

For those who have setup their alerts, how many of you have remembered to set the DELAY BETWEEN RESPONSES setting?

alerts

When I worked at the Port of Virginia, I was a little less experienced in SQL and didn’t notice this lovely little option. I of course failed to set it. Can anyone guess what happened? YEP, we got low on resources in the wee hours of the morning and SQL kicked off an Error 017-Insufficient Resources. Thousands of emails were generated and caused the Exchange server to go down as well as some other issues that arose because of this. The worst part is that all the emails had to finish processing before we could delete them from the system. I think when all was said and done there was well over 250k messages it created.

So the morale of the story is, pay attention to this little tiny option when you set up your alerts your Exchange Admin will thank you for it.

Blog Challenge

oops

Do you have a “Oops was that me” story to tell? If so, share it using hash tag #sqlmistakes. Link back to this blog, so we can all learn from each other.  I can’t wait to hear your stories.

Create Alert Script

 

Run Book, Run!!!

run-bookHow many of you actually have a “Hit-by-the-Bus” handbook? What is that, you ask? It is a document that explains how to execute all your jobs and SSIS packages. In addition, I preference mine with all key elements someone might need, like where passwords are stored, architectures, backup times, where are backups stored, etc… then dig into the job steps. The purpose of this document is so that someone with some SQL skills could step in if needed. You never know when you will be hit by a bus or win the lottery and someone has to take over for you.

Important things to note:

  • Step by Step with Pictures
  • Diagrams – Pictures are worth a thousand words
  • Plain English— Do this, then this, because of this, and watch out for that
  • Jobs- Rerun information, what to do if fails, what not to rerun when
  • Make a HARD Copy

Here is an example:

SERVER NAME

JOBS

LoadEDIDataandValidate: Imports a file \\EDI_FTP\CUSTOM_HOLD_RELEASE\EDI35020110908.log of EDI records that were sent from Gentran to Server A and Server B. It then validates that Server A and Server B have posted those records to their systems. Alerts are sent when something does not post with 15 minutes or record is in QUEUE status on Server B for more than 60 minutes. Server A and Server B data are kept separate on purpose do not combine those tables. As of 3/9/2015 It also sends out a TXT message if count is >50 that have not been posted.

Schedule: Runs Daily every 15 minutes between 2:16 and 11:21 am. This corresponds with 15minutes after Gentran begins and ends its daily processing.

Steps: Executes SSIS Package EDI350Import.dtsx and executes 2 stored procedures; jobValidateEDIServerAEDI350ServerB and jobValidateEDI350

Rerun: Can be rerun any time. Right Click on Agent job and Choose Start Job at step… There is only one.

sample

Here are some other examples of rerun information (try to be a clear as possible:)

Rerun: Can be rerun prior to 4 pm. If run post 4pm you’ll have to manually change the date (@pdate) of the data being pulled. Always verify no partial data was brought into table before rerunning clear out any data loaded.

Rerun: Do not rerun. Load the data manually to Server X for any missing data and use date_billed as key field for data pull

Rerun: This job will fail if there is a duplicate XXX number. You’ll need to resolve the duplicate before you can successfully rerun. It can be rerun prior to 4 pm. If run post 4pm you’ll have to manually change the date (@pdate) of the data being pulled. Always verify no partial data was brought into table before rerunning clear out any data loaded.

Why Share My Knowledge?

Don’t try to build job security into what you do. I know many that worry about giving up the knowledge to others. Having the sole “how to” knowledge for some, gives them a sense of job security. While to a point that might be true, it also locks you in to your current position. Many that hoard their knowledge never advance because they find themselves invaluable in their current position. “We can’t move them because they are the only ones who know about such and such”. Why put yourself in that position? If you can’t ever be replaced, you also can’t move up.

As a lone dba, I find this run book to be vital. It allows me to direct someone to the book and I can walk them through running anything I need them to in my absence.  It allows me to take a vacation or a day off while giving others the tools to get things done.

Why is it important to have a hard copy?

I’ve found over the years having some tangible steps in hand to follow and make notes helps those who have to cover for me. It’s very easy for them to grab a book off my shelf and follow step 1, 2, and 3. It also gives them a place to take notes as they go through the steps that I can later use to modify documentation for better clarity.

If you don’t have a run book I highly suggest you take the time to make one. Now keep in mind a run book is only a helping guide. I automate as much error handling as possible and build in code to minimize the use of this of this book.  However, in my opinion it is invaluable.  The book can give you some space for someone else to cover for you and when that day comes when you win the lottery, you will have left everyone with great notes on how to run things.

Now, off to buy that lottery ticket. Wish me luck!

Hide and Group Columns in SSRS Using a Parameter

Ever had users come to you and request another version of a report just to add another field and group data differently? Today, was such the day for me. I really don’t like have multiple versions of the same report out there. So, I got a little fancy with the current version of the report and added a parameter then used expressions to group the data differently and hide columns. For those new to SSRS I’ve embedded some links to MSDN to help you along the way.

Current Report

The report gives summarized counts by invoice date.  It currently has a ROW group using date_invoiced and the detail row is hidden from user.

current-report

row-group-2

group-exp3

New Version

To complete the user request to have Item Codes and Descriptions added to the report I need to find a way to group the data by Item and show Item columns without disturbing the current report that is currently used by many consumers.

To Do:

  • Add Parameter
  • Set Available Values
  • Set Default Values
  • Add New Columns
  • Change Visibility
  • Change Grouping to group data using parameter

Step 1: Add Parameter

add-para-4

 Step 2: Set Available Values

add-values-5

Step 3: Set Default Values – I want to make sure my current users get their version of the report simply, so I set it to No (N).

add-default-6

Step 4: Next Add Columns.  I was lucky that the fields (Item Code, Item Desc) the user requested to be add was already part of the dataset used, so no additional coding was needed on the stored procedure.

add-fields-7

Step 5: Next change the Visibility attributes. You want to HIDE the column when the IncludeItemDetails parameter is NOT YES (Y). I did this for both item columns.

visibility-8

visibility-9

Step 6: Next I needed to change the grouping. The report is currently group by date_invoiced only. To make the data now total by Item I need to group it by Item only when the IncludeItemDetails parameter is Yes (Y). I did this using an IIF expression setting it to IF IncludeItemDetails=Y then group using field value else don’t (0). Again I did this for both fields.

grouping-10

expression-11

espression-12

You will see it’s relatively simple to do, and prevents a whole new report version from being created. For you beginners out there, it’s a very easy way to start to minimize the number of reports you have to maintain. Try it.

 

 

Back to Basics: Why not parameterize?

I think sometimes those of us that have been doing database administration/development for a while take it for granted that everyone knows the basics. One such basic is parameterizing stored procedures. This allows us to potentially consolidate multiple stored procedures into a single procedure.  It’s as simple thing to do that many don’t.

I try to parameterize as many stored procedures as possible. This not only minimizes the amount of procedures I need to maintain, it in my opinion is a much cleaner way to code. It disturbs me when I see multiple stored procedures that pull the exact same data, but may have slight differences between them. Whether it be a sort, a where clause, or even just an extra field or two that makes it different, some developers think you need a different procedure for each one . Why not consolidate and parameterize?

Exhibit A

The code below is an example of a real work scenario.  Originally, it was 8 stored procedures and with 8 correlated reports. By simply adding a Report Type parameter I was able to make it one stored procedure and as well as consolidate to a single report.

To add a new dataset just right click on Datasets and choose Add Dataset. Since the report is a stored procedure we set the dataset connection string to the stored procedure name and its parameters. This is just my preferred method. You can also choose the stored procedure from the drop down.

rep

rptTrackMonthlyStats @ReportType, @year, @startdate, @enddate

rp

In the Report Type parameter, choose add Available Values. I typed in each option so the user could choose which report layout/data they wanted to see from drop down. That parameter will be passed to the stored procedure upon execution and the proper dataset will be returned. The users will never see the T, TD etc. they only see the label so it doesn’t make any difference to them what those are.

Parareport connectiom

You can even go as far as using these parameters to hide and show different report elements, but that’s for another time. Stay tuned for more back to the basics.

NOTE: There are some reasons not to do this, like the reuse of the execution plans and parameter sniffing but in these cases consolidating would not be an issue as they use the same parameters.