Search This Blog

Tuesday, November 9, 2010

SQL Server Setup has encountered the following error:File format is not valid

Today , we faced an issue where SQL Server 2008 R2 setup was failing at the very beginning .This issue can also be reproduced on 2008 and also in 2005 (in a slightely different way) .Please find the RCA below :

Version : SQL Server 2008 R2
OS : Win Server 2008

Error :
TITLE: SQL Server Setup failure.
------------------------------
SQL Server Setup has encountered the following error:
File format is not valid..
------------------------------
BUTTONS:
OK
------------------------------

Resolution :
Its very clear that there is a file that does not have a correct format and SQL Server cannot read it .So we need to find which file is that .We need to first open the Setup logs .In SQL 2008 a folder is created with the timstamp and all the logs are created inside it .On my machine it was C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20101109_201205 .

I first opened Detail_ComponentUpdate.txt

2010-11-09 20:13:14 Slp: Running Action: GatherUserSettings
2010-11-09 20:13:21 Slp: -- PidPublicConfigObject : ValidateSettings is normalizing input pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : NormalizePid is normalizing input pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : NormalizePid found a pid containing dashes, assuming pid is normalized, output pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPublicConfigObject : ValidateSettings proceeding with normalized pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : Initialize is initializing using input pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : NormalizePid is normalizing input pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : NormalizePid found a pid containing dashes, assuming pid is normalized, output pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : Initialize proceeding with normalized pid=[PID value hidden]
2010-11-09 20:13:21 Slp: -- PidPrivateConfigObject : Initialize called ValidatePid, output is pid=[PID value hidden] validateSuccess=True output editionId=EVAL(0x2467BCA1)
2010-11-09 20:13:21 Slp: -- PidPublicConfigObject : ValidateSettings initialized private object, result is initializeResult=Success

2010-11-09 20:13:22 Slp: Detected localization resources folder: 1033
2010-11-09 20:13:22 Slp: License file: C:\Documents and Settings\Abhay\Desktop\2008 R2 X64\2008_R2_x86\x86\1033\License_EVAL.rtf
2010-11-09 20:13:22 Slp: Error: Action "GatherUserSettings" threw an exception during execution.

2010-11-09 20:13:22 Slp: Microsoft.SqlServer.Setup.Chainer.Workflow.ActionExecutionException: Thread was being aborted. ---> System.Threading.ThreadAbortException: Thread was being aborted.
2010-11-09 20:13:22 Slp: at System.Threading.WaitHandle.WaitOneNative(SafeWaitHandle waitHandle, UInt32 millisecondsTimeout, Boolean hasThreadAffinity, Boolean exitContext)
2010-11-09 20:13:22 Slp: at System.Threading.WaitHandle.WaitOne(Int64 timeout, Boolean exitContext)
2010-11-09 20:13:22 Slp: at System.Threading.WaitHandle.WaitOne(Int32 millisecondsTimeout, Boolean exitContext)
2010-11-09 20:13:22 Slp: at System.Threading.WaitHandle.WaitOne()
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Configuration.UIExtension.Request.Wait()
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Configuration.UIExtension.UserInterfaceProxy.SubmitAndWait(Request request)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Configuration.UIExtension.UserInterfaceProxy.NavigateToWaypoint(String moniker)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Configuration.UIExtension.UserInterfaceService.Waypoint(String moniker)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Configuration.UIExtension.WaypointAction.ExecuteAction(String actionId)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Chainer.Infrastructure.Action.Execute(String actionId, TextWriter errorStream)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Setup.Chainer.Workflow.ActionInvocation.InvokeAction(WorkflowObject metabase, TextWriter statusStream)
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Setup.Chainer.Workflow.PendingActions.InvokeActions(WorkflowObject metaDb, TextWriter loggingStream)
2010-11-09 20:13:22 Slp: --- End of inner exception stack trace ---
2010-11-09 20:13:22 Slp: at Microsoft.SqlServer.Setup.Chainer.Workflow.PendingActions.InvokeActions(WorkflowObject metaDb, TextWriter loggingStream)
2010-11-09 20:13:25 Slp: Received request to add the following file to Watson reporting: C:\Documents and Settings\Abhay\Local Settings\Temp\tmp11E.tmp
2010-11-09 20:13:25 Slp: The following is an exception stack listing the exceptions in outermost to innermost order
2010-11-09 20:13:25 Slp: Inner exceptions are being indented
2010-11-09 20:13:25 Slp:
2010-11-09 20:13:25 Slp: Exception type: System.ArgumentException
2010-11-09 20:13:25 Slp: Message:
2010-11-09 20:13:25 Slp: File format is not valid.
2010-11-09 20:13:25 Slp: Stack:
2010-11-09 20:13:25 Slp: at System.Windows.Forms.RichTextBox.StreamIn(Stream data, Int32 flags)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.RichTextBox.LoadFile(Stream data, RichTextBoxStreamType fileType)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.RichTextBox.LoadFile(String path, RichTextBoxStreamType fileType)
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.InstallWizard.EULAPidView.UpdateLicenseText(String filepath)
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.InstallWizard.EULAPidController.LoadData()
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.InstallWizardFramework.InstallWizardPageHost.PageEntered(PageChangeReason reason)
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.WizardFramework.UIHost.set_SelectedPageIndex(Int32 value)
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.WizardFramework.UIHost.GoNext()
2010-11-09 20:13:25 Slp: at Microsoft.SqlServer.Configuration.WizardFramework.NavigationButtons.nextButton_Click(Object sender, EventArgs e)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Control.OnClick(EventArgs e)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Button.OnClick(EventArgs e)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Control.WndProc(Message& m)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.ButtonBase.WndProc(Message& m)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Button.WndProc(Message& m)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
2010-11-09 20:13:25 Slp: at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
2010-11-09 20:28:33 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\Microsoft SQL Server to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20101109_201205\Registry_SOFTWARE_Microsoft_Microsoft SQL Server.reg_
2010-11-09 20:28:33 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20101109_201205\Registry_SOFTWARE_Microsoft_Windows_CurrentVersion_Uninstall.reg_
2010-11-09 20:28:33 Slp: Sco: Attempting to write hklm registry key SOFTWARE\Microsoft\MSSQLServer to file C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20101109_201205\Registry_SOFTWARE_Microsoft_MSSQLServer.reg_
2010-11-09 20:28:36 Slp: File format is not valid.
2010-11-09 20:28:36 Slp: Watson bucket for exception based failure has been created
2010-11-09 20:28:36 Slp: Sco: Attempting to create base registry key HKEY_LOCAL_MACHINE, machine
2010-11-09 20:28:36 Slp: Sco: Attempting to open registry subkey Software\Microsoft\PCHealth\ErrorReporting\DW\Installed
2010-11-09 20:28:36 Slp: Sco: Attempting to get registry value DW0200
2010-11-09 20:29:01 Slp: Submitted 1 of 1 failures to the Watson data repository

Assuming that there is a file with incorrect format I took a chance to open this file as mentioned in the error above .C:\Documents and Settings\Abhay\Desktop\2008 R2 X64\2008_R2_x86\x86\1033\License_EVAL.rtf . Since this is an RTF file we can open it in WORDPAD ....

When opened , I found it unreadable ...Initially I thought it is suppose to be like that as there might be something encrypted .
However , there were other license files in the same folder which were absolutely readable ..

This made me curious and I checked the same file on my machine as I also had the same EVAL setup ...I was able to read it word by word .So it was clear that the file was corrupt ....We tried and swapped the file between my TP and the server ...The setup moved forward :) .

Hope this helps .

Error 1706. An installation package for the product Microsoft SQL Server 2005 cannot be found. Try the installation again using a valid copy of the installation package 'SqlRun_SQL.msi'.

adding a new post after a good gap ...
Recently we faced an issue where we lost physical files of Master database (master.mdf and mastlog.ldf).We had the backup files but we could not use them unless SQL Server is up and running .So we had no choice but to rebuild master .

We tried the step below via DOS prompt :
C:\Documents and Settings\Abhay\Desktop\softwares\SQLEVAL_2005\Servers>start /wait setup.exe /qn INSTANCENAME=CORRUPT REINSTALL=SQL_Engine REBUILDDATABASE=1 SAPWD=XXXXX

This had always worked for me and is also mentioned in BOL .However , this time this did not work .The errors I got in the log file were (to find the error we should search for Return Value 3):

Error 1706. An installation package for the product Microsoft SQL Server 2005 cannot be found. Try the installation again using a valid copy of the installation package 'SqlRun_SQL.msi'.
MSI (s) (2C:08) [14:07:26:265]: User policy value 'DisableRollback' is 0
MSI (s) (2C:08) [14:07:26:265]: Machine policy value 'DisableRollback' is 0
Action ended 14:07:26: InstallFinalize. Return value 3.

The setup files were valid and had been used many time in the past ..I ran SQLRun_SQL.msi manually and it was running fine .I also used 2 different setups and go the same error .
Also

MSI (s) (2C:08) [14:07:26:281]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\modellog.ldf; To be installed; Won't patch; No existing file
MSI (s) (2C:08) [14:07:26:281]: Executing op: FileCopy(SourceName=C:\Config.Msi\3bf6ceb.rbf,,DestName=C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\model.mdf,Attributes=32800,FileSize=1245184,PerTick=0,,VerifyMedia=0,ElevateFlags=3,,,,,,,InstallMode=4194308,,,,,,,)
MSI (s) (2C:08) [14:07:26:281]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\model.mdf; To be installed; Won't patch; No existing file
MSI (s) (2C:08) [14:07:26:296]: Executing op: FileCopy(SourceName=C:\Config.Msi\3bf6cea.rbf,,DestName=C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\msdblog.ldf,Attributes=32800,FileSize=786432,PerTick=0,,VerifyMedia=0,ElevateFlags=3,,,,,,,InstallMode=4194308,,,,,,,)
MSI (s) (2C:08) [14:07:26:296]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\msdblog.ldf; To be installed; Won't patch; No existing file
MSI (s) (2C:08) [14:07:26:296]: Executing op: FileCopy(SourceName=C:\Config.Msi\3bf6ce9.rbf,,DestName=C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\msdbdata.mdf,Attributes=32800,FileSize=12255232,PerTick=0,,VerifyMedia=0,ElevateFlags=3,,,,,,,InstallMode=4194308,,,,,,,)
MSI (s) (2C:08) [14:07:26:296]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\msdbdata.mdf; To be installed; Won't patch; No existing file
MSI (s) (2C:08) [14:07:26:296]: Executing op: FileCopy(SourceName=C:\Config.Msi\3bf6ce8.rbf,,DestName=C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\mastlog.ldf,Attributes=32800,FileSize=853016576,PerTick=0,,VerifyMedia=0,ElevateFlags=3,,,,,,,InstallMode=4194308,,,,,,,)
MSI (s) (2C:08) [14:07:26:296]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\mastlog.ldf; To be installed; Won't patch; No existing file
MSI (s) (2C:08) [14:07:26:312]: Executing op: FileCopy(SourceName=C:\Config.Msi\3bf6ce7.rbf,,DestName=C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\master.mdf,Attributes=32800,FileSize=92602368,PerTick=0,,VerifyMedia=0,ElevateFlags=3,,,,,,,InstallMode=4194308,,,,,,,)
MSI (s) (2C:08) [14:07:26:312]: File: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\master.mdf; To be installed; Won't patch; No existing file

This was strange as these files were existing ....

I ran the same command using \qb option and got the same error but in a form of a pop-up box.

I then checked and found that there is an option called as REINSTALLMODE .;REINSTALLMODE is used to repair installed components. The supported values are:
O – Reinstall if file is missing, or an older version is present.
M – Rewrite machine specific reg keys under HKLM
U – Rewrite user specific reg keys under HKCU
S – Reinstall all shortcuts

The Option O looked appropriate but I used all i.e.
C:\Documents and Settings\Abhay\Desktop\softwares\SQLEVAL_2005\Servers>start /wait setup.exe /qn INSTANCENAME=CORRUPT REINSTALL=SQL_Engine REINSTALLMODE=OMUS REBUILDDATABASE=1 SAPWD=XXXXX

This resolved the issue on my laptop but not on client server .
Finally I found that there is one more option which is never documented ...and this option is V
The issue was that the setup was copied for a different server and the original media location of where the RTM bits where installed in some cache file.That was the reason we were getting the error about the installation package not being found.To resolve this we had to use the option V to re-cache the media from the new location.

C:\Documents and Settings\Abhay\Desktop\softwares\SQLEVAL_2005\Servers>start /wait setup.exe /qn INSTANCENAME=CORRUPT REINSTALL=SQL_Engine REINSTALLMODE=V REBUILDDATABASE=1 SAPWD=XXXXX


This ran like a knife through butter ..

Hope it will help you in future ...

Wednesday, September 29, 2010

Error: 26049, Severity: 16, State: 1 :Server local connection provider failed to listen on [ \\.\pipe\SQLLocal\XXXXX ]. Error: 0x5

There might be many reasons and many solutions for this kind of error.But let me explain my situation :) ...For testing I installed a new default instance on one of the test servers.The setup was successful .However , later one of the other named instance did not come up after the restart.

The errors were :

2010-09-30 03:17:31.65 Server Error: 26049, Severity: 16, State: 1.
2010-09-30 03:17:31.65 Server Server local connection provider failed to listen on [ \\.\pipe\SQLLocal\XXXXX ]. Error: 0x5
2010-09-30 03:17:31.65 Server Error: 17182, Severity: 16, State: 1.
2010-09-30 03:17:31.65 Server TDSSNIClient initialization failed with error 0x5, status code 0x40.
2010-09-30 03:17:31.65 Server Error: 17182, Severity: 16, State: 1.
2010-09-30 03:17:31.65 Server TDSSNIClient initialization failed with error 0x5, status code 0x1.
2010-09-30 03:17:31.65 Server Error: 17826, Severity: 18, State: 3.
2010-09-30 03:17:31.65 Server Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2010-09-30 03:17:31.65 Server Error: 17120, Severity: 16, State: 1.
2010-09-30 03:17:31.65 Server SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

The issue is quite simple unlike it looks like .I tried everything like changing the named pipe , etc. etc.
Assuming, that 0x5 is the OS error which means Access Denied ,I gave the permission to the domain ID on the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer.
This resolved the issue .

status code 0x40 means that there is an issue with Shared memory listener
status code 0x50 means that there is an issue with Named pipe listener
status code 0x0A means that there is an issue with TCP/IP listener

Please go through this MSDN blog (which has one more link in it).
http://blogs.msdn.com/b/sql_protocols/archive/2006/03/09/546655.aspx?wa=wsignin1.0

Happy Learning !!!

Tuesday, September 28, 2010

A simple VB script to retain Errorlogs worth 90 days (or as you like)

So far ,I heard retaining X number of errlogs which is widely used (So I am not writing that script here)...But one of our clients asked us to retain errlogs worth only 90 days .The client was not ready to recycle the errorlogs and wanted us to keep them to the default ...

Finally , we could come out with a simple VB script that can do it .The script code is mentioned below .

Code :


sFolder = "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG"
iMaxAge = 90
Set oFSO = CreateObject("Scripting.FileSystemObject")
If oFSO.FolderExists(sFolder) Then
for each oFile in oFSO.GetFolder(sFolder).Files
If DateDiff("d", oFile.DateLastModified, Now) > iMaxAge and (oFile.name= "ERRORLOG.1" or oFile.name= "ERRORLOG.2" or oFile.name= "ERRORLOG.3" or oFile.name= "ERRORLOG.4" or oFile.name= "ERRORLOG.5" or oFile.name= "ERRORLOG.6") Then
wscript.echo "Deleting" &oFile.Name
oFile.Delete
End If
next
End If

You will need to create a scheduled task/Or SQL Server job using xp_cmdshell, to run at a specific time .Once its kicked off , if any of the files mentioned in the code (Note : errorlog will not be tried upon) have a timestamp greater than 90 days from the day you are executing the file , It will delete those files ...for example if I have 7 files below :

File Timestamp
Errorlog 12/9/2009
Errorlog.1 12/8/2009
Errorlog.2 12/7/2009
Errorlog.3 12/6/2009
Errorlog.4 12/5/2009
Errorlog.5 12/4/2009
Errorlog.6 12/3/2009

The files deleted will be : Errorlog.3,Errorlog.4,Errorlog.5 and Errorlog.6

You need to change the Path of sFolder variable ..
Happy learning ...

Abhay

checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'XXXXX\clusterservice

a little Background :
As per our security guidelines Builtin\Administrator login should be removed from all the SQL Server instances.It was implemented on all the SQL Server instances including those which are on MCSC (Windows Cluster).
After that, the nodes were rebooted due to patching requirements .The nodes came up , but SQL Server did not :D ...

Error in cluster logs (you will not find it in SQL Server logs) :

ERR SQL Server : [sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Native Client][SQL Server]Login failed for user 'XXXXX\clusterservice'.
ERR SQL Server : [sqsrvres] ODBC sqldriverconnect failed

The error was clear .The cluster service was not able to login to SQL Server through user XXXXX\clusterservice but via a LOGIN ...That login is BUILTIN\Administrators.
But why it needs to login to SQL Server ?? Because it needs to run the isAlive check to make sure that the SQL Server is up and running .It also runs the looksalive (its a function)check but that does not need to query SQL Server .Is Alive check runs select @@servername and waits for the return message through ODBC client (in our case its SQL Server Native client).Thus the Isalive check was not able to create a trusted connection and we lost the access to Virtual server.

So, in a SQL Server 2005\2008 failover cluster installation, the cluster service account relies on membership in the BUILTIN\Administrators group to log on to SQL Server 2005\2008 to run the IsAlive check.If you remove the BUILTIN\Administrators group from a failover cluster, you must explicitly grant the MSCS service account permissions to log on to the SQL Server 2005 failover cluster.

The SQL Server 2005 resource starts an instance of the Sqlcmd.exe utility under the security context of the MSCS service account. Then, the SQL Server 2005 resource runs an SQL script over a dedicated administrator connection (DAC) that samples various dynamic management views (DMV). Because a DAC connection is used to collect some diagnostic data, the clustering service account must be provisioned in the SYSADMIN fixed server role. If later someone says that clustering service account cannot be provisioned in the SYSADMIN fixed server role, then we can create a login for cluster service account that is not given the SYSADMIN fixed server role .I have not tested it yet .So cannot confirm that this will work on not ...

Commands :
CREATE LOGIN [\] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
EXEC master.sp_addsrvrolemember @loginame = N'\ ', @rolename = N'sysadmin'

happly learning .....
Regards
Abhay

Monday, September 20, 2010

Msg 22004, Level 16, State 1, Line 0 :Failed to open loopback connection. Please see event log for more information.Failed to open loopback connection. Please see event log for more information.

I am back with one more solution :).The problem was simple but the error made me thinking ..I was trying to do xp_readerrorlog on a small file .But my SPID hanged ..after some time I got this error :

SQL Server error in QA :
Msg 22004, Level 16, State 1, Line 0
Failed to open loopback connection. Please see event log for more information.
Msg 22004, Level 16, State 1, Line 0
error log location not found

I read somewhere that this error comes when SQL Server Agent failes to come up.Yes my agent was down.But I faied to understand what is the relation between running xp_readerrlog and SQL Agent not running .Still, I tried to run the agent and got the error ...So something is related to SQL Agent here and that something is that If SQL Agent is not running , I can run xp_readerrlog successfully (I will prove it wrong later).

I checked the application logs immediately and got these errors :

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17052
Date: 09/20/2010
Time: 18:24:22
User: N/A
Computer: abchaudh
Description:
Severity: 16 Error:10061, OS: 10061 [Microsoft][SQL Native Client]TCP Provider: No connection could be made because the target machine actively refused it.


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 4d 27 00 00 0a 00 00 00 M'......
0008: 12 00 00 00 61 00 62 00 ....a.b.
0010: 63 00 68 00 61 00 75 00 c.h.a.u.
0018: 64 00 68 00 00 00 0e 00 d.h.....
0020: 00 00 6d 00 61 00 73 00 ..m.a.s.
0028: 74 00 65 00 72 00 00 00 t.e.r...


Event Type: Error
Event Source: SQLAgent$CORRUPT
Event Category: Service Control
Event ID: 103
Date: 09/20/2010
Time: 18:24:45
User: N/A
Computer: abchaudh
Description:
SQLServerAgent could not be started (reason: Unable to connect to server 'abchaudh\CORRUPT'; SQLServerAgent cannot start).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type: Information
Event Source: SQLAgent$CORRUPT
Event Category: Service Control
Event ID: 102
Date: 09/20/2010
Time: 18:24:52
User: N/A
Computer: abchaudh
Description:
SQLServerAgent service successfully stopped.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

I also checked the SQL Agent logs :

SQL Agent logs :

2010-09-20 18:25:36 - ! [298] SQLServer Error: 10061, TCP Provider: No connection could be made because the target machine actively refused it. [SQLSTATE 08001]
2010-09-20 18:25:36 - ! [165] ODBC Error: 0, Login timeout expired [SQLSTATE HYT00]
2010-09-20 18:25:36 - ! [298] SQLServer Error: 10061, An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. [SQLSTATE 08001]
2010-09-20 18:25:36 - ! [000] Unable to connect to server 'abchaudh\CORRUPT'; SQLServerAgent cannot start
2010-09-20 18:25:42 - ! [298] SQLServer Error: 10061, TCP Provider: No connection could be made because the target machine actively refused it. [SQLSTATE 08001]
2010-09-20 18:25:42 - ! [165] ODBC Error: 0, Login timeout expired [SQLSTATE HYT00]
2010-09-20 18:25:42 - ! [298] SQLServer Error: 10061, An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. [SQLSTATE 08001]
2010-09-20 18:25:42 - ! [382] Logon to server 'abchaudh\CORRUPT' failed (DisableAgentXPs)
2010-09-20 18:25:43 - ? [098] SQLServerAgent terminated (normally)

So, now I have 2 issues : SQL Agent is not running and xp_readerrorlog is timing out.

.If you see one of my older posts on "target machine actively refuses it" , you will have some information .

So I opened CLICONFG and found 3 incorrect aliases which were not using the right port ..

I removed them and SQL Agent came on line ...xp_read errorlog also started working ...

I stopped SQL Agent and stil everything was working ...
So the issue was that xp_readerrlog tries to connect to SQL Server but stucks due to the Alias pointing to incorrect port.

But this does not affect SQL Server service .To check why its not affecting SQL Server service , I disabled the sahred memory protocol and BANG....SQL Server connection failed ...

Happy learning ....

Wednesday, August 25, 2010

Could not load the DLL xpstar90.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.)

Hi Team ,
This issue was faced by someone outside IBM but my main intention is to explain the benefit of another nice tool : Dependency Walker (http://www.dependencywalker.com/)

Issue :
SQL Server Agent failed to come up after the service account password was reset at AD level .

Error(s) :
In the event log you will see these errors in sequence :

Description:
Could not load the DLL xpstar90.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.).

Description:
Failed to retrieve SQLPath for syssubsystems population.

Description:
SQLServerAgent could not be started (reason: Failed to load any subsystems. Check errorlog for details.).

The first error is the main error and rest are the errors following the first error and we need not to think about them .

Troubleshooting and Resolution :
The error clearly says that either there is a problem with xpstar90.dll or the other dlls that this dll references .
This file is located in I first tried to re-register xpstar90.dll by using regsvr32 xrstar90.dll and got this message :

xpstar90.dll was loaded , But the dllRegisterServer entry point was not found.

I have heard that sometimes there is a different way of registering some DLLs , so by this error I did not come to the conclusion that this file is corrupt.
I was also thinking that there might be some other DDL that this DLL refers to , which got corrupted.

I decided to see the tree structure of xpstar90.dll in Dependency Walker . I opened the C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\xpstar90.dll in it and go this output.



So, in this case XPSTAR90.DLL itself was corrupt .I found its version 2005.90.4035.0 and replaced it with another one that I had in another instance .
SQL Agent came online .
In case if it dos not , then we need to uninstall Native client from ADD Remove Programs and reinstall it .

Happy learning
Abhay

Monday, August 2, 2010

Error 1117 :The request could not be performed because of an I/O device error.

Our backups were failing under these conditions :

Scenario 1: The System databases plus few user databases are on local disk & few user databases are on LUNs.

Scenario 2: The System & user databases are completely on LUNs

The backups were running for some good amount of time but then use to fail with Error 1117.I know that taking backups on network is not suported but I was breaking my head on this ERROR (1117)to know the reason behind this error .After going through a few tests on my machine using external HDDs ,my understanding of this error is :


-> Error 1117 is ERROR_IO_DEVICE .Thats fine .But I was curious about knowing the situations under which this error might occur and what is the exact meaning on this Error .Does Error_IO_Device means that the Hardware is corrupt ? Found that this error occurs under the below situations and then found the reasons behind those situations as well :

STATUS_FT_MISSING_MEMBER
ERROR_IO_DEVICE

An attempt was made to explicitly access the secondary copy of information via a device control to the fault tolerance driver and the secondary copy is not present in the system.


STATUS_FT_ORPHANING
ERROR_IO_DEVICE
{FT Orphaning} A disk that is part of a fault-tolerant volume can no longer be accessed.


STATUS_DATA_OVERRUN
ERROR_IO_DEVICE
{Data Overrun} A data overrun error occurred.

STATUS_DATA_LATE_ERROR
ERROR_IO_DEVICE
{Data Late} A data late error occurred.


STATUS_IO_DEVICE_ERROR
ERROR_IO_DEVICE
The I/O device reported an I/O error

STATUS_DEVICE_PROTOCOL_ERROR
ERROR_IO_DEVICE
A protocol error was detected between the driver and the device.


STATUS_DRIVER_INTERNAL_ERROR
ERROR_IO_DEVICE
An error was detected between two drivers or within an I/O driver.


So this error mapping says that this error will be thrown out if anyof these conditions are met .In my situation we were falling in into STATUS_DATA_LATE_ERROR since we were also getting thses entries in the SQL serve errorlogs : "x I/O requests are pending for more then 15 secs ............filename.mdf"

If you are running backup jobs you might also get this error -1073548784 .
This is a common error and may come when the query you are running remotely is incorrect , or the table you are trying to drop does not exist .Try to export a table that already exists in another DB and you will recreate this OLEDB error.So we need not to worry about finding the message identifier for this number .


Action plan :
-----------------
--try to take backup of another database located remotely and of near about same size . I mean around 20GB.

--Run Chkdsk on this drive or ask someone to do that and see if the consistency errors come up .

--Create a similar database on another external drive like this one and take the backup .


Conclusion :
---------------
I am very much certain that the issue is with the drive and(OR)Network.The 15 sec IO delay messages in Errorlogs also suggests the same .But as you can see this error also comes when dataa gets late in reaching the destination (STATUS_DATA_LATE_ERROR) I am suspecting that the network might also be a bit slow and contributing to the backup failure .

Now the ball is in your court how you explain this to the client :) .

Hapy Learning

Tuesday, July 27, 2010

Msg 8914, Level 16, State 1, Line 1 -> Incorrect PFS free space information for page

Msg 8914, Level 16, State 1, Line 1
Incorrect PFS free space information for page (1:61991) in object ID 1993058136, index ID 1, partition ID 72057594955366400, alloc unit ID 71906736119218176 (type LOB data). Expected value 0_PCT_FULL, actual value 100_PCT_FULL.


This was the error we were getting in the Docs table of one of the Sharepoint database .The Compatibility level was 80 and Build was 1399 (2005 RTM).

I tried a lot of things on it like :
-> I Rebuilt the clustered index with and without LOB_COMPACTION option .
-> DBCC page shows is fill factor 100
-> Changed the fill factor to 100 explicitly
-> Ran dbcc updateusage
-> changed the compatibility level to 90
-> changed the fill factor to to 99 ,50 etc
Nothing helped .The profiler did not show much (my intention was to know what checkdb is doing internally).

Finally I took the backup of the database and restored it as a test database .It did not give any errors .This means that actually its not a corruption .

On the restored database Ran DBCC checkdb with repair_allow_data_loss .
It fixed the issue without harming the data .Finally Ran the same on the Sharepoint database and it resolved the issue .

Hope this gives you the confidence to run the repair_allow_data_loss for this issue .
But remember , almost every time if you run it with repair_allow_data_loss you will end up loosing the data .So be careful .

This situation was AN EXCEPTION and you can safely use this option of checkdb.

Root cause :
Microsoft says that
the engine (just like OS does which giving pages to processes ) pre-allocates a set of data pages (say X) to the SPID which needs it and marks them as 100% full in PFS assuming that those pages will eventually get filled very soon.It does this to avoid frequently updating PFS page and improving performance.But later when the SPID completes its work in less pages (say X-Y) , these remaining pages are released .However, the remaining pages should be marked again as empty (0_PCT_FULL) which it does not do and hence DBCC CheckDB reports those errors (SQL 2000 silently use to fix it ).Repair_allow_data_loss will fix it with no data loss actually.

Regards
Abhay

Monday, July 26, 2010

Finding the last date when the LOG/FULL/DIFF/FILEGROUP backup was taken for all the databases

Hi Guys ,
While creating a few scripts , a requirement came where I had to find the last backups (all types) taken for all the databases (except tempdb) .
Please find the script below .Hope it helps you in your daily activities .If you want to automate it for all the instances in your environment , please let me know and I can send you some more files.



/*
Script : Last_bckp.sql
Author : Abhay Chaudhary,
Date : 26th JUL, 2010
Purpose : Collecting SQL Server 2000/2005/2008 last backup taken information.
Requirements : Do a CTRL+F and change the to the DB where you want to
create the object.
Suggestions : hi_abhay78@yahoo.co.in
Version : 1.0
*/


USE
set nocount on
if not exists (select * from ..sysobjects where name ='bckp_types' and type ='S')
begin
create table ..bckp_types (num int identity(1,1),type varchar(1),bkp_name varchar(20))
insert into ..bckp_types (type,bkp_name) values ('D','Full backup')
insert into ..bckp_types (type,bkp_name) values ('L','Log Backup')
insert into ..bckp_types (type,bkp_name) values ('F','Filegroup backup')
insert into ..bckp_types (type,bkp_name) values ('I','Differential backup')
end
go

Declare @loop int
select @loop= max(num) from bckp_types
While (@loop !=0)
begin
Select 'last ' + bkp_name +' taken details.' from bckp_types where num=@loop
declare @bk_type varchar(1)
select @bk_type = type from bckp_types where num=@loop

SELECT s.name 'database Name',
b.backup_finish_date 'last backup date',
bmf.physical_device_name 'location of backup'
FROM master..sysdatabases s LEFT OUTER JOIN msdb..backupset b ON s.name = b.database_name
INNER JOIN msdb..backupmediafamily bmf ON b.media_set_id = bmf.media_set_id
WHERE s.name <> 'tempdb'
AND b.backup_finish_date = (SELECT MAX(backup_finish_date)
FROM msdb..backupset
WHERE database_name = b.database_name
AND type = @bk_type)
ORDER BY s.name

set @loop=@loop-1
end
go
Drop table ..bckp_types


Happy Learning ...
Abhay

Friday, July 16, 2010

SQLServer Error: 848, SQL Network Interfaces: The system detected a possible attempt to compromise security.

We faced a strange but simple issue yesterday and as usual I would like to share it with you .



Situation :
-------------
SQL server 2005 SP2
Windows Server 2003 SP2
Cluster : Yes 2 node A-P cluster

Service account of SQL Server Agent service and SQL Server service were same .SQL Server is Clustered .

While SQL Sevrer as well as agent were running fine the account under both the services are running ,got locked(we came to know this later as a rootcause of this issue).Still ,everything was fine and there was no issue since the account got locked after SQL Server and agent were started.

Then we found that all the jobs that were scheduled stopped working .In the job history we found that there is no JOB HISTORY created since the jobs stopped working .But there was not a single failure of the jobs .

Which means that the jobs were not scheduled by the Job schedular >> to the Threads >> to the SPIDs .So , we manually executed the jobs and all of them completed successfully .But again , there was no history being created and those jobs were not doing anything .For example , the backup job was running successfully when we ran it explicitly but no backups were taken .

To drill down further , we ran the commands under the jobs in QA and those were running fine .We created new jobs and there was no change at all in the situation .

Then we checked the SQL Agent logs and found this :

[298] SQLServer Error: 848, SQL Network Interfaces: The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you. [SQLSTATE HY000]
[298] SQLServer Error: 848, Cannot generate SSPI context [SQLSTATE HY000]
[382] Logon to server '(local)' failed (ConnAttemptCachableOp)

This was strange to us as why the connectivity error is not being displayed when we were explicitely executing the job, which completes successfully and doing nothing.
But since it was the connectivity error by agent , we decided to run the jobs by logging on to the server using the account under which SQL Server and agent are running.

We then found out the the account was locked under which SQL Server and agent were still running.

Once the account got unlocked at the AD ,the jobs ACTUALLY started working .

To me it looks like a bug in design and i have logged it on the CONNECT :
https://connect.microsoft.com/SQLServer/feedback/details/575388/strange-behaviour-in-sql-agent-job-on-cluster-where-the-job-runs-but-does-not-do-anything

hope it helos you to resolve your issue .

Saturday, June 19, 2010

Data auditing in SQL Server

There are 2 ways we can audit the SQL Server events to the tracefile (people call it audit log file).

- setting sp_configure parameter 'c2 audit mode' to 1.This will automatically capture all the audit events for all the databases and all the columns . You cannot modify it .Even if you try to , it will
not take the changes made manually .

- Creating our own trace for selected events and columns .Please check BOL for it .

In case you want to go through the second option and that is to create our own trace please see the demo below:

Step 1
In this step we are creating test_yasir trace in C: drive.Then we are setting the Events and columns adn settin gthem to ON .I have choosen a few events and columns .

declare @TraceIdOut int
exec sp_trace_create @TraceIdOut OUTPUT,6, N'c:\test_Yasir'
PRINT @TraceIdOut

declare @On bit
SET @On = 1
exec sp_trace_setevent @TraceIdOut, 14, 6, @On
exec sp_trace_setevent @TraceIdOut, 14, 7, @On
exec sp_trace_setevent @TraceIdOut, 14, 8, @On
exec sp_trace_setevent @TraceIdOut, 14, 9, @On
exec sp_trace_setevent @TraceIdOut, 14, 10, @On
exec sp_trace_setevent @TraceIdOut, 15, 6, @On
exec sp_trace_setevent @TraceIdOut, 15, 7, @On
exec sp_trace_setevent @TraceIdOut, 15, 8, @On
exec sp_trace_setevent @TraceIdOut, 15, 9, @On
exec sp_trace_setevent @TraceIdOut, 15, 10, @On
exec sp_trace_setevent @TraceIdOut, 20, 6, @On
exec sp_trace_setevent @TraceIdOut, 20, 7, @On
exec sp_trace_setevent @TraceIdOut, 20, 8, @On
exec sp_trace_setevent @TraceIdOut, 20, 9, @On
exec sp_trace_setevent @TraceIdOut, 20, 10, @On


exec sp_trace_setevent @TraceIdOut, 104, 1, @On
exec sp_trace_setevent @TraceIdOut, 104, 3, @On
exec sp_trace_setevent @TraceIdOut, 104, 6, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 10, @On
exec sp_trace_setevent @TraceIdOut, 104, 11, @On
exec sp_trace_setevent @TraceIdOut, 104, 14, @On
exec sp_trace_setevent @TraceIdOut, 104, 22, @On
exec sp_trace_setevent @TraceIdOut, 104, 26, @On
exec sp_trace_setevent @TraceIdOut, 104, 35, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On

exec sp_trace_setevent @TraceIdOut, 107, 1, @On
exec sp_trace_setevent @TraceIdOut, 107, 3, @On
exec sp_trace_setevent @TraceIdOut, 107, 6, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 10, @On
exec sp_trace_setevent @TraceIdOut, 107, 11, @On
exec sp_trace_setevent @TraceIdOut, 107, 14, @On
exec sp_trace_setevent @TraceIdOut, 107, 22, @On
exec sp_trace_setevent @TraceIdOut, 107, 26, @On
exec sp_trace_setevent @TraceIdOut, 107, 35, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On

exec sp_trace_setevent @TraceIdOut, 106, 1, @On
exec sp_trace_setevent @TraceIdOut, 106, 3, @On
exec sp_trace_setevent @TraceIdOut, 106, 6, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 10, @On
exec sp_trace_setevent @TraceIdOut, 106, 11, @On
exec sp_trace_setevent @TraceIdOut, 106, 14, @On
exec sp_trace_setevent @TraceIdOut, 106, 22, @On
exec sp_trace_setevent @TraceIdOut, 106, 26, @On
exec sp_trace_setevent @TraceIdOut, 106, 35, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On


exec sp_trace_setevent @TraceIdOut, 105, 1, @On
exec sp_trace_setevent @TraceIdOut, 105, 3, @On
exec sp_trace_setevent @TraceIdOut, 105, 6, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 10, @On
exec sp_trace_setevent @TraceIdOut, 105, 11, @On
exec sp_trace_setevent @TraceIdOut, 105, 14, @On
exec sp_trace_setevent @TraceIdOut, 105, 22, @On
exec sp_trace_setevent @TraceIdOut, 105, 26, @On
exec sp_trace_setevent @TraceIdOut, 105, 35, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On


Step 2 :

In this step we wil apply the filter since you said you need to audit only a user database.We will achieve it using sp_trace_setfilter

sp_trace_setfilter 3,3,0,0,1

In this example I have set the filter on databaseid (to 1 which is master) in traceid 3

Step 3:
In this step we will first confirm if our trace is showing up in the metadata.Do a select * from sys.traces and check the trace you created as well its trace id .
Then start the trace (which is 1) using sp_trace_setstatus

example :
sp_trace_setstatus 3,1

Here traceid is 3 and staus is 1
Further , if you want to add or remove any event use sp_trace_setevent after stopping the trace using sp_trace_setstatus


But in this method there is a problem .The problem is that , if you restart the instance the trace metadata will be washed from the sys.traces DMV.
So you will have to manually run it again .Further the physical trace file (log file) still exist.So you will get the error while creaing the trace .To over come this :

1) I have added the datetime in the file name .So it will create a unique file each minute.
2) I have encapsulated the query into an SP and pinned it to SQL Server startup.

So now

step 1 would be

create proc audit_trace as
declare @TraceIdOut int
Declare @D1 nvarchar(30)
Declare @D2 nvarchar(30)
Declare @D3 nvarchar(30)
Declare @D4 nvarchar(30)
Declare @D5 nvarchar(30)
Declare @trace_name nvarchar(256)

SELECT @D1=DATENAME(Day, GETDATE())
SELECT @D2=DATENAME(month, GETDATE())
SELECT @D3=DATENAME(year, GETDATE())
SELECT @D4=DATENAME(hour, GETDATE())
SELECT @D5=DATENAME(minute, GETDATE())

set @trace_name='c:\trace_'+@d1+'_'+@d2+'_'+@d3+'_'+@d4+'_'+@d5+'_'
print @trace_name

--set @trace_name = 'c:\trace_'+@trace_date+'.trc'
--print @trace_name
exec sp_trace_create @TraceIdOut OUTPUT,6, @trace_name
PRINT @TraceIdOut

declare @On bit
SET @On = 1
exec sp_trace_setevent @TraceIdOut, 14, 6, @On
exec sp_trace_setevent @TraceIdOut, 14, 7, @On
exec sp_trace_setevent @TraceIdOut, 14, 8, @On
exec sp_trace_setevent @TraceIdOut, 14, 9, @On
exec sp_trace_setevent @TraceIdOut, 14, 10, @On
exec sp_trace_setevent @TraceIdOut, 15, 6, @On
exec sp_trace_setevent @TraceIdOut, 15, 7, @On
exec sp_trace_setevent @TraceIdOut, 15, 8, @On
exec sp_trace_setevent @TraceIdOut, 15, 9, @On
exec sp_trace_setevent @TraceIdOut, 15, 10, @On
exec sp_trace_setevent @TraceIdOut, 20, 6, @On
exec sp_trace_setevent @TraceIdOut, 20, 7, @On
exec sp_trace_setevent @TraceIdOut, 20, 8, @On
exec sp_trace_setevent @TraceIdOut, 20, 9, @On
exec sp_trace_setevent @TraceIdOut, 20, 10, @On


exec sp_trace_setevent @TraceIdOut, 104, 1, @On
exec sp_trace_setevent @TraceIdOut, 104, 3, @On
exec sp_trace_setevent @TraceIdOut, 104, 6, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 10, @On
exec sp_trace_setevent @TraceIdOut, 104, 11, @On
exec sp_trace_setevent @TraceIdOut, 104, 14, @On
exec sp_trace_setevent @TraceIdOut, 104, 22, @On
exec sp_trace_setevent @TraceIdOut, 104, 26, @On
exec sp_trace_setevent @TraceIdOut, 104, 35, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On
exec sp_trace_setevent @TraceIdOut, 104, 7, @On
exec sp_trace_setevent @TraceIdOut, 104, 8, @On

exec sp_trace_setevent @TraceIdOut, 107, 1, @On
exec sp_trace_setevent @TraceIdOut, 107, 3, @On
exec sp_trace_setevent @TraceIdOut, 107, 6, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 10, @On
exec sp_trace_setevent @TraceIdOut, 107, 11, @On
exec sp_trace_setevent @TraceIdOut, 107, 14, @On
exec sp_trace_setevent @TraceIdOut, 107, 22, @On
exec sp_trace_setevent @TraceIdOut, 107, 26, @On
exec sp_trace_setevent @TraceIdOut, 107, 35, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On
exec sp_trace_setevent @TraceIdOut, 107, 7, @On
exec sp_trace_setevent @TraceIdOut, 107, 8, @On

exec sp_trace_setevent @TraceIdOut, 106, 1, @On
exec sp_trace_setevent @TraceIdOut, 106, 3, @On
exec sp_trace_setevent @TraceIdOut, 106, 6, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 10, @On
exec sp_trace_setevent @TraceIdOut, 106, 11, @On
exec sp_trace_setevent @TraceIdOut, 106, 14, @On
exec sp_trace_setevent @TraceIdOut, 106, 22, @On
exec sp_trace_setevent @TraceIdOut, 106, 26, @On
exec sp_trace_setevent @TraceIdOut, 106, 35, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On
exec sp_trace_setevent @TraceIdOut, 106, 7, @On
exec sp_trace_setevent @TraceIdOut, 106, 8, @On


exec sp_trace_setevent @TraceIdOut, 105, 1, @On
exec sp_trace_setevent @TraceIdOut, 105, 3, @On
exec sp_trace_setevent @TraceIdOut, 105, 6, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 10, @On
exec sp_trace_setevent @TraceIdOut, 105, 11, @On
exec sp_trace_setevent @TraceIdOut, 105, 14, @On
exec sp_trace_setevent @TraceIdOut, 105, 22, @On
exec sp_trace_setevent @TraceIdOut, 105, 26, @On
exec sp_trace_setevent @TraceIdOut, 105, 35, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On
exec sp_trace_setevent @TraceIdOut, 105, 7, @On
exec sp_trace_setevent @TraceIdOut, 105, 8, @On

/*adding the SP to execute at SQl Server startup */
exec sp_procoption N'audit_trace', 'startup', 'on'

Step 2 and 3 will be same as mentioned above in the begening .

Disadvantage
-------------
Simple ..Its resource consuming .Do not add a lot of columns in the trace .Do specifically what you want to audit.
It entirely depends what all columns you are auditing.
You need to keep the instance under testing phase and monitor the resource consumption due to tracing .
Clear the client that we need to make sure that we have fast disks ,more/faster CPUs, IO processing capabbilities and enough RAM in case they need to do extensive auditing (if there are performance issues).

Using WMI and SQL Agent to fire low memory threshold alert ...

This will work perfectly ....The only thing I wanted to add to the table was when it alerts you it should also fill the column with available MBytes so that you know how much memory was available .......But after trying it for 2 days , I realized that the class through which I am checking another class (Perfmon >> memory >> Available Mbytes) does not have a column for this.I am using "_instance modification" class .May be its due to this that the alert is fired but the job fails when it inserts the availableMbytes ...because this column is not in _instancemodificationevent class...the error number also suggests that .

By the way this one will only alert if your RAM is > 256 every 10 seconds ....this is because I wanted to test it ....you need to modify it to < 256 and every 300 seconds ...so that you get alert every 5 mins or whatever you decide ....


/*******************************************************************************************
* This script will create an Alert to Monitor Physical RAM reaching a low threshold.
* The alert will run a job and the job will enter data in a table.
*******************************************************************************************/

/* Step 1: creating the table to capture the Event information */

USE Master
GO

IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[memory]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE [dbo].[memory]
GO

CREATE TABLE [dbo].[memory] (
[PostTime] [datetime] NOT NULL default (getdate()) ,
[computerName] sql_variant Not Null ,
[RecordID] [int] IDENTITY (1,1) NOT FOR REPLICATION NOT NULL,
[Flag] [int] NOT NULL CONSTRAINT [DF_MEMORY_Flag] DEFAULT ((0)),
) ON [PRIMARY]
GO

CREATE INDEX [Memory_IDX01] ON [dbo].[memory]([recordid]) WITH FILLFACTOR = 100 ON [PRIMARY]
GO

/*Step 2 : Creating the Job that will enter values into the table we created above*/
/*Service account and sql operator option are optional*/

USE [msdb]
GO

IF EXISTS (SELECT job_id FROM msdb.dbo.sysjobs_view WHERE name = N'Capture memory Event')
EXEC msdb.dbo.sp_delete_job @job_name = N'Capture Memory Event', @delete_unused_schedule=1

GO
--DECLARE @ServiceAccount varchar(128)
--SET @ServiceAccount = N''
--DECLARE @SQLOperator varchar(128)
--SET @SQLOperator = N''

BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Capture Memory Event',
@enabled=1,
@notify_level_eventlog=2,
@notify_level_email=3,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'Job for responding to memory events',
@category_name=N'[Uncategorized (Local)]',
--@owner_login_name=@ServiceAccount,
--@notify_email_operator_name=@SQLOperator,
@job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/*Step 3: Insert graph into LogEvents*/


EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Insert data into LogEvents',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'
declare @@server sql_variant
select @@server =serverproperty (''machinename'')

INSERT INTO memory (
PostTime,
Computername
)

VALUES (
GETDATE(),
@@server)
',
@database_name=N'master',
@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

/*Creating the alert and associating it with the Job to be fired */

USE [msdb]
GO

IF EXISTS (SELECT name FROM msdb.dbo.sysalerts WHERE name = N'Respond to Memory_event')
EXEC msdb.dbo.sp_delete_alert @name=N'Respond to memory_event'

GO

DECLARE @server_namespace varchar(255)
SET @server_namespace = N'\\.\root\Cimv2\'

EXEC msdb.dbo.sp_add_alert @name=N'Respond to memory_event',
@message_id=0,
@severity=0,
@enabled=1,
@delay_between_responses=0,
@include_event_description_in=0,
@category_name=N'[Uncategorized]',
@wmi_namespace=N'\\.\root\Cimv2',
@wmi_query=N'SELECT * FROM __InstanceModificationEvent WITHIN 10 WHERE TargetInstance ISA ''Win32_PerfFormattedData_PerfOS_Memory'' AND TargetInstance.AvailableBytes > 256',
@job_name='Capture memory Event' ;

--EXEC msdb.dbo.sp_add_notification @alert_name=N'Respond to memory_event', @operator_name=N'Test', @notification_method = 1
--GO

--/* Step 5: Create a stored proc for sending the [Create_user] information as .CSV file */

--Create proc [dbo].[Deadlock_rpt]
--as
--DECLARE @SQL varchar(Respond to memory_event2000)
--DECLARE @date varchar (2000)
--DECLARE @File varchar(1000)
--select @date= convert(date,GETDATE())
--SET @SQL = 'select * from [Create_user] where flag = 0'
--SET @File = '[Create_user] report'+@date+'.csv'

--EXECUTE msdb.dbo.sp_send_dbmail
--@profile_name = 'test',
--@recipients = 'your email.com',
--@subject = 'low memory threshold reached...',
--@body = '***URGENT***Attached please find the low memory threshold report',
--@query =@SQL ,
--@attach_query_result_as_file = 1,
--@query_attachment_filename = @file,
--@query_result_header = 1,
--@query_result_separator = ' ',
--@query_result_no_padding = 1,
--@query_result_width = 32767


--/* Step 6: Changing the flag to 1 so that next time this information is not sent*/
--update dbo.[Create_user] set flag = 1 where flag = 0
--go

Printing DBCC inputbuffer for any SPID > 50

It helps in many cases .For example if SPIDS are involved in blocking .and we are doing inputbuffer one by one.
For blocking you can just add : and blocked !=0 in the sysprocesses query
Similarly you can modify it for waittype,waittime, CPU etc (as per your need)

--Input buffers
PRINT 'Input buffer SPIDs'

declare @spid smallint
declare @i_buff_string char(30)
set nocount on
declare bufCursor CURSOR FOR SELECT spid from master.dbo.sysprocesses where spid > 50
FOR READ ONLY
open bufCursor
fetch next from bufCursor into @spid
while (@@fetch_status <> -1)
begin
SET @i_buff_string = ('DBCC INPUTBUFFER (' + convert(char(6),@spid) +')')
PRINT '-> '+@i_buff_string
exec (@i_buff_string)
PRINT ''
fetch next from bufCursor into @spid
end
close bufCursor
deallocate bufCursor

Why do we need the Non-clustered Index

I was asked this question by an SME , So I thought let me pass on if it makes sense .

Lets create a table test which has a few columns from Person.address table (Adventureworks database)
select addressid,addressline1,addressline2,postalcode,stateprovinceid into test from Person.Address

On this table I created clustered index on AddressID and stateprovinceID
CREATE UNIQUE CLUSTERED INDEX [combined index] ON [dbo].[testo]
(
[addressid] ASC,
[stateprovinceid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Another non-clustered index was created with include option
CREATE NONCLUSTERED INDEX [provinceID] ON [dbo].[testo]
(
[postalcode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


*** BUT I WILL DISABLE THE INDEX ********

Now I ran a few queries:
-----------------------------------
select * from test where postalcode >'98501'



select * from test where postalcode >'98501' order by stateprovinceid



select postalcode from testo where postalcode >'98501' order by stateprovinceid




select addressline1,addressline2 from testo where postalcode >'98501' order by stateprovinceid



Notice that In all the cases above we have the clustered index Scan and not a simple SCAN .
So the clustered index is being used .And since Its visible that a clustered index is being used there is no need of any other index .Why ............because the data itself lies in the clustered index and we can not improve the plan anymore.

Lets see why a scan would be used
Your Data is sorted on the bases of 2 columns in the clustered index .So any query will use that Index because the data lies there .So in worst to worst case you will have a Clustered Index Scan (Better than a Index scan [heap scan]).

But why Scan and not Seek
Because the columns you are using in the where clause or the columns you are looking for as an output have no index or you can say they are not indexed (indirectly , working like a
heap).So , You will do a scan for those Rows .But since its ordered already , So it will be a clustered Scan.

Now you enable the non-clustered index and run few queries

select postalcode from testo where postalcode >'98501'


It did an index seek .remember it is not a Clustered index seek like below

select addressid from testo where addressid < 100



The reason is that In the earlier case a non-cluseterd index was used to find out the postalcode very quickly but since the data was in the clustered index it needed to point to that Leaf and hence some more efforts .But these efforts were far less than Clustered index scan .

That is the reason we need to Non-clustered index .TO reduce the ROW Lookups or Bookmark lookups.which are the pointers to the clustered index.

Happy learning!!

xp_sendmail: failed with mail error 0x80040111

Hi guys,
Its been long since i posted any new stuff ...so far I've posted nothing related SQL 2000 but i thought this one is needed .

We recently faced a SQL Mail issue on WIN 2k Server .Anyone who has worked on SQL Mail issue in the past would be knowing how irritating this issue can be sometimes.But its actually our dis-interest that makes this issue difficult.Somehow we don't like such issues :) ...Since we resolved this issue and went through different permutations and combinations I would like to discuss this with you all so that in future we can save some time and resources on the same issue.This includes

-> Some very basic understanding on SQL mail .
-> What you should do/do not do when SQL Mail issue comes up.

Brief understanding on SQL Mail
--------------------------------
There are two ways you can send mails in SQL Server 2000 (Not talking about 2005 or 2008).

-> SQL Agent Mail
-> SQL Mail .

Yes , you heard it right .SQL Agent uses its own Extended M(ail)API session where as SQL Mail uses its own .However,Both SQL Mail and SQL Agent Mail connect with Microsoft Exchange Server, Microsoft Windows NT® Mail, or a Post Office Protocol 3 (POP3) server using Mail Profile. It is possible to use different profiles for SQL Mail and SQL Agent Mail at the same time.
Extended MAPI is nothing but a better version over Simple MAPI .Anyway, we don't have to go in this direction .

The SPs that we use in SQL Mail are :

xp_startmail <-- used to start the mail session
xp_sendmail <-- To send mail .This invokes sp_processmail.
xp_findnextmsg <--sp_processmail uses this SP
xp_readmail <-- used by sp_processmail
xp_deletemail<--deletes the message.
sp_processmail <-- processses the mail
xp_stopmail<-- Stops the mail session

We will concentrate on First two and the last one in the list above.You can read about these SPs under : http://msdn.microsoft.com/en-us/library/ms186841.aspx?PHPSESSID=lm71lj7i6gj5fjtebtb2srl4n3
This link is for sp_findnextmsg but has link to all other SPs and XPs.


The Issue that we faced.
-------------------------
One of the job was not sending the mail after completion .SQL Mail was connecting to Exchange server successfully before .
Outlook 2000 client was installed on the Win 2K server (and was working fine under SQL Server service account).
Although we concentrated mostly troubleshooting SQLMail it was SQL Agent mail issue actually .
But Since troubleshooting steps are same for both , we resolved the issue .

TroubleShooting done

-> Logged in to the Server > opened the QA and ran sp_stopmail .Success
-> Sp_startmail ....Took 10 mins to start and with error.
-> The error was :
Server: Msg 18025, Level 16, State 1, Line 0
xp_startmail: failed with mail error 0x80040111
-> We tried to send the test mail using xp_sendmail
xp_sendmail 'abhay.chaudhary@in.ibm.com','test'
Server: Msg 18025, Level 16, State 1, Line 0
xp_sendmail: failed with mail error 0x80040111
-> Assuming that the Profile might be corrupt we created nwe profile through control panel >> Mail option .
-> To our surprise we could not see that profile in the drop down list of both SQL Mail and Agent properties.
-> So we failed there as well .We have the new profile , but we cant use it .Another Big question ? WHY ?
-> Agent logs were showing this :[260] Unable to start mail session (reason: Unable to logon (reason: Unable to get the default MAPI Message Store due to MAPI
error 273: The information store could not be opened); check the mail profile and/or the SQLSERVERAGENT service startup account)
-> After wasting time on 18025 we concentrated on Error 0x80040111
-> Found KB 238119 (List of Extended MAPI numeric result codes)
-> found that the error means : MAPI_E_LOGON_FAILED
-> Now we were clear that this is an authentication issue.
-> SQL Agent mail was not able to talk to Exchange Server.
-> The exchange Engineer cleared that since both the servers can ping each other and we can use outlook web using the same account under Exchange is running , there is no isues with
Exchange Mail box .
-> He was pretty convincing.
-> Then we found that (Actually we already knew that )in the SQL Mail configuration it was mentioned that we need to make sure that in we are logged in to the server using the SQL Server
service account.
-> That is what we have not done and were not able to do actually (Since begening) because of some permission issues on that account while logging on to the Win 2k Server.
-> Since rest all trials were failing , we decided to force ourselves for one last time in resolving this issue after loggin gon tho the server using SQL Server domain account .
-> The Intel team helped us in doing that .
-> We logged on to the server >> created a new profile >> Changed the SQL Agent setting to point to that profile (which was now visible..wow).
-> Started the mail session using xp_startmail.
SQL Mail session started.
-> Sent a test mail : xp_sendmail 'abhay.chaudhary@in.ibm.com','test'
Mail sent.
-> That resolved the issue.

Do's to solve this issue
--------------------------
-> Ping the exchange server or the POP3 Server and make sure its a success.
-> Make sure that the Domain account for SQL Server(including agent) , Exchange and OS are same .
-> Stop the mail session first as it might be Stuck.
-> Start the mail session .If it works , fine .
-> If not then read the MAPI error and match it with the KB 238199 to understand the root cause .
-> In one of the situations when I was testing it on Lotus Notes to POP 3 account on Gmail I got 8004011d which means MAPI_E_FAILONEPROVIDER (due to slow network)
-> Check the Agent logs and make sure MAPI32.dll exists (Using MAPI32.DLL from C:\WINDOWS\SYSTEM32 (version 1.0.2536.0))
-> Make sure the mapistub.dll exists in C:\WINDOWS\SYSTEM32 folder
-> You can also test you MAPI profile : xp_test_mapi_profile 'hello'
-> Create the new profile and see if that helps .If it does it means that the previous profile was corrupt(We did not test this in our case and created a new profile straightaway).

Don'ts
---------
Never try to resolve this issue without logging on to the server with SQL Server service account .

Hope this helps .
let me know your comments ..

Tuesday, January 5, 2010

All about locking , blocking and deadlocking ....Part 5

This is the last one in this series .Let me know if you have any comments : Good or Bad (",)

How to reduce blocking and deadlocks:
I can give you one answer from my own and then we will see more : Do not commit the stupidity that I did in the example in the previous post (deadlock example).

As a DBA
-> Make sure that there are proper Indexes on tables and that they are actually being used.sometimes we feel relaxed after creating the indexes .But the query might be written in such a way that , that particular index might not be used and we might be doing scan instead of seek .One Such example is not using SARGable arguments (Now dont ask whats that ..we will discuss some other day or refer BOL).

-> Make sure you rebuild the indexes regularly based on the fragmentation level (you might be having so many scripts to do that).Do not bother to rebuild if the number of pages are less than 1000.rebuilding index will recompute the stats

-> Check if the Query is doing parallelism .If yes then Try everything else before running the query with maxdop hint.Many a times under CPU pressure the ECIDs have to wait on CPUs as its been taken by other threads.Thus causing blocking and or deadlocking .

-> Although the auto-update stats is ON by default.It does not mean that you should not update the stats .I will not cover this fact in detail here .Check auto-update statistics in BOL and you will know a lot ....trust me .

-> Any query that is going to do sorts in tempdb will surely take a lot of time as compared to in database sorting .For more on this check the facts on my Blog entry http://ms-abhay.blogspot.com/2009/11/sorts-spills-in-tempdb-due-to-incorrect.html

-> make sure that you are not facing disk IOs contention even after following steps .Perfmon will show high Disk queue length .Normally DQL of 2 per spindle is OK

-> Check on what resource the query is waiting using sysprocesses .It might be Network IO as well .

-> When you do a bulk insert disable the indexes and enable them during the time of least activity so that it can catch up to the new rows and update the stats accordingly .Or better go for rebuilding it after the bulk insert or delete.

-> Do not play with the locks option in sp_configure .I would suggest to keep the default setting of 0.

-> Deadlocks can occur due to memory pressure .When concurrent requests are waiting for memory grants that cannot be satisfied with the available memory,a deadlock can occur.For example,two concurrent queries,Q1 and Q2,acquire 10MB and 20MB of memory respectively.If each query needs 30MB and the total available memory is 20MB,then Q1 and Q2 must wait for each other to release memory.This results in a deadlock.


As a Developer

-> Use Temp tables instead of temp variables .Or test both and see which one works better .There is a great debate outside on this topic.

-> Use as less Adhoc queries as possible .There might be situations where too many adhoc queries kicks off the cached plans of SPs and in turn those SPs will
again be recompiled.If you want to use Adhoc queries use EXEC or sp_executesql and the plans will be cached.E.g. exec ('select * from t2') .First time it took 10 seconds .Second time it took 5 seconds.

-> Keep Small batches and transactions .Do not run a lot of transactions from one connection or SPs and try to achieve everything in one go.

-> Try to avoid Sorts in tempdb by creating and calling the temp table from the same SP and not from other SP .Google or Bing on SORT IN TEMPDB

-> Its not always true to say that more tempdb files will improveperformance .Your SAN should be capable enough to do that .

-> Choose Isolation level as per requirement .If you can live with read-uncommitted , go for it .

-> Its not bad to go for Snapshot isolation level but it will stress your tempdb.

-> Keep your table normalized so that you use the DISTINCT clause as less as possible .But make sure you do not over normalize the tables else you will be
ending up making your query more complex to understand for optimizer as it will have more joins .

-> One of the most important reasons for blocking and deadlocking could be lock escalation .If you have a slightest doubt that your query is going for lock
escalation , use Lock:Escalation event in profiler and see if its actually happening .If you find it happening , check if the same SPID is the blocker or
being blocked as well .you can disable lock escalation by enabling trace flag 1211 But I would not suggest that.To avoid lock escalation modify your query
to take less locks.

Tip :
Mostly I have seen that, if the deadlocking issue comes right after the application goes live then its the application fault (the way it synchronizes the
access to the data) .But if the application owner says that everything was fine and he is facing the deadlocks all of a suden then some major data change
might have happened in the tables that might be causing blocking and deadlocking .One such cause is " Parameter Sniffing ".

Sometimes we do a select in such a way that we crosses the threshold of in-database sort and the sort goes to tempdb
(http://ms-abhay.blogspot.com/2009/11/sorts-spills-in-tempdb-due-to-incorrect.html) .In this case we need to keep on testing the query until we find the threshold where the query spills the sorts in the tempdb and reduce the value of the column in the where clause.

Both Developer and DBA has to work together in resolving such issues.


SET DEADLOCK_PRIORITY Option

Sometimes , you might wish that in any case a particular transaction should not be treated as a victim in the event of deadlock.
In that case you can set the deadlock_priority of the trasnaction higher than others.An example could be a case of an update and select , you might be ready
to rollback the select but you want the update to be successful .

Which session is chosen as the deadlock victim depends on each session's deadlock priority:
-If both sessions have the same deadlock priority, the instance of SQL Server chooses the session that is less expensive to roll back as the deadlock victim.
For example, if both sessions have set their deadlock priority to HIGH, the instance will choose as a victim the session it estimates is less costly to roll
back.

-If the sessions have different deadlock priorities, the session with the lowest
deadlock priority is chosen as the deadlock victim.

Unresolved Deadlocks
You need not to bother about it .If it happens open a case with Microsoft as a prooduct bug .Either they have the hotfix or they dont .
In the error log you will find the deadlock graph and the Dump information .

*******************************************************************************
2007-09-12 11:43:53.51 spid4s *
2007-09-12 11:43:53.51 spid4s * BEGIN STACK DUMP:
2007-09-12 11:43:53.51 spid4s * 09/12/07 11:43:53 spid 4
2007-09-12 11:43:53.51 spid4s *
2007-09-12 11:43:53.51 spid4s * Unresolved deadlock
2007-09-12 11:43:53.51 spid4s *
2007-09-12 11:43:53.51 spid4s *
2007-09-12 11:43:53.51 spid4s *
*******************************************************************************
You will also see Cost:(N/A) and deadlock victim=process0 in the deadlock graph.


Few references
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005
http://technet.microsoft.com/hi-in/library/cc966425(en-us).aspx

How to resolve blocking problems that are caused by lock escalation in SQL Server
http://support.microsoft.com/kb/323630

All about locking , blocking and deadlocking ....Part 4

continuing from the previous 3 entries ............

Blocking
So far we have executed many queries and seen blocking as well.But I would still like to discuss blocking a bit more ...
Use sysprocesses DMV/Table to find blocking .This has 3 advanages .One that you will see only the blocking SPIDs and not all the SPIDs by using sp_lock.
Two , you will be able to see head of the blocking chain. Three , you wil be able to see on what resource the SPID is waiting for .Once you have all the knowledge above you can surely use sp_lock to get some more information .

Lab

1st User
set transaction isolation level serializable
begin tran
update t1 set c1 = 3000 where c2 < 3000
--rollback

2nd user
set transaction isolation level serializable
begin tran
select * from t1

Now check the sysprocesses DMV (I am using 2005) :
select * from sys.sysprocesses where blocked <>0

You will notice that lastwaittype is LCK_M_IS
This means that the SPID which is being blocked is actually waiting for an IS lock to grab but cannot and being blockd by another spid.
The SPID that is blocking is shown in the "Blocked" column .
You can cross check by running sp_lock .

Now if you run one more query :
set transaction isolation level serializable
begin tran
select * from t1

You will notice the now there are 2 rows in the sysprocesses DMV where blocked <>0
here you will see :

SPID A is blocked by SPID B --> SPID B is blocked by SPID C
So who is the actual culprit ..yes its C and is called as the head of the blockign chain .


Now a tough question for you :
Why SPID A is blocked by SPID B and not C ? .....Think :) ..


Deadlocks
you have already seen blocking ..
Lets try our hands at creating a deadlock.I am not going to explain what is a deadlock .

For the benefit of all I will keep the Isolation level to serializable .
It will help us in creating deadlocks easily .

Lab
First do this :
select count(*) from t2
select count(*) from t1

If the number of rows are same then we are good to go .
If not, then drop the table with less number of rows and recreate it with the help of another table .
How to Do it .......check previous updates ...



First Window :
set transaction isolation level serializable
go
begin tran
update dbo.t3 set c2 = 230 where c2 > 1000 and c3 > 700

Second Window :
set transaction isolation level serializable
go
begin tran
update dbo.t2 set c3 = 230 where c2 > 1000 and c3 > 700


First Window :
begin tran
select * from t2
--rollback

Second Window :
begin tran
select * from t3
--rollback


You will see that the SPID in second window will be the deadlock victim:

Msg 1205, Level 13, State 56, Line 2
Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.


Like this you can create many deadlocks

How to capture the deadlock Graph
There are 2 ways :

- Use Traceflag 1204 <-- for 2000 and 1222 <-- for 2005 and above.
- Use WMI class deadlock_graph to capture the graph for SQL Server 2005 and above(for 2000 its a bit tricky .Hence not mentioning it here.)


Using TraceFlag 1222
dbcc traceon(1222 , -1)
go

Create the deadlock (I am creating the same one as mentioned above)

Using WMI Class and SQL Server Agent
Please go through one of my previous post : http://ms-abhay.blogspot.com/2009/09/wmi-alert-for-deadlocking-no-need-to.html

How to read the deadlock graph
Check the Error logs and copy the deadlock information on a separate notepad window.
You can find my analysis below:

deadlock-list
deadlock victim=process968d48
process-list
process id=process968d48 taskpriority=0 logused=543996 waitresource=OBJECT: 8:629577281:0 waittime=3359 ownerId=29121 transactionname=user_transaction lasttranstarted=2009-12-28T15:40:39.240 XDES=0x3e8f838 lockMode=IS schedulerid=2 kpid=6124 status=suspended spid=52 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2009-12-28T15:40:56.500 lastbatchcompleted=2009-12-28T15:40:39.413 lastattention=2009-12-28T15:31:16.907 clientapp=Microsoft SQL Server Management Studio - Query hostname=abchaudh hostpid=1336 loginname=abchaudh\Abhay isolationlevel=serializable (4) xactid=29121 currentdb=8 lockTimeout=4294967295 clientoption1=671090784 clientoption2=390200
executionStack
frame procname=adhoc line=2 stmtstart=26 sqlhandle=0x02000000bea8742faf157cd9e8e65d380d705fefe8dfb182
select * from t3
inputbuf
begin tran
select * from t3
process id=process969018 taskpriority=0 logused=576116 waitresource=OBJECT: 8:597577167:0 waittime=10000 ownerId=28599 transactionname=user_transaction lasttranstarted=2009-12-28T15:40:34.780 XDES=0x3e8f290 lockMode=IS schedulerid=2 kpid=8140 status=suspended spid=53 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2009-12-28T15:40:49.873 lastbatchcompleted=2009-12-28T15:40:34.907 lastattention=2009-12-28T15:27:27.483 clientapp=Microsoft SQL Server Management Studio - Query hostname=abchaudh hostpid=1336 loginname=abchaudh\Abhay isolationlevel=serializable (4) xactid=28599 currentdb=8 lockTimeout=4294967295 clientoption1=671090784 clientoption2=390200
executionStack
frame procname=adhoc line=2 stmtstart=26 sqlhandle=0x02000000b7cdeb20804433e7571f29d85516ff688d005724
select * from t2
inputbuf
begin tran
select * from t2



resource-list
objectlock lockPartition=0 objid=597577167 subresource=FULL dbid=8 objectname=deadlocktest.dbo.t2 id=lockcc7f700 mode=X associatedObjectId=597577167
owner-list
owner id=process968d48 mode=X
waiter-list
waiter id=process969018 mode=IS requestType=wait
objectlock lockPartition=0 objid=629577281 subresource=FULL dbid=8 objectname=deadlocktest.dbo.t3 id=lockcc7f780 mode=X associatedObjectId=629577281
owner-list
owner id=process969018 mode=X
waiter-list
waiter id=process968d48 mode=IS requestType=wait




Analysis :
----------
Divide the output into 2 parts : Deadlock list and resource list

process968d48
--------------
logused=543996
waitresource=OBJECT: 8:629577281:0 <-- here 8 is DBID ,629577281 is the object ID and 0 is heap
lockMode=IS
spid=52
sbid=0 <-- Its the Batch ID .It will be 0 until the application is using MARS (check BOL for MARS)
ecid=0 <-- zero means no paralelism is going on (Refer BOL for Query Parallelism)
transcount=2 <-- We are running 2 transaction under this SPID (select and update)
isolationlevel=serializable
currentdb=8
procname=adhoc (select * from t3) <-- Its an adhoc query
inputbuf =
begin tran
select * from t3


process969018
--------------
logused=576116
waitresource=OBJECT: 8:597577167:0
lockMode=IS
spid=53
sbid=0
ecid=0
transcount=2
isolationlevel=serializable
currentdb=8
procname=adhoc (Select * from t2)
inputbuf=
begin tran
select * from t2


Story
---------
-> At the time of deadlock 2 queries were running (under SPID 52 and 53) .
-> The processes were process968d48 for SPID 52 and process969018 for SPID 53
-> process968d48 took X lock on table T2 and requested IS on table T3 ..which was blocked by process969018
-> process969018 took X lock on table T3 and requested IS on table T2.. which was blocked by process968d48
-> deadlock victim is process968d48

Why process968d48 is deadlock victim ?
Because it has less undo to rollback .

How do you know if it has less undo ?
See log used values .


Rest ..........in next blog ....keep looking ...