Yukon-Katmai Discussion: July 2010

Tuesday, July 27, 2010

Msg 8914, Level 16, State 1, Line 1 -> Incorrect PFS free space information for page

Msg 8914, Level 16, State 1, Line 1
Incorrect PFS free space information for page (1:61991) in object ID 1993058136, index ID 1, partition ID 72057594955366400, alloc unit ID 71906736119218176 (type LOB data). Expected value 0_PCT_FULL, actual value 100_PCT_FULL.

This was the error we were getting in the Docs table of one of the Sharepoint database .The Compatibility level was 80 and Build was 1399 (2005 RTM).

I tried a lot of things on it like :
-> I Rebuilt the clustered index with and without LOB_COMPACTION option .
-> DBCC page shows is fill factor 100
-> Changed the fill factor to 100 explicitly
-> Ran dbcc updateusage
-> changed the compatibility level to 90
-> changed the fill factor to to 99 ,50 etc
Nothing helped .The profiler did not show much (my intention was to know what checkdb is doing internally).

Finally I took the backup of the database and restored it as a test database .It did not give any errors .This means that actually its not a corruption .

On the restored database Ran DBCC checkdb with repair_allow_data_loss .
It fixed the issue without harming the data .Finally Ran the same on the Sharepoint database and it resolved the issue .

Hope this gives you the confidence to run the repair_allow_data_loss for this issue .
But remember , almost every time if you run it with repair_allow_data_loss you will end up loosing the data .So be careful .

This situation was AN EXCEPTION and you can safely use this option of checkdb.

Root cause :
Microsoft says that
the engine (just like OS does which giving pages to processes ) pre-allocates a set of data pages (say X) to the SPID which needs it and marks them as 100% full in PFS assuming that those pages will eventually get filled very soon.It does this to avoid frequently updating PFS page and improving performance.But later when the SPID completes its work in less pages (say X-Y) , these remaining pages are released .However, the remaining pages should be marked again as empty (0_PCT_FULL) which it does not do and hence DBCC CheckDB reports those errors (SQL 2000 silently use to fix it ).Repair_allow_data_loss will fix it with no data loss actually.

Regards
Abhay

Monday, July 26, 2010

Finding the last date when the LOG/FULL/DIFF/FILEGROUP backup was taken for all the databases

Hi Guys ,
While creating a few scripts , a requirement came where I had to find the last backups (all types) taken for all the databases (except tempdb) .
Please find the script below .Hope it helps you in your daily activities .If you want to automate it for all the instances in your environment , please let me know and I can send you some more files.

/*
Script : Last_bckp.sql
Author : Abhay Chaudhary,
Date : 26th JUL, 2010
Purpose : Collecting SQL Server 2000/2005/2008 last backup taken information.
Requirements : Do a CTRL+F and change the to the DB where you want to
create the object.
Suggestions : hi_abhay78@yahoo.co.in
Version : 1.0
*/

USE
set nocount on
if not exists (select * from ..sysobjects where name ='bckp_types' and type ='S')
begin
create table ..bckp_types (num int identity(1,1),type varchar(1),bkp_name varchar(20))
insert into ..bckp_types (type,bkp_name) values ('D','Full backup')
insert into ..bckp_types (type,bkp_name) values ('L','Log Backup')
insert into ..bckp_types (type,bkp_name) values ('F','Filegroup backup')
insert into ..bckp_types (type,bkp_name) values ('I','Differential backup')
end
go

Declare @loop int
select @loop= max(num) from bckp_types
While (@loop !=0)
begin
Select 'last ' + bkp_name +' taken details.' from bckp_types where num=@loop
declare @bk_type varchar(1)
select @bk_type = type from bckp_types where num=@loop

SELECT s.name 'database Name',
b.backup_finish_date 'last backup date',
bmf.physical_device_name 'location of backup'
FROM master..sysdatabases s LEFT OUTER JOIN msdb..backupset b ON s.name = b.database_name
INNER JOIN msdb..backupmediafamily bmf ON b.media_set_id = bmf.media_set_id
WHERE s.name <> 'tempdb'
AND b.backup_finish_date = (SELECT MAX(backup_finish_date)
FROM msdb..backupset
WHERE database_name = b.database_name
AND type = @bk_type)
ORDER BY s.name

set @loop=@loop-1
end
go
Drop table ..bckp_types

Happy Learning ...
Abhay

Friday, July 16, 2010

SQLServer Error: 848, SQL Network Interfaces: The system detected a possible attempt to compromise security.

We faced a strange but simple issue yesterday and as usual I would like to share it with you .

Situation :
-------------
SQL server 2005 SP2
Windows Server 2003 SP2
Cluster : Yes 2 node A-P cluster

Service account of SQL Server Agent service and SQL Server service were same .SQL Server is Clustered .

While SQL Sevrer as well as agent were running fine the account under both the services are running ,got locked(we came to know this later as a rootcause of this issue).Still ,everything was fine and there was no issue since the account got locked after SQL Server and agent were started.

Then we found that all the jobs that were scheduled stopped working .In the job history we found that there is no JOB HISTORY created since the jobs stopped working .But there was not a single failure of the jobs .

Which means that the jobs were not scheduled by the Job schedular >> to the Threads >> to the SPIDs .So , we manually executed the jobs and all of them completed successfully .But again , there was no history being created and those jobs were not doing anything .For example , the backup job was running successfully when we ran it explicitly but no backups were taken .

To drill down further , we ran the commands under the jobs in QA and those were running fine .We created new jobs and there was no change at all in the situation .

Then we checked the SQL Agent logs and found this :

[298] SQLServer Error: 848, SQL Network Interfaces: The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you. [SQLSTATE HY000]
[298] SQLServer Error: 848, Cannot generate SSPI context [SQLSTATE HY000]
[382] Logon to server '(local)' failed (ConnAttemptCachableOp)

This was strange to us as why the connectivity error is not being displayed when we were explicitely executing the job, which completes successfully and doing nothing.
But since it was the connectivity error by agent , we decided to run the jobs by logging on to the server using the account under which SQL Server and agent are running.

We then found out the the account was locked under which SQL Server and agent were still running.

Once the account got unlocked at the AD ,the jobs ACTUALLY started working .

To me it looks like a bug in design and i have logged it on the CONNECT :
https://connect.microsoft.com/SQLServer/feedback/details/575388/strange-behaviour-in-sql-agent-job-on-cluster-where-the-job-runs-but-does-not-do-anything

hope it helos you to resolve your issue .

Yukon-Katmai Discussion

Search This Blog