Leggo My Craiggo: February 2009

Friday, February 6, 2009

SQL Cluster configuration and moving the MSDTC.

So over the last 5 or 6 days we have been dealing with a mess of a SQL cluster here at the office. After speaking at length with MS on the issue we needed to perform the following actions.

Move the entire SQL Cluster group along with the MSDTC resource to the cluster group.

Fix the dependencies on the MSDTC resource.

Add a new clustered instance of SQL that can fail-over between all 3 nodes.

According to MS having the MSDTC resource depending on the SQL drive is NOT a supported configuration. MS SQL Team and the MS Windows Cluster Team both had to be involved in case things went bad during the move. MS wouldn't allow me to perform these cluster changes myself due to some possibility of the whole thing going south.

So we moved the MSDTC and SQL group. Then deleted the MSDTC and created a new one. Then we deleted the folder on the SQL data drive that had been there. We made the MSDTC dependent on the Quorum drive. So at this point MSDTC was all set. From there we moved all the SQL stuff to the right place in the SQL cluster first instance.

Then we installed the second instance and then upgraded it to fail over between all 3 nodes of the cluster.

Now that all the cluster stuff was fixed we had to move TEMPDB and the SQL DB and Tlogs and Backups. From there we were good. Since we were on an EMC array PowerPath FULL was then installed on all 3 nodes to ensure that connectivity via fibre-channel was as reliable as it could be.

Since it was 5am at this point and I had been working all night I calmed down and wrapped everything up. Then as quickly as it started it was all over and I could get some sleep :). LOL

Wednesday, February 4, 2009

Be BOLD!

In my employment history I have run into a number of situations that I wanted to discuss with other professionals. Mainly because true SPARKs or TALENTs understand and appreciate being in similiar types of situations.

I started asking myself the following questions:

If I could go back in time and speak to myself 10 years ago, what would I tell myself?
Would I discuss world-views and big picture corporate understanding?
What tidbits that I know now would I tell myself?

These are all good questions. If you are a young person or really ANY person that wants a deeper understanding on how to circumvent the "Old Garde" then keep your eyes peeled.

Go to : The Corporate Bold Homepage.

There you will read about a book that could help many people. I have been selected as a co-author. The book is slated to come out this summer. It is a collection of 100 short experiences. I can't give you anymore information, but realize that it's 100 experiences and pieces of truth from 100 Top Performers.

If you have questions visit the site and ask them.

More SQL Cluster stuff to come soon. :)

Monday, February 2, 2009

SQL Cluster Storage Changes.

This past weekend we had a client environment that needed to move from Microsoft iSCSI to Fibre for SAN connectivity. The reasons for this are numerous.

Fibre connectivity has greater throughput.
The Microsoft iSCSI initiator is junk for a clustered environment. ( when installing 2.07 the cluster nodes would intermittently lock up and lose resources, this is a known issue with 2.07. We updated to 2.08 and still had issues.)
PowerPath on Fiber has proven to be solid numerous times.
Less complexity.
Greater reliability.

We ran into some issues though and since I didn't see any direct posts on the web or on TechNet. So I will detail the issue and the solution here.

Specs:

Servers HP DL380G5; Windows 2003 R2 Enterprise 32-bit, SQL Enterprise 32-bit. Current cluster configuration is Active/Passive. Initial storage configuration is iSCSI. Moving to Fibre HBAs.

Issue:

After installing a Fibre HBA Dual port card and zoning it to the storage. We uninstalled the MS iSCSI initiator and disabled the iSCSI NIC ports on the server. Upon rebooting we attempted a full failover. The cluster group would fail over with the Quorum but the SQL instance would fail over the IP resources and the T log drive then hang. The result is a full failover is not possible, which in an environment that HAS to be online is not a good thing. The event ID would reference not being able to flush the transaction log.

Solution:

After reboots and trying to see if we missed anything these are the steps that we followed to resolve the issue:

Uninstall from windows the NICs used for iSCSI.
Remove the physical adaptor used for iSCSI from the server.
Reboot.

Once these steps were completed a full failover was again possible.

If you have any questions/concerns or if you are having this issue yourself please feel free to comment. We will do our best to assist you.

Leggo My Craiggo