What I really meant was.....: Why SAN is rubbish...or how to make a Solaris host see new stuff WITHOUT having to reboot

I noticed that SAN admins like to say, just reboot the host and it will see the new WWN's and devices.......

Well guess what, Unix hosts ( and linux too I guess) should need to be rebooted about once a decade if they are set up right. (Microsoft are making efforts in this area too.....)

I should not need to reboot for something as trivial as adding a new bit of storage. This is supposed to be enterprise equipment after all.

I get pretty cut up about SAN's fibre channel lack of what I would consider pretty basic functionality.

Basically, evey san device has a unique ID right, just like ethernet. SAN is a packet based network for Storage......so why cannot it be used like ethernet?

Imagine if your server vendors told you that every time you added another device on the IP network, you had to reboot every machine you wantted to talk to it?

I am not talking 2bit SAN here either. I have the same problems with HDS USPV, EMC Symmetrix, EMC Clariion, Cisco MDS 9513 Directors ( Director sounds so much classier than switch, lets double the price...) Multi-million dollar equipment.

Ok, rant over......here come the useful bit.....apologies if my tech terms are a bit off....only the SAN guys really care about Front end ports, back end directors, WWNs etc. Most Unix admins just want to know how to get to the point where the host sees the disk so they can start building volumes and filsystems.

Option 3 below has worked for me every time, I have yet to find a solaris 10 box that it didnt work on.

I came to the conclusion that a reboot should not be necessary, all I needed to do was find out what the server did when initializing the HBA ( host bus adaptor, { again, just a classy name for a SAN network card, which costs about $4k })

So here are some Solaris 10 commands that I have used to detect new SAN equipment that is zoned to the host. ( i have been mostly working on Solaris for the last year. Not because of preference but because of commercial issues, where my employer decided to use Sun rather than another vendor. ) I guess there should be similar functionality on Aix and HP/UX but I havent yet had a need to find out .....

I assume you are root for all the below. If you dont know what that means, then I cant help you much.....Reboot the server ( gently ) like the SAN folks told you.
Also, I use ksh, and am too lazy to port to other shells.

There are 3 basic scenarios:

1. Adding a lun from the same SAN and front end ports (WWN ) and SCSI target that is already hosting some LUNs you are using on the Solaris server:

In this case you have already established the hba<->SAN communication and you are just registering a new LUN.

just running "devfsadm" should be enough to allow you see the new LUN(s)

2. Adding new LUNs from a same SAN and front end ports but different SCSI target:

Here you need to probe the controllers to see if they can register new devices

#! /bin/ksh
fcinfo hba-port | grep "^HBA"| awk '{print $4 }' | while read ln
do
fcinfo remote-port -p $ln -s >/dev/null 2>&1
done

devfsadm

3. Lastly for new SAN arrays you have to find all the hba's and force them to do a link initialization protocol so that they start talking to whatever is out there. This is what the host does on re-boot. I have dont this on production hosts with no ill effects but it ti possible it could cause some interruption to existing data flow so use with care and only if you ~~need to~~ MUST. From experience there is a medium risk of File System corruption

Find which controllers are present:
cfgadm -la | grep fc-fabric
Force Link Initialisation Protocal for each controller you find:
luxadm -e forcelip /dev/cfg/c2
****Stop, wait 5 mins, get a coffee, it can take time for the hba to come back up and mpathadm to recover the paths via the controller you reset. If you dont wait you will likely lose all good paths to the LUN, which gives an IO error and likely corrupts your filesystem. Very BAD****
luxadm -e forcelip /dev/cfg/c4
Check what the hba's can see ( if you want to ):
luxadm -e dump_map /dev/cfg/c2
luxadm -e dump_map /dev/cfg/c4
Go check the newly discovered WWNs for LUNs:
#! /bin/ksh
fcinfo hba-port | grep "^HBA"| awk '{print $4 }' | while read ln
do
fcinfo remote-port -p $ln -s >/dev/null 2>&1
done
Configure the LUNs onto your server:
cfgadm -c configure c2
cfgadm -c configure c4

Check if you got them:

cfgadm -la | grep c2
cfgadm -la | grep c4

Install the disk devices into Solaris:
devfsadm

You can then do what you need via format / Solaris Volume manager / Veritas or whatever

What I really meant was.....

Saturday, June 6, 2009

Why SAN is rubbish...or how to make a Solaris host see new stuff WITHOUT having to reboot

No comments:

Post a Comment

Who the heck?

Followers

Blog Archive