Private Note : My wife is being diagnosed as having stage III overian cancer. Please join and help if you can : http://www.facebook.com/group.php?gid=141171588041&ref=nf
Filer General
Messages at screen is configured at
/etc/syslog.conf.sample
By default, there is no such file, but if user modifies this file, they will have
/etc/syslog.conf ----------- which will tell where to direct
messages at screen ( typically
/etc/messages )
Sysconfig –t ( tape information )
Source –v /etc/rc - this command reads and executes any file
containing filer commands line by line
Auto support ( user – trigger support )
Options.autosupport.doit autosupport@netapp.com
Telneting to Filer
Only one user can do telnet
Options telnet
Autosupport Configuration
Filer>Options autosupport
autosupport.mailhost < >
autosupport.support.to < autosupport@netapp.com >
autosupport.doit <string>
autosupport.support.transport https or smtp
autosupport.support.url < url address must be reachable >
Autosupport troubleshooting
1. ping netapp.com from filer
2. TCP 443 SSL should be open at SMTP server
SMTP server may stay in DMZ side
3. Mail relay in exchange must be specified. Filer’s host name or IP address must be specified in mail relay. Routing for netapp.com or routing by this host or routing by this ip must be enabled for filer. Filer is acting as a SMTP client. In general setup of mail system, no SMTP client is able to send the mail thru mail server to other SMTP server when host’s identity is different as far as mail id is concerned. Relaying is blocked generally.
4. Proxy server http / https must pass http url
Raid Scrub weekly
a. raid.scrub.duration 360
b raid.scrub.schedule sun@01
a. scrub to happen for only 6 hrs
b. forcing the scrub to start on Sunday at 1 am
RAID group
Vol add vol0 –g rg0 2 add 2 disks to raid group 0 of vol0
Vol options yervol raidsize 16 changes the raidsize settings
of the vol yervol to 16
vol create newvol –t raid_dp –r 16 32@36
- newvol creation with raid_dp protection.
RAID group size is 16disks. Since the vol
consists of 32 disks, those will form 2 RAID
group, rg0 & rg1
Max Raid groupsize
Raid DP 28
Raid 4 14
vol options for snapshots
nosnapdir off < default off >
nosnap off < default off >
Disk Fail/unfail
priv set advanced when disk goes bad
disk fail partially then prefail copy
disk unfail is seen when sysconfig -r
sysconfig –d is done. Somestimes it may
Disk troubleshoot just hang there, so disk fail
Priv set advanced -i <disk name> would
Led_on < 1d.16> release the disk &
Led_off < drive id > reconstruct the the RAID
group
Blink_on 4.19 ( failed disk now will be orange )
Blink_off 4.19
Spare disks in vol
Vol status -s
FAS 270 ( this must be done, otherwise not seen )
Priv set advanced
Disk show –v ----- to see who owned it. If this has come from
Another filer, disk block header needs to
Remove. For that
Disk unfail <disk id>
Disk assign 0b.23
Fcadmin device_map
If drive not shown in filer view
Filer> storage show disk -p
Zeroing disks
Priv set advanced
Disk zero spares --- to zero out the data in all spares
Sysconfig –r ( will show % of zero disk ) - spares disks
R100 & R150 Disk Swap
1. find bad disk , identify it
2. type disk swap < disk id >
3. Remove disk
4. Wait 20 sec
5. disk swap again
6.insert new disk
7. wait 20 sec to rescan
Out of inodes
All Rights Reserved Copyright @ 2007
1. Check % used of inodes by
Filer> df –i
2. to increase
Filer> maxfiles < vol name > <max>
NVRAM
Battery check
Filer> priv set diag
Filer> nv
=> should show battery status as OK and voltage as
NVRAM3 6V
Raidtime out in options raid controlls ( 24 hr ) the trigger when bat low
In 940s – NVRAM5 is used as Cluster interconnect card as well, “two in one” on slot 11
Time Deamond
(port 123, 13, 37 must be open)
When there is large skew, lot of messages from
CfTimeDaemon : displacements /skews:10/3670,10/3670, 11/3670
Because of this hourly snapshot creation also fails or in progress message appears.
Because of timed.max_skew set to 30 min, we may see above message in every 30min- 1 hr
If we set this to 5s and see how skew happening – if we see lot of skew messages (once we turned ON to timed.log ON ), MB replacement may require.
For temporary do
Cf.timed.enable ON on both cluster filers and watch those off errors
Checking from unix host
# ntptrace –v filername
From filer check
Filer>options timed
( check all the options of this )
From filer view => set date and time : Synchronize now < ip of NTP server > => do synchronize now and check NTP from unix host.
Tip : if there are multiple interfaces in filer, make sure that they are properly listed in NIS or DNS server – same host name , multiple ip addresses may require
BPS ( Blocks Per Sector ) of Disk
Block Append Checksum requires each disk drive to format it to 520 or 512 BPS per sector This provides a total of 4160 bytes in 8 sectors. This space is broken into two parts. First part is 4096 bytes ( 4K - the WAFL file size ) of file system data. The remaining 64 bytes contain the checksum data for previous 4096 bytes. In this manner, the checksum block is appeneded to each block of data.
Enviromental Status
The top line in each channel says failures to yes , if there is any.
Subsequent messages should say
Power
Cooling
Temperature
Embedded switching [ all to none ]
( if there is no problem )
Volume
Vol options vol0
Vol status vol0 -r ( raid info of volume )
Sysconfig –r
Vol options vol0 raidsize 9
Vol add vol0 <number of disk >
Vol status –l ---- to display all volume
Aggr Volume creation
Filer> create aggr1 10
Filer> vol create log1 aggr1 20g
When vol is gone bad
Vol wafliron start <vol name> -f
To list broken disk in volume
Vol status –f
Sysconfig –r will tell the failed disks
Double Parity
Vol create –t raiddp –r 2 ( minimum of two )
(there are two parity disks for holding parity and double parity data)
Enviroment status – like temp/shelf issues
Environment chesis list_sensors
Environment dump
RSH options - rsh access of fier
Options rsh.enabled on
Adminhost needed to add to do RSH ( can be done from filer
View ) - not root. RSH sec settings must be set with either ip or hostname, but with matching username for logon accounts ( not root, but the domain admin account )
RSH access from unix host
# rsh –l root <console p/w> <ip of filer> “<command>”
( add this unix host in /etc/hosts.equiv file – similar for windows host as well )
( this command can be corned in unix to make it scheduled )
RSH Port 514 / TCP
Registry Walk
Filer> registry walk status.vol.<vol name>