Oracle M5000 sparc sunucularda
fmadm ile görülen donanım hatalarının silinmesi için aşağıdaki adımlar sırayla
uygulanır. Öncelikle sunucu üzerinde alınan hatalar aşağıdaki şekilde
listelenir.
root@FENERBAH2:~$ fmadm faulty -a
---------------
------------------------------------
-------------- ---------
TIME EVENT-ID MSG-ID
SEVERITY
---------------
------------------------------------
-------------- ---------
May 30 2012 ef9dfccb-37bc-63d1-f8df-fe6031378ca8 PCIEX-8000-3S
Critical
Host : FENERBAH2
Platform : SUNW,SPARC-Enterprise Chassis_id
: GALATA2
Product_sn :
Fault class :
fault.io.pciex.device-interr max 50%
fault.io.pciex.bus-linkerr 25%
Affects : dev:////pci@12,600000/pci@0
dev:////pci@12,600000
faulted but still in service
FRU : "iou#1-pci#3"
(hc:///component=iou#1-pci#3)
faulty
Description : A problem has been
detected on one of the specified devices or on
one of the specified connecting
buses.
Refer to http://sun.com/msg/PCIEX-8000-3S
for more information.
Response : One or more device instances may be
disabled
Impact : Loss of services provided by the device
instances associated with
this fault
Action : If a plug-in card is involved check for
badly-seated cards or
bent pins. Otherwise schedule a
repair procedure to replace the
affected device(s). Use fmadm faulty to identify the devices or
contact Sun for support.
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
---------------
------------------------------------
-------------- ---------
May 31 2012 e4b06bcb-e8ed-6f9c-c1cc-ee8216c1ec63 SUNOS-8000-FU
Major
Host : FENERBAH2
Platform : SUNW,SPARC-Enterprise Chassis_id
: GALATA2
Product_sn :
Fault class :
defect.sunos.eft.undiag.fme
FRU : None
faulty
Description : The diagnosis engine
encountered telemetry for which it was
unable to perform a
diagnosis. Refer to
http://sun.com/msg/SUNOS-8000-FU
for more information.
Response : Error reports have been logged for
examination by Sun.
Impact : Automated diagnosis and response for
these events will not occur.
Action : Ensure that the latest Solaris Kernel
and Predictive Self-Healing
(PSH) patches are installed.
---------------
------------------------------------
-------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
---------------
------------------------------------
-------------- ---------
May 30 2012 4eb0d6f7-19fe-ef16-f047-d81526351d1b PCIEX-8000-MH
Major
Host : FENERBAH2
Platform : SUNW,SPARC-Enterprise Chassis_id
: GALATA2
Product_sn :
Fault class :
fault.io.pciex.device-interr-unaf
Affects : dev:////pci@12,600000/pci@0
faulted but still in service
FRU : "iou#1-pci#3" (hc:///component=iou#1-pci#3)
faulty
Description : Too many recovered
errors have been detected, which indicates a
problem with the specified PCIEX
device. This may degrade into an
unrecoverable fault.
Refer to http://sun.com/msg/PCIEX-8000-MH for
more information.
Response : One or more device instances may be
disabled
Impact : Loss of services provided by the device
instances associated with
this fault
Action : Schedule a repair procedure to replace
the affected device. Use
fmadm faulty to identify the
device or contact Sun for support.
Bu hatalar giderilmiş ise bunlar FMD veri tabanına bu
hataların repair edildiği beliritilir.
root@FENERBAH2:~$ fmadm repair
4eb0d6f7-19fe-ef16-f047-d81526351d1b
fmadm: recorded repair to
4eb0d6f7-19fe-ef16-f047-d81526351d1b
root@FENERBAH2:~$ fmadm repair e4b06bcb-e8ed-6f9c-c1cc-ee8216c1ec63
fmadm: recorded repair to
e4b06bcb-e8ed-6f9c-c1cc-ee8216c1ec63
root@FENERBAH2:~$ fmadm repair
ef9dfccb-37bc-63d1-f8df-fe6031378ca8
fmadm: recorded repair to
ef9dfccb-37bc-63d1-f8df-fe6031378ca8
root@FENERBAH2:~$
Repair edilen bu hatalar için
bundan sonraki adımlar aşağıdaki şekilde uygulanır. Bu ara başlıkların Türkçe
karşılıklarını bulamadığımdan orijinal hallerini yazdım.
Clear ereports ve resource cache
Burada sunucu üzerinde bulunan bazı folderlar ve dosyalar
silinir.
root@FENERBAH2:~$ cd /var/fm/fmd/
root@FENERBAH2:/var/fm/fmd$ ls
ckpt errlog
fltlog rsrc xprt
root@FENERBAH2:/var/fm/fmd$ ls -al
total 299
drwxr-xr-x 5 root
sys 7 May 30 2012 .
drwxr-xr-x 3 root
sys 3 May 25 2012 ..
drwx------ 4 root
sys 4 May 31 2012 ckpt
-rw-r--r-- 1 root
root 80995 Dec 27 18:24
errlog
-rw-r--r-- 1 root
root 62538 Jun 5 15:17 fltlog
drwx------ 2 root
sys 7 Mar 9 2013
rsrc
drwx------ 2 root
sys 2 May 25 2012 xprt
root@FENERBAH2:/var/fm/fmd$ rm e*
root@FENERBAH2:/var/fm/fmd$ ls
ckpt fltlog
rsrc xprt
root@FENERBAH2:/var/fm/fmd$ rm f*
root@FENERBAH2:/var/fm/fmd$ rm
ckpt/eft/*
root@FENERBAH2:/var/fm/fmd$ rm rsrc/*
clearing out FMA files with no reboot needed
Sunucu reboot etmeden bu database’in aşağıdaki komutlarla
silinebileceği gösteriliyor.
root@FENERBAH2:/var/fm/fmd$ svcs -a
|grep fmd
online Nov_06 svc:/system/fmd:default
root@FENERBAH2:/var/fm/fmd$ svcadm
disable -s svc:/system/fmd:default
root@FENERBAH2:/var/fm/fmd$ cd
/var/fm/fmd/
root@FENERBAH2:/var/fm/fmd$ ls
ckpt
rsrc xprt
root@FENERBAH2:/var/fm/fmd$ ls -al
total 15
drwxr-xr-x 5 root
sys 5 Jun 5 15:19 .
drwxr-xr-x 3 root
sys 3 May 25 2012 ..
drwx------ 4 root
sys 4 May 31 2012 ckpt
drwx------ 2 root
sys 2 Jun 5 15:19 rsrc
drwx------ 2 root
sys 2 May 25 2012 xprt
root@FENERBAH2:/var/fm/fmd$ find
/var/fm/fmd -type f -exec ls{} \;
root@FENERBAH2:/var/fm/fmd$ find
/var/fm/fmd -type f -exec rm{} \;
root@FENERBAH2:/var/fm/fmd$ svcadm
enable svc:/system/fmd:default
reset the FMD send modules.
Bu aşamada aşağıdaki şekilde modüller resetlenir.
root@FENERBAH2:/var/fm/fmd$ fmadm
reset cpumem-diagnosis
fmadm: cpumem-diagnosis module has
been reset
root@FENERBAH2:/var/fm/fmd$ fmadm
reset cpumem-retire
fmadm: cpumem-retire module has been
reset
root@FENERBAH2:/var/fm/fmd$ fmadm
reset eft
fmadm: eft module has been reset
root@FENERBAH2:/var/fm/fmd$ fmadm
reset io-retire
fmadm: io-retire module has been
reset
root@FENERBAH2:/var/fm/fmd$
işlem tamamlanmış olup bundan sonra aşağıdaki komut ile
faulty ler listelendiğinde ekrana herhangi bir şey gelmeyecektir.
root@FENERBAH2:/var/fm/fmd$ fmadm
faulty -a
Kaynakça:
Hiç yorum yok:
Yorum Gönder