ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL585 samba hung filesystem

 
charles weber_1
Occasional Visitor

DL585 samba hung filesystem

For the last 1.5 years I have had occasional problems on a large (6.8 TB) Samba server. Two of the mounted filesystems will partially dismount at intervals between 3 days and 3 months. Files will still be open but any local access to the filesystem such as "ls" will hang. I end up having to do a hard shutdown as rebooting will also hang trying to close the filesystem.

I have found no logged errors. I have 3 HP DL585 with multiple 6404 raid controllers. Two run samba and the other is NFS only. This only occurs on one server but it is unfortunately the busiest one. I have replaced cables and 6404 cards. The filesystems have been checked using xfs_repair. HP diagnostics has been run for hours. One of our other DL585 servers is physically very close to the problem server but runs NFS instead of Samba on XFS filesystems. It has not had this problem. The only significant hardware difference between the NFS server and Samba server is that the NFS server has all U320 hard drives.
Physical config:
HP DL 585 with dual processor and 3 6404 4 channel SCSI raid controllers. 6 U320 converted 4200 drive chassis with 72 GB U3/U320 and 146 GB U320. 8 GB ram. Firmware for all parts including disks has been flashed repeatedly over the last two years to current levels. Firmware changes have not made any noticeable difference in this problem. I do wonder about the mix of U3 and U320 drives but each disk carrier is either U3 or U320. Each diskcarrier is set as one ADG array and logical drive. It is then partitioned and formatted such as /dev/ddiss/c2d0p1 with XFS and mounted.
Software:
I started with Fedora Core2 X86_64 and have worked my way to Fedora Core 5 and samba 3.0.22-1.fc5, acl 2.2.34 and xfsprogs 2.7.3-1.2.1. No software changes have made any difference that I can see in this problem. Samba shares support ACLs.
Hardware possiblities:
This has occurred in the same 2 disk carriers. I could change the disk carriers or U320 modules. I worry also about the mix of U320 and U3 disks. I setup a test server dl385 with a 6404 from the problem server and a disk carrier with mix of drives. I could not recreate the problem.
Software possiblities:
Kernel, Samba, ACLs and XFS. But I have tried many versions and not seen any logged errors or change in behavior.