System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

CIFS problem with MS Excel files

 
enrico.nic
Regular Advisor

CIFS problem with MS Excel files

Hi,

 

I am facing a strange problem with a few Excel files opened vis CIFS on an HP-UX server with HP-UX 11.31, Samba/CIFS version A.03.01.05.

On july, 25, in the morning, the same user tried to open at least 4 different Excel files (.xlsx files, in Excel 2007 format) hosted in at least 2 different Samba shares; the user was unable to open any of these files directly from the application.

 

Analyzing the files with a text editor, it came up that the beginning of each file was altered, substituting the existing, previous bytes with some lines of text, containing the Samba log of the client PC where the user was connected from.

 

For example, at the beginning of one of these "Excel" files I've found the following lines:

 

[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=Yes write=Yes
(numopen=3)
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=Yes write=No
(numopen=4)
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=Yes write=No
(numopen=5)
[2013/07/25 10:04:57, 2] smbd/close.c:609(close_normal_file)
baldini closed file BeTACTIC/NEWSLETTER/per istogramma.xlsx (numopen=4) NT_STA
TUS_OK
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=No write=No (
numopen=5)
[2013/07/25 10:04:57, 2] smbd/close.c:609(close_normal_file)
baldini closed file BeTACTIC/NEWSLETTER/per istogramma.xlsx (numopen=4) NT_STA
TUS_OK
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=No write=No (
numopen=5)
[2013/07/25 10:04:57, 2] smbd/close.c:609(close_normal_file)

baldini closed file BeTACTIC/NEWSLETTER/per istogramma.xlsx (numopen=4) NT_STA
TUS_OK
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/~$per istogramma.xlsx read=Yes write=Y
es (numopen=5)
[2013/07/25 10:04:57, 2] smbd/close.c:609(close_normal_file)
baldini closed file BeTACTIC/NEWSLETTER/~$per istogramma.xlsx (numopen=5) NT_S
TATUS_OK
[2013/07/25 10:04:57, 2] smbd/open.c:581(open_file)
baldini opened file BeTACTIC/NEWSLETTER/per istogramma.xlsx read=No write=No (
numopen=6)
[2013/07/25 10:04:57, 2] smbd/close.c:609(close_normal_file)
baldini closed file BeTACTIC/NEWSLETTER/per istogramma.xlsx (numopen=4) NT_STA
TUS_OK
[2013/07/25 10:04:59, 2] smbd/close.c:609(close_normal_file)
baldini closed file BeTACTIC/NEWSLETTER/per istogramma.xlsx (numopen=3) NT_STA
TUS_OK

 

Afterwards some unreadable bytes were present.

 

Now I will recover the original contents from a valid backup: but I want obviously understand what did happen, to avoid such a behavior in the future.

 

Can anybody please help me ?

 

Thank you in advance

Enrico

3 REPLIES
Matti_Kurkela
Honored Contributor

Re: CIFS problem with MS Excel files

I'm afraid that might be data corruption on the filesystem that hosts the CIFS shares. From the information you've told so far, it's impossible to guess the root cause of the filesystem corruption.

 

The messages look like Samba server logs. So although they may be talking about the client workstation where the corruption was detected, they are actually generated by the Samba/CIFS server that fulfilled the workstation's requests.

 

You really should carefully check the storage configuration of your HP-UX server.

In some cases, if the SAN storage-side settings are not correct for HP-UX 11.31, a single multipathed LUN may appear as several new-style (agile) DSFs. At that point, it might be possible to use the same LUN twice if you are not careful.

 

If you cannot find any obvious configuration errors, the next step is to take an extra backup of the filesystem that contained the corrupted files, unmount it, and run a full filesystem check on it (fsck -o full /dev/vgXX/lvolYY).

 

(I recommend an extra backup before unmounting because some files might still be readable only because the system is currently holding an uncorrupted version of parts of the filesystem metadata in its disk cache. When the filesystem is unmounted, this cache will be flushed: if the filesystem is too badly corrupted, it might be impossible to mount it again.)

 

-----

 

Last time I saw something like that, it was because of a SAN configuration error: the same LUN had accidentally been presented to two unrelated computers (one was HP-UX, the other Linux). On the HP-UX system, the LUN had been used to extend a filesystem that contained production data: as the extension was done proactively, it took a few weeks for the production data to grow to use the new LUN. Meanwhile, the Linux development system was happily using the same LUN.

 

When the HP-UX production data reached the new LUN, the first issue was a developer that reported corruption on his files: strange lines full of numbers in the middle of his Java source code. While I was restoring his files from backups, we got another report of file corruption, from the HP-UX side. A bit after that we realized what those "strange lines of numbers" amidst the Java source were: they exactly matched the data format used by the HP-UX system that was also having data corruption issues. At that point, I used a storage-system specific LUN identification tool on both systems, to see the storage system IDs of each LUN presented to the system. Sure enough, there it was: the same LUN was presented to both systems, and was being used by two incompatible file systems simultaneously.

 

At that point, the production application was stopped and an extra backup from the corrupted filesystem was taken on both systems involved, just in case that some of the files might be salvageable. Then new LUNs were presented for both systems. Reconfiguring the VGs on both systems and restoring the filesystems from uncorrupted backups was the easy part: identifying the exact extent of the corruption and re-processing the appropriate production data from uncorrupted original data files took more effort.

MK
enrico.nic
Regular Advisor

Re: CIFS problem with MS Excel files

Hi,

 

In the previous message I was citing errors contained in the file itself; I have looked also into the samba log files, but there weren't supplementary indications about the problem.

Anyway, this morning I was looking in the system logs (/var/adm/syslogs/syslog.log), and I've found the following lines:

 

Jul 25 08:55:58 gissi smbd[8339]: [2013/07/25 08:55:58, 0] modules/vfs_hpuxacl.
c:300(hpuxacl_sys_acl_set_file)
Jul 25 08:55:58 gissi smbd[8339]: ERROR calling acl: Not owner
Jul 25 09:05:06 gissi smbd[8624]: [2013/07/25 09:05:06, 0] smbd/nttrans.c:2081(
call_nt_transact_ioctl)
Jul 25 09:05:06 gissi smbd[8624]: call_nt_transact_ioctl(0x1401c4): Currently
not implemented.
Jul 25 09:05:07 gissi smbd[8624]: [2013/07/25 09:05:07, 0] modules/vfs_hpuxacl.
c:300(hpuxacl_sys_acl_set_file)
Jul 25 09:05:07 gissi smbd[8624]: ERROR calling acl: Not owner
Jul 25 09:05:25 gissi smbd[8624]: [2013/07/25 09:05:25, 0] modules/vfs_hpuxacl.
c:300(hpuxacl_sys_acl_set_file)
Jul 25 09:05:40 gissi smbd[8624]: [2013/07/25 09:05:40, 0] modules/vfs_hpuxacl.
c:300(hpuxacl_sys_acl_set_file)

 

 

Moreover, all the physical disks the CIFS server is using are internal, or external to the HP-UX box but directly SCSI connected.

 

Thank you

Enrico

 

enrico.nic
Regular Advisor

Re: CIFS problem with MS Excel files

Hi,

 

Some brief updates about the file corruption problem.

 

I have discovered that:

 

- all the files (5 in total) have been corrupted on July, 25 2013 between 9:55 and 10 :04

- all the corruptions take place from a CIFS connection to the HP-UX server, originating from the same client and from the same user. The same user tried to open the 5 files one after the other.

- the file systems involved are DIFFERENT (at least one of the 5 files is in /dev/vg00, the other 4 are in /dev/vg03). Moreover, the two file systems are on two different disks: one is on an internal hard drive (vg00), the other is on an external, SCSI connected hard drive (vg03). So I suspect (but this has to be verified) that the problem is Samba/CIFS server related, and there is no file system corruption on this box.

 

By the way, I haven't performed the suggested file system check yet, I will do it this evening - without connected users.

 

If any of you has a supplemental suggestion ... you are welcome.

 

Thank you

Enrico