1827286 Members
3675 Online
109717 Solutions
New Discussion

UTF-8 charset under HPUX

 
Dbourcier
Advisor

UTF-8 charset under HPUX

Hi everyone,

I'm currently dealing with a problem since 3 weeks.
Let me explain ..

CIFS SERVER UNDER HP-UX B.11.23 U ia64
======================================

Configuration File ( /etc/opt/samba/smb.conf ) :

[global]
workgroup = ADHARA
client code page = 850
character set = ISO8859-1
server string = "Samba Server"
log file = /var/opt/samba/log.%m
max log size = 1000
security = user
encrypt passwords = yes
socket options = TCP_NODELAY
local master = no
preserve case = yes
short preserve case = no
dos filetime resolution = yes
read only = no
syslog = 0

( I removed the commented part )

[GEMMA]
comment = GEMMA
path =/tmp/gemma_file
valid users = @smbadm, jjgui
public = no
writable = yes
printable = no
write list = @smbadm, jjgui
======================================
Client :
Microsoft Windows XP ...
French customer
======================================

Problem :

When the guy is sending a file from Windows to the Unix share, then the convertion is bad, there is some strange tabulations represented by a little square.

I received a mail from the customer saying :

"The original format is ANSI ( Unix format ). Once converted with unix2dos command, the file become UTF-8 ( DOS / Windows Format ). It might be enough to set the convertion to UTF-8."

======================================

Unfortunately. I cannot find any codepage for UTF-8 format under HPUX. I've try thoses options :

dos charset = CP850
unix charset = UTF-8


Results :

Ignoring unknown parameter "dos charset"
[2007/12/10 20:00:35, 0] param/loadparm.c:map_parameter(2129)
Unknown parameter encountered: "unix charset"
[2007/12/10 20:00:35, 0] param/loadparm.c:lp_do_parameter(2817)

So, I tried :

client code page = 850
character set = UTF-8

I get an error because the file /etc/opt/samba/codepages/codepage.UTF-8 has not been found

======================================

My questions are :

1- Is there any parameters to set like the dos and unix charset ?

2- Is there any place where I can find the UTF-8 codepage file ?

Thanks in advance for your advice, question ...

David Bourcier
5 REPLIES 5
Andrew C Fieldsend
Respected Contributor

Re: UTF-8 charset under HPUX

You might find this useful: http://samba.org/samba/docs/man/Samba-HOWTO-Collection/unicode.html. It indicates that support for Unicode (and therefore UTF-8) was introduced with version 3 of Samba, which might explain your problem if you're running version 2.x.

Technically, UTF-8 is an 8-bit encoding of Unicode, not a character set, although Sambe (and others) are a little free with the meaning of "character set".

There can't be a UTF-8 code page because a code page selection switches the top 128 characters in an 8-bit code set (the bottom 128 are the "standard" ASCII set), while UTF-8 uses a variable-length encoding of the full Unicode character set (apologies if I'm telling you something you already know).
Dbourcier
Advisor

Re: UTF-8 charset under HPUX

Hello,

Thanks for your really usefull help. So I've upgraded the CIFS server on the HPUX to CIFS-SERVER 3 ( based on Samba 3.0.2 ). The good news is that UTF-8 is now known by Samba, but the customer still have the same problem. Here is the parameter I set :

dos charset = CP850
unix charset = UTF-8
display charset = UTF-8

The parameter client code page is unknown.

Any idea of how I can avoid this problem ? Or even, how to implement a automatic conversion of the file using the program ux2dos when someone is tranferring a file on the share ?

Thanks in advance for your help
Matti_Kurkela
Honored Contributor

Re: UTF-8 charset under HPUX

As far as I've understood, the "client code page", "unix charset" and "display charset" settings effect only the characters in the _filenames_.

As there is no way to automatically determine with 100% certainty which files contain binary data which must not be modified and which are plaintext, Samba won't convert the _contents_ of the files at all.

MK
MK
Dbourcier
Advisor

Re: UTF-8 charset under HPUX

Hello,

So, it seems that there is no workaround possible to fix my problem, at least under the samba configuration.

Thanks to all for your precious help, do not hesitate to "ping" me if you have any workaround possible :)

Have a nice Day. Bye
Andrew C Fieldsend
Respected Contributor

Re: UTF-8 charset under HPUX

MK is correct that the Samba settings only affect translation of file names.

When and how does your customer see the "translation"? If he reads the file on the same system it was written, it should be unchanged.

Since he mentions unix2dos, I assume he's writing it on one system (UNIX) and reading it on another (DOS)? Unix2dos should only change the new line sequences ( to ).