topic Re: strtok in Operating System - HP-UX

strtok

Leslie Chaim — Thu, 24 Oct 2002 20:18:31 GMT

Is there a C function to tokenize a string and get the token values. If I use strtok() I cannot know what was the last token. Is there anything else out there?

Thanks,

Re: strtok

A. Clay Stephenson — Thu, 24 Oct 2002 20:41:02 GMT

Actually strtok(s1,s2) will do the job. The 1st time that you invoke the function you pass in a non-NULL value for s1. It returns a pointer to the first token separated by at least one the the chars in s2. You then call strtok again but this time with s1 set to (char *) NULL. You repeat the process until strtok() returns NULL (meaning your original s1 has been completely parsed. You are then ready for a new s1.

strtok() works for simple tokenizing but for more complex tasks, you can use lex to generate a lexical engine and for really complex tasks, it's time to use yacc.

Re: strtok

Leslie Chaim — Thu, 24 Oct 2002 20:56:40 GMT

But..

Here is what I try to do.

char *sep = "()*+,/ ";

char *str = "a * b + c/foo(x,y,z)";

strtok will give me the following tokens:
a
b
c
foo
x
y
z

But I want this:
a
*
b
+
c
/
foo
(
,
x
,
y
,
z
)
----------------------

In other words, I want to use a function where I know WHAT was the delimiting token.

Or must I write my own tokenizer.

Thanks

Re: strtok

A. Clay Stephenson — Thu, 24 Oct 2002 21:38:22 GMT

That's beyond the ability of strtok(). You could change the rules to require whitespace between tokens and then strtok() would work but if you need to do actual expression evaluation then it's time to use yacc. Yacc will allow you to build an interpreter or compiler about as complex as you wish.

Re: strtok

Adam J Markiewicz — Fri, 25 Oct 2002 09:08:00 GMT

I've been dealing with parsing strings lately.
If I had to do this now I would use regular expressiong.
They are not so hard to learn, and flexible enough.
man rexexec
man regcomp

Don't forget regfree at the end. ;)

Adam

Re: strtok

Jean-Louis Phelix — Fri, 25 Oct 2002 10:30:27 GMT

Hi,

I also don't think that you can do it using strtok only ... I also noticed that spaces have a special meaning. If you don't have time to invest, I wrote this which uses a "mirror" string.

Regards,

Jean-Louis.

#include
#include

main (argc, argv)
int argc;
char *argv[];
{
char *sep = "()*+,/ ";
char *str = "a * b + c/foo(x,y,z)";
char *v;
char *p;
char o[BUFSIZ];
int first;

v=str;
p=o;
while (*v)
{
if (*v != ' ')
{
*p=*v;
p++;
}
v++;
}
*p='\0';
p=o;
printf("string <%s>\ntokens <%s>\nnewstr<%s>\n\n", str, sep, o);
v=strtok(str,sep);
first=1;
do {
if (first)
{
first=0;
printf("<%s> first pass\n", v);
}
else
{
printf("<%s> token <%c>\n", v, *(p-1));
}
p+=strlen(v)+1;
v=strtok(NULL,sep);
} while(*v);
}

Re: strtok

Leslie Chaim — Fri, 25 Oct 2002 12:50:25 GMT

I am not building an expression evaluator here, I just have a string with *different* delims that I need to parse *and* depending on the delim used, my program must take different actions. Therefore, I don't think I need to go the Yacc route, although I would like to study about it. All I know about Yacc is the nameJ Where would I start to learn about it.

Back to my problem:
I was looking for something likes Java's StringTokenizer where there is an option to return the delims as tokens, that's all.

Regular Expressions.... I am very comfortable with regex, in fact in Perl I would do this:

@values = grep { length } split m!([ ()*+,/])!;

@values would get all non-zero length tokens, including the delims (notice the capturing parenthesis).

I have not used any regex in C, and I will certainly give it thought. For start, does C have a split function or better how would you write the above line in C?

In the mean time, for simplicity I think I would go with Jean's simple approach and do it in C.

Nevertheless, the regex route is cool:)

Re: strtok

Adam J Markiewicz — Fri, 25 Oct 2002 15:18:25 GMT

Unluckilly it's not so easy as in perl. Mainly because you are limited to the number of subexpressions found inside. Actually they are checked inside expression pattern rather then the string itself.

As I think more about it I come to the conclusion that afterall you would end ap with the iterating loop, so in your case mayby simple Jean-Louis (the winner of the month!) approach would be more practical.

However if you were interested in regex in C check the

But one warning:
Don't remove spaces at the beginning. I think they should be ignored later. If you had two strings without operator they should be intepreted separately (and probably later considered semantical error). If you just remove spaces they will just concatenate into single string and you won't detect it.

Example:

"a + b + c d" -> "a+b+cd"

Re: strtok

Leslie Chaim — Fri, 25 Oct 2002 15:29:16 GMT

Thank you all for you inputs.
As an academic, it would be interesting to see how many lines of C will be needed to do what the above Perl does in one line.

Re: strtok

Adam J Markiewicz — Mon, 28 Oct 2002 09:23:43 GMT

Its me again.
I was thinking about good regular expresions to be defined and I found few potential traps you can fall into, so I decided to write.

You have to treat operators differently from tre names. For the names good regex seems to be:
"[a-zA-Z_][a-zA-Z0-9_]*"
of course, if you think about C.
But for the operators its not so nice. Should they only be single characters? If so what about operators like "<=" (of course, if you respect them).
But you cannot collect all operators automatically, because in an expression like "a*(b+c)" the first operator you will get is "*(".

So, I'm affraid, operators should be defined more strictly, and for names I would use the abore regexp. The rest I would do in a loop with cases.

So, the conclusion is I would do this with regexp. Perl and (sorry) Jean-Louis sollutions could generate you errors.

Good luck

Adam

Re: strtok

Leslie Chaim — Thu, 31 Oct 2002 20:29:39 GMT

Ok so here is what I did with my_strtok(), please voice your critques.

Regards,
Leslie

#include
#include

#define TOKEN_DELIMITER 100
#define TOKEN_TEXT 200

char * my_strtok (char *text, char *sep)
{

static char
*p,
*this_token;

static short
next_token_type;

size_t
token_length;

static long
current_token_max_len = 50; /* Should be good for most tokens */

if ( this_token == NULL )
{
fprintf (stderr, "Info: Calling malloc\n");

if ( !(this_token = (char *) malloc ( sizeof (char) * current_token_max_len)) )
{
perror ("my_strtok() malloc failed");
return NULL;
}
}

if ( text != NULL )
{
p = text; /* Let 'p' point to the text given */

if ( strchr (sep, *p) )
{
next_token_type = TOKEN_DELIMITER;
}
else
{
next_token_type = TOKEN_TEXT;
}
}

if ( next_token_type == TOKEN_TEXT )
{
token_length = strcspn (p, sep);

next_token_type = TOKEN_DELIMITER;
}
else
{
token_length = strspn (p, sep);

if ( token_length == 1 )
{
next_token_type = TOKEN_TEXT;
}
else if ( token_length > 1 )
{
token_length = 1;
next_token_type = TOKEN_DELIMITER;
}
}

if ( token_length == 0 )
{
if (this_token) free (this_token);
return NULL;
}

if ( token_length >= current_token_max_len )
{
fprintf (stderr, "Info: Calling realloc\n");

if ( !(this_token = (char *) realloc (this_token, sizeof (char) * (token_length + 1))) )
{
perror ("my_strtok() realloc failed");

if (this_token) free (this_token);
return NULL;
}
}

strncpy (this_token, p, token_length);

*(this_token + token_length) = '\0';

p += token_length;

return this_token;
}

Re: strtok

Leslie Chaim — Thu, 31 Oct 2002 20:31:58 GMT

Here is my_strtok in an attachment.

If we could only edit our posts:(

Re: strtok

Adam J Markiewicz — Mon, 04 Nov 2002 19:45:50 GMT

If you're interested in regexp version I can think about.

I'm affraid I don't like your code.
I don't see the point in the way hou handle dynamic memory.
Appart from it I see some bugs.

Actually I would do it from the scratch rather than remaster that code.

But I have to warn you: I like the things to mbe done precisely, so my code will be more complecated.

I'll think of it in the neares future if you still need it.

Adam