Operating System - OpenVMS
1827876 Members
1530 Online
109969 Solutions
New Discussion

Re: Multiple errors SQL process

 
odwillia
Frequent Advisor

Multiple errors SQL process



Our programmers are seeing an issue wherein some of our users are getting different (incorrect) SQL results. All of them are running the same query in the same environment but for whatever reason the result set is different. They believe that this is due to a profile issue. However these users have run the same process previously without error. To test we matched profiles with a user that received the correct results but the test user still got the same incorrect SQL results.

Then the programmer received the following errors for the processusing a test account we created:

RDB-F-SYS_REQUEST, error from system services request

The programmer tried the SQL under TEST_DB & executed to create the CSV file out on PCDATA. This is the error message at the bottom of the SQLOUT file found in this user account home directory.

%RDB-F-SYS_REQUEST, error from system services request
-COSI-E-OPENERR, cannot open file
-RDMS-F-ERROPENOUT, error opening RMADSK04:[SORT_WORKSPACE_2]SORTWORK.TMP; as output
-RMS-E-PRV, insufficient privilege or file protection violation

When the programmer logged out of the test account she received these errors:

%BAS-E-FATSYSIO_, Fatal system I/O failure │ │[ ] 80 │
-BAS-I-ON_CHAFIL, on channel 5 for file TT:[].; at user PC 00000000 │[
________________________________________
] 132 │
-RMS-F-IFI, invalid internal file identifier (IFI) value──────────┘ └────────┘
TEST_DB logged out at 13-FEB-2009 15:44:28.44─────────────────────────┐


Let me know if you need more info or this is not clear.

We believe this is an application issue.

As always I appreciate your assistance.
17 REPLIES 17
Hein van den Heuvel
Honored Contributor

Re: Multiple errors SQL process

>> They believe that this is due to a profile issue.

Please try to explain in OpenVMS terms what you mean with 'profile'. SYSUAF account attributes? Logical names / symbols perhpas set up by a login.com? A logical name table being included? Some RDB settings?

>> However these users have run the same process previously without error.

So something changed. That 'profile'?
Or maybe just 'more data' requireing an external sort now, versus memory sort before?

>> To test we matched profiles with a user that received the correct results but the test user still got the same incorrect SQL results.

Can you identify the difference between a workgn and failing 'profile'?



>> -RDMS-F-ERROPENOUT, error opening RMADSK04:[SORT_WORKSPACE_2]SORTWORK.TMP; as output
>> -RMS-E-PRV, insufficient privilege or file protection violation

That's a clear, and hard, error. Does it make sence?
- $DIRECTORY/SECURITY RMADSK04:[000000]SORT_WORKSPACE_2.DIR
- What identifiers does the user have?
- If you temporarely open remove protection on that directory, does that problem go away?


>> When the programmer logged out of the test account she received these errors:
>> %BAS-E-FATSYSIO_, Fatal system I/O failure >> -BAS-I-ON_CHAFIL, on channel 5 for file ->> RMS-F-IFI, invalid internal file identifier (IFI) value

Don't worry too much about that, just now.
It means BASIC asked RMS to close a file using a FAB which was either completely invalid, or had already been closed.
Typically it suggest that some code stomped over dynamic memory corrupting some of the BASIC RTL structures. Most likely an error handler. This is not good, and ought to be figured out some say, but it is likely to be a secondary problem just now.

Hope this helps some,
Hein.
EdgarZamora_1
Respected Contributor

Re: Multiple errors SQL process

I'm not sure if it's related to your SQL problems, but your SORT "privilege" error is usually caused when you redefine SYS$SCRATCH to be a common directory for your users. It becomes a timing issue with the .tmp files being created there. You will find it's an intermittent error depending on timing of processes using SORT. IMO, SORT should be using uniquely named temp files but that's not the case right now. You probably have good reason for making SYS$SCRATCH a common area (such as disk space reasons) but you might wanna try deassigning it temporarily for test purposes to see if that's your problem. You would need sufficient disk space for the home directories of the users if you deassign SYS$SCRATCH.
odwillia
Frequent Advisor

Re: Multiple errors SQL process

>>Please try to explain in OpenVMS terms what you mean with 'profile'. SYSUAF account attributes? Logical names / symbols perhpas set up by a login.com? A logical name table being included? Some RDB settings?

By profile I mean the SYSUAF account attributes.

>>So something changed. That 'profile'?
Or maybe just 'more data' requireing an external sort now, versus memory sort before?

We tested by making the user accounts match exactly.


>>Can you identify the difference between a workgn and failing 'profile'?

No difference that I can see.



>> -RDMS-F-ERROPENOUT, error opening RMADSK04:[SORT_WORKSPACE_2]SORTWORK.TMP; as output
>> -RMS-E-PRV, insufficient privilege or file protection violation

That's a clear, and hard, error. Does it make sence?

- $DIRECTORY/SECURITY RMADSK04:[000000]SORT_WORKSPACE_2.DIR
- What identifiers does the user have?
- If you temporarely open remove protection on that directory, does that problem go away?

I'll test this.



>>Hope this helps some,
Hein.

Every little bit helps. Thanks.
odwillia
Frequent Advisor

Re: Multiple errors SQL process

>>I'm not sure if it's related to your SQL problems, but your SORT "privilege" error is usually caused when you redefine SYS$SCRATCH to be a common directory for your users. It becomes a timing issue with the .tmp files being created there. You will find it's an intermittent error depending on timing of processes using SORT. IMO, SORT should be using uniquely named temp files but that's not the case right now. You probably have good reason for making SYS$SCRATCH a common area (such as disk space reasons) but you might wanna try deassigning it temporarily for test purposes to see if that's your problem. You would need sufficient disk space for the home directories of the users if you deassign SYS$SCRATCH.

How can I tell if I have redefinned SYS$SCRATCH?

EdgarZamora_1
Respected Contributor

Re: Multiple errors SQL process

$ SHOW LOGICAL SYS$SCRATCH

normally SYS$SCRATCH is defined to be your login directory. Some sites define a system-wide (or group-wide, etc.) logical for SYS$SCRATCH to a common area for all users.

I've seen this SORT problem a few times... a user owns a sort work temp file in the directory... another user tries to open the same file name and gets the privilege error. All my users have their own accounts. If you have multiple users using the same account name, you'd probably have a more subtle problem.
odwillia
Frequent Advisor

Re: Multiple errors SQL process

This what I get for a logical for SYS$SCRATCH and the disk with the error.

$ SHOW LOGICAL SYS$SCRATCH
"SYS$SCRATCH" = "CLUSTERDISK:[CONRES]" (LNM$JOB_8213F8C0)

$ sh logical RMADSK04
"RMADSK04" = "$1$DGA858:" (LNM$SYSTEM_TABLE)
$
EdgarZamora_1
Respected Contributor

Re: Multiple errors SQL process


It looks like "CLUSTERDISK:[CONRES]" is your login home directory. That doesn't match the work directory shown in your original post (RMADSK04:[SORT_WORKSPACE_2]). What I'm trying to tell you is that the accounts running the application (and experiencing the problem) must be using/sharing the same SORT temp directory. You might wanna try to separate that (just to see if that's your problem).
Chris Barratt
Frequent Advisor

Re: Multiple errors SQL process

I believe that there is also a logical that can be used to point Rdb sortwork files to a particular directory. Given that RMADSK04:[SORT_WORKSPACE_2] is not your home directory, I would be checking to see whether that logical is turned on. I don't remember it's name, but a "Show log rdm*" should pick it up - or just check the documentation to be sure.

It could be that the working user has accessto this directory, whereas the other users do not.

Why then would it have worked previously ?
- maybe the logical has only been defined recently
- maybe the query didn't use to require a sort work file, but something has changed in the optimisation (often cardinalities), and now it does.

Just one more idea...

cheers,
chris
odwillia
Frequent Advisor

Re: Multiple errors SQL process

Thanks Chris. I will look into that as well.
Hakan Zanderau ( Anders
Trusted Contributor

Re: Multiple errors SQL process

If it is a privilege-problem....

Try

$ SET AUDIT/ALARM/ENABLE=FILE=FAIL
$ REPLY/ENABLE

and hopefully you will get an event telling you
what file(s) that don't allow access.

Hakan
Don't make it worse by guessing.........
Wim Van den Wyngaert
Honored Contributor

Re: Multiple errors SQL process

The sort files come per default in sys$scratch. If they exist, a new version is created. And that's were it goes wrong in your case. The previous version has a different owner.

If you define sys$scratch as a common directory you will get sorts that use the same workfiles (the pid is not in the work file name, bad program).

Solution 1 : put an acl on the directory so that every file created is rwed by anyone.

Solution 2 : in login.com you define sortwork (and sortworkN if you also use /work=N) as a unique directory (e.g. [sortwork.xxxxxxxx] where xxxxxxxx is the pid) and also create that directory. Then the collision is gone.

Solution 3 : don't use workfiles. Increase PGFLQ in the sysuaf until sortfiles are no longer used (will consume memory). But you're never sure that it will be high enough.

I also wonder if the sort always succeeds correctly when 2 processes are competing for the files.

Wim
Wim
EdgarZamora_1
Respected Contributor

Re: Multiple errors SQL process

It could be an Rdb logical as previous poster mentioned, but the bottom line is the utility that's creating the file is SORT and SORT is not using uniquely named temp files. So if you had a user who exited abnormally and the sort work file in the scratch directory didn't get cleaned up, another account could get a privilege error trying to use the same file name.

Here is the email thread of when we were experiencing the SORT "insufficient privilege" problem. Hope it helps...

From: xxxx, Kathy
Sent: Thursday, March 27, 2008 2:12 PM
To: Zamora, Edgar; xxxxxx, Tom
Subject: RE: CLCC - Protections

Thanks for looking into this. The redefinition of SYS$SCRATCH also seems to be working.

-------------------------------------------

From: Zamora, Edgar
Sent: Thursday, March 27, 2008 1:27 PM
To: xxxx, Kathy; xxxxxx, Tom
Subject: RE: CLCC - Protections


I deleted those. It should be fine now.

Ok, I tested this... it looks like the sort utility creates these temp files (and normally deletes them when done, but if thereâ s an error they may not get deleted). The problem arises when two people use sort simultaneously (DTR uses sort, I also discovered R2CONSOL uses sort because this problem arose in the RDA system while a user was running R2CONSOL) The sort utility creates these temp files with the protection (S:RWED,O:RWED,G,W) overriding any default protection we specify. So... bottom line the problem is because we use a common scratch/work directory... I don't know how to fix this other than recommending that you retry at a later time if you encounter the problem (deleting the sys$scratch:sortwork*.tmp.* files will work as long as the other user no longer has these files open).

-----------------------------------------

From: xxxx, Kathy
Sent: Thursday, March 27, 2008 1:19 PM
To: Zamora, Edgar; xxxxxx, Tom
Subject: RE: CLCC - Protections


There are old files in SYS$SCRATCH so it still does not work for me.

$ cat sys$scratch:sort*

Directory $1$DGA426:[SCRATCH]

SORTWORK0.TMP;1 0/3600 21-MAR-2008 10:37:02.97 [1,1]

SORTWORK1.TMP;1 0/4400 21-MAR-2008 10:37:02.97 [1,1]

Total of 2 files, 0/8000 blocks.


SYS$SCRATCH was already defined to be my users directory. But I believe the MJ_SITE_TABLE definition was overriding that. So, I have now defined it in my LOGIN.COM to be in the MJ_PROCESS_SPECIFIC table and that seems to be working. Thanks.


$ show log sys$scratch

"SYS$SCRATCH" = "SYS$USERS:[Kxxxx]" (MJ_PROCESS_SPECIFIC)

"SYS$SCRATCH" = "$1$DGA426:[SCRATCH]" (MJ_SITE_TABLE)

"SYS$SCRATCH" = "SYS$USERS:[Kxxxx]" (LNM$JOB_82B44900)


------------------------------------------

From: Zamora, Edgar
Sent: Thursday, March 27, 2008 1:02 PM
To: xxxxxx, Tom; xxxx, Kathy
Subject: RE: CLCC - Protections


I just realized I didn't explain it clearly... someone was running something that used the same name temp file. That is why you got the error. If you run it at another time it will probably work. Like right now, no one else has a file named sortxxxx in sys$scratch.


------------------------------------------

From: Zamora, Edgar
Sent: Thursday, March 27, 2008 12:56 PM
To: xxxxxx, Tom; xxxx, Kathy
Subject: RE: CLCC - Protections

This problem just occurred in RDA. It's not a privilege problem per se, because the directory is actually wide open. However, because MJ redefines sys$scratch to be that one central directory (instead of the VMS default that your login dir is your scratch dir) identically named files get created by some software and this results in conflict. This is the main reason why in most MJ scripts, the PID or some unique id is used as part of the file name of a temp file so there's no conflict.

I think we need to discuss this a bit more, but sys$scratch is already wide open. Datatrieve (or Sort) is not creating uniquely named work files.

As a workaround for your current problem, I believe if you define sys$scratch to be your login directory you will circumvent the error.

CLCC> dir/secu $1$dga426:[000000]scratch

Directory $1$DGA426:[000000]

SCRATCH.DIR;1 [1,1] (RWED,RWED,RWED,RWED)

(DEFAULT_PROTECTION,SYSTEM:RWED,OWNER:RWED,GROUP:RWED,WORLD:RWED)



Total of 1 file.

--------------------------------------------------------------------------------

From: xxxx, Kathy
Sent: Thursday, March 27, 2008 12:41 PM
To: Zamora, Edgar
Cc: xxxxxx, Tom
Subject: CLCC - Protections


Is it possible to modify the protections on the SYS_WORK directory on CLCC so that I can use Datatrieve? When my collection of records is large, I cannot even sort the records (Datatrieve appears to sort in memory when it can and to go to disk when necessary). Here is an example.

$ dtr32

DTR> ready mag_cdd.retpost

DTR> find all retpost

[23038 records found]

DTR> sum 1 by rp-art

%SORT-E-OPENOUT, error opening $1$DGA426:[SCRATCH]SORTWORK1.TMP; as output

-RMS-E-PRV, insufficient privilege or file protection violation

DTR> find first 10 retpost

[10 records found]

DTR> sum 1 by rp-art


Thanks.
EdgarZamora_1
Respected Contributor

Re: Multiple errors SQL process

Ahhh... reply collisions... I didn't see Wim's response while I was writing mine... but he explained the problem clearly (I am REALLY BAD at explaining)...

"The sort files come per default in sys$scratch. If they exist, a new version is created. And that's were it goes wrong in your case. The previous version has a different owner."

Hein van den Heuvel
Honored Contributor

Re: Multiple errors SQL process

So in summary

- is 'worked' when using the same profile because the owners of the sortwork files became the same.

- if you use a shared scratch directory, then be sure to put an acl with option=default on that directory to allow anyone to write any file in there.

- consider providing per-user scratch directories as is the default:... sys$login:

- consider logical name definitions for sortwork to include a Process-ID or other, unique to the process, identifier.

Hein.

odwillia
Frequent Advisor

Re: Multiple errors SQL process

How can I find all of the sort directories being used?
odwillia
Frequent Advisor

Re: Multiple errors SQL process

I got it nevermind.
Wim Van den Wyngaert
Honored Contributor

Re: Multiple errors SQL process

Note that my solution 3 will speed up the sort too. You can implement it with the other solutions. If you don't have an idea of how much to increase it, increase it 10% at the time and check the result (memory usage). Note that they must login again after the modification.

Wim
Wim