Operating System - HP-UX
1827810 Members
2151 Online
109969 Solutions
New Discussion

Illegal instruction, bus error

 
John Kugelman
New Member

Illegal instruction, bus error

I've written a client program that wipes the hard drives of a number of machines and reports the results to a central server. I originally wrote this for Linux and have now been trying to port it to HP-UX 11.11. The HP-UX client works great on my development machine--it wipes the drive, sends a message to the server saying "wipe complete", and then shuts down the machine. But on a different machine (another company's system) it crashes after wiping but before sending the "wipe complete" message. The error message is "Illegal instruction. Bus error."

I'm not familiar with the intricacies of the HP-UX operating system, so my main question is:

>>> What are the possible differences between our machines that could cause this different behavior? <<<

I'm not sure what the right keywords or concepts are, here? Kernel settings? Drivers? Operating system version? Processor? What's relevant
here???


[Details]

The trickiest part of this that this is on a running system so we're wiping out the hard drive while the OS is running. The wiping is done with commercial off-the-shelf software, and it supports this kind of in-place wiping out of the box. And it seems to work fine. But on the other company's machine, after the wiping is complete and the wipe program exits, returning control to my client, the client crashes with the error: "Illegal instruction. Bus error".

They have their system a lot more locked down and stripped down than my dev machine, i.e. they've removed as many packages as humanly possible to get the size down and make it more secure. I don't know exactly what they've done but I can ask if you can tell me what to ask about.


[Client description]

Here's exactly what the client is doing at the point that it crashes, if it provides any clues:

1. The wipe software is run with popen() and finishes with pclose(). That appears to work.
2. The client sends a “wipe complete” message via an SSL socket:
2a. In the main thread, it uses pthread_mutex_lock() to lock a mutex and adds a message to the outgoing message queue. Then it signals the message writer thread with pthread_cond_signal() and then calls pthread_mutex_unlock().
2b. In the writer thread, once signalled it exits from a pthread_cond_wait() call and calls pthread_mutex_unlock(), then sends the message sitting in the queue to the DM via OpenSSL’s SSL_write() function.
3. Repeat steps 2a and 2b for a “phase change” message. If this 2nd message goes through then the server will show the downgrade as successful.
4. In the main thread, it does sleep(5) followed by reboot(RB_HALT | RB_NOSYNC). These two things happen in parallel with step 3.
9 REPLIES 9
James R. Ferguson
Acclaimed Contributor

Re: Illegal instruction, bus error

Hi:

To get to the root of the problem you will need to do a stack trace using the debugger. For HP-UX you use the Wildebeest Debugger (WDB):

http://www.hp.com/go/wdb

This may already be available in your '/opt/langtools/bin'.

Regards!

...JRF...


Bill Hassell
Honored Contributor

Re: Illegal instruction, bus error

Actually, this is fairly easy to diagnose. You are ripping everything out from underneath the kernel in memory. All it takes is a reference to a shared library, a spawn of a network daemon from inetd or simply the end of the client program which returns to the kernel. The details of what happens at that moment are dependent on the amount of RAM, the model of the computer, the version of various patches, and so on. I would expect exactly this behavior from most large scale systems.

My question is: why do you (or the customer) care about a program crash? The OS has been destroyed on the disk so the job is done. If you really want a clean shutdown then you'll need to write a memory resident piece of code (really difficult since there is no OS) and boot this code off a CD or tape. I know of no practical way to make this work cleanly, especially for something that appears to be using networking (SSL).

If the customer wants 'official' confirmation that the wipe is complete, the display of the message would never be adequate for me. I would have to try to boot off all the installed disks before I was sure the disks were at least partially wiped. The only traceable method would be to move the disks to another server, wipe them and then use dd to read selected tracks. It all depends on what the security requirements might be.


Bill Hassell, sysadmin
Laurent Menase
Honored Contributor

Re: Illegal instruction, bus error

in fact you can't say what your system will do if you wipe out system disk.

It would be better to use ignite scripting capabilities or get instpiration on how ignite boot from a ramdisk.
Then you can safely wipe out the system disk.

Else you can also create a safe environnement on ramdisk:
create a ramdisk, copy all your needed shared libs and executables and files to that ramdisk

chroot to that ramdisk and then wipeout and send your message.
Dennis Handly
Acclaimed Contributor

Re: Illegal instruction, bus error

>Bill: You are ripping everything out from underneath the kernel in memory.

Right.

>The OS has been destroyed on the disk so the job is done.

But possibly not the user data or all of the files.
John Kugelman
New Member

Re: Illegal instruction, bus error

I understand that the system is in a precarious state, which is why I limit my actions and don't access any files on disk--at least, I try not to. The pthread and OpenSSL libraries are already loaded into memory and have been used the entire time the client is running, so I wouldn't think accessing them then would cause a problem. In particular, the kernel should have been wiped twenty minutes previously when that part of the hard drive was wiped. I don't know why it chooses the end of the wipe as the time to crash, except that obviously it's something to do with the wipe process ending or the message being sent.

I care about the client crashing because it needs to report status back to the server saying that the wipe is finished. It's crashing right before it's able to do that.

Anyways, I guess I'll know more once they get some info from the debugger. And if that doesn't help I'll look into the ramdisk option. That's a good idea.
Dennis Handly
Acclaimed Contributor

Re: Illegal instruction, bus error

>I guess I'll know more once they get some info from the debugger.

How, wasn't the system wiped before it crashed?
Any corefile left?
John Kugelman
New Member

Re: Illegal instruction, bus error

No, I am betting that it's trying to core dump (bus error) and then blows up trying to do that (illegal instruction). If the debugger is attached to the live process I'm hoping it'll work even though the disk is wiped out, as long as it doesn't try to load any symbol files or whatever when the program crashes.
Don Morris_1
Honored Contributor

Re: Illegal instruction, bus error

Do you use mlockall() or other means to ensure all the process pages are resident? Otherwise - one thing that comes to mind is an instruction page (your text or shlib text) that wasn't needed yet and hence wasn't faulted in... and then referenced, which would cause an I/O error on the virtual fault and I believe a SIGBUS instead of SIGSEGV -- which all fits.
Dennis Handly
Acclaimed Contributor

Re: Illegal instruction, bus error

>as long as it doesn't try to load any symbol files or whatever when the program crashes.

The debugger typically delays reading in info so that it may read it after the crash. Anything related to the instructions and unwinding would be read as needed.