Operating System - HP-UX
1832343 Members
2573 Online
110041 Solutions
New Discussion

lpsched behaviour and performance

 
SOLVED
Go to solution
Ian Killer_1
Regular Advisor

lpsched behaviour and performance

Hi there experts... I need some detailed information as to how the lp spooler behaves. I have a HP cluster protecting european printing services. The server's have 3500 queues and print 120,000 jobs per week (shared between the nodes). Of recent I've started to see some performance degredation, but both the number of queues and the quantity of the jobs has increased proportionally. I've been granted another server inorder to share the load but I'm not sure how best to share it. The big question is really how the spooler works. Here are my questions...

Does the spooler cycle through queues looking for pending requests? Does it disable queues to prevent uneccesary reads?

Does the spooler execute jobs as they arrive through rlp?

In theory there should be an lpsched process for every print job but I never see more than 15 lpscheds at a time when I know there are more than 400 pending jobs in the requests directory.

I often see queues get disabled (sometimes for weeks) and queues build up to 100 or so jobs. Do these excessive pending jobs hurt the performance of the scheduler at all?

How does the scheduler choose which print job is next?

Apreciate your help.

Thanks..

ian
Where ever the gypsies rome.
11 REPLIES 11
Chris Vail
Honored Contributor
Solution

Re: lpsched behaviour and performance

WOW! Thats a BIG print scheduler! I've not worked on anything that big, but I'll gladly share my ignorance with you. Are all these printers attached directly to the machine (omigod!) or are they on a network? This makes a HUGE difference in the way lpsched works.
If they're attached directly to the system, then lpsched has to do A LOT of work. And you need to tell us how you directly connected 3500 printers to a single host.
If they're all on a network, then lpsched has a lot less work to do. All it will do is contact the server (or network card) attached to the printer, send it the print job, then go about its merry way.
Here's what I know about the answers to your questions:
1) Yes, the spooler cycles through the queues looking for unprocessed print jobs. AFAIK, it does this in the order in which the directories are listed.
2) Disabling the queue does not prevent unnecessary reads. AFAIK, the only way to prevent the daemon from reading the queue is to remove it.
3)A job pending doesn't spawn an lpsched daemon. A job being printed (or handed off) does.
4) Excessive jobs in the print queue is more of a disk space issue than a performance issue.
5) Print job priority depends on how the request was made. Do a man on lpfence for an example.
One last comment: what kind of machine is this? You should have some pretty beefy CPU's--perhaps an RP series of machines. When you have that many print jobs, you'll need LOTS of CPU cycles and a bunch of memory.


Chris
Bill Douglass
Esteemed Contributor

Re: lpsched behaviour and performance

In regards to rlp, I believe that rlpdaemon simply queues print jobs that it receives. It is then up to lpsched to determine when to send the job to the printer.

As for performance dropping off, are you seeing an increase paging (check vmstat, sar, glance). Also check on filesystem fragmentation (fsadm -F vxfs -D -E if using vxfs), as that much creating & deleting of files could be slowing things down somewhat.
Geoff Wild
Honored Contributor

Re: lpsched behaviour and performance

Are you running the HP Spooler as a package in MC/SG?

If no, and want to, and would like to see how I did it, I can share my package information with you.

My environment is on SAP with about 650 printers...currently have 775 jobs in queue - don't notice any performance issues...

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Ian Killer_1
Regular Advisor

Re: lpsched behaviour and performance

Hey chris... thanks for all that info.. that's actually really bad news because of the way our server is constructed. The printers are all remote, and the box is a D370 with plenty of disk space 512mb RAM and 160mHz. It's tad old with little possibility of new H/W. I never see worrisome cpu percentage except when our admin scrips run to clean up the queues. The cpu is generally around 30% or idle, but at the same time the run queue can be as high as 8 but only for a short time.

Hey Bill.. nope.. paging is at a minimum. I'll attach a typical iostat and vmstat.

Geoff.. for sure!! I'd love to have a gander at your package... scripts. Heh. I'd like to add some of the crucial processes as services.
Where ever the gypsies rome.
Bill Hassell
Honored Contributor

Re: lpsched behaviour and performance

The lp scheduler is pretty simple but as mentioned, the daemon lpsched (the first one started) will scan the queues. Now that doesn't mean that each job is read. The 3 control files: pstatus, qstatus and outputq are used to keep track of printers and jobs. CPU and disk I/O cycles are burned when the jobs are submitted and when a specific job is being printed but the rest of the print jobs just sit on disk waiting for their turn.

So there is little overhead for lpsched since all job requests go through the FIFO file which is read by the daemon lpsched. A workload of 8 is not a problem and doesn't really reflect how busy the system might be. It indicates that there are 8 jobs ready to run, one or more (depending on how many CPUs you have) are actually consuming CPU cycles, the rest are waiting.

A spooling server like this is mostly I/O bound but I would assume that most of the printers are LAN-based and those are slow when compared to disk speeds. Occasionally, LAN problems will cause an printer script to start consuming lots of CPU time trying to communicate with the printer. top will show hpnpf as a big CPU user. In that case, disabling the printer should drop the load while you see what is wrong with the printer.

A print queue can be disabled for many reasons, all having to do with non-zero exit codes from the printer script. Since the scripts do not set explicit error codes for each failure, it is tricky to locate a reason. A large queue in a disabled printer is not a performance issue but is usually a disk space issue. (trick question: what is the first thing a user does when the printer won't print? and the second thing?...you get the point).

For this type of application, I would setup a cron job to scan the request queues and notify a sysadmin when the total space in a queue or the total number of jobs in a queue exceed a certain number. Then take appropriate action.


Bill Hassell, sysadmin
Leif Halvarsson_2
Honored Contributor
Bill Hassell
Honored Contributor

Re: lpsched behaviour and performance

Actually,I would not recommend HPDPS for very large print queues. After releasing DPS a while ago, HP has lowered the expectations of this product. especially in very large environments. It uses DCE for multi-system coordination and this can be quite a headache to keep running on a busy network. In your case, the print jobs are all handled on the one system so DPS would not be of much value. The lp spooler will start as many jobs as there are printers (of course, only one per printer) when every request queue has something to print. The load will be based on the number of busy lpsched children (those with the lpsched daemon as their parent) and since it is heavy I/O, the runqueue (from uptime or top) may look high. That's because during the measurement period, each lpsched and the child shell running the printer script have very little to do except initiate and wait for I/O. So the programs are in the runqueue for a very short time and then go to sleep waiting on the I/O to complete.

So I wouldn't be concerned at runqueues up to 10 or 20 for the CPU, or CPU usage up to 50% for system overhead on a spooling server. That means you are getting your money's worth out of this solid performer, albeit slow compared to new rp servers. The CPU is still way faster than the printers it services.


Bill Hassell, sysadmin
Ian Killer_1
Regular Advisor

Re: lpsched behaviour and performance

Thanks Bill. No Network printers. They're all remote. We already have enable / queue admin scripts cron'ed every 20 minutes.

As a standard the remote queues run on NT print servers which then forward on to the printer. The slow link in the chain would then be I/O as you suggest in your entry above. Do the IO numbers in my attachement above look high to you? avg. 60/second.
Where ever the gypsies rome.
Geoff Wild
Honored Contributor

Re: lpsched behaviour and performance

Ian,

Here's my package...things to note:

The following are in a volume group on EMC disk:

/etc/lp Directory of spooler configuration data
/var/sam/lp Backup directory of spooler configuration
/var/spool/lp Directory of LP spooling files and directories
/var/adm/lp Directory of spooler log files

Before adding/removing a printer, you MUST touch the the Cluster Monitoring Lock File - " /tmp/iprprt.lock ", then wait 60 seconds.
If you don't, the iprprtpkg cluster package will fail over to the other node.

Rgds...Geoff

Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Ian Killer_1
Regular Advisor

Re: lpsched behaviour and performance

Further to this question.. Does anyone have any experience with the spooler capabilites of red-hat?
Where ever the gypsies rome.
Geoff Wild
Honored Contributor

Re: lpsched behaviour and performance

Red Hat - I'm using CUPS on my home server on RH 9 with Samba.

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.