<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Problem with Linux ‘ps’ command can cause false failover of packages in Operating System - Linux</title>
    <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561541#M58362</link>
    <description>If you're just disposing the output of 'grep', why not just 'grep -q $PROC /proc/$p_pid/stat' ?  Either way, you're going to get ugly messages if the pid doesn't currently exist (No such file or directory).&lt;BR /&gt;&lt;BR /&gt;The furthering of this would be to put the grep straight into the if:&lt;BR /&gt;&lt;BR /&gt;if grep -q $PROC /proc/$p_pid/stat 2&amp;gt;/dev/null&lt;BR /&gt;&lt;BR /&gt;as 'if' checks the exit state of the application.. just a bit quicker than launching 'test' ([) and checking $?.&lt;BR /&gt;&lt;BR /&gt;Anyway, just some thoughts.</description>
    <pubDate>Thu, 09 Jun 2005 18:14:17 GMT</pubDate>
    <dc:creator>Stuart Browne</dc:creator>
    <dc:date>2005-06-09T18:14:17Z</dc:date>
    <item>
      <title>Problem with Linux ‘ps’ command can cause false failover of packages</title>
      <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561540#M58361</link>
      <description>The Linux command ‘ps pid’ will sometimes return empty, even if the process ‘pid’ exists.  This problem occurs with different frequencies in various releases of Linux.  It has been seen on RedHat 2.1 and RedHat 3.  It is believed to exist in SLES8 and may exist in SLES9 and RedHat 4.  The details are in &lt;A href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158277." target="_blank"&gt;https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158277.&lt;/A&gt;  The problem is related to how the ‘ps’ command checks the list of pids in the /proc filesystem.  This problem is most likely in a very dynamic environment where large number of short-lived processes are being created.&lt;BR /&gt;&lt;BR /&gt;Various Serviceguard for Linux toolkits use this command and it is a suggested method for users writing their own scripts.  They check the pid to see if the process being monitored is still running.  The ‘ps’ error causes the monitor script to falsely determine that the process is no longer running, causing the package to failover.&lt;BR /&gt;&lt;BR /&gt;The exact command lines that have problems are:&lt;BR /&gt;&lt;BR /&gt;pid=`ps $p_pid | grep $PROC ! awk ‘{print $1}’`&lt;BR /&gt;if [ -z “$pid” ]; then&lt;BR /&gt;&lt;BR /&gt;This should be replaced with:&lt;BR /&gt;&lt;BR /&gt;grep $PROC /proc/$p_pid/stat &amp;gt;/dev/null&lt;BR /&gt;if [ $? –ne 0 ]; then&lt;BR /&gt;&lt;BR /&gt;Rather than looking through all of the pids in the /proc filesystem, this just checks the pid that is being monitored.&lt;BR /&gt;&lt;BR /&gt;If you think you have experienced a false failover, then check the monitor scripts and make this change.  &lt;BR /&gt;&lt;BR /&gt;Even if you have not experienced a false failover, it is recommended that you make this change.  Any contributed toolkit that uses the ‘ps’ command in this way will be changed in their next release.  Because of testing, this may take up to 3 months for any specific toolkit.&lt;BR /&gt;&lt;BR /&gt;Remember to make the change on all servers that may run the package.  Note that because the file is open on the server running the package, it will not be updated immediately.  This last node will only be updated after the package is moved.   During a maintenance period, move the package and recheck the file on all nodes.  Remember, if a server fails between a change to the file and the maintenance period, the file may not have been updated.  That is why it is CRITICAL to recheck all nodes after the package move.&lt;BR /&gt;&lt;BR /&gt;As new or updated toolkits are released,</description>
      <pubDate>Thu, 09 Jun 2005 17:55:33 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561540#M58361</guid>
      <dc:creator>Serviceguard for Linux</dc:creator>
      <dc:date>2005-06-09T17:55:33Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Linux ‘ps’ command can cause false failover of packages</title>
      <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561541#M58362</link>
      <description>If you're just disposing the output of 'grep', why not just 'grep -q $PROC /proc/$p_pid/stat' ?  Either way, you're going to get ugly messages if the pid doesn't currently exist (No such file or directory).&lt;BR /&gt;&lt;BR /&gt;The furthering of this would be to put the grep straight into the if:&lt;BR /&gt;&lt;BR /&gt;if grep -q $PROC /proc/$p_pid/stat 2&amp;gt;/dev/null&lt;BR /&gt;&lt;BR /&gt;as 'if' checks the exit state of the application.. just a bit quicker than launching 'test' ([) and checking $?.&lt;BR /&gt;&lt;BR /&gt;Anyway, just some thoughts.</description>
      <pubDate>Thu, 09 Jun 2005 18:14:17 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561541#M58362</guid>
      <dc:creator>Stuart Browne</dc:creator>
      <dc:date>2005-06-09T18:14:17Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Linux ‘ps’ command can cause false failover of packages</title>
      <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561542#M58363</link>
      <description>Oh la! la! ... thanks for letting us on to this one I don't use serviceguard for linux, but as you say this may affects any "ps pid".&lt;BR /&gt;&lt;BR /&gt;Will find/search all my scripts for the use of this.&lt;BR /&gt;&lt;BR /&gt;I did read the bugzilla entry, to try and understand it all, but seem to me Stuart Browne thoughts are correct! way to go, or is there something we missed ?&lt;BR /&gt;&lt;BR /&gt;Jean-Pierre Huc</description>
      <pubDate>Fri, 10 Jun 2005 05:17:57 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561542#M58363</guid>
      <dc:creator>Huc_1</dc:creator>
      <dc:date>2005-06-10T05:17:57Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Linux ‘ps’ command can cause false failover of packages</title>
      <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561543#M58364</link>
      <description>Stuart,&lt;BR /&gt;&lt;BR /&gt;We will keep it this way because we'll get any errors from "grep" logged.  Also, test is a built in function so there is not major launch overhead.  &lt;BR /&gt;&lt;BR /&gt;There may be some advantage to the -q.&lt;BR /&gt;&lt;BR /&gt;We really want to change as little as possible to minimze teh risk of introducing another problem.&lt;BR /&gt;&lt;BR /&gt;Huc,&lt;BR /&gt;&lt;BR /&gt;That's why I posted it with the description - to make everyone who uses this aware of possible problems.  Glad it may help.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 13 Jun 2005 13:04:30 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561543#M58364</guid>
      <dc:creator>Serviceguard for Linux</dc:creator>
      <dc:date>2005-06-13T13:04:30Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Linux ‘ps’ command can cause false failover of packages</title>
      <link>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561544#M58365</link>
      <description>Hrm.. I guess it's only when shown these situations that new things are learnt.  I always thought that in bourne-based shells, the single []'s were based off the external test:&lt;BR /&gt;&lt;BR /&gt;lrwxrwxrwx  1 root root 4 May  1  2004 /usr/bin/[ -&amp;gt; test&lt;BR /&gt;&lt;BR /&gt;and that '[[]]' are inbuilt.  Sometimes shell man pages are just too long:&lt;BR /&gt;&lt;BR /&gt;man bash: under 'CONDITIONAL EXPRESSIONS'&lt;BR /&gt;Conditional expressions are used by the [[  compound  command  and  the test  and [ builtin commands to test file attributes and perform string and arithmetic comparisons.&lt;BR /&gt;&lt;BR /&gt;My apologies.&lt;BR /&gt;&lt;BR /&gt;In any case, as the grep isn't disposing of STDERR, you'll still get an ugly-error when the '$p_pid' doesn't exist.</description>
      <pubDate>Mon, 13 Jun 2005 17:36:00 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/problem-with-linux-ps-command-can-cause-false-failover-of/m-p/3561544#M58365</guid>
      <dc:creator>Stuart Browne</dc:creator>
      <dc:date>2005-06-13T17:36:00Z</dc:date>
    </item>
  </channel>
</rss>

