<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Nice process, bad schedul, big trouble in Operating System - Linux</title>
    <link>https://community.hpe.com/t5/operating-system-linux/nice-process-bad-schedul-big-trouble/m-p/3452121#M71601</link>
    <description>Dear all,&lt;BR /&gt;I have some trouble with nx2 nastran process  runing on a dl380 g3 (bi-processor and multithreading enable) with 2.4.18-14smp linux core.&lt;BR /&gt;I must say that this server  was installed about 1 year and this problem never occur before.&lt;BR /&gt;This is the way the things goes :&lt;BR /&gt;The analisys process start well with a nice scheduling priority (10). &lt;BR /&gt;It does his work (compute and write about 100 Gb of data) taking almost 100 % of one cpu(we don't have nastran // license)&lt;BR /&gt;BUT&lt;BR /&gt;After a while( depending on the size of the computing project) it take 100 %  of ALL cpu and never ending(i let him work 12hours for a a compute of 2 hours).&lt;BR /&gt;So i try to kill it, kill -15 : don't work, kill -9 : stay alive, reboot : don't want to know(the server don't reboot at all)&lt;BR /&gt;&lt;BR /&gt;In fact, the process take 100% of cpu in the system mode waiting for some i/o ending( i think ?!)&lt;BR /&gt;I let the process running until an "ASR Detected by System ROM" reboot the system.&lt;BR /&gt;We restart the compute and the process goes the same way, every compute goes the same way.&lt;BR /&gt;&lt;BR /&gt;After check hardware problem( i don't found any), i try to renice the analysis process level 5 nothing change, level  0 -&amp;gt; the process stop by itself after few second.&lt;BR /&gt;&lt;BR /&gt;We restart a new compute nice level 0, the process stop by itself. But sometime the process nice level 0 goes in trouble again and i must renice it (-1,-2,-3,etc...)&lt;BR /&gt;My question is :&lt;BR /&gt;What this process expect from me ? more seriously i understand that the nice opération give a process higher scheduling priority, but it seems that whatever the priority the process freeze and the only way to stop it is to renice it with more priority.&lt;BR /&gt;&lt;BR /&gt;The data are writing on local scsi hard disk, with enough free space. No error message in /var/log/*, dmesg, ilo log....&lt;BR /&gt;We stress the system(nfs, i/o writing, cpu, memory) during 12 hours and nothing seems to be wrong......  &lt;BR /&gt;I try with multithreading enable then disable, with Ilo card enable/disable.&lt;BR /&gt;If you got any idea ......&lt;BR /&gt;&lt;BR /&gt;Thanks a lot  &lt;BR /&gt;Best Regards&lt;BR /&gt;&lt;BR /&gt;laurent Ranni&lt;BR /&gt;</description>
    <pubDate>Wed, 29 Dec 2004 08:33:18 GMT</pubDate>
    <dc:creator>RANNI</dc:creator>
    <dc:date>2004-12-29T08:33:18Z</dc:date>
    <item>
      <title>Nice process, bad schedul, big trouble</title>
      <link>https://community.hpe.com/t5/operating-system-linux/nice-process-bad-schedul-big-trouble/m-p/3452121#M71601</link>
      <description>Dear all,&lt;BR /&gt;I have some trouble with nx2 nastran process  runing on a dl380 g3 (bi-processor and multithreading enable) with 2.4.18-14smp linux core.&lt;BR /&gt;I must say that this server  was installed about 1 year and this problem never occur before.&lt;BR /&gt;This is the way the things goes :&lt;BR /&gt;The analisys process start well with a nice scheduling priority (10). &lt;BR /&gt;It does his work (compute and write about 100 Gb of data) taking almost 100 % of one cpu(we don't have nastran // license)&lt;BR /&gt;BUT&lt;BR /&gt;After a while( depending on the size of the computing project) it take 100 %  of ALL cpu and never ending(i let him work 12hours for a a compute of 2 hours).&lt;BR /&gt;So i try to kill it, kill -15 : don't work, kill -9 : stay alive, reboot : don't want to know(the server don't reboot at all)&lt;BR /&gt;&lt;BR /&gt;In fact, the process take 100% of cpu in the system mode waiting for some i/o ending( i think ?!)&lt;BR /&gt;I let the process running until an "ASR Detected by System ROM" reboot the system.&lt;BR /&gt;We restart the compute and the process goes the same way, every compute goes the same way.&lt;BR /&gt;&lt;BR /&gt;After check hardware problem( i don't found any), i try to renice the analysis process level 5 nothing change, level  0 -&amp;gt; the process stop by itself after few second.&lt;BR /&gt;&lt;BR /&gt;We restart a new compute nice level 0, the process stop by itself. But sometime the process nice level 0 goes in trouble again and i must renice it (-1,-2,-3,etc...)&lt;BR /&gt;My question is :&lt;BR /&gt;What this process expect from me ? more seriously i understand that the nice opération give a process higher scheduling priority, but it seems that whatever the priority the process freeze and the only way to stop it is to renice it with more priority.&lt;BR /&gt;&lt;BR /&gt;The data are writing on local scsi hard disk, with enough free space. No error message in /var/log/*, dmesg, ilo log....&lt;BR /&gt;We stress the system(nfs, i/o writing, cpu, memory) during 12 hours and nothing seems to be wrong......  &lt;BR /&gt;I try with multithreading enable then disable, with Ilo card enable/disable.&lt;BR /&gt;If you got any idea ......&lt;BR /&gt;&lt;BR /&gt;Thanks a lot  &lt;BR /&gt;Best Regards&lt;BR /&gt;&lt;BR /&gt;laurent Ranni&lt;BR /&gt;</description>
      <pubDate>Wed, 29 Dec 2004 08:33:18 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/nice-process-bad-schedul-big-trouble/m-p/3452121#M71601</guid>
      <dc:creator>RANNI</dc:creator>
      <dc:date>2004-12-29T08:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: Nice process, bad schedul, big trouble</title>
      <link>https://community.hpe.com/t5/operating-system-linux/nice-process-bad-schedul-big-trouble/m-p/3452122#M71602</link>
      <description>Ranni,&lt;BR /&gt;&lt;BR /&gt;I'm afraid that you have a software (nastran) problem. But you don't have a nastran license what makes the thing a lot more serious.&lt;BR /&gt;&lt;BR /&gt;Lets try some things.&lt;BR /&gt;&lt;BR /&gt;You can run process with less data to be processed, i.e., less resource consumming process ?&lt;BR /&gt;&lt;BR /&gt;You can see if the memory becames stressed along the time the long runnig process works ?&lt;BR /&gt;&lt;BR /&gt;Maybe your software problem (bug) appears only when a vary large amount of data has to  be processed.&lt;BR /&gt;&lt;BR /&gt;So, as you don't have a license that makes possible to report a bug, I suggest you, if possible, to break your long running process into 2 or more.&lt;BR /&gt;&lt;BR /&gt;I know that it's not a great help but it's just what I can do for you.&lt;BR /&gt;&lt;BR /&gt;Good luck.&lt;BR /&gt;Xyko</description>
      <pubDate>Wed, 29 Dec 2004 10:58:32 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/nice-process-bad-schedul-big-trouble/m-p/3452122#M71602</guid>
      <dc:creator>xyko_1</dc:creator>
      <dc:date>2004-12-29T10:58:32Z</dc:date>
    </item>
  </channel>
</rss>

